摘要

1. Hearings and mishearings: decrypting the spoken word [PDF] 返回目录
Anita Mehta, Jean-Marc Luck
Abstract: We propose a model of the speech perception of individual words in the presence of mishearings. This phenomenological approach is based on concepts used in linguistics, and provides a formalism that is universal across languages. We put forward an efficient two-parameter form for the word length distribution, and introduce a simple representation of mishearings, which we use in our subsequent modelling of word recognition. In a context-free scenario, word recognition often occurs via anticipation when, part-way into a word, we can correctly guess its full form. We give a quantitative estimate of this anticipation threshold when no mishearings occur, in terms of model parameters. As might be expected, the whole anticipation effect disappears when there are sufficiently many mishearings. Our global approach to the problem of speech perception is in the spirit of an optimisation problem. We show for instance that speech perception is easy when the word length is less than a threshold, to be identified with a static transition, and hard otherwise. We extend this to the dynamics of word recognition, proposing an intuitive approach highlighting the distinction between individual, isolated mishearings and clusters of contiguous mishearings. At least in some parameter range, a dynamical transition is manifest well before the static transition is reached, as is the case for many other examples of complex systems.
摘要：我们建议在mishearings存在的个别单词的语音感知模型。这种现象学的方法是基于语言学所使用的概念，并提供了一个形式主义是跨语言普遍性。我们提出了一个高效的双参数形式的单词长度分布，并介绍mishearings，这是我们在随后的单词识别的模型使用的简单表示。在上下文情景，词识别往往通过预期时，部分的方式进入的话，我们可以猜它的完整形式出现。我们给这个期待阈值的定量估计时，没有发生mishearings，在模型参数方面。正如预期的那样，在整个预期效果时，有足够多的mishearings消失。我们对言语感知的问题，全球性的方法是在优化问题的精神。我们显示实例言语知觉的时候容易字长度小于阈值，使用静态过渡识别和硬盘否则。我们这个延伸到字识别的动态，提出一种直观的方法突出显示单个的，分离mishearings和连续mishearings的簇之间的区别。至少在某些参数范围，一个动态过渡是明显达到之前以及静态过渡，如对于复杂系统的许多其它实例中的情况。

2. At your Command! An Empirical Study on How LaypersonsTeach Robots New Functions [PDF] 返回目录
Sebastian Weigelt, Vanessa Steurer, Walter F. Tichy
Abstract: Even though intelligent systems such as Siri or Google Assistant are enjoyable (and useful) dialog partners, users can only access predefined functionality. Enabling end-users to extend the functionality of intelligent systems will be the next big thing. To promote research in this area we carried out an empirical study on how laypersons teach robots new functions by means of natural language instructions. The result is a labeled corpus consisting of 3168 submissions given by 870 subjects. The analysis of the dataset revealed that many participants used certain wordings to express their wish to teach new functionality; two corresponding trigrams are among the most frequent. On the contrary, more than one third (36.93%) did not verbalize the teaching intent at all. We labeled the semantic constituents in the utterances: declaration (including the name of the function) and intermediate steps. The full corpus is publicly available: this http URL
摘要：尽管智能系统，如Siri的或谷歌助理是快乐（有用的）对话伙伴，用户只能访问预定义功能。使最终用户能够延长智能系统的功能将是下一件大事。在这方面，我们开展了外行教如何通过自然语言指令手段机器人的新功能进行了实证研究促进研究。其结果是由通过870名受试者给予3168名提交标记的语料库。该数据集的分析显示，许多与会者使用的某些措辞来表达自己的意愿来教新的功能;两个对应卦是最常见的。相反，超过三分之一（36.93％），没有用语言表达的教学意图在所有。我们标记在话语语义成分：声明（包括函数的名称）和中间步骤。完整的语料库是公开的：这个HTTP URL

3. Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue [PDF] 返回目录
Longxiang Liu, Zhuosheng Zhang, Hai Zhao, Xi Zhou, Xiang Zhou
Abstract: A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely by taking the pairwise dialogue history and candidate response as a whole, the hierarchical information on either utterance interrelation or speaker roles coupled in such representations is not well addressed. In this work, we propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history. In detail, we decouple the contextualized word representations by masking mechanisms in Transformer-based PrLM, making each word only focus on the words in current utterance, other utterances, two speaker roles (i.e., utterances of sender and utterances of receiver), respectively. Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method.
摘要：多圈的对话是由两个或多个不同的朗读者的角色由多个话语。因此utterance-和扬声器感知的线索都应该在模型中得到很好的捕捉。然而，在现有的基于内容的检索，多转对话造型，预先训练的语言模型（PrLMs）的编码器通过采取两两对话历史和候选响应整体上无论是话语的相互关系，层次信息代表对话粗或再加在这样的表示扬声器的作用没有得到很好的解决。在这项工作中，我们提出了一种新的模型通过模拟在对话历史entailed有效的话语感知和扬声器感知表示，以填补这些差距。详细地说，我们通过在基于变压器的PRLM掩蔽机制，使得每个字仅着眼于词语的当前发声，其它的话语解耦情境字表示两个扬声器的角色（即，发送者和接收者的话语的话语）分别。实验结果表明，我们的方法提升了强烈ELECTRA基线基本四次公开基准数据集，并实现各种新的国家的最先进的性能比以前的方法。一系列的消融研究都是以证明我们的方法的有效性。

4. EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition [PDF] 返回目录
Chengyu Wang, Mengli Cheng, Xu Hu, Jun Huang
Abstract: We present EasyASR, a distributed machine learning platform for training and serving large-scale Automatic Speech Recognition (ASR) models, as well as collecting and processing audio data at scale. Our platform is built upon the Machine Learning Platform for AI of Alibaba Cloud. Its main functionality is to support efficient learning and inference for end-to-end ASR models on distributed GPU clusters. It allows users to learn ASR models with either pre-defined or user-customized network architectures via simple user interface. On EasyASR, we have produced state-of-the-art results over several public datasets for Mandarin speech recognition.
摘要：我们目前EasyASR，分布式机器学习平台，培训和服务的大型自动语音识别（ASR）模型，以及收集和大规模处理音频数据。我们的平台是基于机器学习平台阿里巴巴云的AI建成。它的主要功能是支持高效的学习和推理最终到终端的ASR机型分布式GPU集群。它可以让用户了解ASR机型通过简单的用户界面无论是预定义或用户自定义的网络架构。在EasyASR，我们生产的国家的最先进成果在多个公开数据集普通话语音识别。

5. Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low Resource Languages: Bhojpuri, Maithili and Magahi [PDF] 返回目录
Rajesh Kumar Mundotiya, Shantanu Kumar, Ajeet kumar, Umesh Chandra Chaudhary, Supriya Chauhan, Swasti Mishra, Praveen Gatla, Anil Kumar Singh
Abstract: In Natural Language Processing (NLP) pipelines, Named Entity Recognition (NER) is one of the preliminary problems, which marks proper nouns and other named entities such as Location, Person, Organization, Disease etc. Such entities, without a NER module, adversely affect the performance of a machine translation system. NER helps in overcoming this problem by recognising and handling such entities separately, although it can be useful in Information Extraction systems also. Bhojpuri, Maithili and Magahi are low resource languages, usually known as Purvanchal languages. This paper focuses on the development of a NER benchmark dataset for the Machine Translation systems developed to translate from these languages to Hindi by annotating parts of their available corpora. Bhojpuri, Maithili and Magahi corpora of sizes 228373, 157468 and 56190 tokens, respectively, were annotated using 22 entity labels. The annotation considers coarse-grained annotation labels followed by the tagset used in one of the Hindi NER datasets. We also report a Deep Learning based baseline that uses an LSTM-CNNs-CRF model. The lower baseline F1-scores from the NER tool obtained by using Conditional Random Fields models are 96.73 for Bhojpuri, 93.33 for Maithili and 95.04 for Magahi. The Deep Learning-based technique (LSTM-CNNs-CRF) achieved 96.25 for Bhojpuri, 93.33 for Maithili and 95.44 for Magahi.
摘要：在自然语言处理（NLP）管道，命名实体识别（NER）是初步的问题，其中一个标志专有名词和其他命名实体，如地点，人物，组织，疾病等。这样的实体，没有NER模块，机器翻译系统的性能产生不利影响。 NER有助于克服认识和处理分开这些实体这个问题，虽然它可以在信息抽取系统是有用也。博杰普尔，迈蒂利和Magahi低资源语言，通常被称为Purvanchal语言。本文的重点是NER基准数据集用于开发的注释其现有语料库的部分从这些语言印地文翻译的机器翻译系统的开发。博杰普尔，迈蒂利和Magahi语料库使用22个实体标签尺寸228373，157468级56190的令牌，分别注释的。注释认为粗粒注释标签随后在印地文NER数据集之一中使用的标记集。我们还报告说，使用了LSTM-细胞神经网络-CRF模型深度学习基于基线。从NER工具下基准线F1分数通过使用条件随机域模型获得的96.73的博杰普尔，93.33为迈蒂利和95.04的Magahi。基于学习型深技术（LSTM-细胞神经网络-CRF）为博杰普尔，93.33为迈蒂利和95.44的Magahi达到96.25。

6. Time-Aware Evidence Ranking for Fact-Checking [PDF] 返回目录
Liesbeth Allein, Isabelle Augenstein, Marie-Francine Moens
Abstract: Truth can vary over time. Therefore, fact-checking decisions on claim veracity should take into account temporal information of both the claim and supporting or refuting evidence. Automatic fact-checking models typically take claims and evidence pages as input, and previous work has shown that weighing or ranking these evidence pages by their relevance to the claim is useful. However, the temporal information of the evidence pages is not generally considered when defining evidence relevance. In this work, we investigate the hypothesis that the timestamp of an evidence page is crucial to how it should be ranked for a given claim. We delineate four temporal ranking methods that constrain evidence ranking differently: evidence-based recency, claim-based recency, claim-centered closeness and evidence-centered clustering ranking. Subsequently, we simulate hypothesis-specific evidence rankings given the evidence timestamps as gold standard. Evidence ranking is then optimized using a learning to rank loss function. The best performing time-aware fact-checking model outperforms its baseline by up to 33.34%, depending on the domain. Overall, evidence-based recency and evidence-centered clustering ranking lead to the best results. Our study reveals that time-aware evidence ranking not only surpasses relevance assumptions based purely on semantic similarity or position in a search results list, but also improves veracity predictions of time-sensitive claims in particular.
摘要：真理可以随时间变化。因此，事实查证的要求真实性的决定应该考虑到要求和支持或反驳证据两者的时间信息。自动事实查证模型通常采取索赔和证据页面作为输入，和以前的工作表明，体重或他们的相关性排序，这些证据页面的要求是非常有用的。然而，证据页面的时间信息一般不定义相关证据时，会考虑。在这项工作中，我们研究的假设的证据页的时间戳是应该如何排名对一个给定的要求是至关重要的。我们划定4个暂时排名方法是约束证据排名不同：基于证据的新旧程度，要求为基础的新旧程度，要求为中心的封闭性和证据为中心的集群排名。随后，我们模拟给出的证据时间戳为金标准具体假设的证据的排名。那么排名证据使用学习等级损失函数优化。表现最好的时间感知的事实查证模型优于通过了其基线到33.34％，这取决于域。总体而言，基于证据的近因和证据为中心的聚类排名导致最好的结果。我们的研究表明，时间感知的证据排名不仅超过上在搜索结果列表中的语义相似性或位置完全基于相关假设，但也提高了在特定的时间敏感的索赔真实性预测。

7. Multi-Hop Fact Checking of Political Claims [PDF] 返回目录
Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein
Abstract: Recently, novel multi-hop models and datasets have been introduced to achieve more complex natural language reasoning with neural networks. One notable task that requires multi-hop reasoning is fact checking, where a chain of connected evidence pieces leads to the final verdict of a claim. However, existing datasets do not provide annotations for the gold evidence pieces, which is a critical aspect for improving the explainability of fact checking systems. The only exception is the FEVER dataset, which is artificially constructed based on Wikipedia and does not use naturally occurring political claims and evidence pages, which is more challenging. Most claims in FEVER only have one evidence sentence associated with them and require no reasoning to make label predictions -- the small number of instances with two evidence sentences only require simple reasoning. In this paper, we study how to perform more complex claim verification on naturally occurring claims with multiple hops over evidence chunks. We first construct a small annotated dataset, PolitiHop, of reasoning chains for claim verification. We then compare the dataset to other existing multi-hop datasets and study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex, and achieve the best performance using an architecture that specifically models reasoning over evidence chains in combination with in-domain transfer learning.
摘要：最近，新的多跳的模型和数据集被引入，实现与神经网络更复杂的自然语言的推理。需要多跳推理的一个显着的任务就是核对事实，在连接件证据导致索赔的最终裁决链。但是，现有的数据集不为金价提供证据片，有利于提高事实核查系统的explainability一个重要方面注释。唯一的例外是发烧的数据集，这是人为地构建了基于维基百科和不使用自然发生的政治权利和证据的页面，这是更大的挑战。发热大多数要求只有与他们相关的一个证据一句话，不需要理由，使标签预测 - 两个证据句子很少的情况下只需要简单的推理。在本文中，我们研究如何在自然发生的超过证据块多跳的权利要求执行更复杂的要求验证。我们首先构建的推理链条要求验证的一个小注释数据集，PolitiHop。然后，我们的数据集比其他现有的多跳的数据集，并研究如何从更广泛的IN-外的域资源转让知识和PolitiHop。我们发现，任务是复杂的，并使用专门的模型推理在证据链结合域内迁移学习的架构达到最佳性能。

8. Analysis and representation of Igbo text document for a text-based system [PDF] 返回目录
Ifeanyi-Reuben Nkechi J., Ugwu Chidiebere, Adegbola Tunde
Abstract: The advancement in Information Technology (IT) has assisted in inculcating the three Nigeria major languages in text-based application such as text mining, information retrieval and natural language processing. The interest of this paper is the Igbo language, which uses compounding as a common type of word formation and as well has many vocabularies of compound words. The issues of collocation, word ordering and compounding play high role in Igbo language. The ambiguity in dealing with these compound words has made the representation of Igbo language text document very difficult because this cannot be addressed using the most common and standard approach of the Bag-Of-Words (BOW) model of text representation, which ignores the word order and relation. However, this cause for a concern and the need to develop an improved model to capture this situation. This paper presents the analysis of Igbo language text document, considering its compounding nature and describes its representation with the Word-based N-gram model to properly prepare it for any text-based application. The result shows that Bigram and Trigram n-gram text representation models provide more semantic information as well addresses the issues of compounding, word ordering and collocations which are the major language peculiarities in Igbo. They are likely to give better performance when used in any Igbo text-based system.
摘要：在信息技术（IT）的发展还协助灌输基于文本的应用程序三个主要尼日利亚语言，如文本挖掘，信息检索和自然语言处理。本文的利益是伊格博语言，其使用配混作为通用型的构词以及具有复合词许多词汇表。搭配，词的排序和复利发挥作用，高在伊博语的问题。在处理这些复合词的多义性做出了伊博语语言文本文档的表现非常困难的，因为这不能使用最常见的和文本表示的一袋字（BOW）模型，标准的做法，忽略了字加以解决秩序和关系。然而，这引起人们的关注和需要开发一种改进模型捕捉到了这个情况。本文礼物伊博语语言文本文档的分析，考虑到其复合性质，并介绍其与基于词的N元模型表示正确任何基于文本的应用程序做准备。结果表明，两字组和卦正克文表示模型提供更多的语义信息和地址，这是在伊博主要的语言特点复利，文字排序和搭配的问题。他们可能在任何伊博基于文本的系统中使用时提供更好的性能。

9. EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets [PDF] 返回目录
Nickil Maveli
Abstract: Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they're observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (disaster relief organizations and news agencies) and therefore recognizing the informativeness of a tweet can help filter noise from large volumes of data. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers including RoBERTa, XLNet, and BERTweet trained in a semi-supervised experimental setting. The proposed system achieves a F1 score of 0.9011 on the test set (ranking 7th on the leaderboard), and shows significant gains in performance compared to a baseline system using fasttext embeddings.
摘要：Twitter已经成为在紧急时刻的重要沟通渠道。智能手机的无所不在使人们宣布他们正在实时观察的紧急情况。正因为如此，越来越多的机构有兴趣的编程监控推特（救灾组织和新闻机构），因此从大量的数据中识别鸣叫可以帮助过滤噪声的信息量。在本文中，我们提出我们的WNUT-2020任务2投稿：信息COVID-19英语鸣叫的识别。我们最成功的模式是变压器包括罗伯塔，XLNet的集合，并在BERTweet一个半监督实验设置培训。所提出的系统实现了0.9011的测试集的F1分数（在排行榜上排名第七位），并显示在性能显著的收益相比，使用fasttext的嵌入基准系统。

10. Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying Informative COVID-19 English Tweets [PDF] 返回目录
Thai Quoc Hoang, Phuong Thu Vu
Abstract: As of 2020 when the COVID-19 pandemic is full-blown on a global scale, people's need to have access to legitimate information regarding COVID-19 is more urgent than ever, especially via online media where the abundance of irrelevant information overshadows the more informative ones. In response to such, we proposed a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not. By ensembling different BERTweet model configurations, we have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class. In the post-competition period, we have also experimented with various other approaches that potentially boost generalization to a new dataset.
摘要：2020年，当COVID-19大流行是完全成熟的在全球范围内，人们需要有关于COVID-19是比以往任何时候都更加迫切，尤其是通过网络媒体，其中的无关信息的丰富掩盖了获得合法信息更多的信息的。为了应对这样的，我们提出，鉴于英国的鸣叫模式，自动识别是否鸣叫关于熊COVID-19或没有翔实的内容。通过ensembling不同BERTweet模型配置，我们已经取得的竞争结果只能回避那些顶了约1％，在F1方面的表现的球队在比分上信息类。在后比赛期间，我们还尝试与潜在提升推广到新的数据集各种其他方法。

11. Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples [PDF] 返回目录
Jin Yong Yoo, John X. Morris, Eli Lifland, Yanjun Qi
Abstract: We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search methods are proposed in past work, the attack search space is often modified alongside the search method. Without ablation studies benchmarking the search algorithm change with the search space held constant, an increase in attack success rate could from an improved search method or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms' run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise. Code implementation shared via this https URL
摘要：我们研究了用于产生对抗的例子自然语言处理（NLP）任务数黑匣子搜索算法的行为。我们进行相关的搜索三个要素的细粒度分析：搜索算法，搜索空间和搜索预算。当新的搜索方法，在过去的工作中提出，攻击搜索空间经常被修改旁边的搜索方法。如果没有标杆与搜索区域的搜索算法变化消融研究保持不变，增加了进攻成功率从改进的搜索方法，或放宽搜索空间可以。此外，许多以前的研究未能正确考虑搜索算法运行时的成本，这对于喜欢对抗性训练下游的任务是必不可少的。我们的实验提供的搜索算法在各种搜索空间和查询预算的重复性基准，以指导今后的对抗性NLP研究。根据我们的实验中，我们建议在限制时间或长进攻时投入使用Word重要性贪婪的攻击等级，并且或者波束搜索或粒子群算法，否则。代码实现通过这个共享HTTPS URL

12. GeDi: Generative Discriminator Guided Sequence Generation [PDF] 返回目录
Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, Nazneen Fatema Rajani
Abstract: Class-conditional language models (CC-LMs) can be used to generate natural language with specific attributes, such as style or sentiment, by conditioning on an attribute label, or control code. However, we find that these models struggle to control generation when applied to out-of-domain prompts or unseen control codes. To overcome these limitations, we propose generative discriminator (GeDi) guided contrastive generation, which uses CC-LMs as generative discriminators (GeDis) to efficiently guide generation from a (potentially much larger) LM towards a desired attribute. In our human evaluation experiments, we show that GeDis trained for sentiment control on movie reviews are able to control the tone of book text. We also demonstrate that GeDis are able to detoxify generation and control topic while maintaining the same level of linguistic acceptability as direct generation from GPT-2 (1.5B parameters). Lastly, we show that a GeDi trained on only 4 topics can generalize to new control codes from word embeddings, allowing it to guide generation towards wide array of topics.
摘要：类的条件语言模型（CC-LMS）可以用来产生一个属性标签上具有特定属性，诸如风格或情绪的自然语言，通过调节或控制代码。然而，我们发现，当应用到外域的提示或看不见的控制代码这些模型很难控制产生。为了克服这些限制，我们提出生成鉴别器（哥弟）引导对比代，它使用CC-的LM作为生成鉴别器（GeDis）以有效地引导从产生（潜在地更大）LM向期望的属性。在我们人类的评估实验，我们表明，GeDis训练情绪控制上的电影评论能够控制书文本的基调。我们还表明，GeDis能够解毒产生和控制的话题，同时保持语言可接受性直接生成的，从GPT-2（1.5B参数）相同的水平。最后，我们表明，哥弟的培训上只有4个主题可以概括从字的嵌入新的控制代码，允许其指导生成对各种各样的话题。

13. Improving Language Generation with Sentence Coherence Objective [PDF] 返回目录
Ruixiao Sun, Jie Yang, Mehrdad Yousefzadeh
Abstract: Conditional story generation and contextual text continuation have become increasingly popular topics in NLP community. Existing models are often prone to output paragraphs of texts that gradually diverge from the given prompt. Although the generated text may have a reasonable perplexity and diversity, it could easily be identified by human as gibberish. The goal of our project is to improve the coherence and consistency across sentences in a language-generation model. We aim to solve this issue by first training a sentence pair coherence classifier with GPT-2 pretrained model, and then co-train the GPT-2 language model with this new coherence objective using a method analogous to the REINFORCE algorithm. This fine-tuned language model is able to generate lengthy paragraph conditioned on a given topic without diverging too much. The simplicity of this model allows it to be applicable to a variety of underlying language model architecture since it only modifies the final layer of the pre-trained model.
摘要：有条件的故事产生和上下文文本延续已成为NLP社会越来越热门的话题。现有的模型往往容易的文本从给定的提示逐步发散的输出段。虽然生成的文本可以有一个合理的困惑和多样性，它很容易被人体为乱码鉴定。我们项目的目标是提高语言代车型的连贯性和一致性跨句子。我们的目标是先训练来解决这一问题GPT-2预训练模型中的句子对连贯性分类，然后用这个新的一致性目标使用方法类似于该增强算法联合训练GPT-2语言模型。这种微调语言模型能够产生冗长的段落空调在给定的题目没有分歧太多。该模型的简单性使得它可以适用于各种语言模型体系结构基础，因为它仅修改预先训练的模型的最终层的。

14. QED: A Framework and Dataset for Explanations in Question Answering [PDF] 返回目录
Matthew Lamm, Jennimaria Palomaki, Chris Alberti, Daniel Andor, Eunsol Choi, Livio Baldini Soares, Michael Collins
Abstract: A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks -- post-hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.
摘要：问答系统，除了提供一个答案提供线索，以这个问题的答案有可调试性，可扩展性和信任方面的潜在优势推理的解释。为此，我们提出了QED，在问答说明一个语言告知，可扩展的框架。一个QED解释规定，根据正式的语义概念，如参考平等，sentencehood，并蕴涵一问一答之间的关系。我们描述并公开发布于谷歌自然问题的数据集的一个子集建QED解释和报告的基线模型的专家注释的数据集上的两个任务 - 事后解释代给出的答案，并联合答疑和解释的产生。在联合设置，有前途的结果表明，在QED数据相对少量的培训可以提高答疑。除了描述了QED办法正式，语言理论的动机，我们描述显示，QED解释的存在显著改善未经训练的评估者当场被强烈的神经QA基线做出错误的能力庞大的用户研究。

15. Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues [PDF] 返回目录
Ruijian Xu, Chongyang Tao, Daxin Jiang, Xueliang Zhao, Dongyan Zhao, Rui Yan
Abstract: Building an intelligent dialogue system with the ability to select a proper response according to a multi-turn context is a great challenging task. Existing studies focus on building a context-response matching model with various neural architectures or PLMs and typically learning with a single response prediction task. These approaches overlook many potential training signals contained in dialogue data, which might be beneficial for context understanding and produce better features for response prediction. Besides, the response retrieved from existing dialogue systems supervised by the conventional way still faces some critical challenges, including incoherence and inconsistency. To address these issues, in this paper, we propose learning a context-response matching model with auxiliary self-supervised tasks designed for the dialogue data based on pre-trained language models. Specifically, we introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination, and jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner. By this means, the auxiliary tasks can guide the learning of the matching model to achieve a better local optimum and select a more proper response. Experiment results on two benchmarks indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection in retrieval-based dialogues, and our model achieves new state-of-the-art results on both datasets.
摘要：构建智能对话系统，根据多圈范围内选择合适的反应的能力是一个很大的挑战性的任务。现有的研究侧重于建设有各种神经结构或周期性肢体运动障碍上下文响应匹配模型，通常与单个响应预测任务学习。这些方法远眺包含在对话数据，这可能是对上下文的理解有益并产生响应预测更好的功能，许多潜在的训练信号。此外，从现有对话系统检索到的响应监管由传统方式仍然面临着一些严峻的挑战，其中包括不连贯和不一致。为了解决这些问题，在本文中，我们建议学习设计用于基于预先训练的语言模型的对话数据辅助自我监督任务的上下文响应匹配模型。具体来说，我们介绍的四个自解监督的任务，包括下一个会话预测，发声恢复，不连贯的检测和鉴别一致性，共同培养与在多任务方式这些辅助任务基于PLM-响应选择模型。通过这种方式，辅助任务可以引导匹配模型的学习，以达到更好的局部最优，选择更合适的反应。两个基准测试实验结果表明，该辅助自我监督的任务带来的基于内容的检索，对话多圈反应选择显著改善，我们的模型实现了国家的最先进的新上两个数据集的结果。

16. A Comparison of Two Fluctuation Analyses for Natural Language Clustering Phenomena: Taylor and Ebeling & Neiman Methods [PDF] 返回目录
Kumiko Tanaka-Ishii, Shuntaro Takahashi
Abstract: This article considers the fluctuation analysis methods of Taylor and Ebeling & Neiman. While both have been applied to various phenomena in the statistical mechanics domain, their similarities and differences have not been clarified. After considering their analytical aspects, this article presents a large-scale application of these methods to text. It is found that both methods can distinguish real text from independently and identically distributed (i.i.d.) sequences. Furthermore, it is found that the Taylor exponents acquired from words can roughly distinguish text categories; this is also the case for Ebeling and Neiman exponents, but to a lesser extent. Additionally, both methods show some possibility of capturing script kinds.
摘要：本文认为，泰勒和艾伯林与奈曼的波动分析方法。虽然两人都在统计力学领域被应用到各种现象，它们的异同尚未阐明。考虑到他们的分析方面之后，本文介绍的这些方法文本大规模应用。据发现，这两种方法可以从独立同分布（独立同分布）序列区分真正的文本。此外，还发现，从字获取的泰勒指数可以大致区分文本类;这也是埃贝林和尼曼指数的情况下，但程度较轻。此外，这两种方法显示拍摄脚本类型的一些可能性。

17. Contrastive Triple Extraction with Generative Transformer [PDF] 返回目录
Hongbin Ye, Ningyu Zhang, Shumin Deng, Mosha Chen, Chuanqi Tan, Fei Huang, Huajun Chen
Abstract: Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, We introduce two mechanisms to further improve model performance (i.e., batch-wise dynamic attention-masking and triple-wise calibration). Experimental results on three datasets (i.e., NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines. Our code and datasets will be released after publication.
摘要：三重提取是自然语言处理和知识图构建信息抽取的一项重要任务。在本文中，我们重温序列生成终端到终端的三重提取任务。由于生成三重提取可能难以捕捉长期依赖并生成不忠三倍，我们引入了一种新的模式，以生成变压器对比三联提取。具体而言，我们引入用于基于编码器 - 解码器生成一个共享变压器模块。为了产生可靠的结果，我们提出了一个新颖的三重对比培养对象。此外，我们引入两个机制，进一步提高模型的性能（即分批动态关注掩蔽和三明智校准）。三个数据集（即，纽约时报，WebNLG和MIE）实验结果表明，该方法实现了比基线更好的性能。我们的代码和数据集将公布之后被释放。

18. Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction [PDF] 返回目录
Ningyu Zhang, Luoqiu Li, Shumin Deng, Haiyang Yu, Xu Cheng, Wei Zhang, Huajun Chen
Abstract: Fine-tuning pre-trained models have achieved impressive performance on standard natural language processing benchmarks. However, the resultant model generalizability remains poorly understood. We do not know, for example, how excellent performance can lead to the perfection of generalization models. In this study, we analyze a fine-tuned BERT model from different perspectives using relation extraction. We also characterize the differences in generalization techniques according to our proposed improvements. From empirical experimentation, we find that BERT suffers a bottleneck in terms of robustness by way of randomizations, adversarial and counterfactual tests, and biases (i.e., selection and semantic). These findings highlight opportunities for future improvements. Our open-sourced testbed DiagnoseRE with code, model, and datasets will be released after publication.
摘要：微调预训练的模型已经达到标准的自然语言处理基准骄人的业绩。然而，得到的模型普遍性仍然知之甚少。我们不知道，出色的性能，例如，如何导致推广模式的完善。在这项研究中，我们分析了使用关系抽取不同角度微调BERT模式。我们也根据我们提出的改进在推广技术特点的差异。从实证实验，我们发现，BERT遭受由随机化，对抗性和反测试的方式稳健性方面的瓶颈，和偏见（即，选择和语义）。这些发现强调为今后的改进机会。我们的开源测试床DiagnoseRE与代码，模型和数据集将公布之后被释放。

19. Composing Answer from Multi-spans for Reading Comprehension [PDF] 返回目录
Zhuosheng Zhang, Yiqing Zhang, Hai Zhao, Xi Zhou, Xiang Zhou
Abstract: This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident $n$-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.
摘要：本文提出了一种新颖的方法，以产生用于非浸取机阅读理解（MRC）任务其答案不能简单地提取作为从给定的通道一个跨度的答案。使用指针网络式萃取解码器，用于这种类型的MRC可能导致不令人满意的性能时地面实况答案是由人类注释给定或从通道的部件高度重新改写。在另一方面，使用生成解码器不能很好地遇到长句子时，保证与结构良好的语法和语义所得的答案。因此，以缓解双方的明显的缺点，我们建议从由我们的模型学到的高度自信$ n $的-gram考生在给定的通道中提取的多跨度的答案决策建立的方法。也就是说，返回的答案是由不连续的多跨度的，但在给定的通道不只是一个连续的跨度了。所提出的方法是简单而有效的：在MS MARCO表明，该方法对生成准确的答案长更好的性能，并且基本上实证实验优于两个相互竞争的典型的一跨度和Seq2Seq基线解码器。

20. Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication [PDF] 返回目录
Gašper Beguš
Abstract: Identity-based patterns for which a computational model needs to output some feature together with a copy of that feature are computationally challenging, but pose no problems to human learners and are common in world's languages. In this paper, we test whether a neural network can learn an identity-based pattern in speech called reduplication. To our knowledge, this is the first attempt to test identity-based patterns in deep convolutional networks trained on raw continuous data. Unlike existing proposals, we test learning in an unsupervised manner and we train the network on raw acoustic data. We use the ciwGAN architecture (Beguš 2020; arXiv:2006.02951) in which learning of meaningful representations in speech emerges from a requirement that the deep convolutional network generates informative data. Based on four generative tests, we argue that a deep convolutional network learns to represent an identity-based pattern in its latent space; by manipulating only two categorical variables in the latent space, we can actively turn an unreduplicated form into a reduplicated form with no other changes to the output in the majority of cases. We also argue that the network extends the identity-based pattern to unobserved data: when reduplication is forced in the output with the proposed technique for latent space manipulation, the network generates reduplicated data (e.g., it copies an [s] e.g. in [si-siju] for [siju] although it never sees any reduplicated forms containing an [s] in the input). Comparison with human outputs of reduplication show a high degree of similarity. Exploration of how meaningful representations of identity-based patterns emerge and how the latent space variables outside of the training range correlate with identity-based patterns in the output has general implications for neural network interpretability.
摘要：基于身份的模式对于其计算模型需要输出与该功能的副本的一些特征一起计算挑战，但不会造成任何问题，人类的学习者和在世界上的语言常见。在本文中，我们测试的神经网络是否能学习讲话呼吁加倍基于身份的模式。据我们所知，这是第一次尝试在训练有素的原始数据连续深卷积网络测试基于身份的模式。不同于现有的建议，我们测试学习无人监督的方式和我们培养的原始声学数据网络。我们使用ciwGAN架构（Beguš2020;的arXiv：2006.02951），其中在语音学习有意义表示的出现从一个要求，即深卷积网络生成信息数据。基于四个生成的测试中，我们认为，深卷积网络学习，以表示其潜在空间基于身份的模式;通过操纵只有两个潜在空间分类变量，我们可以积极地把一个unreduplicated形式成重叠形式，没有其他变化在大多数情况下的输出。我们还认为，该网络延伸基于身份的图案到未观测到的数据：当加倍在输出被强制与潜在空间操纵所提出的技术中，网络产生叠音的数据（例如，它拷贝的[S]如在[SI -siju]作为[驷驹]尽管它永远不会看到任何含叠形式的[S]在输入）。与重叠式的人类输出比较显示出高程度的相似性。与输出基于身份的模式的培训范围内互相关联潜在空间变量之外如何基于身份的图案有意义的表述出现，以及如何对神经网络解释性一般意义的探索。

21. Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding [PDF] 返回目录
Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
Abstract: Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme long-range dependencies, as its complexity grows quadratically with respect to the sequence length. Therefore, long sequences are often encoded by Transformer in chunks using a sliding window. In this paper, we propose Cluster-Former, a novel clustering-based sparse Transformer to perform attention across chunked sequences. Our proposed method allows information integration beyond local windows, which is especially beneficial for question answering (QA) and language modeling tasks that rely on long-range dependencies. Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
摘要：变压器已成为深学习领域无处不在。一个注定其成功的关键因素之一是自我关注机制，它允许完全连接上下文编码过的输入令牌。然而，尽管它在模拟短序列，自患有注意力处理与极端远距离依赖性输入时，作为它的复杂相对于序列长度平方增长有效性。因此，长序列通常通过变压器在使用滑动窗口块编码。在本文中，我们提出了集群前，一种新型的基于聚类的稀疏变压器跨分块序列进行关注。我们提出的方法使信息集成超出当地的窗户，这是问答（QA）和语言模型依赖于远程的依赖任务特别有益。实验表明，在几个主要质量基准集群前实现国家的最先进的性能。

22. Span-based Semantic Parsing for Compositional Generalization [PDF] 返回目录
Jonathan Herzig, Jonathan Berant
Abstract: Despite the success of sequence-to-sequence (seq2seq) models in semantic parsing, recent work has shown that they fail in compositional generalization, i.e., the ability to generalize to new structures built of components observed during training. In this work, we posit that a span-based parser should lead to better compositional generalization. we propose SpanBasedSP, a parser that predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input. SpanBasedSP extends Pasupat et al. (2019) to be comparable to seq2seq models by (i) training from programs, without access to gold trees, treating trees as latent variables, (ii) parsing a class of non-projective trees through an extension to standard CKY. On GeoQuery, SCAN and CLOSURE datasets, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization: from $69.8 \rightarrow 95.3$ average accuracy.
摘要：尽管序列到序列（seq2seq）在语义分析模型的成功，最近的研究表明，他们组成的概括，不能即推广到内置组件的新结构的能力培养中观察到。在这项工作中，我们断定，基于跨度的解析器应带来更好的成分概括。我们建议SpanBasedSP，在那输入话语预测跨度树解析器，明确编码程序是如何在局部输入组成了跨。 SpanBasedSP延伸Pasupat等。（2019）是可比较通过从程序（I）的训练，seq2seq模型无法获得金树木，树木治疗作为潜变量，（ii）通过一个扩展标准CKY解析类非投影树。在GeoQuery，扫描和关闭数据集，SpanBasedSP执行类似于随机拆分强seq2seq基准，但性能相比上要求组成概括分裂基线显着提高：从$ 69.8 \ 95.3 RIGHTARROW $平均准确。

23. BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks [PDF] 返回目录
Tongwen Huang, Qingyun She, Junlin Zhang
Abstract: As a pre-trained Transformer model, BERT (Bidirectional Encoder Representations from Transformers) has achieved ground-breaking performance on multiple NLP tasks. On the other hand, Boosting is a popular ensemble learning technique which combines many base classifiers and has been demonstrated to yield better generalization performance in many machine learning tasks. Some works have indicated that ensemble of BERT can further improve the application performance. However, current ensemble approaches focus on bagging or stacking and there has not been much effort on exploring the boosting. In this work, we proposed a novel Boosting BERT model to integrate multi-class boosting into the BERT. Our proposed model uses the pre-trained Transformer as the base classifier to choose harder training sets to fine-tune and gains the benefits of both the pre-training language knowledge and boosting ensemble in NLP tasks. We evaluate the proposed model on the GLUE dataset and 3 popular Chinese NLU benchmarks. Experimental results demonstrate that our proposed model significantly outperforms BERT on all datasets and proves its effectiveness in many NLP tasks. Replacing the BERT base with RoBERTa as base classifier, BoostingBERT achieves new state-of-the-art results in several NLP Tasks. We also use knowledge distillation within the "teacher-student" framework to reduce the computational overhead and model storage of BoostingBERT while keeping its performance for practical application.
摘要：作为一个预先训练变压器模型，BERT（来自变形金刚双向编码表示）已经实现多任务NLP突破性的性能。在另一方面，推进是一种流行的集成学习技术，它结合了许多基分类，并已被证明产生许多机器学习的任务更好的推广能力。有些作品已经表明，BERT的合奏可以进一步提高应用程序的性能。然而，目前的整体方法上装袋或堆叠重点，还没有开拓升压很大的努力。在这项工作中，我们提出了一种新型推进BERT模式，整合多级提升到BERT。我们提出的模型采用预先训练的变压器作为基础分类选择更难的训练集，以微调和获得的预培训语言知识和NLP任务提升合奏两个好处。我们评估上胶集和3个流行的中国NLU基准所提出的模型。实验结果表明，我们提出的模型显著优于上的所有数据集BERT和许多NLP任务证明了其有效性。更换用的罗伯塔BERT碱作为基分类，BoostingBERT实现状态的最先进的新的结果在几个NLP任务。我们还使用了“师生”框架内的知识蒸馏减少BoostingBERT的计算开销和模型存储，同时保持其性能为实际应用。

24. Pow-Wow: A Dataset and Study on Collaborative Communication in Pommerman [PDF] 返回目录
Takuma Yoneda, Matthew R. Walter, Jason Naradowsky
Abstract: In multi-agent learning, agents must coordinate with each other in order to succeed. For humans, this coordination is typically accomplished through the use of language. In this work we perform a controlled study of human language use in a competitive team-based game, and search for useful lessons for structuring communication protocol between autonomous agents. We construct Pow-Wow, a new dataset for studying situated goal-directed human communication. Using the Pommerman game environment, we enlisted teams of humans to play against teams of AI agents, recording their observations, actions, and communications. We analyze the types of communications which result in effective game strategies, annotate them accordingly, and present corpus-level statistical analysis of how trends in communications affect game outcomes. Based on this analysis, we design a communication policy for learning agents, and show that agents which utilize communication achieve higher win-rates against baseline systems than those which do not.
摘要：在多代理学习，代理人必须要想出人头地相互协调。对于人类来说，这种协调通常是通过语言的使用来实现的。在这项工作中，我们在有竞争力的团队为基础的游戏进行人类语言使用的对照研究，并搜索自治代理之间的结构化通信协议有益的借鉴。我们构建的Pow-哇，用于研究位于目标导向的人际交往的新数据集。使用Pommerman游戏环境，我们寻求人类的团队对抗AI剂队打球，记录他们的观察，行动和通讯。因此，我们分析的通信这导致有效的游戏策略的类型，注释它们，并在通信的发展趋势如何影响游戏结局目前语料库的水平的统计分析。基于这种分析，我们设计了一个通信政策学习代理商，并展示其利用通信实现对比那些没有基线系统较高的获胜率代理商。

25. Combining Word and Character Vector Representation on Neural Machine Translation [PDF] 返回目录
K. M. Shahih, Ayu Purwarianti
Abstract: This paper describes combinations of word vector representation and character vector representation in English-Indonesian neural machine translation (NMT). Six configurations of NMT models were built with different input vector representations: word-based, combination of word and character representation using bidirectional LSTM(bi-LSTM), combination of word and character representation using CNN, combination of word and character representation by combining bi-LSTM and CNN by three different vector operations: addition, pointwise multiplication, and averaging. The experiment results showed that NMT models with concatenation of word and character representation obtained BLEU score higher than baseline model, ranging from 9.14 points to 11.65 points, for all models that combining both word and character representation, except the model that combining word and character representation using both bi-LSTM and CNN by addition operation. The highest BLEU score achieved was 42.48 compared to the 30.83 of the baseline model.
摘要：本文介绍了英语，印尼语神经机器翻译（NMT）字向量表示和特征向量表示的组合。 NMT模型的六种配置都建有不同的输入向量表示：字为基础，采用双向LSTM（双LSTM），使用CNN文字和字符的组合，文字和字符的组合，通过组合双字和字符的组合-LSTM和CNN由三个不同的矢量操作：另外，点乘法，和平均。实验结果表明，NMT机型字和字符的拼接获得BLEU得分高于基准模型，从9.14点至11.65点，为所有型号的组合字和字符表示，除了模型，结合文字和字符同时使用双LSTM和CNN的加法运算。比较基准模型的30.83达到的最高BLEU得分为42.48。

26. Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication [PDF] 返回目录
Haihua Chen, Huyen Nguyen
Abstract: Citation function and citation sentiment are two essential aspects of citation content analysis (CCA), which are useful for influence analysis, the recommendation of scientific publications. However, existing studies are mostly traditional machine learning methods, although deep learning techniques have also been explored, the improvement of the performance seems not significant due to insufficient training data, which brings difficulties to applications. In this paper, we propose to fine-tune pre-trained contextual embeddings ULMFiT, BERT, and XLNet for the task. Experiments on three public datasets show that our strategy outperforms all the baselines in terms of the F1 score. For citation function identification, the XLNet model achieves 87.2%, 86.90%, and 81.6% on DFKI, UMICH, and TKDE2019 datasets respectively, while it achieves 91.72% and 91.56% on DFKI and UMICH in term of citation sentiment identification. Our method can be used to enhance the influence analysis of scholars and scholarly publications.
摘要：引文功能和引文情绪是引用内容分析（CCA）的两个重要方面，这是影响分析，科学出版物的建议是有用的。然而，现有的研究大多是传统的机器学习方法，虽然深度学习技术也被开发，性能的提高似乎并不显著由于训练数据不足，从而带来了困难应用。在本文中，我们建议预先训练微调上下文的嵌入ULMFiT，BERT和XLNet的任务。三个公开数据集实验表明，我们的策略优于所有的基线在F1分数方面。对于引用功能鉴定，XLNet模型分别达到87.2％，86.90％，在第DFKI，密西根大学81.6％，和TKDE2019数据集，而在它的引文情绪识别术语达到91.72％和91.56％对DFKI和密西根大学。我们的方法可以用来提高学者和学术出版物的影响分析。

27. Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge [PDF] 返回目录
Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Claire Cardie
Abstract: In this paper, we aim to extract commonsense knowledge to improve machine reading comprehension. We propose to represent relations implicitly by situating structured knowledge in a context instead of relying on a pre-defined set of relations, and we call it contextualized knowledge. Each piece of contextualized knowledge consists of a pair of interrelated verbal and nonverbal messages extracted from a script and the scene in which they occur as context to implicitly represent the relation between the verbal and nonverbal messages, which are originally conveyed by different modalities within the script. We propose a two-stage fine-tuning strategy to use the large-scale weakly-labeled data based on a single type of contextualized knowledge and employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader. Experimental results demonstrate that our method outperforms a state-of-the-art baseline by a 4.3% improvement in accuracy on the machine reading comprehension dataset C^3, wherein most of the questions require unstated prior knowledge.
摘要：在本文中，我们的目标是提取常识性知识，提高机器阅读理解能力。我们建议在上下文情境结构的知识，而不是依赖于预先定义的一组关系的隐含代表的关系，我们把它叫做语境知识。每一块上下文化的知识的由一对从脚本和场景中提取它们出现作为上下文隐含地代表语言和非语言消息，这些消息最初是由不同模态脚本内输送之间的关系口头相互关联和非语言信息的。我们建议使用基于单一类型的情境知识的大型弱标记的数据，并采用师生范例注入多种情境知识纳入学生机读取器的两级微调策略。实验结果表明，我们的方法在精确度的4.3％的改善在机器阅读理解数据集C 1-4 3，其中大部分的问题需要未说明的先验知识优于国家的最先进的基线。

28. CIA_NITT at WNUT-2020 Task 2: Classification of COVID-19 Tweets Using Pre-trained Language Models [PDF] 返回目录
Yandrapati Prakash Babu, Rajagopal Eswari
Abstract: This paper presents our models for WNUT 2020 shared task2. The shared task2 involves identification of COVID-19 related informative tweets. We treat this as binary text classification problem and experiment with pre-trained language models. Our first model which is based on CT-BERT achieves F1-score of 88.7% and second model which is an ensemble of CT-BERT, RoBERTa and SVM achieves F1-score of 88.52%.
摘要：本文介绍了我们模型WNUT 2020共享TASK2。共享TASK2涉及COVID-19的相关信息鸣叫的识别。我们把这个作为二进制文本分类问题和实验预训练的语言模型。我们这是基于CT-BERT第一模型实现了88.7％和第二模式，即CT-BERT，罗伯塔和SVM的合奏F1-得分达到的88.52％F1-得分。

29. Intent Detection with WikiHow [PDF] 返回目录
Li Zhang, Qing Lyu, Chris Callison-Burch
Abstract: Modern task-oriented dialog systems need to reliably understand users' intents. Intent detection is most challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models. Our models are able to predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75% accuracy using only 100 training examples in all datasets.
摘要：现代面向任务的对话系统必须可靠地理解用户的意图。意向检测是最有挑战性的移动到新领域或新的语言的时候，因为很少有注释的数据。为了应对这一挑战，我们提出了一套预先训练意图检测模型。我们的模型能够从许多动作预测范围广泛的预期目标，因为它们是wikiHow上，全面指导网站训练。我们的模型实现上的剪刀数据集，架构制导对话数据集，以及Facebook的多语言对话的数据集的所有3种语言的国家的最先进的成果。我们的模型也显示出强大的零和几拍性能，达到只使用100的所有数据集训练样本超过75％的准确率。

30. Syntax Role for Neural Semantic Role Labeling [PDF] 返回目录
Zuchao Li, Hai Zhao, Shexia He, Jiaxun Cai
Abstract: Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruning-based and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.
摘要：语义角色标注（SRL）是一家致力于识别句子的语义谓词参数的结构。在传统模式方面以往的研究表明句法信息可以作出SRL性能卓越的贡献;然而，句法信息的必要性，通过演示令人印象深刻的性能，而语法骨干和建议，语法信息成为神经语义角色标注那么重要了几个最近的神经SRL研究的挑战，尤其是在最近深层神经网络和大规模配对预先训练语言模型。尽管这个概念，神经SRL领域仍然缺乏对句法信息在SRL的相关性进行系统和全面的调查，对于依赖和两个单语和多语言设置。本文拟以量化为深学习框架神经SRL的句法信息的重要性。我们介绍三种典型SRL框架（基线），序列为基础，基于树和基于图形的，这是伴随着两类利用句法信息：语法修剪型和基于特征的语法。实验是对所有可用的语言对CoNLL - 2005年，2009年，和2012年的基准进行，结果表明，神经SRL车型仍然可以在一定条件下句法信息中受益。此外，我们显示出与使用现有的模型进行彻底的实证性调查一起语法的神经SRL模型定量意义。

31. Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector [PDF] 返回目录
Ayu Purwarianti, Ida Ayu Putu Ari Crisdayanti
Abstract: Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising performance in sentiment classification task. It processes inputs as sequence of information. Due to this behavior, sentiment predictions by Bi-LSTM were influenced by words sequence and the first or last phrases of the texts tend to have stronger features than other phrases. Meanwhile, in the problem scope of Indonesian sentiment analysis, phrases that express the sentiment of a document might not appear in the first or last part of the document that can lead to incorrect sentiment classification. To this end, we propose the using of an existing document representation method called paragraph vector as additional input features for Bi-LSTM. This vector provides information context of the document for each sequence processing. The paragraph vector is simply concatenated to each word vector of the document. This representation also helps to differentiate ambiguous Indonesian words. Bi-LSTM and paragraph vector were previously used as separate methods. Combining the two methods has shown a significant performance improvement of Indonesian sentiment analysis model. Several case studies on testing data showed that the proposed method can handle the sentiment phrases position problem encountered by Bi-LSTM.
摘要：双向龙短时记忆网络（双LSTM）已经显示出市场情绪分类任务有前途的性能。它处理输入作为信息序列。由于这种行为，通过双LSTM情绪预测是由词序列的影响和文本的第一个或最后一个短语往往比其他短语更强大的功能。同时，在印尼情感分析的问题范围，表达文档的情绪用语可能不会出现在文档，可导致不正确的情感分类的第一个或最后一个部分。为此，我们提出了使用被称为段向量的现有文档表示方法作为用于双LSTM附加输入功能。该载体提供了每个序列处理该文档的信息的上下文。该段载体简单地串接到文档中的每个单词矢量。这表示还有助于区分不明确的话印尼。双LSTM和段落矢量以前使用作为单独的方法。这两种方法相结合已经显示出印尼情绪分析模型显著的性能提升。对测试数据的一些案例研究表明，该方法可以处理双LSTM遇到的情绪短语位置的问题。

32. Improving Indonesian Text Classification Using Multilingual Language Model [PDF] 返回目录
Ilham Firdausi Putra, Ayu Purwarianti
Abstract: Compared to English, the amount of labeled data for Indonesian text classification tasks is very small. Recently developed multilingual language models have shown its ability to create multilingual representations effectively. This paper investigates the effect of combining English and Indonesian data on building Indonesian text classification (e.g., sentiment analysis and hate speech) using multilingual language models. Using the feature-based approach, we observe its performance on various data sizes and total added English data. The experiment showed that the addition of English data, especially if the amount of Indonesian data is small, improves performance. Using the fine-tuning approach, we further showed its effectiveness in utilizing the English language to build Indonesian text classification models.
摘要：相比于英语，标签数据印尼文本分类的任务量是非常小的。最近开发的多语种语言模型已经证明它有效地创建多语言表述能力。本文研究构建印尼文本分类（例如，情感分析和仇恨言论）使用多语种语言模型英语和印尼语的数据相结合的效果。使用基于特征的方法，我们观察其对各种数据的大小性能和总加英文资料。实验表明，添加英文资料的，尤其是在印度尼西亚的数据量小，提高了性能。采用微调的方式，我们进一步表明在使用英语语言来构建印尼文本分类模型的有效性。

33. Relation Detection for Indonesian Language using Deep Neural Network -- Support Vector Machine [PDF] 返回目录
Ramos Janoah Hasudungan, Ayu Purwarianti
Abstract: Relation Detection is a task to determine whether two entities are related or not. In this paper, we employ neural network to do relation detection between two named entities for Indonesian Language. We used feature such as word embedding, position embedding, POS-Tag embedding, and character embedding. For the model, we divide the model into two parts: Front-part classifier (Convolutional layer or LSTM layer) and Back-part classifier (Dense layer or SVM). We did grid search method of neural network hyper parameter and SVM. We used 6000 Indonesian sentences for training process and 1,125 for testing. The best result is 0.8083 on F1-Score using Convolutional Layer as front-part and SVM as back-part.
摘要：关系检测是判断两个实体是否相关或不相关的任务。在本文中，我们采用神经网络做两个命名实体印尼语言之间的关系检测。我们使用的功能，如Word嵌入，位置嵌入，POS-标签嵌入和字符嵌入。对于模型，我们把该模型分成两个部分：前部分分级器（卷积层或LSTM层）和背部分分类（密集层或SVM）。我们做了神经网络超参数和SVM的网格搜索方法。我们用于训练过程中6000个印尼句子和1125进行测试。最好的结果使用卷积层作为前部和SVM作为背面部分是0.8083上F1-得分。

34. Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger [PDF] 返回目录
Devin Hoesen, Ayu Purwarianti
Abstract: Researches on Indonesian named entity (NE) tagger have been conducted since years ago. However, most did not use deep learning and instead employed traditional machine learning algorithms such as association rule, support vector machine, random forest, naïve bayes, etc. In those researches, word lists as gazetteers or clue words were provided to enhance the accuracy. Here, we attempt to employ deep learning in our Indonesian NE tagger. We use long short-term memory (LSTM) as the topology since it is the state-of-the-art of NE tagger. By using LSTM, we do not need a word list in order to enhance the accuracy. Basically, there are two main things that we investigate. The first is the output layer of the network: Softmax vs conditional random field (CRF). The second is the usage of part of speech (POS) tag embedding input layer. Using 8400 sentences as the training data and 97 sentences as the evaluation data, we find that using POS tag embedding as additional input improves the performance of our Indonesian NE tagger. As for the comparison between Softmax and CRF, we find that both architectures have a weakness in classifying an NE tag.
摘要：自上多年前恶搞已经进行的印尼命名实体（NE）的研究。然而，大多数没有使用深度学习，而是采用传统的机器学习算法，如关联规则，支持向量机，随机森林，朴素贝叶斯等。在这些研究中，均提供了单词列表作为地名或线索的话，以提高准确性。在这里，我们试图采用深度学习在我们的印尼NE恶搞。我们使用长短期记忆（LSTM）的拓扑结构，因为它是NE恶搞的国家的最先进的。通过使用LSTM，我们不需要为了提高精度的单词列表。基本上，有，我们会调查两两件事。第一种是在网络的输出层：使用SoftMax VS条件随机场（CRF）。第二个是语音（POS）标签嵌入输入层的一部分的使用。使用8400句作为训练数据和97句作为评估数据，我们发现，使用POS标签嵌入额外的输入提高了我们的印尼NE恶搞的性能。至于SOFTMAX与CRF之间的比较，我们发现，这两种架构都在NE标签分类的弱点。

35. Coreference Resolution System for Indonesian Text with Mention Pair Method and Singleton Exclusion using Convolutional Neural Network [PDF] 返回目录
Turfa Auliarachman, Ayu Purwarianti
Abstract: Neural network has shown promising performance on coreference resolution systems that uses mention pair method. With deep neural network, it can learn hidden and deep relations between two mentions. However, there is no work on coreference resolution for Indonesian text that uses this learning technique. The state-of-the-art system for Indonesian text only states the use of lexical and syntactic features can improve the existing coreference resolution system. In this paper, we propose a new coreference resolution system for Indonesian text with mention pair method that uses deep neural network to learn the relations of the two mentions. In addition to lexical and syntactic features, in order to learn the representation of the mentions words and context, we use word embeddings and feed them to Convolutional Neural Network (CNN). Furthermore, we do singleton exclusion using singleton classifier component to prevent singleton mentions entering any entity clusters at the end. Achieving 67.37% without singleton exclusion, 63.27% with trained singleton classifier, and 75.95% with gold singleton classifier on CoNLL average F1 score, our proposed system outperforms the state-of-the-art system.
摘要：神经网络展示在指代消解系统使用提及对方法有前途的性能。凭借深厚的神经网络，它可以学会隐藏，另外两个之间的关系，深提到。然而，印尼文本使用此学习技术的指代消解没有工作。国家的最先进的系统，为印尼的文本只规定使用的词汇和句法特征可以改善现有的指代消解系统。在本文中，我们提出了以提对方法印尼文本使用深层神经网络来学习的两次提到了关系一个新的指代消解系统。除了词汇和句法特征，以学习表现的话提到和背景，我们用字的嵌入和养活他们卷积神经网络（CNN）。此外，我们做单排除使用单分类器组件，以防止单提到在年底进入任何实体集群。实现无单排斥67.37％，63.27％与训练的单分类，并与CoNLL平均得分F1单金75.95分级％，我们所提出的系统优于国家的最先进的系统。

36. Solving Math Word Problems by Scoring Equations with Recursive Neural Networks [PDF] 返回目录
Klim Zaporojets, Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder
Abstract: Solving math word problems is a cornerstone task in assessing language understanding and reasoning capabilities in NLP systems. Recent works use automatic extraction and ranking of candidate solution equations providing the answer to math word problems. In this work, we explore novel approaches to score such candidate solution equations using tree-structured recursive neural network (Tree-RNN) configurations. The advantage of this Tree-RNN approach over using more established sequential representations, is that it can naturally capture the structure of the equations. Our proposed method consists in transforming the mathematical expression of the equation into an expression tree. Further, we encode this tree into a Tree-RNN by using different Tree-LSTM architectures. Experimental results show that our proposed method (i) improves overall performance with more than 3% accuracy points compared to previous state-of-the-art, and with over 18% points on a subset of problems that require more complex reasoning, and (ii) outperforms sequential LSTMs by 4% accuracy points on such more complex problems.
摘要：解决数学题是评估语言理解和推理在NLP系统功能的基础任务。最近的作品使用自动提取候选解方程提供答案的数学题的排名。在这项工作中，我们将探索新的方式使用得分树状结构的递归神经网络（树RNN）配置，候选解方程。此树RNN的方法相比，使用更成熟的顺序表示，的优点是，它可以自然地拍摄方程的结构。我们提出的方法包括转化公式的数学表达式为表达式树。此外，我们通过使用不同的树LSTM架构编码此树成树RNN。实验结果表明，相比于以前的状态的最先进的的问题，一个子集，它需要更复杂的推理我们提出的方法（i）提高超过3个％的精度点的整体性能，并与超过18个百分点，和（ ⅱ）优于对这种更复杂的问题由4％的准确性的点顺序LSTMs。

37. UPB at SemEval-2020 Task 6: Pretrained Language Models for DefinitionExtraction [PDF] 返回目录
Andrei-Marius Avram, Dumitru-Clementin Cercel, Costin-Gabriel Chiru
Abstract: This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity: (1) classification of sentences as definitional or non-definitional,(2) labeling of definitional sentences, and (3) relation classification. We use various pretrained language models (i.e., BERT, XLNet, RoBERTa, SciBERT, and ALBERT) to solve each of the three subtasks of the competition. Specifically, for each language model variant, we experiment by both freezing its weights and fine-tuning them. We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks. Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask. The code is available for further research at: this https URL.
摘要：从自由文本的教科书（DeftEval）提取定义：这项工作提出了我们在SemEval-2020第六届任务的上下文贡献。这种竞争包括三个子任务有不同的粒度级别：（1）句子作为定义性或非定义性，（2）的定义句子的标签，和（3）关系的分类分类。我们使用各种预训练的语言模型（即BERT，XLNet，罗伯塔，SciBERT，和阿尔伯特）来解决每一个比赛的三个子任务。具体来说，对于每一个语言模型的变体，我们双方冻结其重量和微调他们的实验。我们还探索了训练的共同预测的第二输出和第三子任务多任务架构。在DeftEval数据集评估了我们表现最好的模型获得了第一子任务和第二子任务的第37位的第32位。该代码可在进一步的研究：此HTTPS URL。

38. Cosine meets Softmax: A tough-to-beat baseline for visual grounding [PDF] 返回目录
Nivedita Rufus, Unni Krishnan R Nair, K. Madhava Krishna, Vineet Gandhi
Abstract: In this paper, we present a simple baseline for visual grounding for autonomous driving which outperforms the state of the art methods, while retaining minimal design choices. Our framework minimizes the cross-entropy loss over the cosine distance between multiple image ROI features with a text embedding (representing the give sentence/phrase). We use pre-trained networks for obtaining the initial embeddings and learn a transformation layer on top of the text embedding. We perform experiments on the Talk2Car dataset and achieve 68.7% AP50 accuracy, improving upon the previous state of the art by 8.6%. Our investigation suggests reconsideration towards more approaches employing sophisticated attention mechanisms or multi-stage reasoning or complex metric learning loss functions by showing promise in simpler alternatives.
摘要：在本文中，我们提出了一个用于自动驾驶其优于现有技术方法的状态视觉接地简单的基线，同时保持最小的设计选择。我们的框架最小化在具有一个文本嵌入（表示给予句子/短语）的多个图像的ROI特征之间的余弦距离的交叉熵损失。我们使用预先训练网络获得初始的嵌入和学习上的文本嵌入顶部的转换层。我们在Talk2Car数据集进行实验，达到68.7％AP50精度，8.6％，在现有技术中以前的状态得到改善。我们的调查表明，对通过显示简单的替代品承诺采用复杂的注意力机制或多级推理或复杂的度量学习丧失功能的更多方法复议。

39. Differentially Private Language Models Benefit from Public Pre-training [PDF] 返回目录
Gavin Kerrigan, Dylan Slack, Jens Tuyls
Abstract: Language modeling is a keystone task in natural language processing. When training a language model on sensitive information, differential privacy (DP) allows us to quantify the degree to which our private data is protected. However, training algorithms which enforce differential privacy often lead to degradation in model quality. We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus. We find that DP fine-tuning boosts the performance of language models in the private domain, making the training of such models possible.
摘要：语言建模是自然语言处理的一个重点任务。当敏感信息，训练语言模型，微分隐私（DP）使我们能够量化到我们的私人数据受到保护的程度。然而，执行差隐私训练算法往往会导致模型质量下降。我们研究学习语言模型，它是同时具有高品质和隐私通过在私人文集调整公共基础模型维护的可行性。我们发现，DP微调提升在私人领域的语言模型的性能，使得这种模式的训练可能。

40. Exploring the Hierarchy in Relation Labels for Scene Graph Generation [PDF] 返回目录
Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang
Abstract: By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem. Under this formulation, predicate categories are treated as completely different classes. However, different from the object labels where different classes have explicit boundaries, predicates usually have overlaps in their semantic meanings. For example, sit\_on and stand\_on have common meanings in vertical relationships but different details of how these two objects are vertically placed. In order to leverage the inherent structures of the predicate categories, we propose to first build the language hierarchy and then utilize the Hierarchy Guided Feature Learning (HGFL) strategy to learn better region features of both the coarse-grained level and the fine-grained level. Besides, we also propose the Hierarchy Guided Module (HGM) to utilize the coarse-grained level to guide the learning of fine-grained level features. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin (up to $33\%$ relative gain) in terms of Recall@50 on the task of Scene Graph Generation in different datasets.
摘要：通过分配每个关系一个标签，目前的方法制定关系的检测作为一个分类问题。在这种制剂中，谓词类别被视为完全不同的类。但是，从对象的标签，不同的阶级有明确的边界不同，谓语通常有重叠的语义。例如，坐\ _ON和立场\ _ON有垂直关系，而且这两个对象是如何垂直放置不同的细节常见的含义。为了充分利用谓词类别的固有结构，我们建议先建的语言层次，然后利用层次指导地物学习（HGFL）战略，以更好地学习区域都粗粒级和细粒级的功能。此外，我们也提出了指导模块（HGM）层次利用粗粒级指导的细粒级功能的学习。实验结果表明，所提出的简单而有效的方法，可以大幅度提高国家的最先进的几个基准在不同的数据集的场景图生成的任务（高达$ 33 \％$相对增益）在召回方面@ 50 。

41. Country Image in COVID-19 Pandemic: A Case Study of China [PDF] 返回目录
Huimin Chen, Zeyu Zhu, Fanchao Qi, Yining Ye, Zhiyuan Liu, Maosong Sun, Jianbin Jin
Abstract: Country image has a profound influence on international relations and economic development. In the worldwide outbreak of COVID-19, countries and their people display different reactions, resulting in diverse perceived images among foreign public. Therefore, in this study, we take China as a specific and typical case and investigate its image with aspect-based sentiment analysis on a large-scale Twitter dataset. To our knowledge, this is the first study to explore country image in such a fine-grained way. To perform the analysis, we first build a manually-labeled Twitter dataset with aspect-level sentiment annotations. Afterward, we conduct the aspect-based sentiment analysis with BERT to explore the image of China. We discover an overall sentiment change from non-negative to negative in the general public, and explain it with the increasing mentions of negative ideology-related aspects and decreasing mentions of non-negative fact-based aspects. Further investigations into different groups of Twitter users, including U.S. Congress members, English media, and social bots, reveal different patterns in their attitudes toward China. This study provides a deeper understanding of the changing image of China in COVID-19 pandemic. Our research also demonstrates how aspect-based sentiment analysis can be applied in social science researches to deliver valuable insights.
摘要：国家形象对国际关系和经济发展产生深远的影响。在COVID-19在全球爆发，两国和两国人民显示出不同的反应，从而导致外国公众的不同认识的图像。因此，在这项研究中，我们把中国作为一个具体的典型案例，并与大规模数据集的Twitter基于方面，情感分析研究其形象。据我们所知，这是第一次研究，探索在这样的细粒度方式国家形象。为了进行分析，我们首先建立一个手动标记的微数据集方面级情绪注释。后来，我们与BERT进行基于方面，情感分析，探讨中国的形象。我们发现从非负在一般公众的整体人气变化至负面，并与增加提到的消极思想相关方面和非负的事实为基础的方面减少提到解释。进一步调查不同人群的Twitter用户，其中包括美国国会议员，英文媒体和社会的机器人，揭示他们对中国的态度不同的模式。这项研究提供了一个更深COVID-19大流行中国的改变形象的理解。我们的研究还演示了如何基于方面，情感分析可以在社会科学的研究应用，提供了有价值的见解。

42. An Atlas of Cultural Commonsense for Machine Reasoning [PDF] 返回目录
Anurag Acharya, Kartik Talamadupula, Mark A Finlayson
Abstract: Existing commonsense reasoning datasets for AI and NLP tasks fail to address an important aspect of human life: cultural differences. In this work, we introduce an approach that extends prior work on crowdsourcing commonsense knowledge by incorporating differences in knowledge that are attributable to cultural or national groups. We demonstrate the technique by collecting commonsense knowledge that surrounds three fairly universal rituals---coming-of-age, marriage, and funerals---across three different national groups: the United States, India, and the Philippines. Our pilot study expands the different types of relationships identified by existing work in the field of commonsense reasoning for commonplace events, and uses these new types to gather information that distinguishes the knowledge of the different groups. It also moves us a step closer towards building a machine that doesn't assume a rigid framework of universal (and likely Western-biased) commonsense knowledge, but rather has the ability to reason in a contextually and culturally sensitive way. Our hope is that cultural knowledge of this sort will lead to more human-like performance in NLP tasks such as question answering (QA) and text understanding and generation.
摘要：AI和NLP任务，现有的常识推理的数据集未能解决人类生活的一个重要方面：文化差异。在这项工作中，我们介绍了扩展对纳入知识是由于文化或民族群体的差异众包的常识性知识之前工作的方法。我们证明通过收集周围3个相当普遍的仪式---未来-的年龄，婚姻，嫁娶---横跨三个不同的民族群体的常识知识的技术：美国，印度和菲律宾。我们的初步研究扩展了不同类型的用常识推理司空见惯的事件领域现有的工作确定的关系，并使用这些新类型来收集区分不同群体的知识信息。这也使我们更加接近了一步朝着建立一个机器，不承担普遍（可能西方偏见）常识性知识刚性框架，而是在一个语境和文化敏感的方式的能力的原因。我们希望这种文化知识将导致更多的类似人类的自然语言处理任务，比如问答（QA）和文本的理解和发电性能。

43. Unit Test Case Generation with Transformers [PDF] 返回目录
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, Neel Sundaresan
Abstract: Automated Unit Test Case generation has been the focus of extensive literature within the research community. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult to read or understand for developers. In this paper we propose AthenaTest, an approach that aims at generating unit test cases by learning from real-world, developer-written test cases. Our approach relies on a state-of-the-art sequence-to-sequence transformer model which is able to write useful test cases for a given method under test (i.e., focal method). We also introduce methods2test - the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 630k test cases mined from 70k open-source repositories hosted on GitHub. We use this dataset to train a transformer model to translate focal methods into the corresponding test cases. We evaluate the ability of our model in generating test cases using natural language processing as well as code-specific criteria. First, we assess the quality of the translation compared to the target test case, then we analyze properties of the test case such as syntactic correctness and number and variety of testing APIs (e.g., asserts). We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated test cases.
摘要：自动单元测试用例生成得到了广泛文学研究界关注的焦点。现有方法通常是通过测试覆盖准则引导，生成合成测试用例通常难以阅读或理解为开发。在本文中，我们提出AthenaTest，一种方法，旨在通过真实世界，开发人员编写测试用例生成学习单元测试用例。我们的方法依赖于一个国家的最先进的序列到序列变压器模型，其能够写有用测试用例对于给定的方法下测试（即，焦方法）。我们还引进methods2test - 的单元测试用例方法最大的公开可用的监督平行语料库和相应的在Java中，它包括从托管在GitHub上70K开源库开采630K测试用例焦点方法。我们用这个数据集来训练一个变压器模型翻译焦点方法到相应的测试用例。我们评估我们的模型中产生使用自然语言处理以及具体的代码标准测试案例的能力。首先，我们评估目标相比测试用例翻译的质量，然后我们分析如语法正确性和数量和种类的测试的API（例如，断言）的测试用例的性质。我们执行测试用例，收集测试覆盖信息，并将其与由EvoSuite和GPT-3生成的测试用例进行比较。最后，我们调查根据自己的偏好专业开发人员在可读性，易懂，并且所产生的测试用例的测试有效性方面。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-09-15

目录

摘要