摘要

1. Globetrotter: Unsupervised Multilingual Translation from Visual Alignment [PDF] 返回目录
Dídac Surís, Dave Epstein, Carl Vondrick
Abstract: Multi-language machine translation without parallel corpora is challenging because there is no explicit supervision between languages. Existing unsupervised methods typically rely on topological properties of the language representations. We introduce a framework that instead uses the visual modality to align multiple languages, using images as the bridge between them. We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations. Our language representations are trained jointly in one model with a single stage. Experiments with fifty-two languages show that our method outperforms baselines on unsupervised word-level and sentence-level translation using retrieval.
摘要：因为没有语言之间的明确监督，所以没有并行语料库的多语言机器翻译具有挑战性。现有的无监督方法通常依赖于语言表示的拓扑属性。我们引入了一个框架，该框架改为使用视觉模式来对齐多种语言，并使用图像作为它们之间的桥梁。我们估计语言和图像之间的跨模式对齐方式，并使用此估计值来指导跨语言表示的学习。我们的语言表示在一个模型中以单个阶段进行了联合培训。使用52种语言进行的实验表明，在使用检索的无监督词级和句子级翻译中，我们的方法优于基线。

2. Discourse Parsing of Contentious, Non-Convergent Online Discussions [PDF] 返回目录
Stepan Zakharov, Omri Hadar, Tovit Hakak, Dina Grossman, Yifat Ben-David Kolikant, Oren Tsur
Abstract: Online discourse is often perceived as polarized and unproductive. While some conversational discourse parsing frameworks are available, they do not naturally lend themselves to the analysis of contentious and polarizing discussions. Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework, better suited for non-convergent discussions. We redefine the measure of a successful discussion, and develop a novel discourse annotation schema which reflects a hierarchy of discursive strategies. We consider an array of classification models -- from Logistic Regression to BERT. We also consider various feature types and representations, e.g., LIWC categories, standard embeddings, conversational sequences, and non-conversational discourse markers learnt separately. Given the 31 labels in the tagset, an average F-Score of 0.61 is achieved if we allow a different model for each tag, and 0.526 with a single model. The promising results achieved in annotating discussions according to the proposed schema paves the way for a number of downstream tasks and applications such as early detection of discussion trajectories, active moderation of open discussions, and teacher-assistive bots. Finally, we share the first labeled dataset of contentious non-convergent online discussions.
摘要：在线话语通常被认为是两极分化的，没有生产力。尽管提供了一些对话式的语篇解析框架，但它们自然不适合分析有争议和两极分化的讨论。受巴赫金对话主义理论的启发，我们提出了一种新颖的理论和计算框架，更适合于非融合讨论。我们重新定义了成功讨论的量度，并开发了一种新颖的话语注释模式，该模式反映了话语策略的层次结构。我们考虑了一系列分类模型-从Logistic回归到BERT。我们还将考虑各种特征类型和表示形式，例如LIWC类别，标准嵌入，会话序列和非会话性话语标记。给定标签集中的31个标签，如果我们允许每个标签使用不同的模型，则平均F得分为0.61，使用单个模型则为0.526。根据所提出的方案在讨论中进行注释所获得的令人鼓舞的结果为许多下游任务和应用铺平了道路，例如早期检测讨论轨迹，积极主持公开讨论和教师辅助机器人。最后，我们共享有争议的非聚合在线讨论的第一个标记数据集。

3. Distilling Knowledge from Reader to Retriever for Question Answering [PDF] 返回目录
Gautier Izacard, Edouard Grave
Abstract: The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on hand-crafted features, continuous representations based on neural networks recently obtained competitive results. A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents. In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents. Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever. We evaluate our method on question answering, obtaining state-of-the-art results.
摘要：信息检索的任务是许多自然语言处理系统的重要组成部分，例如开放域问答。传统方法基于手工制作的功能，而基于神经网络的连续表示最近获得了竞争性结果。使用此类方法的挑战是获得监督数据以训练检索器模型，该模型对应于成对的查询和支持文档。在本文中，我们提出了一种用于学习下游任务的检索器模型的技术，该技术受知识提炼的启发，并且不需要带注释的查询和文档对。我们的方法利用了读取器模型的注意力得分，该得分用于根据检索到的文档解决任务，从而为检索器获取综合标签。我们评估问题解答的方法，获得最先进的结果。

4. The Role of Interpretable Patterns in Deep Learning for Morphology [PDF] 返回目录
Judit Acs, Andras Kornai
Abstract: We examine the role of character patterns in three tasks: morphological analysis, lemmatization and copy. We use a modified version of the standard sequence-to-sequence model, where the encoder is a pattern matching network. Each pattern scores all possible N character long subwords (substrings) on the source side, and the highest scoring subword's score is used to initialize the decoder as well as the input to the attention mechanism. This method allows learning which subwords of the input are important for generating the output. By training the models on the same source but different target, we can compare what subwords are important for different tasks and how they relate to each other. We define a similarity metric, a generalized form of the Jaccard similarity, and assign a similarity score to each pair of the three tasks that work on the same source but may differ in target. We examine how these three tasks are related to each other in 12 languages. Our code is publicly available.
摘要：我们研究了字符模式在以下三个任务中的作用：形态分析，词形化和复制。我们使用标准序列到序列模型的修改版本，其中编码器是模式匹配网络。每个模式都在源端对所有可能的N个字符长的子词（子字符串）进行评分，并且得分最高的子词的得分用于初始化解码器以及注意力机制的输入。此方法允许了解输入的哪些子词对于生成输出很重要。通过在相同的来源但不同的目标上训练模型，我们可以比较哪些子词对不同的任务很重要以及它们如何相互关联。我们定义了相似性度量，即Jaccard相似性的一般形式，并为在相同源上工作但目标可能不同的三个任务的每对分配相似性得分。我们以12种语言检查这三个任务之间的关系。我们的代码是公开可用的。

5. Improvements and Extensions on Metaphor Detection [PDF] 返回目录
Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi
Abstract: Metaphors are ubiquitous in human language. The metaphor detection task (MD) aims at detecting and interpreting metaphors from written language, which is crucial in natural language understanding (NLU) research. In this paper, we introduce a pre-trained Transformer-based model into MD. Our model outperforms the previous state-of-the-art models by large margins in our evaluations, with relative improvements on the F-1 score from 5.33% to 28.39%. Second, we extend MD to a classification task about the metaphoricity of an entire piece of text to make MD applicable in more general NLU scenes. Finally, we clean up the improper or outdated annotations in one of the MD benchmark datasets and re-benchmark it with our Transformer-based model. This approach could be applied to other existing MD datasets as well, since the metaphoricity annotations in these benchmark datasets may be outdated. Future research efforts are also necessary to build an up-to-date and well-annotated dataset consisting of longer and more complex texts.
摘要：隐喻在人类语言中无处不在。隐喻检测任务（MD）旨在检测和解释书面语言中的隐喻，这对自然语言理解（NLU）研究至关重要。在本文中，我们将预训练的基于变压器的模型引入MD。在我们的评估中，我们的模型大大优于以前的最新模型，F-1分数从5.33％相对提高到28.39％。其次，我们将MD扩展到有关整个文本的隐喻性的分类任务，以使MD适用于更一般的NLU场景。最后，我们清除其中一个MD基准数据集中的不适当或过时的注释，并使用基于Transformer的模型对其重新进行基准测试。这种方法也可以应用于其他现有的MD数据集，因为这些基准数据集中的隐喻注释可能已过时。建立由较长，更复杂的文本组成的最新且注释正确的数据集也需要进一步的研究。

6. Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet Classification Using BERT [PDF] 返回目录
Dylan Whang, Soroush Vosoughi
Abstract: We describe the systems developed for the WNUT-2020 shared task 2, identification of informative COVID-19 English Tweets. BERT is a highly performant model for Natural Language Processing tasks. We increased BERT's performance in this classification task by fine-tuning BERT and concatenating its embeddings with Tweet-specific features and training a Support Vector Machine (SVM) for classification (henceforth called BERT+). We compared its performance to a suite of machine learning models. We used a Twitter specific data cleaning pipeline and word-level TF-IDF to extract features for the non-BERT models. BERT+ was the top performing model with an F1-score of 0.8713.
摘要：我们描述了为WNUT-2020共享任务2（标识信息丰富的COVID-19英文推文）开发的系统。 BERT是用于自然语言处理任务的高性能模型。我们通过对BERT进行微调并将其嵌入与Tweet特定的功能串联在一起，并训练支持向量机（SVM）进行分类（以下称为BERT +），提高了BERT在此分类任务中的性能。我们将其性能与一套机器学习模型进行了比较。我们使用了Twitter特定的数据清理管道和字级TF-IDF来提取非BERT模型的功能。 BERT +是性能最高的模型，F1得分为0.8713。

7. Big Green at WNUT 2020 Shared Task-1: Relation Extraction as Contextualized Sequence Classification [PDF] 返回目录
Chris Miller, Soroush Vosoughi
Abstract: Relation and event extraction is an important task in natural language processing. We introduce a system which uses contextualized knowledge graph completion to classify relations and events between known entities in a noisy text environment. We report results which show that our system is able to effectively extract relations and events from a dataset of wet lab protocols.
摘要：关系和事件提取是自然语言处理中的重要任务。我们介绍了一种使用上下文知识图完成度的系统，用于在嘈杂的文本环境中对已知实体之间的关系和事件进行分类。我们报告的结果表明，我们的系统能够从湿实验室方案的数据集中有效地提取关系和事件。

8. Extractive Opinion Summarization in Quantized Transformer Spaces [PDF] 返回目录
Stefanos Angelidis, Reinald Kim Amplayo, Yoshihiko Suhara, Xiaolan Wang, Mirella Lapata
Abstract: We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector-Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews, a significant step towards opinion summarization of practical scope. In addition, QT enables controllable summarization without further training, by utilizing properties of the quantized space to extract aspect-specific summaries. We also make publicly available SPACE, a large-scale evaluation benchmark for opinion summarizers, comprising general and aspect-specific summaries for 50 hotels. Experiments demonstrate the promise of our approach, which is validated by human studies where judges showed clear preference for our method over competitive baselines.
摘要：我们提出了量化变压器（QT），这是一种用于抽取意见摘要的无监督系统。 QT受到矢量量化变体自动编码器的启发，我们将其重新用于流行度驱动的摘要。它使用量化空间的聚类解释和新颖的提取算法来发现数百条评论中的流行观点，这是向实用范围的观点摘要迈出的重要一步。此外，QT通过利用量化空间的属性来提取方面特定的摘要，从而无需进一步培训即可实现可控的摘要。我们还公开发布SPACE，这是针对意见总结者的大规模评估基准，包括针对50家酒店的一般摘要和针对特定方面的摘要。实验证明了我们方法的希望，这一点已通过人体研究得到了验证，其中法官显示出我们的方法明显优于竞争性基准。

9. End-to-End Chinese Parsing Exploiting Lexicons [PDF] 返回目录
Yuan Zhang, Zhiyang Teng, Yue Zhang
Abstract: Chinese parsing has traditionally been solved by three pipeline systems including word-segmentation, part-of-speech tagging and dependency parsing modules. In this paper, we propose an end-to-end Chinese parsing model based on character inputs which jointly learns to output word segmentation, part-of-speech tags and dependency structures. In particular, our parsing model relies on word-char graph attention networks, which can enrich the character inputs with external word knowledge. Experiments on three Chinese parsing benchmark datasets show the effectiveness of our models, achieving the state-of-the-art results on end-to-end Chinese parsing.
摘要：传统上，中文解析是通过三个流水线系统解决的，这些系统包括单词分段，词性标记和依赖项解析模块。在本文中，我们提出了一种基于字符输入的端到端中文解析模型，该模型共同学习输出分词，词性标记和依赖结构。尤其是，我们的解析模型依赖于单词图表关注网络，该网络可以通过外部单词知识丰富字符输入。在三个中文解析基准数据集上进行的实验证明了我们模型的有效性，实现了端到端中文解析的最新结果。

10. Combining Machine Learning and Human Experts to Predict Match Outcomes in Football: A Baseline Model [PDF] 返回目录
Ryan Beal, Stuart E. Middleton, Timothy J. Norman, Sarvapali D. Ramchurn
Abstract: In this paper, we present a new application-focused benchmark dataset and results from a set of baseline Natural Language Processing and Machine Learning models for prediction of match outcomes for games of football (soccer). By doing so we give a baseline for the prediction accuracy that can be achieved exploiting both statistical match data and contextual articles from human sports journalists. Our dataset is focuses on a representative time-period over 6 seasons of the English Premier League, and includes newspaper match previews from The Guardian. The models presented in this paper achieve an accuracy of 63.18% showing a 6.9% boost on the traditional statistical methods.
摘要：在本文中，我们提出了一个新的以应用程序为中心的基准数据集，并从一组基准自然语言处理和机器学习模型中得出了结果，这些模型用于预测足球（足球）比赛的结果。通过这样做，我们为预测准确性提供了一个基准，该基准可以利用统计匹配数据和来自人类体育记者的上下文文章来实现。我们的数据集着眼于英超联赛6个赛季的代表性时段，并包括《卫报》的报纸比赛预告。本文介绍的模型可达到63.18％的精度，比传统的统计方法提高了6.9％。

11. CrossNER: Evaluating Cross-Domain Named Entity Recognition [PDF] 返回目录
Zihan Liu, Yan Xu, Tiezheng Yu, Wenliang Dai, Ziwei Ji, Samuel Cahyawijaya, Andrea Madotto, Pascale Fung
Abstract: Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at this https URL.
摘要：跨域命名实体识别（NER）模型能够解决目标域中NER样本的稀缺性问题。但是，大多数现有的NER基准测试都缺少特定领域的实体类型或不关注某个特定领域，从而导致跨域评估的效率较低。为了解决这些障碍，我们引入了跨域NER数据集（CrossNER），这是一个完整标记的NER数据集合，它跨越五个不同的域，具有针对不同域的专门实体类别。此外，我们还提供了一个领域相关的语料库，因为使用它来继续进行预训练语言模型（领域自适应的预训练）对于领域适应是有效的。然后，我们进行全面的实验，以探索利用不同级别的域语料库和预训练策略对跨域任务进行域自适应预训练的有效性。结果表明，专注于包含领域专用实体的分数语料库，并在域自适应的预训练中使用更具挑战性的预训练策略有利于NER域的适应，并且我们提出的方法可以始终优于现有的跨域NER基线。尽管如此，实验也说明了这种跨域NER任务的挑战。我们希望我们的数据集和基线能够促进NER域适应领域的研究。该代码和数据可从此https URL获得。

12. From Bag of Sentences to Document: Distantly Supervised Relation Extraction via Machine Reading Comprehension [PDF] 返回目录
Lingyong Yan, Xianpei Han, Le Sun, Fangchao Liu, Ning Bian
Abstract: Distant supervision (DS) is a promising approach for relation extraction but often suffers from the noisy label problem. Traditional DS methods usually represent an entity pair as a bag of sentences and denoise labels using multi-instance learning techniques. The bag-based paradigm, however, fails to leverage the inter-sentence-level and the entity-level evidence for relation extraction, and their denoising algorithms are often specialized and complicated. In this paper, we propose a new DS paradigm--document-based distant supervision, which models relation extraction as a document-based machine reading comprehension (MRC) task. By re-organizing all sentences about an entity as a document and extracting relations via querying the document with relation-specific questions, the document-based DS paradigm can simultaneously encode and exploit all sentence-level, inter-sentence-level, and entity-level evidence. Furthermore, we design a new loss function--DSLoss (distant supervision loss), which can effectively train MRC models using only $\langle$document, question, answer$\rangle$ tuples, therefore noisy label problem can be inherently resolved. Experiments show that our method achieves new state-of-the-art DS performance.
摘要：远程监管（DS）是一种很有前景的关系提取方法，但经常会遇到标签嘈杂的问题。传统的DS方法通常使用多实例学习技术将一个实体对表示为一袋句子和降噪标签。然而，基于袋的范例无法利用句子间和实体间的证据进行关系提取，并且它们的去噪算法通常是专门且复杂的。在本文中，我们提出了一种新的DS范式-基于文档的远程监管，该模型将关系提取建模为基于文档的机器阅读理解（MRC）任务。通过将与实体有关的所有句子重新组织为文档，并通过使用特定于关系的问题查询文档来提取关系，基于文档的DS范式可以同时编码和利用所有句子级，句子级和实体级级证据。此外，我们设计了一个新的损失函数DSLoss（远程监督损失），该函数可以仅使用$ \ langle $ document，quest，answer $ \ rangle $元组有效地训练MRC模型，因此可以固有地解决嘈杂的标签问题。实验表明，我们的方法实现了最新的DS性能。

13. Facts2Story: Controlling Text Generation by Key Facts [PDF] 返回目录
Eyal Orbach, Yoav Goldberg
Abstract: Recent advancements in self-attention neural network architectures have raised the bar for open-ended text generation. Yet, while current methods are capable of producing a coherent text which is several hundred words long, attaining control over the content that is being generated -- as well as evaluating it -- are still open questions. We propose a controlled generation task which is based on expanding a sequence of facts, expressed in natural language, into a longer narrative. We introduce human-based evaluation metrics for this task, as well as a method for deriving a large training dataset. We evaluate three methods on this task, based on fine-tuning pre-trained models. We show that while auto-regressive, unidirectional Language Models such as GPT2 produce better fluency, they struggle to adhere to the requested facts. We propose a plan-and-cloze model (using fine-tuned XLNet) which produces competitive fluency while adhering to the requested content.
摘要：自我注意神经网络体系结构的最新进展提高了开放式文本生成的标准。然而，尽管当前的方法能够产生数百个单词长的连贯文本，但是对所生成的内容进行控制以及对其进行评估仍然是一个悬而未决的问题。我们提出了一个受控的生成任务，该任务基于将以自然语言表达的一系列事实扩展为更长的叙述。我们介绍了针对此任务的基于人的评估指标，以及推导大型训练数据集的方法。我们基于微调的预训练模型评估了针对此任务的三种方法。我们证明，尽管自回归单向语言模型（例如GPT2）可产生更好的流利度，但它们仍难以遵守所要求的事实。我们提出了一种计划和完成模型（使用经过微调的XLNet），该模型在遵循要求的内容的同时产生了竞争优势。

14. Cross-lingual Approach to Abstractive Summarization [PDF] 返回目录
Aleš Žagar, Marko Robnik-Šikonja
Abstract: Automatic text summarization extracts important information from texts and presents the information in the form of a summary. Abstractive summarization approaches progressed significantly by switching to deep neural networks, but results are not yet satisfactory, especially for languages where large training sets do not exist. In several natural language processing tasks, cross-lingual model transfers are successfully applied in low-resource languages. For summarization such cross-lingual model transfer was so far not attempted due to a non-reusable decoder side of neural models. In our work, we used a pretrained English summarization model based on deep neural networks and sequence-to-sequence architecture to summarize Slovene news articles. We solved the problem of inadequate decoder by using an additional language model for target language evaluation. We developed several models with different proportions of target language data for fine-tuning. The results were assessed with automatic evaluation measures and with small-scale human evaluation. The results show that summaries of cross-lingual models fine-tuned with relatively small amount of target language data are useful and of similar quality to an abstractive summarizer trained with much more data in the target language.
摘要：自动文本摘要从文本中提取重要信息，并以摘要的形式显示信息。通过切换到深度神经网络，抽象总结方法取得了显着进步，但结果仍不令人满意，特别是对于不存在大量训练集的语言。在一些自然语言处理任务中，跨语言模型转换已成功应用于低资源语言。概括地说，由于神经模型的解码器端不可重用，因此迄今为止尚未尝试进行这种跨语言的模型转换。在我们的工作中，我们使用了基于深度神经网络和序列到序列体系结构的预训练英语摘要模型来总结斯洛文尼亚新闻文章。通过使用附加的语言模型进行目标语言评估，我们解决了解码器功能不足的问题。我们开发了几种具有不同比例的目标语言数据的模型以进行微调。通过自动评估方法和小规模人工评估来评估结果。结果表明，用相对少量的目标语言数据进行微调的跨语言模型的摘要对于使用大量目标语言数据进行训练的抽象摘要器是有用的，并且具有相似的质量。

15. CTRLsum: Towards Generic Controllable Text Summarization [PDF] 返回目录
Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong
Abstract: Current summarization systems yield generic summaries that are disconnected from users' preferences and expectations. To address this limitation, we present CTRLsum, a novel framework for controllable summarization. Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts. Using a single unified model, CTRLsum is able to achieve a broad scope of summary manipulation at inference time without requiring additional human annotations or pre-defining a set of control aspects during training. We quantitatively demonstrate the effectiveness of our approach on three domains of summarization datasets and five control aspects: 1) entity-centric and 2) length-controllable summarization, 3) contribution summarization on scientific papers, 4) invention purpose summarization on patent filings, and 5) question-guided summarization on news articles in a reading comprehension setting. Moreover, when used in a standard, uncontrolled summarization setting, CTRLsum achieves state-of-the-art results on the CNN/DailyMail dataset. Code and model checkpoints are available at this https URL
摘要：当前的摘要系统生成的通用摘要与用户的偏好和期望是分离的。为了解决此限制，我们提出了CTRLsum，这是一种可控摘要的新颖框架。我们的方法使用户能够通过关键字或描述性提示形式的文本输入与摘要系统进行交互，从而控制生成摘要的多个方面。使用单个统一模型，CTRLsum可以在推理时实现广泛的摘要操作范围，而无需在培训过程中进行额外的人工注释或预先定义一组控制方面。我们在摘要数据集的三个域和五个控制方面定量地证明了我们的方法的有效性：1）以实体为中心，2）长度可控制的摘要，3）科学论文的贡献摘要，4）专利申请的发明目的摘要，以及5）在阅读理解环境中对新闻文章进行问题指导的总结。此外，在标准的不受控制的摘要设置中使用时，CTRLsum可以在CNN / DailyMail数据集上实现最新的结果。代码和模型检查点位于此https URL

16. Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization [PDF] 返回目录
Yinuo Guo, Hualei Zhu, Zeqi Lin, Bei Chen, Jian-Guang Lou, Dongmei Zhang
Abstract: Human intelligence exhibits compositional generalization (i.e., the capacity to understand and produce unseen combinations of seen components), but current neural seq2seq models lack such ability. In this paper, we revisit iterative back-translation, a simple yet effective semi-supervised method, to investigate whether and how it can improve compositional generalization. In this work: (1) We first empirically show that iterative back-translation substantially improves the performance on compositional generalization benchmarks (CFQ and SCAN). (2) To understand why iterative back-translation is useful, we carefully examine the performance gains and find that iterative back-translation can increasingly correct errors in pseudo-parallel data. (3) To further encourage this mechanism, we propose curriculum iterative back-translation, which better improves the quality of pseudo-parallel data, thus further improving the performance.
摘要：人类智能表现出成分概括性（即理解和产生看不见的组合的能力），但是目前的神经seq2seq模型缺乏这种能力。在本文中，我们将回顾一种简单但有效的半监督迭代式逆向翻译方法，以研究其是否以及如何改善合成泛化。在这项工作中：（1）我们首先从经验上证明，迭代反向翻译大大提高了合成泛化基准（CFQ和SCAN）的性能。（2）为了理解为什么迭代反向翻译很有用，我们仔细检查了性能提升，发现迭代反向翻译可以越来越多地纠正伪并行数据中的错误。（3）为了进一步鼓励这种机制，我们提出了课程迭代的反向翻译，可以更好地提高伪并行数据的质量，从而进一步提高性能。

17. Early Detection of Fake News by Utilizing the Credibility of News, Publishers, and Users Based on Weakly Supervised Learning [PDF] 返回目录
Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu
Abstract: The dissemination of fake news significantly affects personal reputation and public trust. Recently, fake news detection has attracted tremendous attention, and previous studies mainly focused on finding clues from news content or diffusion path. However, the required features of previous models are often unavailable or insufficient in early detection scenarios, resulting in poor performance. Thus, early fake news detection remains a tough challenge. Intuitively, the news from trusted and authoritative sources or shared by many users with a good reputation is more reliable than other news. Using the credibility of publishers and users as prior weakly supervised information, we can quickly locate fake news in massive news and detect them in the early stages of dissemination. In this paper, we propose a novel Structure-aware Multi-head Attention Network (SMAN), which combines the news content, publishing, and reposting relations of publishers and users, to jointly optimize the fake news detection and credibility prediction tasks. In this way, we can explicitly exploit the credibility of publishers and users for early fake news detection. We conducted experiments on three real-world datasets, and the results show that SMAN can detect fake news in 4 hours with an accuracy of over 91%, which is much faster than the state-of-the-art models.
摘要：假新闻的传播严重影响个人声誉和公众信任。近年来，假新闻检测已经引起了极大的关注，以前的研究主要集中在从新闻内容或传播途径中寻找线索。但是，先前模型的必需功能在早期检测场景中通常不可用或不足，从而导致性能不佳。因此，早期的假新闻检测仍然是一个艰巨的挑战。从直觉上讲，来自受信任和权威来源的新闻或由许多具有良好信誉的用户共享的新闻比其他新闻更可靠。利用发布者和用户的信誉作为先前受到弱监督的信息，我们可以在大量新闻中快速定位虚假新闻，并在传播的早期阶段对其进行检测。在本文中，我们提出了一个新颖的结构感知多头注意网络（SMAN），该网络结合了新闻内容，发布者和发布者与用户之间的重新发布关系，以共同优化假新闻检测和可信度预测任务。这样，我们可以明确利用发布者和用户的信誉来进行早期的虚假新闻检测。我们在三个真实的数据集上进行了实验，结果表明SMAN可以在4小时内检测到虚假新闻，其准确率超过91％，这比最新模型要快得多。

18. A Topological Method for Comparing Document Semantics [PDF] 返回目录
Yuqi Kong, Fanchao Meng, Benjamin Carterette
Abstract: Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges' results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.
摘要：比较文档语义是自然语言处理和信息检索中最艰巨的任务之一。迄今为止，一方面，用于此任务的工具仍然很少。另一方面，最相关的方法是从统计或向量空间模型的角度设计的，而从拓扑学的角度来看几乎没有。在本文中，我们希望发出不同的声音。提出了一种基于拓扑持久性的新算法，用于比较两个文档之间的语义相似度。我们的实验是在具有人类判断结果的文档数据集上进行的。选择了一组最新方法进行比较。实验结果表明，我们的算法可以产生高度与人类一致的结果，并且尽管与NLTK有联系，但也击败了大多数最新技术。

19. Unsupervised Label Refinement Improves Dataless Text Classification [PDF] 返回目录
Zewei Chu, Karl Stratos, Kevin Gimpel
Abstract: Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice. In this paper, we ask the following question: how can we improve dataless text classification using the inputs of the downstream task dataset? Our primary solution is a clustering based approach. Given a dataless classifier, our approach refines its set of predictions using k-means clustering. We demonstrate the broad applicability of our approach by improving the performance of two widely used classifier architectures, one that encodes text-category pairs with two independent encoders and one with a single joint encoder. Experiments show that our approach consistently improves dataless classification across different datasets and makes the classifier more robust to the choice of label descriptions.
摘要：无数据文本分类能够通过为与标签描述配对的任何文档分配分数，从而将文档分类为以前看不见的标签。尽管很有希望，但它至关重要地依赖于每个下游任务的标签集的准确描述。这种依赖导致无数据分类器对标签描述的选择高度敏感，并阻碍了无数据分类在实践中的广泛应用。在本文中，我们提出以下问题：如何使用下游任务数据集的输入来改进无数据文本分类？我们的主要解决方案是基于聚类的方法。给定无数据分类器，我们的方法使用k均值聚类优化其预测集。我们通过改进两种广泛使用的分类器体系结构的性能来证明我们的方法的广泛适用性，一种使用两个独立的编码器编码文本类别对，另一个使用单个联合编码器对文本类别对进行编码。实验表明，我们的方法不断改进了不同数据集之间的无数据分类，并使分类器对标签描述的选择更加强大。

20. Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution [PDF] 返回目录
David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li, Jason D. Williams
Abstract: This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.
摘要：本文开发并实现了一种可扩展的方法，用于（a）估算由典型的众包语义注释任务产生的标签的噪声，以及（b）与之相比将标签处理的结果错误减少多达20-30％其他常见的标记策略。重要的是，这种标记过程的新方法（我们称为动态自动冲突解决方案（DACR））不需要地面实况数据集，而是基于项目间注释的不一致性。这使DACR不仅更准确，而且可用于各种标记任务。在接下来的内容中，我们介绍了针对商业个人助理大规模执行的文本分类任务的结果，并评估了与其他常见标签策略相比，该注释策略所发现的内在歧义。

21. Using multiple ASR hypotheses to boost i18n NLU performance [PDF] 返回目录
Charith Peris, Gokmen Oz, Khadige Abboud, Venkata sai Varada, Prashan Wanigasekara, Haidar Khan
Abstract: Current voice assistants typically use the best hypothesis yielded by their Automatic Speech Recognition (ASR) module as input to their Natural Language Understanding (NLU) module, thereby losing helpful information that might be stored in lower-ranked ASR hypotheses. We explore the change in performance of NLU associated tasks when utilizing five-best ASR hypotheses when compared to status quo for two language datasets, German and Portuguese. To harvest information from the ASR five-best, we leverage extractive summarization and joint extractive-abstractive summarization models for Domain Classification (DC) experiments while using a sequence-to-sequence model with a pointer generator network for Intent Classification (IC) and Named Entity Recognition (NER) multi-task experiments. For the DC full test set, we observe significant improvements of up to 7.2% and 15.5% in micro-averaged F1 scores, for German and Portuguese, respectively. In cases where the best ASR hypothesis was not an exact match to the transcribed utterance (mismatched test set), we see improvements of up to 6.7% and 8.8% micro-averaged F1 scores, for German and Portuguese, respectively. For IC and NER multi-task experiments, when evaluating on the mismatched test set, we see improvements across all domains in German and in 17 out of 19 domains in Portuguese (improvements based on change in SeMER scores). Our results suggest that the use of multiple ASR hypotheses, as opposed to one, can lead to significant performance improvements in the DC task for these non-English datasets. In addition, it could lead to significant improvement in the performance of IC and NER tasks in cases where the ASR model makes mistakes.
摘要：当前的语音助手通常使用自动语音识别（ASR）模块产生的最佳假设作为其自然语言理解（NLU）模块的输入，从而丢失可能存储在排名较低的ASR假设中的有用信息。当与两种语言数据集（德语和葡萄牙语）的现状进行比较时，我们利用五个最佳ASR假设探索NLU相关任务的性能变化。为了从ASR最佳5项中收集信息，我们将提取摘要和联合提取摘要摘要模型用于域分类（DC）实验，同时将序列到序列模型与用于意图分类（IC）和命名的指针生成器网络一起使用实体识别（NER）多任务实验。对于DC完整测试集，我们观察到德语和葡萄牙语的微观平均F1得分分别显着提高了7.2％和15.5％。如果最佳ASR假设与转录的发声（不匹配的测试集）不完全匹配，我们会发现德语和葡萄牙语的F1分数平均分别提高了6.7％和8.8％。对于IC和NER多任务实验，在评估不匹配的测试集时，我们发现德语的所有域都有改进，葡萄牙语的19个域中有17个域有所改进（基于SeMER得分的变化得到了改进）。我们的结果表明，使用多个ASR假设而不是一个假设，可以导致这些非英语数据集的DC任务显着改善性能。此外，在ASR模型出错的情况下，它可以显着提高IC和NER任务的性能。

22. Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems [PDF] 返回目录
Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu
Abstract: Inspired by SpecAugment -- a data augmentation method for end-to-end ASR systems, we propose a frame-level SpecAugment method (f-SpecAugment) to improve the performance of deep convolutional neural networks (CNN) for hybrid HMM based ASR systems. Similar to the utterance level SpecAugment, f-SpecAugment performs three transformations: time warping, frequency masking, and time masking. Instead of applying the transformations at the utterance level, f-SpecAugment applies them to each convolution window independently during training. We demonstrate that f-SpecAugment is more effective than the utterance level SpecAugment for deep CNN based hybrid models. We evaluate the proposed f-SpecAugment on 50-layer Self-Normalizing Deep CNN (SNDCNN) acoustic models trained with up to 25000 hours of training data. We observe f-SpecAugment reduces WER by 0.5-4.5% relatively across different ASR tasks for four languages. As the benefits of augmentation techniques tend to diminish as training data size increases, the large scale training reported is important in understanding the effectiveness of f-SpecAugment. Our experiments demonstrate that even with 25k training data, f-SpecAugment is still effective. We also demonstrate that f-SpecAugment has benefits approximately equivalent to doubling the amount of training data for deep CNNs.
摘要：受SpecAugment的启发-一种端到端ASR系统的数据增强方法，我们提出了一种帧级SpecAugment方法（f-SpecAugment），以提高基于深度混合卷积神经网络（CNN）的基于混合HMM的ASR的性能系统。与发声级别SpecAugment相似，f-SpecAugment执行三种转换：时间扭曲，频率屏蔽和时间屏蔽。在训练过程中，f-SpecAugment不会将其应用到发声级，而是将其独立地应用于每个卷积窗口。对于基于CNN的深度混合模型，我们证明了f-SpecAugment比话语水平SpecAugment更有效。我们对使用多达25000小时的训练数据训练的50层自归一化深度CNN（SNDCNN）声学模型评估拟议的f-SpecAugment。我们观察到，在四种语言的不同ASR任务中，f-SpecAugment可使WER相对降低0.5-4.5％。由于增强技术的好处往往会随着训练数据量的增加而减少，因此，报道的大规模训练对于理解f-SpecAugment的有效性非常重要。我们的实验表明，即使使用25k训练数据，f-SpecAugment仍然有效。我们还证明了f-SpecAugment的好处大约相当于将深层CNN的训练数据量增加一倍。

23. A Taxonomy of Empathetic Response Intents in Human Social Conversations [PDF] 返回目录
Anuradha Welivita, Pearl Pu
Abstract: Open-domain conversational agents or chatbots are becoming increasingly popular in the natural language processing community. One of the challenges is enabling them to converse in an empathetic manner. Current neural response generation methods rely solely on end-to-end learning from large scale conversation data to generate dialogues. This approach can produce socially unacceptable responses due to the lack of large-scale quality data used to train the neural models. However, recent work has shown the promise of combining dialogue act/intent modelling and neural response generation. This hybrid method improves the response quality of chatbots and makes them more controllable and interpretable. A key element in dialog intent modelling is the development of a taxonomy. Inspired by this idea, we have manually labeled 500 response intents using a subset of a sizeable empathetic dialogue dataset (25K dialogues). Our goal is to produce a large-scale taxonomy for empathetic response intents. Furthermore, using lexical and machine learning methods, we automatically analysed both speaker and listener utterances of the entire dataset with identified response intents and 32 emotion categories. Finally, we use information visualization methods to summarize emotional dialogue exchange patterns and their temporal progression. These results reveal novel and important empathy patterns in human-human open-domain conversations and can serve as heuristics for hybrid approaches.
摘要：在自然语言处理社区中，开放域对话代理或聊天机器人正变得越来越流行。挑战之一是使他们能够以同理心的方式交谈。当前的神经反应生成方法仅依赖于从大规模对话数据进行的端到端学习来生成对话。由于缺乏用于训练神经模型的大规模质量数据，因此该方法可能会产生社会上无法接受的响应。但是，最近的工作表明将对话行为/意图建模与神经反应生成相结合的希望。这种混合方法提高了聊天机器人的响应质量，并使它们更具可控性和可解释性。对话意图建模的关键要素是分类法的发展。受到这个想法的启发，我们使用相当大的移情对话数据集（25K对话）手动标记了500个响应意图。我们的目标是针对移情反应意图生成大规模分类法。此外，使用词汇和机器学习方法，我们自动分析了整个数据集的说话者和听众话语，并确定了响应意图和32种情感类别。最后，我们使用信息可视化方法来总结情感对话交换模式及其时间进度。这些结果揭示了人与人之间开放域对话中的新颖而重要的移情模式，并且可以用作启发式混合方法。

24. Semantics Altering Modifications for Evaluating Comprehension in Machine Reading [PDF] 返回目录
Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro
Abstract: Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether state-of-the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence while preserving most of its lexical surface form. We present a method to automatically generate and align challenge sets featuring original and altered examples. We further propose a novel evaluation methodology to correctly assess the capability of MRC systems to process these examples independent of the data they were optimised on, by discounting for effects introduced by domain shift. In a large-scale empirical study, we apply the methodology in order to evaluate extractive MRC models with regard to their capability to correctly process SAM-enriched data. We comprehensively cover 12 different state-of-the-art neural architecture configurations and four training datasets and find that -- despite their well-known remarkable performance - optimised models consistently struggle to correctly process semantically altered data.
摘要：NLP的进步已经为机器阅读理解（MRC）的任务产生了令人印象深刻的结果，据报道，该方法可以实现与人类相当的性能。在本文中，我们研究了最新的MRC模型是否能够正确处理语义变化修饰（SAM）：语言动机现象，它在保留大部分词汇表层形式的同时改变了句子的语义。我们提出了一种自动生成和对齐具有原始示例和变更示例的挑战集的方法。我们进一步提出了一种新颖的评估方法，通过贴现域移动所带来的影响，可以正确地评估MRC系统处理这些示例的能力，而与它们所优化的数据无关。在大规模的实证研究中，我们应用该方法来评估可提取的MRC模型正确处理SAM丰富数据的能力。我们全面涵盖了12种不同的最新神经体系结构配置和四个训练数据集，并且发现-尽管它们众所周知的出色性能-优化的模型始终难以正确处理语义更改的数据。

25. Improving Clinical Document Understanding on COVID-19 Research with Spark NLP [PDF] 返回目录
Veysel Kocaman, David Talby
Abstract: Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively, leading to increased interest in automated literate review. We present a clinical text mining system that improves on previous efforts in three ways. First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events in addition to other commonly used clinical and biomedical entities. Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient. Third, the deep learning models used are more accurate than previously available, leveraging an integrated pipeline of state-of-the-art pretrained named entity recognition models, and improving on the previous best performing benchmarks for assertion status detection. We illustrate extracting trends and insights, e.g. most frequent disorders and symptoms, and most common vital signs and EKG findings, from the COVID-19 Open Research Dataset (CORD-19). The system is built using the Spark NLP library which natively supports scaling to use distributed clusters, leveraging GPUs, configurable and reusable NLP pipelines, healthcare specific embeddings, and the ability to train models to support new entity types or human languages with no code changes.
摘要：随着全球COVID-19大流行，研究该病毒的科学论文数量大量增加，从而引起了人们对自动扫盲的兴趣日益浓厚。我们提出了一种临床文本挖掘系统，该系统以三种方式改进了先前的工作。首先，除了其他常用的临床和生物医学实体之外，它还可以识别100多种不同的实体类型，包括健康，解剖学，危险因素和不良事件的社会决定因素。其次，文本处理管道包括断言状态检测，以区分存在，不存在，有条件的或关于患者以外的其他人的临床事实。第三，所使用的深度学习模型比以前的模型更准确，它利用了一系列先进的预先训练的命名实体识别模型，并改进了先前用于断言状态检测的最佳性能基准。我们举例说明了提取趋势和见解，例如最常见的疾病和症状，以及最常见的生命体征和EKG发现，来自COVID-19开放研究数据集（CORD-19）。该系统使用Spark NLP库构建，该库本身支持扩展以使用分布式集群，利用GPU，可配置和可重用的NLP管道，医疗保健特定的嵌入，以及能够训练模型以支持新的实体类型或人类语言而无需更改代码的能力。

26. CX DB8: A queryable extractive summarizer and semantic search engine [PDF] 返回目录
Allen Roush
Abstract: Competitive Debate's increasingly technical nature has left competitors looking for tools to accelerate evidence production. We find that the unique type of extractive summarization performed by competitive debaters summarization with a bias towards a particular target meaning - can be performed using the latest innovations in unsupervised pre-trained text vectorization models. We introduce CX_DB8, a queryable word-level extractive summarizer and evidence creation framework, which allows for rapid, biasable summarization of arbitarily sized texts. CX_DB8s usage of the embedding framework Flair means that as the underlying models improve, CX_DB8 will also improve. We observe that CX_DB8 also functions as a semantic search engine, and has application as a supplement to traditional "find" functionality in programs and webpages. CX_DB8 is currently used by competitive debaters and is made available to the public at this https URL
摘要：竞争性辩论的技术性日渐强大，使得竞争者一直在寻找工具来加速证据的产生。我们发现，由竞争性辩论者进行的，具有偏向特定目标含义的摘要的独特类型的摘要可以使用无监督的预训练文本向量化模型中的最新创新来执行。我们介绍了CX_DB8，这是一个可查询的单词级提取摘要器和证据创建框架，它允许对任意大小的文本进行快速，有偏差的摘要。 CX_DB8对嵌入式框架Flair的使用意味着随着基础模型的改进，CX_DB8也将改进。我们观察到CX_DB8还可以充当语义搜索引擎，并且可以作为程序和网页中传统“查找”功能的补充。 CX_DB8当前由竞争性辩论者使用，并通过此https URL向公众公开

27. Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning [PDF] 返回目录
Angelo Ziletti, Christoph Berns, Oliver Treichel, Thomas Weber, Jennifer Liang, Stephanie Kammerath, Marion Schwaerzler, Jagatheswari Virayah, David Ruau, Xin Ma, Andreas Mattern
Abstract: Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we propose a machine learning approach based on natural language processing and unsupervised learning to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.
摘要：制药公司每年都会收到数百万份未经请求的医疗咨询。据推测，这些查询代表信息的宝库，有可能使您对有关药品和相关医疗方法的见解有所了解。但是，由于查询量大和专业性强，难以及时，重复和全面地进行分析。在这里，我们提出一种基于自然语言处理和无监督学习的机器学习方法，以自动发现客户在现实世界中的医疗咨询中的关键主题。这种方法不需要本体或注释。根据医学信息专家的判断，发现的主题有意义且与医学相关，因此表明主动进行的医学咨询是有价值的客户见解的来源。我们的工作为机器学习驱动的制药行业医疗查询分析铺平了道路，该分析最终旨在改善患者护理。

28. CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions [PDF] 返回目录
Tayfun Ates, Muhammed Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret
Abstract: Recent advances in Artificial Intelligence and deep learning have revived the interest in studying the gap between the reasoning capabilities of humans and machines. In this ongoing work, we introduce CRAFT, a new visual question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 38K video and question pairs that are generated from 3K videos from 10 different virtual environments, containing different number of objects in motion that interact with each other. Two question categories from CRAFT include previously studied descriptive and counterfactual questions. Besides, inspired by the theory of force dynamics from the field of human cognitive psychology, we introduce new question categories that involve understanding the intentions of objects through the notions of cause, enable, and prevent. Our preliminary results demonstrate that even though these tasks are very intuitive for humans, the implemented baselines could not cope with the underlying challenges.
摘要：人工智能和深度学习的最新进展激发了人们对研究人类和机器推理能力之间差距的兴趣。在这项正在进行的工作中，我们介绍了CRAFT，这是一个新的视觉问题回答数据集，需要有关物理力和物体相互作用的因果推理。它包含从10个不同虚拟环境中的3K视频生成的38K视频和问题对，其中包含运动中相互交互的不同数量的对象。 CRAFT的两个问题类别包括先前研究的描述性和反事实性问题。此外，受人类认知心理学领域的动力动力学理论启发，我们引入了新的问题类别，其中涉及通过原因，使能和预防的概念来理解对象的意图。我们的初步结果表明，即使这些任务对人类来说非常直观，但已实现的基准仍无法应对潜在的挑战。

29. Efficient Estimation of Influence of a Training Instance [PDF] 返回目录
Sosuke Kobayashi, Sho Yokoi, Jun Suzuki, Kentaro Inui
Abstract: Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a training instance were not used. In this paper, we propose an efficient method for estimating the influence. Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance. By switching between dropout masks, we can use sub-networks that learned or did not learn each training instance and estimate its influence. Through experiments with BERT and VGGNet on classification datasets, we demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
摘要：了解训练实例对神经网络模型的影响可提高解释性。但是，评估影响是困难且效率低下的，这表明如果不使用训练实例，模型的预测将如何更改。在本文中，我们提出了一种有效的影响估计方法。我们的方法是受辍学启发的，该辍学对子网进行零掩码，并阻止子网学习每个训练实例。通过在辍学掩码之间切换，我们可以使用获知或未获悉每个训练实例的子网并估计其影响。通过在分类数据集上使用BERT和VGGNet进行实验，我们证明了该方法可以捕获训练影响，增强错误预测的可解释性，并清理训练数据集以提高泛化性。

30. Learning to Represent Programs with Heterogeneous Graphs [PDF] 返回目录
Wenhan Wang, Kechi Zhang, Ge Li, Zhi Jin
Abstract: Program source code contains complex structure information, which can be represented in structured data forms like trees or graphs. To acquire the structural information in source code, most existing researches use abstract syntax trees (AST). A group of works add additional edges to ASTs to convert source code into graphs and use graph neural networks to learn representations for program graphs. Although these works provide additional control or data flow information to ASTs for downstream tasks, they neglect an important aspect of structure information in AST itself: the different types of nodes and edges. In ASTs, different nodes contain different kinds of information like variables or control flow, and the relation between a node and all its children can also be different. To address the information of node and edge types, we bring the idea of heterogeneous graphs to learning on source code and present a new formula of building heterogeneous program graphs from ASTs with additional type information for nodes and edges. We use the ASDL grammar of programming language to define the node and edge types of program graphs. Then we use heterogeneous graph neural networks to learn on these graphs. We evaluate our approach on two tasks: code comment generation and method naming. Both tasks require reasoning on the semantics of complete code snippets. Experiment results show that our approach outperforms baseline models, including homogeneous graph-based models, showing that leveraging the type information of nodes and edges in program graphs can help in learning program semantics.
摘要：程序源代码包含复杂的结构信息，可以用树或图之类的结构化数据形式表示。为了获取源代码中的结构信息，大多数现有研究使用抽象语法树（AST）。一组工作为AST添加了更多优势，以将源代码转换为图形，并使用图形神经网络学习程序图形的表示形式。尽管这些工作为AST提供了用于下游任务的其他控制或数据流信息，但它们忽略了AST本身的结构信息的重要方面：不同类型的节点和边。在AST中，不同的节点包含不同种类的信息（例如变量或控制流），并且节点与其所有子节点之间的关系也可以不同。为了解决节点和边缘类型的信息，我们将异构图的思想带入了源代码学习中，并提出了一种新的公式，可以根据AST构建具有不同类型信息的节点和边缘的异构程序图。我们使用ASDL编程语言语法定义程序图的节点和边缘类型。然后，我们使用异构图神经网络在这些图上学习。我们在两个任务上评估我们的方法：代码注释生成和方法命名。两项任务都需要对完整代码段的语义进行推理。实验结果表明，我们的方法优于基线模型，包括基于均质图的模型，表明利用程序图中节点和边的类型信息可以帮助学习程序语义。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-09

目录

摘要