摘要

1. The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks [PDF] 返回目录
Brihi Joshi, Neil Shah, Francesco Barbieri, Leonardo Neves
Abstract: Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze such models' performance on tasks that require a more granular level of representation. We focus on the problem of textual similarity from two perspectives: matching documents on a granular level (requiring embeddings to capture fine-grained attributes in the text), and an abstract level (requiring embeddings to capture overall textual semantics). We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks. We then propose a simple but effective method to incorporate TF-IDF into models that use contextual embeddings, achieving relative improvements of up to 36% on granular tasks.
摘要：源自变压器的神经语言模型的上下文嵌入物显示了近年来问题应答，情绪分析和文本相似性等各种任务的最先进的性能。广泛的工作表明，这些模型可以代表文本中存在的抽象，语义信息的准确性。在本次要工作中，我们探讨了切线方向，并分析了需要更具粒度的代表水平的任务的这种模型的表现。我们从两个角度关注文本相似性的问题：匹配粒度水平的文档（要求嵌入捕获文本中的细粒度属性）和抽象水平（要求嵌入捕获整体文本语义）。我们经验展示了来自不同域的两个数据集，尽管随着预期的摘要匹配的抽象文件中的高性能，但上下文嵌入量始终如一地（并且有时，大幅度）超越了TF-IDF等简单基线，可以实现更多粒度任务。然后，我们提出了一种简单但有效的方法将TF-IDF纳入使用上下文嵌入的模型，在粒度任务中实现高达36％的相对提升。

2. QMUL-SDS at SardiStance2020: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs [PDF] 返回目录
Rabab Alkhalifa, Arkaitz Zubiaga
Abstract: This paper presents our submission to the SardiStance 2020 shared task, describing the architecture used for Task A and Task B. While our submission for Task A did not exceed the baseline, retraining our model using all the training tweets, showed promising results leading to (f-avg 0.601) using bidirectional LSTM with BERT multilingual embedding for Task A. For our submission for Task B, we ranked 6th (f-avg 0.709). With further investigation, our best experimented settings increased performance from (f-avg 0.573) to (f-avg 0.733) with same architecture and parameter settings and after only incorporating social interaction features -- highlighting the impact of social interaction on the model's performance.
摘要：本文提出了对Sardistance 2020共享任务的提交，描述了用于任务A和任务B的架构。虽然我们的任务A提交没有超过基线，但使用所有培训推文培训我们的模型，显示出有前途的结果（F-AVG 0.601）使用双向LSTM具有BERT多语言嵌入任务A.对于我们为任务B的提交，我们排名第6（F-AVG 0.709）。通过进一步调查，我们的最佳实验设置从（F-AVG 0.573）到（F-AVG 0.573）的性能增加到（F-AVG 0.733），其中架构和参数设置相同，仅在共同合并社交交互功能 - 突出了社交交互对模型性能的影响。

3. Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets [PDF] 返回目录
Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Chris Biemann
Abstract: The availability of different pre-trained semantic models enabled the quick development of machine learning components for downstream applications. Despite the availability of abundant text data for low resource languages, only a few semantic models are publicly available. Publicly available pre-trained models are usually built as a multilingual version of semantic models that can not fit well for each language due to context variations. In this work, we introduce different semantic models for Amharic. After we experiment with the existing pre-trained semantic models, we trained and fine-tuned nine new different models using a monolingual text corpus. The models are build using word2Vec embeddings, distributional thesaurus (DT), contextual embeddings, and DT embeddings obtained via network embedding algorithms. Moreover, we employ these models for different NLP tasks and investigate their impact. We find that newly trained models perform better than pre-trained multilingual models. Furthermore, models based on contextual embeddings from RoBERTA perform better than the word2Vec models.
摘要：不同预训练的语义模型的可用性使得能够快速开发用于下游应用的机器学习组件。尽管提供了低资源语言的文本数据，但只有少数语义模型可公开可用。公开可用的预先训练的模型通常是由于对上下文变化而对每种语言不适合的语义模型的多语言版本。在这项工作中，我们为Amharic介绍了不同的语义模型。在我们试验现有的预训练的语义模型后，我们使用单晶文本语料库训练和微调九个新不同模型。通过网络嵌入算法获得的Word2VEC嵌入，分布代谢物（DT），上下文嵌入和DT嵌入品来构建模型。此外，我们为不同的NLP任务雇用了这些模型，并调查了它们的影响。我们发现新培训的模型比预先训练的多语言模型更好。此外，基于来自Roberta的上下文嵌入的模型比Word2Vec模型更好地执行。

4. Automated Transcription of Non-Latin Script Periodicals: A Case Study in the Ottoman Turkish Print Archive [PDF] 返回目录
Suphan Kirmizialtin, David Wrisley
Abstract: Our study utilizes deep learning methods for the automated transcription of late nineteenth- and early twentieth-century periodicals written in Arabic script Ottoman Turkish (OT) using the Transkribus platform. We discuss the historical situation of OT text collections and how they were excluded for the most part from the late twentieth century corpora digitization that took place in many Latin script languages. This exclusion has two basic reasons: the technical challenges of OCR for Arabic script languages, and the rapid abandonment of that very script in the Turkish historical context. In the specific case of OT, opening periodical collections to digital tools require training HTR models to generate transcriptions in the Latin writing system of contemporary readers of Turkish, and not, as some may expect, in right-to-left Arabic script text. In the paper we discuss the challenges of training such models where one-to-one correspondence between the writing systems do not exist, and we report results based on our HTR experiments with two OT periodicals from the early twentieth century. Finally, we reflect on potential domain bias of HTR models in historical languages exhibiting spatio-temporal variance as well as the significance of working between writing systems for language communities that have experienced language reform and script change.
摘要：我们的研究利用了使用Transkribus平台的阿拉伯语脚本奥斯曼土耳其语（OT）的十九世纪和二十世纪初期的自动转录的深度学习方法。我们讨论了OT文本收集的历史形势以及如何在二十世纪末的大多数文本中排除在大多数集体数字化中，以许多拉丁语脚本语言发生。这种排除有两种基本原因：OCR为阿拉伯语脚本语言的技术挑战，以及在土耳其历史背景下的快速放弃该脚本。在OT的具体情况下，向数字工具打开期刊集合需要培训HTR模型，以在土耳其语的当代读者的拉丁文本系统中生成转录，而不是一些可能期望的右向阿拉伯文剧本文本。在论文中，我们讨论了培训这些模型的挑战，这些模型不存在写作系统之间的一对一对应，我们根据我们的HTR实验报告了二十世纪初期的两个OT期刊的结果。最后，我们反映了历史语言的潜在域偏差，展示了几种时空方差，以及在经历语言改革和脚本改变的语言社区写作系统之间工作的重要性。

5. Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation [PDF] 返回目录
Ruizhe Li, Xiao Li, Guanyi Chen, Chenghua Lin
Abstract: The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation.
摘要：变形AutoEncoder（VAE）是一种适用于文本建模的流行且强大的模型，以生成不同的句子。然而，当VAE用于文本建模时，会发生称为后塌陷（或KL损失消失）的问题，其中近似后折叠到先前，并且该模型将完全忽略潜在变量并降低到普通语言模型在文本生成期间。当基于RNN的VAE模型用于文本建模时，这种问题特别普遍。在本文中，我们提出了一种称为TimeStep-Wise正规化VAE（TWR-VAE）的简单通用架构，其可以有效地避免后塌崩溃，并且可以应用于任何基于RNN的VAE模型。我们模型的有效性和多功能性在不同的任务中展示，包括语言建模和对话响应生成。

6. Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain [PDF] 返回目录
Danilo Dessì, Francesco Osborne, Diego Reforgiato Recupero, Davide Buscaldi, Enrico Motta
Abstract: The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, ii) describe an approach for integrating entities and relationships generated by these tools, iii) show the advantage of such an hybrid system over alternative approaches, and vi) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge.
摘要：科学文献的持续增长带来了创新，同时提出了新的挑战。其中一个与事实有关，即由于需要手动努力和管理的发布纸张，其分析变得困难。需要新颖的技术基础设施来帮助研究人员，研究决策者和公司有效地浏览，分析和预测科学研究。知识图形即，大型实体和关系网络已经证明是在此空间中的有效解决方案。科学知识图表专注于学术域，通常包含描述研究出版物，如作者，场所，组织，研究主题和引文。然而，目前的知识图表缺乏研究论文中提出的知识的明确表示。因此，在本文中，我们提出了一种新的架构，该架构利用自然语言处理和机器学习方法，用于从研究出版物中提取实体和关系，并将它们集成在大规模的知识图中。在这项研究中，我们）我们通过采用几种最先进的自然语言处理和文本挖掘工具，ii）描述一种暗示整合这些工具，iii）显示的实体和关系的方法的挑战这种混合系统通过替代方法和vi的优点作为所选择的用例，我们生成了一个科学知识图，包括109,105三倍，从语义Web域内的26,827篇论文提取。由于我们的方法是一般的，并且可以应用于任何领域，我们预计它可以促进科学知识的管理，分析，传播和处理。

7. Exploring Question-Specific Rewards for Generating Deep Questions [PDF] 返回目录
Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan, Yansong Feng
Abstract: Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log-likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with actual question quality, which is often reflected by certain global properties such as whether the question can be answered by the document. As such, we directly optimize for QG-specific objectives via reinforcement learning to improve question quality. We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions. We conduct both automatic and human evaluations in addition to a thorough analysis to explore the effect of each QG-specific reward. We find that optimizing question-specific rewards generally leads to better performance in automatic evaluation metrics. However, only the rewards that correlate well with human judgement (e.g., relevance) lead to real improvement in question quality. Optimizing for the others, especially answerability, introduces incorrect bias to the model, resulting in poor question quality. Our code is publicly available at this https URL.
摘要：最近的问题生成（QG）方法经常利用序列到序列框架（SEQ2Seq）来优化使用教师强迫的地面真实问题的日志可能性。然而，这种培训目标与实际问题质量不一致，这通常由某些全局属性反映，例如该问题是否可以由文件回答。因此，我们通过加强学习直接优化QG特定目标，以改善质量。我们设计了三种不同的奖励，以改善产生的问题的流畅性，相关性和可应答性。除了彻底的分析外，我们还进行自动和人类评估，以探讨每个QG特定奖励的效果。我们发现优化质疑奖励通常在自动评估指标中导致更好的性能。然而，只有与人类判断（例如，相关性）相关的奖励导致了质量的真实改善。优化其他，特别是可应答性，将错误的偏差引入模型，导致质量不佳。我们的代码在此HTTPS URL上公开提供。

8. Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders [PDF] 返回目录
Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Carlos Segura
Abstract: Current end-to-end approaches to Spoken Language Translation (SLT) rely on limited training resources, especially for multilingual settings. On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher quality and more massive data sets. Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT (MultiSLT) Our experiments on four different languages show that coupling the speech encoder to the MultiNMT architecture produces similar quality translations compared to a bilingual baseline ($\pm 0.2$ BLEU) while effectively allowing for zero-shot MultiSLT. Additionally, we propose using Adapter networks for SLT that produce consistent improvements of +1 BLEU points in all tested languages.
摘要：目前对语言翻译（SLT）的当前端到端方法依赖于有限的培训资源，特别是对于多语言设置。另一方面，多语言神经机翻译（MultinMT）方法依赖于更高质量和更大的数据集。我们所提出的方法基于特定语言的编码器解码器扩展了多语言编码器的多语言SLT（MULTISLT）的多语言SLT（MULTISLT）的架构，我们对四种不同语言的实验表明，与双语基线相比，将语音编码器耦合到MULTINMT架构中产生类似的质量翻译（ $ \ PM 0.2 $ BLEU），同时有效地允许零拍摄的MULTISLT。此外，我们建议使用SLT的适配器网络在所有测试语言中产生一致的+1 BLEU点的改进。

9. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps [PDF] 返回目录
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa
Abstract: A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.
摘要：多跳问题应答（QA）数据集旨在通过要求模型来阅读多个段落来回答给定的问题来测试推理和推理技能。但是，当前数据集没有向答案的问题提供推理过程的完整解释。此外，以前的研究表明，现有的多跳数据集中的许多例子不需要多跳推理来回答问题。在这项研究中，我们介绍了一个名为2WikimultihopQA的新的多跳QA数据集，它使用结构化和非结构化数据。在我们的数据集中，我们介绍了包含多跳问题的推理路径的证据信息。证据信息有两项福利：（i）为预测和（ii）提供全面的解释，评估模型的推理技能。我们仔细设计了一种管道和一组模板，在生成问题答案对时，保证了多跳步骤和问题的质量。我们还利用Wikidata中的结构化格式，并使用逻辑规则创建自然的问题，但仍需要多跳推理。通过实验，我们证明我们的数据集对多跳模型具有挑战性，并且它确保了需要多跳推理。

10. Biased TextRank: Unsupervised Graph-Based Content Extraction [PDF] 返回目录
Ashkan Kazemi, Verónica Pérez-Rosas, Rada Mihalcea
Abstract: We introduce Biased TextRank, a graph-based content extraction method inspired by the popular TextRank algorithm that ranks text spans according to their importance for language processing tasks and according to their relevance to an input "focus." Biased TextRank enables focused content extraction for text by modifying the random restarts in the execution of TextRank. The random restart probabilities are assigned based on the relevance of the graph nodes to the focus of the task. We present two applications of Biased TextRank: focused summarization and explanation extraction, and show that our algorithm leads to improved performance on two different datasets by significant ROUGE-N score margins. Much like its predecessor, Biased TextRank is unsupervised, easy to implement and orders of magnitude faster and lighter than current state-of-the-art Natural Language Processing methods for similar tasks.
摘要：我们介绍了偏见的textrank，一种基于图的内容提取方法，受到流行的Textrank算法的启发，可根据语言处理任务的重要性排名文本跨度，并根据其与输入“焦点”的相关性。偏见的Textrank通过修改Textrank执行中的随机重启，可以通过修改随机重新启动来实现专注的内容提取。随机重启概率基于图形节点与任务的焦点的相关性分配。我们展示了两种偏见的Textrank应用：重点摘要和解释提取，并表明我们的算法通过显着的Rouge-N得分边距来提高两个不同数据集的性能。与其前身一样，偏见的Textrank是无人监督的，易于实施和数量级比当前最先进的自然语言处理方法，用于类似任务。

11. DNN-Based Semantic Model for Rescoring N-best Speech Recognition List [PDF] 返回目录
Dominique Fohr, Irina Illina
Abstract: The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM mixed with real noise. The proposed rescoring approaches give significant improvement of the WER over the ASR system without rescoring models in two noisy conditions and with n-gram and RNNLM.
摘要：当训练和测试条件之间发生不匹配时，自动语音识别（ASR）系统的单词误差率（ASR）系统增加，在这种情况下，声学信息可以不太可靠。这项工作旨在通过建模长期语义关系来补偿扭曲的声学特征来改善ASR。我们建议通过备份ASR N-Best Hispotheses列表来履行这一点。为实现这一目标，我们训练一个深神经网络（DNN）。我们的DNN救援模式旨在选择具有更好的语义一致性的假设，从而降低WER。我们调查两种类型的表示，作为我们DNN模型的输入功能的一部分：静态字嵌入式（来自Word2VEC）和动态上下文嵌入（来自BERT）。还包括声学和语言特征。我们在与真正的噪声混合的公开数据集TED下的数据上进行实验。拟议的救援方法在ASR系统上显着改善了WER，而在两个嘈杂的条件下并使用N-GRAM和RNNLM进行备用模型。

12. Combining Event Semantics and Degree Semantics for Natural Language Inference [PDF] 返回目录
Izumi Haruta, Koji Mineshima, Daisuke Bekki
Abstract: In formal semantics, there are two well-developed semantic frameworks: event semantics, which treats verbs and adverbial modifiers using the notion of event, and degree semantics, which analyzes adjectives and comparatives using the notion of degree. However, it is not obvious whether these frameworks can be combined to handle cases in which the phenomena in question are interacting with each other. Here, we study this issue by focusing on natural language inference (NLI). We implement a logic-based NLI system that combines event semantics and degree semantics and their interaction with lexical knowledge. We evaluate the system on various NLI datasets containing linguistically challenging problems. The results show that the system achieves high accuracies on these datasets in comparison with previous logic-based systems and deep-learning-based systems. This suggests that the two semantic frameworks can be combined consistently to handle various combinations of linguistic phenomena without compromising the advantage of either framework.
摘要：在正式语义中，有两个发达的语义框架：事件语义，使用事件概念和学位语义来处理动词和状语修饰符，分析了使用程度的概念的形容词和比较。然而，不明显的是这些框架是否可以组合以处理所讨论的现象的案例彼此相互作用。在这里，我们通过专注于自然语言推理（NLI）来研究这个问题。我们实现了一个基于逻辑的NLI系统，将事件语义和学位语义结合起来及其与词汇知识的互动。我们评估了包含语言上挑战性问题的各种NLI数据集的系统。结果表明，与以前的基于逻辑的系统和基于深度学习的系统相比，该系统在这些数据集中实现了高精度。这表明两个语义框架可以一致地组合以处理语言现象的各种组合，而不会影响任一框架的优点。

13. A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English [PDF] 返回目录
Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr M. Abdullah, Dietrich Klakow
Abstract: Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models' performance. Our results highlight the importance of (a)model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.
摘要：基于变压器的语言模型在各种任务上实现了高性能，但我们仍然缺乏对他们学习和依赖的语言知识的那种。我们评估三种模型（BERT，Roberta和Albert），通过句子级探测，诊断情况和屏蔽预测任务测试他们的语法和语义知识。我们专注于相对条款（美国英语）作为需要解决上下文信息和先行识别的复杂现象。基于自然主义数据集，探测表明，所有三种模型都确实捕获了关于语法性的语言知识，实现了高性能。然而，考虑诊断情况和屏蔽预测任务的评估，考虑细粒度的语言知识，显示出明显的模型特异性弱点，特别是在语义知识，强烈影响模型的性能。我们的结果突出了（a）模型比较在评估任务中的重要性和（b）构成模型性能的索赔以及它们捕获超出纯粹探测的评估的语言知识。

14. An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution [PDF] 返回目录
Ryuto Konno, Yuichiroh Matsubayashi, Shun Kiyono, Hiroki Ouchi, Ryo Takahashi, Kentaro Inui
Abstract: One critical issue of zero anaphora resolution (ZAR) is the scarcity of labeled data. This study explores how effectively this problem can be alleviated by data augmentation. We adopt a state-of-the-art data augmentation method, called the contextual data augmentation (CDA), that generates labeled training instances using a pretrained language model. The CDA has been reported to work well for several other natural language processing tasks, including text classification and machine translation. This study addresses two underexplored issues on CDA, that is, how to reduce the computational cost of data augmentation and how to ensure the quality of the generated data. We also propose two methods to adapt CDA to ZAR: [MASK]-based augmentation and linguistically-controlled masking. Consequently, the experimental results on Japanese ZAR show that our methods contribute to both the accuracy gain and the computation cost reduction. Our closer analysis reveals that the proposed method can improve the quality of the augmented training data when compared to the conventional CDA.
摘要：零安差星分辨率（ZAR）的一个关键问题是标记数据的稀缺性。本研究探讨了数据增强可以减轻这个问题的有效性。我们采用了最先进的数据增强方法，称为上下文数据增强（CDA），使用预先磨普的语言模型生成标记的培训实例。据报道，CDA对于几种其他自然语言处理任务，包括文本分类和机器翻译。本研究解决了CDA上的两个未激光的问题，即如何降低数据增强的计算成本以及如何确保所生成的数据的质量。我们还提出了两种方法来使CDA适应ZAR：[面具]基础的增强和语言控制掩蔽。因此，日本ZAR的实验结果表明，我们的方法涉及精度增益和计算成本降低。我们仔细的分析表明，与传统CDA相比，该方法可以提高增强训练数据的质量。

15. How Far Does BERT Look At:Distance-based Clustering and Analysis of BERTś Attention [PDF] 返回目录
Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo
Abstract: Recent research on the multi-head attention mechanism, especially that in pre-trained modelssuch as BERT, has shown us heuristics and clues in analyzing various aspects of the this http URL most of the research focus on probing tasks or hidden states, previous works have found someprimitive patterns of attention head behavior by heuristic analytical methods, but a more system-atic analysis specific on the attention patterns still remains primitive. In this work, we clearlycluster the attention heatmaps into significantly different patterns through unsupervised cluster-ing on top of a set of proposed features, which corroborates with previous observations. Wefurther study their corresponding functions through analytical study. In addition, our proposedfeatures can be used to explain and calibrate different attention heads in Transformer models.
摘要：最近关于多主头注意机制的研究，特别是在预先接受研磨的型号为BERT中，已经向美国启发式和线索分析了这个HTTP URL的各个方面的大多数研究专注于探测任务或隐藏状态作品通过启发式分析方法发现了一些关注头行为的一些关注性头部，而是对注意模式的更具系统的分析仍然是原始的。在这项工作中，通过无监督的聚类在一组提议的特征之上，我们将注意力释放到明显不同的模式，这些功能与先前的观察结果相关。 Wefurther通过分析研究研究了它们的相应功能。此外，我们的BucosedFeatures可用于解释和校准变压器模型中的不同关注头。

16. Emergent Communication Pretraining for Few-Shot Machine Translation [PDF] 返回目录
Yaoyiran Li, Edoardo M. Ponti, Ivan Vulić, Anna Korhonen
Abstract: While state-of-the-art models that rely upon massively multilingual pretrained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Nevertheless, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in the absence of linguistic data. In particular, for the first time we pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages. On the one hand, we show that this substantially benefits machine translation in few-shot settings. On the other hand, this also provides an extrinsic evaluation protocol to probe the properties of emergent languages ex vitro. Intuitively, the closer they are to natural languages, the higher the gains from pretraining on them should be. For instance, in this work we measure the influence of communication success and maximum sequence length on downstream performances. Finally, we introduce a customised adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. These turn out to be crucial to facilitate knowledge transfer and prevent catastrophic forgetting. Compared to a recurrent baseline, our method yields gains of $59.0\%$$\sim$$147.6\%$ in BLEU score with only $500$ NMT training instances and $65.1\%$$\sim$$196.7\%$ with $1,000$ NMT training instances across four language pairs. These proof-of-concept results reveal the potential of emergent communication pretraining for both natural language processing tasks in resource-poor settings and extrinsic evaluation of artificial languages.
摘要：虽然依赖大规模多语言预用编码器的最先进模型在下游应用中实现采样效率，但它们仍然需要丰富的未标记文本。然而，世界上大多数人的语言都缺乏这样的资源。因此，我们在没有语言数据的情况下调查更自然的无监督知识转移。特别是，首次通过来自参考游戏的紧急沟通来预先磨擦神经网络。我们的关键假设是在图像上接地通信 - 作为现实世界环境的粗逼近---感应偏向模型朝向学习自然语言。一方面，我们表明这在几次拍摄的环境中基本上有利的机器翻译。另一方面，这还提供了一种外在评估协议，以探讨出急外语言的特性。直观地，他们对自然语言的越近，预先磨练的收益越高。例如，在这项工作中，我们测量通信成功和最大序列长度对下游性能的影响。最后，我们在微调期间引入了用于最大-A-Bouthiori推断的常规方法的定制适配器层和退火策略。这些人对促进知识转移并防止灾难性的遗忘至关重要。与经常性基线相比，我们的方法收益率为59.0 \％$$ \％$$ 147.6 \％$ 147.6 \％$ 500 $ nmt培训实例和$ 65.1 \％$$ 196.7 \％$ 1,000 $ nmt跨越四种语言对的培训实例。这些概念证明结果揭示了用于资源差的环境中的自然语言处理任务的紧急通信预借鉴以及人工语的外在评估。

17. Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics [PDF] 返回目录
Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, Omri Abend
Abstract: Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods: (i) a rule-based converter, and (ii) a supervised delexicalized parser that parses to one framework using only information from the other as features. We apply these methods to convert the STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a graph-structured full-sentence meaning representation). Both methods yield surprisingly accurate target representations, close to fully supervised UCCA parser quality---indicating that UCCA annotations are partially redundant with STREUSLE annotations. Despite this substantial convergence between frameworks, we find several important areas of divergence.
摘要：建立强大的自然语言理解系统将需要清楚地表征是否以及各种语言意义表现彼此相互补充。为了执行系统的比较分析，我们使用两个互补方法评估来自不同框架的含义表示的映射：（i）基于规则的转换器，和（ii）一个监督的商品化解析器，其仅使用来自另一个框架的信息解析为一个框架作为特征。我们应用这些方法将STREUSLE语料库（具有语法和词汇语义注释）转换为UCCA（图形结构化的全文含义表示表示）。这两种方法都会产生令人惊讶的准确目标表示，接近完全监督的UCCA解析器质量---表明UCCA注释与Streusle注释部分冗余。尽管框架之间存在这一大量融合，但我们发现了几个重要的分歧领域。

18. Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [PDF] 返回目录
Zhongfen Deng, Hao Peng, Congying Xia, Jianxin Li, Lifang He, Philip S. Yu
Abstract: Review rating prediction of text reviews is a rapidly growing technology with a wide range of applications in natural language processing. However, most existing methods either use hand-crafted features or learn features using deep learning with simple text corpus as input for review rating prediction, ignoring the hierarchies among data. In this paper, we propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation, which can serve as an effective decision-making tool for the academic paper review process. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three). Each encoder first derives contextual representation of each level, then generates a higher-level representation, and after the learning process, we are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers. Furthermore, we introduce two new metrics to evaluate models in data imbalance situations. Extensive experiments on a publicly available dataset (PeerRead) and our own collected dataset (OpenReview) demonstrate the superiority of the proposed approach compared with state-of-the-art methods.
摘要：审查文本评估预测是一种快速增长的技术，具有广泛的自然语言处理应用。但是，大多数现有方法使用使用简单的文本语料库作为审查评级预测的简单文本语料库来使用手工制作的功能或学习功能，以忽略数据之间的层次结构。在本文中，我们提出了一种分层双向自我关注网络框架（HABNET），用于纸张审查评级预测和推荐，其可以作为学术纸张审查过程的有效决策工具。具体而言，我们利用纸质评论的分层结构，采用三级编码器：句子编码器（级别），审查编码器（级别）和互联频道编码器（级别三）。每个编码器首先派生每个级别的上下文表示，然后生成更高级别的表示，并且在学习过程之后，我们能够识别有用的预测因子来进行最终接受决定，并帮助发现数字回顾评级之间的不一致性并由审稿人传达的文本情绪。此外，我们介绍了两个新的指标来评估数据不平衡情况的模型。在公开的数据集（PEERREAD）和我们自己的收集数据集（OpenReview）上进行了广泛的实验，证明了与最先进的方法相比提出的方法的优势。

19. Context Dependent Semantic Parsing: A Survey [PDF] 返回目录
Zhuang Li, Lizhen Qu, Gholamreza Haffari
Abstract: Semantic parsing is the task of translating natural language utterances into machine-readable meaning representations. Currently, most semantic parsing methods are not able to utilize contextual information (e.g. dialogue and comments history), which has a great potential to boost semantic parsing performance. To address this issue, context dependent semantic parsing has recently drawn a lot of attention. In this survey, we investigate progress on the methods for the context dependent semantic parsing, together with the current datasets and tasks. We then point out open problems and challenges for future research in this area. The collected resources for this topic are available at:this https URL.
摘要：语义解析是将自然语言话语翻译成机器可读含义表示的任务。目前，大多数语义解析方法无法利用上下文信息（例如对话和评论历史），这具有促进语义解析性能的巨大潜力。要解决此问题，上下文依赖语义解析最近绘制了很多人的注意。在本调查中，我们调查了上下文依赖语义解析的方法的进度，以及当前数据集和任务。然后我们指出了未来研究在这一领域的开放问题和挑战。本主题的收集资源可用于：此HTTPS URL。

20. Adapting Pretrained Transformer to Lattices for Spoken Language Understanding [PDF] 返回目录
Chao-Wei Huang, Yun-Nung Chen
Abstract: Lattices are compact representations that encode multiple hypotheses, such as speech recognition results or different word segmentations. It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU). Recently, pretrained language models with the transformer architecture have achieved the state-of-the-art results on natural language understanding, but their ability of encoding lattices has not been explored. Therefore, this paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language. Our experiments on the benchmark ATIS dataset show that fine-tuning pretrained transformers with lattice inputs yields clear improvement over fine-tuning with 1-best results. Further evaluation demonstrates the effectiveness of our methods under different acoustic conditions. Our code is available at this https URL
摘要：格子是编码多个假设的紧凑型表示，例如语音识别结果或不同的单词分段。结果表明，与自动语音识别器（ASR）产生的1-Bey结果相反，编码格子促进了语言理解的性能（SLU）。最近，具有变压器架构的预磨料语言模型已经实现了最先进的自然语言理解，但它们的编码能力尚未探讨。因此，本文旨在将普雷雷达变压器调整为晶格输入，以便专门为口语进行理解任务。我们对基准测试的实验显示数据集显示，使用晶格输入的微调净化的变压器产生明显的改进，并通过1次效果进行微调。进一步的评估证明了我们在不同声学条件下的方法的有效性。我们的代码可在此HTTPS URL上获得

21. COSMO: Conditional SEQ2SEQ-based Mixture Model for Zero-Shot Commonsense Question Answering [PDF] 返回目录
Farhad Moghimifar, Lizhen Qu, Yue Zhuo, Mahsa Baktashmotlagh, Gholamreza Haffari
Abstract: Commonsense reasoning refers to the ability of evaluating a social situation and acting accordingly. Identification of the implicit causes and effects of a social context is the driving capability which can enable machines to perform commonsense reasoning. The dynamic world of social interactions requires context-dependent on-demand systems to infer such underlying information. However, current approaches in this realm lack the ability to perform commonsense reasoning upon facing an unseen situation, mostly due to incapability of identifying a diverse range of implicit social relations. Hence they fail to estimate the correct reasoning path. In this paper, we present Conditional SEQ2SEQ-based Mixture model (COSMO), which provides us with the capabilities of dynamic and diverse content generation. We use COSMO to generate context-dependent clauses, which form a dynamic Knowledge Graph (KG) on-the-fly for commonsense reasoning. To show the adaptability of our model to context-dependant knowledge generation, we address the task of zero-shot commonsense question answering. The empirical results indicate an improvement of up to +5.2% over the state-of-the-art models.
摘要：致辞推理是指评估社会局面和相应行动的能力。识别社会背景的隐含原因和效果是驱动能力，可以使机器能够执行致辞推理。社交互动的动态世界需要依赖于上下文的按需系统来推断出这样的基础信息。然而，在这个领域的目前的方法缺乏在面对不间断的情况下进行勤义推理的能力，主要是由于无法识别各种隐含的社会关系。因此，他们未能估计正确的推理路径。在本文中，我们提供了条件的SEQ2Seq的混合模型（COSMO），为我们提供了动态和多样化的内容生成的能力。我们使用COSMO生成上下文相关的条款，即在飞行中形成动态知识图（KG）以进行致辞推理。为了表明我们模型的适应性与上下文相关的知识生成，我们解决了零击致辞问题的任务。经验结果表明，在最先进的模型中提高了最高+ 5.2％。

22. Context-Aware Cross-Attention for Non-Autoregressive Translation [PDF] 返回目录
Liang Ding, Longyue Wang, Di Wu, Dacheng Tao, Zhaopeng Tu
Abstract: Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence. However, due to the lack of target dependency modelling in the decoder, the conditional generation process heavily depends on the cross-attention. In this paper, we reveal a localness perception problem in NAT cross-attention, for which it is difficult to adequately capture source context. To alleviate this problem, we propose to enhance signals of neighbour source tokens into conventional cross-attention. Experimental results on several representative datasets show that our approach can consistently improve translation quality over strong NAT baselines. Extensive analyses demonstrate that the enhanced cross-attention achieves better exploitation of source contexts by leveraging both local and global information.
摘要：非自动评级翻译（NAT）通过预测整个目标序列，显着加速推理过程。然而，由于解码器中缺乏目标依赖性建模，条件生成过程大大取决于横向。在本文中，我们揭示了NAT横向关注的本地感知问题，因为它难以充分捕获源上下文。为了缓解这个问题，我们建议增强邻居源代币的信号进入传统的跨关注。关于几个代表性数据集的实验结果表明，我们的方法可以始终如一地提高强大的NAT基线的翻译质量。广泛的分析表明，通过利用本地和全球信息，增强的跨关注实现了对源语境的更好利用。

23. Reducing Confusion in Active Learning for Part-Of-Speech Tagging [PDF] 返回目录
Aditi Chaudhary, Antonios Anastasopoulos, Zaid Sheikh, Graham Neubig
Abstract: Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances which maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution.
摘要：主动学习（AL）使用数据选择算法选择有用的训练样本，以最大限度地减少注释成本。现在这是构建低资源句法分析仪的重要工具，例如语音部分（POS）标签。现有的AL启发式通常是在选择不确定尚未代表性的培训实例的原则上设计的，其中这些实例可以减少大量错误。然而，在六种类型不同语言的实证研究中（德语，瑞典语，加利西亚人，北萨米，波斯和乌克兰），我们发现了令人惊讶的结果，即使在甲骨文场景中，我们知道预测的真正不确定性，这些目前的启发式远非最佳。基于该分析，我们将AL的问题构成为选择实例，从而最大衡地减少特定输出标签之间的混淆。上述语言的广泛实验表明，我们提出的AL策略以重大利润率优于其他AL策略。我们还呈现辅助结果表明模型适当校准的重要性，我们通过跨视图培训确保，并分析展示我们所提出的策略如何选择更紧密地遵循Oracle数据分发的示例。

24. I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning [PDF] 返回目录
Jungwoo Lim, Dongsuk Oh, Yoonna Jang, Kisu Yang, Heuiseok Lim
Abstract: CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.
摘要：CommanSeaseQA是通过具有预定义知识的致辞推理来预测正确答案的任务。最先前的作品旨在提高分布式表示的性能，而不考虑从问题的语义表示中预测答案的过程。在问题的语义解释时阐明，我们提出了一个AMR-ConceptNet-Creuned（ACP）图。 ACP图是从包含从输入问题生成的抽象含义表示（AMR）图的完整集成图形图（AMR）图，概念性知识图表，ConceptNet（CN）。然后，acp图表被剥削以解释推理路径，并预测CommanSENSEQA任务的正确答案。本文介绍了致辞推理过程可以用ACP图所提供的关系和概念解释的方式。此外，基于ACP的模型被示出以优于基线。

25. ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN [PDF] 返回目录
David Samuel, Milan Straka
Abstract: We present PERIN, a novel permutation-invariant approach to sentence-to-graph semantic parsing. PERIN is a versatile, cross-framework and language independent architecture for universal modeling of semantic structures. Our system participated in the CoNLL 2020 shared task, Cross-Framework Meaning Representation Parsing (MRP 2020), where it was evaluated on five different frameworks (AMR, DRG, EDS, PTG and UCCA) across four languages. PERIN was one of the winners of the shared task. The source code and pretrained models are available at this https URL.
摘要：我们介绍涉及句子到图语义解析的新型置换豁免方法。 PERIN是一种多功能，交叉框架和语言独立架构，用于语义结构的通用建模。我们的系统参与了Conll 2020共享任务，跨框架含义表示解析（MRP 2020），其中在四种语言中评估了五种不同的框架（AMR，DRG，EDS，PTG和UCCA）。庇护是共同任务的获奖者之一。此HTTPS URL提供源代码和预磨料模型。

26. Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation [PDF] 返回目录
Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier
Abstract: We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one decoder can attend to different information sources from the other via a dual-attention mechanism. We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively. Extensive experiments on the MuST-C dataset show that our models outperform the previously-reported highest translation performance in the multilingual settings, and outperform as well bilingual one-to-one results. Furthermore, our parallel models demonstrate no trade-off between ASR and ST compared to the vanilla multi-task architecture. Our code and pre-trained models are available at this https URL.
摘要：我们引入双解码器变压器，这是一个共同执行自动语音识别（ASR）和多语言语音转换（ST）的新模型架构。我们的型号基于原始变压器架构（Vaswani等，2017），但由两个解码器组成，每个解码器都负责一个任务（ASR或ST）。我们的主要贡献在于这些解码器如何相互交互：一个解码器可以通过双关注机制从另一个解码器参加不同的信息源。我们提出了两种对应于解码器之间的两个不同依赖性的这些架构的变体，分别称为并联和交叉双解码器变压器之间。关于Must-C DataSet的广泛实验表明，我们的模型在多语言设置中表现出先前报告的最高版本性能，并且擅长双语一对一的结果。此外，我们的并行模型与Vanilla多任务架构相比，ASR和St之间没有折衷。我们的代码和预先训练的型号可在此HTTPS URL上获得。

27. Abstracting Influence Paths for Explaining (Contextualization of) BERT Models [PDF] 返回目录
Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
Abstract: While "attention is all you need" may be proving true, we do not yet know why: attention-based models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement (SVA) is uncertain. We introduce multi-partite patterns, abstractions of sets of paths through a neural network model. Patterns quantify and localize the effect of an input concept (e.g., a subject's number) on an output concept (e.g. corresponding verb's number) to paths passing through a sequence of model components, thus surfacing how BERT contextualizes information. We describe guided pattern refinement, an efficient search procedure for finding patterns representative of concept-critical paths. We discover that patterns generate succinct and meaningful explanations for BERT, highlighted by "copy" and "transfer" operations implemented by skip connections and attention heads, respectively. We also show how pattern visualizations help us understand how BERT contextualizes various grammatical concepts, such as SVA across clauses, and why it makes errors in some cases while succeeding in others.
摘要：虽然“关注是你需要的”可能会证明，但我们尚不知道为什么：伯特等关注的型号是优越的，但它们即使是对象 - 动词编号协议（SVA）的简单语法规则）不确定。我们通过神经网络模型介绍多伴者模式，抽象一组路径。模式量化和本地化输入概念（例如，主题号码）对输出概念（例如，相应的动词号码）到通过一系列模型组件的路径的效果，从而介绍BERT上下文化信息。我们描述了指导模式改进，了解用于查找代表概念关键路径的模式的有效搜索过程。我们发现模式为伯特生成了简洁和有意义的解释，突出显示了“复制”和“传输”操作，分别通过跳过连接和注意头实现。我们还展示了模式可视化如何帮助我们了解BERT中上下文的概念，例如跨子句的各种语法概念，以及为什么在某些情况下会在某些情况下进行错误。

28. Semi-supervised Autoencoding Projective Dependency Parsing [PDF] 返回目录
Xiao Zhang, Dan Goldwasser
Abstract: We describe two end-to-end autoencoding models for semi-supervised graph-based projective dependency parsing. The first model is a Locally Autoencoding Parser (LAP) encoding the input using continuous latent variables in a sequential manner; The second model is a Globally Autoencoding Parser (GAP) encoding the input into dependency trees as latent variables, with exact inference. Both models consist of two parts: an encoder enhanced by deep neural networks (DNN) that can utilize the contextual information to encode the input into latent variables, and a decoder which is a generative model able to reconstruct the input. Both LAP and GAP admit a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We conducted experiments on WSJ and UD dependency parsing data sets, showing that our models can exploit the unlabeled data to improve the performance given a limited amount of labeled data, and outperform a previously proposed semi-supervised model.
摘要：我们描述了两个基于半监督的基于图形的投影依赖性解析的端到端的自动统计模型。第一个模型是使用连续潜变量以顺序方式编码输入的本地自动编码解析器（LAP）;第二种模型是将输入到依赖性树的输入到潜变量的全局自动扩展解析器（间隙），精确推断。这两种型号由两个部分组成：由深神经网络（DNN）增强的编码器，其可以利用上下文信息来对输入进行编码到潜在变量中，以及作为能够重建输入的生成模型的解码器。 LAP和GAP都承认具有不同丢失功能的统一结构，用于具有共享参数的标记和未标记的数据。我们在WSJ和UD依赖性解析数据集上进行了实验，表明我们的模型可以利用未标记的数据来改善具有有限数量的标记数据的性能，并且优于先前提出的半监督模型。

29. ABNIRML: Analyzing the Behavior of Neural IR Models [PDF] 返回目录
Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan
Abstract: Numerous studies have demonstrated the effectiveness of pretrained contextualized language models such as BERT and T5 for ad-hoc search. However, it is not well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic tests that allow us to probe several characteristics---such as sensitivity to word order---that are not addressed by previous techniques. To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. We find evidence that recent neural ranking models have fundamentally different characteristics from prior ranking models. For instance, these models can be highly influenced by altered document word order, sentence order and inflectional endings. They can also exhibit unexpected behaviors when additional content is added to documents, or when documents are expressed with different levels of fluency or formality. We find that these differences can depend on the architecture and not just the underlying language model.
摘要：众多研究已经证明了预训的语境化语言模型，例如BECT和T5的替代语文化的效果。然而，它不太理解为什么这些方法如此有效，是什么使一些变体比其他变体更有效，以及他们可能拥有的陷阱。我们展示了一个新的综合框架，用于分析神经IR模型（AvNIrml）的行为，包括新型的诊断测试，使我们能够探测多种特征 - 例如对Word订单的敏感性 - 没有通过之前解决技巧。为了展示框架的价值，我们进行了广泛的实证研究，产生了对神经模型的收益有贡献的因素的洞察力，并识别模型的潜在意外偏见。我们发现证据表明，最近的神经排名模型从先前的排名模型具有根本不同的特征。例如，这些模型可以受到改变的文档词汇，句子顺序和拐点的影响。当向文档添加其他内容时，它们也可以表现出意想不到的行为，或者用不同水平的流利或形式表达文件时。我们发现这些差异可以取决于架构，而不仅仅是底层语言模型。

30. How Domain Terminology Affects Meeting Summarization Performance [PDF] 返回目录
Jia Jin Koay, Alexander Roustai, Xiaojin Dai, Dillon Burns, Alec Kerrigan, Fei Liu
Abstract: Meetings are essential to modern organizations. Numerous meetings are held and recorded daily, more than can ever be comprehended. A meeting summarization system that identifies salient utterances from the transcripts to automatically generate meeting minutes can help. It empowers users to rapidly search and sift through large meeting collections. To date, the impact of domain terminology on the performance of meeting summarization remains understudied, despite that meetings are rich with domain knowledge. In this paper, we create gold-standard annotations for domain terminology on a sizable meeting corpus; they are known as jargon terms. We then analyze the performance of a meeting summarization system with and without jargon terms. Our findings reveal that domain terminology can have a substantial impact on summarization performance. We publicly release all domain terminology to advance research in meeting summarization.
摘要：会议对现代组织至关重要。每天举行众多会议，比曾经理解的更多会议。会议摘要系统，识别来自成绩单的显着词语，以自动生成会议分钟可以提供帮助。它赋予用户快速搜索和筛选大型会议集合。迄今为止，尽管会议具有富裕的域知识，但迄今为止，域术语对会议概要表现的影响仍然被解读。在本文中，我们在可相同的会议语料库上创建用于域术语的金标注记;它们被称为行话术语。然后，我们分析了会议摘要系统的表现，没有行话术语。我们的研究结果表明，域术语可能对摘要表现产生重大影响。我们公开发布所有域术语，以推进会议概述研究。

31. Sequence-to-Sequence Networks Learn the Meaning of Reflexive Anaphora [PDF] 返回目录
Robert Frank, Jackson Petty
Abstract: Reflexive anaphora present a challenge for semantic interpretation: their meaning varies depending on context in a way that appears to require abstract variables. Past work has raised doubts about the ability of recurrent networks to meet this challenge. In this paper, we explore this question in the context of a fragment of English that incorporates the relevant sort of contextual variability. We consider sequence-to-sequence architectures with recurrent units and show that such networks are capable of learning semantic interpretations for reflexive anaphora which generalize to novel antecedents. We explore the effect of attention mechanisms and different recurrent unit types on the type of training data that is needed for success as measured in two ways: how much lexical support is needed to induce an abstract reflexive meaning (i.e., how many distinct reflexive antecedents must occur during training) and what contexts must a noun phrase occur in to support generalization of reflexive interpretation to this noun phrase?
摘要：反身幻想对语义解释提出挑战：他们的意思是根据似乎需要抽象变量的方式取决于上下文。过去的工作提出了关于经常性网络符合这一挑战的能力的疑虑。在本文中，我们在英语片段的上下文中探讨了包含相关类型的上下文变异性的问题。我们考虑具有经常性单位的顺序与序列架构，并表明这种网络能够学习针对反射的神话方式，以概括为新的前书。我们探讨了注意机制和不同的经常性单位类型对成功所需的培训数据的影响：以两种方式测量所需的培训数据类型：需要多少词汇支持来诱导摘要反射意义（即，必须有多少独特的反身份证在训练期间发生），并且必须在哪个语境中发生什么，以支持对该名词短语的反射解释的概括？

32. Event-Related Bias Removal for Real-time Disaster Events [PDF] 返回目录
Evangelia Spiliopoulou, Salvador Medina Maza, Eduard Hovy, Alexander Hauptmann
Abstract: Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks. Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time. This poses a complex problem due to the large amount of posts that do not contain any actionable information. Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis. Prior work focuses on models pre-trained on similar event types. However, those models capture unnecessary event-specific biases, like the location of the event, which affect the generalizability and performance of the classifiers on new unseen data from an emerging new event. In our work, we train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
摘要：社交媒体已成为分享有关自然灾害和大规模攻击等危机事件信息的重要工具。检测包含有用信息的可操作帖子需要在实时迅速分析大量数据。由于不包含任何可操作信息的大量帖子，这会带来复杂的问题。此外，实时系统中信息的分类需要培训域名数据，因为我们没有来自新兴危机的任何数据。在前工作重点介绍在类似事件类型上预先培训的模型。但是，这些模型捕获了不必要的事件特定偏差，如事件的位置，这会影响来自新兴新事件的新未复数据上的分类器的概括性和性能。在我们的工作中，我们训练对抗性神经模型，以消除潜在的事件特定的偏见，并提高对Tweet Importance分类的性能。

33. Liputan6: A Large-scale Indonesian Dataset for Text Summarization [PDF] 返回目录
Fajri Koto, Jey Han Lau, Timothy Baldwin
Abstract: In this paper, we introduce a large-scale Indonesian summarization dataset. We harvest articles from this http URL, an online news portal, and obtain 215,827 document-summary pairs. We leverage pre-trained language models to develop benchmark extractive and abstractive summarization methods over the dataset with multilingual and monolingual BERT-based models. We include a thorough error analysis by examining machine-generated summaries that have low ROUGE scores, and expose both issues with ROUGE it-self, as well as with extractive and abstractive summarization models.
摘要：在本文中，我们介绍了一个大规模的印度尼西亚摘要数据集。我们从这个HTTP URL，一个在线新闻门户中收集文章，并获得215,827个文档摘要对。我们利用预先接受训练的语言模型在数据集中开发基准提取和抽象摘要方法，具有多语言和单语法的基于BERT的模型。我们通过检查具有低胭脂分数的机器生成的总摘要，包括彻底的错误分析，并通过胭脂IT自我以及提取和抽象摘要模型公开两个问题。

34. Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation [PDF] 返回目录
Shuhao Gu, Yang Feng
Abstract: Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conduct experiments across different language pairs and domains to ensure the validity and reliability of our findings.
摘要：神经电脑翻译（NMT）模型通常在持续训练期间遭受灾难性的遗忘，其中模型倾向于逐渐忘记以前学到的知识和摆动以适合可能具有不同分布的新增数据，例如，一个不同的域名。虽然已经提出了许多方法来解决这个问题，但我们无法了解尚未知道这一现象的原因。在域改性的背景下，我们从模块和参数（神经元）的角度来看灾难性遗忘的原因。对NMT模型模块的研究表明，一些模块与一般域知识有紧张的关系，而其他一些模块在域适应方面更为必要。参数的调查表明，一些参数对于普通域和域名翻译是重要的，并且在持续训练期间它们的巨大变化带来了一般域的性能下降。我们对不同语言对和域进行实验，以确保我们的研究结果的有效性和可靠性。

35. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP [PDF] 返回目录
Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin
Abstract: Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardization. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse. We additionally release IndoBERT, a new pre-trained language model for Indonesian, and evaluate it over IndoLEM, in addition to benchmarking it against existing resources. Our experiments show that IndoBERT achieves state-of-the-art performance over most of the tasks in IndoLEM.
摘要：虽然印度尼西亚语言近2亿人口和世界上第10名口语，但它在NLP研究中遭到了代表性。以前关于印度尼西亚的工作受到了缺乏注释的数据集，语言资源的稀缺性以及缺乏资源标准化的阻碍。在这项工作中，我们发布了Indolem数据集，包括印度尼西亚语言的七个任务，跨越Morpho语法，语义和话语。我们另外发布indobert，这是一个新的印度尼西亚的预先训练的语言模型，除了基于现有资源的基准测试之外，除了基准测试之外还会通过Indlem评估它。我们的实验表明，IndoBert在Indlem中的大多数任务中实现了最先进的性能。

36. Targeted Poisoning Attacks on Black-Box Neural Machine Translation [PDF] 返回目录
Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin I. P. Rubinstein, Trevor Cohn
Abstract: As modern neural machine translation (NMT) systems have been widely deployed, their security vulnerabilities require close scrutiny. Most recently, NMT systems have been shown to be vulnerable to targeted attacks which cause them to produce specific, unsolicited, and even harmful translations. These attacks are usually exploited in a white-box setting, where adversarial inputs causing targeted translations are discovered for a known target system. However, this approach is less useful when the target system is black-box and unknown to the adversary (e.g., secured commercial systems). In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data. We demonstrate that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data. We then analyse the effectiveness of the targeted poisoning in two common NMT training scenarios, which are the one-off training and pre-train & fine-tune paradigms. Our results are alarming: even on the state-of-the-art systems trained with massive parallel data (tens of millions), the attacks are still successful (over 50% success rate) under surprisingly low poisoning rates (e.g., 0.006%). Lastly, we discuss potential defences to counter such attacks.
摘要：随着现代神经机翻译（NMT）系统已被广泛部署，其安全漏洞需要密切审查。最近，已显示NMT系统易受针对性攻击的影响，这导致它们产生特定的，未经请求的，甚至有害的翻译。这些攻击通常在白盒设置中被利用，其中针对已知目标系统发现导致有针对性翻译的逆势输入。然而，当目标系统是黑匣子并且对敌人（例如，安全商业系统）未知时，这种方法不太有用。在本文中，我们表明，对黑盒NMT系统的目标攻击是可行的，基于中毒的一小部分并行训练数据。我们证明，这种攻击几乎可以通过爬行的Web文档的有针对性的损坏来实现，以形成系统的培训数据。然后，我们分析了两个常见的NMT培训方案中有针对性中毒的有效性，这是一次性培训和火车前和微调范式。我们的结果令人震惊：即使在具有大规模并行数据（数百万）培训的最先进的系统上，攻击仍然成功（超过50％的成功率），令人惊讶的低中毒率（例如，0.006％）。最后，我们讨论潜在的防御，以反击这种攻击。

37. Reasoning Over History: Context Aware Visual Dialog [PDF] 返回目录
Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan
Abstract: While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to augment existing strong neural VQA models with the mechanisms that allow them to retain information from previous dialog turns. One strong VQA model is the MAC network, which decomposes a task into a series of attention-based reasoning steps. However, since the MAC network is designed for single-turn question answering, it is not capable of referring to past dialog turns. More specifically, it struggles with tasks that require reasoning over the dialog history, particularly coreference resolution. We extend the MAC network architecture with Context-aware Attention and Memory (CAM), which attends over control states in past dialog turns to determine the necessary reasoning operations for the current question. MAC nets with CAM achieve up to 98.25% accuracy on the CLEVR-Dialog dataset, beating the existing state-of-the-art by 30% (absolute). Our error analysis indicates that with CAM, the model's performance particularly improved on questions that required coreference resolution.
摘要：虽然已显示神经模型在单匝视觉问题的回答（VQA）任务上表现出强大的性能，但将VQA扩展到多转，会话设置仍然是一个挑战。解决这一挑战的一种方法是增加现有的强神经VQA模型，其中允许它们从上一个对话框转动保留信息。一个强大的VQA模型是MAC网络，它将任务分解为一系列基于关注的推理步骤。但是，由于MAC网络被设计用于单转疑问，因此它无法参考过去的对话框。更具体地说，它与需要推理对话历史的任务，特别是Coreference解决方案的任务斗争。我们将Mac网络架构扩展了通过上下文感知的注意和内存（CAM），在过去的对话框中参加控制状态，转向确定当前问题的必要推理操作。带有CAM的MAC网在CLEVR-DICORIVE数据集中达到高达98.25％的精度，以30％（绝对）击败现有的最先进。我们的错误分析表明，使用CAM，模型的性能对所需的问题进行练习分辨率的问题。

38. Aspect-Based Argument Mining [PDF] 返回目录
Dietrich Trautmann
Abstract: Computational Argumentation in general and Argument Mining in particular are important research fields. In previous works, many of the challenges to automatically extract and to some degree reason over natural language arguments were addressed. The tools to extract argument units are increasingly available and further open problems can be addressed. In this work, we are presenting the task of Aspect-Based Argument Mining (ABAM), with the essential subtasks of Aspect Term Extraction (ATE) and Nested Segmentation (NS). At the first instance, we create and release an annotated corpus with aspect information on the token-level. We consider aspects as the main point(s) argument units are addressing. This information is important for further downstream tasks such as argument ranking, argument summarization and generation, as well as the search for counter-arguments on the aspect-level. We present several experiments using state-of-the-art supervised architectures and demonstrate their performance for both of the subtasks. The annotated benchmark is available at this https URL.
摘要：一般和论证挖掘的计算论证是重要的研究领域。在以前的作品中，解决了自动提取和以某种程度的原因对自然语言参数的许多挑战。提取参数单位的工具越来越可用，可以解决进一步的打开问题。在这项工作中，我们正在呈现基于Aspeach的参数挖掘（ABAM）的任务，具有方面术语提取（ATE）和嵌套分段（NS）的基本子任务。在第一个实例中，我们创建并释放有关令牌级的方面信息的带注释的语料库。我们认为是主要点的方面是参数单位正在寻址。此信息对于进一步的下游任务是重要的，例如参数排名，参数概述和生成，以及搜索ASPESS级的反参数。我们使用最先进的监督架构提出了几个实验，并展示了它们对两个子任务的表现。在此HTTPS URL中提供注释的基准。

39. Social Chemistry 101: Learning to Reason about Social and Moral Norms [PDF] 返回目录
Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, Yejin Choi
Abstract: Social norms---the unspoken commonsense rules about acceptable social behavior---are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes." We present Social Chemistry, a new conceptual formalism to study people's everyday social norms and moral judgments over a rich spectrum of real life situations described in natural language. We introduce Social-Chem-101, a large-scale corpus that catalogs 292k rules-of-thumb such as "it is rude to run a blender at 5am" as the basic conceptual units. Each rule-of-thumb is further broken down with 12 different dimensions of people's judgments, including social judgments of good and bad, moral foundations, expected cultural pressure, and assumed legality, which together amount to over 4.5 million annotations of categorical labels and free-text descriptions. Comprehensive empirical results based on state-of-the-art neural models demonstrate that computational modeling of social norms is a promising research direction. Our model framework, Neural Norm Transformer, learns and generalizes Social-Chem-101 to successfully reason about previously unseen situations, generating relevant (and potentially novel) attribute-aware social rules-of-thumb.
摘要：社会规范---关于可接受的社会行为的令人言不乘的致辞规则 - 对理解人们在叙述中的行为的潜在原因和意图是至关重要的。例如，诸如“想要呼叫邻居警察”的行动的基础是社会规范，告知我们的行为，例如“预计您报告犯罪”。我们呈现社会化学，一个新的概念形式主义，以研究人们日常社会规范和道德判断在自然语言中描述的丰富的现实生活中。我们介绍了社交Chem-101，这是一个大规模的语料库，即编目292k拇指规则，例如“凌晨5点凌晨5点凌晨5点粗鲁”的拇指，作为基本的概念单位。每条经验法则进一步分解了12个不同方面的人民判决，包括良好和坏，道德基础，预期的文化压力，以及承担合法性的社会判断，其中金额超过450万元的分类标签注释和自由-text描述。基于最先进的神经模型的综合经验结果表明，社会规范的计算建模是一个有前途的研究方向。我们的模型框架，神经规范变压器，学习和概括社会化学101，成功地理解以前看不见的情况，产生相关（和潜在的新颖）属性感知的社会规则 - 拇指。

40. Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [PDF] 返回目录
Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune, Eneko Agirre
Abstract: The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. We perform simulated experiments on document classification (for development) and Conversational Question Answering datasets like QuAC and DoQA, where binary user feedback is derived from gold annotations. The results show that our method is able to improve over the initial supervised system, getting close to a fully-supervised system that has access to the same labeled examples in in-domain experiments (QuAC), and even matching in out-of-domain experiments (DoQA). Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
摘要：与用户的对话系统的互动构成了部署后改善它们的令人兴奋的机会，但是提供了很少的证据是其可行性。在大多数应用程序中，用户无法为系统提供正确的答案，但它们能够提供二进制（正确，不正确）的反馈。在本文中，我们提出了基于重要性抽样的反馈加权学习，以改善使用二进制用户反馈的初始监督系统。我们对文档分类（用于开发）的模拟实验和会话问题应答数据集如Quac和DoQa，其中二进制用户反馈来自Gold注释。结果表明，我们的方法能够改善初始监督系统，接近具有在域内实验（QUAC）中相同标记的示例的完全监督的系统，甚至在域外匹配实验（DOQA）。我们的工作开辟了展望，以利用与真实用户的交互，并在部署后改善会话系统。

41. Bracketing Encodings for 2-Planar Dependency Parsing [PDF] 返回目录
Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez
Abstract: We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs in sequence labeling parsing. First, we show that existing bracketing encodings for parsing as labeling can only handle a very mild extension of projective trees. Second, we overcome this limitation by taking into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks, i.e., the arcs of a dependency tree can be split into two planes such that arcs in a given plane do not cross. We take advantage of this property to design a method that balances the brackets and that encodes the arcs belonging to each of those planes, allowing for almost unrestricted non-projectivity (round 99.9% coverage) in sequence labeling parsing. The experiments show that our linearizations improve over the accuracy of the original bracketing encoding in highly non-projective treebanks (on average by 0.4 LAS), while achieving a similar speed. Also, they are especially suitable when PoS tags are not used as input parameters to the models.
摘要：我们介绍了一种基于括号的编码，可用于将任何2平面依赖性树代表在长度n句子上作为N标签的序列表示，因此在序列标记解析中提供了交叉弧的几乎完全覆盖。首先，我们显示解析为标签的现有包围编码只能处理投影树的非常轻微的扩展。其次，我们通过考虑到2平方度的众所周知的属性来克服这种限制，这在树木银行中的绝大多数依赖性句法结构中存在，即依赖树的弧可以分成两个平面给定飞机中的弧线不会交叉。我们利用此属性来设计一种余额括号的方法，并且对属于每个平面的弧进行编码，允许在序列标记解析中允许几乎不受限制的非投影率（Round 99.9％覆盖率）。该实验表明，我们的线性化可以改善高度非投射树木银行（平均0.4 LAS）的原始括号编码的准确性，同时实现了类似的速度。此外，当POS标签未用作模型的输入参数时，它们特别合适。

42. MixKD: Towards Efficient Distillation of Large-scale Language Models [PDF] 返回目录
Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin
Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such big models. However, large-scale neural network systems are prone to memorize training instances, and thus tend to make inconsistent predictions when the data distribution is altered slightly. Moreover, the student model has few opportunities to request useful information from the teacher model when there is limited task-specific data available. To address these issues, we propose MixKD, a data-agnostic distillation framework that leverages mixup, a simple yet efficient data augmentation approach, to endow the resulting model with stronger generalization ability. Concretely, in addition to the original training examples, the student model is encouraged to mimic the teacher's behavior on the linear interpolation of example pairs as well. We prove, from a theoretical perspective, that under reasonable conditions MixKD gives rise to a smaller gap between the generalization error and the empirical error. To verify its effectiveness, we conduct experiments on the GLUE benchmark, where MixKD consistently leads to significant gains over the standard KD training, and outperforms several competitive baselines. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
摘要：大型语言模型最近展示了令人印象深刻的经验表现。然而，改进的结果是以更大的型号，功耗和较慢推断的价格实现，这阻碍了他们对低资源（内存和计算）平台的适用性。知识蒸馏（KD）已被证明为压缩此类大型模型的有效框架。然而，大规模的神经网络系统容易记住培训实例，因此当数据分布稍微改变时倾向于产生不一致的预测。此外，当有限的任务特定数据可用时，学生模型几乎没有机会请求教师模型中有用的信息。为了解决这些问题，我们提出MixKD，一种利用混合，一种简单而有效的数据增强方法，以赋予所产生的模型具有更强的泛化能力。具体地，除了原始培训之例外，还鼓励学生模型模仿教师的行为在示例对的线性插值上。从理论的角度来看，我们证明了在合理的条件下，MixKD在泛化误差和经验误差之间产生较小的差距。为了验证其有效性，我们对胶水基准进行进行实验，MixKD始终如一地导致标准KD培训的显着提升，并且优于几种竞争基础。在有限数据设置和消融研究下的实验进一步证明了所提出的方法的优势。

43. Vec2Sent: Probing Sentence Embeddings with Natural Language Generation [PDF] 返回目录
Martin Kerscher, Steffen Eger
Abstract: We introspect black-box sentence embeddings by conditionally generating from them with the objective to retrieve the underlying discrete sentence. We perceive of this as a new unsupervised probing task and show that it correlates well with downstream task performance. We also illustrate how the language generated from different encoders differs. We apply our approach to generate sentence analogies from sentence embeddings.
摘要：通过有条件地从它们生成目标来检索下面的离散句子来嵌入黑盒陈述嵌入。我们认为这是一个新的无监督探测任务，并表明它与下游任务表现良好。我们还说明了如何如何从不同编码器产生的不同之处。我们应用我们的方法来生成句子嵌入的句子类比。

44. A Unifying Theory of Transition-based and Sequence Labeling Parsing [PDF] 返回目录
Carlos Gómez-Rodríguez, Michalina Strzyz, David Vilares
Abstract: We define a mapping from transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees. This not only establishes a theoretical relation between transition-based parsing and sequence-labeling parsing, but also provides a method to obtain new encodings for fast and simple sequence labeling parsing from the many existing transition-based parsers for different formalisms. Applying it to dependency parsing, we implement sequence labeling versions of four algorithms, showing that they are learnable and obtain comparable performance to existing encodings.
摘要：我们定义了一种从基于转换的解析算法的映射，从左到右读取句子的句子序列标记句法树的编码。这不仅建立了基于转换的解析和序列标记解析的理论关系，而且还提供了一种用于获得新编码的方法，用于从许多现有的基于转换的解析器出于不同的形式主义解析的快速和简单的序列标记。将其应用于依赖解析，我们实现了四种算法的序列标记版本，显示它们是可学习的，并获得对现有编码的可比性。

45. ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset [PDF] 返回目录
Basma Alharbi, Hind Alamro, Manal Alshehri, Zuhair Khayyat, Manal Kalkatawi, Inji Ibrahim Jaber, Xiangliang Zhang
Abstract: This paper provides a detailed description of a new Twitter-based benchmark dataset for Arabic Sentiment Analysis (ASAD), which is launched in a competition3, sponsored by KAUST for awarding 10000 USD, 5000 USD and 2000 USD to the first, second and third place winners, respectively. Compared to other publicly released Arabic datasets, ASAD is a large, high-quality annotated dataset(including 95K tweets), with three-class sentiment labels (positive, negative and neutral). We presents the details of the data collection process and annotation process. In addition, we implement several baseline models for the competition task and report the results as a reference for the participants to the competition.
摘要要第三名获胜者。与其他公开发布的阿拉伯语数据集相比，ASAD是一个大型高质量的注释数据集（包括95K推文），具有三类情绪标签（正，负和中性）。我们介绍了数据收集过程和注释过程的详细信息。此外，我们为竞争任务实施了几个基线模型，并将结果作为参与者的参考报告为竞争。

46. Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey [PDF] 返回目录
Samuel Louvan, Bernardo Magnini
Abstract: In recent years, fostered by deep learning technologies and by the high demand for conversational AI, various approaches have been proposed that address the capacity to elicit and understand user's needs in task-oriented dialogue systems. We focus on two core tasks, slot filling (SF) and intent classification (IC), and survey how neural-based models have rapidly evolved to address natural language understanding in dialogue systems. We introduce three neural architectures: independent model, which model SF and IC separately, joint models, which exploit the mutual benefit of the two tasks simultaneously, and transfer learning models, that scale the model to new domains. We discuss the current state of the research in SF and IC and highlight challenges that still require attention.
摘要：近年来，由深入学习技术和对话AI的高需求促进，提出了各种方法，解决了在以任务为导向的对话系统中引出和理解用户需求的能力。我们专注于两个核心任务，插槽填充（SF）和意图分类（IC），并调查神经基础的模型如何迅速发展，以解决对话系统中的自然语言理解。我们介绍了三个神经架构：独立模型，其中SF和IC分开，联合模型，它同时利用两个任务的互利，并将学习模型扩展到新域。我们讨论了SF和IC研究的当前状态，并突出了仍需要注意的挑战。

47. WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments [PDF] 返回目录
Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu
Abstract: This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020. The HASOC 2020 organizers provided participants with annotated datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English). We participated in task 1: Offensive comment identification in Code-mixed Malayalam Youtube comments. In our methodology, we take advantage of available English data by applying cross-lingual contextual word embeddings and transfer learning to make predictions to Malayalam data. We further improve the results using various fine tuning strategies. Our system achieved 0.89 weighted average F1 score for the test set and it ranked 5th place out of 12 participants.
摘要：本文介绍了在印度欧洲语言（HASOC）共享任务2020中的WLV-RIT进入仇恨语音和冒犯内容识别。HASOC 2020组织者为参与者提供了包含在Dravidian语言中的Code-Medio的社交媒体帖子的注释数据集（马拉雅拉姆 - 英语和泰米尔英语）。我们参加了任务1：代码混合Malayalam YouTube评论中的进攻评论识别。在我们的方法中，我们通过应用交叉语言语境词嵌入和转移学习来利用可用的英语数据来对马拉雅拉姆数据进行预测。我们通过各种微调策略进一步提高了结果。我们的系统实现了测试集0.89加权平均F1分数，并排名为12名参与者的第5名。

48. SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training [PDF] 返回目录
Huda Khayrallah, João Sedoc
Abstract: Non-task-oriented dialog models suffer from poor quality and non-diverse responses. To overcome limited conversational data, we apply Simulated Multiple Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to simulate multiple responses per training prompt. We find SMRT improves over a strong Transformer baseline as measured by human and automatic quality scores and lexical diversity. We also find SMRT is comparable to pretraining in human evaluation quality, and outperforms pretraining on automatic quality and lexical diversity, without requiring related-domain dialog data.
摘要：面向非任务为导向的对话模型遭受质量差和非多样化的反应。为了克服有限的会话数据，我们应用模拟多参考培训（SMRT; Khayrallah等，2020），并使用令人互动的应用程序模拟每个训练提示的多个响应。我们发现SMRT通过人类和自动质量分数和词汇分类来衡量的强大变压器基线。我们还发现SMRT与人类评估质量的预先威胁相当，而且超越自动质量和词汇分集的预借鉴，而无需相关域对话数据。

49. Semantic coordinates analysis reveals language changes in the AI field [PDF] 返回目录
Zining Zhu, Yang Xu, Frank Rudzicz
Abstract: Semantic shifts can reflect changes in beliefs across hundreds of years, but it is less clear whether trends in fast-changing communities across a short time can be detected. We propose semantic coordinates analysis, a method based on semantic shifts, that reveals changes in language within publications of a field (we use AI as example) across a short time span. We use GloVe-style probability ratios to quantify the shifting directions and extents from multiple viewpoints. We show that semantic coordinates analysis can detect shifts echoing changes of research interests (e.g., "deep" shifted further from "rigorous" to "neural"), and developments of research activities (e,g., "collaboration" contains less "competition" than "collaboration"), based on publications spanning as short as 10 years.
摘要：语义转变可以反映数百年的信仰的变化，但可以不清楚是否可以检测到短时间内快速变化社区的趋势。我们提出了语义坐标分析，一种基于语义移位的方法，它揭示了在短时间跨度跨越字段的出版物（我们使用AI的AI）中的语言变化。我们使用手套式概率比来从多个观点来量化移位方向和范围。我们表明语义坐标分析可以检测转向研究兴趣的变化（例如，“深深”从“严格”转移到“神经网络”）以及研究活动的发展（E，G.，“Collaborations”含有更少“的竞争“基于跨越10年的出版物，”合作“）。

50. Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization [PDF] 返回目录
Badr AlKhamissi, Muhammad N. ElNokrashy, Mohamed Gabr
Abstract: We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately---enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two, and opens the door for network interpretability. The task module is a softmax classifier that enumerates valid combinations of diacritics. This architecture can be extended with a recurrent decoder that optionally accepts priors from partially diacritized text, which improves results. We employ extra tricks such as sentence dropout and majority voting to further boost the final result. Our best model achieves a WER of 5.34%, outperforming the previous state-of-the-art with a 30.56% relative error reduction.
摘要：我们提出了一种新颖的架构，用于标记标签的字符序列，以实现最先进的结果，以塔什基拉阿拉伯阿拉伯杂记基准。核心是一种双层复发层次结构，单独运营单词和字符级别 - 从可比较的传统模型中实现更快的培训和推论。交叉级注意模块进一步连接两者，并打开门以进行网络解释性。任务模块是一个softmax分类器，其枚举有效的变音符组合。该架构可以与复制解码器扩展，可选地接受来自部分变音的文本的前沿，这改善了结果。我们采用额外的技巧，如句子辍学和大多数投票，以进一步推动最终结果。我们的最佳型号实现了5.34％的WER，优于以前的最先进的相对误差减少了30.56％。

51. CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering [PDF] 返回目录
Junru Lu, Gabriele Pergola, Lin Gui, Binyang Li, Yulan He
Abstract: We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis revealed the interpretability introduced by the memory module.
摘要：通过文本生成，我们介绍Chime，横通分层内存网络的问题答案（QA）。它扩展了由两个组件组成的辅助存储器模块的XLnet：上下文存储器收集交叉通道证据，以及作为缓冲区的答案内存不断地改进所生成的答案。经验上，我们展示了拟议的架构在多通道生成QA中的功效，优于最先进的基线，具有更好的句法良好的答案，并提高了解决AmazonQA评论数据集的问题的精确度。额外的定性分析显示了存储器模块引入的解释性。

52. Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems [PDF] 返回目录
Vitou Phy, Yang Zhao, Akiko Aizawa
Abstract: Many automatic evaluation metrics have been proposed to score the overall quality of a response in open-domain dialogue. Generally, the overall quality is comprised of various aspects, such as relevancy, specificity, and empathy, and the importance of each aspect differs according to the task. For instance, specificity is mandatory in a food-ordering dialogue task, whereas fluency is preferred in a language-teaching dialogue system. However, existing metrics are not designed to cope with such flexibility. For example, BLEU score fundamentally relies only on word overlapping, whereas BERTScore relies on semantic similarity between reference and candidate response. Thus, they are not guaranteed to capture the required aspects, i.e., specificity. To design a metric that is flexible to a task, we first propose making these qualities manageable by grouping them into three groups: understandability, sensibleness, and likability, where likability is a combination of qualities that are essential for a task. We also propose a simple method to composite metrics of each aspect to obtain a single metric called USL-H, which stands for Understandability, Sensibleness, and Likability in Hierarchy. We demonstrated that USL-H score achieves good correlations with human judgment and maintains its configurability towards different aspects and metrics.
摘要：已经提出了许多自动评估指标，以在开放式对话中进行响应的整体质量。通常，整体质量由各个方面组成，例如相关性，特异性和同理性，并且每个方面的重要性根据任务而不同。例如，在食物排序的对话任务中必须具有特异性，而流利在语言教学对话系统中是首选。但是，现有的指标不设计用于应对这种灵活性。例如，Bleu得分根本依赖于单词重叠，而Bertscore则依赖于参考和候选响应之间的语义相似性。因此，它们不保证捕获所需的方面，即特异性。要设计一个灵活的指标，我们首先建议通过将它们分为三组：可理性，敏感性和可爱能力来制作这些品质，可爱是对任务至关重要的品质的组合。我们还提出了一种简单的方法来汇编每个方面的综合度量，以获得称为USL-H的单个度量，这代表层次结构中的可辨性，敏感性和可爱性。我们展示了USL-H分数与人类判断良好的相关性，并保持其对不同方面和指标的可配置性。

53. Transformer-based Multi-Aspect Modeling for Multi-Aspect Multi-Sentiment Analysis [PDF] 返回目录
Zhen Wu, Chengcan Ying, Xinyu Dai, Shujian Huang, Jiajun Chen
Abstract: Aspect-based sentiment analysis (ABSA) aims at analyzing the sentiment of a given aspect in a sentence. Recently, neural network-based methods have achieved promising results in existing ABSA datasets. However, these datasets tend to degenerate to sentence-level sentiment analysis because most sentences contain only one aspect or multiple aspects with the same sentiment polarity. To facilitate the research of ABSA, NLPCC 2020 Shared Task 2 releases a new large-scale Multi-Aspect Multi-Sentiment (MAMS) dataset. In the MAMS dataset, each sentence contains at least two different aspects with different sentiment polarities, which makes ABSA more complex and challenging. To address the challenging dataset, we re-formalize ABSA as a problem of multi-aspect sentiment analysis, and propose a novel Transformer-based Multi-aspect Modeling scheme (TMM), which can capture potential relations between multiple aspects and simultaneously detect the sentiment of all aspects in a sentence. Experiment results on the MAMS dataset show that our method achieves noticeable improvements compared with strong baselines such as BERT and RoBERTa, and finally ranks the 2nd in NLPCC 2020 Shared Task 2 Evaluation.
摘要：基于宽度的情感分析（ABSA）旨在分析句子中给定方面的情绪。最近，基于神经网络的方法已经取得了有希望的现有ABSA数据集的结果。然而，这些数据集倾向于退化到句子级的情绪分析，因为大多数句子只包含具有相同情绪极性的一个方面或多个方面。为方便ABSA的研究，NLPCC 2020共享任务2释放了一个新的大规模多方多语言（MAMS）数据集。在MAMS数据集中，每个句子包含至少两个不同的情感极性的不同方面，这使得absa更复杂和具有挑战性。为了解决具有挑战性的数据集，我们将ABSA重新形式化为多个方面情绪分析的问题，并提出了一种基于新型的基于变压器的多宽方面建模方案（TMM），其可以捕获多个方面之间的潜在关系并同时检测情绪句子中的所有方面。 MAMS数据集上的实验结果表明，与BERT和Roberta等强基线相比，我们的方法实现了明显的改进，最后在NLPCC 2020共享任务2评估中排名第二。

54. Opinion Transmission Network for Jointly Improving Aspect-oriented Opinion Words Extraction and Sentiment Classification [PDF] 返回目录
Chengcan Ying, Zhen Wu, Xinyu Dai, Shujian Huang, Jiajun Chen
Abstract: Aspect-level sentiment classification (ALSC) and aspect oriented opinion words extraction (AOWE) are two highly relevant aspect-based sentiment analysis (ABSA) subtasks. They respectively aim to detect the sentiment polarity and extract the corresponding opinion words toward a given aspect in a sentence. Previous works separate them and focus on one of them by training neural models on small-scale labeled data, while neglecting the connections between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential bridge between ALSC and AOWE to achieve the goal of facilitating them simultaneously. Specifically, we design two tailor-made opinion transmission mechanisms to control opinion clues flow bidirectionally, respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two benchmark datasets show that our joint model outperforms strong baselines on the two tasks. Further analysis also validates the effectiveness of opinion transmission mechanisms.
摘要：方面级别情绪分类（ALSC）和方面导向的意见单词提取（AOWE）是两个高度相关的基于方面的情绪分析（ABSA）子组织。它们分别旨在检测情绪极性，并提取对句子中给定方面的相应意见词。以前的作品将它们分开并通过培训在小规模标记数据上的神经模型来关注其中一个，同时忽略它们之间的连接。在本文中，我们提出了一种新颖的联合模型，意见传输网络（OTN），利用ALSC和Aowe之间的潜在桥来实现促进他们的目标。具体而言，我们设计了两种量身定制的意见传输机制，以分别从Alsc向AWEE和Aowe向ALSC向ALSC向ALSC进行双向进行双向传导线索流动。两个基准数据集上的实验结果表明，我们的联合模型优于两项任务的强大基线。进一步分析还验证了意见传输机制的有效性。

55. Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels [PDF] 返回目录
Miruna Pislar, Marek Rei
Abstract: In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate combination of hierarchical structures that conveys the meaning of the whole sentence. Neural networks can capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL systematically outperforms equivalent models that are not incentivized towards developing compositional representations. Moreover, we demonstrate that, with the proposed architecture, the sentence information flows naturally to individual words, allowing the model to behave like a sequence labeller (which is a lower, word-level task) even without any word supervision, in a zero-shot fashion.
摘要：在自然语言中，单词用于构建句子。它不是孤立的单词，而是相应的分层结构组合，其传达了整个句子的含义。神经网络可以捕捉表达语言特征;但是，难以自动获得单词和句子之间的联系的见解。在这项工作中，我们设计了一个深度神经网络架构，明确地明确地线和更高的语言组件;然后，我们评估其在不同层级执行相同任务的能力。在广泛的文本分类任务中建立，我们表明我们的模型MHAL，学会通过在层次结构之间流动转移知识来同时解决不同水平的粒度。使用多主题注意机制将单词与完整句子之间的表示，MHAL系统地优于不促进组成表示的等效模型。此外，我们证明，利用所提出的架构，句子信息自然地流到各个单词，允许模型表现得像一个ZEROW-拍摄时尚。

56. Fake or Real? A Study of Arabic Satirical Fake News [PDF] 返回目录
Hadeel Saadany, Emad Mohamed, Constantin Orasan
Abstract: One very common type of fake news is satire which comes in a form of a news website or an online platform that parodies reputable real news agencies to create a sarcastic version of reality. This type of fake news is often disseminated by individuals on their online platforms as it has a much stronger effect in delivering criticism than through a straightforward message. However, when the satirical text is disseminated via social media without mention of its source, it can be mistaken for real news. This study conducts several exploratory analyses to identify the linguistic properties of Arabic fake news with satirical content. We exploit these features to build a number of machine learning models capable of identifying satirical fake news with an accuracy of up to 98.6%.
摘要：一个非常常见的假新闻是讽刺，讽刺是一种新闻网站或在线平台的形式，可以使知名的真正的新闻机构构成现实的讽刺版本。这种类型的假新闻通常由在线平台上的个人传播，因为它在提供批评方面具有比通过简单的信息更强烈的效果。但是，当讽刺文本通过社交媒体传播而不提及其来源时，可能会误认为是真正的新闻。本研究开展了几种探索性分析，以确定阿拉伯假新闻与讽刺含量的语言学特性。我们利用这些功能来构建许多机器学习模型，能够识别讽刺假新闻，精度高达98.6％。

57. Improving Cyberbully Detection with User Interaction [PDF] 返回目录
Suyu Ge, Lu Cheng, Huan Liu
Abstract: Cyberbullying, identified as intended and repeated online bullying behavior, has become increasingly prevalent in the past few decades. Despite the significant progress made thus far, the focus of most existing work on cyberbullying detection lies in the independent content analysis of different comments within a social media session. We argue that such leading notions of analysis suffer from three key limitations: they overlook the temporal correlations among different comments; they only consider the content within a single comment rather than the topic coherence across comments; they remain generic and exploit limited interactions between social media users. In this work, we observe that user comments in the same session may be inherently related, e.g., discussing similar topics, and their interaction may evolve over time. We also show that modeling such topic coherence and temporal interaction are critical to capture the repetitive characteristics of bullying behavior, thus leading to better predicting performance. To achieve the goal, we first construct a unified temporal graph for each social media session. Drawing on recent advances in graph neural network, we then propose a principled approach for modeling the temporal dynamics and topic coherence throughout user interactions. We empirically evaluate the effectiveness of our approach with the tasks of session-level bullying detection and comment-level case study.
摘要：以旨在和重复在线欺凌行为确定的网络欺凌，在过去几十年中越来越普遍。尽管迄今取得了重大进展，但大多数现有的网络欺凌措施的重点是在社交媒体会话中不同评论的独立内容分析。我们认为，这种传统的分析概念遭受了三个关键限制：他们忽略了不同评论之间的时间相关性;他们只考虑单一评论中的内容，而不是对评论的主题一致;它们在社交媒体用户之间保持通用和利用有限的互动。在这项工作中，我们观察到同一会话中的用户评论可能是固有的相关性，例如，讨论类似主题，并且它们的交互可能随着时间的推移而发展。我们还表明，建模此类主题连贯性和时间交互对于捕获欺凌行为的重复特性至关重要，从而导致更好的预测性能。为了实现目标，我们首先为每个社交媒体会话构建一个统一的时间图。绘制近期图形神经网络的进步，我们提出了一个原则方法，用于在整个用户交互中建模时间动态和主题一致性。我们凭经理评估了我们的方法的有效性，与会话级欺凌检测和评论级别案例研究的任务。

58. Analyzing the Effect of Multi-task Learning for Biomedical Named Entity Recognition [PDF] 返回目录
Arda Akdemir, Tetsuo Shibuya
Abstract: Developing high-performing systems for detecting biomedical named entities has major implications. State-of-the-art deep-learning based solutions for entity recognition often require large annotated datasets, which is not available in the biomedical domain. Transfer learning and multi-task learning have been shown to improve performance for low-resource domains. However, the applications of these methods are relatively scarce in the biomedical domain, and a theoretical understanding of why these methods improve the performance is lacking. In this study, we performed an extensive analysis to understand the transferability between different biomedical entity datasets. We found useful measures to predict transferability between these datasets. Besides, we propose combining transfer learning and multi-task learning to improve the performance of biomedical named entity recognition systems, which is not applied before to the best of our knowledge.
摘要：开发用于检测生物医学命名实体的高性能系统具有重大影响。用于实体识别的最先进的基于深度学习的解决方案通常需要大的注释数据集，这在生物医学域中不可用。已显示转移学习和多任务学习以提高低资源域的性能。然而，这些方法的应用在生物医学领域中相对稀缺，并且对为什么这些方法提高性能的理论理解是缺乏的。在这项研究中，我们进行了广泛的分析，以了解不同生物医学实体数据集之间的可转换性。我们发现预测这些数据集之间的可转换性的有用措施。此外，我们提出结合转移学习和多任务学习，以提高生物医学命名实体识别系统的性能，这在我们的知识之前未应用。

59. Deep Learning for Text Attribute Transfer: A Survey [PDF] 返回目录
Di Jin, Zhijing Jin, Rada Mihalcea
Abstract: Driven by the increasingly larger deep learning models, neural language generation (NLG) has enjoyed unprecedentedly improvement and is now able to generate a diversity of human-like texts on demand, granting itself the capability of serving as a human writing assistant. Text attribute transfer is one of the most important NLG tasks, which aims to control certain attributes that people may expect the texts to possess, such as sentiment, tense, emotion, political position, etc. It has a long history in Natural Language Processing but recently gains much more attention thanks to the promising performance brought by deep learning models. In this article, we present a systematic survey on these works for neural text attribute transfer. We collect all related academic works since the first appearance in 2017. We then select, summarize, discuss, and analyze around 65 representative works in a comprehensive way. Overall, we have covered the task formulation, existing datasets and metrics for model development and evaluation, and all methods developed over the last several years. We reveal that existing methods are indeed based on a combination of several loss functions with each of which serving a certain goal. Such a unique perspective we provide could shed light on the design of new methods. We conclude our survey with a discussion on open issues that need to be resolved for better future development.
摘要：由越来越大的深度学习模式驱动，神经语言生成（NLG）享有前所未有的改进，现在能够根据需求产生多样性的人类文本，授予自己作为人类写作助理的能力。文本属性传输是最重要的NLG任务之一，旨在控制人们可能希望文本所拥有的某些属性，例如情感，紧张，情感，政治地位等。它在自然语言处理中具有悠久的历史，但是由于深入学习模型带来的有希望的表现，最近越来越多地提高了重视。在本文中，我们对这些作品提供了系统调查，用于神经文本属性转移。自2017年首次出现以来，我们收集所有相关的学术作品。然后，我们以全面的方式选择，总结，讨论和分析65个代表作品。总体而言，我们涵盖了任务制定，现有数据集和模型开发和评估的指标，以及在过去几年中开发的所有方法。我们揭示现有方法确实基于几种损失函数的组合，其中每个损失函数用于某个目标。如此独特的观点，我们提供的是新方法的设计。我们讨论了我们的调查，讨论了对更好的未来发展所需的开放问题。

60. Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies [PDF] 返回目录
Alexander H. Liu, Yu-An Chung, James Glass
Abstract: Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generating the representation. In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech. NPC has a conceptually simple objective and can be implemented easily with the introduced Masked Convolution Blocks. NPC offers a significant speedup for inference since it is parallelizable in time and has a fixed inference time for each time step regardless of the input sequence length. We discuss and verify the effectiveness of NPC by theoretically and empirically comparing it with other methods. We show that the NPC representation is comparable to other methods in speech experiments on phonetic and speaker classification while being more efficient.
摘要：已显示自我监督的语音表示在各种语音应用中有效。然而，现有的表示学习方法通常依赖于自回归模型和/或观察到的全局依赖关系，同时生成表示。在这项工作中，我们提出了非自动评级预测编码（NPC），自我监督方法，通过仅依赖于局部的语音依赖性来以非自动增加方式学习语音表示。 NPC具有概念上简单的目标，并且可以通过引入的蒙面卷积块轻松实现。 NPC为推断提供了显着的加速，因为它并行于时间并行，并且对于每个时间步长，无论输入序列长度如何，都具有固定的推理时间。我们在理论上讨论并验证NPC的有效性，并与其他方法凭证进行比较。我们表明NPC表示与语音和扬声器分类的语音实验中的其他方法相当，同时更有效。

61. Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction [PDF] 返回目录
Minh Tran, Yipeng Zhang, Mohammad Soleymani
Abstract: Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a RETRIEVE, GENERATE and EDIT unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level of fluency and preserving the content of the original text. We extensively evaluate our method's performance and compare it to previous style transfer models using both automatic metrics and human evaluations. Experimental results show that our method outperforms other models on human evaluations and is the only approach that consistently performs well on all automatic evaluation metrics.
摘要：令人反感和辱骂语言是社交媒体平台上的迫切问题。在这项工作中，我们提出了一种将令人反感的评论，含有亵渎或攻击性语言的声明转变的方法。我们设计检索，生成和编辑无监督的样式传输管道，以单词限制方式缩短进攻性注释，同时保持高水平的流畅性并保留原始文本的内容。我们广泛评估了我们的方法的性能，并使用自动指标和人类评估将其与以前的样式传输模型进行比较。实验结果表明，我们的方法优于人类评估的其他模型，是唯一一直在所有自动评估指标上表现良好的方法。

62. Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism [PDF] 返回目录
Peng Su, K. Vijay-Shanker
Abstract: With the explosive growth of biomedical literature, designing automatic tools to extract information from the literature has great significance in biomedical research. Recently, transformer-based BERT models adapted to the biomedical domain have produced leading results. However, all the existing BERT models for relation classification only utilize partial knowledge from the last layer. In this paper, we will investigate the method of utilizing the entire layer in the fine-tuning process of BERT model. To the best of our knowledge, we are the first to explore this method. The experimental results illustrate that our method improves the BERT model performance and outperforms the state-of-the-art methods on three benchmark datasets for different relation extraction tasks. In addition, further analysis indicates that the key knowledge about the relations can be learned from the last layer of BERT model.
摘要：凭借生物医学文献的爆炸性成长，设计自动工具以从文献中提取信息具有重要意义。最近，适用于生物医学域的基于变压器的BERT模型产生了主要的结果。然而，用于关系分类的所有现有BERT模型仅利用来自最后一层的部分知识。在本文中，我们将研究利用整个层在BERT模型的微调过程中的方法。据我们所知，我们是第一个探索此方法的方法。实验结果表明，我们的方法改善了BERT模型性能，优于三个基准数据集的最先进的方法，以实现不同关系提取任务。此外，进一步的分析表明，可以从伯特模型的最后一层中学到关系的关键知识。

63. Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [PDF] 返回目录
Kaize Ding, Jianling Wang, Jundong Li, Dingcheng Li, Huan Liu
Abstract: Text classification is a critical research topic with broad applications in natural language processing. Recently, graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are: (1) unable to capture high-order interaction between words; (2) inefficient to handle large datasets and new documents. To address those issues, in this paper, we propose a principled model -- hypergraph attention networks (HyperGAT), which can obtain more expressive power with less computational consumption for text representation learning. Extensive experiments on various benchmark datasets demonstrate the efficacy of the proposed approach on the text classification task.
摘要：文本分类是一种具有广泛应用自然语言处理的关键研究主题。最近，图形神经网络（GNNS）在研究界受到了越来越多的关注，并在这个规范任务上展示了他们有希望的结果。尽管成功，他们的表现可能在实践中主要受到危及，因为它们是：（1）无法捕获单词之间的高阶互动; （2）处理大型数据集和新文件的低效。为了解决这些问题，在本文中，我们提出了一个原则性的模型 - 超图 - 超图 - 超图形网络（Hypergat），这可以获得更具表现力的功率，可以获得更少的文本表示学习的计算。各种基准数据集的广泛实验证明了所提出的方法对文本分类任务的功效。

64. Efficient Arabic emotion recognition using deep neural networks [PDF] 返回目录
Ahmed Ali, Yasser Hifny
Abstract: Emotion recognition from speech signal based on deep learning is an active research area. Convolutional neural networks (CNNs) may be the dominant method in this area. In this paper, we implement two neural architectures to address this problem. The first architecture is an attention-based CNN-LSTM-DNN model. In this novel architecture, the convolutional layers extract salient features and the bi-directional long short-term memory (BLSTM) layers handle the sequential phenomena of the speech signal. This is followed by an attention layer, which extracts a summary vector that is fed to the fully connected dense layer (DNN), which finally connects to a softmax output layer. The second architecture is based on a deep CNN model. The results on an Arabic speech emotion recognition task show that our innovative approach can lead to significant improvements (2.2% absolute improvements) over a strong deep CNN baseline system. On the other hand, the deep CNN models are significantly faster than the attention based CNN-LSTM-DNN models in training and classification.
摘要：基于深度学习的语音信号的情感识别是一个活跃的研究区域。卷积神经网络（CNNS）可以是该区域中的主要方法。在本文中，我们实施了两个神经架构来解决这个问题。第一架构是基于关注的CNN-LSTM-DNN模型。在这种新颖的架构中，卷积层提取突出特征和双向长短期存储器（BLSTM）层处理语音信号的顺序现象。这是关注层，其提取摘要向量，该摘要向量馈送到完全连接的致密层（DNN），该载体最终连接到Softmax输出层。第二架构基于深度CNN模型。阿拉伯语演讲情感识别任务的结果表明，我们的创新方法可能导致强大的深层CNN基线系统中的显着改善（绝对改善2.2％）。另一方面，深度CNN模型比训练和分类在基于CNN-LSTM-DNN模型的注意力速度更快。

65. Aspectuality Across Genre: A Distributional Semantics Approach [PDF] 返回目录
Thomas Kober, Malihe Alikhani, Matthew Stone, Mark Steedman
Abstract: The interpretation of the lexical aspect of verbs in English plays a crucial role for recognizing textual entailment and learning discourse-level inferences. We show that two elementary dimensions of aspectual class, states vs. events, and telic vs. atelic events, can be modelled effectively with distributional semantics. We find that a verb's local context is most indicative of its aspectual class, and demonstrate that closed class words tend to be stronger discriminating contexts than content words. Our approach outperforms previous work on three datasets. Lastly, we contribute a dataset of human--human conversations annotated with lexical aspect and present experiments that show the correlation of telicity with genre and discourse goals.
摘要：英语动词词汇表的解释对认识文本意外和学习话语级推论来说起着至关重要的作用。我们展示了各种类别，州与事件和电视与房间事件的两个基本维度可以有效地与分布语义有效地建模。我们发现动词的本地背景最重要的是其各种类别，并且证明封闭的类词往往比内容词更强大辨别语境。我们的方法在三个数据集上以前的工作胜过。最后，我们贡献了具有词汇方面的人类谈话的数据集，并显示出显示电视与流派和话语目标的相关性的实验。

66. Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage [PDF] 返回目录
Ella Rabinovich, Hila Gonen, Suzanne Stevenson
Abstract: A large body of research on gender-linked language has established foundations regarding cross-gender differences in lexical, emotional, and topical preferences, along with their sociological underpinnings. We compile a novel, large and diverse corpus of spontaneous linguistic productions annotated with speakers' gender, and perform a first large-scale empirical study of distinctions in the usage of \textit{figurative language} between male and female authors. Our analyses suggest that (1) idiomatic choices reflect gender-specific lexical and semantic preferences in general language, (2) men's and women's idiomatic usages express higher emotion than their literal language, with detectable, albeit more subtle, differences between male and female authors along the dimension of dominance compared to similar distinctions in their literal utterances, and (3) contextual analysis of idiomatic expressions reveals considerable differences, reflecting subtle divergences in usage environments, shaped by cross-gender communication styles and semantic biases.
摘要：对性别联系语言的大量研究已经建立了关于词汇，情感和题目偏好的交叉性别差异的基础，以及他们的社会学内部。我们编制了一种用扬声器的性别注释的新颖，大型和多样的自发语言制作语料库，并对男女作家之间的使用情况进行了第一个大规模的实证研究\ Texit {比喻语言}。我们的分析表明（1）惯用选择反映了一般语言中的性别特定的词汇和语义偏好，（2）男女的惯用惯例表达比他们的语文语言更高的情感，可检测，虽然更加微妙，男女作者之间的差异与他们的文字话语中相似的区别相比，（3）惯用表达的上下文分析呈现了相当大的差异，反映了使用环境中的微妙分歧，由交叉性别通信风格和语义偏见。

67. Effective Approach to Develop a Sentiment Annotator For Legal Domain in a Low Resource Setting [PDF] 返回目录
Gathika Ratnayaka, Nisansa de Silva, Amal Shehan Perera, Ramesh Pathirana
Abstract: Analyzing the sentiments of legal opinions available in Legal Opinion Texts can facilitate several use cases such as legal judgement prediction, contradictory statements identification and party-based sentiment analysis. However, the task of developing a legal domain specific sentiment annotator is challenging due to resource constraints such as lack of domain specific labelled data and domain expertise. In this study, we propose novel techniques that can be used to develop a sentiment annotator for the legal domain while minimizing the need for manual annotations of data.
摘要：分析法律意见文本中可用的法律意见的情绪可以促进若干用例，如法律判断预测，矛盾陈述识别和基于党的情绪分析。然而，开发法律域特定情绪的任务由于资源限制而挑战，例如缺乏域特定标记的数据和域专业知识。在这项研究中，我们提出了新颖的技术，可用于为法律领域开发一种情感注释者，同时最大限度地减少对数据的手动注释的需求。

68. Method of the coherence evaluation of Ukrainian text [PDF] 返回目录
S. D. Pogorilyy, A. A. Kramov
Abstract: Due to the growing role of the SEO technologies, it is necessary to perform an automated analysis of the article's quality. Such approach helps both to return the most intelligible pages for the user's query and to raise the web sites positions to the top of query results. An automated assessment of a coherence is a part of the complex analysis of the text. In this article, main methods for text coherence measurements for Ukrainian language are analyzed. Expediency of using the semantic similarity graph method in comparison with other methods are explained. It is suggested the improvement of that method by the pre-training of the neural network for vector representations of sentences. Experimental examination of the original method and its modifications is made. Training and examination procedures are made on the corpus of Ukrainian texts, which were previously retrieved from abstracts and full texts of Ukrainian scientific articles. The testing procedure is implemented by performing of two typical tasks for the text coherence assessment: document discrimination task and insertion task. Accordingly to the analysis it is defined the most effective combination of method's modification and its parameter for the measurement of the text coherence.
摘要：由于SEO技术的作用越来越多，有必要对文章的质量进行自动分析。这种方法有助于返回用户查询最理敏感的页面，并将网站将位置提升到查询结果的顶部。一致性的自动评估是文本复杂分析的一部分。在本文中，分析了乌克兰语语言的文本连贯测量的主要方法。解释了使用语义相似性图方法的权限与其他方法相比。建议通过对句子的矢量表示的神经网络预先训练来改善该方法。对原始方法的实验检查及其修改。培训和审查程序是在乌克兰文本的语料库中进行的，以前从乌克兰科学文章的摘要和全文中取出。测试过程是通过对文本一致性评估的两个典型任务进行实现：文档辨别任务和插入任务。因此，为了分析，它定义了方法修改的最有效组合及其用于测量文本连贯性的参数。

69. Neural Coreference Resolution for Arabic [PDF] 返回目录
Abdulrahman Aloraini, Juntao Yu, Massimo Poesio
Abstract: No neural coreference resolver for Arabic exists, in fact we are not aware of any learning-based coreference resolver for Arabic since (Bjorkelund and Kuhn, 2014). In this paper, we introduce a coreference resolution system for Arabic based on Lee et al's end to end architecture combined with the Arabic version of bert and an external mention detector. As far as we know, this is the first neural coreference resolution system aimed specifically to Arabic, and it substantially outperforms the existing state of the art on OntoNotes 5.0 with a gain of 15.2 points conll F1. We also discuss the current limitations of the task for Arabic and possible approaches that can tackle these challenges.
摘要：没有阿拉伯语的神经Coreference解析器存在，实际上我们并不了解以来（Bjorkelund和Kuhn，2014）以来的阿拉伯语的任何基于学习的Coreference解析器。在本文中，我们向阿拉伯语介绍了一种基于Lee等人的端到端建筑的阿拉伯语的Coreference解决系统，结合阿拉伯语版BERT和外部提及探测器。据我们所知，这是第一个专门针对阿拉伯语的神经循环解析系统，它基本上优于onototes 5.0的现有技术状态，增益为15.2点C1。我们还讨论了阿拉伯语和可能解决这些挑战的可能方法的当前限制。

70. Rumor Detection on Twitter Using Multiloss Hierarchical BiLSTM with an Attenuation Factor [PDF] 返回目录
Yudianto Sujana, Jiawen Li, Hung-Yu Kao
Abstract: Social media platforms such as Twitter have become a breeding ground for unverified information or rumors. These rumors can threaten people's health, endanger the economy, and affect the stability of a country. Many researchers have developed models to classify rumors using traditional machine learning or vanilla deep learning models. However, previous studies on rumor detection have achieved low precision and are time consuming. Inspired by the hierarchical model and multitask learning, a multiloss hierarchical BiLSTM model with an attenuation factor is proposed in this paper. The model is divided into two BiLSTM modules: post level and event level. By means of this hierarchical structure, the model can extract deep in-formation from limited quantities of text. Each module has a loss function that helps to learn bilateral features and reduce the training time. An attenuation fac-tor is added at the post level to increase the accuracy. The results on two rumor datasets demonstrate that our model achieves better performance than that of state-of-the-art machine learning and vanilla deep learning models.
摘要：诸如Twitter之类的社交媒体平台已成为未经证明的信息或谣言的繁殖理由。这些谣言可能会威胁人们的健康，危及经济，并影响一个国家的稳定性。许多研究人员已经开发了使用传统机器学习或vanilla深入学习模型来分类谣言的模型。然而，以前的谣言检测研究已经实现了低精度并且耗时。通过分层模型和多任务学习的启发，本文提出了一种具有衰减因子的多载分层Bilstm模型。该模型分为两种Bilstm模块：后级和事件级别。借助于该层次结构，该模型可以从有限的文本中提取深度形成。每个模块都有一个损耗功能，有助于学习双边特征并减少培训时间。衰减FAC-TOR在后级添加以提高准确性。两个谣言数据集的结果表明，我们的模型比最先进的机器学习和vanilla深度学习模型实现了更好的性能。

71. Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution [PDF] 返回目录
Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Massimo Poesio
Abstract: Now that the performance of coreference resolvers on the simpler forms of anaphoric reference has greatly improved, more attention is devoted to more complex aspects of anaphora. One limitation of virtually all coreference resolution models is the focus on single-antecedent anaphors. Plural anaphors with multiple antecedents-so-called split-antecedent anaphors (as in John met Mary. They went to the movies) have not been widely studied, because they are not annotated in ONTONOTES and are relatively infrequent in other corpora. In this paper, we introduce the first model for unrestricted resolution of split-antecedent anaphors. We start with a strong baseline enhanced by BERT embeddings, and show that we can substantially improve its performance by addressing the sparsity issue. To do this, we experiment with auxiliary corpora where split-antecedent anaphors were annotated by the crowd, and with transfer learning models using element-of bridging references and single-antecedent coreference as auxiliary tasks. Evaluation on the gold annotated ARRAU corpus shows that the out best model uses a combination of three auxiliary corpora achieved F1 scores of 70% and 43.6% when evaluated in a lenient and strict setting, respectively, i.e., 11 and 21 percentage points gain when compared with our baseline.
摘要：现在Coreference Resolvers对简单形式的视力学参考的表现大大提高，更多的关注是致力于华侨人的更复杂方面。几乎所有Coreference解决模型的一个限制是对单一前进的赞助人的关注。多元的阴道与多个先行者 - 所谓的分裂前瞻性化学者（如John Met Mary）。他们去看电影）尚未被广泛研究，因为它们在Onototes中没有注释，在其他公司中相对不常见。在本文中，我们介绍了分裂前瞻性的不受限制解决的第一个模型。我们从BERT Embeddings开始增强的强大基线，并表明我们可以通过解决稀疏问题大大提高其性能。为此，我们试验辅助Corpora，其中分裂前瞻由人群注释，并使用浏览参考的元素和单一前进的Coreference作为辅助任务的传输学习模型。黄金注释arrau语料库的评估表明，当在宽松和严格的环境中，即11和21个百分点的增益分别在评估时，最佳模型的组合使用了三个辅助的组合达到了70％和43.6％的F1分数。我们的基线。

72. Evaluating Bias In Dutch Word Embeddings [PDF] 返回目录
Rodrigo Alejandro Chávez Mulsa, Gerasimos Spanakis
Abstract: Recent research in Natural Language Processing has revealed that word embeddings can encode social biases present in the training data which can affect minorities in real world applications. This paper explores the gender bias implicit in Dutch embeddings while investigating whether English language based approaches can also be used in Dutch. We implement the Word Embeddings Association Test (WEAT), Clustering and Sentence Embeddings Association Test (SEAT) methods to quantify the gender bias in Dutch word embeddings, then we proceed to reduce the bias with Hard-Debias and Sent-Debias mitigation methods and finally we evaluate the performance of the debiased embeddings in downstream tasks. The results suggest that, among others, gender bias is present in traditional and contextualized Dutch word embeddings. We highlight how techniques used to measure and reduce bias created for English can be used in Dutch embeddings by adequately translating the data and taking into account the unique characteristics of the language. Furthermore, we analyze the effect of the debiasing techniques on downstream tasks which show a negligible impact on traditional embeddings and a 2% decrease in performance in contextualized embeddings. Finally, we release the translated Dutch datasets to the public along with the traditional embeddings with mitigated bias.
摘要：最近的自然语言处理研究揭示了嵌入词可以编码培训数据中存在的社会偏见，这可以影响现实世界应用中的少数群体。本文探讨了荷兰嵌入中隐含的性别偏见，同时调查基于英语语言的方法也可以在荷兰语中使用。我们实现了Word Embeddings关联测试（Weat），群集和句子嵌入关联测试（座位）方法，以量化荷兰语单词Embeddings中的性别偏差，然后我们继续减少硬盘和发送的Debias缓解方法的偏差，最后我们评估下游任务中脱叠嵌入式的表现。结果表明，除了其他方面，性别偏见存在于传统和环境化的荷兰语单词嵌入中。我们强调如何通过充分翻译数据并考虑语言的独特特征，在荷兰嵌入中使用用于测量和减少为英语创建的偏见的技术。此外，我们分析了脱叠技术对下游任务的影响，这对传统嵌入的影响忽略不计，并且在上下脑化嵌入中的性能下降了2％。最后，我们将翻译的荷兰数据集与传统嵌入的传统嵌入释放释放。

73. Personalized Multimodal Feedback Generation in Education [PDF] 返回目录
Haochen Liu, Zitao Liu, Zhongqin Wu, Jiliang Tang
Abstract: The automatic evaluation for school assignments is an important application of AI in the education field. In this work, we focus on the task of personalized multimodal feedback generation, which aims to generate personalized feedback for various teachers to evaluate students' assignments involving multimodal inputs such as images, audios, and texts. This task involves the representation and fusion of multimodal information and natural language generation, which presents the challenges from three aspects: 1) how to encode and integrate multimodal inputs; 2) how to generate feedback specific to each modality; and 3) how to realize personalized feedback generation. In this paper, we propose a novel Personalized Multimodal Feedback Generation Network (PMFGN) armed with a modality gate mechanism and a personalized bias mechanism to address these challenges. The extensive experiments on real-world K-12 education data show that our model significantly outperforms several baselines by generating more accurate and diverse feedback. In addition, detailed ablation experiments are conducted to deepen our understanding of the proposed framework.
摘要：学校作业的自动评估是AI在教育领域的重要应用。在这项工作中，我们专注于个性化的多模式反馈生成的任务，旨在为各种教师提供个性化反馈，以评估涉及图像，音频和文本等多式化输入的学生的作业。这项任务涉及多式联运信息和自然语言生成的表示和融合，它提出了三个方面的挑战：1）如何编码和整合多式数输入; 2）如何生成特定于每种方式的反馈; 3）如何实现个性化反馈生成。在本文中，我们提出了一种新颖的个性化多模态反馈生成网络（PMFGN），其具有模态栅极机制和个性化偏置机制来解决这些挑战。关于现实世界K-12教育数据的广泛实验表明，我们的模型通过产生更准确和多样化的反馈来显着优于几个基线。此外，进行了详细的消融实验，以加深我们对所提出的框架的理解。

74. Understanding Pre-trained BERT for Aspect-based Sentiment Analysis [PDF] 返回目录
Hu Xu, Lei Shu, Philip S. Yu, Bing Liu
Abstract: This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at this https URL.
摘要：本文分析了基于宽高的情绪分析（ABSA）的博士学位评论中获取的预训练隐藏表示。我们的作品是由最近的ABSA语言模型的进展激励。但是，目前尚不清楚（蒙版）语言模型的一般代理任务如何在没有注释的未标记语料库上培训，没有注释，可以为ABS中的下游任务提供重要特征。通过利用ABSA中的注释数据集，我们调查了在评论中预先培训的BERT的注意事项和学习的陈述。我们发现BERT使用很少的自我关注头来编码上下文词（例如指示一个方面的介词或代词）和某个方面的意见单词。一个方面的表示中的大多数功能都专用于域（或产品类别）的细粒度语义，而不是从其上下文中携带总结意见。我们希望这项调查可以帮助未来的研究改善自我监督的学习，无监督的学习和ABS的微调。可以在此HTTPS URL找到预先训练的模型和代码。

75. Improving Dialogue Breakdown Detection with Semi-Supervised Learning [PDF] 返回目录
Nathan Ng, Marzyeh Ghassemi, Narendran Thangarajan, Jiacheng Pan, Qi Guo
Abstract: Building user trust in dialogue agents requires smooth and consistent dialogue exchanges. However, agents can easily lose conversational context and generate irrelevant utterances. These situations are called dialogue breakdown, where agent utterances prevent users from continuing the conversation. Building systems to detect dialogue breakdown allows agents to recover appropriately or avoid breakdown entirely. In this paper we investigate the use of semi-supervised learning methods to improve dialogue breakdown detection, including continued pre-training on the Reddit dataset and a manifold-based data augmentation method. We demonstrate the effectiveness of these methods on the Dialogue Breakdown Detection Challenge (DBDC) English shared task. Our submissions to the 2020 DBDC5 shared task place first, beating baselines and other submissions by over 12\% accuracy. In ablations on DBDC4 data from 2019, our semi-supervised learning methods improve the performance of a baseline BERT model by 2\% accuracy. These methods are applicable generally to any dialogue task and provide a simple way to improve model performance.
摘要：在对话器中建立用户信任需要平稳且一致的对话交流。然而，代理人可以很容易地失去对话背景并产生无关的话语。这些情况称为对话崩溃，代理话语阻止用户继续对话。建筑系统检测对话击穿允许代理程序适当地恢复或完全避免故障。在本文中，我们调查了使用半监控学习方法来改善对话崩溃检测，包括继续培训Reddit DataSet和基于歧管的数据增强方法。我们展示了这些方法对对话破坏检测挑战（DBDC）英语共享任务的有效性。我们的提交给2020 DBDC5共享任务首先，以超过12 \％的准确性击败基线和其他提交。在2019年的DBDC4数据中的消融中，我们的半监督学习方法将基线BERT模型的性能提高了2 \％的精度。这些方法通常适用于任何对话任务，并提供一种提高模型性能的简单方法。

76. Learning Structured Representations of Entity Names using Active Learning and Weak Supervision [PDF] 返回目录
Kun Qian, Poornima Chozhiyath Raman, Yunyao Li, Lucian Popa
Abstract: Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.
摘要：实体名称的结构化表示对于许多与实体归一化和变体生成等许多与实体相关的任务有用。学习没有上下文和外部知识的实体名称的隐式结构化表示尤其具有挑战性。在本文中，我们提出了一种新的学习框架，将活跃的学习和弱监管结合起来解决这个问题。我们的实验评估表明，该框架可以从仅仅是十几个或如此标记的示例来学习高质量模型。

77. Joint Masked CPC and CTC Training for ASR [PDF] 返回目录
Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve
Abstract: Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR). But, training SSL models like wav2vec~2.0 requires a two-stage pipeline. In this paper we demonstrate a single-stage training of ASR models that can utilize both unlabeled and labeled data. During training, we alternately minimize two losses: an unsupervised masked Contrastive Predictive Coding (CPC) loss and the supervised audio-to-text alignment loss Connectionist Temporal Classification (CTC). We show that this joint training method directly optimizes performance for the downstream ASR task using unsupervised data while achieving similar word error rates to wav2vec~2.0 on the Librispeech 100-hour dataset. Finally, we postulate that solving the contrastive task is a regularization for the supervised CTC loss.
摘要：自我监督的学习（SSL）在学习对自动语音识别（ASR）的音频的学习表示中显示了承诺。但是，像Wav2Vec〜2.0这样的训练SSL模型需要两级管道。在本文中，我们展示了可以利用未标记和标记数据的ASR模型的单级训练。在培训期间，我们交替最大限度地减少了两次损失：无监督的掩蔽对比预测编码（CPC）丢失和监督的音频到文本对齐丢失连接员时间分类（CTC）。我们表明，此联合训练方法使用无监督数据直接优化下游ASR任务的性能，同时在LibrisPeech 100小时数据集上实现对Wav2Vec〜2.0的类似字误差速率。最后，我们假设解决对比任务是监督CTC损失的正则化。

78. Analyzing Gender Bias within Narrative Tropes [PDF] 返回目录
Dhruvil Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan O'Connor, Mohit Iyyer
Abstract: Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl this http URL, an online user-created repository that contains 30K tropes associated with 1.9M examples of their occurrences across film, television, and literature. We automatically score the "genderedness" of each trope in our TVTROPES dataset, which enables an analysis of (1) highly-gendered topics within tropes, (2) the relationship between gender bias and popular reception, and (3) how the gender of a work's creator correlates with the types of tropes that they use.
摘要：流行媒体反映并通过使用Tropers反映和加强社会偏见，这是叙述元素，例如媒体的叙述元素和绘图弧。在本文中，我们专门调查在大量的Tropes集合中的性别偏见。要启用我们的研究，我们抓取了这个HTTP URL，这是一个在线用户创建的存储库，其中包含与电影，电视和文学出现的1.9m的30k Tropes。我们自动在TVTROPES数据集中自动评分每个拖车的“性别”，这使得（1）在Tropers内的高度成年主题分析，（2）性别偏见和流行接待之间的关系，以及（3）性别如何工作的创造者与他们使用的Tropes的类型相关联。

79. Dynamic Data Selection for Curriculum Learning via Ability Estimation [PDF] 返回目录
John P. Lalor, Hong Yu
Abstract: Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.
摘要：课程学习方法通常依赖于启发式估计培训例子的难度或模型的能力。在这项工作中，我们建议用学习的难度参数替换困难的启发式。我们还通过能力估计（DDACLAE）提出了用于课程学习的动态数据选择，该策略探讨每个训练时代的模型能力，以选择该点的最佳训练示例。我们展示了使用学习难度和/或能力优于胶水分类任务的启发式课程学习模型的模型。

80. A New Neural Search and Insights Platform for Navigating and Organizing AI Research [PDF] 返回目录
Marzieh Fadaee, Olga Gureenkova, Fernando Rejon Barrera, Carsten Schnober, Wouter Weerkamp, Jakub Zavrel
Abstract: To provide AI researchers with modern tools for dealing with the explosive growth of the research literature in their field, we introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature. The system provides search at multiple levels of textual granularity, from sentences to aggregations across documents, both in natural language and through navigation in a domain-specific Knowledge Graph. We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
摘要：为了为其领域进行研究文献的爆炸性增长，为AI研究人员提供了现代化的工具，我们介绍了一个新的平台AI Research Navigator，将经典关键字搜索与神经检索相结合，发现和组织相关文献。该系统在自然语言中的句子和域中的文档中的句子中的句子和通过域特定知识图中的导航提供搜索。我们概述了系统的整体架构以及文档分析，问题应答，搜索，分析，专家搜索和建议的组件。

81. A Sui Generis QA Approach using RoBERTa for Adverse Drug Event Identification [PDF] 返回目录
Harshit Jain, Nishant Raj, Suyash Mishra
Abstract: Extraction of adverse drug events from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around entity-relation extraction using bidirectional long short term memory networks (Bi-LSTM) which does not attain the best feature representations. In this paper, we introduce a question answering framework that exploits the robustness, masking and dynamic attention capabilities of RoBERTa by a technique of domain adaptation and attempt to overcome the aforementioned limitations. Our model outperforms the prior work by 9.53% F1-Score.
摘要：从生物医学文献和其他文本数据中提取不良药物事件，是监测药物安全的重要组成部分，这引起了医疗保健的许多研究人员的关注。使用双向长期内存网络（Bi-LSTM），现有的作品更加围绕实体关系提取，该方法不达到最佳特征表示。在本文中，我们介绍了一个问题应答框架，利用域的适应技术利用Roberta的鲁塞特，掩蔽和动态注意力，并试图克服上述限制。我们的模型优于前后工作量为9.53％F1分数。

82. Streaming Simultaneous Speech Translation with Augmented Memory Transformer [PDF] 返回目录
Xutai Ma, Yongqiang Wang, Mohammad Javad Dousti, Philipp Koehn, Juan Pino
Abstract: Transformer-based models have achieved state-of-the-art performance on speech translation tasks. However, the model architecture is not efficient enough for streaming scenarios since self-attention is computed over an entire input sequence and the computational cost grows quadratically with the length of the input sequence. Nevertheless, most of the previous work on simultaneous speech translation, the task of generating translations from partial audio input, ignores the time spent in generating the translation when analyzing the latency. With this assumption, a system may have good latency quality trade-offs but be inapplicable in real-time scenarios. In this paper, we focus on the task of streaming simultaneous speech translation, where the systems are not only capable of translating with partial input but are also able to handle very long or continuous input. We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder, which has shown great success on the streaming automatic speech recognition task with hybrid or transducer-based models. We conduct an empirical evaluation of the proposed model on segment, context and memory sizes and we compare our approach to a transformer with a unidirectional mask.
摘要：基于变压器的模型在语音翻译任务中取得了最先进的性能。然而，模型架构不足以足够有效地用于流式场景，因为在整个输入序列上计算自我关注，并且计算成本随着输入序列的长度立方地增长。尽管如此，大多数以前的工作在同时语音翻译中，从部分音频输入生成翻译的任务，忽略了在分析延迟时生成转换时所花费的时间。通过这种假设，系统可能具有良好的延迟质量权衡，但在实时方案中可以不可应用。在本文中，我们专注于流传输同步语音翻译的任务，其中系统不仅能够用部分输入进行平移，而且还能够处理非常长或连续的输入。我们提出了一种基于端到端的变压器的序列到序列模型，配备了增强的存储器变压器编码器，它在具有混合或基于传感器的模型的流自动语音识别任务上表现出巨大的成功。我们对段，上下文和内存大小的提出模型进行了实证评估，我们将我们的方法与单向掩模进行了比较变压器。

83. Speaker anonymisation using the McAdams coefficient [PDF] 返回目录
Jose Patino, Natalia Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans
Abstract: Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness. This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data, is based upon well-known signal processing techniques and is both efficient and effective. The proposed solution uses the McAdams coefficient to transform the spectral envelope of speech signals. Results derived using common VoicePrivacy 2020 databases and protocols show that random, optimised transformations can outperform competing solutions in terms of anonymisation while causing only modest, additional degradations to intelligibility, even in the case of a semi-informed privacy adversary.
摘要：匿名化具有操纵语音信号的目标，以降低扬声器识别的自动方法的可靠性，同时保留语音的其他方面，例如与可懂度和自然有关的方面。本文报告了一种对匿名化的方法，与其他电流方法不同，不需要培训数据，是基于众所周知的信号处理技术，并且既有效又有效。所提出的解决方案使用MCADAMS系数来转换语音信号的光谱包络。使用常见的ove voiceprivacy的结果2020数据库和协议显示随机，优化的转换可以在匿名方面优于竞争解决方案，同时只导致适度，额外的劣化，即使在半通知隐私对手的情况下也是如此。

84. Evaluation of Siamese Networks for Semantic Code Search [PDF] 返回目录
Raunak Sinha, Utkarsh Desai, Srikanth Tamilselvam, Senthil Mani
Abstract: With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese networks are well suited to learning such joint relations between data, but have not been explored in the context of code search. In this work, we evaluate Siamese networks for this task by exploring multiple extraction network architectures. These networks independently process code and text descriptions before passing them to a Siamese network to learn embeddings in a common space. We experiment on two different datasets and discover that Siamese networks can act as strong regularizers on networks that extract rich information from code and text, which in turn helps achieve impressive performance on code search beating previous baselines on $2$ programming languages. We also analyze the embedding space of these networks and provide directions to fully leverage the power of Siamese networks for semantic code search.
摘要：随着开放式储存库和讨论论坛的数量的增加，使用自然语言的语义代码搜索已经变得越来越普遍。然而，这种系统返回的结果的准确性可能是代码和用户查询之间的有限共享词汇量，2）对用户查询的语义理解不足及其与代码语法的关系。暹罗网络非常适合学习数据之间的联合关系，但在代码搜索的背景下尚未探讨。在这项工作中，我们通过探索多个提取网络架构来评估此任务的暹罗网络。这些网络在将它们传递到暹罗网络之前独立地处理代码和文本描述，以便在公共空间中学习嵌入。我们在两种不同的数据集上进行实验，发现暹罗网络可以在网络中提取来自代码和文本的丰富信息的网络中的强大规则，这反过来有助于在$ 2 $ 2 $ 2 $ 2编程语言上击败以前的基线上令人印象深刻的性能。我们还分析了这些网络的嵌入空间，并提供了完全利用暹罗网络的力量的方向进行语义代码搜索。

85. Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks [PDF] 返回目录
Nikhil Fernandes, Alexandra Gkolia, Nicolas Pizzo, James Davenport, Akshar Nair
Abstract: There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. This, coupled with the fact that if lectures were delivered in a video on demand format, there would be no fixed time where the majority of students could ask questions. When questions are asked in a lecture there is a negligible chance of having similar questions repeatedly, but asynchronously this is more likely. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. There are different unsupervised models fit for text clustering, of which the Latent Dirichlet Allocation model is the most commonly used. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs. Due to the probabilistic nature of these topic models, the outputs of them vary for different runs. The general trend we found is that not all the topics were being used for clustering on the first run of the LDA model, which results in a less effective clustering. To tackle probabilistic output, we recursively use the LDA model on the effective topics being used until we obtain an efficiency ratio of 1. Through our experimental results we also establish a reasoning on how Zeno's paradox is avoided.
摘要：课程转型越来越受欢迎，使教学更加互动，适合在线课程。在线课程的普及的增加将导致学者课程相关查询数量增加。这是，如果讲座在按需格式的视频中交付讲座，则大多数学生可能会提出问题的情况下没有固定时间。当在讲座中被要求问题时，有可能重复具有类似问题的可能性可以忽略不计，但是异步更有可能。为了减少在回答每个人的问题上的时间，群集它们是一个理想的选择。有不同的无监督模型适合文本群集，其中潜在的Dirichlet分配模型是最常用的。我们使用分层DireChlet进程来确定LDA模型运行的最佳主题编号输入。由于这些主题模型的概率性质，它们的输出因不同的运行而异。我们发现的一般趋势是，并非所有主题都在LDA模型的第一次运行中用于聚类，这导致较少的群集。为了解决概率的输出，我们递归地使用LDA模型在获得的有效主题上，直到我们获得效率比为1.通过我们的实验结果，我们还建立了ZENO的悖论如何避免悖论的推理。

86. The 2020s Political Economy of Machine Translation [PDF] 返回目录
Steven Weber
Abstract: This paper explores the hypothesis that the diversity of human languages, right now a barrier to interoperability in communication and trade, will become significantly less of a barrier as machine translation technologies are deployed over the next several years.But this new boundary-breaking technology does not reduce all boundaries equally, and it creates new challenges for the distribution of ideas and thus for innovation and economic growth.
摘要：本文探讨了人类语言的多样性，现在对通信和贸易的互操作性障碍的障碍将变得较低，因为机器翻译技术在未来几年内部署了机器翻译技术。但这种新的边界破裂技术不同样减少所有界限，它为思想分配创造了新的挑战，从而为创新和经济增长产生了新的挑战。

87. Advanced Semantics for Commonsense Knowledge Extraction [PDF] 返回目录
Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum
Abstract: Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.
摘要：关于概念及其属性的型号知识（CSK）对于AI应用程序（如强大的Chatbots）有用。在ConceptNet，Tuplekb和其他人这样的工作中编译了大型CSK集合，但受到对象 - 对象（SPO）三元组的表现，具有P和O的S和单片字符串的简单概念。此外，这些项目还具有优先级精度或者回忆，但几乎没有调和这些互补目标。本文提出了一种称为Ascent的方法，自动构建CSK断言的大规模知识库（KB），具有先进的表现力，并且比先前作品更好的精度和召回。通过将复合概念与子组和方面捕获复合概念，以及用语义方面炼制断言，上升超越了三倍。后者对于表达断言的时间和空间有效性是重要的，以及进一步的预选率。 Ascent使用语言模型将开放信息提取与明智的清洁相结合。内在评估显示了上升KB的卓越尺寸和质量，并对QA-支持任务的外在评估强调了上升的益处。

88. Learning from Non-Binary Constituency Trees via Tensor Decomposition [PDF] 返回目录
Daniele Castellana, Davide Bacciu
Abstract: Processing sentence constituency trees in binarised form is a common and popular approach in literature. However, constituency trees are non-binary by nature. The binarisation procedure changes deeply the structure, furthering constituents that instead are close. In this work, we introduce a new approach to deal with non-binary constituency trees which leverages tensor-based models. In particular, we show how a powerful composition function based on the canonical tensor decomposition can exploit such a rich structure. A key point of our approach is the weight sharing constraint imposed on the factor matrices, which allows limiting the number of model parameters. Finally, we introduce a Tree-LSTM model which takes advantage of this composition function and we experimentally assess its performance on different NLP tasks.
摘要：分娩形式的处理句子选区树是一种常见而流行的文学方法。然而，选区树是非二进制的。双相作程序深度变化，进一步处理所接近的成分。在这项工作中，我们介绍了一种新的方法来处理非二元选区树，利用基于张量的模型。特别是，我们展示了基于规范张量分解的强大的组合功能如何利用这种丰富的结构。我们方法的一个关键点是对因子矩阵施加的权重共享约束，其允许限制模型参数的数量。最后，我们介绍了一个Tree-LSTM模型，利用了这种组成功能，我们通过实验评估其在不同的NLP任务上的性能。

89. COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning [PDF] 返回目录
Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox
Abstract: Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities. The method consists of three major components: an attention-aware feature aggregation layer, which leverages the local temporal context (intra-level, e.g., within a clip), a contextual transformer to learn the interactions between low-level and high-level semantics (inter-level, e.g. clip-video, sentence-paragraph), and a cross-modal cycle-consistency loss to connect video and text. The resulting method compares favorably to the state of the art on several benchmarks while having few parameters. All code is available open-source at this https URL
摘要：许多现实世界的视频文本任务涉及不同级别的粒度，例如框架和单词，剪辑和句子或视频和段落，每个都具有不同的语义。在本文中，我们提出了一个合作等级变压器（CoOT），以利用该层级信息并模拟不同粒度和不同方式之间的相互作用。该方法由三个主要组成部分组成：注意感知特征聚合层，它利用本地时间上下文（帧内，例如，在剪辑中），一个上下文变形器来学习低级和高级语义之间的交互（级别，例如剪辑 - 视频，句子 - 段落）以及连接视频和文本的跨模态周期 - 一致性损失。所得到的方法在几乎参数的同时在若干基准上对现有技术的状态有利地进行比较。此HTTPS URL的所有代码都是可用的开源

90. DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation [PDF] 返回目录
Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring
Abstract: In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency and accuracy. The proposed method is composed of a deep neural networks-based (DNN-based) module, including a retinal disease identifier and clinical description generator, and a DNN visual explanation module. To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset. Also, as ground truth, we provide a retinal image dataset manually labeled by ophthalmologists to qualitatively show, the proposed AI-based method is effective. With our experimental results, we show that the proposed method is quantitatively and qualitatively effective. Our method is capable of creating meaningful retinal image descriptions and visual explanations that are clinically relevant.
摘要：在这项工作中，我们提出了一种基于AI的方法，该方法旨在改善传统的视网膜疾病治疗程序，并有助于眼科医生提高诊断效率和准确性。该方法由基于深神经网络（基于DNN的）模块组成，包括视网膜疾病标识符和临床描述发生器，以及DNN视觉解释模块。要培训和验证基于DNN的模块的有效性，我们提出了一个大规模的视网膜疾病图像数据集。此外，作为地面真理，我们提供了由眼科医生手动标记的视网膜图像数据集，以定性地显示，所提出的基于AI的方法是有效的。通过我们的实验结果，我们表明所提出的方法是定量和定性的。我们的方法能够创建有意义的视网膜图像描述和临床相关的视觉解释。

91. The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020 [PDF] 返回目录
Xu Xiang
Abstract: This report describes the systems submitted to the first and second tracks of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020, which ranked second in both tracks. Three key points of the system pipeline are explored: (1) investigating multiple CNN architectures including ResNet, Res2Net and dual path network (DPN) to extract the x-vectors, (2) using a composite angular margin softmax loss to train the speaker models, and (3) applying score normalization and system fusion to boost the performance. Measured on the VoxSRC-20 Eval set, the best submitted systems achieve an EER of $3.808\%$ and a MinDCF of $0.1958$ in the close-condition track 1, and an EER of $3.798\%$ and a MinDCF of $0.1942$ in the open-condition track 2, respectively.
摘要：本报告描述了提交给VoxceB扬声器识别挑战（VOXSRC）2020的第一和第二曲目的系统，该追踪在两条轨道中排名第二。探索了系统管道的三个关键点：（1）调查多个CNN架构，包括Reset，Res2Net和双路径（DPN），以利用复合角裕度Softmax丢失来提取X型载波，（2）培训扬声器模型（3）应用得分标准化和系统融合，提高性能。在VOXSRC-20 eval集上测量，最佳提交的系统达到3.808 \％$ 0.1958 $ 0.1958美元的eer，eer为3.798美元，达到0.1942美元的Mindcf。分别打开状态轨道2。

92. Semantic similarity-based approach to enhance supervised classification learning accuracy [PDF] 返回目录
Houcemeddine Turki, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha
Abstract: This brief communication discusses the usefulness of semantic similarity measures for the evaluation and amelioration of the accuracy of supervised classification learning. It proposes a semantic similarity-based method to enhance the choice of adequate labels for the classification algorithm as well as two metrics (SS-Score and TD-Score) and a curve (SA-Curve) that can be coupled to statistical evaluation measures of supervised classification learning to take into consideration the impact of the semantic aspect of the labels on the classification accuracy.
摘要：这种简短的沟通讨论了语义相似措施的有用性评价和改善了监督分类学习准确性的评估。它提出了一种基于语义的相似性的方法，以增强分类算法的适当标签以及两个度量（SS-SCATE和TD-得分）和曲线（SA曲线），其可以耦合到统计评估措施监督分类学习考虑标签的语义方面对分类准确性的影响。

93. Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization [PDF] 返回目录
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu
Abstract: This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR), which explicitly models source speaker locations. In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance. All three functionalities of D-ASR: localization, separation, and recognition are connected as a single differentiable neural network and trained solely based on ASR error minimization objectives. The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined. In addition, D-ASR does not require explicit direction of arrival (DOA) supervision like existing data-driven localization models, which makes it more appropriate for realistic data. For the case of two source mixtures, D-ASR achieves an average DOA prediction error of less than three degrees. It also outperforms a strong far-field multi-speaker end-to-end system in both separation quality and ASR performance.
摘要：本文提出了一种以端到端的神经网络方式处理远场多扬声器数据的新范式，称为方向自动语音识别（D-ASR），其明确地模拟源扬声器位置。在D-ASR中，源相对于麦克风阵列的方位角被定义为潜在变量。该角度控制分离质量，反过来决定了ASR性能。 D-ASR的所有三个功能：本地化，分离和识别是单个可分辨动的神经网络，并仅基于ASR误差最小化目标培训。 D-ASR对现有方法的优点是三倍：（1）它提供了明确的扬声器位置，（2）它改善了解释性因子，而（3）它可以实现更好的ASR性能，因为该过程更加流动。此外，D-ASR不需要明确的到达方向（DOA）监督，如现有的数据驱动的本地化模型，这使得更适合现实数据。对于两个源混合物的情况，D-ASR的平均DOA预测误差小于三度。它还优于分离质量和ASR性能的强大远场多扬声器端到端系统。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-11-03

目录

摘要