摘要

1. Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines [PDF] 返回目录
David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown
Abstract: In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation. ASR output segmentation is crucial, as ASR systems segment the input audio using purely acoustic information and are not guaranteed to output sentence-like segments. Since most MT systems expect sentences as input, feeding in longer unsegmented passages can lead to sub-optimal performance. We explore the feasibility of using datasets of subtitles from TV shows and movies to train better ASR segmentation models. We further incorporate part-of-speech (POS) tag and dependency label information (derived from the unsegmented ASR outputs) into our segmentation model. We show that this noisy syntactic information can improve model accuracy. We evaluate our models intrinsically on segmentation quality and extrinsically on downstream MT performance, as well as downstream tasks including cross-lingual information retrieval (CLIR) tasks and human relevance assessments. Our model shows improved performance on downstream tasks for Lithuanian and Bulgarian.
摘要：在这项工作中，我们着眼于提高ASR输出分割在资源匮乏的语言语音至文本转换的情况下。 ASR输出分割是至关重要的，因为ASR系统段中的输入音频使用纯粹的声学信息，并且不保证输出句子状链段。由于大多数MT系统预计句子作为输入，在较长的非分割通道喂养可导致次优的性能。我们探索利用字幕的数据集从电视节目和电影，培养更好的ASR分割模式的可行性。我们部分的语音（POS）标记和依赖标签信息（来自非分割ASR输出导出）到我们的细分模型进一步结合。我们表明，这种嘈杂的句法信息可以提高模型的准确性。我们评估我们的模型本质上分割的质量和外在上下游MT性能，以及下游任务，包括跨语言信息检索（CLIR）的任务和人相关的评估。我们的模型显示出改进的对立陶宛和保加利亚下游任务中的表现。

2. Question Generation for Supporting Informational Query Intents [PDF] 返回目录
Xusen Yin, Jonathan May, Li Zhou, Kevin Small
Abstract: Users frequently ask simple factoid questions when encountering question answering (QA) systems, attenuating the impact of myriad recent works designed to support more complex questions. Prompting users with automatically generated suggested questions (SQs) can improve understanding of QA system capabilities and thus facilitate using this technology more effectively. While question generation (QG) is a well-established problem, existing methods are not targeted at producing SQ guidance for human users seeking more in-depth information about a specific concept. In particular, existing QG works are insufficient for this task as the generated questions frequently (1) require access to supporting documents as comprehension context (e.g., How many points did LeBron score?) and (2) focus on short answer spans, often producing peripheral factoid questions unlikely to attract interest. In this work, we aim to generate self-explanatory questions that focus on the main document topics and are answerable with variable length passages as appropriate. We satisfy these requirements by using a BERT-based Pointer-Generator Network (BertPGN) trained on the Natural Questions (NQ) dataset. First, we show that the BertPGN model produces state-of-the-art QG performance for long and short answers for in-domain NQ (BLEU-4 for 20.13 and 28.09, respectively). Secondly, we evaluate this QG model on the out-of-domain NewsQA dataset automatically and with human evaluation, demonstrating that our method produces better SQs for news articles, even those from a different domain than the training data.
摘要：遇到问答（QA）系统时，衰减最近无数的工程设计，支持更复杂的问题影响用户经常问简单的仿真陈述的问题。提示自动生成提出的问题（用户队列）的用户可以提高质量保证体系能力的理解，更有效地使用这种技术从而促进。当询问生成（QG）是一种行之有效的问题，现有的方法不是在为人类用户寻求关于特定概念更深入的信息产生SQ指导的针对性。特别是，现有的QG作品是不够的来完成这个任务所产生的问题经常（1）需要访问证明文件理解上下文（例如，多少点做勒布朗得分？）和（2）注重简短的回答跨度，往往产生外围仿真陈述问题不大可能引起人们的兴趣。在这项工作中，我们的目标是产生不言自明的问题，专注于主文档主题，并交代长度可变通道适当。我们通过训练有素的自然问题（NQ）数据集基于BERT指针发电机网络（BertPGN）满足这些要求。首先，我们表明，BertPGN模型产生状态的最先进的QG性能长期和短期的答案为在域NQ（BLEU-4 20.13 28.09和，分别地）。其次，我们会自动与人工评估评估对域外的最NewsQA数据集此QG模型，证明了我们的方法产生的新闻报道更好的用户队列，从除训练数据不同的域，甚至那些。

3. PySBD: Pragmatic Sentence Boundary Disambiguation [PDF] 返回目录
Nipun Sadvilkar, Mark Neumann
Abstract: In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical sentences even when the format and domain of the input text is unknown. In our work, we adapt the Golden Rules Set (a language-specific set of sentence boundary exemplars) originally implemented as a ruby gem - pragmatic_segmenter - which we ported to Python with additional improvements and functionality. PySBD passes 97.92% of the Golden Rule Set exemplars for English, an improvement of 25% over the next best open-source Python tool.
摘要：在本文中，我们提出了一个基于规则的句子边界歧义Python包，用于22种语言的作品外的即装即用。我们的目标是提供一个现实的分割，可以提供逻辑的句子即使在输入文本的格式和域是未知的。在我们的工作中，我们适应了黄金规则集（特定语言的句子集边界典范的）最初实现为红宝石的宝石 - pragmatic_segmenter - 这是我们移植到Python和额外的改进和功能。 PySBD通过黄金规则集典范英语，25％在未来的最好的开放源码Python工具的改进的97.92％。

4. Adaptive Attentional Network for Few-Shot Knowledge Graph Completion [PDF] 返回目录
Jiawei Sheng, Shu Guo, Zhenyu Chen, Juwei Yue, Lihong Wang, Tingwen Liu, Hongbo Xu
Abstract: Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
摘要：很少拍知识图（KG）完成是当前研究的热点，其中每个任务的目的是查询给予其几炮参考实体对的关系看不见的事实。最近试图通过学习实体和引用的静态表示，忽略了他们的动态性能即，实体可以工作关系中表现出不同的角色，并引用可能会做出不同的查询捐款解决这个问题。这项工作提出了几拍KG完成通过学习适应实体和参照表示自适应注意力网络。具体来说，实体是受自适应邻居编码器模型以分辨其面向任务的角色，而引用是受自适应查询感知聚合建模来区分他们的贡献。通过关注机制，实体和引用可以捕捉它们的细粒度语义，从而使更多的表现表示。这将是在几拍的场景知识获取更多的预测。评价链接预测在两个公共数据集显示，我们的方法实现了国家的最先进的新结果与不同的为数不多的拍摄尺寸。

5. An Empirical Study for Vietnamese Constituency Parsing with Pre-training [PDF] 返回目录
Tuan-Vi Tran, Xuan-Thien Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
Abstract: In this work, we use a span-based approach for Vietnamese constituency parsing. Our method follows the self-attention encoder architecture and a chart decoder using a CKY-style inference algorithm. We present analyses of the experiment results of the comparison of our empirical method using pre-training models XLM-Roberta and PhoBERT on both Vietnamese datasets VietTreebank and NIIVTB1. The results show that our model with XLM-Roberta archived the significantly F1-score better than other pre-training models, VietTreebank at 81.19% and NIIVTB1 at 85.70%.
摘要：在这项工作中，我们使用越南选区解析基于跨度的方法。我们的方法遵循自我注意编码器架构和解码器使用CKY式的推理算法的图表。我们对我们的越南语数据集VietTreebank和NIIVTB1使用预培训模式XLM，罗伯塔和PhoBERT经验方法的比较试验结果，目前的分析。结果表明，我们与XLM - 罗伯塔模型归档的显著更好的F1-得分比其他前的训练模式，VietTreebank为81.19％和NIIVTB1在85.70％。

6. Incorporating Terminology Constraints in Automatic Post-Editing [PDF] 返回目录
David Wan, Chris Kedzie, Faisal Ladhak, Marine Carpuat, Kathleen McKeown
Abstract: Users of machine translation (MT) may want to ensure the use of specific lexical terminologies. While there exist techniques for incorporating terminology constraints during inference for MT, current APE approaches cannot ensure that they will appear in the final translation. In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating that our approach enables preservation of 95% of the terminologies and also improves translation quality on English-German benchmarks. Even when applied to lexically constrained MT output, our approach is able to improve preservation of the terminologies. However, we show that our models do not learn to copy constraints systematically and suggest a simple data augmentation technique that leads to improved performance and robustness.
摘要：机器翻译（MT）的用户可能希望确保使用特定的词汇术语的。虽然存在技术推断为MT过程中引入术语的限制，目前的APE方法不能保证他们会出现在最终的翻译。在本文中，我们提出了一个词汇受限APE两种自回归和非自回归模型，证明了我们的方法实现了对术语的95％保存，也提高了英语，德语基准翻译质量。即使施加词法限制MT输出，我们的做法是能够提高术语的保存。然而，我们表明，我们的模型不学习，系统地复制约束和建议一个简单的数据增强技术，显着提高性能和健壮性。

7. Drug Repurposing for COVID-19 via Knowledge Graph Completion [PDF] 返回目录
Rui Zhang, Dimitar Hristovski, Dalton Schutte, Andrej Kastrin, Marcelo Fiszman, Halil Kilicoglu
Abstract: Objective: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. Methods: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. Results: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. Conclusion: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.
摘要：目的：运用文献来源的知识和知识图完井方法发现的候选药物进行重构的COVID-19。方法：我们提出一种新型的，综合性，以及基于神经网络的基于文献的发现（LBD）的方式从两个PubMed和COVID-19为中心的研究文献确定的候选药物。我们的方法依赖于使用SemRep提取语义三元（通过SemMedDB）。我们确定使用过滤规则，并提供了BERT上开发的变型的精度分类语义三元的信息子集，并且用这个子集来构建一个知识图谱。五SOTA，神经知识图完成算法被用来预测药物候选再利用。对模型进行训练和使用时间分片的方式进行评估和预测药物与药物文献报道和临床试验中进行评估的列表进行比较。这些模型通过基于模式发现方法的补充。结果：基于PubMedBERT精度分类器实现在语义分类断言的最佳性能（F1 = 0.854）。在五个知识图完成模型，TRANSE胜过别人（MR = 0.923，Hits@1=0.417）。挂COVID-19在文献中的一些已知药物进行鉴别，以及尚未被研究的一些候选药物。发现模式启用代有关候选药物和COVID-19之间的关系，合理的假设。其中，5种高排名和新颖的药物（紫杉醇，SB 203580，α-2 - 抗纤溶酶，二硫代氨基甲酸吡咯烷，和丁基化羟基甲苯）与他们的机械解释被进一步讨论。结论：我们证明了一个LBD方法对于发现的候选药物COVID-19，以及用于生成机理的解释是可行的。我们的方法可以推广到其他疾病以及其他临床问题。

8. Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering [PDF] 返回目录
Jeroen Offerijns, Suzan Verberne, Tessa Verhoef
Abstract: For the field of education, being able to generate semantically correct and educationally relevant multiple choice questions (MCQs) could have a large impact. While question generation itself is an active research topic, generating distractors (the incorrect multiple choice options) receives much less attention. A missed opportunity, since there is still a lot of room for improvement in this area. In this work, we train a GPT-2 language model to generate three distractors for a given question and text context, using the RACE dataset. Next, we train a BERT language model to answer MCQs, and use this model as a filter, to select only questions that can be answered and therefore presumably make sense. To evaluate our work, we start by using text generation metrics, which show that our model outperforms earlier work on distractor generation (DG) and achieves state-of-the-art performance. Also, by calculating the question answering ability, we show that larger base models lead to better performance. Moreover, we conducted a human evaluation study, which confirmed the quality of the generated questions, but showed no statistically significant effect of the QA filter.
摘要：针对教育领域，能产生语义正确和教育相关的选择题（多选题）可以有很大的影响。当询问生成本身是一个活跃的研究课题，产生干扰项（不正确的选择题选项）备受关注较少。一个错失的机会，因为目前仍然没有在这方面改进的空间很大。在这项工作中，我们培养了GPT-2语言模型来生成一个给定的问题和文本上下文3分分心，使用RACE数据集。接下来，我们培养了BERT语言模型来回答多选题，并使用此模型作为过滤器，只选择都可以回答，因此可能是有意义的问题。为了评估我们的工作，我们通过使用文本生成的指标，这表明，在牵张发电（DG）我们的模型优于先前的工作，并实现国家的最先进的性能开始。此外，通过计算答疑能力，我们表明，大基地模式带来更好的性能。此外，我们进行了人工评估研究，证实了产生问题的质量，但显示质量保证过滤器的无统计学显著影响。

9. Cold-start Active Learning through Self-supervised Language Modeling [PDF] 返回目录
Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber
Abstract: Active learning strives to reduce annotation costs by choosing the most critical examples to label. Typically, the active learning strategy is contingent on the classification model. For instance, uncertainty sampling depends on poorly calibrated model confidence scores. In the cold-start setting, active learning is impractical because of model instability and data scarcity. Fortunately, modern NLP provides an additional source of information: pre-trained language models. The pre-training loss can find examples that surprise the model and should be labeled for efficient fine-tuning. Therefore, we treat the language modeling loss as a proxy for classification uncertainty. With BERT, we develop a simple strategy based on the masked language modeling loss that minimizes labeling costs for text classification. Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and computation time.
摘要：主动学习努力打造选择最重要的例子，标签标注降低成本。通常情况下，主动学习策略是该分类模型队伍。举例来说，不确定性采样取决于不佳校准模式的信心分数。在冷启动的设置，主动学习是不切实际的，因为模型的不稳定性和数据的稀缺性。幸运的是，现代NLP提供附加的信息来源：预先训练语言模型。预培训的损失可以发现惊喜的模型，并应标示为有效微调的例子。因此，我们对待语言建模损失作为分类不确定性的代理。随着BERT，我们开发基于掩盖语言建模损失最小化标签的文本分类成本的简单策略。相比其他基线，我们的方法达到更低的采样迭代和计算时间内更高的精度。

10. Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads [PDF] 返回目录
Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
Abstract: Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.
摘要：基于变压器的预先训练的语言模型（周期性肢体运动障碍）在许多任务已经大大提高了艺术的NLP的状态。这导致在分析语法知识周期性肢体运动障碍学极大的兴趣。以前对这个问题的方法很有限，大多采用测试套件或探头。在这里，我们充分提出了一种无监督的分析方法，从PLM注意头提取选区的树木。我们排名变压器关注元首基于其固有的特性，创造的高级负责人的合奏来产生最终的树。我们的方法是适用于低资源的语言，因为它不依赖于开发集，它可以是昂贵的注释。我们的实验表明，该方法通常优于现有的方法，如果没有目前的发展集。我们的无监督解析器也可以用来作为一种工具来分析周期性肢体运动障碍学隐含的语法。对于这一点，我们用我们的方法引起的解析树训练神经PCFG和比较，从人类标注的树库导出的语法。

11. Diving Deep into Context-Aware Neural Machine Translation [PDF] 返回目录
Jingjing Huo, Christian Herold, Yingbo Gao, Leonard Dahlmann, Shahram Khadivi, Hermann Ney
Abstract: Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts.
摘要：上下文感知神经机器翻译（NMT）是提高通过利用附加的上下文，例如，文档级翻译，或具有元信息的翻译质量有希望的方向。尽管存在各种架构和分析，不同的上下文感知NMT模式的有效性没有很好地探索尚未。本文分析文档级NMT模型对具有平行文档级双语数据的变化量4个多样结构域的性能。我们进行了全面的实验，调查文档级NMT的影响。我们发现，对文档级NMT没有单一的最好的办法，而是不同的架构拔得头筹不同的任务。纵观各个特定任务的问题，如代词分辨率或标题的翻译，我们发现在上下文感知系统的改善，甚至在情况下，胼级指标，如BLEU显示没有显著改善。我们还表明，文档级回译显著有助于弥补缺乏文档级双向文本。

12. Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains [PDF] 返回目录
Eyal Shnarch, Leshem Choshen, Guy Moshkowich, Noam Slonim, Ranit Aharonov
Abstract: Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory. Aiming to assist domain experts in their first steps into a new task over a new corpus, we present an unsupervised approach to reveal complex rules which cluster the unexplored corpus by its prominent categories (or facets). These rules are human-readable, thus providing an important ingredient which has become in short supply lately - explainability. Each rule provides an explanation for the commonality of all the texts it clusters together. We present an extensive evaluation of the usefulness of these rules in identifying target categories, as well as a user study which assesses their interpretability.
摘要：走近新的数据可以说是相当的威慑;你不知道你的兴趣类别实现在手在里面，通常，没有标记的数据，和领域适应性方法的性能是不能令人满意的。旨在协助领域专家在他们的第一个步骤到了一个新的语料库中的新任务，我们提出了一种无监督的方法来揭示复杂的规则，其突出的类别（或面）群集未开发的语料库。这些规则是人类可读的，从而提供了一个重要的成分，其已成为供不应求近来 - explainability。每个规则规定的所有文本的共性它聚类在一起的解释。我们提出在确定目标类别，以及作为评估其解释性用户研究这些规则的有用性进行了广泛的评价。

13. Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation [PDF] 返回目录
Dušan Variš, Ondřej Bojar
Abstract: This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling tasks. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context.
摘要：这项工作提出了神经机器翻译（NMT）我们无监督训练前的正在进行的研究。在我们的方法，我们初始化编码器和解码器的权重与被训练与单语数据，然后微调使用弹性体重合并（EWC），以避免原始语言建模任务遗忘并行数据模型中的两个语言模型。我们通过EWC与通过语言模型的目标集中在正规化以前的工作比较正规化。阳性结果是，使用EWC与解码器实现类似于以前的工作BLEU分数。然而，该模型收敛速度快2-3倍，而且不需要在微调阶段，原来未标记的训练数据。相比之下，使用EWC正则效果较差，如果原来的和新的任务的关系并不密切。我们发现，初始化与左到右语言模型的双向NMT编码器，迫使模型记得原来左到右语言建模任务限制了编码器的整个双向上下文中的学习能力。

14. Revisiting Modularized Multilingual NMT to Meet Industrial Demands [PDF] 返回目录
Sungwon Lyu, Bokyung Son, Kichang Yang, Jaekyoung Bae
Abstract: The complete sharing of parameters for multilingual translation (1-1) has been the mainstream approach in current research. However, degraded performance due to the capacity bottleneck and low maintainability hinders its extensive adoption in industries. In this study, we revisit the multilingual neural machine translation model that only share modules among the same languages (M2) as a practical alternative to 1-1 to satisfy industrial requirements. Through comprehensive experiments, we identify the benefits of multi-way training and demonstrate that the M2 can enjoy these benefits without suffering from the capacity bottleneck. Furthermore, the interlingual space of the M2 allows convenient modification of the model. By leveraging trained modules, we find that incrementally added modules exhibit better performance than singly trained models. The zero-shot performance of the added modules is even comparable to supervised models. Our findings suggest that the M2 can be a competent candidate for multilingual translation in industries.
摘要：对多语种翻译（1-1）参数的完全共享一直是当前研究的主流方法。然而，性能降低，由于产能瓶颈，低维护性阻碍了其在行业广泛采用。在这项研究中，我们重温了多语种神经机器翻译模型，相同的语言（M2）为1-1实际的选择中，只有共享模块，以满足工业要求。通过综合性实验，我们确定多路训练的好处，并表明M2可以享受这些好处，而不从产能瓶颈的痛苦。此外，M2的语际空间允许模型的方便的修改。通过利用受过训练的模块，我们发现，逐步添加模块表现出比单独训练的机型更好的表现。所添加的模块的零射门的表现甚至可以媲美监督模式。我们的研究结果表明，M2可以为行业多语种翻译一个称职的候选人。

15. The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual Relation Classification [PDF] 返回目录
Abdullatif Köksal, Arzucan Özgür
Abstract: Relation classification is one of the key topics in information extraction, which can be used to construct knowledge bases or to provide useful information for question answering. Current approaches for relation classification are mainly focused on the English language and require lots of training data with human annotations. Creating and annotating a large amount of training data for low-resource languages is impractical and expensive. To overcome this issue, we propose two cross-lingual relation classification models: a baseline model based on Multilingual BERT and a new multilingual pretraining setup, which significantly improves the baseline with distant supervision. For evaluation, we introduce a new public benchmark dataset for cross-lingual relation classification in English, French, German, Spanish, and Turkish, called RELX. We also provide the RELX-Distant dataset, which includes hundreds of thousands of sentences with relations from Wikipedia and Wikidata collected by distant supervision for these languages. Our code and data are available at: this https URL
摘要：关系分类的信息抽取的重点课题，它可以被用来构建知识库或提供答疑有用的信息之一。对于关系的分类目前的做法主要集中在英语语言需要大量的人类注释的训练数据。创建和注释的训练数据资源少的语言大量是不切实际的和昂贵的。为了解决这个问题，我们提出了两个跨语言关系分类模型：基于多语种BERT和全新的多语言训练前的设置，这显著改善了与远方的监督基准的基准模型。对于评估中，我们介绍了英语，法语，德语，西班牙语和土耳其语跨语言关系分类，称为RELX一个新的公共基准数据集。我们还提供RELX遥远的数据集，其中包括数以十万计的句子与维基百科和维基数据通过这些语言遥远的监督收集的关系。我们的代码和数据，请访问：此HTTPS URL

16. Understanding Unnatural Questions Improves Reasoning over Text [PDF] 返回目录
Xiao-Yu Guo, Yuan-Fang Li, Gholamreza Haffari
Abstract: Complex question answering (CQA) over raw text is a challenging task. A prominent approach to this task is based on the programmer-interpreter framework, where the programmer maps the question into a sequence of reasoning actions which is then executed on the raw text by the interpreter. Learning an effective CQA model requires large amounts of human-annotated data,consisting of the ground-truth sequence of reasoning actions, which is time-consuming and expensive to collect at scale. In this paper, we address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions which are more convenient to parse. We firstly generate synthetic (question,action sequence) pairs by a data generator, and train a semantic parser that associates synthetic questions with their corresponding action sequences. To capture the diversity when applied tonatural questions, we learn a projection model to map natural questions into their most similar unnatural questions for which the parser can work well. Without any natural training data, our projection model provides high-quality action sequences for the CQA task. Experimental results show that the QA model trained exclusively with synthetic data generated by our method outperforms its state-of-the-art counterpart trained on human-labeled data.
摘要：复杂的问答（CQA）通过原始文本是一个具有挑战性的任务。这个任务的一个突出的做法是基于程序员解释框架，其中编程的问题映射到推理，然后由翻译原始文本执行操作的顺序。学习的有效CQA模型需要大量的人力标注的数据，包括推理动作的地面实况序列，这是费时和昂贵的收集大规模的。在本文中，我们讨论学习通过投影人类自然产生的问题成更方便解析非自然机器生成的问题，高品质的程序员（解析器）的挑战。我们首先生成合成的（问题，动作序列）对由数据发生器，和培养了语义解析器与它们相应的动作序列相关联的合成的问题。要捕捉的多样性时，应用tonatural问题，我们学习的投影模式，自然问题到他们最相似的不自然的问题，映射为其解析器可以很好地工作。没有任何天然的训练数据，我们的预测模型提供了CQA任务高质量的动作场面。实验结果表明，该模型QA专门训练了与由我们的方法所产生的合成数据优于对人标记的数据的状态的最先进的对应训练。

17. BERTnesia: Investigating the capture and forgetting of knowledge in BERT [PDF] 返回目录
Jonas Wallat, Jaspreet Singh, Avishek Anand
Abstract: Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.
摘要：探测复杂的语言模型最近已经显示了一些见解语言和语义模式在学习交涉中。在本文中，我们探讨BERT专门了解和衡量它抓住了关系的知识。我们利用知识库完成任务，探测的每一层预先训练以及微调BERT（居，问题解答，NER）。我们的研究结果表明，知识不只是包含在BERT的最后层。中间层贡献显著量（17-60％），以总的知识发现的。探测中间层也揭示了不同类型的知识如何出现在不同的利率。当BERT是微调，关系知识被遗忘，但遗忘的程度是由微调的目标，但不是数据集的大小的影响。我们发现，排名模型忘了至少在他们的最后一层保留更多的知识。我们发布在GitHub上我们的代码重复实验。

18. Global Attention for Name Tagging [PDF] 返回目录
Boliang Zhang, Spencer Whitehead, Lifu Huang, Heng Ji
Abstract: Many name tagging approaches use local contextual information with much success, but fail when the local context is ambiguous or limited. We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information. We retrieve document-level context from other sentences within the same document and corpus-level context from sentences in other topically related documents. We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions, which dynamically weight their respective contextual information, and gating mechanisms, which determine the influence of this information. Extensive experiments on benchmark datasets show the effectiveness of our approach, which achieves state-of-the-art results for Dutch, German, and Spanish on the CoNLL-2002 and CoNLL-2003 datasets.
摘要：许多名称标注方法使用中有很多成功的本地上下文信息，但在当地的情况是不明确的或有限的失败。我们提出了一个新的框架，通过利用本地，文档级，和黄层次情境信息来改善名称标记。我们检索句子其他局部相关文件相同的文档和语料库级范围内的其他句子文档级上下文。我们提出了一个模型，学会结合一起通过全局的关注，其动态加权各自的上下文信息，和门控机制，确定该信息的影响当地环境信息文档级和黄层次情境信息。在基准数据集大量的实验证明我们的方法，达到国家的先进成果为荷兰语，德语和西班牙语在CoNLL-2002和CoNLL-2003数据集的有效性。

19. Query-aware Tip Generation for Vertical Search [PDF] 返回目录
Yang Yang, Junmei Hao, Canjia Li, Zili Wang, Jingang Wang, Fuzheng Zhang, Rao Fu, Peixu Hou, Gong Zhang, Zhongyuan Wang
Abstract: As a concise form of user reviews, tips have unique advantages to explain the search results, assist users' decision making, and further improve user experience in vertical search scenarios. Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios. To address this issue, this paper proposes a query-aware tip generation framework, integrating query information into encoding and subsequent decoding processes. Two specific adaptations of Transformer and Recurrent Neural Network (RNN) are proposed. For Transformer, the query impact is incorporated into the self-attention computation of both the encoder and the decoder. As for RNN, the query-aware encoder adopts a selective network to distill query-relevant information from the review, while the query-aware decoder integrates the query information into the attention computation during decoding. The framework consistently outperforms the competing methods on both public and real-world industrial datasets. Last but not least, online deployment experiments on Dianping demonstrate the advantage of the proposed framework for tip generation as well as its online business values.
摘要：由于用户评论简洁的形式，提示有独特的优势来解释搜索结果，帮助用户决策，进一步提高在垂直搜索场景的用户体验。现有的尖端产生的工作并不需要查询的考虑，这限制了提示的搜索场景的影响。为了解决这个问题，提出了一种查询感知尖端生成框架，查询信息纳入编码和随后的解码处理。变压器和递归神经网络（RNN）的两个具体的修改提出了建议。对于变压器，查询影响纳入编码器和解码器的自我关注计算。至于RNN，查询知晓型编码器采用选择性的网络从审查提制查询相关信息，而查询感知解码器解码期间集成查询信息到注意计算。该框架的性能一直优于在公共和真实世界的数据集产业竞争的方法。最后但并非最不重要的，在大众点评网上部署的实验证明了尖端产生以及其在线业务价值所提出的框架的优势。

20. Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization [PDF] 返回目录
Tiezheng Yu, Dan Su, Wenliang Dai, Pascale Fung
Abstract: Lay summarization aims to generate lay summaries of scientific papers automatically. It is an essential task that can increase the relevance of science for all of society. In this paper, we build a lay summary generation system based on the BART model. We leverage sentence labels as extra supervision signals to improve the performance of lay summarization. In the CL-LaySumm 2020 shared task, our model achieves 46.00\% Rouge1-F1 score.
摘要：莱总结旨在自动生成的科学论文奠定摘要。它是一个可以提高全社会的科学意义的一项重要任务。在本文中，我们建立基于捷运模型奠定总结发电系统。我们充分利用句子作为标签额外监管信号来改善奠定总结的性能。在CL-LaySumm 2020共享任务，我们的模型达到46.00 \％Rouge1-F1得分。

21. Multi-hop Question Generation with Graph Convolutional Network [PDF] 返回目录
Dan Su, Yan Xu, Wenliang Dai, Ziwei Ji, Tiezheng Yu, Pascale Fung
Abstract: Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. It is a more challenging yet under-explored task compared to conventional single-hop QG, where the questions are generated from the sentence containing the answer or nearby sentences in the same paragraph without complex reasoning. To address the additional challenges in multi-hop QG, we propose Multi-Hop Encoding Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops with Graph Convolutional Network and encoding fusion via an Encoder Reasoning Gate. To the best of our knowledge, we are the first to tackle the challenge of multi-hop reasoning over paragraphs without any sentence-level information. Empirical results on HotpotQA dataset demonstrate the effectiveness of our method, in comparison with baselines on automatic evaluation metrics. Moreover, from the human evaluation, our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation. The code is publicly available at this https URL}{this https URL .
摘要：多跳询问生成（QG）的目的是产生通过汇总和推理在从不同的段落多个分散的证据，回答相关问题。充分开发的任务相比传统的单跳QG，其中的问题是从含而不复杂的推理同款的答案，或者附近的句子句子产生这是一个更具挑战性呢。为了解决在多跳QG的额外的挑战，我们提出了多跳编码融合网络的问题生成（MulQG），它不与格拉夫卷积网络多跳上下文编码和编码融合经由编码器推理门。据我们所知，我们是第一个来解决多跳推理在段落的挑战，没有任何句子级信息。上HotpotQA经验结果数据集显示了我们方法的有效性，在具有自动评价标准基线比较。此外，从人的评价，我们提出的模型能够产生高完整性流利的问题，并在多跳评价20.8％胜过最强的基线。该代码是公开的，在此HTTPS URL} {此HTTPS URL。

22. Auto-Encoding Variational Bayes for Inferring Topics and Visualization [PDF] 返回目录
Dang Pham, Tuan M.V.Le
Abstract: Visualization and topic modeling are widely used approaches for text analysis. Traditional visualization methods find low-dimensional representations of documents in the visualization space (typically 2D or 3D) that can be displayed using a scatterplot. In contrast, topic modeling aims to discover topics from text, but for visualization, one needs to perform a post-hoc embedding using dimensionality reduction methods. Recent approaches propose using a generative model to jointly find topics and visualization, allowing the semantics to be infused in the visualization space for a meaningful interpretation. A major challenge that prevents these methods from being used practically is the scalability of their inference algorithms. We present, to the best of our knowledge, the first fast Auto-Encoding Variational Bayes based inference method for jointly inferring topics and visualization. Since our method is black box, it can handle model changes efficiently with little mathematical rederivation effort. We demonstrate the efficiency and effectiveness of our method on real-world large datasets and compare it with existing baselines.
摘要：可视化和主题建模是文本分析广泛的方法。传统的可视化方法找到的可使用的散点图显示在可视化空间的文件（通常是2D或3D）低维表示。相比之下，主题建模目的在于发现从文本的话题，但对于可视化，需要使用降维方法进行事后嵌入。近来的方案提出利用生成模型，共同找话题和可视化，使语义为有意义的解释可视化空间被注入。防止被实际使用这些方法的一个重大挑战是他们的推理算法的可扩展性。我们提出，到我们所知，对共同的主题推断和可视化的第一快速自动编码变基于贝叶斯推断方法。由于我们的方法是黑盒子，它可以用很少的数学rederivation努力有效地处理模式的转变。我们证明了我们对现实世界的大型数据集的方法的效率和效益，并与现有基准进行比较。

23. Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism [PDF] 返回目录
Pan Xie, Zhi Cui, Xiuyin Chen, Xiaohui Hu, Jianwei Cui, Bin Wang
Abstract: Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the \textit{sequential dependency} among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En$\leftrightarrow$De and WMT16 En$\leftrightarrow$Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
摘要：非自回归模型生成并行的方式，其实现更快的解码速度，但在翻译的准确性牺牲目标词。为了弥补有缺陷的翻译非自回归模型，一个有前途的方法是训练条件蒙面翻译模型（CMTM），和几个迭代中提炼生成的结果。不幸的是，这样的方法几乎不考虑目标字，这不可避免地导致翻译降解之间的\ textit {顺序依赖性}。因此，不应仅仅培养了基于变压器的CMTM，我们提出了一个自我审查机制注入顺序的信息吧。具体而言，我们插入一个左到右的掩模CMTM的同一译码器，然后将其诱导对autoregressively审查从CMTM每个所生成的字是否应该被替换或保持。实验结果（WMT14恩$ \ leftrightarrow $德和WMT16恩$ \ leftrightarrow $ RO）表明，我们的模型使用大幅减少培训的计算比典型的CMTM，以及性能优于国家的最先进的几种非自回归模型通过在1 BLEU。通过知识升华，我们的模型甚至超过一个典型的左到右的变压器模型，而显著加快解码。

24. SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline [PDF] 返回目录
Jiaxin Ju, Ming Liu, Longxiang Gao, Shirui Pan
Abstract: The Scholarly Document Processing (SDP) workshop is to encourage more efforts on natural language understanding of scientific task. It contains three shared tasks and we participate in the LongSumm shared task. In this paper, we describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020) that is an unsupervised text summarization system for multi-document in news domain. Our SciSummPip includes a transformer-based language model SciBERT (Beltagy et al., 2019) for contextual sentence representation, content selection with PageRank (Page et al., 1999), sentence graph construction with both deep and linguistic information, sentence graph clustering and within-graph summary generation. Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain. The experiment results on both training dataset and blind test dataset show the effectiveness of our method, and we empirically verify the robustness of modules used in SciSummPip with BERTScore (Zhang et al., 2019a).
摘要：学术文档处理（SDP）车间是鼓励对自然语言理解科学的任务更多的努力。它包含三个共同任务，我们参加LongSumm共享任务。在本文中，我们描述文本摘要系统，SciSummPip，由SummPip启发（Zhao等，2020）即是多文档中新闻领域的无监督文本摘要系统。我们SciSummPip包括基于变压器的语言模型SciBERT（Beltagy等，2019）对上下文的句子表达，内容选择与PageRank（网页等，1999），既深且语言信息，句子图聚类和句子图施工内-图概要生成。我们从在内容选择前一种方法，并总结长度约束的工作不同应用，以适应科学领域。两个训练数据集和盲测试数据集的实验结果表明我们的方法的有效性，以及我们经验验证SciSummPip使用BERTScore（Zhang等人，2019a）模块的鲁棒性。

25. Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning [PDF] 返回目录
Ye Liu, Sheng Zhang, Rui Song, Suo Feng, Yanghua Xiao
Abstract: Open attribute value extraction for emerging entities is an important but challenging task. A lot of previous works formulate the problem as a \textit{question-answering} (QA) task. While the collections of articles from web corpus provide updated information about the emerging entities, the retrieved texts can be noisy, irrelevant, thus leading to inaccurate answers. Effectively filtering out noisy articles as well as bad answers is the key to improving extraction accuracy. Knowledge graph (KG), which contains rich, well organized information about entities, provides a good resource to address the challenge. In this work, we propose a knowledge-guided reinforcement learning (RL) framework for open attribute value extraction. Informed by relevant knowledge in KG, we trained a deep Q-network to sequentially compare extracted answers to improve extraction accuracy. The proposed framework is applicable to different information extraction system. Our experimental results show that our method outperforms the baselines by 16.5 - 27.8\%.
摘要：打开属性值提取为新兴的实体是一个重要但具有挑战性的任务。很多以前的作品的制定问题作为\ textit {问题回答}（QA）任务。虽然文章来自网络语料库中集提供更新大约新兴实体的信息，检索到的文本可以是嘈杂的，不相关的，从而导致不准确的答案。有效过滤嘈杂的文章以及糟糕的答案的关键在于提高提取精度。知识图（KG），其中含有丰富的，有关单位精心组织信息，提供了良好的资源来应对这一挑战。在这项工作中，我们提出了开放的属性值提取知识引导的强化学习（RL）的框架。在KG相关知识告知，我们培养了深厚Q-网络顺序比较提取的答案，以提高提取精度。拟议的框架适用于不同的信息提取系统。我们的实验结果表明，该方法由16.5优于基准 - 27.8 \％。

26. Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model [PDF] 返回目录
Jason Obeid, Enamul Hoque
Abstract: Information visualizations such as bar charts and line charts are very popular for exploring data and communicating insights. Interpreting and making sense of such visualizations can be challenging for some people, such as those who are visually impaired or have low visualization literacy. In this work, we introduce a new dataset and present a neural model for automatically generating natural language summaries for charts. The generated summaries provide an interpretation of the chart and convey the key insights found within that chart. Our neural model is developed by extending the state-of-the-art model for the data-to-text generation task, which utilizes a transformer-based encoder-decoder architecture. We found that our approach outperforms the base model on a content selection metric by a wide margin (55.42% vs. 8.49%) and generates more informative, concise, and coherent summaries.
摘要：信息可视化，例如条形图和折线图的是浏览数据和通信的见解很受欢迎。这种可视化的解读和决策意识可以挑战一些人，比如那些谁是视障或具有低可视化素养。在这项工作中，我们引入一个新的数据集，并提出一个神经模型自动生成自然语言总结的图表。将所生成的摘要提供图表的解释和传送密钥的见解，图表中发现。我们的神经网络模型是通过扩展状态的最先进的型号为数据到文本生成任务，其利用基于变压器的编码器 - 解码器架构开发的。我们发现，我们的方法优于在内容选择指标大幅度（55.42％和8.49％）的基础模型，并产生更丰富，简洁，连贯的摘要。

27. Incorporating Count-Based Features into Pre-Trained Models for Improved Stance Detection [PDF] 返回目录
Anushka Prakash, Harish Tayyar Madabushi
Abstract: The explosive growth and popularity of Social Media has revolutionised the way we communicate and collaborate. Unfortunately, this same ease of accessing and sharing information has led to an explosion of misinformation and propaganda. Given that stance detection can significantly aid in veracity prediction, this work focuses on boosting automated stance detection, a task on which pre-trained models have been extremely successful on, as on several other tasks. This work shows that the task of stance detection can benefit from feature based information, especially on certain under performing classes, however, integrating such features into pre-trained models using ensembling is challenging. We propose a novel architecture for integrating features with pre-trained models that address these challenges and test our method on the RumourEval 2019 dataset. This method achieves state-of-the-art results with an F1-score of 63.94 on the test set.
摘要：爆炸性增长和社会媒体的普及已经彻底改变了我们的沟通方式和协作。不幸的是，这同样容易访问和共享信息导致了误传和宣传的爆炸。鉴于姿态检测可以预测准确性帮助显著，今年工作重点放在提高自动化检测的立场，任务上预先训练车型已全部上非常成功，因为其他几个任务。这项工作表明，姿态探测的任务可以从基于特征信息进行类然而下受益，尤其是在某些，这些功能集成到使用ensembling预先训练模式是具有挑战性的。我们提出了一个新颖的架构与应对这些挑战，考验我们的方法在RumourEval 2019集预训练模型集成功能。这种方法实现了国家的最先进的结果与63.94在测试组的F1-得分。

28. UoB at SemEval-2020 Task 1: Automatic Identification of Novel Word Senses [PDF] 返回目录
Eleri Sarsfield, Harish Tayyar Madabushi
Abstract: Much as the social landscape in which languages are spoken shifts, language too evolves to suit the needs of its users. Lexical semantic change analysis is a burgeoning field of semantic analysis which aims to trace changes in the meanings of words over time. This paper presents an approach to lexical semantic change detection based on Bayesian word sense induction suitable for novel word sense identification. This approach is used for a submission to SemEval-2020 Task 1, which shows the approach to be capable of the SemEval task. The same approach is also applied to a corpus gleaned from 15 years of Twitter data, the results of which are then used to identify words which may be instances of slang.
摘要：多为社会景观中哪些语言口语班，语言太演变，以满足其用户的需求。词汇语义变化的分析是语义分析的一个新兴领域，其目的是跟踪一段时间内词的含义改变。此提出了一种基于适合于新语义确定贝叶斯词义感应的方式来词汇语义变化检测。这种方法被用于提交SemEval-2020任务1，其示出为能够在该SemEval任务的方法。同样的方法也适用于从15年Twitter的数据的收集的语料库，其结果然后被用于确定哪些可能是俚语的实例的话。

29. Meta-Learning for Low-Resource Unsupervised Neural MachineTranslation [PDF] 返回目录
Yunwon Tae, Cheonbok Park, Taehee Kim, Soyoung Yang, Mohammad Azam Khan, Eunjeong Park, Tao Qin, Jaegul Choo
Abstract: Unsupervised machine translation, which utilizes unpaired monolingual corpora as training data, has achieved comparable performance against supervised machine translation. However, it still suffers from data-scarce domains. To address this issue, this paper presents a meta-learning algorithm for unsupervised neural machine translation (UNMT) that trains the model to adapt to another domain by utilizing only a small amount of training data. We assume that domain-general knowledge is a significant factor in handling data-scarce domains. Hence, we extend the meta-learning algorithm, which utilizes knowledge learned from high-resource domains to boost the performance of low-resource UNMT. Our model surpasses a transfer learning-based approach by up to 2-4 BLEU scores. Extensive experimental results show that our proposed algorithm is pertinent for fast adaptation and consistently outperforms other baseline models.
摘要：无监督机器翻译，它利用非成对的单语语料库作为训练数据，取得了对监督的机器翻译相当的性能。但是，它仍然从数据稀少的领域受到影响。为了解决这个问题，本文提出了一种无监督神经机器翻译（UNMT）元学习算法，火车模型，只利用训练数据少量适应另一个域。我们假设域的一般知识是在处理数据稀少域的显著因素。因此，我们扩展了元学习算法，它利用知识从高资源域学会提高低资源UNMT的性能。我们的模型超过达2-4 BLEU分数传输基于学习的方法。大量的实验结果表明，我们的算法是有关快速适应和一贯优于其他基线模型。

30. Explaining and Improving Model Behavior with k Nearest Neighbor Representations [PDF] 返回目录
Nazneen Fatema Rajani, Ben Krause, Wengpeng Yin, Tong Niu, Richard Socher, Caiming Xiong
Abstract: Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens. We propose using k nearest neighbor (kNN) representations to identify training examples responsible for a model's predictions and obtain a corpus-level understanding of the model's behavior. Apart from interpretability, we show that kNN representations are effective at uncovering learned spurious associations, identifying mislabeled examples, and improving the fine-tuned model's performance. We focus on Natural Language Inference (NLI) as a case study and experiment with multiple datasets. Our method deploys backoff to kNN for BERT and RoBERTa on examples with low model confidence without any update to the model parameters. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
摘要：NLP解释性技术主要集中于了解使用注意可视化或基于梯度的显着标记映射在个人的预测。我们建议使用K近邻（KNN）表示，以确定负责模型的预测训练样本，并获得该模型的行为的语料库层次的理解。除了可解释性，我们表明，k近邻表示是在揭露虚假学会协会，识别贴错标签的例子，并提高了微调模型的表现效果。我们专注于自然语言推理（NLI）作为一个案例研究和实验多个数据集。我们的方法展开时退以k近邻的BERT和罗伯塔上没有任何更新模型参数低模式的信心的例子。我们的研究结果表明，邻近法使得微调，模型更加稳健对抗输入。

31. Towards Interpreting BERT for Reading Comprehension Based QA [PDF] 返回目录
Sahana Ramnath, Preksha Nema, Deep Sahni, Mitesh M. Khapra
Abstract: BERT and its variants have achieved state-of-the-art performance in various NLP tasks. Since then, various works have been proposed to analyze the linguistic information being captured in BERT. However, the current works do not provide an insight into how BERT is able to achieve near human-level performance on the task of Reading Comprehension based Question Answering. In this work, we attempt to interpret BERT for RCQA. Since BERT layers do not have predefined roles, we define a layer's role or functionality using Integrated Gradients. Based on the defined roles, we perform a preliminary analysis across all layers. We observed that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction. Specifically for quantifier questions (how much/how many), we notice that BERT focuses on confusing words (i.e., on other numerical quantities in the passage) in the later layers, but still manages to predict the answer correctly. The fine-tuning and analysis scripts will be publicly available at this https URL .
摘要：BERT及其变种都取得了各种自然语言处理任务的国家的最先进的性能。自那时以来，各项工作已经提出来分析BERT所捕获的语言信息。然而，目前的作品不提供洞察BERT如何能够实现对阅读理解基础答疑的任务接近人类水平的性能。在这项工作中，我们试图解释BERT为RCQA。由于BERT层没有预定义的角色，我们定义使用集成渐变图层的角色或功能。基于定义的角色，我们执行所有层的初步分析。我们观察到，在初始层集中于查询通道相互作用，而后来的层更注重背景的理解，提高应答预测。具体的量词问题（多少/多少），我们注意到，BERT侧重于在以后的层易混词（即，在该通道其它数字量），但仍然能正确预测答案。微调和分析脚本将公开可在此HTTPS URL。

32. Querent Intent in Multi-Sentence Questions [PDF] 返回目录
Laurie Burchell, Jie Chi, Tom Hosking, Nina Markl, Bonnie Webber
Abstract: Multi-sentence questions (MSQs) are sequences of questions connected by relations which, unlike sequences of standalone questions, need to be answered as a unit. Following Rhetorical Structure Theory (RST), we recognise that different "question discourse relations" between the subparts of MSQs reflect different speaker intents, and consequently elicit different answering strategies. Correctly identifying these relations is therefore a crucial step in automatically answering MSQs. We identify five different types of MSQs in English, and define five novel relations to describe them. We extract over 162,000 MSQs from Stack Exchange to enable future research. Finally, we implement a high-precision baseline classifier based on surface features.
摘要：多句子题（MSQs）是由它不同于独立的序列问题，需要回答为单位关系有关的问题序列。继修辞结构理论（RST），我们认识到，MSQs的子部件之间的不同“问题的话语关系”反映了不同的扬声器意图，从而引导学生不同的应答策略。因此，正确识别这些关系是自动应答MSQs的关键一步。我们确定了五种不同类型的MSQs的英文，并确定5个新的关系来形容他们。我们提取堆栈交换了162,000 MSQs，使今后的研究。最后，我们实现了基于表面特征的高精度基准分类。

33. hinglishNorm -- A Corpus of Hindi-English Code Mixed Sentences for Text Normalization [PDF] 返回目录
Piyush Makhija, Ankit Kumar, Anuj Gupta
Abstract: We present hinglishNorm -- a human annotated corpus of Hindi-English code-mixed sentences for text normalization task. Each sentence in the corpus is aligned to its corresponding human annotated normalized form. To the best of our knowledge, there is no corpus of Hindi-English code-mixed sentences for text normalization task that is publicly available. Our work is the first attempt in this direction. The corpus contains 13494 parallel segments. Further, we present baseline normalization results on this corpus. We obtain a Word Error Rate (WER) of 15.55, BiLingual Evaluation Understudy (BLEU) score of 71.2, and Metric for Evaluation of Translation with Explicit ORdering (METEOR) score of 0.50.
摘要：我们目前hinglishNorm - 文字规范化任务印地文，英文代码混合句子的人类标注语料库。在语料库中的每个句子对齐到其对应的人类注释的规范化的形式。据我们所知，目前对文本规范化任务是公开可用的无印地文，英文代码混合句子的语料库。我们的工作就是在这个方向上的首次尝试。胼包含13494个并行段。此外，我们目前基线正常化结果这个语料库。我们得到了一个字错误率15.55（WER），双语评估替补（BLEU）评分为71.2，和公制为翻译的评价与明确的排序（流星）得分为0.50。

34. Capturing Longer Context for Document-level Neural Machine Translation: A Multi-resolutional Approach [PDF] 返回目录
Zewei Sun, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Shujian Huang, Jiajun Chen, Lei Li
Abstract: Discourse context has been proven useful when translating documents. It is quite a challenge to incorporate long document context in the prevailing neural machine translation models such as Transformer. In this paper, we propose multi-resolutional (MR) Doc2Doc, a method to train a neural sequence-to-sequence model for document-level translation. Our trained model can simultaneously translate sentence by sentence as well as a document as a whole. We evaluate our method and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that MR Doc2Doc outperforms sentence-level models and previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation.
摘要：翻译文档时，话语语境已被证明是有用的。这是相当大的挑战纳入长文档背景在当时的神经机器翻译模型，例如变压器。在本文中，我们提出了多分辨率（MR）Doc2Doc，训练神经序列到序列模型文档级转换的方法。我们训练有素的模型可以同时翻译逐句以及一个文档作为一个整体。我们评估我们的方法和九个文件级数据集和跨越六种语言的两句话级数据集最近的几个方法。实验表明，MR Doc2Doc性能优于句级模型和一套完整的指标，包括BLEU，四个词汇索引，三度新提出的助理语言的指标，与人评价以前的方法。

35. Towards Data Distillation for End-to-end Spoken Conversational Question Answering [PDF] 返回目录
Chenyu You, Nuo Chen, Fenglin Liu, Dongchao Yang, Yuexian Zou
Abstract: In spoken question answering, QA systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora. In this task, our main objective is to build a QA system to deal with conversational questions both in spoken and text forms, and to explore the plausibility of providing more cues in spoken documents with systems in information gathering. To this end, instead of adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which directly fuse audio-text features to reduce the misalignment between automatic speech recognition hypotheses and the reference transcriptions. In addition, to evaluate the capacity of QA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 120k question-answer pairs. Experiments demonstrate that our proposed method achieves superior performance in spoken conversational question answering.
摘要：在语音问答，问答系统旨在解答相关的语音转录文本内连续的文本跨度问题。然而，最自然的方式，人类寻求或测试他们的知识是通过人的谈话。因此，我们提出了一个新的会话口语问答任务（SCQA），旨在使QA系统来模拟复杂的对话流给出的语音发声和语料库。在此任务中，我们的主要目标是建立一个质量保证体系，以应对双方在语音和文本形式的对话的问题，并探讨在信息收集系统语音文档提供更多线索的合理性。为此目的，而不是用高噪声数据采用自动生成的语音转录，我们提出了一种新颖的统一数据蒸馏方法，DDNet，这直接保险丝音频文本特性来降低自动语音识别假设和参考转录之间的错位。此外，评估质量保证体系进行对话式的交互能力，我们组装一口语会话问答（口语-CoQA）数据集与超过120K问答配对。实验表明，我们提出的方法实现的英语口语对话问答卓越的性能。

36. Mixed-Lingual Pre-training for Cross-lingual Summarization [PDF] 返回目录
Ruochen Xu, Chenguang Zhu, Yu Shi, Michael Zeng, Xuedong Huang
Abstract: Cross-lingual Summarization (CLS) aims at producing a summary in the target language for an article in the source language. Traditional solutions employ a two-step approach, i.e. translate then summarize or summarize then translate. Recently, end-to-end models have achieved better results, but these approaches are mostly limited by their dependence on large-scale labeled data. We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks such as translation and monolingual tasks like masked language models. Thus, our model can leverage the massive monolingual data to enhance its modeling of language. Moreover, the architecture has no task-specific components, which saves memory and increases optimization efficiency. We show in experiments that this pre-training scheme can effectively boost the performance of cross-lingual summarization. In Neural Cross-Lingual Summarization (NCLS) dataset, our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
摘要：跨语言概述（CLS）的目的是产生在目标语言的总结中源语言的文章。传统的解决方案采用两步法，即翻译，然后汇总或汇总，然后进行翻译。近日，端至中高端车型都取得了较好的效果，但这些方法是通过其对大规模的标签数据的依赖大多受限。我们提出了一种基于混合语种前培训，充分利用两个跨语言任务，例如翻译和多语的任务，如蒙面语言模型的解决方案。因此，我们的模型可以利用大量的单语数据，以提高其语言的造型。此外，该架构还没有具体的任务组件，从而节省内存和提高效率的优化。我们发现在实验中，这种前培训方案能够有效提振跨语言概括的性能。在神经跨语言概述（NCLS）数据集，我们的模型达到2.82（英文为中国）和1.15（中国英语）的改进ROUGE-1的分数超过国家的先进成果。

37. Question Answering over Knowledge Base using Language Model Embeddings [PDF] 返回目录
Sai Sharath Japa, Rekabdar Banafsheh
Abstract: Knowledge Base, represents facts about the world, often in some form of subsumption ontology, rather than implicitly, embedded in procedural code, the way a conventional computer program does. While there is a rapid growth in knowledge bases, it poses a challenge of retrieving information from them. Knowledge Base Question Answering is one of the promising approaches for extracting substantial knowledge from Knowledge Bases. Unlike web search, Question Answering over a knowledge base gives accurate and concise results, provided that natural language questions can be understood and mapped precisely to an answer in the knowledge base. However, some of the existing embedding-based methods for knowledge base question answering systems ignore the subtle correlation between the question and the Knowledge Base (e.g., entity types, relation paths, and context) and suffer from the Out Of Vocabulary problem. In this paper, we focused on using a pre-trained language model for the Knowledge Base Question Answering task. Firstly, we used Bert base uncased for the initial experiments. We further fine-tuned these embeddings with a two-way attention mechanism from the knowledge base to the asked question and from the asked question to the knowledge base answer aspects. Our method is based on a simple Convolutional Neural Network architecture with a Multi-Head Attention mechanism to represent the asked question dynamically in multiple aspects. Our experimental results show the effectiveness and the superiority of the Bert pre-trained language model embeddings for question answering systems on knowledge bases over other well-known embedding methods.
摘要：知识库，约占世界的事实，往往在某种形式的包容本体，而不是含蓄，嵌入在程序代码中，传统的计算机程序的方式做。虽然在知识基础快速增长，它对从他们获取信息的一个挑战。知识库问答系统是从传统的离散型提取大量知识的有前途的方法之一。与网页搜索，问答了一个知识库给人准确，简洁的结果，前提是自然语言问题可以理解，准确地映射到知识库的答案。然而，其中一些对知识库问答系统中的现有的基于嵌入的方法忽略的问题和知识库之间微妙的相关性（例如，实体类型，关系的路径，和上下文），并且从出词汇问题的困扰。在本文中，我们将重点放在使用知识库答疑任务预先训练的语言模型。首先，我们用伯特基地无套管用于初始实验。我们进一步微调的嵌入这些从知识库，问问题，从提出问题到知识库答案方面的双向注意机制。我们的方法是基于一个简单的卷积神经网络架构，多头注意机制在多个方面的动态表示被问到的问题。我们的实验结果表明，有效性和伯特的优越性预先训练的问答系统语言模型的嵌入超过其他知名嵌入方法的知识基础。

38. HABERTOR: An Efficient and Effective Deep Hatespeech Detector [PDF] 返回目录
Thanh Tran, Yifan Hu, Changwei Hu, Kevin Yen, Fei Tan, Kyumin Lee, Serim Park
Abstract: We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.
摘要：我们提出我们的HABERTOR模型在大规模用户生成的内容检测仇恨言论。通过近期BERT模式成功的启发，我们提出了若干修改BERT，以增强对下游仇恨言论分类任务性能。 HABERTOR继承BERT的架构，但在四个方面不同：（i）其生成自己的词汇量以及使用规模最大的仇恨言论集scratch被预先训练; （ii）其由基于四元数因式分解组分，导致参数，更快的训练和推理，以及更少的内存使用一个更小的数; （ⅲ）它使用我们提出的多源集合头具有用于单独的输入源的池层，以进一步提高其效力;及（iv）它采用了正规化对抗性训练，我们提出的细粒度和自适应噪声幅度，以提高其耐用性。通过与1.4M注解注释的大型现实世界的仇恨言论集实验，我们表明，HABERTOR更好地工作超过15国家的最先进的仇恨言论的检测方法，其中包括微调语言模型。特别是，BERT相比，我们的HABERTOR为4〜5小于存储的1/3在训练/推理阶段倍的速度，使用并具有更好的性能，尽管我们用不到1％的预训练它的单词的数目。我们的概分析表明，转移HABERTOR很好地其他看不见仇恨言论数据集和是一个更高效和有效的替代BERT为仇恨言论分类。

39. Knowledge-Grounded Dialogue Generation with Pre-trained Language Models [PDF] 返回目录
Xueliang Zhao, Wei Wu, Can Xu, Chongyang Tao, Dongyan Zhao, Rui Yan
Abstract: We study knowledge-grounded dialogue generation with pre-trained language models. To leverage the redundant external knowledge under capacity constraint, we propose equipping response generation defined by a pre-trained language model with a knowledge selection module, and an unsupervised approach to jointly optimizing knowledge selection and response generation with unlabeled dialogues. Empirical results on two benchmarks indicate that our model can significantly outperform state-of-the-art methods in both automatic evaluation and human judgment.
摘要：我们与预先训练语言模型学习知识接地对话产生。要充分利用在产能限制多余的外部知识，我们建议通过装备预先训练的语言模型知识选择模块，和无监督的方式定义的响应一代，共同优化知识选择和反应生成与未标记的对话。两个基准实证结果表明，我们的模型可以国家的最先进的显著跑赢方法都自动评估和人的判断。

40. Consistency and Coherency Enhanced Story Generation [PDF] 返回目录
Wei Wang, Piji Li, Hai-Tao Zheng
Abstract: Story generation is a challenging task, which demands to maintain consistency of the plots and characters throughout the story. Previous works have shown that GPT2, a large-scale language model, has achieved good performance on story generation. However, we observe that several serious issues still exist in the stories generated by GPT2 which can be categorized into two folds: consistency and coherency. In terms of consistency, on one hand, GPT2 cannot guarantee the consistency of the plots explicitly. On the other hand, the generated stories usually contain coreference errors. In terms of coherency, GPT2 does not take account of the discourse relations between sentences of stories directly. To enhance the consistency and coherency of the generated stories, we propose a two-stage generation framework, where the first stage is to organize the story outline which depicts the story plots and events, and the second stage is to expand the outline into a complete story. Therefore the plots consistency can be controlled and guaranteed explicitly. In addition, coreference supervision signals are incorporated to reduce coreference errors and improve the coreference consistency. Moreover, we design an auxiliary task of discourse relation modeling to improve the coherency of the generated stories. Experimental results on a story dataset show that our model outperforms the baseline approaches in terms of both automatic metrics and human evaluation.
摘要：故事发生是一项艰巨的任务，它需要保持整个故事的情节和人物的一致性。以前的作品表明GPT2，大规模语言模型，取得了上一代的故事性能良好。但是，我们观察到几个严重的问题仍然存在通过GPT2产生的故事，这个故事可以分为两种折叠：连贯性和一致性。在一致性方面，一方面，GPT2不能保证该地块的一致性明确。在另一方面，所产生的故事通常包含共指错误。在一致性方面，GPT2没有考虑直接故事的句子之间的关系的话语的。为了提高产生故事的连贯性和一致性，我们提出了两个阶段的生成框架，其中第一阶段是组织故事大纲描绘的故事情节和事件，第二个阶段是展开大纲成一个完整的故事。因此，曲线的一致性可以被控制，并明确保证。此外，共指监督信号被结合，以减少共参照误差，提高共参照一致性。此外，我们设计的话语关系建模的辅助任务，以提高所产生的故事的连贯性。在故事集节目，我们的模型优于基线都自动度量和人工评估方面接近实验结果。

41. Active Testing: An Unbiased Evaluation Method for Distantly Supervised Relation Extraction [PDF] 返回目录
Pengshuai Li, Xinsong Zhang, Weijia Jia, Wei Zhao
Abstract: Distant supervision has been a widely used method for neural relation extraction for its convenience of automatically labeling datasets. However, existing works on distantly supervised relation extraction suffer from the low quality of test set, which leads to considerable biased performance evaluation. These biases not only result in unfair evaluations but also mislead the optimization of neural relation extraction. To mitigate this problem, we propose a novel evaluation method named active testing through utilizing both the noisy test set and a few manual annotations. Experiments on a widely used benchmark show that our proposed approach can yield approximately unbiased evaluations for distantly supervised relation extractors.
摘要：遥远的监督一直是神经关系抽取其方便自动标注的数据集的广泛使用的方法。然而，在遥远的监督关系抽取现有作品从测试组的低质量，这导致了相当大的偏见绩效评估受到影响。这些偏见不仅会造成不公平的评价，也误导神经关系抽取的优化。为了缓解这一问题，我们提出通过利用嘈杂的测试集和一些人工注释并重命名主动测试一种新的评价方法。在广泛使用的基准测试表明，该方法可以产生约公正的评价为遥远的监督关系提取实验。

42. ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection [PDF] 返回目录
Fatima Haouari, Maram Hasanain, Reem Suwaileh, Tamer Elsayed
Abstract: In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. We then manually-annotated the tweets by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. We aim to support two classes of misinformation detection problems over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious.
摘要：本文介绍ArCOV19-传言，阿拉伯语COVID-19的Twitter数据集误传检测含有从1月27日要求到4月到2020年年底我们收集了138个核实索赔，大多来自流行的事实查证网站推特组成，并确定了9.4K相关鸣叫这些说法。通过准确性然后我们手工标注的微博上误传的检测，这是在大流行所面临的主要问题之一支持研究。我们的目标是支持两类超过Twitter的误报检测问题：验证自由文本索赔（称为要求级验证）和验证鸣叫（称为鸣叫级验证）提出的索赔。我们的数据集的封面，除了健康，与其他专题类别声称被COVID-19，即社会，政治，体育，娱乐和宗教的影响。

43. CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets [PDF] 返回目录
Sara Renjit, Sumam Mary Idicula
Abstract: With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We applied this method in the Manglish dataset provided along with the sub-track.
摘要：随着社交媒体的普及，通过博客通信，Facebook，微博和其他开发平台，形式有所增加。首先，英语是交流的唯一媒介。幸运的是，现在我们可以用任何语言交流。这导致了一个混合形式使用英语和自己母语或母语语言的人。有时，在其他语言的评论有英文音译格式或其他情况;人们使用预期语言脚本。从这样的代码混合鸣叫识别情绪和攻击性的内容是在这些时候的必要任务。我们目前在论坛的信息检索评估，2020年提交的子轨迹HASOC攻击性语言识别 - DravidianCodeMix的任务2的工作模式是一种消息级分类任务。嵌入基于模型的分类识别我们的方法和进攻不是进攻的意见。我们沿着与子轨迹提供的数据集Manglish采用这种方法。

44. Drink bleach or do what now? Covid-HeRA: A dataset for risk-informed health decision making in the presence of COVID19 misinformation [PDF] 返回目录
Arkin Dharawat, Ismini Lourentzou, Alex Morales, ChengXiang Zhai
Abstract: Given the wide spread of inaccurate medical advice related to the 2019 coronavirus pandemic (COVID-19), such as fake remedies, treatments and prevention suggestions, misinformation detection has emerged as an open problem of high importance and interest for the NLP community. To combat potential harm of COVID19-related misinformation, we release Covid-HeRA, a dataset for health risk assessment of COVID-19-related social media posts. More specifically, we study the severity of each misinformation story, i.e., how harmful a message believed by the audience can be and what type of signals can be used to discover high malicious fake news and detect refuted claims. We present a detailed analysis, evaluate several simple and advanced classification models, and conclude with our experimental analysis that presents open challenges and future directions.
摘要：鉴于涉及到2019年冠状病毒大流行（COVID-19），如假偏方，治疗和预防的建议不准确的医疗建议的广泛普及，误传检测已成为非常重要和利息的NLP社会开放的问题。为了COVID19相关误传的打击潜在的危害，我们释放Covid，赫拉，为COVID19相关社交媒体帖子的健康风险评估的数据集。更具体地说，我们研究每一个误传的故事，即严重程度，如何有害的消息认为受到观众可以和什么类型的信号可以被用来发现高恶意的假新闻和检测反驳索赔。我们提出了详细的分析，评估几种简单的和先进的分类模型，并与我们的实验分析得出结论，呈现开放的挑战和未来的发展方向。

45. RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling [PDF] 返回目录
Jun Quan, Shian Zhang, Qian Cao, Zizhong Li, Deyi Xiong
Abstract: In order to alleviate the shortage of multi-domain data and to capture discourse phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Both single- and multi-domain dialogues are constructed, accounting for 65% and 35%, respectively. Each dialogue is labeled with comprehensive dialogue annotations, including dialogue goal in the form of natural language description, domain, dialogue states and acts at both the user and system side. In addition to traditional dialogue annotations, we especially provide linguistic annotations on discourse phenomena, e.g., ellipsis and coreference, in dialogues, which are useful for dialogue coreference and ellipsis resolution tasks. Apart from the fully annotated dataset, we also present a detailed description of the data collection procedure, statistics and analysis of the dataset. A series of benchmark models and results are reported, including natural language understanding (intent detection & slot filling), dialogue state tracking and dialogue context-to-text generation, as well as coreference and ellipsis resolution, which facilitate the baseline comparison for future research on this corpus.
摘要：为了缓解多领域数据的不足，捕捉面向任务的对话建模的话语现象，我们提出RiSAWOZ，一个大型的多域中国向导的盎司数据集丰富的语义标注。 RiSAWOZ包含11.2K人对人（H2H）多匝语义注解的对话，与超过150K话语跨越12结构域，这比以前的所有注释的H2H会话数据集大。两个单和多域对话构造，占65％和35％，分别。每个对话都标有全面对话的注解，包括在自然语言描述，域对话状态和行为，在用户和系统侧都是形式对话的目标。除了传统的对话注释，我们特别提供的话语现象，例如，省略号和共指，在对话，这是对话的共参照和省略号分辨率任务有用的语言注释。除了完全注释的数据集，我们也给出了数据收集程序，统计和数据集进行分析的详细说明。一系列的基准模型和结果的报告，包括自然语言理解（意向检测与槽分配），对话状态跟踪和对话语境对文本生成，以及共指和省略的分辨率，这有利于今后的研究基线比较这个语料库。

46. Incorporate Semantic Structures into Machine Translation Evaluation via UCCA [PDF] 返回目录
Jin Xu, Yinuo Guo, Junfeng Hu
Abstract: Copying mechanism has been commonly used in neural paraphrasing networks and other text generation tasks, in which some important words in the input sequence are preserved in the output sequence. Similarly, in machine translation, we notice that there are certain words or phrases appearing in all good translations of one source text, and these words tend to convey important semantic information. Therefore, in this work, we define words carrying important semantic meanings in sentences as semantic core words. Moreover, we propose an MT evaluation approach named Semantically Weighted Sentence Similarity (SWSS). It leverages the power of UCCA to identify semantic core words, and then calculates sentence similarity scores on the overlap of semantic core words. Experimental results show that SWSS can consistently improve the performance of popular MT evaluation metrics which are based on lexical similarity.
摘要：复制机制在神经意译网络和其它文本生成任务，其中，在输入序列中的一些重要词语的输出序列被保留被普遍使用。同样，在机器翻译，我们注意到有出现在一个源文本的所有好的翻译某些词或短语，并且这些词往往传达重要的语义信息。因此，在这项工作中，我们定义携带句子语义核心词的重要语义含义的词语。此外，我们提出了一个名为语义加权句子相似度（SWSS）的MT评估方法。它利用UCCA的功率，以确定语义核心词，然后计算上的语义核心词的重叠句子相似度得分。实验结果表明，SWSS能够持续改善是基于词汇相似度流行MT评价标准的性能。

47. A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences [PDF] 返回目录
Andrew Merritt, Chenhui Chu, Yuki Arase
Abstract: Multimodal neural machine translation (NMT) has become an increasingly important area of research over the years because additional modalities, such as image data, can provide more context to textual data. Furthermore, the viability of training multimodal NMT models without a large parallel corpus continues to be investigated due to low availability of parallel sentences with images, particularly for English-Japanese data. However, this void can be filled with comparable sentences that contain bilingual terms and parallel phrases, which are naturally created through media such as social network posts and e-commerce product descriptions. In this paper, we propose a new multimodal English-Japanese corpus with comparable sentences that are compiled from existing image captioning datasets. In addition, we supplement our comparable sentences with a smaller parallel corpus for validation and test purposes. To test the performance of this comparable sentence translation scenario, we train several baseline NMT models with our comparable corpus and evaluate their English-Japanese translation performance. Due to low translation scores in our baseline experiments, we believe that current multimodal NMT models are not designed to effectively utilize comparable sentence data. Despite this, we hope for our corpus to be used to further research into multimodal NMT with comparable sentences.
摘要：多通道神经机器翻译（NMT）已成为研究的一个越来越重要的领域，多年来，因为附加方式，诸如图像数据，可以为文本数据提供更多的上下文。此外，训练多NMT车型没有大的平行语料库生存能力继续因与图片，特别是英日的数据并行语句的低可用性进行调查。然而，这种无效可以充满包含双语术语和短语并列可比的句子，这是自然通过媒体创建诸如社交网络帖子和电子商务产品说明。在本文中，我们提出了一个新的多模英日语料库与从现有图像数据集字幕编译媲美的句子。此外，我们还与验证和测试目的较小的平行语料库补充我们相媲美的句子。为了测试这个可比整句翻译方案的性能，我们培养几个基线NMT模型与我们可比语料库，并评估他们的英语，日语翻译性能。由于我们的基础实验低的翻译分数，我们认为，目前多NMT模型设计无法有效地利用可比较的句子数据。尽管如此，我们希望我们的语料库被用来进一步研究具有可比性的句子多NMT。

48. Factual Error Correction for Abstractive Summarization Models [PDF] 返回目录
Meng Cao, Yue Dong, Jiapeng Wu, Jackie Chi Kit Cheung
Abstract: Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting factual errors in generated summaries. The neural corrector model is pre-trained on artificial examples that are created by applying a series of heuristic transformations on reference summaries. These transformations are inspired by an error analysis of state-of-the-art summarization model outputs. Experimental results show that our model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. We also find that transferring from artificial error correction to downstream settings is still very challenging.
摘要：神经抽象摘要系统已经实现希望的进展，得益于大规模的数据集和模型的可用性预先训练与自我监督的方法。然而，确保所生成的摘要的用于抽象摘要系统的事实一致性是一个挑战。我们提出了一个后期编辑校正模块，通过识别并生成摘要纠正事实错误来解决这个问题。神经校正模型预先训练对由涂布在参考摘要一系列启发式变换的人工创建的实例。这些转换由国家的最先进的聚合模型输出的误差分析的启发。实验结果表明，我们的模型能够由其他的神经总结模型生成的摘要正确的事实错误，优于上的CNN /每日邮报数据集的事实一致性评价以前的型号。我们还发现，人工纠错转移到下游设置还是很有挑战性的。

49. Example-Driven Intent Prediction with Observers [PDF] 返回目录
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
Abstract: A key challenge of dialog systems research is to effectively and efficiently adapt to new domains. A scalable paradigm for adaptation necessitates the development of generalizable models that perform well in few-shot settings. In this paper, we focus on the intent classification problem which aims to identify user intents given utterances addressed to the dialog system. We propose two approaches for improving the generalizability of utterance classification models: (1) example-driven training and (2) observers. Example-driven training learns to classify utterances by comparing to examples, thereby using the underlying encoder as a sentence similarity model. Prior work has shown that BERT-like models tend to attribute a significant amount of attention to the [CLS] token, which we hypothesize results in diluted representations. Observers are tokens that are not attended to, and are an alternative to the [CLS] token. The proposed methods attain state-of-the-art results on three intent prediction datasets (Banking, Clinc}, and HWU) in both the full data and few-shot (10 examples per intent) settings. Furthermore, we demonstrate that the proposed approach can transfer to new intents and across datasets without any additional training.
摘要：对话系统研究的一个主要挑战是有效地适应新的领域。适应可扩展的模式就必须在为数不多的拍摄设置执行以及普及应用模式的发展。在本文中，我们重点关注的意图分类问题其目的是确定给定的话语给对话系统用户的意图。我们提出了提高话语分类模型的普遍性两种方法：（1）例如驱动的培训和（2）的观察员。通过比较实施例，从而使用底层编码器作为一个句子相似度模型实施例驱动训练学会分类话语。以前的工作表明，BERT般的模型往往关注的显著量归因于[CLS]标记，我们假设在稀释交涉结果。观察者是未参加到令牌，并正在向[CLS]标记的替代物。所提出的方法实现在所述完整的数据和少数次既3个意图的预测数据集（银行，医务室}，和HWU）设置（每意图10个实施例）的状态的最先进的结果。此外，我们证明，该方法可以转移到新的意图和整个数据集，而无需任何额外的培训。

50. CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding [PDF] 返回目录
Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen
Abstract: Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average improvement of 2.2% while applied to the RoBERTa-large model. More importantly, it consistently exhibits stronger results relative to several competitive data augmentation and adversarial training base-lines (including the low-resource settings). Extensive experiments show that the proposed contrastive objective can be flexibly combined with various data augmentation approaches to further boost their performance, highlighting the wide applicability of the CoDA framework.
摘要：数据增强已被证明是一种有效的策略用于改善模型概括和数据效率。然而，由于自然语言，设计为文本数据标签保留变换的离散性往往更具挑战性。在本文中，我们提出了被称为CODA一种新颖的数据增强框架，其通过有机结合多个转换合成多样化和信息增加的例子。此外，对比正规化目标被引入到捕获所有的数据样本之间的全球合作关系。与记忆库的A势头编码器被进一步利用，以更好地估计对比损失。为了验证所提出的框架的有效性，我们采用CODA于基于变压器的车型上广泛的自然语言理解任务。上胶基准，同时适用于罗伯塔大型模型CODA产生了2.2％的平均改善。更重要的是，它显示出一致的相对于几个竞争的数据扩充及对抗性训练基地线（包括低资源设置）更强的结果。大量的实验表明，该对比目标可以灵活组合各种数据隆胸方法，以进一步提升其性能，突出CODA框架广泛的适用性。

51. Cross-Lingual Relation Extraction with Transformers [PDF] 返回目录
Jian Ni, Taesun Moon, Parul Awasthy, Radu Florian
Abstract: Relation extraction (RE) is one of the most important tasks in information extraction, as it provides essential information for many NLP applications. In this paper, we propose a cross-lingual RE approach that does not require any human annotation in a target language or any cross-lingual resources. Building upon unsupervised cross-lingual representation learning frameworks, we develop several deep Transformer based RE models with a novel encoding scheme that can effectively encode both entity location and entity type information. Our RE models, when trained with English data, outperform several deep neural network based English RE models. More importantly, our models can be applied to perform zero-shot cross-lingual RE, achieving the state-of-the-art cross-lingual RE performance on two datasets (68-89% of the accuracy of the supervised target-language RE model). The high cross-lingual transfer efficiency without requiring additional training data or cross-lingual resources shows that our RE models are especially useful for low-resource languages.
摘要：关系抽取（RE）是在信息提取最重要的任务之一，因为它提供了许多NLP应用的基本信息。在本文中，我们提出了不需要在目标语言或跨语言资源的任何人注释的跨语种RE的方法。在无人监督的跨语言表示学习框架的基础上，我们开发了新的编码方案，可以有效地编码两个实体位置和实体类型信息的几个深基于变压器RE模式。我们的RE模式，与英文数据训练，跑赢大盘的几个深层神经网络基础英语RE模式。更重要的是，我们的模型可以应用于执行零次跨语种RE，实现对两个数据集的国家的最先进的跨语言RE性能（监督目标语言RE的准确性68-89％模型）。无需额外的训练数据或跨语言资源显示，我们的RE模式是资源少的语言特别有用的高跨语言的转换效率。

52. Multimodal Speech Recognition with Unstructured Audio Masking [PDF] 返回目录
Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
Abstract: Visual context has been shown to be useful for automatic speech recognition (ASR) systems when the speech signal is noisy or corrupted. Previous work, however, has only demonstrated the utility of visual context in an unrealistic setting, where a fixed set of words are systematically masked in the audio. In this paper, we simulate a more realistic masking scenario during model training, called RandWordMask, where the masking can occur for any word segment. Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting. Moreover, our analysis shows that our models are capable of attending to the visual signal when the audio signal is corrupted. These results show that multimodal ASR systems can leverage the visual signal in more generalized noisy scenarios.
摘要：视觉上下文已经表明当语音信号有噪声或损坏成为自动语音识别（ASR）系统是有用的。以前的工作，然而，仅展示了一个不切实际的设置，其中单词的固定的一组被系统地在音频掩蔽可视上下文的效用。在本文中，我们模型训练，称为RandWordMask，可能发生任何字段屏蔽期间模拟更真实的掩蔽方案。我们对Flickr的8K音频说明语料库实验表明，多ASR可以概括恢复不同类型的屏蔽词在这个非结构化屏蔽设置。此外，我们的分析表明，我们的模型能够当音频信号被破坏出席视觉信号。这些结果表明，多ASR系统可以利用更广义的喧闹场景的视觉信号。

53. Substance over Style: Document-Level Targeted Content Transfer [PDF] 返回目录
Allison Hegel, Sudha Rao, Asli Celikyilmaz, Bill Dolan
Abstract: Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs (this https URL). Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model's rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.
摘要：现有的语言模型擅长从头开始编写，但许多现实世界的情况都需要重写现有的文档，以适合一组约束。虽然语句级改写已经相当充分研究，很少的工作已经解决了连贯重写整个文档的挑战。在这项工作中，我们将介绍文档级的任务，有针对性的菜谱领域内容传输和地址的，具有配方作为文档和饮食限制（如素食主义者或无奶）为目标约束。我们提出了一个新的模型基础上，生成预训练的语言模型（GPT-2）和列车上有大量大致对准配方对这个任务（此HTTPS URL）。自动和人的评估表明，我们的模型外执行通过生成一致且服从约束，而其余接近原始文件不同重写现有的方法。最后，我们分析模型的重写，以评估对使语言生成更切合是实质性的，而不是限制文体的目标方面取得进展。

54. Linguistically-Informed Transformations (LIT): A Method forAutomatically Generating Contrast Sets [PDF] 返回目录
Chuanrong Li, Lin Shengshuo, Leo Z. Liu, Xinyi Wu, Xuhui Zhou, Shane Steinert-Threlkeld
Abstract: Although large-scale pretrained language models, such as BERT and RoBERTa, have achieved superhuman performance on in-distribution test sets, their performance suffers on out-of-distribution test sets (e.g., on contrast sets). Building contrast sets often re-quires human-expert annotation, which is expensive and hard to create on a large scale. In this work, we propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets, which enables practitioners to explore linguistic phenomena of interests as well as compose different phenomena. Experimenting with our method on SNLI and MNLI shows that current pretrained language models, although being claimed to contain sufficient linguistic knowledge, struggle on our automatically generated contrast sets. Furthermore, we improve models' performance on the contrast sets by apply-ing LIT to augment the training data, without affecting performance on the original data.
摘要：虽然大型预训练的语言模型，如BERT和罗伯塔，对乱分布测试设备取得超人的表现在分布试验台，其性能会受到影响（例如，对比度套）。相比之下建设往往集重新奎雷斯人类专家的注解，这是昂贵的，并努力创造大规模。在这项工作中，我们提出了一个语言学知情转换（LIT）方法来自动生成对比度设置，使从业者探索的兴趣语言现象以及撰写不同的现象。与我们的SNLI和MNLI显示方法进行实验，目前的预训练的语言模型，虽然被声称含有足够的语言知识，对我们的自动生成的对比度套斗争。此外，我们改善应用-ING LIT充实到训练数据，在不影响原有的数据性能模型的对比度套性能。

55. Generating Fact Checking Summaries for Web Claims [PDF] 返回目录
Rahul Mishra, Dhruv Gupta, Markus Leippold
Abstract: We present SUMO, a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents (e.g., news articles or Web documents). SUMO further generates an extractive summary by presenting a diversified set of sentences from the documents that explain its decision on the correctness of the textual claim. Prior approaches to address the problem of fact checking and evidence extraction have relied on simple concatenation of claim and document word embeddings as an input to claim driven attention weight computation. This is done so as to extract salient words and sentences from the documents that help establish the correctness of the claim. However, this design of claim-driven attention does not capture the contextual information in documents properly. We improve on the prior art by using improved claim and title guided hierarchical attention to model effective contextual cues. We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues.
摘要：我们目前SUMO，神经注意力为基础的方法是学会建立了基于证据的文本文档（例如，新闻文章或网页文件）的形式，文字的权利要求的正确性。 SUMO还生成通过呈现一个多元化的从解释其对文本要求的正确性判决的文件设置句子的采掘总结。解决核对事实和证据提取问题的现有方法都依赖于要求和文件的嵌入字的简单连接，作为输入要求驱动注意重量计算。这是从文档，这样做以提取突出的词和句子帮助建立要求的正确性。然而，这种设计的要求驱动注意不能正确捕捉文件的上下文信息。我们提高利用改进的要求和标题引导分级注意型号有效上下文线索的现有技术。我们展示我们的关于政治，医疗数据集的方法，以及环境问题的功效。

56. Reflective Decoding: Unsupervised Paraphrasing and Abductive Reasoning [PDF] 返回目录
Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena Hwang, Yejin Choi
Abstract: Pretrained Language Models (LMs) generate text with remarkable quality, novelty,and coherence. Yet applying LMs to the problems of paraphrasing and infilling currently requires direct supervision, since these tasks break the left-to-right generation setup of pretrained LMs. We present Reflective Decoding, a novel unsupervised approach to apply the capabilities of pretrained LMs to non-sequential tasks. Our approach is general and applicable to two distant tasks - paraphrasing and abductive reasoning. It requires no supervision or parallel corpora, only two pretrained language models: forward and backward. Reflective Decoding operates in two intuitive steps. In the contextualization step, we use LMs to generate many left and right contexts which collectively capture the meaning of the input sentence. Then, in the reflection step we decode in the semantic neighborhood of the input, conditioning on an ensemble of generated contexts with the reverse direction LM. We reflect through the generated contexts, effectively using them as an intermediate meaning representation to generate conditional output. Empirical results demonstrate that Reflective Decoding outperforms strong unsupervised baselines on both paraphrasing and abductive text infilling, significantly narrowing the gap between unsupervised and supervised methods.Reflective Decoding introduces the concept of using generated contexts to represent meaning, opening up new possibilities for unsupervised conditional text generation.
摘要：预训练的语言模型（LMS）生成具有卓越的品质，新颖性，和连贯性文本。然而，应用的LMS转述，目前充填的问题需要直接监管，因为这些任务分解预训练的LMS左到右代设置。我们本反射解码，以预先训练的LMS功能适用于非连续工作一个新的无监督的办法。我们的做法是普遍的，适用于两个远距离的任务 - 解读和绎推理。它不需要任何监督或平行语料库中，只有两个预训练的语言模型：前进和后退。反光解码有两种操作直观的步骤。在语境步骤中，我们使用的LMS产生许多左边和右边上下文，它们共同捕获输入句子的意思。然后，在反射步骤我们解码该输入，调理上与反向方向产生LM上下文的集合的语义邻域。我们通过反映所产生的上下文中，有效地使用它们作为中间体的含义表示来生成条件输出。经验结果表明，反射解码优于在两个复述和溯文本充填强无监督基线，显著缩小无监督之间的间隙和幼儿methods.Reflective解码介绍了利用产生的上下文来表示的意思，对于无监督条件文本生成开辟了新的可能性的概念。

57. Parameter Norm Growth During Training of Transformers [PDF] 返回目录
William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith
Abstract: The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically some variant of gradient descent (GD). To better understand this bias, we study the tendency of transformer parameters to grow in magnitude during training. We find, both theoretically and empirically, that, in certain contexts, GD increases the parameter $L_2$ norm up to a threshold that itself increases with training-set accuracy. This means increasing training accuracy over time enables the norm to increase. Empirically, we show that the norm grows continuously over pretraining for T5 (Raffel et al., 2019). We show that pretrained T5 approximates a semi-discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the original network family that can be described in automata-theoretic terms. This suggests saturation is a new characterization of an inductive bias implicit in GD that is of particular interest for NLP. While our experiments focus on transformers, our theoretical analysis extends to other architectures with similar formal properties, such as feedforward ReLU networks.
摘要：神经网络一样被广泛采用的变压器容量被称为是非常高的。有证据表明他们成功地学习，由于在例行训练归纳偏置，典型的梯度下降（GD）的一些变种。为了更好地理解这种偏见，我们研究了变压器参数的趋势，培训期间的幅度增长。我们发现，无论是理论和实证，即，在某些情况下，GD增加了参数$ L_2 $规范了阈值本身与训练集精度提高。这意味着随着时间的推移不断增加训练精度使得规范增加。根据经验，我们表明，在规范训练前的T5持续增长（拉费尔等，2019）。我们表明，预训练的T5接近饱和激活功能的半离散化的网络。这样的“饱和的”网络已知具有相比，可以在自动机理论来描述原始网络家族的能力降低。这表明，饱和度是一个归纳偏置在GD隐含的意思是NLP特别感兴趣的一个新特征。虽然我们的实验集中在变压器，我们的理论分析延伸到类似的形式属性，如前馈RELU网络等架构。

58. End-to-End Text-to-Speech using Latent Duration based on VQ-VAE [PDF] 返回目录
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract: Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS). We propose a new TTS framework using explicit duration modeling that incorporates duration as a discrete latent variable to TTS and enables joint optimization of whole modules from scratch. We formulate our method based on conditional VQ-VAE to handle discrete duration in a variational autoencoder and provide a theoretical explanation to justify our method. In our framework, a connectionist temporal classification-based force aligner acts as the approximate posterior, and text-to-duration works as the prior in the variational autoencoder. We evaluated our proposed method with a listening test and compared it with other TTS methods based on soft-attention or explicit duration modeling. The result show that our systems rated between soft-attention-based methods (Transformer-TTS, Tacotron2) and explicit duration modeling-based methods (Fastspeech).
摘要：显式时长建模是在文本到语音合成（TTS）实现稳健和高效的校准的关键。我们建议使用结合持续时间离散潜变量来TTS，使从头开始整个模块的联合优化显式时长建模新TTS框架。我们制定基于条件VQ-VAE我们的方法来处理在变的自动编码离散时间，并提供一个理论解释，以证明我们的方法。在我们的框架，一个联结时间分类为基础的力量矫正作为近似后，和文本 - 持续时间的作品如现有的自动编码变。我们评估我们提出的方法有听力测试和基于软注意力或显式时长建模等TTS方法相比它。结果表明，我们的评级系统为基础的软注意力的方法（变压器TTS，Tacotron2）和显式时长的建模的方法（Fastspeech）之间。

59. Emerging Trends of Multimodal Research in Vision and Language [PDF] 返回目录
Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
Abstract: Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. More recently, this has enhanced research interests in the intersection of the Vision and Language arena with its numerous applications and fast-paced growth. In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities. We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation. We also address task-specific trends, along with their evaluation strategies and upcoming challenges. Moreover, we shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems. This survey identifies key trends gravitating recent literature in VisLang research and attempts to unearth directions that the field is heading towards.
摘要：深学习及其应用已经级联影响力的研究和发展提供了多种多样存在于真实世界的数据模式的。最近，这增强了研究的兴趣在视觉和语言的舞台与众多的应用程序和快节奏增长的交集。在本文中，我们目前在研究有关视觉和语言模式的最新趋势的详细介绍。我们来看看它在自己的任务配方的应用程序，以及如何解决与语义感知和内容产生的各种问题。我们也地址任务的具体动向，他们的评估策略和即将到来的挑战一起。此外，我们阐明了多学科的模式和洞察力，已经出现在最近过去的一些光，朝着更加模块化和透明的智能系统指导这一领域。本次调查确定的主要发展趋势引力在VisLang研究近期的文献并试图挖掘的方向，该领域朝着前进。

60. Image Captioning with Visual Object Representations Grounded in the Textual Modality [PDF] 返回目录
Dušan Variš, Katsuhito Sudoh, Satoshi Nakamura
Abstract: We present our work in progress exploring the possibilities of a shared embedding space between textual and visual modality. Leveraging the textual nature of object detection labels and the hypothetical expressiveness of extracted visual object representations, we propose an approach opposite to the current trend, grounding of the representations in the word embedding space of the captioning system instead of grounding words or sentences in their associated images. Based on the previous work, we apply additional grounding losses to the image captioning training objective aiming to force visual object representations to create more heterogeneous clusters based on their class label and copy a semantic structure of the word embedding space. In addition, we provide an analysis of the learned object vector space projection and its impact on the IC system performance. With only slight change in performance, grounded models reach the stopping criterion during training faster than the unconstrained model, needing about two to three times less training updates. Additionally, an improvement in structural correlation between the word embeddings and both original and projected object vectors suggests that the grounding is actually mutual.
摘要：我们提出我们的工作正在进行中探索的文字和视觉方式之间的共享嵌入空间的可能性。利用物体检测标签的文本性质和提取的视觉对象表示的假想的表现，我们提出了相反的做法，以目前的趋势，在这个词的陈述的接地嵌入字幕系统空间，而不是接地在其相关的词或句子图片。基于以前的工作中，我们应用额外的接地损失图像字幕旨在迫使视觉对象表示创建基于他们的阶级标签上更多的异构集群培养目标和复制字嵌入空间的语义结构。此外，我们提供学习对象向量空间投影的分析及其对IC系统性能的影响。在性能上只有轻微的改变，基模型达到训练期间停止准则比不受约束模型更快，需要大约少两到三次训练更新。另外，在字的嵌入和原件与投影对象矢量之间的结构相关性的改进表明接地实际上是相互的。

61. Reduce and Reconstruct: Improving Low-resource End-to-end ASR Via Reconstruction Using Reduced Vocabularies [PDF] 返回目录
Anuj Diwan, Preethi Jyothi
Abstract: End-to-end automatic speech recognition (ASR) systems are increasingly being favoured due to their direct treatment of the problem of speech to text conversion. However, these systems are known to be data hungry and hence underperform in low-resource settings. In this work, we propose a seemingly simple but effective technique to improve low-resource end-to-end ASR performance. We compress the output vocabulary of the end-to-end ASR system using linguistically meaningful reductions and then reconstruct the original vocabulary using a standalone module. Our objective is two-fold: to lessen the burden on the low-resource end-to-end ASR system by reducing the output vocabulary space and to design a powerful reconstruction module that recovers sequences in the original vocabulary from sequences in the reduced vocabulary. We present two reconstruction modules, an encoder decoder-based architecture and a finite state transducer-based model. We demonstrate the efficacy of our proposed techniques using ASR systems for two Indian languages, Gujarati and Telugu.
摘要：结束到终端的自动语音识别（ASR）系统越来越多地受到青睐，因为它们直接治疗演讲到文本转换的问题。然而，这些系统被称为是饿了，在低资源设置，因此表现不佳的数据。在这项工作中，我们提出了一个看似简单但有效的技术来提高资源匮乏的端至端ASR性能。我们压缩使用所述端至端ASR系统的输出词汇语言上有意义的减少，然后重建使用独立模块原有的词汇。我们的目标是双重的：通过降低输出词汇空间以减轻低资源端至端ASR系统的负担，并设计一个强大的重建模块，其在从在还原词汇序列原词汇中恢复序列。我们提出了两种重建模块，编码器基于解码器的体系结构和有限状态基于换能器的模式。我们使用ASR系统两种印度语，古吉拉特语和泰卢固证明我们提出的技术效果。

62. Construction and Application of Teaching System Based on Crowdsourcing Knowledge Graph [PDF] 返回目录
Jinta Weng, Ying Gao, Jing Qiu, Guozhu Ding, Huanqin Zheng
Abstract: Through the combination of crowdsourcing knowledge graph and teaching system, research methods to generate knowledge graph and its applications. Using two crowdsourcing approaches, crowdsourcing task distribution and reverse captcha generation, to construct knowledge graph in the field of teaching system. Generating a complete hierarchical knowledge graph of the teaching domain by nodes of school, student, teacher, course, knowledge point and exercise type. The knowledge graph constructed in a crowdsourcing manner requires many users to participate collaboratively with fully consideration of teachers' guidance and users' mobilization issues. Based on the three subgraphs of knowledge graph, prominent teacher, student learning situation and suitable learning route could be visualized. Personalized exercises recommendation model is used to formulate the personalized exercise by algorithm based on the knowledge graph. Collaborative creation model is developed to realize the crowdsourcing construction mechanism. Though unfamiliarity with the learning mode of knowledge graph and learners' less attention to the knowledge structure, system based on Crowdsourcing Knowledge Graph can still get high acceptance around students and teachers
摘要：通过众包知识图和教学系统的结合，研究方法产生知识图及其应用。使用两个众包的临近，众包的任务分配和反向验证码生成，构建知识图在教学系统领域。由学校，学生，教师，课程，知识点和运动类型的节点生成教学领域的一个完整的层次知识图。在众包的方式构造的知识图需要许多用户与充分考虑教师的指导和用户的动员问题协作参与。基于知识图，突出教师，学生的学习情况，并适合自己学习的途径的三个子图可以显现。个性化推荐的练习模型来制定基于知识图的个性化运动的算法。协作创建模型来实现众包建设机制。虽然不熟悉基于众包知识图知识图和学习者不太注意知识结构，系统的学习模式，仍然可以得到各地的学生和教师的高接受

63. Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC [PDF] 返回目录
Mohamed Bader, Ismail Shahin, Abdelfatah Hassan
Abstract: Recently there has been a formidable work which has been put up from the people who are working in the frontlines such as hospitals, clinics, and labs alongside researchers and scientists who are also putting tremendous efforts in the fight against COVID-19 pandemic. Due to the preposterous spread of the virus, the integration of the artificial intelligence has taken a considerable part in the health sector, by implementing the fundamentals of Automatic Speech Recognition (ASR) and deep learning algorithms. In this paper, we illustrate the importance of speech signal processing in the extraction of the Mel-Frequency Cepstral Coefficients (MFCCs) of the COVID-19 and non-COVID-19 samples and find their relationship using Pearson correlation coefficients. Our results show high similarity in MFCCs between different COVID-19 cough and breathing sounds, while MFCC of voice is more robust between COVID-19 and non-COVID-19 samples. Moreover, our results are preliminary, and there is a possibility to exclude the voices of COVID-19 patients from further processing in diagnosing the disease.
摘要：最近，人们已经从谁在第一线工作，如医院，诊所和实验室一起研究人员和科学家谁也投入了巨大的努力，在对抗COVID-19大流行的斗争中，人们提出了一个艰巨的工作。由于病毒的传播荒谬，人工智能的整合已经采取了相当一部分在卫生部门，通过实施自动语音识别（ASR）和深学习算法的基本原理。在本文中，我们示出了语音信号处理的重要性在COVID-19和非COVID-19样品的梅尔频率倒谱系数（MFCC）的提取，并使用皮尔逊相关系数找到他们的关系。我们的研究结果显示，不同COVID-19咳嗽和呼吸声音之间的MFCC较高的相似性，而声音的MFCC是COVID-19和非COVID-19样品之间的更稳健。此外，我们的研究结果是初步的，并没有排除在诊断疾病COVID-19的患者从进一步处理声音的可能性。

64. Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering [PDF] 返回目录
Hantao Huang, Tao Han, Wei Han, Deep Yap, Cheng-Ming Chiang
Abstract: Visual Question Answering (VQA) is challenging due to the complex cross-modal relations. It has received extensive attention from the research community. From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer. This answer will then be checked against the question and image again for the final confirmation. In this paper, we mimic this process and propose a fully attention based VQA architecture. Moreover, an answer-checking module is proposed to perform a unified attention on the jointly answer, question and image representation to update the answer. This mimics the human answer checking process to consider the answer in the context. With answer-checking modules and transferred BERT layers, our model achieves the state-of-the-art accuracy 71.57\% using fewer parameters on VQA-v2.0 test-standard split.
摘要：视觉答疑（VQA）是由于复杂的跨模态关系挑战。它已受到广泛关注的研究领域。从人性的角度，回答一个问题视觉，需要阅读问题，然后参考图像生成一个答案。这个答案将被反对该议题和形象再次为最终确认检查。在本文中，我们模仿这个过程中，提出了一种基于完全关注VQA架构。此外，回答检查模块提出了对联合答案，问题和图像表示执行统一的注意更新的答案。这模仿人类的答案检查过程中要考虑上下文答案。与答案检查模块和转移BERT层，我们的模型实现了使用上VQA-V2.0测试标准分裂更少的参数的状态的最先进的准确性71.57 \％。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-10-20

目录

摘要