摘要

1. Revisiting Few-sample BERT Fine-tuning [PDF] 返回目录
Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, Yoav Artzi
Abstract: We study the problem of few-sample fine-tuning of BERT contextual representations, and identify three sub-optimal choices in current, broadly adopted practices. First, we observe that the omission of the gradient bias correction in the \bertadam optimizer results in fine-tuning instability. We also find that parts of the BERT network provide a detrimental starting point for fine-tuning, and simply re-initializing these layers speeds up learning and improves performance. Finally, we study the effect of training time, and observe that commonly used recipes often do not allocate sufficient time for training. In light of these findings, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe a decrease in their relative impact when modifying the fine-tuning process based on our findings.
摘要：我们研究BERT语境表述的几个样本微调的问题，而在目前，广泛采用的做法，确定3次优选择。首先，我们观察到，在\ bertadam优化的微调不稳定会导致梯度偏差校正的遗漏。我们还发现了BERT网络的那部分提供了微调产生不利的起点，只需重新初始化这些层加快了学习和提高性能。最后，我们研究的训练时间的影响，并观察常用食谱往往不为培训配置足够的时间。在这些研究结果，我们重访最近提出的方法来提高少数样本微调与BERT和重新评估其有效性。一般情况下，我们修改基于我们的研究结果在微调过程中，当观察他们的相对影响的下降。

2. ClarQ: A large-scale and diverse dataset for Clarification Question Generation [PDF] 返回目录
Vaibhav Kumar, Alan W. black
Abstract: Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification questions. In order to overcome these limitations, we devise a novel bootstrapping framework (based on self-supervision) that assists in the creation of a diverse, large-scale dataset of clarification questions based on post-comment tuples extracted from stackexchange. The framework utilises a neural network based architecture for classifying clarification questions. It is a two-step method where the first aims to increase the precision of the classifier and second aims to increase its recall. We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering. The final dataset, ClarQ, consists of ~2M examples distributed across 173 domains of stackexchange. We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
摘要：答疑和对话系统往往感到困惑和需要帮助澄清某些含糊之处。但是，现有的数据集的限制阻碍了能够产生和利用澄清问题的大型模型的开发。为了克服这些限制，我们制定一个新的引导框架（基于自我监督），在创作的基础上，从stackexchange提取后的注释元组澄清的问题多样化，大规模数据集的助攻。该框架采用基于神经网络的架构进行分类澄清的问题。它是一种两步法，其中所述第一目的，以增加分类器和第二目标的精度，以增加它的召回。我们通过定量其应用到问题回答下游任务演示新创建的数据集的效用。最终数据集，ClarQ，由跨越stackexchange 173个域分布〜2M的例子。我们为了发布这个数据集来培育研究，澄清问题生成的场加强对话和答疑系统的更大的目标。

3. Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus [PDF] 返回目录
Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi
Abstract: Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).
摘要：从没有生产语法性别，如英语到性别的标记语言的语言翻译为机器众所周知的困难。这难度也由于其上的模型都是建立在训练数据通常反映自然语言的不对称性，包括性别偏见。与文本数据专喂，机器翻译在本质上的事实，输入句子并不总是包含有关的人被称为实体性别身份的线索制约。但是，与语音翻译，其中输入的音频信号会发生什么？可以将音频提供更多信息，以减少性别偏见？我们目前在语音翻译的性别偏见的第一个彻底的调查，以贡献：1）为今后的研究中有用的基准的发布，以及ii）在两个语言方向不同的技术（级联和终端到终端）的对比（英语 - 意大利/法国）。

4. MC-BERT: Efficient Language Pre-Training via a Meta Controller [PDF] 返回目录
Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Liwei Wang, Jiang Bian, Tie-Yan Liu
Abstract: Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.
摘要：预先训练的情景表示（例如，BERT）已经成为实现许多NLP任务的国家的最先进成果的基础。然而，大规模的前期训练的计算成本高昂。 ELECTRA，早企图加速前培训，火车，预测一个判别模型是否每个令牌是由发电机所取代的输入。我们的研究表明，ELECTRA的成功主要是由于其前期训练任务复杂性降低：二元分类（更换令牌检测）是更有效的学习比一代任务（屏蔽语言模型）。然而，这样一个简单的任务就是少语义信息。为了实现更好的效率和效益，我们提出了一个新的元学习框架，MC-BERT。预训练任务是具有拒绝选项，其中一个元控制器网络提供训练输入和候选多选择完形测试。结果过胶自然语言理解的基准测试表明，我们提出的方法是既高效又有效：它优于胶水语义任务基线给出了相同的计算预算。

5. Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network [PDF] 返回目录
Yutai Hou, Wanxiang Che, Yongkui Lai, Zhihan Zhou, Yijia Liu, Han Liu, Ting Liu
Abstract: In this paper, we explore the slot tagging with only a few labeled support sentences (a.k.a. few-shot). Few-shot slot tagging faces a unique challenge compared to the other few-shot classification problems as it calls for modeling the dependencies between labels. But it is hard to apply previously learned label dependencies to an unseen domain, due to the discrepancy of label sets. To tackle this, we introduce a collapsed dependency transfer mechanism into the conditional random field (CRF) to transfer abstract label dependency patterns as transition scores. In the few-shot setting, the emission score of CRF can be calculated as a word's similarity to the representation of each label. To calculate such similarity, we propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet, by leveraging label name semantics in representing labels. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.
摘要：在本文中，我们探讨只有少数几个标记的支持句插槽标签（又名几拍）。很少拍插槽标签面相比，因为它要求建模标签之间的依赖关系的其他几拍分类问题一个独特的挑战。但是，它是很难以前学过的标签依赖适用于看不见的领域，由于标签组的差异。为了解决这个问题，我们引入一个倒塌的依赖转移机制引入条件随机场（CRF），以抽象的标签依赖性模式转移过渡的分数。在以后的几合一设定中，CRF的排放分数可以计算为一个单词的相似每个标签的代表性。为了计算这种相似性，我们提出了基于国家的最先进的为数不多的镜头分类模型标签增强任务自适应投影网（L-TapNet） - TapNet，通过利用标签名称语义代表标签。实验结果表明，我们的模型由14.64 F1成绩显著胜过最强的几炮学习基线的一次性设置。

6. Position Masking for Language Models [PDF] 返回目录
Andy Wagner, Tiyasa Mitra, Mrinal Iyer, Godfrey Da Costa, Marc Tremblay
Abstract: Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. This is an effective technique which has led to good results on all NLP benchmarks. We propose to expand upon this idea by masking the positions of some tokens along with the masked input token ids. We follow the same standard approach as BERT masking a percentage of the tokens positions and then predicting their original values using an additional fully connected classifier stage. This approach has shown good performance gains (.3\% improvement) for the SQUAD additional improvement in convergence times. For the Graphcore IPU the convergence of BERT Base with position masking requires only 50\% of the tokens from the original BERT paper.
摘要：屏蔽语言建模（MLM）预训练模型，例如BERT腐败通过与[MASK]替换一些令牌，然后训练模型来重建原始的令牌的输入端。这是一种有效的技术，它导致了对所有NLP基准良好的效果。我们建议通过与蒙面输入令牌的ID一起掩盖了一些记号的位置在这个想法扩大。我们遵循相同的标准方法，因为BERT掩盖记号位置的百分比，然后使用一个额外的完全连接分类阶段预测其原始值。这种方法已经显示出在收敛时间队内进一步改善良好的性能提升（0.3 \％的改善）。对于Graphcore IPU BERT碱与位置掩蔽会聚仅需要50 \％从原始BERT纸令牌的。

7. Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors [PDF] 返回目录
Longshaokan Wang, Maryam Fazel-Zarandi, Aditya Tiwari, Spyros Matsoukas, Lazaros Polymenakos
Abstract: Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR) and feed the text to downstream dialog models for natural language understanding and response generation. The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time. To bridge the gap and make dialog models more robust to ASR errors, we leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data. Compared to other approaches for handling ASR errors, such as using ASR lattice or end-to-end methods, our data augmentation approach does not require any modification to the ASR or downstream dialog models; our approach also does not introduce any additional latency during inference time. We perform extensive experiments on benchmark data and show that our approach improves the performance of downstream dialog models in the presence of ASR errors, and it is particularly effective in the low-resource situations where there are constraints on model size or the training data is scarce.
摘要：基于语音的虚拟助理，如Amazon的Alexa，谷歌的助手，和苹果的Siri，典型转换用户的音频信号，通过自动语音识别（ASR），文本数据和文本输送到下游的对话模型自然语言理解和响应代。的ASR输出是容易出错;然而，下游对话框模型往往训练有素上无差错的文本数据，使他们在推理时间ASR误差非常敏感。为了缩小差距，使对话模式更稳健ASR错误，我们利用了ASR错误模拟器噪声注入到无差错的文本数据，并随后与增强的数据训练的对话模式。相比其他方法处理ASR错误，如使用ASR格子或终端到终端的方法，我们的数据隆胸方法不需要任何修改ASR或下游对话模式;我们的方法也没有在推理时间引入任何额外的延迟。我们的基准数据进行大量的实验，并表明我们的方法改进了ASR错误的存在下游对话框车型的性能，并且它是在有关于模型的大小或训练数据的限制是稀缺资源匮乏的情况下特别有效。

8. Understanding Points of Correspondence between Sentences for Abstractive Summarization [PDF] 返回目录
Logan Lebanoff, John Muchovej, Franck Dernoncourt, Doo Soon Kim, Lidan Wang, Walter Chang, Fei Liu
Abstract: Fusing sentences containing disparate content is a remarkable human ability that helps create informative and succinct summaries. Such a simple task for humans has remained challenging for modern abstractive summarizers, substantially restricting their applicability in real-world scenarios. In this paper, we present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text. The types of points of correspondence are delineated by text cohesion theory, covering pronominal and nominal referencing, repetition and beyond. We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences. Our dataset bridges the gap between coreference resolution and summarization. It is publicly shared to serve as a basis for future work to measure the success of sentence fusion systems. (this https URL)
摘要：包含不同的内容融合的句子是一个了不起的人的能力，有助于创造信息和简要概述。对于人类这样一个简单的任务仍然具有挑战性的现代抽象summarizers，相当程度上限制真实场景的适用性。在本文中，我们提出了一个调查融合通过引入对应的点，这是任何两个句子绑在一起成一个连贯的文本衔接的概念从文档得出句子。该类型对应的点都是由文本衔接理论划定，覆盖代词和名义参照，重复和超越。我们创建一个包含文档，源和融合的句子的数据集，和句子之间的对应点的人的注释。我们的数据桥接指代消解和总结之间的差距。据公开共享，以作为今后工作的基础来衡量句子融合系统的成功。（此HTTPS URL）

9. Adversarial Training Based Multi-Source Unsupervised Domain Adaptation for Sentiment Analysis [PDF] 返回目录
Yong Dai, Jian Liu, Xiancong Ren, Zenglin Xu
Abstract: Multi-source unsupervised domain adaptation (MS-UDA) for sentiment analysis (SA) aims to leverage useful information in multiple source domains to help do SA in an unlabeled target domain that has no supervised information. Existing algorithms of MS-UDA either only exploit the shared features, i.e., the domain-invariant information, or based on some weak assumption in NLP, e.g., smoothness assumption. To avoid these problems, we propose two transfer learning frameworks based on the multi-source domain adaptation methodology for SA by combining the source hypotheses to derive a good target hypothesis. The key feature of the first framework is a novel Weighting Scheme based Unsupervised Domain Adaptation framework (WS-UDA), which combine the source classifiers to acquire pseudo labels for target instances directly. While the second framework is a Two-Stage Training based Unsupervised Domain Adaptation framework (2ST-UDA), which further exploits these pseudo labels to train a target private extractor. Importantly, the weights assigned to each source classifier are based on the relations between target instances and source domains, which measured by a discriminator through the adversarial training. Furthermore, through the same discriminator, we also fulfill the separation of shared features and private features. Experimental results on two SA datasets demonstrate the promising performance of our frameworks, which outperforms unsupervised state-of-the-art competitors.
摘要：多源监督的领域适应性（MS-UDA）的情感分析（SA）的目标是在多个源域来帮助杠杆有用的信息做SA在没有监督信息未标记的目标域。 MS-UDA的现有算法或者仅利用所述共享的功能，即，域不变的信息，或基于在NLP一些弱假设，例如，光滑的假设。为了避免这些问题，我们建议结合源假设推导出良好的目标的假设基础上，多源领域适应性方法的SA两个转移的学习框架。所述第一框架的关键特征是新颖的基于加权方案的无监督域自适应框架（WS-UDA），它们的源分类器直接结合到获取伪标签目标实例。而第二个框架是一个基于两级培训无监督领域适应性框架（2ST-UDA），这进一步利用这些伪标签，培养目标的私人提取。重要的是，分配给每个源分类器的权重是基于目标实例和源结构域之间的关系，它通过对抗训练由鉴别器来测量。此外，通过相同的鉴别，我们也实现共享功能和专用功能的分离。在两个SA数据集实验结果表明，我们的框架，它优于国家的最先进的无监督竞争对手的前途表现。

10. Predicting and Analyzing Law-Making in Kenya [PDF] 返回目录
Oyinlola Babafemi, Adewale Akinfaderin
Abstract: Modelling and analyzing parliamentary legislation, roll-call votes and order of proceedings in developed countries has received significant attention in recent years. In this paper, we focused on understanding the bills introduced in a developing democracy, the Kenyan bicameral parliament. We developed and trained machine learning models on a combination of features extracted from the bills to predict the outcome - if a bill will be enacted or not. We observed that the texts in a bill are not as relevant as the year and month the bill was introduced and the category the bill belongs to.
摘要：建模与分析议会立法，唱名投票和发达国家的诉讼秩序已收到显著重视，近年来。在本文中，我们重点了解在发展民主，肯尼亚两院制议会提出的条例草案。我们开发和从票据中提取预测结果的功能组合训练的机器学习模型 - 如果法案将颁布与否。我们观察到，在法案文本都没有的年份和月份引入该法案，而该法案所属的类别相关。

11. Modeling Label Semantics for Predicting Emotional Reactions [PDF] 返回目录
Radhika Gaonkar, Heeyoung Kwon, Mohaddeseh Bastan, Niranjan Balasubramanian, Nathanael Chambers
Abstract: Predicting how events induce emotions in the characters of a story is typically seen as a standard multi-label classification task, which usually treats labels as anonymous classes to predict. They ignore information that may be conveyed by the emotion labels themselves. We propose that the semantics of emotion labels can guide a model's attention when representing the input story. Further, we observe that the emotions evoked by an event are often related: an event that evokes joy is unlikely to also evoke sadness. In this work, we explicitly model label classes via label embeddings, and add mechanisms that track label-label correlations both during training and inference. We also introduce a new semi-supervision strategy that regularizes for the correlations on unlabeled data. Our empirical evaluations show that modeling label semantics yields consistent benefits, and we advance the state-of-the-art on an emotion inference task.
摘要：预测事件是如何引发的故事，人物情感通常被视为一个标准的多标签分类任务，平时零食标签为匿名类来预测哪些。他们忽略可能由情绪标签本身传达的信息。我们建议，表示输入故事时，情绪标签的语义可以指导模型的关注。此外，我们观察到，由事件引起的情绪往往涉及：事件唤起快乐是不可能也是悲伤呼魂。在这项工作中，通过标签的嵌入我们明确型号标签类，并添加机制，培训和推理过程跟踪标签，标签的相关性。我们还引入规则化对标签数据的相关性一个新的半监督策略。我们的实证评价表明，造型标签语义产生一致的利益，我们在前进的情感推理任务的国家的最先进的。

12. Unsupervised Paraphrase Generation using Pre-trained Language Models [PDF] 返回目录
Chaitra Hegde, Shrikumar Patil
Abstract: Large scale Pre-trained Language Models have proven to be very powerful approach in various Natural language tasks. OpenAI's GPT-2 \cite{radford2019language} is notable for its capability to generate fluent, well formulated, grammatically consistent text and for phrase completions. In this paper we leverage this generation capability of GPT-2 to generate paraphrases without any supervision from labelled data. We examine how the results compare with other supervised and unsupervised approaches and the effect of using paraphrases for data augmentation on downstream tasks such as classification. Our experiments show that paraphrases generated with our model are of good quality, are diverse and improves the downstream task performance when used for data augmentation.
摘要：大型预训练的语言模型已经被证明是在各种自然语言的任务非常有效的方法。 OpenAI的GPT-2 \ {引用} radford2019language值得注意的是它的能力产生流畅，精心拟定，语法一致的文本和短语完成。在本文中，我们充分利用GPT-2的这一代人的能力产生不复述从标数据的任何监督。我们研究的结果如何与其他监督和无监督的方法和使用上下游任务，如分类数据增强复述的效果进行比较。我们的实验表明，我们的模型生成的释义质量都不错的，是多种多样的，提高用于数据增强当下游任务性能。

13. Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation [PDF] 返回目录
Liqun Shao, Sahitya Mantravadi, Tom Manzini, Alejandro Buendia, Manon Knoertzer, Soundar Srinivasan, Chris Quirk
Abstract: In this paper, we detail novel strategies for interpolating personalized language models and methods to handle out-of-vocabulary (OOV) tokens to improve personalized language models. Using publicly available data from Reddit, we demonstrate improvements in offline metrics at the user level by interpolating a global LSTM-based authoring model with a user-personalized n-gram model. By optimizing this approach with a back-off to uniform OOV penalty and the interpolation coefficient, we observe that over 80% of users receive a lift in perplexity, with an average of 5.2% in perplexity lift per user. In doing this research we extend previous work in building NLIs and improve the robustness of metrics for downstream tasks.
摘要：在本文中，我们详细介绍了内插个性化语言模型和方法，新策略来处理外的词汇（OOV）令牌，以提高个性化的语言模型。从reddit的使用公开数据，我们通过插值与用户个性化的n-gram模型基于全球LSTM创作模型展示在用户级下线指标的改善。通过优化与退避到均匀OOV罚分和内插系数这种方法，我们观察到用户的80％以上的接收升降机困惑，平均5.2％在每个用户困惑升力。在做这个研究，我们在建设NLIS延伸了先前的工作，提高指标下游任务的鲁棒性。

14. Combination of abstractive and extractive approaches for summarization of long scientific texts [PDF] 返回目录
Vladislav Tretyak
Abstract: In this research work, we present a method to generate summaries of long scientific documents that uses the advantages of both extractive and abstractive approaches. Before producing a summary in an abstractive manner, we perform the extractive step, which then is used for conditioning the abstractor module. We used pre-trained transformer-based language models, for both extractor and abstractor. Our experiments showed that using extractive and abstractive models jointly significantly improves summarization results and ROUGE scores.
摘要：在这项研究工作中，我们提出生成的同时使用采掘和抽象方法的优势，长期科学文献汇总的方法。在一个抽象的方式产生一个摘要之前，我们进行萃取步骤，其然后被用于调理所述提取器模块。我们使用预先训练基于变压器的语言模型，对于提取和抽象的。我们的实验表明，采用采掘和抽象模型共同显著提高了总结成果和ROUGE得分。

15. Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience [PDF] 返回目录
Isar Nejadgholi, Kathleen C. Fraser, Berry De Bruijn
Abstract: When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based F-score which is shown to be a better approximation of a forgiving user's experience than the relaxed F-score. We demonstrated the results of applying the proposed evaluation metric for a variety of deep learning medical entity recognition models trained with two datasets.
摘要：当比较用在测试集金标准注释的医疗实体识别系统提取的实体，可能会出现两种类型的不匹配，标记不匹配或不匹配的跨度。在这里，我们重点关注跨度不匹配，并表明其严重程度可以从一个严重错误，以一个完全可以接受的实体提取由于跨度注释的主观性而变化。对于一个特定的基于域的BERT-NER系统，我们发现，这些错误的25％，具有相同的标签和重叠的跨度与金标准的实体。我们收集了这些错配的节目的90％以上被接受或部分地被用户接受专家判断。使用NER系统的训练集，我们建立了一个快速，轻量级的实体分类近似通过接受或拒绝他们这种不匹配的用户体验。通过该分类作出的决定是用来计算一个学习型的F-score被证明是一个宽容的用户比放宽F-得分体验更好的近似。我们展示了应用所提出的评价指标，适用于各种有两个数据集训练的深度学习医学实体识别模型的结果。

16. Knowledge-Aided Open-Domain Question Answering [PDF] 返回目录
Mantong Zhou, Zhouxing Shi, Minlie Huang, Xiaoyan Zhu
Abstract: Open-domain question answering (QA) aims to find the answer to a question from a large collection of documents.Though many models for single-document machine comprehension have achieved strong performance, there is still much room for improving open-domain QA systems since document retrieval and answer reranking are still unsatisfactory. Golden documents that contain the correct answers may not be correctly scored by the retrieval component, and the correct answers that have been extracted may be wrongly ranked after other candidate answers by the reranking component. One of the reasons is derived from the independent principle in which each candidate document (or answer) is scored independently without considering its relationship to other documents (or answers). In this work, we propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and candidate answer reranking by considering the relationship between a question and the documents (termed as question-document graph), and the relationship between candidate documents (termed as document-document graph). The graphs are built using knowledge triples from external knowledge resources. During document retrieval, a candidate document is scored by considering its relationship to the question and other documents. During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents. The experimental results show that our proposed method improves document retrieval and answer reranking, and thereby enhances the overall performance of open-domain question answering.
摘要：开放域问答（QA）的目标中找到了答案，从一个大的集合documents.Though许多型号的单文档机器理解的问题都取得了强劲的性能，还有很大的改进余地开放域QA由于文献检索和答案重新排名系统仍然不能令人满意。包含正确的答案黄金文档可能不被检索组件正确得分，并已提取的正确答案可以通过重新排名成分以外的候选答案后，被错误地排名。其中一个原因是，其中每个候选文档（或答案），不考虑其对其他文件（或答案）之间的关系进行独立评分的独立原则的。在这项工作中，我们提出一种改进相关文献检索和候选答案通过考虑一个问题和文档之间的关系再排序目标（称为问题的文档图）知识辅助开域QA（KAQA）方法，以及候选文档之间关系（称为文档文件图）。该图是使用三元的知识与外部知识资源建设。在文献检索，候选文档被考虑的的问题和其他文件的关系评分。在回答重新排名，候选答案是不仅使用其自己的上下文，但也从其他文档的线索重新分级。实验结果表明，该方法提高了文献检索和答案重新排名，从而提高开放域问答的整体性能。

17. Re-evaluating phoneme frequencies [PDF] 返回目录
Jayden L. Macklin-Cordes, Erich R. Round
Abstract: Causal processes can give rise to distinctive distributions in the linguistic variables that they affect. Consequently, a secure understanding of a variable's distribution can hold a key to understanding the forces that have causally shaped it. A storied distribution in linguistics has been Zipf's law, a kind of power law. In the wake of a major debate in the sciences around power-law hypotheses and the unreliability of earlier methods of evaluating them, here we re-evaluate the distributions claimed to characterize phoneme frequencies. We infer the fit of power laws and three alternative distributions to 168 Australian languages, using a maximum likelihood framework. We find evidence supporting earlier results, but also qualifying and nuancing them. Most notably, phonemic inventories appear to have a Zipfian-like frequency structure among their most-frequent members (though perhaps also a lognormal structure) but a geometric (or exponential) structure among the least-frequent. We highlight implications for causal accounts.
摘要：因果过程可以产生于语言变量鲜明分布，它们的影响。因此，一个变量的分布的安全理解可以抓住重点了解那些因果形它的力量。在语言学传奇的分布一直是齐普夫定律，是一种功法。在各地的幂律假说科学的重大辩论和评估他们的早期方法，在这里我们不可靠性之后重新评估分布声称表征音素的频率。我们推断的幂律和三个替代分布于168种澳大利亚语言的配合，使用最大似然框架。我们发现证据支持先前的结果，也有资格和他们nuancing。最值得注意的是，音位库存似乎有自己最频繁的成员之间Zipfian样频率结构（虽然也许还对数正态分布结构），但其中最频繁的几何（或指数）的结构。我们强调因果账户的影响。

18. Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation [PDF] 返回目录
Qiu Ran, Yankai Lin, Peng Li, Jie Zhou
Abstract: Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process. However, NAT discards the dependency information in a sentence, and thus inevitably suffers from the multi-modality problem: the target tokens may be provided by different possible translations, often causing token repetitions or missing. To alleviate this problem, we propose a novel semi-autoregressive model RecoverSAT in this work, which generates a translation as a sequence of segments. The segments are generated simultaneously while each segment is predicted token-by-token. By dynamically determining segment length and deleting repetitive segments, RecoverSAT is capable of recovering from repetitive and missing token errors. Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4$\times$ speedup while maintaining comparable performance compared with the corresponding autoregressive model.
摘要：非自回归神经机器翻译（NAT）同时预测整个目标序列和显著加速推理过程。然而，NAT丢弃在一个句子的依赖性的信息，并且因此不可避免地从多模态问题的困扰：目标令牌可以由不同的可能的翻译提供，常常引起令牌重复或缺失。为了缓解这一问题，我们提出在这项工作中，产生翻译为段序列一种新的半自回归模型RecoverSAT。而各段被预测令牌通过令牌所述段被同时产生。通过动态地确定分段长度和删除重复段，RecoverSAT能够从重复回收和缺少标记错误。三个广泛使用的标准数据集实验结果表明，该模型实现了超过4 $ \ $倍，而加速与相应的自回归模型相比保持相当的性能。

19. ConfNet2Seq: Full Length Answer Generation from Spoken Questions [PDF] 返回目录
Vaishali Pal, Manish Shrivastava, Laurent Besacier
Abstract: Conversational and task-oriented dialogue systems aim to interact with the user using natural responses through multi-modal interfaces, such as text or speech. These desired responses are in the form of full-length natural answers generated over facts retrieved from a knowledge source. While the task of generating natural answers to questions from an answer span has been widely studied, there has been little research on natural sentence generation over spoken content. We propose a novel system to generate full length natural language answers from spoken questions and factoid answers. The spoken sequence is compactly represented as a confusion network extracted from a pre-trained Automatic Speech Recognizer. This is the first attempt towards generating full-length natural answers from a graph input(confusion network) to the best of our knowledge. We release a large-scale dataset of 259,788 samples of spoken questions, their factoid answers and corresponding full-length textual answers. Following our proposed approach, we achieve comparable performance with best ASR hypothesis.
摘要：会话和面向任务的对话系统旨在通过多模式界面，如文本或语音使用自然响应用户交互。这些所希望的响应是在从一个知识源检索事实产生的全长天然答案的形式。虽然从一个答案跨度自然产生的问题的答案的任务已被广泛研究，已经超过语音内容上自然语句生成的研究很少。我们提出了一个新颖的系统来从口语的问题和答案的仿真陈述全长自然语言的答案。说出的序列被紧凑地表示为从预先训练自动语音识别提取的混淆网络。这是迈向生成从图形输入（混淆网络）全长天然答案，我们所知的第一次尝试。我们发布的口语问题，他们的答案的仿真陈述和相应的全长文本答案259788个样本的大型数据集。根据我们提出的方法，我们实现了与最好的ASR假设相当的性能。

20. Human brain activity for machine attention [PDF] 返回目录
Lukas Muttenthaler, Nora Hollenstein, Maria Barrett
Abstract: Cognitively inspired NLP leverages human-derived data to teach machines about language processing mechanisms. Recently, neural networks have been augmented with behavioral data to solve a range of NLP tasks spanning syntax and semantics. We are the first to exploit neuroscientific data, namely electroencephalography (EEG), to inform a neural attention model about language processing of the human brain with direct cognitive measures. Part of the challenge in working with EEG is that features are exceptionally rich and need extensive pre-processing to isolate signals specific to text processing. We devise a method for finding such EEG features to supervise machine attention through combining theoretically motivated cropping with random forest tree splits. This method considerably reduces the number of dimensions of the EEG data. We employ the method on a publicly available EEG corpus and demonstrate that the pre-processed EEG features are capable of distinguishing two reading tasks. We apply these features to regularise attention on relation classification and show that EEG is more informative than strong baselines. This improvement, however, is dependent on both the cognitive load of the task and the EEG frequency domain. Hence, informing neural attention models through EEG signals has benefits but requires further investigation to understand which dimensions are most useful across NLP tasks.
摘要：在认知启发NLP杠杆人得出的数据大约语言处理机制教机。最近，神经网络已扩充与行为数据来解决一系列的NLP任务跨越语法和语义。我们是第一个利用神经科学的数据，即脑电图（EEG），通知神经注意力模型对人脑直接认知措施的语言处理。在脑电图工作面临的挑战之一是，功能异常丰富，需要大量的预先处理隔离信号特定的文本处理。我们制定了找到这样EEG的方法，通过特征理论上种植积极性随机林木分割相结合，督促机关注。此方法显着降低了EEG数据的维数。我们采用在公开的脑电图语料库的方法，并证明预处理脑电图的特点是能够区分两种阅读任务。我们运用这些功能来规范注意力放在关系分类和显示，脑电图比强基线提供更多的信息。这一改进，然而，依赖于这两个任务的认知负荷和EEG频域。因此，通过脑电图信号通知的神经关注车型具有优势，但还需要进一步调查，以了解哪些维度跨越NLP任务最有用的。

21. HausaMT v1.0: Towards English-Hausa Neural Machine Translation [PDF] 返回目录
Adewale Akinfaderin
Abstract: Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English-Hausa machine translation, which is considered a task for low-resource language. The Hausa language is the second largest Afro-Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa-English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder-decoder architecture with two tokenization approaches: standard word-level tokenization and Byte Pair Encoding (BPE) subword tokenization.
摘要：神经机器翻译（NMT）对于因为缺乏大量的并行数据和语言多样性的低性能低资源语言受到影响。为了有助于缓解这一问题，我们建立了英语，豪萨语机器翻译，这被认为是资源少的语言任务的基准模型。豪萨语言是阿拉伯语之后的世界第二大亚非语系语言，它是西非地区国家的大片大买卖，英语和法语之后的第三大语言。在本文中，我们策划含豪萨 - 英语平行语料库为我们的翻译不同的数据集。我们训练有素的基线模型和评估使用复发和变压器编码器，解码器架构有两个符号化接近我们的模型的性能：标准字级符号化和字节对编码（BPE）子字符号化。

22. Universal Vector Neural Machine Translation With Effective Attention [PDF] 返回目录
Satish Mylapore, Ryan Quincy Paul, Joshua Yi, Robert D. Slater
Abstract: Neural Machine Translation (NMT) leverages one or more trained neural networks for the translation of phrases. Sutskever introduced a sequence to sequence based encoder-decoder model which became the standard for NMT based systems. Attention mechanisms were later introduced to address the issues with the translation of long sentences and improving overall accuracy. In this paper, we propose a singular model for Neural Machine Translation based on encoder-decoder models. Most translation models are trained as one model for one translation. We introduce a neutral/universal model representation that can be used to predict more than one language depending on the source and a provided target. Secondly, we introduce an attention model by adding an overall learning vector to the multiplicative model. With these two changes, by using the novel universal model the number of models needed for multiple language translation applications are reduced.
摘要：神经机器翻译（NMT）杠杆一个或短语的翻译更训练有素的神经网络。 Sutskever引入的序列基于序列编码器 - 解码器模型，其成为用于基于NMT系统的标准。注意机制后来被引入，以解决与长句的翻译，提高整体精度的问题。在本文中，我们提出了基于编码器，解码器模型神经机器翻译的单一模式。大多数翻译模型被训练作为一个翻译一个模型。我们介绍了可用于根据源和一个提供的目标来预测一种以上语言的中性/通用模型表示。其次，我们通过增加一个整体学习向量的乘法模型引入注意模型。有了这两个变化，通过使用新型通用模型所需要的多语言翻译应用程序模型的数量减少。

23. Embed2Detect: Temporally Clustered Embedded Words for Event Detection in Social Media [PDF] 返回目录
Hansi Hettiarachchi, Mariam Adedoyin-Olowe, Jagdev Bhogal, Mohamed Medhat Gaber
Abstract: Event detection in social media refers to automatic identification of important information shared in social media platforms on a certain time. Considering the dynamic nature and high volume of data production in data streams, it is impractical to filter the events manually. Therefore, it is important to have an automated mechanism to detect events in order to utilise social media data effectively. Analysing the available literature, most of the existing event detection methods are only focused on statistical and syntactical features in data, even though the underlying semantics are also important for an effective information retrieval from text, because they describe the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in prediction-based word embeddings and hierarchical agglomerative clustering. The adoption of prediction-based word embeddings incorporates the semantical features in the text to overcome a major limitation available with previous approaches. This method is experimented on two recent social media data sets which represent the sports and politics domains. The results obtained from the experiments reveal that our approach is capable of effective and efficient event detection with the proof of significant improvements over baselines. For sports data set, Embed2Detect achieved 30% higher F-measure than the best performed baseline method and for political data set, it was an increase by 36%.
摘要：在社交媒体事件检测是指在社会化媒体平台上共享一段时间的重要信息自动识别。考虑的动态性质和在数据流中的数据生产高体积，是不切实际的手动过滤的事件。因此，有一个自动的机制来检测，以便有效地利用社交媒体数据事件是很重要的。分析现有的文献，大部分现有的事件检测方法只专注于在数据统计和句法功能，即使相关的语义也从文字的有效信息检索的重要，因为他们描述词语和意义之间的连接。在本文中，我们提出了一个新的方法在社交媒体称为Embed2Detect事件检测由特性基于预测的字的嵌入和分层合并聚类合并。基于预测的字的嵌入的通过合并文本的语义功能，以克服现有先前方法的主要限制。这种方法试验了最近两次的社交媒体数据集代表了体育和政治领域。从实验中获得的结果表明，我们的方法能够有效和高效事件检测与超过基线显著改善的证明。对于体育数据集，Embed2Detect实现高30％F值比最好的执行基线方法和政治的数据集，这是一个通过增加36％。

24. Estimating semantic structure for the VQA answer space [PDF] 返回目录
Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
Abstract: Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers. Despite its convenience, this classification approach poorly reflects the semantics of the problem limiting the answering to a choice between independent proposals, without taking into account the similarity between them (e.g. equally penalizing for answering cat or German shepherd instead of dog). We address this issue by proposing (1) two measures of proximity between VQA classes, and (2) a corresponding loss which takes into account the estimated proximity. This significantly improves the generalization of VQA models by reducing their language bias. In particular, we show that our approach is completely model-agnostic since it allows consistent improvements with three different VQA models. Finally, by combining our method with a language bias reduction approach, we report SOTA-level performance on the challenging VQAv2-CP dataset.
摘要：由于它的外观，视觉答疑（VQA，即回答了提出的图像的问题），一直被视为在一组预定义答案的分类问题。尽管它的便利性，这种分类方法反映不佳限制了应答独立提案之间进行选择的问题的语义，而不考虑它们之间的相似性（例如同样惩罚回答猫或德国牧羊犬，而不是狗）。我们通过提出解决这个问题（1）VQA类之间的接近，以及两项措施（2）相应的损失，考虑到估计接近。通过减少他们的语言偏这显著提高VQA车型的推广。特别是，我们表明，我们的做法是完全模型无关，因为它可以用三种不同的型号VQA持续改善。最后，我们的方法与语言偏倚减少相结合的方法，我们报告的挑战VQAv2-CP数据集SOTA级性能。

25. Learning Functions to Study the Benefit of Multitask Learning [PDF] 返回目录
Gabriele Bettgenhäuser, Michael A. Hedderich, Dietrich Klakow
Abstract: We study and quantify the generalization patterns of multitask learning (MTL) models for sequence labeling tasks. MTL models are trained to optimize a set of related tasks jointly. Although multitask learning has achieved improved performance in some problems, there are also tasks that lose performance when trained together. These mixed results motivate us to study the factors that impact the performance of MTL models. We note that theoretical bounds and convergence rates for MTL models exist, but they rely on strong assumptions such as task relatedness and the use of balanced datasets. To remedy these limitations, we propose the creation of a task simulator and the use of Symbolic Regression to learn expressions relating model performance to possible factors of influence. For MTL, we study the model performance against the number of tasks (T), the number of samples per task (n) and the task relatedness measured by the adjusted mutual information (AMI). In our experiments, we could empirically find formulas relating model performance with factors of sqrt(n), sqrt(T), which are equivalent to sound mathematical proofs in Maurer[2016], and we went beyond by discovering that performance relates to a factor of sqrt(AMI).
摘要：我们研究和量化多任务学习（MTL）模型的推广模式为序列标注任务。 MTL模型进行培训，以共同优化一组相关的任务。虽然多任务学习已在一些问题取得了更好的性能，也有失去效能，当一起训练任务。这些混合的结果促使我们研究这种影响MTL模型的性能的因素。我们注意到，对于MTL模型理论界和收敛速度存在，但他们依靠强大的假设，如任务关联性和使用平衡的数据集。为了弥补这些缺陷，我们提出了一个任务模拟器的创建和使用符号回归的学习与模型性能的影响可能因素表情。对于MTL，我们研究对任务（T）的数量，每个任务（N）和调整后的交互信息（AMI）测量任务关联的样本数量模型的性能。在我们的实验中，我们可以凭经验查找与使用的sqrt（N），开方（T）的因素，这等同于声音的数学证明在毛雷尔[2016年]模型性能的公式，我们超越了通过发现性能涉及的因素SQRT的（AMI）。

26. Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation [PDF] 返回目录
Changhan Wang, Juan Pino, Jiatao Gu
Abstract: Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low-resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6% word error rate (WER) reduction to the baseline (direct transfer from high-resource ASR). We show that training ST with human translations is not necessary. ST trained with machine translation (MT) pseudo-labels brings consistent gains. It can even outperform those using human labels when transferred to target ASR by leveraging only 500K MT examples. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
摘要：从资源丰富的语言迁移学习被称为是改善低资源语言的端至端自动语音识别（ASR）的有效方式。预先训练或联合训练的编码器，解码器模组，然而，不共享同一语言，这很可能是低效远处目标语言的语言模型（解码器）。我们引入语音到文本转换（ST）作为辅助任务纳入目标语言的额外知识，使从该目标语言传递。具体来说，我们第一个高资源ASR成绩单翻译成目标低资源语言与一个ST模型进行训练。与ST目标ASR共享相同的关注，基于编码器的解码器结构和词汇。前者的任务则提供用于后者的完全预先训练模式，带来高达（WER）减少24.6％，字错误率基线（从高资源ASR直接转移）。我们表明，人类的翻译训练ST是没有必要的。 ST训练有素的机器翻译（MT）伪标签带来的收益保持一致。当通过利用只有500K MT例子转移到目标ASR它甚至可以超越那些使用人的标签。即使从低资源MT（200K例子）伪标签，ST增强转移带来高达8.9％WER减少直接转移。

27. Dialog Policy Learning for Joint Clarification and Active Learning Queries [PDF] 返回目录
Aishwarya Padmakumar, Raymond J. Mooney
Abstract: Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on dialog systems has either focused on exclusively learning how to perform clarification/ information seeking, or to perform active learning. In this work, we train a hierarchical dialog policy to jointly perform {\it both} clarification and active learning in the context of an interactive language-based image retrieval task motivated by an on-line shopping application, and demonstrate that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions.
摘要：智能系统需要能够从错误中，决心不确定性恢复，并适应训练中没有见到的新概念。对话交互可以通过使用澄清的整改和解决的不确定性，以及主动学习的查询启用此学习操作过程中遇到的新概念。在对话系统之前的工作要么集中在专门学习如何进行澄清/信息搜索，或进行主动学习。在这项工作中，我们培养的层次对话的政策，共同执行{\它既}澄清和主动学习通过一个在线购物应用动机交互式的基于语言的图像检索任务的上下文，并表明共同学习对话框政策澄清和主动学习比使用静态对话框政策的一个或两个的这些功能更有效。

28. Vocal markers from sustained phonation in Huntington's Disease [PDF] 返回目录
Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi
Abstract: Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntington's Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.
摘要：疾病修饰治疗目前正在评估神经变性疾病。亨廷顿氏病代表设计自动亚临床标记一个独特的机会，即使是在premanifest基因携带者。我们调查发声障碍的潜在的临床指标，并提出他们诊断和基因携带者跟进。我们使用了两套功能：发声功能和调制功率谱特点。我们发现，发声不足以premanifest基因携带者的亚临床疾病的鉴别。根据我们的回归结果，发声功能，适合在亨廷顿氏病临床表现的预测。

29. Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition [PDF] 返回目录
Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi, Sunayana Sitaram
Abstract: Recognizing code-switched speech is challenging for Automatic Speech Recognition (ASR) for a variety of reasons, including the lack of code-switched training data. Recently, we showed that monolingual ASR systems fine-tuned on code-switched data deteriorate in performance on monolingual speech recognition, which is not desirable as ASR systems deployed in multilingual scenarios should recognize both monolingual and code-switched speech with high accuracy. Our experiments indicated that this loss in performance could be mitigated by using certain strategies for fine-tuning and regularization, leading to improvements in both monolingual and code-switched ASR. In this work, we present further improvements over our previous work by using domain adversarial learning to train task agnostic models. We evaluate the classification accuracy of an adversarial discriminator and show that it can learn shared layer parameters that are task agnostic. We train end-to-end ASR systems starting with a pooled model that uses monolingual and code-switched data along with the adversarial discriminator. Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
摘要：认识代码交换语音具有挑战性的自动语音识别（ASR），适用于各种原因，包括缺乏代码交换训练数据。最近，我们发现，单语ASR系统微调的代码交换数据在单语语音识别，这是不理想的，因为部署在多语种情况下应该承认双方的单语和代码交换语音高精度ASR系统的性能恶化。我们的实验表明，这种性能损失可以通过一定的策略进行微调正规化得到缓解，导致这两个单语和改进代码交换ASR。在这项工作中，我们提出了我们以前使用域对抗的学习训练任务无关的模型工作进一步改善。我们评估的对抗性鉴别分类准确性，表明它可以学习是任务无关的共享层参数。我们训练结束到终端的ASR系统的汇集模型开始使用单语和与敌对鉴别沿着代码交换数据。我们提出的技术导致了跨越三个语言对在单语和代码交换测试集字错误率（WER）减少。

30. audino: A Modern Annotation Tool for Audio and Speech [PDF] 返回目录
Manraj Singh Grover, Pakhi Bamdev, Yaman Kumar, Mika Hama, Rajiv Ratn Shah
Abstract: In this paper, we introduce a collaborative and modern annotation tool for audio and speech: audino. The tool allows annotators to define and describe temporal segmentation in audios. These segments can be labelled and transcribed easily using a dynamically generated form. An admin can centrally control user roles and project assignment through the admin dashboard. The dashboard also enables describing labels and their values. The annotations can easily be exported in JSON format for further processing. The tool allows audio data to be uploaded and assigned to a user through a key-based API. The flexibility available in the annotation tool enables annotation for Speech Scoring, Voice Activity Detection (VAD), Speaker Diarisation, Speaker Identification, Speech Recognition, Emotion Recognition tasks and more. The MIT open source license allows it to be used for academic and commercial projects.
摘要：在本文中，我们介绍了音频和语音协作和现代注释工具：audino。该工具允许注释定义和描述在音频域分割。这些片段可以被标记，并容易地使用动态生成的形式转录。管理员可以通过集中的管理仪表板控制用户角色和项目分配。仪表板还使描述标签和它们的值。注释可以很容易地在JSON格式导出用于进一步处理。该工具允许音频数据以被上传，并通过基于密钥的API分配给用户。在注释工具的灵活性将使得注释的语音评分，语音活动检测（VAD），扬声器Diarisation，说话人识别，语音识别，情感识别任务和更多。麻省理工学院的开放源码许可证允许它被用于学术和商业项目。

31. Graph-Aware Transformer: Is Attention All Graphs Need? [PDF] 返回目录
Sanghyun Yoo, Young-Seok Kim, Kang Hyun Lee, Kuhwan Jeong, Junhwi Choi, Hoshik Lee, Young Sang Choi
Abstract: Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer, shows superior performance in various tasks especially of natural language processing, it is not immediately available for graphs due to their non-sequential characteristics. To tackle this incompatibility, we propose GRaph-Aware Transformer (GRAT), the first Transformer-based model which can encode and decode whole graphs in end-to-end fashion. GRAT is featured with a self-attention mechanism adaptive to the edge information and an auto-regressive decoding mechanism based on the two-path approach consisting of sub-graph encoding path and node-and-edge generation path for each decoding step. We empirically evaluated GRAT on multiple setups including encoder-based tasks such as molecule property predictions on QM9 datasets and encoder-decoder-based tasks such as molecule graph generation in the organic molecule synthesis domain. GRAT has shown very promising results including state-of-the-art performance on 4 regression tasks in QM9 benchmark.
摘要：图形是自然的数据结构来表示在许多领域的关系和结构信息。为了覆盖广泛的图的数据应用，包括图形分类以及图形生成，理想的是具有由一个编码器和一个一般和灵活的模型，可以处理图形数据的解码器。虽然代表性编码器，解码器模型，变压器，显示各项任务，特别是自然语言处理性能优越，这是没有立即图表由于其不连续的特点。为了解决这种不兼容问题，我们提出了图形感知变压器（GRAT），第一基于变压器的模型，它可以编码和解码的终端到终端的整个时尚图表。 GRAT的特征具有自注意机制自适应于边缘信息，并基于由子图的编码路径和节点和边生成路径的每个解码步骤的双路径方法的自回归解码机制。我们凭经验评估了多个设置包括基于编码器的任务，如对QM9数据集和编码器 - 译码器为基础的任务分子属性的预测，例如在有机合成分子结构域分子图表生成GRAT。 GRAT已经显示出非常有希望的结果，包括在QM9基准4个回归任务的国家的最先进的性能。

32. Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer [PDF] 返回目录
Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-yi Lee
Abstract: In this paper, we seek to reduce the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention mechanisms; then, we pre-train the transformer-based model with those attentions in a self-supervised fashion and use them as feature extractors on downstream tasks, including phoneme classification and speaker classification. We find that the proposed approach, which only uses hand-crafted and learnable attentions, is comparable with the full self-attention.
摘要：在本文中，我们力求减少基于变压器模型的计算复杂度的讲话表示学习。我们评估10个注意机制;那么，我们预培养与关注的基于变压器的模型自监督方式，并以此作为对下游任务，包括音素分类和扬声器分类特征提取。我们发现，所提出的方法，只使用手工制作的和可以学习的关注，与完全的自我关注媲美。

33. On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech [PDF] 返回目录
Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik
Abstract: Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason for this are the high cost and complexity of training and using neural language models, mostly possible by adding a second decoding pass (rescoring). In our recent work we have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with text generation based data augmentation. In the present paper we analyze the amount of transferable knowledge and demonstrate that the neural augmented LM (RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by dropping the second decoding pass and making the system real-time capable. We also systematically compare word and subword LMs and show that subword-based neural text augmentation can be especially beneficial in under-resourced conditions. In addition, we show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
摘要：先进的神经网络模型，近年来渗透自动语音识别（ASR），但是，在语言模型许多系统仍然依赖于传统的退避的N-gram语言模型（BNLM）部分或全部。这样做的原因是成本高，训练的复杂性和使用的神经语言模型，大多可以通过添加第二解码通过（再评分）。在我们最近的工作，我们通过从回归神经网络语言模型（RNNLM）传授知识与文本生成基于数据增强单程BNLM显著提高了对话的语音转录系统的在线性能。在本论文中，我们分析转让的知识量，并证明神经增强LM（RNN-BNLM）可以帮助尚未通过丢弃第二解码通和使系统实时捕捉RNNLM的知识近50％能。我们还比较系统地字和子字LMS和表明基于子词神经文本增强可以在资源不足的情况特别有益。此外，我们表明，在第一遍之后神经第二次使用RNN-BNLM，离线ASR结果甚至可以显著改善。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-11

目录

摘要