摘要

1. Scaling Hidden Markov Language Models [PDF] 返回目录
Justin T. Chiu, Alexander M. Rush
Abstract: The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization. Experiments show that this approach leads to models that are more accurate than previous HMM and n-gram-based methods, making progress towards the performance of state-of-the-art neural models.
摘要：隐马尔可夫模型（HMM）是用于序列建模的基本工具，干净地从发光结构分离隐藏状态。然而，这种分离使得它很难适应HMM模型到大型数据集在现代NLP，他们已经堕落使用了由于与完全遵守机型性能非常差。这项工作重访缩放HMM模型语言建模的数据集，从最近方法神经建模思想服用的挑战。我们提出了HMM模型缩放大规模状态空间，同时保持高效率的精确推理，紧凑的参数，以及有效的正则化方法。实验结果表明，这种方法会导致模型比以往基于n元HMM和方法更准确，正在朝着国家的最先进的神经模型的性能进步。

2. Action State Update Approach to Dialogue Management [PDF] 返回目录
Svetlana Stoyanchev, Simon Keizer, Rama Doddipatla
Abstract: Utterance interpretation is one of the main functions of a dialogue manager, which is the key component of a dialogue system. We propose the action state update approach (ASU) for utterance interpretation, featuring a statistically trained binary classifier used to detect dialogue state update actions in the text of a user utterance. Our goal is to interpret referring expressions in user input without a domain-specific natural language understanding component. For training the model, we use active learning to automatically select simulated training examples. With both user-simulated and interactive human evaluations, we show that the ASU approach successfully interprets user utterances in a dialogue system, including those with referring expressions.
摘要：话语理解是一个对话的经理，这是一个对话系统的关键组成部分的主要功能之一。我们提出了话语理解的动作状态更新办法（ASU），配有用于检测用户话语的文本对话状态更新操作的统计学训练的二元分类。我们的目标是解释用户输入指称词语没有特定域的自然语言理解的组成部分。对于训练模型，我们使用主动学习来自动选择模拟训练的例子。随着这两个用户模拟和互动的人的评价，我们证明了ASU成功接近解释用户话语的对话系统，包括那些指称词语。

3. Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze [PDF] 返回目录
Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández
Abstract: When speakers describe an image, they tend to look at objects before mentioning them. In this paper, we investigate such sequential cross-modal alignment by modelling the image description generation process computationally. We take as our starting point a state-of-the-art image captioning system and develop several model variants that exploit information from human gaze patterns recorded during language production. In particular, we propose the first approach to image description generation where visual processing is modelled $\textit{sequentially}$. Our experiments and analyses confirm that better descriptions can be obtained by exploiting gaze-driven attention and shed light on human cognitive processes by comparing different ways of aligning the gaze modality with language production. We find that processing gaze data sequentially leads to descriptions that are better aligned to those produced by speakers, more diverse, and more natural${-}$particularly when gaze is encoded with a dedicated recurrent component.
摘要：当扬声器描述的形象，他们往往看对象提他们。在本文中，我们通过计算模拟图象描述生成处理调查这种顺序跨通道对准。我们需要为出发点的国家的最先进的图像字幕系统，并开发出利用从语言的产生过程中记录的人的目光模式的信息模型的几个变种。特别是，我们提出了第一种方法，以图像描述文件生成其中可视处理被建模$ \ textit {依次} $。我们的实验和分析证实，更好的描述可以通过利用凝视驱动的关注和比较对准语言生产凝视方式不同的方式对人的认知过程揭示获得。我们发现，处理数据的目光依次导致了更好地对准那些扬声器，更多样化的生产描述，更自然的$ { - } $尤其是当注视编码有专门的经常性组件。

4. Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts [PDF] 返回目录
Ece Takmaz, Mario Giulianelli, Sandro Pezzelle, Arabella Sinclair, Raquel Fernández
Abstract: Dialogue participants often refer to entities or situations repeatedly within a conversation, which contributes to its cohesiveness. Subsequent references exploit the common ground accumulated by the interlocutors and hence have several interesting properties, namely, they tend to be shorter and reuse expressions that were effective in previous mentions. In this paper, we tackle the generation of first and subsequent references in visually grounded dialogue. We propose a generation model that produces referring utterances grounded in both the visual and the conversational context. To assess the referring effectiveness of its output, we also implement a reference resolution system. Our experiments and analyses show that the model produces better, more effective referring utterances than a model not grounded in the dialogue context, and generates subsequent references that exhibit linguistic patterns akin to humans.
摘要：对话参与者往往是指实体或情况反复交谈，这有助于它的凝聚力内。后续引用利用由对话者积累的共同点，因此有几个有趣的特性，即，他们往往是是有效的先前提到短和重用表达式。在本文中，我们将处理在视觉上接地对话第一和后续引用的产生。我们建议产生指同时在视觉和会话语境接地话语的生成模型。为了评估其输出的参考有效性，我们还实现了一个参考解析系统。我们的实验和分析表明，该模型产生更好的，更有效的参照话语比对话的上下文不接地的模型，并生成表现出语言模式类似于人类的后续引用。

5. Automated Discovery of Mathematical Definitions in Text with Deep Neural Networks [PDF] 返回目录
Natalia Vanetik, Marina Litvak, Sergey Shevchuk, Lior Reznik
Abstract: Automatic definition extraction from texts is an important task that has numerous applications in several natural language processing fields such as summarization, analysis of scientific texts, automatic taxonomy generation, ontology generation, concept identification, and question answering. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. In this paper, we focus on automatic detection of one-sentence definitions in mathematical texts, which are difficult to separate from surrounding text. We experiment with several data representations, which include sentence syntactic structure and word embeddings, and apply deep learning methods such as the Convolutional Neural Network (CNN) and the Long Short-Term Memory network (LSTM), in order to identify mathematical definitions. Our experiments demonstrate the superiority of CNN and its combination with LSTM, when applied on the syntactically-enriched input representation. We also present a new dataset for definition extraction from mathematical texts. We demonstrate that this dataset is beneficial for training supervised models aimed at extraction of mathematical definitions. Our experiments with different domains demonstrate that mathematical definitions require special treatment, and that using cross-domain learning is inefficient for that task.
摘要：从文本自动定义的提取是有几个自然语言处理领域，如总结，科学文本，自动分类生成的分析各种应用，本体生成，概念识别，并答疑的一项重要任务。对于包含在单个句子中的定义，这个问题可以被看作是句子的二元分类成定义和非定义。在本文中，我们侧重于数学文本用一句话定义，这是很难从周围文本分开自动检测。我们几个数据表示，包括句子的句法结构和字的嵌入实验，并应用深度学习的方法，如卷积神经网络（CNN）和长短期记忆网络（LSTM），以便识别数学定义。我们的实验证明CNN的优越性，它与LSTM，当在语法上富集的输入表示施加组合。我们还提出从数学的文本定义提取新的数据集。我们表明，该数据集用于训练目的的数学定义提取监管模式是有益的。我们与不同领域的实验结果表明，数学定义需要特殊的处理，而且使用跨域学习效率低下这一任务。

6. Auxiliary Sequence Labeling Tasks for Disfluency Detection [PDF] 返回目录
Dongyub Lee, Byeongil Ko, Myeong Cheol Shin, Taesun Whang, Daniel Lee, Eun Hwa Kim, EungGyun Kim, Jaechoon Jo
Abstract: Detecting disfluencies in spontaneous speech is an important preprocessing step in natural language processing and speech recognition applications. In this paper, we propose a method utilizing named entity recognition (NER) and part-of-speech (POS) as auxiliary sequence labeling (SL) tasks for disfluency detection. First, we show that training a disfluency detection model with auxiliary SL tasks can improve its F-score in disfluency detection. Then, we analyze which auxiliary SL tasks are influential depending on baseline models. Experimental results on the widely used English Switchboard dataset show that our method outperforms the previous state-of-the-art in disfluency detection.
摘要：在自然语音检测不流利是在自然语言处理和语音识别应用的一个重要预处理步骤。在本文中，我们提出了利用命名实体识别（NER）和部分的语音（POS）为不流利检测辅助序列标签（SL）任务的方法。首先，我们表明，训练不流利检测模型辅助SL任务可以改善它的F-得分不流利的检测。然后，我们分析了辅助SL任务是有影响力取决于基线模型。广泛使用的英语总机数据集上，我们的方法优于先前的国家的最先进的检测不流利的实验结果。

7. VisBERT: Hidden-State Visualizations for Transformers [PDF] 返回目录
Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers
Abstract: Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) is one such black-box model. It has become a staple architecture to solve many different NLP tasks and has inspired a number of related Transformer models. Understanding how these models draw conclusions is crucial for both their improvement and application. We contribute to this challenge by presenting VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. Instead of analyzing attention weights, we focus on the hidden states resulting from each encoder block within the BERT model. This way we can observe how the semantic representations are transformed throughout the layers of the model. VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings. The tool allows us to identify distinct phases in BERT's transformations that are similar to a traditional NLP pipeline and offer insights during failed predictions.
摘要：Explainability和解释性两个重要的概念，如果没有这些可以而且应该阻碍成效良好的神经网络到现实世界的问题的应用程序。与此同时，他们很难融入大，暗箱操作模式，实现国家的先进成果在NLP任务，众说纷纭。从变压器双向编码器交涉（BERT）是一种这样的黑盒模型。它已成为大宗架构来解决许多不同的NLP任务，并激发了一些相关的变压器模型。了解这些模型是如何得出的结论是他们都改进和应用至关重要。我们通过介绍VisBERT，可视化内BERT上下文令牌表示为（多跳）答疑任务的工具有助于这一挑战。相反，分析关注权重，我们专注于从BERT模型中的每个编码器模块所产生的隐藏状态。这样我们就可以观察到语义表征是如何在整个模型的层转化。 VisBERT使用户能够获得关于模型的内部状态的见解，并探讨其推理的步骤或潜在的缺陷。该工具使我们能够识别BERT的转变类似于传统的NLP管道，并提供故障预测期间的见解不同的阶段。

8. Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension [PDF] 返回目录
Siyu Long, Ran Wang, Kun Tao, Jiali Zeng, Xin-Yu Dai
Abstract: Machine reading comprehension (MRC) is the task that asks a machine to answer questions based on a given context. For Chinese MRC, due to the non-literal and non-compositional semantic characteristics, Chinese idioms pose unique challenges for machines to understand. Previous studies tend to treat idioms separately without fully exploiting the relationship among them. In this paper, we first define the concept of literal meaning coverage to measure the consistency between semantics and literal meanings for Chinese idioms. With the definition, we prove that the literal meanings of many idioms are far from their semantics, and we also verify that the synonymic relationship can mitigate this inconsistency, which would be beneficial for idiom comprehension. Furthermore, to fully utilize the synonymic relationship, we propose the synonym knowledge enhanced reader. Specifically, for each idiom, we first construct a synonym graph according to the annotations from a high-quality synonym dictionary or the cosine similarity between the pre-trained idiom embeddings and then incorporate the graph attention network and gate mechanism to encode the graph. Experimental results on ChID, a large-scale Chinese idiom reading comprehension dataset, show that our model achieves state-of-the-art performance.
摘要：机阅读理解（MRC）是问一个机器基于给定情境答题任务。对于中国MRC，由于非字面和非成分语义特征，中国成语构成的机器，了解独特的挑战。以往的研究倾向于把成语分别不充分利用它们之间的关系。在本文中，我们首先定义的字面意思覆盖的概念来衡量语义和字面意义的中国成语之间的一致性。通过定义，我们证明了许多成语的字面意义是远离它们的语义，我们也验证同义关系可以缓解这个矛盾，这将是对成语的理解是有益的。此外，为了充分利用同义关系，我们提出的代名词知识增强读者。具体而言，对于每个成语，我们首先构造一个同义词图表根据从高质量同义词词典或预先训练成语的嵌入之间的余弦相似性注释，然后结合图关注网络和闸门机构的图进行编码。在责骂实验结果，大规模的中国成语阅读理解数据集，表明我们的模型实现了国家的最先进的性能。

9. Hierarchical Multitask Learning Approach for BERT [PDF] 返回目录
Çağla Aksoy, Alper Ahmetoğlu, Tunga Güngör
Abstract: Recent works show that learning contextualized embeddings for words is beneficial for downstream tasks. BERT is one successful example of this approach. It learns embeddings by solving two tasks, which are masked language model (masked LM) and the next sentence prediction (NSP). The pre-training of BERT can also be framed as a multitask learning problem. In this work, we adopt hierarchical multitask learning approaches for BERT pre-training. Pre-training tasks are solved at different layers instead of the last layer, and information from the NSP task is transferred to the masked LM task. Also, we propose a new pre-training task bigram shift to encode word order information. We choose two downstream tasks, one of which requires sentence-level embeddings (textual entailment), and the other requires contextualized embeddings of words (question answering). Due to computational restrictions, we use the downstream task data instead of a large dataset for the pre-training to see the performance of proposed models when given a restricted dataset. We test their performance on several probing tasks to analyze learned embeddings. Our results show that imposing a task hierarchy in pre-training improves the performance of embeddings.
摘要：最近的工作表明，学习情境的嵌入了字对于下游的任务是有益的。 BERT是这种方法的一个成功范例。它通过解决两个任务，这是蒙面的语言模型（LM屏蔽）和下一句预测（NSP）获悉的嵌入。 BERT的岗前培训，也可以框定为多任务学习问题。在这项工作中，我们采用分层多任务学习方法对BERT前培训。前培训的任务是在不同的层，而不是最后一层，并从NSP任务信息解决转移到蒙面LM任务。此外，我们提出了一个新的预培训任务两字转移到编码词序信息。我们选择两个下游任务，其中之一要求语句级的嵌入（文字蕴涵），以及其他需要的话语境的嵌入（问答）。由于计算的限制，我们使用了下游任务数据，而不是对前培训的大型数据集，看提出的模型的表现给予了有限的数据集时。我们测试的几个完成探测任务的表现来分析学的嵌入。我们的研究结果表明，在前期训练征收任务层次提高的嵌入的性能。

10. Bangla Text Classification using Transformers [PDF] 返回目录
Tanvirul Alam, Akib Khan, Firoj Alam
Abstract: Text classification has been one of the earliest problems in NLP. Over time the scope of application areas has broadened and the difficulty of dealing with new areas (e.g., noisy social media content) has increased. The problem-solving strategy switched from classical machine learning to deep learning algorithms. One of the recent deep neural network architecture is the Transformer. Models designed with this type of network and its variants recently showed their success in many downstream natural language processing tasks, especially for resource-rich languages, e.g., English. However, these models have not been explored fully for Bangla text classification tasks. In this work, we fine-tune multilingual transformer models for Bangla text classification tasks in different domains, including sentiment analysis, emotion detection, news categorization, and authorship attribution. We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.
摘要：文本分类一直是NLP最早的问题之一。随着时间的推移应用领域的范围已经扩大和处理新的领域（例如，嘈杂的社交媒体内容）的难度有所增加。解决问题的策略，从传统的机器学习深度学习算法进行切换。一个最近的深层神经网络结构是变压器。设计这种类型的网络及其变种车型最近发现自己在很多下游的自然语言处理任务的成功，特别是对于资源丰富的语言，例如英语。然而，这些模型还没有被充分挖掘了孟加拉文本分类的任务。在这项工作中，我们微调多种语言变压器模型，在不同的领域，包括情感分析，情感检测，新闻分类，和作者归属孟加拉文本分类的任务。我们获得六个标准数据集的艺术效果的状态下，一旦通过在不同的任务，5-29％的精度以前的结果改进。

11. Catch the "Tails" of BERT [PDF] 返回目录
Ziyang Luo
Abstract: Recently, contextualized word embeddings outperform static word embeddings on many NLP tasks. However, we still don't know much about the mechanism inside these internal representations produced by BERT. Do they have any common patterns? What are the relations between word sense and context? We find that nearly all the contextualized word vectors of BERT and RoBERTa have some common patterns. For BERT, the $557^{th}$ element is always the smallest. For RoBERTa, the $588^{th}$ element is always the largest and the $77^{th}$ element is the smallest. We call them as "tails" of models. We find that these "tails" are the major cause of anisotrpy of the vector space. After "cutting the tails", the same word's different vectors are more similar to each other. The internal representations also perform better on word-in-context (WiC) task. These suggest that "cutting the tails" can decrease the influence of context and better represent word sense.
摘要：近日，情境化的嵌入字胜过许多NLP任务静态字的嵌入。但是，我们仍然不知道很多有关BERT产生的这些内部表示内部机制。难道他们有什么共同的模式？什么是词义和上下文之间的关系？我们发现BERT和罗伯塔的几乎所有的情境词矢量有一些共同的模式。对于BERT，在$ 557 ^ {个} $元素始终是最小的。对于罗伯塔，在$ 588 ^ {个} $元素总是最大，$ 77国^ {个} $元素是最小的。我们称他们为模型的“尾巴”。我们发现，这些“尾巴”是向量空间的anisotrpy的主要原因。 “切尾巴”后，同一个词的不同载体更彼此相似。该内部表示还对词在上下文（WIC）任务有更好的表现。这表明，“切尾巴”，可以减少环境的影响，更好地代表词义。

12. Low-Resource Adaptation of Neural NLP Models [PDF] 返回目录
Farhad Nooralahzadeh
Abstract: Real-world applications of natural language processing (NLP) are challenging. NLP models rely heavily on supervised machine learning and require large amounts of annotated data. These resources are often based on language data available in large quantities, such as English newswire. However, in real-world applications of NLP, the textual resources vary across several dimensions, such as language, dialect, topic, and genre. It is challenging to find annotated data of sufficient amount and quality. The objective of this thesis is to investigate methods for dealing with such low-resource scenarios in information extraction and natural language understanding. To this end, we study distant supervision and sequential transfer learning in various low-resource settings. We develop and adapt neural NLP models to explore a number of research questions concerning NLP tasks with minimal or no training data.
摘要：真实世界的自然语言处理（NLP）的应用正在挑战。 NLP模型在很大程度上依赖于监督的机器学习和需要大量的注释数据。这些资源往往是基于大批量，比如英文通讯社提供语言数据。然而，在自然语言处理的现实世界的应用，文本资源在几个方面，如语言，方言，主题和风格有所不同。它是具有挑战性找到足够数量和质量的注释数据。本文的目的是探讨关于处理信息提取和自然语言理解等资源不足的情况的方法。为此，我们研究了各种低资源环境遥远的监督和顺序传送学习。我们开发和适应神经NLP模型来探索了一些关于最小或没有训练数据NLP任务研究的问题。

13. Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT [PDF] 返回目录
Rik van Noord, Antonio Toral, Johan Bos
Abstract: We combine character-level and contextual language model representations to improve performance on Discourse Representation Structure parsing. Character representations can easily be added in a sequence-to-sequence model in either one encoder or as a fully separate encoder, with improvements that are robust to different language models, languages and data sets. For English, these improvements are larger than adding individual sources of linguistic information or adding non-contextual embeddings. A new method of analysis based on semantic tags demonstrates that the character-level representations improve performance across a subset of selected semantic phenomena.
摘要：我们结合字符级和上下文的语言模型表示，提高语篇表述结构解析性能。字符表示可以很容易地在一个序列到序列模型被添加在任一个编码器或作为一个完全独立的编码器中，与具有鲁棒性，以不同的语言模型，语言和数据集的改进。对于英语，这些改进是不是添加的语言信息或添加非上下文的嵌入各个源大。基于语义标签分析的新方法证明了字符级表示提高整个的选择的语义现象的一个子集性能。

14. BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention [PDF] 返回目录
Zhebin Zhang, Sai Wu, Dawei Jiang, Gang Chen
Abstract: BERT-enhanced neural machine translation (NMT) aims at leveraging BERT-encoded representations for translation tasks. A recently proposed approach uses attention mechanisms to fuse Transformer's encoder and decoder layers with BERT's last-layer representation and shows enhanced performance. However, their method doesn't allow for the flexible distribution of attention between the BERT representation and the encoder/decoder representation. In this work, we propose a novel BERT-enhanced NMT model called BERT-JAM which improves upon existing models from two aspects: 1) BERT-JAM uses joint-attention modules to allow the encoder/decoder layers to dynamically allocate attention between different representations, and 2) BERT-JAM allows the encoder/decoder layers to make use of BERT's intermediate representations by composing them using a gated linear unit (GLU). We train BERT-JAM with a novel three-phase optimization strategy that progressively unfreezes different components of BERT-JAM. Our experiments show that BERT-JAM achieves SOTA BLEU scores on multiple translation tasks.
摘要：利用BERT编码表示的翻译任务BERT增强神经机器翻译（NMT）的目标。最近提出的方法使用注意机制保险丝变压器的编码器和解码器层与BERT的最后层表示，并显示增强的性能。然而，他们的方法不允许对于关注的BERT表示和编码器/解码器表示之间的柔性分布。在这项工作中，我们提出了一种新颖的BERT增强NMT模型称为BERT-JAM其中改进了现有的模型从两个方面：1）BERT-JAM使用联合注意模块允许编码器/译码器层，以动态地分配不同表示之间的关注，和2）BERT-JAM允许编码器/译码器层通过使用门控线性单元（GLU）构成他们使用BERT的中间表示。我们有一个新的三阶段优化策略，逐步解除冻结BERT-JAM的不同组成部分训练BERT-JAM。我们的实验表明，BERT-JAM实现了对多个翻译任务SOTA BLEU分数。

15. CapWAP: Captioning with a Purpose [PDF] 返回目录
Adam Fisch, Kenton Lee, Ming-Wei Chang, Jonathan H. Clark, Regina Barzilay
Abstract: The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rather than merely provide generic information about an image. In this task, we use question-answer (QA) pairs---a natural expression of information need---from users, instead of reference captions, for both training and post-inference evaluation. We show that it is possible to use reinforcement learning to directly optimize for the intended information need, by rewarding outputs that allow a question answering model to provide correct answers to sampled user questions. We convert several visual question answering datasets into CapWAP datasets, and demonstrate that under a variety of scenarios our purposeful captioning system learns to anticipate and fulfill specific information needs better than its generic counterparts, as measured by QA performance on user questions from unseen images, when using the caption alone as context.
摘要：传统的图像字幕任务使用的通用参考字幕提供有关图片的文本信息。不同的用户群，但是，会在乎图像的不同视觉方面。在本文中，我们提出了一个新的任务，有目的（CAPWAP）字幕。我们的目标是开发出可以根据对预期的人群的信息需求有用的，而不是仅仅提供有关图像的通用信息系统。在此任务中，我们使用问答（QA）---对信息的需要自然的表情---来自用户的，而不是参考标题，对训练和后推断评价。我们表明，它可以使用强化学习直接优化为目的的信息需求，通过奖励输出，可答疑系统的模型提供正确的答案，以抽样用户的问题。我们转换了多种视觉问答集成CAPWAP数据集，并证明在各种情况下我们有目的的字幕系统学习预见并满足特定信息的需求比其一般的同行，由QA表现从看不见的图像，当用户提问测单独使用标题作为背景。

16. AI Stories: An Interactive Narrative System for Children [PDF] 返回目录
Ben Burtenshaw
Abstract: AI Stories is a proposed interactive dialogue system, that lets children co-create narrative worlds through conversation. Over the next three years this system will be developed and tested within pediatric wards, where it offers a useful resource between the gap of education and play. Telling and making stories is a fundamental part of language play, and its chatty and nonsensical qualities are important; therefore, the prologued usage an automated system offers is a benefit to children. In this paper I will present the current state of this project, in its more experimental and general guise. Conceptually story-telling through dialogue relates to the preprint interpretation of story, beyond the static and linear medium, where stories were performative, temporal, and social.
摘要：AI的故事，是拟议的互动对话系统，让通过交谈的孩子共同创造世界的叙述。在未来三年内，这个系统将被开发和儿科病房，在那里它提供教育和娱乐的差距之间的有用的资源范围内进行测试。故事和制作的故事是语言游戏的基本组成部分，它的健谈，荒谬的素质是重要的;因此，prologued使用自动化系统报价是儿童受益。在本文中，我将介绍项目的当前状态，其更多的实验和一般的幌子。从概念讲故事通过对话涉及故事的预印本解释，超越了静态和线性中，其中的故事是表演，时间和社会。

17. Pointing to Subwords for Generating Function Names in Source Code [PDF] 返回目录
Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Abstract: We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accuracy on the Java-small and Java-large datasets.
摘要：我们处理的自动生成从源代码的函数名的任务。现有发电机面临产生低频或外的词汇子词困难。在本文中，我们提出了复制低频或外的词汇的子词在输入两种策略。我们表现最好的模型显示在传统方法的改进在Java小和Java的大型数据集我们修改F1和准确性方面。

18. Efficient End-to-End Speech Recognition Using Performers in Conformers [PDF] 返回目录
Peidong Wang, DeLiang Wang
Abstract: On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addition to model sizes. More specifically, we reduce the floating-point operations in conformer by replacing the transformer module with a performer. The proposed attention-based efficient end-to-end speech recognition model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear computation complexity. The proposed model also outperforms previous lightweight end-to-end models by about 20% relatively in word error rate.
摘要：在设备终端到终端的语音识别带来的型号效率高的要求。大多数现有的作品通过降低模型大小提高了工作效率。我们建议，以减少除了模型尺寸模型架构的复杂性。更具体地说，我们用一个演员更换变压器模块减少构象的浮点运算。建议关注基于高效终端到终端的语音识别模型产生与1000万的参数和线性计算复杂度LibriSpeech语料库竞争力的性能。该模型也约20％，相对字错误率优于先前的轻量级终端到高端机型。

19. Text Classification through Glyph-aware Disentangled Character Embedding and Semantic Sub-character Augmentation [PDF] 返回目录
Takumi Aoki, Shunsuke Kitada, Hitoshi Iyatomi
Abstract: We propose a new character-based text classification framework for non-alphabetic languages, such as Chinese and Japanese. Our framework consists of a variational character encoder (VCE) and character-level text classifier. The VCE is composed of a $\beta$-variational auto-encoder ($\beta$-VAE) that learns the proposed glyph-aware disentangled character embedding (GDCE). Since our GDCE provides zero-mean unit-variance character embeddings that are dimensionally independent, it is applicable for our interpretable data augmentation, namely, semantic sub-character augmentation (SSA). In this paper, we evaluated our framework using Japanese text classification tasks at the document- and sentence-level. We confirmed that our GDCE and SSA not only provided embedding interpretability but also improved the classification performance. Our proposal achieved a competitive result to the state-of-the-art model while also providing model interpretability. Our code is available on this https URL
摘要：我们提出了非字母语言，如中国和日本的新的基于字符的文本分类框架。我们的框架由一个变字符编码器（VCE）和字符级文本分类的。该VCE由$ \公测$ -variational自动编码器（$ \ $公测-VAE），该学会建议的字形感知解缠结的字符嵌入（GDCE）。由于我们的GDCE提供零均值单位方差的嵌入字符是尺寸上独立的，它适用于我们的可解释的数据扩张，即，语义子字符增强（SSA）。在本文中，我们使用的文档 - 和句子级日语文本分类任务评估我们的框架。我们证实，我们的GDCE和SSA不仅提供了嵌入解释性而且还提高了分类性能。我们的建议获得有竞争力的结果，以国家的最先进的机型，同时还提供模型解释性。我们的代码可以在此HTTPS URL

20. Chapter Captor: Text Segmentation in Novels [PDF] 返回目录
Charuta Pethe, Allen Kim, Steven Skiena
Abstract: Books are typically segmented into chapters and sections, representing coherent subnarratives and topics. We investigate the task of predicting chapter boundaries, as a proxy for the general task of segmenting long texts. We build a Project Gutenberg chapter segmentation data set of 9,126 English novels, using a hybrid approach combining neural inference and rule matching to recognize chapter title headers in books, achieving an F1-score of 0.77 on this task. Using this annotated data as ground truth after removing structural cues, we present cut-based and neural methods for chapter segmentation, achieving an F1-score of 0.453 on the challenging task of exact break prediction over book-length documents. Finally, we reveal interesting historical trends in the chapter structure of novels.
摘要：书籍通常会被分割成章节，代表一致subnarratives和主题。我们调查预测章节界限，为分割长文本的总任务代理的任务。我们建立的9126本英文小说一古腾堡计划分章符的数据集，使用一种混合的方法结合神经推理和规则匹配的识别章节标题头中书，实现了0.77这一任务的F1-得分。消除结构的线索后，使用此注释数据作为基础事实，我们目前切为基础的市场细分章神经的方法，实现了0.453的精确预测休息过书长文档具有挑战性的任务的F1-得分。最后，我们揭示小说的章节结构有趣的历史趋势。

21. "What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [PDF] 返回目录
Yuntao Li, Bei Chen, Qian Liu, Yan Gao, Jian-Guang Lou, Yan Zhang, Dongmei Zhang
Abstract: In Natural Language Interfaces to Databases systems, the text-to-SQL technique allows users to query databases by using natural language questions. Though significant progress in this area has been made recently, most parsers may fall short when they are deployed in real systems. One main reason stems from the difficulty of fully understanding the users' natural language questions. In this paper, we include human in the loop and present a novel parser-independent interactive approach (PIIA) that interacts with users using multi-choice questions and can easily work with arbitrary parsers. Experiments were conducted on two cross-domain datasets, the WikiSQL and the more complex Spider, with five state-of-the-art parsers. These demonstrated that PIIA is capable of enhancing the text-to-SQL performance with limited interaction turns by using both simulation and human evaluation.
摘要：在自然语言界面到数据库系统中，文本到SQL技术允许用户查询数据库通过使用自然语言问题。虽然在这方面的进步显著最近已经提出的，当时他们在实际系统中部署最解析器可能功亏一篑。从充分理解用户的自然语言问题的难度一个主要的原因造成的。在本文中，我们包括人类中环和提出了一种新的解析器无关的交互方式（PIIA），与使用多选择题用户交互，并可以方便地与任意解析器的工作。实验是在两个交叉域的数据集，所述WikiSQL和更复杂的蜘蛛进行的，有五个状态的最先进的解析器。这些表明，PIIA能够同时使用模拟和人工评估提高了文本到SQL性能有限的交互圈。

22. CxGBERT: BERT meets Construction Grammar [PDF] 返回目录
Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak, Petar Milin
Abstract: While lexico-semantic elements no doubt capture a large amount of linguistic information, it has been argued that they do not capture all information contained in text. This assumption is central to constructionist approaches to language which argue that language consists of constructions, learned pairings of a form and a function or meaning that are either frequent or have a meaning that cannot be predicted from its component parts. BERT's training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information. In this work we design several probes and conduct extensive experiments to answer this question. Our results allow us to conclude that BERT does indeed have access to a significant amount of information, much of which linguists typically call constructional information. The impact of this observation is potentially far-reaching as it provides insights into what deep learning methods learn from text, while also showing that information contained in constructions is redundantly encoded in lexico-semantics.
摘要：虽然字典式的语义元素无疑捕捉了大量的语言信息，有人认为，他们不捕获包含在文本的所有信息。这个假设是中央建构方法语言，它认为，语言是由结构，即要么频繁或有不能从它的组成部分预言意义的形式和功能或意义的教训配对。 BERT的培养目标，便可访问的字典式的语义大量信息，并同时BERTology表明，BERT捕捉某些重要的语言方面，也还没有研究探索到BERT可能访问的结构信息的范围。在这项工作中，我们设计了几种探针进行广泛的实验来回答这个问题。我们的研究结果让我们得出结论，BERT确实有机会获得的信息显著量，其中大部分语言学家通常称之为结构信息。这种看法的影响可能是深远的，因为它提供了深入了解什么深刻的学习方法，从文字学，同时也显示在字典式语义冗余编码包含在结构信息。

23. The UCF Podcast Summarization System at TREC 2020 [PDF] 返回目录
Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu
Abstract: We present implementation details of our abstractive summarizers that achieve competitive results on the Podcast Summarization task of TREC 2020. A concise textual summary that captures important information is crucial for users to decide whether to listen to the podcast. Prior work focuses primarily on learning contextualized representations. Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points. We highlight the design considerations behind our system and offer key insights into the strengths and weaknesses of neural abstractive systems. Our results suggest that identifying important segments from transcripts to use as input to an abstractive summarizer is advantageous for summarizing long documents. Our best system achieved a quality rating of 1.559 judged by NIST evaluators---an absolute increase of 0.268 (+21%) over the creator descriptions.
摘要：我们认为对实现2020年TREC一个简洁的文字概括的播客总结任务的竞争结果，捕捉重要信息是至关重要的用户来决定是否听播客抽象summarizers的本实施细则。以前的工作主要侧重于学习情境表示。相反，我们调查的神经抽象总结的几个研究得较少的方面，包括（i）从成绩单中选择的重要部分，作为输入汇总程序的重要性; （ⅱ）撞击量和训练实例的质量之间的平衡; （ⅲ）适当的摘要长度和开始/结束点。我们强调我们的系统背后的设计考虑，并提供关键的见解的优势和神经系统抽象的弱点。我们的研究结果表明，从成绩单识别重要的段用作输入到一个抽象的概括器是总结长文档有利。我们最好的系统实现了由NIST评估--- 0.268（+ 21％）的绝对增幅比创作者的描述判断质量等级的1.559。

24. What time is it? Temporal Analysis of Novels [PDF] 返回目录
Allen Kim, Charuta Pethe, Steven Skiena
Abstract: Recognizing the flow of time in a story is a crucial aspect of understanding it. Prior work related to time has primarily focused on identifying temporal expressions or relative sequencing of events, but here we propose computationally annotating each line of a book with wall clock times, even in the absence of explicit time-descriptive phrases. To do so, we construct a data set of hourly time phrases from 52,183 fictional books. We then construct a time-of-day classification model that achieves an average error of 2.27 hours. Furthermore, we show that by analyzing a book in whole using dynamic programming of breakpoints, we can roughly partition a book into segments that each correspond to a particular time-of-day. This approach improves upon baselines by over two hours. Finally, we apply our model to a corpus of literature categorized by different periods in history, to show interesting trends of hourly activity throughout the past. Among several observations we find that the fraction of events taking place past 10 P.M jumps past 1880 coincident with the advent of the electric light bulb and city lights.
摘要：鉴于时间在故事的流动是理解它的重要方面。与时间有关的以前的工作主要集中在识别时间表达或事件的相对排序，但在这里我们建议在计算注释一本书，挂钟时间的每一行，即使没有明确的时间描述性短语。要做到这一点，我们从52183本虚构的书籍构造小时时间短语的数据集。然后，我们构建实现了2.27小时的平均故障时间按日分类模型。此外，我们表明，通过分析一本书在整个使用断点的动态编程，可以大致划分一本书成段，每个对应于特定时间的日。这种方法通过了两个多小时后基线提高。最后，我们应用我们的模型到文学在历史上不同时期分类的语料库，在整个过去的每小时展示活动的有趣的趋势。其中一些意见，我们发现，发生近10 P.M事件的部分跳过去的1880重合的电灯泡和城市灯光的到来。

25. Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics [PDF] 返回目录
Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu
Abstract: In text summarization, evaluating the efficacy of automatic metrics without human judgments has become recently popular. One exemplar work concludes that automatic metrics strongly disagree when ranking high-scoring summaries. In this paper, we revisit their experiments and find that their observations stem from the fact that metrics disagree in ranking summaries from any narrow scoring range. We hypothesize that this may be because summaries are similar to each other in a narrow scoring range and are thus, difficult to rank. Apart from the width of the scoring range of summaries, we analyze three other properties that impact inter-metric agreement - Ease of Summarization, Abstractiveness, and Coverage. To encourage reproducible research, we make all our analysis code and data publicly available.
摘要：在文本摘要，自动评价指标的功效，无需人工判断已经成为最近流行。一个模范工作的结论是，自动指标排名高得分总结时坚决不同意。在本文中，我们重新审视他们的实验，发现他们的观察的事实，度量从任何狭窄的范围内打分排名摘要不同意干。我们推测这可能是因为摘要是在狭窄的范围内计分彼此相似，并且因此，难以秩。除了总结的得分范围的宽度，我们分析其他三个属性是影响跨指标协议 - 总结，Abstractiveness和覆盖易于。为了鼓励可重复的研究，我们做所有我们的分析代码和数据公开。

26. Exploring End-to-End Differentiable Natural Logic Modeling [PDF] 返回目录
Yufei Feng, Zi'ou Zheng, Quan Liu, Michael Greenspan, Xiaodan Zhu
Abstract: We explore end-to-end trained differentiable models that integrate natural logic with neural networks, aiming to keep the backbone of natural language reasoning based on the natural logic formalism while introducing subsymbolic vector representations and neural components. The proposed model adapts module networks to model natural logic operations, which is enhanced with a memory component to model contextual information. Experiments show that the proposed framework can effectively model monotonicity-based reasoning, compared to the baseline neural network models without built-in inductive bias for monotonicity-based reasoning. Our proposed model shows to be robust when transferred from upward to downward inference. We perform further analyses on the performance of the proposed model on aggregation, showing the effectiveness of the proposed subcomponents on helping achieve better intermediate aggregation performance.
摘要：本文探讨的端至端自然逻辑与神经网络集成训练的微模型，旨在保持自然语言推理基于自然逻辑形式主义同时引入亚符号矢量表示和神经元件的骨干力量。所提出的模型适应模块网络建模自然逻辑运算，其与存储器组件增强的上下文信息进行建模。实验表明，该框架可以有效的基于模型的单调性的推理，与没有底线的神经网络模型内置了基于单调推理归纳偏置。向上转移到向下的推理的时候我们提出的模型显示是稳健的。我们上聚集了模型的性能进行进一步的分析，显示在帮助实现更好的中间聚集的性能所提出的子组件的有效性。

27. Stochastic Attention Head Removal: A Simple and Effective Method for Improving Automatic Speech Recognition with Transformers [PDF] 返回目录
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Abstract: Recently, Transformers have shown competitive automatic speech recognition (ASR) results. One key factor to the success of these models is the multi-head attention mechanism. However, we observed in trained models, the diagonal attention matrices indicating the redundancy of the corresponding attention heads. Furthermore, we found some architectures with reduced numbers of attention heads have better performance. Since the search for the best structure is time prohibitive, we propose to randomly remove attention heads during training and keep all attention heads at test time, thus the final model can be viewed as an average of models with different architectures. This method gives consistent performance gains on the Wall Street Journal, AISHELL, Switchboard and AMI ASR tasks. On the AISHELL dev/test sets, the proposed method achieves state-of-the-art Transformer results with 5.8%/6.3% word error rates.
摘要：近日，变形金刚已经证明有竞争力的自动语音识别（ASR）的结果。这些模型的成功的一个关键因素是多头注意机制。然而，我们在训练的模型中观察到，对角矩阵注意指示对应关注头的冗余。此外，我们还发现了一些架构与减少的注意头数有更好的表现。由于最佳结构中的搜索时间望而却步，我们建议在训练中随机取出注意头，并保持所有的注意力头在测试时间，从而最终模型可以被看作是一个平均的机型不同的架构。该方法给出了华尔街日报，AISHELL，交换机和AMI ASR任务一致的性能提升。在AISHELL开发/测试集，所提出的方法实现了国家的最先进的变压器的结果与5.8％/ 6.3％的字错误率。

28. Adapting a Language Model for Controlled Affective Text Generation [PDF] 返回目录
Ishika Singh, Ahsan Barkati, Tushar Goswamy, Ashutosh Modi
Abstract: Human use language not just to convey information but also to express their inner feelings and mental states. In this work, we adapt the state-of-the-art language generation models to generate affective (emotional) text. We posit a model capable of generating affect-driven and topic-focused sentences without losing grammatical correctness as the affect intensity increases. We propose to incorporate emotion as prior for the probabilistic state-of-the-art text generation model such as GPT-2. The model gives a user the flexibility to control the category and intensity of emotion as well as the topic of the generated text. Previous attempts at modelling fine-grained emotions fall out on grammatical correctness at extreme intensities, but our model is resilient to this and delivers robust results at all intensities. We conduct automated evaluations and human studies to test the performance of our model and provide a detailed comparison of the results with other models. In all evaluations, our model outperforms existing affective text generation models.
摘要：人类使用语言不仅仅是传达信息，但也表达自己的内心情感和心理状态。在这项工作中，我们适应国家的最先进的语言生成模型来生成情感（情感）的文本。我们断定能够产生的影响模型驱动和主题为重点的句子不失语法的正确性的影响强度的增加。我们建议纳入情感为先的概率国家的最先进的文本生成模型，如GPT-2。该模型给出了一个用户可以灵活地控制情绪的类别和强度以及所生成的文本的主题。在造型细粒度的情感以前曾试图在极端强度的语法的正确性掉下来，但我们的模型是有弹性的，以本，并提供在所有强度稳定的结果。我们进行自动评估和人类的研究来检验我们的模型的性能，并提供与其他模式的结果的详细比较。在所有的评估，我们的模型优于现有的情感文字一代车型。

29. A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems [PDF] 返回目录
Craig Thomson, Ehud Reiter
Abstract: Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics
摘要：大多数自然语言生成系统需要能产生精确的文本。我们提出了生成文本的准确性，其目的是作为一个黄金标准的数据到文本系统的精度评估的高素质人力评估的方法。我们用我们的方法来评估计算机生成的篮球摘要的准确性。然后，我们展示我们的黄金标准评价如何被用于验证自动指标

30. Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings [PDF] 返回目录
Roshan Santosh, H. Andrew Schwartz, Johannes C. Eichstaedt, Lyle H. Ungar, Sharath C. Guntuku
Abstract: In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before being reported by the Centers for Disease Control (CDC).
摘要：在本文中，我们提出用于检测COVID-19的症状的迭代基于图的方法中，病理学似乎是不断变化的，其的。更一般地，该方法可以适用于找到上下文特定词语和文本（例如症状提及）在大不平衡语料库（例如所有的鸣叫提＃COVID-19）。鉴于COVID-19的新颖性，我们也测试，如果提出的方法推广到检测药品不良反应（ADR）的问题。我们发现，该方法适用于Twitter的数据可以检测症状由疾病控制中心（CDC）在报告之前大幅提到。

31. On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages [PDF] 返回目录
Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
Abstract: While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languages, this discrepancy in performance calls for an explanation. We study the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs. We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer. At the same time, we observe that recurrent models are expressive enough to recognize Dyck words of arbitrary lengths in finite precision if their depths are bounded. Hence, we evaluate our models on samples generated from Dyck languages with bounded depth and find that they are indeed able to generalize to much higher lengths. Since natural language datasets have nested dependencies of bounded depth, this may help explain why they perform well in modeling hierarchical dependencies in natural language data despite prior works indicating poor generalization performance on Dyck languages. We perform probing studies to support our results and provide comparisons with Transformers.
摘要：尽管反复模式已有效地NLP任务，他们对上下文无关语言（节能灯）的表现已被发现是相当弱的。由于节能灯被认为是捕捉重要的现象，例如分层结构的自然语言，这种差异在性能上呼吁作出解释。我们反复研究模型对戴克-N语言中，显得尤为重要和充分研究类节能灯的性能。我们发现，尽管反复模式概括几乎完全如果训练和测试串的长度在相同的范围内，他们表现不佳，如果测试字符串是更长的时间。与此同时，我们观察到，经常模型表现足以识别任意长度的戴克的话在有限精度，如果他们的深度界。因此，我们评估与有限的深度从戴克语言生成的样本我们的模型，发现他们的确能够推广到更高的长度。由于自然语言的数据集有界深度嵌套的相关性，这可能有助于解释为什么他们表示，尽管对戴克语言的泛化性能较差之前的作品在造型自然语言数据分层的依赖表现良好。我们进行探测研究，以支持我们的结果并提供与变压器的比较。

32. Denoising Relation Extraction from Document-level Distant Supervision [PDF] 返回目录
Chaojun Xiao, Yuan Yao, Ruobing Xie, Xu Han, Zhiyuan Liu, Maosong Sun, Fen Lin, Leyu Lin
Abstract: Distant supervision (DS) has been widely used to generate auto-labeled data for sentence-level relation extraction (RE), which improves RE performance. However, the existing success of DS cannot be directly transferred to the more challenging document-level relation extraction (DocRE), since the inherent noise in DS may be even multiplied in document level and significantly harm the performance of RE. To address this challenge, we propose a novel pre-trained model for DocRE, which denoises the document-level DS data via multiple pre-training tasks. Experimental results on the large-scale DocRE benchmark show that our model can capture useful information from noisy DS data and achieve promising results.
摘要：遥远的监督（DS）已被广泛用于生成句级关系抽取（RE），从而提高了性能RE自动标记的数据。然而，DS的现有成功不能直接传递到更有挑战性文档级关系抽取（DocRE），由于DS中的固有噪声可能在文档级别甚至相乘并显著损害RE的性能。为了应对这一挑战，我们提出了DocRE，它通过多前的训练任务去噪文档级DS数据的新的预训练的模式。在大型DocRE基准测试的实验结果表明我们的模型可以捕捉从嘈杂的DS数据有用的信息，并实现希望的结果。

33. Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data [PDF] 返回目录
Ankit Arun, Soumya Batra, Vikas Bhardwaj, Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White
Abstract: Natural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions. Although neural network solutions recently developed in the research community have been shown to provide several benefits, deployment of such model-based solutions has been challenging due to high latency, correctness issues, and high data needs. In this paper, we present approaches that have helped us deploy data-efficient neural solutions for NLG in conversational systems to production. We describe a family of sampling and modeling techniques to attain production quality with light-weight neural network models using only a fraction of the data that would be necessary otherwise, and show a thorough comparison between each. Our results show that domain complexity dictates the appropriate approach to achieve high data efficiency. Finally, we distill the lessons from our experimental findings into a list of best practices for production-level NLG model development, and present them in a brief runbook. Importantly, the end products of all of the techniques are small sequence-to-sequence models (2Mb) that we can reliably deploy in production.
摘要：自然语言生成（NLG）是对话系统的重要组成部分，由于其制定正确的和自然的文本响应的作用。传统上，前起落架组件一直在使用基于模板的解决方案部署。虽然最近在研究界开发的神经网络解决方案已被证明有几个优点，例如基于模型的解决方案部署已具有挑战性，由于高延迟性，正确性的问题，和高数据需求。在本文中，曾经帮助过我们的NLG部署数据高效的神经解决方案，对话系统到生产，我们当前的措施。我们描述一个家庭抽样和建模技术实现产品质量与仅使用，这将是必要的，否则数据的一小部分轻质神经网络模型，并显示每个之间的比较透彻的。我们的研究结果表明，域名的复杂性决定了合适的方法来实现高数据效率。最后，我们提炼出的经验教训，从我们的实验结果进了生产级NLG模型开发最佳实践在一份简短的运行手册的列表，目前他们。重要的是，所有这些技术的最终产品是小序列到序列模型（2MB），我们可以生产可靠地部署。

34. Explainable Automated Fact-Checking: A Survey [PDF] 返回目录
Neema Kotonya, Francesca Toni
Abstract: A number of exciting advances have been made in automated fact-checking thanks to increasingly larger datasets and more powerful systems, leading to improvements in the complexity of claims which can be accurately fact-checked. However, despite these advances, there are still desirable functionalities missing from the fact-checking pipeline. In this survey, we focus on the explanation functionality -- that is fact-checking systems providing reasons for their predictions. We summarize existing methods for explaining the predictions of fact-checking systems and we explore trends in this topic. Further, we consider what makes for good explanations in this specific domain through a comparative analysis of existing fact-checking explanations against some desirable properties. Finally, we propose further research directions for generating fact-checking explanations, and describe how these may lead to improvements in the research area.
摘要：令人振奋的进展的数已在自动事实查证由于越来越大的数据集和更强大的系统而进行的，从而导致在权利要求中，其可准确事实检查的复杂性的改进。然而，尽管这些进展，仍然有希望从功能的事实查证管道失踪。在本次调查中，我们侧重于解释的功能 - 这是事实查证系统提供他们的预测的理由。我们总结了解释事实查证系统的预测，现有的方法和我们这个主题探索的发展趋势。此外，我们认为，通过现有的事实检查，对一些希望的性质解释的比较分析在这个特定领域良好的解释是什么让。最后，我们提出了产生事实查证解释进一步研究的方向，并说明这些可能会导致在研究领域的改进。

35. Knowledge-driven Self-supervision for Zero-shot Commonsense Question Answering [PDF] 返回目录
Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari
Abstract: Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.
摘要：在预训练神经语言建模的最新发展已经导致了对常识答疑基准精度飞跃。然而，有越来越多的关注，模型过度拟合到特定的任务，而无需学习利用外部知识或执行一般语义推理。相比之下，零次评估显示承诺作为模型的一般推理能力的更稳健的措施。在本文中，我们提出了跨越常识性的任务，零次答疑一种新型神经象征性的框架。由一组假设的指导下，该框架研究如何把各种预现有的知识资源转化为一种形式，是最有效的预培训模式。我们改变集语言模型，培训制度，知识来源和数据生成策略，并测量它们的整个任务的影响。在先前的工作扩展，我们设计和比较4约束牵张采样策略。我们提供在五个常识答疑任务实证结果与五个外部知识资源产生的数据。我们表明，虽然个人知识图更适合于特定的任务，一个全球性的知识图谱带来了跨不同的任务相一致的收益。此外，无论是维护任务的结构以及产生公平和信息问题的帮助语言模型学习更有效。

36. Rethinking the Value of Transformer Components [PDF] 返回目录
Wenxuan Wang, Zhaopeng Tu
Abstract: Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this work, we bridge this gap by evaluating the impact of individual component (sub-layer) in trained Transformer models from different perspectives. Experimental results across language pairs, training strategies, and model capacities show that certain components are consistently more important than the others. We also report a number of interesting findings that might help humans better analyze, understand and improve Transformer models. Based on these observations, we further propose a new training strategy that can improves translation performance by distinguishing the unimportant components in training.
摘要：变压器成为国家的最先进的翻译模型，虽然它没有很好地研究了每个中间组件有助于模型的性能，这对设计优化的架构显著的挑战。在这项工作中，我们通过评估各个组件（子层）在受过训练的变压器模型从不同的角度影响弥补这一差距。跨语言对，培训策略和模型的能力实验结果表明，某些组件比其他一致更重要。我们还报告了一些有趣的发现，可能有助于人类更好地分析，理解和提高变压器模型。根据这些意见，我们进一步提出了一个新的培训策略，可以通过训练区分不重要的部件提高了翻译性能。

37. PairRE: Knowledge Graph Embeddings via Paired Relation Vectors [PDF] 返回目录
Linlin Chao, Jianshan He, Taifeng Wang, Wei Chu
Abstract: Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with improved expressiveness and low computational requirement. PairRE represents each relation with paired vectors, where these paired vectors project connected two entities to relation specific locations. Beyond its ability to solve the aforementioned two problems, PairRE is advantageous to represent subrelation as it can capture both the similarities and differences of subrelations effectively. Given simple constraints on relation representations, PairRE can be the first model that is capable of encoding symmetry/antisymmetry, inverse, composition and subrelation relations. Experiments on link prediction benchmarks show PairRE can achieve either state-of-the-art or highly competitive performances. In addition, PairRE has shown encouraging results for encoding subrelation.
摘要：基于距离的知识图嵌入方法显示在链接预测任务有希望的结果，在其两个议题已被广泛研究：一个是处理复杂的关系，如N对1，能力1对N和N-对N，另一种是编码各种关系模式，如对称/反对称。然而，现有的方法不能解决在同一时间这两个问题，这导致了令人满意的结果。为了缓解这一问题，我们提出PairRE，以提高表现力和较低的计算需求的典范。 PairRE表示与配对矢量，其中，这些成对的突出载体连接的两个实体关系的特定位置的每个关系。超出其解决上述两个问题的能力，PairRE有利的是表示subrelation，因为它可以同时捕获的相似性和有效的subrelations差异。关系上表示给定的简单的限制，可以PairRE是能够编码对称性/反对称性，反，组合物和subrelation关系的第一模型。上链路预测基准实验表明PairRE可实现任一状态的最先进的或高度竞争的性能。此外，PairRE已经显示出令人鼓舞的编码subrelation结果。

38. AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations [PDF] 返回目录
Lifeng Han, Gareth Jones, Alan Smeaton
Abstract: In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at this http URL.
摘要：在这项工作中，我们提出了多语种平行语料库的建设与多字表达式（MWEs）的注释。 MWEs包括在具有动词为研究术语头部的PARSEME共享任务定义口头MWEs（vMWEs）。附加说明vMWEs也双语和手动multilingually对齐。该语言涵盖包括英语，中国，波兰和德国。我们原来的英语语料库取自PARSEME 2018年共同任务，我们进行这个源语料库其次是人为后期编辑和目标MWEs的注释的机器翻译。严格的质量控制涂敷了用于误差限制，即，每个MT输出语句第一接收手动后期编辑和批注加上第二手动质量复检。一个语料库制备过程中我们发现的是MWEs礼物挑战MT系统是准确的翻译。为了进一步促进MT的研究，我们提出由MT系统在执行MWE相关翻译中遇到的错误类型的分类。为了获得的MT问题更广阔的视野，我们选择了四种流行国家的最先进的MT车型进行比较，分别是：微软必应翻译，GoogleMT，百度繁漪和深1 MT。由于噪声去除，翻译后编辑和注释MWE人类的专业人才，我们相信我们的AlphaMWE数据集将是跨语种和多语种的研究，如MT和信息提取的资产。我们的多语语料库可以作为这个HTTP URL开放接入。

39. Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads [PDF] 返回目录
Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Qun Liu, Maosong Sun
Abstract: Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually overparameterized and significantly increase the computational overhead in applications. It is intuitive to address this issue by model compression. In this work, we propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning. Specifically, we focus on pruning unnecessary attention heads adaptively for different downstream tasks. To measure the informativeness of attention heads, we train our Single-Shot Meta-Pruner (SMP) with a meta-learning paradigm aiming to maintain the distribution of text representations after pruning. Compared with existing compression methods for pre-trained models, our method can reduce the overhead of both fine-tuning and inference. Experimental results show that our pruner can selectively prune 50% of attention heads with little impact on the performance on downstream tasks and even provide better text representations. The source code will be released in the future.
摘要：深预先训练变压器模型已经在各种自然语言处理（NLP）任务，实现了国家的先进成果。通过学习与数以百万计的参数，丰富的语言知识，这些模型通常overparameterized和显著增加应用程序的计算开销。这是直观的通过模式压缩来解决这个问题。在这项工作中，我们提出了一个方法，称为单发元修剪，之前微调，压缩深预先训练变压器。具体而言，我们专注于修剪不必要的注意头为适应不同的下游任务。为了测量的注意头信息量，我们培训单发元修枝剪（SMP）与元学习的范例，旨在修剪后维持文本表示的分布。与预训练模型现有压缩方法相比，我们的方法可以减少双方的微调和推理的开销。实验结果表明，我们的修剪器可以有选择地修剪的注意元首50％，下游的任务对性能的影响很小，甚至提供更好的文本表示。源代码将在未来释放。

40. NLP-CIC @ PRELEARN: Mastering prerequisites relations, from handcrafted features to embeddings [PDF] 返回目录
Jason Angel, Segun Taofeek Aroyehun, Alexander Gelbukh
Abstract: We present our systems and findings for the prerequisite relation learning task (PRELEARN) at EVALITA 2020. The task aims to classify whether a pair of concepts hold a prerequisite relation or not. We model the problem using handcrafted features and embedding representations for in-domain and cross-domain scenarios. Our submissions ranked first place in both scenarios with average F1 score of 0.887 and 0.690 respectively across domains on the test sets. We made our code is freely available.
摘要：我们在2020年EVALITA任务目标提出我们的系统和前提关系学习任务（PRELEARN）的调查结果而确定的对概念是否持有的先决条件关系或没有。我们采用手工制作的功能和嵌入表示对于建模问题域和跨域场景。我们提交名列第一名，平均F1值0.887和0.690这两种方案分别跨域的测试集。我们做了我们的代码是免费提供的。

41. NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for Lexical Semantic Change in Diachronic Italian Corpora [PDF] 返回目录
Jason Angel, Carlos A. Rodriguez-Diaz, Alexander Gelbukh, Sergio Jimenez
Abstract: We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes. Our first model solely relies on part-of-speech usage and an ensemble of distance measures. The second model uses word embedding representation to extract the neighbor's relative distances across spaces and propose "the average of absolute differences" to estimate lexical semantic change. Our models achieved competent results, ranking third in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor parameter of our second model to compare the impact of using "the average of absolute differences" versus the cosine distance used in Hamilton et al. (2016).
摘要：我们提出我们的系统和调查结果在DIACR式-Ita意大利语监督的词汇语义变化的共同任务，在2020年EVALITA的任务是确定目标字词是否已演变及其随时间的意义，只有依靠RAW-从两个时间特定的数据集文本。我们建议表示跨周期的目标词来预测使用阈值和表决方案的改变的话两款车型。我们的第一个模型完全依靠部分的语音使用和距离测量的合奏。第二种模式使用Word中嵌入表示在整个空间中提取邻居的相对距离，并提出“平均绝对误差”估计词汇语义变化。我们的模型取得资格的结果，在DIACR式-Ita竞争排名第三。此外，我们有我们的第二个模型的参数k_neighbor实验中比较使用与汉密尔顿等人使用的余弦距离“的绝对差值的平均值”的影响。（2016）。

42. Naturalization of Text by the Insertion of Pauses and Filler Words [PDF] 返回目录
Richa Sharma, Parth Vipul Shah, Ashwini M. Joshi
Abstract: In this article, we introduce a set of methods to naturalize text based on natural human speech. Voice-based interactions provide a natural way of interfacing with electronic systems and are seeing a widespread adaptation of late. These computerized voices can be naturalized to some degree by inserting pauses and filler words at appropriate positions. The first proposed text transformation method uses the frequency of bigrams in the training data to make appropriate insertions in the input sentence. It uses a probability distribution to choose the insertions from a set of all possible insertions. This method is fast and can be included before a Text-To-Speech module. The second method uses a Recurrent Neural Network to predict the next word to be inserted. It confirms the insertions given by the bigram method. Additionally, the degree of naturalization can be controlled in both these methods. On the conduction of a blind survey, we conclude that the output of these text transformation methods is comparable to natural speech.
摘要：在这篇文章中，我们介绍了一套基于自然人类语音的方法归化入籍的文本。基于语音的交互提供的电子系统接口和看到的后期普遍适应自然的方式。这些计算机化的声音可以通过在适当的位置插入暂停并填充词被归到一定程度。第一个提出的文本转换方法使用的双字母组的频率在训练数据作出适当的插入输入语句。它采用的概率分布从一组所有可能的插入的选择插入。这种方法快速，并且可以包括文本到语音模块之前。第二种方法是使用一个递归神经网络来预测下一个字将被插入。它确认由二元方法给出的插入。此外，归化的程度可以在这两个方法来控制。在盲测的传导，我们得出的结论是，这些文本的转换方法的输出足以媲美自然语音。

43. Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages [PDF] 返回目录
Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black
Abstract: With recent advancements in language technologies, humansare now interacting with technology through speech. To in-crease the reach of these technologies, we need to build suchsystems in local languages. A major bottleneck here are theunderlying data-intensive parts that make up such systems,including automatic speech recognition (ASR) systems thatrequire large amounts of labelled data. With the aim of aidingdevelopment of dialog systems in low resourced languages,we propose a novel acoustics based intent recognition systemthat uses discovered phonetic units for intent classification.The system is made up of two blocks - the first block gen-erates a transcript of discovered phonetic units for the inputaudio, and the second block which performs intent classifi-cation from the generated phonemic transcripts. Our workpresents results for such a system for two languages families- Indic languages and Romance languages, for two differentintent recognition tasks. We also perform multilingual train-ing of our intent classifier and show improved cross-lingualtransfer and performance on an unknown language with zeroresources in the same language family.
摘要：随着语言技术的最新发展，humansare现在随着技术通过语音交互。要在折痕这些技术的范围，我们需要在当地语言构建suchsystems。这里的一大瓶颈是theunderlying数据密集型部分组成这样的系统，包括自动语音识别（ASR）thatrequire大量的标签数据的系统。随着低资源匮乏语言对话系统的aidingdevelopment的目标，提出了一种基于意图识别systemthat用途发现语音单位的意图classification.The系统两大块组成一个新的声学 - 第一块GEN-erates发现拼音的成绩单单位为inputaudio，以及执行从所生成的音素转录意图群分类阳离子的第二块。我们workpresents结果这样的系统对于两种语言families-印度语和罗曼语，两个differentintent识别任务。我们还进行多语种培训-ING我们的意图分类，并显示出改善的跨lingualtransfer和性能上与同语族zeroresources一个未知的语言。

44. Hostility Detection Dataset in Hindi [PDF] 返回目录
Mohit Bhardwaj, Md Shad Akhtar, Asif Ekbal, Amitava Das, Tanmoy Chakraborty
Abstract: In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.
摘要：在本文中，我们提出在印地文语言新颖的敌意检测数据集。我们收集和手动标注〜8200个网上的帖子。带注释的数据集涵盖四个方面的敌意：假新闻，煽动仇恨的言论，进攻，和诽谤的帖子，与非敌对的标签一起。敌对的帖子也被认为是多标签标签由于敌对阶级之间的显著重叠。我们发布这个数据集作为敌对检测后约束-2021共享任务的一部分。

45. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis [PDF] 返回目录
Ron J. Weiss, RJ Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, Diederik P. Kingma
Abstract: We describe a sequence-to-sequence neural network which can directly generate speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are modeled as a sequence of non-overlapping fixed-length frames, each one containing hundreds of samples. The interdependencies of waveform samples within each frame are modeled using the normalizing flow, enabling parallel training and synthesis. Longer-term dependencies are handled autoregressively by conditioning each flow on preceding frames. This model can be optimized directly with maximum likelihood, without using intermediate, hand-designed features nor additional loss terms. Contemporary state-of-the-art text-to-speech (TTS) systems use a cascade of separately learned models: one (such as Tacotron) which generates intermediate features (such as spectrograms) from text, followed by a vocoder (such as WaveRNN) which generates waveform samples from the intermediate features. The proposed system, in contrast, does not use a fixed intermediate representation, and learns all parameters end-to-end. Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system, with significantly improved generation speed.
摘要：我们描述了可以直接生成文本输入的语音波形序列对序列的神经网络。该架构通过将归一化流进入自回归译码器环路的延伸Tacotron模型。输出波形被建模为不重叠的固定长度的帧的序列，每一个都含有数百个样品。每个帧内的波形采样的相互依存关系是使用归一化流建模，使平行训练和合成。长期依赖性调理autoregressively处理上前面的帧的每个流。该模型可以直接与最大似然进行优化，而无需使用中间，手设计的特征，也不附加损耗方面。当代状态的最先进的文本到语音（TTS）系统使用分别学习模型的级联：酮（如Tacotron），其生成从文本中间功能（如谱图），随后由语音合成器（如WaveRNN），其从中间特征生成波形样品。所提出的系统，与此相反，不使用固定的中间表示，并且获悉所有参数端至端。实验结果表明，所提出的模型生成的语音质量接近的状态下的最先进的神经TTS系统，具有改进的显著生成速度。

46. Artificial Intelligence Decision Support for Medical Triage [PDF] 返回目录
Chiara Marchiori, Douglas Dykeman, Ivan Girardi, Adam Ivankay, Kevin Thandiackal, Mario Zusag, Andrea Giovannini, Daniel Karpati, Henri Saenz
Abstract: Applying state-of-the-art machine learning and natural language processing on approximately one million of teleconsultation records, we developed a triage system, now certified and in use at the largest European telemedicine provider. The system evaluates care alternatives through interactions with patients via a mobile application. Reasoning on an initial set of provided symptoms, the triage application generates AI-powered, personalized questions to better characterize the problem and recommends the most appropriate point of care and time frame for a consultation. The underlying technology was developed to meet the needs for performance, transparency, user acceptance and ease of use, central aspects to the adoption of AI-based decision support systems. Providing such remote guidance at the beginning of the chain of care has significant potential for improving cost efficiency, patient experience and outcomes. Being remote, always available and highly scalable, this service is fundamental in high demand situations, such as the current COVID-19 outbreak.
摘要：运用国家的最先进的机器学习和自然语言处理上大约的远程会诊记录百万，我们开发了一个分流系统，现在认证，并使用在欧洲最大的远程医疗提供者。系统将计算关心通过移动应用程序通过与患者互动的替代品。推理上一组初始提供的症状，分流应用程序生成AI供电，个性化的问题，以更好地表征问题，并建议护理和时限进行了咨询的最合适的点。底层技术的开发是为了满足对性能，透明度，用户验收的需求和易用性，核心方面采用基于人工智能的决策支持系统。在护理链的开头通过提供这种远程指导，对提高成本效益，病人的经验和成果显著的潜力。作为远程，始终可用，高度可扩展的，这个服务是在高需求的情况下，如目前COVID-19的爆发根本。

47. Mask Proxy Loss for Text-Independent Speaker Recognition [PDF] 返回目录
Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh
Abstract: Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning\protect\footnote{Different from the definition in \cite{Proxyanchor}, we adopt the concept of entity-based learning rather than pair-based learning to illustrate the data-to-data relationship. Entity refers to real data point.}. Most of existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance of which is either highly dependent on sample mining strategy or restricted by insufficient label information in the mini-batch. Proxy-based losses mitigate both shortcomings, however, fine-grained connections among entities are either not or indirectly leveraged. This paper proposes a Mask Proxy (MP) loss which directly incorporates both proxy-based relationship and entity-based relationship. We further propose Multinomial Mask Proxy (MMP) loss to leverage the hardness of entity-to-entity pairs. These methods have been applied to evaluate on VoxCeleb test set and reach state-of-the-art Equal Error Rate(EER).
摘要：开集说话人识别可以看作是一个度量学习问题，这是最大化类间方差和最小化类内变化。监督度量学习可以分为基于实体的学习和基于代理的学习\保护\注脚{自定义不同的\举{Proxyanchor}，我们采用了实体为基础的学习，而不是对为基础的学习来说明的概念数据到数据的关系。实体指的是实际数据点。}。大多数喜欢对比，三峰，原型，GE2E等都属于原区现有度量学习的目标，其表现要么是高度依赖于样品采策略或通过在小批量标签信息不足的限制。基于代理的损失减轻两者的缺点，但是，实体之间的细粒度连接或者不或间接地利用。本文提出了直接既包括基于代理的关系和基于实体的关系的掩码代理（MP）的损失。我们进一步建议多项式掩码代理（MMP）损失杠杆实体 - 实体对硬度。这些方法已被应用到上VoxCeleb测试组和覆盖状态的最先进的等错误率（EER）评估。

48. Knowledge Distillation for Singing Voice Detection [PDF] 返回目录
Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das
Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pretrained network known as the teacher is used to train a smaller student network. However, to the best of our knowledge, such methods have not been explored yet in the domain of SVD. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques. Through extensive experimentation on the publicly available Jamendo dataset, we show that, not only it's possible to achieve comparable accuracies with far smaller models (upto 1000x smaller in terms of parameters), but fascinatingly, in some cases, smaller models trained with distillation, even surpass the current state-of-the-art models on voice detection performance.
摘要：歌声检测（SVD）一直是音乐信息检索（MIR）研究的活跃领域。目前，两道深深的基于神经网络的方法，根据CNN一个，另一个在RNN，存在于该学会的语音检测（VD）任务优化的功能，实现共同的数据集的国家的最先进的性能文献。这两种模型具有参数数量巨大（1.4M于CNN和65.7K为RNN），并因此不适合在例如智能手机或与在存储器和计算能力方面的能力有限的嵌入式传感器设备部署。解决这一问题的最流行的方法被称为地方被称为老师的大型预训练的网络用于训练较小的学生网络深度学习文学（除模型压缩）知识升华。然而，据我们所知，这种方法尚未在SVD的领域探索。在本文中，已经作出努力用常规以及整体知识蒸馏技术来研究这个问题。通过对可公开获得的Jamendo数据集大量的实验，我们发现，不仅有可能（在参数方面高达1000倍以下）即可达到同等的精度远用较小的型号，但让人着迷，在某些情况下，用蒸馏训练有素的小模型，甚至超越语音检测性能的当前状态的最先进的车型。

49. Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition [PDF] 返回目录
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen
Abstract: The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR). However, these methods only utilize the enhanced feature as the input of the speech recognition component, which are affected by the speech distortion problem. In order to address this problem, this paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR. The GRF algorithm is used to dynamically combine the noisy and enhanced features. Therefore, the GRF can not only remove the noise signals from the enhanced features, but also learn the raw fine structures from the noisy features so that it can alleviate the speech distortion. The proposed method consists of speech enhancement, GRF and speech recognition. Firstly, the mask based speech enhancement network is applied to enhance the input speech. Secondly, the GRF is applied to address the speech distortion problem. Thirdly, to improve the performance of ASR, the state-of-the-art speech transformer algorithm is used as the speech recognition component. Finally, the joint training framework is utilized to optimize these three components, simultaneously. Our experiments are conducted on an open-source Mandarin speech corpus called AISHELL-1. Experimental results show that the proposed method achieves the relative character error rate (CER) reduction of 10.04\% over the conventional joint enhancement and transformer method only using the enhanced features. Especially for the low signal-to-noise ratio (0 dB), our proposed method can achieves better performances with 12.67\% CER reduction, which suggests the potential of our proposed method.
摘要：语音增强和识别方法的联合培训框架已经获得了强大的终端到终端的自动语音识别（ASR）相当不错的表现。然而，这些方法只利用增强特征作为语音识别组件，其被由语音失真问题影响的输入。为了解决这个问题，提出了一种具有用于健壮端至端ASR联合训练框架门控复发性融合（GRF）方法。在GRF算法被用来动态地组合嘈杂和增强功能。因此，GRF不仅可以从增强特性去除噪声信号，还学习从嘈杂特征的原始精细结构，使得它可以缓解语音失真。该方法包括语音增强，GRF和语音识别。首先，基于面具语音增强网络应用，以提高输入语音。其次，GRF被施加到处理语音失真的问题。第三，以提高ASR的性能，所述状态的最先进的语音变压器算法被用作语音识别组件。最后，联合训练框架用于优化这三个部件，同时。我们的实验是在被称为AISHELL-1的开源汉语语音语料进行。实验结果表明，所提出的方法实现了比传统的接合增强仅使用增强的特征的10.04 \％相对字符误差率（CER）还原和变压器方法。特别是对于低信噪比（0 dB）的，我们提出的方法能够实现更好的性能与12.67 \％CER减少，这表明我们提出的方法的潜力。

50. Long Range Arena: A Benchmark for Efficient Transformers [PDF] 返回目录
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
Abstract: Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. LRA paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle. Our benchmark code will be released at this https URL.
摘要：变压器不规模非常好长序列的长度，主要是因为二次自我关注的复杂性。在最近几个月，高效，快捷的变形金刚的广泛被提出来解决这个问题，往往不是自称上级或相近的模型质量香草Transformer模型。为了这一天，对如何评价这个级别的车型没有行之有效的共识。此外，在任务和数据集的广泛不一致的标杆使得很难评估相对模型质量以及很多车型。本文提出了一种系统的，统一的标杆，LRA，特别专注于长下上下文情境评估模型的品质。我们的基准是由序列范围从$ 1K $到$ $ 16K令牌的，包含大范围的数据类型和诸如文本，天然的，合成的图像模态的任务的一个套件，数学表达式需要相似性，结构和visual-空间推理。我们系统地评价我们的新提议的基准套件10行之有效的远射Transformer模型（改革者，Linformers，线性变压器，变压器Sinkhorn，表演者，合成器，稀疏变压器和Longformers）。 LRA对铺平了更好地了解这类高效变压器模型的方式，有利于在这个方向上进行更多的研究，并提出了新的挑战任务来解决。我们的基准测试的代码将在这个HTTPS URL被释放。

51. DyERNIE: Dynamic Evolution of Riemannian Manifold Embeddings for Temporal Knowledge Graph Completion [PDF] 返回目录
Zhen Han, Peng Chen, Yunpu Ma, Volker Tresp
Abstract: There has recently been increasing interest in learning representations of temporal knowledge graphs (KGs), which record the dynamic relationships between entities over time. Temporal KGs often exhibit multiple simultaneous non-Euclidean structures, such as hierarchical and cyclic structures. However, existing embedding approaches for temporal KGs typically learn entity representations and their dynamic evolution in the Euclidean space, which might not capture such intrinsic structures very well. To this end, we propose Dy ERNIE, a non-Euclidean embedding approach that learns evolving entity representations in a product of Riemannian manifolds, where the composed spaces are estimated from the sectional curvatures of underlying data. Product manifolds enable our approach to better reflect a wide variety of geometric structures on temporal KGs. Besides, to capture the evolutionary dynamics of temporal KGs, we let the entity representations evolve according to a velocity vector defined in the tangent space at each timestamp. We analyze in detail the contribution of geometric spaces to representation learning of temporal KGs and evaluate our model on temporal knowledge graph completion tasks. Extensive experiments on three real-world datasets demonstrate significantly improved performance, indicating that the dynamics of multi-relational graph data can be more properly modeled by the evolution of embeddings on Riemannian manifolds.
摘要：在学习时间的知识图（KGS），其中记录随着时间的推移实体之间的动态关系的表示最近被越来越多的关注。颞幼稚园通常表现出的多个同时非欧几里得结构，如分层的，环状结构。但是，现有的嵌入方法的时间通常幼稚园学习实体交涉，并在欧氏空间，这可能不是捕获这样的内在结构很好的动态演变。为此，我们提出的Dy摇奖，非欧几里得嵌入的办法，获悉演进实体表示在黎曼流形，其中由空格从基础数据的截面曲率推定的产物。产品歧管使我们的方法，以更好地反映时间幼儿园各种各样的几何结构。另外，捕捉时间幼稚园的进化动力学，我们让实体表示根据每个时间戳的切空间中定义的速度矢量进化。我们详细分析几何空间到时间幼稚园表示学习的贡献，并评估我们对时间的知识图完成任务的模式。三个真实世界的数据集大量的实验证明显著改进的性能，这表明多关系图数据的动态可在黎曼流形的嵌入的演变更适当地建模。

52. Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [PDF] 返回目录
Christopher Clark, Mark Yatskar, Luke Zettlemoyer
Abstract: Many datasets have been shown to contain incidental correlations created by idiosyncrasies in the data collection process. For example, sentence entailment datasets can have spurious word-class correlations if nearly all contradiction sentences contain the word "not", and image recognition datasets can have tell-tale object-background correlations if dogs are always indoors. In this paper, we propose a method that can automatically detect and ignore these kinds of dataset-specific patterns, which we call dataset biases. Our method trains a lower capacity model in an ensemble with a higher capacity model. During training, the lower capacity model learns to capture relatively shallow correlations, which we hypothesize are likely to reflect dataset bias. This frees the higher capacity model to focus on patterns that should generalize better. We ensure the models learn non-overlapping approaches by introducing a novel method to make them conditionally independent. Importantly, our approach does not require the bias to be known in advance. We evaluate performance on synthetic datasets, and four datasets built to penalize models that exploit known biases on textual entailment, visual question answering, and image recognition tasks. We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
摘要：许多数据集已被证明包含在数据收集过程特质创建附带的相关性。例如，句子蕴涵数据集可以有假字级的相关性，如果几乎所有的矛盾句子包含单词“不”，和图像识别数据集可以有搬弄是非的对象背景的相关性，如果狗永远是在室内。在本文中，我们提出，能自动检测并忽略这些类型的特定数据集，模式，我们称之为数据集偏见的方法。我们的方法中的列车的合奏较低容量模型具有较高容量模型。在培训过程中，低容量型号学会捕捉相对较浅的相关性，其中我们推测很可能反映数据集的偏见。这将释放更高容量的型号，以专注于模式，应更好地一概而论。我们保证模型学习不重叠的方式通过引入一种新的方法，使他们有条件独立。重要的是，我们的方法不要求偏置提前知道。我们在综合数据集进行性能评估，以及四个数据集构建以惩罚那些利用已知的文字蕴涵，视觉问答以及图像识别任务的偏见模型。我们发现在所有设置，包括在视觉答疑数据集有10点的增益改善。

53. Sim-to-Real Transfer for Vision-and-Language Navigation [PDF] 返回目录
Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee
Abstract: We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions. Recent work on the task of Vision-and-Language Navigation (VLN) has achieved significant progress in simulation. To assess the implications of this work for robotics, we transfer a VLN agent trained in simulation to a physical robot. To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, we propose a subgoal model to identify nearby waypoints, and use domain randomization to mitigate visual domain differences. For accurate sim and real comparisons in parallel environments, we annotate a 325m2 office space with 1.3km of navigation instructions, and create a digitized replica in simulation. We find that sim-to-real transfer to an environment not seen in training is successful if an occupancy map and navigation graph can be collected and annotated in advance (success rate of 46.8% vs. 55.9% in sim), but much more challenging in the hardest setting with no prior mapping at all (success rate of 22.5%).
摘要：我们研究一个前所未见的环境释放机器人，并让它跟随不受约束的自然语言导航指令的具有挑战性的问题。在视觉和语言导航（VLN）的任务，最近的工作已经取得了在模拟显著的进展。为了评估这项工作的机器人的影响，我们在模拟训练了VLN剂传送到物理机器人。为了弥补由VLN剂学会高级独立行动的空间，以及机器人的低级别的连续动作空间之间的差距，我们提出了一个子目标模型来识别附近的航点，并使用域随机减轻视觉域差异。为了获得准确的SIM卡，并在并行环境现实的比较中，我们用注释的导航指令1.3公里一个325平方米办公空间，营造仿真的数字化复制品。我们发现，SIM到真正转移到一个环境，训练没有看到是成功的，如果入住的地图和导航图可以收集并提前注解（46.8％的成功率与SIM卡中的55.9％），但更大的挑战与在所有之前没有映射（22.5％的成功率）最难设置。

54. Template Controllable keywords-to-text Generation [PDF] 返回目录
Abhijit Mishra, Md Faisal Mahbub Chowdhury, Sagar Manohar, Dan Gutfreund, Karthik Sankaranarayanan
Abstract: This paper proposes a novel neural model for the understudied task of generating text from keywords. The model takes as input a set of un-ordered keywords, and part-of-speech (POS) based template instructions. This makes it ideal for surface realization in any NLG setup. The framework is based on the encode-attend-decode paradigm, where keywords and templates are encoded first, and the decoder judiciously attends over the contexts derived from the encoded keywords and templates to generate the sentences. Training exploits weak supervision, as the model trains on a large amount of labeled data with keywords and POS based templates prepared through completely automatic means. Qualitative and quantitative performance analyses on publicly available test-data in various domains reveal our system's superiority over baselines, built using state-of-the-art neural machine translation and controllable transfer techniques. Our approach is indifferent to the order of input keywords.
摘要：本文提出了一种从关键字生成文本的充分研究任务，一个新的神经网络模型。该模型的输入是一组未排序的关键字，和部分的语音（POS）基于模板的说明。这使得它非常适合表面实现在任何NLG设置。该框架是基于编码错过的解码模式，其中的关键字和模板首先被编码，解码器在从编码关键字和模板导出生成句子上下文明智出席。培训利用监管不力，作为大量使用关键字以及基于POS模板标记数据的模型火车准备通过完全自动化的手段。可公开获得的测试数据的定性和定量分析的性能在各领域显示我们的系统在基线优势，采用先进设备，最先进的神经机器翻译和可控的转移技术建造。我们的做法是无动于衷输入关键词的顺序。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-11-10

目录

摘要