摘要

1. Automatic Standardization of Colloquial Persian [PDF] 返回目录
Mohammad Sadegh Rasooli, Farzane Bakhtyari, Fatemeh Shafiei, Mahsa Ravanbakhsh, Chris Callison-Burch
Abstract: The Iranian Persian language has two varieties: standard and colloquial. Most natural language processing tools for Persian assume that the text is in standard form: this assumption is wrong in many real applications especially web content. This paper describes a simple and effective standardization approach based on sequence-to-sequence translation. We design an algorithm for generating artificial parallel colloquial-to-standard data for learning a sequence-to-sequence model. Moreover, we annotate a publicly available evaluation data consisting of 1912 sentences from a diverse set of domains. Our intrinsic evaluation shows a higher BLEU score of 62.8 versus 61.7 compared to an off-the-shelf rule-based standardization model in which the original text has a BLEU score of 46.4. We also show that our model improves English-to-Persian machine translation in scenarios for which the training data is from colloquial Persian with 1.4 absolute BLEU score difference in the development data, and 0.8 in the test data.
摘要：伊朗波斯语有两种：标准和口语。大多数用于波斯语的自然语言处理工具都假定文本为标准格式：这种假设在许多实际应用程序（尤其是Web内容）中是错误的。本文介绍了一种基于序列到序列翻译的简单有效的标准化方法。我们设计了一种算法，用于生成人工并行口语到标准数据，以学习序列到序列模型。此外，我们注释了公开评估的数据，该数据由来自不同领域的1912个句子组成。与基于现有规则的标准化BLEU得分为46.4的基于现成规则的标准化模型相比，我们的内在评估显示出更高的BLEU得分为62.8和61.7。我们还表明，在训练数据来自口语波斯语的情况下，我们的模型改进了英语到波斯语的机器翻译，开发数据中的绝对BLEU得分差为1.4，测试数据中的绝对BLEU得分差为0.8。

2. Exploring Pair-Wise NMT for Indian Languages [PDF] 返回目录
Kartheek Akella, Sai Himal Allu, Sridhar Suresh Ragupathi, Aman Singhal, Zeeshan Khan, Vinay P. Namboodiri, C V Jawahar
Abstract: In this paper, we address the task of improving pair-wise machine translation for specific low resource Indian languages. Multilingual NMT models have demonstrated a reasonable amount of effectiveness on resource-poor languages. In this work, we show that the performance of these models can be significantly improved upon by using back-translation through a filtered back-translation process and subsequent fine-tuning on the limited pair-wise language corpora. The analysis in this paper suggests that this method can significantly improve a multilingual model's performance over its baseline, yielding state-of-the-art results for various Indian languages.
摘要：在本文中，我们解决了针对特定的低资源印度语言改进成对机器翻译的任务。多语言NMT模型已经证明在资源匮乏的语言上具有相当数量的有效性。在这项工作中，我们表明，通过使用经过筛选的反向翻译过程进行反向翻译并随后对有限的成对语言语料库进行微调，可以显着提高这些模型的性能。本文的分析表明，该方法可以大大提高多语言模型在其基线之上的性能，从而为各种印度语言提供最新的结果。

3. Multi-Sense Language Modelling [PDF] 返回目录
Andrea Lekkas, Peter Schneider-Kamp, Isabelle Augenstein
Abstract: The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. For sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.
摘要：语言模型的有效性受其令牌表示的影响，令牌表示必须对上下文信息进行编码，并处理具有多种含义的同一个单词形式（多义）。当前，没有一种公共语言建模体系结构明确地建模多义性。我们提出一种语言模型，该模型不仅可以预测下一个单词，还可以预测其在上下文中的含义。我们认为，这种较高的预测粒度可能对诸如辅助写作之类的最终任务很有用，并且允许将语言模型与知识库进行更精确的链接。我们发现，多感觉语言建模需要超越标准语言模型的体系结构，并且在此提出了一种结构化的预测框架，该框架将任务分解为单词，然后分解为感觉预测任务。对于感官预测，我们利用图注意力网络来对单词感官的定义和示例用法进行编码。总体而言，我们发现多感官语言建模是一项极富挑战性的任务，并建议未来的工作重点是创建更多带注释的训练数据集。

4. Longitudinal Citation Prediction using Temporal Graph Neural Networks [PDF] 返回目录
Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein
Abstract: Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time. Prior work viewed this as a static prediction task. As papers and their citations evolve over time, considering the dynamics of the number of citations a paper will receive would seem logical. Here, we introduce the task of sequence citation prediction, where the goal is to accurately predict the trajectory of the number of citations a scholarly work receives over time. We propose to view papers as a structured network of citations, allowing us to use topological information as a learning signal. Additionally, we learn how this dynamic citation network changes over time and the impact of paper meta-data such as authors, venues and abstracts. To approach the introduced task, we derive a dynamic citation network from Semantic Scholar which spans over 42 years. We present a model which exploits topological and temporal information using graph convolution networks paired with sequence prediction, and compare it against multiple baselines, testing the importance of topological and temporal information and analyzing model performance. Our experiments show that leveraging both the temporal and topological information greatly increases the performance of predicting citation counts over time.
摘要：引文计数预测是预测一段时间后论文获得的引文数量的任务。先前的工作将此视为静态预测任务。随着论文及其引文的发展，随着时间的推移，考虑论文收到的引文数量的动态似乎是合乎逻辑的。在这里，我们介绍了序列引文预测的任务，目的是准确预测学术作品随时间推移收到的引文数量的轨迹。我们建议将论文视为引文的结构化网络，使我们可以将拓扑信息用作学习信号。此外，我们还将学习该动态引文网络如何随着时间变化以及纸张元数据（例如作者，会场和摘要）的影响。为了完成引入的任务，我们从语义学者那里获得了一个长达42年的动态引文网络。我们提出了一个使用图卷积网络和序列预测来利用拓扑和时间信息的模型，并将其与多个基线进行比较，测试拓扑和时间信息的重要性并分析模型性能。我们的实验表明，同时利用时间信息和拓扑信息可大大提高预测引文计数的性能。

5. Towards Coinductive Models for Natural Language Understanding. Bringing together Deep Learning and Deep Semantics [PDF] 返回目录
Wlodek W. Zadrozny
Abstract: This article contains a proposal to add coinduction to the computational apparatus of natural language understanding. This, we argue, will provide a basis for more realistic, computationally sound, and scalable models of natural language dialogue, syntax and semantics. Given that the bottom up, inductively constructed, semantic and syntactic structures are brittle, and seemingly incapable of adequately representing the meaning of longer sentences or realistic dialogues, natural language understanding is in need of a new foundation. Coinduction, which uses top down constraints, has been successfully used in the design of operating systems and programming languages. Moreover, implicitly it has been present in text mining, machine translation, and in some attempts to model intensionality and modalities, which provides evidence that it works. This article shows high level formalizations of some of such uses. Since coinduction and induction can coexist, they can provide a common language and a conceptual model for research in natural language understanding. In particular, such an opportunity seems to be emerging in research on compositionality. This article shows several examples of the joint appearance of induction and coinduction in natural language processing. We argue that the known individual limitations of induction and coinduction can be overcome in empirical settings by a combination of the the two methods. We see an open problem in providing a theory of their joint use.
摘要：本文包含一项建议，将共归添加到自然语言理解的计算设备中。我们认为，这将为自然语言对话，语法和语义的更现实，计算合理且可扩展的模型提供基础。鉴于自下而上的归纳构造，语义和句法结构是脆弱的，并且似乎无法充分表示较长句子或现实对话的含义，因此自然语言理解需要新的基础。使用自上而下的约束的Coinduction已成功用于操作系统和编程语言的设计中。而且，它隐含地存在于文本挖掘，机器翻译中，并且在对意图和形式进行建模的一些尝试中，这提供了它起作用的证据。本文显示了其中一些此类用法的高级形式化。由于共归和归纳可以共存，因此它们可以为自然语言理解研究提供通用语言和概念模型。特别是，这种机会似乎在组成性研究中正在出现。本文显示了自然语言处理中归纳和共归的联合外观的几个示例。我们认为，通过两种方法的组合，可以在经验设置中克服已知的诱导和共诱导的个体限制。提供联合使用的理论时，我们看到了一个开放的问题。

6. Direct multimodal few-shot learning of speech and images [PDF] 返回目录
Leanne Nortje, Herman Kamper
Abstract: We propose direct multimodal few-shot models that learn a shared embedding space of spoken words and images from only a few paired examples. Imagine an agent is shown an image along with a spoken word describing the object in the picture, e.g. pen, book and eraser. After observing a few paired examples of each class, the model is asked to identify the "book" in a set of unseen pictures. Previous work used a two-step indirect approach relying on learned unimodal representations: speech-speech and image-image comparisons are performed across the support set of given speech-image pairs. We propose two direct models which instead learn a single multimodal space where inputs from different modalities are directly comparable: a multimodal triplet network (MTriplet) and a multimodal correspondence autoencoder (MCAE). To train these direct models, we mine speech-image pairs: the support set is used to pair up unlabelled in-domain speech and images. In a speech-to-image digit matching task, direct models outperform indirect models, with the MTriplet achieving the best multimodal five-shot accuracy. We show that the improvements are due to the combination of unsupervised and transfer learning in the direct models, and the absence of two-step compounding errors.
摘要：我们提出了直接多模态的少数射击模型，该模型仅从几个配对的示例中学习了语音和图像的共享嵌入空间。想象一下，向一个代理显示一张图像以及一个口头单词，描述图片中的对象，例如笔，书和橡皮。观察每个类的几个配对示例后，要求模型在一组看不见的图片中标识“书”。先前的工作使用了两步间接方法，该方法依赖于学习到的单峰表示：在给定语音-图像对的支持集上执行语音-语音和图像-图像比较。我们提出了两个直接模型，而不是学习单个多模式空间，其中来自不同模式的输入可以直接比较：多模式三重态网络（MTriplet）和多模式对应自动编码器（MCAE）。为了训练这些直接模型，我们挖掘语音图像对：支持集用于配对未标记的域内语音和图像。在语音到图像的数字匹配任务中，直接模型要优于间接模型，而MTriplet可以达到最佳的多模态五连发精度。我们表明，改进是由于无监督学习和直接模型中的转移学习相结合，并且没有两步复合错误。

7. As good as new. How to successfully recycle English GPT-2 to make models for other languages [PDF] 返回目录
Wietse de Vries, Malvina Nissim
Abstract: Large generative language models have been very successful for English, but other languages lag behind due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained language models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings and induce a bilingual lexicon from this alignment. Additionally, we show how to scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch, but on average these sentences are still identifiable as artificial by humans. Based on perplexity scores and human judgements, we find that generated sentences become more realistic with some additional full model finetuning, especially for Dutch. For Italian, we see that they are evaluated on par with sentences generated by a GPT-2 model fully trained from scratch. Our work can be conceived as a blueprint for training GPT-2s for other languages, and we provide a 'recipe' to do so.
摘要：大型生成语言模型对于英语非常成功，但是由于数据和计算限制，其他语言落后。我们提出了一种方法，可以通过将现有的预训练语言模型改编为新语言来克服这些问题。具体来说，我们通过重新训练词法嵌入而不调整Transformer层来描述英语GPT-2对意大利语和荷兰语的适应性。结果，我们获得了与原始英语词汇嵌入对齐的意大利语和荷兰语的词汇嵌入，并由此对齐生成了双语词典。此外，我们展示了如何通过将较小的GPT-2的相关学习词法嵌入转换为GPT-2中等嵌入空间来提高复杂度。此方法可最大程度地减少训练量，并防止在GPT-2学习到的适应过程中丢失信息。具有相关词汇嵌入的英语GPT-2模型可以用意大利语和荷兰语生成逼真的句子，但平均而言，这些句子仍可被人类识别为人工。基于困惑度评分和人类判断，我们发现生成的句子在进行一些附加的完整模型微调后变得更加现实，尤其是对于荷兰语。对于意大利语，我们看到它们与完全由零开始训练的GPT-2模型生成的句子同等评价。我们的工作可以被视为培训其他语言的GPT-2的蓝图，并且我们为此提供了“食谱”。

8. Approches quantitatives de l'analyse des pr{é}dictions en traduction automatique neuronale (TAN) [PDF] 返回目录
Maria Zimina-Poirot, Nicolas Ballier, Jean-Baptiste Yunès
Abstract: As part of a larger project on optimal learning conditions in neural machine translation, we investigate characteristic training phases of translation engines. All our experiments are carried out using OpenNMT-Py: the pre-processing step is implemented using the Europarl training corpus and the INTERSECT corpus is used for validation. Longitudinal analyses of training phases suggest that the progression of translations is not always linear. Following the results of textometric explorations, we identify the importance of the phenomena related to chronological progression, in order to map different processes at work in neural machine translation (NMT).
摘要：作为神经机器翻译中最佳学习条件的大型项目的一部分，我们研究了翻译引擎的特征训练阶段。我们所有的实验都是使用OpenNMT-Py进行的：预处理步骤使用Europarl训练语料库进行，而INTERSECT语料库用于验证。培训阶段的纵向分析表明，翻译的进度并不总是线性的。根据文本探索的结果，我们确定了与时间顺序相关的现象的重要性，以便在神经机器翻译（NMT）中绘制工作中的不同过程。

9. An Event Correlation Filtering Method for Fake News Detection [PDF] 返回目录
Hao Li, Huan Wang, Guanghua Liu
Abstract: Nowadays, social network platforms have been the prime source for people to experience news and events due to their capacities to spread information rapidly, which inevitably provides a fertile ground for the dissemination of fake news. Thus, it is significant to detect fake news otherwise it could cause public misleading and panic. Existing deep learning models have achieved great progress to tackle the problem of fake news detection. However, training an effective deep learning model usually requires a large amount of labeled news, while it is expensive and time-consuming to provide sufficient labeled news in actual applications. To improve the detection performance of fake news, we take advantage of the event correlations of news and propose an event correlation filtering method (ECFM) for fake news detection, mainly consisting of the news characterizer, the pseudo label annotator, the event credibility updater, and the news entropy selector. The news characterizer is responsible for extracting textual features from news, which cooperates with the pseudo label annotator to assign pseudo labels for unlabeled news by fully exploiting the event correlations of news. In addition, the event credibility updater employs adaptive Kalman filter to weaken the credibility fluctuations of events. To further improve the detection performance, the news entropy selector automatically discovers high-quality samples from pseudo labeled news by quantifying their news entropy. Finally, ECFM is proposed to integrate them to detect fake news in an event correlation filtering manner. Extensive experiments prove that the explainable introduction of the event correlations of news is beneficial to improve the detection performance of fake news.
摘要：如今，社交网络平台已经成为人们体验新闻和事件的主要来源，因为它们具有迅速传播信息的能力，这不可避免地为散布假新闻提供了沃土。因此，检测假新闻非常重要，否则可能引起公众误导和恐慌。现有的深度学习模型在解决虚假新闻检测问题上取得了长足的进步。但是，训练有效的深度学习模型通常需要大量标记新闻，而在实际应用中提供足够的标记新闻既昂贵又耗时。为了提高对假新闻的检测性能，我们利用新闻的事件相关性，提出了一种用于假新闻检测的事件相关过滤方法（ECFM），主要由新闻特征符，伪标签注释器，事件可信度更新器，和新闻熵选择器新闻表征器负责从新闻中提取文本特征，该新闻特征与伪标签注释器合作，通过充分利用新闻的事件相关性为未标记的新闻分配伪标签。另外，事件可信度更新器采用自适应卡尔曼滤波器来减弱事件的可信度波动。为了进一步提高检测性能，新闻熵选择器通过量化新闻标签的新闻熵来自动从伪标签新闻中发现高质量样本。最后，提出了以事件相关过滤的方式将ECFM进行集成以检测假新闻。大量实验证明，新闻事件相关性的可解释性引入有利于提高对假新闻的检测性能。

10. Causal-BERT : Language models for causality detection between events expressed in text [PDF] 返回目录
Vivek Khetan, Roshni Ramnani, Mayuresh Anand, Shubhashis Sengupta, Andrew E.Fano
Abstract: Causality understanding between events is a critical natural language processing task that is helpful in many areas, including health care, business risk management and finance. On close examination, one can find a huge amount of textual content both in the form of formal documents or in content arising from social media like Twitter, dedicated to communicating and exploring various types of causality in the real world. Recognizing these "Cause-Effect" relationships between natural language events continues to remain a challenge simply because it is often expressed implicitly. Implicit causality is hard to detect through most of the techniques employed in literature and can also, at times be perceived as ambiguous or vague. Also, although well-known datasets do exist for this problem, the examples in them are limited in the range and complexity of the causal relationships they depict especially when related to implicit relationships. Most of the contemporary methods are either based on lexico-semantic pattern matching or are feature-driven supervised methods. Therefore, as expected these methods are more geared towards handling explicit causal relationships leading to limited coverage for implicit relationships and are hard to generalize. In this paper, we investigate the language model's capabilities for causal association among events expressed in natural language text using sentence context combined with event information, and by leveraging masked event context with in-domain and out-of-domain data distribution. Our proposed methods achieve the state-of-art performance in three different data distributions and can be leveraged for extraction of a causal diagram and/or building a chain of events from unstructured text.
摘要：事件之间的因果关系理解是一项至关重要的自然语言处理任务，在医疗保健，业务风险管理和财务等许多领域都有帮助。通过仔细检查，人们可以找到大量的文本内容，无论是正式文档的形式，还是源于Twitter等社交媒体的内容，这些内容致力于交流和探索现实世界中的各种因果关系。认识到自然语言事件之间的这些“因果关系”仍然是一个挑战，仅仅因为它经常被隐式表达。隐含的因果关系很难通过文献中使用的大多数技术来检测，有时也可能被认为是模棱两可或含糊不清的。同样，尽管确实存在针对此问题的众所周知的数据集，但其中的示例在它们描述的因果关系的范围和复杂性方面受到限制，尤其是在与隐式关系相关时。大多数当代方法要么基于词汇语义模式匹配，要么是基于特征驱动的监督方法。因此，正如预期的那样，这些方法更适合于处理显式因果关系，从而导致隐式关系的覆盖范围有限，并且难以一概而论。在本文中，我们研究了语言模型使用句子上下文与事件信息相结合，并利用带掩码的事件上下文与域内和域外数据分布来在自然语言文本中表达的事件之间因果关联的能力。我们提出的方法可以在三种不同的数据分布中实现最先进的性能，并且可以用于提取因果图和/或从非结构化文本构建事件链。

11. A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality [PDF] 返回目录
Shubhanshu Mishra, Daniel Collier
Abstract: In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment. We release three datasets of Facebook comments for further research at: this https URL
摘要：在本文中，我们介绍了一种用于注释各种类别的社交媒体文本语料库的框架。由于社交媒体数据是通过个人生成的，因此为个人的人口统计属性注释文本以实现对语料库的社会技术分析非常重要。此外，在分析大型数据集时，我们通常可以注释少量数据样本，然后使用该样本训练预测模型以注释相关类别的完整数据。我们使用一个关于学生贷款讨论的Facebook评论语料库的案例研究，其中注释了性别，军事隶属关系，年龄组，政治倾向，种族，立场，话题性，新自由主义观点和评论的文明性。我们在以下网址发布了三个Facebook评论数据集，以供进一步研究：

12. Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition [PDF] 返回目录
Yangming Li, Lemao Liu, Shuming Shi
Abstract: In many scenarios, named entity recognition (NER) models severely suffer from unlabeled entity problem, where the entities of a sentence may not be fully annotated. Through empirical studies performed on synthetic datasets, we find two causes of the performance degradation. One is the reduction of annotated entities and the other is treating unlabeled entities as negative instances. The first cause has less impact than the second one and can be mitigated by adopting pretraining language models. The second cause seriously misguides a model in training and greatly affects its performances. Based on the above observations, we propose a general approach that is capable of eliminating the misguidance brought by unlabeled entities. The core idea is using negative sampling to keep the probability of training with unlabeled entities at a very low level. Experiments on synthetic datasets and real-world datasets show that our model is robust to unlabeled entity problem and surpasses prior baselines. On well-annotated datasets, our model is competitive with state-of-the-art method.
摘要：在许多情况下，命名实体识别（NER）模型严重遭受未标记的实体问题的困扰，在该问题中，句子的实体可能未得到完全注释。通过对综合数据集进行的实证研究，我们发现了性能下降的两个原因。一种是减少带注释的实体，另一种是将未标记的实体视为否定实例。第一个原因的影响小于第二个原因，可以通过采用预训练语言模型来缓解。第二个原因在训练中严重误导了模型，并极大地影响了模型的性能。基于以上观察，我们提出了一种通用方法，该方法能够消除未标记实体带来的误导。核心思想是使用负采样将未标记实体进行训练的概率保持在非常低的水平。在合成数据集和现实数据集上进行的实验表明，我们的模型对未标记的实体问题具有鲁棒性，并且超过了先前的基准。在标注正确的数据集上，我们的模型与最先进的方法相比具有竞争力。

13. Segmenting Natural Language Sentences via Lexical Unit Analysis [PDF] 返回目录
Yangming Li, Lemao Liu, Shuming Shi
Abstract: In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks. Given a natural language sentence, LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one. LUA enjoys a number of appealing properties such as inherently guaranteeing the predicted segmentation to be valid and facilitating globally optimal training and inference. Besides, the practical time complexity of LUA can be reduced to linear time, which is very efficient. We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our models have achieved the state-of-the-art performances on 13 of them. The results also show that the F1 score of identifying long-length segments is notably improved.
摘要：在这项工作中，我们提出了词法单元分析（LUA），这是用于一般序列分割任务的框架。给定自然语言句子后，LUA会为所有有效的细分候选者评分，并利用动态编程（DP）来提取得分最高的一个。 LUA具有许多吸引人的特性，例如，固有地保证预测的细分有效，并促进全局最佳训练和推理。此外，LUA的实际时间复杂度可以降低为线性时间，这非常有效。我们在15个数据集中对5个任务进行了广泛的实验，包括语法分块，命名实体识别（NER），空位填充，中文分词和中文词性（POS）标记。我们的模型已在其中的13个上实现了最先进的性能。结果还表明，识别长段的F1分数显着提高。

14. Rewriter-Evaluator Framework for Neural Machine Translation [PDF] 返回目录
Yangming Li, Kaisheng Yao
Abstract: Encoder-decoder architecture has been widely used in neural machine translation (NMT). A few methods have been proposed to improve it with multiple passes of decoding. However, their full potential is limited by a lack of appropriate termination policy. To address this issue, we present a novel framework, Rewriter-Evaluator. It consists of a rewriter and an evaluator. Translating a source sentence involves multiple passes. At every pass, the rewriter produces a new translation to improve the past translation and the evaluator estimates the translation quality to decide whether to terminate the rewriting process. We also propose a prioritized gradient descent (PGD) method that facilitates training the rewriter and the evaluator jointly. Though incurring multiple passes of decoding, Rewriter-Evaluator with the proposed PGD method can be trained with similar time to that of training encoder-decoder models. We apply the proposed framework to improve the general NMT models (e.g., Transformer). We conduct extensive experiments on two translation tasks, Chinese-English and English-German, and show that the proposed framework notably improves the performances of NMT models and significantly outperforms previous baselines.
摘要：编码器-解码器体系结构已广泛用于神经机器翻译（NMT）。已经提出了几种通过多次解码来改进它的方法。但是，由于缺乏适当的终止政策，它们的全部潜力受到限制。为了解决这个问题，我们提出了一个新颖的框架，Rewriter-Evaluator。它由一个重写器和一个评估器组成。翻译源句涉及多次通过。在每次通过时，重写器都会产生新的译文以改善过去的译文，评估人员会评估翻译质量，以决定是否终止重写过程。我们还提出了一种优先的梯度下降（PGD）方法，该方法有助于共同培训重写者和评估者。尽管会经历多次解码，但是采用建议的PGD方法的Rewriter-Evaluator可以在与训练编码器-解码器模型相似的时间下进行训练。我们应用提出的框架来改进常规的NMT模型（例如Transformer）。我们对汉英和英德两种翻译任务进行了广泛的实验，结果表明，提出的框架显着提高了NMT模型的性能，并明显优于以前的基准。

15. Infusing Finetuning with Semantic Dependencies [PDF] 返回目录
Zhaofeng Wu, Hao Peng, Noah A. Smith
Abstract: For natural language processing systems, two kinds of evidence support the use of text representations from neural language models "pretrained" on large unannotated corpora: performance on application-inspired benchmarks (Peters et al., 2018, inter alia), and the emergence of syntactic abstractions in those representations (Tenney et al., 2019, inter alia). On the other hand, the lack of grounded supervision calls into question how well these representations can ever capture meaning (Bender and Koller, 2020). We apply novel probes to recent language models -- specifically focusing on predicate-argument structure as operationalized by semantic dependencies (Ivanova et al., 2012) -- and find that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding (NLU) tasks in the GLUE benchmark. This approach demonstrates the potential for general-purpose (rather than task-specific) linguistic supervision, above and beyond conventional pretraining and finetuning. Several diagnostics help to localize the benefits of our approach.
摘要：对于自然语言处理系统，有两种证据支持在未经注释的大型语料库上使用“预训练”的神经语言模型的文本表示形式：基于应用程序的基准性能（Peters等人，2018，除其他外），以及这些表示形式中出现了句法抽象（Tenney等人，2019，尤其是）。另一方面，缺乏扎实的监督使人们质疑这些表述能否很好地捕捉意义（Bender and Koller，2020）。我们将新颖的探针应用于最新的语言模型-特别关注语义依赖可操作的谓词-自变量结构（Ivanova et al。，2012）-发现与语法不同的是，当今的预训练模型并未将语义浮出水面。然后，我们使用卷积图编码器将语义分析显式地合并到特定于任务的微调中，从而为GLUE基准中的自然语言理解（NLU）任务带来好处。这种方法展示了超越常规预培训和微调之外的通用（而非特定任务）语言监督的潜力。几种诊断方法有助于确定我们方法的好处。

16. Speech Recognition for Endangered and Extinct Samoyedic languages [PDF] 返回目录
Niko Partanen, Mika Hämäläinen, Tiina Klooster
Abstract: Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia. To best of our knowledge, this is the first time a functional ASR system is built for an extinct language. We achieve with Kamas language a Label Error Rate of 15\%, and conclude through careful error analysis that this quality is already very useful as a starting point for refined human transcriptions. Our results with related Nganasan language are more modest, with best model having the error rate of 33\%. We show, however, through experiments where Kamas training data is enlarged incrementally, that Nganasan results are in line with what is expected under low-resource circumstances of the language. Based on this, we provide recommendations for scenarios in which further language documentation or archive processing activities could benefit from modern ASR technology. All training data and processing scripts haven been published on Zenodo with clear licences to ensure further work in this important topic.
摘要：我们的研究提出了一系列在濒临灭绝的萨摩耶语系语言中进行语音识别的实验，这些语言在西伯利亚南部和北部使用。据我们所知，这是首次为一种已存在的语言构建功能性ASR系统。我们使用Kamas语言实现了15％的标签错误率，并通过仔细的错误分析得出结论，这种质量作为提炼人类转录的起点已经非常有用。我们使用Nganasan语言进行的比较结果较为适度，最佳模型的错误率为33％。但是，我们通过Kamas训练数据逐渐增加的实验表明，Nganasan的结果与该语言资源较少的情况下的预期结果一致。基于此，我们为可能会从现代ASR技术中受益的其他语言文档或存档处理活动的方案提供建议。所有培训数据和处理脚本均已在Zenodo上发布，并具有清晰的许可证，以确保在此重要主题上的进一步工作。

17. Normalization of Different Swedish Dialects Spoken in Finland [PDF] 返回目录
Mika Hämäläinen, Niko Partanen, Khalid Alnajjar
Abstract: Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions. We tested 5 different models, and the best model improved the word error rate from 76.45 to 28.58. Contrary to results reported in earlier research on Finnish dialects, we found that training the model with one word at a time gave best results. We believe this is due to the size of the training data available for the model. Our models are accessible as a Python package. The study provides important information about the adaptability of these methods in different contexts, and gives important baselines for further study.
摘要：我们的研究提出了涵盖六个地区的不同芬兰瑞典方言的方言归一化方法。我们测试了5种不同的模型，最佳模型将单词错误率从76.45提高到28.58。与早期芬兰方言研究报告的结果相反，我们发现一次用一个词训练模型可获得最佳结果。我们认为，这是由于该模型可用的训练数据量很大。我们的模型可以作为Python包进行访问。该研究提供了有关这些方法在不同情况下的适应性的重要信息，并为进一步研究提供了重要的基线。

18. Generative Adversarial Networks for Annotated Data Augmentation in Data Sparse NLU [PDF] 返回目录
Olga Golovneva, Charith Peris
Abstract: Data sparsity is one of the key challenges associated with model development in Natural Language Understanding (NLU) for conversational agents. The challenge is made more complex by the demand for high quality annotated utterances commonly required for supervised learning, usually resulting in weeks of manual labor and high cost. In this paper, we present our results on boosting NLU model performance through training data augmentation using a sequential generative adversarial network (GAN). We explore data generation in the context of two tasks, the bootstrapping of a new language and the handling of low resource features. For both tasks we explore three sequential GAN architectures, one with a token-level reward function, another with our own implementation of a token-level Monte Carlo rollout reward, and a third with sentence-level reward. We evaluate the performance of these feedback models across several sampling methodologies and compare our results to upsampling the original data to the same scale. We further improve the GAN model performance through the transfer learning of the pretrained embeddings. Our experiments reveal synthetic data generated using the sequential generative adversarial network provides significant performance boosts across multiple metrics and can be a major benefit to the NLU tasks.
摘要：数据稀疏性是自然语言理解（NLU）中用于对话代理的模型开发相关的主要挑战之一。监督学习通常需要高质量的注释话语，这使挑战变得更加复杂，这通常导致数周的体力劳动和高昂的成本。在本文中，我们介绍了通过使用顺序生成对抗网络（GAN）训练数据来增强NLU模型性能的结果。我们在两个任务的上下文中探索数据生成，一种新语言的引导以及对低资源功能的处理。对于这两项任务，我们探索了三种顺序的GAN架构，一种具有令牌级奖励功能，另一种采用我们自己的令牌级蒙特卡洛推出推广奖励的实现，第三种具有句子级奖励。我们通过几种抽样方法评估这些反馈模型的性能，并比较我们的结果以将原始数据上抽样到相同规模。我们通过对预训练嵌入的传递学习来进一步提高GAN模型的性能。我们的实验表明，使用顺序生成对抗网络生成的合成数据可显着提高跨多个指标的性能，并且可以为NLU任务带来重大好处。

19. Cross-lingual Word Sense Disambiguation using mBERT Embeddings with Syntactic Dependencies [PDF] 返回目录
Xingran Zhu
Abstract: Cross-lingual word sense disambiguation (WSD) tackles the challenge of disambiguating ambiguous words across languages given context. The pre-trained BERT embedding model has been proven to be effective in extracting contextual information of words, and have been incorporated as features into many state-of-the-art WSD systems. In order to investigate how syntactic information can be added into the BERT embeddings to result in both semantics- and syntax-incorporated word embeddings, this project proposes the concatenated embeddings by producing dependency parse tress and encoding the relative relationships of words into the input embeddings. Two methods are also proposed to reduce the size of the concatenated embeddings. The experimental results show that the high dimensionality of the syntax-incorporated embeddings constitute an obstacle for the classification task, which needs to be further addressed in future studies.
摘要：跨语言的词义消歧（WSD）解决了在给定上下文的情况下跨语言对歧义词进行消歧的挑战。预训练的BERT嵌入模型已被证明可有效地提取单词的上下文信息，并且已作为功能集成到许多最新的WSD系统中。为了研究如何将语法信息添加到BERT嵌入中以生成结合语义和语法的单词嵌入，该项目通过产生依赖项解析树并将单词的相对关系编码为输入嵌入，提出了串联嵌入。还提出了两种方法来减小级联嵌入的大小。实验结果表明，结合了语法的嵌入的高维度构成了分类任务的障碍，有待进一步研究。

20. Bew: Towards Answering Business-Entity-Related Web Questions [PDF] 返回目录
Qingqing Cao, Oriana Riva, Aruna Balasubramanian, Niranjan Balasubramanian
Abstract: We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding context, and are dynamic since the webpage information can be updated frequently. Under these conditions, existing QA systems perform poorly. We present a practical approach, called BewQA, that can answer Bew queries by mining a template of the business-related webpages and using the template to guide the search. We show how we can extract the template automatically by leveraging aggregator websites that aggregate information about business entities in a domain (e.g., restaurants). We answer a given question by identifying the section from the extracted template that is most likely to contain the answer. By doing so we can extract the answers even when the answer span does not have sufficient context. Importantly, BewQA does not require any training. We crowdsource a new dataset of 1066 Bew questions and ground-truth answers in the restaurant domain. Compared to state-of-the-art QA models, BewQA has a 27 percent point improvement in F1 score. Compared to a commercial search engine, BewQA answered correctly 29% more Bew questions.
摘要：我们提出了BewQA，这是一个专门设计用来回答一类问题的系统，我们称之为Bew问题。 Bew问题与诸如餐厅，酒店和电影院之类的企业/服务有关；例如，“直到什么时候是欢乐时光？”。这些问题很难回答，因为答案是在开放域Web中找到的，以简短的句子形式出现而没有周围的上下文，并且是动态的，因为可以经常更新网页信息。在这种情况下，现有的质量检查系统效果不佳。我们提供一种称为BewQA的实用方法，该方法可以通过挖掘与业务相关的网页的模板并使用该模板指导搜索来回答Bew查询。我们展示了如何利用聚合器网站自动提取模板，这些聚合器网站聚合了域中业务实体的信息（例如，餐馆）。我们通过从提取的模板中识别最有可能包含答案的部分来回答给定的问题。这样，即使答案范围没有足够的上下文，我们也可以提取答案。重要的是，BewQA不需要任何培训。我们在餐厅领域众包了1066个Bew问题和真实答案的新数据集。与最新的QA模型相比，BewQA的F1得分提高了27％。与商业搜索引擎相比，BewQA正确回答了29％的Bew问题。

21. Generative Deep Learning Techniques for Password Generation [PDF] 返回目录
David Biesner, Kostadin Cvejoski, Bogdan Georgiev, Rafet Sifa, Erik Krupicka
Abstract: Password guessing approaches via deep learning have recently been investigated with significant breakthroughs in their ability to generate novel, realistic password candidates. In the present work we study a broad collection of deep learning and probabilistic based models in the light of password guessing: attention-based deep neural networks, autoencoding mechanisms and generative adversarial networks. We provide novel generative deep-learning models in terms of variational autoencoders exhibiting state-of-art sampling performance, yielding additional latent-space features such as interpolations and targeted sampling. Lastly, we perform a thorough empirical analysis in a unified controlled framework over well-known datasets (RockYou, LinkedIn, Youku, Zomato, Pwnd). Our results not only identify the most promising schemes driven by deep neural networks, but also illustrate the strengths of each approach in terms of generation variability and sample uniqueness.
摘要：最近研究了通过深度学习进行密码猜测的方法，在生成新颖，逼真的密码候选者的能力方面取得了重大突破。在当前的工作中，我们根据密码猜测研究了一系列深度学习和基于概率的模型：基于注意力的深度神经网络，自动编码机制和生成式对抗网络。我们提供了可变的自动编码器来展示最新的采样性能，并提供了额外的潜在空间特征，例如插值和目标采样，从而提供了新颖的生成式深度学习模型。最后，我们在统一控制的框架内对知名数据集（RockYou，LinkedIn，Youku，Zomato，Pwnd）进行了全面的经验分析。我们的结果不仅确定了由深度神经网络驱动的最有前途的方案，而且从生成变异性和样本唯一性方面说明了每种方法的优势。

22. Recurrent Point Review Models [PDF] 返回目录
Kostadin Cvejoski, Ramses J. Sanchez, Bogdan Georgiev, Christian Bauckhage, Cesar Ojeda
Abstract: Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how to review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, to generate instantaneous language models with improved prediction capabilities. Simultaneously, our methodologies enhance the predictive power of our point process models by incorporating summarized review content representations. We provide recurrent network and temporal convolution solutions for modeling the review content. We deploy our methodologies in the context of recommender systems, effectively characterizing the change in preference and taste of users as time evolves. Source code is available at [1].
摘要：深度神经网络模型代表了自然语言处理的最新方法。在这里，我们以这些方法为基础，以合并时间信息并建模如何查看数据随时间的变化。具体来说，我们使用循环点流程模型的动态表示来编码具有改进的预测功能的即时语言模型，该动态点表示对如何及时接收业务或服务评论的历史进行编码。同时，我们的方法通过合并汇总的评论内容表示，增强了点流程模型的预测能力。我们提供循环网络和时间卷积解决方案，用于对评论内容进行建模。我们在推荐系统中部署我们的方法，以有效地描述随着时间的流逝用户喜好和品味的变化。源代码位于[1]。

23. AI Driven Knowledge Extraction from Clinical Practice Guidelines: Turning Research into Practice [PDF] 返回目录
Musarrat Hussain, Jamil Hussain, Taqdir Ali, Fahad Ahmed Satti, Sungyoung Lee
Abstract: Background and Objectives: Clinical Practice Guidelines (CPGs) represent the foremost methodology for sharing state-of-the-art research findings in the healthcare domain with medical practitioners to limit practice variations, reduce clinical cost, improve the quality of care, and provide evidence based treatment. However, extracting relevant knowledge from the plethora of CPGs is not feasible for already burdened healthcare professionals, leading to large gaps between clinical findings and real practices. It is therefore imperative that state-of-the-art Computing research, especially machine learning is used to provide artificial intelligence based solution for extracting the knowledge from CPGs and reducing the gap between healthcare research/guidelines and practice. Methods: This research presents a novel methodology for knowledge extraction from CPGs to reduce the gap and turn the latest research findings into clinical practice. First, our system classifies the CPG sentences into four classes such as condition-action, condition-consequences, action, and not-applicable based on the information presented in a sentence. We use deep learning with state-of-the-art word embedding, improved word vectors technique in classification process. Second, it identifies qualifier terms in the classified sentences, which assist in recognizing the condition and action phrases in a sentence. Finally, the condition and action phrase are processed and transformed into plain rule If Condition(s) Then Action format. Results: We evaluate the methodology on three different domains guidelines including Hypertension, Rhinosinusitis, and Asthma. The deep learning model classifies the CPG sentences with an accuracy of 95%. While rule extraction was validated by user-centric approach, which achieved a Jaccard coefficient of 0.6, 0.7, and 0.4 with three human experts extracted rules, respectively.
摘要：背景与目的：临床实践指南（CPG）是与医疗从业人员共享医疗领域最新研究成果的首要方法，以限制实践变化，降低临床成本，提高护理质量并提供循证治疗。但是，从已经大量负担的CPG中提取相关知识对于已经负担很重的医疗保健专业人员是不可行的，从而导致临床发现与实际操作之间存在巨大差距。因此，必须将最新的计算机研究（尤其是机器学习）用于提供基于人工智能的解决方案，以从CPG中提取知识并缩小医疗保健研究/指南与实践之间的差距。方法：本研究提出了一种从CPG中提取知识的新方法，以缩小差距并将最新研究成果转化为临床实践。首先，我们的系统根据句子中显示的信息将CPG句子分为四个类别，例如条件动作，条件后果，动作和不适用。在分类过程中，我们将深度学习与最先进的词嵌入技术和改进的词向量技术结合使用。其次，它在分类的句子中标识限定词，这有助于识别句子中的条件和动作短语。最后，条件和动作短语将被处理并转换为简单规则If Condition（s）然后Action格式。结果：我们在三种不同的领域指南（包括高血压，鼻-鼻窦炎和哮喘）上评估了该方法。深度学习模型对CPG句子进行分类的准确性为95％。规则提取通过以用户为中心的方法进行了验证，在三位人类专家提取的规则下，Jaccard系数分别为0.6、0.7和0.4。

24. Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition [PDF] 返回目录
Binbin Zhang, Di Wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
Abstract: In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. Our model adopts the hybrid CTC/attention architecture, in which the conformer layers in the encoder are modified. We propose a dynamic chunk-based attention strategy to allow arbitrary right context length. At inference time, the CTC decoder generates n-best hypotheses in a streaming way. The inference latency could be easily controlled by only changing the chunk size. The CTC hypotheses are then rescored by the attention decoder to get the final result. This efficient rescoring process causes very little sentence-level latency. Our experiments on the open 170-hour AISHELL-1 dataset show that, the proposed method can unify the streaming and non-streaming model simply and efficiently. On the AISHELL-1 test set, our unified model achieves 5.60% relative character error rate (CER) reduction in non-streaming ASR compared to a standard non-streaming transformer. The same model achieves 5.42% CER with 640ms latency in a streaming ASR system.
摘要：在本文中，我们提出了一种新颖的两遍方法，以在单个模型中统一流式传输和非流式端到端（E2E）语音识别。我们的模型采用了混合CTC /注意架构，其中对编码器中的conformer层进行了修改。我们提出了一种基于动态块的注意策略，以允许任意正确的上下文长度。在推断时间，CTC解码器以流方式生成n个最佳假设。仅更改块大小即可轻松控制推理延迟。然后，注意解码器对CTC假设进行重新评分，以获得最终结果。这种高效的记录过程几乎不会造成句子级的延迟。我们对170小时开放的AISHELL-1数据集进行的实验表明，该方法可以简单，有效地统一流和非流模型。与标准的非流式变压器相比，在AISHELL-1测试仪上，我们的统一模型在非流式ASR中实现了5.60％的相对字符错误率（CER）降低。在流式ASR系统中，同一型号的CER达到了5.42％，延迟为640ms。

25. Research Challenges in Designing Differentially Private Text Generation Mechanisms [PDF] 返回目录
Oluwaseyi Feyisetan, Abhinav Aggarwal, Zekun Xu, Nathanael Teissier
Abstract: Accurately learning from user data while ensuring quantifiable privacy guarantees provides an opportunity to build better Machine Learning (ML) models while maintaining user trust. Recent literature has demonstrated the applicability of a generalized form of Differential Privacy to provide guarantees over text queries. Such mechanisms add privacy preserving noise to vectorial representations of text in high dimension and return a text based projection of the noisy vectors. However, these mechanisms are sub-optimal in their trade-off between privacy and utility. This is due to factors such as a fixed global sensitivity which leads to too much noise added in dense spaces while simultaneously guaranteeing protection for sensitive outliers. In this proposal paper, we describe some challenges in balancing the tradeoff between privacy and utility for these differentially private text mechanisms. At a high level, we provide two proposals: (1) a framework called LAC which defers some of the noise to a privacy amplification step and (2), an additional suite of three different techniques for calibrating the noise based on the local region around a word. Our objective in this paper is not to evaluate a single solution but to further the conversation on these challenges and chart pathways for building better mechanisms.
摘要：准确地从用户数据中学习，同时确保可量化的隐私保证，为建立更好的机器学习（ML）模型并保持用户信任度提供了机会。最近的文献已经证明了一般形式的差异隐私的适用性，以提供对文本查询的保证。这样的机制将保护隐私的噪声添加到高维文本的矢量表示中，并返回基于文本的噪声矢量投影。但是，这些机制在隐私和实用程序之间的权衡取舍不够理想。这是由于诸如固定的全局灵敏度之类的因素造成的，该灵敏度导致在密集空间中添加过多的噪声，同时又确保了对敏感异常值的保护。在这份建议书中，我们描述了在这些差异化私人文本机制之间平衡隐私和实用性之间的权衡的一些挑战。在较高级别上，我们提供了两个建议：（1）一个称为LAC的框架，该框架将一些噪声延迟到隐私放大步骤中；（2），这是另外三种套件，用于基于周围的局部区域来校准噪声一个字。本文的目的不是评估一个单一的解决方案，而是进一步讨论这些挑战和建立更好机制的途径。

26. Topological Planning with Transformers for Vision-and-Language Navigation [PDF] 返回目录
Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese
Abstract: Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
摘要：视觉和语言导航（VLN）的常规方法经过端到端培训，但在自由穿越的环境中很难取得良好的性能。受机器人技术界的启发，我们提出了使用拓扑图的VLN模块化方法。给定自然语言指令和拓扑图，我们的方法利用注意力机制来预测地图中的导航计划。然后使用健壮的控制器以低级动作（例如，前进，旋转）执行该计划。实验表明，我们的方法优于以前的端到端方法，生成可解释的导航计划，并展现出智能行为，例如回溯。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-11

目录

摘要