摘要

1. An Empirical Accuracy Law for Sequential Machine Translation: the Case of Google Translate [PDF] 返回目录
Lucas Nunes Sequeira, Bruno Moreschi, Fabio Gagliardi Cozman, Bernardo Fontes
Abstract: We have established, through empirical testing, a law that relates the number of translating hops to translation accuracy in sequential machine translation in Google Translate. Both accuracy and size decrease with the number of hops; the former displays a decrease closely following a power law. Such a law allows one to predict the behavior of translation chains that may be built as society increasingly depends on automated devices.
摘要：我们已经建立，通过实证检验，这涉及谷歌翻译啤酒花翻译准确性顺序机器翻译翻译数量的法律。精确度和尺寸的减小与跳数;前者显示的下降密切关注功法。这样的法律允许预测可能被构建为社会越来越依赖于自动化设备转换链的行为。

2. HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference [PDF] 返回目录
Tianyu Liu, Xin Zheng, Baobao Chang, Zhifang Sui
Abstract: Many recent studies have shown that for models trained on datasets for natural language inference (NLI), it is possible to make correct predictions by merely looking at the hypothesis while completely ignoring the premise. In this work, we manage to derive adversarial examples in terms of the hypothesis-only bias and explore eligible ways to mitigate such bias. Specifically, we extract various phrases from the hypotheses (artificial patterns) in the training sets, and show that they have been strong indicators to the specific labels. We then figure out `hard' and `easy' instances from the original test sets whose labels are opposite to or consistent with those indications. We also set up baselines including both pretrained models (BERT, RoBERTa, XLNet) and competitive non-pretrained models (InferSent, DAM, ESIM). Apart from the benchmark and baselines, we also investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias: down-sampling and adversarial training. We believe those methods can be treated as competitive baselines in NLI debiasing tasks.
摘要：最近许多研究表明，经过训练数据集上的自然语言推理（NLI）模式，可以通过仅仅是看着假说，而完全忽视的前提下做出正确的预测。在这项工作中，我们成功地汲取对抗的例子中唯一的假设，偏见方面，探索符合条件的方式来减轻这种偏见。具体来说，我们提取的训练集的假设（人工模式）不同的短语，并表明他们一直坚挺指标的特定标签。然后，我们计算出从原来的测试集，其标签是相反的或与这些说明一致的'硬“和'方便”的情况。我们还建立了基准既包括预训练模型（BERT，罗伯塔，XLNet）和有竞争力的非预训练模型（InferSent，DAM，ESIM）。除了基准和基准，我们还调查其利用人工图案造型，以减轻这种只假设偏置2点消除直流偏压的方法：下采样和对抗性训练。我们相信，这些方法可以在NLI消除直流偏压任务有竞争力的基线处理。

3. Zero-Shot Cross-Lingual Transfer with Meta Learning [PDF] 返回目录
Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein
Abstract: Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to improve the performance of downstream tasks. The same applies to sharing between languages, and is especially important when considering the fact that most languages in the world suffer from being under-resourced. In this paper, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks (natural language inference, question answering). Our extensive experimental setup demonstrates the consistent effectiveness of meta-learning, on a total 16 languages. We improve upon state-of-the-art on zero-shot and few-shot NLI and QA tasks on the XNLI and X-WikiRe datasets, respectively. We further conduct a comprehensive analysis which indicates that correlation of typological features between languages can further explain when parameter sharing learned via meta learning is beneficial.
摘要：学习如何在任务间共享一直是高度重视的话题近来，随着知识的共享战略已经显示出改善的下游任务的性能。这同样适用于语言之间共享，并考虑到在世界上大多数语言从资源不足是受害事实时尤为重要。在本文中，我们考虑在同一时间，在很少或根本没有数据可用于英语以外的语言在多个不同的语言培训模式的设置。我们表明，这种具有挑战性的设置可以使用元学习，在那里，除了训练源语言模型，另一个模型学会选择哪些训练实例是最有利的接近。我们尝试使用标准的监督，零次跨语言，以及为不同的自然语言理解任务（自然语言推理，问题解答）为数不多的射门跨语言设置。我们广泛的实验装置演示元学习的一致有效性上共有16种语言。我们分别提高在国家的最先进的零次和几个次NLI和QA任务的XNLI和X-WikiRe数据集。我们进一步进行了全面的分析，这表明的类型学特征语言之间的相关性可以进一步解释，当参数共享通过学习荟萃学是有益的。

4. Fact Check-Worthiness Detection as Positive Unlabelled Learning [PDF] 返回目录
Dustin Wright, Isabelle Augenstein
Abstract: A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece of information should be checked for veracity. There are multiple isolated lines of research which address this core issue: check-worthiness detection from political speeches and debates, rumour detection on Twitter, and citation needed detection from Wikipedia. What is still lacking is a structured comparison of these variants of check-worthiness, as well as a unified approach to them. We find that check-worthiness detection is a very challenging task in any domain, because it both hinges upon detecting how factual a sentence is, and how likely a sentence is to be believed without verification. As such, annotators often only mark those instances they judge to be clear-cut check-worthy. Our best-performing method automatically corrects for this, using a variant of positive unlabelled learning, which learns when an instance annotated as not check-worthy should in fact have been annotated as being check-worthy. In applying this, we outperform the state of the art in two of the three domains studied for check-worthiness detection in English.
摘要：自动打击误传关键部件是检测事实检查适航的，即确定是否一条信息应真实性进行检查。有研究其解决这一核心问题，多个相互隔离的线路：从政治演讲和辩论，在Twitter上辟谣检测和维基百科引文需要识别检验适航检测。什么是仍然缺乏是检查适航这些变体，以及作为一个统一的方式对他们的结构比较。我们发现，检查适航检测是在任何领域一个非常具有挑战性的任务，因为它在检测两个铰链的句子怎么事实是，怎么可能是一个句子是不进行验证可以相信的。这样，注释者往往只标出那些他们判断是明确的检查值得实例。我们表现最好的方法自动纠正这一点，使用正未标记的学习，这获悉当实例标注为未入住值得其实应该被标注为检查值得的变体。在应用此，我们胜过两个研究了在英语检查适航检测三个领域的技术状态。

5. SentenceMIM: A Latent Variable Language Model [PDF] 返回目录
Micha Livne, Kevin Swersky, David J. Fleet
Abstract: We introduce sentenceMIM, a probabilistic auto-encoder for language modelling, trained with Mutual Information Machine (MIM) learning. Previous attempts to learn variational auto-encoders for language data? have had mixed success, with empirical performance well below state-of-the-art auto-regressive models, a key barrier being the? occurrence of posterior collapse with VAEs. The recently proposed MIM framework encourages high mutual information between observations and latent variables, and is more robust against posterior collapse. This paper formulates a MIM model for text data, along with a corresponding learning algorithm. We demonstrate excellent perplexity (PPL) results on several datasets, and show that the framework learns a rich latent space, allowing for interpolation between sentences of different lengths with a fixed-dimensional latent representation. We also demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering, a transfer learning task, without fine-tuning. To the best of our knowledge, this is the first latent variable model (LVM) for text modelling that achieves competitive performance with non-LVM models.
摘要：介绍sentenceMIM，概率自动编码器来进行语言建模，用互信息机（MIM）学习培训。以前曾试图学习变自动编码语言数据？有成败参半，有经验的表现远远低于国家的最先进的自回归模型，一个主要障碍是对？与VAES后坍塌的发生。最近提出的MIM框架鼓励人们观察和潜在变量之间的高互信息，并针对后崩溃更稳健。本文制定的文本数据的MIM模型，具有相应的学习算法一起。我们证明在几个数据集优异的困惑（PPL）的结果，并显示该框架学习了丰富的潜在空间，允许具有固定维的潜在表示不同长度的句子之间的插值。我们还利用对问题回答，传递学习任务训练模型，无需微调证明sentenceMIM的多功能性。据我们所知，这是为实现与非LVM车型竞争力的性能文字造型的第一潜变量模型（LVM）。

6. RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System [PDF] 返回目录
Helena H. Lee, Ke Shu, Palakorn Achananuparp, Philips Kokoh Prasetyo, Yue Liu, Ee-Peng Lim, Lav R. Varshney
Abstract: Interests in the automatic generation of cooking recipes have been growing steadily over the past few years thanks to a large amount of online cooking recipes. We present RecipeGPT, a novel online recipe generation and evaluation system. The system provides two modes of text generations: (1) instruction generation from given recipe title and ingredients; and (2) ingredient generation from recipe title and cooking instructions. Its back-end text generation module comprises a generative pre-trained language model GPT-2 fine-tuned on a large cooking recipe dataset. Moreover, the recipe evaluation module allows the users to conveniently inspect the quality of the generated recipe contents and store the results for future reference. RecipeGPT can be accessed online at this https URL.
摘要：兴趣在自动生成烹饪食谱已经过去由于大量的在线烹饪食谱的几年以上稳定增长。我们提出RecipeGPT，一种新型的在线配方产生和评价体系。该系统提供文本代的两种模式：（1）从给定的配方标题和成分指令产生;和（2）成分生成从配方标题和烹饪的指令。其后端文本生成模块包括生成预训练语言模型在一个大的烹饪食谱数据集GPT-2微调。此外，配方评估模块允许用户方便地检查所产生的配方内容的质量和储存以备将来参考的结果。 RecipeGPT可在网上这个HTTPS URL访问。

7. Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout [PDF] 返回目录
Filip Graliński, Tomasz Stanisławek, Anna Wróblewska, Dawid Lipiński, Agnieszka Kaliska, Paulina Rosalska, Bartosz Topolski, Przemysław Biecek
Abstract: State-of-the-art solutions for Natural Language Processing (NLP) are able to capture a broad range of contexts, like the sentence level context or document level context for short documents. But these solutions are still struggling when it comes to real-world longer documents with information encoded in the spatial structure of the document, in elements like tables, forms, headers, openings or footers, or the complex layout of pages or multiple pages. To encourage progress on deeper and more complex information extraction, we present a new task (named Kleister) with two new datasets. Based on textual and structural layout features, an NLP system must find the most important information, about various types of entities, in formal long documents. These entities are not only classes from standard named entity recognition (NER) systems (e.g. location, date, or amount) but also the roles of the entities in the whole documents (e.g. company town address, report date, income amount).
摘要：最先进的国家的最自然语言处理（NLP）解决方案能够捕捉到广泛的背景，如短文件句子层面语境或文档级别的上下文。但是，当谈到与文档的空间结构进行编码，像表格，表单，页眉，开口或页脚或页面或多个页面布局复杂元素的信息真实世界更长的文档这些解决方案仍在挣扎。为了鼓励更深入和更复杂的信息提取的进步，我们提出了两个新的数据集，一个新的任务（名为Kleister）。基于文本和结构布局特点，一个NLP系统必须找到最重要的信息，关于各类型的实体，在正式的长文档。这些实体是从标准的命名实体识别（NER）系统（例如，位置，日期或金额）不仅课而且在整个文件中的实体的角色（如公司地址镇，报告日期，收入金额）。

8. A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection [PDF] 返回目录
Daniele Bonadiman, Alessandro Moschitti
Abstract: An essential task of most Question Answering (QA) systems is to re-rank the set of answer candidates, i.e., Answer Sentence Selection (A2S). These candidates are typically sentences either extracted from one or more documents preserving their natural order or retrieved by a search engine. Most state-of-the-art approaches to the task use huge neural models, such as BERT, or complex attentive architectures. In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results with respect to the state of the art while retaining high efficiency. Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the $\sim 18$ minutes required by a standard BERT-base fine-tuning.
摘要：大多数问题回答（QA）系统的一个重要任务是重新排名的一组答案考生，即答句精选（A2S）。这些候选人通常是从保持它们的自然顺序或由搜索引擎检索到的一个或多个文档或者提取句子。大多数国家的最先进的方法，以任务使用巨大的神经模型，如BERT，或复杂的周到架构。在本文中，我们认为，通过用有效字关联性编码器利用原始等级的本征结构一起，就可以实现相对于现有技术的状态的竞争结果，同时保持高效率。我们的模型需要9.5秒对WikiQA数据集训练，即很快与一个标准的BERT基微调所需的$ \卡$ 18分钟比较。

9. BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward [PDF] 返回目录
Florian Schmidt, Thomas Hofmann
Abstract: Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.
摘要：测量与一组引用的生成序列的质量是许多学习框架的中心问题，是它以计算得分，分配奖励，或进行歧视。尽管在模型架构，即独立引用的数量规模指标的巨大进步仍是基于正克估计。我们表明，底层操作，字数统计和比较计数，可以解除嵌入文字和比较的嵌入。的BERT的嵌入示出了深入分析凭经验该上下文的嵌入可被用于捕获所需要的依赖，同时保持通过适当的修剪必要的可扩展性和平滑技术。我们投无条件一代不如强化学习问题，表明我们的奖励功能的确提供了比在这个充满挑战的设置n元的奖励更有效的学习信号。

10. Phase transitions in a decentralized graph-based approach to human language [PDF] 返回目录
Javier Vera, Felipe Urbina, Wenceslao Palma
Abstract: Zipf's law establishes a scaling behavior for word-frequencies in large text corpora. The appearance of Zipfian properties in human language has been previously explained as an optimization problem for the interests of speakers and hearers. On the other hand, human-like vocabularies can be viewed as bipartite graphs. The aim here is double: within a bipartite-graph approach to human vocabularies, to propose a decentralized language game model for the formation of Zipfian properties. To do this, we define a language game, in which a population of artificial agents is involved in idealized linguistic interactions. Numerical simulations show the appearance of a phase transition from an initially disordered state to three possible phases for language formation. Our results suggest that Zipfian properties in language seem to arise partly from decentralized linguistic interactions between agents endowed with bipartite word-meaning mappings.
摘要：齐普夫定律建立了大型语料库字频率缩放行为。在人类语言Zipfian性质的出现为演讲者和听众的利益的最优化问题事先说明。在另一方面，类似人类的词汇可以被看作是二分图。本文的目的是双重：二分，图法对人的词汇中，提出了Zipfian特性的形成分散的语言游戏模式。要做到这一点，我们定义了一个语言游戏，其中人工坐席的群体参与理想化的语言互动。数值仿真表明的相变的从初始的无序状态的外观以三种可能的相对于语言的形成。我们的研究结果表明，在语言Zipfian性质似乎从赋有二分词意映射代理之间的分散化语言的交互部分出现。

11. An Incremental Explanation of Inference in Hybrid Bayesian Networks for Increasing Model Trustworthiness and Supporting Clinical Decision Making [PDF] 返回目录
Evangelia Kyrimi, Somayyeh Mossadegh, Nigel Tai, William Marsh
Abstract: Various AI models are increasingly being considered as part of clinical decision-support tools. However, the trustworthiness of such models is rarely considered. Clinicians are more likely to use a model if they can understand and trust its predictions. Key to this is if its underlying reasoning can be explained. A Bayesian network (BN) model has the advantage that it is not a black-box and its reasoning can be explained. In this paper, we propose an incremental explanation of inference that can be applied to hybrid BNs, i.e. those that contain both discrete and continuous nodes. The key questions that we answer are: (1) which important evidence supports or contradicts the prediction, and (2) through which intermediate variables does the information flow. The explanation is illustrated using a real clinical case study. A small evaluation study is also conducted.
摘要：各种型号的AI越来越多地被视为临床决策支持工具的一部分。然而，这种模式的可信度很少被考虑。临床医生更容易使用的模型，如果他们能理解并相信它的预测。关键是，如果它的根本理由可以解释。贝叶斯网络（BN）模型的优势在于它不是一个黑盒子和推理来解释。在本文中，我们提出了推断的增量解释，即可以适用于混合动力贝叶斯网络，即那些含有两个离散和连续节点。我们回答的关键问题是：（1）重要的证据支持或违背了预测;（2）通过中间变量确实的信息流。对此的解释是使用一个真正的临床病例研究说明。一个小的评价研究还进行。

12. Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems [PDF] 返回目录
Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, Bo Yuan
Abstract: As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker's voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired) speaker label. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Experiment using a public dataset of $109$ English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%. The attack launching time also achieves a 100X speedup over contemporary non-universal attacks.
摘要：随着语音用户界面（VUI）的普及，近年来爆炸，说话人识别系统已经成为识别许多与安全所需的应用程序和服务的扬声器的重要媒介。在本文中，我们提出了对国家的最先进的深层神经网络（DNN）的说话人识别系统的第一个实时的，普遍的和强大的敌对攻击。通过添加音频无关的普遍的扰动上的任意登记的演讲人的语音输入，所述基于DNN-说话人识别系统将确定所述扬声器作为任何目标（即，攻击者期望的）扬声器的标签。此外，我们通过模拟通过估计房间脉冲响应（RIR）所造成的物理过度的空气传播的声音失真提高我们的攻击的鲁棒性。实验使用的$ $ 109英语为母语的公开数据集显示了我们提出的攻击有超过90％的高攻成功率的有效性和鲁棒性。攻击发起时间也实现了100倍的加速比当代非通用攻击。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-03-06

目录

摘要