1. An Empirical Accuracy Law for Sequential Machine Translation: the Case of Google Translate [PDF]
  Lucas Nunes Sequeira, Bruno Moreschi, Fabio Gagliardi Cozman, Bernardo Fontes
Abstract: We have established, through empirical testing, a law that relates the number of translating hops to translation accuracy in sequential machine translation in Google Translate. Both accuracy and size decrease with the number of hops; the former displays a decrease closely following a power law. Such a law allows one to predict the behavior of translation chains that may be built as society increasingly depends on automated devices.

2. HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference [PDF]
  Tianyu Liu, Xin Zheng, Baobao Chang, Zhifang Sui
Abstract: Many recent studies have shown that for models trained on datasets for natural language inference (NLI), it is possible to make correct predictions by merely looking at the hypothesis while completely ignoring the premise. In this work, we manage to derive adversarial examples in terms of the hypothesis-only bias and explore eligible ways to mitigate such bias. Specifically, we extract various phrases from the hypotheses (artificial patterns) in the training sets, and show that they have been strong indicators to the specific labels. We then figure out `hard' and `easy' instances from the original test sets whose labels are opposite to or consistent with those indications. We also set up baselines including both pretrained models (BERT, RoBERTa, XLNet) and competitive non-pretrained models (InferSent, DAM, ESIM). Apart from the benchmark and baselines, we also investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias: down-sampling and adversarial training. We believe those methods can be treated as competitive baselines in NLI debiasing tasks.

3. Zero-Shot Cross-Lingual Transfer with Meta Learning [PDF]
  Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein
Abstract: Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to improve the performance of downstream tasks. The same applies to sharing between languages, and is especially important when considering the fact that most languages in the world suffer from being under-resourced. In this paper, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks (natural language inference, question answering). Our extensive experimental setup demonstrates the consistent effectiveness of meta-learning, on a total 16 languages. We improve upon state-of-the-art on zero-shot and few-shot NLI and QA tasks on the XNLI and X-WikiRe datasets, respectively. We further conduct a comprehensive analysis which indicates that correlation of typological features between languages can further explain when parameter sharing learned via meta learning is beneficial.

4. Fact Check-Worthiness Detection as Positive Unlabelled Learning [PDF]
  Dustin Wright, Isabelle Augenstein
Abstract: A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece of information should be checked for veracity. There are multiple isolated lines of research which address this core issue: check-worthiness detection from political speeches and debates, rumour detection on Twitter, and citation needed detection from Wikipedia. What is still lacking is a structured comparison of these variants of check-worthiness, as well as a unified approach to them. We find that check-worthiness detection is a very challenging task in any domain, because it both hinges upon detecting how factual a sentence is, and how likely a sentence is to be believed without verification. As such, annotators often only mark those instances they judge to be clear-cut check-worthy. Our best-performing method automatically corrects for this, using a variant of positive unlabelled learning, which learns when an instance annotated as not check-worthy should in fact have been annotated as being check-worthy. In applying this, we outperform the state of the art in two of the three domains studied for check-worthiness detection in English.

5. SentenceMIM: A Latent Variable Language Model [PDF]
  Micha Livne, Kevin Swersky, David J. Fleet
Abstract: We introduce sentenceMIM, a probabilistic auto-encoder for language modelling, trained with Mutual Information Machine (MIM) learning. Previous attempts to learn variational auto-encoders for language data? have had mixed success, with empirical performance well below state-of-the-art auto-regressive models, a key barrier being the? occurrence of posterior collapse with VAEs. The recently proposed MIM framework encourages high mutual information between observations and latent variables, and is more robust against posterior collapse. This paper formulates a MIM model for text data, along with a corresponding learning algorithm. We demonstrate excellent perplexity (PPL) results on several datasets, and show that the framework learns a rich latent space, allowing for interpolation between sentences of different lengths with a fixed-dimensional latent representation. We also demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering, a transfer learning task, without fine-tuning. To the best of our knowledge, this is the first latent variable model (LVM) for text modelling that achieves competitive performance with non-LVM models.

6. RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System [PDF]
  Helena H. Lee, Ke Shu, Palakorn Achananuparp, Philips Kokoh Prasetyo, Yue Liu, Ee-Peng Lim, Lav R. Varshney
Abstract: Interests in the automatic generation of cooking recipes have been growing steadily over the past few years thanks to a large amount of online cooking recipes. We present RecipeGPT, a novel online recipe generation and evaluation system. The system provides two modes of text generations: (1) instruction generation from given recipe title and ingredients; and (2) ingredient generation from recipe title and cooking instructions. Its back-end text generation module comprises a generative pre-trained language model GPT-2 fine-tuned on a large cooking recipe dataset. Moreover, the recipe evaluation module allows the users to conveniently inspect the quality of the generated recipe contents and store the results for future reference. RecipeGPT can be accessed online at this https URL.
摘要:兴趣在自动生成烹饪食谱已经过去由于大量的在线烹饪食谱的几年以上稳定增长。我们提出RecipeGPT,一种新型的在线配方产生和评价体系。该系统提供文本代的两种模式:(1)从给定的配方标题和成分指令产生;和(2)成分生成从配方标题和烹饪的指令。其后端文本生成模块包括生成预训练语言模型在一个大的烹饪食谱数据集GPT-2微调。此外,配方评估模块允许用户方便地检查所产生的配方内容的质量和储存以备将来参考的结果。 RecipeGPT可在网上这个HTTPS URL访问。

7. Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout [PDF]
  Filip Graliński, Tomasz Stanisławek, Anna Wróblewska, Dawid Lipiński, Agnieszka Kaliska, Paulina Rosalska, Bartosz Topolski, Przemysław Biecek
Abstract: State-of-the-art solutions for Natural Language Processing (NLP) are able to capture a broad range of contexts, like the sentence level context or document level context for short documents. But these solutions are still struggling when it comes to real-world longer documents with information encoded in the spatial structure of the document, in elements like tables, forms, headers, openings or footers, or the complex layout of pages or multiple pages. To encourage progress on deeper and more complex information extraction, we present a new task (named Kleister) with two new datasets. Based on textual and structural layout features, an NLP system must find the most important information, about various types of entities, in formal long documents. These entities are not only classes from standard named entity recognition (NER) systems (e.g. location, date, or amount) but also the roles of the entities in the whole documents (e.g. company town address, report date, income amount).

8. A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection [PDF]
  Daniele Bonadiman, Alessandro Moschitti
Abstract: An essential task of most Question Answering (QA) systems is to re-rank the set of answer candidates, i.e., Answer Sentence Selection (A2S). These candidates are typically sentences either extracted from one or more documents preserving their natural order or retrieved by a search engine. Most state-of-the-art approaches to the task use huge neural models, such as BERT, or complex attentive architectures. In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results with respect to the state of the art while retaining high efficiency. Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the $\sim 18$ minutes required by a standard BERT-base fine-tuning.
摘要:大多数问题回答(QA)系统的一个重要任务是重新排名的一组答案考生,即答句精选(A2S)。这些候选人通常是从保持它们的自然顺序或由搜索引擎检索到的一个或多个文档或者提取句子。大多数国家的最先进的方法,以任务使用巨大的神经模型,如BERT,或复杂的周到架构。在本文中,我们认为,通过用有效字关联性编码器利用原始等级的本征结构一起,就可以实现相对于现有技术的状态的竞争结果,同时保持高效率。我们的模型需要9.5秒对WikiQA数据集训练,即很快与一个标准的BERT基微调所需的$ \卡$ 18分钟比较。

9. BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward [PDF]
  Florian Schmidt, Thomas Hofmann
Abstract: Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.

10. Phase transitions in a decentralized graph-based approach to human language [PDF]
  Javier Vera, Felipe Urbina, Wenceslao Palma
Abstract: Zipf's law establishes a scaling behavior for word-frequencies in large text corpora. The appearance of Zipfian properties in human language has been previously explained as an optimization problem for the interests of speakers and hearers. On the other hand, human-like vocabularies can be viewed as bipartite graphs. The aim here is double: within a bipartite-graph approach to human vocabularies, to propose a decentralized language game model for the formation of Zipfian properties. To do this, we define a language game, in which a population of artificial agents is involved in idealized linguistic interactions. Numerical simulations show the appearance of a phase transition from an initially disordered state to three possible phases for language formation. Our results suggest that Zipfian properties in language seem to arise partly from decentralized linguistic interactions between agents endowed with bipartite word-meaning mappings.

11. An Incremental Explanation of Inference in Hybrid Bayesian Networks for Increasing Model Trustworthiness and Supporting Clinical Decision Making [PDF]
  Evangelia Kyrimi, Somayyeh Mossadegh, Nigel Tai, William Marsh
Abstract: Various AI models are increasingly being considered as part of clinical decision-support tools. However, the trustworthiness of such models is rarely considered. Clinicians are more likely to use a model if they can understand and trust its predictions. Key to this is if its underlying reasoning can be explained. A Bayesian network (BN) model has the advantage that it is not a black-box and its reasoning can be explained. In this paper, we propose an incremental explanation of inference that can be applied to hybrid BNs, i.e. those that contain both discrete and continuous nodes. The key questions that we answer are: (1) which important evidence supports or contradicts the prediction, and (2) through which intermediate variables does the information flow. The explanation is illustrated using a real clinical case study. A small evaluation study is also conducted.

12. Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems [PDF]
  Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, Bo Yuan
Abstract: As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker's voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired) speaker label. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Experiment using a public dataset of $109$ English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%. The attack launching time also achieves a 100X speedup over contemporary non-universal attacks.
摘要:随着语音用户界面(VUI)的普及,近年来爆炸,说话人识别系统已经成为识别许多与安全所需的应用程序和服务的扬声器的重要媒介。在本文中,我们提出了对国家的最先进的深层神经网络(DNN)的说话人识别系统的第一个实时的,普遍的和强大的敌对攻击。通过添加音频无关的普遍的扰动上的任意登记的演讲人的语音输入,所述基于DNN-说话人识别系统将确定所述扬声器作为任何目标(即,攻击者期望的)扬声器的标签。此外,我们通过模拟通过估计房间脉冲响应(RIR)所造成的物理过度的空气传播的声音失真提高我们的攻击的鲁棒性。实验使用的$ $ 109英语为母语的公开数据集显示了我们提出的攻击有超过90%的高攻成功率的有效性和鲁棒性。攻击发起时间也实现了100倍的加速比当代非通用攻击。
