摘要

1. Causal Inference of Script Knowledge [PDF] 返回目录
Noah Weber, Rachel Rudinger, Benjamin Van Durme
Abstract: When does a sequence of events define an everyday scenario and how can this knowledge be induced from text? Prior works in inducing such scripts have relied on, in one form or another, measures of correlation between instances of events in a corpus. We argue from both a conceptual and practical sense that a purely correlation-based approach is insufficient, and instead propose an approach to script induction based on the causal effect between events, formally defined via interventions. Through both human and automatic evaluations, we show that the output of our method based on causal effects better matches the intuition of what a script represents
摘要：什么时候的事件序列定义一个日常场景，并如何将这些知识从文字诱导？在诱导这些脚本之前的作品一直依赖，在这种或那种形式，在语料库事件的实例之间的相关措施。我们无论从理论和实践意义上，纯粹基于相关性的方法是不够的争论，而是提出了基于事件之间的因果关系到脚本诱导的方法，通过干预措施正式确定。通过人力和自动评估，我们证明了我们方法的基础上因果效果的输出一个什么样的剧本代表直觉更好地匹配

2. NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts [PDF] 返回目录
Salvador Lima, Naiara Perez, Montse Cuadros, German Rigau
Abstract: This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest publicly available corpus for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.
摘要：本文介绍了NUBes语料库（否定和不确定性的注解生物医学文本西班牙语）的第一个版本。该语料库是一个持续的研究的一部分，目前由来自与否定和不确定性注释匿名健康档案获得的29682个句子。文章包括与西班牙类似的语料库，并提出了主要的注释和设计决策的详尽比较。此外，我们采用深学习算法来验证注释数据集进行初步实验。据我们所知，NUBes是西班牙语否定最大的公开可用的语料库和第一，又结合了炒作的线索，范围和事件的注释。

3. Revisiting the linearity in cross-lingual embedding mappings: from a perspective of word analogies [PDF] 返回目录
Xutan Peng, Chenghua Lin, Mark Stevenson, Chen li
Abstract: Most cross-lingual embedding mapping algorithms assume the optimised transformation functions to be linear. Recent studies showed that on some occasions, learning a linear mapping does not work, indicating that the commonly-used assumption may fail. However, it still remains unclear under which conditions the linearity of cross-lingual embedding mappings holds. In this paper, we rigorously explain that the linearity assumption relies on the consistency of analogical relations encoded by multilingual embeddings. We did extensive experiments to validate this claim. Empirical results based on the analogy completion benchmark and the BLI task demonstrate a strong correlation between whether mappings capture analogical information and are linear.
摘要：大多数跨语种嵌入映射算法假定优化的变换函数是线性的。最近的研究表明，在某些情况下，学习的线性映射是不行的，这表明常用的假设可能会失败。然而，仍然不清楚在什么条件下跨语言嵌入映射的线性成立。在本文中，我们严格解释，线性假设依赖于多语种的嵌入编码的类比关系的一致性。我们进行了大量的实验来验证这一说法。基于类比完成基准和BLI任务实证结果表明映射捕获是否类比信息之间的强相关性，并是线性的。

4. How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context [PDF] 返回目录
Jey Han Lau, Carlos S. Armendariz, Shalom Lappin, Matthew Purver, Chang Shu
Abstract: We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raises acceptability. Next, we test unidirectional and bidirectional language models in their ability to predict acceptability ratings. The bidirectional models show very promising results, with the best model achieving a new state-of-the-art for unsupervised acceptability prediction. The two sets of experiments provide insights into the cognitive aspects of sentence processing and central issues in the computational modelling of text and discourse.
摘要：我们研究范围内对句子可接受性的影响。首先，我们比较孤立地判断句子的可接受性评级，与相关的上下文，并用不相关的上下文。我们的研究结果表明这方面引起人类认知负荷，从而压缩评级分布。此外，在相关环境中，我们观察到的话语的连贯性效果，其均匀提高可接受性。接下来，我们在他们的预测接受收视率的能力测试单向和双向语言模型。双向模型显示出非常有希望的结果，最好的模式实现了新的国家的最先进的无监督可接受的预测。两组实验提供深入了解判决处理和中心问题的文字和话语的计算模型的认知方面。

5. Mapping Languages and Demographics with Georeferenced Corpora [PDF] 返回目录
Jonathan Dunn, Ben Adams
Abstract: This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts of the world the datasets are most representative of actual populations; and (iii) how to weight the datasets to provide more accurate representations of underlying populations. The paper finds that the two datasets represent very different populations and that they correlate with actual populations with values of r=0.60 (social media) and r=0.49 (web-crawled). Further, Twitter data makes better predictions about the inventory of languages used in each country.
摘要：本文评估大地理参考语料，从两个网页抓取和社交媒体来源采取对地面实况人口和语言普查的数据集。该目标是确定（i）在数据集最好代表人口统计; （二）在什么部位世界的数据集是最有代表性的实际人口;和（iii）如何加权的数据集，以提供底层群体的更准确的表示。本文认为，两个数据代表了非常不同的人群，他们与R = 0.60（社交媒体）和r = 0.49（网页抓取）的值实际人口相关。此外，Twitter的数据使得关于每个国家使用的语言库存更好的预测。

6. Mapping Languages: The Corpus of Global Language Use [PDF] 返回目录
Jonathan Dunn
Abstract: This paper describes a web-based corpus of global language use with a focus on how this corpus can be used for data-driven language mapping. First, the corpus provides a representation of where national varieties of major languages are used (e.g., English, Arabic, Russian) together with consistently collected data for each variety. Second, the paper evaluates a language identification model that supports more local languages with smaller sample sizes than alternative off-the-shelf models. Improved language identification is essential for moving beyond majority languages. Given the focus on language mapping, the paper analyzes how well this digital language data represents actual populations by (i) systematically comparing the corpus with demographic ground-truth data and (ii) triangulating the corpus with an alternate Twitter-based dataset. In total, the corpus contains 423 billion words representing 148 languages (with over 1 million words from each language) and 158 countries (again with over 1 million words from each country), all distilled from Common Crawl web data. The main contribution of this paper, in addition to describing this publicly-available corpus, is to provide a comprehensive analysis of the relationship between two sources of digital data (the web and Twitter) as well as their connection to underlying populations.
摘要：本文介绍了，重点是这个语料库可以如何用于数据驱动的语言映射全球语言使用的基于Web的语料库。首先，胼提供在国家品种主要语言的使用（例如，英语，阿拉伯语，俄语），连同一致收集各品种数据的表示。其次，本文评价语言识别模型，支持更小的样本大小比其他关闭的，现成的车型更多本地语言。改进的语言识别是超越大多数语言是必不可少的。给定的焦点上语言映射，分析通过（i）系统的比较人口统计地面实况数据和（ii）三角测量语料库具有替代基于Twitter的数据集中的语料库这个数字语言数据如何以及表示实际群。总体而言，语料库包含代表148种语言（从每个语言超过100万字），158个国家（再一次与来自各个国家超过100万字）423分十亿的话，都来自通用抓取网络数据蒸馏水。本文的主要贡献，除了描述这个公开可用的主体，是提供数字数据（网页和Twitter）的两个源之间的关系进行了全面的分析，以及其对底层人群的连接。

7. Igbo-English Machine Translation: An Evaluation Benchmark [PDF] 返回目录
Ignatius Ezeani, Paul Rayson, Ikechukwu Onyenwe, Chinedu Uchechukwu, Mark Hepple
Abstract: Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no data, tools, and techniques for NLP research. For instance, only 5 out of 2965, 0.19% authors of full text papers in the ACL Anthology extracted from the 5 major conferences in 2018 ACL, NAACL, EMNLP, COLING and CoNLL, are affiliated to African institutions. In this work, we discuss our effort toward building a standard machine translation benchmark dataset for Igbo, one of the 3 major Nigerian languages. Igbo is spoken by more than 50 million people globally with over 50% of the speakers are in southeastern Nigeria. Igbo is low resourced although there have been some efforts toward developing IgboNLP such as part of speech tagging and diacritic restoration
摘要：虽然研究人员和从业者进行突破和提升的NLP工具和方法的能力，对非洲语言的作品却相对滞后。在资源充足的语言，如英语，日语，德语，法语，俄语，普通话很多重点中国等全球7000种语言，包括非洲语言中，低资源为NLP的超过97％，即他们很少或没有数据，工具和NLP研究技术。例如，只有5个2965，在ACL文集全文论文0.19％的作者来自5次主要会议于2018年的ACL，NAACL，EMNLP，COLING和CoNLL提取，附属于非洲机构。在这项工作中，我们讨论了我们努力建立一个标准的机器翻译基准数据集为伊博语，3种主要尼日利亚语言中的一种努力。伊博得到了全球与发言者的50％以上是在尼日利亚东南部超过50亿人使用。伊博是低资源虽然已经有一些努力朝着IgboNLP发展等语音标记和变音恢复的一部分

8. Improving the Utility of Knowledge Graph Embeddings with Calibration [PDF] 返回目录
Tara Safavi, Danai Koutra, Edgar Meij
Abstract: This paper addresses machine learning models that embed knowledge graph entities and relationships toward the goal of predicting unseen triples, which is an important task because most knowledge graphs are by nature incomplete. We posit that while offline link prediction accuracy using embeddings has been steadily improving on benchmark datasets, such embedding models have limited practical utility in real-world knowledge graph completion tasks because it is not clear when their predictions should be accepted or trusted. To this end, we propose to calibrate knowledge graph embedding models to output reliable confidence estimates for predicted triples. In crowdsourcing experiments, we demonstrate that calibrated confidence scores can make knowledge graph embeddings more useful to practitioners and data annotators in knowledge graph completion tasks. We also release two resources from our evaluation tasks: An enriched version of the FB15K benchmark and a new knowledge graph dataset extracted from Wikidata.
摘要：本文地址机器学习的，嵌入知识图的实体和关系走向预测未三元的目标，这是一个重要的任务，因为大多数知识图在性质上不完整的模型。我们断定，在使用的嵌入离线链接预测精度对标准数据集一直在稳步提高，例如嵌入模型已经在现实世界的知识图完成任务的实用价值很有限，因为目前尚不清楚当他们的预测应被接受或可信的。为此，我们提出来校准知识图嵌入模型输出可靠的信心，估计预测的三倍。在众包的实验中，我们证明了校准的置信度可以使知识图嵌入更多有用的知识图完成任务从业者和数据注释。我们也从我们的评测任务发布两个资源：一个丰富的FB15K基准，从维基数据中提取新的知识图谱数据集的版本。

9. Learning to cooperate: Emergent communication in multi-agent navigation [PDF] 返回目录
Ivana Kajić, Eser Aygün, Doina Precup
Abstract: Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An analysis of the agents' policies reveals that emergent signals spatially cluster the state space, with signals referring to specific locations and spatial directions such as "left", "up", or "upper left room". Using populations of agents, we show that the emergent protocol has basic compositional structure, thus exhibiting a core property of natural language.
摘要：在人工坐席应急通信进行了研究，了解语言的演变，以及开发学习与人沟通人工系统。我们发现，在不同的环境下gridworld执行协作导航任务代理学习可解释的通信协议，它们能够有效地，在许多情况下，最佳的，解决的任务。代理的策略的分析表明，出射信号空间聚类的状态的空间，与信号参照特定位置和空间方向上，如‘左’，‘上’，或‘左上室’。使用代理群体，我们证明了应急方案已基本组成结构，从而表现出自然语言的核心特性。

10. MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model [PDF] 返回目录
Han Fu, Rui Wu, Chenghao Liu, Jianling Sun
Abstract: Nowadays, driven by the increasing concern on diet and health, food computing has attracted enormous attention from both industry and research community. One of the most popular research topics in this domain is Food Retrieval, due to its profound influence on health-oriented applications. In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes. We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space. To capture the latent alignments between modalities, we incorporate stochastic latent variables to explicitly exploit the interactions between textual and visual features. Importantly, our method learns the cross-modal alignments during training but computes embeddings of different modalities independently at inference time for the sake of efficiency. Extensive experimental results clearly demonstrate that the proposed MCEN outperforms all existing approaches on the benchmark Recipe1M dataset and requires less computational cost.
摘要：如今，通过在饮食和健康的日益关注推动下，食品计算已经引起极大关注来自工业界和研究界。一个在该领域最热门的研究课题是食品检索，由于对健康为导向的应用，影响深远。在本文中，我们专注于食物图像和烹饪食谱之间的跨模态获取的任务。我们目前的模态一致嵌入网络（MCEN），其学习通过投影图像和文字相同的嵌入空间形态不变表示。为了捕捉模式之间的潜在路线，我们结合随机潜在变量明确利用文本和视觉特征之间的相互作用。重要的是，我们的方法在训练期间学习的跨模态路线，但在独立推理时间为求效率计算不同方式的嵌入物。大量的实验结果清楚地表明，该MCEN优于基准的Recipe1M数据集中的所有现有的方法和需要较少的计算成本。

11. Pruned Wasserstein Index Generation Model and wigpy Package [PDF] 返回目录
Fangzhou Xie
Abstract: Recent proposal of Wasserstein Index Generation model (WIG) has shown a new direction for automatically generating indices. However, it is challenging in practice to fit large datasets for two reasons. First, the Sinkhorn distance is notoriously expensive to compute and suffers from dimensionality severely. Second, it requires to compute a full $N\times N$ matrix to be fit into memory, where $N$ is the dimension of vocabulary. When the dimensionality is too large, it is even impossible to compute at all. I hereby propose a Lasso-based shrinkage method to reduce dimensionality for the vocabulary as a pre-processing step prior to fitting the WIG model. After we get the word embedding from Word2Vec model, we could cluster these high-dimensional vectors by $k$-means clustering, and pick most frequent tokens within each cluster to form the "base vocabulary". Non-base tokens are then regressed on the vectors of base token to get a transformation weight and we could thus represent the whole vocabulary by only the "base tokens". This variant, called pruned WIG (pWIG), will enable us to shrink vocabulary dimension at will but could still achieve high accuracy. I also provide a \textit{wigpy} module in Python to carry out computation in both flavor. Application to Economic Policy Uncertainty (EPU) index is showcased as comparison with existing methods of generating time-series sentiment indices.
摘要：瓦瑟斯坦指数代车型（WIG）最近的提议已经显示出自动生成指数的一个新的方向。然而，在实践中是具有挑战性的，以适应大型数据集的原因有两个。首先，Sinkhorn距离来计算，并从维度患有严重众所周知，价格昂贵。其次，它需要计算一个完整的$ N \次N $矩阵是装入到内存中，其中$ N $是词汇量的大小。当维数过大，甚至无法计算的。我在此提出了一种基于套索收缩方法之前降低维数的词汇作为预处理步骤，以拟合WIG模型。之后，我们得到了这个词从Word2Vec模型嵌入，我们可以通过$ķ$ -means集群，这些集群高维向量，并挑选每个集群内最常见的令牌，形成了“基础词汇”。那么非基本令牌退步基地的令牌来获得改造重向量，我们可以仅由“基地令牌”，从而代表整个词汇。这种变异，称为修剪假发（pWIG），将使我们随意收缩词汇层面，但仍然可以达到较高的精度。我也提供了一个\ textit {} wigpy在Python模块中都味道来进行计算。对经济政策的不确定性（EPU）指数中的应用是展示与生成时间序列情绪指数的现有方法的比较。

12. DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [PDF] 返回目录
Vivek Gupta
Abstract: Source code summarizing is a task of writing short, natural language descriptions of source code behavior during run time. Such summaries are extremely useful for software development and maintenance but are expensive to manually author,hence it is done for small fraction of the code that is produced and is often ignored. Automatic code documentation can possibly solve this at a low cost. This is thus an emerging research field with further applications to program comprehension, and software maintenance. Traditional methods often relied on cognitive models that were built in the form of templates and by heuristics and had varying degree of adoption by the developer community. But with recent advancements, end to end data-driven approaches based on neural techniques have largely overtaken the traditional techniques. Much of the current landscape employs neural translation based architectures with recurrence and attention which is resource and time intensive training procedure. In this paper, we employ neural techniques to solve the task of source code summarizing and specifically compare NMT based techniques to more simplified and appealing Transformer architecture on a dataset of Java methods and comments. We bring forth an argument to dispense the need of recurrence in the training procedure. To the best of our knowledge, transformer based models have not been used for the task before. With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50% and achieve the BLEU score of 17.99 for the test set of examples.
摘要：源代码总结短写作的任务，在运行时的源代码的行为自然语言描述。这样的总结是软件开发和维护是非常有用的，但昂贵的手工作家，因此它对于生产和经常被忽略的代码的小部分来完成。自动代码文档都不可能以较低的成本解决这个问题。因此，这是与其他应用程序理解和软件维护的一个新兴研究领域。传统方法往往依赖于建于模板的形式，认知模型，并通过启发式，不得不通过开发者社区采用不同程度的。但是，随着最近的进步，端到端的数据驱动的方法基于神经网络的技术已经大大超越了传统的技术。当前许多风景采用的复发和注意力，这是资源和时间的强化训练过程神经翻译基础架构。在本文中，我们采用神经技术来解决源代码总结的任务，特别是基于比较NMT技术，以更简化和Java方法和意见的数据集吸引力变压器架构。我们带出的参数分配复发的训练过程中的需要。据我们所知，基于变压器模型还没有被用于之前的任务。随着逾210万个注释和代码监督样品，我们超过50％，缩短了训练时间，并实现该测试集的例子BLEU得分17.99。

13. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers [PDF] 返回目录
Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu
Abstract: We propose Pixel-BERT to align image pixels with text by deep multi-modal transformers that jointly learn visual and language embedding in a unified end-to-end framework. We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks. Our Pixel-BERT which aligns semantic connection in pixel and text level solves the limitation of task-specific visual representation for vision and language tasks. It also relieves the cost of bounding box annotations and overcomes the unbalance between semantic labels in visual task and language semantic. To provide a better representation for down-stream tasks, we pre-train a universal end-to-end model with image and sentence pairs from Visual Genome dataset and MS-COCO dataset. We propose to use a random pixel sampling mechanism to enhance the robustness of visual representation and to apply the Masked Language Model and Image-Text Matching as pre-training tasks. Extensive experiments on downstream tasks with our pre-trained model show that our approach makes the most state-of-the-arts in downstream tasks, including Visual Question Answering (VQA), image-text retrieval, Natural Language for Visual Reasoning for Real (NLVR). Particularly, we boost the performance of a single model in VQA task by 2.17 points compared with SOTA under fair comparison.
摘要：本文提出像素-BERT对齐图像像素与深多模式上变压器，共同学习在一个统一的终端到终端的框架，视觉和语言文字嵌入。我们的目标是建立直接从图像和句对，而不是采用基于区域的图像特征作为最近的视觉和语言的任务图像像素和语言语义之间更准确和全面的连接。我们的像素-BERT其对齐像素和文字水平解决了视觉和语言任务的特定任务的可视化表示的限制语义联系。它也减轻边框注释的成本，克服语义标签之间的不平衡视觉任务和语言的语义。为了给下游任务的更好的代表性，我们预先训练结束普及到终端模型从视觉基因组数据集和MS-COCO数据集图像和句子对。我们建议使用随机像素采样机制，以提高视觉表现的鲁棒性和应用蒙面语言模型和图片，文本匹配为前培训任务。与我们的预训练的模型显示，我们的方法使得大部分国家的的艺术在下游的任务，包括Visual答疑（VQA），图文检索，自然语言为真实的视觉推理（下游任务，大量的实验NLVR）。特别是，我们通过2.17点与公平的比较下SOTA相比提振VQA任务单一模型的性能。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-03

目录

摘要