摘要

1. Towards Zero-shot Cross-lingual Image Retrieval [PDF] 返回目录
Pranav Aggarwal, Ajinkya Kale
Abstract: There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side. We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during inference. We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts from each other. Finally, we introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform. We use this as the test set for evaluating zero-shot model performance across languages. XTD10 dataset is made publicly available here: this https URL
摘要：最近对多模式语言和视觉问题的兴趣激增。在语言方面，由于大多数多模式数据集都是单语言的，因此大多数模型主要关注英语。我们尝试使用零边距方法弥补这一差距，从而在文本侧使用跨语言预训练来学习多模式表示。我们提出了一种简单而实用的方法来构建跨语言的图像检索模型，该模型在单语言训练数据集上进行训练，但是在推理过程中可以以零镜头的跨语言方式使用。我们还引入了一个新的目标函数，该函数通过彼此推挤不同的文本来加强文本嵌入簇。最后，我们使用众包平台以7种语言介绍了一个新的1K多语言MSCOCO2014字幕测试数据集（XTD10）。我们将其用作评估各种语言的零镜头模型性能的测试集。 XTD10数据集在此处公开可用：此https URL

2. Infinite use of finite means: Zero-Shot Generalization using Compositional Emergent Protocols [PDF] 返回目录
Rishi Hazra, Sonu Dixit, Sayambhu Sen
Abstract: Human language has been described as a system that makes use of finite means to express an unlimited array of thoughts. Of particular interest is the aspect of compositionality, whereby, the meaning of a complex, compound language expression can be deduced from the meaning of its constituent parts. If artificial agents can develop compositional communication protocols akin to human language, they can be made to seamlessly generalize to unseen combinations. However, the real question is, how do we induce compositionality in emergent communication? Studies have recognized the role of curiosity in enabling linguistic development in children. It is this same intrinsic urge that drives us to master complex tasks with decreasing amounts of explicit reward. In this paper, we seek to use this intrinsic feedback in inducing a systematic and unambiguous protolanguage in artificial agents. We show in our experiments, how these rewards can be leveraged in training agents to induce compositionality in absence of any external feedback. Additionally, we introduce Comm-gSCAN, a platform for investigating grounded language acquisition in 2D-grid environments. Using this, we demonstrate how compositionality can enable agents to not only interact with unseen objects, but also transfer skills from one task to other in zero-shot (Can an agent, trained to pull and push twice, pull twice?)
摘要：人类语言已被描述为一种系统，它利用有限的手段来表达无限的思想。特别令人感兴趣的是组合性方面，由此可以从其组成部分的含义推导出复杂的复合语言表达的含义。如果人工代理可以开发类似于人类语言的成分通信协议，则可以使它们无缝地泛化为看不见的组合。但是，真正的问题是，我们如何在紧急交流中诱导组成性？研究已经认识到好奇心在促进儿童语言发展中的作用。正是这种内在的冲动促使我们以减少的显性报酬来完成复杂的任务。在本文中，我们试图利用这种内在反馈在人工代理中诱导系统的和无歧义的原始语言。我们在实验中展示了在没有任何外部反馈的情况下，如何在培训代理人中利用这些奖励来诱导团队精神。此外，我们引入了Comm-gSCAN，这是一个用于研究2D网格环境中的基础语言习得的平台。使用此方法，我们演示了组合性如何使代理不仅可以与看不见的对象进行交互，而且还可以零射击将技能从一项任务转移到另一项任务（经过培训的代理可以两次拉动，两次拉动吗？

3. Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing [PDF] 返回目录
Run-Ze Wang, Zhen-Hua Ling, Jing-Bo Zhou, Yu Hu
Abstract: The task of multi-turn text-to-SQL semantic parsing aims to translate natural language utterances in an interaction into SQL queries in order to answer them using a database which normally contains multiple table schemas. Previous studies on this task usually utilized contextual information to enrich utterance representations and to further influence the decoding process. While they ignored to describe and track the interaction states which are determined by history SQL queries and are related with the intent of current utterance. In this paper, two kinds of interaction states are defined based on schema items and SQL keywords separately. A relational graph neural network and a non-linear layer are designed to update the representations of these two states respectively. The dynamic schema-state and SQL-state representations are then utilized to decode the SQL query corresponding to current utterance. Experimental results on the challenging CoSQL dataset demonstrate the effectiveness of our proposed method, which achieves better performance than other published methods on the task leaderboard.
摘要：多转文本到SQL语义解析的任务旨在将交互中的自然语言发音转换为SQL查询，以便使用通常包含多个表模式的数据库来回答它们。对此任务的先前研究通常利用上下文信息来丰富话语表示并进一步影响解码过程。尽管他们忽略了描述和跟踪由历史SQL查询确定的并与当前话语意图相关的交互状态。在本文中，分别基于模式项和SQL关键字定义了两种交互状态。设计了一个关系图神经网络和一个非线性层来分别更新这两个状态的表示。动态模式状态和SQL状态表示然后被用来解码与当前话语相对应的SQL查询。在具有挑战性的CoSQL数据集上的实验结果证明了我们提出的方法的有效性，该方法比任务排行榜上的其他已发布方法具有更好的性能。

4. Label Confusion Learning to Enhance Text Classification Models [PDF] 返回目录
Biyang Guo, Songqiao Han, Xiao Han, Hailiang Huang, Ting Lu
Abstract: Representing a true label as a one-hot vector is a common practice in training text classification models. However, the one-hot representation may not adequately reflect the relation between the instances and labels, as labels are often not completely independent and instances may relate to multiple labels in practice. The inadequate one-hot representations tend to train the model to be over-confident, which may result in arbitrary prediction and model overfitting, especially for confused datasets (datasets with very similar labels) or noisy datasets (datasets with labeling errors). While training models with label smoothing (LS) can ease this problem in some degree, it still fails to capture the realistic relation among labels. In this paper, we propose a novel Label Confusion Model (LCM) as an enhancement component to current popular text classification models. LCM can learn label confusion to capture semantic overlap among labels by calculating the similarity between instances and labels during training and generate a better label distribution to replace the original one-hot label vector, thus improving the final classification performance. Extensive experiments on five text classification benchmark datasets reveal the effectiveness of LCM for several widely used deep learning classification models. Further experiments also verify that LCM is especially helpful for confused or noisy datasets and superior to the label smoothing method.
摘要：将真实标签表示为一键矢量是训练文本分类模型的一种常见做法。但是，一站式表示可能无法充分反映实例与标签之间的关系，因为标签通常并不完全独立，并且实例在实践中可能涉及多个标签。不足的单次热表示倾向于训练模型过于自信，这可能会导致任意预测和模型过度拟合，尤其是对于混淆的数据集（具有非常相似标签的数据集）或嘈杂的数据集（具有标签错误的数据集）。尽管使用标签平滑（LS）训练模型可以在某种程度上缓解此问题，但仍无法捕获标签之间的实际关系。在本文中，我们提出了一种新颖的标签混淆模型（LCM）作为当前流行文本分类模型的增强组件。 LCM可以通过在训练过程中计算实例与标签之间的相似度来学习标签混淆，以捕获标签之间的语义重叠，并生成更好的标签分布以替换原始的一键式标签向量，从而提高最终分类性能。在五个文本分类基准数据集上的大量实验揭示了LCM对于几种广泛使用的深度学习分类模型的有效性。进一步的实验还证明，LCM对于混乱或嘈杂的数据集特别有用，并且优于标签平滑方法。

5. On Knowledge Distillation for Direct Speech Translation [PDF] 返回目录
Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi
Abstract: Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality.
摘要：直接语音翻译（ST）已被证明是一项复杂的任务，需要从其子任务：自动语音识别（ASR）和机器翻译（MT）的知识转移。对于MT来说，最有前途的知识转移技术之一就是知识蒸馏。在本文中，我们比较了在像ST这样的序列到序列任务中提取知识的不同解决方案。此外，我们分析了这种方法的最终弊端，以及如何减轻它们，以维持翻译质量方面的利益。

6. Breeding Gender-aware Direct Speech Translation Systems [PDF] 返回目录
Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi
Abstract: In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vocal characteristics) that is otherwise lost in the cascade framework. Although such ability proved to be useful for gender translation, direct ST is nonetheless affected by gender bias just like its cascade counterpart, as well as machine translation and numerous other natural language processing applications. Moreover, direct ST systems that exclusively rely on vocal biometric features as a gender cue can be unsuitable and potentially harmful for certain users. Going beyond speech signals, in this paper we compare different approaches to inform direct ST models about the speaker's gender and test their ability to handle gender translation from English into Italian and French. To this aim, we manually annotated large datasets with speakers' gender information and used them for experiments reflecting different possible real-world scenarios. Our results show that gender-aware direct ST solutions can significantly outperform strong - but gender-unaware - direct ST models. In particular, the translation of gender-marked words can increase up to 30 points in accuracy while preserving overall translation quality.
摘要：在自动语音翻译（ST）中，涉及单独的转录和翻译步骤的传统级联方法正在为竞争日益激烈且功能更强大的直接解决方案奠定基础。特别地，通过翻译语音音频数据而无需中间转录，直接的ST模型能够利用和保留存在于输入中的必要信息（例如，说话者的声音特性），否则这些信息会在级联框架中丢失。尽管这种能力对性别翻译非常有用，但是直接ST仍然像性别级联一样受到性别偏见的影响，以及机器翻译和许多其他自然语言处理应用程序。此外，仅依靠声音生物特征作为性别提示的直接ST系统可能不适合某些用户，并且可能对某些用户有害。除了语音信号之外，本文中我们还比较了不同的方法，以向直接的ST模型介绍说话者的性别，并测试其处理从英语到意大利语和法语的性别翻译的能力。为此，我们用演讲者的性别信息手动注释了大型数据集，并将其用于反映不同现实情况的实验。我们的结果表明，具有性别意识的直接ST解决方案可以大大胜过强大但不具有性别意识的直接ST模型。尤其是，带有性别标记的单词的翻译可以在保持整体翻译质量的同时，最多提高30分的准确性。

7. Multidimensional scaling and linguistic theory [PDF] 返回目录
Martijn van der Klis, Jos Tellings
Abstract: This paper reports on the state-of-the-art in the application of multidimensional scaling (MDS) techniques to create semantic maps in linguistic research. MDS refers to a statistical technique that represents objects (lexical items, linguistic contexts, languages, etc.) as points in a space so that close similarity between the objects corresponds to close distances between the corresponding points in the representation. We focus on the recent trend to apply MDS to parallel corpus data in order to investigate a certain linguistic phenomenon from a cross-linguistic perspective. We first introduce the mathematical foundations of MDS, intended for non-experts, so that readers understand notions such as 'eigenvalues', 'dimensionality reduction', 'stress values', etc. as they appear in linguistic MDS writing. We then give an exhaustive overview of past research that employs MDS techniques in combination with parallel corpus data, and propose a set of terminology to succinctly describe the key parameters of a particular MDS application. We go over various research questions that have been answered with the aid of MDS maps, showing that the methodology covers topics in a spectrum ranging from classic typology (e.g. language classification) to formal linguistics (e.g. study of a phenomenon in a single language). We finally identify two lines of future research that build on the insights of earlier MDS research described in the paper. First, we envisage the use of MDS in the investigation of cross-linguistic variation of compositional structures, an important area in variation research that has not been approached by parallel corpus work yet. Second, we discuss how MDS can be complemented and compared with other dimensionality reduction techniques that have seen little use in the linguistic domain so far.
摘要：本文报道了在语言研究中多维缩放（MDS）技术在创建语义映射中应用的最新技术。 MDS是指一种统计技术，它将对象（词汇，语言环境，语言等）表示为空间中的点，以使对象之间的紧密相似性对应于表示中相应点之间的接近距离。我们着眼于将MDS应用于并行语料库数据的最新趋势，以便从跨语言的角度研究某种语言现象。我们首先介绍面向非专家的MDS的数学基础，以便读者理解在语言MDS写作中出现的“特征值”，“降维”，“应力值”等概念。然后，我们对使用MDS技术与并行语料库数据相结合的过去研究进行了详尽的概述，并提出了一组术语来简要描述特定MDS应用程序的关键参数。我们遍历了借助MDS映射得到回答的各种研究问题，表明该方法论涵盖了从经典类型学（例如语言分类）到形式语言学（例如对一种语言的现象研究）等一系列主题。我们最终根据本文描述的早期MDS研究的见解，确定了两条未来的研究方向。首先，我们设想在组合结构的跨语言变异研究中使用MDS，这是变异语料研究中尚未通过并行语料库进行研究的重要领域。其次，我们讨论了如何对MDS进行补充，并将其与迄今为止在语言领域中很少使用的其他降维技术进行比较。

8. Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation [PDF] 返回目录
Yunlong Liang, Fandong Meng, Ying Zhang, Jinan Xu, Yufeng Chen, Jie Zhou
Abstract: The success of emotional conversation systems depends on sufficient perception and appropriate expression of emotions. In a real-world conversation, we firstly instinctively perceive emotions from multi-source information, including the emotion flow of dialogue history, facial expressions, and personalities of speakers, and then express suitable emotions according to our personalities, but these multiple types of information are insufficiently exploited in emotional conversation fields. To address this issue, we propose a heterogeneous graph-based model for emotional conversation generation. Specifically, we design a Heterogeneous Graph-Based Encoder to represent the conversation content (i.e., the dialogue history, its emotion flow, facial expressions, and speakers' personalities) with a heterogeneous graph neural network, and then predict suitable emotions for feedback. After that, we employ an Emotion-Personality-Aware Decoder to generate a response not only relevant to the conversation context but also with appropriate emotions, by taking the encoded graph representations, the predicted emotions from the encoder and the personality of the current speaker as inputs. Experimental results show that our model can effectively perceive emotions from multi-source knowledge and generate a satisfactory response, which significantly outperforms previous state-of-the-art models.
摘要：情感对话系统的成功取决于对情感的充分感知和适当表达。在现实世界中的对话中，我们首先是本能地从多种来源的信息中感知情感，包括对话历史，面部表情和说话人的个性的情感流，然后根据我们的个性表达适当的情感，但是这些多种类型的信息在情感对话领域没有得到充分利用。为了解决这个问题，我们提出了一种基于异构图的情感对话生成模型。具体而言，我们设计了一种基于异质图的编码器，以使用异质图神经网络表示对话内容（即对话历史，其情感流，面部表情和说话人的个性），然后预测合适的情感进行反馈。之后，我们采用情绪个性识别解码器，不仅生成与对话上下文相关的响应，而且还以适当的情感生成响应，方法是将编码的图形表示，编码器的预测情感和当前说话者的个性作为输入。实验结果表明，我们的模型可以有效地从多来源的知识中感知情绪并产生令人满意的响应，这大大优于以前的最新模型。

9. Complex Relation Extraction: Challenges and Opportunities [PDF] 返回目录
Haiyun Jiang, Qiaoben Bao, Qiao Cheng, Deqing Yang, Li Wang, Yanghua Xiao
Abstract: Relation extraction aims to identify the target relations of entities in texts. Relation extraction is very important for knowledge base construction and text understanding. Traditional binary relation extraction, including supervised, semi-supervised and distant supervised ones, has been extensively studied and significant results are achieved. In recent years, many complex relation extraction tasks, i.e., the variants of simple binary relation extraction, are proposed to meet the complex applications in practice. However, there is no literature to fully investigate and summarize these complex relation extraction works so far. In this paper, we first report the recent progress in traditional simple binary relation extraction. Then we summarize the existing complex relation extraction tasks and present the definition, recent progress, challenges and opportunities for each task.
摘要：关系提取旨在识别文本中实体的目标关系。关系提取对于知识库构建和文本理解非常重要。传统的二元关系提取，包括有监督，半监督和远距离监督，已经得到了广泛的研究，并取得了显著成果。近年来，提出了许多复杂的关系提取任务，即简单的二进制关系提取的变体，以满足实际中的复杂应用。但是，到目前为止，还没有文献对这些复杂的关系提取工作进行充分的调查和总结。在本文中，我们首先报告了传统简单二进制关系提取中的最新进展。然后，我们总结了现有的复杂关系提取任务，并给出了每个任务的定义，近期进展，挑战和机遇。

10. Improving Relation Extraction by Leveraging Knowledge Graph Link Prediction [PDF] 返回目录
George Stoica, Emmanouil Antonios Platanios, Barnabás Póczos
Abstract: Relation extraction (RE) aims to predict a relation between a subject and an object in a sentence, while knowledge graph link prediction (KGLP) aims to predict a set of objects, O, given a subject and a relation from a knowledge graph. These two problems are closely related as their respective objectives are intertwined: given a sentence containing a subject and an object o, a RE model predicts a relation that can then be used by a KGLP model together with the subject, to predict a set of objects O. Thus, we expect object o to be in set O. In this paper, we leverage this insight by proposing a multi-task learning approach that improves the performance of RE models by jointly training on RE and KGLP tasks. We illustrate the generality of our approach by applying it on several existing RE models and empirically demonstrate how it helps them achieve consistent performance gains.
摘要：关系提取（RE）的目的是预测句子中主语和宾语之间的关系，而知识图链接预测（KGLP）的目的是根据知识图预测给定一个主语和关系的对象集O 。这两个问题紧密相关，因为它们各自的目标相互交织：给定一个包含主语和宾语o的句子，RE模型会预测一个关系，然后KGLP模型可以与主语一起使用该关系来预测一组宾语 O。因此，我们希望对象o在集合O中。在本文中，我们通过提出一种多任务学习方法来利用这一见解，该方法通过联合培训RE和KGLP任务来提高RE模型的性能。我们通过将其应用到几个现有的RE模型上来说明我们方法的一般性，并通过经验证明它如何帮助他们获得一致的性能提升。

11. Fusing Context Into Knowledge Graph for Commonsense Reasoning [PDF] 返回目录
Yichong Xu, Chenguang Zhu, Ruochen Xu, Yang Liu, Michael Zeng, Xuedong Huang
Abstract: Commonsense reasoning requires a model to make presumptions about world events via language understanding. Many methods couple pre-trained language models with knowledge graphs in order to combine the merits in language modeling and entity-based relational learning. However, although a knowledge graph contains rich structural information, it lacks the context to provide a more precise understanding of the concepts and relations. This creates a gap when fusing knowledge graphs into language modeling, especially in the scenario of insufficient paired text-knowledge data. In this paper, we propose to utilize external entity description to provide contextual information for graph entities. For the CommonsenseQA task, our model first extracts concepts from the question and choice, and then finds a related triple between these concepts. Next, it retrieves the descriptions of these concepts from Wiktionary and feed them as additional input to a pre-trained language model, together with the triple. The resulting model can attain much more effective commonsense reasoning capability, achieving state-of-the-art results in the CommonsenseQA dataset with an accuracy of 80.7% (single model) and 83.3% (ensemble model) on the official leaderboard.
摘要：常识推理需要一个模型，通过语言理解来推测世界大事。为了将语言建模和基于实体的关系学习中的优点结合起来，许多方法都将预训练的语言模型与知识图相结合。但是，尽管知识图包含丰富的结构信息，但是它缺少提供对概念和关系的更精确理解的上下文。在将知识图融合到语言建模中时，这会造成差距，尤其是在配对的文本知识数据不足的情况下。在本文中，我们建议利用外部实体描述为图实体提供上下文信息。对于CommonsenseQA任务，我们的模型首先从问题和选择中提取概念，然后在这些概念之间找到相关的三元组。接下来，它从Wiktionary检索这些概念的描述，并将它们作为三元组作为预输入语言模型的附加输入。生成的模型可以获得更有效的常识推理能力，在CommonsenseQA数据集中获得最新的结果，在官方排行榜上的准确度为80.7％（单个模型）和83.3％（整体模型）。

12. On an Unknown Ancestor of Burrows' Delta Measure [PDF] 返回目录
Petr Plechac
Abstract: This article points out some surprising similarities between a 1944 study by Georgy Udny Yule and modern approaches to authorship attribution.
摘要：本文指出了乔治·乌德尼·尤尔（Georgy Udny Yule）1944年的一项研究与现代作者身份归属方法之间令人惊讶的相似之处。

13. Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders [PDF] 返回目录
Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato Bagchi, Alfio Gliozzo
Abstract: Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state of the art approaches. Moreover, we introduce CanonicNell a novel dataset to evaluate entity canonicalization systems.
摘要：开放知识图中的名词短语和关系短语没有规范化，从而导致多余的，含混的主语-宾语-宾语三元组爆炸。现有的解决该问题的方法采取两步法：首先，它们为名词和关系短语生成嵌入表示，然后使用聚类算法，以嵌入为特征对它们进行分组。在这项工作中，我们提出了使用变分自动编码器（CUVA）规范化的联合模型，该模型以端到端的方式学习嵌入和聚类分配，从而为名词和关系短语提供了更好的矢量表示形式。我们对多个基准的评估表明，CUVA的性能优于现有方法。此外，我们介绍了CanonicNell一个新的数据集来评估实体规范化系统。

14. Fact-Enhanced Synthetic News Generation [PDF] 返回目录
Kai Shu, Yichuan Li, Kaize Ding, Huan Liu
Abstract: The advanced text generation methods have witnessed great success in text summarization, language translation, and synthetic news generation. However, these techniques can be abused to generate disinformation and fake news. To better understand the potential threats of synthetic news, we develop a new generation method FactGen to generate high-quality news content. The existing text generation methods either afford limited supplementary information or lose consistency between the input and output which makes the synthetic news less trustworthy. To address these issues, FactGen retrieves external facts to enrich the output and reconstructs the input claim from the generated content to improve the consistency among the input and the output. Experiment results on real-world datasets show that the generated news contents of FactGen are consistent and contain rich facts. We also discuss the possible defending method to identify these synthetic news pieces if FactGen is used to generate synthetic news.
摘要：先进的文本生成方法在文本摘要，语言翻译和合成新闻生成方面取得了巨大的成功。但是，可以滥用这些技术来生成虚假信息和虚假新闻。为了更好地了解合成新闻的潜在威胁，我们开发了一种新一代方法FactGen来生成高质量的新闻内容。现有的文本生成方法要么提供有限的补充信息，要么失去输入和输出之间的一致性，这使得合成新闻的可信度降低。为了解决这些问题，FactGen检索外部事实以丰富输出，并从生成的内容中重建输入声明，以提高输入和输出之间的一致性。真实数据集上的实验结果表明，FactGen生成的新闻内容是一致的，并且包含丰富的事实。如果FactGen用于生成合成新闻，我们还将讨论识别这些合成新闻的可能的防御方法。

15. Edited Media Understanding: Reasoning About Implications of Manipulated Images [PDF] 返回目录
Jeff Da, Maxwell Forbes, Rowan Zellers, Anthony Zheng, Jena D. Hwang, Antoine Bosselut, Yejin Choi
Abstract: Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems. We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. We evaluate a wide variety of vision-and-language models for our task, and introduce a new model PELICAN, which builds upon recent progress in pretrained multimodal representations. Our model obtains promising results on our dataset, with humans rating its answers as accurate 40.35% of the time. At the same time, there is still much work to be done -- humans prefer human-annotated captions 93.56% of the time -- and we provide analysis that highlights areas for further progress.
摘要：从“伪造品”到欺骗性的简单编辑，多模式虚假信息是一个重要的社会问题。但与此同时，绝大多数媒体编辑都是无害的，例如经过过滤的度假照片。此示例与传播虚假信息的有害编辑之间的区别是意图之一。认识和描述这种意图是当今AI系统面临的主要挑战。我们提出了“编辑媒体理解”的任务，要求模型回答开放性问题，以捕捉图像编辑的意图和含义。我们为任务EMU引入了一个数据集，其中包含以丰富自然语言编写的48k个问答对。我们为任务评估了多种视觉和语言模型，并引入了新模型PELICAN，该模型建立在预训练多模态表示法的最新进展的基础上。我们的模型在我们的数据集上获得了可喜的结果，人类将其答案正确的时间定为40.35％。同时，仍有许多工作要做-人类有93.56％的时间喜欢人类注释的字幕-我们提供的分析突出了需要进一步发展的领域。

16. Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text [PDF] 返回目录
Nishtha Madaan, Inkit Padhi, Naveen Panwar, Diptikalyan Saha
Abstract: Machine Learning has seen tremendous growth recently, which has led to a larger adoption of ML systems for educational assessments, credit risk, healthcare, employment, criminal justice, to name a few. Trustworthiness of ML and NLP systems is a crucial aspect and requires guarantee that the decisions they make are fair and robust. Aligned with this, we propose a framework GYC, to generate a set of counterfactual text samples, which are crucial for testing these ML systems. Our main contributions include a) We introduce GYC, a framework to generate counterfactual samples such that the generation is plausible, diverse, goal-oriented, and effective, b) We generate counterfactual samples, that can direct the generation towards a corresponding condition such as named-entity tag, semantic role label, or sentiment. Our experimental results on various domains show that GYC generates counterfactual text samples exhibiting the above four properties. %The generated counterfactuals can then be fed complementary to the existing data augmentation for improving the debiasing algorithms performance as compared to existing counterfactuals generated by token substitution. GYC generates counterfactuals that can act as test cases to evaluate a model and any text debiasing algorithm.
摘要：机器学习最近取得了巨大的发展，这导致机器学习系统在教育评估，信用风险，医疗保健，就业，刑事司法等方面得到了更大的采用。机器学习和自然语言处理系统的可信赖性是至关重要的方面，需要保证它们做出的决策是公正而稳健的。为此，我们提出了一个框架GYC，以生成一组反事实文本样本，这些样本对于测试这些ML系统至关重要。我们的主要贡献包括：a）引入GYC，这是一个生成反事实样本的框架，从而使生成的样本看起来合理，多样，面向目标且有效； b）我们生成反事实样本，可以将生成的样本引向相应的条件，例如命名实体标签，语义角色标签或情感。我们在各个领域的实验结果表明，GYC会生成具有上述四个属性的反事实文本样本。然后，与通过令牌替换生成的现有反事实相比，可以将生成的反事实补充到现有数据扩充中，以提高去偏置算法的性能。 GYC生成反事实，它们可以充当测试用例以评估模型和任何文本去偏算法。

17. Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation [PDF] 返回目录
Thibault Cordier, Tanguy Urvoy, Lina M. Rojas-Barahona, Fabrice Lefèvre
Abstract: A learning dialogue agent can infer its behaviour from interactions with the users. These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in this paper several imitation learning strategies for dialogue policy where the guiding expert is a near-optimal handcrafted policy. We incorporate these strategies with state-of-the-art reinforcement learning methods based on Q-learning and actor-critic. We notably propose a randomised exploration policy which allows for a seamless hybridisation of the learned policy and the expert. Our experiments show that our hybridisation strategy outperforms several baselines, and that it can accelerate the learning when facing real humans.
摘要：学习对话代理可以通过与用户的交互来推断其行为。这些交互可以从人与人或人机对话中进行。但是，人际互动稀缺且成本高昂，因此从很少的互动中学习至关重要。加快学习过程的一种解决方案是在专家的帮助下指导代理商的探索。我们在本文中介绍了几种针对对话策略的模仿学习策略，其中指导专家是近乎最佳的手工策略。我们将这些策略与基于Q学习和演员批评的最新强化学习方法结合在一起。我们特别提出了一种随机探索策略，该策略允许将学习到的策略和专家进行无缝混合。我们的实验表明，我们的杂交策略优于几个基准，并且在面对真实人类时可以加快学习速度。

18. Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19 [PDF] 返回目录
Leo K. Tam, Xiaosong Wang, Daguang Xu
Abstract: Previous work established skip-gram word2vec models could be used to mine knowledge in the materials science literature for the discovery of thermoelectrics. Recent transformer architectures have shown great progress in language modeling and associated fine-tuned tasks, but they have yet to be adapted for drug discovery. We present a RoBERTa transformer-based method that extends the masked language token prediction using query-target conditioning to treat the specificity challenge. The transformer discovery method entails several benefits over the word2vec method including domain-specific (antiviral) analogy performance, negation handling, and flexible query analysis (specific) and is demonstrated on influenza drug discovery. To stimulate COVID-19 research, we release an influenza clinical trials and antiviral analogies dataset used in conjunction with the COVID-19 Open Research Dataset Challenge (CORD-19) literature dataset in the study. We examine k-shot fine-tuning to improve the downstream analogies performance as well as to mine analogies for model explainability. Further, the query-target analysis is verified in a forward chaining analysis against the influenza drug clinical trials dataset, before adapted for COVID-19 drugs (combinations and side-effects) and on-going clinical trials. In consideration of the present topic, we release the model, dataset, and code.
摘要：先前建立的跳跃语法word2vec模型可用于挖掘材料科学文献中的知识，以发现热电学。最近的转换器体系结构在语言建模和相关的微调任务方面显示出了巨大的进步，但尚未适应药物发现。我们提出了一种基于RoBERTa转换器的方法，该方法扩展了使用查询目标条件来处理特殊性挑战的屏蔽语言令牌预测。转换器发现方法与word2vec方法相比具有多个优点，包括特定领域（抗病毒）的类比性能，否定处理和灵活的查询分析（特定），并且已在流感药物发现中得到证明。为了刺激COVID-19研究，我们发布了一项流感临床试验和抗病毒类似物数据集，并与该研究中的COVID-19开放研究数据集挑战（CORD-19）文献数据集一起使用。我们研究了k-shot微调，以改善下游类比性能，并挖掘类比的模型可解释性。此外，在针对COVID-19药物（组合和副作用）和正在进行的临床试验之前，针对流感药物临床试验数据集的前向链分析中对查询目标分析进行了验证。考虑到本主题，我们发布了模型，数据集和代码。

19. SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint [PDF] 返回目录
Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin
Abstract: Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry. In automatic song writing, lyric-to-melody generation and melody-to-lyric generation are two important tasks, both of which usually suffer from the following challenges: 1) the paired lyric and melody data are limited, which affects the generation quality of the two tasks, considering a lot of paired training data are needed due to the weak correlation between lyric and melody; 2) Strict alignments are required between lyric and melody, which relies on specific alignment modeling. In this paper, we propose SongMASS to address the above challenges, which leverages masked sequence to sequence (MASS) pre-training and attention based alignment modeling for lyric-to-melody and melody-to-lyric generation. Specifically, 1) we extend the original sentence-level MASS pre-training to song level to better capture long contextual information in music, and use a separate encoder and decoder for each modality (lyric or melody); 2) we leverage sentence-level attention mask and token-level attention constraint during training to enhance the alignment between lyric and melody. During inference, we use a dynamic programming strategy to obtain the alignment between each word/syllable in lyric and note in melody. We pre-train SongMASS on unpaired lyric and melody datasets, and both objective and subjective evaluations demonstrate that SongMASS generates lyric and melody with significantly better quality than the baseline method without pre-training or alignment constraint.
摘要：自动歌曲创作旨在通过机器来创作歌曲（歌词和/或旋律），这在学术界和工业界都是一个有趣的话题。在自动歌曲创作中，歌词生成和旋律生成是两个重要的任务，通常都面临以下挑战：1）歌词和旋律数据的配对受到限制，这影响了歌曲的生成质量。由于歌词和旋律之间的相关性较弱，在两项任务中都需要考虑大量配对的训练数据。 2）歌词和旋律之间需要严格的对齐方式，这取决于特定的对齐方式建模。在本文中，我们提出SongMASS来解决上述挑战，它利用掩蔽序列到序列（MASS）的预训练和基于注意力的比对建模来实现歌词转旋律和旋律转歌词的生成。具体来说，1）我们将原始的句子级别的MASS预训练扩展到歌曲级别，以更好地捕获音乐中的长情境信息，并对每种形式（歌词或旋律）使用单独的编码器和解码器； 2）在训练过程中，我们利用句子级别的注意掩码和标记级别的注意约束条件来增强歌词和旋律之间的一致性。在推论过程中，我们使用动态编程策略来获取歌词中的每个单词/音节与旋律中的音符之间的对齐方式。我们在未配对的歌词和旋律数据集上对SongMASS进行了预训练，客观评估和主观评估都表明，SongMASS生成的歌词和旋律的质量明显优于基线方法，而没有预训练或对齐约束。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-10

目录

摘要