摘要

1. Benchmarking Intent Detection for Task-Oriented Dialog Systems [PDF] 返回目录
Haode Qi, Lin Pan, Atin Sood, Abhishek Shah, Ladislav Kunc, Saloni Potdar
Abstract: Intent detection is a key component of modern goal-oriented dialog systems that accomplish a user task by predicting the intent of users' text input. There are three primary challenges in designing robust and accurate intent detection models. First, typical intent detection models require a large amount of labeled data to achieve high accuracy. Unfortunately, in practical scenarios it is more common to find small, unbalanced, and noisy datasets. Secondly, even with large training data, the intent detection models can see a different distribution of test data when being deployed in the real world, leading to poor accuracy. Finally, a practical intent detection model must be computationally efficient in both training and single query inference so that it can be used continuously and re-trained frequently. We benchmark intent detection methods on a variety of datasets. Our results show that Watson Assistant's intent detection model outperforms other commercial solutions and is comparable to large pretrained language models while requiring only a fraction of computational resources and training data. Watson Assistant demonstrates a higher degree of robustness when the training and test distributions differ.
摘要：目的检测是现代目标导向对话系统的关键组件，该系统通过预测用户文本输入的意图来完成用户任务。设计强大而准确的意图检测模型存在三个主要挑战。首先，典型的意图检测模型需要大量的标记数据才能实现高精度。不幸的是，在实际情况下，更常见的是找到小的，不平衡的和嘈杂的数据集。其次，即使有大量的训练数据，意图检测模型在实际环境中部署时也会看到测试数据的不同分布，从而导致准确性低下。最后，实用的意图检测模型必须在训练和单个查询推理上都具有较高的计算效率，以便可以连续使用和频繁地对其进行训练。我们在各种数据集上对意图检测方法进行基准测试。我们的结果表明，Watson Assistant的意图检测模型优于其他商业解决方案，并且可以与大型预训练语言模型媲美，而只需要少量的计算资源和训练数据。当训练和测试分布不同时，Watson Assistant表现出更高的鲁棒性。

2. Evaluating Cross-Lingual Transfer Learning Approaches in Multilingual Conversational Agent Models [PDF] 返回目录
Lizhen Tan, Olga Golovneva
Abstract: With the recent explosion in popularity of voice assistant devices, there is a growing interest in making them available to user populations in additional countries and languages. However, to provide the highest accuracy and best performance for specific user populations, most existing voice assistant models are developed individually for each region or language, which requires linear investment of effort. In this paper, we propose a general multilingual model framework for Natural Language Understanding (NLU) models, which can help bootstrap new language models faster and reduce the amount of effort required to develop each language separately. We explore how different deep learning architectures affect multilingual NLU model performance. Our experimental results show that these multilingual models can reach same or better performance compared to monolingual models across language-specific test data while require less effort in creating features and model maintenance.
摘要：随着语音助手设备最近的爆炸式增长，人们越来越有兴趣将其提供给其他国家/地区和语言的用户使用。但是，为了为特定用户群体提供最高的准确性和最佳的性能，大多数现有的语音助手模型是针对每种地区或语言分别开发的，这需要线性投入。在本文中，我们为自然语言理解（NLU）模型提出了一个通用的多语言模型框架，该框架可以帮助更快地引导新的语言模型并减少分别开发每种语言所需的工作量。我们探索了不同的深度学习架构如何影响多语言NLU模型的性能。我们的实验结果表明，与跨特定语言的测试数据的单语言模型相比，这些多语言模型可以达到相同或更好的性能，同时在创建功能和模型维护方面所需的精力更少。

3. The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models [PDF] 返回目录
José Lopes, Francisco J. Chiyah Garcia, Helen Hastie
Abstract: Challenges around collecting and processing quality data have hampered progress in data-driven dialogue models. Previous approaches are moving away from costly, resource-intensive lab settings, where collection is slow but where the data is deemed of high quality. The advent of crowd-sourcing platforms, such as Amazon Mechanical Turk, has provided researchers with an alternative cost-effective and rapid way to collect data. However, the collection of fluid, natural spoken or textual interaction can be challenging, particularly between two crowd-sourced workers. In this study, we compare the performance of dialogue models for the same interaction task but collected in two different settings: in the lab vs. crowd-sourced. We find that fewer lab dialogues are needed to reach similar accuracy, less than half the amount of lab data as crowd-sourced data. We discuss the advantages and disadvantages of each data collection method.
摘要：围绕收集和处理高质量数据的挑战阻碍了数据驱动的对话模型的发展。以前的方法正在远离昂贵的资源密集型实验室环境，在实验室环境中收集速度很慢，但数据被认为是高质量的。诸如Amazon Mechanical Turk之类的众包平台的出现为研究人员提供了另一种经济高效的快速收集数据的方式。但是，流畅，自然的口语或文本互动的收集可能具有挑战性，尤其是在两名人群中的工人之间。在本研究中，我们比较了在相同的交互任务但在两种不同的环境下收集的对话模型的性能：在实验室中还是在众包中。我们发现，达到相同的准确度所需的实验室对话较少，而实验室数据的数量是众包数据的一半。我们讨论每种数据收集方法的优缺点。

4. Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis [PDF] 返回目录
Jean-Baptiste Camps, Thibault Clérice, Ariane Pinche
Abstract: Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist. Basing the analysis on the study of the copy from a single hand of several texts can partially mitigate these issues (Camps and Cafiero, 2013), but the limited availability of complete diplomatic transcriptions might make this difficult. In this paper, we use a workflow combining handwritten text recognition and stylometric analysis, applied to the case of the hagiographic works contained in MS BnF, fr. 412. We seek to evaluate Paul Meyer's hypothesis about the constitution of groups of hagiographic works, as well as to examine potential authorial groupings in a vastly anonymous corpus.
摘要：对中世纪白话文本的风格分析仍然是一个巨大的挑战：抄写变体的重要性，无论是拼写或更重要的形式，以及传统中引入的变体和错误，都使准造型师的工作变得复杂。基于从单手研究几个文本的副本进行的分析可以部分缓解这些问题（Camps and Cafiero，2013），但是完整外交记录的有限提供可能使这一工作变得困难。在本文中，我们使用结合了手写文本识别和笔势分析的工作流，将其应用于MS BnF，fr。 412.我们试图评估保罗·迈耶关于假言作品组构成的假说，并研究一个广泛匿名的语料库中的潜在作者群体。

5. What Meaning-Form Correlation Has to Compose With [PDF] 返回目录
Timothee Mickus, Timothée Bernard, Denis Paperno
Abstract: Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze meaning-form correlation on three sets of languages: (i) artificial toy languages tailored to be compositional, (ii) a set of English dictionary definitions, and (iii) a set of English sentences drawn from literature. We find that linguistic phenomena such as synonymy and ungrounded stop-words weigh on MFC measurements, and that straightforward methods to mitigate their effects have widely varying results depending on the dataset they are applied to. Data and code are made publicly available.
摘要：尽管合成性的确切定义难以捉摸，但它是自然语言广泛讨论的属性。我们专注于可以通过测量意义形式相关性来评估组成性的提议。我们分析了三种语言之间的意义形式相关性：（i）量身定制的人工玩具语言，（ii）一组英语词典定义，以及（iii）一组从文学作品中提取的英语句子。我们发现语言现象（例如同义词和不切实际的停用词）影响着MFC的测量，而减轻其影响的直接方法根据应用于它们的数据集而产生广泛不同的结果。数据和代码是公开可用的。

6. Using previous acoustic context to improve Text-to-Speech synthesis [PDF] 返回目录
Pilar Oplustil-Gallegos, Simon King
Abstract: Many speech synthesis datasets, especially those derived from audiobooks, naturally comprise sequences of utterances. Nevertheless, such data are commonly treated as individual, unordered utterances both when training a model and at inference time. This discards important prosodic phenomena above the utterance level. In this paper, we leverage the sequential nature of the data using an acoustic context encoder that produces an embedding of the previous utterance audio. This is input to the decoder in a Tacotron 2 model. The embedding is also used for a secondary task, providing additional supervision. We compare two secondary tasks: predicting the ordering of utterance pairs, and predicting the embedding of the current utterance audio. Results show that the relation between consecutive utterances is informative: our proposed model significantly improves naturalness over a Tacotron 2 baseline.
摘要：许多语音合成数据集，尤其是从有声读物得到的数据，自然都包含发声序列。但是，在训练模型时和推论时，这些数据通常都被视为单独的无序话语。这就丢弃了语音水平以上的重要韵律现象。在本文中，我们使用声学上下文编码器利用数据的顺序性质，该编码器会嵌入以前的话语音频。它以Tacotron 2模型输入到解码器。该嵌入还用于辅助任务，提供了额外的监督。我们比较了两个次要任务：预测话语对的排序和预测当前话语音频的嵌入。结果表明，连续发声之间的关系是有益的：我们提出的模型大大提高了Tacotron 2基线之上的自然度。

7. Reference Knowledgeable Network for Machine Reading Comprehension [PDF] 返回目录
Yilin Zhao, Zhuosheng Zhang, Hai Zhao
Abstract: Multi-choice Machine Reading Comprehension (MRC) is a major and challenging form of MRC tasks that requires model to select the most appropriate answer from a set of candidates given passage and question. Most of the existing researches focus on the modeling of the task datasets without explicitly referring to external fine-grained commonsense sources, which is a well-known challenge in multi-choice tasks. Thus we propose a novel reference-based knowledge enhancement model based on span extraction called Reference Knowledgeable Network (RekNet), which simulates human reading strategy to refine critical information from the passage and quote external knowledge in necessity. In detail, RekNet refines fine-grained critical information and defines it as Reference Span, then quotes external knowledge quadruples by the co-occurrence information of Reference Span and answer options. Our proposed method is evaluated on two multi-choice MRC benchmarks: RACE and DREAM, which shows remarkable performance improvement with observable statistical significance level over strong baselines.
摘要：多选机阅读理解（MRC）是MRC任务的主要且具有挑战性的形式，要求模型从给定段落和问题的一组候选人中选择最合适的答案。现有的大多数研究都集中在任务数据集的建模上，而没有明确地引用外部的细粒度常识源，这是在多项选择任务中众所周知的挑战。因此，我们提出了一种基于跨度提取的新颖的基于参考的知识增强模型，称为参考知识网络（RekNet），该模型模拟人类阅读策略以从段落中提取关键信息并在必要时引用外部知识。详细而言，RekNet会细化关键信息并将其定义为“参考范围”，然后通过“参考范围”和“答案”选项的同时出现信息来引用外部知识四倍。我们提出的方法是在两个多项选择的MRC基准上进行评估的：RACE和DREAM，它们在强大的基准上显示了显着的性能改进，并且具有可观察到的统计显着性水平。

8. An Enhanced MeanSum Method For Generating Hotel Multi-Review Summarizations [PDF] 返回目录
Saibo Geng, Diego Antognini
Abstract: Multi-document summaritazion is the process of taking multiple texts as input and producing a short summary text based on the content of input texts. Up until recently, multi-document summarizers are mostly supervised extractive. However, supervised methods require datasets of large, paired document-summary examples which are rare and expensive to produce. In 2018, an unsupervised multi-document abstractive summarization method(Meansum) was proposed by Chu and Liu, and demonstrated competitive performances comparing to extractive methods. Despite good evaluation results on automatic metrics, Meansum has multiple limitations, notably the inability of dealing with multiple aspects. The aim of this work was to use Multi-Aspect Masker(MAM) as content selector to address the issue with multi-aspect. Moreover, we propose a regularizer to control the length of the generated summaries. Through a series of experiments on the hotel dataset from Trip Advisor, we validate our assumption and show that our improved model achieves higher ROUGE, Sentiment Accuracy than the original Meansum method and also beats/ comprarable/close to the supervised baseline.
摘要：多文档摘要是将多个文本作为输入并根据输入文本的内容生成简短摘要文本的过程。直到最近，多文档摘要器大多是受监督的摘录工具。但是，受监督的方法需要大型，成对的文档摘要示例的数据集，这些数据集很少且制作成本很高。在2018年，Chu和Liu提出了一种无监督的多文档抽象总结方法（Meansum），并证明了与提取方法相比的竞争性能。尽管对自动指标的评估结果良好，Meansum仍然存在多个局限性，尤其是无法处理多个方面。这项工作的目的是使用Multi-Aspect Masker（MAM）作为内容选择器来解决多方面的问题。此外，我们提出了一个正则化器来控制生成摘要的长度。通过对Trip Advisor的酒店数据集进行的一系列实验，我们验证了我们的假设，并表明，与原始的Meansum方法相比，改进后的模型具有更高的ROUGE，情感准确度，并且拍子/可比较/接近监督基准。

9. Topical Change Detection in Documents via Embeddings of Long Sequences [PDF] 返回目录
Dennis Aumiller, Satya Almasian, Sebastian Lackner, Michael Gertz
Abstract: In a longer document, the topic often slightly shifts from one passage to the next, where topic boundaries are usually indicated by semantically coherent segments. Discovering this latent structure in a document improves the readability and is essential for passage retrieval and summarization tasks. We formulate the task of text segmentation as an independent supervised prediction task, making it suitable to train on Transformer-based language models. By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information, which can be used to find the section boundaries and divide the text into coherent segments. Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context of an entire paragraph and assume topical independence of preceeding and succeeding text. We lastly introduce a novel large-scale dataset constructed from online Terms-of-Service documents, on which we compare against various traditional and deep learning baselines, showing significantly better performance of Transformer-based methods.
摘要：在较长的文档中，主题通常会从一个段落略微转移到下一个段落，其中主题边界通常由语义上连贯的片段表示。在文档中发现这种潜在结构可以提高可读性，对于段落检索和摘要任务至关重要。我们将文本分割的任务表述为独立的有监督的预测任务，使其适合在基于Transformer的语言模型上进行训练。通过对相似部分的段落进行微调，我们可以证明学习的功能对主题信息进行了编码，这些信息可用于查找部分边界并将文本划分为连贯的段。与以前的方法（通常在句子级别上运行）不同，我们始终使用整个段落的更广泛的上下文，并假设前后文本的主题独立性。最后，我们介绍了一个由在线服务条款文档构成的新颖的大规模数据集，我们将其与各种传统和深度学习基准进行了比较，从而显示出基于Transformer的方法的明显更好的性能。

10. PPKE: Knowledge Representation Learning by Path-based Pre-training [PDF] 返回目录
Bin He, Di Zhou, Jing Xie, Jinghui Xiao, Xin Jiang, Qun Liu
Abstract: Entities may have complex interactions in a knowledge graph (KG), such as multi-step relationships, which can be viewed as graph contextual information of the entities. Traditional knowledge representation learning (KRL) methods usually treat a single triple as a training unit, and neglect most of the graph contextual information exists in the topological structure of KGs. In this study, we propose a Path-based Pre-training model to learn Knowledge Embeddings, called PPKE, which aims to integrate more graph contextual information between entities into the KRL model. Experiments demonstrate that our model achieves state-of-the-art results on several benchmark datasets for link prediction and relation prediction tasks, indicating that our model provides a feasible way to take advantage of graph contextual information in KGs.
摘要：实体在知识图（KG）中可能具有复杂的交互作用，例如多步关系，可以将其视为实体的图上下文信息。传统的知识表示学习（KRL）方法通常将单个三元组视为一个训练单元，而忽略了大多数图上下文信息存在于KG的拓扑结构中。在这项研究中，我们提出了一种基于路径的预训练模型来学习知识嵌入，称为PPKE，该模型旨在将实体之间的更多图形上下文信息集成到KRL模型中。实验表明，我们的模型在几个用于链接预测和关系预测任务的基准数据集上均达到了最新的结果，这表明我们的模型提供了一种利用KG中图上下文信息的可行方法。

11. KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning [PDF] 返回目录
Bin He, Xin Jiang, Jinghui Xiao, Qun Liu
Abstract: Recent studies on pre-trained language models have demonstrated their ability to capture factual knowledge and applications in knowledge-aware downstream tasks. In this work, we present a language model pre-training framework guided by factual knowledge completion and verification, and use the generative and discriminative approaches cooperatively to learn the model. Particularly, we investigate two learning schemes, named two-tower scheme and pipeline scheme, in training the generator and discriminator with shared parameter. Experimental results on LAMA, a set of zero-shot cloze-style question answering tasks, show that our model contains richer factual knowledge than the conventional pre-trained language models. Furthermore, when fine-tuned and evaluated on the MRQA shared tasks which consists of several machine reading comprehension datasets, our model achieves the state-of-the-art performance, and gains large improvements on NewsQA (+1.26 F1) and TriviaQA (+1.56 F1) over RoBERTa.
摘要：最近对预训练语言模型的研究表明，它们具有捕获事实知识的能力，并能在知识意识的下游任务中应用。在这项工作中，我们提出了一个以事实知识的完成和验证为指导的语言模型预训练框架，并协同使用生成性和判别性方法来学习该模型。特别是，在训练具有共享参数的生成器和鉴别器时，我们研究了两种学习方案，分别称为两塔方案和流水线方案。在LAMA（一组零击克洛什式提问任务）上的实验结果表明，与传统的预训练语言模型相比，我们的模型包含了更丰富的事实知识。此外，当我们对由多个机器阅读理解数据集组成的MRQA共享任务进行微调和评估时，我们的模型可实现最先进的性能，并且对NewsQA（+1.26 F1）和TriviaQA（+ 1.56 F1）。

12. UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2 [PDF] 返回目录
Yunyi Yang, Yunhao Li, Xiaojun Quan
Abstract: This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. % especially in end-to-end modeling, where we improve the combined score by 9.4 points. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.
摘要：本文介绍了我们的面向任务的对话系统UBAR，该系统在对话会话级别对面向任务的对话进行建模。具体来说，通过在整个对话会话的序列上微调大型预训练的单向语言模型GPT-2来获取UBAR，对话会话的序列由用户话语，信念状态，数据库结果，系统行为和每个对话的系统响应组成转。此外，在更现实的环境中对UBAR进行评估，在该环境中，其对话上下文可以访问用户的话语及其所生成的所有内容，例如信念状态，系统行为和系统响应。在MultiWOZ数据集上的实验结果表明，UBAR在多种设置下均达到了最先进的性能，将响应生成，策略优化和端到端建模的综合得分分别提高了4.7、3.5和9.4点。％，尤其是在端到端建模中，我们将综合得分提高了9.4分。全面的分析表明，会话级训练序列的表述和所生成的对话上下文对于UBAR在现实生活中充当完全端对端的面向任务的对话系统至关重要。我们还检查了UBAR向有限数据新域的转移能力，并提供了可视化和案例研究，以说明UBAR在对话会话级别建模中的优势。

13. H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction [PDF] 返回目录
Jhih-Wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma
Abstract: Although distant supervision automatically generates training data for relation extraction, it also introduces false-positive (FP) and false-negative (FN) training instances to the generated datasets. Whereas both types of errors degrade the final model performance, previous work on distant supervision denoising focuses more on suppressing FP noise and less on resolving the FN problem. We here propose H-FND, a hierarchical false-negative denoising framework for robust distant supervision relation extraction, as an FN denoising solution. H-FND uses a hierarchical policy which first determines whether non-relation (NA) instances should be kept, discarded, or revised during the training process. For those learning instances which are to be revised, the policy further reassigns them appropriate relations, making them better training inputs. Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances. In this setting, H-FND can revise FN instances correctly and maintains high F1 scores even when 50% of the instances have been turned into negatives. Experiment on NYT10 is further conducted to shows that H-FND is applicable in a realistic setting.
摘要：尽管远程监管会自动生成用于关系提取的训练数据，但它还会将假阳性（FP）和假阴性（FN）训练实例引入所生成的数据集。两种类型的误差都会降低最终模型的性能，而以前在远程监管降噪方面的工作更多地集中在抑制FP噪声上，而较少集中在解决FN问题上。我们在这里提出H-FND，这是一种用于稳健远距离监管关系提取的分层假阴性消噪框架，作为FN消噪解决方案。 H-FND使用分层策略，该策略首先确定在培训过程中是否应保留，丢弃或修改非关系（NA）实例。对于将要修改的学习实例，该政策进一步为他们分配了适当的关系，从而使他们获得更好的培训投入。在SemEval-2010和TACRED上进行的实验是在受控FN比率下进行的，该比率将训练和验证实例之间的关系随机化为负值以生成FN实例。在此设置下，即使50％的实例已变成负数，H-FND仍可正确修改FN实例并保持较高的F1分数。进一步在NYT10上进行的实验表明，H-FND适用于现实环境。

14. Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization [PDF] 返回目录
Xiachong Feng, Xiaocheng Feng, Bing Qin, Xinwei Geng, Ting Liu
Abstract: Sequence-to-sequence methods have achieved promising results for textual abstractive meeting summarization. Different from documents like news and scientific papers, a meeting is naturally full of dialogue-specific structural information. However, previous works model a meeting in a sequential manner, while ignoring the rich structural information. In this paper, we develop a Dialogue Discourse-Aware Graph Convolutional Networks (DDA-GCN) for meeting summarization by utilizing dialogue discourse, which is a dialogue-specific structure that can provide pre-defined semantic relationships between each utterance. We first transform the entire meeting text with dialogue discourse relations into a discourse graph and then use DDA-GCN to encode the semantic representation of the graph. Finally, we employ a Recurrent Neural Network to generate the summary. In addition, we utilize the question-answer discourse relation to construct a pseudo-summarization corpus, which can be used to pre-train our model. Experimental results on the AMI dataset show that our model outperforms various baselines and can achieve state-of-the-art performance.
摘要：序列到序列方法在文本抽象会议摘要中取得了可喜的成果。与新闻和科学论文等文档不同，会议自然充满对话特定的结构信息。但是，先前的工作忽略了丰富的结构信息，却以连续的方式对会议进行建模。在本文中，我们开发了一种对话对话感知图卷积网络（DDA-GCN），以利用对话对话来满足摘要，对话对话是一种特定于对话的结构，可以提供每个话语之间的预定义语义关系。我们首先将具有对话话语关系的整个会议文本转换为话语图，然后使用DDA-GCN对图的语义表示进行编码。最后，我们使用递归神经网络生成摘要。另外，我们利用问题-答案话语关系来构建伪摘要语料库，该伪摘要语料库可用于预训练我们的模型。 AMI数据集上的实验结果表明，我们的模型优于各种基准，并且可以实现最新的性能。

15. Document Graph for Neural Machine Translation [PDF] 返回目录
Mingzhou Xu, Liangyou Li, Derek. F. Wong, Qun Liu, Lidia S. Chao
Abstract: Previous works have shown that contextual information can improve the performance of neural machine translation (NMT). However, most existing document-level NMT methods failed to leverage contexts beyond a few set of previous sentences. How to make use of the whole document as global contexts is still a challenge. To address this issue, we hypothesize that a document can be represented as a graph that connects relevant contexts regardless of their distances. We employ several types of relations, including adjacency, syntactic dependency, lexical consistency, and coreference, to construct the document graph. Then, we incorporate both source and target graphs into the conventional Transformer architecture with graph convolutional networks. Experiments on various NMT benchmarks, including IWSLT English-French, Chinese-English, WMT English-German and Opensubtitle English-Russian, demonstrate that using document graphs can significantly improve the translation quality.
摘要：先前的研究表明，上下文信息可以提高神经机器翻译（NMT）的性能。但是，大多数现有的文档级NMT方法都无法利用之前几句话所没有的上下文。如何利用整个文件作为全球背景仍然是一个挑战。为了解决这个问题，我们假设可以将文档表示为连接相关上下文的图形，而不管它们之间的距离如何。我们使用几种类型的关系来构造文档图，包括邻接关系，句法依赖性，词法一致性和共指关系。然后，我们将源图和目标图都结合到具有图卷积网络的常规Transformer体系结构中。在各种NMT基准上进行的实验（包括IWSLT英文，法文，中文-英文，WMT英文-德文和Opensubtitle英文-俄文）证明，使用文档图可以显着提高翻译质量。

16. An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data [PDF] 返回目录
Lili Wang, Chongyang Gao, Jason Wei, Weicheng Ma, Ruibo Liu, Soroush Vosoughi
Abstract: The field of NLP has seen unprecedented achievements in recent years. Most notably, with the advent of large-scale pre-trained Transformer-based language models, such as BERT, there has been a noticeable improvement in text representation. It is, however, unclear whether these improvements translate to noisy user-generated text, such as tweets. In this paper, we present an experimental survey of a wide range of well-known text representation techniques for the task of text clustering on noisy Twitter data. Our results indicate that the more advanced models do not necessarily work best on tweets and that more exploration in this area is needed.
摘要：近年来，自然语言处理领域取得了空前的成就。最值得注意的是，随着基于BERT的大规模预训练基于Transformer的语言模型的出现，文本表示形式有了显着的改进。但是，尚不清楚这些改进是否会转化为嘈杂的用户生成的文本，例如推文。在本文中，我们将对各种著名的文本表示技术进行实验性调查，以解决在嘈杂的Twitter数据上进行文本聚类的任务。我们的结果表明，更高级的模型不一定在推文上最有效，因此需要在此领域进行更多的探索。

17. From syntactic structure to semantic relationship: hypernym extraction from definitions by recurrent neural networks using the part of speech information [PDF] 返回目录
Yixin Tan, Xiaomeng Wang, Tao Jia
Abstract: The hyponym-hypernym relation is an essential element in the semantic network. Identifying the hypernym from a definition is an important task in natural language processing and semantic analysis. While a public dictionary such as WordNet works for common words, its application in domain-specific scenarios is limited. Existing tools for hypernym extraction either rely on specific semantic patterns or focus on the word representation, which all demonstrate certain limitations.
摘要：下位词-上位词关系是语义网络中必不可少的元素。从定义中识别上位词是自然语言处理和语义分析中的重要任务。尽管诸如WordNet之类的公共词典适用于常见单词，但其在特定领域的场景中的应用受到限制。现有的用于重音提取的工具要么依赖于特定的语义模式，要么专注于单词表示，这些都显示出一定的局限性。

18. Competition in Cross-situational Word Learning: A Computational Study [PDF] 返回目录
Aida Nematzadeh, Zahra Shekarchi, Thomas L. Griffiths, Suzanne Stevenson
Abstract: Children learn word meanings by tapping into the commonalities across different situations in which words are used and overcome the high level of uncertainty involved in early word learning experiences. In a set of computational studies, we show that to successfully learn word meanings in the face of uncertainty, a learner needs to use two types of competition: words competing for association to a referent when learning from an observation and referents competing for a word when the word is used.
摘要：孩子们通过利用单词在不同情况下的共性来学习单词的含义，并克服了早期单词学习经验中的高度不确定性。在一组计算研究中，我们表明，要在不确定性的情况下成功学习单词的含义，学习者需要使用两种竞争：从观察中学习时，单词竞争与被指对象的关联；当对象从单词中竞争时，被指竞争的对象。这个词被使用。

19. A Two-Systems Perspective for Computational Thinking [PDF] 返回目录
Arvind W Kiwelekar, Swanand Navandar, Dharmendra K. Yadav
Abstract: Computational Thinking (CT) has emerged as one of the vital thinking skills in recent times, especially for Science, Technology, Engineering and Management (STEM) graduates. Educators are in search of underlying cognitive models against which CT can be analyzed and evaluated. This paper suggests adopting Kahneman's two-systems model as a framework to understand the computational thought process. Kahneman's two-systems model postulates that human thinking happens at two levels, i.e. fast and slow thinking. This paper illustrates through examples that CT activities can be represented and analyzed using Kahneman's two-systems model. The potential benefits of adopting Kahneman's two-systems perspective are that it helps us to fix the biases that cause errors in our reasoning. Further, it also provides a set of heuristics to speed up reasoning activities.
摘要：计算思维（CT）近年来已成为一种重要的思维技能，特别是对于科学，技术，工程和管理（STEM）的毕业生而言。教育者正在寻找可以分析和评估CT的潜在认知模型。本文建议采用卡尼曼的两个系统模型作为框架来理解计算思维过程。卡尼曼的两个系统模型假设人类思维发生在两个层面，即快速思维和缓慢思维。本文通过示例说明可以使用Kahneman的两个系统模型来表示和分析CT活动。采纳卡尼曼的两种系统的观点的潜在好处是，它可以帮助我们纠正导致推理错误的偏见。此外，它还提供了一组试探法来加速推理活动。

20. Modeling and Utilizing User's Internal State in Movie Recommendation Dialogue [PDF] 返回目录
Takashi Kodama, Ribeka Tanaka, Sadao Kurohashi
Abstract: Intelligent dialogue systems are expected as a new interface between humans and machines. Such an intelligent dialogue system should estimate the user's internal state (UIS) in dialogues and change its response appropriately according to the estimation result. In this paper, we model the UIS in dialogues, taking movie recommendation dialogues as examples, and construct a dialogue system that changes its response based on the UIS. Based on the dialogue data analysis, we model the UIS as three elements: knowledge, interest, and engagement. We train the UIS estimators on a dialogue corpus with the modeled UIS's annotations. The estimators achieved high estimation accuracy. We also design response change rules that change the system's responses according to each UIS. We confirmed that response changes using the result of the UIS estimators improved the system utterances' naturalness in both dialogue-wise evaluation and utterance-wise evaluation.
摘要：智能对话系统有望成为人与机器之间的新接口。这样的智能对话系统应该估计对话中的用户内部状态（UIS），并根据估计结果适当地改变其响应。在本文中，我们以对话中的UIS为模型，以电影推荐对话为例，并构建一个基于UIS更改其响应的对话系统。基于对话数据分析，我们将UIS建模为三个要素：知识，兴趣和参与。我们在具有建模的UIS注释的对话语料库上训练UIS估计量。估计器获得了很高的估计精度。我们还设计了响应更改规则，可以根据每个UIS更改系统的响应。我们确认，使用UIS估计器的结果进行的响应更改在对话方式评估和话语评估中均提高了系统话语的自然性。

21. Over a Decade of Social Opinion Mining [PDF] 返回目录
Keith Cortis, Brian Davis
Abstract: Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.
摘要：由于人们将其用于跨多种渠道的各种类型的社交互动，因此社交媒体的受欢迎程度和重要性正在增加。在线用户的这种社交互动包括提交有关各种个人，实体，主题和事件的反馈，意见和建议。本系统综述着重于社交意见挖掘的不断发展的研究领域，其任务是从跨多个社交媒体平台和以各种媒体格式显示，例如文本，图像，视频和音频。因此，通过社会舆论挖掘，自然语言可以按照人类表达的不同意见维度来理解。这为人工智能的发展做出了贡献，而人工智能又反过来帮助了一些实际应用案例的发展，例如客户服务和决策制定。对Social Opinion Mining的研究进行了全面的系统审查，总共485项研究，从2007年到2018年，为期12年。深入分析的重点是社交媒体平台，技术，社交数据集，语言，情态，工具技术，自然语言处理任务以及从已发表的研究中得出的其他方面。这种多源信息融合在从社交媒体平台挖掘人们的社会意见方面起着根本作用。这些可用于许多应用领域，包括用于产品/服务管理的市场营销，广告和销售，以及政治，技术，金融，医疗保健，体育和政府等多个领域和行业。提出了未来的研究方向，而进一步的研究和开发则有可能留下更广泛的学术和社会影响。

22. Codeswitched Sentence Creation using Dependency Parsing [PDF] 返回目录
Dhruval Jain, Arun D Prabhu, Shubham Vatsal, Gopi Ramena, Naresh Purre
Abstract: Codeswitching has become one of the most common occurrences across multilingual speakers of the world, especially in countries like India which encompasses around 23 official languages with the number of bilingual speakers being around 300 million. The scarcity of Codeswitched data becomes a bottleneck in the exploration of this domain with respect to various Natural Language Processing (NLP) tasks. We thus present a novel algorithm which harnesses the syntactic structure of English grammar to develop grammatically sensible Codeswitched versions of English-Hindi, English-Marathi and English-Kannada data. Apart from maintaining the grammatical sanity to a great extent, our methodology also guarantees abundant generation of data from a minuscule snapshot of given data. We use multiple datasets to showcase the capabilities of our algorithm while at the same time we assess the quality of generated Codeswitched data using some qualitative metrics along with providing baseline results for couple of NLP tasks.
摘要：代码转换已成为世界范围内使用多种语言的国家中最常见的情况之一，尤其是在像印度这样的国家中，它包含大约23种官方语言，并且使用双语的国家大约有3亿。相对于各种自然语言处理（NLP）任务，代码转换数据的稀缺性成为探索该领域的瓶颈。因此，我们提出了一种新颖的算法，该算法利用英语语法的句法结构来开发英语-印度语，英语-马拉地语和英语-卡纳达语数据的语法上合理的代码转换版本。除了在很大程度上保持语法上的合理性之外，我们的方法还保证从给定数据的微小快照中生成大量数据。我们使用多个数据集来展示算法的功能，同时我们使用一些定性指标评估生成的代码转换数据的质量，并为一些NLP任务提供基线结果。

23. On-Device Tag Generation for Unstructured Text [PDF] 返回目录
Manish Chugani, Shubham Vatsal, Gopi Ramena, Sukumar Moharana, Naresh Purre
Abstract: With the overwhelming transition to smart phones, storing important information in the form of unstructured text has become habitual to users of mobile devices. From grocery lists to drafts of emails and important speeches, users store a lot of data in the form of unstructured text (for eg: in the Notes application) on their devices, leading to cluttering of data. This not only prevents users from efficient navigation in the applications but also precludes them from perceiving the relations that could be present across data in those applications. This paper proposes a novel pipeline to generate a set of tags using world knowledge based on the keywords and concepts present in unstructured textual data. These tags can then be used to summarize, categorize or search for the desired information thus enhancing user experience by allowing them to have a holistic outlook of the kind of information stored in the form of unstructured text. In the proposed system, we use an on-device (mobile phone) efficient CNN model with pruned ConceptNet resource to achieve our goal. The architecture also presents a novel ranking algorithm to extract the top n tags from any given text.
摘要：随着向智能手机的压倒性转换，以非结构化文本形式存储重要信息已成为移动设备用户的习惯。从杂货店清单到电子邮件草稿和重要演讲，用户以非结构化文本的形式（例如：在Notes应用程序中）将大量数据存储在设备上，从而导致数据混乱。这不仅阻止用户在应用程序中进行有效导航，而且使他们无法感知那些应用程序中的数据之间可能存在的关系。本文提出了一种新颖的流水线，该流水线可以使用世界知识基于非结构化文本数据中存在的关键字和概念来生成一组标签。这些标签随后可用于汇总，分类或搜索所需信息，从而通过允许用户对以非结构化文本形式存储的信息种类有整体了解，从而增强了用户体验。在提出的系统中，我们使用带有修剪的ConceptNet资源的设备上（移动电话）高效CNN模型来实现我们的目标。该体系结构还提出了一种新颖的排名算法，可以从任何给定的文本中提取前n个标签。

24. Reciprocal Supervised Learning Improves Neural Machine Translation [PDF] 返回目录
Minkai Xu, Mingxuan Wang, Zhouhan Lin, Hao Zhou, Weinan Zhang, Lei Li
Abstract: Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-away prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher, RSL is capable of boosting the accuracy of one model by introducing other comparable or even weaker models. RSL can also be viewed as a more efficient alternative to ensemble. Extensive experiments demonstrate the superior performance of RSL on several benchmarks with significant margins.
摘要：尽管最近在图像分类方面取得了成功，但自训练仅在结构化预测任务（例如神经机器翻译（NMT））上获得了有限的收益。这主要是由于目标空间的组成性，其中遥远的预测假设导致了臭名昭著的强化错误问题。在本文中，我们将重新研究多种多样的模型的使用，并提出一种简单而有效的方法，称为对等监督学习（RSL）。 RSL首先利用单个模型来生成伪并行数据，然后在组合的合成语料库上协同训练每个模型。 RSL利用了以下事实：不同的参数化模型具有不同的归纳偏差，并且可以通过共同利用彼此之间的协议来做出更好的预测。与以前的基于强大的教师的知识提炼方法不同，RSL可以通过引入其他可比较甚至更弱的模型来提高一个模型的准确性。 RSL也可以看作是集成的一种更有效的替代方法。大量的实验表明，RSL在几个基准上均具有出色的性能，并且具有明显的优势。

25. A Sequence-Oblivious Generation Method for Context-Aware Hashtag Recommendation [PDF] 返回目录
Junmo Kang, Jeonghwan Kim, Suwon Shin, Sung-Hyon Myaeng
Abstract: Like search, a recommendation task accepts an input query or cue and provides desirable items, often based on a ranking function. Such a ranking approach rarely considers explicit dependency among the recommended items. In this work, we propose a generative approach to tag recommendation, where semantic tags are selected one at a time conditioned on the previously generated tags to model inter-dependency among the generated tags. We apply this tag recommendation approach to an Instagram data set where an array of context feature types (image, location, time, and text) are available for posts. To exploit the inter-dependency among the distinct types of features, we adopt a simple yet effective architecture using self-attention, making deep interactions possible. Empirical results show that our method is significantly superior to not only the usual ranking schemes but also autoregressive models for tag recommendation. They indicate that it is critical to fuse mutually supporting features at an early stage to induce extensive and comprehensive view on inter-context interaction in generating tags in a recurrent feedback loop.
摘要：与搜索类似，推荐任务通常会基于排名函数接受输入的查询或提示并提供所需的项目。这种排序方法很少考虑推荐项目之间的显式依赖。在这项工作中，我们提出了一种标记推荐的生成方法，其中，以先前生成的标记为条件，一次选择一个语义标记，以对生成的标记之间的相互依赖性进行建模。我们将此标签推荐方法应用于Instagram数据集，其中可以使用一系列上下文特征类型（图像，位置，时间和文本）。为了利用不同类型的功能之间的相互依赖关系，我们采用了一种简单而有效的架构，该架构使用了自我注意功能，从而可以进行深层交互。实证结果表明，我们的方法不仅明显优于通常的排名方案，而且远远优于标签推荐的自回归模型。他们指出，至关重要的是，在早期阶段融合相互支持的功能，以便在循环反馈循环中生成标签时引发对上下文间交互作用的广泛而全面的了解。

26. Enhanced Offensive Language Detection Through Data Augmentation [PDF] 返回目录
Ruibo Liu, Guangxuan Xu, Soroush Vosoughi
Abstract: Detecting offensive language on social media is an important task. The ICWSM-2020 Data Challenge Task 2 is aimed at identifying offensive content using a crowd-sourced dataset containing 100k labelled tweets. The dataset, however, suffers from class imbalance, where certain labels are extremely rare compared with other classes (e.g, the hateful class is only 5% of the data). In this work, we present Dager (Data Augmenter), a generation-based data augmentation method, that improves the performance of classification on imbalanced and low-resource data such as the offensive language dataset. Dager extracts the lexical features of a given class, and uses these features to guide the generation of a conditional generator built on GPT-2. The generated text can then be added to the training set as augmentation data. We show that applying Dager can increase the F1 score of the data challenge by 11% when we use 1% of the whole dataset for training (using BERT for classification); moreover, the generated data also preserves the original labels very well. We test Dager on four different classifiers (BERT, CNN, Bi-LSTM with attention, and Transformer), observing universal improvement on the detection, indicating our method is effective and classifier-agnostic.
摘要：在社交媒体上检测攻击性语言是一项重要任务。 ICWSM-2020数据挑战任务2旨在使用包含10万个带标签推文的众包数据集来识别令人反感的内容。但是，数据集存在类别不平衡的问题，其中某些标签与其他类别相比极为罕见（例如，可恨的类别仅占数据的5％）。在这项工作中，我们提出了Dager（Data Augmenter），这是一种基于世代的数据扩充方法，可以提高对不平衡和资源匮乏的数据（例如攻击性语言数据集）的分类性能。 Dager提取给定类的词汇特征，并使用这些特征指导基于GPT-2的条件生成器的生成。然后可以将生成的文本作为扩充数据添加到训练集中。我们表明，当我们使用整个数据集的1％进行训练时（使用BERT进行分类），应用Dager可以使数据挑战的F1分数提高11％；此外，生成的数据还可以很好地保留原始标签。我们在四个不同的分类器（BERT，CNN，Bi-LSTM和Transformer）上测试了Dager，观察到检测方法的普遍改进，表明我们的方法有效且与分类器无关。

27. Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation [PDF] 返回目录
Ruibo Liu, Guangxuan Xu, Chenyan Jia, Weicheng Ma, Lili Wang, Soroush Vosoughi
Abstract: Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.
摘要：事实证明，数据增强在许多NLU任务中都是有效的，特别是对于那些数据匮乏的人。在本文中，我们提出了一个功能强大且易于部署的文本增强框架Data Boost，该框架通过增强学习指导的条件生成来增强数据。我们在五种不同的分类器架构下，针对三种不同的文本分类任务评估了Data Boost。结果表明，Data Boost可以提高分类器的性能，尤其是在低资源数据场景中。例如，当仅提供全部训练数据的10％时，Data Boost可以将三个任务的F1平均提高8.7％。我们还将Data Boost与六种先前的文本扩充方法进行了比较。通过人工评估（N = 178），我们确认Data Boost扩充在可读性和类一致性方面具有与原始数据相当的质量。

28. Cross-Domain Sentiment Classification with In-Domain Contrastive Learning [PDF] 返回目录
Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer
Abstract: Contrastive learning (CL) has been successful as a powerful representation learning method. In this paper, we propose a contrastive learning framework for cross-domain sentiment classification. We aim to induce domain invariant optimal classifiers rather than distribution matching. To this end, we introduce in-domain contrastive learning and entropy minimization. Also, we find through ablation studies that these two techniques behaviour differently in case of large label distribution shift and conclude that the best practice is to choose one of them adaptively according to label distribution shift. The new state-of-the-art results our model achieves on standard benchmarks show the efficacy of the proposed method.
摘要：对比学习（CL）作为一种功能强大的表示学习方法已获得成功。在本文中，我们提出了一种跨领域情感分类的对比学习框架。我们旨在诱导领域不变的最优分类器，而不是分布匹配。为此，我们介绍了领域内的对比学习和熵最小化。此外，我们通过消融研究发现，在大标签分布偏移的情况下，这两种技术的行为有所不同，并得出最佳实践是根据标签分布偏移来自适应地选择其中之一。我们的模型在标准基准上获得的最新技术成果表明了该方法的有效性。

29. Does Yoga Make You Happy? Analyzing Twitter User Happiness using Textual and Temporal Information [PDF] 返回目录
Tunazzina Islam, Dan Goldwasser
Abstract: Although yoga is a multi-component practice to hone the body and mind and be known to reduce anxiety and depression, there is still a gap in understanding people's emotional state related to yoga in social media. In this study, we investigate the causal relationship between practicing yoga and being happy by incorporating textual and temporal information of users using Granger causality. To find out causal features from the text, we measure two variables (i) Yoga activity level based on content analysis and (ii) Happiness level based on emotional state. To understand users' yoga activity, we propose a joint embedding model based on the fusion of neural networks with attention mechanism by leveraging users' social and textual information. For measuring the emotional state of yoga users (target domain), we suggest a transfer learning approach to transfer knowledge from an attention-based neural network model trained on a source domain. Our experiment on Twitter dataset demonstrates that there are 1447 users where "yoga Granger-causes happiness".
摘要：虽然瑜伽是锻炼身体和心灵的多成分练习，并且众所周知可以减轻焦虑和抑郁，但在社交媒体上了解人们与瑜伽相关的情绪状态仍然存在差距。在这项研究中，我们通过使用Granger因果关系并入用户的文本和时间信息来研究练习瑜伽与快乐之间的因果关系。为了从文本中找出因果特征，我们测量了两个变量（i）基于内容分析的瑜伽活动水平和（ii）基于情绪状态的幸福水平。为了了解用户的瑜伽活动，我们提出了一种基于神经网络与注意力机制融合的联合嵌入模型，该模型利用了用户的社交和文本信息。为了测量瑜伽使用者的情感状态（目标域），我们建议一种转移学习方法，以从在源域上训练的基于注意力的神经网络模型转移知识。我们在Twitter数据集上进行的实验表明，有1447个用户“瑜伽格兰杰导致幸福”。

30. Data-Efficient Methods for Dialogue Systems [PDF] 返回目录
Igor Shalyminov
Abstract: Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa or business-oriented solutions. Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts. Trained with smaller data, these methods end up severely lacking robustness (e.g. to disfluencies and out-of-domain input), and often just have too little generalisation power. In this thesis, we address the above issues by introducing a series of methods for training robust dialogue systems from minimal data. Firstly, we study two orthogonal approaches to dialogue: linguistically informed and machine learning-based - from the data efficiency perspective. We outline the steps to obtain data-efficient solutions with either approach. We then introduce two data-efficient models for dialogue response generation: the Dialogue Knowledge Transfer Network based on latent variable dialogue representations, and the hybrid Generative-Retrieval Transformer model (ranked first at the DSTC 8 Fast Domain Adaptation task). Next, we address the problem of robustness given minimal data. As such, propose a multitask LSTM-based model for domain-general disfluency detection. For the problem of out-of-domain input, we present Turn Dropout, a data augmentation technique for anomaly detection only using in-domain data, and introduce autoencoder-augmented models for efficient training with Turn Dropout. Finally, we focus on social dialogue and introduce a neural model for response ranking in social conversation used in Alana, the 3rd place winner in the Amazon Alexa Prize 2017 and 2018. We employ a novel technique of predicting the dialogue length as the main ranking objective and show that this approach improves upon the ratings-based counterpart in terms of data efficiency while matching it in performance.
摘要：对话型用户界面（CUI）在日常生活中变得无处不在，在以消费者为中心的产品（如Siri和Alexa）或面向业务的解决方案中。深度学习是对话系统中许多最新突破的基础，但需要大量训练数据，通常由专家进行注释。用较小的数据训练后，这些方法最终严重缺乏鲁棒性（例如，对流量不足和域外输入），并且通常具有太少的泛化能力。在本文中，我们通过介绍一系列从最小数据训练健壮对话系统的方法来解决上述问题。首先，我们从数据效率的角度研究了两种正交的对话方法：语言告知和基于机器学习的对话。我们概述了使用两种方法获得数据有效解决方案的步骤。然后，我们介绍两个用于对话响应生成的数据有效模型：基于潜在变量对话表示的对话知识传递网络，以及混合的生成-检索-变压器模型（在DSTC 8快速域自适应任务中排名第一）。接下来，我们解决了在数据最少的情况下鲁棒性的问题。因此，提出了一种基于LSTM的多任务模型，用于领域通用的不满度检测。对于域外输入的问题，我们提出了Turn Dropout，这是一种仅使用域内数据进行异常检测的数据增强技术，并介绍了自动编码器增强模型以使用Turn Dropout进行有效的训练。最后，我们专注于社交对话，并引入了神经模型，用于在2017年和2018年亚马逊Alexa奖第三名的Alana中使用的社交对话响应排名。我们采用了一种预测对话长度的新颖技术作为主要排名目标并表明该方法在数据效率方面与基于评级的同类方法相比有所改进，同时在性能上与之相当。

31. Inductive Bias and Language Expressivity in Emergent Communication [PDF] 返回目录
Shangmin Guo, Yi Ren, Agnieszka Słowik, Kory Mathewson
Abstract: Referential games and reconstruction games are the most common game types for studying emergent languages. We investigate how the type of the language game affects the emergent language in terms of: i) language compositionality and ii) transfer of an emergent language to a task different from its origin, which we refer to as language expressivity. With empirical experiments on a handcrafted symbolic dataset, we show that languages emerged from different games have different compositionality and further different expressivity.
摘要：参照游戏和重构游戏是研究新兴语言的最常见游戏类型。我们从以下方面研究语言游戏的类型如何影响新兴语言：i）语言组成和ii）将新兴语言转移到不同于其起源的任务，我们称之为语言表现力。通过在手工制作的符号数据集上进行的实验，我们显示了从不同游戏中出现的语言具有不同的组成性和进一步不同的表现力。

32. On-Device Sentence Similarity for SMS Dataset [PDF] 返回目录
Arun D Prabhu, Nikhil Arora, Shubham Vatsal, Gopi Ramena, Sukumar Moharana, Naresh Purre
Abstract: Determining the sentence similarity between Short Message Service (SMS) texts/sentences plays a significant role in mobile device industry. Gauging the similarity between SMS data is thus necessary for various applications like enhanced searching and navigation, clubbing together SMS of similar type when given a custom label or tag is provided by user irrespective of their sender etc. The problem faced with SMS data is its incomplete structure and grammatical inconsistencies. In this paper, we propose a unique pipeline for evaluating the text similarity between SMS texts. We use Part of Speech (POS) model for keyword extraction by taking advantage of the partial structure embedded in SMS texts and similarity comparisons are carried out using statistical methods. The proposed pipeline deals with major semantic variations across SMS data as well as makes it effective for its application on-device (mobile phone). To showcase the capabilities of our work, our pipeline has been designed with an inclination towards one of the possible applications of SMS text similarity discussed in one of the following sections but nonetheless guarantees scalability for other applications as well.
摘要：确定短消息服务（SMS）文本/句子之间的句子相似度在移动设备行业中起着重要作用。因此，对于各种应用程序（例如增强的搜索和导航），当用户给定自定义标签或标签的情况下，不论其发送者如何，将相似类型的SMS组合在一起，都必须对SMS数据之间的相似性进行度量。结构和语法上的不一致。在本文中，我们提出了一个独特的管道来评估SMS文本之间的文本相似性。我们利用词性（POS）模型，通过利用SMS文本中嵌入的部分结构来提取关键字，并使用统计方法进行相似度比较。拟议中的管道处理了SMS数据中的主要语义变化，并使其对其在设备（手机）上的应用有效。为了展示我们的工作能力，我们在设计管道时倾向于以下几节中讨论的一种可能的SMS文本相似性应用程序，但是仍然保证了其他应用程序的可伸缩性。

33. Diverse Melody Generation from Chinese Lyrics via Mutual Information Maximization [PDF] 返回目录
Ruibin Yuan, Ge Zhang, Anqiao Yang, Xinyue Zhang
Abstract: In this paper, we propose to adapt the method of mutual information maximization into the task of Chinese lyrics conditioned melody generation to improve the generation quality and diversity. We employ scheduled sampling and force decoding techniques to improve the alignment between lyrics and melodies. With our method, which we called Diverse Melody Generation (DMG), a sequence-to-sequence model learns to generate diverse melodies heavily depending on the input style ids, while keeping the tonality and improving the alignment. The experimental results of subjective tests show that DMG can generate more pleasing and coherent tunes than baseline methods.
摘要：本文提出将互信息最大化方法应用于汉语歌词条件性旋律生成的任务，以提高生成质量和多样性。我们采用预定的采样和强制解码技术来改善歌词和旋律之间的对齐方式。使用我们称为多元旋律生成（DMG）的方法，序列到序列模型学会了在很大程度上取决于输入样式ID来生成各种旋律，同时保持了音调并改善了对齐方式。主观测试的实验结果表明，DMG可以比基线方法产生更多令人愉悦和连贯的音乐。

34. Grammar-Aware Question-Answering on Quantum Computers [PDF] 返回目录
Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice, Bob Coecke
Abstract: Natural language processing (NLP) is at the forefront of great advances in contemporary AI, and it is arguably one of the most challenging areas of the field. At the same time, with the steady growth of quantum hardware and notable improvements towards implementations of quantum algorithms, we are approaching an era when quantum computers perform tasks that cannot be done on classical computers with a reasonable amount of resources. This provides a new range of opportunities for AI, and for NLP specifically. Earlier work has already demonstrated a potential quantum advantage for NLP in a number of manners: (i) algorithmic speedups for search-related or classification tasks, which are the most dominant tasks within NLP, (ii) exponentially large quantum state spaces allow for accommodating complex linguistic structures, (iii) novel models of meaning employing density matrices naturally model linguistic phenomena such as hyponymy and linguistic ambiguity, among others. In this work, we perform the first implementation of an NLP task on noisy intermediate-scale quantum (NISQ) hardware. Sentences are instantiated as parameterised quantum circuits. We encode word-meanings in quantum states and we explicitly account for grammatical structure, which even in mainstream NLP is not commonplace, by faithfully hard-wiring it as entangling operations. This makes our approach to quantum natural language processing (QNLP) particularly NISQ-friendly. Our novel QNLP model shows concrete promise for scalability as the quality of the quantum hardware improves in the near future.
摘要：自然语言处理（NLP）处于当代AI重大发展的最前沿，可以说是该领域最具挑战性的领域之一。同时，随着量子硬件的稳定增长和对量子算法实现的显着改进，我们正在接近一个时代，量子计算机以合理的资源量执行无法在经典计算机上完成的任务。这为AI特别是NLP提供了新的机遇。较早的工作已经以多种方式展示了NLP的潜在量子优势：（i）搜索相关任务或分类任务的算法加速，这是NLP中最主要的任务；（ii）指数级的大量子态空间可容纳复杂的语言结构；（iii）使用密度矩阵的新颖意义模型，自然地对语言现象建模，例如下位音和语言歧义等。在这项工作中，我们在嘈杂的中级量子（NISQ）硬件上执行NLP任务的第一个实现。句子被实例化为参数化的量子电路。我们通过在量子态中对单词含义进行编码，并明确地说明了语法结构，即使在主流的NLP中，语法结构也是如此，通过忠实地将其固定为纠结操作。这使得我们的量子自然语言处理（QNLP）方法特别适合NISQ。随着量子硬件质量的提高，在不久的将来，我们新颖的QNLP模型显示出可扩展性的具体前景。

35. Foundations for Near-Term Quantum Natural Language Processing [PDF] 返回目录
Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi
Abstract: We provide conceptual and mathematical foundations for near-term quantum natural language processing (QNLP), and do so in quantum computer scientist friendly terms. We opted for an expository presentation style, and provide references for supporting empirical evidence and formal statements concerning mathematical generality. We recall how the quantum model for natural language that we employ canonically combines linguistic meanings with rich linguistic structure, most notably grammar. In particular, the fact that it takes a quantum-like model to combine meaning and structure, establishes QNLP as quantum-native, on par with simulation of quantum systems. Moreover, the now leading Noisy Intermediate-Scale Quantum (NISQ) paradigm for encoding classical data on quantum hardware, variational quantum circuits, makes NISQ exceptionally QNLP-friendly: linguistic structure can be encoded as a free lunch, in contrast to the apparently exponentially expensive classical encoding of grammar. Quantum speed-up for QNLP tasks has already been established in previous work with Will Zeng. Here we provide a broader range of tasks which all enjoy the same advantage. Diagrammatic reasoning is at the heart of QNLP. Firstly, the quantum model interprets language as quantum processes via the diagrammatic formalism of categorical quantum mechanics. Secondly, these diagrams are via ZX-calculus translated into quantum circuits. Parameterisations of meanings then become the circuit variables to be learned. Our encoding of linguistic structure within quantum circuits also embodies a novel approach for establishing word-meanings that goes beyond the current standards in mainstream AI, by placing linguistic structure at the heart of Wittgenstein's meaning-is-context.
摘要：我们为近期的量子自然语言处理（QNLP）提供了概念和数学基础，并且以量子计算机科学家友好的方式进行。我们选择了一种说明性的表示方式，并提供了支持以支持有关数学普遍性的经验证据和正式陈述。我们回想起我们采用的自然语言的量子模型是如何将语言含义与丰富的语言结构（尤其是语法）结合在一起的。尤其是，它需要采用类似于量子模型的模型来组合含义和结构，从而将QNLP建立为量子本机，这与对量子系统的仿真相提并论。此外，目前领先的用于在量子硬件，变分量子电路上编码经典数据的“嘈杂中级量子”（NISQ）范例使NISQ格外QNLP友好：与明显昂贵的语言相比，语言结构可以被编码为免费午餐。语法的经典编码。 QNLP任务的量子加速已经在与Will Zeng的先前工作中确定。在这里，我们提供了更广泛的任务，所有这些任务都具有相同的优势。图解推理是QNLP的核心。首先，量子模型通过分类量子力学的图解形式主义将语言解释为量子过程。其次，这些图是通过ZX微积分转换成量子电路的。含义的参数化然后成为要学习的电路变量。我们在量子电路中对语言结构的编码还通过将语言结构置于维特根斯坦意义的上下文的核心，体现了一种新颖的建立词义的方法，该方法超越了主流AI的当前标准。

36. Generating Natural Questions from Images for Multimodal Assistants [PDF] 返回目录
Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, Jason Williams
Abstract: Generating natural, diverse, and meaningful questions from images is an essential task for multimodal assistants as it confirms whether they have understood the object and scene in the images properly. The research in visual question answering (VQA) and visual question generation (VQG) is a great step. However, this research does not capture questions that a visually-abled person would ask multimodal assistants. Recently published datasets such as KB-VQA, FVQA, and OK-VQA try to collect questions that look for external knowledge which makes them appropriate for multimodal assistants. However, they still contain many obvious and common-sense questions that humans would not usually ask a digital assistant. In this paper, we provide a new benchmark dataset that contains questions generated by human annotators keeping in mind what they would ask multimodal digital assistants. Large scale annotations for several hundred thousand images are expensive and time-consuming, so we also present an effective way of automatically generating questions from unseen images. In this paper, we present an approach for generating diverse and meaningful questions that consider image content and metadata of image (e.g., location, associated keyword). We evaluate our approach using standard evaluation metrics such as BLEU, METEOR, ROUGE, and CIDEr to show the relevance of generated questions with human-provided questions. We also measure the diversity of generated questions using generative strength and inventiveness metrics. We report new state-of-the-art results on the public and our datasets.
摘要：从图像中生成自然，多样且有意义的问题是多模式助手的一项基本任务，因为它可以确认他们是否正确理解了图像中的对象和场景。视觉问题解答（VQA）和视觉问题生成（VQG）的研究是迈出的重要一步。但是，这项研究并未捕捉到有视觉能力的人会问多模式助手的问题。最近发布的数据集（例如KB-VQA，FVQA和OK-VQA）尝试收集寻找外部知识的问题，这使其适合于多模式助手。但是，它们仍然包含许多显而易见的常识性问题，人类通常不会问数字助理。在本文中，我们提供了一个新的基准数据集，其中包含人类注释者所产生的问题，同时牢记他们会问多模式数字助理。数十万张图像的大规模注释既昂贵又费时，因此，我们还提出了一种有效的方法，可以从看不见的图像中自动生成问题。在本文中，我们提出了一种生成各种有意义的问题的方法，这些问题考虑了图像的内容和图像的元数据（例如位置，关联的关键字）。我们使用BLEU，METEOR，ROUGE和CIDEr等标准评估指标评估我们的方法，以显示生成的问题与人工提供的问题的相关性。我们还使用生成力和创造力指标来衡量所产生问题的多样性。我们在公众和我们的数据集上报告最新的最新结果。

37. DeepTriage: Automated Transfer Assistance for Incidents in Cloud Services [PDF] 返回目录
Phuong Pham, Vivek Jain, Lukas Dauterman, Justin Ormont, Navendu Jain
Abstract: As cloud services are growing and generating high revenues, the cost of downtime in these services is becoming significantly expensive. To reduce loss and service downtime, a critical primary step is to execute incident triage, the process of assigning a service incident to the correct responsible team, in a timely manner. An incorrect assignment risks additional incident reroutings and increases its time to mitigate by 10x. However, automated incident triage in large cloud services faces many challenges: (1) a highly imbalanced incident distribution from a large number of teams, (2) wide variety in formats of input data or data sources, (3) scaling to meet production-grade requirements, and (4) gaining engineers' trust in using machine learning recommendations. To address these challenges, we introduce DeepTriage, an intelligent incident transfer service combining multiple machine learning techniques - gradient boosted classifiers, clustering methods, and deep neural networks - in an ensemble to recommend the responsible team to triage an incident. Experimental results on real incidents in Microsoft Azure show that our service achieves 82.9% F1 score. For highly impacted incidents, DeepTriage achieves F1 score from 76.3% - 91.3%. We have applied best practices and state-of-the-art frameworks to scale DeepTriage to handle incident routing for all cloud services. DeepTriage has been deployed in Azure since October 2017 and is used by thousands of teams daily.
摘要：随着云服务的增长和创收的增加，这些服务的停机成本变得越来越昂贵。为了减少损失和服务停机，关键的首要步骤是执行事件分类，即及时将服务事件分配给正确的责任团队的过程。错误的分配可能会导致额外的事件重新路由，并使其时间减少10倍。但是，大型云服务中的自动事件分类处理面临许多挑战：（1）来自众多团队的事件分布高度不平衡；（2）输入数据或数据源格式的多样性，（3）扩展以满足生产需求-等级要求，以及（4）在使用机器学习建议时获得工程师的信任。为了解决这些挑战，我们引入了DeepTriage，这是一种智能的事件传输服务，结合了多种机器学习技术-梯度增强分类器，聚类方法和深度神经网络-合起来推荐负责的团队对事件进行分类。在Microsoft Azure中对实际事件的实验结果表明，我们的服务获得了82.9％的F1分数。对于严重影响的事件，DeepTriage的F1分数从76.3％-91.3％达到。我们已应用最佳实践和最新框架来扩展DeepTriage，以处理所有云服务的事件路由。自2017年10月以来，DeepTriage已在Azure中部署，每天有成千上万的团队使用它。

38. Confidence-aware Non-repetitive Multimodal Transformers for TextCaps [PDF] 返回目录
Zhaokai Wang, Renda Bao, Qi Wu, Si Liu
Abstract: When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, i.e. image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to read text and cover them in generated captions. Existing approaches fail to generate accurate descriptions because of their (1) poor reading ability; (2) inability to choose the crucial words among all extracted OCR tokens; (3) repetition of words in predicted captions. To this end, we propose a Confidence-aware Non-repetitive Multimodal Transformers (CNMT) to tackle the above challenges. Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens. To address the issue of word redundancy in captions, our Generation Module includes a repetition mask to avoid predicting repeated word in captions. Our model outperforms state-of-the-art models on TextCaps dataset, improving from 81.0 to 93.0 in CIDEr. Our source code is publicly available.
摘要：描述图像时，在视觉场景中阅读文本对于理解关键信息至关重要。最近的工作探索了TextCaps任务，即通过读取光学字符识别（OCR）令牌的图像字幕，该模型要求模型读取文本并将其覆盖在生成的字幕中。现有方法由于其（1）阅读能力差而无法生成准确的描述。（2）无法在所有提取的OCR令牌中选择关键单词；（3）预测字幕中的单词重复。为此，我们提出了一种具有信心的非重复多模态变压器（CNMT），以应对上述挑战。我们的CNMT包含阅读，推理和生成模块，其中阅读模块采用更好的OCR系统来增强文本阅读能力，并采用置信度来选择最值得关注的令牌。为了解决字幕中单词冗余的问题，我们的生成模块包含一个重复掩码，以避免预测字幕中的重复单词。我们的模型优于TextCaps数据集上的最新模型，在CIDEr中从81.0改进到93.0。我们的源代码是公开可用的。

39. MFST: A Python OpenFST Wrapper With Support for Custom Semirings and Jupyter Notebooks [PDF] 返回目录
Matthew Francis-Landau
Abstract: This paper introduces mFST, a new Python library for working with Finite-State Machines based on OpenFST. mFST is a thin wrapper for OpenFST and exposes all of OpenFST's methods for manipulating FSTs. Additionally, mFST is the only Python wrapper for OpenFST that exposes OpenFST's ability to define a custom semirings. This makes mFST ideal for developing models that involve learning the weights on a FST or creating neuralized FSTs. mFST has been designed to be easy to get started with and has been previously used in homework assignments for a NLP class as well in projects for integrating FSTs and neural networks. In this paper, we exhibit mFST API and how to use mFST to build a simple neuralized FST with PyTorch.
摘要：本文介绍了mFST，这是一个新的Python库，用于基于OpenFST的有限状态机。 mFST是OpenFST的精简包装，并公开了OpenFST的所有操作FST的方法。此外，mFST是OpenFST的唯一Python封装程序，它公开了OpenFST定义自定义半环的能力。这使得mFST非常适合开发涉及学习FST权重或创建神经化FST的模型。 mFST的设计易于入门，以前已用于NLP类的家庭作业以及集成FST和神经网络的项目中。在本文中，我们展示了mFST API以及如何使用mFST通过PyTorch构建简单的神经化FST。

40. MLS: A Large-Scale Multilingual Dataset for Speech Research [PDF] 返回目录
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
Abstract: This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages. Additionally, we provide Language Models (LM) and baseline Automatic Speech Recognition (ASR) models and for all the languages in our dataset. We believe such a large transcribed dataset will open new avenues in ASR and Text-To-Speech (TTS) research. The dataset will be made freely available for anyone at this http URL.
摘要：本文介绍了多语言LibriSpeech（MLS）数据集，这是一个适用于语音研究的大型多语言语料库。该数据集来自LibriVox的有声读物，由8种语言组成，其中包括大约44.5万小时的英语和其他语言的总计6K小时。此外，我们为数据集中的所有语言提供语言模型（LM）和基准自动语音识别（ASR）模型。我们相信，如此庞大的转录数据集将为ASR和文本语音转换（TTS）研究开辟新的途径。该数据集将通过此http URL免费提供给任何人。

41. Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream Tasks [PDF] 返回目录
Modestas Filipavicius, Matteo Manica, Joris Cadow, Maria Rodriguez Martinez
Abstract: Less than 1% of protein sequences are structurally and functionally annotated. Natural Language Processing (NLP) community has recently embraced self-supervised learning as a powerful approach to learn representations from unlabeled text, in large part due to the attention-based context-aware Transformer models. In this work we present a modification to the RoBERTa model by inputting during pre-training a mixture of binding and non-binding protein sequences (from STRING database). However, the sequence pairs have no label to indicate their binding status, as the model relies solely on Masked Language Modeling (MLM) objective during pre-training. After fine-tuning, such approach surpasses models trained on single protein sequences for protein-protein binding prediction, TCR-epitope binding prediction, cellular-localization and remote homology classification tasks. We suggest that the Transformer's attention mechanism contributes to protein binding site discovery. Furthermore, we compress protein sequences by 64% with the Byte Pair Encoding (BPE) vocabulary consisting of 10K subwords, each around 3-4 amino acids long. Finally, to expand the model input space to even larger proteins and multi-protein assemblies, we pre-train Longformer models that support 2,048 tokens. Further work in token-level classification for secondary structure prediction is needed. Code available at: this https URL
摘要：少于1％的蛋白质序列在结构和功能上都有注释。自然语言处理（NLP）社区最近将自我监督学习作为一种从未标记文本中学习表示形式的强大方法，这在很大程度上是由于基于关注的上下文感知的Transformer模型。在这项工作中，我们通过对结合和非结合蛋白序列的混合物进行预训练来输入，对RoBERTa模型进行修改（来自STRING数据库）。但是，序列对没有标签来指示其结合状态，因为模型在预训练期间仅依赖于蒙版语言建模（MLM）目标。经过微调后，这种方法超越了在单个蛋白质序列上进行蛋白质-蛋白质结合预测，TCR-表位结合预测，细胞定位和远程同源性分类任务训练的模型。我们建议，变形金刚的注意力机制有助于发现蛋白质结合位点。此外，我们使用由10K个子字组成的字节对编码（BPE）词汇表将蛋白质序列压缩64％，每个子字长约3-4个氨基酸。最后，为了将模型输入空间扩展到更大的蛋白质和多蛋白质装配体，我们预训练了支持2,048个令牌的Longformer模型。对于二级结构预测，需要在令牌级别分类中做进一步的工作。代码可在以下网址获得：https URL

42. MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs [PDF] 返回目录
Fangda Han, Guoyao Hao, Ricardo Guerrero, Vladimir Pavlovic
Abstract: Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature maps to learn scalewise label information. Because of the complex nature of the multilabel image generation problem, we also regularize synthetic image by predicting the corresponding ingredients as well as encourage the discriminator to distinguish between matched image and mismatched image. To verify the efficacy of MPG, we test it on Pizza10, which is a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realist pizza images with desired ingredients. The framework can be easily extend to other multilabel image generation scenarios.
摘要：多标签条件图像生成是计算机视觉中的一个难题。在这项工作中，我们提出了多成分比萨饼生成器（MPG），这是一种用于合成多标签图像的条件生成神经网络（GAN）框架。我们基于称为GAN2的最新GAN结构设计MPG，其中我们通过强制执行中间特征图来学习按比例标注的信息，从而开发了一种新的调节技术。由于多标签图像生成问题的复杂性，我们还通过预测相应的成分来规范化合成图像，并鼓励区分器区分匹配的图像和不匹配的图像。为了验证MPG的功效，我们在Pizza10（经过仔细注释的多成分的批萨图像数据集）上对其进行了测试。 MPG可以成功生成具有所需成分的逼真比萨饼图像。该框架可以轻松扩展到其他多标签图像生成方案。

43. Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment [PDF] 返回目录
Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract: The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i.e. meta-learning) and (2) doing so while being trained on a different source modality. We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities? Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data while ensuring quick generalization to new tasks across different modalities. We study this problem on 3 classification tasks: text to image, image to audio, and text to speech. Our results demonstrate strong performance even when the new target modality has only a few (1-10) labeled samples and in the presence of noisy labels, a scenario particularly prevalent in low-resource modalities.
摘要：自然世界充满了通过视觉，听觉，触觉和语言形式表达的概念。但是，多模式学习中的许多现有进展主要集中在培训和测试时出现相同模式的问题，这使得在低资源模式下的学习特别困难。在这项工作中，我们提出了跨模式泛化的算法：一种训练模型的学习范例，该模型可以（1）在目标模态下快速执行新任务（即元学习），以及（2）在模型上进行训练时这样做不同的来源方式。我们研究了一个关键的研究问题：尽管针对不同的源模态和目标模态使用了单独的编码器，但如何确保跨模态的泛化呢？我们的解决方案基于元对齐，这是一种使用强和弱配对的跨模态数据对齐表示空间的新方法，同时可以确保快速概括到不同模态中的新任务。我们在3个分类任务上研究了此问题：文本到图像，图像到音频和文本到语音。即使在新的目标模式只有少量（1-10）标记的样本并且存在嘈杂的标记的情况下，我们的结果也显示出了出色的性能，这种情况在低资源模式中尤为普遍。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-08

目录

摘要