0%

【arxiv论文】 Computation and Language 2020-06-12

目录

1. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge [PDF] 摘要
2. Multi-hop Reading Comprehension across Documents with Path-based Graph Convolutional Network [PDF] 摘要
3. A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction [PDF] 摘要
4. CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP [PDF] 摘要
5. Provenance for Linguistic Corpora Through Nanopublications [PDF] 摘要
6. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [PDF] 摘要
7. Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context [PDF] 摘要
8. Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France [PDF] 摘要
9. Discrete Latent Variable Representations for Low-Resource Text Classification [PDF] 摘要
10. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages [PDF] 摘要
11. Emora STDM: A Versatile Framework for Innovative Dialogue System Development [PDF] 摘要
12. Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols [PDF] 摘要
13. Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges [PDF] 摘要
14. Disentangled Non-Local Neural Networks [PDF] 摘要
15. VirTex: Learning Visual Representations from Textual Annotations [PDF] 摘要
16. Exploring Weaknesses of VQA Models through Attribution Driven Insights [PDF] 摘要
17. Deep Differential System Stability -- Learning advanced computations from examples [PDF] 摘要
18. Extracting and categorising the reactions to COVID-19 by the South African public -- A social media study [PDF] 摘要
19. Mental Workload and Language Production in Non-Native Speaker IPA Interaction [PDF] 摘要
20. See what I'm saying? Comparing Intelligent Personal Assistant use for Native and Non-Native Language Speakers [PDF] 摘要
21. Transparency in Language Generation: Levels of Automation [PDF] 摘要
22. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System [PDF] 摘要
23. Large-Scale Adversarial Training for Vision-and-Language Representation Learning [PDF] 摘要
24. PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing [PDF] 摘要

摘要

1. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge [PDF] 返回目录
  Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant
Abstract: To what extent can a neural network systematically reason over symbolic facts? Evidence suggests that large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. Recently, it has been shown that Transformer-based models succeed in consistent reasoning over explicit symbolic facts, under a "closed-world" assumption. However, in an open-domain setup, it is desirable to tap into the vast reservoir of implicit knowledge already encoded in the parameters of pre-trained LMs. In this work, we provide a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements. To do this, we describe a procedure for automatically generating datasets that teach a model new reasoning skills, and demonstrate that models learn to effectively perform inference which involves implicit taxonomic and world knowledge, chaining and counting. Finally, we show that "teaching" models to reason generalizes beyond the training distribution: they successfully compose the usage of multiple reasoning skills in single examples. Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
摘要:在何种程度上可以一个神经网络系统的理智战胜了象征性的事实吗?有证据表明,大型预训练的语言模型(LMS)获得了一些推理能力,但这种能力是难以控制。最近,它已经表明,基于变压器的模型在明确的象征事实相符推理成功,下一个“封闭的世界”的假设。然而,在开放域的设置,希望进军在预先训练的LMS参数已编码的隐性知识方面的巨大资源。在这项工作中,我们提供了一个首次证明LMS能够通过训练来可靠地执行系统的推理组合这两个隐含的,预先训练知识和显性的自然语言语句。要做到这一点,我们描述了一个程序自动生成,教一个新的模型推理能力的数据集,并证明模型学习如何有效地进行推理涉及隐分类和世界的知识,链接和计数。最后,我们表明,“教学”模式,以超越训练分布的原因概括:他们成功地组成的多个推理技巧的使用单一实例。我们的工作铺平了道路,朝着这一不断与谁可以通过添加简单的自然语言的语句立即纠正模型用户互动提高开放领域系统的路径。

2. Multi-hop Reading Comprehension across Documents with Path-based Graph Convolutional Network [PDF] 返回目录
  Zeyun Tang, Yongliang Shen, Xinyin Ma, Wei Xu, Jiale Yu, Weiming Lu
Abstract: Multi-hop reading comprehension across multiple documents attracts much attention recently. In this paper, we propose a novel approach to tackle this multi-hop reading comprehension problem. Inspired by human reasoning processing, we construct a path-based reasoning graph from supporting documents. This graph can combine both the idea of the graph-based and path-based approaches, so it is better for multi-hop reasoning. Meanwhile, we propose Gated-RGCN to accumulate evidence on the path-based reasoning graph, which contains a new question-aware gating mechanism to regulate the usefulness of information propagating across documents and add question information during reasoning. We evaluate our approach on WikiHop dataset, and our approach achieves state-of-the-art accuracy against previously published approaches. Especially, our ensemble model surpasses human performance by 4.2%.
摘要:在多个文档多跳阅读理解最近备受瞩目。在本文中,我们提出了一种新的方法来解决这个多跳阅读理解问题。通过人类的推理处理的启发,我们从支持文件构建一个基于路径的推理图。该图可以结合基于图形和基于路径的方法的两个想法,所以它是多跳推理更好。同时,我们提出门控RGCN对基于路径的推理图,其中包含一个新的问题意识的门控机制,规范信息传播的跨文档的有效性和推理过程中添加问题的资料积累的证据。我们评估我们的WikiHop数据集的方法,而我们的方法实现对此前公布的方案国家的最先进的精度。特别是,我们的集成模型4.2%,超过了人体机能。

3. A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction [PDF] 返回目录
  Yang Zhou, Tong Zhao, Meng Jiang
Abstract: Textual patterns (e.g., Country's president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.
摘要:文本模式(例如,国总统人)被指定和/或用于提取从非结构化数据的事实信息生成的。基于模式的信息提取方法已被确认为他们的效率和转移性。然而,并不是每一个模式是可靠的:一个主要挑战是派生从不同和有时是相互冲突提取最完整,最准确的事实。在这项工作中,我们提出这实际上制定提取在生成过程概率图形模型。它会自动推断出真正的事实和模式的可靠性没有任何监督。它具有专门用于时间事实两种新的设计:在两种类型的时间信号,包括在文本和文本生成时间的时间标签的(1)它的模型图案的可靠性; (2)它的模型常识约束作为可观察的变量。实验结果表明,我们的模型显著优于从新闻的数据中提取真实时间的事实现有的方法。

4. CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP [PDF] 返回目录
  Libo Qin, Minheng Ni, Yue Zhang, Wanxiang Che
Abstract: Multi-lingual contextualized embeddings, such as multilingual-BERT (mBERT), have shown success in a variety of zero-shot cross-lingual tasks. However, these models are limited by having inconsistent contextualized representations of subwords across different languages. Existing work addresses this issue by bilingual projection and fine-tuning technique. We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT, which encourages model to align representations from source and multiple target languages once by mixing their context information. Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages. Experimental results on five tasks with 19 languages show that our method leads to significantly improved performances for all the tasks compared with mBERT.
摘要:多国语言情境的嵌入,例如多语言-BERT(mBERT),已经显示出了多种零射门跨语言任务中取得成功。然而,这些模型由具有不同语言的子词不一致的情境表示限制。通过双语投影和微调技术现有的工作解决了这个问题。我们提出了一个数据增强框架生成多语种码转换数据进行微调mBERT,鼓励模型从源和多目标语言对齐表示一旦通过混合它们的上下文信息。与现有工作相比,我们的方法不依赖于双语句子进行训练,并且只需要一个多目标语言的训练过程。与19种语言五项任务的实验结果表明,该方法导致显著改善了所有与mBERT相比的任务性能。

5. Provenance for Linguistic Corpora Through Nanopublications [PDF] 返回目录
  Timo Lek, Anna de Groot, Tobias Kuhn, Roser Morante
Abstract: Research in Computational Linguistics is dependent on text corpora for training and testing new tools and methodologies. While there exists a plethora of annotated linguistic information, these corpora are often not interoperable without significant manual work. Moreover, these annotations might have adapted and might have evolved into different versions, making it challenging for researchers to know the data's provenance and merge it with other annotated corpora. In other words, these variations affect the interoperability between existing corpora. This paper addresses this issue with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications. We demonstrate how linguistic annotations from separate corpora can be merged through a similar format to thereby make annotation content simultaneously accessible. The process for developing the nanopublications is described, and SPARQL queries are performed to extract interesting content from the new representations. The queries show that information of multiple corpora can now be retrieved more easily and effectively with the automated interoperability of the information of different corpora in a uniform data format.
摘要:研究计算语言学取决于语料库进行训练和测试新的工具和方法。虽然存在的注释语言信息过多,这些语料库往往没有显著的手工工作不能互操作。此外,这些注释可能已经适应,并有可能演变成不同的版本,使其成为具有挑战性的研究人员知道数据的出处,并与其他标注的语料进行合并。换句话说,这些变化影响现有的语料库之间的互操作性。本文将解决这个问题与事件注释语料的案例研究,并在nanopublications的形式创建该数据的新的,更好的互操作性表示。我们演示了如何从不同的语料库语言学注释可以通过类似的形式合并,从而使注释内容的同时访问。为发展nanopublications的过程描述,并且SPARQL查询的执行来提取新的表述有趣的内容。的查询显示多个语料库的信息,现在可以与不同的语料库以统一的数据格式的信息的自动互操作性更容易且有效地检索。

6. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [PDF] 返回目录
  Nitika Mathur, Tim Baldwin, Trevor Cohn
Abstract: Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which often leads to falsely confident conclusions about a metric's efficacy. Finally, we turn to pairwise system ranking, developing a method for thresholding performance improvement under an automatic metric against human judgements, which allows quantification of type I versus type II errors incurred, i.e., insignificant human differences in system quality that are accepted, and significant human differences that are rejected. Together, these findings suggest improvements to the protocols for metric evaluation and system performance evaluation in machine translation.
摘要:自动度量是机器翻译系统的开发和评估的基础。判断是否以及在何种程度上,自动度量与人工评估的金标准同意不是一个简单的问题。我们表明,判断当前的度量方法是用于评估翻译,离群的特别存在,高度敏感的这往往会导致错误的结论充满信心有关指标的功效。最后,我们来看看两两系统排名中,开发了一种用于在自动度量对人的判断,这使得I型相比发生II类错误,即在系统质量显着人类个体差异所接受,并显著的量化阈值处理性能改进被拒绝的人的差异。总之,这些发现表明改进为机器翻译度量评估和系统性能评估的协议。

7. Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context [PDF] 返回目录
  Hankyol Lee, Youngjae Yu, Gunhee Kim
Abstract: We present a novel data augmentation technique, CRA (Contextual Response Augmentation), which utilizes conversational context to generate meaningful samples for training. We also mitigate the issues regarding unbalanced context lengths by changing the input-output format of the model such that it can deal with varying context lengths effectively. Specifically, our proposed model, trained with the proposed data augmentation technique, participated in the sarcasm detection task of FigLang2020, have won and achieves the best performance in both Reddit and Twitter datasets.
摘要:我们提出一个新的数据增强技术,CRA(上下文响应增强),其利用会话语境生成用于训练有意义的样品。我们还通过改变模式使得它可以有效处理不同的语境长度的输入输出格式减轻有关不平衡背景下长度的问题。具体来说,我们提出的模型,与所提出的数据增强技术的培训,参加FigLang2020的嘲讽检测任务,赢得并实现两者Reddit和Twitter的数据集最佳的性能。

8. Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France [PDF] 返回目录
  Paul Boniol, George Panagopoulos, Christos Xypolopoulos, Rajaa El Hamdani, David Restrepo Amariles, Michalis Vazirgiannis
Abstract: Artificial Intelligence techniques are already popular and important in the legal domain. We extract legal indicators from judicial judgment to decrease the asymmetry of information of the legal system and the access-to-justice gap. We use NLP methods to extract interesting entities/data from judgments to construct networks of lawyers and judgments. We propose metrics to rank lawyers based on their experience, wins/loss ratio and their importance in the network of lawyers. We also perform community detection in the network of judgments and propose metrics to represent the difficulty of cases capitalising on communities features.
摘要:人工智能技术已经流行和重要的法律领域。我们提取的司法判决的法律指标下降的法律制度和接入到正义差距信息的不对称。我们用NLP方法从判决中提取有趣的实体/数据构建的律师和判断网络。我们建议指标,根据他们的经验,胜/损失率和他们的律师在网络中的重要性级别的律师。我们也判断网络中进行社区发现并提出的指标来表示资本对社区功能的情况下难度。

9. Discrete Latent Variable Representations for Low-Resource Text Classification [PDF] 返回目录
  Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu
Abstract: While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.
摘要:虽然在文本的深层潜变量模型很多工作采用连续隐变量,因为他们更可解释的,通常更节省空间的离散隐变量很有趣。我们认为几种方法来学习离散潜变量模型,在这些变数确切的边缘化是棘手的情况下的文本。我们比较了解到表示作为特征的低资源文件和句子分类的性能。我们最好的榜样胜过在这些低资源设置连续表示以前最好的业绩报告,一边学习显著更多的压缩表示。有趣的是,我们发现,硬EM执行中特别最低资源制度的摊销变型。

10. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages [PDF] 返回目录
  Pedro Ortiz Suárez, Laurent Romary, Benoît Sagot
Abstract: We use the multilingual OSCAR corpus, extracted from Common Crawl via language classification, filtering and cleaning, to train monolingual contextualized word embeddings (ELMo) for several mid-resource languages. We then compare the performance of OSCAR-based and Wikipedia-based ELMo embeddings for these languages on the part-of-speech tagging and parsing tasks. We show that, despite the noise in the Common-Crawl-based OSCAR data, embeddings trained on OSCAR perform much better than monolingual embeddings trained on Wikipedia. They actually equal or improve the current state of the art in tagging and parsing for all five languages. In particular, they also improve over multilingual Wikipedia-based contextual embeddings (multilingual BERT), which almost always constitutes the previous state of the art, thereby showing that the benefit of a larger, more diverse corpus surpasses the cross-lingual benefit of multilingual embedding architectures.
摘要:我们通过语言分类,过滤和清洗使用多语言语料库OSCAR从普通抓取提取,培养了数月中资源语言和英语情境字的嵌入(ELMO)。然后,我们比较基于OSCAR和基于维基百科ELMO的嵌入这些语言对部分的语音标签化和分析任务的性能。我们表明,尽管在基于共抓取-OSCAR数据中的噪声,培训了OSCAR的嵌入进行更好的培训相比维基百科上的嵌入单语。他们实际上等于或提高标记和解析所有五种语言艺术的当前状态。特别地,它们还改善对多语言基于维基百科上下文的嵌入(多种语言BERT),这几乎总是构成现有技术的先前的状态,从而显示出一个更大,更多样化的语料库的益处超过多种语言嵌入的跨语种益处架构。

11. Emora STDM: A Versatile Framework for Innovative Dialogue System Development [PDF] 返回目录
  James D. Finch, Jinho D. Choi
Abstract: This demo paper presents Emora STDM (State Transition Dialogue Manager), a dialogue system development framework that provides novel workflows for rapid prototyping of chat-based dialogue managers as well as collaborative development of complex interactions. Our framework caters to a wide range of expertise levels by supporting interoperability between two popular approaches, state machine and information state, to dialogue management. Our Natural Language Expression package allows seamless integration of pattern matching, custom NLP modules, and database querying, that makes the workflows much more efficient. As a user study, we adopt this framework to an interdisciplinary undergraduate course where students with both technical and non-technical backgrounds are able to develop creative dialogue managers in a short period of time.
摘要:本文演示呈现Emora STDM(状态转换对话管理器),对话系统开发框架,它提供了基于聊天的对话经理的快速原型以及复杂的互动协作开发新的工作流程。我们的框架迎合广泛的专业知识水平的两个流行的方法,状态机和信息状态,对话管理之间的互操作性支持。我们的自然语言表达软件包允许模式匹配,自定义NLP模块和数据库查询,使工作流程更加高效的无缝集成。作为用户研究中,我们采用这个框架,一个跨学科的本科课程,让学生有技术和非技术背景能够开发创造性的对话经理在很短的时间周期。

12. Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols [PDF] 返回目录
  Sarah E. Finch, Jinho D. Choi
Abstract: As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.
摘要:基于AI-作为对话的对话管理已日益成为一个热门话题,需要一个标准化的,可靠的评估程序的增长更为严峻。事务的当前状态显示不同的评估方案,评估导向的聊天对话管理系统,使其难以在不同的方法进行公平比较研究,并获得他们的价值观有见地的理解。为了促进这项研究,更强劲的评价协议必须在地方设置。本文介绍了对话系统自动和人工评估方法,确定各自的缺点,而对积累的最有效的评价维度的证据进行全面综合。从过去两年共有20篇论文被调查,分析三种评价协议:自动,静态和互动。最后,在这些论文中使用的评价维度是对我们从Alexa的奖2020采集系统用户对话数据专家评价比较。

13. Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges [PDF] 返回目录
  Maxine Eskenazi, Tiancheng Zhao
Abstract: This USER Workshop was convened with the goal of defining future research directions for the burgeoning intelligent agent research community and to communicate them to the National Science Foundation. It took place in Pittsburgh Pennsylvania on October 24 and 25, 2019 and was sponsored by National Science Foundation Grant Number IIS-1934222. Any opinions, findings and conclusions or future directions expressed in this document are those of the authors and do not necessarily reflect the views of the National Science Foundation. The 27 participants presented their individual research interests and their personal research goals. In the breakout sessions that followed, the participants defined the main research areas within the domain of intelligent agents and they discussed the major future directions that the research in each area of this domain should take
摘要:该用户研讨会与定义为新兴的智能代理研究界未来的研究方向的目标召集和他们沟通,以美国国家科学基金会。它发生在宾夕法尼亚州匹兹堡于2019年10月24和25,是由美国国家科学基金会资助号IIS赞助-1934222。本文档中的任何意见,研究成果和结论或未来的发展方向是那些作者,并不一定反映国家科学基金会的意见。 27名学员提出了他们的个人研究兴趣和个人的研究目标。在分组会议随后,与会的智能代理域内定义的主要研究领域,他们讨论了今后的主要方向,在这个领域的各个方面的研究应该采取

14. Disentangled Non-Local Neural Networks [PDF] 返回目录
  Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu
Abstract: The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, such as semantic segmentation on Cityscapes, ADE20K and PASCAL Context, object detection on COCO, and action recognition on Kinetics.
摘要:非本地块是加强常规卷积神经网络的上下文建模能力的流行的模块。本文首先研究在深度非局部块,我们发现它的注意力计算可以分成两个方面,一个白化成对项占两个像素,并表示每个像素的显着性一元项之间的关系。我们还注意到,仅仅训练了两个词往往不同的视觉线索,例如模拟白化成对项获悉内区域的关系,而一元长期学习突出的边界。然而,这两个术语紧密耦合在非局部块,这妨碍各学习。基于这些发现,我们现在的解开的非本地块,其中两个术语分离,以便学习两个词。我们展示的各种任务,如城市景观,ADE20K和PASCAL上下文语义分割,目标检测的COCO,以及动作识别的动力学解耦设计的有效性。

15. VirTex: Learning Visual Representations from Textual Annotations [PDF] 返回目录
  Karan Desai, Justin Johnson
Abstract: The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet. Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, we aim to learn high-quality visual representations from fewer images. To this end, we revisit supervised pretraining, and seek data-efficient alternatives to classification-based pretraining. We propose VirTex -- a pretraining approach using semantically dense captions to learn visual representations. We train convolutional networks from scratch on COCO Captions, and transfer them to downstream recognition tasks including image classification, object detection, and instance segmentation. On all tasks, VirTex yields features that match or exceed those learned on ImageNet -- supervised or unsupervised -- despite using up to ten times fewer images.
摘要:事实上的方法很多视觉任务是从预训练的视觉表现,通常是通过对ImageNet监督培训学到启动。最近的方法已经探索监督的训练前向规模浩大的数量未标记的图像。相比之下,我们的目标是从较少的图像学习高品质的视觉表现。为此,我们重新审视监督训练前,并寻求数据有效替代基于分类的训练前。我们提出的Virtex - 使用语义密集字幕学习视觉表现一个训练前的办法。我们培养卷积网络从COCO标题划伤,并将其传送到下游的识别任务,包括图像分类,目标检测和实例分割。在所有任务,VIRTEX收益率的特点是匹配或超过那些ImageNet了解到 - 监督或无人监督 - 尽管使用高达十倍较少的图像。

16. Exploring Weaknesses of VQA Models through Attribution Driven Insights [PDF] 返回目录
  Shaunak Halbe
Abstract: Deep Neural Networks have been successfully used for the task of Visual Question Answering for the past few years owing to the availability of relevant large scale datasets. However these datasets are created in artificial settings and rarely reflect the real world scenario. Recent research effectively applies these VQA models for answering visual questions for the blind. Despite achieving high accuracy these models appear to be susceptible to variation in input questions.We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights. Further, We use these insights to craft adversarial attacks which inflict significant damage to these systems with negligible change in meaning of the input questions. We believe this will enhance development of systems more robust to the possible variations in inputs when deployed to assist the visually impaired.
摘要:深层神经网络已经被成功地用于可视化问题回答为由于相关大型数据集的可用性,在过去几年的任务。然而,这些数据集在人工设置创建,很少反映真实世界的场景。最近的研究有效地应用这些模型VQA回答视觉问题,为盲人。尽管实现高精度这些模型似乎容易受到输入变化questions.We通过归属(上预测输入的影响力),以获得宝贵的见解的镜头分析流行VQA车型。此外,我们利用这些资料来手艺这造成这些系统在输入问题,这意味着可以忽略不计的变化显著损害对抗性攻击。我们相信,出动协助视障人士的时候,这将提升系统更稳健的输入可能的变化发展。

17. Deep Differential System Stability -- Learning advanced computations from examples [PDF] 返回目录
  François Charton, Amaury Hayat, Guillaume Lample
Abstract: Can advanced mathematical computations be learned from examples? Using transformers over large generated datasets, we train models to learn properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect estimates of qualitative characteristics of the systems, and good approximations of numerical quantities, demonstrating that neural networks can learn advanced theorems and complex computations without built-in mathematical knowledge.
摘要:CAN先进的数学计算从例子学到什么?使用变压器过大产生的数据集,我们训练模式学习微分系统,如局部稳定性,行为的无限性和可控性的特性。我们实现了系统的质量特征,以及数字量的很好的近似近乎完美的估计,表明神经网络可以学习先进的定理和复杂的计算,而不内置的数学知识。

18. Extracting and categorising the reactions to COVID-19 by the South African public -- A social media study [PDF] 返回目录
  Vukosi Marivate, Avashlin Moodley, Athandiwe Saba
Abstract: Social Media can be used to extract discussion topics during a disaster. With the COVID-19 pandemic impact on South Africa, we need to understand how the law and regulation promulgated by the government in response to the pandemic contrasts with discussion topics social media users have been engaging in. In this work, we expand on traditional media analysis by using Social Media discussions driven by or directed to South African government officials. We find topics that are similar as well as different in some cases. The findings can inform further study into social media during disaster settings in South Africa and beyond.
摘要:社会化媒体可以用于灾难发生过程中提取的讨论主题。随着南非COVID-19大流行的影响,我们需要了解如何通过政府的响应与讨论主题的社交媒体用户中一直从事大流行对比颁布的法律,法规。在这项工作中,我们对传统媒体扩大通过使用社会化媒体的讨论通过推动或引导到南非政府官员分析。我们发现,在某些情况下,类似的还有不同的主题。这些发现可以在南非和超越灾难设置通知进一步研究社交媒体。

19. Mental Workload and Language Production in Non-Native Speaker IPA Interaction [PDF] 返回目录
  Yunhan Wu, Justin Edwards, Orla Cooney, Anna Bleakley, Philip R.Doyle, Leigh Clark, Daniel Rough, Benjamin R. Cowan
Abstract: Through proliferation on smartphones and smart speakers, intelligent personal assistants (IPAs) have made speech a common interaction modality. Yet, due to linguistic coverage and varying levels of functionality, many speakers engage with IPAs using a non-native language. This may impact the mental workload and pattern of language production displayed by non-native speakers. We present a mixed-design experiment, wherein native (L1) and non-native (L2) English speakers completed tasks with IPAs through smartphones and smart speakers. We found significantly higher mental workload for L2 speakers during IPA interactions. Contrary to our hypotheses, we found no significant differences between L1 and L2 speakers in terms of number of turns, lexical complexity, diversity, or lexical adaptation when encountering errors. These findings are discussed in relation to language production and processing load increases for L2 speakers in IPA interaction.
摘要:通过对智能手机和智能扬声器扩散,智能个人助理(IPAS)已经取得讲话共同交互形态。然而,由于语言的覆盖范围和功能的不同级别,许多发言者从事与投资促进机构使用非母语语言。这可能会影响非母语的语言显示生产的脑力负荷和模式。我们提出了一个混合设计实验,其中天然(L1)和非本地(L2)英语的人通过智能手机和智能扬声器完成了投资促进机构的任务。我们在IPA相互作用发现L2音箱显著较高的心理负荷。相反,我们的假设,我们发现L1和L2音箱之间没有显著差异的匝数,词汇的复杂性,多样性,或词汇适应方面遇到错误时。这些发现相对于为L2扬声器在IPA交互语言的生产和加工负荷增大了讨论。

20. See what I'm saying? Comparing Intelligent Personal Assistant use for Native and Non-Native Language Speakers [PDF] 返回目录
  Yunhan Wu, Daniel Rough, Anna Bleakley, Justin Edwards, Orla Cooney, Philip R. Doyle, Leigh Clark, Benjamin R. Cowan
Abstract: Limited linguistic coverage for Intelligent Personal Assistants (IPAs) means that many interact in a non-native language. Yet we know little about how IPAs currently support or hinder these users. Through native (L1) and non-native (L2) English speakers interacting with Google Assistant on a smartphone and smart speaker, we aim to understand this more deeply. Interviews revealed that L2 speakers prioritised utterance planning around perceived linguistic limitations, as opposed to L1 speakers prioritising succinctness because of system limitations. L2 speakers see IPAs as insensitive to linguistic needs resulting in failed interaction. L2 speakers clearly preferred using smartphones, as visual feedback supported diagnoses of communication breakdowns whilst allowing time to process query results. Conversely, L1 speakers preferred smart speakers, with audio feedback being seen as sufficient. We discuss the need to tailor the IPA experience for L2 users, emphasising visual feedback whilst reducing the burden of language production.
摘要:有限的语言覆盖智能个人助理(IPAS)意味着在非母语很多互动。然而,我们很少知道目前的投资促进机构如何支持或阻碍这些用户。通过本机(L1)和非本地(L2)英语的人与谷歌助理在智能手机和智能交互的音箱,我们的目标是更深刻地理解这一点。采访发现,L2音箱优先话语规划围绕感知语言的限制,相对于L1音箱由于系统限制,优先简洁。 L2音箱看到投资促进机构的不敏感,从而导致失败的交互语言需求。 L2扬声器使用智能手机清楚地优选的,因为视觉反馈支持的通信故障诊断的同时允许的时间来处理查询结果。相反地​​,L1扬声器优选智能扬声器,用音频反馈被视为足够的。我们讨论是否需要裁缝为L2用户体验IPA,强调视觉反馈,同时减少语言产生的负担。

21. Transparency in Language Generation: Levels of Automation [PDF] 返回目录
  Justin Edwards, Allison Perrone, Philip R. Doyle
Abstract: Language models and conversational systems are growing increasingly advanced, creating outputs that may be mistaken for humans. Consumers may thus be misled by advertising, media reports, or vagueness regarding the role of automation in the production of language. We propose a taxonomy of language automation, based on the SAE levels of driving automation, to establish a shared set of terms for describing automated language. It is our hope that the proposed taxonomy can increase transparency in this rapidly advancing field.
摘要:语言模型和对话系统变得越来越先进,创造产出可能被误认为是人类。消费者因此,可以通过广告,媒体的报道,或含糊关于自动化的生产语言的作用被误导。我们提出语言自动化的分类的基础上,推动自动化水平SAE,建立一个共享的一套术语用于描述自动化语言。我们希望,所提出的分类可以提高在这个快速发展的领域的透明度。

22. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System [PDF] 返回目录
  Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou
Abstract: This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we add a residual connection in F0 prediction. 3) In addition to the duration loss of each phoneme, the duration of all the phonemes in a musical note is accumulated to calculate the syllable duration loss for rhythm enhancement. Experiment results show that XiaoiceSing outperforms the baseline system of convolutional neural networks by 1.44 MOS on sound quality, 1.18 on pronunciation accuracy and 1.38 on naturalness respectively. In two A/B tests, the proposed F0 and duration modeling methods achieve 97.3% and 84.3% preference rate over baseline respectively, which demonstrates the overwhelming advantages of XiaoiceSing.
摘要:本文呈现XiaoiceSing,高品质的歌声合成系统,其采用一个集成的网络用于频谱,F0和持续时间的建模。我们遵循FastSpeech的主要架构,同时提出一些歌唱专用设计:1)除了音素ID和位置编码,从乐谱(e.g.note节距和长度特征)也被加入。 2)为了削弱关的关键问题,我们加入F0预测的剩余连接。 3)除了每个音素的时间损失,在音符的所有音素的持续时间累计计算节奏增强音节持续时间损失。实验结果表明,XiaoiceSing 1.44 MOS音质,分别为1.18发音的准确性和1.38上自然优于卷积神经网络的基线系统。在两个A / B的测试中,所提出的F0和持续时间的建模方法实现分别超过基线的偏好率97.3%和84.3%,这表明XiaoiceSing的压倒性优势。

23. Large-Scale Adversarial Training for Vision-and-Language Representation Learning [PDF] 返回目录
  Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu
Abstract: We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning. VILLA consists of two training stages: (i) task-agnostic adversarial pre-training; followed by (ii) task-specific adversarial finetuning. Instead of adding adversarial perturbations on image pixels and textual tokens, we propose to perform adversarial training in the embedding space of each modality. To enable large-scale training, we adopt the "free" adversarial training strategy, and combine it with KL-divergence-based regularization to promote higher invariance in the embedding space. We apply VILLA to current best-performing V+L models, and achieve new state of the art on a wide range of tasks, including Visual Question Answering, Visual Commonsense Reasoning, Image-Text Retrieval, Referring Expression Comprehension, Visual Entailment, and NLVR2.
摘要:我们目前VILLA,对视力和语言(V + L)表示学习的大规模对抗训练的第一个已知的努力。别墅由两个训练阶段:(一)任务无关的对抗前的培训;接着(ⅱ)任务特异性对抗细化和微调。相反,在图像像素和文本标记加入敌对的扰动,我们建议在每个模式的嵌入空间进行对抗性训练。为了使大规模培训,我们采用了“免费”的对抗性训练策略,并与基于KL散度正规化结合起来,以促进嵌入空间较高的不变性。我们应用VILLA目前表现最好的V + L型,并在广泛的任务实现新的艺术状态,包括Visual答疑,视觉常识推理,图片,文本检索,参考表述的理解,视觉蕴涵和NLVR2 。

24. PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing [PDF] 返回目录
  Jon Saad-Falcon, Omar Shaikh, Zijie J. Wang, Austin P. Wright, Sasha Richardson, Duen Horng Chau
Abstract: Discovering research expertise at institutions can be a difficult task. Manually curated university directories easily become out of date and they often lack the information necessary for understanding a researcher's interests and past work, making it harder to explore the diversity of research at an institution and identify research talents. This results in lost opportunities for both internal and external entities to discover new connections and nurture research collaboration. To solve this problem, we have developed PeopleMap, the first interactive, open-source, web-based tool that visually "maps out" researchers based on their research interests and publications by leveraging embeddings generated by natural language processing (NLP) techniques. PeopleMap provides a new engaging way for institutions to summarize their research talents and for people to discover new connections. The platform is developed with ease-of-use and sustainability in mind. Using only researchers' Google Scholar profiles as input, PeopleMap can be readily adopted by any institution using its publicly-accessible repository and detailed documentation.
摘要:在机构发现研究的专业知识可以是一个艰巨的任务。手工辅助大学目录很容易成为过时的,他们往往缺乏必要的理解研究人员的兴趣和过去的工作,使其更难探索研究的多样性,在一个机构,并确定研究人才的信息。这导致失去机会,内部和外部的实体能发现新的连接和培育的研究合作。为了解决这个问题,我们已经开发PeopleMap,即第一个互动,开放源码的,基于Web的工具,在视觉上“映射出”,表示可以利用自然语言处理(NLP)技术生成的嵌入根据自己的研究兴趣和出版物的研究人员。 PeopleMap提供机构来概括他们的研究人才,并为人们发现新连接的新的吸引人的方式。该平台具有的易于使用的头脑和可持续性发展。只使用研究者谷歌学术的配置文件作为输入,PeopleMap可以很容易地通过其公开访问的存储库和详细的文档任何机构采纳。

注:中文为机器翻译结果!