0%

【arxiv论文】 Computation and Language 2020-04-24

目录

1. Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [PDF] 摘要
2. Adaptive Forgetting Curves for Spaced Repetition Language Learning [PDF] 摘要
3. Correct Me If You Can: Learning from Error Corrections and Markings [PDF] 摘要
4. Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [PDF] 摘要
5. Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model [PDF] 摘要
6. On Adversarial Examples for Biomedical NLP Tasks [PDF] 摘要
7. DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models [PDF] 摘要
8. Coupled intrinsic and extrinsic human language resource-based query expansion [PDF] 摘要
9. Coupling semantic and statistical techniques for dynamically enriching web ontologies [PDF] 摘要
10. Learning Dialog Policies from Weak Demonstrations [PDF] 摘要
11. QURIOUS: Question Generation Pretraining for Text Generation [PDF] 摘要
12. Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [PDF] 摘要
13. Semi-Supervised Models via Data Augmentationfor Classifying Interactive Affective Responses [PDF] 摘要
14. Visual Question Answering Using Semantic Information from Image Descriptions [PDF] 摘要
15. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [PDF] 摘要
16. What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing [PDF] 摘要
17. Preserving the Hypernym Tree of WordNet in Dense Embeddings [PDF] 摘要
18. Syntactic Structure from Deep Learning [PDF] 摘要
19. ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts [PDF] 摘要
20. Revisiting the Context Window for Cross-lingual Word Embeddings [PDF] 摘要
21. Polarized-VAE: Proximity Based Disentangled Representation Learning for Text Generation [PDF] 摘要
22. Learning to Classify Intents and Slot Labels Given a Handful of Examples [PDF] 摘要
23. Classification using Hyperdimensional Computing: A Review [PDF] 摘要
24. Natural language technology and query expansion: issues, state-of-the-art and perspectives [PDF] 摘要
25. Distilling Knowledge for Fast Retrieval-based Chat-bots [PDF] 摘要
26. Love, Joy, Anger, Sadness, Fear, and Surprise: SE Needs Special Kinds of AI: A Case Study on Text Mining and SE [PDF] 摘要
27. TCNN: Triple Convolutional Neural Network Models for Retrieval-based Question Answering System in E-commerce [PDF] 摘要
28. Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [PDF] 摘要
29. Visual Commonsense Graphs: Reasoning about the Dynamic Context of a Still Image [PDF] 摘要

摘要

1. Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [PDF] 返回目录
  Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin
Abstract: We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at this http URL
摘要:我们目前CovidQA,专门为COVID-19设计了一个问答集的开始,由专人从Kaggle的COVID-19开放研究数据集挑战聚集知识构建的。据我们所知,这是它的类型的第一个公开可用的资源,并打算作为权宜之计指导研究,直到较大幅度的评价资源可用。虽然这种数据集,包括124的问题,文章对作为本版本0.1版本的,不具有监督的机器学习足够的例子,我们相信它可以为具体的专题评估现有车型的零次或传输能力是有帮助有关COVID-19。本文介绍了我们的方法构建数据集,并提出了一些基线,包括基于长期的技术和各种基于变压器模型的有效性。该数据集可在这个HTTP URL

2. Adaptive Forgetting Curves for Spaced Repetition Language Learning [PDF] 返回目录
  Ahmed Zaidi, Andrew Caines, Russell Moore, Paula Buttery, Andrew Rice
Abstract: The forgetting curve has been extensively explored by psychologists, educationalists and cognitive scientists alike. In the context of Intelligent Tutoring Systems, modelling the forgetting curve for each user and knowledge component (e.g. vocabulary word) should enable us to develop optimal revision strategies that counteract memory decay and ensure long-term retention. In this study we explore a variety of forgetting curve models incorporating psychological and linguistic features, and we use these models to predict the probability of word recall by learners of English as a second language. We evaluate the impact of the models and their features using data from an online vocabulary teaching platform and find that word complexity is a highly informative feature which may be successfully learned by a neural network model.
摘要:遗忘曲线,心理学家,教育学家和认知科学家都得到了广泛的探讨。在智能教学系统的情况下,造型为每个用户和知识组件(如词汇字)的遗忘曲线应使我们能够开发出最佳的修正策略,抵消记忆衰退并确保长期保留。在这项研究中,我们探索多种遗忘曲线模型结合的心理和语言特点,而我们使用这些模型通过英语学习者作为第二语言来预测词召回的可能性。我们评价使用了由一个在线的词汇教学平台的数据模型及其功能的影响,并找到这个词的复杂性是一个非常翔实的功能,可以通过神经网络模型的成功经验。

3. Correct Me If You Can: Learning from Error Corrections and Markings [PDF] 返回目录
  Julia Kreutzer, Nathaniel Berger, Stefan Riezler
Abstract: Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for translations of TED talks from English to German allow precise credit assignment while requiring significantly less human effort than correcting/post-editing, and that error-marked data can be used successfully to fine-tune neural machine translation models.
摘要:序列对序列学习包括信号强度和训练数据的标注成本之间的权衡。例如,机器翻译的数据范围从有利于强化学习昂贵的专家产生的翻译,使监督学习,以微弱的质量判断的反馈。我们目前的注释成本和机器学习能力的第一个用户研究错误标记冷门注释模式。我们发现从英国到德国TED演讲的翻译是错误标记允许精确的债权转让,同时要求显著少人的努力比修正/后期编辑,以及错误标记的数据可以被成功地用于微调神经机器翻译模型。

4. Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [PDF] 返回目录
  Yaru Hao, Li Dong, Furu Wei, Ke Xu
Abstract: The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input. Prior work strives to attribute model decisions to individual input features with different saliency measures, but they fail to explain how these input features interact with each other to reach predictions. In this paper, we propose a self-attention attribution algorithm to interpret the information interactions inside Transformer. We take BERT as an example to conduct extensive studies. Firstly, we extract the most salient dependencies in each layer to construct an attribution graph, which reveals the hierarchical interactions inside Transformer. Furthermore, we apply self-attention attribution to identify the important attention heads, while others can be pruned with only marginal performance degradation. Finally, we show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
摘要:基于变压器的车型受益于强大的多头自注意机制,学习令牌的依赖和对从输入上下文信息的巨大成功。以前的工作努力属性模型决定各个输入不同的显着性特征的措施,但他们无法解释如何将这些输入彼此交互功能达到预测。在本文中,我们提出了一个自我关注归因算法来解释变压器内部的信息交互。我们采取BERT为例进行了广泛的研究。首先,我们提取最显着的依赖性在每一层中以构建归因图,它揭示了变压器内部分层的相互作用。此外,我们采用自重视归属地识别重要注意头,而其他人只能与边缘的性能下降被修剪。最后,我们表明,归属结果可以作为对抗模式实现对BERT非针对性的攻击。

5. Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model [PDF] 返回目录
  Stefan Ollinger, Lorik Dumani, Premtim Sahitaj, Ralph Bergmann, Ralf Schenkel
Abstract: Research on computational argumentation is currently being intensively investigated. The goal of this community is to find the best pro and con arguments for a user given topic either to form an opinion for oneself, or to persuade others to adopt a certain standpoint. While existing argument mining methods can find appropriate arguments for a topic, a correct classification into pro and con is not yet reliable. The same side stance classification task provides a dataset of argument pairs classified by whether or not both arguments share the same stance and does not need to distinguish between topic-specific pro and con vocabulary but only the argument similarity within a stance needs to be assessed. The results of our contribution to the task are build on a setup based on the BERT architecture. We fine-tuned a pre-trained BERT model for three epochs and used the first 512 tokens of each argument to predict if two arguments share the same stance.
摘要:研究论证计算目前正在深入研究。这个社区的目标是找到用户特定主题最好的正反论点要么形成意见为自己,或说服他人接受一定的角度。尽管现有的参数采矿方法可以找到一个主题,一个正确的分类适当的参数为正反还不可靠。同方的立场分类任务提供了两个参数是否共享相同的立场和不需要的立场需要的特定主题的赞成和反对的词汇,但仅有参数相似性来区分进行评估分类的参数对数据表。我们的任务贡献的结果是建立在基础上,BERT架构的设置。我们微调预训练BERT模型三个时期和使用的每个参数的第512代币预测,如果两个参数共享相同的立场。

6. On Adversarial Examples for Biomedical NLP Tasks [PDF] 返回目录
  Vladimir Araujo, Andres Carvallo, Carlos Aspillaga, Denis Parra
Abstract: The success of pre-trained word embeddings has motivated its use in tasks in the biomedical domain. The BERT language model has shown remarkable results on standard performance metrics in tasks such as Named Entity Recognition (NER) and Semantic Textual Similarity (STS), which has brought significant progress in the field of NLP. However, it is unclear whether these systems work seemingly well in critical domains, such as legal or medical. For that reason, in this work, we propose an adversarial evaluation scheme on two well-known datasets for medical NER and STS. We propose two types of attacks inspired by natural spelling errors and typos made by humans. We also propose another type of attack that uses synonyms of medical terms. Under these adversarial settings, the accuracy of the models drops significantly, and we quantify the extent of this performance loss. We also show that we can significantly improve the robustness of the models by training them with adversarial examples. We hope our work will motivate the use of adversarial examples to evaluate and develop models with increased robustness for medical tasks.
摘要:预先训练字的嵌入的成功,促使在生物医学领域它在任务中使用。 BERT的语言模型已经显示出任务,例如命名实体识别(NER)和语义文本相似性(STS),这带来了NLP领域显著进展的标准性能指标显着的成效。但是,目前还不清楚这些系统是否在关键领域,如法律或医疗工作看似良好。出于这个原因,在这项工作中,我们提出了对医疗NER和STS两家知名的数据集对抗性的评估方案。我们提出了两种类型的自然拼写错误,并通过人类制造的错别字启发攻击。我们还提出了另一种类型的攻击是医学术语的使用同义词。在这种对抗性的设置,该模型的准确性显著下降,我们量化这种性能损失的程度。我们还表明,我们可以通过显著与对抗的例子训练他们提高了模型的鲁棒性。我们希望我们的工作将激励采用对抗性的例子与医疗任务,提高稳健性评估和发展模式。

7. DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models [PDF] 返回目录
  Hongxuan Tang, Jing Liu, Hongyu Li, Yu Hong, Hua Wu, Haifeng Wang
Abstract: Machine Reading Comprehension (MRC) is a crucial and challenging task in natural language processing. Although several MRC models obtains human parity performance on several datasets, we find that these models are still far from robust. To comprehensively evaluate the robustness of MRC models, we create a Chinese dataset, namely DuReader_{robust}. It is designed to challenge MRC models from the following aspects: (1) over-sensitivity, (2) over-stability and (3) generalization. Most of previous work studies these problems by altering the inputs to unnatural texts. By contrast, the advantage of DuReader_{robust} is that its questions and documents are natural texts. It presents the robustness challenges when applying MRC models to real-world applications. The experimental results show that MRC models based on the pre-trained language models perform much worse than human does on the robustness test set, although they perform as well as human on in-domain test set. Additionally, we analyze the behavior of existing models on the robustness test set, which might give suggestions for future model development. The dataset and codes are available at \url{this https URL}
摘要:机阅读理解(MRC)是在自然语言处理的关键和艰巨的任务。虽然在几个数据集数MRC车型取得人类平等的表现,我们发现,这些模型是从稳健仍远。综合评价模型MRC的稳健性,我们创建了一个中国的数据集,即DuReader_ {}强劲。它的目的是从以下几方面挑战MRC模型:(1)在灵敏度,(2)在稳定性和(3)的概括。大部分前期工作研究这些问题,通过改变输入到非自然文本。相比之下,DuReader_ {}强劲的优点是它的问题和文档是天然的文本。应用MRC模型以真实世界的应用程序时,它呈现的稳健性挑战。实验结果表明,基于预先训练的语言模型MRC模型进行比人做的健壮性测试集更糟糕,尽管它们在域测试集执行以及人类。此外,我们分析的稳健性检验组现有的模型,这可能会给对未来发展模式的建议的行为。该数据集和代码可在\ {URL这HTTPS URL}

8. Coupled intrinsic and extrinsic human language resource-based query expansion [PDF] 返回目录
  Bhawani Selvaretnam, Mohammed Belkhatir
Abstract: Poor information retrieval performance has often been attributed to the query-document vocabulary mismatch problem which is defined as the difficulty for human users to formulate precise natural language queries that are in line with the vocabulary of the documents deemed relevant to a specific search goal. To alleviate this problem, query expansion processes are applied in order to spawn and integrate additional terms to an initial query. This requires accurate identification of main query concepts to ensure the intended search goal is duly emphasized and relevant expansion concepts are extracted and included in the enriched query. Natural language queries have intrinsic linguistic properties such as parts-of-speech labels and grammatical relations which can be utilized in determining the intended search goal. Additionally, extrinsic language-based resources such as ontologies are needed to suggest expansion concepts semantically coherent with the query content. We present here a query expansion framework which capitalizes on both linguistic characteristics of user queries and ontology resources for query constituent encoding, expansion concept extraction and concept weighting. A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
摘要:可怜的信息检索的性能往往被归因于被定义为人类用户制定精确的自然语言查询是与文档的词汇行认为相关的特定搜索目标的难度查询文档的词汇不匹配问题。为了缓解这个问题,查询扩展过程,以便施加到产卵和整合的额外方面的初始查询。这需要的主要查询概念准确的识别,以确保预期目标搜索被充分强调和扩张相关的概念被提取并列入丰富的查询。自然语言查询,具有内在的语言特性,例如零件的词性标签和可确定拟搜索目标可以利用语法关系。此外,需要外在基于语言的资源,如本体,以提示与查询内容的语义相干扩张的概念。我们在这里提出一个查询扩展框架,它利用了用户查询的两个语言特点和本体查询成分的编码,扩展概念提取和概念加权资源。现实世界的数据集进行彻底实证评价证明了我们对元语言模型,相关模型和序列依赖性为基础的技术方法。

9. Coupling semantic and statistical techniques for dynamically enriching web ontologies [PDF] 返回目录
  Mohammed Maree, Mohammed Belkhatir
Abstract: With the development of the Semantic Web technology, the use of ontologies to store and retrieve information covering several domains has increased. However, very few ontologies are able to cope with the ever-growing need of frequently updated semantic information or specific user requirements in specialized domains. As a result, a critical issue is related to the unavailability of relational information between concepts, also coined missing background knowledge. One solution to address this issue relies on the manual enrichment of ontologies by domain experts which is however a time consuming and costly process, hence the need for dynamic ontology enrichment. In this paper we present an automatic coupled statistical/semantic framework for dynamically enriching large-scale generic ontologies from the World Wide Web. Using the massive amount of information encoded in texts on the Web as a corpus, missing background knowledge can therefore be discovered through a combination of semantic relatedness measures and pattern acquisition techniques and subsequently exploited. The benefits of our approach are: (i) proposing the dynamic enrichment of large-scale generic ontologies with missing background knowledge, and thus, enabling the reuse of such knowledge, (ii) dealing with the issue of costly ontological manual enrichment by domain experts. Experimental results in a precision-based evaluation setting demonstrate the effectiveness of the proposed techniques.
摘要:随着语义Web技术的发展,使用本体的存储和检索覆盖多个领域的信息有所增加。然而,很少本体是能够应付不断增长的需求的频繁更新的语义信息或专业领域特定用户的需求。其结果是,一个关键问题是有关的概念之间关系的信息不可用,也创造缺失的背景知识。解决这个问题的一个解决方案依赖于由领域专家本体的人工富集然而这是一个耗时且昂贵的过程,因此需要进行动态本体富集。在本文中,我们提出了一个自动加上统计/语义从万维网动态丰富的大型通用本体框架。使用的信息编码文本在网络上作为语料巨量,缺少背景知识,因此可以通过语义关联的措施和模式获取技术,随后利用相结合发现的。我们的做法的好处是:(i)建议的大型通用本体的动态富集缺失背景知识,从而使这些知识的重用,(二)处理与昂贵的本体论手动富集的问题领域专家。在精密基于评价设置实验结果表明所提出的技术的有效性。

10. Learning Dialog Policies from Weak Demonstrations [PDF] 返回目录
  Gabriel Gordon-Hall, Philip John Gorinski, Shay B. Cohen
Abstract: Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.
摘要:深强化学习是一种很有前途的方法来训练对话管理器,但目前的方法与多领域对话系统的大型国有和行动空间的斗争。从演示(DQfD),一种算法,成绩非常困难雅达利的游戏,我们利用对话框数据来指导代理成功地响应用户的请求建立在深Q学习。我们做出所需要的数据,越来越少的假设使用标记,减少了标记,甚至未标注数据来训练专家示威者。我们引入增强微调学习,扩展到DQfD,使我们能够克服数据集和环境之间的差距域。实验在一个具有挑战性的多领域对话系统架构验证我们的方法,并获得域外的数据训练有素,即使高成功率。

11. QURIOUS: Question Generation Pretraining for Text Generation [PDF] 返回目录
  Shashi Narayan, Gonçalo Simoes, Ji Ma, Hannah Craighead, Ryan Mcdonald
Abstract: Recent trends in natural language processing using pretraining have shifted focus towards pretraining and fine-tuning approaches for text generation. Often the focus has been on task-agnostic approaches that generalize the language modeling objective. We propose question generation as a pretraining method, which better aligns with the text generation objectives. Our text generation models pretrained with this method are better at understanding the essence of the input and are better language models for the target task. When evaluated on two text generation tasks, abstractive summarization and answer-focused question generation, our models result in state-of-the-art performances in terms of automatic metrics. Human evaluators also found our summaries and generated questions to be more natural, concise and informative.
摘要:在自然语言的最新趋势利用训练前处理已经转移重点转向训练前和文本生成微调方法。通常情况下,重点是对概括的语言建模目标任务无关的方法。我们提出询问生成作为训练前的方法,用文本生成目标哪个更好对齐。用这种方法预先训练我们的文本生成模型更好地理解输入的本质,是目标任务更好的语言模型。当两个文本生成任务,抽象概括和答案为重点的问题生成评估,我们的模型导致国家的最先进的演出自动指标方面。评估员还发现,我们的总结和产生的问题更加自然,简洁,内容翔实。

12. Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [PDF] 返回目录
  Libo Qin, Xiao Xu, Wanxiang Che, Yue Zhang, Ting Liu
Abstract: Recent studies have shown remarkable success in end-to-end task-oriented dialog system. However, most neural models rely on large training data, which are only available for a certain number of task domains, such as navigation and scheduling. This makes it difficult to scalable for a new domain with limited labeled data. However, there has been relatively little research on how to effectively use data from all domains to improve the performance of each domain and also unseen domains. To this end, we investigate methods that can make explicit use of domain knowledge and introduce a shared-private network to learn shared and specific knowledge. In addition, we propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain. Results show that our model outperforms existing methods on multi-domain dialogue, giving the state-of-the-art in the literature. Besides, with little training data, we show its transferability by outperforming prior best model by 13.9\% on average.
摘要:最近的研究表明,在终端到终端的面向任务的对话系统显着的成功。然而,大多数的神经模型依赖于大量的训练数据,这仅适用于一定数量的任务领域,如导航和调度。这使得它可扩展性与有限的标记数据的新域困难。然而,还没有关于如何有效地利用数据从所有域,以提高每个域的性能,也看不见领域的研究相对较少。为此,我们调查的方法,可以作出明确的使用领域的知识,并介绍一个共享专用网络学习共享和专门知识。此外,我们提出了一种新动态融合网络(DF-净),其自动地利用目标域和各个域之间的相关性。结果表明,该模型优于现有的多领域对话的方法,给国家的最先进的文献。此外,很少有训练数据,我们通过展示由13.9 \%的平均表现优于前一个最佳模型的可转移性。

13. Semi-Supervised Models via Data Augmentationfor Classifying Interactive Affective Responses [PDF] 返回目录
  Jiaao Chen, Yuwei Wu, Diyi Yang
Abstract: We present semi-supervised models with data augmentation (SMDA), a semi-supervised text classification system to classify interactive affective responses. SMDA utilizes recent transformer-based models to encode each sentence and employs back translation techniques to paraphrase given sentences as augmented data. For labeled sentences, we performed data augmentations to uniform the label distributions and computed supervised loss during training process. For unlabeled sentences, we explored self-training by regarding low-entropy predictions over unlabeled sentences as pseudo labels, assuming high-confidence predictions as labeled data for training. We further introduced consistency regularization as unsupervised loss after data augmentations on unlabeled data, based on the assumption that the model should predict similar class distributions with original unlabeled sentences as input and augmented sentences as input. Via a set of experiments, we demonstrated that our system outperformed baseline models in terms of F1-score and accuracy.
摘要:我们目前有数据增强(SMDA),一个半监督文本分类系统互动的情感反应分类半监督模式。 SMDA利用近期基于变压器的型号编码每个句子,并采用背面翻译技巧套用给定的句子作为增强数据。对于标记的句子,我们进行数据扩充到统一的标签分布,并在培训过程监督计算损失。对于未标记的句子,我们通过关于低熵的预测在未标记的句子为伪标签,假设高可信度的预测作为训练标记数据探索自我训练。我们进一步介绍一致性正规化作为标签数据的数据扩充后监督的损失,基于这样的假设,该模型可以预测与原来未标记的句子作为输入,扩充句子作为输入同级分布。通过一组实验中,我们证明了我们的系统跑赢基准车型在F1-得分和准确性方面。

14. Visual Question Answering Using Semantic Information from Image Descriptions [PDF] 返回目录
  Tasmia Tasrin, Md Sultan Al Nahian, Brent Harrison
Abstract: Visual question answering (VQA) is a task that requires AI systems to display multi-modal understanding. A system must be able to reason over the question being asked as well as the image itself to determine reasonable answers to the questions posed. In many cases, simply reasoning over the image itself and the question is not enough to achieve good performance. As an aid of the task, other than region based visual information and natural language questions, external textual knowledge extracted from images can also be used to generate correct answers for questions. Considering these, we propose a deep neural network model that uses an attention mechanism which utilizes image features, the natural language question asked and semantic knowledge extracted from the image to produce open-ended answers for the given questions. The combination of image features and contextual information about the image bolster a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.
摘要:视觉问答(VQA)是需要AI系统显示多模式的认识的任务。系统必须能够理智战胜了所提出的问题以及图像本身来确定合理的回答提出的问题。在许多情况下,简单地推理在图像本身的问题是不够的,取得良好的业绩。由于任务的援助,比基于区域的视觉信息和自然语言问题等,从图像中提取外部文本的知识也可以用来产生问题的正确答案。考虑到这些,我们建议采用一种利用图像特征,自然语言问题问及语义知识从图像中提取生产对于给定的问题,开放式答案的注意机制深刻的神经网络模型。图像特征和有关图像的上下文信息相结合,加强了一个模型来更准确地回答问题,并需要较少的训练数据可能这样做。我们评估对上一个强有力的基线VQA任务我们提出的架构,并表明我们的方法实现这一任务的优异成绩。

15. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [PDF] 返回目录
  Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Abstract: Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multi-phase adaptive pretraining offers large gains in task performance.
摘要:从各种来源预训练上的文字语言模型形成了今天的NLP的基础。在这些广泛的覆盖模式的成功来看,我们研究它是否仍然有帮助的裁缝预训练模型的目标任务的域。我们提出了一个研究横跨四个领域(生物医学和计算机科学的出版物,新闻和评论)和八个分类任务,显示出在域训练前(领域自适应训练前)导致的性能提升,无论是高在第二阶段和低资源设置。此外,适应任务的未标记的数据(任务适应性训练前)提高后,即使域自适应训练前的表现。最后,我们表明,适应任务的语料用简单的数据选择策略是一种有效的替代增强,特别是当领域自适应训练前的资源可能不可用。总体而言,我们一致发现,多阶段适应性训练前出版任务性能大的收益。

16. What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing [PDF] 返回目录
  Irene Li, Yixin Li, Tianxiao Li, Sergio Alvarez-Napagao, Dario Garcia
Abstract: The outbreak of coronavirus disease 2019 (COVID-19) recently has affected human life to a great extent. Besides direct physical and economic threats, the pandemic also indirectly impact people's mental health conditions, which can be overwhelming but difficult to measure. The problem may come from various reasons such as unemployment status, stay-at-home policy, fear for the virus, and so forth. In this work, we focus on applying natural language processing (NLP) techniques to analyze tweets in terms of mental health. We trained deep models that classify each tweet into the following emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust. We build the EmoCT (Emotion-Covid19-Tweet) dataset for the training purpose by manually labeling 1,000 English tweets. Furthermore, we propose and compare two methods to find out the reasons that are causing sadness and fear.
摘要:冠状病毒病2019(COVID-19)的爆发,最近已经影响人类生活在很大的程度。除了直接的物质和经济的威胁,将流感大流行也间接影响人的心理健康状况,可以压倒,但很难衡量。这个问题可能来自各种原因,如失业状态,留在家里的政策,担心这种病毒,等等。在这项工作中,我们侧重于应用自然语言处理(NLP)技术来分析在心理健康方面的鸣叫。我们训练的每个鸣叫分为如下几情感深型号:愤怒,期待,厌恶,恐惧,快乐,悲伤,惊讶和信任。我们通过人工标注1000个英语微博打造为训练目的EmoCT(情感Covid19鸣叫)数据集。此外,我们提出并比较两种方法,找出那些造成的悲伤和恐惧的原因。

17. Preserving the Hypernym Tree of WordNet in Dense Embeddings [PDF] 返回目录
  Canlin Zhang, Xiuwen Liu
Abstract: In this paper, we provide a novel way to generate low-dimension (dense) vector embeddings for the noun and verb synsets in WordNet, so that the hypernym-hyponym tree structure is preserved in the embeddings. We call this embedding the sense spectrum (and sense spectra for embeddings). In order to create suitable labels for the training of sense spectra, we designed a new similarity measurement for noun and verb synsets in WordNet. We call this similarity measurement the hypernym intersection similarity (HIS), since it compares the common and unique hypernyms between two synsets. Our experiments show that on the noun and verb pairs of the SimLex-999 dataset, HIS outperforms the three similarity measurements in WordNet. Moreover, to the best of our knowledge, the sense spectra is the first dense embedding system that can explicitly and completely measure the hypernym-hyponym relationship in WordNet.
摘要:在本文中,我们提供生成低维的名词和动词WordNet中同义词集(密实)的嵌入矢量一种新颖的方式,以使上位词-下位词树结构在嵌入物保存。我们称这个嵌入感频谱(和感觉光谱的嵌入)。为了营造频谱检测的训练适当的标签,我们设计了一种新的相似性测量名词和动词WordNet中同义词集。我们称这个相似性度量的上位词相交相似(HIS),因为它的共同和独特的上位词比较两个同义词集之间。我们的实验表明,对名词和动词对的SimLex-999数据集,HIS性能优于三个相似的测量WordNet中的。此外,据我们所知,频谱检测是第一个密集的嵌入系统,可以明确,完全衡量共发现了上位词,下义词的关系。

18. Syntactic Structure from Deep Learning [PDF] 返回目录
  Tal Linzen, Marco Baroni
Abstract: Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to, and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks, and discuss the broader implications that this work has for theoretical linguistics.
摘要:现代深层神经网络的实现需要大量的语言技巧,比如机器翻译工程应用骄人的业绩。这一成功在探测这些模型是否诱导人般从他们接触到的原始数据语法知识引发了人们的兴趣,并且,因此,他们能否关于必要的语言习得的内在结构长期存在的争论有了新的认识。在这篇文章中,我们调查深网络的语法能力的代表性研究,并讨论更广泛的影响,这项工作对理论语言学。

19. ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts [PDF] 返回目录
  Majid Asgari-Bidhendi, Farzane Fakhrian, Behrouz Minaei-Bidgoli
Abstract: In recent years, social media data has exponentially increased, which can be enumerated as one of the largest data repositories in the world. A large portion of this social media data is natural language text. However, the natural language is highly ambiguous due to exposure to the frequent occurrences of entities, which have polysemous words or phrases. Entity linking is the task of linking the entity mentions in the text to their corresponding entities in a knowledge base. Recently, FarsBase, a Persian knowledge graph, has been introduced containing almost half a million entities. In this paper, we propose an unsupervised Persian Entity Linking system, the first entity linking system specially focused on the Persian language, which utilizes context-dependent and context-independent features. For this purpose, we also publish the first entity linking corpus of the Persian language containing 67,595 words that have been crawled from social media texts of some popular channels in the Telegram messenger. The output of the proposed method is 86.94% f-score for the Persian language, which is comparable with the similar state-of-the-art methods in the English language.
摘要:近年来,社交媒体数据呈指数增加,这可以被列举为世界上最大的数据仓库之一。这个社交媒体数据的很大一部分是自然语言文本。然而,自然语言是非常暧昧由于暴露于实体的频繁发生,具有多义词或短语。实体链接是实体连接的任务文本及其对应的实体在知识库中提到。近日,FarsBase,波斯知识图,已经出台了包含将近五十万的实体。在本文中,我们提出了一种无监督的波斯实体链接系统,第一实体链接系统特别专注于波斯语,它利用上下文相关和上下文无关的特性。为此,我们也发布包含67595个字的波斯语已经从电报使者一些热门频道的社交媒体文本抓取的第一个实体连接语料库。所提出的方法的输出是86.94%F的分数波斯语言,这是与类似的状态的最先进的方法在英语媲美。

20. Revisiting the Context Window for Cross-lingual Word Embeddings [PDF] 返回目录
  Ryokan Ri, Yoshimasa Tsuruoka
Abstract: Existing approaches to mapping-based cross-lingual word embeddings are based on the assumption that the source and target embedding spaces are structurally similar. The structures of embedding spaces largely depend on the co-occurrence statistics of each word, which the choice of context window determines. Despite this obvious connection between the context window and mapping-based cross-lingual embeddings, their relationship has been underexplored in prior work. In this work, we provide a thorough evaluation, in various languages, domains, and tasks, of bilingual embeddings trained with different context windows. The highlight of our findings is that increasing the size of both the source and target window sizes improves the performance of bilingual lexicon induction, especially the performance on frequent nouns.
摘要:现有的方法基于映射的跨语言文字的嵌入是基于这样的假设源和目标嵌入的空间结构类似。嵌入空间的结构在很大程度上取决于每个单词的共同出现统计,其中上下文窗口的选择决定。尽管上下文窗口和基于映射的跨语言的嵌入之间的这种明显的联系,他们之间的关系已经在勘探不足以前的工作。在这项工作中,我们提供了一个全面的评估,各种语言,域和任务,具有不同背景的窗口受过训练的双语的嵌入的。我们的研究结果的一大亮点是,增加的源和目标窗口尺寸大小均有提高双语词典感应的性能,特别是在频繁的名词的性能。

21. Polarized-VAE: Proximity Based Disentangled Representation Learning for Text Generation [PDF] 返回目录
  Vikash Balasubramanian, Ivan Kobyzev, Hareesh Bahuleyan, Ilya Shapiro, Olga Vechtomova
Abstract: Learning disentangled representations of real world data is a challenging open problem. Most previous methods have focused on either fully supervised approaches which use attribute labels or unsupervised approaches that manipulate the factorization in the latent space of models such as the variational autoencoder (VAE), by training with task-specific losses. In this work we propose polarized-VAE, a novel approach that disentangles selected attributes in the latent space based on proximity measures reflecting the similarity between data points with respect to these attributes. We apply our method to disentangle the semantics and syntax of a sentence and carry out transfer experiments. Polarized-VAE significantly outperforms the VAE baseline and is competitive with the state-of-the-art approaches, while being more a general framework that is applicable to other attribute disentanglement tasks.
摘要:学习解开现实世界的数据的表示是一个具有挑战性的开放问题。以往大多数方法都集中在要么完全监督的方法,其使用属性标签或操纵模型的潜在空间因式分解监督的办法,如变自动编码器(VAE),通过训练任务的具体损失。在这项工作中,我们提出偏振-VAE,一种新颖的方法即理顺了那些纷繁选择的属性的基础上反映相对于这些属性的数据点之间的相似性接近措施的潜在空间。我们应用我们的方法理清句子的语义和语法并进行传输实验。偏光VAE显著优于VAE基线,并与国家的最先进的方法有竞争力,同时更加的总体框架是适用于其他属性的解开任务。

22. Learning to Classify Intents and Slot Labels Given a Handful of Examples [PDF] 返回目录
  Jason Krone, Yi Zhang, Mona Diab
Abstract: Intent classification (IC) and slot filling (SF) are core components in most goal-oriented dialogue systems. Current IC/SF models perform poorly when the number of training examples per class is small. We propose a new few-shot learning task, few-shot IC/SF, to study and improve the performance of IC and SF models on classes not seen at training time in ultra low resource scenarios. We establish a few-shot IC/SF benchmark by defining few-shot splits for three public IC/SF datasets, ATIS, TOP, and Snips. We show that two popular few-shot learning algorithms, model agnostic meta learning (MAML) and prototypical networks, outperform a fine-tuning baseline on this benchmark. Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets. In addition, we demonstrate that joint training as well as the use of pre-trained language models, ELMo and BERT in our case, are complementary to these few-shot learning methods and yield further gains.
摘要:意图分类(IC)和槽填充(SF)是在大多数面向目标的对话系统核心组件。电流IC / SF模型表现不佳时的训练样例每类的数目是小的。我们提出了一个新的少数拍学习任务,很少拍IC / SF,学习,提高IC和SF模型对在超低资源情景训练的时间没有见过类的性能。我们定义了三个公共IC / SF数据集,ATIS,TOP和零星消息很少拍分裂建立几拍IC / SF基准。我们发现,两种流行的几拍学习算法,模型无关元学习(MAML)和典型的网络,超越微调基准这一基准。原型网络实现了对ATIS和TOP数据集IC性能显著的收益,而这两个典型的网络和MAML跑赢基准相对于SF上的所有三个数据集。此外,我们证明了联合训练,以及采用预训练的语言模型,埃尔莫和BERT在我们的情况下,对这些少数次学习方法的补充和产量进一步增长。

23. Classification using Hyperdimensional Computing: A Review [PDF] 返回目录
  Lulu Ge, Keshab K. Parhi
Abstract: Hyperdimensional (HD) computing is built upon its unique data type referred to as hypervectors. The dimension of these hypervectors is typically in the range of tens of thousands. Proposed to solve cognitive tasks, HD computing aims at calculating similarity among its data. Data transformation is realized by three operations, including addition, multiplication and permutation. Its ultra-wide data representation introduces redundancy against noise. Since information is evenly distributed over every bit of the hypervectors, HD computing is inherently robust. Additionally, due to the nature of those three operations, HD computing leads to fast learning ability, high energy efficiency and acceptable accuracy in learning and classification tasks. This paper introduces the background of HD computing, and reviews the data representation, data transformation, and similarity measurement. The orthogonality in high dimensions presents opportunities for flexible computing. To balance the tradeoff between accuracy and efficiency, strategies include but are not limited to encoding, retraining, binarization and hardware acceleration. Evaluations indicate that HD computing shows great potential in addressing problems using data in the form of letters, signals and images. HD computing especially shows significant promise to replace machine learning algorithms as a light-weight classifier in the field of internet of things (IoTs).
摘要:超维度(HD)的计算是建立在其独特的数据类型被称为hypervectors。这些hypervectors的尺寸通常在数万的范围。建议在计算其数据之间的相似性来解决认知任务,HD计算的目标。数据转换是通过三次操作,包括加法,乘法和排列实现。它的超宽数据表示反对引入冗余噪音。由于信息被均匀地分布在hypervectors的每一位,HD计算本质上是稳健的。此外,由于这三个业务的性质,HD计算导致快速的学习能力,高能源效率和学习和分类任务可接受的精度。本文介绍了HD的计算的背景,及评论数据表示,数据转换,和相似性测量。在高维正交提出了灵活的计算机会。为了平衡精度和效率之间的折衷,策略包括但不限于编码,再培训,二值化和硬件加速。评价表明在处理的信件,信号和图像的形式使用数据的问题,HD计算显示出巨大的潜力。 HD计算特别是显示显著承诺更换机器学习算法,在物联网(IOT中)互联网领域的轻质分类。

24. Natural language technology and query expansion: issues, state-of-the-art and perspectives [PDF] 返回目录
  Bhawani Selvaretnam, Mohammed Belkhatir
Abstract: The availability of an abundance of knowledge sources has spurred a large amount of effort in the development and enhancement of Information Retrieval techniques. Users information needs are expressed in natural language and successful retrieval is very much dependent on the effective communication of the intended purpose. Natural language queries consist of multiple linguistic features which serve to represent the intended search goal. Linguistic characteristics that cause semantic ambiguity and misinterpretation of queries as well as additional factors such as the lack of familiarity with the search environment affect the users ability to accurately represent their information needs, coined by the concept intention gap. The latter directly affects the relevance of the returned search results which may not be to the users satisfaction and therefore is a major issue impacting the effectiveness of information retrieval systems. Central to our discussion is the identification of the significant constituents that characterize the query intent and their enrichment through the addition of meaningful terms, phrases or even latent representations, either manually or automatically to capture their intended meaning. Specifically, we discuss techniques to achieve the enrichment and in particular those utilizing the information gathered from statistical processing of term dependencies within a document corpus or from external knowledge sources such as ontologies. We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition, covering topical issues from query processing, information retrieval, computational linguistics and ontology engineering. For each of the modules we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.
摘要:丰富的知识来源的可用性刺激了大量的信息检索技术的发展和加强努力。用户信息需求的自然语言表达和成功的检索是对的预期目的,有效的沟通非常依赖。自然语言查询包括其用于表示预期目标搜索多语言特征。导致查询的语义模糊和误解以及其他因素如缺乏与搜索环境熟悉的语言特点影响用户的能力,以准确地表示它们的信息的需求,由概念意图间隙创造的。后者直接影响到返回的搜索结果可能不会对用户满意度的相关性,因此是影响信息检索系统的有效性的一个主要问题。中央对我们的讨论是,手动或自动表征的查询意图,并通过添加有意义的术语,短语甚至潜伏表示他们的富集,捕捉他们的本意的显著成分的鉴定。具体而言,我们讨论的技术来实现富集,特别是那些利用文档语料库中或从外部知识源,从长期依赖的统计处理收集到的信息,如本体。我们放下一个通用的基于语言查询扩展框架的解剖结构,并提出了基于模块的分解,覆盖了从查询处理,信息检索,计算语言学和本体工程的热点问题。对于每个模块,我们审查分类以及所使用的技术的光下分析的文献状态的最先进的解决方案。

25. Distilling Knowledge for Fast Retrieval-based Chat-bots [PDF] 返回目录
  Amir Vakili Tahami, Kamyar Ghajar, Azadeh Shakery
Abstract: Response retrieval is a subset of neural ranking in which a model selects a suitable response from a set of candidates given a conversation history. Retrieval-based chat-bots are typically employed in information seeking conversational systems such as customer support agents. In order to make pairwise comparisons between a conversation history and a candidate response, two approaches are common: cross-encoders performing full self-attention over the pair and bi-encoders encoding the pair separately. The former gives better prediction quality but is too slow for practical use. In this paper, we propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time. We perform a detailed analysis of this approach on three response retrieval datasets.
摘要:响应检索是神经排名的一个子集,其中一个模型从一组给定的一个对话历史候选的合适的响应。基于内容的检索,聊天机器人都在寻求对话系统,如客户支持代理的信息通常使用。为了使对话历史记录和候选响应之间两两比较,两种方法是常见的:交叉编码器在对执行完全自我的关注和双编码器单独编码的对。前者提供更好的预测质量,但在实际使用速度太慢。在本文中,我们提出从这个模型中一个新的交叉编码器结构和知识转移到双编码器模型中使用蒸馏。在推理时间不花钱这将有效地提升双编码器性能。我们执行三个响应检索的数据集这种方法进行了详细分析。

26. Love, Joy, Anger, Sadness, Fear, and Surprise: SE Needs Special Kinds of AI: A Case Study on Text Mining and SE [PDF] 返回目录
  Nicole Novielli, Fabio Calefato, Filippo Lanubile
Abstract: Do you like your code? What kind of code makes developers happiest? What makes them angriest? Is it possible to monitor the mood of a large team of coders to determine when and where a codebase needs additional help?
摘要:你喜欢你的代码?什么样的代码使得开发商最快乐?是什么让他们angriest?是否可以监控大型团队编码器的心情来决定何时何地需要的代码库额外的帮助?

27. TCNN: Triple Convolutional Neural Network Models for Retrieval-based Question Answering System in E-commerce [PDF] 返回目录
  Shuangyong Song, Chao Wang
Abstract: Automatic question-answering (QA) systems have boomed during last few years, and commonly used techniques can be roughly categorized into Information Retrieval (IR)-based and generation-based. A key solution to the IR based models is to retrieve the most similar knowledge entries of a given query from a QA knowledge base, and then rerank those knowledge entries with semantic matching models. In this paper, we aim to improve an IR based e-commerce QA system-AliMe with proposed text matching models, including a basic Triple Convolutional Neural Network (TCNN) model and two Attention-based TCNN (ATCNN) models. Experimental results show their effect.
摘要:自动答疑(QA)系统在过去几年蓬勃发展,以及常用的技术大致可以分为信息检索(IR)为基础,并代为主。到IR基于模型的关键解决方案是从QA的知识库检索与特定查询最相似的知识条目,然后重新排名与语义匹配模型的知识条目。在本文中,我们的目标是改善IR基于电子商务的质量保证体系,具有AliMe提出的文本匹配模型,包括一个基本的三重卷积神经网络(TCNN)模型和两个基于注意TCNN(ATCNN)模型。实验结果表明,它们的作用。

28. Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [PDF] 返回目录
  Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
Abstract: While end-to-end ASR systems have proven competitive with the conventional hybrid approach, they are prone to accuracy degradation when it comes to noisy and low-resource conditions. In this paper, we argue that, even in such difficult cases, some end-to-end approaches show performance close to the hybrid baseline. To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. We also provide a comparison of acoustic features and speech enhancements. Besides, we evaluate the effectiveness of neural network language models for hypothesis re-scoring in low-resource conditions. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline. With the Guided Source Separation based speech enhancement, this approach outperforms the hybrid baseline system by 2.7% WER abs. and the end-to-end system best known before by 25.7% WER abs.
摘要:尽管终端到终端的ASR系统已被证明的竞争与传统的混合方法,他们很容易精度的下降,当谈到嘈杂和低资源条件。在本文中,我们认为,即使在这样困难的情况下,一些终端到终端的方法显示性能接近混合基线。为了证明这一点,我们使用磬-6挑战数据作为具有挑战性的环境和日常用语的嘈杂条件下的一个例子。我们通过实验比较和与RNN换能器与RNN沿接近与变压器的架构分析CTC-关注。我们还提供的声学特征和语音增强功能的比较。此外,我们评估的神经网络语言模型,可以在低资源条件假设再得分的有效性。我们最好的基于RNN换能器的终端到高端机型,改进的束搜索在一起,达到仅3.8%WER ABS质量。比LF-MMI TDNN-F磬-6的挑战基线更糟。与被引导源分离语音增强,本办法由2.7%WER腹肌优于混合基线系统。和25.7%WER之前最知名的端至端系统腹肌。

29. Visual Commonsense Graphs: Reasoning about the Dynamic Context of a Still Image [PDF] 返回目录
  Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi
Abstract: Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away. We propose VisualComet, the novel framework of visual commonsense reasoning tasks to predict events that might have happened before, events that might happen next, and the intents of the people at present. To support research toward visual commonsense reasoning, we introduce the first large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after. In addition, we provide person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text. We establish strong baseline performances on this task and demonstrate that integration between visual and textual commonsense reasoning is the key and wins over non-integrative alternatives.
摘要:尽管从静止图像的单个帧,人们可以推理的图像前,后,以及超越帧的动态故事。例如,假设一个人挣扎着生存下去的水的影像,我们可以推论,这名男子又陷入了过去的水有时,意图那个男人此刻的是为了生存,他需要的帮助不久的将来,否则他将得到冲走。我们建议VisualComet,视觉常识推理任务的新框架来预测可能发生之前,可能未来发生的事件,活动和人的目前的意图。为了向视觉常识推理支持研究,我们介绍Visual常识图形是由精心注释了一组不同的60000个图像的视觉常识推论超过140万的文字描述的首个大型仓库,每前的短片摘要配对之后。此外,我们还提供图像中出现,人民和文字常识的描述中提到的人之间的人接地(即,共同引用链接),允许图像和文本之间更紧密的集成。我们对这项工作建立强有力基线表演和展示的视觉和文本常识推理之间的整合是在非整合方案的关键和胜利。

注:中文为机器翻译结果!