0%

【arxiv论文】 Computation and Language 2020-01-17

目录

1. Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence [PDF] 摘要
2. Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion [PDF] 摘要
3. Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records [PDF] 摘要
4. A Pilot Study on Multiple Choice Machine Reading Comprehension for Vietnamese Texts [PDF] 摘要
5. AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion [PDF] 摘要
6. Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model [PDF] 摘要
7. Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations [PDF] 摘要
8. AggressionNet: Generalised Multi-Modal Deep Temporal and Sequential Learning for Aggression Identification [PDF] 摘要
9. #MeToo on Campus: Studying College Sexual Assault at Scale Using Data Reported on Social Media [PDF] 摘要
10. Show, Recall, and Tell: Image Captioning with Recall Mechanism [PDF] 摘要
11. "Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans [PDF] 摘要
12. Ensemble based discriminative models for Visual Dialog Challenge 2018 [PDF] 摘要
13. Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures [PDF] 摘要
14. Document Network Projection in Pretrained Word Embedding Space [PDF] 摘要
15. Delving Deeper into the Decoder for Video Captioning [PDF] 摘要
16. Insertion-Deletion Transformer [PDF] 摘要

摘要

1. Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence [PDF] 返回目录
  Jiaju Du, Fanchao Qi, Maosong Sun, Zhiyuan Liu
Abstract: Sememes, defined as the minimum semantic units of human languages in linguistics, have been proven useful in many NLP tasks. Since manual construction and update of sememe knowledge bases (KBs) are costly, the task of automatic sememe prediction has been proposed to assist sememe annotation. In this paper, we explore the approach of applying dictionary definitions to predicting sememes for unannotated words. We find that sememes of each word are usually semantically matched to different words in its dictionary definition, and we name this matching relationship local semantic correspondence. Accordingly, we propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance. Moreover, further quantitative analysis shows that our model can properly learn the local semantic correspondence between sememes and words in dictionary definitions, which explains the effectiveness of our model. The source codes of this paper can be obtained from this https URL.
摘要:义位,定义为语言学人类语言的语义的最小单位,已在许多自然语言处理的任务被证明是有用的。由于人工建设和义素知识库(KBS)更新是昂贵的,自动义原预测的任务已经提出,以协助义原注释。在本文中,我们将探讨采用字典的定义为预测未注释词义位的方法。我们发现,每个词的义位通常是在语义上在其字典上的定义匹配不同的话,我们命名此匹配关系当地语义对应。因此,我们提出了一个义位对应池(SCORP)模型,它能够捕捉到这种匹配预测义原。我们评估在一个著名的义原KB知网我们的模型和基线的方法和发现,我们的模型实现了国家的最先进的性能。此外,进一步的定量分析表明,我们的模型能够正确地学习字典定义义位与词之间的本地语义对应,这说明我们的模型的有效性。本文的源代码可以从该HTTPS URL来获得。

2. Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion [PDF] 返回目录
  Chunyi Wang
Abstract: A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused by lack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extracted from existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposed solution is evaluated on the two Chinese corpus and two English corpus, and is shown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.
摘要:基于多特征和多语种的融合语音情感识别算法是为了解决由于缺乏大型数据集的讲话和在识别语音情感的声学特征低稳健的低识别精度提出。首先,手工和自动深特征在中国和英语演讲情绪现有的数据中提取。于是,各种功能都融合分别。最后,不同语言的熔断特性再次融合,并在分类模型训练。判定,非融合的,结果清单中的融合功能,融合功能显著增强语音情感识别算法的精度。提出的解决方案是在两名中国语料库和两个英语语料库进行评估,并显示相对于原来的解决方案,以提供更精确的预测。作为这项研究的结果是,多特征和多语种的融合算法可以显著提高语音情感识别的准确性如果数据集小。

3. Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records [PDF] 返回目录
  Jan Trienes, Dolf Trieschnigg, Christin Seifert, Djoerd Hiemstra
Abstract: Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.
摘要:在电子健康记录的非结构化信息提供医学研究的宝贵资源。为了保护病人的保密性和符合隐私法规,去识别方法自动删除的个人识别这些医疗记录信息。然而,由于标签的数据,大多数现有的研究被限制在英国的医疗文本和小的不可有人知道的跨语言和领域去识别方法的普遍性。在这项研究中,我们构建了一个不同的数据集从9个院所和荷兰医疗保健的三个域采样数据组成的1260例患者的医疗记录。我们测试的跨语言,跨域三个去识别方法的普遍性。我们的实验表明,专门为荷兰语言开发现有的基于规则的方法不能推广到这个新的数据。此外,一个国家的最先进的神经结构进行强烈跨语言和域,即使在有限的训练数据。相比于基于规则的基于特征和方法,神经方法需要显著较少配置工作和领域的知识。我们让所有的代码和预先训练去标识模型提供给研究界,让从业者将它们应用到自己的数据集,以使未来的基准。

4. A Pilot Study on Multiple Choice Machine Reading Comprehension for Vietnamese Texts [PDF] 返回目录
  Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen
Abstract: Machine Reading Comprehension (MRC) is the task of natural language processing which studies the ability to read and understand unstructured texts and then find the correct answers for questions. Until now, we have not yet had any MRC dataset for such a low-resource language as Vietnamese. In this paper, we introduce ViMMRC, a challenging machine comprehension corpus with multiple-choice questions, intended for research on the machine comprehension of Vietnamese text. This corpus includes 2,783 multiple-choice questions and answers based on a set of 417 Vietnamese texts used for teaching reading comprehension for 1st to 5th graders. Answers may be extracted from the contents of single or multiple sentences in the corresponding reading text. A thorough analysis of the corpus and experimental results in this paper illustrate that our corpus ViMMRC demands reasoning abilities beyond simple word matching. We proposed the method of Boosted Sliding Window (BSW) that improves 5.51% in accuracy over the best baseline method. We also measured human performance on the corpus and compared it to our MRC models. The performance gap between humans and our best experimental model indicates that significant progress can be made on Vietnamese machine reading comprehension in further research. The corpus is freely available at our website for research purposes.
摘要:机阅读理解(MRC)是自然语言处理的哪些研究阅读和理解非结构化的文本,然后找到问题的正确答案的能力的任务。到现在为止,我们还没有过任何MRC数据集这样的低资源语言越南。在本文中,我们介绍ViMMRC,一个具有挑战性的机器理解语料库与多项选择题,供越南文本的机器理解研究。该文集包括基于一套用于教学阅读理解的1日至5年级学生417个越南文2783多项选择题及答案。答案可以从在相应的阅读文本单个或多个句子的内容被提取。本文的语料和实验结果的深入分析表明我们的语料库ViMMRC要求超出了简单的词语匹配的推理能力。我们提出的提振推拉窗(BSW)的,超过最佳基线法提高了精度5.51%的方法。我们还测量了语料库人的表现和它相比,我们的MRC模型。人类和我们最好的实验模型之间的性能差距表明,显著的进展可以在进一步研究越南机器阅读理解进行。该语料库是免费提供的,在我们的网站用于研究目的。

5. AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion [PDF] 返回目录
  Trung Q. Tran
Abstract: I introduce a simple but efficient method to solve one of the critical aspects of English grammar which is the relationship between active sentence and passive sentence. In fact, an active sentence and its corresponding passive sentence express the same meaning, but their structure is different. I utilized Prolog [4] along with Definite Clause Grammars (DCG) [5] for doing the conversion between active sentence and passive sentence. Some advanced techniques were also used such as Extra Arguments, Extra Goals, Lexicon, etc. I tried to solve a variety of cases of active and passive sentences such as 12 English tenses, modal verbs, negative form, etc. More details and my contributions will be presented in the following sections. The source code is available at this https URL.
摘要:我介绍一个简单的,但要解决的英语语法的关键方面是主动句和被动句之间的关系的一个有效的方法。事实上,一个主动句和其对应的被动句表达同一个意思,但它们的结构是不同的。我使用的Prolog [4]与定条款文法(DCG)[5]这样做主动句和被动句子之间的转换沿。一些先进的技术,还使用了诸如额外的参数,额外的目标,词汇,等我试图解决各种主动和被动句等12个英文时态,情态动词,否定形式等更多细节和我的贡献的情况下将在下面的章节中介绍。源代码可在此HTTPS URL。

6. Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model [PDF] 返回目录
  Silei Xu, Giovanni Campagna, Jian Li, Monica S. Lam
Abstract: Virtual assistants today require every website to submit skills individually into their proprietary repositories. The skill consists of a fixed set of supported commands and the formal representation of each command. The assistants use the contributed data to create a proprietary linguistic interface, typically using an intent classifier. This paper proposes an open-source toolkit, called Schema2QA, that leverages the this http URL markup found in many websites to automatically build skills. Schema2QA has several advantages: (1) Schema2QA handles compositional queries involving multiple fields automatically, such as "find the Italian restaurant around here with the most reviews", or "what W3C employees on LinkedIn went to Oxford"; (2) Schema2QA translates natural language into executable queries on the up-to-date data from the website; (3) natural language training can be applied to one domain at a time to handle multiple websites using the same this http URL representations. We apply Schema2QA to two different domains, showing that the skills we built can answer useful queries with little manual effort. Our skills achieve an overall accuracy between 74% and 78%, and can answer questions that span three or more properties with 65% accuracy. We also show that a new domain can be supported by transferring knowledge. The open-source Schema2QA lets each website create and own its linguistic interface.
摘要:虚拟助理都要求每一个网站提交技巧单独为他们的专利库。技能由一组固定的支持的命令和各命令的正式表示。助手用提供的数据以创建一个专有的语言接口,通常使用的意图分类器。本文提出了一种开放源代码工具包,叫做Schema2QA,即利用了这个在很多网站上找到的自动构建技术HTTP URL标记。 Schema2QA有以下几个优点:(1)Schema2QA自动处理涉及多个领域组成的查询,如“找到意大利餐厅这里与大多数评论围绕”或“去牛津大学在LinkedIn什么W3C员工”; (2)Schema2QA转换自然语言转换为可执行的查询从网站上的最新数据; (3)自然语言培训可以同时被应用到一个域中使用相同的这个HTTP URL交涉处理多个网站。我们应用Schema2QA于两个不同的领域,显示出我们建立了技能可以回答很少的手动工作有用的查询。我们的技能达到74%和78%之间的整体精度,并能回答这个跨越65%的准确率三个或更多的性能问题。我们还表明,一个新的域可以通过知识转移的支持。开源Schema2QA让每个网站创建和拥有自己的语言界面。

7. Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations [PDF] 返回目录
  Pinkesh Badjatiya, Manish Gupta, Vasudeva Varma
Abstract: With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks
摘要:随着不断增加的社会化媒体平台上传播仇恨的情况下,它是设计滥用检测手段,积极主动规避和控制此类事件的关键。虽然存在仇恨言论的检测方法,他们刻板印象的话,因此从本质上偏向训练受到影响。偏置消除历来被研究了结构化数据集,但我们的目标是从非结构化的文本数据缓解偏差。在本文中,我们提出两个重要的贡献。首先,我们系统的设计方法,以量化的任何模型的偏差,提出的算法识别词集该模型定型。第二,我们提出了新的方法利用知识为基础的概括为无偏差的学习。基于知识的推广提供了一个有效的方式来编码知识,因为他们提供的不只是抽象概括的内容,但也有利于信息回缩从仇恨言论检测分类,从而减少不平衡。我们与多个知识推广政策实验和分析整体性能和减轻他们的偏见的影响。我们有两个现实世界的尺寸〜96K的数据集,维基百科对话页数据集(WikiDetox)和大小的Twitter的数据集〜24K,表明通过强制分类在更好的性能使用基于知识的概括的结果,从学习实验广义的内容。我们的方法利用现有的知识基地,可以很容易地扩展到其他任务

8. AggressionNet: Generalised Multi-Modal Deep Temporal and Sequential Learning for Aggression Identification [PDF] 返回目录
  Anant Khandelwal, Niraj Kumar
Abstract: Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.
摘要:社会化媒体平台,用途广泛增加侵略的风险,这会导致精神压力和负面影响的人们的生活就像心理上的痛苦,战斗行为,和不尊重他人。这样的对话的大多数包含代码混合语言[28]。此外,该方法用来表达思想或沟通方式也从一个社交媒体平台,改变到另一个平台(例如,沟通方式是在Twitter和Facebook有所不同)。这些都增加了问题的复杂性。为了解决这些问题,我们引入了一个统一和强大的多模态深度学习架构,适用于英语代码混合数据集和单语种英语数据集both.The设计系统,采用心理语言特征和非常BA-SIC语言特征。我们的多模态深度学习架构包含,深金字塔CNN,汇集BiLSTM,并断开RNN(带手套和FastText嵌入,两者)。最后,该系统采用基于模型平均的决定。我们评估了英语代码混合TRAC 2018数据集,并从Kaggle获得单语种英语数据集我们的系统。实验结果表明,该系统优于所有英语代码混合数据集和单语种英语数据集以前的方法。

9. #MeToo on Campus: Studying College Sexual Assault at Scale Using Data Reported on Social Media [PDF] 返回目录
  Viet Duong, Phu Pham, Ritwik Bose, Jiebo Luo
Abstract: Recently, the emergence of the #MeToo trend on social media has empowered thousands of people to share their own sexual harassment experiences. This viral trend, in conjunction with the massive personal information and content available on Twitter, presents a promising opportunity to extract data driven insights to complement the ongoing survey based studies about sexual harassment in college. In this paper, we analyze the influence of the #MeToo trend on a pool of college followers. The results show that the majority of topics embedded in those #MeToo tweets detail sexual harassment stories, and there exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions. Furthermore, we discover the outstanding sentiments of the #MeToo tweets using deep semantic meaning representations and their implications on the affected users experiencing different types of sexual harassment. We hope this study can raise further awareness regarding sexual misconduct in academia.
摘要:近日,在社会化媒体的#MeToo趋势的出现已经授权成千上万的人分享自己的性骚扰经历。这种病毒发展趋势,结合大量的个人信息,并在Twitter上可用内容,提出了一个有前途的机会抽取数据的深入分析,以补充有关大学性骚扰正在进行的调查为基础的研究。在本文中,我们分析了对高校追随者池#MeToo趋势的影响。结果显示,大部分嵌入在这些#MeToo主题的鸣叫细节性骚扰的故事,并且存在几大地理区域这一趋势,官方报告的患病率之间的相关性显著。此外,我们发现使用深层语义表述及其对受影响的用户体验不同类型的性骚扰影响的#MeToo鸣叫的优秀情绪。我们希望这项研究能提高公众对学术界的性行为不端进一步的认识。

10. Show, Recall, and Tell: Image Captioning with Recall Mechanism [PDF] 返回目录
  Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu
Abstract: Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.
摘要:生成自然和图像帽tioning准确的描述一直是一个挑战。在本文中,我们亲姿势模仿的方式人类CON-管字幕一种新的召回机制。有三个部分在我们的回忆机甲-NISM:召回单位,语义指南(SG),并回顾-wordslot(RWS)。召回单元是文本的检索模块designedto检索图像召回的话。 SG和RWS被解签订了最好的使用被召回的话。 SG分支cangenerate一个回顾上下文,其可以引导过程ofgenerating字幕。 RWS分公司负责copyingrecalled字标题。通过指向文本摘要机甲-NISM启发,我们采用软切换至SG和RWS之间balancethe产生字概率。在矿井苹果酒优化步骤,我们还引入individualrecalled字奖励(WR),以提升培训。我们的proposedmethods(SG + RWS + WR)实现BLEU-4 /苹果酒/ 36.6 / 116.9 / 21.3与交叉熵损失和38.7 /129.1 /苹果酒优化上MSCOCO Karpathytest分裂22.4,这超越其他状态 - 的结果SPICEscores的最artmethods。

11. "Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans [PDF] 返回目录
  Vivian Lai, Han Liu, Chenhao Tan
Abstract: To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI.
摘要:为支持与机器学习模型人的决策,我们经常需要嵌入是unsalient的,未知的,或者违反直觉的人类模型阐发模式。虽然现有的方法重点讲解机器的预测具有实时援助,我们探讨模型驱动的教程,以帮助人们了解一个训练阶段这些模式。我们认为,从科学论文,类似于科学传播的现行做法,并从解释训练数据自动选择的例子准则都教程。我们使用欺骗性的审查检测作为测试平台,并进行大规模,随机人体学科实验来检验这种教程的效果。我们发现,确实教程改善人类的性能,使用和不使用实时的援助。特别是,虽然深度学习不是简单的模型,教程和简单模型的解释提供了卓越的预测性能对人体更有益。我们的工作提出了以人为本对人类和人工智能之间的协同作用的教程和说明未来的发展方向。

12. Ensemble based discriminative models for Visual Dialog Challenge 2018 [PDF] 返回目录
  Shubham Agarwal, Raghav Goyal
Abstract: This manuscript describes our approach for the Visual Dialog Challenge 2018. We use an ensemble of three discriminative models with different encoders and decoders for our final submission. Our best performing model on 'test-std' split achieves the NDCG score of 55.46 and the MRR value of 63.77, securing third position in the challenge.
摘要:本稿件描述了我们的视觉对话挑战2018年我们使用三种判别模型不同的编码器和解码器为我们最后提交的集成方法。在“测试-STD”分裂我们的最佳表现模型达到NDCG得分55.46和63.77的MRR的价值,确保在挑战第三的位置。

13. Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures [PDF] 返回目录
  David Noever, Wes Regian, Matt Ciolino, Josh Kalin, Dom Hambrick, Kaye Blankenship
Abstract: Small satellite constellations provide daily global coverage of the earth's landmass, but image enrichment relies on automating key tasks like change detection or feature searches. For example, to extract text annotations from raw pixels requires two dependent machine learning models, one to analyze the overhead image and the other to generate a descriptive caption. We evaluate seven models on the previously largest benchmark for satellite image captions. We extend the labeled image samples five-fold, then augment, correct and prune the vocabulary to approach a rough min-max (minimum word, maximum description). This outcome compares favorably to previous work with large pre-trained image models but offers a hundred-fold reduction in model size without sacrificing overall accuracy (when measured with log entropy loss). These smaller models provide new deployment opportunities, particularly when pushed to edge processors, on-board satellites, or distributed ground stations. To quantify a caption's descriptiveness, we introduce a novel multi-class confusion or error matrix to score both human-labeled test data and never-labeled images that include bounding box detection but lack full sentence captions. This work suggests future captioning strategies, particularly ones that can enrich the class coverage beyond land use applications and that lessen color-centered and adjacency adjectives ("green", "near", "between", etc.). Many modern language transformers present novel and exploitable models with world knowledge gleaned from training from their vast online corpus. One interesting, but easy example might learn the word association between wind and waves, thus enriching a beach scene with more than just color descriptions that otherwise might be accessed from raw pixels without text annotation.
摘要:小卫星星座提供地球陆地面积的每日全球覆盖,但图像富集依赖于自动化样改变检测或功能的搜索关键任务。例如,为了从原始像素提取文本注释需要两个依赖机器学习模型,一个分析俯视图像和其他,以产生描述性标题。我们评估对卫星图片说明以前最大的基准七款车型。我们扩展了标记图像样本五倍,然后扩充的,正确的和修剪词汇接近粗糙最小 - 最大(最小字,最大的描述)。这一结果相比毫不逊色与大预先训练的图像模型,但提供的模型大小百倍还原之前的工作不牺牲整体精度(当日志熵损失测量)。这些小模型提供了新的部署机会,特别是当推到边缘处理器,板载卫星,或分布式地面站。为了量化标题的描述性,我们引入了一个新的多类混淆或错误矩阵得分人类标记的测试数据,而不会标记的图像,包括边框检测,但缺乏完整的句子标题。这项工作表明未来字幕战略,特别是那些能充实类覆盖率超过土地使用的应用程序和减轻色心和邻接的形容词(“绿色”,“近”,“间”等)。许多现代语言的变压器存在新颖性和与世界的知识利用的模型从训练中收集来自其庞大的在线语料库。一个有趣的,但简单的例子可以学习乘风破浪的词语联想,从而丰富海滩场景比,否则可能从原始像素进行访问,而不文本注释只是颜色的描述更多。

14. Document Network Projection in Pretrained Word Embedding Space [PDF] 返回目录
  Antoine Gourru, Adrien Guille, Julien Velcin, Julien Jacques
Abstract: We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.
摘要:我们提出正则线性嵌入(RLE),一种新型的方法,其项目链接的文档(例如引网络)的集合到一个预训练的字嵌入空间。除了文本内容,我们利用成对的相似性提供补充信息(例如,在引用图两个文件的网络接近)的基质中。我们首先建立每个文档的简单词汇向量平均,而我们使用的相似改变这种平均表示。该文件表示可以帮助解决许多信息检索任务,如推荐,分类和聚类。我们证明我们的方法比或匹配现有的文档嵌入网络节点上的分类和链接预测任务的方法。此外,我们表明,它可以帮助识别相关关键字来描述文档类。

15. Delving Deeper into the Decoder for Video Captioning [PDF] 返回目录
  Haoran Chen, Jianmin Li, Xiaolin Hu
Abstract: Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there still exist some non-negligible problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called \textit{professional learning} is proposed which develops the strong points of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7% on MSVD and 5% on MSR-VTT compared with the previous state-of-the-art models.
摘要:视频字幕是一种先进的多模态的任务,目的是描述使用自然语言句子的视频剪辑。编码器,解码器框架是在最近几年,这一任务最流行的范例。但是,仍然存在着一个视频字幕模型的解码器的一些不可忽视的问题。我们做一个彻底的调查,解码器,并采用三种技术来提高模型的性能。首先,变差和层正常化的组合被嵌入到一个重复单元,以减轻过拟合问题。其次,新方法,提出了在验证集评估模型的性能,以便选择最佳的检查点进行测试。最后,新的培训战略称为\ textit {专业学习},提出了开发一个字幕模型的长处,避开其弱点。它证明了在微软研究院的视频描述语料库(MSVD)和MSR视频的实验文本(MSR-VTT)的数据集,我们的模型已经实现由BLEU,苹果酒,流星和ROUGE-L指标评估了显著收益最好的结果高达11.7%的MSVD和5%的MSR-VTT与以前国家的最先进的机型相比。

16. Insertion-Deletion Transformer [PDF] 返回目录
  Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan
Abstract: We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation. The model consists of two phases that are executed iteratively, 1) an insertion phase and 2) a deletion phase. The insertion phase parameterizes a distribution of insertions on the current output hypothesis, while the deletion phase parameterizes a distribution of deletions over the current output hypothesis. The training method is a principled and simple algorithm, where the deletion model obtains its signal directly on-policy from the insertion model output. We demonstrate the effectiveness of our Insertion-Deletion Transformer on synthetic translation tasks, obtaining significant BLEU score improvement over an insertion-only model.
摘要:本文提出的插入缺失变压器,一种新型的基于变压器的神经结构和序列生成训练方法。该模型由被迭代执行两个阶段:1)的插入相位和2)的删除相。插入阶段参数化对电流输出假设插入的分布,而删除相位参数化缺失的过电流输出假设的分布。该训练方法是一个原则性和简单的算法,其中该删除模型直接获得关于策略从插入模型输出它的信号。我们证明我们的插入缺失变压器上合成翻译任务的有效性,通过一个只有插入模型取得显著BLEU评分改善。

注:中文为机器翻译结果!