0%

【arxiv论文】 Computation and Language 2020-07-09

目录

1. Unsupervised Online Grounding of Natural Language during Human-Robot Interactions [PDF] 摘要
2. Tweets Sentiment Analysis via Word Embeddings and Machine Learning Techniques [PDF] 摘要
3. A Novel BGCapsule Network for Text Classification [PDF] 摘要
4. Segmentation Approach for Coreference Resolution Task [PDF] 摘要
5. Normalizador Neural de Datas e Endereços [PDF] 摘要
6. Interpreting Hierarchical Linguistic Interactions in DNNs [PDF] 摘要
7. Open Domain Suggestion Mining Leveraging Fine-Grained Analysis [PDF] 摘要
8. Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (Malayalam-English Mix-Code) [PDF] 摘要
9. Chatbot: A Conversational Agent employed with Named Entity Recognition Model using Artificial Neural Network [PDF] 摘要
10. Neural relation extraction: a survey [PDF] 摘要
11. Understanding Object Affordances Through Verb Usage Patterns [PDF] 摘要
12. A Survey on Transfer Learning in Natural Language Processing [PDF] 摘要
13. Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets [PDF] 摘要
14. Automatic Detection of Sexist Statements Commonly Used at the Workplace [PDF] 摘要
15. Learning Neural Textual Representations for Citation Recommendation [PDF] 摘要
16. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion [PDF] 摘要
17. Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases [PDF] 摘要
18. Best-First Beam Search [PDF] 摘要
19. Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents [PDF] 摘要
20. KQA Pro: A Large Diagnostic Dataset for Complex Question Answering over Knowledge Base [PDF] 摘要
21. Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge [PDF] 摘要
22. Language Modeling with Reduced Densities [PDF] 摘要
23. ISA: An Intelligent Shopping Assistant [PDF] 摘要
24. The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain [PDF] 摘要
25. Cross-lingual Inductive Transfer to Detect Offensive Language [PDF] 摘要
26. Evaluating German Transformer Language Models with Syntactic Agreement Tests [PDF] 摘要
27. Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision [PDF] 摘要
28. Streaming End-to-End Bilingual ASR Systems with Joint Language Identification [PDF] 摘要
29. Spatio-Temporal Scene Graphs for Video Dialog [PDF] 摘要
30. Expressive Interviewing: A Conversational System for Coping with COVID-19 [PDF] 摘要
31. Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations [PDF] 摘要

摘要

1. Unsupervised Online Grounding of Natural Language during Human-Robot Interactions [PDF] 返回目录
  Oliver Roesler
Abstract: Allowing humans to communicate through natural language with robots requires connections between words and percepts. The process of creating these connections is called symbol grounding and has been studied for nearly three decades. Although many studies have been conducted, not many considered grounding of synonyms and the employed algorithms either work only offline or in a supervised manner. In this paper, a cross-situational learning based grounding framework is proposed that allows grounding of words and phrases through corresponding percepts without human supervision and online, i.e. it does not require any explicit training phase, but instead updates the obtained mappings for every new encountered situation. The proposed framework is evaluated through an interaction experiment between a human tutor and a robot, and compared to an existing unsupervised grounding framework. The results show that the proposed framework is able to ground words through their corresponding percepts online and in an unsupervised manner, while outperforming the baseline framework.
摘要:让人类通过与机器人的自然语言沟通需要的单词和知觉之间的联系。建立这些连接的过程被称为符号接地,并已研究了近三十年。虽然许多研究已进行同义词和所采用的算法,无论是工作,而不是很多人认为的接地仅脱机或监督的方式。在本文中,跨情境学习基础接地框架被提出,允许单词和短语的接地通过相应无需人工监控和在线知觉,即它不需要任何明确的训练阶段,而是更新获得每个新遇到的映射情况。所提出的架构是通过人的导师和机器人之间的交互实验评估,并且相比于现有的无监督的接地框架。结果表明,该框架能够通过设置相应的知觉在线和无监督的方式对地的话,而跑赢基准框架。

2. Tweets Sentiment Analysis via Word Embeddings and Machine Learning Techniques [PDF] 返回目录
  Aditya Sharma, Alex Daniels
Abstract: Sentiment analysis of social media data consists of attitudes, assessments, and emotions which can be considered a way human think. Understanding and classifying the large collection of documents into positive and negative aspects are a very difficult task. Social networks such as Twitter, Facebook, and Instagram provide a platform in order to gather information about peoples sentiments and opinions. Considering the fact that people spend hours daily on social media and share their opinion on various different topics helps us analyze sentiments better. More and more companies are using social media tools to provide various services and interact with customers. Sentiment Analysis (SA) classifies the polarity of given tweets to positive and negative tweets in order to understand the sentiments of the public. This paper aims to perform sentiment analysis of real-time 2019 election twitter data using the feature selection model word2vec and the machine learning algorithm random forest for sentiment classification. Word2vec with Random Forest improves the accuracy of sentiment analysis significantly compared to traditional methods such as BOW and TF-IDF. Word2vec improves the quality of features by considering contextual semantics of words in a text hence improving the accuracy of machine learning and sentiment analysis.
摘要:社交媒体数据的情感分析包括态度,评估和情感可以被视为人类的方式思考。了解和大集合的文件转化为积极和消极方面进行分类是一个非常艰巨的任务。社交网络如Twitter,Facebook的,和Instagram以收集关于民族主义情绪和意见的信息提供了一个平台。考虑到这一事实,人们每天花在社交媒体小时并分享各种不同的主题他们的意见可以帮助我们分析情绪更好。越来越多的公司使用社交媒体工具来提供各种服务和互动与客户。情感分析(SA),以了解公众的情绪分类给予鸣叫正面和负面的鸣叫的极性。本文旨在执行使用特征选择模型word2vec和机器学习的情感分类算法随机森林实时2019大选的Twitter数据的情感分析。 Word2vec与随机森林改善情绪分析的准确性显著相比传统方法,如弓和TF-IDF。 Word2vec改善的特点,通过考虑从而改善机器学习和情感分析的准确度文本中的字的上下文语义的质量。

3. A Novel BGCapsule Network for Text Classification [PDF] 返回目录
  Akhilesh Kumar Gangwar, Vadlamani Ravi
Abstract: Several text classification tasks such as sentiment analysis, news categorization, multi-label classification and opinion classification are challenging problems even for modern deep learning networks. Recently, Capsule Networks (CapsNets) are proposed for image classification. It has been shown that CapsNets have several advantages over Convolutional Neural Networks (CNNs), while their validity in the domain of text has been less explored. In this paper, we propose a novel hybrid architecture viz., BGCapsule, which is a Capsule model preceded by an ensemble of Bidirectional Gated Recurrent Units (BiGRU) for several text classification tasks. We employed an ensemble of Bidirectional GRUs for feature extraction layer preceding the primary capsule layer. The hybrid architecture, after performing basic pre-processing steps, consists of five layers: an embedding layer based on GloVe, a BiGRU based ensemble layer, a primary capsule layer, a flatten layer and fully connected ReLU layer followed by a fully connected softmax layer. In order to evaluate the effectiveness of BGCapsule, we conducted extensive experiments on five benchmark datasets (ranging from 10,000 records to 700,000 records) including Movie Review (MR Imdb 2005), AG News dataset, Dbpedia ontology dataset, Yelp Review Full dataset and Yelp review polarity dataset. These benchmarks cover several text classification tasks such as news categorization, sentiment analysis, multiclass classification, multi-label classification and opinion classification. We found that our proposed architecture (BGCapsule) achieves better accuracy compared to the existing methods without the help of any external linguistic knowledge such as positive sentiment keywords and negative sentiment keywords. Further, BGCapsule converged faster compared to other extant techniques.
摘要:一些文本分类的任务,如情感分析,新闻分类,多标签分类和意见分类提出了挑战,甚至对现代深学习网络问题。近日,胶囊网络(CapsNets)提出的图像分类。它已经表明,CapsNets有超过卷积神经网络(细胞神经网络)几个优点,而他们在文本域中有效性已经得到更少的探讨。在本文中,我们提出了一种新颖的混合体系结构即,BGCapsule,其是胶囊模型通过双向门控复发性单位(BiGRU)的集合数文本分类任务之前。我们采用双向越冬的集合用于主要胶囊层前述特征提取层。的混合体系结构,执行基本预处理步骤之后,由五层组成:后面是完全连接SOFTMAX层基于手套,基于BiGRU合奏层,主胶囊层,平化层和完全连接RELU层包埋层。为了评估BGCapsule的有效性,我们在五个基准数据集(10,000记录到700,000记录范围),包括电影回顾(MR IMDB 2005),AG新闻数据集,DBpedia的本体数据集,Yelp的评论完整数据集和Yelp的评论进行了广泛的实验极性数据集。这些基准测试涵盖多个文本分类的任务,如新闻分类,情感分析,多分类,多标签分类和意见分类。我们发现,我们提出的架构(BGCapsule)实现了比没有任何外部的语言知识的帮助下,现有的方法更准确,如积极情绪关键字和负面情绪关键字。此外,BGCapsule相比其他现存技术的融合速度更快。

4. Segmentation Approach for Coreference Resolution Task [PDF] 返回目录
  Aref Jafari, Ali Ghodsi
Abstract: In coreference resolution, it is important to consider all members of a coreference cluster and decide about all of them at once. This technique can help to avoid losing precision and also in finding long-distance relations. The presented paper is a report of an ongoing study on an idea which proposes a new approach for coreference resolution which can resolve all coreference mentions to a given mention in the document in one pass. This has been accomplished by defining an embedding method for the position of all members of a coreference cluster in a document and resolving all of them for a given mention. In the proposed method, the BERT model has been used for encoding the documents and a head network designed to capture the relations between the embedded tokens. These are then converted to the proposed span position embedding matrix which embeds the position of all coreference mentions in the document. We tested this idea on CoNLL 2012 dataset and although the preliminary results from this method do not quite meet the state-of-the-art results, they are promising and they can capture features like long-distance relations better than the other approaches.
摘要:在指代消解,它考虑的共参照组的所有成员,并同时决定对所有的人是很重要的。这种技术可以帮助避免失去精度,也找到长途关系。所提出的论文是一个正在进行的研究上提出了指代消解能解决所有的共参照提到在一个通文档中给定的提新方法的思路的报告。这已通过限定嵌入方法用于共参照集群的所有成员的一个文档中的位置和解决对给定提所有这些实现。在该方法中,BERT模型已被用于编码的文件和一个头网络设计来捕捉嵌入标记之间的关系。然后这些转换成嵌入所有共参照的位置的文档中提到所提出的跨度位置嵌入基质。我们测试CoNLL这个想法2012集,虽然此方法的初步结果不太符合国家的最先进的成果,他们是有希望的,他们可以更好地捕捉像长途关系的功能比其他方法。

5. Normalizador Neural de Datas e Endereços [PDF] 返回目录
  Gustavo Plensack, Paulo Finardi
Abstract: Documents of any kind present a wide variety of date and address formats, in some cases dates can be written entirely in full or even have different types of separators. The pattern disorder in addresses is even greater due to the greater possibility of interchanging between streets, neighborhoods, cities and states. In the context of natural language processing, problems of this nature are handled by rigid tools such as ReGex or DateParser, which are efficient as long as the expected input is pre-configured. When these algorithms are given an unexpected format, errors and unwanted outputs happen. To circumvent this challenge, we present a solution with deep neural networks state of art T5 that treats non-preconfigured formats of dates and addresses with accuracy above 90% in some cases. With this model, our proposal brings generalization to the task of normalizing dates and addresses. We also deal with this problem with noisy data that simulates possible errors in the text.
摘要:任何一种存在的文件各种各样的日期和地址格式,在某些情况下,红枣能完全的全文写,甚至有不同类型的分离。在地址模式障碍是更大的,由于街道,社区,城市和国家之间交换的可能性更大。在自然语言处理的上下文中,这种性质的问题是由刚性工具,如正则表达式或DateParser,这是有效的,只要预期输入被预先配置处理。当这些算法给出一个意外的格式,错误和不必要的输出发生。为了克服这一挑战,我们提出用艺术T5的深层神经网络状态的解决方案,日期和地址,在某些情况下,精度在90%以上的对待非预先配置的格式。在这种模式下,我们的建议带来的泛化正火日期和地址的任务。我们还处理与噪声数据这一问题在文本中模拟可能出现的错误。

6. Interpreting Hierarchical Linguistic Interactions in DNNs [PDF] 返回目录
  Die Zhang, Huilin Zhou, Xiaoyi Bao, Da Huo, Ruizhao Chen, Xu Cheng, Hao Zhang, Mengyue Wu, Quanshi Zhang
Abstract: This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing. We construct a tree to encode salient interactions extracted by the DNN. Six metrics are proposed to analyze properties of interactions between constituents in a sentence. The interaction is defined based on Shapley values of words, which are considered as an unbiased estimation of word contributions to the network prediction. Our method is used to quantify word interactions encoded inside the BERT, ELMo, LSTM, CNN, and Transformer networks. Experimental results have provided a new perspective to understand these DNNs, and have demonstrated the effectiveness of our method.
摘要:本文提出了被编码的DNN内用于自然语言处理的单词中解开和量化的相互作用的方法。我们构建一个树由DNN提取编码显着的相互作用。六个指标提出来分析句子成分之间的相互作用的性质。的相互作用是基于话,这被认为是到网络预测字捐款无偏估计的沙普利值定义。我们的方法用于将BERT,毛毛,LSTM,CNN,和变压器网络内编码字进行量化的相互作用。实验结果提供了一个新的角度来理解这些DNNs,并证明了该方法的有效性。

7. Open Domain Suggestion Mining Leveraging Fine-Grained Analysis [PDF] 返回目录
  Shreya Singal, Tanishq Goel, Shivang Chopra, Sonika Dahiya
Abstract: Suggestion mining tasks are often semantically complex and lack sophisticated methodologies that can be applied to real-world data. The presence of suggestions across a large diversity of domains and the absence of large labelled and balanced datasets render this task particularly challenging to deal with. In an attempt to overcome these challenges, we propose a two-tier pipeline that leverages Discourse Marker based oversampling and fine-grained suggestion mining techniques to retrieve suggestions from online forums. Through extensive comparison on a real-world open-domain suggestion dataset, we demonstrate how the oversampling technique combined with transformer based fine-grained analysis can beat the state of the art. Additionally, we perform extensive qualitative and qualitative analysis to give construct validity to our proposed pipeline. Finally, we discuss the practical, computational and reproducibility aspects of the deployment of our pipeline across the web.
摘要:建议挖掘任务往往是语义复杂,缺乏可应用到真实世界的数据完善的方法。的建议在大型多元化域以及缺乏大型标记和平衡数据集的存在使得这一任务尤其具有挑战性的处理。在试图克服这些挑战,我们提出了一个两层的管道,它利用基于话语标记过采样和细粒度建议挖掘技术来从网上论坛的建议。通过对现实世界的开放领域建议集广泛的比较,我们展示过采样技术与基于变压器的细粒度分析相结合,如何能击败领域的状态。此外,我们进行大量的定性和定量分析给构想效度,以我们提出的管道。最后,我们讨论了我们在网络上的管道部署的实际,计算和可重复性方面。

8. Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (Malayalam-English Mix-Code) [PDF] 返回目录
  Subramaniam Kazhuparambil, Abhishek Kaushik
Abstract: The scope of a lucrative career promoted by Google through its video distribution platform YouTube has attracted a large number of users to become content creators. An important aspect of this line of work is the feedback received in the form of comments which show how well the content is being received by the audience. However, volume of comments coupled with spam and limited tools for comment classification makes it virtually impossible for a creator to go through each and every comment and gather constructive feedback. Automatic classification of comments is a challenge even for established classification models, since comments are often of variable lengths riddled with slang, symbols and abbreviations. This is a greater challenge where comments are multilingual as the messages are often rife with the respective vernacular. In this work, we have evaluated top-performing classification models and four different vectorizers, for classifying comments which are a mix of different combinations of English and Malayalam (only English, only Malayalam and Mix of English and Malayalam). The statistical analysis of results indicates that Multinomial Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest and Decision Trees offer similar level of accuracy in comment classification. Further, we have also evaluated 3 multilingual sub-types of the novel NLP language model, BERT and compared its performance to the conventional machine learning classification techniques. XLM was the top-performing BERT model with an accuracy of 67.31. Random Forest with Term Frequency Vectorizer was the best the top-performing model out of all the traditional classification models with an accuracy of 63.59.
摘要:通过视频分发平台的YouTube由谷歌推动一个有利可图的职业的范围,吸引了大量用户成为内容创造者。这条线工作的一个重要方面是展示如何做好内容受到观众收到的意见的形式收到的反馈意见。然而,加之垃圾邮件和有限的工具评论分类注释体积使得它几乎不可能的创作者要经过的每一个意见,并收集建设性的反馈意见。注释自动分类甚至对于建立分类模型是一个挑战,因为评论经常用俚语,符号和缩写充斥可变长度。这是一个更大的挑战,其中的意见是多语言的信息往往充斥着各自的白话。在这项工作中,我们已经评估了顶级表现分类模式和四种不同的vectorizers,进行分类这是英语和马来亚(只有英文,只有马来亚和英语和马拉雅拉姆语的混合)的不同组合的混合意见。结果的统计分析表明,多项朴素贝叶斯,K最近邻(KNN),支持向量机(SVM),随机森林和决策树报价在评论分类准确度的水平相近。此外,我们还评估3多语言的子类型小说NLP语言模型,BERT和比较其性能与传统的机器学习分类技术。 XLM是表现最出色BERT模型的67.31的精度。随机森林与词频矢量器是最好的所有传统的分类模型的表现最出色的模型只是一支由63.59的精度。

9. Chatbot: A Conversational Agent employed with Named Entity Recognition Model using Artificial Neural Network [PDF] 返回目录
  Nazakat Ali
Abstract: Chatbot is a technology that is used to mimic human behavior using natural language. There are different types of Chatbot that can be used as conversational agent in various business domains in order to increase the customer service and satisfaction. For any business domain, it requires a knowledge base to be built for that domain and design an information retrieval based system that can respond the user with a piece of documentation or generated sentences. The core component of a Chatbot is Natural Language Understanding (NLU) which has been impressively improved by deep learning methods. But we often lack such properly built NLU modules and requires more time to build it from scratch for high quality conversations. This may encourage fresh learners to build a Chatbot from scratch with simple architecture and using small dataset, although it may have reduced functionality, rather than building high quality data driven methods. This research focuses on Named Entity Recognition (NER) and Intent Classification models which can be integrated into NLU service of a Chatbot. Named entities will be inserted manually in the knowledge base and automatically detected in a given sentence. The NER model in the proposed architecture is based on artificial neural network which is trained on manually created entities and evaluated using CoNLL-2003 dataset.
摘要:聊天机器人是用自然语言来模仿人类行为的技术。有迹象表明,在以提高客户服务和满意度作为在各个业务领域的会话代理不同类型的聊天机器人的。对于任何业务领域,它需要一个知识库,该域将建成并设计了一个信息检索基础的系统,可以用一块文档或生成的句子的响应用户。一个聊天机器人的核心部件是自然语言理解(NLU),其具有通过深学习方法得到令人印象深刻的改进。但是,我们往往缺乏这样的正确建NLU模块,需要更多的时间来从头高质量通话建立它。这可能会鼓励新鲜学习者建立从简单的架构,并使用小数据集从头开始聊天机器人,但它可能会降低的功能性,而不是建立驱动方法,高质量的数据。本研究以命名实体识别(NER)和意图分类模型可以被集成到一个聊天机器人的NLU服务。命名实体将手动知识库中的插入,并在给定的句子自动检测。在所提出的架构的NER模型是基于它是在手动创建实体的培训和评估使用CoNLL-2003数据集的人工神经网络。

10. Neural relation extraction: a survey [PDF] 返回目录
  Mehmet Aydar, Ozge Bozal, Furkan Ozbay
Abstract: Neural relation extraction discovers semantic relations between entities from unstructured text using deep learning methods. In this study, we present a comprehensive review of methods on neural network based relation extraction. We discuss advantageous and incompetent sides of existing studies and investigate additional research directions and improvement ideas in this field.
摘要:采用深学习方法从非结构化文本实体之间的神经关系抽取发现的语义关系。在这项研究中,我们提出了对基于神经网络的关系抽取方法进行全面审查。我们讨论了现有研究的有利和不称职的两侧和探讨在这一领域进一步研究的方向和改进意见。

11. Understanding Object Affordances Through Verb Usage Patterns [PDF] 返回目录
  Ka Chun Lam, Francisco Pereira, Maryam Vaziri-Pashkam, Kristin Woodard, Emalie McMahon
Abstract: In order to interact with objects in our environment, we rely on an understanding of the actions that can be performed on them, and the extent to which they rely or have an effect on the properties of the object. This knowledge is called the object "affordance". We propose an approach for creating an embedding of objects in an affordance space, in which each dimension corresponds to an aspect of meaning shared by many actions, using text corpora. This embedding makes it possible to predict which verbs will be applicable to a given object, as captured in human judgments of affordance. We show that the dimensions learned are interpretable, and that they correspond to patterns of interaction with objects. Finally, we show that they can be used to predict other dimensions of object representation that have been shown to underpin human judgments of object similarity.
摘要:为了在我们的环境中的对象进行交互,我们依靠的是可以对它们进行到它们所依赖或对对象的属性产生影响的行动,以及在何种程度的理解。这方面的知识被称为对象“启示”。我们提出的方法在一个启示空间创建对象的嵌入,其中每个维度对应的许多行动意思共享,利用语料库的一个方面。该嵌入使得能够预测哪些动词将适用于给定的对象,如在启示的人工判断捕获。我们发现,学到的尺寸是可解释的,并且它们对应于与对象的交互模式。最后,我们表明,它们可以被用于预测已显示对象相似的托换人为判断对象表示的其它尺寸。

12. A Survey on Transfer Learning in Natural Language Processing [PDF] 返回目录
  Zaid Alyafeai, Maged Saeed AlShaibani, Irfan Ahmad
Abstract: Deep learning models usually require a huge amount of data. However, these large datasets are not always attainable. This is common in many challenging NLP tasks. Consider Neural Machine Translation, for instance, where curating such large datasets may not be possible specially for low resource languages. Another limitation of deep learning models is the demand for huge computing resources. These obstacles motivate research to question the possibility of knowledge transfer using large trained models. The demand for transfer learning is increasing as many large models are emerging. In this survey, we feature the recent transfer learning advances in the field of NLP. We also provide a taxonomy for categorizing different transfer learning approaches from the literature.
摘要:深学习模式通常需要庞大的数据量。然而,这些大型数据集并不总是可以实现的。这是在许多具有挑战性的任务NLP常见。考虑神经机器翻译,例如,在那里策划这样的大型数据集可能不适合低资源语言是可能的特别。深学习模型的另一个局限性是巨大计算资源的需求。这些障碍激励研究质疑知识转移的使用大训练的模型的可能性。为众多大型的模式正在兴起的迁移学习的需求不断增加。在本次调查中,我们拥有在自然语言处理领域的最新传输的学习进展。我们还提供了不同的分类迁移学习的分类法从文献的方法。

13. Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets [PDF] 返回目录
  María Andrea Cruz Blandón, Okko Räsänen
Abstract: Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any labels. Even though several promising predictive coding -based learning algorithms have been proposed in the literature, it is currently unclear how well they generalise to different languages and training dataset sizes. In addition, despite that such models have shown to be effective phonemic feature learners, it is unclear whether minimisation of the predictive loss functions of these models also leads to optimal phoneme-like representations. The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes. Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets. However, to our surprise, the CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.
摘要:使用预测编码的神经网络模型是从人类语言习得的计算模型的角度来看有趣如果目标是要了解如何单位可以语言学从语音没有任何标签来学习。虽然几个有前途的预测编码为基础的学习算法已经在文献中提出,目前还不清楚他们如何推广到不同的语言和训练数据集的大小。此外,尽管这些模型已经证明是有效的音位功能学习者,目前还不清楚这些车型也导致的损失预测功能最小化,以最优的音素般的表示是否。本研究探讨两个预测编码模型的行为,自回归预测编码和对比预测编码,在音素识别任务(ABX任务)对两种语言不同的数据集的大小。我们的实验表明自回归的损失,并用两个数据集的音素识别分数之间的强相关性。然而,出乎我们的意料,中共模特表演快速收敛已经一个传过来的训练数据,以及平均后,其表示上优于两种语言的APC的。

14. Automatic Detection of Sexist Statements Commonly Used at the Workplace [PDF] 返回目录
  Dylan Grosz, Patricia Conde-Cespedes
Abstract: Detecting hate speech in the workplace is a unique classification task, as the underlying social context implies a subtler version of conventional hate speech. Applications regarding a state-of the-art workplace sexism detection model include aids for Human Resources departments, AI chatbots and sentiment analysis. Most existing hate speech detection methods, although robust and accurate, focus on hate speech found on social media, specifically Twitter. The context of social media is much more anonymous than the workplace, therefore it tends to lend itself to more aggressive and "hostile" versions of sexism. Therefore, datasets with large amounts of "hostile" sexism have a slightly easier detection task since "hostile" sexist statements can hinge on a couple words that, regardless of context, tip the model off that a statement is sexist. In this paper we present a dataset of sexist statements that are more likely to be said in the workplace as well as a deep learning model that can achieve state-of-the art results. Previous research has created state-of-the-art models to distinguish "hostile" and "benevolent" sexism based simply on aggregated Twitter data. Our deep learning methods, initialized with GloVe or random word embeddings, use LSTMs with attention mechanisms to outperform those models on a more diverse, filtered dataset that is more targeted towards workplace sexism, leading to an F1 score of 0.88.
摘要:在工作场所检测仇恨言论是一个独特的分类任务,作为潜在的社会背景下意味着传统的仇恨言论的一个微妙的版本。关于先进国家的职场性别歧视检测模型应用包括人力资源部门,人工智能聊天机器人和情感分析的辅助工具。大多数现有的仇恨言论的检测方法,尽管需要稳定和准确,对社交媒体发现专注于仇恨言论,特别是Twitter的。社交媒体的背景是比工作更匿名的,因此它倾向于借给自己的性别歧视更积极和“敌对”的版本。因此,大量的“敌对”性别歧视的数据集有一个稍微容易检测的任务,因为“敌对”性别歧视语句可对一对夫妇的话,无论背景下,小费模型关闭该声明是性别歧视的铰链。在本文中,我们目前更可能的性别歧视报表的数据集在工作场所,以及一个深度学习模式,可以实现国家的艺术效果可说的。以前的研究已经创造国家的最先进的车型来区分“敌对”,并简单地基于Twitter的汇总数据“仁者”性别歧视。我们深厚的学习方法,用手套或随机字的嵌入初始化,注重机制使用LSTMs跑赢上更加多样化,过滤数据集更针对朝职场性别歧视的模式,导致F1得分0.88。

15. Learning Neural Textual Representations for Citation Recommendation [PDF] 返回目录
  Binh Thanh Kieu, Inigo Jauregi Unanue, Son Bao Pham, Hieu Xuan Phan, Massimo Piccardi
Abstract: With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming. While several approaches for automated citation recommendation have been proposed in the recent years, effective document representations for citation recommendation are still elusive to a large extent. For this reason, in this paper we propose a novel approach to citation recommendation which leverages a deep sequential representation of the documents (Sentence-BERT) cascaded with Siamese and triplet networks in a submodular scoring function. To the best of our knowledge, this is the first approach to combine deep representations and submodular selection for a task of citation recommendation. Experiments have been carried out using a popular benchmark dataset - the ACL Anthology Network corpus - and evaluated against baselines and a state-of-the-art approach using metrics such as the MRR and F1-at-k score. The results show that the proposed approach has been able to outperform all the compared approaches in every measured metric.
摘要:随着科学文献的快速增长,手动地选择用于造纸适当引用变得越来越具有挑战性的并且耗时。虽然自动引用推荐几种方法已经在近几年被提出,有效的文件表示了引用推荐仍然是难以捉摸的,在很大程度上。为此,本文提出了一种新的方法,以充分利用其在子模的计分函数与连体和三线网络级联的文件(句子-BERT)的深顺序表示引文建议。据我们所知,这是深交涉和子模块的选择结合起来,引用的推荐任务的第一种方法。使用度量如MRR和F1-在-K得分国家的最先进的方法,并评价针对基线和 - 实验已经进行了使用一个流行的基准数据集 - 的ACL文集网络语料库。实验结果表明,该方法已经能够胜过所有在每一个测量指标的比较方法。

16. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion [PDF] 返回目录
  Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuanhang Zhou, Ji-Rong Wen, Jingsong Yu
Abstract: Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. Although several efforts have been made for CRS, two major issues still remain to be solved. First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference. Second, there is a semantic gap between natural language expression and item-level user preference. To address these issues, we incorporate both word-oriented and entity-oriented knowledge graphs (KG) to enhance the data representations in CRSs, and adopt Mutual Information Maximization to align the word-level and entity-level semantic spaces. Based on the aligned semantic representations, we further develop a KG-enhanced recommender component for making accurate recommendations, and a KG-enhanced dialog component that can generate informative keywords or entities in the response text. Extensive experiments have demonstrated the effectiveness of our approach in yielding better performance on both recommendation and conversation tasks.
摘要:会话推荐系统(CRS)的目标是通过交流互动推荐优质项目给用户。虽然一些努力已经进行了CRS,两大问题仍有待解决。首先,对话数据本身缺乏对准确了解用户的偏好充分的背景资料。第二,有自然语言表达和项目级别的用户偏好之间的语义差距。为了解决这些问题,我们二者结合字的和面向实体知识图(KG)加强CRS用的数据表示,并采用互信息最大化对齐字级和实体级语义空间。基于对齐的语义表示,我们进一步发展作出准确的建议一KG增强推荐器组件,并且可以在响应文本生成信息的关键字或实体KG增强对话的组成部分。大量的实验已经证明,在收益上都推荐和谈话任务更好的性能我们的方法的有效性。

17. Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases [PDF] 返回目录
  Yu Liu, Quanming Yao, Yong Li
Abstract: With the rapid development of knowledge bases (KBs), link prediction task, which completes KBs with missing facts, has been broadly studied in especially binary relational KBs (a.k.a knowledge graph) with powerful tensor decomposition related methods. However, the ubiquitous n-ary relational KBs with higher-arity relational facts are paid less attention, in which existing translation based and neural network based approaches have weak expressiveness and high complexity in modeling various relations. Tensor decomposition has not been considered for n-ary relational KBs, while directly extending tensor decomposition related methods of binary relational KBs to the n-ary case does not yield satisfactory results due to exponential model complexity and their strong assumptions on binary relations. To generalize tensor decomposition for n-ary relational KBs, in this work, we propose GETD, a generalized model based on Tucker decomposition and Tensor Ring decomposition. The existing negative sampling technique is also generalized to the n-ary case for GETD. In addition, we theoretically prove that GETD is fully expressive to completely represent any KBs. Extensive evaluations on two representative n-ary relational KB datasets demonstrate the superior performance of GETD, significantly improving the state-of-the-art methods by over 15\%. Moreover, GETD further obtains the state-of-the-art results on the benchmark binary relational KB datasets.
摘要:随着知识库(KBS),链路预测的任务,完成与缺少事实KB的快速发展,已在尤其是二元关系知识库系统(a.k.a知识图)具有强大分解张相关的方法被广泛研究。然而,与更高元数的关系的事实普遍存在的n元关系KB的被关注较少,其中和基于神经网络现有的翻译方法在造型各方面的关系较弱的表现力和高复杂性。张量分解并没有被认为是n元关系知识库系统,而直接扩展二元关系KB的张量分解相关方法,以n进制情况下不会产生由于指数模型的复杂性及其对二元关系强的假设令人满意的结果。为了概括为n元关系KB的张量分解,在这项工作中,我们提出GETD的基础上,塔克分解和张量环分解广义模型。现有的负采样技术也推广到用于GETD n进制的情况。此外,我们从理论上证明了GETD完全表现力完全代表任何KB的。上两个代表性n进制关系KB数据集广泛评价显示GETD的性能优越,通过在15 \%显著提高国家的最先进的方法。此外,GETD进一步获得关于基准二元关系KB数据集的状态的最先进的结果。

18. Best-First Beam Search [PDF] 返回目录
  Clara Meister, Ryan Cotterell, Tim Vieira
Abstract: Decoding for many NLP tasks requires a heuristic algorithm for approximating exact search since the full search space is often intractable if not simply too large to traverse efficiently. The default algorithm for this job is beam search--a pruned version of breadth-first search--which in practice, returns better results than exact inference due to beneficial search bias. In this work, we show that standard beam search is a computationally inefficient choice for many decoding tasks; specifically, when the scoring function is a monotonic function in sequence length, other search algorithms can be used to reduce the number of calls to the scoring function (e.g., a neural network), which is often the bottleneck computation. We propose best-first beam search, an algorithm that provably returns the same set of results as standard beam search, albeit in the minimum number of scoring function calls to guarantee optimality (modulo beam size). We show that best-first beam search can be used with length normalization and mutual information decoding, among other rescoring functions. Lastly, we propose a memory-reduced variant of best-first beam search, which has a similar search bias in terms of downstream performance, but runs in a fraction of the time.
摘要:解码许多NLP任务需要近似精确搜索,因为全搜索空间往往是棘手的如果不是太大了有效穿越启发式算法。此作业的默认算法是波束搜索 - 广度优先搜索的修剪版本 - 这在实践中,不是精确推断返回更好的结果,由于有利的搜索偏见。在这项工作中,我们表明,标准束搜索是许多解码任务的计算效率低下的选择;具体地,当计分的功能是在序列长度的单调函数,其它搜索算法可以用来减少调用评分函数(例如,神经网络),这是经常的瓶颈计算的数量。我们建议最好先束搜索,一个算法,可证明返回相同的结果集为标准束搜索,虽然在得分函数调用,以保证最优(模波束尺寸)的最小数量的。我们表明,最佳优先束搜索可以与长度归一化和互信息解码,其他再评分功能中使用。最后,我们建议最好先定向搜索,这在下游性能方面类似的搜索偏见,但运行的内存降低了变体的一小部分时间。

19. Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents [PDF] 返回目录
  Eda Okur, Shachi H Kumar, Saurav Sahay, Lama Nachman
Abstract: Building multimodal dialogue understanding capabilities situated in the in-cabin context is crucial to enhance passenger comfort in autonomous vehicle (AV) interaction systems. To this end, understanding passenger intents from spoken interactions and vehicle vision systems is a crucial component for developing contextual and visually grounded conversational agents for AV. Towards this goal, we explore AMIE (Automated-vehicle Multimodal In-cabin Experience), the in-cabin agent responsible for handling multimodal passenger-vehicle interactions. In this work, we discuss the benefits of a multimodal understanding of in-cabin utterances by incorporating verbal/language input together with the non-verbal/acoustic and visual clues from inside and outside the vehicle. Our experimental results outperformed text-only baselines as we achieved improved performances for intent detection with a multimodal approach.
摘要:位于建设多式联运对话的理解能力,舱内环境是增强自主汽车(AV)交互系统乘客的舒适性是至关重要的。为此,从语音交互和车辆视觉系统的理解乘客的意图是为发展背景和视觉接地对话剂AV的重要组成部分。为了实现这一目标,我们将探讨AMIE(自动车多式联运舱内体验),舱内代理负责处理多乘用车相互作用。在这项工作中,我们通过引入言语/语言输入共同探讨在机舱话语的理解多的好处,非言语/声学和从内部和车外的视觉线索。我们的实验结果优于纯文本基线,因为我们实现了目标探测性能改善与多模式的方法。

20. KQA Pro: A Large Diagnostic Dataset for Complex Question Answering over Knowledge Base [PDF] 返回目录
  Jiaxin Shi, Shulin Cao, Liangming Pan, Yutong Xiang, Lei Hou, Juanzi Li, Hanwang Zhang, Bin He
Abstract: Complex question answering over knowledge base (Complex KBQA) is challenging because it requires the compositional reasoning capability. Existing benchmarks have three shortcomings that limit the development of Complex KBQA: 1) they only provide QA pairs without explicit reasoning processes; 2) questions are either generated by templates, leading to poor diversity, or on a small scale; and 3) they mostly only consider the relations among entities but not attributes. To this end, we introduce KQA Pro, a large-scale dataset for Complex KBQA. We generate questions, SPARQLs, and functional programs with recursive templates and then paraphrase the questions by crowdsourcing, giving rise to around 120K diverse instances. The SPARQLs and programs depict the reasoning processes in various manners, which can benefit a large spectrum of QA methods. We contribute a unified codebase and conduct extensive evaluations for baselines and state-of-the-arts: a blind GRU obtains 31.58\%, the best model achieves only 35.15\%, and humans top at 97.5\%, which offers great research potential to fill the gap.
摘要:复杂的问题回答了知识基础(复杂KBQA)是具有挑战性的,因为它需要组合推理能力。现有的基准有三个缺点,限制了复杂KBQA的发展:1)他们只提供没有明确的推理过程QA对; 2)的问题是由模板生成任一,导致多样性差,或在一个小规模; 3),他们大多只考虑实体之间的关系,但不是属性。为此,我们引入KQA临,大规模数据集的复杂KBQA。我们产生疑问,SPARQLs,并用递归模板功能的程序,然后通过意译众包,从而引发周围120K多样化实例的问题。该SPARQLs和计划描绘了各种方式,它可以受益的QA方法大范围的推理过程。我们贡献一个统一的代码库,并进行了基线和国家的最艺术的广泛评估:以97.5 \%的盲GRU取得31.58 \%,最好的模式只能达到35.15 \%,和人类的顶部,提供了巨大的研究潜力填补了国内空白。

21. Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge [PDF] 返回目录
  Zheng Li, Gang Tu, Guang Liu, Zhi-Qiang Zhan, Yi-Jian Liu
Abstract: At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi granularity and nested features of natural language cannot be recognized. In order to solve these problems, the annotation rules based on phrase window are proposed, and the corresponding multi-dimensional end-to-end phrase recognition algorithm is designed. This annotation rule divides sentences into seven types of nested phrases, and indicates the dependency between phrases. The algorithm can not only introduce background knowledge, recognize all kinds of nested phrases in sentences, but also recognize the dependency between phrases. The experimental results show that the annotation rule is easy to use and has no ambiguity; the matching algorithm is more consistent with the multi granularity and diversity characteristics of syntax than the traditional end-to-end algorithm. The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point. The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition.
摘要:目前,基于监督学习深端至端方法在实体识别和依赖性分析使用。有在此法的两个问题:首先,无法引进的背景知识;其次,多粒度和自然语言的嵌套功能不能被识别。为了解决这些问题,根据短语窗口中的注释规则提出,和相应的多维端至端短语识别算法被设计。此注释规则划分句子翻译成七种类型的嵌套短语,并指示短语之间的相关性。该算法不仅可以介绍背景知识,认识各种嵌套短语在句子,但也承认短语之间的相关性。实验结果表明,该注释规则是易于使用,并且没有歧义;匹配算法是具有语法比传统的端至端算法的多粒度和多样性的特点更加一致。上CPWD数据集中的实验中,通过引入的背景知识,新算法由一个以上的点改善了端 - 端方法的准确性。相应的方法应用于覆铜板2018竞争,中国的幽默类型识别的任务,获得了第一名。

22. Language Modeling with Reduced Densities [PDF] 返回目录
  Tai-Danae Bradley, Yiannis Vlassopoulos
Abstract: We present a framework for modeling words, phrases, and longer expressions in a natural language using reduced density operators. We show these operators capture something of the meaning of these expressions and, under the Loewner order on positive semidefinite operators, preserve both a simple form of entailment and the relevant statistics therein. Pulling back the curtain, the assignment is shown to be a functor between categories enriched over probabilities.
摘要:我们提出了一个框架,在使用密度降低运营商的自然语言建模单词,短语,和更长的表达式。我们展示的这些表达的意义,这些运营商捕捉的东西,并且在Loewner方程组顺序下上半正定运营,维护蕴涵了简单的形式和相关统计在其中。拉回帘,分配被示出为经富集的概率类别之间函子。

23. ISA: An Intelligent Shopping Assistant [PDF] 返回目录
  Tuan Manh Lai, Trung Bui, Nedim Lipka
Abstract: Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people. In this paper, we present ISA, a mobile-based intelligent shopping assistant that is designed to improve shopping experience in physical stores. ISA assists users by leveraging advanced techniques in computer vision, speech processing, and natural language processing. An in-store user only needs to take a picture or scan the barcode of the product of interest, and then the user can talk to the assistant about the product. The assistant can also guide the user through the purchase process or recommend other similar products to the user. We take a data-driven approach in building the engines of ISA's natural language processing component, and the engines achieve good performance.
摘要:尽管电子商务的发展,砖和迫击炮商店仍然是许多人的首选目的地。在本文中,我们提出ISA,是旨在提高实体店的购物体验基于移动智能购物助手。 ISA通过利用计算机视觉,语音处理和自然语言处理的先进技术帮助用户。店内用户只需要拍照或扫描感兴趣的产品的条形码,然后用户可以交谈的关于产品的助手。该助理也可以通过购买过程中引导用户或推荐其他同类产品提供给用户。我们取一个数据驱动的方法构建ISA的自然语言处理组件的发动机,并且发动机达到良好的性能。

24. The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain [PDF] 返回目录
  Xin Wang
Abstract: In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression. Just like perceptual and cognitive neurophysiology has inspired effective deep neural network architectures which in turn make a useful model for understanding the brain, here we explore how biological neural development might inspire efficient and robust optimization procedures which in turn serve as a useful model for the maturation and aging of the brain.
摘要:在这篇文章中,我们将探讨深度学习和神经科学之间的交叉点通过大型语言模型,传递学习和网络压缩镜头。就像知觉和认知神经生理学激发有效的深层神经网络架构而这又做出了有益的模型对理解大脑,在这里我们探索的神经如何生物的发展可能会激发效率和强大的优化程序,这反过来又作为成熟的有用模型和老化的大脑。

25. Cross-lingual Inductive Transfer to Detect Offensive Language [PDF] 返回目录
  Kartikey Pant, Tanvi Dadu
Abstract: With the growing use of social media and its availability, many instances of the use of offensive language have been observed across multiple languages and domains. This phenomenon has given rise to the growing need to detect the offensive language used in social media cross-lingually. In OffensEval 2020, the organizers have released the \textit{multilingual Offensive Language Identification Dataset} (mOLID), which contains tweets in five different languages, to detect offensive language. In this work, we introduce a cross-lingual inductive approach to identify the offensive language in tweets using the contextual word embedding \textit{XLM-RoBERTa} (XLM-R). We show that our model performs competitively on all five languages, obtaining the fourth position in the English task with an F1-score of $0.919$ and eighth position in the Turkish task with an F1-score of $0.781$. Further experimentation proves that our model works competitively in a zero-shot learning environment, and is extensible to other languages.
摘要:随着越来越多地使用社交媒体和其可用性,使用攻击性语言的许多情况下,已经在多个语言和域名观察。这种现象已经引起了越来越多需要检测的社交媒体使用的交叉舌的攻击性语言。在OffensEval 2020年,主办方发布了\ {textit多种语言的攻击性语言识别数据集}(摩利),其中包含了五种不同的语言鸣叫,来检测攻击性语言。在这项工作中,我们引入一个跨语种归纳的方法使用上下文字嵌入\ textit {XLM-罗伯塔}(XLM-R)来标识在鸣叫的冒犯性的语言。我们证明了我们的竞争力模型对所有五种语言进行,获得在英国工作的第四位为$ 0.919 $ F1的得分和第八的位置在土耳其的任务为$ 0.781 $的F1-得分。进一步的实验证明,我们的模型作品竞争在零射门的学习环境,并可以扩展到其他语言。

26. Evaluating German Transformer Language Models with Syntactic Agreement Tests [PDF] 返回目录
  Karolina Zaczynska, Nils Feldhus, Robert Schwarzenberg, Aleksandra Gabryszak, Sebastian Möller
Abstract: Pre-trained transformer language models (TLMs) have recently refashioned natural language processing (NLP): Most state-of-the-art NLP models now operate on top of TLMs to benefit from contextualization and knowledge induction. To explain their success, the scientific community conducted numerous analyses. Besides other methods, syntactic agreement tests were utilized to analyse TLMs. Most of the studies were conducted for the English language, however. In this work, we analyse German TLMs. To this end, we design numerous agreement tasks, some of which consider peculiarities of the German language. Our experimental results show that state-of-the-art German TLMs generally perform well on agreement tasks, but we also identify and discuss syntactic structures that push them to their limits.
摘要:预先训练变压器语言模型(TLM的)最近重制自然语言处理(NLP):大多数国家的最先进的NLP模型现在从语境和知识归纳TLM的受益的顶部运行。为了解释他们的成功,科学界进行了无数次的分析。此外其他的方法,句法协议测试用于分析TLM的。大多数研究都是针对英语进行的,但是。在这项工作中,我们分析了德国TLM的。为此,我们设计了许多协议的任务,其中一些考虑德语的特点。我们的实验结果表明,国家的最先进的德国TLM的普遍对协议的任务表现良好,但我们还确定并讨论他们推到自己的极限句法结构。

27. Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision [PDF] 返回目录
  Abhinav Shukla, Stavros Petridis, Maja Pantic
Abstract: The intuitive interaction between the audio and visual modalities is valuable for cross-modal self-supervised learning. This concept has been demonstrated for generic audiovisual tasks like video action recognition and acoustic scene classification. However, self-supervision remains under-explored for audiovisual speech. We propose a method to learn self-supervised speech representations from the raw audio waveform. We train a raw audio encoder by combining audio-only self-supervision (by predicting informative audio attributes) with visual self-supervision (by generating talking faces from audio). The visual pretext task drives the audio representations to capture information related to lip movements. This enriches the audio encoder with visual information and the encoder can be used for evaluation without the visual modality. Our method attains competitive performance with respect to existing self-supervised audio features on established isolated word classification benchmarks, and significantly outperforms other methods at learning from fewer labels. Notably, our method also outperforms fully supervised training, thus providing a strong initialization for speech related tasks. Our results demonstrate the potential of multimodal self-supervision in audiovisual speech for learning good audio representations.
摘要:在音频和视频模式之间的直观的人机交互是有价值的跨模态自我监督学习。这个概念已经被证明对于像视频行为识别和听觉场景分类通用视听任务。然而,自我监督遗体充分开发的视听讲话。我们建议学习从原始音频波形自我监督的讲话表示的方法。我们培养的结合仅音频自检(通过预测信息的音频属性)与视觉自检(由音频生成交谈面)原始音频编码器。视觉借口任务驱动的音频交涉与嘴唇动作捕捉信息。这种丰富视觉信息的音频编码器和编码器可用于评估,而不视觉模态。我们对于现有的既定孤立词的分类基准,自我监督的音频功能,并显著方法有竞争力的无所获表现在从较少的标签学习优于其他方法。值得注意的是,我们的方法也优于完全监督下的训练,从而为言语相关的任务,强大的初始化。我们的研究结果表明多式联运自我监督的视听讲话学习好音频表示的潜力。

28. Streaming End-to-End Bilingual ASR Systems with Joint Language Identification [PDF] 返回目录
  Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann
Abstract: Multilingual ASR technology simplifies model training and deployment, but its accuracy is known to depend on the availability of language information at runtime. Since language identity is seldom known beforehand in real-world scenarios, it must be inferred on-the-fly with minimum latency. Furthermore, in voice-activated smart assistant systems, language identity is also required for downstream processing of ASR output. In this paper, we introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification (LID) using the recurrent neural network transducer (RNN-T) architecture. On the input side, embeddings from pretrained acoustic-only LID classifiers are used to guide RNN-T training and inference, while on the output side, language targets are jointly modeled with ASR targets. The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India. Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies. For the more challenging (owing to within-utterance code switching) case of English-Hindi, English ASR and LID metrics show degradation. Overall, in scenarios where users switch dynamically between languages, the proposed architecture offers a promising simplification over running multiple monolingual ASR models and an LID classifier in parallel.
摘要:多语种ASR技术简化模型的训练和部署,但其精度是已知的依赖于在运行时的语言信息的可用性。由于语言标识很少在现实世界的场景事先知道,它必须在即时与最小的延迟来推断。此外,在语音激活智能辅助系统中,还需要对ASR的输出端的下游处理语言标识。在本文中,我们引入流,端至端,执行既ASR以及使用所述回归神经网络传感器(RNN-T)结构语言识别(LID)双语系统。在投入方面,从预训练的声学只LID分类的嵌入用于指导RNN-T训练和推理,而在输出侧,语言指标均会同ASR目标建模。在印度说英语 - 西班牙语所讲的在美国和英国,印地文:该方法适用于两张语言对。实验表明,英语 - 西班牙语,双语联合ASR-LID架构的单语ASR和声学只LID精度匹配。对于更有挑战性(由于内发声代码转换)英语 - 印地文,英语ASR和LID度量的情况下表现出的降解。总体而言,在用户语言之间进行动态切换,所提出的架构提供了运行多个单语ASR模型有希望简化和并行的LID分类方案。

29. Spatio-Temporal Scene Graphs for Video Dialog [PDF] 返回目录
  Shijie Geng, Peng Gao, Chiori Hori, Jonathan Le Roux, Anoop Cherian
Abstract: The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to indulge in a natural conversation with a human about a given video. Specifically, apart from the video frames, the agent receives the audio, brief captions, and a dialog history, and the task is to produce the correct answer to a question about the video. Due to the diversity in the type of inputs, this task poses a very challenging multimodal reasoning problem. Current approaches to AVSD either use global video-level features or those from a few sampled frames, and thus lack the ability to explicitly capture relevant visual regions or their interactions for answer generation. To this end, we propose a novel spatio-temporal scene graph representation (STSGR) modeling fine-grained information flows within videos. Specifically, on an input video sequence, STSGR (i) creates a two-stream visual and semantic scene graph on every frame, (ii) conducts intra-graph reasoning using node and edge convolutions generating visual memories, and (iii) applies inter-graph aggregation to capture their temporal evolutions. These visual memories are then combined with other modalities and the question embeddings using a novel semantics-controlled multi-head shuffled transformer, which then produces the answer recursively. Our entire pipeline is trained end-to-end. We present experiments on the AVSD dataset and demonstrate state-of-the-art results. A human evaluation on the quality of our generated answers shows 12% relative improvement against prior methods.
摘要:视听场景感知对话框(AVSD)的任务,需要有一个在一个给定的视频的人的代理人沉迷于一个自然对话。具体而言,除了视频帧,代理接收音频,字幕短暂和对话历史,任务就是产生正确回答关于视频的问题。由于投入的类型的多样性,这个任务提出了一个非常具有挑战性的多模态推理问题。到AVSD目前的做法既可以使用全球视频级功能或者那些从几个采样帧,因此缺乏明确捕获有关视觉的区域或他们的答案产生相互作用的能力。为此,我们提出了一种新颖的时空场景图的表示(STSGR)建模细粒度信息的视频内流动。具体地,在输入视频序列,STSGR(ⅰ)创建每个帧上的两流视觉和语义场景图,(ⅱ)进行-图表帧内推理使用节点和边的卷积产生的视觉记忆,和(iii)适用帧间图聚集捕捉他们的时间的演化。这些视觉记忆然后用使用新颖的语义控制多头其他模式和问题的嵌入组合改组变压器,然后产生答案递归。我们的整个管道被训练结束到终端。我们在AVSD数据集目前的实验和展示国家的最先进的成果。对我们产生的回答显示了对现有技术的方法12%的相对改善质量进行人工评估。

30. Expressive Interviewing: A Conversational System for Coping with COVID-19 [PDF] 返回目录
  Charles Welch, Allison Lahnala, Verónica Pérez-Rosas, Siqi Shen, Sarah Seraj, Larry An, Kenneth Resnicow, James Pennebaker, Rada Mihalcea
Abstract: The ongoing COVID-19 pandemic has raised concerns for many regarding personal and public health implications, financial security and economic stability. Alongside many other unprecedented challenges, there are increasing concerns over social isolation and mental health. We introduce \textit{Expressive Interviewing}--an interview-style conversational system that draws on ideas from motivational interviewing and expressive writing. Expressive Interviewing seeks to encourage users to express their thoughts and feelings through writing by asking them questions about how COVID-19 has impacted their lives. We present relevant aspects of the system's design and implementation as well as quantitative and qualitative analyses of user interactions with the system. In addition, we conduct a comparative evaluation with a general purpose dialogue system for mental health that shows our system potential in helping users to cope with COVID-19 issues.
摘要:正在进行COVID-19大流行已经提高了很多关于个人和公众健康的影响,金融安全和经济稳定的担忧。除了其他许多前所未有的挑战,也有对社会的隔离和心理健康日益关注。我们推出\ {textit表现面谈} - 即从动机访谈和表达性写作思路借鉴的采访式对话系统。表现访旨在鼓励用户通过询问COVID-19是如何影响他们的生活他们的问题写来表达自己的想法和感受。我们提出了系统的设计的相关方面,实施以及与系统的用户交互的定量和定性分析。此外,我们还与心理健康的通用对话系统,显示我们的系统在帮助用户应对COVID-19潜在的问题进行了对比评测。

31. Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations [PDF] 返回目录
  Huaiyi Huang, Yuqi Zhang, Qingqiu Huang, Zhengkui Guo, Ziwei Liu, Dahua Lin
Abstract: Place is an important element in visual understanding. Given a photo of a building, people can often tell its functionality, e.g. a restaurant or a shop, its cultural style, e.g. Asian or European, as well as its economic type, e.g. industry oriented or tourism oriented. While place recognition has been widely studied in previous work, there remains a long way towards comprehensive place understanding, which is far beyond categorizing a place with an image and requires information of multiple aspects. In this work, we contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places. Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc., and labels at multiple levels, including function, city, country, etc.. This dataset, with its large amount of data and rich annotations, allows various studies to be conducted. Particularly, in our studies, we develop 1) PlaceNet, a unified framework for multi-level place recognition, and 2) a method for city embedding, which can produce a vector representation for a city that captures both visual and multi-faceted side information. Such studies not only reveal key challenges in place understanding, but also establish connections between visual observations and underlying socioeconomic/cultural implications.
摘要:Place是直观的理解的重要元素。由于建筑物的照片,人们可以经常告诉它的功能,例如餐馆或商店,它的文化风格,例如亚洲或欧洲,以及其经济类型,例如面向行业或旅游为主。尽管地方肯定已被广泛研究在以前的工作中,仍然存在对综合处了解到,这远远超出了同一个图像分类的地方,需要多个方面的信息很长的路要走。在这项工作中,我们贡献Placepedia,大规模数据集的地方从240K独特的地方超过35M的照片。除了照片,每个地方还带有大量的多方面的信息,例如GDP,人口等,多层次的,包括功能,城市,国家等。该数据集,其大量的数据和丰富的注释标签,允许进行了各种研究。特别是,在我们的研究中,我们开发1)PlaceNet,多层次的地方识别一个统一的框架,和2)城市嵌入的方法,它可以产生一个向量表示,对于一座捕获视觉和多方位的侧信息。这样的研究不仅揭示了地方理解的关键挑战,同时也建立目视观测和潜在的社会经济/文化内涵之间的连接。

注:中文为机器翻译结果!封面为论文标题词云图!