0%

【arxiv论文】 Computation and Language 2020-03-03

目录

1. Gated Mechanism for Attention Based Multimodal Sentiment Analysis [PDF] 摘要
2. Identification of primary and collateral tracks in stuttered speech [PDF] 摘要
3. Multi-View Learning for Vision-and-Language Navigation [PDF] 摘要
4. PhoBERT: Pre-trained language models for Vietnamese [PDF] 摘要
5. Style Example-Guided Text Generation using Generative Adversarial Transformers [PDF] 摘要
6. Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation [PDF] 摘要
7. StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization [PDF] 摘要
8. Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification [PDF] 摘要
9. Voice trigger detection from LVCSR hypothesis lattices using bidirectional lattice recurrent neural networks [PDF] 摘要
10. Depth-Adaptive Graph Recurrent Network for Text Classification [PDF] 摘要
11. AraBERT: Transformer-based Model for Arabic Language Understanding [PDF] 摘要
12. The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources [PDF] 摘要
13. Pathological speech detection using x-vector embeddings [PDF] 摘要
14. Long Short-Term Sample Distillation [PDF] 摘要
15. Environment-agnostic Multitask Learning for Natural Language Grounded Navigation [PDF] 摘要
16. What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI [PDF] 摘要

摘要

1. Gated Mechanism for Attention Based Multimodal Sentiment Analysis [PDF] 返回目录
  Ayush Kumar, Jithendra Vepa
Abstract: Multimodal sentiment analysis has recently gained popularity because of its relevance to social media posts, customer service calls and video blogs. In this paper, we address three aspects of multimodal sentiment analysis; 1. Cross modal interaction learning, i.e. how multiple modalities contribute to the sentiment, 2. Learning long-term dependencies in multimodal interactions and 3. Fusion of unimodal and cross modal cues. Out of these three, we find that learning cross modal interactions is beneficial for this problem. We perform experiments on two benchmark datasets, CMU Multimodal Opinion level Sentiment Intensity (CMU-MOSI) and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) corpus. Our approach on both these tasks yields accuracies of 83.9% and 81.1% respectively, which is 1.6% and 1.34% absolute improvement over current state-of-the-art.
摘要:多模态的情感分析最近获得,因为其相关的社交媒体帖子,客户服务电话和视频博客的人气。在本文中,我们要解决多模态的情感分析的三个方面; 1.交叉模态相互作用的学习,即模式如何将多个向情绪,2.多模式交互学习长期依赖性和3的单峰和交叉模态线索的融合。在这些三,我们发现,学习跨模态相互作用,对于这个问题是有益的。我们在两个基准数据集进行实验,CMU多式联运意见级别情绪强度(CMU-MOSI)和CMU多式联运意见和情绪情感强度(CMU-MOSEI)语料库。我们在这两个任务分别做法产生的83​​.9%和81.1%的精度,这是1.6%和1.34%的绝对改进过电流状态的最先进的。

2. Identification of primary and collateral tracks in stuttered speech [PDF] 返回目录
  Rachid Riad, Anne-Catherine Bachoud-Lévi, Frank Rudzicz, Emmanuel Dupoux
Abstract: Disfluent speech has been previously addressed from two main perspectives: the clinical perspective focusing on diagnostic, and the Natural Language Processing (NLP) perspective aiming at modeling these events and detect them for downstream tasks. In addition, previous works often used different metrics depending on whether the input features are text or speech, making it difficult to compare the different contributions. Here, we introduce a new evaluation framework for disfluency detection inspired by the clinical and NLP perspective together with the theory of performance from \cite{clark1996using} which distinguishes between primary and collateral tracks. We introduce a novel forced-aligned disfluency dataset from a corpus of semi-directed interviews, and present baseline results directly comparing the performance of text-based features (word and span information) and speech-based (acoustic-prosodic information). Finally, we introduce new audio features inspired by the word-based span features. We show experimentally that using these features outperformed the baselines for speech-based predictions on the present dataset.
摘要:不流利的发言已经从以前的两个主要观点解决:临床的角度侧重于诊断,和自然语言处理(NLP)的角度,旨在模拟这些事件并检测它们对下游任务。此外,以前的作品中经常使用这取决于输入的功能是否是文本或语音,因此很难比较不同的贡献不同的指标。在这里,我们介绍通过临床和NLP的角度与性能从理论的启发在一起不流利检测新的评估框架\ {引用} clark1996using小学和抵押轨道之间用以区别。我们引入新的强制对齐不流利的数据集从半定向采访的语料库,和现在的基准结果直接比较的基于文本的功能(字和跨度信息)和基于语音(声韵律信息)的性能。最后,我们通过引入基于单词的跨功能激发了新的音频功能。我们展示实验上使用这些功能胜过对本数据集基于语音的预测基线。

3. Multi-View Learning for Vision-and-Language Navigation [PDF] 返回目录
  Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Yejin Choi, Noah A. Smith
Abstract: Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn from EveryOne (LEO), which leverages multiple instructions (as different views) for the same trajectory to resolve language ambiguity and improve generalization. By sharing parameters across instructions, our approach learns more effectively from limited training data and generalizes better in unseen environments. On the recent Room-to-Room (R2R) benchmark dataset, LEO achieves 16% improvement (absolute) over a greedy agent as the base agent (25.3% $\rightarrow$ 41.4%) in Success Rate weighted by Path Length (SPL). Further, LEO is complementary to most existing models for vision-and-language navigation, allowing for easy integration with the existing techniques, leading to LEO+, which creates the new state of the art, pushing the R2R benchmark to 62% (9% absolute improvement).
摘要:学习在一个可视化的环境中导航以下的自然语言指令是一项具有挑战性的任务,因为自然语言指令是高度可变的,暧昧的,并在指定的。在本文中,我们提出了一个新颖的培训模式,从每个人(LEO),它采用多种指令(如不同的看法)对同一轨迹决心语言歧义学习和提高泛化。通过从有限的训练数据,概括了在看不见的环境更好更有效地跨越指令共享参数,我们的方法可以学习。在最近的房间到房间(R2R)基准数据集,LEO达到16%的改善(绝对值)在贪婪剂为基剂(25.3%$ \ RIGHTARROW $ 41.4%)成功率的路径长度(SPL)加权。此外,LEO是为视觉和语言导航大多数现有车型的补充,允许与现有技术易于集成,导致LEO +,这创造了新的艺术状态,推R2R基准,以62%(9%绝对改善)。

4. PhoBERT: Pre-trained language models for Vietnamese [PDF] 返回目录
  Dat Quoc Nguyen, Anh Tuan Nguyen
Abstract: We present PhoBERT with two versions of "base" and "large"--the first public large-scale monolingual language models pre-trained for Vietnamese. We show that PhoBERT improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT is released at: this https URL
摘要:我们目前PhoBERT与“基地”和“大”的两个版本 - 第一次公开大规模的单语语言模型预训练越南。我们发现,PhoBERT提高了国家的最先进的多种具体的越南NLP任务,包括部分词性标注,命名实体识别和自然语言推理。我们发布PhoBERT,以方便未来的研究和越南NLP下游应用。此HTTPS URL:我们PhoBERT在发布

5. Style Example-Guided Text Generation using Generative Adversarial Transformers [PDF] 返回目录
  Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu
Abstract: We introduce a language generative model framework for generating a styled paragraph based on a context sentence and a style reference example. The framework consists of a style encoder and a texts decoder. The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context. We propose a novel objective function to train our framework. We also investigate different network design choices. We conduct extensive experimental validation with comparison to strong baselines to validate the effectiveness of the proposed framework using a newly collected dataset with diverse text styles. Both code and dataset will be released upon publication.
摘要:用于基于上下文的句子和样式参考示例风格的段落介绍语言生成模型框架。该框架包括一个风格编码器和解码器的文本中。样式编码器提取从参考例的样式码,并且解码器基于样式代码和上下文文本的文本。我们提出了一个新的目标函数来训练我们的框架。我们还研究了不同的网络设计选择。我们进行了广泛的实验验证与比较强的基线,以验证使用与不同的文本样式新收集的数据集所提出的框架的有效性。代码和数据集将出版时被释放。

6. Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation [PDF] 返回目录
  Hengyi Cai, Hongshen Chen, Cheng Zhang, Yonghao Song, Xiaofang Zhao, Yangxi Li, Dongsheng Duan, Dawei Yin
Abstract: Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes---specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.
摘要:当前国家的最先进的神经对话系统主要是数据驱动和对人体产生反应的训练。然而,由于主观性和人类对话的开放性质,训练对话的复杂程度差别很大。噪声和不均匀的查询响应对的复杂性阻碍了学习效率和神经对话代车型的影响。更重要的是,到目前为止,还没有统一的对话复杂的测量和属性的对话复杂性体现多个方面---特异性,重复性,相关性等,通过学习,相反,人类的行为,让孩子从简单的对话,启发学习对复杂的,动态调整自己的学习进度,在本文中,我们首先分析了5个对话属性来衡量的三个公开可用的语料中多角度的对话复杂性。然后,我们提出了一种自适应多的课程学习框架安排有组织的课程组成的委员会。该框架是在强化学习模式,根据神经对话代模型的学习状态在不断变化的学习过程中自动选择不同的课程成立。在五个国家的最先进的模型进行了广泛的实验证明其学习效率和效力相对于13个自动评价指标和人的判断。

7. StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization [PDF] 返回目录
  Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov
Abstract: Traditional preneural approaches to single document summarization relied on modeling the intermediate structure of a document before generating the summary. In contrast, the current state of the art neural summarization models do not preserve any intermediate structure, resorting to encoding the document as a sequence of tokens. The goal of this work is two-fold: to improve the quality of generated summaries and to learn interpretable document representations for summarization. To this end, we propose incorporating latent and explicit sentence dependencies into single-document summarization models. We use structure-aware encoders to induce latent sentence relations, and inject explicit coreferring mention graph across sentences to incorporate explicit structure. On the CNN/DM dataset, our model outperforms standard baselines and provides intermediate latent structures for analysis. We present an extensive analysis of our summaries and show that modeling document structure reduces copying long sequences and incorporates richer content from the source document while maintaining comparable summary lengths and an increased degree of abstraction.
摘要:传统preneural方法单个文档文摘依赖于产生该摘要之前建模文档的中间结构。与此相反,在本领域的神经总结模型的当前状态不保留任何中间结构,诉诸于编码文档作为标记序列。这项工作的目的是双重的:改善产生摘要的质量和学习解释文档表示进行汇总。为此,我们建议结合潜在的和明确的句子的依赖,进入单文档自动文摘模型。我们使用结构感知编码器,诱导潜在的句子关系,和整个句子注入明确提及coreferring图形纳入明确的结构。在CNN / DM数据集,我们的模型优于标准的基准,并提供中间潜在结构进行分析。我们提出我们的概要的一个广泛的分析和显示,建模文件结构减少复制长序列和同时保持相当摘要长度和抽象的增加的程度并入从源文档更丰富的内容。

8. Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification [PDF] 返回目录
  Wei-Hung Weng, Yu-An Chung, Schrasing Tong
Abstract: In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow. The ideal summarization strategy can preserve important information in the informative but less organized, ill-structured clinical narrative texts. Instead of using pure statistical learning approaches, which are difficult to interpret and explain, we utilized knowledge of computational linguistics with human experts-curated biomedical knowledge base to achieve the interpretable and meaningful clinical text summarization. Our research objective is to use the biomedical ontology with semantic information, and take the advantage from the language hierarchical structure, the constituency tree, in order to identify the correct clinical concepts and the corresponding negation information, which is critical for summarizing clinical concepts from narrative text. We achieved the clinically acceptable performance for both negation detection and concept identification, and the clinical concepts with common negated patterns can be identified and negated by the proposed method.
摘要:在临床信息爆炸的时代,为临床文本摘要一个很好的策略,有利于提高临床工作流程。理想的汇总策略可以保存在言之有物,但组织化程度较低,结构不良的临床叙事文本的重要信息。而是采用纯粹的统计学习方法,这是很难理解和解释,我们利用与人类专家策划的生物医学知识基础计算语言学的知识来实现​​可解释和有意义的临床文摘。我们的研究目标是利用语义信息,生物医学本体,并采取从语言层次结构,选区树的优势,以确定正确的临床概念和相应的否定信息,这是总结从叙事临床概念的关键文本。我们实现了两个否定检测和概念识别临床上可接受的性能,并与共同求反模式的临床概念可以被识别并且通过所提出的方法否定。

9. Voice trigger detection from LVCSR hypothesis lattices using bidirectional lattice recurrent neural networks [PDF] 返回目录
  Woojay Jeon, Leo Liu, Henry Mason
Abstract: We propose a method to reduce false voice triggers of a speech-enabled personal assistant by post-processing the hypothesis lattice of a server-side large-vocabulary continuous speech recognizer (LVCSR) via a neural network. We first discuss how an estimate of the posterior probability of the trigger phrase can be obtained from the hypothesis lattice using known techniques to perform detection, then investigate a statistical model that processes the lattice in a more explicitly data-driven, discriminative manner. We propose using a Bidirectional Lattice Recurrent Neural Network (LatticeRNN) for the task, and show that it can significantly improve detection accuracy over using the 1-best result or the posterior.
摘要:我们建议通过后处理通过神经网络的服务器端的大词汇量连续语音识别(LVCSR)的假设晶格以减少语音功能的个人助理的虚假语音触发的方法。我们首先讨论如何触发短语的后验概率的估计可以从假设晶格使用已知的技术进行检测而获得,然后调查其处理所述晶格在一个更明确地数据驱动,判别方式的统计模型。我们建议使用双向格递归神经网络(LatticeRNN)的任务,并表明,它可以显著提高检测精度比使用1最佳结果或后。

10. Depth-Adaptive Graph Recurrent Network for Text Classification [PDF] 返回目录
  Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou
Abstract: The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network, which views words as nodes and performs layer-wise recurrent steps between them simultaneously. Despite its successes on text representations, the S-LSTM still suffers from two drawbacks. Firstly, given a sentence, certain words are usually more ambiguous than others, and thus more computation steps need to be taken for these difficult words and vice versa. However, the S-LSTM takes fixed computation steps for all words, irrespective of their hardness. The secondary one comes from the lack of sequential information (e.g., word order) that is inherently important for natural language. In this paper, we try to address these issues and propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required. In addition, we integrate an extra RNN layer to inject sequential information, which also serves as an input feature for the decision of adaptive depths. Results on the classic text classification task (24 datasets in various sizes and domains) show that our model brings significant improvements against the conventional S-LSTM and other high-performance models (e.g., the Transformer), meanwhile achieving a good accuracy-speed trade off.
摘要:句子态LSTM(S-LSTM)是一个功能强大的,高效率的曲线图递归网络,哪些视图字作为节点和进行逐层复发它们之间同时的步骤。尽管其对文本表示成功的S-LSTM还是来自两个缺点。首先,给定一个句子中的某些词通常比其他人更暧昧的,因而更多的计算步骤,需要采取这些困难的话,反之亦然。然而,S-LSTM需要固定的所有字计算步骤,不论其硬度。二次一个来自缺乏顺序的信息(例如,单词顺序)是用于自然语言固有地重要。在本文中,我们试图解决这些问题,并提出了S-LSTM,这使得该模型了解有多少的计算步骤,为不同的单词根据需要进行深度的自适应机制。此外,我们还集成了一个额外的RNN层注入顺序信息,其也作为自适应深度的决定的输入功能。对经典文本分类任务结果(在各种尺寸和域24集)表明我们的模型带来了对传统的S-LSTM和其他高性能车型显著的改善(例如,变压器),同时达到良好的精度,速度贸易关闭。

11. AraBERT: Transformer-based Model for Arabic Language Understanding [PDF] 返回目录
  Wissam Antoun, Fady Baly, Hazem Hajj
Abstract: The Arabic language is a morphologically rich and complex language with relatively little resources and a less explored syntax compared to English. Given these limitations, tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models proved to have a very efficient understanding of languages, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. We then compare the performance of AraBERT with multilingual BERT provided by Google and other state-of-the-art approaches. The results of the conducted experiments show that the newly developed AraBERT achieved state-of-the-art results on most tested tasks. The pretrained araBERT models are publicly available on hoping to encourage research and applications for Arabic NLP.
摘要:阿拉伯语是一个形态丰富而复杂的语言以相对较少的资源,相比英语不太探讨语法。鉴于这些局限性,比如情感分析(SA),命名实体识别(NER)和问答(QA)的任务,已被证明是非常具有挑战性的解决。近年来,随着变压器的浪涌基于模型,特定语言的BERT基础的模式被证明具有的语言非常有效的理解,提供了一个非常大的语料库他们预先训练。这种模式能够为大部分NLP任务中设定了新的标准和实现国家的最先进的成果。在本文中,我们预先训练BERT专为追求达到相同的成功,BERT没有为英语的阿拉伯语。然后,我们比较AraBERT的多语种BERT性能由谷歌和国家的最先进的其他方法提供。在所进行的实验结果表明,新开发的AraBERT实现大多数测试任务的国家的最先进的成果。预训练araBERT模型上希望鼓励研究和应用阿拉伯语NLP公开。

12. The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources [PDF] 返回目录
  Jennifer D'Souza, Anett Hoppe, Arthur Brack, Mohamad Yaser Jaradeh, Sören Auer, Ralph Ewerth
Abstract: We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines that were found to be the most prolific ones on a major publishing platform. We describe the creation of such a multidisciplinary corpus and highlight the obtained findings in terms of the following features: 1) a generic conceptual formalism for scientific entities in a multidisciplinary scientific context; 2) the feasibility of the domain-independent human annotation of scientific entities under such a generic formalism; 3) a performance benchmark obtainable for automatic extraction of multidisciplinary scientific entities using BERT-based neural models; 4) a delineated 3-step entity resolution procedure for human annotation of the scientific entities via encyclopedic entity linking and lexicographic word sense disambiguation; and 5) human evaluations of Babelfy returned encyclopedic links and lexicographic senses for our entities. Our findings cumulatively indicate that human annotation and automatic learning of multidisciplinary scientific concepts as well as their semantic disambiguation in a wide-ranging setting as STEM is reasonable.
摘要:介绍了STEM(科学,技术,工程和医学)数据集科研实体提取,分类和分辨率,1.0版(STEM-ECR V1.0)。干-ECR V1.0数据集已经发展提供科学的实体提取,分类和解决的任务在域无关的方式评价的基准。它包括在被认为是一个主要的发布平台上最多产的那些10个STEM学科摘要。我们描述了这样一个跨学科文集的创作,并强调所获得的结果在以下功能方面:1)在一个多学科的科学背景的科学机构的一般概念上的形式主义; 2)这样的通用的形式主义下科学实体的域无关的人注释的可行性; 3)用于使用基于BERT神经模型的多学科科学实体的自动提取可获得的性能基准; 4)用于经由百科全书实体链接和字典词义消歧科学实体的人注释的划定3步实体解决过程; 5)Babelfy的人评价返回百科全书式的联系,并为我们的实体词典的感觉。我们的研究结果表明累计在广泛的环境,人的注释和多学科的科学概念,以及它们的语义歧义自动学习作为STEM是合理的。

13. Pathological speech detection using x-vector embeddings [PDF] 返回目录
  Catarina Botelho, Francisco Teixeira, Thomas Rolland, Alberto Abad, Isabel Trancoso
Abstract: The potential of speech as a non-invasive biomarker to assess a speaker's health has been repeatedly supported by the results of multiple works, for both physical and psychological conditions. Traditional systems for speech-based disease classification have focused on carefully designed knowledge-based features. However, these features may not represent the disease's full symptomatology, and may even overlook its more subtle manifestations. This has prompted researchers to move in the direction of general speaker representations that inherently model symptoms, such as Gaussian Supervectors, i-vectors and, x-vectors. In this work, we focus on the latter, to assess their applicability as a general feature extraction method to the detection of Parkinson's disease (PD) and obstructive sleep apnea (OSA). We test our approach against knowledge-based features and i-vectors, and report results for two European Portuguese corpora, for OSA and PD, as well as for an additional Spanish corpus for PD. Both x-vector and i-vector models were trained with an out-of-domain European Portuguese corpus. Our results show that x-vectors are able to perform better than knowledge-based features in same-language corpora. Moreover, while x-vectors performed similarly to i-vectors in matched conditions, they significantly outperform them when domain-mismatch occurs.
摘要:语音作为一种非侵入性的生物标志物来评估扬声器健康的潜力已经被多个工程的结果被一再支持,为物理和心理状况。对于基于语音的疾病分类传统的系统都集中在精心设计的以知识为基础的特征。但是,这些功能可能无法代表疾病的症状全面,甚至可以忽略其更微妙的表现。这促使研究人员一般扬声器表示的方向上移动固有地模型的症状,如高斯超向量,I-矢量和,X-载体。在这项工作中,我们侧重于后者,以评估它们的适用性作为一般的特征提取方法到检测帕金森氏病(PD)和阻塞性睡眠呼吸暂停(OSA)的。我们测试我们对以知识为基础的功能和i-载体,以及报告结果的方法有两个欧洲葡萄牙语语料库,对OSA和PD,以及为PD额外西班牙语料库。两种x矢量和i-矢量模型与一个彻头彻尾的域欧洲葡萄牙语语料库培训。我们的研究结果表明,X-载体能比同语料知识为基础的功能,以更好的表现。此外,虽然X-矢量执行类似于在匹配条件的i-载体,它们显著优于它们时域失配发生。

14. Long Short-Term Sample Distillation [PDF] 返回目录
  Liang Jiang, Zujie Wen, Zhongping Liang, Yafang Wang, Gerard de Melo, Zhe Li, Liangzhuang Ma, Jiaxing Zhang, Xiaolong Li, Yuan Qi
Abstract: In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.
摘要:在过去的十年里,在训练越来越深层神经网络已取得实质性进展。教师中的最新进展 - 学生培养模式已经建立了关于过去的培训更新的信息显示承诺为指导的在随后的训练步骤的来源。基于这个概念,在本文中,我们提出了长短期样品蒸馏,一种新颖的培训政策,即同时利用了以前的培训过程中的多个阶段,以指导以后的训练更新神经网络,而只是一个单一有效地进行代传。随着长短期样品蒸馏,每个样品的监管信号被分解为两个部分:一个长期的信号和短期的一个。长期的教师借鉴了几个时代的快照前,以提供坚定的指导和保障教师 - 学生的差异,而短期收益率一个更先进的最新线索有,可实现更高质量的更新的目标。此外,教师对每个样品是唯一的,这样,总体而言,从一个非常多样化的教师模型获悉。在一系列的视觉和NLP任务的综合实验结果表明,这种新的训练方法的有效性。

15. Environment-agnostic Multitask Learning for Natural Language Grounded Navigation [PDF] 返回目录
  Xin Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi
Abstract: Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to generalize well in previously unseen environments. In order to close the gap between seen and unseen environments, we aim at learning a generalized navigation model from two novel perspectives: (1) we introduce a multitask navigation model that can be seamlessly trained on both Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks, which benefits from richer natural language guidance and effectively transfers knowledge across tasks; (2) we propose to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments. Extensive experiments show that our navigation model trained using environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments and outperforms the baselines on unseen environments by 16% (relative measure on success rate) on VLN and 120% (goal progress) on NDH, establishing a new state-of-the-art for the NDH task. The code for training the navigation model using environment-agnostic multitask learning is available at this https URL.
摘要:最近的研究工作能够在照片般逼真的环境中,例如,以下的自然语言指令或对话框的自然语言接地导航研究。但是,现有的方法往往在看到环境过度拟合训练数据,并未能在以前看不见的环境下推广好。为了关闭可见和不可见的环境之间的间隙中,我们的目标是在学习从两个新颖的观点广义导航模型:(1)我们介绍可以同时在视觉语言导航(VLN)和导航无缝训练多任务导航模型从对话历史(NDH)的任务,从更丰富的自然语言指导的利益和整个任务有效地传递知识; (2)我们提出学习的导航策略,是在训练中看到的环境中不变的环境无关的交涉,从而对看不见的环境中更好的推广。大量的实验表明,我们的导航模式使用环境无关的多任务显著学习培训的减少可见和不可见的环境之间的性能差距,并通过16%的VLN优于对看不见的环境基线(成功率相对度量)和120%(目标的进展情况)在NDH,建立一个新的国家的最先进的NDH任务。训练使用环境无关的多任务学习导航模型的代码可在此HTTPS URL。

16. What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI [PDF] 返回目录
  Chaehan So
Abstract: When people buy products online, they primarily base their decisions on the recommendations of others given in online reviews. The current work analyzed these online reviews by sentiment analysis and used the extracted sentiments as features to predict the product ratings by several machine learning algorithms. These predictions were disentangled by various meth-ods of explainable AI (XAI) to understand whether the model showed any bias during prediction. Study 1 benchmarked these algorithms (knn, support vector machines, random forests, gradient boosting machines, XGBoost) and identified random forests and XGBoost as best algorithms for predicting the product ratings. In Study 2, the analysis of global feature importance identified the sentiment joy and the emotional valence negative as most predictive features. Two XAI visualization methods, local feature attributions and partial dependency plots, revealed several incorrect prediction mechanisms on the instance-level. Performing the benchmarking as classification, Study 3 identified a high no-information rate of 64.4% that indicated high class imbalance as underlying reason for the identified problems. In conclusion, good performance by machine learning algorithms must be taken with caution because the dataset, as encountered in this work, could be biased towards certain predictions. This work demonstrates how XAI methods reveal such prediction bias.
摘要:当人们购买产品线上,他们主要是立足于在网上评论给别人的建议,他们的决定。目前的工作分析由情感分析这些网上的评论和使用提取的情绪作为特征由几个机器学习算法来预测产品的评级。这些预测是由可解释AI(XAI)的各种甲基-ODS解开理解模型是否表明预测过程中的任何偏差。研究1基准这些算法(KNN,支持向量机,随机森林,梯度升压机,XGBoost)并确定随机森林和XGBoost作为用于预测产品评分最好的算法。在研究2中,全局特征重要性分析鉴定的情绪快乐和情绪价否定的,因为大多数的预测功能。两个XAI可视化方法,局部特征归属和部分依赖图,透露了关于实例级别的几个不正确的预测机制。执行基准分类,研究3中鉴定的64.4%的高无信息速率指示高类不平衡作为根本原因的确定的问题。总之,通过机器学习算法良好的性能,必须谨慎考虑,因为数据集,在这个工作中遇到的,可能对某些预测偏差。这项工作表明XAI方法是如何揭示这样的预测偏差。

注:中文为机器翻译结果!