0%

【arxiv论文】 Computation and Language 2020-02-17

目录

1. Transformer on a Diet [PDF] 摘要
2. Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base [PDF] 摘要
3. FQuAD: French Question Answering Dataset [PDF] 摘要
4. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems [PDF] 摘要
5. Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models [PDF] 摘要
6. A Data Efficient End-To-End Spoken Language Understanding Architecture [PDF] 摘要
7. Zero-Resource Cross-Domain Named Entity Recognition [PDF] 摘要
8. Understanding patient complaint characteristics using contextual clinical BERT embeddings [PDF] 摘要
9. Transformers as Soft Reasoners over Language [PDF] 摘要
10. HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing [PDF] 摘要
11. Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR [PDF] 摘要
12. Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers [PDF] 摘要
13. Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery [PDF] 摘要
14. Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [PDF] 摘要
15. Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings [PDF] 摘要

摘要

1. Transformer on a Diet [PDF] 返回目录
  Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola
Abstract: Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70% parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.
摘要:变压器已广泛应用于得益于其能否捕获序列信息的有效途径。然而,最近的事态发展,如BERT和GPT-2,重点放在有效性只提供重的体系结构。在本文中,我们将探讨3精心设计的灯变压器架构弄清楚用更少的计算,Transformer是否可以产生竞争的结果。在语言模型标准数据集实验结果提示,这种权衡是有希望的,虽然取得竞争的困惑与标准变压器,光变压器充其量减少70%的参数。源代码是公开的。

2. Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base [PDF] 返回目录
  William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler
Abstract: We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs, can scale to tens of millions of entities and facts, and is orders of magnitude faster than naive sparse-matrix implementations. The reified KB enables very simple end-to-end architectures to obtain competitive performance on several benchmarks representing two families of tasks: KB completion, and learning semantic parsers from denotations.
摘要:我们描述代表称为稀疏矩阵物化KB一个象征性的知识基础(KB)的新方法。这表示使得神经模块完全区分的,忠实于KB,表现足够多跳推理模型,可扩展性足以与使用大现实KB的原始语义。稀疏矩阵物化KB可以跨多个GPU分布,可以扩展到数以千万计的实体和事实的,是数量级比幼稚稀疏矩阵实现更快。在物化的KB能够非常简单的终端到终端的架构就代表任务的两个家庭几个基准获得有竞争力的性能:KB完成,并从denotations学习语义解析器。

3. FQuAD: French Question Answering Dataset [PDF] 返回目录
  Martin d'Hoffschmidt, Maxime Vidal, Wacim Belblidia, Tom Brendlé
Abstract: Recent advances in the field of language modeling have improved state-of-the-art results on many Natural Language Processing tasks. Among them, the Machine Reading Comprehension task has made significant progress. However, most of the results are reported in English since labeled resources available in other languages, such as French, remain scarce. In the present work, we introduce the French Question Answering Dataset (FQuAD). FQuAD is French Native Reading Comprehension dataset that consists of 25,000+ questions on a set of Wikipedia articles. A baseline model is trained which achieves an F1 score of 88.0% and an exact match ratio of 77.9% on the test set. The dataset is made freely available at https://fquad.illuin.tech.
摘要:在语言建模领域的最新进展已改善许多自然语言处理任务的国家的最先进的成果。其中,机器阅读理解任务,取得了显著的进步。然而,大部分的结果都报道了英语,因为在其他语言,如法语标注的资源,仍然十分匮乏。在目前的工作中,我们引进法国问答集(FQuAD)。 FQuAD是法语为母语阅读理解数据集包括一组的维基百科文章的25000个问题。基线模型被训练其实现的88.0%的F1分数和在测试组的77.9%的精确匹配比。该数据集是在https://fquad.illuin.tech免费提供。

4. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems [PDF] 返回目录
  Natalia Tomashenko, Christian Raymond, Antoine Caubriere, Renato De Mori, Yannick Esteve
Abstract: This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in the form of dialog history embedding vectors (so-called h-vectors) and is provided as an additional information to end-to-end SLU models in order to improve the system performance. Three following types of h-vectors are proposed and experimentally evaluated in this paper: (1) supervised-all embeddings predicting bag-of-concepts expected in the answer of the user from the last dialog system response; (2) supervised-freq embeddings focusing on predicting only a selected set of semantic concept (corresponding to the most frequent errors in our experiments); and (3) unsupervised embeddings. Experiments on the MEDIA corpus for the semantic slot filling task demonstrate that the proposed h-vectors improve the model performance.
摘要:该作品探讨了在口语理解(SLU)系统代表对话历史的嵌入物。我们专注于场景时,直接从语音信号通过一个单端至端的神经网络模型的装置所提取的语义信息。我们提出了对话的历史融入一个终端到终端的信号 - 概念SLU系统。对话历史被在对话历史嵌入矢量的形式表示(所谓的H-载体)和以提高系统的性能被设置为附加信息,以端 - 端SLU模型。三个以下类型的h-向量提出和实验本文评价:(1)监督-所有的嵌入预测袋的的概念预计在从最后一个对话系统响应所述用户的答案; (2)监督-FREQ的嵌入集中于预测只有选定的一组语义概念(对应于在我们的实验中最频繁的误差)的;和(3)无监督的嵌入。对媒体语料库语义槽分配任务,实验结果表明,所提出的H-载体提高模型的性能。

5. Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models [PDF] 返回目录
  Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu
Abstract: There has been a long recognition that discrete features (n-gram features) and neural network based features have complementary strengths for language models (LMs). Improved performance can be obtained by model interpolation, which is, however, a suboptimal two-step integration of discrete and neural features. The trans-dimensional random field (TRF) framework has the potential advantage of being able to flexibly integrate a richer set of features. However, either discrete or neural features are used alone in previous TRF LMs. This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features. Various LMs are trained over PTB and Google one-billion-word datasets, and evaluated in N-best list rescoring experiments for speech recognition. Among all single LMs (i.e. without model interpolation), the mixed-feature TRF LMs perform the best, improving over both discrete TRF LMs and neural TRF LMs alone, and also being significantly better than LSTM LMs. Compared to interpolating two separately trained models with discrete and neural features respectively, the performance of mixed-feature TRF LMs matches the best interpolated model, and with simplified one-step training process and reduced training time.
摘要:已经有很长的承认,离散特征(正语法特征)和基于神经网络的特点对语言模型(LMS)的互补优势。改进的性能可通过模型内插,然而其​​是,离散和神经功能次优的两个步骤的积分而获得。反式维随机场(TRF)框架具有能够灵活地集成更丰富的功能的潜在优势。然而,无论是分立或神经功能在以前的TRF LM的单独使用。本文开发的混合特征TRF LM并演示了在整合离散和神经功能的优势。各种LM的是培养了PTB和谷歌一十亿字的数据集,并评估了N最佳列表再评分实验语音识别。在所有单个的LM(即没有模型内插),混合特征TRF的LM执行最好的,改善优于分立的TRF LMS和神经TRF的LM单独,也被显著优于LSTM的LM。相比分别内插两个可单独训练的模型具有离散和神经功能,混合特征TRF的LM的性能最好的内插模型,并用简化的一步法训练过程和减少训练时间相匹配。

6. A Data Efficient End-To-End Spoken Language Understanding Architecture [PDF] 返回目录
  Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier
Abstract: End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very good results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy. For that, in many cases, models are combined with an external language model to enhance their performance. In this paper we introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. One key feature of our approach is an incremental training procedure where acoustic, language and semantic models are trained sequentially one after the other. The proposed model has a reasonable size and achieves competitive results with respect to state-of-the-art while using a small training dataset. In particular, we reach 24.02% Concept Error Rate (CER) on MEDIA/test while training on MEDIA/train without any additional data.
摘要:结束到终端的架构最近已提出了口语理解(SLU)和语义分析。基于大量的数据,这些模型学习共同声音和语言顺序功能。这种架构给域,意图和插槽检测,他们在更复杂的语义分块的应用程序的情况下非常好的效果和标记的任务是不容易。对于这一点,在许多情况下,模型与外部的语言模型相结合,以提高它们的性能。在本文中,我们介绍该训练端至端,没有额外的,预先训练外部模块数据有效的系统。我们的做法的一个重要功能就是声学,语言和语义模型依次经过培训的一前一后的增量训练过程。该模型有一个合理的规模,并实现了相对于同时使用一个小的训练数据集的国家的最先进的具有竞争力的结果。特别是,我们达到24.02%,概念错误率上的媒体(CER)/测试,而在媒体/火车训练没有任何额外的数据。

7. Zero-Resource Cross-Domain Named Entity Recognition [PDF] 返回目录
  Zihan Liu, Genta Indra Winata, Pascale Fung
Abstract: Existing models for cross-domain named entity recognition (NER) rely on numerous unlabeled corpus or labeled NER training data in target domains. However, collecting data for low-resource target domains is not only expensive but also time-consuming. Hence, we propose a cross-domain NER model that does not use any external resources. We first introduce Multi-Task Learning (MTL) by adding a new objective function to detect whether tokens are named entities or not. We then introduce a framework called Mixture of Entity Experts (MoEE) to improve the robustness for zero-resource domain adaptation. Finally, experimental results show that our model outperforms strong unsupervised cross-domain sequence labeling models, and the performance of our model is close to that of the state-of-the-art model which leverages extensive resources.
摘要:跨域命名实体识别(NER)现有模型依赖于大量的未标记的语料库或目标域标记NER的训练数据。然而,对于低资源目标域收集数据不仅昂贵而且耗时。因此,我们建议不使用任何外部资源跨域NER模型。我们首先通过添加一个新的目标函数,以检测是否令牌被命名为实体或不引入多任务学习(MTL)。然后,我们引入了一个名为实体专家(MoEE)的混合物来改善零资源领域适应性的鲁棒性框架。最后,实验结果表明,我们的模型优于强监督的跨域序列标注模型,我们的模型的性能接近国家的最先进的模型,利用广泛的资源。

8. Understanding patient complaint characteristics using contextual clinical BERT embeddings [PDF] 返回目录
  Budhaditya Saha, Sanal Lisboa, Shameek Ghosh
Abstract: In clinical conversational applications, extracted entities tend to capture the main subject of a patient's complaint, namely symptoms or diseases. However, they mostly fail to recognize the characterizations of a complaint such as the time, the onset, and the severity. For example, if the input is "I have a headache and it is extreme", state-of-the-art models only recognize the main symptom entity - headache, but ignore the severity factor of "extreme", that characterizes headache. In this paper, we design a two-stage approach to detect the characterizations of entities like symptoms presented by general users in contexts where they would describe their symptoms to a clinician. We use Word2Vec and BERT to encode clinical text given by the patients. We transform the output and re-frame the task as multi-label classification problem. Finally, we combine the processed encodings with the Linear Discriminant Analysis (LDA) algorithm to classify the characterizations of the main entity. Experimental results demonstrate that our method achieves 40-50% improvement on the accuracy over the state-of-the-art models.
摘要:在临床应用会话,提取的实体往往抓住患者的投诉,即症状或疾病的主要议题。然而,他们大多没有认识到投诉的刻画,如时间,发病和严重程度。例如,如果输入的是“我头疼,这是极端的”,国家的最先进的机型只承认主要症状实体 - 头痛,却忽略了“极端”的严重性因素,表征头痛。在本文中,我们设计了一个两阶段的方法来检测类似的环境中一般用户呈现症状实体的表征,他们会描述自己的症状给临床医生。我们使用Word2Vec和BERT通过给予患者临床文本进行编码。我们变换输出和重制帧任务的多标签分类问题。最后,我们用线性判别分析(LDA)算法相结合的处理编码的主要实体的表征进行分类。实验结果表明,我们的方法实现对精度超过国家的最先进的机型40-50%的改善。

9. Transformers as Soft Reasoners over Language [PDF] 返回目录
  Peter Clark, Oyvind Tafjord, Kyle Richardson
Abstract: AI has long pursued the goal of having systems reason over *explicitly provided* knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a formal representation. We provide the first demonstration that this is possible, and characterize the extent of this capability. To do this, we use a collection of synthetic datasets that test increasing levels of reasoning complexity (number of rules, presence of negation, and depth of chaining). We find transformers appear to learn rule-based reasoning with high (99%) accuracy on these datasets, and in a way that generalizes to test data requiring substantially deeper chaining than in the training data (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. These findings are significant as it suggests a new role for transformers, namely as a limited "soft theorem prover" operating over explicit theories in language. This in turn suggests new possibilities for explainability, correctability, and counterfactual reasoning in question-answering. All datasets and a live demo are available at this http URL
摘要:AI一直在寻求具有系统原因,目标明确* *提供的知识,更适合建设表示已被证明具有挑战性。这里,我们探讨是否变压器同样可以学习的原因(或模拟推理),但使用在语言表达的规则,从而绕过了正式代表。我们提供首次证明这是可能的,而表征这种能力的程度。要做到这一点,我们使用合成的数据集的集合测试增加复杂的推理(规则数,否定的存在,和链接的深度)的水平。我们发现变压器出现学习规则推理高(99%)的精度对这些数据集,并在推广到测试数据需要大幅更深的链接比在训练数据(95%+)分的方式。我们还表明,该模型转移阱两个手创作的规则库,并转述成更自然的语言规则库。因为它表明变压器一个新角色,即作为一个有限的“软定理证明”工作语言在明确的理论这些发现显著。这又提出了explainability,可纠,并在问题回答反推理新的可能性。所有的数据集和现场演示都可以在这个HTTP URL

10. HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing [PDF] 返回目录
  Xiyou Zhou, Zhiyu Chen, Xiaoyong Jin, William Yang Wang
Abstract: Computation-intensive pretrained models have been taking the lead of many natural language processing benchmarks such as GLUE. However, energy efficiency in the process of model training and inference becomes a critical bottleneck. We introduce HULK, a multi-task energy efficiency benchmarking platform for responsible natural language processing. With HULK, we compare pretrained models' energy efficiency from the perspectives of time and cost. Baseline benchmarking results are provided for further analysis. The fine-tuning efficiency of different pretrained models can differ a lot among different tasks and fewer parameter number does not necessarily imply better efficiency. We analyzed such phenomenon and demonstrate the method of comparing the multi-task efficiency of pretrained models. Our platform is available at this https URL.
摘要:计算密集型的预训练模式已经采取了许多自然语言处理基准测试中领先胶水等。然而,在模型训练和推理过程中的能源效率成为一个关键瓶颈。我们介绍HULK,多任务的能源效率基准平台负责自然语言处理。随着HULK,我们比较从时间和成本的角度预训练模型的能源效率。基线基准测试结果提供了进一步的分析。不同预训练模型的微调效率不同,不同的任务和更少的参数号中有很多并不一定意味着更好的效率。我们分析了这种现象,并证明比较预先训练模式的多任务效率的方法。我们的平台可在此HTTPS URL。

11. Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR [PDF] 返回目录
  Leda Sarı, Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract: We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR). The proposed model contains a memory block that holds speaker i-vectors extracted from the training data and reads relevant i-vectors from the memory through an attention mechanism. The resulting memory vector (M-vector) is concatenated to the acoustic features or to the hidden layer activations of an E2E neural network model. The E2E ASR system is based on the joint connectionist temporal classification and attention-based encoder-decoder architecture. M-vector and i-vector results are compared for inserting them at different layers of the encoder neural network using the WSJ and TED-LIUM2 ASR benchmarks. We show that M-vectors, which do not require an auxiliary speaker embedding extraction system at test time, achieve similar word error rates (WERs) compared to i-vectors for single speaker utterances and significantly lower WERs for utterances in which there are speaker changes.
摘要:我们建议由神经图灵机的端至端(E2E)的启发无监督说话人自适应方法自动语音识别(ASR)。该模型包含一个保持从训练数据中提取扬声器的i-矢量,并通过关注机构从存储器读出有关的i-矢量的存储器块。将得到的存储器向量(M-矢量)级联到声学特征或到E2E神经网络模型的隐藏层的激活。该E2E ASR系统是基于基于共同的关注联结时间分类和编码器,解码器架构。 M-矢量和i-矢量结果在使用WSJ和TED-LIUM2 ASR基准编码器神经网络的不同层将它们插入比较。我们证明了M-载体,不需要辅助扬声器在测试时嵌入提取系统,相比于我的载体为单喇叭话语和显著降低WERS的言论,其中有扬声器的变化实现类似的字错误率(WERS) 。

12. Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers [PDF] 返回目录
  Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
Abstract: The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.
摘要:在过去的十年中取得数字化的历史记录了大量的自然借给自己自动处理和探索。研究工作寻求自动处理传真和提取信息,从而与倍增,作为第一个重要步骤,文档布局分析。如果识别和文件图像的兴趣细分的分类已经看到过去几年中由于深学习技术了显著的进步,许多挑战仍然存在,等等,使用细粒度分割类型学和考虑复杂的异构文件如历史报纸。此外,大多数的方法只考虑视觉特征,忽略文本信号。在这方面,我们引入历史报纸的语义分割,结合视觉和文本特征的多模态的方法。基于一系列关于历时瑞士和卢森堡报纸实验,调查,除其他外,视觉和文字特征的预测能力和他们的能力,以跨越时间和来源一概而论。结果表明,比较多车型的持续改进,以强烈的视觉底线,以及更好的鲁棒性高的材料差异。

13. Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery [PDF] 返回目录
  Hakime Öztürk, Arzucan Özgür, Philippe Schwaller, Teodoro Laino, Elif Ozkirimli
Abstract: Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to construct models to predict molecular properties or to design novel molecules. This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.
摘要:化学物质和蛋白质的基于文本的表示可以被认为是人类编纂非结构化的语言来描述特定领域的知识。进展自然语言处理(NLP)的方法在语言的处理加速NLP的应用,阐明这些生化实体的文本表示隐含的知识,然后用它来构建模型来预测分子性质或设计新分子。本次审查概述了对药物发现和宗旨这些进步,以推动药物化学家和计算机科学家之间的对话的影响。

14. Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [PDF] 返回目录
  Aleksei Gusev, Vladimir Volokhov, Tseren Andzhukaev, Sergey Novoselov, Galina Lavrentyeva, Marina Volkova, Alice Gazizullina, Andrey Shulipa, Artem Gorlanov, Anastasia Avdeeva, Artem Ivanov, Alexander Kozlov, Timur Pekhovsky, Yuri Matveev
Abstract: Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical point of view, taking into account the increased interest in virtual assistants (such as Amazon Alexa, Google Home, AppleSiri, etc.), speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances. For these purposes, we considered deep neural network architectures based on TDNN (TimeDelay Neural Network) and ResNet (Residual Neural Network) blocks. We experimented with state-of-the-art embedding extractors and their training procedures. Obtained results confirm that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality for both long-duration and short-duration utterances. We also investigate the impact of speech activity detector, different scoring models, adaptation and score normalization techniques. The experimental results are presented for publicly available data and verification protocols for the VoxCeleb1, VoxCeleb2, and VOiCES datasets.
摘要:基于深扬声器的嵌入说话人识别系统已经根据早期NIST SRE(说话人识别评价)的数据集所获得的结果在控制的条件下实现显著性能。从实用的角度出发,考虑到虚拟助理(如Alexa的亚马逊,谷歌主页,AppleSiri等),在不受控制嘈杂的环境条件下短话语说话人确认的兴趣增加是一个最具挑战性和高要求任务。本文礼物办法旨在实现两个目标:1)提高远场扬声器验证系统的环境噪声,混响和b的存在质量)减少短话语系统qualitydegradation。为了这些目的,我们认为是基于TDNN(纯滞后神经网络)和RESNET(残余神经网络)块深层神经网络结构。我们尝试与国家的最先进的嵌入提取和他们的训练程序。得到的结果证实,RESNET架构要优于标准的x向量方法在两个长持续时间和短持续时间的话语的说话者验证质量方面。我们还调查语音活动检测器,不同的评分模型,适应和分数标准化技术的影响。实验结果提出了可公开获得的数据和验证协议的VoxCeleb1,VoxCeleb2和声音的数据集。

15. Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings [PDF] 返回目录
  Hongyu Ren, Weihua Hu, Jure Leskovec
Abstract: Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions ($\wedge$) and existential quantifiers ($\exists$). Handling queries with logical disjunctions ($\vee$) remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with $\wedge$, $\vee$, and $\exists$ operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with $\wedge$, $\vee$, $\exists$ in a scalable manner. We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.
摘要:大型不全知识图(KGS)回答复杂的逻辑查询是一项基本而具有挑战性的任务。近日,有前途的方法这个问题已经嵌入KG实体以及查询到向量空间,使得回答查询实体嵌入接近查询。然而,以前的工作模式查询,在向量空间,这是问题,因为一个复杂的查询表示一个潜在的大集的答案实体,但目前还不清楚一套怎么这么可以表示为一个单点单点。此外,以前的工作只能处理的查询使用连词($ \ $楔)和存在量词($ \ $存在)。处理查询与逻辑或($ \ $ V型)仍然是一个悬而未决的问题。在这里我们建议query2box,与$ \ $楔形,$ \ $ V型推理在任意查询基于嵌入的框架和$ \大规模和不完整的幼儿园存在$运营商。我们的主要观点是,查询可以嵌入为框(即超矩形),其中一组框对应内部点的一组查询的答案实体。我们表明,连词可以自然地表示为方框交叉,也证明了一个否定结果处理析取需要与尺寸比例嵌入到KG实体的数量。然而,我们表明,通过将查询到的析取范式,query2box能够处理任意逻辑查询与$ \ $楔形,$ \ $ V型,$ \以可扩展的方式存在$。我们证明query2box的三个大型幼儿园的有效性,并表明query2box在技术状态具有高达25%的相对改善。

注:中文为机器翻译结果!