目录
1. Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents [PDF] 摘要
9. Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks [PDF] 摘要
10. If I Hear You Correctly: Building and Evaluating Interview Chatbots with Active Listening Skills [PDF] 摘要
摘要
1. Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents [PDF] 返回目录
Ruixue Zhang, Wei Yang, Luyun Lin, Zhengkai Tu, Yuqing Xie, Zihang Fu, Yuhao Xie, Luchen Tan, Kun Xiong, Jimmy Lin
Abstract: Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than "standard" information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.
摘要:技术的自动提取业务文档的重要内容元素,如合同,报表和申报必须让企业运营更为有效的潜力。这个问题可以配制成序列标注任务,我们证明BERT的两种类型商务文档的适应:监管机构备案和物业租赁协议。有迹象表明,使它比“标准”信息提取任务等各个方面,使之更加困难,更容易,但总的来说,我们发现,适量的注释数据(小于100个文件)的足以实现合理的准确性这个问题的各个方面。我们我们的模型集成到一个终端到终端的云平台,同时提供了一个易于使用的界面注释以及推理接口,允许用户上传文档并检查模型输出。
Ruixue Zhang, Wei Yang, Luyun Lin, Zhengkai Tu, Yuqing Xie, Zihang Fu, Yuhao Xie, Luchen Tan, Kun Xiong, Jimmy Lin
Abstract: Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than "standard" information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.
摘要:技术的自动提取业务文档的重要内容元素,如合同,报表和申报必须让企业运营更为有效的潜力。这个问题可以配制成序列标注任务,我们证明BERT的两种类型商务文档的适应:监管机构备案和物业租赁协议。有迹象表明,使它比“标准”信息提取任务等各个方面,使之更加困难,更容易,但总的来说,我们发现,适量的注释数据(小于100个文件)的足以实现合理的准确性这个问题的各个方面。我们我们的模型集成到一个终端到终端的云平台,同时提供了一个易于使用的界面注释以及推理接口,允许用户上传文档并检查模型输出。
2. Automatic Location Type Classification From Social-Media Posts [PDF] 返回目录
Elad Kravi, Benny Kimelfeld, Yaron Kanza, Roi Reichart
Abstract: We introduce the problem of Automatic Location Type Classification from social media posts. Our goal is to correctly associate a set of messages posted in a small radius around a given location with their corresponding location type, e.g., school, church, restaurant or museum. We provide a dataset of locations associated with tweets posted in close geographical proximity. We explore two approaches to the problem: (a) a pipeline approach where each message is first classified, and then the location associated with the message set is inferred from the individual message labels; and (b) a joint approach where the individual messages are simultaneously processed to yield the desired location type. Our results demonstrate the superiority of the joint approach. Moreover, we show that due to the unique structure of the problem, where weakly-related messages are jointly processed to yield a single final label, simpler linear classifiers outperform deep neural network alternatives that have shown superior in previous text classification tasks.
摘要:我们从社交媒体文章介绍了自动定位类型划分的问题。我们的目标是正确的一组贴在小半径围绕给定的位置,其对应的位置类型,例如,学校,教堂,餐馆或博物馆的消息联系起来。我们提供的与贴在缘相近微博相关联的位置的数据集。我们探索两种方法的问题:(1)管线方法,其中每个消息首先分类,然后与消息集相关联的位置被从单独的消息标签推断;和(b)在各个消息被同时处理的联合方法,得到所需的位置类型。我们的研究结果表明共同方法的优越性。此外,我们表明,由于问题,在弱相关的消息被联合处理,以产生一个最终标签的独特结构,简单的线性分类跑赢大盘已显示出在以前的文本分类任务优良的深神经网络的替代品。
Elad Kravi, Benny Kimelfeld, Yaron Kanza, Roi Reichart
Abstract: We introduce the problem of Automatic Location Type Classification from social media posts. Our goal is to correctly associate a set of messages posted in a small radius around a given location with their corresponding location type, e.g., school, church, restaurant or museum. We provide a dataset of locations associated with tweets posted in close geographical proximity. We explore two approaches to the problem: (a) a pipeline approach where each message is first classified, and then the location associated with the message set is inferred from the individual message labels; and (b) a joint approach where the individual messages are simultaneously processed to yield the desired location type. Our results demonstrate the superiority of the joint approach. Moreover, we show that due to the unique structure of the problem, where weakly-related messages are jointly processed to yield a single final label, simpler linear classifiers outperform deep neural network alternatives that have shown superior in previous text classification tasks.
摘要:我们从社交媒体文章介绍了自动定位类型划分的问题。我们的目标是正确的一组贴在小半径围绕给定的位置,其对应的位置类型,例如,学校,教堂,餐馆或博物馆的消息联系起来。我们提供的与贴在缘相近微博相关联的位置的数据集。我们探索两种方法的问题:(1)管线方法,其中每个消息首先分类,然后与消息集相关联的位置被从单独的消息标签推断;和(b)在各个消息被同时处理的联合方法,得到所需的位置类型。我们的研究结果表明共同方法的优越性。此外,我们表明,由于问题,在弱相关的消息被联合处理,以产生一个最终标签的独特结构,简单的线性分类跑赢大盘已显示出在以前的文本分类任务优良的深神经网络的替代品。
3. Discontinuous Constituent Parsing with Pointer Networks [PDF] 返回目录
Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract: One of the most complex syntactic representations used in computational linguistics and NLP are discontinuous constituent trees, crucial for representing all grammatical phenomena of languages such as German. Recent advances in dependency parsing have shown that Pointer Networks excel in efficiently parsing syntactic relations between words in a sentence. This kind of sequence-to-sequence models achieve outstanding accuracies in building non-projective dependency trees, but its potential has not been proved yet on a more difficult task. We propose a novel neural network architecture that, by means of Pointer Networks, is able to generate the most accurate discontinuous constituent representations to date, even without the need of Part-of-Speech tagging information. To do so, we internally model discontinuous constituent structures as augmented non-projective dependency structures. The proposed approach achieves state-of-the-art results on the two widely-used NEGRA and TIGER benchmarks, outperforming previous work by a wide margin.
摘要:一个在计算语言学和自然语言处理中使用的最复杂的句法表征的是不连续的组成部分树木,为代表的语言的所有语法现象,如德国的关键。在依存分析的最新进展表明,指针网络高强高效地分析词与词之间句法关系的句子。这种顺序对序列模型的实现建立非投影依赖树出色的精度,但它的潜力还没有得到一个更艰巨的任务尚未证实。我们提出了一种新的神经网络结构,通过指针网络的手段,能够产生最准确的不连续的组成表示到目前为止,即使没有需要的部分,词性标注信息。要做到这一点,我们在内部不连续的成分结构建模为增强非投影依赖结构。所提出的方法实现对两种广泛使用的NEGRA和TIGER基准国家的先进成果,大幅跑赢以前的工作。
Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract: One of the most complex syntactic representations used in computational linguistics and NLP are discontinuous constituent trees, crucial for representing all grammatical phenomena of languages such as German. Recent advances in dependency parsing have shown that Pointer Networks excel in efficiently parsing syntactic relations between words in a sentence. This kind of sequence-to-sequence models achieve outstanding accuracies in building non-projective dependency trees, but its potential has not been proved yet on a more difficult task. We propose a novel neural network architecture that, by means of Pointer Networks, is able to generate the most accurate discontinuous constituent representations to date, even without the need of Part-of-Speech tagging information. To do so, we internally model discontinuous constituent structures as augmented non-projective dependency structures. The proposed approach achieves state-of-the-art results on the two widely-used NEGRA and TIGER benchmarks, outperforming previous work by a wide margin.
摘要:一个在计算语言学和自然语言处理中使用的最复杂的句法表征的是不连续的组成部分树木,为代表的语言的所有语法现象,如德国的关键。在依存分析的最新进展表明,指针网络高强高效地分析词与词之间句法关系的句子。这种顺序对序列模型的实现建立非投影依赖树出色的精度,但它的潜力还没有得到一个更艰巨的任务尚未证实。我们提出了一种新的神经网络结构,通过指针网络的手段,能够产生最准确的不连续的组成表示到目前为止,即使没有需要的部分,词性标注信息。要做到这一点,我们在内部不连续的成分结构建模为增强非投影依赖结构。所提出的方法实现对两种广泛使用的NEGRA和TIGER基准国家的先进成果,大幅跑赢以前的工作。
4. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters [PDF] 返回目录
Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu ji, Cuihong Cao, Daxin Jiang, Ming Zhou
Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from the problem of catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further show that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.
摘要:我们研究的知识注入到大预先训练的车型,如BERT和罗伯塔的问题。现有的方法通常注射知识时更新的预先训练模型的原始参数。然而,当多个种类的知识注入,它们可以从灾难性遗忘的问题的困扰。为了解决这个问题,我们提出了K-适配器,这仍然是预先训练模型固定和支持持续的知识灌输的原始参数。以罗伯塔作为预先训练模型,K-适配器有各种灌输的知识的神经适配器,就像连接到一个罗伯塔插件。有不同的适配器,从而不同的适配器以分布式方式有效地训练之间没有信息流。我们注入两种知识,包括自动对齐文本三联维基百科和维基数据获得的实际知识,并从依赖分析获得的语言知识。三个知识驱动型任务(共六集),包括有关分类,实体打字和答疑结果表明,每个适配器提高性能,并且这两个适配器的组合带来了进一步的改进。探测实验进一步表明,K-适配器捕捉比罗伯塔更丰富的事实和常识性知识。
Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu ji, Cuihong Cao, Daxin Jiang, Ming Zhou
Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from the problem of catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further show that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.
摘要:我们研究的知识注入到大预先训练的车型,如BERT和罗伯塔的问题。现有的方法通常注射知识时更新的预先训练模型的原始参数。然而,当多个种类的知识注入,它们可以从灾难性遗忘的问题的困扰。为了解决这个问题,我们提出了K-适配器,这仍然是预先训练模型固定和支持持续的知识灌输的原始参数。以罗伯塔作为预先训练模型,K-适配器有各种灌输的知识的神经适配器,就像连接到一个罗伯塔插件。有不同的适配器,从而不同的适配器以分布式方式有效地训练之间没有信息流。我们注入两种知识,包括自动对齐文本三联维基百科和维基数据获得的实际知识,并从依赖分析获得的语言知识。三个知识驱动型任务(共六集),包括有关分类,实体打字和答疑结果表明,每个适配器提高性能,并且这两个适配器的组合带来了进一步的改进。探测实验进一步表明,K-适配器捕捉比罗伯塔更丰富的事实和常识性知识。
5. Multi-Fusion Chinese WordNet (MCW) : Compound of Machine Learning and Manual Correction [PDF] 返回目录
Mingchen Li, Zili Zhou, Yanna Wang
Abstract: Princeton WordNet (PWN) is a lexicon-semantic network based on cognitive linguistics, which promotes the development of natural language processing. Based on PWN, five Chinese wordnets have been developed to solve the problems of syntax and semantics. They include: Northeastern University Chinese WordNet (NEW), Sinica Bilingual Ontological WordNet (BOW), Southeast University Chinese WordNet (SEW), Taiwan University Chinese WordNet (CWN), Chinese Open WordNet (COW). By using them, we found that these word networks have low accuracy and coverage, and cannot completely portray the semantic network of PWN. So we decided to make a new Chinese wordnet called Multi-Fusion Chinese Wordnet (MCW) to make up those shortcomings. The key idea is to extend the SEW with the help of Oxford bilingual dictionary and Xinhua bilingual dictionary, and then correct it. More specifically, we used machine learning and manual adjustment in our corrections. Two standards were formulated to help our work. We conducted experiments on three tasks including relatedness calculation, word similarity and word sense disambiguation for the comparison of lemma's accuracy, at the same time, coverage also was compared. The results indicate that MCW can benefit from coverage and accuracy via our method. However, it still has room for improvement, especially with lemmas. In the future, we will continue to enhance the accuracy of MCW and expand the concepts in it.
摘要:普林斯顿的WordNet(PWN)是一种基于认知语言学的词汇,语义网络,促进自然语言处理的发展。基于PWN,五个中国词汇网络已发展到解决语法和语义的问题。它们包括:东北大学中国共发现(NEW),报双语本体共发现(BOW),东南大学中国共发现(SEW),台湾大学中国共发现(CWN),中国公开赛共发现(COW)。通过使用它们,我们发现,这些字网络具有低精度和覆盖范围,并不能完全刻画PWN的语义网络。所以我们决定称为Multi-融合中国WORDNET(MCW)新中国共发现来弥补这些缺陷。其核心思想是将与牛津双解词典和新华双语词典的帮助延长SEW,然后纠正它。更具体地讲,我们用机器学习和手动调节我们的更正。两个标准配制,以帮助我们的工作。我们三个任务,包括相关性计算,词语相似度和引理的精度比较多义进行了实验,在同一时间,范围也进行了比较。结果表明,MCW可以从覆盖范围和精度通过我们的方法中受益。然而,它仍然有改进的余地,尤其是与引理。今后,我们将继续加强MCW的准确性和扩大它的概念。
Mingchen Li, Zili Zhou, Yanna Wang
Abstract: Princeton WordNet (PWN) is a lexicon-semantic network based on cognitive linguistics, which promotes the development of natural language processing. Based on PWN, five Chinese wordnets have been developed to solve the problems of syntax and semantics. They include: Northeastern University Chinese WordNet (NEW), Sinica Bilingual Ontological WordNet (BOW), Southeast University Chinese WordNet (SEW), Taiwan University Chinese WordNet (CWN), Chinese Open WordNet (COW). By using them, we found that these word networks have low accuracy and coverage, and cannot completely portray the semantic network of PWN. So we decided to make a new Chinese wordnet called Multi-Fusion Chinese Wordnet (MCW) to make up those shortcomings. The key idea is to extend the SEW with the help of Oxford bilingual dictionary and Xinhua bilingual dictionary, and then correct it. More specifically, we used machine learning and manual adjustment in our corrections. Two standards were formulated to help our work. We conducted experiments on three tasks including relatedness calculation, word similarity and word sense disambiguation for the comparison of lemma's accuracy, at the same time, coverage also was compared. The results indicate that MCW can benefit from coverage and accuracy via our method. However, it still has room for improvement, especially with lemmas. In the future, we will continue to enhance the accuracy of MCW and expand the concepts in it.
摘要:普林斯顿的WordNet(PWN)是一种基于认知语言学的词汇,语义网络,促进自然语言处理的发展。基于PWN,五个中国词汇网络已发展到解决语法和语义的问题。它们包括:东北大学中国共发现(NEW),报双语本体共发现(BOW),东南大学中国共发现(SEW),台湾大学中国共发现(CWN),中国公开赛共发现(COW)。通过使用它们,我们发现,这些字网络具有低精度和覆盖范围,并不能完全刻画PWN的语义网络。所以我们决定称为Multi-融合中国WORDNET(MCW)新中国共发现来弥补这些缺陷。其核心思想是将与牛津双解词典和新华双语词典的帮助延长SEW,然后纠正它。更具体地讲,我们用机器学习和手动调节我们的更正。两个标准配制,以帮助我们的工作。我们三个任务,包括相关性计算,词语相似度和引理的精度比较多义进行了实验,在同一时间,范围也进行了比较。结果表明,MCW可以从覆盖范围和精度通过我们的方法中受益。然而,它仍然有改进的余地,尤其是与引理。今后,我们将继续加强MCW的准确性和扩大它的概念。
6. Parsing as Pretraining [PDF] 返回目录
David Vilares, Michalina Strzyz, Anders Søgaard, Carlos Gómez-Rodríguez
Abstract: Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures -- and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).
摘要:最近的分析表明预训练的语言模型捕捉特定的形态句法结构的编码器。然而,对于词矢量探测框架仍然不报告的标准设置,如成分和依存分析结果。本文将解决这个问题,不完全解析(英语)只在训练前的架构依赖 - 没有解码。首先,我们投的组成和依赖解析为序列标记。然后,我们使用一个单一的前馈层直接字矢量映射到编码的线性化树标签。这被用来:(ⅰ)见多远我们可以语法建模与刚刚预训练的编码器达到,和(ii)棚约不同字向量的语法灵敏度一些光(由训练期间冻结预训练网络的权重) 。对于评价,我们采用包围F1-得分和LAS,并分析跨表示深度的差异跨度长度和依赖位移。总的结果超过上PTB(93.5%)和端至端EN-EWT UD(78.8%)现有的序列标记的解析器。
David Vilares, Michalina Strzyz, Anders Søgaard, Carlos Gómez-Rodríguez
Abstract: Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures -- and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).
摘要:最近的分析表明预训练的语言模型捕捉特定的形态句法结构的编码器。然而,对于词矢量探测框架仍然不报告的标准设置,如成分和依存分析结果。本文将解决这个问题,不完全解析(英语)只在训练前的架构依赖 - 没有解码。首先,我们投的组成和依赖解析为序列标记。然后,我们使用一个单一的前馈层直接字矢量映射到编码的线性化树标签。这被用来:(ⅰ)见多远我们可以语法建模与刚刚预训练的编码器达到,和(ii)棚约不同字向量的语法灵敏度一些光(由训练期间冻结预训练网络的权重) 。对于评价,我们采用包围F1-得分和LAS,并分析跨表示深度的差异跨度长度和依赖位移。总的结果超过上PTB(93.5%)和端至端EN-EWT UD(78.8%)现有的序列标记的解析器。
7. Identification of Indian Languages using Ghost-VLAD pooling [PDF] 返回目录
Krishna D N, Ankita Patil, M.S.P Raj, Sai Prasad H S, Prabhu Aashish Garapati
Abstract: In this work, we propose a new pooling strategy for language identification by considering Indian languages. The idea is to obtain utterance level features for any variable length audio for robust language recognition. We use the GhostVLAD approach to generate an utterance level feature vector for any variable length input audio by aggregating the local frame level features across time. The generated feature vector is shown to have very good language discriminative features and helps in getting state of the art results for language identification task. We conduct our experiments on 635Hrs of audio data for 7 Indian languages. Our method outperforms the previous state of the art x-vector [11] method by an absolute improvement of 1.88% in F1-score and achieves 98.43% F1-score on the held-out test data. We compare our system with various pooling approaches and show that GhostVLAD is the best pooling approach for this task. We also provide visualization of the utterance level embeddings generated using Ghost-VLAD pooling and show that this method creates embeddings which has very good language discriminative features.
摘要:在这项工作中,我们考虑印度语提出了语言识别新的合并策略。这样做是为了获得话语级功能为强大的语言识别的任何可变长度的音频。我们使用GhostVLAD方法,通过在时间上聚集所述本地帧级特征,以生成用于任何可变长度的输入音频发声水平特征向量。所生成的特征向量显示出具有很好的语言辨别功能,并获得艺术效果的语言识别任务的状态有所帮助。我们进行了对7种印度语言的音频数据的635Hrs我们的实验。我们的方法由1.88%的F1-得分的绝对改进优于现有技术的x矢量[11]的方法的先前状态,并实现所保持的输出测试数据98.43%F1-得分。我们比较了各种池系统接近,并表明GhostVLAD是这个任务的最佳方式汇集。我们还提供了使用Ghost-VLAD汇集和展示所产生的话语层面的嵌入的可视化,这种方法可以创建具有很好的语言判别特征的嵌入。
Krishna D N, Ankita Patil, M.S.P Raj, Sai Prasad H S, Prabhu Aashish Garapati
Abstract: In this work, we propose a new pooling strategy for language identification by considering Indian languages. The idea is to obtain utterance level features for any variable length audio for robust language recognition. We use the GhostVLAD approach to generate an utterance level feature vector for any variable length input audio by aggregating the local frame level features across time. The generated feature vector is shown to have very good language discriminative features and helps in getting state of the art results for language identification task. We conduct our experiments on 635Hrs of audio data for 7 Indian languages. Our method outperforms the previous state of the art x-vector [11] method by an absolute improvement of 1.88% in F1-score and achieves 98.43% F1-score on the held-out test data. We compare our system with various pooling approaches and show that GhostVLAD is the best pooling approach for this task. We also provide visualization of the utterance level embeddings generated using Ghost-VLAD pooling and show that this method creates embeddings which has very good language discriminative features.
摘要:在这项工作中,我们考虑印度语提出了语言识别新的合并策略。这样做是为了获得话语级功能为强大的语言识别的任何可变长度的音频。我们使用GhostVLAD方法,通过在时间上聚集所述本地帧级特征,以生成用于任何可变长度的输入音频发声水平特征向量。所生成的特征向量显示出具有很好的语言辨别功能,并获得艺术效果的语言识别任务的状态有所帮助。我们进行了对7种印度语言的音频数据的635Hrs我们的实验。我们的方法由1.88%的F1-得分的绝对改进优于现有技术的x矢量[11]的方法的先前状态,并实现所保持的输出测试数据98.43%F1-得分。我们比较了各种池系统接近,并表明GhostVLAD是这个任务的最佳方式汇集。我们还提供了使用Ghost-VLAD汇集和展示所产生的话语层面的嵌入的可视化,这种方法可以创建具有很好的语言判别特征的嵌入。
注:中文为机器翻译结果!