目录
5. End-to-End Entity Linking and Disambiguation leveraging Word and Knowledge Graph Embeddings [PDF] 摘要
6. Trends of digitalization and adoption of big data & analytics among UK SMEs: Analysis and lessons drawn from a case study of 53 SMEs [PDF] 摘要
9. A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition [PDF] 摘要
摘要
1. Marathi To English Neural Machine Translation With Near Perfect Corpus And Transformers [PDF] 返回目录
Swapnil Ashok Jadhav
Abstract: There have been very few attempts to benchmark performances of state-of-the-art algorithms for Neural Machine Translation task on Indian Languages. Google, Bing, Facebook and Yandex are some of the very few companies which have built translation systems for few of the Indian Languages. Among them, translation results from Google are supposed to be better, based on general inspection. Bing-Translator do not even support Marathi language which has around 95 million speakers and ranks 15th in the world in terms of combined primary and secondary speakers. In this exercise, we trained and compared variety of Neural Machine Marathi to English Translators trained with BERT-tokenizer by huggingface and various Transformer based architectures using Facebook's Fairseq platform with limited but almost correct parallel corpus to achieve better BLEU scores than Google on Tatoeba and Wikimedia open datasets.
摘要:目前已经很少有企图的国家的最先进的算法基准演出在印度语言神经机器翻译的任务。谷歌,必应,Facebook和Yandex的是一些已建成的翻译系统几个印度语言的企业很少。其中,从谷歌翻译的结果应该是更好的,基于一般检查。炳翻译甚至不支持其中有大约95亿人讲和中小学综合扬声器方面排名第15位的世界马拉语言。在这个练习中,我们训练和比较各种神经机马拉的英文翻译与BERT-标记者训练有素的huggingface并使用Facebook的Fairseq平台限制各种基于变压器的架构,但几乎正确的平行语料库,以达到更好的BLEU得分高于谷歌在Tatoeba和维基媒体开放的数据集。
Swapnil Ashok Jadhav
Abstract: There have been very few attempts to benchmark performances of state-of-the-art algorithms for Neural Machine Translation task on Indian Languages. Google, Bing, Facebook and Yandex are some of the very few companies which have built translation systems for few of the Indian Languages. Among them, translation results from Google are supposed to be better, based on general inspection. Bing-Translator do not even support Marathi language which has around 95 million speakers and ranks 15th in the world in terms of combined primary and secondary speakers. In this exercise, we trained and compared variety of Neural Machine Marathi to English Translators trained with BERT-tokenizer by huggingface and various Transformer based architectures using Facebook's Fairseq platform with limited but almost correct parallel corpus to achieve better BLEU scores than Google on Tatoeba and Wikimedia open datasets.
摘要:目前已经很少有企图的国家的最先进的算法基准演出在印度语言神经机器翻译的任务。谷歌,必应,Facebook和Yandex的是一些已建成的翻译系统几个印度语言的企业很少。其中,从谷歌翻译的结果应该是更好的,基于一般检查。炳翻译甚至不支持其中有大约95亿人讲和中小学综合扬声器方面排名第15位的世界马拉语言。在这个练习中,我们训练和比较各种神经机马拉的英文翻译与BERT-标记者训练有素的huggingface并使用Facebook的Fairseq平台限制各种基于变压器的架构,但几乎正确的平行语料库,以达到更好的BLEU得分高于谷歌在Tatoeba和维基媒体开放的数据集。
2. Using Distributional Thesaurus Embedding for Co-hyponymy Detection [PDF] 返回目录
Abhik Jana, Nikhil Reddy Varimalla, Pawan Goyal
Abstract: Discriminating lexical relations among distributionally similar words has always been a challenge for natural language processing (NLP) community. In this paper, we investigate whether the network embedding of distributional thesaurus can be effectively utilized to detect co-hyponymy relations. By extensive experiments over three benchmark datasets, we show that the vector representation obtained by applying node2vec on distributional thesaurus outperforms the state-of-the-art models for binary classification of co-hyponymy vs. hypernymy, as well as co-hyponymy vs. meronymy, by huge margins.
摘要:中分布式地相似的单词词汇辨析的关系一直是自然语言处理(NLP)社会面临的挑战。在本文中,我们调查是否在网络中嵌入分布式词库可以有效地利用检测共下义关系。通过在三个标准数据集大量的实验,我们表明,通过应用node2vec获得的矢量表示分布式词库性能优于国家的最先进的型号为共同下义与hypernymy的二元分类,以及共同下义与分体法,通过巨大的利润。
Abhik Jana, Nikhil Reddy Varimalla, Pawan Goyal
Abstract: Discriminating lexical relations among distributionally similar words has always been a challenge for natural language processing (NLP) community. In this paper, we investigate whether the network embedding of distributional thesaurus can be effectively utilized to detect co-hyponymy relations. By extensive experiments over three benchmark datasets, we show that the vector representation obtained by applying node2vec on distributional thesaurus outperforms the state-of-the-art models for binary classification of co-hyponymy vs. hypernymy, as well as co-hyponymy vs. meronymy, by huge margins.
摘要:中分布式地相似的单词词汇辨析的关系一直是自然语言处理(NLP)社会面临的挑战。在本文中,我们调查是否在网络中嵌入分布式词库可以有效地利用检测共下义关系。通过在三个标准数据集大量的实验,我们表明,通过应用node2vec获得的矢量表示分布式词库性能优于国家的最先进的型号为共同下义与hypernymy的二元分类,以及共同下义与分体法,通过巨大的利润。
3. Detecting Potential Topics In News Using BERT, CRF and Wikipedia [PDF] 返回目录
Swapnil Ashok Jadhav
Abstract: For a news content distribution platform like Dailyhunt, Named Entity Recognition is a pivotal task for building better user recommendation and notification algorithms. Apart from identifying names, locations, organisations from the news for 13+ Indian languages and use them in algorithms, we also need to identify n-grams which do not necessarily fit in the definition of Named-Entity, yet they are important. For example, "me too movement", "beef ban", "alwar mob lynching". In this exercise, given an English language text, we are trying to detect case-less n-grams which convey important information and can be used as topics and/or hashtags for a news. Model is built using Wikipedia titles data, private English news corpus and BERT-Multilingual pre-trained model, Bi-GRU and CRF architecture. It shows promising results when compared with industry best Flair, Spacy and Stanford-caseless-NER in terms of F1 and especially Recall.
摘要:对于新闻内容发布平台像Dailyhunt,命名实体识别是建立更好的用户建议,并通知算法的关键任务。除了从13+印度语新闻识别名称,位置,组织和算法使用它们,我们还需要确定的n-gram它不一定适合在命名实体的定义,但他们却是非常重要的。例如,“我也运动”,“牛肉禁令”,“暴民阿尔瓦尔私刑”。在这个练习中,由于英语语言的文本,我们试图检测无外壳正克传达重要信息,可作为主题和/或主题标签的消息。模型使用维基百科标题数据建,私人英语新闻语料和BERT,多语种预训练模式,双GRU和CRF架构。当同行业相比,它显示了可喜的成果最好的天才,Spacy和F1,尤其是召回方面斯坦福无壳-ER。
Swapnil Ashok Jadhav
Abstract: For a news content distribution platform like Dailyhunt, Named Entity Recognition is a pivotal task for building better user recommendation and notification algorithms. Apart from identifying names, locations, organisations from the news for 13+ Indian languages and use them in algorithms, we also need to identify n-grams which do not necessarily fit in the definition of Named-Entity, yet they are important. For example, "me too movement", "beef ban", "alwar mob lynching". In this exercise, given an English language text, we are trying to detect case-less n-grams which convey important information and can be used as topics and/or hashtags for a news. Model is built using Wikipedia titles data, private English news corpus and BERT-Multilingual pre-trained model, Bi-GRU and CRF architecture. It shows promising results when compared with industry best Flair, Spacy and Stanford-caseless-NER in terms of F1 and especially Recall.
摘要:对于新闻内容发布平台像Dailyhunt,命名实体识别是建立更好的用户建议,并通知算法的关键任务。除了从13+印度语新闻识别名称,位置,组织和算法使用它们,我们还需要确定的n-gram它不一定适合在命名实体的定义,但他们却是非常重要的。例如,“我也运动”,“牛肉禁令”,“暴民阿尔瓦尔私刑”。在这个练习中,由于英语语言的文本,我们试图检测无外壳正克传达重要信息,可作为主题和/或主题标签的消息。模型使用维基百科标题数据建,私人英语新闻语料和BERT,多语种预训练模式,双GRU和CRF架构。当同行业相比,它显示了可喜的成果最好的天才,Spacy和F1,尤其是召回方面斯坦福无壳-ER。
4. Speech2Phone: A Multilingual and Text Independent Speaker Identification Model [PDF] 返回目录
Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Hamilton Pereira da Silva, Pedro Luiz de Paula Filho, Alessandro Ferreira Cordeiro, Victor de Oliveira Guedes, Sandra Maria Aluisio
Abstract: Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100% and 76.96% for closed an open set scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.
摘要:语音识别是具有广泛应用前景的领域。说话人识别是在几个语音识别任务非常有用,如基于语音的认证,转录系统和智能个人助理看到。有些任务受益于开放式集的模型,它可以处理新的扬声器,而不需要再培训的。对于说话人识别音频的嵌入是解决这一问题的建议。然而,选择一个合适的模型是一项艰巨的任务,尤其是在培训资源是稀缺的。此外,它并不总是很清楚的嵌入是否不如更传统的方法。在这项工作中,我们提出了Speech2Phone并比较开集说话人识别几个嵌入模型,以及传统的闭集模型。该车型在小型数据集的情况下,这使得它们更适用于数据珍异是一个问题的语言进行了调查。结果表明,相对于任务的经典方法时,人工神经网络产生的嵌入具有竞争力。考虑到20个扬声器组成的测试数据集,最好的车型分别达到100%和76.96%,精度为封闭的开集情景。结果表明,该模型可以执行独立于语言的说话人识别。在这些测试的模型,完全连接的一个,这里作为Speech2Phone,导致了更高的精度。此外,该机型是为表明学过的知识被成功地转移了靠近和远离的语言葡萄牙语(在词汇方面)不同的语言测试。最后,该模型可以扩展,并且可以处理更多的扬声器比他们进行了培训,辨认更150%的扬声器,同时仍保持55%的准确率。
Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Hamilton Pereira da Silva, Pedro Luiz de Paula Filho, Alessandro Ferreira Cordeiro, Victor de Oliveira Guedes, Sandra Maria Aluisio
Abstract: Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100% and 76.96% for closed an open set scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.
摘要:语音识别是具有广泛应用前景的领域。说话人识别是在几个语音识别任务非常有用,如基于语音的认证,转录系统和智能个人助理看到。有些任务受益于开放式集的模型,它可以处理新的扬声器,而不需要再培训的。对于说话人识别音频的嵌入是解决这一问题的建议。然而,选择一个合适的模型是一项艰巨的任务,尤其是在培训资源是稀缺的。此外,它并不总是很清楚的嵌入是否不如更传统的方法。在这项工作中,我们提出了Speech2Phone并比较开集说话人识别几个嵌入模型,以及传统的闭集模型。该车型在小型数据集的情况下,这使得它们更适用于数据珍异是一个问题的语言进行了调查。结果表明,相对于任务的经典方法时,人工神经网络产生的嵌入具有竞争力。考虑到20个扬声器组成的测试数据集,最好的车型分别达到100%和76.96%,精度为封闭的开集情景。结果表明,该模型可以执行独立于语言的说话人识别。在这些测试的模型,完全连接的一个,这里作为Speech2Phone,导致了更高的精度。此外,该机型是为表明学过的知识被成功地转移了靠近和远离的语言葡萄牙语(在词汇方面)不同的语言测试。最后,该模型可以扩展,并且可以处理更多的扬声器比他们进行了培训,辨认更150%的扬声器,同时仍保持55%的准确率。
5. End-to-End Entity Linking and Disambiguation leveraging Word and Knowledge Graph Embeddings [PDF] 返回目录
Rostislav Nedelchev, Debanjan Chaudhuri, Jens Lehmann, Asja Fischer
Abstract: Entity linking - connecting entity mentions in a natural language utterance to knowledge graph (KG) entities is a crucial step for question answering over KGs. It is often based on measuring the string similarity between the entity label and its mention in the question. The relation referred to in the question can help to disambiguate between entities with the same label. This can be misleading if an incorrect relation has been identified in the relation linking step. However, an incorrect relation may still be semantically similar to the relation in which the correct entity forms a triple within the KG; which could be captured by the similarity of their KG embeddings. Based on this idea, we propose the first end-to-end neural network approach that employs KG as well as word embeddings to perform joint relation and entity classification of simple questions while implicitly performing entity disambiguation with the help of a novel gating mechanism. An empirical evaluation shows that the proposed approach achieves a performance comparable to state-of-the-art entity linking while requiring less post-processing.
摘要:实体连接 - 连接实体的自然语言语句知识图(KG)实体提到是问题回答了幼儿园的关键一步。它往往是基于测量的实体标签,它在这个问题提到之间的串相似度。的关系中提到的问题,可以帮助具有相同标签的实体之间的歧义。这可能是,如果一个不正确的关系已经在相关联的步骤已经确定误导。然而,不正确的关系仍然可以是语义上相似,其中正确的实体形成内KG三重的关系;这可以通过他们的KG的嵌入的相似性被捕获。基于这一思路,我们建议采用KG和字的嵌入,同时含蓄地用新的门控机制的帮助下进行实体消歧进行的简单问题的联合关系,实体分类第一终端到终端的神经网络方法。一种经验评估表明,所提出的方法实现了相当先进的最先进的实体,同时要求较少的后处理联性能。
Rostislav Nedelchev, Debanjan Chaudhuri, Jens Lehmann, Asja Fischer
Abstract: Entity linking - connecting entity mentions in a natural language utterance to knowledge graph (KG) entities is a crucial step for question answering over KGs. It is often based on measuring the string similarity between the entity label and its mention in the question. The relation referred to in the question can help to disambiguate between entities with the same label. This can be misleading if an incorrect relation has been identified in the relation linking step. However, an incorrect relation may still be semantically similar to the relation in which the correct entity forms a triple within the KG; which could be captured by the similarity of their KG embeddings. Based on this idea, we propose the first end-to-end neural network approach that employs KG as well as word embeddings to perform joint relation and entity classification of simple questions while implicitly performing entity disambiguation with the help of a novel gating mechanism. An empirical evaluation shows that the proposed approach achieves a performance comparable to state-of-the-art entity linking while requiring less post-processing.
摘要:实体连接 - 连接实体的自然语言语句知识图(KG)实体提到是问题回答了幼儿园的关键一步。它往往是基于测量的实体标签,它在这个问题提到之间的串相似度。的关系中提到的问题,可以帮助具有相同标签的实体之间的歧义。这可能是,如果一个不正确的关系已经在相关联的步骤已经确定误导。然而,不正确的关系仍然可以是语义上相似,其中正确的实体形成内KG三重的关系;这可以通过他们的KG的嵌入的相似性被捕获。基于这一思路,我们建议采用KG和字的嵌入,同时含蓄地用新的门控机制的帮助下进行实体消歧进行的简单问题的联合关系,实体分类第一终端到终端的神经网络方法。一种经验评估表明,所提出的方法实现了相当先进的最先进的实体,同时要求较少的后处理联性能。
6. Trends of digitalization and adoption of big data & analytics among UK SMEs: Analysis and lessons drawn from a case study of 53 SMEs [PDF] 返回目录
Muhidin Mohamed
Abstract: Small and Medium Enterprises (SMEs) now generate digital data at an unprecedented rate from online transactions, social media marketing and associated customer interactions, online product or service reviews and feedback, clinical diagnosis, Internet of Things (IoT) sensors, and production processes. All these forms of data can be transformed into monetary value if put into a proper data value chain. This requires both skills and IT investments for the long-term benefit of businesses. However, such spending is beyond the capacity of most SMEs due to their limited resources and restricted access to finances. This paper presents lessons learned from a case study of 53 UK SMEs, mostly from the West Midlands region of England, supported as part of a 3-year ERDF project, Big Data Corridor, in the areas of big data management, analytics and related IT issues. Based on our study's sample companies, several perspectives including the digital technology trends, challenges facing the UK SMEs, and the state of their adoption in data analytics and big data, are presented in the paper.
摘要:中小型企业(SMEs)现在可以生成从网上交易以前所未有的速度数字数据,社会化媒体营销和相关的客户互动,在线产品或服务的评论和反馈,临床诊断,物联网(IOT)互联网传感器,以及生产流程。所有这些形式的数据可如果放到一个正确的数据价值链转化成货币价值。这既需要技能和企业的长期效益的IT投资。然而,这样的支出超出了大多数中小企业的能力,由于其有限的资源和资金限制访问。本文礼物教训53个英国中小企业,大多来自英国西米德兰地区,支持为3年ERDF项目,大数据通道的一部分的情况下节课的学习,在大数据管理,分析的领域和相关的IT的问题。根据我们研究的样本公司,几个方面,包括数字化技术的发展趋势,面临的挑战英国中小企业,以及它们在数据分析和大数据采用的状态,在论文分别介绍。
Muhidin Mohamed
Abstract: Small and Medium Enterprises (SMEs) now generate digital data at an unprecedented rate from online transactions, social media marketing and associated customer interactions, online product or service reviews and feedback, clinical diagnosis, Internet of Things (IoT) sensors, and production processes. All these forms of data can be transformed into monetary value if put into a proper data value chain. This requires both skills and IT investments for the long-term benefit of businesses. However, such spending is beyond the capacity of most SMEs due to their limited resources and restricted access to finances. This paper presents lessons learned from a case study of 53 UK SMEs, mostly from the West Midlands region of England, supported as part of a 3-year ERDF project, Big Data Corridor, in the areas of big data management, analytics and related IT issues. Based on our study's sample companies, several perspectives including the digital technology trends, challenges facing the UK SMEs, and the state of their adoption in data analytics and big data, are presented in the paper.
摘要:中小型企业(SMEs)现在可以生成从网上交易以前所未有的速度数字数据,社会化媒体营销和相关的客户互动,在线产品或服务的评论和反馈,临床诊断,物联网(IOT)互联网传感器,以及生产流程。所有这些形式的数据可如果放到一个正确的数据价值链转化成货币价值。这既需要技能和企业的长期效益的IT投资。然而,这样的支出超出了大多数中小企业的能力,由于其有限的资源和资金限制访问。本文礼物教训53个英国中小企业,大多来自英国西米德兰地区,支持为3年ERDF项目,大数据通道的一部分的情况下节课的学习,在大数据管理,分析的领域和相关的IT的问题。根据我们研究的样本公司,几个方面,包括数字化技术的发展趋势,面临的挑战英国中小企业,以及它们在数据分析和大数据采用的状态,在论文分别介绍。
7. Object Relational Graph with Teacher-Recommended Learning for Video Captioning [PDF] 返回目录
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha
Abstract: Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.
摘要:在充分考虑来自视觉和语言的信息,充分利用是视频字幕任务的关键。现有车型缺乏足够的可视化表示,由于物体之间的相互作用的忽视,以及由于长尾问题内容有关的词汇足够的培训。在本文中,我们提出了一个完整的视频字幕系统,既包括新的模型和有效的培训战略。具体地,我们提出了一种对象关系图(ORG)基于编码器,其捕获更详细交互功能来丰富视觉表示。同时,我们设计了一个老师推荐的学习(TRL)方法,以充分利用外部成功的语言模型(ELM)与丰富的语言知识融入到标题的模式。榆树产生更多的语义延伸用于训练应对长尾问题的地面实况话相似字的建议。在三个基准实验评估:MSVD,MSR-VTT和VATEX表明了该ORG-TRL系统实现国家的最先进的性能。广泛切除研究和可视化说明我们系统的有效性。
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha
Abstract: Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.
摘要:在充分考虑来自视觉和语言的信息,充分利用是视频字幕任务的关键。现有车型缺乏足够的可视化表示,由于物体之间的相互作用的忽视,以及由于长尾问题内容有关的词汇足够的培训。在本文中,我们提出了一个完整的视频字幕系统,既包括新的模型和有效的培训战略。具体地,我们提出了一种对象关系图(ORG)基于编码器,其捕获更详细交互功能来丰富视觉表示。同时,我们设计了一个老师推荐的学习(TRL)方法,以充分利用外部成功的语言模型(ELM)与丰富的语言知识融入到标题的模式。榆树产生更多的语义延伸用于训练应对长尾问题的地面实况话相似字的建议。在三个基准实验评估:MSVD,MSR-VTT和VATEX表明了该ORG-TRL系统实现国家的最先进的性能。广泛切除研究和可视化说明我们系统的有效性。
8. Sparse Sinkhorn Attention [PDF] 返回目录
Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan
Abstract: We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our memory efficient Sinkhorn Attention method is competitive with vanilla attention and consistently outperforms recently proposed efficient Transformer models such as Sparse Transformers.
摘要:本文提出稀疏Sinkhorn注意,学习去参加一个新的高效和稀疏的方法。我们的方法是基于内部表示微排序。具体而言,我们引入一个元排序网络学会产生超过序列潜排列。鉴于排序序列,我们就能够计算准全球的关注,只有本地窗口,提高注意力模块的存储效率。为此,我们提出了新的算法创新,如因果Sinkhorn平衡和SortCut,用于定制Sinkhorn注意,用于编码和/或解码目的的动态序列截断方法。通过对算法seq2seq排序,语言建模,逐像素图像生成,文档分类和自然语言推理广泛的实验,我们证明了我们的记忆效率Sinkhorn注意力的方法是用香草关注竞争力和持续优于最近提出的高效变压器等车型稀疏变形金刚。
Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan
Abstract: We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our memory efficient Sinkhorn Attention method is competitive with vanilla attention and consistently outperforms recently proposed efficient Transformer models such as Sparse Transformers.
摘要:本文提出稀疏Sinkhorn注意,学习去参加一个新的高效和稀疏的方法。我们的方法是基于内部表示微排序。具体而言,我们引入一个元排序网络学会产生超过序列潜排列。鉴于排序序列,我们就能够计算准全球的关注,只有本地窗口,提高注意力模块的存储效率。为此,我们提出了新的算法创新,如因果Sinkhorn平衡和SortCut,用于定制Sinkhorn注意,用于编码和/或解码目的的动态序列截断方法。通过对算法seq2seq排序,语言建模,逐像素图像生成,文档分类和自然语言推理广泛的实验,我们证明了我们的记忆效率Sinkhorn注意力的方法是用香草关注竞争力和持续优于最近提出的高效变压器等车型稀疏变形金刚。
9. A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition [PDF] 返回目录
Erik McDermott, Hasim Sak, Ehsan Variani
Abstract: This article describes a density ratio approach to integrating external Language Models (LMs) into end-to-end models for Automatic Speech Recognition (ASR). Applied to a Recurrent Neural Network Transducer (RNN-T) ASR model trained on a given domain, a matched in-domain RNN-LM, and a target domain RNN-LM, the proposed method uses Bayes' Rule to define RNN-T posteriors for the target domain, in a manner directly analogous to the classic hybrid model for ASR based on Deep Neural Networks (DNNs) or LSTMs in the Hidden Markov Model (HMM) framework (Bourlard & Morgan, 1994). The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T. Specifically, an RNN-T model trained on paired audio & transcript data from YouTube is evaluated for its ability to generalize to Voice Search data. The Density Ratio method was found to consistently outperform the dominant approach to LM and end-to-end ASR integration, Shallow Fusion.
摘要:本文介绍了一种密度比逼近外部语言模型(LMS)集成到终端到高端机型的自动语音识别(ASR)。施加到上训练给定域中的回归神经网络传感器(RNN-T)ASR模型,匹配在域RNN-LM,和目标域RNN-LM,所提出的方法使用贝叶斯规则来定义RNN-T后验目标域,在直接类似于基于深神经网络(DNNs)或在隐马尔可夫模型(HMM)的框架(Bourlard和Morgan,1994)LSTMs对ASR的经典混合模型的方式。所提出的方法在跨域和有限的数据情况下,对于其用于LM训练目标域的文本数据的一个显著量进行评价,但只有有限的(或没有){音频,转录}训练数据对被用来训练所述RNN-T。具体而言,从YouTube训练的成对的音频和成绩单数据RNN-T模式是其推广到语音搜索数据的能力进行评估。密度比的方法,发现持续超越的主要途径LM和终端到终端的整合ASR,浅层融合。
Erik McDermott, Hasim Sak, Ehsan Variani
Abstract: This article describes a density ratio approach to integrating external Language Models (LMs) into end-to-end models for Automatic Speech Recognition (ASR). Applied to a Recurrent Neural Network Transducer (RNN-T) ASR model trained on a given domain, a matched in-domain RNN-LM, and a target domain RNN-LM, the proposed method uses Bayes' Rule to define RNN-T posteriors for the target domain, in a manner directly analogous to the classic hybrid model for ASR based on Deep Neural Networks (DNNs) or LSTMs in the Hidden Markov Model (HMM) framework (Bourlard & Morgan, 1994). The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T. Specifically, an RNN-T model trained on paired audio & transcript data from YouTube is evaluated for its ability to generalize to Voice Search data. The Density Ratio method was found to consistently outperform the dominant approach to LM and end-to-end ASR integration, Shallow Fusion.
摘要:本文介绍了一种密度比逼近外部语言模型(LMS)集成到终端到高端机型的自动语音识别(ASR)。施加到上训练给定域中的回归神经网络传感器(RNN-T)ASR模型,匹配在域RNN-LM,和目标域RNN-LM,所提出的方法使用贝叶斯规则来定义RNN-T后验目标域,在直接类似于基于深神经网络(DNNs)或在隐马尔可夫模型(HMM)的框架(Bourlard和Morgan,1994)LSTMs对ASR的经典混合模型的方式。所提出的方法在跨域和有限的数据情况下,对于其用于LM训练目标域的文本数据的一个显著量进行评价,但只有有限的(或没有){音频,转录}训练数据对被用来训练所述RNN-T。具体而言,从YouTube训练的成对的音频和成绩单数据RNN-T模式是其推广到语音搜索数据的能力进行评估。密度比的方法,发现持续超越的主要途径LM和终端到终端的整合ASR,浅层融合。
注:中文为机器翻译结果!