目录
3. BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a Combination of Multilingualism and Language Models [PDF] 摘要
摘要
1. Transformer based Multilingual document Embedding model [PDF] 返回目录
Wei Li, Brian Mak
Abstract: One of the current state-of-the-art multilingual document embedding model is the bidirectional LSTM-based multilingual neural machine translation model (LASER). This paper presents a transformer-based sentence/document embedding model, T-LASER, which makes three significant improvements. Firstly, the BiLSTM encoder is replaced by the attention-based transformer structure, which is more capable of learning sequential patterns in longer texts. Secondly, due to the absence of recurrence, T-LASER enables faster parallel computations in the encoder to generate the text embedding. Thirdly, we augment the NMT translation loss function with an additional novel distance constraint loss. This distance constraint loss would further bring the embeddings of parallel sentences close together in the vector space; we call the T-LASER model trained with distance constraint, cT-LASER. Our cT-LASER model significantly outperforms both BiLSTM-based LASER and the simpler transformer-based T-LASER.
摘要:一个当前国家的最先进的多语言文档嵌入模型是基于双向LSTM,多语种神经机器翻译模型(LASER)。本文提出了一种基于变压器的句子/文档嵌入模型,T-激光器,它提出了三个显著的改善。首先,BiLSTM编码器所重视,基于变压器的结构,更能够在较长的文本学习序列模式所取代。其次,由于没有复发,T-LASER使得在编码器中更快的并行计算以产生所述文本嵌入。第三,我们增加一个额外新颖距离约束损失NMT翻译损失函数。这个距离约束损失将在矢量空间进一步带来平行句子的嵌入靠近在一起;我们所说的与距离的限制,CT-LASER训练T-LASER模型。我们的CT-LASER模型显著性能优于基于BiLSTM - 激光和简单的基于变压器的T-LASER。
Wei Li, Brian Mak
Abstract: One of the current state-of-the-art multilingual document embedding model is the bidirectional LSTM-based multilingual neural machine translation model (LASER). This paper presents a transformer-based sentence/document embedding model, T-LASER, which makes three significant improvements. Firstly, the BiLSTM encoder is replaced by the attention-based transformer structure, which is more capable of learning sequential patterns in longer texts. Secondly, due to the absence of recurrence, T-LASER enables faster parallel computations in the encoder to generate the text embedding. Thirdly, we augment the NMT translation loss function with an additional novel distance constraint loss. This distance constraint loss would further bring the embeddings of parallel sentences close together in the vector space; we call the T-LASER model trained with distance constraint, cT-LASER. Our cT-LASER model significantly outperforms both BiLSTM-based LASER and the simpler transformer-based T-LASER.
摘要:一个当前国家的最先进的多语言文档嵌入模型是基于双向LSTM,多语种神经机器翻译模型(LASER)。本文提出了一种基于变压器的句子/文档嵌入模型,T-激光器,它提出了三个显著的改善。首先,BiLSTM编码器所重视,基于变压器的结构,更能够在较长的文本学习序列模式所取代。其次,由于没有复发,T-LASER使得在编码器中更快的并行计算以产生所述文本嵌入。第三,我们增加一个额外新颖距离约束损失NMT翻译损失函数。这个距离约束损失将在矢量空间进一步带来平行句子的嵌入靠近在一起;我们所说的与距离的限制,CT-LASER训练T-LASER模型。我们的CT-LASER模型显著性能优于基于BiLSTM - 激光和简单的基于变压器的T-LASER。
2. UoB at SemEval-2020 Task 12: Boosting BERT with Corpus Level Information [PDF] 返回目录
Wah Meng Lim, Harish Tayyar Madabushi
Abstract: Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art. This can largely be attributed to their ability to better capture semantic information contained within a sentence. Several tasks, however, can benefit from information available at a corpus level, such as Term Frequency-Inverse Document Frequency (TF-IDF). In this work we test the effectiveness of integrating this information with BERT on the task of identifying abuse on social media and show that integrating this information with BERT does indeed significantly improve performance. We participate in Sub-Task A (abuse detection) wherein we achieve a score within two points of the top performing team and in Sub-Task B (target detection) wherein we are ranked 4 of the 44 participating teams.
摘要:预先训练语言模型字表示,如BERT,已经非常成功的几个自然语言处理任务,在国家的最先进的显著改善。这可以在很大程度上归因于自己的能力包含在句子中更好地捕捉语义信息。几个任务,但是,可以从现有的资料在语料库水平中受益,如词频 - 逆文档频率(TF-IDF)。在这项工作中,我们测试整合于识别社交媒体,并表明与BERT整合这些信息确实显著提高性能虐待的任务与BERT该信息的有效性。我们参加了子任务A(滥用检测),其中我们中表现最出色的球队两分和子任务B(目标检测),其中我们排名44支参赛队的4实现了比分。
Wah Meng Lim, Harish Tayyar Madabushi
Abstract: Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art. This can largely be attributed to their ability to better capture semantic information contained within a sentence. Several tasks, however, can benefit from information available at a corpus level, such as Term Frequency-Inverse Document Frequency (TF-IDF). In this work we test the effectiveness of integrating this information with BERT on the task of identifying abuse on social media and show that integrating this information with BERT does indeed significantly improve performance. We participate in Sub-Task A (abuse detection) wherein we achieve a score within two points of the top performing team and in Sub-Task B (target detection) wherein we are ranked 4 of the 44 participating teams.
摘要:预先训练语言模型字表示,如BERT,已经非常成功的几个自然语言处理任务,在国家的最先进的显著改善。这可以在很大程度上归因于自己的能力包含在句子中更好地捕捉语义信息。几个任务,但是,可以从现有的资料在语料库水平中受益,如词频 - 逆文档频率(TF-IDF)。在这项工作中,我们测试整合于识别社交媒体,并表明与BERT整合这些信息确实显著提高性能虐待的任务与BERT该信息的有效性。我们参加了子任务A(滥用检测),其中我们中表现最出色的球队两分和子任务B(目标检测),其中我们排名44支参赛队的4实现了比分。
3. BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a Combination of Multilingualism and Language Models [PDF] 返回目录
Lucas R. C. Pessutto, Tiago de Melo, Viviane P. Moreira, Altigran da Silva
Abstract: This paper describes the system submitted by our team (BabelEnconding) to SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word Similarity. We propose an approach that relies on translation and multilingual language models in order to compute the contextual similarity between pairs of words. Our hypothesis is that evidence from additional languages can leverage the correlation with the human generated scores. BabelEnconding was applied to both subtasks and ranked among the top-3 in six out of eight task/language combinations and was the highest scoring system three times.
摘要:本文介绍了提交我们的团队(BabelEnconding)到SemEval-2020任务3系统:预测语境在Word中相似的分级效果。我们建议依赖于翻译和多语种的语言模型以计算对单词之间的上下文相似度的方法。我们的假设是从其他语言的证据可以充分利用与人体产生分数的相关性。 BabelEnconding施加给两子任务与中六排前三的出八任务/语言组合和最高评分系统的三倍。
Lucas R. C. Pessutto, Tiago de Melo, Viviane P. Moreira, Altigran da Silva
Abstract: This paper describes the system submitted by our team (BabelEnconding) to SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word Similarity. We propose an approach that relies on translation and multilingual language models in order to compute the contextual similarity between pairs of words. Our hypothesis is that evidence from additional languages can leverage the correlation with the human generated scores. BabelEnconding was applied to both subtasks and ranked among the top-3 in six out of eight task/language combinations and was the highest scoring system three times.
摘要:本文介绍了提交我们的团队(BabelEnconding)到SemEval-2020任务3系统:预测语境在Word中相似的分级效果。我们建议依赖于翻译和多语种的语言模型以计算对单词之间的上下文相似度的方法。我们的假设是从其他语言的证据可以充分利用与人体产生分数的相关性。 BabelEnconding施加给两子任务与中六排前三的出八任务/语言组合和最高评分系统的三倍。
4. FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics [PDF] 返回目录
Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo
Abstract: Creating open-domain chatbots requires large amounts of conversational data and related benchmark tasks to evaluate them. Standardized evaluation tasks are crucial for creating automatic evaluation metrics for model development; otherwise, comparing the models would require resource-expensive human evaluation. While chatbot challenges have recently managed to provide a plethora of such resources for English, resources in other languages are not yet available. In this work, we provide a starting point for Finnish open-domain chatbot research. We describe our collection efforts to create the Finnish chat conversation corpus FinChat, which is made available publicly. FinChat includes unscripted conversations on seven topics from people of different ages. Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development. We observe that off-the-shelf chatbot models trained on conversational corpora do not perform better than chance at choosing the right answer based on automatic metrics, while humans can do the same task almost perfectly. Similarly, in a human evaluation, responses to questions from the evaluation set generated by the chatbots are predominantly marked as incoherent. Thus, FinChat provides a challenging evaluation set, meant to encourage chatbot development in Finnish.
摘要:创建开放域聊天机器人需要大量的会话数据和相关基准任务来评价他们。标准化的评价任务是用于创建模型开发的自动评估指标至关重要;否则,比较模型将需要资源,昂贵的人工评估。虽然聊天机器人挑战最近设法提供英语这样的资源过多,在其他语言资源尚未公布。在这项工作中,我们提供了芬兰的开放域聊天机器人研究的一个出发点。我们描述了我们收集的努力创造了芬兰聊天对话语料库FinChat,这是可以公开获得的。 FinChat包括不同年龄的人对七个议题脱稿谈话。使用这个语料库,我们还构建了芬兰聊天机器人开发一个基于内容的检索评估任务。我们观察到培训了对话语料库在选择基于自动度量正确的答案不执行优于偶然性,是现成的,货架聊天机器人模型,虽然人类可以做同样的任务,几乎是完美的。类似地,在人的评价,从由所述聊天机器人产生的评估集合问题的答案进行主要标记为不相干。因此,FinChat提供一个具有挑战性的评价集,旨在鼓励聊天机器人发展的芬兰。
Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo
Abstract: Creating open-domain chatbots requires large amounts of conversational data and related benchmark tasks to evaluate them. Standardized evaluation tasks are crucial for creating automatic evaluation metrics for model development; otherwise, comparing the models would require resource-expensive human evaluation. While chatbot challenges have recently managed to provide a plethora of such resources for English, resources in other languages are not yet available. In this work, we provide a starting point for Finnish open-domain chatbot research. We describe our collection efforts to create the Finnish chat conversation corpus FinChat, which is made available publicly. FinChat includes unscripted conversations on seven topics from people of different ages. Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development. We observe that off-the-shelf chatbot models trained on conversational corpora do not perform better than chance at choosing the right answer based on automatic metrics, while humans can do the same task almost perfectly. Similarly, in a human evaluation, responses to questions from the evaluation set generated by the chatbots are predominantly marked as incoherent. Thus, FinChat provides a challenging evaluation set, meant to encourage chatbot development in Finnish.
摘要:创建开放域聊天机器人需要大量的会话数据和相关基准任务来评价他们。标准化的评价任务是用于创建模型开发的自动评估指标至关重要;否则,比较模型将需要资源,昂贵的人工评估。虽然聊天机器人挑战最近设法提供英语这样的资源过多,在其他语言资源尚未公布。在这项工作中,我们提供了芬兰的开放域聊天机器人研究的一个出发点。我们描述了我们收集的努力创造了芬兰聊天对话语料库FinChat,这是可以公开获得的。 FinChat包括不同年龄的人对七个议题脱稿谈话。使用这个语料库,我们还构建了芬兰聊天机器人开发一个基于内容的检索评估任务。我们观察到培训了对话语料库在选择基于自动度量正确的答案不执行优于偶然性,是现成的,货架聊天机器人模型,虽然人类可以做同样的任务,几乎是完美的。类似地,在人的评价,从由所述聊天机器人产生的评估集合问题的答案进行主要标记为不相干。因此,FinChat提供一个具有挑战性的评价集,旨在鼓励聊天机器人发展的芬兰。
5. Victim or Perpetrator? Analysis of Violent Characters Portrayals from Movie Scripts [PDF] 返回目录
Victor R Martinez, Krishna Somendapalli, Karan Singla, Anil Ramanakrishna, Yalda T. Uhls, Shrikanth Narayanan
Abstract: Violent content in the media can influence viewers' perception of the society. For example, frequent depictions of certain demographics as victims or perpetrators of violence can shape stereotyped attitudes. We propose that computational methods can aid in the large-scale analysis of violence in movies. The method we develop characterizes aspects of violent content solely from the language used in the scripts. Thus, our method is applicable to a movie in the earlier stages of content creation even before it is produced. This is complementary to previous works which rely on audio or video post production. In this work, we identify stereotypes in character roles (i.e., victim, perpetrator and narrator) based on the demographics of the actor casted for that role. Our results highlight two significant differences in the frequency of portrayals as well as the demographics of the interaction between victims and perpetrators : (1) female characters appear more often as victims, and (2) perpetrators are more likely to be White if the victim is Black or Latino. To date, we are the first to show that language used in movie scripts is a strong indicator of violent content, and that there are systematic portrayals of certain demographics as victims and perpetrators in a large dataset. This offers novel computational tools to assist in creating awareness of representations in storytelling
摘要:媒体可以影响观众的社会认知暴力内容。例如,某些人口统计作为受害者或暴力犯罪者的频繁描绘可以塑造定型的态度。我们提出的计算方法可以在电影中的暴力大规模有助于分析。该方法我们开发的仅在脚本中使用的语言暴力内容特征化方面。因此,我们的方法适用于电影在它产生之前就内容创作的早期阶段。这是依赖于音频或视频后期制作公司以前的作品互补。在这项工作中,我们确定的基础上铸造了该角色的演员的人口统计性格的角色(即受害者,犯罪者和叙述者)定型。我们的研究结果突出描写的频率的两个显著差异以及受害者和肇事者之间的相互作用的人口统计数据:(1)女性角色更常出现作为受害者,和(2)犯罪者更可能是白色的,如果受害人黑人或拉丁美洲人。迄今为止,我们是第一个显示的电影脚本中使用的语言是暴力内容的重要指标,并有一定的人口在大型数据集的受害者和肇事者的系统性写照。这提供了新的计算工具,以帮助创建评书表示意识
Victor R Martinez, Krishna Somendapalli, Karan Singla, Anil Ramanakrishna, Yalda T. Uhls, Shrikanth Narayanan
Abstract: Violent content in the media can influence viewers' perception of the society. For example, frequent depictions of certain demographics as victims or perpetrators of violence can shape stereotyped attitudes. We propose that computational methods can aid in the large-scale analysis of violence in movies. The method we develop characterizes aspects of violent content solely from the language used in the scripts. Thus, our method is applicable to a movie in the earlier stages of content creation even before it is produced. This is complementary to previous works which rely on audio or video post production. In this work, we identify stereotypes in character roles (i.e., victim, perpetrator and narrator) based on the demographics of the actor casted for that role. Our results highlight two significant differences in the frequency of portrayals as well as the demographics of the interaction between victims and perpetrators : (1) female characters appear more often as victims, and (2) perpetrators are more likely to be White if the victim is Black or Latino. To date, we are the first to show that language used in movie scripts is a strong indicator of violent content, and that there are systematic portrayals of certain demographics as victims and perpetrators in a large dataset. This offers novel computational tools to assist in creating awareness of representations in storytelling
摘要:媒体可以影响观众的社会认知暴力内容。例如,某些人口统计作为受害者或暴力犯罪者的频繁描绘可以塑造定型的态度。我们提出的计算方法可以在电影中的暴力大规模有助于分析。该方法我们开发的仅在脚本中使用的语言暴力内容特征化方面。因此,我们的方法适用于电影在它产生之前就内容创作的早期阶段。这是依赖于音频或视频后期制作公司以前的作品互补。在这项工作中,我们确定的基础上铸造了该角色的演员的人口统计性格的角色(即受害者,犯罪者和叙述者)定型。我们的研究结果突出描写的频率的两个显著差异以及受害者和肇事者之间的相互作用的人口统计数据:(1)女性角色更常出现作为受害者,和(2)犯罪者更可能是白色的,如果受害人黑人或拉丁美洲人。迄今为止,我们是第一个显示的电影脚本中使用的语言是暴力内容的重要指标,并有一定的人口在大型数据集的受害者和肇事者的系统性写照。这提供了新的计算工具,以帮助创建评书表示意识
6. Generating Categories for Sets of Entities [PDF] 返回目录
Shuo Zhang, Krisztian Balog, Jamie Callan
Abstract: Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities. They are a unique and valuable resource that is utilized in a broad range of information access tasks. To aid knowledge editors in the manual process of expanding a category system, this paper presents a method of generating categories for sets of entities. First, we employ neural abstractive summarization models to generate candidate categories. Next, the location within the hierarchy is identified for each candidate. Finally, structure-, content-, and hierarchy-based features are used to rank candidates to identify by the most promising ones (measured in terms of specificity, hierarchy, and importance). We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.
摘要:分类系统知识库的核心组成部分,因为它们提供的语义相关的概念和实体的分层分组。他们是被用于广泛的信息获取任务的独特和宝贵的资源。为了帮助知识编辑在扩大一类系统,本文提出生成类别组实体的方法的手工工艺。首先,我们采用神经抽象概括模型来生成候选类别。接下来,层次结构中所处的位置被确定为每个候选。最后,结构 - ,内容 - 和层次结构为基础的功能是用来排名候选人最有前途的人(特异性,等级和重要性来衡量)来识别。我们开发了基于维基百科类测试收集和验证了该方法的有效性。
Shuo Zhang, Krisztian Balog, Jamie Callan
Abstract: Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities. They are a unique and valuable resource that is utilized in a broad range of information access tasks. To aid knowledge editors in the manual process of expanding a category system, this paper presents a method of generating categories for sets of entities. First, we employ neural abstractive summarization models to generate candidate categories. Next, the location within the hierarchy is identified for each candidate. Finally, structure-, content-, and hierarchy-based features are used to rank candidates to identify by the most promising ones (measured in terms of specificity, hierarchy, and importance). We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.
摘要:分类系统知识库的核心组成部分,因为它们提供的语义相关的概念和实体的分层分组。他们是被用于广泛的信息获取任务的独特和宝贵的资源。为了帮助知识编辑在扩大一类系统,本文提出生成类别组实体的方法的手工工艺。首先,我们采用神经抽象概括模型来生成候选类别。接下来,层次结构中所处的位置被确定为每个候选。最后,结构 - ,内容 - 和层次结构为基础的功能是用来排名候选人最有前途的人(特异性,等级和重要性来衡量)来识别。我们开发了基于维基百科类测试收集和验证了该方法的有效性。
7. Leveraging Historical Interaction Data for Improving Conversational Recommender System [PDF] 返回目录
Kun Zhou, Wayne Xin Zhao, Hui Wang, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen
Abstract: Recently, conversational recommender system (CRS) has become an emerging and practical research topic. Most of the existing CRS methods focus on learning effective preference representations for users from conversation data alone. While, we take a new perspective to leverage historical interaction data for improving CRS. For this purpose, we propose a novel pre-training approach to integrating both item-based preference sequence (from historical interaction data) and attribute-based preference sequence (from conversation data) via pre-training methods. We carefully design two pre-training tasks to enhance information fusion between item- and attribute-based preference. To improve the learning performance, we further develop an effective negative sample generator which can produce high-quality negative samples. Experiment results on two real-world datasets have demonstrated the effectiveness of our approach for improving CRS.
摘要:近日,对话的推荐系统(CRS)已经成为一个新兴的,实用的研究课题。大多数现有的CRS方法集中在学习有效的偏好表示从单独谈话数据的用户。虽然,我们采取了新的视角来利用历史互动数据改善CRS。为此,我们提出了一个新的预培训方法通过岗前培训方法均基于项目的优先顺序(从历史互动数据)和基于属性的优先顺序(从会话数据)整合。我们精心设计了两个前培训任务,以提高本期特价货品和基于属性的偏好之间的信息融合。为了提高学习效果,我们进一步开发一种能够生产高品质的阴性样品的有效阴性样品发生器。两个真实世界的数据集实验结果表明我们的改进CRS方法的有效性。
Kun Zhou, Wayne Xin Zhao, Hui Wang, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen
Abstract: Recently, conversational recommender system (CRS) has become an emerging and practical research topic. Most of the existing CRS methods focus on learning effective preference representations for users from conversation data alone. While, we take a new perspective to leverage historical interaction data for improving CRS. For this purpose, we propose a novel pre-training approach to integrating both item-based preference sequence (from historical interaction data) and attribute-based preference sequence (from conversation data) via pre-training methods. We carefully design two pre-training tasks to enhance information fusion between item- and attribute-based preference. To improve the learning performance, we further develop an effective negative sample generator which can produce high-quality negative samples. Experiment results on two real-world datasets have demonstrated the effectiveness of our approach for improving CRS.
摘要:近日,对话的推荐系统(CRS)已经成为一个新兴的,实用的研究课题。大多数现有的CRS方法集中在学习有效的偏好表示从单独谈话数据的用户。虽然,我们采取了新的视角来利用历史互动数据改善CRS。为此,我们提出了一个新的预培训方法通过岗前培训方法均基于项目的优先顺序(从历史互动数据)和基于属性的优先顺序(从会话数据)整合。我们精心设计了两个前培训任务,以提高本期特价货品和基于属性的偏好之间的信息融合。为了提高学习效果,我们进一步开发一种能够生产高品质的阴性样品的有效阴性样品发生器。两个真实世界的数据集实验结果表明我们的改进CRS方法的有效性。
8. Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation [PDF] 返回目录
Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik
Abstract: False triggers in voice assistants are unintended invocations of the assistant, which not only degrade the user experience but may also compromise privacy. False trigger mitigation (FTM) is a process to detect the false trigger events and respond appropriately to the user. In this paper, we propose a novel solution to the FTM problem by introducing a parallel ASR decoding process with a special language model trained from "out-of-domain" data sources. Such language model is complementary to the existing language model optimized for the assistant task. A bidirectional lattice RNN (Bi-LRNN) classifier trained from the lattices generated by the complementary language model shows a $38.34\%$ relative reduction of the false trigger (FT) rate at the fixed rate of $0.4\%$ false suppression (FS) of correct invocations, compared to the current Bi-LRNN model. In addition, we propose to train a parallel Bi-LRNN model based on the decoding lattices from both language models, and examine various ways of implementation. The resulting model leads to further reduction in the false trigger rate by $10.8\%$.
摘要:在语音助手误触发的助理,这不仅降低用户体验的意外调用,也可能危及隐私。假触发缓解(FTM)是适当地检测假触发事件并响应于用户的处理。在本文中,我们提出了通过引入并行ASR解码处理与来自“外的结构域”的数据源培养了特殊的语言模型的新颖解决FTM问题。这样的语言模型是为助理任务优化现有的语言模型的补充。双向晶格RNN(双向LRNN)从通过互补语言模型示出了$ 38.34 \%$为$ 0.4 \%的固定速率的假触发(FT)速率的相对减少所产生的晶格分类器训练$假抑制(FS)正确调用的,相比目前碧LRNN模型。此外,我们提出培养基于来自语言模型解码格子平行双LRNN模型,并研究实施的各种方式。结果模型导致的误触发率进一步降低了$ 10.8 \%$。
Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik
Abstract: False triggers in voice assistants are unintended invocations of the assistant, which not only degrade the user experience but may also compromise privacy. False trigger mitigation (FTM) is a process to detect the false trigger events and respond appropriately to the user. In this paper, we propose a novel solution to the FTM problem by introducing a parallel ASR decoding process with a special language model trained from "out-of-domain" data sources. Such language model is complementary to the existing language model optimized for the assistant task. A bidirectional lattice RNN (Bi-LRNN) classifier trained from the lattices generated by the complementary language model shows a $38.34\%$ relative reduction of the false trigger (FT) rate at the fixed rate of $0.4\%$ false suppression (FS) of correct invocations, compared to the current Bi-LRNN model. In addition, we propose to train a parallel Bi-LRNN model based on the decoding lattices from both language models, and examine various ways of implementation. The resulting model leads to further reduction in the false trigger rate by $10.8\%$.
摘要:在语音助手误触发的助理,这不仅降低用户体验的意外调用,也可能危及隐私。假触发缓解(FTM)是适当地检测假触发事件并响应于用户的处理。在本文中,我们提出了通过引入并行ASR解码处理与来自“外的结构域”的数据源培养了特殊的语言模型的新颖解决FTM问题。这样的语言模型是为助理任务优化现有的语言模型的补充。双向晶格RNN(双向LRNN)从通过互补语言模型示出了$ 38.34 \%$为$ 0.4 \%的固定速率的假触发(FT)速率的相对减少所产生的晶格分类器训练$假抑制(FS)正确调用的,相比目前碧LRNN模型。此外,我们提出培养基于来自语言模型解码格子平行双LRNN模型,并研究实施的各种方式。结果模型导致的误触发率进一步降低了$ 10.8 \%$。
注:中文为机器翻译结果!封面为论文标题词云图!