目录
1. A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information [PDF] 摘要
4. Generating Intelligible Plumitifs Descriptions: Use Case Application with Ethical Considerations [PDF] 摘要
7. Two-Way Neural Machine Translation: A Proof of Concept for Bidirectional Translation Modeling using a Two-Dimensional Grid [PDF] 摘要
9. Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis [PDF] 摘要
10. Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning [PDF] 摘要
11. Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis [PDF] 摘要
14. Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation [PDF] 摘要
16. Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture [PDF] 摘要
18. Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions [PDF] 摘要
19. Does BERT Understand Sentiment? Leveraging Comparisons Between Contextual and Non-Contextual Embeddings to Improve Aspect-Based Sentiment Models [PDF] 摘要
摘要
1. A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information [PDF] 返回目录
Yingxue Fu, Elaine Ui Dhonnchadha
Abstract: This paper studies the differences between different types of newspapers in expressing temporal information, which is a topic that has not received much attention. Techniques from the fields of temporal processing and pattern mining are employed to investigate this topic. First, a corpus annotated with temporal information is created by the author. Then, sequences of temporal information tags mixed with part-of-speech tags are extracted from the corpus. The TKS algorithm is used to mine skip-gram patterns from the sequences. With these patterns, the signatures of the four newspapers are obtained. In order to make the signatures uniquely characterize the newspapers, we revise the signatures by removing reference patterns. Through examining the number of patterns in the signatures and revised signatures, the proportion of patterns containing temporal information tags and the specific patterns containing temporal information tags, it is found that newspapers differ in ways of expressing temporal information.
摘要:本文研究了不同类型的报纸之间表达时间信息的差异,这是一个没有受到巨大关注的主题。采用来自时间处理和模式挖掘领域的技术来研究该主题。首先,作者创建了用时间信息注释的语料库。然后,从语料库中提取与语音部分混合的时间信息标签的序列。 TKS算法用于从序列中挖掘跳过克模式。通过这些模式,获得了四个报纸的签名。为了使签名唯一地表征报纸,我们通过删除参考模式来修改签名。通过检查签名中的模式数和修改签名,发现包含时间信息标签的模式的比例和包含时间信息标签的特定模式,发现报纸以表达时间信息的方式不同。
Yingxue Fu, Elaine Ui Dhonnchadha
Abstract: This paper studies the differences between different types of newspapers in expressing temporal information, which is a topic that has not received much attention. Techniques from the fields of temporal processing and pattern mining are employed to investigate this topic. First, a corpus annotated with temporal information is created by the author. Then, sequences of temporal information tags mixed with part-of-speech tags are extracted from the corpus. The TKS algorithm is used to mine skip-gram patterns from the sequences. With these patterns, the signatures of the four newspapers are obtained. In order to make the signatures uniquely characterize the newspapers, we revise the signatures by removing reference patterns. Through examining the number of patterns in the signatures and revised signatures, the proportion of patterns containing temporal information tags and the specific patterns containing temporal information tags, it is found that newspapers differ in ways of expressing temporal information.
摘要:本文研究了不同类型的报纸之间表达时间信息的差异,这是一个没有受到巨大关注的主题。采用来自时间处理和模式挖掘领域的技术来研究该主题。首先,作者创建了用时间信息注释的语料库。然后,从语料库中提取与语音部分混合的时间信息标签的序列。 TKS算法用于从序列中挖掘跳过克模式。通过这些模式,获得了四个报纸的签名。为了使签名唯一地表征报纸,我们通过删除参考模式来修改签名。通过检查签名中的模式数和修改签名,发现包含时间信息标签的模式的比例和包含时间信息标签的特定模式,发现报纸以表达时间信息的方式不同。
2. Cross-Document Event Coreference Resolution Beyond Corpus-Tailored Systems [PDF] 返回目录
Michael Bugert, Nils Reimers, Iryna Gurevych
Abstract: Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and model development, downstream improvements from applying CDCR have not been shown yet. The reason lies in the fact that every CDCR system released to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability --- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To approach this issue, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our analysis possible). We compare a corpus-independent, feature-based system against a recent neural system developed for ECB+. Whilst being inferior in absolute numbers, the feature-based system shows more consistent performance across all corpora whereas the neural system is hit-and-miss. Via model introspection, we find that the importance of event actions, event time, etc. for resolving coreference in practice varies greatly between the corpora. Additional analysis shows that several systems overfit on the structure of the ECB+ corpus. We conclude with recommendations on how to move beyond corpus-tailored CDCR systems in the future -- the most important being that evaluation on multiple CDCR corpora is strongly necessary. To facilitate future research, we release our dataset, annotation guidelines, and model implementation to the public.
摘要:跨文档事件Coreference分辨率(CDCR)是一个NLP任务,其中需要在整个文件集中识别并群集事件的提升。 CDCR旨在使下游多文件应用程序受益,但尽管最近对语料库的进展情况进行了进展,但尚未显示申请CDCR的下游改进。原因在于,仅开发了迄今为止发布的每个CDCR系统,培训并仅在单个相应的语料库上进行测试。这提高了对其普遍性的强烈关切---一种必须具有下游应用的必要性,其中域或事件提到的幅度可能超过策划语料库中的那些。为了解决这个问题,我们定义了一个统一的评估设置,涉及三个CDCR语料库:欧洲贺卡+,枪支暴力语料库和足球练习语料库(我们在令牌水平上恢复了令牌水平,以使我们的分析成为可能。我们比较近最近为欧洲央行开发的神经系统的基于语料库的基于特征的系统。虽然绝对数字低劣,但基于特征的系统在所有基层上显示了更一致的性能,而神经系统是击中和错过的。通过模型内省,我们发现,在实践中解决Coreference的事件行为,事件时间等的重要性在很大程度上变化。其他分析表明,欧洲ecb +语料库结构的几个系统过度装备。我们结束了关于如何在未来超越语料库定制的CDCR系统的建议 - 最重要的是对多个CDCR基层的评估非常必要。为了促进未来的研究,我们向公众发布我们的数据集,注释指南和模型实施。
Michael Bugert, Nils Reimers, Iryna Gurevych
Abstract: Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and model development, downstream improvements from applying CDCR have not been shown yet. The reason lies in the fact that every CDCR system released to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability --- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To approach this issue, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our analysis possible). We compare a corpus-independent, feature-based system against a recent neural system developed for ECB+. Whilst being inferior in absolute numbers, the feature-based system shows more consistent performance across all corpora whereas the neural system is hit-and-miss. Via model introspection, we find that the importance of event actions, event time, etc. for resolving coreference in practice varies greatly between the corpora. Additional analysis shows that several systems overfit on the structure of the ECB+ corpus. We conclude with recommendations on how to move beyond corpus-tailored CDCR systems in the future -- the most important being that evaluation on multiple CDCR corpora is strongly necessary. To facilitate future research, we release our dataset, annotation guidelines, and model implementation to the public.
摘要:跨文档事件Coreference分辨率(CDCR)是一个NLP任务,其中需要在整个文件集中识别并群集事件的提升。 CDCR旨在使下游多文件应用程序受益,但尽管最近对语料库的进展情况进行了进展,但尚未显示申请CDCR的下游改进。原因在于,仅开发了迄今为止发布的每个CDCR系统,培训并仅在单个相应的语料库上进行测试。这提高了对其普遍性的强烈关切---一种必须具有下游应用的必要性,其中域或事件提到的幅度可能超过策划语料库中的那些。为了解决这个问题,我们定义了一个统一的评估设置,涉及三个CDCR语料库:欧洲贺卡+,枪支暴力语料库和足球练习语料库(我们在令牌水平上恢复了令牌水平,以使我们的分析成为可能。我们比较近最近为欧洲央行开发的神经系统的基于语料库的基于特征的系统。虽然绝对数字低劣,但基于特征的系统在所有基层上显示了更一致的性能,而神经系统是击中和错过的。通过模型内省,我们发现,在实践中解决Coreference的事件行为,事件时间等的重要性在很大程度上变化。其他分析表明,欧洲ecb +语料库结构的几个系统过度装备。我们结束了关于如何在未来超越语料库定制的CDCR系统的建议 - 最重要的是对多个CDCR基层的评估非常必要。为了促进未来的研究,我们向公众发布我们的数据集,注释指南和模型实施。
3. Neural Text Classification by Jointly Learning to Cluster and Align [PDF] 返回目录
Yekun Chai, Haidong Zhang, Shuo Jin
Abstract: Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by inducing cluster centers via a latent variable model and interacting with distributional word embeddings, to enrich the representation of tokens and measure the relatedness between tokens and each learnable cluster centroid. The proposed method jointly learns word clustering centroids and clustering-token alignments, achieving the state of the art results on multiple benchmark datasets and proving that the proposed cluster-token alignment mechanism is indeed favorable to text classification. Notably, our qualitative analysis has conspicuously illustrated that text representations learned by the proposed model are in accord well with our intuition.
摘要:分布文本群集可提供语义上信息表示,并捕获每个单词和语义群集质心之间的相关性。我们通过潜在变量模型诱导集群中心并与分布单词嵌入式交互来扩展文本分类任务的神经文本群集方法,以丰富令牌的表示,并测量令牌和每个学习群集质心之间的相关性。所提出的方法共同学习Word群集质心和聚类令牌对齐,实现了多个基准数据集的最终状态,并证明所提出的集群令牌对齐机制确实有利于文本分类。值得注意的是,我们的定性分析明显地说明了所提出的模型学到的文本表示与我们的直觉相吻合。
Yekun Chai, Haidong Zhang, Shuo Jin
Abstract: Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by inducing cluster centers via a latent variable model and interacting with distributional word embeddings, to enrich the representation of tokens and measure the relatedness between tokens and each learnable cluster centroid. The proposed method jointly learns word clustering centroids and clustering-token alignments, achieving the state of the art results on multiple benchmark datasets and proving that the proposed cluster-token alignment mechanism is indeed favorable to text classification. Notably, our qualitative analysis has conspicuously illustrated that text representations learned by the proposed model are in accord well with our intuition.
摘要:分布文本群集可提供语义上信息表示,并捕获每个单词和语义群集质心之间的相关性。我们通过潜在变量模型诱导集群中心并与分布单词嵌入式交互来扩展文本分类任务的神经文本群集方法,以丰富令牌的表示,并测量令牌和每个学习群集质心之间的相关性。所提出的方法共同学习Word群集质心和聚类令牌对齐,实现了多个基准数据集的最终状态,并证明所提出的集群令牌对齐机制确实有利于文本分类。值得注意的是,我们的定性分析明显地说明了所提出的模型学到的文本表示与我们的直觉相吻合。
4. Generating Intelligible Plumitifs Descriptions: Use Case Application with Ethical Considerations [PDF] 返回目录
David Beauchemin, Nicolas Garneau, Eve Gaumond, Pierre-Luc Déziel, Richard Khoury, Luc Lamontagne
Abstract: Plumitifs (dockets) were initially a tool for law clerks. Nowadays, they are used as summaries presenting all the steps of a judicial case. Information concerning parties' identity, jurisdiction in charge of administering the case, and some information relating to the nature and the course of the preceding are available through plumitifs. They are publicly accessible but barely understandable; they are written using abbreviations and referring to provisions from the Criminal Code of Canada, which makes them hard to reason about. In this paper, we propose a simple yet efficient multi-source language generation architecture that leverages both the plumitif and the Criminal Code's content to generate intelligible plumitifs descriptions. It goes without saying that ethical considerations rise with these sensitive documents made readable and available at scale, legitimate concerns that we address in this paper.
摘要:Plumitifs(Dockets)最初是法律职员的工具。如今,它们被用作呈现司法案件的所有步骤的摘要。有关缔约方的身份,负责管理案件的管辖权以及与前面的性质有关的一些信息,可通过Plumitif获得。他们是公开的,但几乎可以理解;他们是使用缩写编写的,并参考加拿大刑法的规定,这使得他们难以理解。在本文中,我们提出了一种简单而有效的多源语言生成架构,它利用Plumitif和刑法的内容来生成可理解的羽毛描述。不言而喻,道德考虑因素崛起,这些敏感文件在规模上进行了可读和可用,合法担心我们在本文中解决。
David Beauchemin, Nicolas Garneau, Eve Gaumond, Pierre-Luc Déziel, Richard Khoury, Luc Lamontagne
Abstract: Plumitifs (dockets) were initially a tool for law clerks. Nowadays, they are used as summaries presenting all the steps of a judicial case. Information concerning parties' identity, jurisdiction in charge of administering the case, and some information relating to the nature and the course of the preceding are available through plumitifs. They are publicly accessible but barely understandable; they are written using abbreviations and referring to provisions from the Criminal Code of Canada, which makes them hard to reason about. In this paper, we propose a simple yet efficient multi-source language generation architecture that leverages both the plumitif and the Criminal Code's content to generate intelligible plumitifs descriptions. It goes without saying that ethical considerations rise with these sensitive documents made readable and available at scale, legitimate concerns that we address in this paper.
摘要:Plumitifs(Dockets)最初是法律职员的工具。如今,它们被用作呈现司法案件的所有步骤的摘要。有关缔约方的身份,负责管理案件的管辖权以及与前面的性质有关的一些信息,可通过Plumitif获得。他们是公开的,但几乎可以理解;他们是使用缩写编写的,并参考加拿大刑法的规定,这使得他们难以理解。在本文中,我们提出了一种简单而有效的多源语言生成架构,它利用Plumitif和刑法的内容来生成可理解的羽毛描述。不言而喻,道德考虑因素崛起,这些敏感文件在规模上进行了可读和可用,合法担心我们在本文中解决。
5. Domain-Transferable Method for Named Entity Recognition Task [PDF] 返回目录
Vladislav Mikhailov, Tatiana Shavrina
Abstract: Named Entity Recognition (NER) is a fundamental task in the fields of natural language processing and information extraction. NER has been widely used as a standalone tool or an essential component in a variety of applications such as question answering, dialogue assistants and knowledge graphs development. However, training reliable NER models requires a large amount of labelled data which is expensive to obtain, particularly in specialized domains. This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities when domain-specific supervision is not available. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other. The code, data and models are publicly available.
摘要:命名实体识别(ner)是自然语言处理和信息提取领域的基本任务。 Ner已被广泛用作独立工具或各种应用中的基本组成部分,如问题应答,对话助理和知识图形开发。然而,训练可靠的NER模型需要大量标记的数据,这是昂贵的,特别是在专用域中获得昂贵的数据。本文介绍了一种用于在特定于域的监督时学习用于任意集命名实体集的域的特定网型的方法。我们假设无法使用人类努力获得监督,并且神经模型可以彼此学习。代码,数据和模型是公开可用的。
Vladislav Mikhailov, Tatiana Shavrina
Abstract: Named Entity Recognition (NER) is a fundamental task in the fields of natural language processing and information extraction. NER has been widely used as a standalone tool or an essential component in a variety of applications such as question answering, dialogue assistants and knowledge graphs development. However, training reliable NER models requires a large amount of labelled data which is expensive to obtain, particularly in specialized domains. This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities when domain-specific supervision is not available. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other. The code, data and models are publicly available.
摘要:命名实体识别(ner)是自然语言处理和信息提取领域的基本任务。 Ner已被广泛用作独立工具或各种应用中的基本组成部分,如问题应答,对话助理和知识图形开发。然而,训练可靠的NER模型需要大量标记的数据,这是昂贵的,特别是在专用域中获得昂贵的数据。本文介绍了一种用于在特定于域的监督时学习用于任意集命名实体集的域的特定网型的方法。我们假设无法使用人类努力获得监督,并且神经模型可以彼此学习。代码,数据和模型是公开可用的。
6. Tight Integrated End-to-End Training for Cascaded Speech Translation [PDF] 返回目录
Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney
Abstract: A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previous studies have proposed using two-stage models by passing the hidden vectors of the recognizer into the decoder of the MT model and ignoring the MT encoder. This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly without ignoring any learned parameters. It is a tightly integrated method that passes renormalized source word posterior distributions as a soft decision instead of one-hot vectors and enables backpropagation. Therefore, it provides both transcriptions and translations and achieves strong consistency between them. Our experiments on four tasks with different data scenarios show that the model outperforms cascade models up to 1.8% in BLEU and 2.0% in TER and is superior compared to direct models.
摘要:级联语言翻译模型依赖于离散和非可分子转录,从而提供来自源侧的监督信号,并有助于源语音和目标文本之间的转换。这种建模遭受ASR和MT模型之间的错误传播。直接语音翻译是避免错误传播的替代方法;但是,它的性能通常在级联系统后面。为了使用中间表示并保持端到端的培训性,通过将识别器的隐藏矢量传输到MT模型的解码器并忽略MT编码器来提出使用两级模型来提出先前的研究。这项工作探讨了通过在不忽略任何学习参数的情况下通过优化ASR和MT模型的所有参数来折叠整个级联组件将整个级联组件崩溃到单端训练模型中的可行性。它是一种紧密集成的方法,将重新运行的源单词后部分布作为软判决而不是单热量向量,并启用反向化。因此,它提供了转录和翻译并在它们之间实现了强的一致性。我们对具有不同数据场景的四个任务的实验表明,该模型优于BLEU的级联模型高达1.8%,而且与直接模型相比,速度高2.0%。
Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney
Abstract: A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previous studies have proposed using two-stage models by passing the hidden vectors of the recognizer into the decoder of the MT model and ignoring the MT encoder. This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly without ignoring any learned parameters. It is a tightly integrated method that passes renormalized source word posterior distributions as a soft decision instead of one-hot vectors and enables backpropagation. Therefore, it provides both transcriptions and translations and achieves strong consistency between them. Our experiments on four tasks with different data scenarios show that the model outperforms cascade models up to 1.8% in BLEU and 2.0% in TER and is superior compared to direct models.
摘要:级联语言翻译模型依赖于离散和非可分子转录,从而提供来自源侧的监督信号,并有助于源语音和目标文本之间的转换。这种建模遭受ASR和MT模型之间的错误传播。直接语音翻译是避免错误传播的替代方法;但是,它的性能通常在级联系统后面。为了使用中间表示并保持端到端的培训性,通过将识别器的隐藏矢量传输到MT模型的解码器并忽略MT编码器来提出使用两级模型来提出先前的研究。这项工作探讨了通过在不忽略任何学习参数的情况下通过优化ASR和MT模型的所有参数来折叠整个级联组件将整个级联组件崩溃到单端训练模型中的可行性。它是一种紧密集成的方法,将重新运行的源单词后部分布作为软判决而不是单热量向量,并启用反向化。因此,它提供了转录和翻译并在它们之间实现了强的一致性。我们对具有不同数据场景的四个任务的实验表明,该模型优于BLEU的级联模型高达1.8%,而且与直接模型相比,速度高2.0%。
7. Two-Way Neural Machine Translation: A Proof of Concept for Bidirectional Translation Modeling using a Two-Dimensional Grid [PDF] 返回目录
Parnia Bahar, Christopher Brix, Hermann Ney
Abstract: Neural translation models have proven to be effective in capturing sufficient information from a source sentence and generating a high-quality target sentence. However, it is not easy to get the best effect for bidirectional translation, i.e., both source-to-target and target-to-source translation using a single model. If we exclude some pioneering attempts, such as multilingual systems, all other bidirectional translation approaches are required to train two individual models. This paper proposes to build a single end-to-end bidirectional translation model using a two-dimensional grid, where the left-to-right decoding generates source-to-target, and the bottom-to-up decoding creates target-to-source output. Instead of training two models independently, our approach encourages a single network to jointly learn to translate in both directions. Experiments on the WMT 2018 German$\leftrightarrow$English and Turkish$\leftrightarrow$English translation tasks show that the proposed model is capable of generating a good translation quality and has sufficient potential to direct the research.
摘要:神经翻译模型已被证明是有效地从源句中捕获足够的信息并产生高质量的目标句子。但是,使用单个模型,不容易获得双向翻译的最佳效果,即源代码到目标和源极转换。如果我们排除一些开创性的尝试,例如多语言系统,则需要所有其他双向翻译方法来培训两个单独的模型。本文建议使用二维网格构建单个端到端双向转换模型,其中左右解码生成源极端为目标,并且底部到上解码会创建目标 - 源输出。我们的方法而不是独立培训两种模型,而是鼓励单个网络共同学会在两个方向上翻译。在WMT 2018德国$ \ Leftrightarrow $英语和土耳其$ \ Leftrightarrow $英语翻译任务表明,拟议的型号能够产生良好的翻译质量,并有足够的潜力来指导研究。
Parnia Bahar, Christopher Brix, Hermann Ney
Abstract: Neural translation models have proven to be effective in capturing sufficient information from a source sentence and generating a high-quality target sentence. However, it is not easy to get the best effect for bidirectional translation, i.e., both source-to-target and target-to-source translation using a single model. If we exclude some pioneering attempts, such as multilingual systems, all other bidirectional translation approaches are required to train two individual models. This paper proposes to build a single end-to-end bidirectional translation model using a two-dimensional grid, where the left-to-right decoding generates source-to-target, and the bottom-to-up decoding creates target-to-source output. Instead of training two models independently, our approach encourages a single network to jointly learn to translate in both directions. Experiments on the WMT 2018 German$\leftrightarrow$English and Turkish$\leftrightarrow$English translation tasks show that the proposed model is capable of generating a good translation quality and has sufficient potential to direct the research.
摘要:神经翻译模型已被证明是有效地从源句中捕获足够的信息并产生高质量的目标句子。但是,使用单个模型,不容易获得双向翻译的最佳效果,即源代码到目标和源极转换。如果我们排除一些开创性的尝试,例如多语言系统,则需要所有其他双向翻译方法来培训两个单独的模型。本文建议使用二维网格构建单个端到端双向转换模型,其中左右解码生成源极端为目标,并且底部到上解码会创建目标 - 源输出。我们的方法而不是独立培训两种模型,而是鼓励单个网络共同学会在两个方向上翻译。在WMT 2018德国$ \ Leftrightarrow $英语和土耳其$ \ Leftrightarrow $英语翻译任务表明,拟议的型号能够产生良好的翻译质量,并有足够的潜力来指导研究。
8. Gender bias in magazines oriented to men and women: a computational approach [PDF] 返回目录
Diego Kozlowski, Gabriela Lozano, Carla M. Felcher, Fernando Gonzalez, Edgar Altszyler
Abstract: Cultural products are a source to acquire individual values and behaviours. Therefore, the differences in the content of the magazines aimed specifically at women or men are a means to create and reproduce gender stereotypes. In this study, we compare the content of a women-oriented magazine with that of a men-oriented one, both produced by the same editorial group, over a decade (2008-2018). With Topic Modelling techniques we identify the main themes discussed in the magazines and quantify how much the presence of these topics differs between magazines over time. Then, we performed a word-frequency analysis to validate this methodology and extend the analysis to other subjects that did not emerge automatically. Our results show that the frequency of appearance of the topics Family, Business and Women as sex objects, present an initial bias that tends to disappear over time. Conversely, in Fashion and Science topics, the initial differences between both magazines are maintained. Besides, we show that in 2012, the content associated with horoscope increased in the women-oriented magazine, generating a new gap that remained open over time. Also, we show a strong increase in the use of words associated with feminism since 2015 and specifically the word abortion in 2018. Overall, these computational tools allowed us to analyse more than 24,000 articles. Up to our knowledge, this is the first study to compare magazines in such a large dataset, a task that would have been prohibitive using manual content analysis methodologies.
摘要:文化产品是获得个人价值观和行为的源泉。因此,专门针对妇女或男性的杂志内容的差异是创造和再现性别陈规定型观念的手段。在这项研究中,我们将妇女导向的杂志的内容与一个由同一编辑集团产生的男性为导向的杂志的内容比较(2008 - 2018年)。具有主题建模技术,我们确定杂志中讨论的主要主题,并量化这些主题随着时间的推移在杂志之间存在的存在程度。然后,我们执行了一个词频率分析以验证该方法,并将分析扩展到其他未自动出现的主题。我们的研究结果表明,主题家庭,商业和妇女的外观频率为性别对象,呈现随着时间的推移往往消失的初始偏见。相反,在时尚和科学主题中,维持了两种杂志之间的初始差异。此外,我们表明,在2012年,与星座相关的内容在女性导向的杂志中增加,产生了一种仍然随着时间的推移持续打开的新差距。此外,我们在2015年以来,我们表现出与女权主义相关的单词的强劲增加,特别是2018年的单词堕胎。总体而言,这些计算工具允许我们分析超过24,000篇文章。符合我们的知识,这是第一次将杂志比较在如此大型数据集中的研究,这是一个使用手动内容分析方法禁止的任务。
Diego Kozlowski, Gabriela Lozano, Carla M. Felcher, Fernando Gonzalez, Edgar Altszyler
Abstract: Cultural products are a source to acquire individual values and behaviours. Therefore, the differences in the content of the magazines aimed specifically at women or men are a means to create and reproduce gender stereotypes. In this study, we compare the content of a women-oriented magazine with that of a men-oriented one, both produced by the same editorial group, over a decade (2008-2018). With Topic Modelling techniques we identify the main themes discussed in the magazines and quantify how much the presence of these topics differs between magazines over time. Then, we performed a word-frequency analysis to validate this methodology and extend the analysis to other subjects that did not emerge automatically. Our results show that the frequency of appearance of the topics Family, Business and Women as sex objects, present an initial bias that tends to disappear over time. Conversely, in Fashion and Science topics, the initial differences between both magazines are maintained. Besides, we show that in 2012, the content associated with horoscope increased in the women-oriented magazine, generating a new gap that remained open over time. Also, we show a strong increase in the use of words associated with feminism since 2015 and specifically the word abortion in 2018. Overall, these computational tools allowed us to analyse more than 24,000 articles. Up to our knowledge, this is the first study to compare magazines in such a large dataset, a task that would have been prohibitive using manual content analysis methodologies.
摘要:文化产品是获得个人价值观和行为的源泉。因此,专门针对妇女或男性的杂志内容的差异是创造和再现性别陈规定型观念的手段。在这项研究中,我们将妇女导向的杂志的内容与一个由同一编辑集团产生的男性为导向的杂志的内容比较(2008 - 2018年)。具有主题建模技术,我们确定杂志中讨论的主要主题,并量化这些主题随着时间的推移在杂志之间存在的存在程度。然后,我们执行了一个词频率分析以验证该方法,并将分析扩展到其他未自动出现的主题。我们的研究结果表明,主题家庭,商业和妇女的外观频率为性别对象,呈现随着时间的推移往往消失的初始偏见。相反,在时尚和科学主题中,维持了两种杂志之间的初始差异。此外,我们表明,在2012年,与星座相关的内容在女性导向的杂志中增加,产生了一种仍然随着时间的推移持续打开的新差距。此外,我们在2015年以来,我们表现出与女权主义相关的单词的强劲增加,特别是2018年的单词堕胎。总体而言,这些计算工具允许我们分析超过24,000篇文章。符合我们的知识,这是第一次将杂志比较在如此大型数据集中的研究,这是一个使用手动内容分析方法禁止的任务。
9. Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis [PDF] 返回目录
Michael A. Lepori
Abstract: We present a new approach for detecting human-like social biases in word embeddings using representational similarity analysis. Specifically, we probe contextualized and non-contextualized embeddings for evidence of intersectional biases against Black women. We show that these embeddings represent Black women as simultaneously less feminine than White women, and less Black than Black men. This finding aligns with intersectionality theory, which argues that multiple identity categories (such as race or sex) layer on top of each other in order to create unique modes of discrimination that are not shared by any individual category.
摘要:我们使用代表性相似性分析提出了一种新方法,用于检测单词嵌入词嵌入的人类社会偏见。具体而言,我们探讨了上下文化和非上外化嵌入的嵌入,以证明对黑人女性的偏见。我们表明,这些嵌入式代表黑人女性同时比白人女性更少,而且比黑人男性更少。这一发现与交叉关系对齐,据称,彼此顶部的多个身份类别(例如种族或性别)层,以创建任何单个类别不共享的唯一歧视模式。
Michael A. Lepori
Abstract: We present a new approach for detecting human-like social biases in word embeddings using representational similarity analysis. Specifically, we probe contextualized and non-contextualized embeddings for evidence of intersectional biases against Black women. We show that these embeddings represent Black women as simultaneously less feminine than White women, and less Black than Black men. This finding aligns with intersectionality theory, which argues that multiple identity categories (such as race or sex) layer on top of each other in order to create unique modes of discrimination that are not shared by any individual category.
摘要:我们使用代表性相似性分析提出了一种新方法,用于检测单词嵌入词嵌入的人类社会偏见。具体而言,我们探讨了上下文化和非上外化嵌入的嵌入,以证明对黑人女性的偏见。我们表明,这些嵌入式代表黑人女性同时比白人女性更少,而且比黑人男性更少。这一发现与交叉关系对齐,据称,彼此顶部的多个身份类别(例如种族或性别)层,以创建任何单个类别不共享的唯一歧视模式。
10. Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning [PDF] 返回目录
Suk Joon Hong, Brandon Bennett
Abstract: The Winograd Schema Challenge (WSC) is a common-sense reasoning task that requires background knowledge. In this paper, we contribute to tackling WSC in four ways. Firstly, we suggest a keyword method to define a restricted domain where distinctive high-level semantic patterns can be found. A thanking domain was defined by key-words, and the data set in this domain is used in our experiments. Secondly, we develop a high-level knowledge-based reasoning method using semantic roles which is based on the method of Sharma [2019]. Thirdly, we propose an ensemble method to combine knowledge-based reasoning and machine learning which shows the best performance in our experiments. As a machine learning method, we used Bidirectional Encoder Representations from Transformers (BERT) [Kocijan et al., 2019]. Lastly, in terms of evaluation, we suggest a "robust" accuracy measurement by modifying that of Trichelair et al. [2018]. As with their switching method, we evaluate a model by considering its performance on trivial variants of each sentence in the test set.
摘要:Winograd架构挑战(WSC)是一个需要背景知识的常识推理任务。在本文中,我们有助于以四种方式解决WSC。首先,我们建议一个关键字方法来定义可以找到独特的高级语义模式的限制域。感谢域由密钥单词定义,并且在我们的实验中使用该域中的数据集。其次,我们使用基于Sharma方法的语义角色制定了一种高级别知识的推理方法[2019]。第三,我们提出了一种合并方法,以结合基于知识的推理和机器学习,这在我们的实验中显示了最佳性能。作为机器学习方法,我们使用了来自变压器的双向编码器表示(BERT)[Kocijan等,2019]。最后,在评估方面,我们通过修改Trichelair等人来建议“鲁棒”精度测量。 [2018]。与其交换方法一样,我们通过考虑其在测试集中每个句子的琐碎变体上的性能来评估模型。
Suk Joon Hong, Brandon Bennett
Abstract: The Winograd Schema Challenge (WSC) is a common-sense reasoning task that requires background knowledge. In this paper, we contribute to tackling WSC in four ways. Firstly, we suggest a keyword method to define a restricted domain where distinctive high-level semantic patterns can be found. A thanking domain was defined by key-words, and the data set in this domain is used in our experiments. Secondly, we develop a high-level knowledge-based reasoning method using semantic roles which is based on the method of Sharma [2019]. Thirdly, we propose an ensemble method to combine knowledge-based reasoning and machine learning which shows the best performance in our experiments. As a machine learning method, we used Bidirectional Encoder Representations from Transformers (BERT) [Kocijan et al., 2019]. Lastly, in terms of evaluation, we suggest a "robust" accuracy measurement by modifying that of Trichelair et al. [2018]. As with their switching method, we evaluate a model by considering its performance on trivial variants of each sentence in the test set.
摘要:Winograd架构挑战(WSC)是一个需要背景知识的常识推理任务。在本文中,我们有助于以四种方式解决WSC。首先,我们建议一个关键字方法来定义可以找到独特的高级语义模式的限制域。感谢域由密钥单词定义,并且在我们的实验中使用该域中的数据集。其次,我们使用基于Sharma方法的语义角色制定了一种高级别知识的推理方法[2019]。第三,我们提出了一种合并方法,以结合基于知识的推理和机器学习,这在我们的实验中显示了最佳性能。作为机器学习方法,我们使用了来自变压器的双向编码器表示(BERT)[Kocijan等,2019]。最后,在评估方面,我们通过修改Trichelair等人来建议“鲁棒”精度测量。 [2018]。与其交换方法一样,我们通过考虑其在测试集中每个句子的琐碎变体上的性能来评估模型。
11. Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis [PDF] 返回目录
Michael A. Lepori, R. Thomas McCoy
Abstract: As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word (as determined by a dependency parse). In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded in representations of language.
摘要:众所周知,语境化语言的表现通常是通过它们编码上下文的能力的动力。这些陈述捕获了哪些上下文的各个方面?我们介绍一种使用代表性相似性分析(RSA)来解决这个问题的方法。如案例研究,我们调查动词嵌入对动词主题进行编码的程度,代词嵌入编码代词的先发文,并且一个全句子表示编码句子的头单词(由依赖性解析确定)。在所有情况下,我们都表明BERT的上下文化嵌入物反映了所研究的语言依赖性,并且BERT将这些依赖性编码到比编码更少的语言突出控制的程度更大。这些结果展示了我们在假设之间判断的能力,并在语言的表示中编码上下文的方面。
Michael A. Lepori, R. Thomas McCoy
Abstract: As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word (as determined by a dependency parse). In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded in representations of language.
摘要:众所周知,语境化语言的表现通常是通过它们编码上下文的能力的动力。这些陈述捕获了哪些上下文的各个方面?我们介绍一种使用代表性相似性分析(RSA)来解决这个问题的方法。如案例研究,我们调查动词嵌入对动词主题进行编码的程度,代词嵌入编码代词的先发文,并且一个全句子表示编码句子的头单词(由依赖性解析确定)。在所有情况下,我们都表明BERT的上下文化嵌入物反映了所研究的语言依赖性,并且BERT将这些依赖性编码到比编码更少的语言突出控制的程度更大。这些结果展示了我们在假设之间判断的能力,并在语言的表示中编码上下文的方面。
12. Argument from Old Man's View: Assessing Social Bias in Argumentation [PDF] 返回目录
Maximilian Spliethöver, Henning Wachsmuth
Abstract: Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.
摘要:语言中的社会偏见 - 朝着性别,种族,年龄和其他社会团体 - 对许多NLP应用程序构成道德影响的问题。最近的研究表明,在各个数据上培训的机器学习模型可能不仅可以采用,而且甚至放大偏差。然而,到目前为止,在计算论证中的偏差很少注意。在本文中,我们研究了大型英语辩论门户中的社会偏见存在。特别是,我们在特定于门户的语料库上培训Word嵌入模型,并系统地使用Weat进行系统评估其偏差,现有度量标准测量Word Embeddings中的偏差。在一个单词共同发生分析中,我们调查偏见的原因。结果表明,所有测试的辩论Corpora都包含不平衡和偏见的数据,主要是有利于欧美名称的男性。我们的经验洞察力有助于了解争论数据来源的偏见。
Maximilian Spliethöver, Henning Wachsmuth
Abstract: Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.
摘要:语言中的社会偏见 - 朝着性别,种族,年龄和其他社会团体 - 对许多NLP应用程序构成道德影响的问题。最近的研究表明,在各个数据上培训的机器学习模型可能不仅可以采用,而且甚至放大偏差。然而,到目前为止,在计算论证中的偏差很少注意。在本文中,我们研究了大型英语辩论门户中的社会偏见存在。特别是,我们在特定于门户的语料库上培训Word嵌入模型,并系统地使用Weat进行系统评估其偏差,现有度量标准测量Word Embeddings中的偏差。在一个单词共同发生分析中,我们调查偏见的原因。结果表明,所有测试的辩论Corpora都包含不平衡和偏见的数据,主要是有利于欧美名称的男性。我们的经验洞察力有助于了解争论数据来源的偏见。
13. GLGE: A New General Language Generation Evaluation Benchmark [PDF] 返回目录
Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
Abstract: Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet\footnote{The source code and dataset will be publicly available at this https URL.
摘要:胶水和超格子等多项任务基准在自然语言处理(NLP)中推出和转移学习的巨大进展。这些基准主要专注于一系列自然语言理解(NLU)任务,而不考虑自然语言生成(NLG)模型。在本文中,我们介绍了一般语言生成评估(Glge),是一种新的多任务基准,用于评估八种语言生成任务的NLG模型的泛化能力。对于每项任务,我们继续在任务困难(轻松,炫耀介质和畏缩)方面设计三个子任务。这引入了24个子组织以全面比较模型性能。为了鼓励对NLG模型的预先训练和转移学习的研究,我们将公开可用的衰退,并建立一个强大的基线的排行榜,包括质量,巴特和先知网点\脚注{源代码和数据集将在此HTTPS URL上公开可用。
Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
Abstract: Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet\footnote{The source code and dataset will be publicly available at this https URL.
摘要:胶水和超格子等多项任务基准在自然语言处理(NLP)中推出和转移学习的巨大进展。这些基准主要专注于一系列自然语言理解(NLU)任务,而不考虑自然语言生成(NLG)模型。在本文中,我们介绍了一般语言生成评估(Glge),是一种新的多任务基准,用于评估八种语言生成任务的NLG模型的泛化能力。对于每项任务,我们继续在任务困难(轻松,炫耀介质和畏缩)方面设计三个子任务。这引入了24个子组织以全面比较模型性能。为了鼓励对NLG模型的预先训练和转移学习的研究,我们将公开可用的衰退,并建立一个强大的基线的排行榜,包括质量,巴特和先知网点\脚注{源代码和数据集将在此HTTPS URL上公开可用。
14. Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation [PDF] 返回目录
Woohwan Jung, Kyuseok Shim
Abstract: Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework.
摘要:关系提取(RE)由于其在知识库建设和问题应答之类的现实世界应用中的重要性而被广泛研究。大多数现有工程在远端监督数据或人为注释数据上培训模型。为了利用人类注释的高准确性和遥远监管的廉价成本,我们提出了双重监督框架,有效地利用了两种类型的数据。然而,只需将这两种类型的数据组合到训练RE模型可能会降低预测精度,因为远处监控具有标记偏差。我们采用了两个独立的预测网络HA-Net和DS-Net,分别通过人类注释和远处监控来预测标签,以防止通过遥远监管的错误标记来降低准确性。此外,我们提出了一个额外的损失术语,称为分歧罚款,以使HA-Net能够从远方监督的标签中学到。此外,我们利用其他网络通过考虑上下文信息,自适应地评估标签偏差。我们对句子级和文件级RES的绩效研究证实了双重监督框架的有效性。
Woohwan Jung, Kyuseok Shim
Abstract: Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework.
摘要:关系提取(RE)由于其在知识库建设和问题应答之类的现实世界应用中的重要性而被广泛研究。大多数现有工程在远端监督数据或人为注释数据上培训模型。为了利用人类注释的高准确性和遥远监管的廉价成本,我们提出了双重监督框架,有效地利用了两种类型的数据。然而,只需将这两种类型的数据组合到训练RE模型可能会降低预测精度,因为远处监控具有标记偏差。我们采用了两个独立的预测网络HA-Net和DS-Net,分别通过人类注释和远处监控来预测标签,以防止通过遥远监管的错误标记来降低准确性。此外,我们提出了一个额外的损失术语,称为分歧罚款,以使HA-Net能够从远方监督的标签中学到。此外,我们利用其他网络通过考虑上下文信息,自适应地评估标签偏差。我们对句子级和文件级RES的绩效研究证实了双重监督框架的有效性。
15. Acoustic span embeddings for multilingual query-by-example search [PDF] 返回目录
Yushi Hu, Shane Settle, Karen Livescu
Abstract: Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Recent work has found that methods based on acoustic word embeddings (AWEs) can improve both performance and search speed. However, prior work on AWE-based QbE has primarily focused on English data and with single-word queries. In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages. We consider the commonly used setting where we have access to labeled data in other languages (in our case, several low-resource languages) distinct from the unseen test languages. We evaluate our approach on the QUESST 2015 QbE tasks, finding that multilingual ASE-based search is much faster than DTW-based search and outperforms the best previously published results on this task.
摘要:逐个示例(QBE)语音搜索是匹配搜索集中的语音查询的任务。在低资源或零资源设置中,QBE搜索通常通过基于动态时间翘曲(DTW)的方法来解决。最近的工作发现,基于声学单词嵌入式的方法(AWES)可以提高性能和搜索速度。然而,在基于敬畏的QBE上的事先工作主要集中在英语数据和单词查询。在这项工作中,我们概括了威胁训练跨越单词,产生声学跨度嵌入(ASE),并探索ASE对QBE以多种看法语言中的任意长度查询的应用。我们考虑常用的设置,我们可以在其他语言中访问标记数据(在我们的案例中,几种低资源语言)不同于看不见的试验语言。我们在Quesstt 2015 Qbe任务中评估我们的方法,发现多语言的ASE的搜索比基于DTW的搜索更快,优于此任务的最佳发布结果。
Yushi Hu, Shane Settle, Karen Livescu
Abstract: Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Recent work has found that methods based on acoustic word embeddings (AWEs) can improve both performance and search speed. However, prior work on AWE-based QbE has primarily focused on English data and with single-word queries. In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages. We consider the commonly used setting where we have access to labeled data in other languages (in our case, several low-resource languages) distinct from the unseen test languages. We evaluate our approach on the QUESST 2015 QbE tasks, finding that multilingual ASE-based search is much faster than DTW-based search and outperforms the best previously published results on this task.
摘要:逐个示例(QBE)语音搜索是匹配搜索集中的语音查询的任务。在低资源或零资源设置中,QBE搜索通常通过基于动态时间翘曲(DTW)的方法来解决。最近的工作发现,基于声学单词嵌入式的方法(AWES)可以提高性能和搜索速度。然而,在基于敬畏的QBE上的事先工作主要集中在英语数据和单词查询。在这项工作中,我们概括了威胁训练跨越单词,产生声学跨度嵌入(ASE),并探索ASE对QBE以多种看法语言中的任意长度查询的应用。我们考虑常用的设置,我们可以在其他语言中访问标记数据(在我们的案例中,几种低资源语言)不同于看不见的试验语言。我们在Quesstt 2015 Qbe任务中评估我们的方法,发现多语言的ASE的搜索比基于DTW的搜索更快,优于此任务的最佳发布结果。
16. Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture [PDF] 返回目录
Felipe Godoy
Abstract: Humor is a natural and fundamental component of human interactions. When correctly applied, humor allows us to express thoughts and feelings conveniently and effectively, increasing interpersonal affection, likeability, and trust. However, understanding the use of humor is a computationally challenging task from the perspective of humor-aware language processing models. As language models become ubiquitous through virtual-assistants and IOT devices, the need to develop humor-aware models rises exponentially. To further improve the state-of-the-art capacity to perform this particular sentiment-analysis task we must explore models that incorporate contextualized and nonverbal elements in their design. Ideally, we seek architectures accepting non-verbal elements as additional embedded inputs to the model, alongside the original sentence-embedded input. This survey thus analyses the current state of research in techniques for improved contextualized embedding incorporating nonverbal information, as well as newly proposed deep architectures to improve context retention on top of popular word-embeddings methods.
摘要:幽默是人类互动的自然和基本组成部分。当正确应用时,幽默让我们能够方便,有效地表达思想和感受,增加人际关系的情感,可爱和信任。然而,从幽默感知语言处理模型的角度来看,了解幽默的使用是一个计算上具有挑战性的任务。由于语言模型通过虚拟助理和IOT设备变得无处不在,因此需要开发幽默感知模型的需求呈指数级升高。为了进一步提高最先进的能力,以执行这种特殊的情绪分析任务,我们必须探索在其设计中包含上下文化和非语言元素的模型。理想情况下,我们寻求接受非口头元素的架构作为额外的嵌入输入到模型,以及原始句子嵌入的输入。因此,该调查分析了当前的研究状态,用于改进的语境化嵌入结合非语言信息,以及新提出的深层架构,以改善流行的单词嵌入方法的上下文保留。
Felipe Godoy
Abstract: Humor is a natural and fundamental component of human interactions. When correctly applied, humor allows us to express thoughts and feelings conveniently and effectively, increasing interpersonal affection, likeability, and trust. However, understanding the use of humor is a computationally challenging task from the perspective of humor-aware language processing models. As language models become ubiquitous through virtual-assistants and IOT devices, the need to develop humor-aware models rises exponentially. To further improve the state-of-the-art capacity to perform this particular sentiment-analysis task we must explore models that incorporate contextualized and nonverbal elements in their design. Ideally, we seek architectures accepting non-verbal elements as additional embedded inputs to the model, alongside the original sentence-embedded input. This survey thus analyses the current state of research in techniques for improved contextualized embedding incorporating nonverbal information, as well as newly proposed deep architectures to improve context retention on top of popular word-embeddings methods.
摘要:幽默是人类互动的自然和基本组成部分。当正确应用时,幽默让我们能够方便,有效地表达思想和感受,增加人际关系的情感,可爱和信任。然而,从幽默感知语言处理模型的角度来看,了解幽默的使用是一个计算上具有挑战性的任务。由于语言模型通过虚拟助理和IOT设备变得无处不在,因此需要开发幽默感知模型的需求呈指数级升高。为了进一步提高最先进的能力,以执行这种特殊的情绪分析任务,我们必须探索在其设计中包含上下文化和非语言元素的模型。理想情况下,我们寻求接受非口头元素的架构作为额外的嵌入输入到模型,以及原始句子嵌入的输入。因此,该调查分析了当前的研究状态,用于改进的语境化嵌入结合非语言信息,以及新提出的深层架构,以改善流行的单词嵌入方法的上下文保留。
17. Multi-task Language Modeling for Improving Speech Recognition of Rare Words [PDF] 返回目录
Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
Abstract: End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model with trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in term of word-error-rate relative (WERR).
摘要:由于其相对架构简洁和竞争性能,端到端的自动语音识别(ASR)系统越来越受欢迎。然而,即使这些系统的平均准确性可能很高,稀有内容词的性能也经常滞后于混合ASR系统。为解决这个问题,通常应用二手备救援。在本文中,我们提出了一种具有多任务学习的二手系统,利用语义目标(例如意图和时隙预测)来提高语音识别性能。我们展示我们的救援模式与这些额外的任务接受过培训,优于基线救援模型,只有语言建模任务训练,常规测试的1.4%,在单词误差期间罕见的单词测试中的2.6%率相对(WERR)。
Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
Abstract: End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model with trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in term of word-error-rate relative (WERR).
摘要:由于其相对架构简洁和竞争性能,端到端的自动语音识别(ASR)系统越来越受欢迎。然而,即使这些系统的平均准确性可能很高,稀有内容词的性能也经常滞后于混合ASR系统。为解决这个问题,通常应用二手备救援。在本文中,我们提出了一种具有多任务学习的二手系统,利用语义目标(例如意图和时隙预测)来提高语音识别性能。我们展示我们的救援模式与这些额外的任务接受过培训,优于基线救援模型,只有语言建模任务训练,常规测试的1.4%,在单词误差期间罕见的单词测试中的2.6%率相对(WERR)。
18. Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions [PDF] 返回目录
Jernej Vivod
Abstract: The increasing adoption of technology to augment or even replace traditional face-to-face learning has led to the development of a myriad of tools and platforms aimed at engaging the students and facilitating the teacher's ability to present new information. The IMapBook project aims at improving the literacy and reading comprehension skills of elementary school-aged children by presenting them with interactive e-books and letting them take part in moderated book discussions. This study aims to develop and illustrate a machine learning-based approach to message classification that could be used to automatically notify the discussion moderator of a possible need for an intervention and also to collect other useful information about the ongoing discussion. We aim to predict whether a message posted in the discussion is relevant to the discussed book, whether the message is a statement, a question, or an answer, and in which broad category it can be classified. We incrementally enrich our used feature subsets and compare them using standard classification algorithms as well as the novel Feature stacking method. We use standard classification performance metrics as well as the Bayesian correlated t-test to show that the use of described methods in discussion moderation is feasible. Moving forward, we seek to attain better performance by focusing on extracting more of the significant information found in the strong temporal interdependence of the messages.
摘要:增加技术通过增加甚至取代传统的面对面学习,导致了旨在从事学生的无数的工具和平台的发展,并促进教师呈现新信息的能力。 IMAPBook项目旨在通过展示互动电子书提出浅谈小学儿童的识字和阅读理解技能,让他们参加审查书籍讨论。本研究旨在开发和说明基于机器学习的方法,用于消息分类,可用于自动通知讨论主持人可能需要干预以及收集关于正在进行的讨论的其他有用信息。我们的目标是预测讨论中发布的消息是否与讨论的书籍相关,无论是邮件是一个语句,问题还是答案,以及它可以分类的广泛类别。我们逐步丰富了我们使用的特征子集,并使用标准分类算法以及新颖的特征堆叠方法进行比较。我们使用标准分类绩效指标以及贝叶斯相关的T检验,以表明使用所描述的方法在讨论适度中是可行的。向前迈进,我们寻求通过专注于提取在消息的强时间相互依存中发现的更多重要信息来实现更好的性能。
Jernej Vivod
Abstract: The increasing adoption of technology to augment or even replace traditional face-to-face learning has led to the development of a myriad of tools and platforms aimed at engaging the students and facilitating the teacher's ability to present new information. The IMapBook project aims at improving the literacy and reading comprehension skills of elementary school-aged children by presenting them with interactive e-books and letting them take part in moderated book discussions. This study aims to develop and illustrate a machine learning-based approach to message classification that could be used to automatically notify the discussion moderator of a possible need for an intervention and also to collect other useful information about the ongoing discussion. We aim to predict whether a message posted in the discussion is relevant to the discussed book, whether the message is a statement, a question, or an answer, and in which broad category it can be classified. We incrementally enrich our used feature subsets and compare them using standard classification algorithms as well as the novel Feature stacking method. We use standard classification performance metrics as well as the Bayesian correlated t-test to show that the use of described methods in discussion moderation is feasible. Moving forward, we seek to attain better performance by focusing on extracting more of the significant information found in the strong temporal interdependence of the messages.
摘要:增加技术通过增加甚至取代传统的面对面学习,导致了旨在从事学生的无数的工具和平台的发展,并促进教师呈现新信息的能力。 IMAPBook项目旨在通过展示互动电子书提出浅谈小学儿童的识字和阅读理解技能,让他们参加审查书籍讨论。本研究旨在开发和说明基于机器学习的方法,用于消息分类,可用于自动通知讨论主持人可能需要干预以及收集关于正在进行的讨论的其他有用信息。我们的目标是预测讨论中发布的消息是否与讨论的书籍相关,无论是邮件是一个语句,问题还是答案,以及它可以分类的广泛类别。我们逐步丰富了我们使用的特征子集,并使用标准分类算法以及新颖的特征堆叠方法进行比较。我们使用标准分类绩效指标以及贝叶斯相关的T检验,以表明使用所描述的方法在讨论适度中是可行的。向前迈进,我们寻求通过专注于提取在消息的强时间相互依存中发现的更多重要信息来实现更好的性能。
19. Does BERT Understand Sentiment? Leveraging Comparisons Between Contextual and Non-Contextual Embeddings to Improve Aspect-Based Sentiment Models [PDF] 返回目录
Natesh Reddy, Pranaydeep Singh, Muktabh Mayank Srivastava
Abstract: When performing Polarity Detection for different words in a sentence, we need to look at the words around to understand the sentiment. Massively pretrained language models like BERT can encode not only just the words in a document but also the context around the words along with them. This begs the questions, "Does a pretrain language model also automatically encode sentiment information about each word?" and "Can it be used to infer polarity towards different aspects?". In this work we try to answer this question by showing that training a comparison of a contextual embedding from BERT and a generic word embedding can be used to infer sentiment. We also show that if we finetune a subset of weights the model built on comparison of BERT and generic word embedding, it can get state of the art results for Polarity Detection in Aspect Based Sentiment Classification datasets.
摘要:在句子中对不同单词进行极性检测时,我们需要看看周围的话来了解情绪。像BERT这样的大规模预用的语言模型不仅可以仅为文档中的单词编码,而且可以编码文字中的语境以及它们周围的上下文。这引出了问题,“Pretrain语言模型还会自动编码有关每个单词的情绪信息吗?”并且“它可以用来推断极性对不同的方面吗?”。在这项工作中,我们尝试通过表示培训与BERT的上下文嵌入的比较来回答这个问题,并且可以使用通用词嵌入来推断出情绪。我们还表明,如果我们Finetune伯特和通用词嵌入的比较建立的重量子集,它可以在基于方面的情绪分类数据集中获得最终的极性检测结果。
Natesh Reddy, Pranaydeep Singh, Muktabh Mayank Srivastava
Abstract: When performing Polarity Detection for different words in a sentence, we need to look at the words around to understand the sentiment. Massively pretrained language models like BERT can encode not only just the words in a document but also the context around the words along with them. This begs the questions, "Does a pretrain language model also automatically encode sentiment information about each word?" and "Can it be used to infer polarity towards different aspects?". In this work we try to answer this question by showing that training a comparison of a contextual embedding from BERT and a generic word embedding can be used to infer sentiment. We also show that if we finetune a subset of weights the model built on comparison of BERT and generic word embedding, it can get state of the art results for Polarity Detection in Aspect Based Sentiment Classification datasets.
摘要:在句子中对不同单词进行极性检测时,我们需要看看周围的话来了解情绪。像BERT这样的大规模预用的语言模型不仅可以仅为文档中的单词编码,而且可以编码文字中的语境以及它们周围的上下文。这引出了问题,“Pretrain语言模型还会自动编码有关每个单词的情绪信息吗?”并且“它可以用来推断极性对不同的方面吗?”。在这项工作中,我们尝试通过表示培训与BERT的上下文嵌入的比较来回答这个问题,并且可以使用通用词嵌入来推断出情绪。我们还表明,如果我们Finetune伯特和通用词嵌入的比较建立的重量子集,它可以在基于方面的情绪分类数据集中获得最终的极性检测结果。
20. Fuzzy Stochastic Timed Petri Nets for Causal properties representation [PDF] 返回目录
Alejandro Sobrino, Eduardo C. Garrido-Merchan, Cristina Puente
Abstract: Imagery is frequently used to model, represent and communicate knowledge. In particular, graphs are one of the most powerful tools, being able to represent relations between objects. Causal relations are frequently represented by directed graphs, with nodes denoting causes and links denoting causal influence. A causal graph is a skeletal picture, showing causal associations and impact between entities. Common methods used for graphically representing causal scenarios are neurons, truth tables, causal Bayesian networks, cognitive maps and Petri Nets. Causality is often defined in terms of precedence (the cause precedes the effect), concurrency (often, an effect is provoked simultaneously by two or more causes), circularity (a cause provokes the effect and the effect reinforces the cause) and imprecision (the presence of the cause favors the effect, but not necessarily causes it). We will show that, even though the traditional graphical models are able to represent separately some of the properties aforementioned, they fail trying to illustrate indistinctly all of them. To approach that gap, we will introduce Fuzzy Stochastic Timed Petri Nets as a graphical tool able to represent time, co-occurrence, looping and imprecision in causal flow.
摘要:图像经常用于模拟,代表和传达知识。特别是,图形是最强大的工具之一,能够代表对象之间的关系。因果关系经常由有向图表示,其中节点表示表示因果影响的原因和链接。因果图是一个骨架图片,显示了因果关系和实体之间的影响。用于图形代表因果方案的常用方法是神经元,真理表,因果贝叶斯网络,认知地图和培养网。因果关系通常在优先级(原因在效果之前),并发(通常,通过两个或更多原因同时激发效果),循环(引起效果,效果强化了原因)和不精确(事业的存在有利于效果,但不一定会导致它)。我们将展示,即使传统的图形模型能够单独代表上述一些属性,它们也会失败试图毫不含糊地说明它们。为了接近这种差距,我们将把模糊的随机定时Petri网作为一种能够表示因果流量的时间,共同发生,环路和不精确的图形工具。
Alejandro Sobrino, Eduardo C. Garrido-Merchan, Cristina Puente
Abstract: Imagery is frequently used to model, represent and communicate knowledge. In particular, graphs are one of the most powerful tools, being able to represent relations between objects. Causal relations are frequently represented by directed graphs, with nodes denoting causes and links denoting causal influence. A causal graph is a skeletal picture, showing causal associations and impact between entities. Common methods used for graphically representing causal scenarios are neurons, truth tables, causal Bayesian networks, cognitive maps and Petri Nets. Causality is often defined in terms of precedence (the cause precedes the effect), concurrency (often, an effect is provoked simultaneously by two or more causes), circularity (a cause provokes the effect and the effect reinforces the cause) and imprecision (the presence of the cause favors the effect, but not necessarily causes it). We will show that, even though the traditional graphical models are able to represent separately some of the properties aforementioned, they fail trying to illustrate indistinctly all of them. To approach that gap, we will introduce Fuzzy Stochastic Timed Petri Nets as a graphical tool able to represent time, co-occurrence, looping and imprecision in causal flow.
摘要:图像经常用于模拟,代表和传达知识。特别是,图形是最强大的工具之一,能够代表对象之间的关系。因果关系经常由有向图表示,其中节点表示表示因果影响的原因和链接。因果图是一个骨架图片,显示了因果关系和实体之间的影响。用于图形代表因果方案的常用方法是神经元,真理表,因果贝叶斯网络,认知地图和培养网。因果关系通常在优先级(原因在效果之前),并发(通常,通过两个或更多原因同时激发效果),循环(引起效果,效果强化了原因)和不精确(事业的存在有利于效果,但不一定会导致它)。我们将展示,即使传统的图形模型能够单独代表上述一些属性,它们也会失败试图毫不含糊地说明它们。为了接近这种差距,我们将把模糊的随机定时Petri网作为一种能够表示因果流量的时间,共同发生,环路和不精确的图形工具。
21. A Robotic Dating Coaching System Leveraging Online Communities Posts [PDF] 返回目录
Sihyeon Jo, Donghwi Jung, Keonwoo Kim, Eun Gyo Joung, Giulia Nespoli, Seungryong Yoo, Minseob So, Seung-Woo Seo, Seong-Woo Kim
Abstract: Can a robot be a personal dating coach? Even with the increasing amount of conversational data on the internet, the implementation of conversational robots remains a challenge. In particular, a detailed and professional counseling log is expensive and not publicly accessible. In this paper, we develop a robot dating coaching system leveraging corpus from online communities. We examine people's perceptions of the dating coaching robot with a dialogue module. 97 participants joined to have a conversation with the robot, and 30 of them evaluated the robot. The results indicate that participants thought the robot could become a dating coach while considering the robot is entertaining rather than helpful.
摘要:机器人可以成为个人约会教练吗?即使在互联网上越来越多的会话数据,也仍然是一个挑战。特别是,详细和专业的咨询日志是昂贵的且不公开的。在本文中,我们开发了利用在线社区利用语料库的机器人约会教练系统。我们审查人们对与对话模块的约会教练机器人的看法。 97参与者加入与机器人的对话,其中30个评估机器人。结果表明,参与者认为机器人可以在考虑机器人娱乐而不是有用的同时成为约会教练。
Sihyeon Jo, Donghwi Jung, Keonwoo Kim, Eun Gyo Joung, Giulia Nespoli, Seungryong Yoo, Minseob So, Seung-Woo Seo, Seong-Woo Kim
Abstract: Can a robot be a personal dating coach? Even with the increasing amount of conversational data on the internet, the implementation of conversational robots remains a challenge. In particular, a detailed and professional counseling log is expensive and not publicly accessible. In this paper, we develop a robot dating coaching system leveraging corpus from online communities. We examine people's perceptions of the dating coaching robot with a dialogue module. 97 participants joined to have a conversation with the robot, and 30 of them evaluated the robot. The results indicate that participants thought the robot could become a dating coach while considering the robot is entertaining rather than helpful.
摘要:机器人可以成为个人约会教练吗?即使在互联网上越来越多的会话数据,也仍然是一个挑战。特别是,详细和专业的咨询日志是昂贵的且不公开的。在本文中,我们开发了利用在线社区利用语料库的机器人约会教练系统。我们审查人们对与对话模块的约会教练机器人的看法。 97参与者加入与机器人的对话,其中30个评估机器人。结果表明,参与者认为机器人可以在考虑机器人娱乐而不是有用的同时成为约会教练。
22. Multimodal Pretraining for Dense Video Captioning [PDF] 返回目录
Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut
Abstract: Learning specific hands-on skills such as cooking, car maintenance, and home repairs increasingly happens via instructional videos. The user experience with such videos is known to be improved by meta-information such as time-stamped annotations for the main steps involved. Generating such annotations automatically is challenging, and we describe here two relevant contributions. First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations. Second, we explore several multimodal sequence-to-sequence pretraining strategies that leverage large unsupervised datasets of videos and caption-like texts. We pretrain and subsequently finetune dense video captioning models using both YouCook2 and ViTT. We show that such models generalize well and are robust over a wide variety of instructional videos.
摘要:学习特定的实践技能,如烹饪,汽车维护和家庭维修,越来越多地通过教学视频发生。已知具有此类视频的用户体验通过诸如所涉及的主要步骤的时间戳注释,例如诸如涉及的主要步骤的时间戳。自动生成此类注释是具有挑战性的,我们在这里描述了两个相关贡献。首先,我们构建并释放一个新的密集视频字幕数据集,视频时间线标记(VITT),其中包括各种教学视频以及时间戳的注释。其次,我们探讨了几种多模式序列到序列预借策略,可利用视频和标题文本的大型无监督数据集。我们使用youscook2和Vitt之前和随后的芬特义亮镜标题模型。我们表明,此类模型概括了很好,并且在各种教学视频中都很强大。
Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut
Abstract: Learning specific hands-on skills such as cooking, car maintenance, and home repairs increasingly happens via instructional videos. The user experience with such videos is known to be improved by meta-information such as time-stamped annotations for the main steps involved. Generating such annotations automatically is challenging, and we describe here two relevant contributions. First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations. Second, we explore several multimodal sequence-to-sequence pretraining strategies that leverage large unsupervised datasets of videos and caption-like texts. We pretrain and subsequently finetune dense video captioning models using both YouCook2 and ViTT. We show that such models generalize well and are robust over a wide variety of instructional videos.
摘要:学习特定的实践技能,如烹饪,汽车维护和家庭维修,越来越多地通过教学视频发生。已知具有此类视频的用户体验通过诸如所涉及的主要步骤的时间戳注释,例如诸如涉及的主要步骤的时间戳。自动生成此类注释是具有挑战性的,我们在这里描述了两个相关贡献。首先,我们构建并释放一个新的密集视频字幕数据集,视频时间线标记(VITT),其中包括各种教学视频以及时间戳的注释。其次,我们探讨了几种多模式序列到序列预借策略,可利用视频和标题文本的大型无监督数据集。我们使用youscook2和Vitt之前和随后的芬特义亮镜标题模型。我们表明,此类模型概括了很好,并且在各种教学视频中都很强大。
23. Streaming Multi-speaker ASR with RNN-T [PDF] 返回目录
Ilya Sklyar, Anna Piunova, Yulan Liu
Abstract: Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers. However, all published works have assumed no latency constraints during inference, which does not hold for most voice assistant interactions. This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T) that has been shown to provide high recognition accuracy at a low latency online recognition regime. We investigate two approaches to multi-speaker model training of the RNN-T: deterministic output-target assignment and permutation invariant training. We show that guiding separation with speaker order labels in the former case enhances the high-level speaker tracking capability of RNN-T. Apart from that, with multistyle training on single- and multi-speaker utterances, the resulting models gain robustness against ambiguous numbers of speakers during inference. Our best model achieves a WER of 10.2% on simulated 2-speaker LibriSpeech data, which is competitive with the previously reported state-of-the-art nonstreaming model (10.3%), while the proposed model could be directly applied for streaming applications.
摘要:最近的研究显示端到端ASR系统可以从多个扬声器识别重叠的语音。但是,所有已发布的作品都在推理期间没有假设延迟约束,这不适用于大多数语音辅助交互。这项工作侧重于基于经常性神经网络换能器(RNN-T)的多扬声器语音识别,该传感器(RNN-T)已被示出为在低延迟在线识别方案下提供高识别精度。我们调查了RNN-T的多扬声器模型培训的两种方法:确定性输出 - 目标分配和排列不变训练。我们表明,前案件中的带有扬声器订单标签的引导分离增强了RNN-T的高级扬声器跟踪能力。除此之外,在单扬声器话语上具有多际风格培训,所产生的模型在推理期间对薄膜扬声器造成稳健性。我们的最佳模型在模拟的2扬声器LibrisPeech数据上实现了10.2%的WER,这与先前报告的最先进的无日期型号(10.3%)竞争,而提出的模型可以直接应用于流式应用。
Ilya Sklyar, Anna Piunova, Yulan Liu
Abstract: Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers. However, all published works have assumed no latency constraints during inference, which does not hold for most voice assistant interactions. This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T) that has been shown to provide high recognition accuracy at a low latency online recognition regime. We investigate two approaches to multi-speaker model training of the RNN-T: deterministic output-target assignment and permutation invariant training. We show that guiding separation with speaker order labels in the former case enhances the high-level speaker tracking capability of RNN-T. Apart from that, with multistyle training on single- and multi-speaker utterances, the resulting models gain robustness against ambiguous numbers of speakers during inference. Our best model achieves a WER of 10.2% on simulated 2-speaker LibriSpeech data, which is competitive with the previously reported state-of-the-art nonstreaming model (10.3%), while the proposed model could be directly applied for streaming applications.
摘要:最近的研究显示端到端ASR系统可以从多个扬声器识别重叠的语音。但是,所有已发布的作品都在推理期间没有假设延迟约束,这不适用于大多数语音辅助交互。这项工作侧重于基于经常性神经网络换能器(RNN-T)的多扬声器语音识别,该传感器(RNN-T)已被示出为在低延迟在线识别方案下提供高识别精度。我们调查了RNN-T的多扬声器模型培训的两种方法:确定性输出 - 目标分配和排列不变训练。我们表明,前案件中的带有扬声器订单标签的引导分离增强了RNN-T的高级扬声器跟踪能力。除此之外,在单扬声器话语上具有多际风格培训,所产生的模型在推理期间对薄膜扬声器造成稳健性。我们的最佳模型在模拟的2扬声器LibrisPeech数据上实现了10.2%的WER,这与先前报告的最先进的无日期型号(10.3%)竞争,而提出的模型可以直接应用于流式应用。
注:中文为机器翻译结果!封面为论文标题词云图!