目录
2. Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation [PDF] 摘要
3. May I Ask Who's Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance [PDF] 摘要
7. Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model [PDF] 摘要
9. Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management [PDF] 摘要
10. Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection [PDF] 摘要
12. "where is this relationship going?": Understanding Relationship Trajectories in Narrative Text [PDF] 摘要
20. Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation [PDF] 摘要
摘要
1. Contextual BERT: Conditioning the Language Model Using a Global State [PDF] 返回目录
Timo I. Denk, Ana Peleteiro Ramallo
Abstract: BERT is a popular language model whose main pre-training task is to fill in the blank, i.e., predicting a word that was masked out of a sentence, based on the remaining words. In some applications, however, having an additional context can help the model make the right prediction, e.g., by taking the domain or the time of writing into account. This motivates us to advance the BERT architecture by adding a global state for conditioning on a fixed-sized context. We present our two novel approaches and apply them to an industry use-case, where we complete fashion outfits with missing articles, conditioned on a specific customer. An experimental comparison to other methods from the literature shows that our methods improve personalization significantly.
摘要:BERT是一种流行的语言模型,其主要预训练任务是填补空白,即,预测基于剩余的单词屏蔽句子的单词。然而,在某些应用中,具有额外的上下文可以帮助模型进行正确的预测,例如,通过占据域或写入时的时间。这使我们能够通过在固定大小的上下文上添加全局状态来推进BERT架构。我们介绍了我们的两种新方法,并将其应用于行业用法案例,在那里我们完成了缺少文章的时尚服装,调节在特定客户身上。与文献中其他方法的实验比较表明,我们的方法显着提高个性化。
Timo I. Denk, Ana Peleteiro Ramallo
Abstract: BERT is a popular language model whose main pre-training task is to fill in the blank, i.e., predicting a word that was masked out of a sentence, based on the remaining words. In some applications, however, having an additional context can help the model make the right prediction, e.g., by taking the domain or the time of writing into account. This motivates us to advance the BERT architecture by adding a global state for conditioning on a fixed-sized context. We present our two novel approaches and apply them to an industry use-case, where we complete fashion outfits with missing articles, conditioned on a specific customer. An experimental comparison to other methods from the literature shows that our methods improve personalization significantly.
摘要:BERT是一种流行的语言模型,其主要预训练任务是填补空白,即,预测基于剩余的单词屏蔽句子的单词。然而,在某些应用中,具有额外的上下文可以帮助模型进行正确的预测,例如,通过占据域或写入时的时间。这使我们能够通过在固定大小的上下文上添加全局状态来推进BERT架构。我们介绍了我们的两种新方法,并将其应用于行业用法案例,在那里我们完成了缺少文章的时尚服装,调节在特定客户身上。与文献中其他方法的实验比较表明,我们的方法显着提高个性化。
2. Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation [PDF] 返回目录
Hang Dong, Víctor Suárez-Paniagua, William Whiteley, Honghan Wu
Abstract: Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlation among medical codes which can potentially be exploited to improve the performance. We propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS COVID-19 shielding codes. Experiments were conducted to compare HLAN and LE initialisation to the state-of-the-art neural network based methods. HLAN achieved the best Micro-level AUC and $F_1$ on the top-50 code prediction and comparable results on the NHS COVID-19 shielding code prediction to other models. By highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to its downgraded baselines and the CNN-based models. LE initialisation consistently boosted most deep learning models for automated medical coding.
摘要:临床笔记的诊断或程序编码旨在导出有关患者的疾病相关信息的编码摘要。这种编码通常在医院手动完成,但可能是可以自动化的,以提高医学编码的效率和准确性。自动化医学编码深度学习的最新研究取得了有希望的表现。然而,这些模型的解释性通常差,防止它们自信地用于支持临床实践。另一个限制是,这些模型主要假设标签之间的独立性,忽略了可以潜在利用以提高性能的医学代码之间的复杂相关性。我们提出了一个分层标签的关注网络(HLAN),其旨在通过量化与每个标签相关的单词和句子的重要性(作为注意力)来解释模型。其次,我们建议提高具有标签嵌入(LE)初始化方法的主要深度学习模型,该方法学习密集,连续的矢量表示,然后将表示注入最终层和模型中的标签 - 明智的关注层。我们在模拟-III放电摘要上使用三种设置进行了评估方法:完整代码,前50个代码和英国NHS Covid-19屏蔽代码。进行实验以将HLAN和LE初始化与最先进的神经网络的方法进行比较。 HLAN在NHS Covid-19屏蔽码预测到其他模型的NHS Covid-19屏蔽码预测中获得了最佳的微级AUC和$ F_1 $。通过突出显示每个标签的最大的单词和句子,与其降级的基线和基于CNN的模型相比,HLAN显示了更有意义和全面的模型解释。 LE初始化一直提升了最深入的自动化医学编码模型。
Hang Dong, Víctor Suárez-Paniagua, William Whiteley, Honghan Wu
Abstract: Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlation among medical codes which can potentially be exploited to improve the performance. We propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS COVID-19 shielding codes. Experiments were conducted to compare HLAN and LE initialisation to the state-of-the-art neural network based methods. HLAN achieved the best Micro-level AUC and $F_1$ on the top-50 code prediction and comparable results on the NHS COVID-19 shielding code prediction to other models. By highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to its downgraded baselines and the CNN-based models. LE initialisation consistently boosted most deep learning models for automated medical coding.
摘要:临床笔记的诊断或程序编码旨在导出有关患者的疾病相关信息的编码摘要。这种编码通常在医院手动完成,但可能是可以自动化的,以提高医学编码的效率和准确性。自动化医学编码深度学习的最新研究取得了有希望的表现。然而,这些模型的解释性通常差,防止它们自信地用于支持临床实践。另一个限制是,这些模型主要假设标签之间的独立性,忽略了可以潜在利用以提高性能的医学代码之间的复杂相关性。我们提出了一个分层标签的关注网络(HLAN),其旨在通过量化与每个标签相关的单词和句子的重要性(作为注意力)来解释模型。其次,我们建议提高具有标签嵌入(LE)初始化方法的主要深度学习模型,该方法学习密集,连续的矢量表示,然后将表示注入最终层和模型中的标签 - 明智的关注层。我们在模拟-III放电摘要上使用三种设置进行了评估方法:完整代码,前50个代码和英国NHS Covid-19屏蔽代码。进行实验以将HLAN和LE初始化与最先进的神经网络的方法进行比较。 HLAN在NHS Covid-19屏蔽码预测到其他模型的NHS Covid-19屏蔽码预测中获得了最佳的微级AUC和$ F_1 $。通过突出显示每个标签的最大的单词和句子,与其降级的基线和基于CNN的模型相比,HLAN显示了更有意义和全面的模型解释。 LE初始化一直提升了最深入的自动化医学编码模型。
3. May I Ask Who's Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance [PDF] 返回目录
Micaela Kaplan
Abstract: We investigate using Named Entity Recognition on a new type of user-generated text: a call center conversation. These conversations combine problems from spontaneous speech with problems novel to conversational Automated Speech Recognition, including incorrect recognition, alongside other common problems from noisy user-generated text. Using our own corpus with new annotations, training custom contextual string embeddings, and applying a BiLSTM-CRF, we match state-of-the-art results on our novel task.
摘要:我们在新类型的用户生成文本上使用命名实体识别进行调查:呼叫中心对话。这些对话与对话自动语音识别的问题小说中的自发演讲结合起来,包括不正确的识别,以及来自嘈杂的用户生成的文本的其他常见问题。使用我们自己的语料库具有新的注释,培训自定义上下文串嵌入品,并应用Bilstm-CRF,我们将最先进的结果与我们的小型任务相匹配。
Micaela Kaplan
Abstract: We investigate using Named Entity Recognition on a new type of user-generated text: a call center conversation. These conversations combine problems from spontaneous speech with problems novel to conversational Automated Speech Recognition, including incorrect recognition, alongside other common problems from noisy user-generated text. Using our own corpus with new annotations, training custom contextual string embeddings, and applying a BiLSTM-CRF, we match state-of-the-art results on our novel task.
摘要:我们在新类型的用户生成文本上使用命名实体识别进行调查:呼叫中心对话。这些对话与对话自动语音识别的问题小说中的自发演讲结合起来,包括不正确的识别,以及来自嘈杂的用户生成的文本的其他常见问题。使用我们自己的语料库具有新的注释,培训自定义上下文串嵌入品,并应用Bilstm-CRF,我们将最先进的结果与我们的小型任务相匹配。
4. Unbabel's Participation in the WMT20 Metrics Shared Task [PDF] 返回目录
Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie
Abstract: We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.
摘要:我们在指标上展示了Unbabel团队对WMT 2020共享任务的贡献。我们打算参与所有语言对的段级别,文档级别和系统级曲目,以及“QE作为指标”轨道。因此,我们将参考上一年的测试集说明了这些曲目中模型的结果。我们的提交版本在最近提出的彗星框架上建立:我们培训几个估计模型,以在不同的人类生成的质量评分上归因,并在直接评估中获得的相对等级培训的新型排名模型。我们还提出了一种简单的技术,将段级预测转换为文档级别分数。总的来说,我们的系统在以前的测试集中的所有语言对中实现了强烈的结果,并且在许多情况下设置了新的最先进的。
Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie
Abstract: We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.
摘要:我们在指标上展示了Unbabel团队对WMT 2020共享任务的贡献。我们打算参与所有语言对的段级别,文档级别和系统级曲目,以及“QE作为指标”轨道。因此,我们将参考上一年的测试集说明了这些曲目中模型的结果。我们的提交版本在最近提出的彗星框架上建立:我们培训几个估计模型,以在不同的人类生成的质量评分上归因,并在直接评估中获得的相对等级培训的新型排名模型。我们还提出了一种简单的技术,将段级预测转换为文档级别分数。总的来说,我们的系统在以前的测试集中的所有语言对中实现了强烈的结果,并且在许多情况下设置了新的最先进的。
5. Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information [PDF] 返回目录
Yuyang Nie, Yuanhe Tian, Yan Song, Xiang Ao, Xiang Wan
Abstract: Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.
摘要:命名实体识别(ner)对句子句法和语义属性非常敏感,其中可以根据它们的使用方式提取实体并将其放在正在运行的文本中。为了模拟此类属性,可以依赖现有资源来为NER任务提供有用的知识;一些现有研究证明了这样做的有效性,并且在适当利用的知识方面受到限制,例如在特定环境中区分重要的知识。在本文中,我们通过分别通过临床集合利用不同类型的语法信息来改进ner,其通过所提出的键值存储网络,语法关注和用于编码,加权和聚合这种句法信息的栅极机制来官能化。六种英语和中国基准数据集的实验结果表明,拟议模型的有效性,并表明它优于所有实验数据集的先前研究。
Yuyang Nie, Yuanhe Tian, Yan Song, Xiang Ao, Xiang Wan
Abstract: Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.
摘要:命名实体识别(ner)对句子句法和语义属性非常敏感,其中可以根据它们的使用方式提取实体并将其放在正在运行的文本中。为了模拟此类属性,可以依赖现有资源来为NER任务提供有用的知识;一些现有研究证明了这样做的有效性,并且在适当利用的知识方面受到限制,例如在特定环境中区分重要的知识。在本文中,我们通过分别通过临床集合利用不同类型的语法信息来改进ner,其通过所提出的键值存储网络,语法关注和用于编码,加权和聚合这种句法信息的栅极机制来官能化。六种英语和中国基准数据集的实验结果表明,拟议模型的有效性,并表明它优于所有实验数据集的先前研究。
6. Named Entity Recognition for Social Media Texts with Semantic Augmentation [PDF] 返回目录
Yuyang Nie, Yuanhe Tian, Xiang Wan, Yan Song, Bo Dai
Abstract: Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.
摘要:在短期和非正式文本中,特别是用户生成的社交媒体内容进行时,当命名实体识别的现有方法遭受数据稀疏问题。语义增强是一种缓解这个问题的潜在方法。鉴于在预训练的单词嵌入中隐式保留了丰富的语义信息,它们是语义增强的潜在理想资源。在本文中,我们向NER提出了一种基于神经的方法,用于社交媒体文本,其中包括本地(来自运行文本)和增强语义都被考虑在内。特别地,我们从大规模语料库获得增强语义信息,并提出分别的细节增强模块和栅极模块,以分别编码和聚合这些信息。在从英文和中国社交媒体平台中收集的三个基准数据集进行了广泛的实验,结果表明了我们对所有三个数据集的前一项研究的方法的优势。
Yuyang Nie, Yuanhe Tian, Xiang Wan, Yan Song, Bo Dai
Abstract: Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.
摘要:在短期和非正式文本中,特别是用户生成的社交媒体内容进行时,当命名实体识别的现有方法遭受数据稀疏问题。语义增强是一种缓解这个问题的潜在方法。鉴于在预训练的单词嵌入中隐式保留了丰富的语义信息,它们是语义增强的潜在理想资源。在本文中,我们向NER提出了一种基于神经的方法,用于社交媒体文本,其中包括本地(来自运行文本)和增强语义都被考虑在内。特别地,我们从大规模语料库获得增强语义信息,并提出分别的细节增强模块和栅极模块,以分别编码和聚合这些信息。在从英文和中国社交媒体平台中收集的三个基准数据集进行了广泛的实验,结果表明了我们对所有三个数据集的前一项研究的方法的优势。
7. Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model [PDF] 返回目录
Mana Ihori, Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi
Abstract: This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called {\bf memory attentive fusion}, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.
摘要:本文介绍了一种用于将外部语言模型(LM)集成到基于变压器的序列 - 序列(SEQ2SEQ)模型的新型融合方法。虽然配对数据基本要求训练SEQ2SEQ模型,但外部LM只能培训,只有未配对的数据训练。因此,重要的是在外部LM中利用记忆知识来构建SEQ2SEQ模型,因为很难准备大量配对数据。然而,现有的融合方法假设LM与经常性神经网络的SEQ2Seq模型集成而不是变压器。因此,本文提出了一种融合方法,可以明确地利用变压器中的网络结构。所提出的方法称为{\ BF内存周度融合},利用了变压器风格的注意机制,以多跳的方式重复源极点关注,以便在LM中读取记忆的知识。我们对两个文本式转换任务的实验表明,所提出的方法比传统的融合方法更好。
Mana Ihori, Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi
Abstract: This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called {\bf memory attentive fusion}, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.
摘要:本文介绍了一种用于将外部语言模型(LM)集成到基于变压器的序列 - 序列(SEQ2SEQ)模型的新型融合方法。虽然配对数据基本要求训练SEQ2SEQ模型,但外部LM只能培训,只有未配对的数据训练。因此,重要的是在外部LM中利用记忆知识来构建SEQ2SEQ模型,因为很难准备大量配对数据。然而,现有的融合方法假设LM与经常性神经网络的SEQ2Seq模型集成而不是变压器。因此,本文提出了一种融合方法,可以明确地利用变压器中的网络结构。所提出的方法称为{\ BF内存周度融合},利用了变压器风格的注意机制,以多跳的方式重复源极点关注,以便在LM中读取记忆的知识。我们对两个文本式转换任务的实验表明,所提出的方法比传统的融合方法更好。
8. Tilde at WMT 2020: News Task Systems [PDF] 返回目录
Rihards Krišlauks, Mārcis Pinnis
Abstract: This paper describes Tilde's submission to the WMT2020 shared task on news translation for both directions of the English-Polish language pair in both the constrained and the unconstrained tracks. We follow our submissions from the previous years and build our baseline systems to be morphologically motivated sub-word unit-based Transformer base models that we train using the Marian machine translation toolkit. Additionally, we experiment with different parallel and monolingual data selection schemes, as well as sampled back-translation. Our final models are ensembles of Transformer base and Transformer big models that feature right-to-left re-ranking.
摘要:本文介绍了TINDE对WMT2020共享任务的提交,即在受限制和无约束轨道中的英语 - 波兰语对的两种方向的新闻翻译。我们遵循前几年的提交,并在使用Marian Machinal Translight Toolkit培训我们训练的基于基于基于子字单元的变压器基础模型的基线系统。此外,我们尝试不同的并行和单语数据选择方案,以及采样的后翻。我们的最终模型是变压器基础和变压器大型型号的融合,可右至于重新排名。
Rihards Krišlauks, Mārcis Pinnis
Abstract: This paper describes Tilde's submission to the WMT2020 shared task on news translation for both directions of the English-Polish language pair in both the constrained and the unconstrained tracks. We follow our submissions from the previous years and build our baseline systems to be morphologically motivated sub-word unit-based Transformer base models that we train using the Marian machine translation toolkit. Additionally, we experiment with different parallel and monolingual data selection schemes, as well as sampled back-translation. Our final models are ensembles of Transformer base and Transformer big models that feature right-to-left re-ranking.
摘要:本文介绍了TINDE对WMT2020共享任务的提交,即在受限制和无约束轨道中的英语 - 波兰语对的两种方向的新闻翻译。我们遵循前几年的提交,并在使用Marian Machinal Translight Toolkit培训我们训练的基于基于基于子字单元的变压器基础模型的基线系统。此外,我们尝试不同的并行和单语数据选择方案,以及采样的后翻。我们的最终模型是变压器基础和变压器大型型号的融合,可右至于重新排名。
9. Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management [PDF] 返回目录
Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci
Abstract: Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size considering the complexity of the dialogues. Additionally, conventional training signal inference is not suitable for non-deterministic agent behaviour, i.e. considering multiple actions as valid in identical dialogue states. We propose the Conversation Graph (ConvGraph), a graph-based representation of dialogues that can be exploited for data augmentation, multi-reference training and evaluation of non-deterministic agents. ConvGraph generates novel dialogue paths to augment data volume and diversity. Intrinsic and extrinsic evaluation across three datasets shows that data augmentation and/or multi-reference training with ConvGraph can improve dialogue success rates by up to 6.4%.
摘要:面向任务的对话系统通常依赖大量的高质量培训数据,或者需要复杂的手工规则。然而,考虑到对话的复杂性,现有数据集通常限制。另外,传统的训练信号推断不适用于非确定性代理行为,即考虑到在相同的对话状态中有效的多个动作。我们提出了对话图(CONVEGraph),可以用于数据增强,多参考培训和非确定性代理评估的基于图形的对话的表现形式。 Convapraph生成新的对话路径,以增加数据量和多样性。三个数据集的内在和外在评估显示,使用Revergraph的数据增强和/或多参考培训可以将对话成功率提高至6.4%。
Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci
Abstract: Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size considering the complexity of the dialogues. Additionally, conventional training signal inference is not suitable for non-deterministic agent behaviour, i.e. considering multiple actions as valid in identical dialogue states. We propose the Conversation Graph (ConvGraph), a graph-based representation of dialogues that can be exploited for data augmentation, multi-reference training and evaluation of non-deterministic agents. ConvGraph generates novel dialogue paths to augment data volume and diversity. Intrinsic and extrinsic evaluation across three datasets shows that data augmentation and/or multi-reference training with ConvGraph can improve dialogue success rates by up to 6.4%.
摘要:面向任务的对话系统通常依赖大量的高质量培训数据,或者需要复杂的手工规则。然而,考虑到对话的复杂性,现有数据集通常限制。另外,传统的训练信号推断不适用于非确定性代理行为,即考虑到在相同的对话状态中有效的多个动作。我们提出了对话图(CONVEGraph),可以用于数据增强,多参考培训和非确定性代理评估的基于图形的对话的表现形式。 Convapraph生成新的对话路径,以增加数据量和多样性。三个数据集的内在和外在评估显示,使用Revergraph的数据增强和/或多参考培训可以将对话成功率提高至6.4%。
10. Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection [PDF] 返回目录
Shaolei Wang, Zhongyuan Wang, Wanxiang Che, Ting Liu
Abstract: Most existing approaches to disfluency detection heavily rely on human-annotated corpora, which is expensive to obtain in practice. There have been several proposals to alleviate this issue with, for instance, self-supervised learning techniques, but they still require human-annotated corpora. In this work, we explore the unsupervised learning paradigm which can potentially work with unlabeled text corpora that are cheaper and easier to obtain. Our model builds upon the recent work on Noisy Student Training, a semi-supervised learning approach that extends the idea of self-training. Experimental results on the commonly used English Switchboard test set show that our approach achieves competitive performance compared to the previous state-of-the-art supervised systems using contextualized word embeddings (e.g. BERT and ELECTRA).
摘要:大多数风暴检测方法依赖于人类注释的语料库,这在实践中昂贵。有几个提案可以通过例如自我监督的学习技术来缓解这个问题,但他们仍然需要人类注释的语料库。在这项工作中,我们探讨了无监督的学习范式,它可能与未标记的文本语料库一起工作,更便宜,更容易获得。我们的模型在最近的嘈杂学生培训工作时,一个半监督的学习方法,延伸了自我培训的想法。常用的英语交换机测试集上的实验结果表明,与使用上下文化单词嵌入(例如BERT和Electra)相比,我们的方法与以前的最先进的监督系统相比实现了竞争性能。
Shaolei Wang, Zhongyuan Wang, Wanxiang Che, Ting Liu
Abstract: Most existing approaches to disfluency detection heavily rely on human-annotated corpora, which is expensive to obtain in practice. There have been several proposals to alleviate this issue with, for instance, self-supervised learning techniques, but they still require human-annotated corpora. In this work, we explore the unsupervised learning paradigm which can potentially work with unlabeled text corpora that are cheaper and easier to obtain. Our model builds upon the recent work on Noisy Student Training, a semi-supervised learning approach that extends the idea of self-training. Experimental results on the commonly used English Switchboard test set show that our approach achieves competitive performance compared to the previous state-of-the-art supervised systems using contextualized word embeddings (e.g. BERT and ELECTRA).
摘要:大多数风暴检测方法依赖于人类注释的语料库,这在实践中昂贵。有几个提案可以通过例如自我监督的学习技术来缓解这个问题,但他们仍然需要人类注释的语料库。在这项工作中,我们探讨了无监督的学习范式,它可能与未标记的文本语料库一起工作,更便宜,更容易获得。我们的模型在最近的嘈杂学生培训工作时,一个半监督的学习方法,延伸了自我培训的想法。常用的英语交换机测试集上的实验结果表明,与使用上下文化单词嵌入(例如BERT和Electra)相比,我们的方法与以前的最先进的监督系统相比实现了竞争性能。
11. Multiple Sclerosis Severity Classification From Clinical Text [PDF] 返回目录
Alister D Costa, Stefan Denkovski, Michal Malyska, Sae Young Moon, Brandon Rufino, Zhen Yang, Taylor Killian, Marzyeh Ghassemi
Abstract: Multiple Sclerosis (MS) is a chronic, inflammatory and degenerative neurological disease, which is monitored by a specialist using the Expanded Disability Status Scale (EDSS) and recorded in unstructured text in the form of a neurology consult note. An EDSS measurement contains an overall "EDSS" score and several functional subscores. Typically, expert knowledge is required to interpret consult notes and generate these scores. Previous approaches used limited context length Word2Vec embeddings and keyword searches to predict scores given a consult note, but often failed when scores were not explicitly stated. In this work, we present MS-BERT, the first publicly available transformer model trained on real clinical data other than MIMIC. Next, we present MSBC, a classifier that applies MS-BERT to generate embeddings and predict EDSS and functional subscores. Lastly, we explore combining MSBC with other models through the use of Snorkel to generate scores for unlabelled consult notes. MSBC achieves state-of-the-art performance on all metrics and prediction tasks and outperforms the models generated from the Snorkel ensemble. We improve Macro-F1 by 0.12 (to 0.88) for predicting EDSS and on average by 0.29 (to 0.63) for predicting functional subscores over previous Word2Vec CNN and rule-based approaches.
摘要:多发性硬化症(MS)是一种慢性,炎症和退行性神经系统疾病,其通过使用扩展的残疾状态规模(EDS)通过专家监测,并以神经内科的形式记录在非结构化文本中。 EDSS测量包含一个整体“EDSS”得分和多个功能子。通常,专家知识需要解释咨询说明并生成这些分数。以前的方法使用有限的上下文长度Word2vec eMbeddings和关键字搜索以预测咨询注意,但在未明确说明分数时经常失败。在这项工作中,我们展示了MS-BERT,这是第一批公开的变压器模型,这些模型除模仿以外的真实临床数据。接下来,我们呈现MSBC,该分类器应用MS-BERT生成嵌入品并预测EDSS和功能子系统。最后,我们通过使用浮潜与其他模型结合MSBC以为未标记的咨询说明生成分数。 MSBC在所有指标和预测任务上实现最先进的性能,并且优于浮潜组合生成的模型。我们将宏F1改进0.12(至0.88),用于预测EDSS,平均为0.29(至0.63),以预测先前的WORD2VEC CNN和基于规则的方法。
Alister D Costa, Stefan Denkovski, Michal Malyska, Sae Young Moon, Brandon Rufino, Zhen Yang, Taylor Killian, Marzyeh Ghassemi
Abstract: Multiple Sclerosis (MS) is a chronic, inflammatory and degenerative neurological disease, which is monitored by a specialist using the Expanded Disability Status Scale (EDSS) and recorded in unstructured text in the form of a neurology consult note. An EDSS measurement contains an overall "EDSS" score and several functional subscores. Typically, expert knowledge is required to interpret consult notes and generate these scores. Previous approaches used limited context length Word2Vec embeddings and keyword searches to predict scores given a consult note, but often failed when scores were not explicitly stated. In this work, we present MS-BERT, the first publicly available transformer model trained on real clinical data other than MIMIC. Next, we present MSBC, a classifier that applies MS-BERT to generate embeddings and predict EDSS and functional subscores. Lastly, we explore combining MSBC with other models through the use of Snorkel to generate scores for unlabelled consult notes. MSBC achieves state-of-the-art performance on all metrics and prediction tasks and outperforms the models generated from the Snorkel ensemble. We improve Macro-F1 by 0.12 (to 0.88) for predicting EDSS and on average by 0.29 (to 0.63) for predicting functional subscores over previous Word2Vec CNN and rule-based approaches.
摘要:多发性硬化症(MS)是一种慢性,炎症和退行性神经系统疾病,其通过使用扩展的残疾状态规模(EDS)通过专家监测,并以神经内科的形式记录在非结构化文本中。 EDSS测量包含一个整体“EDSS”得分和多个功能子。通常,专家知识需要解释咨询说明并生成这些分数。以前的方法使用有限的上下文长度Word2vec eMbeddings和关键字搜索以预测咨询注意,但在未明确说明分数时经常失败。在这项工作中,我们展示了MS-BERT,这是第一批公开的变压器模型,这些模型除模仿以外的真实临床数据。接下来,我们呈现MSBC,该分类器应用MS-BERT生成嵌入品并预测EDSS和功能子系统。最后,我们通过使用浮潜与其他模型结合MSBC以为未标记的咨询说明生成分数。 MSBC在所有指标和预测任务上实现最先进的性能,并且优于浮潜组合生成的模型。我们将宏F1改进0.12(至0.88),用于预测EDSS,平均为0.29(至0.63),以预测先前的WORD2VEC CNN和基于规则的方法。
12. "where is this relationship going?": Understanding Relationship Trajectories in Narrative Text [PDF] 返回目录
Keen You, Dan Goldwasser
Abstract: We examine a new commonsense reasoning task: given a narrative describing a social interaction that centers on two protagonists, systems make inferences about the underlying relationship trajectory. Specifically, we propose two evaluation tasks: Relationship Outlook Prediction MCQ and Resolution Prediction MCQ. In Relationship Outlook Prediction, a system maps an interaction to a relationship outlook that captures how the interaction is expected to change the relationship. In Resolution Prediction, a system attributes a given relationship outlook to a particular resolution that explains the outcome. These two tasks parallel two real-life questions that people frequently ponder upon as they navigate different social situations: "where is this relationship going?" and "how did we end up here?". To facilitate the investigation of human social relationships through these two tasks, we construct a new dataset, Social Narrative Tree, which consists of 1250 stories documenting a variety of daily social interactions. The narratives encode a multitude of social elements that interweave to give rise to rich commonsense knowledge of how relationships evolve with respect to social interactions. We establish baseline performances using language models and the accuracies are significantly lower than human performance. The results demonstrate that models need to look beyond syntactic and semantic signals to comprehend complex human relationships.
摘要:我们研究了一个新的偶然推理任务:给出了一个叙述,描述了两个主角的社会互动,系统对底层关系轨迹进行推论。具体而言,我们提出了两个评估任务:关系展望预测MCQ和分辨率预测MCQ。在关系展望预测中,系统将交互映射到关系Outlook的交互,从而捕获预期如何改变关系的关系。在分辨率预测中,系统将给定的关系前景属性属于解释结果的特定分辨率。这两个任务并行两个现实的问题,即人们经常思考,因为他们在导航不同的社交场合时:“这种关系在哪里?”和“我们是如何在这里结束的?”。为了促进通过这两个任务对人类社会关系的调查,我们建立了一个新的数据集,社会叙事树,由1250个故事组成,记录各种日常社交互动。叙述编码了一些众多社会元素,交织在富裕的型号致力于对社会互动的发展方式。我们使用语言模型建立基线表演,精度低于人类性能。结果表明,模型需要超越句法和语义信号来理解复杂的人际关系。
Keen You, Dan Goldwasser
Abstract: We examine a new commonsense reasoning task: given a narrative describing a social interaction that centers on two protagonists, systems make inferences about the underlying relationship trajectory. Specifically, we propose two evaluation tasks: Relationship Outlook Prediction MCQ and Resolution Prediction MCQ. In Relationship Outlook Prediction, a system maps an interaction to a relationship outlook that captures how the interaction is expected to change the relationship. In Resolution Prediction, a system attributes a given relationship outlook to a particular resolution that explains the outcome. These two tasks parallel two real-life questions that people frequently ponder upon as they navigate different social situations: "where is this relationship going?" and "how did we end up here?". To facilitate the investigation of human social relationships through these two tasks, we construct a new dataset, Social Narrative Tree, which consists of 1250 stories documenting a variety of daily social interactions. The narratives encode a multitude of social elements that interweave to give rise to rich commonsense knowledge of how relationships evolve with respect to social interactions. We establish baseline performances using language models and the accuracies are significantly lower than human performance. The results demonstrate that models need to look beyond syntactic and semantic signals to comprehend complex human relationships.
摘要:我们研究了一个新的偶然推理任务:给出了一个叙述,描述了两个主角的社会互动,系统对底层关系轨迹进行推论。具体而言,我们提出了两个评估任务:关系展望预测MCQ和分辨率预测MCQ。在关系展望预测中,系统将交互映射到关系Outlook的交互,从而捕获预期如何改变关系的关系。在分辨率预测中,系统将给定的关系前景属性属于解释结果的特定分辨率。这两个任务并行两个现实的问题,即人们经常思考,因为他们在导航不同的社交场合时:“这种关系在哪里?”和“我们是如何在这里结束的?”。为了促进通过这两个任务对人类社会关系的调查,我们建立了一个新的数据集,社会叙事树,由1250个故事组成,记录各种日常社交互动。叙述编码了一些众多社会元素,交织在富裕的型号致力于对社会互动的发展方式。我们使用语言模型建立基线表演,精度低于人类性能。结果表明,模型需要超越句法和语义信号来理解复杂的人际关系。
13. Uncovering Latent Biases in Text: Method and Application to Peer Review [PDF] 返回目录
Emaad Manzoor, Nihar B. Shah
Abstract: Quantifying systematic disparities in numerical quantities such as employment rates and wages between population subgroups provides compelling evidence for the existence of societal biases. However, biases in the text written for members of different subgroups (such as in recommendation letters for male and non-male candidates), though widely reported anecdotally, remain challenging to quantify. In this work, we introduce a novel framework to quantify bias in text caused by the visibility of subgroup membership indicators. We develop a nonparametric estimation and inference procedure to estimate this bias. We then formalize an identification strategy to causally link the estimated bias to the visibility of subgroup membership indicators, provided observations from time periods both before and after an identity-hiding policy change. We identify an application wherein "ground truth" bias can be inferred to evaluate our framework, instead of relying on synthetic or secondary data. Specifically, we apply our framework to quantify biases in the text of peer reviews from a reputed machine learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as "ground truth", and show that our proposed framework accurately detects these biases from the review text without having access to the review ratings.
摘要:量化数量的系统差异,如人口亚组之间的就业率和工资为存在的社会偏见提供了令人信服的证据。但是,为不同亚组的成员编写的文本中的偏见(例如男性和非男性候选人的推荐信),虽然广泛报告,但对量化保持挑战。在这项工作中,我们介绍了一个小说框架,以量化由子组成员指标的可见性造成的文本中的偏见。我们开发非参数估计和推理过程以估计此偏差。然后,我们将识别策略正规化,以便将估计的偏见与子群会员资格指标的可见性联系起来,提供了在身份隐藏政策变更之前和之后的时间段的观察结果。我们识别一个应用程序,其中可以推断出“地面真理”偏差以评估我们的框架,而不是依赖于合成或次要数据。具体而言,我们将我们的框架应用于在会议之前和之后的知名机器学习会议上通过了一项双盲审查政策,从对等审查文本中量化偏差。我们展示了审查评级的偏见证据,该评级作为“基础事实”,并表明我们的拟议框架准确地检测了审查文本的这些偏见,而无需进入审查评级。
Emaad Manzoor, Nihar B. Shah
Abstract: Quantifying systematic disparities in numerical quantities such as employment rates and wages between population subgroups provides compelling evidence for the existence of societal biases. However, biases in the text written for members of different subgroups (such as in recommendation letters for male and non-male candidates), though widely reported anecdotally, remain challenging to quantify. In this work, we introduce a novel framework to quantify bias in text caused by the visibility of subgroup membership indicators. We develop a nonparametric estimation and inference procedure to estimate this bias. We then formalize an identification strategy to causally link the estimated bias to the visibility of subgroup membership indicators, provided observations from time periods both before and after an identity-hiding policy change. We identify an application wherein "ground truth" bias can be inferred to evaluate our framework, instead of relying on synthetic or secondary data. Specifically, we apply our framework to quantify biases in the text of peer reviews from a reputed machine learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as "ground truth", and show that our proposed framework accurately detects these biases from the review text without having access to the review ratings.
摘要:量化数量的系统差异,如人口亚组之间的就业率和工资为存在的社会偏见提供了令人信服的证据。但是,为不同亚组的成员编写的文本中的偏见(例如男性和非男性候选人的推荐信),虽然广泛报告,但对量化保持挑战。在这项工作中,我们介绍了一个小说框架,以量化由子组成员指标的可见性造成的文本中的偏见。我们开发非参数估计和推理过程以估计此偏差。然后,我们将识别策略正规化,以便将估计的偏见与子群会员资格指标的可见性联系起来,提供了在身份隐藏政策变更之前和之后的时间段的观察结果。我们识别一个应用程序,其中可以推断出“地面真理”偏差以评估我们的框架,而不是依赖于合成或次要数据。具体而言,我们将我们的框架应用于在会议之前和之后的知名机器学习会议上通过了一项双盲审查政策,从对等审查文本中量化偏差。我们展示了审查评级的偏见证据,该评级作为“基础事实”,并表明我们的拟议框架准确地检测了审查文本的这些偏见,而无需进入审查评级。
14. CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models [PDF] 返回目录
Abhinav Singh, Patrick Xia, Guanghui Qin, Mahsa Yarmohammadi, Benjamin Van Durme
Abstract: Copy mechanisms are employed in sequence to sequence models (seq2seq) to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.
摘要:复制机制按顺序使用序列模型(SEQ2Seq),以生成从输入到输出的单词的再现。这些框架,在词汇类型级别操作,无法提供显式对齐,其记录每个令牌复制的位置。此外,它们需要单独复制的输入(跨度)的连续令牌序列。我们提出了一个具有明确令牌级复制操作的模型,并将其扩展为复制整个跨度。我们的模型在输入和输出中的跨度之间提供了硬对齐,允许SEQ2Seq的非传统应用,如信息提取。我们展示了嵌套命名实体识别的方法,实现了近最先进的准确性,并以幅度增加的解码速度增加。
Abhinav Singh, Patrick Xia, Guanghui Qin, Mahsa Yarmohammadi, Benjamin Van Durme
Abstract: Copy mechanisms are employed in sequence to sequence models (seq2seq) to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.
摘要:复制机制按顺序使用序列模型(SEQ2Seq),以生成从输入到输出的单词的再现。这些框架,在词汇类型级别操作,无法提供显式对齐,其记录每个令牌复制的位置。此外,它们需要单独复制的输入(跨度)的连续令牌序列。我们提出了一个具有明确令牌级复制操作的模型,并将其扩展为复制整个跨度。我们的模型在输入和输出中的跨度之间提供了硬对齐,允许SEQ2Seq的非传统应用,如信息提取。我们展示了嵌套命名实体识别的方法,实现了近最先进的准确性,并以幅度增加的解码速度增加。
15. A Visuospatial Dataset for Naturalistic Verb Learning [PDF] 返回目录
Dylan Ebert, Ellie Pavlick
Abstract: We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access: That is, naturalistic, spontaneous speech paired with richly grounded visuospatial context. We use the collected data to compare several distributional semantics models for verb learning. We evaluate neural models based on 2D (pixel) features as well as feature-engineered models based on 3D (symbolic, spatial) features, and show that neither modeling approach achieves satisfactory performance. Our results are consistent with evidence from child language acquisition that emphasizes the difficulty of learning verbs from naive distributional data. We discuss avenues for future work on cognitively-inspired grounded language learning, and release our corpus with the intent of facilitating research on the topic.
摘要:我们介绍了一个新数据集,用于培训和评估接地语言模型。我们的数据将在虚拟现实环境中收集,旨在模拟口头前列儿童可能有访问的语言数据的质量:即自然主义,自发性语音与丰富地接地的粘合性上下文配对。我们使用收集的数据比较动词学习的几个分布语义模型。我们评估基于2D(像素)特征的神经模型以及基于3D(符号,空间)特征的特征设计模型,并显示既不是建模方法都能实现令人满意的性能。我们的结果与来自儿童语言习得的证据一致,以强调来自天真分布数据的学习动词的难度。我们讨论了未来促进认知的接地语言学习工作的途径,并释放了我们的语料库,促进了对该主题的研究。
Dylan Ebert, Ellie Pavlick
Abstract: We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access: That is, naturalistic, spontaneous speech paired with richly grounded visuospatial context. We use the collected data to compare several distributional semantics models for verb learning. We evaluate neural models based on 2D (pixel) features as well as feature-engineered models based on 3D (symbolic, spatial) features, and show that neither modeling approach achieves satisfactory performance. Our results are consistent with evidence from child language acquisition that emphasizes the difficulty of learning verbs from naive distributional data. We discuss avenues for future work on cognitively-inspired grounded language learning, and release our corpus with the intent of facilitating research on the topic.
摘要:我们介绍了一个新数据集,用于培训和评估接地语言模型。我们的数据将在虚拟现实环境中收集,旨在模拟口头前列儿童可能有访问的语言数据的质量:即自然主义,自发性语音与丰富地接地的粘合性上下文配对。我们使用收集的数据比较动词学习的几个分布语义模型。我们评估基于2D(像素)特征的神经模型以及基于3D(符号,空间)特征的特征设计模型,并显示既不是建模方法都能实现令人满意的性能。我们的结果与来自儿童语言习得的证据一致,以强调来自天真分布数据的学习动词的难度。我们讨论了未来促进认知的接地语言学习工作的途径,并释放了我们的语料库,促进了对该主题的研究。
16. DeSMOG: Detecting Stance in Media On Global Warming [PDF] 返回目录
Yiwei Luo, Dallas Card, Dan Jurafsky
Abstract: Citing opinions is a powerful yet understudied strategy in argumentation. For example, an environmental activist might say, "Leading scientists agree that global warming is a serious concern," framing a clause which affirms their own stance ("that global warming is serious") as an opinion endorsed ("[scientists] agree") by a reputable source ("leading"). In contrast, a global warming denier might frame the same clause as the opinion of an untrustworthy source with a predicate connoting doubt: "Mistaken scientists claim [...]." Our work studies opinion-framing in the global warming (GW) debate, an increasingly partisan issue that has received little attention in NLP. We introduce DeSMOG, a dataset of stance-labeled GW sentences, and train a BERT classifier to study novel aspects of argumentation in how different sides of a debate represent their own and each other's opinions. From 56K news articles, we find that similar linguistic devices for self-affirming and opponent-doubting discourse are used across GW-accepting and skeptic media, though GW-skeptical media shows more opponent-doubt. We also find that authors often characterize sources as hypocritical, by ascribing opinions expressing the author's own view to source entities known to publicly endorse the opposing view. We release our stance dataset, model, and lexicons of framing devices for future work on opinion-framing and the automatic detection of GW stance.
摘要:引用意见是一种强大的尚未被描述的论证策略。例如,环境活动家可能会说,“领先的科学家认为,全球变暖是一个严重关注的问题,”欺骗自己的立场(“全球变暖是严重的”)作为认可的意见(“科学家”(“科学家) )通过信誉良好的来源(“领导”)。相比之下,全球变暖旦尼尔可能会框架与谓词暗示疑问的不值得信任来源相同:“误认为科学家声称[...]。”我们的工作研究框架在全球变暖(GW)辩论中,这是一个越来越多的Partisan问题,在NLP中受到了很少的关注。我们介绍了Desmog,一个标有标记的GW句子的数据集,并训练了BERT分类器,以研究论证的新颖方面,以辩论的不同方面代表自己和彼此的意见。从56K新闻文章来看,我们发现类似的语言设备用于自我肯定和对手怀疑话语,虽然GW持怀疑媒体呈现出更多的对手怀疑。我们还发现作者通常将消息人士作为虚伪,以表示提交人自己的观点,以公开认识对方视图的来源实体的意见。我们释放我们的姿势数据集,型号和框架设备的词汇,以备将来的框架作品,并自动检测GW姿势。
Yiwei Luo, Dallas Card, Dan Jurafsky
Abstract: Citing opinions is a powerful yet understudied strategy in argumentation. For example, an environmental activist might say, "Leading scientists agree that global warming is a serious concern," framing a clause which affirms their own stance ("that global warming is serious") as an opinion endorsed ("[scientists] agree") by a reputable source ("leading"). In contrast, a global warming denier might frame the same clause as the opinion of an untrustworthy source with a predicate connoting doubt: "Mistaken scientists claim [...]." Our work studies opinion-framing in the global warming (GW) debate, an increasingly partisan issue that has received little attention in NLP. We introduce DeSMOG, a dataset of stance-labeled GW sentences, and train a BERT classifier to study novel aspects of argumentation in how different sides of a debate represent their own and each other's opinions. From 56K news articles, we find that similar linguistic devices for self-affirming and opponent-doubting discourse are used across GW-accepting and skeptic media, though GW-skeptical media shows more opponent-doubt. We also find that authors often characterize sources as hypocritical, by ascribing opinions expressing the author's own view to source entities known to publicly endorse the opposing view. We release our stance dataset, model, and lexicons of framing devices for future work on opinion-framing and the automatic detection of GW stance.
摘要:引用意见是一种强大的尚未被描述的论证策略。例如,环境活动家可能会说,“领先的科学家认为,全球变暖是一个严重关注的问题,”欺骗自己的立场(“全球变暖是严重的”)作为认可的意见(“科学家”(“科学家) )通过信誉良好的来源(“领导”)。相比之下,全球变暖旦尼尔可能会框架与谓词暗示疑问的不值得信任来源相同:“误认为科学家声称[...]。”我们的工作研究框架在全球变暖(GW)辩论中,这是一个越来越多的Partisan问题,在NLP中受到了很少的关注。我们介绍了Desmog,一个标有标记的GW句子的数据集,并训练了BERT分类器,以研究论证的新颖方面,以辩论的不同方面代表自己和彼此的意见。从56K新闻文章来看,我们发现类似的语言设备用于自我肯定和对手怀疑话语,虽然GW持怀疑媒体呈现出更多的对手怀疑。我们还发现作者通常将消息人士作为虚伪,以表示提交人自己的观点,以公开认识对方视图的来源实体的意见。我们释放我们的姿势数据集,型号和框架设备的词汇,以备将来的框架作品,并自动检测GW姿势。
17. Semi-Supervised Speech Recognition via Graph-based Temporal Classification [PDF] 返回目录
Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract: Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training targets. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only a temporal alignment, similarly to CTC, but also a label alignment to obtain the optimal pseudo-label sequence from the weighted graph. Results show that this approach can effectively exploit an N-best list of pseudo-labels with associated scores, outperforming standard pseudo-labeling by a large margin, with ASR results close to an oracle experiment in which the best hypotheses of the N-best lists are selected manually.
摘要:半监督学习通过使用为未标记数据生成的伪标签的种子ASR模型,通过自我训练表现出自动语音识别(ASR)的有希望的结果。这种方法的有效性在很大程度上依赖于伪标签精度,其通常仅使用1个最佳的ASR假设。然而,N-Best列表的替代ASR假设可以为未标记的语音话语提供更准确的标签,并且还反映了种子ASR模型的不确定性。在本文中,我们提出了一种接受培训目标的图表表示的连接主义时间分类(CTC)目标的普遍形式。新提出的基于图形的时间分类(GTC)目的应用于基于WFST的监督的自我训练,这是由伪标签的n最佳列表生成的。在该设置中,GTC用于不仅学习时间对准,类似于CTC,还用于从加权图获得最佳伪标签序列的标签对齐。结果表明,这种方法可以有效利用带有相关分数的伪标签的N-Best列表,优于大幅度的标准伪标签,ASR接近Oracle实验,其中n最佳列表中最好的假设手动选择。
Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract: Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training targets. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only a temporal alignment, similarly to CTC, but also a label alignment to obtain the optimal pseudo-label sequence from the weighted graph. Results show that this approach can effectively exploit an N-best list of pseudo-labels with associated scores, outperforming standard pseudo-labeling by a large margin, with ASR results close to an oracle experiment in which the best hypotheses of the N-best lists are selected manually.
摘要:半监督学习通过使用为未标记数据生成的伪标签的种子ASR模型,通过自我训练表现出自动语音识别(ASR)的有希望的结果。这种方法的有效性在很大程度上依赖于伪标签精度,其通常仅使用1个最佳的ASR假设。然而,N-Best列表的替代ASR假设可以为未标记的语音话语提供更准确的标签,并且还反映了种子ASR模型的不确定性。在本文中,我们提出了一种接受培训目标的图表表示的连接主义时间分类(CTC)目标的普遍形式。新提出的基于图形的时间分类(GTC)目的应用于基于WFST的监督的自我训练,这是由伪标签的n最佳列表生成的。在该设置中,GTC用于不仅学习时间对准,类似于CTC,还用于从加权图获得最佳伪标签序列的标签对齐。结果表明,这种方法可以有效利用带有相关分数的伪标签的N-Best列表,优于大幅度的标准伪标签,ASR接近Oracle实验,其中n最佳列表中最好的假设手动选择。
18. Designing learning experiences for online teaching and learning [PDF] 返回目录
Nachamma Sockalingam, Junhua Liu
Abstract: Teaching is about constantly innovating strategies, ways and means to engage diverse students in active and meaningful learning. In line with this, SUTD adopts various student-centric teaching and learning teaching methods and approaches. This means that our graduate/undergraduate instructors have to be ready to teach using these student student-centric teaching and learning pedagogies. In this article, I share my experiences of redesigning this teaching course that is typically conducted face-to-face to a synchronous online course and also invite one of the participant in this course to reflect on his experience as a student.
摘要:教学是关于不断创新的策略,方式和手段,以实现各种学生在积极和有意义的学习中。符合这一点,SUDD采用各种以学生为中心的教学和学习教学方法和方法。这意味着我们的研究生/本科教练必须准备使用这些学生以学生为中心的教学和学习教学。在本文中,我分享了我重新设计这一教学课程的经验,这些教学课程通常对同步的在线课程进行面对面,并邀请了本课程中的一个参与者,以反思他作为学生的经验。
Nachamma Sockalingam, Junhua Liu
Abstract: Teaching is about constantly innovating strategies, ways and means to engage diverse students in active and meaningful learning. In line with this, SUTD adopts various student-centric teaching and learning teaching methods and approaches. This means that our graduate/undergraduate instructors have to be ready to teach using these student student-centric teaching and learning pedagogies. In this article, I share my experiences of redesigning this teaching course that is typically conducted face-to-face to a synchronous online course and also invite one of the participant in this course to reflect on his experience as a student.
摘要:教学是关于不断创新的策略,方式和手段,以实现各种学生在积极和有意义的学习中。符合这一点,SUDD采用各种以学生为中心的教学和学习教学方法和方法。这意味着我们的研究生/本科教练必须准备使用这些学生以学生为中心的教学和学习教学。在本文中,我分享了我重新设计这一教学课程的经验,这些教学课程通常对同步的在线课程进行面对面,并邀请了本课程中的一个参与者,以反思他作为学生的经验。
19. Three computational models and its equivalence [PDF] 返回目录
Ciro Ivan Garcia Lopez
Abstract: The study of computability has its origin in Hilbert's conference of 1900, where an adjacent question, to the ones he asked, is to give a precise description of the notion of algorithm. In the search for a good definition arose three independent theories: Turing and the Turing machines, Gödel and the recursive functions, Church and the Lambda Calculus. Later there were established by Kleene that the classic models of computation are equivalent. This fact is widely accepted by many textbooks and the proof is omitted since the proof is tedious and unreadable. We intend to fill this gap presenting the proof in a modern way, without forgetting the mathematical details.
摘要:计算性的研究在1900年的1900年的1900年大会上有其起源,其中一个邻近的问题,他所要求的那些邻近的问题,是给出算法概念的精确描述。在寻求良好的定义中,出现了三个独立的理论:图灵和图灵和递归函数,教堂和兰姆达微积分。后来通过Kleene建立了经典的计算模型等同。这一事实被许多教科书广泛接受,并且省略了证明,因为证明是乏味的并且不可读。我们打算填补这种差距,以现代的方式呈现证明,而不会忘记数学细节。
Ciro Ivan Garcia Lopez
Abstract: The study of computability has its origin in Hilbert's conference of 1900, where an adjacent question, to the ones he asked, is to give a precise description of the notion of algorithm. In the search for a good definition arose three independent theories: Turing and the Turing machines, Gödel and the recursive functions, Church and the Lambda Calculus. Later there were established by Kleene that the classic models of computation are equivalent. This fact is widely accepted by many textbooks and the proof is omitted since the proof is tedious and unreadable. We intend to fill this gap presenting the proof in a modern way, without forgetting the mathematical details.
摘要:计算性的研究在1900年的1900年的1900年大会上有其起源,其中一个邻近的问题,他所要求的那些邻近的问题,是给出算法概念的精确描述。在寻求良好的定义中,出现了三个独立的理论:图灵和图灵和递归函数,教堂和兰姆达微积分。后来通过Kleene建立了经典的计算模型等同。这一事实被许多教科书广泛接受,并且省略了证明,因为证明是乏味的并且不可读。我们打算填补这种差距,以现代的方式呈现证明,而不会忘记数学细节。
20. Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation [PDF] 返回目录
Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee
Abstract: Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.
摘要:言语分离是充分发达的,而等待解决问题。我们在本文中专注的主要问题是频繁的标签置换交换切换折放不变训练(PIT)。对于n扬声器分离,会有n!可能的标签排列。如何稳定选择正确的标签排列是一个长期存在的问题。在本文中,我们利用自我监督的预训练来稳定标签排列。在几种类型的自我监督任务中,基于语音增强的预训练任务在我们的实验中表现出显着的有效性。当使用现成的预训练模型时,培训持续时间可以缩短为三分之一到三分之二。此外,即使考虑到预训练时间,也仍然在使用更大的批量大小时仍然较短而无需性能下降。
Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee
Abstract: Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.
摘要:言语分离是充分发达的,而等待解决问题。我们在本文中专注的主要问题是频繁的标签置换交换切换折放不变训练(PIT)。对于n扬声器分离,会有n!可能的标签排列。如何稳定选择正确的标签排列是一个长期存在的问题。在本文中,我们利用自我监督的预训练来稳定标签排列。在几种类型的自我监督任务中,基于语音增强的预训练任务在我们的实验中表现出显着的有效性。当使用现成的预训练模型时,培训持续时间可以缩短为三分之一到三分之二。此外,即使考虑到预训练时间,也仍然在使用更大的批量大小时仍然较短而无需性能下降。
21. Fusion Models for Improved Visual Captioning [PDF] 返回目录
Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Abstract: Visual captioning aims to generate textual descriptions given images. Traditionally, the captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them to often make mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language encoders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with an aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.
摘要:视觉标题旨在产生给定图像的文本描述。传统上,标题模型在人类注释数据集上培训,例如FlickR30K和MS-Coco,其尺寸和多样性有限。这种限制阻碍了这些模型的泛化能力,同时也使他们常常犯错误。但是,语言模型可以在大量的自由可用的未标记数据上培训,并且最近被出现为成功的语言编码器和连贯的文本生成器。同时,已证明了几种单向和多模式融合技术,以适应自然语言生成和自动语音识别。在这些最近的发展中建立,并旨在提高所产生的标题的质量,本文的工作的贡献是两倍:首先,我们提出了一种用于标题生成的通用多模式模型融合框架以及我们的解码利用不同的融合策略在传统的编码器 - 解码器视觉标题框架内集成普雷雷尼辅助语言模型(AUXLM)。接下来,我们采用相同的融合策略来集成预读屏蔽语言模型(MLM),即BERT,具有视觉标题模型,VIZ。显示,参加并告诉,用于在字幕中共用句法和语义错误。我们在三个基准图像标题数据集,viz上的标题修正实验。 Flickr8k,Flickr30k和Mscoco,显示出对基线的改进,表明我们所提出的多模融合策略的有用性。此外,我们对所强化标题进行初步定性分析,并根据更正类型识别错误类别。
Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Abstract: Visual captioning aims to generate textual descriptions given images. Traditionally, the captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them to often make mistakes. Language models can, however, be trained on vast amounts of freely available unlabelled data and have recently emerged as successful language encoders and coherent text generators. Meanwhile, several unimodal and multimodal fusion techniques have been proven to work well for natural language generation and automatic speech recognition. Building on these recent developments, and with an aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks. Next, we employ the same fusion strategies to integrate a pretrained Masked Language Model (MLM), namely BERT, with a visual captioning model, viz. Show, Attend, and Tell, for emending both syntactic and semantic errors in captions. Our caption emendation experiments on three benchmark image captioning datasets, viz. Flickr8k, Flickr30k, and MSCOCO, show improvements over the baseline, indicating the usefulness of our proposed multimodal fusion strategies. Further, we perform a preliminary qualitative analysis on the emended captions and identify error categories based on the type of corrections.
摘要:视觉标题旨在产生给定图像的文本描述。传统上,标题模型在人类注释数据集上培训,例如FlickR30K和MS-Coco,其尺寸和多样性有限。这种限制阻碍了这些模型的泛化能力,同时也使他们常常犯错误。但是,语言模型可以在大量的自由可用的未标记数据上培训,并且最近被出现为成功的语言编码器和连贯的文本生成器。同时,已证明了几种单向和多模式融合技术,以适应自然语言生成和自动语音识别。在这些最近的发展中建立,并旨在提高所产生的标题的质量,本文的工作的贡献是两倍:首先,我们提出了一种用于标题生成的通用多模式模型融合框架以及我们的解码利用不同的融合策略在传统的编码器 - 解码器视觉标题框架内集成普雷雷尼辅助语言模型(AUXLM)。接下来,我们采用相同的融合策略来集成预读屏蔽语言模型(MLM),即BERT,具有视觉标题模型,VIZ。显示,参加并告诉,用于在字幕中共用句法和语义错误。我们在三个基准图像标题数据集,viz上的标题修正实验。 Flickr8k,Flickr30k和Mscoco,显示出对基线的改进,表明我们所提出的多模融合策略的有用性。此外,我们对所强化标题进行初步定性分析,并根据更正类型识别错误类别。
注:中文为机器翻译结果!封面为论文标题词云图!