目录
2. Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media [PDF] 摘要
5. Show or Tell? Demonstration is More Robust to Changes in Shared Perception than Explanation [PDF] 摘要
6. Using Meta-Knowledge Mined from Identifiers to Improve Intent Recognition in Neuro-Symbolic Algorithms [PDF] 摘要
10. Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism [PDF] 摘要
11. Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration [PDF] 摘要
14. Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference [PDF] 摘要
17. Improving Multilingual Neural Machine Translation For Low-Resource Languages: French-, English- Vietnamese [PDF] 摘要
摘要
1. LIREx: Augmenting Language Inference with Relevant Explanation [PDF] 返回目录
Xinyan Zhao, V.G.Vinod Vydiswaran
Abstract: Natural language explanations (NLEs) are a special form of data annotation in which annotators identify rationales (most significant text tokens) when assigning labels to data instances, and write out explanations for the labels in natural language based on the rationales. NLEs have been shown to capture human reasoning better, but not as beneficial for natural language inference (NLI). In this paper, we analyze two primary flaws in the way NLEs are currently used to train explanation generators for language inference tasks. We find that the explanation generators do not take into account the variability inherent in human explanation of labels, and that the current explanation generation models generate spurious explanations. To overcome these limitations, we propose a novel framework, LIREx, that incorporates both a rationale-enabled explanation generator and an instance selector to select only relevant, plausible NLEs to augment NLI models. When evaluated on the standardized SNLI data set, LIREx achieved an accuracy of 91.87%, an improvement of 0.32 over the baseline and matching the best-reported performance on the data set. It also achieves significantly better performance than previous studies when transferred to the out-of-domain MultiNLI data set. Qualitative analysis shows that LIREx generates flexible, faithful, and relevant NLEs that allow the model to be more robust to spurious explanations. The code is available at this https URL.
摘要:自然语言解释(NLE)是数据注释的一种特殊形式,其中注释者在为数据实例分配标签时识别基本原理(最重要的文本标记),并根据基本原理以自然语言编写标签的说明。 NLE已被证明可以更好地捕获人类推理,但对自然语言推理(NLI)却没有好处。在本文中,我们分析了NLE当前用于训练语言推理任务的解释生成器的方式的两个主要缺陷。我们发现,解释生成器未考虑人类对标签的解释固有的可变性,而当前的解释生成模型会生成虚假的解释。为了克服这些限制,我们提出了一个新颖的框架LIREx,该框架结合了支持原理的解释生成器和实例选择器,以仅选择相关的,合理的NLE来增强NLI模型。在标准SNLI数据集上进行评估时,LIREx的准确度为91.87%,比基线提高了0.32,与数据集上报告的最佳性能相匹配。当将其传输到域外MultiNLI数据集时,其性能也比以前的研究好得多。定性分析表明,LIREx可以生成灵活,忠实且相关的NLE,从而使模型对虚假的解释更加健壮。该代码可从以下https URL获得。
Xinyan Zhao, V.G.Vinod Vydiswaran
Abstract: Natural language explanations (NLEs) are a special form of data annotation in which annotators identify rationales (most significant text tokens) when assigning labels to data instances, and write out explanations for the labels in natural language based on the rationales. NLEs have been shown to capture human reasoning better, but not as beneficial for natural language inference (NLI). In this paper, we analyze two primary flaws in the way NLEs are currently used to train explanation generators for language inference tasks. We find that the explanation generators do not take into account the variability inherent in human explanation of labels, and that the current explanation generation models generate spurious explanations. To overcome these limitations, we propose a novel framework, LIREx, that incorporates both a rationale-enabled explanation generator and an instance selector to select only relevant, plausible NLEs to augment NLI models. When evaluated on the standardized SNLI data set, LIREx achieved an accuracy of 91.87%, an improvement of 0.32 over the baseline and matching the best-reported performance on the data set. It also achieves significantly better performance than previous studies when transferred to the out-of-domain MultiNLI data set. Qualitative analysis shows that LIREx generates flexible, faithful, and relevant NLEs that allow the model to be more robust to spurious explanations. The code is available at this https URL.
摘要:自然语言解释(NLE)是数据注释的一种特殊形式,其中注释者在为数据实例分配标签时识别基本原理(最重要的文本标记),并根据基本原理以自然语言编写标签的说明。 NLE已被证明可以更好地捕获人类推理,但对自然语言推理(NLI)却没有好处。在本文中,我们分析了NLE当前用于训练语言推理任务的解释生成器的方式的两个主要缺陷。我们发现,解释生成器未考虑人类对标签的解释固有的可变性,而当前的解释生成模型会生成虚假的解释。为了克服这些限制,我们提出了一个新颖的框架LIREx,该框架结合了支持原理的解释生成器和实例选择器,以仅选择相关的,合理的NLE来增强NLI模型。在标准SNLI数据集上进行评估时,LIREx的准确度为91.87%,比基线提高了0.32,与数据集上报告的最佳性能相匹配。当将其传输到域外MultiNLI数据集时,其性能也比以前的研究好得多。定性分析表明,LIREx可以生成灵活,忠实且相关的NLE,从而使模型对虚假的解释更加健壮。该代码可从以下https URL获得。
2. Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media [PDF] 返回目录
Lei Cao, Huijun Zhang, Ling Feng
Abstract: A large number of individuals are suffering from suicidal ideation in the world. There are a number of causes behind why an individual might suffer from suicidal ideation. As the most popular platform for self-expression, emotion release, and personal interaction, individuals may exhibit a number of symptoms of suicidal ideation on social media. Nevertheless, challenges from both data and knowledge aspects remain as obstacles, constraining the social media-based detection performance. Data implicitness and sparsity make it difficult to discover the inner true intentions of individuals based on their posts. Inspired by psychological studies, we build and unify a high-level suicide-oriented knowledge graph with deep neural networks for suicidal ideation detection on social media. We further design a two-layered attention mechanism to explicitly reason and establish key risk factors to individual's suicidal ideation. The performance study on microblog and Reddit shows that: 1) with the constructed personal knowledge graph, the social media-based suicidal ideation detection can achieve over 93% accuracy; and 2) among the six categories of personal factors, post, personality, and experience are the top-3 key indicators. Under these categories, posted text, stress level, stress duration, posted image, and ruminant thinking contribute to one's suicidal ideation detection.
摘要:世界上有许多人有自杀念头。为什么一个人可能会有自杀念头,背后有很多原因。作为用于自我表达,情感释放和人际交往的最流行平台,个人可能会在社交媒体上表现出许多自杀意念的症状。然而,数据和知识方面的挑战仍然是障碍,制约了基于社交媒体的检测性能。数据的隐含性和稀疏性使得很难根据他们的帖子发现个人的内在真实意图。受心理学研究的启发,我们建立并统一了具有深度神经网络的高级自杀知识图,用于在社交媒体上检测自杀意念。我们进一步设计了一个两层的注意机制,以明确地推理并建立个人自杀意念的关键危险因素。对微博和Reddit的性能研究表明:1)利用构建的个人知识图谱,基于社交媒体的自杀意念检测可以达到93%以上的准确率; 2)在个人因素的六个类别中,职位,性格和经验是前三大关键指标。在这些类别下,张贴的文字,压力水平,压力的持续时间,张贴的图像和反刍思维有助于自杀意念的发现。
Lei Cao, Huijun Zhang, Ling Feng
Abstract: A large number of individuals are suffering from suicidal ideation in the world. There are a number of causes behind why an individual might suffer from suicidal ideation. As the most popular platform for self-expression, emotion release, and personal interaction, individuals may exhibit a number of symptoms of suicidal ideation on social media. Nevertheless, challenges from both data and knowledge aspects remain as obstacles, constraining the social media-based detection performance. Data implicitness and sparsity make it difficult to discover the inner true intentions of individuals based on their posts. Inspired by psychological studies, we build and unify a high-level suicide-oriented knowledge graph with deep neural networks for suicidal ideation detection on social media. We further design a two-layered attention mechanism to explicitly reason and establish key risk factors to individual's suicidal ideation. The performance study on microblog and Reddit shows that: 1) with the constructed personal knowledge graph, the social media-based suicidal ideation detection can achieve over 93% accuracy; and 2) among the six categories of personal factors, post, personality, and experience are the top-3 key indicators. Under these categories, posted text, stress level, stress duration, posted image, and ruminant thinking contribute to one's suicidal ideation detection.
摘要:世界上有许多人有自杀念头。为什么一个人可能会有自杀念头,背后有很多原因。作为用于自我表达,情感释放和人际交往的最流行平台,个人可能会在社交媒体上表现出许多自杀意念的症状。然而,数据和知识方面的挑战仍然是障碍,制约了基于社交媒体的检测性能。数据的隐含性和稀疏性使得很难根据他们的帖子发现个人的内在真实意图。受心理学研究的启发,我们建立并统一了具有深度神经网络的高级自杀知识图,用于在社交媒体上检测自杀意念。我们进一步设计了一个两层的注意机制,以明确地推理并建立个人自杀意念的关键危险因素。对微博和Reddit的性能研究表明:1)利用构建的个人知识图谱,基于社交媒体的自杀意念检测可以达到93%以上的准确率; 2)在个人因素的六个类别中,职位,性格和经验是前三大关键指标。在这些类别下,张贴的文字,压力水平,压力的持续时间,张贴的图像和反刍思维有助于自杀意念的发现。
3. Exploring Thematic Coherence in Fake News [PDF] 返回目录
Martins Samuel Dogo, Deepak P, Anna Jurek-Loughrey
Abstract: The spread of fake news remains a serious global issue; understanding and curtailing it is paramount. One way of differentiating between deceptive and truthful stories is by analyzing their coherence. This study explores the use of topic models to analyze the coherence of cross-domain news shared online. Experimental results on seven cross-domain datasets demonstrate that fake news shows a greater thematic deviation between its opening sentences and its remainder.
摘要:假新闻的传播仍然是一个严重的全球性问题。 理解和减少它是至关重要的。 区分欺骗性和真实性故事的一种方法是分析它们的连贯性。 本研究探索使用主题模型来分析在线共享的跨域新闻的连贯性。 对七个跨域数据集的实验结果表明,假新闻显示其开篇句子与其余句子之间的主题差异更大。
Martins Samuel Dogo, Deepak P, Anna Jurek-Loughrey
Abstract: The spread of fake news remains a serious global issue; understanding and curtailing it is paramount. One way of differentiating between deceptive and truthful stories is by analyzing their coherence. This study explores the use of topic models to analyze the coherence of cross-domain news shared online. Experimental results on seven cross-domain datasets demonstrate that fake news shows a greater thematic deviation between its opening sentences and its remainder.
摘要:假新闻的传播仍然是一个严重的全球性问题。 理解和减少它是至关重要的。 区分欺骗性和真实性故事的一种方法是分析它们的连贯性。 本研究探索使用主题模型来分析在线共享的跨域新闻的连贯性。 对七个跨域数据集的实验结果表明,假新闻显示其开篇句子与其余句子之间的主题差异更大。
4. You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection [PDF] 返回目录
Prateek Chaudhry, Matthew Lease
Abstract: Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.
摘要:仇恨语音检测研究主要集中在纯粹基于内容的方法上,没有利用任何其他上下文。 我们简要地评论了此任务制定的利弊。 然后,我们将根据用户过去的话语来分析用户,以便更好地预测新话语是否构成仇恨言论。 为了对此进行评估,我们使用额外的时间轴数据扩充了三个Twitter仇恨语音数据集,然后将此额外的上下文嵌入到强大的基准模型中。 有希望的结果表明有待进一步研究的优点,尽管注释方案和过程的差异以及Twitter API的限制和数据共享策略使分析变得复杂。
Prateek Chaudhry, Matthew Lease
Abstract: Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.
摘要:仇恨语音检测研究主要集中在纯粹基于内容的方法上,没有利用任何其他上下文。 我们简要地评论了此任务制定的利弊。 然后,我们将根据用户过去的话语来分析用户,以便更好地预测新话语是否构成仇恨言论。 为了对此进行评估,我们使用额外的时间轴数据扩充了三个Twitter仇恨语音数据集,然后将此额外的上下文嵌入到强大的基准模型中。 有希望的结果表明有待进一步研究的优点,尽管注释方案和过程的差异以及Twitter API的限制和数据共享策略使分析变得复杂。
5. Show or Tell? Demonstration is More Robust to Changes in Shared Perception than Explanation [PDF] 返回目录
Theodore R. Sumers, Mark K. Ho, Thomas L. Griffiths
Abstract: Successful teaching entails a complex interaction between a teacher and a learner. The teacher must select and convey information based on what they think the learner perceives and believes. Teaching always involves misaligned beliefs, but studies of pedagogy often focus on situations where teachers and learners share perceptions. Nonetheless, a teacher and learner may not always experience or attend to the same aspects of the environment. Here, we study how misaligned perceptions influence communication. We hypothesize that the efficacy of different forms of communication depends on the shared perceptual state between teacher and learner. We develop a cooperative teaching game to test whether concrete mediums (demonstrations, or "showing") are more robust than abstract ones (language, or "telling") when the teacher and learner are not perceptually aligned. We find evidence that (1) language-based teaching is more affected by perceptual misalignment, but (2) demonstration-based teaching is less likely to convey nuanced information. We discuss implications for human pedagogy and machine learning.
摘要:成功的教学需要教师和学习者之间复杂的互动。教师必须根据他们认为学习者的看法和信念来选择和传达信息。教学总是涉及错位的信念,但是教育学的研究通常侧重于教师和学习者分享看法的情况。但是,教师和学习者可能并不总是体验或关注环境的相同方面。在这里,我们研究错位的观念如何影响交流。我们假设不同形式的交流的有效性取决于教师和学习者之间共享的知觉状态。我们开发了一个合作教学游戏,以测试在教师和学习者未在感知上保持一致的情况下,具体媒介(演示或“显示”)是否比抽象媒介(语言或“说”)更健壮。我们发现有证据表明(1)基于语言的教学受感知错位的影响更大,但(2)基于演示的教学不太可能传达细微差别的信息。我们讨论了对人类教学法和机器学习的影响。
Theodore R. Sumers, Mark K. Ho, Thomas L. Griffiths
Abstract: Successful teaching entails a complex interaction between a teacher and a learner. The teacher must select and convey information based on what they think the learner perceives and believes. Teaching always involves misaligned beliefs, but studies of pedagogy often focus on situations where teachers and learners share perceptions. Nonetheless, a teacher and learner may not always experience or attend to the same aspects of the environment. Here, we study how misaligned perceptions influence communication. We hypothesize that the efficacy of different forms of communication depends on the shared perceptual state between teacher and learner. We develop a cooperative teaching game to test whether concrete mediums (demonstrations, or "showing") are more robust than abstract ones (language, or "telling") when the teacher and learner are not perceptually aligned. We find evidence that (1) language-based teaching is more affected by perceptual misalignment, but (2) demonstration-based teaching is less likely to convey nuanced information. We discuss implications for human pedagogy and machine learning.
摘要:成功的教学需要教师和学习者之间复杂的互动。教师必须根据他们认为学习者的看法和信念来选择和传达信息。教学总是涉及错位的信念,但是教育学的研究通常侧重于教师和学习者分享看法的情况。但是,教师和学习者可能并不总是体验或关注环境的相同方面。在这里,我们研究错位的观念如何影响交流。我们假设不同形式的交流的有效性取决于教师和学习者之间共享的知觉状态。我们开发了一个合作教学游戏,以测试在教师和学习者未在感知上保持一致的情况下,具体媒介(演示或“显示”)是否比抽象媒介(语言或“说”)更健壮。我们发现有证据表明(1)基于语言的教学受感知错位的影响更大,但(2)基于演示的教学不太可能传达细微差别的信息。我们讨论了对人类教学法和机器学习的影响。
6. Using Meta-Knowledge Mined from Identifiers to Improve Intent Recognition in Neuro-Symbolic Algorithms [PDF] 返回目录
Claudio Pinhanez, Paulo Cavalin, Victor Ribeiro, Heloisa Candello, Julio Nogima, Ana Appel, Mauro Pichiliani, Maira Gatti de Bayser, Melina Guerra, Henrique Ferreira, Gabriel Malfatti
Abstract: In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10% in the equal error rate (EER) in almost a third of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be even more relevant in detecting out-of-scope utterances, decreasing the false acceptance rate (FAR) in more than 20\% in about half of the chatbots. The experiments demonstrate that such symbolic meta-knowledge structures can be effectively mined and used by neuro-symbolic algorithms, apparently by incorporating into the learning process higher-level structures of the problem being solved. Based on these results, we also discuss how the use of mined meta-knowledge can be an answer for the challenge of knowledge acquisition in neuro-symbolic algorithms.
摘要:在本文中,我们探索了在意图标识符中嵌入的元知识的使用,以改善会话系统中的意图识别。正如对数千个现实世界聊天机器人的分析以及在与专业聊天机器人策展人的访谈中所证明的那样,开发人员和领域专家倾向于通过使用原型分类法(即,将高级知识,跨不同意图共享的符号概念。通过使用能够整合此类原分类法以扩展意图表示的神经符号算法,我们证明了这种挖掘的元知识可以提高意图识别的准确性。在具有来自数百个专业聊天机器人的意图和示例话语的数据集中,当我们将这些算法与相同的基线进行比较时,我们看到在近三分之一的聊天机器人中,平均错误率(EER)的提高超过10%。没有元知识的算法。事实证明,元知识与检测范围外话语更为相关,在大约一半的聊天机器人中,错误接受率(FAR)降低了20%以上。实验表明,这种符号化的元知识结构可以被神经符号算法有效地挖掘和使用,显然是通过将要解决的问题的更高层次的结构整合到学习过程中来的。基于这些结果,我们还将讨论如何使用挖掘的元知识来解决神经符号算法中知识获取的挑战。
Claudio Pinhanez, Paulo Cavalin, Victor Ribeiro, Heloisa Candello, Julio Nogima, Ana Appel, Mauro Pichiliani, Maira Gatti de Bayser, Melina Guerra, Henrique Ferreira, Gabriel Malfatti
Abstract: In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10% in the equal error rate (EER) in almost a third of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be even more relevant in detecting out-of-scope utterances, decreasing the false acceptance rate (FAR) in more than 20\% in about half of the chatbots. The experiments demonstrate that such symbolic meta-knowledge structures can be effectively mined and used by neuro-symbolic algorithms, apparently by incorporating into the learning process higher-level structures of the problem being solved. Based on these results, we also discuss how the use of mined meta-knowledge can be an answer for the challenge of knowledge acquisition in neuro-symbolic algorithms.
摘要:在本文中,我们探索了在意图标识符中嵌入的元知识的使用,以改善会话系统中的意图识别。正如对数千个现实世界聊天机器人的分析以及在与专业聊天机器人策展人的访谈中所证明的那样,开发人员和领域专家倾向于通过使用原型分类法(即,将高级知识,跨不同意图共享的符号概念。通过使用能够整合此类原分类法以扩展意图表示的神经符号算法,我们证明了这种挖掘的元知识可以提高意图识别的准确性。在具有来自数百个专业聊天机器人的意图和示例话语的数据集中,当我们将这些算法与相同的基线进行比较时,我们看到在近三分之一的聊天机器人中,平均错误率(EER)的提高超过10%。没有元知识的算法。事实证明,元知识与检测范围外话语更为相关,在大约一半的聊天机器人中,错误接受率(FAR)降低了20%以上。实验表明,这种符号化的元知识结构可以被神经符号算法有效地挖掘和使用,显然是通过将要解决的问题的更高层次的结构整合到学习过程中来的。基于这些结果,我们还将讨论如何使用挖掘的元知识来解决神经符号算法中知识获取的挑战。
7. Discovering New Intents with Deep Aligned Clustering [PDF] 返回目录
Hanlei Zhang, Hua Xu, Ting-En Lin, Rui Lv
Abstract: Discovering new intents is a crucial task in a dialogue system. Most existing methods are limited in transferring the prior knowledge from known intents to new intents. These methods also have difficulties in providing high-quality supervised signals to learn clustering-friendly features for grouping unlabeled intents. In this work, we propose an effective method (Deep Aligned Clustering) to discover new intents with the aid of limited known intent data. Firstly, we leverage a few labeled known intent samples as prior knowledge to pre-train the model. Then, we perform k-means to produce cluster assignments as pseudo-labels. Moreover, we propose an alignment strategy to tackle the label inconsistency during clustering assignments. Finally, we learn the intent representations under the supervision of the aligned pseudo-labels. With an unknown number of new intents, we predict the number of intent categories by eliminating low-confidence intent-wise clusters. Extensive experiments on two benchmark datasets show that our method is more robust and achieves substantial improvements over the state-of-the-art methods.(Code available at this https URL)
摘要:发现新的意图是对话系统中的关键任务。大多数现有方法在将现有知识从已知意图转移到新意图方面受到限制。这些方法在提供高质量的监督信号以学习用于分组未标记意图的聚类友好功能方面也存在困难。在这项工作中,我们提出了一种有效的方法(深度对齐聚类),借助有限的已知意图数据来发现新意图。首先,我们利用一些标记的已知意图样本作为先验知识来对模型进行预训练。然后,我们执行k均值以产生聚类分配作为伪标签。此外,我们提出了一种对齐策略来解决聚类分配期间的标签不一致问题。最后,我们在对齐的伪标签的监督下学习意图表示。在未知数量的新意图的情况下,我们通过消除低置信度的按意图分类来预测意图类别的数量。在两个基准数据集上进行的大量实验表明,我们的方法比最先进的方法更健壮并取得了实质性的改进。(可在此https URL获得代码)
Hanlei Zhang, Hua Xu, Ting-En Lin, Rui Lv
Abstract: Discovering new intents is a crucial task in a dialogue system. Most existing methods are limited in transferring the prior knowledge from known intents to new intents. These methods also have difficulties in providing high-quality supervised signals to learn clustering-friendly features for grouping unlabeled intents. In this work, we propose an effective method (Deep Aligned Clustering) to discover new intents with the aid of limited known intent data. Firstly, we leverage a few labeled known intent samples as prior knowledge to pre-train the model. Then, we perform k-means to produce cluster assignments as pseudo-labels. Moreover, we propose an alignment strategy to tackle the label inconsistency during clustering assignments. Finally, we learn the intent representations under the supervision of the aligned pseudo-labels. With an unknown number of new intents, we predict the number of intent categories by eliminating low-confidence intent-wise clusters. Extensive experiments on two benchmark datasets show that our method is more robust and achieves substantial improvements over the state-of-the-art methods.(Code available at this https URL)
摘要:发现新的意图是对话系统中的关键任务。大多数现有方法在将现有知识从已知意图转移到新意图方面受到限制。这些方法在提供高质量的监督信号以学习用于分组未标记意图的聚类友好功能方面也存在困难。在这项工作中,我们提出了一种有效的方法(深度对齐聚类),借助有限的已知意图数据来发现新意图。首先,我们利用一些标记的已知意图样本作为先验知识来对模型进行预训练。然后,我们执行k均值以产生聚类分配作为伪标签。此外,我们提出了一种对齐策略来解决聚类分配期间的标签不一致问题。最后,我们在对齐的伪标签的监督下学习意图表示。在未知数量的新意图的情况下,我们通过消除低置信度的按意图分类来预测意图类别的数量。在两个基准数据集上进行的大量实验表明,我们的方法比最先进的方法更健壮并取得了实质性的改进。(可在此https URL获得代码)
8. No Budget? Don't Flex! Cost Consideration when Planning to Adopt NLP for Your Business [PDF] 返回目录
Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Alham Fikri Aji
Abstract: Recent advances in Natural Language Processing (NLP) have largely pushed deep transformer-based models as the go-to state-of-the-art technique without much regard to the production and utilization cost. Companies planning to adopt these methods into their business face difficulties because of the lack of machine and human resources to build them. In this work, we compare both the performance and the cost of classical learning algorithms to the latest ones in common sequence and text labeling tasks. We find that classical models often perform on par with deep neural ones despite the lower cost. We argue that under many circumstances the smaller and lighter models fit better for AI-pivoting businesses and that we call for more research into low-cost models, especially for under-resourced languages.
摘要:自然语言处理(NLP)的最新进展很大程度上推动了基于变压器的深层模型成为最先进的技术,而没有过多考虑生产和使用成本。 计划将这些方法应用到其业务中的公司面临着困难,因为缺少构建它们的机器和人力资源。 在这项工作中,我们将经典学习算法的性能和成本与通用序列和文本标注任务中的最新算法进行了比较。 我们发现,尽管成本较低,但经典模型通常仍可与深层神经模型媲美。 我们认为,在许多情况下,更小更轻的模型更适合于以AI为中心的企业,并且我们呼吁对低成本模型(尤其是资源匮乏的语言)进行更多的研究。
Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Alham Fikri Aji
Abstract: Recent advances in Natural Language Processing (NLP) have largely pushed deep transformer-based models as the go-to state-of-the-art technique without much regard to the production and utilization cost. Companies planning to adopt these methods into their business face difficulties because of the lack of machine and human resources to build them. In this work, we compare both the performance and the cost of classical learning algorithms to the latest ones in common sequence and text labeling tasks. We find that classical models often perform on par with deep neural ones despite the lower cost. We argue that under many circumstances the smaller and lighter models fit better for AI-pivoting businesses and that we call for more research into low-cost models, especially for under-resourced languages.
摘要:自然语言处理(NLP)的最新进展很大程度上推动了基于变压器的深层模型成为最先进的技术,而没有过多考虑生产和使用成本。 计划将这些方法应用到其业务中的公司面临着困难,因为缺少构建它们的机器和人力资源。 在这项工作中,我们将经典学习算法的性能和成本与通用序列和文本标注任务中的最新算法进行了比较。 我们发现,尽管成本较低,但经典模型通常仍可与深层神经模型媲美。 我们认为,在许多情况下,更小更轻的模型更适合于以AI为中心的企业,并且我们呼吁对低成本模型(尤其是资源匮乏的语言)进行更多的研究。
9. R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic Matching [PDF] 返回目录
Kun Zhang, Le Wu, Guangyi Lv, Meng Wang, Enhong Chen, Shulan Ruan
Abstract: Sentence semantic matching is one of the fundamental tasks in natural language processing, which requires an agent to determine the semantic relation among input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially BERT. Despite the effectiveness of these models, most of them treat output labels as meaningless one-hot vectors, underestimating the semantic information and guidance of relations that these labels reveal, especially for tasks with a small number of labels. To address this problem, we propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. Specifically, we first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task for guiding R2-Net to consider more about labels. Meanwhile, a triplet loss is employed to distinguish the intra-class and inter-class relations in a finer granularity. Empirical experiments on two sentence semantic matching tasks demonstrate the superiority of our proposed model. As a byproduct, we have released the codes to facilitate other researches.
摘要:句子语义匹配是自然语言处理中的基本任务之一,需要智能体来确定输入句子之间的语义关系。最近,深度神经网络在该领域取得了令人瞩目的性能,尤其是BERT。尽管这些模型有效,但大多数模型都将输出标签视为无意义的“一键向量”,这低估了这些标签所揭示的语义信息和关系指导,尤其是对于带有少量标签的任务。为了解决这个问题,我们提出了一个关系学习网络关系(R2-Net)用于句子语义匹配。具体来说,我们首先使用BERT从全局角度对输入句子进行编码。然后,基于CNN的编码器被设计为从本地角度捕获关键字和短语信息。为了充分利用标签以更好地提取关系信息,我们引入了一种自监督的关系分类任务,以指导R2-Net考虑更多有关标签的信息。同时,采用三元组损失来以更精细的粒度区分类内和类间关系。对两个句子语义匹配任务的实证实验证明了我们提出的模型的优越性。作为副产品,我们发布了代码以方便其他研究。
Kun Zhang, Le Wu, Guangyi Lv, Meng Wang, Enhong Chen, Shulan Ruan
Abstract: Sentence semantic matching is one of the fundamental tasks in natural language processing, which requires an agent to determine the semantic relation among input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially BERT. Despite the effectiveness of these models, most of them treat output labels as meaningless one-hot vectors, underestimating the semantic information and guidance of relations that these labels reveal, especially for tasks with a small number of labels. To address this problem, we propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. Specifically, we first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task for guiding R2-Net to consider more about labels. Meanwhile, a triplet loss is employed to distinguish the intra-class and inter-class relations in a finer granularity. Empirical experiments on two sentence semantic matching tasks demonstrate the superiority of our proposed model. As a byproduct, we have released the codes to facilitate other researches.
摘要:句子语义匹配是自然语言处理中的基本任务之一,需要智能体来确定输入句子之间的语义关系。最近,深度神经网络在该领域取得了令人瞩目的性能,尤其是BERT。尽管这些模型有效,但大多数模型都将输出标签视为无意义的“一键向量”,这低估了这些标签所揭示的语义信息和关系指导,尤其是对于带有少量标签的任务。为了解决这个问题,我们提出了一个关系学习网络关系(R2-Net)用于句子语义匹配。具体来说,我们首先使用BERT从全局角度对输入句子进行编码。然后,基于CNN的编码器被设计为从本地角度捕获关键字和短语信息。为了充分利用标签以更好地提取关系信息,我们引入了一种自监督的关系分类任务,以指导R2-Net考虑更多有关标签的信息。同时,采用三元组损失来以更精细的粒度区分类内和类间关系。对两个句子语义匹配任务的实证实验证明了我们提出的模型的优越性。作为副产品,我们发布了代码以方便其他研究。
10. Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism [PDF] 返回目录
Denisa A.O. Roberts
Abstract: This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. A 400 example mixed language English-Romanian dataset is created for cross-lingual transfer learning evaluation. We make code, datasets, and trained models available upon publication.
摘要:本文研究了多语种证据的检索和事实验证,以据我们所知,这是与全球虚假信息作斗争的第一步。 创建了400个示例的英语-罗马尼亚语混合语言数据集,用于跨语言迁移学习评估。 我们在发布时提供代码,数据集和训练有素的模型。
Denisa A.O. Roberts
Abstract: This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. A 400 example mixed language English-Romanian dataset is created for cross-lingual transfer learning evaluation. We make code, datasets, and trained models available upon publication.
摘要:本文研究了多语种证据的检索和事实验证,以据我们所知,这是与全球虚假信息作斗争的第一步。 创建了400个示例的英语-罗马尼亚语混合语言数据集,用于跨语言迁移学习评估。 我们在发布时提供代码,数据集和训练有素的模型。
11. Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration [PDF] 返回目录
Lei Sha, Oana-Maria Camburu, Thomas Lukasiewicz
Abstract: Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i.e., a subset of features of an instance that lead the model to give its prediction on the instance. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i.e., the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor and the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide to the second model. We use an adversarial-based technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.
摘要:解释AI模型的预测对于安全性至关重要的应用(例如法律或医学领域)至关重要。预测的一种解释形式是提取原理,即导致模型对实例进行预测的实例特征的子集。先前用于生成提取性基本原理的工作通常采用两阶段模型:选择器选择最重要的特征(即基本原理),然后是预测器,该预测器仅基于所选特征进行预测。这些工作的一个缺点是,学习选择特征的主要信号来自预测变量给出的答案与真实答案的比较。在这项工作中,我们建议通过信息校准方法从预测变量中榨取更多信息。更准确地说,我们共同训练两个模型:一个是典型的神经模型,以准确但黑匣子的方式解决当前的任务,另一个是选择器-预测器模型,该模型还为其预测提供了依据。第一个模型用作第二个模型的指南。我们使用基于对抗的技术来校准由两个模型提取的信息,以使它们之间的差异是缺失或过度选择特征的指标。另外,对于自然语言任务,我们建议使用基于语言模型的正则化程序来鼓励流利的基本原理的提取。在情感分析任务以及法律领域的三个任务上的实验结果表明,我们提取基本原理的方法是有效的。
Lei Sha, Oana-Maria Camburu, Thomas Lukasiewicz
Abstract: Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i.e., a subset of features of an instance that lead the model to give its prediction on the instance. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i.e., the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor and the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide to the second model. We use an adversarial-based technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.
摘要:解释AI模型的预测对于安全性至关重要的应用(例如法律或医学领域)至关重要。预测的一种解释形式是提取原理,即导致模型对实例进行预测的实例特征的子集。先前用于生成提取性基本原理的工作通常采用两阶段模型:选择器选择最重要的特征(即基本原理),然后是预测器,该预测器仅基于所选特征进行预测。这些工作的一个缺点是,学习选择特征的主要信号来自预测变量给出的答案与真实答案的比较。在这项工作中,我们建议通过信息校准方法从预测变量中榨取更多信息。更准确地说,我们共同训练两个模型:一个是典型的神经模型,以准确但黑匣子的方式解决当前的任务,另一个是选择器-预测器模型,该模型还为其预测提供了依据。第一个模型用作第二个模型的指南。我们使用基于对抗的技术来校准由两个模型提取的信息,以使它们之间的差异是缺失或过度选择特征的指标。另外,对于自然语言任务,我们建议使用基于语言模型的正则化程序来鼓励流利的基本原理的提取。在情感分析任务以及法律领域的三个任务上的实验结果表明,我们提取基本原理的方法是有效的。
12. Multi-type Disentanglement without Adversarial Training [PDF] 返回目录
Lei Sha, Thomas Lukasiewicz
Abstract: Controlling the style of natural language by disentangling the latent space is an important step towards interpretable machine learning. After the latent space is disentangled, the style of a sentence can be transformed by tuning the style representation without affecting other features of the sentence. Previous works usually use adversarial training to guarantee that disentangled vectors do not affect each other. However, adversarial methods are difficult to train. Especially when there are multiple features (e.g., sentiment, or tense, which we call style types in this paper), each feature requires a separate discriminator for extracting a disentangled style vector corresponding to that feature. In this paper, we propose a unified distribution-controlling method, which provides each specific style value (the value of style types, e.g., positive sentiment, or past tense) with a unique representation. This method contributes a solid theoretical basis to avoid adversarial training in multi-type disentanglement. We also propose multiple loss functions to achieve a style-content disentanglement as well as a disentanglement among multiple style types. In addition, we observe that if two different style types always have some specific style values that occur together in the dataset, they will affect each other when transferring the style values. We call this phenomenon training bias, and we propose a loss function to alleviate such training bias while disentangling multiple types. We conduct experiments on two datasets (Yelp service reviews and Amazon product reviews) to evaluate the style-disentangling effect and the unsupervised style transfer performance on two style types: sentiment and tense. The experimental results show the effectiveness of our model.
摘要:通过解开潜在空间来控制自然语言的风格是迈向可解释机器学习的重要一步。解开潜在空间后,可以通过调整样式表示形式来更改句子的样式,而不会影响句子的其他功能。以前的作品通常使用对抗性训练来确保散乱的向量不会相互影响。但是,对抗方法很难训练。尤其是当存在多个特征时(例如,我们在本文中将其称为样式类型的情感或时态),每个特征都需要一个单独的鉴别器来提取与该特征相对应的解缠结的样式向量。在本文中,我们提出了一种统一的分布控制方法,该方法为每个特定样式值(样式类型的值,例如积极情绪或过去时态)提供了唯一的表示形式。该方法为避免在多类型纠缠中进行对抗训练提供了坚实的理论基础。我们还提出了多种损失函数,以实现样式内容的解缠以及多种样式类型之间的解缠。此外,我们观察到,如果两种不同的样式类型始终在数据集中同时出现一些特定的样式值,则在传输样式值时它们将相互影响。我们称这种现象为训练偏见,我们提出了一种损失函数来缓解这种训练偏见,同时消除了多种类型。我们在两个数据集(Yelp服务评论和Amazon产品评论)上进行了实验,以评估两种风格类型上的风格解开效果和无监督的风格转移性能:情感和时态。实验结果表明了该模型的有效性。
Lei Sha, Thomas Lukasiewicz
Abstract: Controlling the style of natural language by disentangling the latent space is an important step towards interpretable machine learning. After the latent space is disentangled, the style of a sentence can be transformed by tuning the style representation without affecting other features of the sentence. Previous works usually use adversarial training to guarantee that disentangled vectors do not affect each other. However, adversarial methods are difficult to train. Especially when there are multiple features (e.g., sentiment, or tense, which we call style types in this paper), each feature requires a separate discriminator for extracting a disentangled style vector corresponding to that feature. In this paper, we propose a unified distribution-controlling method, which provides each specific style value (the value of style types, e.g., positive sentiment, or past tense) with a unique representation. This method contributes a solid theoretical basis to avoid adversarial training in multi-type disentanglement. We also propose multiple loss functions to achieve a style-content disentanglement as well as a disentanglement among multiple style types. In addition, we observe that if two different style types always have some specific style values that occur together in the dataset, they will affect each other when transferring the style values. We call this phenomenon training bias, and we propose a loss function to alleviate such training bias while disentangling multiple types. We conduct experiments on two datasets (Yelp service reviews and Amazon product reviews) to evaluate the style-disentangling effect and the unsupervised style transfer performance on two style types: sentiment and tense. The experimental results show the effectiveness of our model.
摘要:通过解开潜在空间来控制自然语言的风格是迈向可解释机器学习的重要一步。解开潜在空间后,可以通过调整样式表示形式来更改句子的样式,而不会影响句子的其他功能。以前的作品通常使用对抗性训练来确保散乱的向量不会相互影响。但是,对抗方法很难训练。尤其是当存在多个特征时(例如,我们在本文中将其称为样式类型的情感或时态),每个特征都需要一个单独的鉴别器来提取与该特征相对应的解缠结的样式向量。在本文中,我们提出了一种统一的分布控制方法,该方法为每个特定样式值(样式类型的值,例如积极情绪或过去时态)提供了唯一的表示形式。该方法为避免在多类型纠缠中进行对抗训练提供了坚实的理论基础。我们还提出了多种损失函数,以实现样式内容的解缠以及多种样式类型之间的解缠。此外,我们观察到,如果两种不同的样式类型始终在数据集中同时出现一些特定的样式值,则在传输样式值时它们将相互影响。我们称这种现象为训练偏见,我们提出了一种损失函数来缓解这种训练偏见,同时消除了多种类型。我们在两个数据集(Yelp服务评论和Amazon产品评论)上进行了实验,以评估两种风格类型上的风格解开效果和无监督的风格转移性能:情感和时态。实验结果表明了该模型的有效性。
13. A Lightweight Neural Model for Biomedical Entity Linking [PDF] 返回目录
Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
Abstract: Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base. The specific challenge in this context is that the same biomedical entity can have a wide range of names, including synonyms, morphological variations, and names with different word orderings. Recently, BERT-based methods have advanced the state-of-the-art by allowing for rich representations of word sequences. However, they often have hundreds of millions of parameters and require heavy computing resources, which limits their applications in resource-limited scenarios. Here, we propose a lightweight neural method for biomedical entity linking, which needs just a fraction of the parameters of a BERT model and much less computing resources. Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names. Yet, we show that our model is competitive with previous work on standard evaluation benchmarks.
摘要:生物医学实体链接旨在将疾病和药物等生物医学提及内容映射到给定知识库中的标准实体。在这种情况下的具体挑战是,同一生物医学实体可以具有广泛的名称,包括同义词,形态变异以及具有不同词序的名称。最近,基于BERT的方法通过允许对单词序列进行丰富的表示,已经使最新技术得到了发展。但是,它们通常具有数亿个参数,并且需要大量的计算资源,这限制了它们在资源受限的情况下的应用。在这里,我们提出了一种用于生物医学实体链接的轻量级神经方法,该方法仅需要BERT模型的一部分参数,而所需的计算资源则少得多。我们的方法使用带有注意机制的简单对齐层来捕获提及名称和实体名称之间的差异。但是,我们证明了我们的模型与以前在标准评估基准上的工作相比具有竞争力。
Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
Abstract: Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base. The specific challenge in this context is that the same biomedical entity can have a wide range of names, including synonyms, morphological variations, and names with different word orderings. Recently, BERT-based methods have advanced the state-of-the-art by allowing for rich representations of word sequences. However, they often have hundreds of millions of parameters and require heavy computing resources, which limits their applications in resource-limited scenarios. Here, we propose a lightweight neural method for biomedical entity linking, which needs just a fraction of the parameters of a BERT model and much less computing resources. Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names. Yet, we show that our model is competitive with previous work on standard evaluation benchmarks.
摘要:生物医学实体链接旨在将疾病和药物等生物医学提及内容映射到给定知识库中的标准实体。在这种情况下的具体挑战是,同一生物医学实体可以具有广泛的名称,包括同义词,形态变异以及具有不同词序的名称。最近,基于BERT的方法通过允许对单词序列进行丰富的表示,已经使最新技术得到了发展。但是,它们通常具有数亿个参数,并且需要大量的计算资源,这限制了它们在资源受限的情况下的应用。在这里,我们提出了一种用于生物医学实体链接的轻量级神经方法,该方法仅需要BERT模型的一部分参数,而所需的计算资源则少得多。我们的方法使用带有注意机制的简单对齐层来捕获提及名称和实体名称之间的差异。但是,我们证明了我们的模型与以前在标准评估基准上的工作相比具有竞争力。
14. Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference [PDF] 返回目录
Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang
Abstract: There has been a steady need in the medical community to precisely extract the temporal relations between clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. Existing methods either require expensive feature engineering or are incapable of modeling the global relational dependencies among the events. In this paper, we propose a novel method, Clinical Temporal ReLation Exaction with Probabilistic Soft Logic Regularization and Global Inference (CTRL-PG) to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly outperforms baseline methods for temporal relation extraction.
摘要:在医学界一直需要精确地提取临床事件之间的时间关系。 特别地,时间信息可以促进各种下游应用,例如病例报告检索和医学问题解答。 现有方法要么需要昂贵的功能工程,要么无法对事件之间的全局关系依赖性进行建模。 在本文中,我们提出了一种新的方法,即具有概率软逻辑正则化和全局推断(CTRL-PG)的临床时态关联来解决文档级别的问题。 在两个基准数据集I2B2-2012和TB-Dense上进行的广泛实验表明,CTRL-PG明显优于时域关系提取的基线方法。
Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang
Abstract: There has been a steady need in the medical community to precisely extract the temporal relations between clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. Existing methods either require expensive feature engineering or are incapable of modeling the global relational dependencies among the events. In this paper, we propose a novel method, Clinical Temporal ReLation Exaction with Probabilistic Soft Logic Regularization and Global Inference (CTRL-PG) to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly outperforms baseline methods for temporal relation extraction.
摘要:在医学界一直需要精确地提取临床事件之间的时间关系。 特别地,时间信息可以促进各种下游应用,例如病例报告检索和医学问题解答。 现有方法要么需要昂贵的功能工程,要么无法对事件之间的全局关系依赖性进行建模。 在本文中,我们提出了一种新的方法,即具有概率软逻辑正则化和全局推断(CTRL-PG)的临床时态关联来解决文档级别的问题。 在两个基准数据集I2B2-2012和TB-Dense上进行的广泛实验表明,CTRL-PG明显优于时域关系提取的基线方法。
15. Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training [PDF] 返回目录
Chen Xing, Wencong Xiao, Yong Li, Wei Lin
Abstract: In this work, we propose to improve the effectiveness of language pre-training methods with the help of mis-predictions during pre-training. Neglecting words in the input sentence that have conflicting semantics with mis-predictions is likely to be the reason of generating mis-predictions at pre-training. Therefore, we hypothesis that mis-predictions during pre-training can act as detectors of the ill focuses of the model. If we train the model to focus more on the conflicts with the mis-predictions while focus less on the rest words in the input sentence, the mis-predictions can be more easily corrected and the entire model could be better trained. Towards this end, we introduce Focusing Less on Context of Mis-predictions(McMisP). In McMisP, we record the co-occurrence information between words to detect the conflicting words with mis-predictions in an unsupervised way. Then McMisP uses such information to guide the attention modules when a mis-prediction occurs. Specifically, several attention modules in the Transformer are optimized to focus more on words in the input sentence that have co-occurred rarely with the mis-predictions and vice versa. Results show that McMisP significantly expedites BERT and ELECTRA and improves their performances on downstream tasks.
摘要:在这项工作中,我们建议借助预训练期间的错误预测来提高语言预训练方法的有效性。忽略输入句子中具有与错误预测相冲突的语义的单词可能是在预训练时产生错误预测的原因。因此,我们假设在预训练期间的错误预测可以充当模型不良焦点的检测器。如果我们训练模型以将更多的注意力集中在与错误预测的冲突上,而将注意力集中在输入句子中的其余单词上,则可以更容易地纠正错误预测,并且可以更好地训练整个模型。为此,我们介绍了“少关注错误预测上下文”(McMisP)。在McMisP中,我们记录单词之间的共现信息,以无监督的方式检测到带有错误预测的冲突单词。然后,McMisP使用此类信息在发生错误预测时引导注意模块。具体来说,优化了Transformer中的几个注意模块,使其更多地关注输入句子中很少与错误预测共同出现的单词,反之亦然。结果表明,McMisP大大加快了BERT和ELECTRA的速度,并提高了它们在下游任务上的性能。
Chen Xing, Wencong Xiao, Yong Li, Wei Lin
Abstract: In this work, we propose to improve the effectiveness of language pre-training methods with the help of mis-predictions during pre-training. Neglecting words in the input sentence that have conflicting semantics with mis-predictions is likely to be the reason of generating mis-predictions at pre-training. Therefore, we hypothesis that mis-predictions during pre-training can act as detectors of the ill focuses of the model. If we train the model to focus more on the conflicts with the mis-predictions while focus less on the rest words in the input sentence, the mis-predictions can be more easily corrected and the entire model could be better trained. Towards this end, we introduce Focusing Less on Context of Mis-predictions(McMisP). In McMisP, we record the co-occurrence information between words to detect the conflicting words with mis-predictions in an unsupervised way. Then McMisP uses such information to guide the attention modules when a mis-prediction occurs. Specifically, several attention modules in the Transformer are optimized to focus more on words in the input sentence that have co-occurred rarely with the mis-predictions and vice versa. Results show that McMisP significantly expedites BERT and ELECTRA and improves their performances on downstream tasks.
摘要:在这项工作中,我们建议借助预训练期间的错误预测来提高语言预训练方法的有效性。忽略输入句子中具有与错误预测相冲突的语义的单词可能是在预训练时产生错误预测的原因。因此,我们假设在预训练期间的错误预测可以充当模型不良焦点的检测器。如果我们训练模型以将更多的注意力集中在与错误预测的冲突上,而将注意力集中在输入句子中的其余单词上,则可以更容易地纠正错误预测,并且可以更好地训练整个模型。为此,我们介绍了“少关注错误预测上下文”(McMisP)。在McMisP中,我们记录单词之间的共现信息,以无监督的方式检测到带有错误预测的冲突单词。然后,McMisP使用此类信息在发生错误预测时引导注意模块。具体来说,优化了Transformer中的几个注意模块,使其更多地关注输入句子中很少与错误预测共同出现的单词,反之亦然。结果表明,McMisP大大加快了BERT和ELECTRA的速度,并提高了它们在下游任务上的性能。
16. Building domain specific lexicon based on TikTok comment dataset [PDF] 返回目录
Hao Jiaxiang
Abstract: In the sentiment analysis task, predicting the sentiment tendency of a sentence is an important branch. Previous research focused more on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences. the emotional tendency is different between the two languages. For example, the sentence order between Chinese and English may present different emotions. This paper tried a method that builds a domain-specific lexicon. In this way, the model can classify Chinese words with emotional tendency. In this approach, based on the [13], an ultra-dense space embedding table is trained through word embedding of Chinese TikTok review and emotional lexicon sources(seed words). The result of the model is a domain-specific lexicon, which presents the emotional tendency of words. I collected Chinese TikTok comments as training data. By comparing The training results with the PCA method to evaluate the performance of the model in Chinese sentiment classification, the results show that the model has done well in Chinese. The source code has released on github:this https URL
摘要:在情感分析任务中,预测句子的情感倾向是重要的分支。先前的研究更多地集中在英语的情感分析上,例如,基于价,价,句子的优势来分析句子的情感趋势。两种语言之间的情感倾向不同。例如,中文和英文之间的句子顺序可能会表现出不同的情感。本文尝试了一种构建特定领域词典的方法。这样,该模型可以对具有情感倾向的中文单词进行分类。在这种方法的基础上,基于[13],通过中文TikTok评论的词嵌入和情感词典来源(种子词)来训练超密集空间嵌入表。该模型的结果是特定于域的词典,该词典呈现了单词的情感倾向。我收集了中文TikTok注释作为培训数据。通过将训练结果与PCA方法进行比较,以评估该模型在中文情感分类中的性能,结果表明该模型在中文中表现良好。源代码已在github上发布:此https URL
Hao Jiaxiang
Abstract: In the sentiment analysis task, predicting the sentiment tendency of a sentence is an important branch. Previous research focused more on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences. the emotional tendency is different between the two languages. For example, the sentence order between Chinese and English may present different emotions. This paper tried a method that builds a domain-specific lexicon. In this way, the model can classify Chinese words with emotional tendency. In this approach, based on the [13], an ultra-dense space embedding table is trained through word embedding of Chinese TikTok review and emotional lexicon sources(seed words). The result of the model is a domain-specific lexicon, which presents the emotional tendency of words. I collected Chinese TikTok comments as training data. By comparing The training results with the PCA method to evaluate the performance of the model in Chinese sentiment classification, the results show that the model has done well in Chinese. The source code has released on github:this https URL
摘要:在情感分析任务中,预测句子的情感倾向是重要的分支。先前的研究更多地集中在英语的情感分析上,例如,基于价,价,句子的优势来分析句子的情感趋势。两种语言之间的情感倾向不同。例如,中文和英文之间的句子顺序可能会表现出不同的情感。本文尝试了一种构建特定领域词典的方法。这样,该模型可以对具有情感倾向的中文单词进行分类。在这种方法的基础上,基于[13],通过中文TikTok评论的词嵌入和情感词典来源(种子词)来训练超密集空间嵌入表。该模型的结果是特定于域的词典,该词典呈现了单词的情感倾向。我收集了中文TikTok注释作为培训数据。通过将训练结果与PCA方法进行比较,以评估该模型在中文情感分类中的性能,结果表明该模型在中文中表现良好。源代码已在github上发布:此https URL
17. Improving Multilingual Neural Machine Translation For Low-Resource Languages: French-, English- Vietnamese [PDF] 返回目录
Thi-Vinh Ngo, Phuong-Thai Nguyen, Thanh-Le Ha, Khac-Quy Dinh, Le-Minh Nguyen
Abstract: Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating their embeddings during the training. Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem. We have shown significant improvements of up to +1.62 and +2.54 BLEU points over the bilingual baseline systems for both language pairs and released our datasets for the research community.
摘要:先前的工作表明,资源匮乏的语言对可以从多语言机器翻译(MT)系统中受益,该系统依赖于许多语言对的联合培训。 本文针对两种资源较少的语言对:法语-越南语和英语-越南语,提出了两种简单的策略来解决多语言MT系统中的稀有单词问题。 第一种策略是动态学习源语言之间共享空间中令牌的单词相似性,而另一种策略则是尝试通过在培训期间更新其嵌入来增强稀有单词的翻译能力。 此外,我们在多语言MT系统中使用单语言数据,以增加合成并行语料库的数量,同时解决数据稀疏性问题。 对于这两种语言对,我们已经显示出比双语基准系统高出+1.62和+2.54 BLEU点的显着改善,并为研究社区发布了我们的数据集。
Thi-Vinh Ngo, Phuong-Thai Nguyen, Thanh-Le Ha, Khac-Quy Dinh, Le-Minh Nguyen
Abstract: Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating their embeddings during the training. Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem. We have shown significant improvements of up to +1.62 and +2.54 BLEU points over the bilingual baseline systems for both language pairs and released our datasets for the research community.
摘要:先前的工作表明,资源匮乏的语言对可以从多语言机器翻译(MT)系统中受益,该系统依赖于许多语言对的联合培训。 本文针对两种资源较少的语言对:法语-越南语和英语-越南语,提出了两种简单的策略来解决多语言MT系统中的稀有单词问题。 第一种策略是动态学习源语言之间共享空间中令牌的单词相似性,而另一种策略则是尝试通过在培训期间更新其嵌入来增强稀有单词的翻译能力。 此外,我们在多语言MT系统中使用单语言数据,以增加合成并行语料库的数量,同时解决数据稀疏性问题。 对于这两种语言对,我们已经显示出比双语基准系统高出+1.62和+2.54 BLEU点的显着改善,并为研究社区发布了我们的数据集。
18. DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition [PDF] 返回目录
Weizhou Shen, Junqing Chen, Xiaojun Quan, Zhixian Xie
Abstract: This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.
摘要:本文介绍了我们在使用预先训练的语言模型进行会话中的情感识别(ERC)方面的开拓性工作。与常规文档不同,会话语音在不同的参与者之间交替出现,通常在以前的工作中被组织为层次结构。这样的结构不利于诸如XLNet之类的预训练语言模型的应用。为了解决此问题,我们提出了一种多合一的XLNet模型,即DialogXL,该模型具有增强的内存以存储更长的历史上下文以及具有对话意识的自我关注以处理多方结构。具体来说,我们首先将XLNet的递归机制从段级别修改为话语级别,以便更好地对会话数据建模。其次,我们引入了对话感知自注意力来代替XLNet中的香草自注意力,以捕获有用的扬声器内和扬声器间依赖性。在四个ERC基准上进行了广泛的实验,并提供了主流模型进行比较。实验结果表明,该模型在所有数据集上均优于基线。还进行了其他一些实验,例如消融研究和误差分析,结果证实了DialogXL关键模块的作用。
Weizhou Shen, Junqing Chen, Xiaojun Quan, Zhixian Xie
Abstract: This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.
摘要:本文介绍了我们在使用预先训练的语言模型进行会话中的情感识别(ERC)方面的开拓性工作。与常规文档不同,会话语音在不同的参与者之间交替出现,通常在以前的工作中被组织为层次结构。这样的结构不利于诸如XLNet之类的预训练语言模型的应用。为了解决此问题,我们提出了一种多合一的XLNet模型,即DialogXL,该模型具有增强的内存以存储更长的历史上下文以及具有对话意识的自我关注以处理多方结构。具体来说,我们首先将XLNet的递归机制从段级别修改为话语级别,以便更好地对会话数据建模。其次,我们引入了对话感知自注意力来代替XLNet中的香草自注意力,以捕获有用的扬声器内和扬声器间依赖性。在四个ERC基准上进行了广泛的实验,并提供了主流模型进行比较。实验结果表明,该模型在所有数据集上均优于基线。还进行了其他一些实验,例如消融研究和误差分析,结果证实了DialogXL关键模块的作用。
19. Pre-Training Transformers as Energy-Based Cloze Models [PDF] 返回目录
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Abstract: We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
摘要:我们介绍了Electric,这是一种基于能量的完形填空模型,用于通过文本进行表示学习。 像BERT一样,它是给定令牌上下文的条件生成模型。 但是,Electric不使用屏蔽或在上下文中可能发生的令牌上输出完整的分布。 取而代之的是,它为每个输入令牌分配一个标量能量得分,指示给定其上下文的可能性。 我们使用基于噪声对比估计的算法对Electric进行训练,并阐明该学习目标与最近提出的ELECTRA预训练方法如何紧密相关。 Electric在转移到下游任务时表现良好,在生成文本的可能性得分方面特别有效:它比语言模型更好地对语音识别n最佳列表进行了重新排序,比掩盖的语言模型要快得多。 此外,它提供了关于ELECTRA在预培训期间学到的知识的更清晰,更原则的看法。
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Abstract: We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
摘要:我们介绍了Electric,这是一种基于能量的完形填空模型,用于通过文本进行表示学习。 像BERT一样,它是给定令牌上下文的条件生成模型。 但是,Electric不使用屏蔽或在上下文中可能发生的令牌上输出完整的分布。 取而代之的是,它为每个输入令牌分配一个标量能量得分,指示给定其上下文的可能性。 我们使用基于噪声对比估计的算法对Electric进行训练,并阐明该学习目标与最近提出的ELECTRA预训练方法如何紧密相关。 Electric在转移到下游任务时表现良好,在生成文本的可能性得分方面特别有效:它比语言模型更好地对语音识别n最佳列表进行了重新排序,比掩盖的语言模型要快得多。 此外,它提供了关于ELECTRA在预培训期间学到的知识的更清晰,更原则的看法。
20. Exploring Transfer Learning For End-to-End Spoken Language Understanding [PDF] 返回目录
Subendhu Rongali, Beiye Liu, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Wael Hamza
Abstract: Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). We call this the Audio-Text All-Task (AT-AT) Model and we show that it beats the performance of E2E models trained on individual tasks, especially ones trained on limited data. We show this result on an internal music dataset and two public datasets, FluentSpeech and SNIPS Audio, where we achieve state-of-the-art results. Since our model can process both speech and text input sequences and learn to predict a target sequence, it also allows us to do zero-shot E2E SLU by training on only text-hypothesis data (without any speech) from a new domain. We evaluate this ability of our model on the Facebook TOP dataset and set a new benchmark for zeroshot E2E performance. We will soon release the audio data collected for the TOP dataset for future research.
摘要:语音助手(例如Alexa,Siri和Google助手)通常使用两阶段的口语理解管道;首先是自动语音识别(ASR)组件,用于处理客户语音并生成文本转录,其次是自然语言理解(NLU)组件,以将转录映射到可行的假设。直接从语音到假设的端到端(E2E)系统是一种更具吸引力的选择。这些系统显示出更小,更快和更好的优化。但是,它们需要大量的端到端培训数据,此外,不要利用已经可用的ASR和NLU培训数据。在这项工作中,我们提出了一种E2E系统,该系统旨在共同训练多个语音转文本任务,例如ASR(语音转录)和SLU(语音假设),以及文本转文本任务,例如NLU(文本假设)。我们将其称为音频文本所有任务(AT-AT)模型,并证明它优于在单个任务上训练的E2E模型(尤其是在有限数据上训练的模型)的性能。我们在内部音乐数据集和两个公共数据集FluentSpeech和SNIPS Audio上显示了此结果,从而获得了最新的结果。由于我们的模型既可以处理语音和文本输入序列,又可以学习预测目标序列,因此它还允许我们通过仅训练来自新域的文本假设数据(没有任何语音)来进行零散E2E SLU。我们在Facebook TOP数据集上评估了模型的这种能力,并为零散E2E性能设定了新的基准。我们将很快发布为TOP数据集收集的音频数据,以备将来研究。
Subendhu Rongali, Beiye Liu, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Wael Hamza
Abstract: Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). We call this the Audio-Text All-Task (AT-AT) Model and we show that it beats the performance of E2E models trained on individual tasks, especially ones trained on limited data. We show this result on an internal music dataset and two public datasets, FluentSpeech and SNIPS Audio, where we achieve state-of-the-art results. Since our model can process both speech and text input sequences and learn to predict a target sequence, it also allows us to do zero-shot E2E SLU by training on only text-hypothesis data (without any speech) from a new domain. We evaluate this ability of our model on the Facebook TOP dataset and set a new benchmark for zeroshot E2E performance. We will soon release the audio data collected for the TOP dataset for future research.
摘要:语音助手(例如Alexa,Siri和Google助手)通常使用两阶段的口语理解管道;首先是自动语音识别(ASR)组件,用于处理客户语音并生成文本转录,其次是自然语言理解(NLU)组件,以将转录映射到可行的假设。直接从语音到假设的端到端(E2E)系统是一种更具吸引力的选择。这些系统显示出更小,更快和更好的优化。但是,它们需要大量的端到端培训数据,此外,不要利用已经可用的ASR和NLU培训数据。在这项工作中,我们提出了一种E2E系统,该系统旨在共同训练多个语音转文本任务,例如ASR(语音转录)和SLU(语音假设),以及文本转文本任务,例如NLU(文本假设)。我们将其称为音频文本所有任务(AT-AT)模型,并证明它优于在单个任务上训练的E2E模型(尤其是在有限数据上训练的模型)的性能。我们在内部音乐数据集和两个公共数据集FluentSpeech和SNIPS Audio上显示了此结果,从而获得了最新的结果。由于我们的模型既可以处理语音和文本输入序列,又可以学习预测目标序列,因此它还允许我们通过仅训练来自新域的文本假设数据(没有任何语音)来进行零散E2E SLU。我们在Facebook TOP数据集上评估了模型的这种能力,并为零散E2E性能设定了新的基准。我们将很快发布为TOP数据集收集的音频数据,以备将来研究。
21. A Closer Look at the Robustness of Vision-and-Language Pre-trained Models [PDF] 返回目录
Linjie Li, Zhe Gan, Jingjing Liu
Abstract: Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level. Although achieving impressive performance on standard tasks, to date, it still remains unclear how robust these pre-trained models are. To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) Visual Content Manipulation; and (iv) Answer Distribution Shift. Interestingly, by standard model finetuning, pre-trained V+L models already exhibit better robustness than many task-specific state-of-the-art methods. To further enhance model robustness, we propose Mango, a generic and efficient approach that learns a Multimodal Adversarial Noise GeneratOr in the embedding space to fool pre-trained V+L models. Differing from previous studies focused on one specific type of robustness, Mango is task-agnostic, and enables universal performance lift for pre-trained models over diverse tasks designed to evaluate broad aspects of robustness. Comprehensive experiments demonstrate that Mango achieves new state of the art on 7 out of 9 robustness benchmarks, surpassing existing methods by a significant margin. As the first comprehensive study on V+L robustness, this work puts robustness of pre-trained models into sharper focus, pointing new directions for future study.
摘要:ViLBERT和UNITER等大型预训练多模态变压器已将视觉和语言(V + L)研究的最新水平推向了一个新水平。尽管迄今为止在标准任务上实现了令人印象深刻的性能,但到目前为止,仍不清楚这些预训练模型的鲁棒性。为了进行调查,我们针对现有的预训练模型对4种不同类型的V + L特定模型的鲁棒性进行了全面的评估:(i)语言变异; (ii)逻辑推理; (iii)视觉内容操纵; (iv)答案分布转移。有趣的是,通过标准模型微调,预训练的V + L模型已经比许多特定于任务的最新方法表现出更好的鲁棒性。为了进一步增强模型的鲁棒性,我们提出了Mango,这是一种通用且有效的方法,它可以在嵌入空间中学习多模式对抗性噪声产生器,以欺骗预训练的V + L模型。与以往针对一种特定类型的鲁棒性的研究不同,Mango与任务无关,可以为各种任务(旨在评估鲁棒性的广泛方面)的预训练模型提供通用的性能提升。全面的实验表明,Mango在9个鲁棒性基准中有7个达到了最新水平,大大超过了现有方法。作为对V + L鲁棒性的第一个综合研究,这项工作将预先训练的模型的鲁棒性放在了更加明确的焦点上,为今后的研究指明了新的方向。
Linjie Li, Zhe Gan, Jingjing Liu
Abstract: Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level. Although achieving impressive performance on standard tasks, to date, it still remains unclear how robust these pre-trained models are. To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) Visual Content Manipulation; and (iv) Answer Distribution Shift. Interestingly, by standard model finetuning, pre-trained V+L models already exhibit better robustness than many task-specific state-of-the-art methods. To further enhance model robustness, we propose Mango, a generic and efficient approach that learns a Multimodal Adversarial Noise GeneratOr in the embedding space to fool pre-trained V+L models. Differing from previous studies focused on one specific type of robustness, Mango is task-agnostic, and enables universal performance lift for pre-trained models over diverse tasks designed to evaluate broad aspects of robustness. Comprehensive experiments demonstrate that Mango achieves new state of the art on 7 out of 9 robustness benchmarks, surpassing existing methods by a significant margin. As the first comprehensive study on V+L robustness, this work puts robustness of pre-trained models into sharper focus, pointing new directions for future study.
摘要:ViLBERT和UNITER等大型预训练多模态变压器已将视觉和语言(V + L)研究的最新水平推向了一个新水平。尽管迄今为止在标准任务上实现了令人印象深刻的性能,但到目前为止,仍不清楚这些预训练模型的鲁棒性。为了进行调查,我们针对现有的预训练模型对4种不同类型的V + L特定模型的鲁棒性进行了全面的评估:(i)语言变异; (ii)逻辑推理; (iii)视觉内容操纵; (iv)答案分布转移。有趣的是,通过标准模型微调,预训练的V + L模型已经比许多特定于任务的最新方法表现出更好的鲁棒性。为了进一步增强模型的鲁棒性,我们提出了Mango,这是一种通用且有效的方法,它可以在嵌入空间中学习多模式对抗性噪声产生器,以欺骗预训练的V + L模型。与以往针对一种特定类型的鲁棒性的研究不同,Mango与任务无关,可以为各种任务(旨在评估鲁棒性的广泛方面)的预训练模型提供通用的性能提升。全面的实验表明,Mango在9个鲁棒性基准中有7个达到了最新水平,大大超过了现有方法。作为对V + L鲁棒性的第一个综合研究,这项工作将预先训练的模型的鲁棒性放在了更加明确的焦点上,为今后的研究指明了新的方向。
注:中文为机器翻译结果!封面为论文标题词云图!