
目录
1. LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets [PDF] 摘要
2. ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model [PDF] 摘要
3. Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification [PDF] 摘要
4. kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification [PDF] 摘要
5. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition [PDF] 摘要
6. NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier [PDF] 摘要
摘要
1. LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets [PDF] 返回目录
Abhilasha Sancheti, Kushal Chawla, Gaurav Verma
Abstract: We describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.
摘要:我们描述了我们系统的信息COVID-19英语鸣叫的识别WNUT-2020共享任务。我们的系统是各种机器学习方法的集合,利用传统的基于特征的分类以及在预先训练的语言模型的最新进展,在捕捉语法,语义,并从上下文鸣叫功能的帮助。我们进一步使用伪标签,纳入有关流感大流行公布的未标记的Twitter数据。我们最好的执行模型实现了0.9179所提供的验证集和0.8805的盲测设定的F1-得分。
Abhilasha Sancheti, Kushal Chawla, Gaurav Verma
Abstract: We describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.
摘要:我们描述了我们系统的信息COVID-19英语鸣叫的识别WNUT-2020共享任务。我们的系统是各种机器学习方法的集合,利用传统的基于特征的分类以及在预先训练的语言模型的最新进展,在捕捉语法,语义,并从上下文鸣叫功能的帮助。我们进一步使用伪标签,纳入有关流感大流行公布的未标记的Twitter数据。我们最好的执行模型实现了0.9179所提供的验证集和0.8805的盲测设定的F1-得分。
2. ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model [PDF] 返回目录
Zhengjie Huang, Shikun Feng, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun
Abstract: This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final M atchm metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics
摘要:本文介绍了通过摇奖团队所设计的系统,该系统SemEval-2020任务10获得第一名:重点选择对于书面文字在视觉媒体。给定一个句子,要求我们找出的建议自动化设计中最重要的词。我们利用无监督前的训练模式和微调我们的任务,这些车型。经过调查,我们发现以下车型在该任务中取得了优良的性能:摇奖2.0,XLM-ROBERTA,ROBERTA和阿尔伯特。我们结合逐点回归损失和成对的排名损失,这是更接近最终的中号atchm度量微调我们的模型。而且我们还发现,更多的功能设计和数据增强可以帮助提高性能。我们最好的模式实现了0.823的最高得分和排名第一的各种指标
Zhengjie Huang, Shikun Feng, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun
Abstract: This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final M atchm metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics
摘要:本文介绍了通过摇奖团队所设计的系统,该系统SemEval-2020任务10获得第一名:重点选择对于书面文字在视觉媒体。给定一个句子,要求我们找出的建议自动化设计中最重要的词。我们利用无监督前的训练模式和微调我们的任务,这些车型。经过调查,我们发现以下车型在该任务中取得了优良的性能:摇奖2.0,XLM-ROBERTA,ROBERTA和阿尔伯特。我们结合逐点回归损失和成对的排名损失,这是更接近最终的中号atchm度量微调我们的模型。而且我们还发现,更多的功能设计和数据增强可以帮助提高性能。我们最好的模式实现了0.823的最高得分和排名第一的各种指标
3. Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification [PDF] 返回目录
Samuel Louvan, Bernardo Magnini
Abstract: Neural-based models have achieved outstanding performance on slot filling and intent classification, when fairly large in-domain training data are available. However, as new domains are frequently added, creating sizeable data is expensive. We show that lightweight augmentation, a set of augmentation methods involving word span and sentence level operations, alleviates data scarcity problems. Our experiments on limited data settings show that lightweight augmentation yields significant performance improvement on slot filling on the ATIS and SNIPS datasets, and achieves competitive performance with respect to more complex, state-of-the-art, augmentation approaches. Furthermore, lightweight augmentation is also beneficial when combined with pre-trained LM-based models, as it improves BERT-based joint intent and slot filling models.
摘要:基于神经模型已经取得的槽分配和意图的分类,当可用相当大的域训练数据表现出色。然而,随着新域频繁地增加,创造可观的数据是昂贵的。我们发现,轻质增强,一套涉及字跨度和句子层面的操作,缓解数据匮乏的问题隆胸方法。我们的有限的数据设置实验表明,轻质增强产生于槽填充显著性能改进的ATIS和剪刀的数据集,并实现相对于更复杂的竞争性能,国家的最先进的,增强方法。此外,当与预训练基于LM-模型相结合,因为它改进了基于BERT关节意图和槽填充模型轻质增强也是有益的。
Samuel Louvan, Bernardo Magnini
Abstract: Neural-based models have achieved outstanding performance on slot filling and intent classification, when fairly large in-domain training data are available. However, as new domains are frequently added, creating sizeable data is expensive. We show that lightweight augmentation, a set of augmentation methods involving word span and sentence level operations, alleviates data scarcity problems. Our experiments on limited data settings show that lightweight augmentation yields significant performance improvement on slot filling on the ATIS and SNIPS datasets, and achieves competitive performance with respect to more complex, state-of-the-art, augmentation approaches. Furthermore, lightweight augmentation is also beneficial when combined with pre-trained LM-based models, as it improves BERT-based joint intent and slot filling models.
摘要:基于神经模型已经取得的槽分配和意图的分类,当可用相当大的域训练数据表现出色。然而,随着新域频繁地增加,创造可观的数据是昂贵的。我们发现,轻质增强,一套涉及字跨度和句子层面的操作,缓解数据匮乏的问题隆胸方法。我们的有限的数据设置实验表明,轻质增强产生于槽填充显著性能改进的ATIS和剪刀的数据集,并实现相对于更复杂的竞争性能,国家的最先进的,增强方法。此外,当与预训练基于LM-模型相结合,因为它改进了基于BERT关节意图和槽填充模型轻质增强也是有益的。
4. kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification [PDF] 返回目录
Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie Huang, Weiyue Su
Abstract: Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. Furthermore, the adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
摘要:代码转换就是音箱分享一种以上的语言多语言设置中可能出现的语言现象。用不同的语言群体之间的交流不断增加,这种现象越来越普遍。不过,也有一些研究和数据在这方面,尤其是在代码混合情感分类。在这项工作中,域转移从国家的最先进的单语言模型学习摇奖上的代码混合数据集进行测试,并且令人惊讶地,具有强大的基线得以实现。此外,对抗性训练与多语种的模型来实现SemEval-2020任务9印地文,英文情感分类竞赛第一名。
Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie Huang, Weiyue Su
Abstract: Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. Furthermore, the adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
摘要:代码转换就是音箱分享一种以上的语言多语言设置中可能出现的语言现象。用不同的语言群体之间的交流不断增加,这种现象越来越普遍。不过,也有一些研究和数据在这方面,尤其是在代码混合情感分类。在这项工作中,域转移从国家的最先进的单语言模型学习摇奖上的代码混合数据集进行测试,并且令人惊讶地,具有强大的基线得以实现。此外,对抗性训练与多语种的模型来实现SemEval-2020任务9印地文,英文情感分类竞赛第一名。
5. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition [PDF] 返回目录
Gizem Soğancıoğlu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah, Alexey Karpov
Abstract: Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set performances of various models, we also propose alternative training and decision fusion strategies to better estimate and improve the generalization performance.
摘要:声学和老年人情感识别语言分析是欠学习和具有挑战性的研究方向,但对于创作老人数字助理,以及在他们的住所进行心理保健目的的老人不显眼的远程监控是必不可少的。本文介绍了我们的INTERSPEECH 2020计算Paralinguistics挑战(比较)的贡献 - 老年情感子的挑战,其中包括对觉醒和价承认2个三元分类任务。我们提出了一个双峰框架,而这些任务都通过国家的最先进的声学和语言特征蓝本,分别。在这项研究中,我们证明了利用任务特定的词典和资源可以提高语言模型的性能,当标记的数据量很小。观察各种车型的开发和测试集的表演之间的高匹配,我们也提出了另类的培训和决策融合策略,以更好地估计,提高了泛化能力。
Gizem Soğancıoğlu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah, Alexey Karpov
Abstract: Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set performances of various models, we also propose alternative training and decision fusion strategies to better estimate and improve the generalization performance.
摘要:声学和老年人情感识别语言分析是欠学习和具有挑战性的研究方向,但对于创作老人数字助理,以及在他们的住所进行心理保健目的的老人不显眼的远程监控是必不可少的。本文介绍了我们的INTERSPEECH 2020计算Paralinguistics挑战(比较)的贡献 - 老年情感子的挑战,其中包括对觉醒和价承认2个三元分类任务。我们提出了一个双峰框架,而这些任务都通过国家的最先进的声学和语言特征蓝本,分别。在这项研究中,我们证明了利用任务特定的词典和资源可以提高语言模型的性能,当标记的数据量很小。观察各种车型的开发和测试集的表演之间的高匹配,我们也提出了另类的培训和决策融合策略,以更好地估计,提高了泛化能力。
6. NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier [PDF] 返回目录
Jason Angel, Segun Taofeek Aroyehun, Antonio Tamayo, Alexander Gelbukh
Abstract: Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0.71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting.
摘要:码转换是在其中两个或更多个语言相同的消息中所使用的现象。如今,它是很常见的找到在社交媒体混合语言的消息。这种现象提出了情感分析是一个挑战。在本文中,我们使用标准的卷积神经网络模型来预测鸣叫的情绪西班牙语和英语语言的融合。我们简单的方法实现的0.71 F1-比分上的竞争测试集。我们分析我们的最佳模型的能力和进行错误分析,揭露了在代码转换设置的情绪进行分类的重要困难。
Jason Angel, Segun Taofeek Aroyehun, Antonio Tamayo, Alexander Gelbukh
Abstract: Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0.71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting.
摘要:码转换是在其中两个或更多个语言相同的消息中所使用的现象。如今,它是很常见的找到在社交媒体混合语言的消息。这种现象提出了情感分析是一个挑战。在本文中,我们使用标准的卷积神经网络模型来预测鸣叫的情绪西班牙语和英语语言的融合。我们简单的方法实现的0.71 F1-比分上的竞争测试集。我们分析我们的最佳模型的能力和进行错误分析,揭露了在代码转换设置的情绪进行分类的重要困难。
7. Leam: An Interactive System for In-situ Visual Text Analysis [PDF] 返回目录
Sajjadur Rahman, Peter Griggs, Çağatay Demiralp
Abstract: With the increase in scale and availability of digital text generated on the web, enterprises such as online retailers and aggregators often use text analytics to mine and analyze the data to improve their services and products alike. Text data analysis is an iterative, non-linear process with diverse workflows spanning multiple stages, from data cleaning to visualization. Existing text analytics systems usually accommodate a subset of these stages and often fail to address challenges related to data heterogeneity, provenance, workflow reusability and reproducibility, and compatibility with established practices. Based on a set of design considerations we derive from these challenges, we propose Leam, a system that treats the text analysis process as a single continuum by combining advantages of computational notebooks, spreadsheets, and visualization tools. Leam features an interactive user interface for running text analysis workflows, a new data model for managing multiple atomic and composite data types, and an expressive algebra that captures diverse sets of operations representing various stages of text analysis and enables coordination among different components of the system, including data, code, and visualizations. We report our current progress in Leam development while demonstrating its usefulness with usage examples. Finally, we outline a number of enhancements to Leam and identify several research directions for developing an interactive visual text analysis system.
摘要:随着网络上产生的规模的扩大和可用性数字文本的,企业如在线零售商和集成商经常使用的文本分析来挖掘和分析数据,以提高自己的服务和产品的一致好评。文本数据分析是与不同的工作流跨越多个阶段,从数据清理到可视化迭代,非线性过程。现有的文本分析系统通常容纳这些阶段的一个子集,往往不能与数据的异质性,种源,应对挑战的工作流程的可重用性和可重复性,并与之建立规范的兼容性。基于一组设计考虑,我们从这些挑战中获得的,我们提出易学,一个系统,对待文本分析过程,通过合并计算笔记本电脑,电子表格和可视化工具的优势单一的连续。易学的特征在于用于运行文本分析的工作流,用于管理多个原子和复合数据的类型的新的数据模型,以及表现代数捕获多样集合表示文本分析的不同阶段的操作,并且使系统的不同组件之间的协调的交互式用户界面包括数据,代码,和可视化。我们在易学发展报告我们目前的进展,同时展示与用法示例其实用性。最后,我们概述了一些增强功能,易学,并确定发展方式的交互可视文本分析系统中的几个研究方向。
Sajjadur Rahman, Peter Griggs, Çağatay Demiralp
Abstract: With the increase in scale and availability of digital text generated on the web, enterprises such as online retailers and aggregators often use text analytics to mine and analyze the data to improve their services and products alike. Text data analysis is an iterative, non-linear process with diverse workflows spanning multiple stages, from data cleaning to visualization. Existing text analytics systems usually accommodate a subset of these stages and often fail to address challenges related to data heterogeneity, provenance, workflow reusability and reproducibility, and compatibility with established practices. Based on a set of design considerations we derive from these challenges, we propose Leam, a system that treats the text analysis process as a single continuum by combining advantages of computational notebooks, spreadsheets, and visualization tools. Leam features an interactive user interface for running text analysis workflows, a new data model for managing multiple atomic and composite data types, and an expressive algebra that captures diverse sets of operations representing various stages of text analysis and enables coordination among different components of the system, including data, code, and visualizations. We report our current progress in Leam development while demonstrating its usefulness with usage examples. Finally, we outline a number of enhancements to Leam and identify several research directions for developing an interactive visual text analysis system.
摘要:随着网络上产生的规模的扩大和可用性数字文本的,企业如在线零售商和集成商经常使用的文本分析来挖掘和分析数据,以提高自己的服务和产品的一致好评。文本数据分析是与不同的工作流跨越多个阶段,从数据清理到可视化迭代,非线性过程。现有的文本分析系统通常容纳这些阶段的一个子集,往往不能与数据的异质性,种源,应对挑战的工作流程的可重用性和可重复性,并与之建立规范的兼容性。基于一组设计考虑,我们从这些挑战中获得的,我们提出易学,一个系统,对待文本分析过程,通过合并计算笔记本电脑,电子表格和可视化工具的优势单一的连续。易学的特征在于用于运行文本分析的工作流,用于管理多个原子和复合数据的类型的新的数据模型,以及表现代数捕获多样集合表示文本分析的不同阶段的操作,并且使系统的不同组件之间的协调的交互式用户界面包括数据,代码,和可视化。我们在易学发展报告我们目前的进展,同时展示与用法示例其实用性。最后,我们概述了一些增强功能,易学,并确定发展方式的交互可视文本分析系统中的几个研究方向。
8. Robust Conversational AI with Grounded Text Generation [PDF] 返回目录
Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Heung-Yeung Shum
Abstract: This article presents a hybrid approach based on a Grounded Text Generation (GTG) model to building robust task bots at scale. GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone, combined with symbol-manipulation modules for knowledge base inference and prior knowledge encoding, to generate responses grounded in dialog belief state and real-world knowledge for task completion. GTG is pre-trained on large amounts of raw text and human conversational data, and can be fine-tuned to complete a wide range of tasks. The hybrid approach and its variants are being developed simultaneously by multiple research teams. The primary results reported on task-oriented dialog benchmarks are very promising, demonstrating the big potential of this approach. This article provides an overview of this progress and discusses related methods and technologies that can be incorporated for building robust conversational AI systems.
摘要:本文提出了一种基于接地文本生成(GTG)模型在大规模建设强大的任务的机器人的混合方式。 GTG是使用一个大型变压器神经网络为骨干,与知识基础的推理和先验知识编码符号操纵模块相结合,以产生在完成任务对话框信仰状态和现实世界的知识接地反应的混合模式。 GTG是预先训练上大量原始文本和人类对话的数据,并可以进行微调,以完成各种任务。混合方法及其变体由多个研究团队同步发展。报告面向任务的对话框基准的主要结果非常乐观,证明这种方法的巨大潜力。本文提供的这一进展的概述,并讨论了相关的方法和可合并建立强大的对话AI系统技术。
Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Heung-Yeung Shum
Abstract: This article presents a hybrid approach based on a Grounded Text Generation (GTG) model to building robust task bots at scale. GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone, combined with symbol-manipulation modules for knowledge base inference and prior knowledge encoding, to generate responses grounded in dialog belief state and real-world knowledge for task completion. GTG is pre-trained on large amounts of raw text and human conversational data, and can be fine-tuned to complete a wide range of tasks. The hybrid approach and its variants are being developed simultaneously by multiple research teams. The primary results reported on task-oriented dialog benchmarks are very promising, demonstrating the big potential of this approach. This article provides an overview of this progress and discusses related methods and technologies that can be incorporated for building robust conversational AI systems.
摘要:本文提出了一种基于接地文本生成(GTG)模型在大规模建设强大的任务的机器人的混合方式。 GTG是使用一个大型变压器神经网络为骨干,与知识基础的推理和先验知识编码符号操纵模块相结合,以产生在完成任务对话框信仰状态和现实世界的知识接地反应的混合模式。 GTG是预先训练上大量原始文本和人类对话的数据,并可以进行微调,以完成各种任务。混合方法及其变体由多个研究团队同步发展。报告面向任务的对话框基准的主要结果非常乐观,证明这种方法的巨大潜力。本文提供的这一进展的概述,并讨论了相关的方法和可合并建立强大的对话AI系统技术。
9. Generative Language Modeling for Automated Theorem Proving [PDF] 返回目录
Stanislas Polu, Ilya Sutskever
Abstract: We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans -- the generation of original mathematical terms -- might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. GPT-f found new short proofs that were accepted into the main Metamath library, which is to our knowledge, the first time a deep-learning based system has contributed proofs that were adopted by a formal mathematics community.
摘要:本文探讨基于变压器的语言模型自动定理证明中的应用。原来数学术语的生成 - - 可能通过代是可寻址的从语言模型这项工作是由该可能性的自动定理证明相对于人类的主要限制的动机。我们提出了一个自动化的证明和证明助手,GPT-F,用于Metamath形式化的语言,并分析其性能。 GPT-F发现,被接受进入主Metamath库,这是我们所知,第一次深刻的学习基础的系统,贡献的一个正式的数学界所通过的新的证据证明短。
Stanislas Polu, Ilya Sutskever
Abstract: We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans -- the generation of original mathematical terms -- might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. GPT-f found new short proofs that were accepted into the main Metamath library, which is to our knowledge, the first time a deep-learning based system has contributed proofs that were adopted by a formal mathematics community.
摘要:本文探讨基于变压器的语言模型自动定理证明中的应用。原来数学术语的生成 - - 可能通过代是可寻址的从语言模型这项工作是由该可能性的自动定理证明相对于人类的主要限制的动机。我们提出了一个自动化的证明和证明助手,GPT-F,用于Metamath形式化的语言,并分析其性能。 GPT-F发现,被接受进入主Metamath库,这是我们所知,第一次深刻的学习基础的系统,贡献的一个正式的数学界所通过的新的证据证明短。
注:中文为机器翻译结果!封面为论文标题词云图!