目录
4. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation [PDF] 摘要
16. Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning [PDF] 摘要
摘要
1. AMALGUM -- A Free, Balanced, Multilayer English Web Corpus [PDF] 返回目录
Luke Gessler, Siyao Peng, Yang Liu, Yilun Zhu, Shabnam Behzad, Amir Zeldes
Abstract: We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory. By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets, while avoiding pitfalls such as imbalanced or unknown composition, licensing problems, and low-quality natural language processing. We harness knowledge from multiple annotation layers in order to achieve a "better than NLP" benchmark and evaluate the accuracy of the resulting resource.
摘要:我们提出一个可以自由使用,流派平衡英文网页语料共计4M令牌和配备了大量高品质的自动标注层,包括依赖树木,非命名实体的注解,指代消解,和话语树木修辞结构理论。通过轻敲开放在线数据源的语料库是为了提供一个更庞大的替代较小的手动创建的注释的数据集,同时避免缺陷诸如不平衡的或未知的组合物,牌的问题,并且低质量的自然语言处理。来自多个注释层,以达到“优于NLP”的基准,评价所形成的资源的准确性,我们利用知识。
Luke Gessler, Siyao Peng, Yang Liu, Yilun Zhu, Shabnam Behzad, Amir Zeldes
Abstract: We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory. By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets, while avoiding pitfalls such as imbalanced or unknown composition, licensing problems, and low-quality natural language processing. We harness knowledge from multiple annotation layers in order to achieve a "better than NLP" benchmark and evaluate the accuracy of the resulting resource.
摘要:我们提出一个可以自由使用,流派平衡英文网页语料共计4M令牌和配备了大量高品质的自动标注层,包括依赖树木,非命名实体的注解,指代消解,和话语树木修辞结构理论。通过轻敲开放在线数据源的语料库是为了提供一个更庞大的替代较小的手动创建的注释的数据集,同时避免缺陷诸如不平衡的或未知的组合物,牌的问题,并且低质量的自然语言处理。来自多个注释层,以达到“优于NLP”的基准,评价所形成的资源的准确性,我们利用知识。
2. Explainable and Discourse Topic-aware Neural Language Understanding [PDF] 返回目录
Yatin Chaudhary, Hinrich Schütze, Pankaj Gupta
Abstract: Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate latent document topic proportions and ignore topical discourse in sentences of the document. This work extends the line of research by additionally introducing an explainable topic representation in language understanding, obtained from a set of key terms correspondingly for each latent topic of the proportion. Moreover, we retain sentence-topic associations along with document-topic association by modeling topical discourse for every sentence in the document. We present a novel neural composite language model that exploits both the latent and explainable topics along with topical discourse at sentence-level in a joint learning framework of topic and language models. Experiments over a range of tasks such as language modeling, word sense disambiguation, document classification, retrieval and text generation demonstrate ability of the proposed model in improving language understanding.
摘要:拟结婚主题模型和语言模型自曝语言理解到文档级上下文的句子之外通过主题更广泛的来源。虽然在语言模型引入局部的语义,现有的方法包括潜在的文档主题比例和忽略文件的句子专题论述。这项工作的延伸研究通过另外引入在语言理解的解释的主题的表示,从一组关键术语的相应对比例的每一个潜在主题获得的线。此外,我们通过在文档中的每一句话造型外用话语保留句话题与关联文档主题协会一起。我们提出了利用两个潜伏并在句子层面的话题和语言模型的联合学习框架外用话语沿解释的主题新颖的神经复合语言模型。在一定范围内的任务,如语言模型,词义消歧,文档分类,检索和文本生成的实验证明在提高语言理解该模型的能力。
Yatin Chaudhary, Hinrich Schütze, Pankaj Gupta
Abstract: Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate latent document topic proportions and ignore topical discourse in sentences of the document. This work extends the line of research by additionally introducing an explainable topic representation in language understanding, obtained from a set of key terms correspondingly for each latent topic of the proportion. Moreover, we retain sentence-topic associations along with document-topic association by modeling topical discourse for every sentence in the document. We present a novel neural composite language model that exploits both the latent and explainable topics along with topical discourse at sentence-level in a joint learning framework of topic and language models. Experiments over a range of tasks such as language modeling, word sense disambiguation, document classification, retrieval and text generation demonstrate ability of the proposed model in improving language understanding.
摘要:拟结婚主题模型和语言模型自曝语言理解到文档级上下文的句子之外通过主题更广泛的来源。虽然在语言模型引入局部的语义,现有的方法包括潜在的文档主题比例和忽略文件的句子专题论述。这项工作的延伸研究通过另外引入在语言理解的解释的主题的表示,从一组关键术语的相应对比例的每一个潜在主题获得的线。此外,我们通过在文档中的每一句话造型外用话语保留句话题与关联文档主题协会一起。我们提出了利用两个潜伏并在句子层面的话题和语言模型的联合学习框架外用话语沿解释的主题新颖的神经复合语言模型。在一定范围内的任务,如语言模型,词义消歧,文档分类,检索和文本生成的实验证明在提高语言理解该模型的能力。
3. Pre-trained Language Models as Symbolic Reasoners over Knowledge? [PDF] 返回目录
Nora Kassner, Benno Kroje, Hinrich Schütze
Abstract: How can pre-trained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that establishes a causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs learn to apply some symbolic reasoning rules; but in particular, they struggle with two-hop reasoning. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.
摘要:如何可以预先训练的语言模型(周期性肢体运动障碍)学习训练组事实性知识?我们调查的两个最重要的机制:推理和记忆。在此之前的工作已经试图量化事实周期性肢体运动障碍学的数目,但我们目前使用的合成数据,即建立一个因果关系之间存在的事实在训练和由PLM了解到事实的首次研究。对于推理,我们表明,周期性肢体运动障碍学会运用一些象征性的推理规则;但特别是,他们有两跳推理奋斗。对于记忆,我们确定架构整合(通过其他事实的支持系统FACTS)和频率作为其成功的关键因素。
Nora Kassner, Benno Kroje, Hinrich Schütze
Abstract: How can pre-trained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that establishes a causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs learn to apply some symbolic reasoning rules; but in particular, they struggle with two-hop reasoning. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.
摘要:如何可以预先训练的语言模型(周期性肢体运动障碍)学习训练组事实性知识?我们调查的两个最重要的机制:推理和记忆。在此之前的工作已经试图量化事实周期性肢体运动障碍学的数目,但我们目前使用的合成数据,即建立一个因果关系之间存在的事实在训练和由PLM了解到事实的首次研究。对于推理,我们表明,周期性肢体运动障碍学会运用一些象征性的推理规则;但特别是,他们有两跳推理奋斗。对于记忆,我们确定架构整合(通过其他事实的支持系统FACTS)和频率作为其成功的关键因素。
4. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation [PDF] 返回目录
Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
Abstract: State-of-the-art neural machine translation models generate outputs autoregressively, where every step conditions on the previously generated tokens. This sequential nature causes inherent decoding latency. Non-autoregressive translation techniques, on the other hand, parallelize generation across positions and speed up inference at the expense of translation quality. Much recent effort has been devoted to non-autoregressive methods, aiming for a better balance between speed and quality. In this work, we re-examine the trade-off and argue that transformer-based autoregressive models can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a one-layer autoregressive decoder yields state-of-the-art accuracy with comparable latency to strong non-autoregressive models. Our findings suggest that the latency disadvantage for autoregressive translation has been overestimated due to a suboptimal choice of layer allocation, and we provide a new speed-quality baseline for future research toward fast, accurate translation.
摘要:采用最先进技术的神经机器翻译模型生成的输出autoregressively,其中对之前生成的令牌每一步的条件。这种顺序特性导致固有解码延迟。非自回归翻译技巧,在另一方面,并行化跨代立场和加快在翻译质量为代价的推论。最近很多努力,一直致力于非自回归方法,瞄准速度和质量之间取得更好的平衡。在这项工作中,我们重新审视的权衡,并认为基于变压器的自回归模型可以大大加快无精度损失。具体来说,我们研究与编码器和不同深度的解码器自回归模型。我们广泛实验表明,给定足够深的编码器,译码器的自回归产率状态的最先进的一层精度具有相当延迟,以强的非自回归模型。我们的研究结果表明,自回归翻译延迟的缺点已经被高估,由于次优的选择,层分配的,以及我们为未来朝着快速,准确的翻译研究提供了新的速度质量基准。
Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
Abstract: State-of-the-art neural machine translation models generate outputs autoregressively, where every step conditions on the previously generated tokens. This sequential nature causes inherent decoding latency. Non-autoregressive translation techniques, on the other hand, parallelize generation across positions and speed up inference at the expense of translation quality. Much recent effort has been devoted to non-autoregressive methods, aiming for a better balance between speed and quality. In this work, we re-examine the trade-off and argue that transformer-based autoregressive models can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a one-layer autoregressive decoder yields state-of-the-art accuracy with comparable latency to strong non-autoregressive models. Our findings suggest that the latency disadvantage for autoregressive translation has been overestimated due to a suboptimal choice of layer allocation, and we provide a new speed-quality baseline for future research toward fast, accurate translation.
摘要:采用最先进技术的神经机器翻译模型生成的输出autoregressively,其中对之前生成的令牌每一步的条件。这种顺序特性导致固有解码延迟。非自回归翻译技巧,在另一方面,并行化跨代立场和加快在翻译质量为代价的推论。最近很多努力,一直致力于非自回归方法,瞄准速度和质量之间取得更好的平衡。在这项工作中,我们重新审视的权衡,并认为基于变压器的自回归模型可以大大加快无精度损失。具体来说,我们研究与编码器和不同深度的解码器自回归模型。我们广泛实验表明,给定足够深的编码器,译码器的自回归产率状态的最先进的一层精度具有相当延迟,以强的非自回归模型。我们的研究结果表明,自回归翻译延迟的缺点已经被高估,由于次优的选择,层分配的,以及我们为未来朝着快速,准确的翻译研究提供了新的速度质量基准。
5. Extraction and Evaluation of Formulaic Expressions Used in Scholarly Papers [PDF] 返回目录
Kenichi Iwatsuki, Florian Boudin, Akiko Aizawa
Abstract: Formulaic expressions, such as 'in this paper we propose', are helpful for authors of scholarly papers because they convey communicative functions; in the above, it is showing the aim of this paper'. Thus, resources of formulaic expressions, such as a dictionary, that could be looked up easily would be useful. However, forms of formulaic expressions can often vary to a great extent. For example, 'in this paper we propose', 'in this study we propose' and 'in this paper we propose a new method to' are all regarded as formulaic expressions. Such a diversity of spans and forms causes problems in both extraction and evaluation of formulaic expressions. In this paper, we propose a new approach that is robust to variation of spans and forms of formulaic expressions. Our approach regards a sentence as consisting of a formulaic part and non-formulaic part. Then, instead of trying to extract formulaic expressions from a whole corpus, by extracting them from each sentence, different forms can be dealt with at once. Based on this formulation, to avoid the diversity problem, we propose evaluating extraction methods by how much they convey specific communicative functions rather than by comparing extracted expressions to an existing lexicon. We also propose a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence. Experimental results show that the proposed extraction method achieved the best performance compared to other existing methods.
摘要:公式化表述,“在本文中,我们提出了”,有利于的学术论文的作者,因为他们传达等交际功能;在上文中,它是表示本论文”的目的。因此,公式化表述,如字典的资源,那可以很容易地抬头将是有益的。然而,公式化表达式的形式可以经常变化很大程度。例如,“在本文中,我们提出了”,“在这个研究中,我们提出了”和“在本文中,我们提出了一个新的方法”都视为公式化表述。跨度和形式的这种多样性导致公式化表述的两个提取和评估的问题。在本文中,我们提出了一种新的方法,具有较强的抗跨度和公式化表达形式的变化。我们的做法对于一个句子中包含一个公式化的一部分和非公式化的一部分。然后,而不是试图从整个语料库公式化表述,从每个句子中提取它们,不同形式可以处理一次。在此基础上制定,避免了多样性问题,我们建议他们多么传达特定的交际功能,而不是通过提取表达式比较现有的评估词汇提取方法。我们还建议,利用命名实体和依赖结构,以除去非公式化的部分从一个句子一个新的提取方法。实验结果表明,该提取方法相对于其他现有方法达到的最佳性能。
Kenichi Iwatsuki, Florian Boudin, Akiko Aizawa
Abstract: Formulaic expressions, such as 'in this paper we propose', are helpful for authors of scholarly papers because they convey communicative functions; in the above, it is showing the aim of this paper'. Thus, resources of formulaic expressions, such as a dictionary, that could be looked up easily would be useful. However, forms of formulaic expressions can often vary to a great extent. For example, 'in this paper we propose', 'in this study we propose' and 'in this paper we propose a new method to' are all regarded as formulaic expressions. Such a diversity of spans and forms causes problems in both extraction and evaluation of formulaic expressions. In this paper, we propose a new approach that is robust to variation of spans and forms of formulaic expressions. Our approach regards a sentence as consisting of a formulaic part and non-formulaic part. Then, instead of trying to extract formulaic expressions from a whole corpus, by extracting them from each sentence, different forms can be dealt with at once. Based on this formulation, to avoid the diversity problem, we propose evaluating extraction methods by how much they convey specific communicative functions rather than by comparing extracted expressions to an existing lexicon. We also propose a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence. Experimental results show that the proposed extraction method achieved the best performance compared to other existing methods.
摘要:公式化表述,“在本文中,我们提出了”,有利于的学术论文的作者,因为他们传达等交际功能;在上文中,它是表示本论文”的目的。因此,公式化表述,如字典的资源,那可以很容易地抬头将是有益的。然而,公式化表达式的形式可以经常变化很大程度。例如,“在本文中,我们提出了”,“在这个研究中,我们提出了”和“在本文中,我们提出了一个新的方法”都视为公式化表述。跨度和形式的这种多样性导致公式化表述的两个提取和评估的问题。在本文中,我们提出了一种新的方法,具有较强的抗跨度和公式化表达形式的变化。我们的做法对于一个句子中包含一个公式化的一部分和非公式化的一部分。然后,而不是试图从整个语料库公式化表述,从每个句子中提取它们,不同形式可以处理一次。在此基础上制定,避免了多样性问题,我们建议他们多么传达特定的交际功能,而不是通过提取表达式比较现有的评估词汇提取方法。我们还建议,利用命名实体和依赖结构,以除去非公式化的部分从一个句子一个新的提取方法。实验结果表明,该提取方法相对于其他现有方法达到的最佳性能。
6. Automatic Speech Recognition Benchmark for Air-Traffic Communications [PDF] 返回目录
Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, Rudolf Braun
Abstract: Advances in Automatic Speech Recognition (ASR) over the last decade opened new areas of speech-based automation such as in Air-Traffic Control (ATC) environment. Currently, voice communication and data links communications are the only way of contact between pilots and Air-Traffic Controllers (ATCo), where the former is the most widely used and the latter is a non-spoken method mandatory for oceanic messages and limited for some domestic issues. ASR systems on ATCo environments inherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases, and small in-domain ATC databases for training. Hereby, we introduce CleanSky EC-H2020 ATCO2, a project that aims to develop an ASR-based platform to collect, organize and automatically pre-process ATCo speech-data from air space. This paper conveys an exploratory benchmark of several state-of-the-art ASR models trained on more than 170 hours of ATCo speech-data. We demonstrate that the cross-accent flaws due to speakers' accents are minimized due to the amount of data, making the system feasible for ATC environments. The developed ASR system achieves an averaged word error rate (WER) of 7.75% across four databases. An additional 35% relative improvement in WER is achieved on one test set when training a TDNNF system with byte-pair encoding.
摘要:在过去的十年进展自动语音识别(ASR)打开基于语音的自动化的新领域,如空中交通管制(ATC)环境。目前,语音通信和数据链路的通信是飞行员和空中交通管制(ATCO),其中前者是使用最广泛的,而后者之间的接触的唯一方式是在非语音方法强制性的海洋消息和有限的一段国内问题。在空中交通管制员的环境ASR系统为业,由于来自非英语为母语,座舱的噪音,扬声器依赖的偏见,并为培训小域ATC数据库口音越来越复杂。在此,我们介绍CleanSky EC-H2020 ATCO2,一个项目,旨在开发一种基于ASR平台,收集,整理和自动空气空间预处理空中交通管制员的语音数据。本文传达的培训了超过170小时空中交通管制员的语音数据的几个国家的最先进的ASR模式的探索基准。我们证明了交叉口音缺陷由于扬声器的口音被最小化由于数据量,使得系统对环境ATC可行。在发达的ASR系统实现跨越四个数据库的7.75%的平均字错误率(WER)。在WER额外35%的相对改善训练TDNNF系统字节对编码当在一个测试集来实现的。
Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, Rudolf Braun
Abstract: Advances in Automatic Speech Recognition (ASR) over the last decade opened new areas of speech-based automation such as in Air-Traffic Control (ATC) environment. Currently, voice communication and data links communications are the only way of contact between pilots and Air-Traffic Controllers (ATCo), where the former is the most widely used and the latter is a non-spoken method mandatory for oceanic messages and limited for some domestic issues. ASR systems on ATCo environments inherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases, and small in-domain ATC databases for training. Hereby, we introduce CleanSky EC-H2020 ATCO2, a project that aims to develop an ASR-based platform to collect, organize and automatically pre-process ATCo speech-data from air space. This paper conveys an exploratory benchmark of several state-of-the-art ASR models trained on more than 170 hours of ATCo speech-data. We demonstrate that the cross-accent flaws due to speakers' accents are minimized due to the amount of data, making the system feasible for ATC environments. The developed ASR system achieves an averaged word error rate (WER) of 7.75% across four databases. An additional 35% relative improvement in WER is achieved on one test set when training a TDNNF system with byte-pair encoding.
摘要:在过去的十年进展自动语音识别(ASR)打开基于语音的自动化的新领域,如空中交通管制(ATC)环境。目前,语音通信和数据链路的通信是飞行员和空中交通管制(ATCO),其中前者是使用最广泛的,而后者之间的接触的唯一方式是在非语音方法强制性的海洋消息和有限的一段国内问题。在空中交通管制员的环境ASR系统为业,由于来自非英语为母语,座舱的噪音,扬声器依赖的偏见,并为培训小域ATC数据库口音越来越复杂。在此,我们介绍CleanSky EC-H2020 ATCO2,一个项目,旨在开发一种基于ASR平台,收集,整理和自动空气空间预处理空中交通管制员的语音数据。本文传达的培训了超过170小时空中交通管制员的语音数据的几个国家的最先进的ASR模式的探索基准。我们证明了交叉口音缺陷由于扬声器的口音被最小化由于数据量,使得系统对环境ATC可行。在发达的ASR系统实现跨越四个数据库的7.75%的平均字错误率(WER)。在WER额外35%的相对改善训练TDNNF系统字节对编码当在一个测试集来实现的。
7. Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [PDF] 返回目录
Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han
Abstract: Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.
摘要:分类法发现在各领域有着广泛的应用,尤其是在线商品分类,浏览和搜索。尽管普遍使用的在线产品目录分类法,在实践中大部分都是由人维护,这是劳动密集型的,不易结垢。虽然从头分类建设在文献中相当的研究,如何有效地丰富现有不完整的分类仍然是一个开放而重要的研究问题。分类富集不仅需要稳健应对新兴的方面,而且现有的分类结构和新名词附件之间的一致性。在本文中,我们提出了一个自我监督的终端到终端的框架,字节,用于在线目录分类富集。八位位组杠杆异构信息独有的在线目录分类,如用户查询,项目及其关系的分类节点,而无需其他监管比现有的分类法。我们建议遥远训练的术语提取,并采用图表神经网络(GNNS)一个序列标注模型捕捉到分类结构以及为长期固定的查询项目,分类的相互作用。在不同的在线领域大量的实验证明八位超过国家的最先进的方法,通过自动和人类评估的优越性。值得注意的是,八位丰富的生产在开放世界的评价大2倍的在线目录分类。
Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han
Abstract: Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.
摘要:分类法发现在各领域有着广泛的应用,尤其是在线商品分类,浏览和搜索。尽管普遍使用的在线产品目录分类法,在实践中大部分都是由人维护,这是劳动密集型的,不易结垢。虽然从头分类建设在文献中相当的研究,如何有效地丰富现有不完整的分类仍然是一个开放而重要的研究问题。分类富集不仅需要稳健应对新兴的方面,而且现有的分类结构和新名词附件之间的一致性。在本文中,我们提出了一个自我监督的终端到终端的框架,字节,用于在线目录分类富集。八位位组杠杆异构信息独有的在线目录分类,如用户查询,项目及其关系的分类节点,而无需其他监管比现有的分类法。我们建议遥远训练的术语提取,并采用图表神经网络(GNNS)一个序列标注模型捕捉到分类结构以及为长期固定的查询项目,分类的相互作用。在不同的在线领域大量的实验证明八位超过国家的最先进的方法,通过自动和人类评估的优越性。值得注意的是,八位丰富的生产在开放世界的评价大2倍的在线目录分类。
8. Multi-branch Attentive Transformer [PDF] 返回目录
Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
Abstract: While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks. In this work, we propose a simple yet effective variant of Transformer called multi-branch attentive Transformer (briefly, MAT), where the attention layer is the average of multiple branches and each branch is an independent multi-head attention layer. We leverage two training techniques to regularize the training: drop-branch, which randomly drops individual branches during training, and proximal initialization, which uses a pre-trained Transformer model to initialize multiple branches. Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements. Our code is available at \url{this https URL}.
摘要:虽然多分支结构的关键成分,以计算机视觉任务的成功之一,它并没有被很好地研究在自然语言处理,特别是序列学习任务。在这项工作中,我们提出了变压器的简单而有效的变种,叫多支周到的变压器(简称为MAT),其中注意力层是平均多个分支,每个分支是一个独立的多头注意层。我们利用两个培训技术来规范培训:降支,培训期间,随机丢弃个别分行,和近端初始化,它使用预训练变压器模型初始化多个分支。机器翻译,代码生成和自然语言理解实验证明变压器,这样一个简单的变体带来显著的改善。我们的代码是可以在\ {URL这HTTPS URL}。
Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
Abstract: While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks. In this work, we propose a simple yet effective variant of Transformer called multi-branch attentive Transformer (briefly, MAT), where the attention layer is the average of multiple branches and each branch is an independent multi-head attention layer. We leverage two training techniques to regularize the training: drop-branch, which randomly drops individual branches during training, and proximal initialization, which uses a pre-trained Transformer model to initialize multiple branches. Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements. Our code is available at \url{this https URL}.
摘要:虽然多分支结构的关键成分,以计算机视觉任务的成功之一,它并没有被很好地研究在自然语言处理,特别是序列学习任务。在这项工作中,我们提出了变压器的简单而有效的变种,叫多支周到的变压器(简称为MAT),其中注意力层是平均多个分支,每个分支是一个独立的多头注意层。我们利用两个培训技术来规范培训:降支,培训期间,随机丢弃个别分行,和近端初始化,它使用预训练变压器模型初始化多个分支。机器翻译,代码生成和自然语言理解实验证明变压器,这样一个简单的变体带来显著的改善。我们的代码是可以在\ {URL这HTTPS URL}。
9. STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths [PDF] 返回目录
Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang
Abstract: Taxonomies are important knowledge ontologies that underpin numerous applications on a daily basis, but many taxonomies used in practice suffer from the low coverage issue. We study the taxonomy expansion problem, which aims to expand existing taxonomies with new concept terms. We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion. To generate natural self-supervision signals, STEAM samples mini-paths from the existing taxonomy, and formulates a node attachment prediction task between anchor mini-paths and query terms. To solve the node attachment task, it learns feature representations for query-anchor pairs from multiple views and performs multi-view co-training for prediction. Extensive experiments show that STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6\% in accuracy and 7.0\% in mean reciprocal rank on three public benchmarks. The implementation of STEAM can be found at \url{this https URL}.
摘要:分类法是重要的知识本体是每天都在托换大量的应用,但在实践中使用的许多分类从低覆盖问题受到影响。我们研究的分类扩张的问题,其目的是扩大与新概念方面现有的分类。我们提出了一个名为蒸汽自我监督分类的扩张模式,它利用在扩展现有的分类天然监督。为了产生天然自检信号,STEAM样品从现有分类法迷你路径,并制定锚迷你路径和查询词之间的节点附预测任务。为了解决节点附任务,它学习用于从多个视图和执行用于预测多视角协同训练查询锚对特征表示。大量的实验表明,STEAM性能优于国家的最先进的方法,通过在精度11.6 \%,在平均倒数排名7.0 \%的三个公立基准分类扩展。蒸汽的实现可以在\ {URL这HTTPS URL}中找到。
Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang
Abstract: Taxonomies are important knowledge ontologies that underpin numerous applications on a daily basis, but many taxonomies used in practice suffer from the low coverage issue. We study the taxonomy expansion problem, which aims to expand existing taxonomies with new concept terms. We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion. To generate natural self-supervision signals, STEAM samples mini-paths from the existing taxonomy, and formulates a node attachment prediction task between anchor mini-paths and query terms. To solve the node attachment task, it learns feature representations for query-anchor pairs from multiple views and performs multi-view co-training for prediction. Extensive experiments show that STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6\% in accuracy and 7.0\% in mean reciprocal rank on three public benchmarks. The implementation of STEAM can be found at \url{this https URL}.
摘要:分类法是重要的知识本体是每天都在托换大量的应用,但在实践中使用的许多分类从低覆盖问题受到影响。我们研究的分类扩张的问题,其目的是扩大与新概念方面现有的分类。我们提出了一个名为蒸汽自我监督分类的扩张模式,它利用在扩展现有的分类天然监督。为了产生天然自检信号,STEAM样品从现有分类法迷你路径,并制定锚迷你路径和查询词之间的节点附预测任务。为了解决节点附任务,它学习用于从多个视图和执行用于预测多视角协同训练查询锚对特征表示。大量的实验表明,STEAM性能优于国家的最先进的方法,通过在精度11.6 \%,在平均倒数排名7.0 \%的三个公立基准分类扩展。蒸汽的实现可以在\ {URL这HTTPS URL}中找到。
10. SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [PDF] 返回目录
Yao Zhao, Mohammad Saleh, Peter J.Liu
Abstract: Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability.
摘要:在序列到序列模式大多数现有的工作集中在与数百令牌的输入序列长度的数据集,由于共同RNN和变压器架构的计算限制。在本文中,我们研究了长期形成抽象的文本摘要,序列对序列的设置与输入序列长度达10万名字串和输出序列长度可达768个令牌。我们建议SEAL,基于变压器的模型,采用了新的编码器,解码器的注意,动态提取/选择输入片段到疏出席为每个输出段。只有使用原始凭证和总结,我们得出代理的标签,采掘层同时提供监管不力与抽象摘要定期监督。海豹模型实现了对现有的长篇总结任务的国家的最先进成果,并优于上一个新的数据集/任务介绍,Search2Wiki强基线机型,拥有更长的输入文本。由于内容选择是在SEAL模型明确的,理想的副作用是,该选择可以被检查以增强解释性。
Yao Zhao, Mohammad Saleh, Peter J.Liu
Abstract: Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability.
摘要:在序列到序列模式大多数现有的工作集中在与数百令牌的输入序列长度的数据集,由于共同RNN和变压器架构的计算限制。在本文中,我们研究了长期形成抽象的文本摘要,序列对序列的设置与输入序列长度达10万名字串和输出序列长度可达768个令牌。我们建议SEAL,基于变压器的模型,采用了新的编码器,解码器的注意,动态提取/选择输入片段到疏出席为每个输出段。只有使用原始凭证和总结,我们得出代理的标签,采掘层同时提供监管不力与抽象摘要定期监督。海豹模型实现了对现有的长篇总结任务的国家的最先进成果,并优于上一个新的数据集/任务介绍,Search2Wiki强基线机型,拥有更长的输入文本。由于内容选择是在SEAL模型明确的,理想的副作用是,该选择可以被检查以增强解释性。
11. Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections [PDF] 返回目录
Łukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz, Michał Bernaczyk
Abstract: Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.
摘要:政治运动都充满张贴在社交媒体上候选人的政治广告。政治广告构成竞选的基本形式,进行各种社会的要求。我们提出在波兰语言检测特定的文本块和政治广告的类别的第一个公开开放的数据集。它包含1705人的注释的鸣叫标记有九类,它们构成根据波兰选举法竞选。我们实现了0.65注释间协议(科恩kappa分数)。一个额外的注释器解决前两个注释改善注释过程的一致性和复杂度之间的错配。我们使用新创建的数据集来训练一个完善的神经恶搞(达到70%个百分点F1分数)。我们还提出了用例这样的数据集和模型可能的方向在Twitter波兰2020总统选举的初步分析。
Łukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz, Michał Bernaczyk
Abstract: Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.
摘要:政治运动都充满张贴在社交媒体上候选人的政治广告。政治广告构成竞选的基本形式,进行各种社会的要求。我们提出在波兰语言检测特定的文本块和政治广告的类别的第一个公开开放的数据集。它包含1705人的注释的鸣叫标记有九类,它们构成根据波兰选举法竞选。我们实现了0.65注释间协议(科恩kappa分数)。一个额外的注释器解决前两个注释改善注释过程的一致性和复杂度之间的错配。我们使用新创建的数据集来训练一个完善的神经恶搞(达到70%个百分点F1分数)。我们还提出了用例这样的数据集和模型可能的方向在Twitter波兰2020总统选举的初步分析。
12. Is this Dialogue Coherent? Learning from Dialogue Acts and Entities [PDF] 返回目录
Alessandra Cervone, Giuseppe Riccardi
Abstract: In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-turn candidate utterances ratings are provided considering the full dialogue context. Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities previously introduced and the Dialogue Acts used. Second, we experiment with different architectures to model entities, Dialogue Acts and their combination and evaluate their performance in predicting human coherence ratings on SWBD-Coh. We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.
摘要:在这项工作中,我们研究的连贯性的开放领域对话的人类感知。特别是,我们解决注释和同时考虑对话的整个历史建模下转候选人的连贯性的问题。首先,我们创建总机一致性(SWBD-COH)主体,带转向一致的收视率,在提供下转候选人话语收视率全面考虑对话的上下文注释的人类,人类口语对话中的数据集。我们对语料的统计分析表明如何转弯的一致性看法是实体的分布格局的影响前面介绍和对话徒使用。第二,我们有不同的架构进行试验,以模型实体,对话行为及其组合,并评估其在预测上SWBD-COH人评级的一致性表现。我们发现,模型结合两种DA和实体的信息产生最好的表演都为响应选择和转连贯性的评级。
Alessandra Cervone, Giuseppe Riccardi
Abstract: In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-turn candidate utterances ratings are provided considering the full dialogue context. Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities previously introduced and the Dialogue Acts used. Second, we experiment with different architectures to model entities, Dialogue Acts and their combination and evaluate their performance in predicting human coherence ratings on SWBD-Coh. We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.
摘要:在这项工作中,我们研究的连贯性的开放领域对话的人类感知。特别是,我们解决注释和同时考虑对话的整个历史建模下转候选人的连贯性的问题。首先,我们创建总机一致性(SWBD-COH)主体,带转向一致的收视率,在提供下转候选人话语收视率全面考虑对话的上下文注释的人类,人类口语对话中的数据集。我们对语料的统计分析表明如何转弯的一致性看法是实体的分布格局的影响前面介绍和对话徒使用。第二,我们有不同的架构进行试验,以模型实体,对话行为及其组合,并评估其在预测上SWBD-COH人评级的一致性表现。我们发现,模型结合两种DA和实体的信息产生最好的表演都为响应选择和转连贯性的评级。
13. Extensively Matching for Few-shot Learning Event Detection [PDF] 返回目录
Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen
Abstract: Current event detection models under super-vised learning settings fail to transfer to newevent types. Few-shot learning has not beenexplored in event detection even though it al-lows a model to perform well with high gener-alization on new event types. In this work, weformulate event detection as a few-shot learn-ing problem to enable to extend event detec-tion to new event types. We propose two novelloss factors that matching examples in the sup-port set to provide more training signals to themodel. Moreover, these training signals can beapplied in many metric-based few-shot learn-ing models. Our extensive experiments on theACE-2005 dataset (under a few-shot learningsetting) show that the proposed method can im-prove the performance of few-shot learning
摘要:在超叼着学习设置当前事件检测模型并不能转移到newevent类型。很少拍学习未在事件检测beenexplored即使AL-低点模型与新的事件类型高根儿,alization表现良好。在这项工作中,weformulate事件检测到几拍学会 - 荷兰国际集团的问题,以使该事件侦测扩展到新的事件类型。我们提出了两种novelloss因素,在SUP端口组匹配的例子来提供更多的训练信号themodel。此外,这些训练信号可以在许多基于度量的几个次学习-ING模型beapplied。我们对theACE - 2005年的数据集大量的实验(几拍learningsetting下)表明,该方法能IM-证明几拍的学习表现
Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen
Abstract: Current event detection models under super-vised learning settings fail to transfer to newevent types. Few-shot learning has not beenexplored in event detection even though it al-lows a model to perform well with high gener-alization on new event types. In this work, weformulate event detection as a few-shot learn-ing problem to enable to extend event detec-tion to new event types. We propose two novelloss factors that matching examples in the sup-port set to provide more training signals to themodel. Moreover, these training signals can beapplied in many metric-based few-shot learn-ing models. Our extensive experiments on theACE-2005 dataset (under a few-shot learningsetting) show that the proposed method can im-prove the performance of few-shot learning
摘要:在超叼着学习设置当前事件检测模型并不能转移到newevent类型。很少拍学习未在事件检测beenexplored即使AL-低点模型与新的事件类型高根儿,alization表现良好。在这项工作中,weformulate事件检测到几拍学会 - 荷兰国际集团的问题,以使该事件侦测扩展到新的事件类型。我们提出了两种novelloss因素,在SUP端口组匹配的例子来提供更多的训练信号themodel。此外,这些训练信号可以在许多基于度量的几个次学习-ING模型beapplied。我们对theACE - 2005年的数据集大量的实验(几拍learningsetting下)表明,该方法能IM-证明几拍的学习表现
14. Zero-Shot Learning with Common Sense Knowledge Graphs [PDF] 返回目录
Nihal V. Nayak, Stephen H. Bach
Abstract: Zero-shot learning relies on semantic class representations such as attributes or pretrained embeddings to predict classes without any labeled examples. We propose to learn class representations from common sense knowledge graphs. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a framework based on graph neural networks with non-linear aggregators to generate class representations. Whereas most prior work on graph neural networks uses linear functions to aggregate information from neighboring nodes, we find that non-linear aggregators such as LSTMs or transformers lead to significant improvements on zero-shot tasks. On two natural language tasks across three datasets, ZSL-KG shows an average improvement of 9.2 points of accuracy versus state-of-the-art methods. In addition, on an object classification task, ZSL-KG shows a 2.2 accuracy point improvement versus the best methods that do not require hand-engineered class representations. Finally, we find that ZSL-KG outperforms the best performing graph neural networks with linear aggregators by an average of 3.8 points of accuracy across these four datasets.
摘要:零射门的学习依赖于语义类的表示,如属性或预训练的嵌入预测类没有任何标记的例子。我们建议借鉴常识知识图类表示。常识性知识图是明确的知识层次高的未开发的来源,只需要很少的人的努力,以适用于一系列任务。为了捕捉图中的知识,我们引入ZSL-KG,基于图形神经网络与非线性的聚合,生成类表示了一个框架。而从邻居节点上图的神经网络应用最优先的工作线性函数来汇总信息,我们发现非线性的聚合,如LSTMs或变压器导致对零射门的任务显著的改善。跨三个数据集两个自然语言任务,ZSL-KG显示的精度与国家的最先进方法的9.2分的平均改善。此外,对象类别的任务,ZSL-KG显示精度2.2点的改善对不需要手工设计类表示最好的方法。最后,我们发现,ZSL-KG平均跨越这四个数据集精度的3.8点优于线性聚合表现最佳的图形神经网络。
Nihal V. Nayak, Stephen H. Bach
Abstract: Zero-shot learning relies on semantic class representations such as attributes or pretrained embeddings to predict classes without any labeled examples. We propose to learn class representations from common sense knowledge graphs. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a framework based on graph neural networks with non-linear aggregators to generate class representations. Whereas most prior work on graph neural networks uses linear functions to aggregate information from neighboring nodes, we find that non-linear aggregators such as LSTMs or transformers lead to significant improvements on zero-shot tasks. On two natural language tasks across three datasets, ZSL-KG shows an average improvement of 9.2 points of accuracy versus state-of-the-art methods. In addition, on an object classification task, ZSL-KG shows a 2.2 accuracy point improvement versus the best methods that do not require hand-engineered class representations. Finally, we find that ZSL-KG outperforms the best performing graph neural networks with linear aggregators by an average of 3.8 points of accuracy across these four datasets.
摘要:零射门的学习依赖于语义类的表示,如属性或预训练的嵌入预测类没有任何标记的例子。我们建议借鉴常识知识图类表示。常识性知识图是明确的知识层次高的未开发的来源,只需要很少的人的努力,以适用于一系列任务。为了捕捉图中的知识,我们引入ZSL-KG,基于图形神经网络与非线性的聚合,生成类表示了一个框架。而从邻居节点上图的神经网络应用最优先的工作线性函数来汇总信息,我们发现非线性的聚合,如LSTMs或变压器导致对零射门的任务显著的改善。跨三个数据集两个自然语言任务,ZSL-KG显示的精度与国家的最先进方法的9.2分的平均改善。此外,对象类别的任务,ZSL-KG显示精度2.2点的改善对不需要手工设计类表示最好的方法。最后,我们发现,ZSL-KG平均跨越这四个数据集精度的3.8点优于线性聚合表现最佳的图形神经网络。
15. Compositional Generalization by Learning Analytical Expressions [PDF] 返回目录
Qian Liu, Shengnan An, Jian-Guang Lou, Bei Chen, Zeqi Lin, Yan Gao, Bin Zhou, Nanning Zheng, Dongmei Zhang
Abstract: Compositional generalization is a basic but essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules Composer and Solver, fitting well with the cognitive argument while still being trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on a well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.
摘要:成分概括是人类的一个基本的而且是必要的智力能力,这使我们能够很容易地知道重组的部分。然而,现有的基于神经网络模型已经被证明是这样的能力极度匮乏。通过在认知工作,这认为组合性可以通过用符号函数可变时隙被捕获的启发,我们提出,其连接与解析表达式一个内存增强神经模型,以实现组成概括清凉图。我们的模型包括两个合作神经模块Composer和求解,与认知的说法,同时仍然在一个终端到终端的方式通过分层强化学习算法训练装修好。在著名的基准SCAN实验表明,我们的模型抓住组成泛化的一个伟大的能力,解决所有难题解决由以前的作品以100%的准确度。
Qian Liu, Shengnan An, Jian-Guang Lou, Bei Chen, Zeqi Lin, Yan Gao, Bin Zhou, Nanning Zheng, Dongmei Zhang
Abstract: Compositional generalization is a basic but essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules Composer and Solver, fitting well with the cognitive argument while still being trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on a well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.
摘要:成分概括是人类的一个基本的而且是必要的智力能力,这使我们能够很容易地知道重组的部分。然而,现有的基于神经网络模型已经被证明是这样的能力极度匮乏。通过在认知工作,这认为组合性可以通过用符号函数可变时隙被捕获的启发,我们提出,其连接与解析表达式一个内存增强神经模型,以实现组成概括清凉图。我们的模型包括两个合作神经模块Composer和求解,与认知的说法,同时仍然在一个终端到终端的方式通过分层强化学习算法训练装修好。在著名的基准SCAN实验表明,我们的模型抓住组成泛化的一个伟大的能力,解决所有难题解决由以前的作品以100%的准确度。
16. Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning [PDF] 返回目录
Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
Abstract: We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation that many neural networks are severely overparameterized, resulting in significant waste in computational resources as well as being susceptible to overfitting. SSNs address this by learning where and how to share parameters between layers in a neural network while avoiding degenerate solutions that result in underfitting. Specifically, we automatically construct parameter groups that identify where parameter sharing is most beneficial. Then, we map each group's weights to construct layers with learned combinations of candidates from a shared parameter pool. SSNs can share parameters across layers even when they have different sizes, perform different operations, and/or operate on features from different modalities. We evaluate our approach on a diverse set of tasks, including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters. We also apply SSNs to knowledge distillation, where we obtain state-of-the-art results when combined with traditional distillation methods.
摘要:我们提出变形者网络(核潜艇),灵活的神经网络框架,提高了性能并减少了多样化的超标神经网络场景的内存需求。我们的做法是基于这样的观察,许多神经网络受到严重overparameterized,造成计算资源浪费显著以及易受过度拟合。核潜艇通过学习在哪里以及如何各层之间共享参数的神经网络,同时避免退化的解决方案,导致欠拟合解决这个问题。具体来说,我们会自动构建查明参数共享是最有利的参数组。然后,我们从一个共享参数池候选人的教训组合各组的权重来构造层映射。的SSN可以共享跨层参数,即使他们具有不同的尺寸,执行不同的操作,和/或在来自不同模态的特征操作。我们评估我们在一组不同的任务,包括图像分类,双向图像句子检索和短语接地的方法,少用为参数的1%时,甚至创造高性能的机型。我们也在申请社会安全号码知识蒸馏,在那里我们与传统的蒸馏方法相结合获得国家的先进成果。
Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
Abstract: We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation that many neural networks are severely overparameterized, resulting in significant waste in computational resources as well as being susceptible to overfitting. SSNs address this by learning where and how to share parameters between layers in a neural network while avoiding degenerate solutions that result in underfitting. Specifically, we automatically construct parameter groups that identify where parameter sharing is most beneficial. Then, we map each group's weights to construct layers with learned combinations of candidates from a shared parameter pool. SSNs can share parameters across layers even when they have different sizes, perform different operations, and/or operate on features from different modalities. We evaluate our approach on a diverse set of tasks, including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters. We also apply SSNs to knowledge distillation, where we obtain state-of-the-art results when combined with traditional distillation methods.
摘要:我们提出变形者网络(核潜艇),灵活的神经网络框架,提高了性能并减少了多样化的超标神经网络场景的内存需求。我们的做法是基于这样的观察,许多神经网络受到严重overparameterized,造成计算资源浪费显著以及易受过度拟合。核潜艇通过学习在哪里以及如何各层之间共享参数的神经网络,同时避免退化的解决方案,导致欠拟合解决这个问题。具体来说,我们会自动构建查明参数共享是最有利的参数组。然后,我们从一个共享参数池候选人的教训组合各组的权重来构造层映射。的SSN可以共享跨层参数,即使他们具有不同的尺寸,执行不同的操作,和/或在来自不同模态的特征操作。我们评估我们在一组不同的任务,包括图像分类,双向图像句子检索和短语接地的方法,少用为参数的1%时,甚至创造高性能的机型。我们也在申请社会安全号码知识蒸馏,在那里我们与传统的蒸馏方法相结合获得国家的先进成果。
17. MIMICS: A Large-Scale Data Collection for Search Clarification [PDF] 返回目录
Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, Nick Craswell
Abstract: Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale data for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs. Each clarification in MIMICS is generated by a Bing production algorithm and consists of a clarifying question and up to five candidate answers. MIMICS contains three datasets: (1) MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks). (2) MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes. (3) MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer. MIMICS is publicly available for research purposes, thus enables researchers to study a number of tasks related to search clarification, including clarification generation and selection, user engagement prediction for clarification, click models for clarification, and analyzing user interactions with search clarification.
摘要:搜索澄清近来备受关注,因为它在搜索引擎中的应用。它也被认为是对话寻求信息系统的主要组成部分。尽管它的重要性,研究界仍然觉得缺乏研究搜索澄清的不同方面大规模的数据。在本文中,我们介绍MIMICS,搜索澄清数据集从兵查询日志采集真正的web搜索查询的集合。每个澄清MIMICS是由冰生产算法生成,由一个澄清问题和5个候选答案。 MIMICS包含三个数据集:(1)MIMICS-点击包括超过400K唯一查询,其相关联的澄清窗格以及相应的聚集的用户交互的信号(即,点击)。 (2)MIMICS-ClickExplore是包括聚集的用户交互信号超过60K唯一查询,每个均具有多个澄清窗格的勘探数据。 (3)模拟物,手册包括超过2K的唯一真正的搜索查询。在此数据集每一个查询澄清对已通过至少三个训练注释器被手动地标记。它包含了澄清问题,候选答案集,并为每个候选答案着陆结果页面渐变效果的标签。 MIMICS是公开的用于研究目的,从而使研究人员能够研究一些与搜索相关的澄清,包括澄清发生和选择,用户参与预测澄清的任务,点击型号澄清,并与搜索澄清分析用户交互。
Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, Nick Craswell
Abstract: Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale data for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs. Each clarification in MIMICS is generated by a Bing production algorithm and consists of a clarifying question and up to five candidate answers. MIMICS contains three datasets: (1) MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks). (2) MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes. (3) MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer. MIMICS is publicly available for research purposes, thus enables researchers to study a number of tasks related to search clarification, including clarification generation and selection, user engagement prediction for clarification, click models for clarification, and analyzing user interactions with search clarification.
摘要:搜索澄清近来备受关注,因为它在搜索引擎中的应用。它也被认为是对话寻求信息系统的主要组成部分。尽管它的重要性,研究界仍然觉得缺乏研究搜索澄清的不同方面大规模的数据。在本文中,我们介绍MIMICS,搜索澄清数据集从兵查询日志采集真正的web搜索查询的集合。每个澄清MIMICS是由冰生产算法生成,由一个澄清问题和5个候选答案。 MIMICS包含三个数据集:(1)MIMICS-点击包括超过400K唯一查询,其相关联的澄清窗格以及相应的聚集的用户交互的信号(即,点击)。 (2)MIMICS-ClickExplore是包括聚集的用户交互信号超过60K唯一查询,每个均具有多个澄清窗格的勘探数据。 (3)模拟物,手册包括超过2K的唯一真正的搜索查询。在此数据集每一个查询澄清对已通过至少三个训练注释器被手动地标记。它包含了澄清问题,候选答案集,并为每个候选答案着陆结果页面渐变效果的标签。 MIMICS是公开的用于研究目的,从而使研究人员能够研究一些与搜索相关的澄清,包括澄清发生和选择,用户参与预测澄清的任务,点击型号澄清,并与搜索澄清分析用户交互。
18. Overcoming Statistical Shortcuts for Open-ended Visual Counting [PDF] 返回目录
Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord
Abstract: Machine learning models tend to over-rely on statistical shortcuts. These spurious correlations between parts of the input and the output labels does not hold in real-world settings. We target this issue on the recent open-ended visual counting task which is well suited to study statistical shortcuts. We aim to develop models that learn a proper mechanism of counting regardless of the output label. First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts. It is based on pairs of training and testing sets that do not follow the same count label distribution such as the odd-even sets. Intuitively, models that have learned a proper mechanism of counting on odd numbers should perform well on even numbers. Secondly, we introduce the Spatial Counting Network (SCN), which is dedicated to visual analysis and counting based on natural language questions. Our model selects relevant image regions, scores them with fusion and self-attention mechanisms, and provides a final counting score. We apply our protocol on the recent dataset, TallyQA, and show superior performances compared to state-of-the-art models. We also demonstrate the ability of our model to select the correct instances to count in the image. Code and datasets are available: this https URL
摘要:机器学习模型往往过分依赖统计的快捷方式。输入部分和输出标签之间的这些虚假相关不真实世界的设置保持。我们的目标上非常适合学习统计捷径近期开放式的视觉计数任务这个问题。我们的目标是开发出学习计数的适当机制,无论输出标签的机型。首先,我们提出了修改计数分发(MCD)协议,该惩罚的模型,过分依赖统计的快捷方式。它是基于对训练和测试不遵循相同的计数标签分发诸如奇偶台套。直观地说,这已经学会计数的适当机制上奇数模式应在偶数表现良好。其次,我们引入了空间计数网络(SCN),它是基于自然语言问题,致力于可视化分析和计算。我们的模式选择有关的图像区域,以融合和自我关注机制评分他们,并提供了最终计分。我们采用了最近的数据集,TallyQA我们的协议,并与国家的最先进的机型显示出优异的性能。我们还表明我们的模型来选择正确的情况下,图像中的计数能力。代码和数据集可用:此HTTPS URL
Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord
Abstract: Machine learning models tend to over-rely on statistical shortcuts. These spurious correlations between parts of the input and the output labels does not hold in real-world settings. We target this issue on the recent open-ended visual counting task which is well suited to study statistical shortcuts. We aim to develop models that learn a proper mechanism of counting regardless of the output label. First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts. It is based on pairs of training and testing sets that do not follow the same count label distribution such as the odd-even sets. Intuitively, models that have learned a proper mechanism of counting on odd numbers should perform well on even numbers. Secondly, we introduce the Spatial Counting Network (SCN), which is dedicated to visual analysis and counting based on natural language questions. Our model selects relevant image regions, scores them with fusion and self-attention mechanisms, and provides a final counting score. We apply our protocol on the recent dataset, TallyQA, and show superior performances compared to state-of-the-art models. We also demonstrate the ability of our model to select the correct instances to count in the image. Code and datasets are available: this https URL
摘要:机器学习模型往往过分依赖统计的快捷方式。输入部分和输出标签之间的这些虚假相关不真实世界的设置保持。我们的目标上非常适合学习统计捷径近期开放式的视觉计数任务这个问题。我们的目标是开发出学习计数的适当机制,无论输出标签的机型。首先,我们提出了修改计数分发(MCD)协议,该惩罚的模型,过分依赖统计的快捷方式。它是基于对训练和测试不遵循相同的计数标签分发诸如奇偶台套。直观地说,这已经学会计数的适当机制上奇数模式应在偶数表现良好。其次,我们引入了空间计数网络(SCN),它是基于自然语言问题,致力于可视化分析和计算。我们的模式选择有关的图像区域,以融合和自我关注机制评分他们,并提供了最终计分。我们采用了最近的数据集,TallyQA我们的协议,并与国家的最先进的机型显示出优异的性能。我们还表明我们的模型来选择正确的情况下,图像中的计数能力。代码和数据集可用:此HTTPS URL
注:中文为机器翻译结果!