目录
1. Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity [PDF] 摘要
5. Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog [PDF] 摘要
7. Combining Pretrained High-Resource Embeddings and Subword Representations for Low-Resource Languages [PDF] 摘要
8. GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model [PDF] 摘要
摘要
1. Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity [PDF] 返回目录
Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, Thierry Poibeau, Roi Reichart, Anna Korhonen
Abstract: We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering datasets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, M-BERT and XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step dataset creation protocol for creating consistent, Multi-Simlex-style resources for additional languages. We make these contributions -- the public release of Multi-SimLex datasets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning -- available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.
摘要:本文介绍了多SimLex,大规模的词汇资源,覆盖12种类型学的不同语言的数据集,包括主要语言(例如,国语中国语,西班牙语,俄语),以及资源充足的少的人(例如,威尔士的评价基准,斯瓦希里语)。每个语言数据集注释的语义相似度的词法关系和包含1888语义上对准的概念对,提供词类(名词,动词,形容词,副词),频率行列,相似的间隔,词汇字段,和具体层次的代表覆盖。另外,由于概念跨语言的对准,我们提供了一套66跨语言的语义相似度的数据集。由于其广泛的大小和语言覆盖,多SimLex提供了实验评估和分析完全新颖的机会。它的单语和跨语种基准,我们评估和分析的宽状态的最先进的最近的单语和跨语言表示模型,包括静态和情境化字的嵌入(如fastText,M-BERT和XLM)的阵列的,外部通知词汇表示,以及完全无人监督和(弱)监督跨语言字的嵌入。我们还提出一个一步一步的数据集创建协议对其他语言的创建一致的,多Simlex式的资源。我们做出这些贡献 - 多SimLex数据集,他们的创作协议,强大的基准结果的公开发布,并深入分析其可能是在多语种词汇语义和代表学习指导未来的发展有帮助的 - 可通过网站这将鼓励多Simlex到更多的语言进一步扩大社会各界共同努力。如此大规模的语义资源可以跨越语言激励在NLP显著的进一步发展。
Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, Thierry Poibeau, Roi Reichart, Anna Korhonen
Abstract: We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering datasets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, M-BERT and XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step dataset creation protocol for creating consistent, Multi-Simlex-style resources for additional languages. We make these contributions -- the public release of Multi-SimLex datasets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning -- available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.
摘要:本文介绍了多SimLex,大规模的词汇资源,覆盖12种类型学的不同语言的数据集,包括主要语言(例如,国语中国语,西班牙语,俄语),以及资源充足的少的人(例如,威尔士的评价基准,斯瓦希里语)。每个语言数据集注释的语义相似度的词法关系和包含1888语义上对准的概念对,提供词类(名词,动词,形容词,副词),频率行列,相似的间隔,词汇字段,和具体层次的代表覆盖。另外,由于概念跨语言的对准,我们提供了一套66跨语言的语义相似度的数据集。由于其广泛的大小和语言覆盖,多SimLex提供了实验评估和分析完全新颖的机会。它的单语和跨语种基准,我们评估和分析的宽状态的最先进的最近的单语和跨语言表示模型,包括静态和情境化字的嵌入(如fastText,M-BERT和XLM)的阵列的,外部通知词汇表示,以及完全无人监督和(弱)监督跨语言字的嵌入。我们还提出一个一步一步的数据集创建协议对其他语言的创建一致的,多Simlex式的资源。我们做出这些贡献 - 多SimLex数据集,他们的创作协议,强大的基准结果的公开发布,并深入分析其可能是在多语种词汇语义和代表学习指导未来的发展有帮助的 - 可通过网站这将鼓励多Simlex到更多的语言进一步扩大社会各界共同努力。如此大规模的语义资源可以跨越语言激励在NLP显著的进一步发展。
2. Video Caption Dataset for Describing Human Actions in Japanese [PDF] 返回目录
Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi
Abstract: In recent years, automatic video caption generation has attracted considerable attention. This paper focuses on the generation of Japanese captions for describing human actions. While most currently available video caption datasets have been constructed for English, there is no equivalent Japanese dataset. To address this, we constructed a large-scale Japanese video caption dataset consisting of 79,822 videos and 399,233 captions. Each caption in our dataset describes a video in the form of "who does what and where." To describe human actions, it is important to identify the details of a person, place, and action. Indeed, when we describe human actions, we usually mention the scene, person, and action. In our experiments, we evaluated two caption generation methods to obtain benchmark results. Further, we investigated whether those generation methods could specify "who does what and where."
摘要:近年来,自动字幕生成已经吸引了相当多的关注。本文重点介绍日本字幕的生成用于描述人的行为。虽然大多数现有的视频字幕的数据集已经构建了英语,没有等价日本数据集。为了解决这个问题,我们构建一个大型的日本视频字幕数据集,包括79822个视频和字幕399233。在我们的数据集中的每个标题描述视频的形式“谁做什么,在哪里。”为了描述人的行为,以识别人物,地点和行动的细节是很重要的。事实上,当我们描述人的行为,我们通常提到的场景,人物和动作。在我们的实验中,我们评价了两种字幕生成方法来获取基准测试结果。此外,我们研究了这些发电方式是否可以指定“谁做什么,在哪里。”
Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi
Abstract: In recent years, automatic video caption generation has attracted considerable attention. This paper focuses on the generation of Japanese captions for describing human actions. While most currently available video caption datasets have been constructed for English, there is no equivalent Japanese dataset. To address this, we constructed a large-scale Japanese video caption dataset consisting of 79,822 videos and 399,233 captions. Each caption in our dataset describes a video in the form of "who does what and where." To describe human actions, it is important to identify the details of a person, place, and action. Indeed, when we describe human actions, we usually mention the scene, person, and action. In our experiments, we evaluated two caption generation methods to obtain benchmark results. Further, we investigated whether those generation methods could specify "who does what and where."
摘要:近年来,自动字幕生成已经吸引了相当多的关注。本文重点介绍日本字幕的生成用于描述人的行为。虽然大多数现有的视频字幕的数据集已经构建了英语,没有等价日本数据集。为了解决这个问题,我们构建一个大型的日本视频字幕数据集,包括79822个视频和字幕399233。在我们的数据集中的每个标题描述视频的形式“谁做什么,在哪里。”为了描述人的行为,以识别人物,地点和行动的细节是很重要的。事实上,当我们描述人的行为,我们通常提到的场景,人物和动作。在我们的实验中,我们评价了两种字幕生成方法来获取基准测试结果。此外,我们研究了这些发电方式是否可以指定“谁做什么,在哪里。”
3. Undersensitivity in Neural Reading Comprehension [PDF] 返回目录
Johannes Welbl, Pasquale Minervini, Max Bartolo, Pontus Stenetorp, Sebastian Riedel
Abstract: Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs. Most prior work on adversarial inputs studies oversensitivity: semantically invariant text perturbations that cause a model's prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity, where input text is meaningfully changed but the model's prediction does not, even though it should. We formulate a noisy adversarial attack which searches among semantic variations of the question for which a model erroneously predicts the same answer, and with even higher probability. Despite comprising unanswerable questions, both SQuAD2.0 and NewsQA models are vulnerable to this attack. This indicates that although accurate, models tend to rely on spurious patterns and do not fully consider the information specified in a question. We experiment with data augmentation and adversarial training as defences, and find that both substantially decrease vulnerability to attacks on held out data, as well as held out attack spaces. Addressing undersensitivity also improves results on AddSent and AddOneSent, and models furthermore generalise better when facing train/evaluation distribution mismatch: they are less prone to overly rely on predictive cues present only in the training set, and outperform a conventional model by as much as 10.9% F1.
摘要:当前的阅读理解模式推广以及在分布测试集,但表现不佳,adversarially选择的输入。在对抗投入研究过度敏感最优先的工作:语义不变文字扰动时,它不应该是导致模型的预测变化。在这项工作中,我们注重补充问题:过多的预测undersensitivity,在输入文本时意味深长地改变,但模型预测不对,即使它应该。我们制定一个嘈杂的敌对攻击该问题的语义变化中搜索该模型预测错误相同的答案,并具有更高的概率。尽管包括无法回答的问题,既SQuAD2.0和NewsQA模型很容易受到这种攻击。这表明,虽然准确,模型往往依靠虚假的模式,不充分考虑在一个问题中指定的信息。我们用数据增强和对抗性训练作为防御试验,发现两者显着降低脆弱性就伸出数据攻击,以及举行了进攻的空间。面对火车/评估分布不匹配时,寻址undersensitivity也提高了AddSent和AddOneSent结果和模型进一步广义含更好:它们不易过分依赖预测的线索只有在训练集展示,以及多达10.9优于传统模式%F1。
Johannes Welbl, Pasquale Minervini, Max Bartolo, Pontus Stenetorp, Sebastian Riedel
Abstract: Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs. Most prior work on adversarial inputs studies oversensitivity: semantically invariant text perturbations that cause a model's prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity, where input text is meaningfully changed but the model's prediction does not, even though it should. We formulate a noisy adversarial attack which searches among semantic variations of the question for which a model erroneously predicts the same answer, and with even higher probability. Despite comprising unanswerable questions, both SQuAD2.0 and NewsQA models are vulnerable to this attack. This indicates that although accurate, models tend to rely on spurious patterns and do not fully consider the information specified in a question. We experiment with data augmentation and adversarial training as defences, and find that both substantially decrease vulnerability to attacks on held out data, as well as held out attack spaces. Addressing undersensitivity also improves results on AddSent and AddOneSent, and models furthermore generalise better when facing train/evaluation distribution mismatch: they are less prone to overly rely on predictive cues present only in the training set, and outperform a conventional model by as much as 10.9% F1.
摘要:当前的阅读理解模式推广以及在分布测试集,但表现不佳,adversarially选择的输入。在对抗投入研究过度敏感最优先的工作:语义不变文字扰动时,它不应该是导致模型的预测变化。在这项工作中,我们注重补充问题:过多的预测undersensitivity,在输入文本时意味深长地改变,但模型预测不对,即使它应该。我们制定一个嘈杂的敌对攻击该问题的语义变化中搜索该模型预测错误相同的答案,并具有更高的概率。尽管包括无法回答的问题,既SQuAD2.0和NewsQA模型很容易受到这种攻击。这表明,虽然准确,模型往往依靠虚假的模式,不充分考虑在一个问题中指定的信息。我们用数据增强和对抗性训练作为防御试验,发现两者显着降低脆弱性就伸出数据攻击,以及举行了进攻的空间。面对火车/评估分布不匹配时,寻址undersensitivity也提高了AddSent和AddOneSent结果和模型进一步广义含更好:它们不易过分依赖预测的线索只有在训练集展示,以及多达10.9优于传统模式%F1。
4. Efficient Intent Detection with Dual Sentence Encoders [PDF] 返回目录
Iñigo Casanueva, Tadas Temčinas, Daniela Gerz, Matthew Henderson, Ivan Vulić
Abstract: Building conversational systems in new domains and with added functionality requires resource-efficient models that work under low-data regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and ConveRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.
摘要:在新的领域,并与附加功能建立对话系统,需要节约资源的模式,低数据传输机制(即,在为数不多的镜头设置)下工作。通过这些要求的推动下,我们通过引入双预先训练句子的编码器,如使用和转换支持的意图的检测方法。我们证明了该意图检测器的有用性和广泛的适用性,这表明:1)它们优于基于微调在三个不同的意图的检测的数据集的完整BERT-Large模式或使用BERT作为固定黑箱编码器意图检测器; 2)增益在几次设置(即尤其显着,每意图仅10或30注释的例子); 3)我们的意图的检测器可以在单个CPU上几分钟内就被训练; 4)他们是在不同的超参数设置稳定。在促进和民主化研究的希望集中在意向检测,我们发布的代码,以及新的具有挑战性的单域意图检测,包括超过77个意图13083个注释实例数据集。
Iñigo Casanueva, Tadas Temčinas, Daniela Gerz, Matthew Henderson, Ivan Vulić
Abstract: Building conversational systems in new domains and with added functionality requires resource-efficient models that work under low-data regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and ConveRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.
摘要:在新的领域,并与附加功能建立对话系统,需要节约资源的模式,低数据传输机制(即,在为数不多的镜头设置)下工作。通过这些要求的推动下,我们通过引入双预先训练句子的编码器,如使用和转换支持的意图的检测方法。我们证明了该意图检测器的有用性和广泛的适用性,这表明:1)它们优于基于微调在三个不同的意图的检测的数据集的完整BERT-Large模式或使用BERT作为固定黑箱编码器意图检测器; 2)增益在几次设置(即尤其显着,每意图仅10或30注释的例子); 3)我们的意图的检测器可以在单个CPU上几分钟内就被训练; 4)他们是在不同的超参数设置稳定。在促进和民主化研究的希望集中在意向检测,我们发布的代码,以及新的具有挑战性的单域意图检测,包括超过77个意图13083个注释实例数据集。
5. Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog [PDF] 返回目录
Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao, Rui Yan
Abstract: Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.
摘要:以生动,引人入胜的表情贴纸正在成为在线消息应用越来越普及,有的作品致力于通过与以前的话语贴纸匹配的文本标签自动选择贴纸响应。然而,由于其数量大,是不切实际的要求对所有的贴纸文本标签。因此,在本文中,我们提出建议适当贴纸基于多轮对话的上下文历史的用户,无需任何外部的标签。两个主要挑战所面临的这一任务。一是学习贴的语义没有相应的文字标签。另一个挑战是共同的模型与多轮对话语境候选贴纸。为了应对这些挑战,我们提出了一个标签响应选择(SRS)模型。具体而言,SRS第一采用卷积基于不干胶贴纸图像编码器和一个自关注基于多匝对话编码器以获得贴和话语的表示。接下来,深交互网络提出了在对话历史每个话语的贴纸之间进行深匹配。 SRS然后通过熔融网络来输出最终的匹配分数获悉所有交互结果之间的短期和长期的依赖性。为了评估我们提出的方法,我们将收集从最流行的在线聊天平台之一贴纸大规模真实世界的对话集。在这个数据集上,我们的模型实现了国家的最先进的性能为所有常用的指标进行了广泛的实验。实验还证实SRS的每个组件的有效性。为了便于在贴纸的选择领域的进一步研究,我们发布的340K多轮对话和贴纸对这个数据集。
Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao, Rui Yan
Abstract: Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.
摘要:以生动,引人入胜的表情贴纸正在成为在线消息应用越来越普及,有的作品致力于通过与以前的话语贴纸匹配的文本标签自动选择贴纸响应。然而,由于其数量大,是不切实际的要求对所有的贴纸文本标签。因此,在本文中,我们提出建议适当贴纸基于多轮对话的上下文历史的用户,无需任何外部的标签。两个主要挑战所面临的这一任务。一是学习贴的语义没有相应的文字标签。另一个挑战是共同的模型与多轮对话语境候选贴纸。为了应对这些挑战,我们提出了一个标签响应选择(SRS)模型。具体而言,SRS第一采用卷积基于不干胶贴纸图像编码器和一个自关注基于多匝对话编码器以获得贴和话语的表示。接下来,深交互网络提出了在对话历史每个话语的贴纸之间进行深匹配。 SRS然后通过熔融网络来输出最终的匹配分数获悉所有交互结果之间的短期和长期的依赖性。为了评估我们提出的方法,我们将收集从最流行的在线聊天平台之一贴纸大规模真实世界的对话集。在这个数据集上,我们的模型实现了国家的最先进的性能为所有常用的指标进行了广泛的实验。实验还证实SRS的每个组件的有效性。为了便于在贴纸的选择领域的进一步研究,我们发布的340K多轮对话和贴纸对这个数据集。
6. A Framework for Evaluation of Machine Reading Comprehension Gold Standards [PDF] 返回目录
Viktor Schlegel, Marco Valentino, André Freitas, Goran Nenadic, Riza Batista-Navarro
Abstract: Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.
摘要:机阅读理解(MRC)是在一段文字回答问题的任务。虽然神经系统MRC得到普及并取得显着的性能,问题被提出以用于建立自己的表现,特别是有关的被用来评估他们的黄金标准的数据设计的方法。有,但是目前在这个数据,这使得它很难得出比较,并制定可靠的假设的理解有限的挑战。作为有助于减轻该问题的第一步骤中,提出了一种统一的框架系统地研究本语言特征,一方面需要推理和背景知识和事实正确性,和词汇线索的存在,作为一个下界的理解的要求另一方面。我们提出了一个质的注释架构的第一和后者的一组近似度量。在此框架的第一个应用,我们分析现代MRC金标准,目前我们的研究结果:即对词汇歧义贡献没有特色,期望答案的不同事实的正确性和词汇线索的存在,所有这些都可能降低阅读理解的复杂性和评估数据的质量。
Viktor Schlegel, Marco Valentino, André Freitas, Goran Nenadic, Riza Batista-Navarro
Abstract: Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.
摘要:机阅读理解(MRC)是在一段文字回答问题的任务。虽然神经系统MRC得到普及并取得显着的性能,问题被提出以用于建立自己的表现,特别是有关的被用来评估他们的黄金标准的数据设计的方法。有,但是目前在这个数据,这使得它很难得出比较,并制定可靠的假设的理解有限的挑战。作为有助于减轻该问题的第一步骤中,提出了一种统一的框架系统地研究本语言特征,一方面需要推理和背景知识和事实正确性,和词汇线索的存在,作为一个下界的理解的要求另一方面。我们提出了一个质的注释架构的第一和后者的一组近似度量。在此框架的第一个应用,我们分析现代MRC金标准,目前我们的研究结果:即对词汇歧义贡献没有特色,期望答案的不同事实的正确性和词汇线索的存在,所有这些都可能降低阅读理解的复杂性和评估数据的质量。
7. Combining Pretrained High-Resource Embeddings and Subword Representations for Low-Resource Languages [PDF] 返回目录
Machel Reid, Edison Marrese-Taylor, Yutaka Matsuo
Abstract: The contrast between the need for large amounts of data for current Natural Language Processing (NLP) techniques, and the lack thereof, is accentuated in the case of African languages, most of which are considered low-resource. To help circumvent this issue, we explore techniques exploiting the qualities of morphologically rich languages (MRLs), while leveraging pretrained word vectors in well-resourced languages. In our exploration, we show that a meta-embedding approach combining both pretrained and morphologically-informed word embeddings performs best in the downstream task of Xhosa-English translation.
摘要:大量数据的当前自然语言处理(NLP)技术,以及缺乏在非洲语言,其中大部分被认为是低的资源的情况下,加剧了需求之间的对比。为了帮助规避这个问题,我们探索利用技术形态丰富的语言(最大残留限量)的质量,同时利用预先训练词矢量在资源充足的语言。在我们的探索,我们表明,元嵌入的方式在科萨英语翻译的下游任务都预先训练和形态知情字的嵌入进行最佳结合。
Machel Reid, Edison Marrese-Taylor, Yutaka Matsuo
Abstract: The contrast between the need for large amounts of data for current Natural Language Processing (NLP) techniques, and the lack thereof, is accentuated in the case of African languages, most of which are considered low-resource. To help circumvent this issue, we explore techniques exploiting the qualities of morphologically rich languages (MRLs), while leveraging pretrained word vectors in well-resourced languages. In our exploration, we show that a meta-embedding approach combining both pretrained and morphologically-informed word embeddings performs best in the downstream task of Xhosa-English translation.
摘要:大量数据的当前自然语言处理(NLP)技术,以及缺乏在非洲语言,其中大部分被认为是低的资源的情况下,加剧了需求之间的对比。为了帮助规避这个问题,我们探索利用技术形态丰富的语言(最大残留限量)的质量,同时利用预先训练词矢量在资源充足的语言。在我们的探索,我们表明,元嵌入的方式在科萨英语翻译的下游任务都预先训练和形态知情字的嵌入进行最佳结合。
8. GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model [PDF] 返回目录
Vaishali Ingale, Pushpender Singh
Abstract: Multiple-choice machine reading comprehension is difficult task as its required machines to select the correct option from a set of candidate or possible options using the given passage and question.Reading Comprehension with Multiple Choice Questions task,required a human (or machine) to read a given passage, question pair and select the best one option from n given options. There are two different ways to select the correct answer from the given passage. Either by selecting the best match answer to by eliminating the worst match answer. Here we proposed GenNet model, a neural network-based model. In this model first we will generate the answer of the question from the passage and then will matched the generated answer with given answer, the best matched option will be our answer. For answer generation we used S-net (Tan et al., 2017) model trained on SQuAD and to evaluate our model we used Large-scale RACE (ReAding Comprehension Dataset From Examinations) (Lai et al.,2017).
摘要:多项选择题机器阅读理解是其所需的机器使用给定的通道和question.Reading理解与多项选择题的任务从一组的候选人或可能的选择正确的选项艰巨的任务,需要一个人(或机器)阅读给定的通道,问题对选择由正给定的选项最好的一种选择。有两种不同的方式来选择从给定的通道正确的答案。无论是通过消除最恶劣的比赛答案中选择最匹配的答案。在这里,我们提出GENNET模型,基于神经网络模型。在这个模型中我们首先会产生从通道的问题的答案,然后将匹配给定答案中产生答案,最佳匹配的选择将是我们的答案。对于答案代我们使用S-NET(Tan等,2017)模型中训练的阵容,来评估我们的模型中,我们使用大型RACE(阅读理解数据集从考试)(Lai等,2017)。
Vaishali Ingale, Pushpender Singh
Abstract: Multiple-choice machine reading comprehension is difficult task as its required machines to select the correct option from a set of candidate or possible options using the given passage and question.Reading Comprehension with Multiple Choice Questions task,required a human (or machine) to read a given passage, question pair and select the best one option from n given options. There are two different ways to select the correct answer from the given passage. Either by selecting the best match answer to by eliminating the worst match answer. Here we proposed GenNet model, a neural network-based model. In this model first we will generate the answer of the question from the passage and then will matched the generated answer with given answer, the best matched option will be our answer. For answer generation we used S-net (Tan et al., 2017) model trained on SQuAD and to evaluate our model we used Large-scale RACE (ReAding Comprehension Dataset From Examinations) (Lai et al.,2017).
摘要:多项选择题机器阅读理解是其所需的机器使用给定的通道和question.Reading理解与多项选择题的任务从一组的候选人或可能的选择正确的选项艰巨的任务,需要一个人(或机器)阅读给定的通道,问题对选择由正给定的选项最好的一种选择。有两种不同的方式来选择从给定的通道正确的答案。无论是通过消除最恶劣的比赛答案中选择最匹配的答案。在这里,我们提出GENNET模型,基于神经网络模型。在这个模型中我们首先会产生从通道的问题的答案,然后将匹配给定答案中产生答案,最佳匹配的选择将是我们的答案。对于答案代我们使用S-NET(Tan等,2017)模型中训练的阵容,来评估我们的模型中,我们使用大型RACE(阅读理解数据集从考试)(Lai等,2017)。
9. ReZero is All You Need: Fast Convergence at Large Depth [PDF] 返回目录
Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley
Abstract: Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.
摘要:深网络使跨域显著的性能提升,但他们往往消失/爆炸梯度受到影响。这是变压器的架构,其中深度超过12层,是很难培养没有大型数据集和计算的预算来说尤其如此。在一般情况下,我们发现,低效率的信号传播阻碍了深刻的网络学习。在变压器,多头的自我关注的是这个可怜的信号传播的主要原因。为了便于深信号传播,我们建议重新调零,一个简单的改变来初始化的任意层作为恒等映射的体系结构,使用每层单个附加学习参数。我们应用此技术的语言模型和发现,我们可以很容易地超过一百层训练重新调零变压器网络。当施加到12个变压器,重新调零收敛于enwiki8快56%。重新调零适用超出变压器到其它残余的网络,使1500%更快的收敛为深全连接网络,32%用于RESNET-56上训练CIFAR 10更快的收敛。
Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley
Abstract: Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.
摘要:深网络使跨域显著的性能提升,但他们往往消失/爆炸梯度受到影响。这是变压器的架构,其中深度超过12层,是很难培养没有大型数据集和计算的预算来说尤其如此。在一般情况下,我们发现,低效率的信号传播阻碍了深刻的网络学习。在变压器,多头的自我关注的是这个可怜的信号传播的主要原因。为了便于深信号传播,我们建议重新调零,一个简单的改变来初始化的任意层作为恒等映射的体系结构,使用每层单个附加学习参数。我们应用此技术的语言模型和发现,我们可以很容易地超过一百层训练重新调零变压器网络。当施加到12个变压器,重新调零收敛于enwiki8快56%。重新调零适用超出变压器到其它残余的网络,使1500%更快的收敛为深全连接网络,32%用于RESNET-56上训练CIFAR 10更快的收敛。
10. On the coexistence of competing languages [PDF] 返回目录
Jean-Marc Luck, Anita Mehta
Abstract: We investigate the evolution of competing languages, a subject where much previous literature suggests that the outcome is always the domination of one language over all the others. Since coexistence of languages is observed in reality, we here revisit the question of language competition, with an emphasis on uncovering the ways in which coexistence might emerge. We find that this emergence is related to symmetry breaking, and explore two particular scenarios -- the first relating to an imbalance in the population dynamics of language speakers in a single geographical area, and the second to do with spatial heterogeneity, where language preferences are specific to different geographical regions. For each of these, the investigation of paradigmatic situations leads us to a quantitative understanding of the conditions leading to language coexistence. We also obtain predictions of the number of surviving languages as a function of various model parameters.
摘要:我们研究了竞争性语言的进化,在以前的许多文献表明,结局总是一种语言比所有其他人的统治的对象。由于语言共存的现实观察,我们在这里重温语言竞争的问题,对揭示其共存可能出现的方式为重点。我们发现,这个出现有关对称性破缺,探索两种特殊情形 - 第一与在语言的人在一个地理区域的人口动态失衡,而第二与空间异质性,其中的语言偏好做具体到不同的地理区域。对于这些,的范式情况的调查使我们领先的语言共存的条件的定量理解。我们也获得仅存的语言为各种模型参数的函数的数量的预测。
Jean-Marc Luck, Anita Mehta
Abstract: We investigate the evolution of competing languages, a subject where much previous literature suggests that the outcome is always the domination of one language over all the others. Since coexistence of languages is observed in reality, we here revisit the question of language competition, with an emphasis on uncovering the ways in which coexistence might emerge. We find that this emergence is related to symmetry breaking, and explore two particular scenarios -- the first relating to an imbalance in the population dynamics of language speakers in a single geographical area, and the second to do with spatial heterogeneity, where language preferences are specific to different geographical regions. For each of these, the investigation of paradigmatic situations leads us to a quantitative understanding of the conditions leading to language coexistence. We also obtain predictions of the number of surviving languages as a function of various model parameters.
摘要:我们研究了竞争性语言的进化,在以前的许多文献表明,结局总是一种语言比所有其他人的统治的对象。由于语言共存的现实观察,我们在这里重温语言竞争的问题,对揭示其共存可能出现的方式为重点。我们发现,这个出现有关对称性破缺,探索两种特殊情形 - 第一与在语言的人在一个地理区域的人口动态失衡,而第二与空间异质性,其中的语言偏好做具体到不同的地理区域。对于这些,的范式情况的调查使我们领先的语言共存的条件的定量理解。我们也获得仅存的语言为各种模型参数的函数的数量的预测。
11. Neuro-symbolic Architectures for Context Understanding [PDF] 返回目录
Alessandro Oltramari, Jonathan Francis, Cory Henson, Kaixin Ma, Ruwan Wickramarachchi
Abstract: Computational context understanding refers to an agent's ability to fuse disparate sources of information for decision-making and is, therefore, generally regarded as a prerequisite for sophisticated machine reasoning capabilities, such as in artificial intelligence (AI). Data-driven and knowledge-driven methods are two classical techniques in the pursuit of such machine sense-making capability. However, while data-driven methods seek to model the statistical regularities of events by making observations in the real-world, they remain difficult to interpret and they lack mechanisms for naturally incorporating external knowledge. Conversely, knowledge-driven methods, combine structured knowledge bases, perform symbolic reasoning based on axiomatic principles, and are more interpretable in their inferential processing; however, they often lack the ability to estimate the statistical salience of an inference. To combat these issues, we propose the use of hybrid AI methodology as a general framework for combining the strengths of both approaches. Specifically, we inherit the concept of neuro-symbolism as a way of using knowledge-bases to guide the learning progress of deep neural networks. We further ground our discussion in two applications of neuro-symbolism and, in both cases, show that our systems maintain interpretability while achieving comparable performance, relative to the state-of-the-art.
摘要:计算上下文的理解是指试剂的融合不同的信息来源的决策,并,因此,通常被视为成熟的机器的推理能力,如在人工智能(AI)的先决条件的能力。数据驱动和知识驱动的方法都在追求这样的机器意义建构能力的两种经典技术。然而,虽然数据驱动的方法试图通过在真实世界的观察到的事件的统计规律的模型,但它们仍然难以解释,他们缺乏自然引入外部知识的机制。相反,知识驱动的方法,结合结构化的知识基础,进行符号推理基础上的公理原则,并在他们的推理处理更可解释的;然而,他们往往缺乏估计推理的统计显着性的能力。为了解决这些问题,我们建议使用混合AI方法学为结合两种方法的优点的总体框架。具体来说,我们继承神经象征的概念,利用知识基础,指导深层神经网络的学习进度的一种方式。我们进一步地我们的讨论中神经象征的两个应用程序,并在这两种情况下,显示我们的系统中保持可解释性,同时实现相当的性能,相对于国家的最先进的。
Alessandro Oltramari, Jonathan Francis, Cory Henson, Kaixin Ma, Ruwan Wickramarachchi
Abstract: Computational context understanding refers to an agent's ability to fuse disparate sources of information for decision-making and is, therefore, generally regarded as a prerequisite for sophisticated machine reasoning capabilities, such as in artificial intelligence (AI). Data-driven and knowledge-driven methods are two classical techniques in the pursuit of such machine sense-making capability. However, while data-driven methods seek to model the statistical regularities of events by making observations in the real-world, they remain difficult to interpret and they lack mechanisms for naturally incorporating external knowledge. Conversely, knowledge-driven methods, combine structured knowledge bases, perform symbolic reasoning based on axiomatic principles, and are more interpretable in their inferential processing; however, they often lack the ability to estimate the statistical salience of an inference. To combat these issues, we propose the use of hybrid AI methodology as a general framework for combining the strengths of both approaches. Specifically, we inherit the concept of neuro-symbolism as a way of using knowledge-bases to guide the learning progress of deep neural networks. We further ground our discussion in two applications of neuro-symbolism and, in both cases, show that our systems maintain interpretability while achieving comparable performance, relative to the state-of-the-art.
摘要:计算上下文的理解是指试剂的融合不同的信息来源的决策,并,因此,通常被视为成熟的机器的推理能力,如在人工智能(AI)的先决条件的能力。数据驱动和知识驱动的方法都在追求这样的机器意义建构能力的两种经典技术。然而,虽然数据驱动的方法试图通过在真实世界的观察到的事件的统计规律的模型,但它们仍然难以解释,他们缺乏自然引入外部知识的机制。相反,知识驱动的方法,结合结构化的知识基础,进行符号推理基础上的公理原则,并在他们的推理处理更可解释的;然而,他们往往缺乏估计推理的统计显着性的能力。为了解决这些问题,我们建议使用混合AI方法学为结合两种方法的优点的总体框架。具体来说,我们继承神经象征的概念,利用知识基础,指导深层神经网络的学习进度的一种方式。我们进一步地我们的讨论中神经象征的两个应用程序,并在这两种情况下,显示我们的系统中保持可解释性,同时实现相当的性能,相对于国家的最先进的。
12. Ecological Semantics: Programming Environments for Situated Language Understanding [PDF] 返回目录
Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Abstract: Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions. However, extensive empirical research has shown this to be a double-edged sword, coming at the cost of shallow understanding: inferior generalization, grounding and explainability. Grounded language learning approaches offer the promise of deeper understanding by situating learning in richer, more structured training environments, but are limited in scale to relatively narrow, predefined domains. How might we enjoy the best of both worlds: grounded, general NLU? Following extensive contemporary cognitive science, we propose treating environments as ``first-class citizens'' in semantic representations, worthy of research and development in their own right. Importantly, models should also be partners in the creation and configuration of environments, rather than just actors within them, as in existing approaches. To do so, we argue that models must begin to understand and program in the language of affordances (which define possible actions in a given situation) both for online, situated discourse comprehension, as well as large-scale, offline common-sense knowledge mining. To this end we propose an environment-oriented ecological semantics, outlining theoretical and practical approaches towards implementation. We further provide actual demonstrations building upon interactive fiction programming languages.
摘要:大型自然语言理解(NLU)系统已经取得了重大进展:他们可以灵活地在各种任务中得到应用,并采用最小的结构假设。然而,大量的实证研究表明这是一个双刃剑,在认识肤浅的成本来:劣质泛化,接地和explainability。接地的语言学习方法通过更丰富,更结构化的训练环境情境的学习提供更深入的了解的承诺,但在规模上受限于相对狭窄的,预定义的域。我们如何享受两全其美:接地,一般NLU?经过广泛的当代认知科学,我们建议治疗环境中的语义表示``一等公民“”,值得在他们自己的权利的研究和开发。重要的是,车型也应该是在环境的创建和配置的合作伙伴,而不仅仅是演员在其中,如在现有的方法。要做到这一点,我们认为,模型必须开始启示(限定在特定情况下可能采取的行动)的语言理解和程序都在网上,位于语篇理解,以及大型,离线常识性的知识挖掘。为此,我们提出了一个面向环境生态语义,勾勒朝着实现理论和实践方法。我们进一步提供实际的示范建设在互动小说的编程语言。
Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Abstract: Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions. However, extensive empirical research has shown this to be a double-edged sword, coming at the cost of shallow understanding: inferior generalization, grounding and explainability. Grounded language learning approaches offer the promise of deeper understanding by situating learning in richer, more structured training environments, but are limited in scale to relatively narrow, predefined domains. How might we enjoy the best of both worlds: grounded, general NLU? Following extensive contemporary cognitive science, we propose treating environments as ``first-class citizens'' in semantic representations, worthy of research and development in their own right. Importantly, models should also be partners in the creation and configuration of environments, rather than just actors within them, as in existing approaches. To do so, we argue that models must begin to understand and program in the language of affordances (which define possible actions in a given situation) both for online, situated discourse comprehension, as well as large-scale, offline common-sense knowledge mining. To this end we propose an environment-oriented ecological semantics, outlining theoretical and practical approaches towards implementation. We further provide actual demonstrations building upon interactive fiction programming languages.
摘要:大型自然语言理解(NLU)系统已经取得了重大进展:他们可以灵活地在各种任务中得到应用,并采用最小的结构假设。然而,大量的实证研究表明这是一个双刃剑,在认识肤浅的成本来:劣质泛化,接地和explainability。接地的语言学习方法通过更丰富,更结构化的训练环境情境的学习提供更深入的了解的承诺,但在规模上受限于相对狭窄的,预定义的域。我们如何享受两全其美:接地,一般NLU?经过广泛的当代认知科学,我们建议治疗环境中的语义表示``一等公民“”,值得在他们自己的权利的研究和开发。重要的是,车型也应该是在环境的创建和配置的合作伙伴,而不仅仅是演员在其中,如在现有的方法。要做到这一点,我们认为,模型必须开始启示(限定在特定情况下可能采取的行动)的语言理解和程序都在网上,位于语篇理解,以及大型,离线常识性的知识挖掘。为此,我们提出了一个面向环境生态语义,勾勒朝着实现理论和实践方法。我们进一步提供实际的示范建设在互动小说的编程语言。
注:中文为机器翻译结果!