目录
2. Improved acoustic word embeddings for zero-resource languages using multilingual transfer [PDF] 摘要
11. Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings [PDF] 摘要
12. On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior [PDF] 摘要
13. Event Arguments Extraction via Dilate Gated Convolutional Neural Network with Enhanced Local Features [PDF] 摘要
摘要
1. Emergent Multi-Agent Communication in the Deep Learning Era [PDF] 返回目录
Angeliki Lazaridou, Marco Baroni
Abstract: The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language emerges in communities of deep agents and the characteristics of the resulting code can shed light on human language evolution, and on what is unique about the latter. From a practical perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us will make them more flexible and useful in everyday life. We review language emergence studies from each of these two angles in turn.
摘要:通过语言进行合作的能力是人类的一个定义特征。由于深的人工网络的感知,motory和规划能力提高,研究人员正在研究他们是否也可以开发一个共同的语言进行交互。从科学的角度,理解下的语言深剂的社区和结果代码的特征出现的条件可以揭示人类语言进化的光,对什么是对后者的独特。从实用的角度出发,赋予深刻的网络,通过与相互之间以及与我们沟通会使其更加灵活和有用的在日常生活中交互式解决问题的能力。我们回顾从每个反过来这两个角度的语言出现的研究。
Angeliki Lazaridou, Marco Baroni
Abstract: The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language emerges in communities of deep agents and the characteristics of the resulting code can shed light on human language evolution, and on what is unique about the latter. From a practical perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us will make them more flexible and useful in everyday life. We review language emergence studies from each of these two angles in turn.
摘要:通过语言进行合作的能力是人类的一个定义特征。由于深的人工网络的感知,motory和规划能力提高,研究人员正在研究他们是否也可以开发一个共同的语言进行交互。从科学的角度,理解下的语言深剂的社区和结果代码的特征出现的条件可以揭示人类语言进化的光,对什么是对后者的独特。从实用的角度出发,赋予深刻的网络,通过与相互之间以及与我们沟通会使其更加灵活和有用的在日常生活中交互式解决问题的能力。我们回顾从每个反过来这两个角度的语言出现的研究。
2. Improved acoustic word embeddings for zero-resource languages using multilingual transfer [PDF] 返回目录
Herman Kamper, Yevgen Matusevych, Sharon Goldwater
Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.
摘要:声字的嵌入被固定维可变长度的语音段的表示。当传统的语音识别是不可能这样的嵌入可形成语音搜索,索引和发现系统的基础。在零资源贫乏地区未标记的讲话是唯一可用的资源,我们需要给出一个任意语言的嵌入稳健的方法。在这里,我们探索多语言传递:我们培养从多个资源充足的语言标记数据的单一监督嵌入模型,然后把它应用到看不见的零资源的语言。我们考虑三个多语种回归神经网络(RNN)模型:培训了所有训练语言的词汇共同的分类;连体RNN训练从多种语言相同和不同的字之间辨别;和一个对应自动编码(CAE)RNN训练来重建字对。在六个目标语言文字辨别任务,所有这些车型的超越训练有素的零资源语言本身国家的最先进的无人监督的模式,给予平均精度超过30%的相对改善。当仅使用几个训练语言,多语种CAE性能更好,但更多的训练语言等多语种车型类似的性能。使用更多的训练语言通常是有益的,但改善是对某些语言边际。我们提出的探测实验,这表明该CAE编码比其他车型多语种语音多,持续时间词,语言标识和扬声器的信息。
Herman Kamper, Yevgen Matusevych, Sharon Goldwater
Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.
摘要:声字的嵌入被固定维可变长度的语音段的表示。当传统的语音识别是不可能这样的嵌入可形成语音搜索,索引和发现系统的基础。在零资源贫乏地区未标记的讲话是唯一可用的资源,我们需要给出一个任意语言的嵌入稳健的方法。在这里,我们探索多语言传递:我们培养从多个资源充足的语言标记数据的单一监督嵌入模型,然后把它应用到看不见的零资源的语言。我们考虑三个多语种回归神经网络(RNN)模型:培训了所有训练语言的词汇共同的分类;连体RNN训练从多种语言相同和不同的字之间辨别;和一个对应自动编码(CAE)RNN训练来重建字对。在六个目标语言文字辨别任务,所有这些车型的超越训练有素的零资源语言本身国家的最先进的无人监督的模式,给予平均精度超过30%的相对改善。当仅使用几个训练语言,多语种CAE性能更好,但更多的训练语言等多语种车型类似的性能。使用更多的训练语言通常是有益的,但改善是对某些语言边际。我们提出的探测实验,这表明该CAE编码比其他车型多语种语音多,持续时间词,语言标识和扬声器的信息。
3. CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning [PDF] 返回目录
Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon
Abstract: Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular concerning attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with abstract and situated attributes. By using diagnostic classifiers, we show that current models learn representations that are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).
摘要:途径接地语言学习通常侧重于可能不取决于了解到隐藏表示,中所需特性,例如他们的预测显着属性或推广到看不见的情况下,能力单一的基于任务的最终性能指标。为了解决这个问题,我们目前GROLLA,以及接地语言学习的评估框架与三个子任务属性:1)面向目标的评价; 2)对象属性预测评价; 3)零射门的评价。我们还提出了一个新的数据集CompGuessWhat?作为这个框架,用以评估学会神经表征的质量,特别是关于属性接地的一个实例。为此,我们扩展了原有GuessWhat?通过包括上感知一个的顶部上的语义层数据集。具体来说,我们充实与GuessWhat相关的VisualGenome场景图?抽象和位于属性的图像。通过使用诊断分类,我们表明,目前的模型学会不在表现足以编码对象属性(44.27平均F1)表示。此外,他们不学习是足够强大的时候小说的场景或对象参与游戏(零次最佳精度50.06%)表现良好的策略,也没有表示。
Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon
Abstract: Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular concerning attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with abstract and situated attributes. By using diagnostic classifiers, we show that current models learn representations that are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).
摘要:途径接地语言学习通常侧重于可能不取决于了解到隐藏表示,中所需特性,例如他们的预测显着属性或推广到看不见的情况下,能力单一的基于任务的最终性能指标。为了解决这个问题,我们目前GROLLA,以及接地语言学习的评估框架与三个子任务属性:1)面向目标的评价; 2)对象属性预测评价; 3)零射门的评价。我们还提出了一个新的数据集CompGuessWhat?作为这个框架,用以评估学会神经表征的质量,特别是关于属性接地的一个实例。为此,我们扩展了原有GuessWhat?通过包括上感知一个的顶部上的语义层数据集。具体来说,我们充实与GuessWhat相关的VisualGenome场景图?抽象和位于属性的图像。通过使用诊断分类,我们表明,目前的模型学会不在表现足以编码对象属性(44.27平均F1)表示。此外,他们不学习是足够强大的时候小说的场景或对象参与游戏(零次最佳精度50.06%)表现良好的策略,也没有表示。
4. Multi-Agent Cross-Translated Diversification for Unsupervised Machine Translation [PDF] 返回目录
Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw
Abstract: Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply these principles differently. This work introduces another component to this framework: Multi-Agent Cross-translated Diversification (MACD). The method trains multiple UMT agents and then translates monolingual data back and forth using non-duplicative agents to acquire synthetic parallel data for supervised MT. MACD is applicable to all previous UMT approaches. In our experiments, the technique boosts the performance for some commonly used UMT methods by 1.5-2.0 BLEU. In particular, in WMT'14 English-French, WMT'16 German-English and English-Romanian, MACD outperforms cross-lingual masked language model pretraining by 2.3, 2.2 and 1.6 BLEU, respectively. It also yields 1.5-3.3 BLEU improvements in IWSLT English-French and English-German translation tasks. Through extensive experimental analyses, we show that MACD is effective because it embraces data diversity while other similar variants do not.
摘要:近期无监督的机器翻译(UMT)系统通常采用三个主要原则:初始化,语言建模和迭代回译,虽然他们可能有不同的应用这些原则。这项工作引入了另一个组件,这个框架:Multi-Agent的交叉编译多样化(MACD)。该方法列车多个UMT剂,然后转化使用非重复的试剂以获取用于监督MT合成并行数据单语数据来回。 MACD是适用于所有以前UMT方法。在我们的实验中,该技术提升了由1.5-2.0 BLEU一些常用UMT方法的性能。特别是,在WMT'14英语法语,德语WMT'16 - 英语和英语,罗马尼亚语,MACD优于跨语言蒙面语言模型2.3,分别为2.2和1.6 BLEU,训练前。它也产生在IWSLT英语,法语和英语,德语翻译任务1.5-3.3 BLEU改进。通过大量的实验分析,我们表明,MACD是有效的,因为它包含了数据的多样性,而其他类似的变种没有。
Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw
Abstract: Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply these principles differently. This work introduces another component to this framework: Multi-Agent Cross-translated Diversification (MACD). The method trains multiple UMT agents and then translates monolingual data back and forth using non-duplicative agents to acquire synthetic parallel data for supervised MT. MACD is applicable to all previous UMT approaches. In our experiments, the technique boosts the performance for some commonly used UMT methods by 1.5-2.0 BLEU. In particular, in WMT'14 English-French, WMT'16 German-English and English-Romanian, MACD outperforms cross-lingual masked language model pretraining by 2.3, 2.2 and 1.6 BLEU, respectively. It also yields 1.5-3.3 BLEU improvements in IWSLT English-French and English-German translation tasks. Through extensive experimental analyses, we show that MACD is effective because it embraces data diversity while other similar variants do not.
摘要:近期无监督的机器翻译(UMT)系统通常采用三个主要原则:初始化,语言建模和迭代回译,虽然他们可能有不同的应用这些原则。这项工作引入了另一个组件,这个框架:Multi-Agent的交叉编译多样化(MACD)。该方法列车多个UMT剂,然后转化使用非重复的试剂以获取用于监督MT合成并行数据单语数据来回。 MACD是适用于所有以前UMT方法。在我们的实验中,该技术提升了由1.5-2.0 BLEU一些常用UMT方法的性能。特别是,在WMT'14英语法语,德语WMT'16 - 英语和英语,罗马尼亚语,MACD优于跨语言蒙面语言模型2.3,分别为2.2和1.6 BLEU,训练前。它也产生在IWSLT英语,法语和英语,德语翻译任务1.5-3.3 BLEU改进。通过大量的实验分析,我们表明,MACD是有效的,因为它包含了数据的多样性,而其他类似的变种没有。
5. Transfer Learning for British Sign Language Modelling [PDF] 返回目录
Boris Mocialov, Graham Turner, Helen Hastie
Abstract: Automatic speech recognition and spoken dialogue systems have made great advances through the use of deep machine learning methods. This is partly due to greater computing power but also through the large amount of data available in common languages, such as English. Conversely, research in minority languages, including sign languages, is hampered by the severe lack of data. This has led to work on transfer learning methods, whereby a model developed for one language is reused as the starting point for a model on a second language, which is less resourced. In this paper, we examine two transfer learning techniques of fine-tuning and layer substitution for language modelling of British Sign Language. Our results show improvement in perplexity when using transfer learning with standard stacked LSTM models, trained initially using a large corpus for standard English from the Penn Treebank corpus
摘要:自动语音识别和语音对话系统已经通过使用深机器学习方法取得了很大的进步。这部分是由于更高的计算能力,而且通过大量的常用语言,如英语可用的数据。相反,研究少数民族语言,包括手语,是严重缺乏数据而受到阻碍。这导致了对转让的学习方法的工作,由此对于一种语言开发的模型被再次用作对第二语言的模型,这是较贫穷的起点。在本文中,我们将考察微调和层替代的英国手语的语言建模两个转移学习技术。我们的研究结果使用与标准堆叠LSTM模型迁移学习,训练的最初使用来自宾州树库语料的标准英语大语料库显示改善困惑
Boris Mocialov, Graham Turner, Helen Hastie
Abstract: Automatic speech recognition and spoken dialogue systems have made great advances through the use of deep machine learning methods. This is partly due to greater computing power but also through the large amount of data available in common languages, such as English. Conversely, research in minority languages, including sign languages, is hampered by the severe lack of data. This has led to work on transfer learning methods, whereby a model developed for one language is reused as the starting point for a model on a second language, which is less resourced. In this paper, we examine two transfer learning techniques of fine-tuning and layer substitution for language modelling of British Sign Language. Our results show improvement in perplexity when using transfer learning with standard stacked LSTM models, trained initially using a large corpus for standard English from the Penn Treebank corpus
摘要:自动语音识别和语音对话系统已经通过使用深机器学习方法取得了很大的进步。这部分是由于更高的计算能力,而且通过大量的常用语言,如英语可用的数据。相反,研究少数民族语言,包括手语,是严重缺乏数据而受到阻碍。这导致了对转让的学习方法的工作,由此对于一种语言开发的模型被再次用作对第二语言的模型,这是较贫穷的起点。在本文中,我们将考察微调和层替代的英国手语的语言建模两个转移学习技术。我们的研究结果使用与标准堆叠LSTM模型迁移学习,训练的最初使用来自宾州树库语料的标准英语大语料库显示改善困惑
6. Towards Large-Scale Data Mining for Data-Driven Analysis of Sign Languages [PDF] 返回目录
Boris Mocialov, Graham Turner, Helen Hastie
Abstract: Access to sign language data is far from adequate. We show that it is possible to collect the data from social networking services such as TikTok, Instagram, and YouTube by applying data filtering to enforce quality standards and by discovering patterns in the filtered data, making it easier to analyse and model. Using our data collection pipeline, we collect and examine the interpretation of songs in both the American Sign Language (ASL) and the Brazilian Sign Language (Libras). We explore their differences and similarities by looking at the co-dependence of the orientation and location phonological parameters
摘要:获得手语数据是远远不够的。我们表明,它可以通过应用数据过滤以实施质量标准,并通过发现已筛选数据中的模式,以收集来自社交网络服务,如的TikTok,Instagram的,和YouTube的数据,使其更易于分析和模型。使用我们的数据收集管道,我们收集和研究的歌曲在美国手语(ASL)都解释和巴西的手语(天秤座)。我们通过观察方向和位置参数音韵的合作关系探索他们的异同
Boris Mocialov, Graham Turner, Helen Hastie
Abstract: Access to sign language data is far from adequate. We show that it is possible to collect the data from social networking services such as TikTok, Instagram, and YouTube by applying data filtering to enforce quality standards and by discovering patterns in the filtered data, making it easier to analyse and model. Using our data collection pipeline, we collect and examine the interpretation of songs in both the American Sign Language (ASL) and the Brazilian Sign Language (Libras). We explore their differences and similarities by looking at the co-dependence of the orientation and location phonological parameters
摘要:获得手语数据是远远不够的。我们表明,它可以通过应用数据过滤以实施质量标准,并通过发现已筛选数据中的模式,以收集来自社交网络服务,如的TikTok,Instagram的,和YouTube的数据,使其更易于分析和模型。使用我们的数据收集管道,我们收集和研究的歌曲在美国手语(ASL)都解释和巴西的手语(天秤座)。我们通过观察方向和位置参数音韵的合作关系探索他们的异同
7. Exploiting Class Labels to Boost Performance on Embedding-based Text Classification [PDF] 返回目录
Arkaitz Zubiaga
Abstract: Text classification is one of the most frequent tasks for processing textual data, facilitating among others research from large-scale datasets. Embeddings of different kinds have recently become the de facto standard as features used for text classification. These embeddings have the capacity to capture meanings of words inferred from occurrences in large external collections. While they are built out of external collections, they are unaware of the distributional characteristics of words in the classification dataset at hand, including most importantly the distribution of words across classes in training data. To make the most of these embeddings as features and to boost the performance of classifiers using them, we introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Our experiments on eight datasets show the effectiveness of TF-CR, leading to improved performance scores over the well-known weighting schemes TF-IDF and KLD as well as over the absence of a weighting scheme in most cases.
摘要:文本分类是用于处理文本数据的最常见的任务之一,其中包括促进从大型数据集的研究。不同种类的嵌入最近已成为事实上的标准作为用于文本分类特征。这些嵌入物必须捕获发生在大的外部收款推断词义的能力。虽然他们的外部收款的建造出来,他们不知道在分类数据集的话手头的分布特征,其中最重要的词的跨类训练数据的分布。为了充分利用这些的嵌入的作为特征,并使用它们来提高分类器的性能,我们计算当引入一个加权方案,术语频率类别比率(TF-CR),其可以加权高频,类别排他性词语更高字的嵌入。我们的八个数据集实验表明TF-CR的有效性,从而提高性能得分超过著名的加权方案TF-IDF和KLD以及在缺乏在大多数情况下的权重方案。
Arkaitz Zubiaga
Abstract: Text classification is one of the most frequent tasks for processing textual data, facilitating among others research from large-scale datasets. Embeddings of different kinds have recently become the de facto standard as features used for text classification. These embeddings have the capacity to capture meanings of words inferred from occurrences in large external collections. While they are built out of external collections, they are unaware of the distributional characteristics of words in the classification dataset at hand, including most importantly the distribution of words across classes in training data. To make the most of these embeddings as features and to boost the performance of classifiers using them, we introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Our experiments on eight datasets show the effectiveness of TF-CR, leading to improved performance scores over the well-known weighting schemes TF-IDF and KLD as well as over the absence of a weighting scheme in most cases.
摘要:文本分类是用于处理文本数据的最常见的任务之一,其中包括促进从大型数据集的研究。不同种类的嵌入最近已成为事实上的标准作为用于文本分类特征。这些嵌入物必须捕获发生在大的外部收款推断词义的能力。虽然他们的外部收款的建造出来,他们不知道在分类数据集的话手头的分布特征,其中最重要的词的跨类训练数据的分布。为了充分利用这些的嵌入的作为特征,并使用它们来提高分类器的性能,我们计算当引入一个加权方案,术语频率类别比率(TF-CR),其可以加权高频,类别排他性词语更高字的嵌入。我们的八个数据集实验表明TF-CR的有效性,从而提高性能得分超过著名的加权方案TF-IDF和KLD以及在缺乏在大多数情况下的权重方案。
8. Norm-Based Curriculum Learning for Neural Machine Translation [PDF] 返回目录
Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao
Abstract: A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the difficulty of the sentence, 2) the competence of the model, and 3) the weight of the sentence. The norm-based sentence difficulty takes the advantages of both linguistically motivated and model-based sentence difficulties. It is easy to determine and contains learning-dependent features. The norm-based model competence makes NMT learn the curriculum in a fully automated way, while the norm-based sentence weight further enhances the learning of the vector representation of the NMT. Experimental results for the WMT'14 English-German and WMT'17 Chinese-English translation tasks demonstrate that the proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x).
摘要:神经机器翻译(NMT)系统是昂贵的火车,尤其是高资源设置。随着NMT架构变得更深,更广,这个问题变得越来越坏。在本文中,我们的目标是改进的通过引入新颖的基于准则课程学习方法训练的NMT效率。我们使用标准(又名长度或模块)的单词嵌入作为1的度量)句子的困难,2)模型的能力,和3)句子的重量。基于规范,句子难度采取两种语言的动机和基于模型的句子困难的优点。这是很容易确定,并包含学习相关的功能。基于标准模型的能力,使学习NMT在一个完全自动化的方式,课程,而基于规范句重量进一步增强了NMT的向量表示的学习。对于WMT'14英语,德语和WMT'17中国 - 英语翻译任务的实验结果表明,该方法优于强基线在(+ 1.17 / + 1.56)和培训加速(2.22x / 3.33x)BLEU分数方面。
Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao
Abstract: A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the difficulty of the sentence, 2) the competence of the model, and 3) the weight of the sentence. The norm-based sentence difficulty takes the advantages of both linguistically motivated and model-based sentence difficulties. It is easy to determine and contains learning-dependent features. The norm-based model competence makes NMT learn the curriculum in a fully automated way, while the norm-based sentence weight further enhances the learning of the vector representation of the NMT. Experimental results for the WMT'14 English-German and WMT'17 Chinese-English translation tasks demonstrate that the proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x).
摘要:神经机器翻译(NMT)系统是昂贵的火车,尤其是高资源设置。随着NMT架构变得更深,更广,这个问题变得越来越坏。在本文中,我们的目标是改进的通过引入新颖的基于准则课程学习方法训练的NMT效率。我们使用标准(又名长度或模块)的单词嵌入作为1的度量)句子的困难,2)模型的能力,和3)句子的重量。基于规范,句子难度采取两种语言的动机和基于模型的句子困难的优点。这是很容易确定,并包含学习相关的功能。基于标准模型的能力,使学习NMT在一个完全自动化的方式,课程,而基于规范句重量进一步增强了NMT的向量表示的学习。对于WMT'14英语,德语和WMT'17中国 - 英语翻译任务的实验结果表明,该方法优于强基线在(+ 1.17 / + 1.56)和培训加速(2.22x / 3.33x)BLEU分数方面。
9. Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2 [PDF] 返回目录
Virapat Kieuvongngam, Bowen Tan, Yiming Niu
Abstract: With the COVID-19 pandemic, there is a growing urgency for medical community to keep up with the accelerating growth in the new coronavirus-related literature. As a result, the COVID-19 Open Research Dataset Challenge has released a corpus of scholarly articles and is calling for machine learning approaches to help bridging the gap between the researchers and the rapidly growing publications. Here, we take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2, to solve this challenge by performing text summarization on this dataset. We evaluate the results using ROUGE scores and visual inspection. Our model provides abstractive and comprehensive information based on keywords extracted from the original articles. Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
摘要:随着COVID-19大流行,人们越来越迫切性医学界跟上新的冠状病毒相关文献的加速增长。其结果是,在COVID-19开放研究数据集的挑战已经发布了学术文章的文集,并呼吁机器学习方法来帮助弥合研究人员和快速增长的出版物之间的差距。在这里,我们采取预先训练NLP模型的最新进展的优势,BERT和OpenAI GPT-2,通过对数据集执行文本摘要解决了这个难题。我们评估使用ROUGE得分和目测的结果。我们的模型提供了基于从原来的文章所提取关键字抽象和全面的信息。我们的工作可以帮助医学界,通过提供的文章,其抽象尚未提供简要概述。
Virapat Kieuvongngam, Bowen Tan, Yiming Niu
Abstract: With the COVID-19 pandemic, there is a growing urgency for medical community to keep up with the accelerating growth in the new coronavirus-related literature. As a result, the COVID-19 Open Research Dataset Challenge has released a corpus of scholarly articles and is calling for machine learning approaches to help bridging the gap between the researchers and the rapidly growing publications. Here, we take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2, to solve this challenge by performing text summarization on this dataset. We evaluate the results using ROUGE scores and visual inspection. Our model provides abstractive and comprehensive information based on keywords extracted from the original articles. Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
摘要:随着COVID-19大流行,人们越来越迫切性医学界跟上新的冠状病毒相关文献的加速增长。其结果是,在COVID-19开放研究数据集的挑战已经发布了学术文章的文集,并呼吁机器学习方法来帮助弥合研究人员和快速增长的出版物之间的差距。在这里,我们采取预先训练NLP模型的最新进展的优势,BERT和OpenAI GPT-2,通过对数据集执行文本摘要解决了这个难题。我们评估使用ROUGE得分和目测的结果。我们的模型提供了基于从原来的文章所提取关键字抽象和全面的信息。我们的工作可以帮助医学界,通过提供的文章,其抽象尚未提供简要概述。
10. The Typology of Polysemy: A Multilingual Distributional Framework [PDF] 返回目录
Ella Rabinovich, Yang Xu, Suzanne Stevenson
Abstract: Lexical semantic typology has identified important cross-linguistic generalizations about the variation and commonalities in polysemy patterns---how languages package up meanings into words. Recent computational research has enabled investigation of lexical semantics at a much larger scale, but little work has explored lexical typology across semantic domains, nor the factors that influence cross-linguistic similarities. We present a novel computational framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept. Our approach defines a common multilingual semantic space that enables a direct comparison of the lexical expression of concepts across languages. We validate our framework against empirical findings on lexical semantic typology at both the concept and domain levels. Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny, that co-shape the typology of polysemy across languages.
摘要:词汇语义类型学已经确定有关的变化和共性中一词多义的模式---语言如何打包含义进言重要的跨语言概括。最近的计算研究,在更大的范围,使词汇语义学的调查,但小的工作探索词汇类型学跨越语义域,也不影响跨语言的相似的因素。我们提出,其量化语义亲和力,用于概念词汇语义的跨语言相似度的新的计算框架。我们的方法定义了一个通用多语言语义空间,使跨语言概念的词汇表达的直接比较。我们确认我们反对的概念和域级别都词汇语义类型学的实证研究结果框架。我们的研究结果显示语义域和非语言因素之间错综复杂的相互作用,超越语言的发展史,是共同塑造一词多义的跨语言类型学。
Ella Rabinovich, Yang Xu, Suzanne Stevenson
Abstract: Lexical semantic typology has identified important cross-linguistic generalizations about the variation and commonalities in polysemy patterns---how languages package up meanings into words. Recent computational research has enabled investigation of lexical semantics at a much larger scale, but little work has explored lexical typology across semantic domains, nor the factors that influence cross-linguistic similarities. We present a novel computational framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept. Our approach defines a common multilingual semantic space that enables a direct comparison of the lexical expression of concepts across languages. We validate our framework against empirical findings on lexical semantic typology at both the concept and domain levels. Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny, that co-shape the typology of polysemy across languages.
摘要:词汇语义类型学已经确定有关的变化和共性中一词多义的模式---语言如何打包含义进言重要的跨语言概括。最近的计算研究,在更大的范围,使词汇语义学的调查,但小的工作探索词汇类型学跨越语义域,也不影响跨语言的相似的因素。我们提出,其量化语义亲和力,用于概念词汇语义的跨语言相似度的新的计算框架。我们的方法定义了一个通用多语言语义空间,使跨语言概念的词汇表达的直接比较。我们确认我们反对的概念和域级别都词汇语义类型学的实证研究结果框架。我们的研究结果显示语义域和非语言因素之间错综复杂的相互作用,超越语言的发展史,是共同塑造一词多义的跨语言类型学。
11. Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings [PDF] 返回目录
Vaibhav Kumar, Tenzin Singhay Bhotia, Vaibhav Kumar, Tanmoy Chakraborty
Abstract: Word embeddings are the standard model for semantic and syntactic representations of words. Unfortunately, these models have been shown to exhibit undesirable word associations resulting from gender, racial, and religious biases. Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors. In this paper, we propose RAN-Debias, a novel gender debiasing methodology which not only eliminates the bias present in a word vector but also alters the spatial distribution of its neighbouring vectors, achieving a bias-free setting while maintaining minimal semantic offset. We also propose a new bias evaluation metric - Gender-based Illicit Proximity Estimate (GIPE), which measures the extent of undue proximity in word vectors resulting from the presence of gender-based predilections. Experiments based on a suite of evaluation metrics show that RAN-Debias significantly outperforms the state-of-the-art in reducing proximity bias (GIPE) by at least 42.02%. It also reduces direct bias, adding minimal semantic disturbance, and achieves the best performance in a downstream application task (coreference resolution).
摘要:Word中的嵌入用语言来表达的语义和句法表示的标准模型。不幸的是,这些模型已被证明具有从性别,种族和宗教偏见造成不良联想词。用于消除直流偏压的嵌入字现有后处理方法不能减轻性别偏见隐藏在字矢量的空间排列。在本文中,我们提出了RAN-消除直流偏压,一个新的性别消除直流偏压方法,它不仅消除本偏见的单词矢量也改变了其邻近矢量的空间分布,实现了无偏差设定,同时保持最小的语义偏移。我们还提出了一种新的偏置评估度量 - 基于性别的非法接近估计(GIPE),其测量在从基于性别的偏好存在而产生的字矢量过分接近的程度。基于一套评价标准的实验表明,RAN-消除直流偏压显著优于状态的最先进的在至少42.02%减少接近偏压(GIPE)。这也降低了直流偏置,增加最小的语义干扰,并实现在下游应用任务(共指分辨率)的最佳性能。
Vaibhav Kumar, Tenzin Singhay Bhotia, Vaibhav Kumar, Tanmoy Chakraborty
Abstract: Word embeddings are the standard model for semantic and syntactic representations of words. Unfortunately, these models have been shown to exhibit undesirable word associations resulting from gender, racial, and religious biases. Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors. In this paper, we propose RAN-Debias, a novel gender debiasing methodology which not only eliminates the bias present in a word vector but also alters the spatial distribution of its neighbouring vectors, achieving a bias-free setting while maintaining minimal semantic offset. We also propose a new bias evaluation metric - Gender-based Illicit Proximity Estimate (GIPE), which measures the extent of undue proximity in word vectors resulting from the presence of gender-based predilections. Experiments based on a suite of evaluation metrics show that RAN-Debias significantly outperforms the state-of-the-art in reducing proximity bias (GIPE) by at least 42.02%. It also reduces direct bias, adding minimal semantic disturbance, and achieves the best performance in a downstream application task (coreference resolution).
摘要:Word中的嵌入用语言来表达的语义和句法表示的标准模型。不幸的是,这些模型已被证明具有从性别,种族和宗教偏见造成不良联想词。用于消除直流偏压的嵌入字现有后处理方法不能减轻性别偏见隐藏在字矢量的空间排列。在本文中,我们提出了RAN-消除直流偏压,一个新的性别消除直流偏压方法,它不仅消除本偏见的单词矢量也改变了其邻近矢量的空间分布,实现了无偏差设定,同时保持最小的语义偏移。我们还提出了一种新的偏置评估度量 - 基于性别的非法接近估计(GIPE),其测量在从基于性别的偏好存在而产生的字矢量过分接近的程度。基于一套评价标准的实验表明,RAN-消除直流偏压显著优于状态的最先进的在至少42.02%减少接近偏压(GIPE)。这也降低了直流偏置,增加最小的语义干扰,并实现在下游应用任务(共指分辨率)的最佳性能。
12. On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior [PDF] 返回目录
Ethan Gotlieb Wilcox, Jon Gauthier, Jennifer Hu, Peng Qian, Roger Levy
Abstract: Human reading behavior is tuned to the statistics of natural language: the time it takes human subjects to read a word can be predicted from estimates of the word's probability in context. However, it remains an open question what computational architecture best characterizes the expectations deployed in real time by humans that determine the behavioral signatures of reading. Here we test over two dozen models, independently manipulating computational architecture and training dataset size, on how well their next-word expectations predict human reading time behavior on naturalistic text corpora. We find that across model architectures and training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evaluate how features of these models determine their psychometric predictive power, or ability to predict human reading behavior. In general, the better a model's next-word expectations, the better its psychometric predictive power. However, we find nontrivial differences across model architectures. For any given perplexity, deep Transformer models and n-gram models generally show superior psychometric predictive power over LSTM or structurally supervised neural models, especially for eye movement data. Finally, we compare models' psychometric predictive power to the depth of their syntactic knowledge, as measured by a battery of syntactic generalization tests developed using methods from controlled psycholinguistic experiments. Once perplexity is controlled for, we find no significant relationship between syntactic knowledge and predictive power. These results suggest that different approaches may be required to best model human real-time language comprehension behavior in naturalistic reading versus behavior for controlled linguistic materials designed for targeted probing of syntactic knowledge.
摘要:人的阅读行为被调整到自然语言的统计信息:它需要人类受试者读一个字的时间可以从上下文中的单词的概率的估计来预测。但是,它仍然是一个悬而未决的问题是什么计算体系结构的最佳表征了决定的阅读行为特征人类部署实时的期望。在这里,我们测试超过两级的机型,独立操作的计算架构和训练数据集的大小,他们的下一个字的预期如何很好地预测对自然语料库人的阅读时间的行为。我们发现,整个模型结构和训练数据集大小的字数概率和阅读的时间是(近)的线性关系。接下来,我们评估这些模型的特征是如何确定自己的心理预测能力,或预测人类阅读行为的能力。在一般情况下,更好的模型的下一个字的期望,更好的心理预测能力。然而,我们发现整个模型架构平凡的差异。对于任何给定的困惑,深Transformer模型和正克模型一般显示一段LSTM优越的心理预测能力或结构的监督神经模型,尤其是对眼睛运动数据。最后,我们比较模型的心理预测能力的他们的语法知识的深度,通过使用从控制心理语言学的实验方法开发语法概括测试的电池测量。一旦困惑被控制,我们发现语法知识和预测能力之间没有显著的关系。这些结果表明,不同的方法可能需要最好的模型人的实时语言理解的行为在自然阅读与设计用于探测有针对性的语法知识的控制语言材料的行为。
Ethan Gotlieb Wilcox, Jon Gauthier, Jennifer Hu, Peng Qian, Roger Levy
Abstract: Human reading behavior is tuned to the statistics of natural language: the time it takes human subjects to read a word can be predicted from estimates of the word's probability in context. However, it remains an open question what computational architecture best characterizes the expectations deployed in real time by humans that determine the behavioral signatures of reading. Here we test over two dozen models, independently manipulating computational architecture and training dataset size, on how well their next-word expectations predict human reading time behavior on naturalistic text corpora. We find that across model architectures and training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evaluate how features of these models determine their psychometric predictive power, or ability to predict human reading behavior. In general, the better a model's next-word expectations, the better its psychometric predictive power. However, we find nontrivial differences across model architectures. For any given perplexity, deep Transformer models and n-gram models generally show superior psychometric predictive power over LSTM or structurally supervised neural models, especially for eye movement data. Finally, we compare models' psychometric predictive power to the depth of their syntactic knowledge, as measured by a battery of syntactic generalization tests developed using methods from controlled psycholinguistic experiments. Once perplexity is controlled for, we find no significant relationship between syntactic knowledge and predictive power. These results suggest that different approaches may be required to best model human real-time language comprehension behavior in naturalistic reading versus behavior for controlled linguistic materials designed for targeted probing of syntactic knowledge.
摘要:人的阅读行为被调整到自然语言的统计信息:它需要人类受试者读一个字的时间可以从上下文中的单词的概率的估计来预测。但是,它仍然是一个悬而未决的问题是什么计算体系结构的最佳表征了决定的阅读行为特征人类部署实时的期望。在这里,我们测试超过两级的机型,独立操作的计算架构和训练数据集的大小,他们的下一个字的预期如何很好地预测对自然语料库人的阅读时间的行为。我们发现,整个模型结构和训练数据集大小的字数概率和阅读的时间是(近)的线性关系。接下来,我们评估这些模型的特征是如何确定自己的心理预测能力,或预测人类阅读行为的能力。在一般情况下,更好的模型的下一个字的期望,更好的心理预测能力。然而,我们发现整个模型架构平凡的差异。对于任何给定的困惑,深Transformer模型和正克模型一般显示一段LSTM优越的心理预测能力或结构的监督神经模型,尤其是对眼睛运动数据。最后,我们比较模型的心理预测能力的他们的语法知识的深度,通过使用从控制心理语言学的实验方法开发语法概括测试的电池测量。一旦困惑被控制,我们发现语法知识和预测能力之间没有显著的关系。这些结果表明,不同的方法可能需要最好的模型人的实时语言理解的行为在自然阅读与设计用于探测有针对性的语法知识的控制语言材料的行为。
13. Event Arguments Extraction via Dilate Gated Convolutional Neural Network with Enhanced Local Features [PDF] 返回目录
Zhigang Kan, Linbo Qiao, Sen Yang, Feng Liu, Feng Huang
Abstract: Event Extraction plays an important role in information-extraction to understand the world. Event extraction could be split into two subtasks: one is event trigger extraction, the other is event arguments extraction. However, the F-Score of event arguments extraction is much lower than that of event trigger extraction, i.e. in the most recent work, event trigger extraction achieves 80.7%, while event arguments extraction achieves only 58%. In pipelined structures, the difficulty of event arguments extraction lies in its lack of classification feature, and the much higher computation consumption. In this work, we proposed a novel Event Extraction approach based on multi-layer Dilate Gated Convolutional Neural Network (EE-DGCNN) which has fewer parameters. In addition, enhanced local information is incorporated into word features, to assign event arguments roles for triggers predicted by the first subtask. The numerical experiments demonstrated significant performance improvement beyond state-of-art event extraction approaches on real-world datasets. Further analysis of extraction procedure is presented, as well as experiments are conducted to analyze impact factors related to the performance improvement.
摘要:事件抽取起着信息提取了重要的作用,了解世界。事件提取可以分成两个子任务:一个是事件触发提取,另一种是事件参数的提取。然而,事件的参数提取的F-分数比事件触发提取,即在最近的工作中,事件触发提取达到80.7%的要低得多,而事件参数的提取只能达到58%。在流水线结构,事件的参数提取谎言在它缺乏分类功能,以及更高的计算消耗的难度。在这项工作中,我们提出了基于多层扩张封闭式卷积神经网络(EE-DGCNN),其中有更少的参数上一个新的事件抽取方法。此外,增强型本地信息被并入字特征,以分配事件参数的角色由第一子任务预测触发器。数值实验证明显著的性能提升超越国家的艺术活动提取对现实世界的数据集的方法。提取过程的进一步分析被呈现,以及实验以分析有关的性能改善的影响因子。
Zhigang Kan, Linbo Qiao, Sen Yang, Feng Liu, Feng Huang
Abstract: Event Extraction plays an important role in information-extraction to understand the world. Event extraction could be split into two subtasks: one is event trigger extraction, the other is event arguments extraction. However, the F-Score of event arguments extraction is much lower than that of event trigger extraction, i.e. in the most recent work, event trigger extraction achieves 80.7%, while event arguments extraction achieves only 58%. In pipelined structures, the difficulty of event arguments extraction lies in its lack of classification feature, and the much higher computation consumption. In this work, we proposed a novel Event Extraction approach based on multi-layer Dilate Gated Convolutional Neural Network (EE-DGCNN) which has fewer parameters. In addition, enhanced local information is incorporated into word features, to assign event arguments roles for triggers predicted by the first subtask. The numerical experiments demonstrated significant performance improvement beyond state-of-art event extraction approaches on real-world datasets. Further analysis of extraction procedure is presented, as well as experiments are conducted to analyze impact factors related to the performance improvement.
摘要:事件抽取起着信息提取了重要的作用,了解世界。事件提取可以分成两个子任务:一个是事件触发提取,另一种是事件参数的提取。然而,事件的参数提取的F-分数比事件触发提取,即在最近的工作中,事件触发提取达到80.7%的要低得多,而事件参数的提取只能达到58%。在流水线结构,事件的参数提取谎言在它缺乏分类功能,以及更高的计算消耗的难度。在这项工作中,我们提出了基于多层扩张封闭式卷积神经网络(EE-DGCNN),其中有更少的参数上一个新的事件抽取方法。此外,增强型本地信息被并入字特征,以分配事件参数的角色由第一子任务预测触发器。数值实验证明显著的性能提升超越国家的艺术活动提取对现实世界的数据集的方法。提取过程的进一步分析被呈现,以及实验以分析有关的性能改善的影响因子。
14. REL: An Entity Linker Standing on the Shoulders of Giants [PDF] 返回目录
Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, Arjen P. de Vries
Abstract: Entity linking is a standard component in modern retrieval system that is often performed by third-party toolkits. Despite the plethora of open source options, it is difficult to find a single system that has a modular architecture where certain components may be replaced, does not depend on external sources, can easily be updated to newer Wikipedia versions, and, most important of all, has state-of-the-art performance. The REL system presented in this paper aims to fill that gap. Building on state-of-the-art neural components from natural language processing research, it is provided as a Python package as well as a web API. We also report on an experimental comparison against both well-established systems and the current state-of-the-art on standard entity linking benchmarks.
摘要:实体链接是现代检索系统的标准组件,通常是由第三方工具包进行。尽管开源选项过多,很难找到具有模块化体系结构,其中某些成分可能被替换一个单一的系统,不依赖于外部资源,可以很容易地更新到新的维基百科版本,并且,最重要的是具有先进的最先进的性能。本文旨在提出的REL系统填补了这个空白。从自然语言处理研究的国家的最先进的神经元件的基础上,它提供了一个Python包,以及一个Web API。我们还对两种行之有效的制度和目前的标准实体连接基准的国家的最先进的实验比较报告。
Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, Arjen P. de Vries
Abstract: Entity linking is a standard component in modern retrieval system that is often performed by third-party toolkits. Despite the plethora of open source options, it is difficult to find a single system that has a modular architecture where certain components may be replaced, does not depend on external sources, can easily be updated to newer Wikipedia versions, and, most important of all, has state-of-the-art performance. The REL system presented in this paper aims to fill that gap. Building on state-of-the-art neural components from natural language processing research, it is provided as a Python package as well as a web API. We also report on an experimental comparison against both well-established systems and the current state-of-the-art on standard entity linking benchmarks.
摘要:实体链接是现代检索系统的标准组件,通常是由第三方工具包进行。尽管开源选项过多,很难找到具有模块化体系结构,其中某些成分可能被替换一个单一的系统,不依赖于外部资源,可以很容易地更新到新的维基百科版本,并且,最重要的是具有先进的最先进的性能。本文旨在提出的REL系统填补了这个空白。从自然语言处理研究的国家的最先进的神经元件的基础上,它提供了一个Python包,以及一个Web API。我们还对两种行之有效的制度和目前的标准实体连接基准的国家的最先进的实验比较报告。
注:中文为机器翻译结果!