摘要

1. A Set of Recommendations for Assessing Human-Machine Parity in Language Translation [PDF] 返回目录
Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, Antonio Toral
Abstract: The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
摘要：机器翻译的质量已显着增加在过去几年里，以它被认为是从一些实证研究的专业人工翻译无法区分的程度。我们重新评估哈桑等人2018年调查，中国的英语新闻翻译，显示出人机平价的结论是欠在评价设计的弱点 - 这被认为是当前该领域的最佳实践。我们表明，专业的人力翻译包含显著更少的错误，并在人工评估感知质量取决于选择评估者，语境的可用性和参考译文的创建。我们的研究结果呼吁重新审视当前的最佳实践，以评估一般和人机平价强的机器翻译系统特别为我们提供了一组基于我们的实证研究结果建议。

2. Directions in Abusive Language Training Data: Garbage In, Garbage Out [PDF] 返回目录
Bertie Vidgen, Leon Derczynski
Abstract: Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.
摘要：数据驱动分析和滥用在线内容涵盖了许多不同的任务，现象，上下文和方法的检测。本文系统地回顾辱骂数据集创建和内容连同开放的网站进行编目辱骂性语言数据。知识引线的集合的合成提供与这个复杂和高度多样化的数据工作从业者的证据为基础的建议。

3. Aligned Cross Entropy for Non-Autoregressive Machine Translation [PDF] 返回目录
Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy
Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.
摘要：非自回归机器翻译模型显著加快通过允许整个靶序列的并行预测解码。然而，造型词序更由于缺乏模型自回归系数挑战。这是艰苦训练，交叉熵损失，可高度惩处词序小的变化过程复杂。在本文中，我们建议对齐交叉熵（AX）作为替代损失函数非自回归模型的训练。 AX使用基于目标标记和模型预测之间的最佳对准的单调可微分动态程序来分配损失。有条件的蒙面语言模型（CMLMs）基于AX-训练显着地提高对重大WMT基准性能，同时设定了一个新的艺术状态的非自回归模型。

4. Analyzing autoencoder-based acoustic word embeddings [PDF] 返回目录
Yevgen Matusevych, Herman Kamper, Sharon Goldwater
Abstract: Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.
摘要：最近的研究已经推出了学习声字的嵌入（AWES）---编码的声学特征词的固定大小的矢量表示方法。尽管广泛使用的语音处理研究AWES的，他们只被定量在他们的整个字标记区分能力评估。为了更好地了解各种下游任务和认知建模AWES的应用，我们需要分析AWES的表现空间。在这里，我们通过分析一系列对序列编码器，解码器模型在六个不同的类型学学会语言AWE空间的基本属性。我们首先表明，这些AWES保留有关的话绝对时间和扬声器的一些信息。与此同时，这些AWES的表示空间被组织为使得与这些词语的嵌入增加语音相异字之间的距离。最后，AWES表现出字发病偏见，类似于人类语音处理和词汇出入各种研究报告的模式。我们认为这是一个有前途的结果，并鼓励AWES的进一步评估。结果在认知科学可能有用的工具，它可以提供语音处理和词汇记忆之间的联系。

5. Keyphrase Rubric Relationship Classification in Complex Assignments [PDF] 返回目录
Manikandan Ravikiran
Abstract: Complex assignments are open-ended question with varying content irrespective of diversity of course and mode of communication. With sheer scale comes issue of reviews that are incomplete and lack details leading to high regrading requests. As such to automatically relate the contents of assignments to scoring rubric, in this work we present a very first work on keyphrase-rubric relationship classification i.e. we will try to relate the contents to rubrics by solving it as classification problem. In this study, we analyze both supervised and unsupervised methods to find that supervised approaches outperform unsupervised approaches and topic modelling approaches, despite data limitation with supervised approaches producing maximum results of 0.48 F1-Score and unsupervised approach producing best result of 0.31 F1-Score. We further present exhaustive experimentation and cluster analysis using multiple metrics identifying cases where the unsupervised and supervised methods are usable.
摘要：复杂的任务是与不同的无关当然和通信方式的多样性，内容的开放式问题。凭借庞大的规模来评论是不完整的，缺乏细节导致高改编职系要求的问题。因此，自动分配相关的内容得分专栏，在这项工作中，我们提出的关键词短语，专栏关系的第一个作品分类，即我们将尝试解决它作为分类问题涉及的内容量规。在这项研究中，我们分析了这两种监督和无监督的方法来发现，监督的方法优于无人监督的方法和主题建模方法，尽管数据的限制与监督的方法生产的0.48 F1-得分和不受监督的方法生产0.31 F1-得分的最好成绩最大的效果。我们进一步详尽本实验和使用多个指标识别其中无监督和监督的方法是可用的情况下，聚类分析。

6. Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation [PDF] 返回目录
Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz, Rafael C. Carrasco
Abstract: Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refinement algorithm presented in this work is that this single non-terminal is overloaded, and insufficiently discriminative, and therefore, an adequate split of it into more specialised symbols could lead to improved models. This paper presents a method to learn synchronous context-free grammars with a huge number of initial non-terminals, which are then grouped via a clustering algorithm. Our experiments show that the resulting smaller set of non-terminals correctly capture the contextual information that makes it possible to statistically significantly improve the BLEU score of the standard HSMT approach.
摘要：基于层次基于短语的统计机器翻译（HSMT）翻译模型显示比一些语言对非分层基于短语的同行更好的性能。到HSMT的标准方法学习并应用与单个非末端的同步上下文无关文法。在这项工作中提出的语法细化算法背后的假设是，这种单一的非终端过载，并且歧视性不足，因此，它的一个适当的分割成多个专业符号可能导致改进型号。本文提出了一种方法来学习同步上下文无关文法与初始非端子，然后将其通过一个聚类算法分组的数量巨大。我们的实验表明，导致较小的组非终端的正确捕捉上下文信息，使得它能够统计显著提高标准HSMT方法的BLEU得分。

7. XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation [PDF] 返回目录
Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Bruce Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Ming Zhou
Abstract: In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.
摘要：在本文中，我们介绍XGLUE，一个新的基准数据集训练使用多语种的双语语料的大型跨语种预先训练模型，并评估在一组不同的跨语种的任务他们的表现。（Wang等，2019），相较于胶水，这是用英语标注，包括自然语言理解只的任务，XGLUE主要有三个优点：（1）它提供了两个语料库不同大小的跨语种前的培训; （2）它提供了覆盖两个自然语言理解和生成场景11个多样化任务; （3）为每个任务，它提供多语言标记的数据。我们延续近期的跨语种预训练模型Unicoder（Huang等，2019），以涵盖的理解和生成任务，这是在XGLUE评价作为一个强大的基础。我们还评估多语种BERT，XLM和XLM-R的基本版本（12层）进行比较。

8. MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing [PDF] 返回目录
Tao Zhang, Congying Xia, Chun-Ta Lu, Philip Yu
Abstract: Named entity typing (NET) is a classification task of assigning an entity mention in the context with given semantic types. However, with the growing size and granularity of the entity types, rare researches in previous concern with newly emerged entity types. In this paper, we propose MZET, a novel memory augmented FNET (Fine-grained NET) model, to tackle the unseen types in a zero-shot manner. MZET incorporates character-level, word-level, and contextural-level information to learn the entity mention representation. Besides, MZET considers the semantic meaning and the hierarchical structure into the entity type representation. Finally, through the memory component which models the relationship between the entity mention and the entity type, MZET transfer the knowledge from seen entity types to the zero-shot ones. Extensive experiments on three public datasets show prominent performance obtained by MZET, which surpasses the state-of-the-art FNET neural network models with up to 7\% gain in Micro-F1 and Macro-F1 score.
摘要：命名实体类型（NET）是分配与给定的语义类型的上下文中的实体提的分类任务。然而，随着实体类型，与新出现的实体类型此前备受关注的研究罕见规模的不断扩大和粒度。在本文中，我们提出MZET，一种新的内存扩充FNET（细粒度NET）模型，解决了看不见的类型在零射门的方式。 MZET包含字符级，字级和contextural级信息以了解实体提表示。此外，MZET认为语义和分层结构进实体类型的表示。最后，通过记忆组件，它的模型实体提及和实体类型之间的关系，从MZET看到实体类型转移知识，以零射门的。三个公共数据集大量的实验表明由MZET，它超越了国家的最先进的FNET神经网络模型多达微F1 7 \％的涨幅和宏观F1比分取得了显着的性能。

9. R3: A Reading Comprehension Benchmark Requiring Reasoning Processes [PDF] 返回目录
Ran Wang, Kun Tao, Dingjie Song, Zhilong Zhang, Xiao Ma, Xi'ao Su, Xinyu Dai
Abstract: Existing question answering systems can only predict answers without explicit reasoning processes, which hinder their explainability and make us overestimate their ability of understanding and reasoning over natural language. In this work, we propose a novel task of reading comprehension, in which a model is required to provide final answers and reasoning processes. To this end, we introduce a formalism for reasoning over unstructured text, namely Text Reasoning Meaning Representation (TRMR). TRMR consists of three phrases, which is expressive enough to characterize the reasoning process to answer reading comprehension questions. We develop an annotation platform to facilitate TRMR's annotation, and release the R3 dataset, a \textbf{R}eading comprehension benchmark \textbf{R}equiring \textbf{R}easoning processes. R3 contains over 60K pairs of question-answer pairs and their TRMRs. Our dataset is available at: \url{http://anonymous}.
摘要：现有答疑系统只能预测没有明确的推理过程答案，这阻碍了他们的explainability，使我们高估了自己的理解和推理在自然语言的能力。在这项工作中，我们提出的阅读理解，其中一个模型需要提供最终答案，推理过程一个新的任务。为此，我们引入了推理在非结构化的文本，即文字推理含义表示（TRMR）形式主义。 TRMR包括三个阶段，这是表现足以表征推理过程来回答阅读理解题。我们开发了一个注解的平台，促进TRMR的注释，并释放R3数据集，一个\ textbf {R} eading理解基准\ textbf {R} equiring \ textbf {R} easoning过程。 R3包含超过60K对问题 - 回答对和他们的TRMRs的。我们的数据，请访问：\ {URL的http：//匿名}。

10. Towards Relevance and Sequence Modeling in Language Recognition [PDF] 返回目录
Bharat Padi, Anand Mohan, Sriram Ganapathy
Abstract: The task of automatic language identification (LID) involving multiple dialects of the same language family in the presence of noise is a challenging problem. In these scenarios, the identity of the language/dialect may be reliably present only in parts of the temporal sequence of the speech signal. The conventional approaches to LID (and for speaker recognition) ignore the sequence information by extracting long-term statistical summary of the recording assuming an independence of the feature frames. In this paper, we propose a neural network framework utilizing short-sequence information in language recognition. In particular, a new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task. This relevance weighting is achieved using the bidirectional long short-term memory (BLSTM) network with attention modeling. We explore two approaches, the first approach uses segment level i-vector/x-vector representations that are aggregated in the neural model and the second approach where the acoustic features are directly modeled in an end-to-end neural model. Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data as well as in the RATS language recognition corpus. In these experiments on noisy LRE tasks as well as the RATS dataset, the proposed approach yields significant improvements over the conventional i-vector/x-vector based language recognition approaches as well as with other previous models incorporating sequence information.
摘要：自动语言识别（LID）涉及同一语系的多种方言在噪声存在的任务是一个具有挑战性的问题。在这些情况下，该语言/方言的身份可以是仅在语音信号的时间序列的部分可靠地存在。常规方法LID（和用于识别说话人）忽略通过提取记录假设特征帧的独立的长期统计概要的序列信息。在本文中，我们提出了利用短序列信息的语言识别神经网络框架。特别是，一种新的模式，提出了在语言识别，其中语音数据的部分根据他们对语言识别任务的相关性加权更结合相关性。这种相关性的权重是使用以注意建模双向长短期存储器（BLSTM）网络来实现。我们探索两种方法，第一种方法的用途段水平被在神经元模型，并且其中所述声学特征在端至端神经模型直接建模的第二种方法聚集I-矢量/ X-矢量表示。实验所使用的语言识别任务使用干净的，嘈杂，多扬声器的语音数据作为RATS语言识别语料进行在NIST LRE 2017年面临的挑战，以及。在嘈杂的LRE任务，这些实验以及反恐怖数据集所提出的方法产生比传统的i-矢量/ X-基于矢量的语言识别显著的改善方法以及与其它以往机型集成序列信息。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-06

目录

摘要