摘要

1. Sentence level estimation of psycholinguistic norms using joint multidimensional annotations [PDF] 返回目录
Anil Ramakrishna, Shrikanth Narayanan
Abstract: Psycholinguistic normatives represent various affective and mental constructs using numeric scores and are used in a variety of applications in natural language processing. They are commonly used at the sentence level, the scores of which are estimated by extrapolating word level scores using simple aggregation strategies, which may not always be optimal. In this work, we present a novel approach to estimate the psycholinguistic norms at sentence level. We apply a multidimensional annotation fusion model on annotations at the word level to estimate a parameter which captures relationships between different norms. We then use this parameter at sentence level to estimate the norms. We evaluate our approach by predicting sentence level scores for various normative dimensions and compare with standard word aggregation schemes.
摘要：心理语言学normatives代表不同的情感和心理结构使用数字分数和在各种自然语言处理应用中使用。他们在句子层面常用的，它的得分是通过使用简单聚合策略，这可能并不总是最佳的推断字级分数估计。在这项工作中，我们提出了一个新的方法，在句子层面估计心理语言学规范。我们在单词级别应用上标注多维注释融合模型来估算捕捉不同规范之间的关系的参数。然后，我们使用这个参数在句子层面来估算规范。我们通过预测各种规范尺寸句子层面的分数评估我们的方法和标准字聚合方案进行比较。

2. BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages [PDF] 返回目录
Abhishek Shivkumar, Jack Weston, Raphael Lenain, Emil Fristed
Abstract: We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and simplifying clinical linguistic research. The library is built on state-of-the-art NLP frameworks and supports multithreaded/GPU-enabled feature extraction via both native Python calls and a command line interface. We describe BlaBla's architecture and clinical validation of its features across 12 diseases. We further demonstrate the application of BlaBla to a task visualizing and classifying language disorders in three languages on real clinical data from the AphasiaBank dataset. We make the codebase freely available to researchers with the hope of providing a consistent, well-validated foundation for the next generation of clinical linguistic research.
摘要：介绍BLABLA，一个开源的Python库在许多语言中提取与经过验证的临床相关的语言功能，神经系统和精神疾病。 BLABLA是加速和简化临床语言学研究一个统一的框架。该库是建立在状态的最先进的NLP框架和支持多线程经由两个本机Python呼叫和一个命令行界面/启用GPU-特征提取。我们描述了在12种疾病，其功能BLABLA的架构和临床验证。我们进一步证明BLABLA，以可视化和以三种语言从AphasiaBank数据集真正的临床数据语言障碍分类的任务的应用程序。我们做的代码库免费提供给研究人员提供下一代临床语言学研究的一致，充分验证的基础上的希望。

3. Applying the Transformer to Character-level Transduction [PDF] 返回目录
Shijie Wu, Ryan Cotterell, Mans Hulden
Abstract: The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. The model offers other benefits as well: It trains faster and has fewer parameters. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, few shows success on outperforming recurrent models with the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration. Code is available at this https URL.
摘要：变压器已被证明在不同的字级NLP任务跑赢复发基于神经网络的序列到序列模型。该模型提供了其他好处：它训练速度更快，具有较少的参数。然而，对于字符级转任务，例如形态拐点的产生和历史文本规范化，跑赢上反复型号变压器的几个节目的成功。在实证研究中，我们发现那，而相比之下，递归数列到序列模型，批量大小起着在字符级任务的变压器的性能至关重要的作用，我们表明，一个足够大的批量大小，变压器确实优于复发模型。我们还介绍了一个简单的技术处理功能，引导字符级转导进一步提高性能。有了这些认识，我们实现形态上的拐点和历史文本规范化国家的最先进的性能。我们还表明，在变压器上优于其他两种字符级转任务的强大的基线：字形到音位转换和音译。代码可在此HTTPS URL。

4. BERTweet: A pre-trained language model for English Tweets [PDF] 返回目录
Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen
Abstract: We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet is trained using the RoBERTa pre-training procedure (Liu et al., 2019), with the same model configuration as BERT-base (Devlin et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet to facilitate future research and downstream applications on Tweet data. Our BERTweet is available at: this https URL
摘要：我们目前BERTweet，第一次公开大型预训练的语言模型的英文推文。我们的BERTweet使用罗伯塔预训练过程训练（Liu等人。，2019）中，用作为BERT基相同的模型配置（Devlin等人，2019）。实验表明，BERTweet优于强基线罗伯塔·基和XLM-R-基地，生产性能更好的结果比三个分享Tweet NLP任务的先前状态的最先进的机型（Conneau等，2020）：第一部分-OF-词性标注，命名实体识别和文本分类。我们发布BERTweet，以方便今后的研究和分享Tweet数据的下游应用。我们BERTweet，请访问：此HTTPS URL

5. Examining the State-of-the-Art in News Timeline Summarization [PDF] 返回目录
Demian Gholipour Ghalandari, Georgiana Ifrim
Abstract: Previous work on automatic news timeline summarization (TLS) leaves an unclear picture about how this task can generally be approached and how well it is currently solved. This is mostly due to the focus on individual subtasks, such as date selection and date summarization, and to the previous lack of appropriate evaluation metrics for the full TLS task. In this paper, we compare different TLS strategies using appropriate evaluation frameworks, and propose a simple and effective combination of methods that improves over the state-of-the-art on all tested benchmarks. For a more robust evaluation, we also present a new TLS dataset, which is larger and spans longer time periods than previous datasets.
摘要：自动新闻时间表总结以往的工作（TLS）离开有关如何这个任务一般可以走近以及它如何目前亟待解决的不清楚的画面。这主要是由于专注于个体的子任务，如日期选择日期和总结，以及以前缺乏合适的评价指标全面TLS任务。在本文中，我们比较了不同的策略，TLS使用适当的评估框架，并提出的方法简单而有效的组合，改善了所有测试基准的国家的最先进的。对于更强大的评估，我们也提出了一种新的TLS数据集，比以前更大的数据集和跨度较长的时间段。

6. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal [PDF] 返回目录
Demian Gholipour Ghalandari, Chris Hokamp, Nghia The Pham, John Glover, Georgiana Ifrim
Abstract: Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters. We build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events, with links to external source articles. We also automatically extend these source articles by looking for related articles in the Common Crawl archive. We provide a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.
摘要：多文档文摘（MDS）旨在压缩大文件集的内容转换成短的总结和对新闻源的故事集群的重要应用，搜索结果的呈现，和时间表产生。但是，缺乏数据集在规模足够大的训练监督模型对于这个任务，切实解决这些用例。这项工作礼物MDS新的数据集是大无论是在文档聚类的总数和各个簇的大小。我们通过利用维基百科时事门户网站（WCEP），它提供了简洁和新闻事件的中性人写的总结，与链接到外部来源的文章建立这个数据集。我们还通过寻找共同抓取归档相关的文章自动扩展这些来源的文章。我们为国家的最先进的几种MDS技术提供的数据集和实证结果的定量分析。

7. Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources [PDF] 返回目录
Magdalena Biesialska, Bardia Rafieian, Marta R. Costa-jussà
Abstract: In this work, we present an effective method for semantic specialization of word vector representations. To this end, we use traditional word embeddings and apply specialization methods to better capture semantic relations between words. In our approach, we leverage external knowledge from rich lexical resources such as BabelNet. We also show that our proposed post-specialization method based on an adversarial neural network with the Wasserstein distance allows to gain improvements over state-of-the-art methods on two tasks: word similarity and dialog state tracking.
摘要：在这项工作中，我们提出了一个字向量表示的语义专业化的有效方法。为此，我们使用传统的文字的嵌入和应用专业化的方法来词之间更好地捕捉语义关系。在我们的方法，我们利用从丰富的词汇资源外部知识，如BabelNet。我们还表明，基于与Wasserstein的距离的对抗神经网络对我们提出的后专业化方法允许获得两个任务在国家的最先进方法的改进：单词相似性和对话状态跟踪。

8. Leveraging Graph to Improve Abstractive Multi-Document Summarization [PDF] 返回目录
Wei Li, Xinyan Xiao, Jiachen Liu, Hua Wu, Haifeng Wang, Junping Du
Abstract: Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.
摘要：图形是文本单位之间的关系，捕获有来自多个文档检测突出的信息，并生成整体连贯摘要很大的好处。在本文中，我们开发一种可以利用的文件，如相似度图形和图表的话语公知的图形表示，以更有效地处理多个输入文件和产生抽象摘要神经抽象多文档文摘（MDS）模型。我们的模型利用图表来编码文件，以捕捉跨文档的关系，这是总结长文档的关键。我们的模型也可以采取图的优点引导摘要生成过程，这是用于产生相干和简明摘要是有益的。此外，预训练的语言模型可以很容易地与我们的模型，从而进一步显著改善总结性能相结合。在WikiSum和MultiNews数据集表明，该架构带来了几个强大的基线实质性的改进实证结果。

9. GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering [PDF] 返回目录
Pierluigi Cassotti, Annalina Caputo, Marco Polignano, Pierpaolo Basile
Abstract: This paper describes the system proposed for the SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focused our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or loosed senses. To this end, we defined a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compared the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.
摘要：本文介绍了提出了SemEval-2020任务1系统：无监督词义变化检测。我们专注于发现问题，我们的做法。鉴于在不同的时间段时间的嵌入字捕获单词的语义，我们调查使用的无监督方法，当目标词已取得或松动的感官来检测。为此，我们定义了基于高斯混合模型集群计算在两个时期的目标相似的新算法。我们比较了一些基于相似性阈值的所提出的方法。我们发现，虽然检测方法的性能跨越字嵌入算法不同，高斯混合颞引用的组合导致我们最好的系统。

10. Positive emotions help rank negative reviews in e-commerce [PDF] 返回目录
Di Weng, Jichang Zhao
Abstract: Negative reviews, the poor ratings in postpurchase evaluation, play an indispensable role in e-commerce, especially in shaping future sales and firm equities. However, extant studies seldom examine their potential value for sellers and producers in enhancing capabilities of providing better services and products. For those who exploited the helpfulness of reviews in the view of e-commerce keepers, the ranking approaches were developed for customers instead. To fill this gap, in terms of combining description texts and emotion polarities, the aim of the ranking method in this study is to provide the most helpful negative reviews under a certain product attribute for online sellers and producers. By applying a more reasonable evaluating procedure, experts with related backgrounds are hired to vote for the ranking approaches. Our ranking method turns out to be more reliable for ranking negative reviews for sellers and producers, demonstrating a better performance than the baselines like BM25 with a result of 8% higher. In this paper, we also enrich the previous understandings of emotions in valuing reviews. Specifically, it is surprisingly found that positive emotions are more helpful rather than negative emotions in ranking negative reviews. The unexpected strengthening from positive emotions in ranking suggests that less polarized reviews on negative experience in fact offer more rational feedbacks and thus more helpfulness to the sellers and producers. The presented ranking method could provide e-commerce practitioners with an efficient and effective way to leverage negative reviews from online consumers.
摘要：负面评价，在购买后评价的差评，在电子商务中发挥着不可或缺的作用，特别是在塑造未来的销售和公司股票。然而，现存的研究很少检查增强提供更好的服务和产品的能力，为销售者与生产者的潜在价值。对于那些谁利用了电子商务的饲养者的观点评语乐于助人的排名方法被用于客户开发的代替。为了填补这一空白，在结合描述文本和情感极性方面，在这项研究中的排名方法的目的是提供在一定的产品属性最有用的负面评论，网上销售者与生产者。通过应用更合理的评估过程中，与相关背景的专家的聘请投票排名的方法。我们的排名方法原来是为排名销售者与生产者负面评价，证明不是像BM25的基线8％以上的结果更好的性能更可靠。在本文中，我们也丰富情感的理解以前在评估审查。具体而言，惊讶地发现，正面情绪是更有帮助，而不是排名负面评价消极情绪。意外从加强排名的积极情绪暗示在事实上提供更合理的反馈，负面经验较少的极化审查，因而更乐于助人的销售者与生产者。所提出的排序方法可提供电子商务从业者与有效的方式从网上消费者杠杆的负面评论。

11. Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability [PDF] 返回目录
Mariflor Vega-Carrasco, Jason O'sullivan, Rosie Prior, Ioanna Manolopoulou, Mirco Musolesi
Abstract: Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.
摘要：了解购物动机市场篮子背后都有在食品零售行业很高的商业价值。分析购物交易要求的技术，可以在保持可解释的结果与卷和杂货交易数据的维度应付。隐含狄利克雷分布（LDA）提供了工艺杂货交易一个适当的框架，并发现客户的广泛代表性的购物动机。但是，总结的LDA模型的后验分布是具有挑战性的，而单独的LDA平可能不连贯，无法捕捉的主题不确定性。此外，LDA模型的评估是通过模型拟合措施可能不足以捕捉诸如解释性和主题的稳定质量方面占主导地位。在本文中，我们介绍了聚类方法，它后处理后LDA吸引来概括整个后验分布，并确定表示为复发性主题语义模式。我们的方法是标准标记交换技术的替代，并提供一个单一的后总结组主题，以及不确定性相关联的措施。此外，我们建立模型评估，其评估不仅基于他们的可能性，同时也对他们的连贯性，独特性和稳定性主题模型更全面的定义。通过调查的方式，我们为主题的连贯性和话题的相似性在杂货零售数据的域名解释设置阈值。我们表明，经常性的主题，通过我们的集群方法的选择，不仅提高了模型的可能性也优于LDA的定性方面，如解释性和稳定性。我们说明了从英国一家大型连锁超市的例子中，我们的方法。

12. A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition [PDF] 返回目录
Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu
Abstract: End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR). One of their advantages is the simplicity of building that directly recognizes the speech frame sequence into the text label sequence by neural networks. According to the driving end in the recognition process, end-to-end ASR models could be categorized into two types: label-synchronous and frame-synchronous, each of which has unique model behaviour and characteristic. In this work, we make a detailed comparison on a representative label-synchronous model (transformer) and a soft frame-synchronous model (continuous integrate-and-fire (CIF) based model). The results on three public dataset and a large-scale dataset with 12000 hours of training data show that the two types of models have respective advantages that are consistent with their synchronous mode.
摘要：最终，以中高端机型也日益受到自动语音识别（ASR）领域更广泛的关注。他们的一个优势是建筑的简单直接识别语音帧序列成神经网络的文本标签序列。根据在识别过程中的驱动端，端至端ASR模型可以被分类为两种类型：标签同步和帧同步，每一个都具有独特的模式的行为和特性。在这项工作中，我们做出有代表性的标签同步模式（变压器）和软帧同步模式（连续集成和火（CIF）基于模型）的详细比较。三个公共数据集和12000小时练习数据的大规模数据集的结果表明，这两种类型的模型有与他们同步模式一致的各自优势。

13. Investigation of Large-Margin Softmax in Neural Language Modeling [PDF] 返回目录
Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney
Abstract: To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept into the softmax is reported to have good properties such as enhanced discriminative power, less overfitting and well-defined geometric intuitions. Nowadays, language modeling is commonly approached with neural networks using softmax and cross entropy. In this work, we are curious to see if introducing large-margins to neural language models would improve the perplexity and consequently word error rate in automatic speech recognition. Specifically, we first implement and test various types of conventional margins following the previous works in face recognition. To address the distribution of natural language data, we then compare different strategies for word vector norm-scaling. After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition. We find that although perplexity is slightly deteriorated, neural language models with large-margin softmax can yield word error rate similar to that of the standard softmax baseline. Finally, expected margins are analyzed through visualization of word vectors, showing that the syntactic and semantic relationships are also preserved.
摘要：为了鼓励训练的特征向量之间的类内致密性和类间可分性，大利润SOFTMAX方法开发和人脸识别社会广泛的应用。在引入大余量概念到SOFTMAX被报道具有良好的性质，例如增强的辨别力，更少过拟合的和明确的几何直觉。如今，语言模型通常使用SOFTMAX和交叉熵神经网络走近。在这项工作中，我们很好奇，看看是否引入大的利润神经语言模型将提高自动语音识别的困惑，因此字错误率。具体而言，我们首先实施和测试以下人脸识别以往的作品不同类型的传统利润率。为了解决自然语言数据的分布，然后比较单词矢量范缩放不同的策略。在那之后，我们应用的最佳规范缩放设置与各种利润率和行为神经语言模型再评分在自动语音识别的实验组合。我们发现，虽然困惑稍有恶化，大量利润SOFTMAX神经语言模型可以产生类似的标准SOFTMAX基线的字错误率。最后，预期的利润是通过词矢量的可视化分析，显示出句法和语义关系也都得以保留。

14. On embedding Lambek calculus into commutative categorial grammars [PDF] 返回目录
Sergey Slavnov
Abstract: Abstract categorial grammars (ACG), as well as some other, closely related systems, are based on the ordinary, commutative implicational linear logic and linear $\lambda$-calculus in contrast to the better known "noncommutative" Lambek grammars and their variations. ACG seem attractive in many ways, not the least of which is the simplicity of the underlying logic. Yet it is known that ACG and their relatives behave poorly in modeling many natural language phenomena (such as, for example, coordination) compared to "noncommutative" formalisms. Therefore different solutions have been proposed in order to enrich ACG with noncommutative constructions. Tensor grammars of this work are another example of "commutative" grammars, based on the classical, rather than intuitionistic linear logic. They can be seen as a surface representation of ACG in the sense that derivations of ACG translate to derivations of tensor grammars and this translation is isomorphic on the level of string languages. An advantage of this representation, as it seems to us, is that the syntax becomes extremely simple and a direct geometric meaning is transparent. We address the problem of encoding noncommutative operations in our setting. This turns out possible after enriching the system with new unary operators. The resulting system allows representing both ACG and Lambek grammars as conservative fragments, while the formalism remains, as it seems to us, rather simple and intuitive.
摘要：摘要范畴语法（ACG），以及一些其他密切相关的系统，是基于普通，可交换蕴涵线性逻辑和线性$ \ $拉姆达在演算对比的是更好地称为“非交换” Lambek语法和其变化。 ACG似乎在许多方面，而不是其中的是基本逻辑的简单性的吸引力。然而，众所周知，ACG和他们的亲属表现不佳建模许多自然语言现象相比，“非交换”的形式主义（如，例如，协调）。因此不同的解决方案，以便与非交换结构，丰富ACG被提出。这项工作的张量文法的“交换”语法另一示例中，基于经典的，而不是线性直观逻辑。它们可以被看作是ACG在这个意义上，ACG的推导转化为张文法推导表面表示和这个翻译是字符串语言水平同构。这种表示的优点，因为在我们看来，是语法变得非常简单和直接的几何意义是透明的。我们解决我们设置编码非交换操作的问题。这结果可能与新的一元运算符丰富了系统之后。最终的系统允许同时代表ACG和Lambek语法保守片段，而形式主义遗体，因为在我们看来，相当简单和直观。

15. Early Stage LM Integration Using Local and Global Log-Linear Combination [PDF] 返回目录
Wilfried Michel, Ralf Schlüter, Hermann Ney
Abstract: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training.
摘要：序列到序列模型与一个隐含的对准机构（例如注意力）的闭合用于自动语音识别的任务对传统混合动力车隐马尔可夫模型的性能差距（HMM）。提高这两种情况下的字错误率的一个重要因素是使用的培训上大纯文本语料库外部语言模型（LM）的。语言模型整合是必然的声学模型和语言模型的经典基于HMM的建模清晰的分离。相比而言，多个整合方案已提出了关注的机型。在这项工作中，我们提出了语言模型集成到基于隐式排列顺序对序列模型的新方法。的声学和语言模型对数 - 线性模型组合与每个令牌重整化进行。这允许我们计算全面正常化长期有效无论是在训练和测试。这相对于一个全球性的重整化方案，该方案相当于在训练中的应用浅层融合。所提出的方法展示我们国家的最先进的Librispeech系统在标准模型的组合（浅融合）良好的改善。此外，改进即使LM被用于训练后更强大的一个交换执着。

16. Creative Artificial Intelligence -- Algorithms vs. humans in an incentivized writing competition [PDF] 返回目录
Nils Köbis, Luca Mossink
Abstract: The release of openly available, robust text generation algorithms has spurred much public attention and debate, due to algorithm's purported ability to generate human-like text across various domains. Yet, empirical evidence using incentivized tasks to assess human behavioral reactions to such algorithms is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of multiple algorithmically-generated poems. From these samples, either a random poem was chosen (Human-out-of-the-loop) or the best one was selected (Human-in-the-loop) and in turn matched with a human written poem. Taking part in a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmically-generated poems in the human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, the results reveal a general aversion towards algorithmic poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in experimental settings.
摘要：公开提供，强大的文本生成算法的释放刺激了不少市民的关注和辩论，由于算法的本意，产生类似人类在各个领域的文字能力。然而，使用诱因任务人类行为反应评估这些算法的经验证据不足。我们进行了两个实验评估的状态的最先进的自然语言生成算法GPT-2（NTOTAL = 830）行为反应。使用人诗的相同起始线，GPT-2产生的多个算法生成的诗样品。从这些样品中，无论是随机的诗选择（人类外的所述环）或最好的一个被选择（以人在半实物）并依次用人类书面诗匹配。参加图灵测试的新版本的诱因，与会者未能可靠地检测在人类中最环路处理的算法生成的诗歌，却成功地以人乱的循环处理。此外，结果显示对算法的诗一般的厌恶，独立的参与者是否了解这首诗（透明度）或不（不透明度）的算法起源。我们讨论一下这些结果传达的NLG算法的性能，产生类似人类的文字和建议的方法来研究实验设置这样的学习算法。

17. Relative Positional Encoding for Speech Recognition and Direct Translation [PDF] 返回目录
Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel
Abstract: Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition is relative distance between input states in the self-attention network. As a result, the network can better adapt to the variable distributions present in speech data. Our experiments show that our resulting model achieves the best recognition result on the Switchboard benchmark in the non-augmentation condition, and the best published result in the MuST-C speech translation benchmark. We also show that this model is able to better utilize synthetic data than the Transformer, and adapts better to variable sentence segmentation quality for speech translation.
摘要：变压器型号是强大的序列到序列架构，能够直接映射语音输入来转录或翻译。然而，对于模拟在这个模型位置机制是专为文本建模，从而对声音输入不太理想的。在这项工作中，我们调整相对位置编码方案语音变压器，其中关键除了是自重视网络在输入状态之间的相对距离。其结果是，网络可以更好地适应存在于语音数据的可变分布。我们的实验表明，我们得到的模型实现了对非增强条件的总机基准，并在一定-C语音翻译基准的最佳发布结果最好的识别结果。我们还表明，这种模式能够更好地利用除变压器合成数据，并更好地适应变量句子切分品质语音翻译。

18. End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors [PDF] 返回目录
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu
Abstract: End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers. This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexible number of attractors from a speech embedding sequence. Then, the generated multiple attractors are multiplied by the speech embedding sequence to produce the same number of speaker activities. The speech embedding sequence is extracted using the conventional self-attentive end-to-end neural speaker diarization (SA-EEND) network. In a two-speaker condition, our method achieved a 2.69 % diarization error rate (DER) on simulated mixtures and a 8.07 % DER on the two-speaker subset of CALLHOME, while vanilla SA-EEND attained 4.56 % and 9.54 %, respectively. In unknown numbers of speakers conditions, our method attained a 15.29 % DER on CALLHOME, while the x-vector-based clustering method achieved a 19.43 % DER.
摘要：结束到终端的未知数音箱的扬声器diarization本文是写给。最近提出的端 - 端扬声器diarization优于传统的基于聚类的扬声器diarization，但它有一个缺点：它是扬声器的数量方面不够灵活。本文提出了一种编码器 - 解码器基于吸引计算（EDA）的方法，其中首先从语音嵌入序列生成吸引的柔性数。然后，将产生的多个吸引由语音嵌入序列相乘，以产生相同数量的扬声器的活动。语音嵌入序列是使用传统的自细心的端至端的神经扬声器diarization（SA-EEND）网络萃取。在两个扬声器情况下，我们的方法来实现上模拟的混合物2.69％diarization错误率（DER）和上CALLHOME的两个扬声器子集的8.07％DER，而香草SA-EEND分别达到4.56％和9.54％。在扬声器条件未知号码，我们的方法来实现上CALLHOME一个15.29％DER，而基于X-向量聚类方法来实现一个19.43％DER。

19. A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition [PDF] 返回目录
Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li
Abstract: Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. However, many aspects of MPC have not been fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks. Experiments reveled that pre-training data with matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided 8.46% relative error reduction on streaming model trained on HKUST. Also, the combination of target data adaption and layer-wise discriminative training helped the knowledge transfer of MPC, which achieved 3.99% relative error reduction on AISHELL over a strong baseline.
摘要：建立一个良好的语音识别系统通常需要大量的转录数据，这是收集昂贵的。为了解决这个问题，许多无人监管前的训练方法被提出。在这些方法中，蒙面预测编码实现与BERT般的蒙面重建损耗和变压器骨干各种语音识别数据集显著的改善。然而，MPC的许多方面还没有得到充分的调查。在本文中，我们将进行进一步的MPC和三个重要方面的研究重点：更好地传递所学的知识，从预训练阶段前的训练数据的说话风格的影响，其对流模型的扩展，以及如何向下游的任务。实验陶醉的是前培训的数据相匹配的说话方式是对下游识别任务更加有用。与APC和MPC一个统一的培养目标上的流模型中训练的香港科技大学提供的相对误差减少8.46％。此外，目标数据适配和逐层判别训练的组合帮助MPC的知识转移，这在强基线上AISHELL实现相对误差减少3.99％。

20. PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR [PDF] 返回目录
Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
Abstract: We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit. Unlike other PyTorch and Kaldi based ASR toolkits, PyChain is designed to be as flexible and light-weight as possible so that it can be easily plugged into new ASR projects, or other existing PyTorch-based ASR tools, as exemplified respectively by a new project PyChain-example, and Espresso, an existing end-to-end ASR toolkit. PyChain's efficiency and flexibility is demonstrated through such novel features as full GPU training on numerator/denominator graphs, and support for unequal length sequences. Experiments on the WSJ dataset show that with simple neural networks and commonly used machine learning techniques, PyChain can achieve competitive results that are comparable to Kaldi and better than other end-to-end ASR systems.
摘要：所谓的\ EMPH我们目前PyChain，一个完全并行PyTorch实现终端到终端的无格最大互信息（LF-MMI）培训{链模式}在Kaldi自动语音识别（ASR）工具包。不像其他的基于PyTorch和Kaldi ASR工具包，PyChain被设计成柔性且重量轻越好，以便它可以容易地插入到新的ASR项目，或其它现有的基于PyTorch-ASR工具，如分别由一个新的项目列举PyChain-例子，咖啡，现有的端部到端ASR工具包。 PyChain的效率和灵活性是通过这样的新颖特征，但是对分子/分母图表充分GPU训练，并且支持不等长度的序列证实。在华尔街日报的数据集的实验表明用简单的神经网络和常用的机器学习技术，PyChain可以做到这一点是相当Kaldi比另一端至端的ASR系统更好的竞争结果。

21. A Computational Analysis of Polarization on Indian and Pakistani Social Media [PDF] 返回目录
Aman Tyagi, Anjalie Field, Priyank Lathwal, Yulia Tsvetkov, Kathleen M. Carley
Abstract: Between February 14, 2019 and March 4, 2019, a terrorist attack in Pulwama, Kashmir followed by retaliatory air strikes led to rising tensions between India and Pakistan, two nuclear-armed countries. In this work, we examine polarizing messaging on Twitter during these events, particularly focusing on the positions of Indian and Pakistani politicians. We use label propagation technique focused on hashtag cooccurences to find polarizing tweets and users. Our analysis reveals that politicians in the ruling political party in India (BJP) used polarized hashtags and called for escalation of conflict more so than politicians from other parties. Our work offers the first analysis of how escalating tensions between India and Pakistan manifest on Twitter and provides a framework for studying polarizing messages.
摘要：2019年2月14日和2019年3月4日，在普尔瓦马恐怖袭击期间，克什米尔其次是导致印度和巴基斯坦两个核武装国家之间的紧张局势升温报复性空袭。在这项工作中，我们研究这些事件中的偏振的Twitter消息，尤其集中在印度和巴基斯坦政治家的立场。我们使用标签繁殖技术重点包括hashtag cooccurences找到偏光鸣叫和用户。我们的分析表明，在印度（BJP）的执政党政客用偏光井号标签，并呼吁冲突升级远远超过其他党派的政治家。我们的工作提供了如何在Twitter上不断升级的印度和巴基斯坦清单之间的紧张关系的第一分析，为研究极化信息的框架。

22. GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [PDF] 返回目录
Javad Rafiei Asl, Juan M. Banda
Abstract: Automated methods for granular categorization of large corpora of text documents have become increasingly more important with the rate scientific, news, medical, and web documents are growing in the last few years. Automatic keyphrase extraction (AKE) aims to automatically detect a small set of single or multi-words from within a single textual document that captures the main topics of the document. AKE plays an important role in various NLP and information retrieval tasks such as document summarization and categorization, full-text indexing, and article recommendation. Due to the lack of sufficient human-labeled data in different textual contents, supervised learning approaches are not ideal for automatic detection of keyphrases from the content of textual bodies. With the state-of-the-art advances in text embedding techniques, NLP researchers have focused on developing unsupervised methods to obtain meaningful insights from raw datasets. In this work, we introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of AKE. GLEAKE utilizes single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases and then combines them into a series of embedding-based graphs. Moreover, GLEAKE applies network analysis techniques on each embedding-based graph to refine the most significant phrases as a final set of keyphrases. We demonstrate the high performance of GLEAKE by evaluating its results on five standard AKE datasets from different domains and writing styles and by showing its superiority with regards to other state-of-the-art methods.
摘要：大语料库的文本文档的粒状自动分类方法已经越来越科学，新闻，医疗和网络文件在过去几年增长速度更重要。自动关键短语提取（AKE）的目标自动地从一个单一的文本文档中检测的小集合的单个或多个单词捕获文档的主要议题。 AKE起着不同的自然语言处理和信息检索任务，如文档汇总和分类，全文索引和文章推荐了重要的作用。由于缺乏在不同的文本内容足够的人力标记的数据，监督学习方法是不理想的，从文本主体内容的关键字句的自动检测。随着文本嵌入技术的国家的最先进的进步，自然语言处理研究人员专注于开发无监督的方法来获得从原数据集有意义的见解。在这项工作中，我们介绍了全局和局部嵌入自动的关键词提取（GLEAKE）为AKE的任务。 GLEAKE利用单和多字嵌入技术探索候选短语的句法和语义方面，然后将它们组合成一个系列嵌入基于图形。此外，GLEAKE上每个基于埋设 - 图应用于网络分析技术来细化最显著短语最后一组关键短语的。我们从不同的域评估其对五个标准AKE数据集的结果和写作风格，并通过展示与关于其他国家的最先进的方法，其优越性表现出GLEAKE的高性能。

23. Exploring Transformers for Large-Scale Speech Recognition [PDF] 返回目录
Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong
Abstract: While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition. Most studies with Transformers have been constrained in a relatively small scale setting, and some forms of data argumentation approaches are usually applied to combat the data sparsity issue. In this paper, we aim at understanding the behaviors of Transformers in the large-scale speech recognition setting, where we have used around 65,000 hours of training data. We investigated various aspects on scaling up Transformers, including model initialization, warmup training as well as different Layer Normalization strategies. In the streaming condition, we compared the widely used attention mask based future context lookahead approach to the Transformer-XL network. From our experiments, we show that Transformers can achieve around 6% relative word error rate (WER) reduction compared to the BLSTM baseline in the offline fashion, while in the streaming fashion, Transformer-XL is comparable to LC-BLSTM with 800 millisecond latency constraint.
摘要：虽然回归神经网络仍然主要定义的国家的最先进的语音识别系统，变压器网络已经被证明是有竞争力的选择，尤其是在离线状态。变形金刚大多数研究是限定在一个比较小的比例设置，以及数据论证方法的一些形式通常应用于打击数据稀疏问题。在本文中，我们的目标是了解变形金刚的行为，大规模的语音识别设置，在这里我们使用周围65000小时训练数据。我们研究了关于加强对变压器，包括模型初始化，热身训练以及不同层标准化战略的各个方面。在流状况，我们比较广泛使用的关注度掩基于未来情境前瞻方法的变压器-XL网络。从我们的实验中，我们发现变压器相比可在离线方式BLSTM基线实现（WER）减少约6％的相对字错误率，而在流媒体的方式，变压器-XL媲美LC-BLSTM 800毫秒的延迟约束。

24. Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey [PDF] 返回目录
Ammar Rashed, Mucahid Kutlu, Kareem Darwish, Tamer Elsayed, Cansın Bayrak
Abstract: On June 24, 2018, Turkey conducted a highly consequential election in which the Turkish people elected their president and parliament in the first election under a new presidential system. During the election period, the Turkish people extensively shared their political opinions on Twitter. One aspect of polarization among the electorate was support for or opposition to the reelection of Recep Tayyip Erdoğan. In this paper, we present an unsupervised method for target-specific stance detection in a polarized setting, specifically Turkish politics, achieving 90% precision in identifying user stances, while maintaining more than 80% recall. The method involves representing users in an embedding space using Google's Convolutional Neural Network (CNN) based multilingual universal sentence encoder. The representations are then projected onto a lower dimensional space in a manner that reflects similarities and are consequently clustered. We show the effectiveness of our method in properly clustering users of divergent groups across multiple targets that include political figures, different groups, and parties. We perform our analysis on a large dataset of 108M Turkish election-related tweets along with the timeline tweets of 168k Turkish users, who authored 213M tweets. Given the resultant user stances, we are able to observe correlations between topics and compute topic polarization.
摘要：在2018年6月24日，土耳其进行了高度间接选举中，土耳其人民选出自己的第一次选举总统和议会下一个新的总统制。在选举期间，土耳其人民广泛分享在Twitter上自己的政治观点。选民之间的两极分化的一个方面是支持或反对雷杰普·塔伊普·埃尔多安连任。在本文中，我们提出了目标特异性姿态检测无监督方法在极化设定，具体地土耳其政治，在确定用户的立场达到90％的精度，同时保持80％以上的召回。该方法包括使用谷歌的卷积神经网络（CNN）的多语言通用句话编码器在嵌入空间代表用户。然后，将表示被投影到以反映相似的方式较低维空间中，并且因此聚类。我们发现在整个包括政治人物，不同的群体，和政党多目标正确聚类不同群体的用户提供方法的有效性。我们与土耳其168K的用户，谁撰写了213M的tweets的时间轴沿鸣叫履行我们的大型数据集的108M土耳其与选举有关的tweet分析。鉴于最终用户的立场，我们能够观察到的话题和计算话题极化之间的相关性。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-05-21

目录

摘要