摘要

1. Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty [PDF] 返回目录
Katherine A. Keith, Christoph Teichmann, Brendan O'Connor, Edgar Meij
Abstract: Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors' annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.
摘要：方法和应用科学有着千丝万缕的联系，特别是在文本作为数据域。在本文中，我们研究一个这样的文本作为数据应用，建立经济指标，从新闻中出现的关键字措施的经济政策的不确定性。该指数，其显示关联公司与投资，就业和超额市场回报，已在私营部门和学术界的实质性影响。然而，当我们重新审视和扩展原作者的注释和文字的测量，我们发现有趣的文字作为数据方法论研究问题：（1）注释的分歧模糊性语言的反映？（2）做替代文本度与相关彼此之间以及与外部的预测效度的措施？我们发现此应用程序（1）经济政策的不确定性的一些注释分歧可以归因于模糊性语言，和（2）的切换，从关键字匹配监督的机器学习相关性低，一个关于寓意为的有效性分类结果的测量索引。

2. Recursive Top-Down Production for Sentence Generation with Latent Trees [PDF] 返回目录
Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville
Abstract: We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate performance on two synthetic tasks: SCAN (Lake and Baroni, 2017), where it outperforms previous models on the LENGTH split, and English question formation (McCoy et al., 2020), where it performs comparably to decoders with the ground-truth tree structure. We also present experimental results on German-English translation on the Multi30k dataset (Elliott et al., 2016), and qualitatively analyse the induced tree structures our model learns for the SCAN tasks and the German-English translation task.
摘要：我们对天然和合成语言上下文无关文法的递归生产属性模型。为此，我们提出了一个动态规划算法，超过潜在二叉树结构以$ N $叶子，让我们来计算$ N $序列的可能性被边缘化的潜树模型下定情，我们最大限度地培养出递归神经功能。我们证明两个合成任务上的表现：（McCoy等人，2020年）SCAN（湖和巴罗尼，2017年），在那里它优于上述长度分割之前的机型，和英语问题的形成，它与地面相当执行到解码器真树结构。我们还对Multi30k数据集上德语英语翻译目前的实验结果（Elliott等人，2016），以及定性分析诱发树形结构我们的模型中获悉的扫描任务和德英翻译任务。

3. Learning Context-Free Languages with Nondeterministic Stack RNNs [PDF] 返回目录
Brian DuSell, David Chiang
Abstract: We present a differentiable stack data structure that simultaneously and tractably encodes an exponential number of stack configurations, based on Lang's algorithm for simulating nondeterministic pushdown automata. We call the combination of this data structure with a recurrent neural network (RNN) controller a Nondeterministic Stack RNN. We compare our model against existing stack RNNs on various formal languages, demonstrating that our model converges more reliably to algorithmic behavior on deterministic tasks, and achieves lower cross-entropy on inherently nondeterministic tasks.
摘要：本文提出了一种微堆栈数据结构，能同时和tractably编码堆栈配置的指数数量的基础上，郎咸平的模拟算法不确定性下推自动机。我们把这个数据结构与回归神经网络（RNN）控制器具有不确定性的堆栈RNN组合。我们比较我们的模型对各种形式语言存在栈RNNs，这表明我们的模型收敛更可靠地对确定性的任务算法行为，达到降低对固有的不确定性任务交叉熵。

4. Scaling Systematic Literature Reviews with Machine Learning Pipelines [PDF] 返回目录
Seraphina Goldfarb-Tarrant, Alexander Robertson, Jasmina Lazic, Theodora Tsouloufi, Louise Donnison, Karen Smyth
Abstract: Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15% of the time it takes to do the whole review manually and can be repeated and extended to new data with no additional effort.
摘要：系统评价，这意味着从大量的科学文件数据的提取，是机器学习的应用的理想途径。他们是科学和慈善事业的许多领域是至关重要的，但非常耗时，需要专家。然而，一个系统评价的三个主要阶段很容易自动完成：搜索文档可以通过API和刮削器，相关文件的选择可以通过二元分类来完成，而数据的提取可以通过序列标签分类来进行完成。尽管自动化的对这一领域的承诺，小小的研究存在检查各种方式这些任务自动化。我们构建一个管道，它可以自动每个方面，并试验了许多人，时间与系统质量的权衡。我们测试的分类器对少量数据很好地工作，并推广到从训练数据未代表的国家数据的能力。我们测试了不同类型的数据提取与标注不同的难度，和五个不同的神经结构做了提取。我们发现，我们可以得到整个管道系统的令人惊讶的准确性和可推广性与仅仅两周人类专家的注解，这是需要手工完成整个检讨，并可以重复并扩展到新数据的时间只有15％的没有额外的努力。

5. Case Study: Deontological Ethics in NLP [PDF] 返回目录
Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black
Abstract: Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie these efforts. In this work, we study one ethical theory, namely deontological ethics, from the perspective of NLP. In particular, we focus on the generalization principle and the respect for autonomy through informed consent. We provide four case studies to demonstrate how these principles can be used with NLP systems. We also recommend directions to avoid the ethical issues in these systems.
摘要：在自然语言处理（NLP）最近的工作主要集中在如理解并减轻数据和算法偏差伦理挑战;像识别憎恨言论，定型和攻击性的语言反感的内容;并建立框架，以提高系统的设计和数据处理的做法。然而，一直存在能够解释他们这些努力的道德基础的讨论很少。在这项工作中，我们研究了一个道德理论，即义务伦理学，从NLP的角度。尤其是，我们专注于推广原则，通过知情同意的自主权的尊重。我们提供了四个案例研究，以展示如何将这些原则可以用NLP系统中使用。我们还建议方向，以避免在这些系统中的伦理问题。

6. LSTMs Compose (and Learn) Bottom-Up [PDF] 返回目录
Naomi Saphra, Adam Lopez
Abstract: Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the \textit{learning} process that leads to their compositional behavior. For a closer look at how an LSTM's sequential representations are composed hierarchically, we present a related measure of Decompositional Interdependence (DI) between word meanings in an LSTM, based on their gate interactions. We connect this measure to syntax with experiments on English language data, where DI is higher on pairs of words with lower syntactic distance. To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data. These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than learning the longer-range relations independently from children.
摘要：NLP显示最近的工作，在语言数据LSTM语言模型捕捉分层结构。相较于现有的工作，我们考虑\ textit {}学习的过程，导致其成分的行为。对于在LSTM的顺序表示如何分级由定睛一看，我们目前在LSTM词义之间的相互依存Decompositional（DI）的相关措施，根据他们的门互动。我们这个措施连接到语法与英语语言数据，其中DI是对低句法距离的话较高的实验。为了探讨感性的偏见是导致训练过程中出现的这些成分的表示，我们对合成数据进行简单的实验。这些合成实验支持有关如何层次结构被发现在培训过程中一个特殊的假设：即LSTM组成申述了解到自下而上，依靠自己的矮个子女的有效交涉，而不是从孩子自主学习的时间越长程关系。

7. High-order Semantic Role Labeling [PDF] 返回目录
Zuchao Li, Hai Zhao, Rui Wang, Kevin Parnow
Abstract: Semantic role labeling is primarily used to identify predicates, arguments, and their semantic relationships. Due to the limitations of modeling methods and the conditions of pre-identified predicates, previous work has focused on the relationships between predicates and arguments and the correlations between arguments at most, while the correlations between predicates have been neglected for a long time. High-order features and structure learning were very common in modeling such correlations before the neural network era. In this paper, we introduce a high-order graph structure for the neural semantic role labeling model, which enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models and further boost our baseline to achieve new state-of-the-art results.
摘要：语义角色标注主要是用来识别谓词，参数和他们的语义关系。由于建模方法的局限性和预鉴定谓词的条件，以前的工作主要集中在谓词和参数，并在大部分参数之间的相关性之间的关系，而谓语之间的相关性被忽视了很长的时间。高阶的功能和结构学习的神经网络时代之前，这样的造型是相关性非常普遍。在本文中，我们介绍了神经语义角色标注模型，这使得该模型明确考虑的不仅仅是孤立谓词参数的对，但也谓词参数的对之间的相互作用高阶图形结构。在7种语言的CoNLL-2009基准测试表明，高阶结构学习方法是强者执行SRL模式有益的，进一步提升我们的基础，实现国家的最先进的新成果的试验结果。

8. Grid Tagging Scheme for Aspect-oriented Fine-grained Opinion Extraction [PDF] 返回目录
Zhen Wu, Chengcan Ying, Fei Zhao, Zhifang Fan, Xinyu Dai, Rui Xia
Abstract: Aspect-oriented Fine-grained Opinion Extraction (AFOE) aims at extracting aspect terms and opinion terms from review in the form of opinion pairs or additionally extracting sentiment polarity of aspect term to form opinion triplet. Because of containing several opinion factors, the complete AFOE task is usually divided into multiple subtasks and achieved in the pipeline. However, pipeline approaches easily suffer from error propagation and inconvenience in real-world scenarios. To this end, we propose a novel tagging scheme, Grid Tagging Scheme (GTS), to address the AFOE task in an end-to-end fashion only with one unified grid tagging task. Additionally, we design an effective inference strategy on GTS to exploit mutual indication between different opinion factors for more accurate extractions. To validate the feasibility and compatibility of GTS, we implement three different GTS models respectively based on CNN, BiLSTM, and BERT, and conduct experiments on the aspect-oriented opinion pair extraction and opinion triplet extraction datasets. Extensive experimental results indicate that GTS models outperform strong baselines significantly and achieve state-of-the-art performance.
摘要：面向方面细粒度意见萃取（AFOE）的目的是在看来对的形式从评论中提取方面的条款和条件的意见或附加地提取方面术语的情感极性以形成三重意见。由于包含几项民意因素，完全AFOE任务通常分为多个子任务，并在管道中实现。然而，管道易办法在现实世界的场景错误传播和不便受苦。为此，我们提出了一个新的标记方案，电网标记方案（GTS），只能用一个统一的网格标记的任务，以解决在终端到终端的时尚AFOE任务。此外，我们在GTS设计一个有效的推理策略，以利用不同的看法的因素更准确地提取相互指示。为了验证GTS的可行性和兼容性，我们基于CNN，BiLSTM和BERT，并在面向方面的意见对的提取和舆论三重提取数据集进行实验分别实现三种不同的GTS车型。大量的实验结果表明，GTS车型显著超越强大的基线和实现国家的最先进的性能。

9. Recurrent babbling: evaluating the acquisition of grammar from limited input data [PDF] 返回目录
Ludovica Pannitto, Aurélie Herbelot
Abstract: Recurrent Neural Networks (RNNs) have been shown to capture various aspects of syntax from raw linguistic input. In most previous experiments, however, learning happens over unrealistic corpora, which do not reflect the type and amount of data a child would be exposed to. This paper remedies this state of affairs by training a Long Short-Term Memory network (LSTM) over a realistically sized subset of child-directed input. The behaviour of the network is analysed over time using a novel methodology which consists in quantifying the level of grammatical abstraction in the model's generated output (its "babbling"), compared to the language it has been exposed to. We show that the LSTM indeed abstracts new structuresas learning proceeds.
摘要：回归神经网络（RNNs）已经显示出从原料语言输入捕捉语法的各个方面。在大多数以前的实验，然而，学习发生过不切实际的语料库，并没有反映孩子将面临数据的类型和数量。通过训练长短时记忆网络（LSTM）在实际大小儿童导向输入的子集本文补救这种状况。网络的行为是利用其在于在量化模型的生成的输出（它的“潺潺”）语法抽象级别相比，它已被暴露于语言一种新颖的方法分析随着时间的推移。我们表明，LSTM的确抽象新structuresas学习所得。

10. Examining the Ordering of Rhetorical Strategies in Persuasive Requests [PDF] 返回目录
Omar Shaikh, Jiaao Chen, Jon Saad-Falcon, Duen Horng, Chau, Diyi Yang
Abstract: Interpreting how persuasive language influences audiences has implications across many domains like advertising, argumentation, and propaganda. Persuasion relies on more than a message's content. Arranging the order of the message itself (i.e., ordering specific rhetorical strategies) also plays an important role. To examine how strategy orderings contribute to persuasiveness, we first utilize a Variational Autoencoder model to disentangle content and rhetorical strategies in textual requests from a large-scale loan request corpus. We then visualize interplay between content and strategy through an attentional LSTM that predicts the success of textual requests. We find that specific (orderings of) strategies interact uniquely with a request's content to impact success rate, and thus the persuasiveness of a request.
摘要：解读语言影响观众如何说服有遇到像广告，论证，宣传诸多领域产生影响。说服依靠多信息的内容。排列消息本身的顺序（即排序的具体修辞策略）也起着重要的作用。为了检验战略排序如何促进说服力，我们首先利用变自动编码模式中挣脱出来的内容，并从大规模的贷款请求语料库文本请求修辞策略。然后，我们通过预测文本请求成功的注意力LSTM内容和战略之间的可视化互动。我们发现，特定的（的排序）的战略互动与唯一的请求的内容，影响成功率，因而要求的说服力。

11. Mark-Evaluate: Assessing Language Generation using Population Estimation Methods [PDF] 返回目录
Gonçalo Mordido, Christoph Meinel
Abstract: We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME$_\text{Petersen}$ and ME$_\text{CAPTURE}$, which retrieve a single-valued assessment, and ME$_\text{Schnabel}$ which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.
摘要：我们提出了一个家庭的指标来评估从人口估算方法广泛应用于生态衍生语言的产生。更具体地说，我们使用已经应用在过去的几十年来估计野生封闭人口规模标志重捕和最大似然法。我们提出了三种新的指标：ME $ _ \文本{彼得森} $和ME $ _ \文本{CAPTURE} $，其中检索单值评估，ME $ _ \文本{施纳贝尔} $返回一个双值指标来评估在质量和多样性方面的评估组，分别。在合成实验中，我们的方法是家在质量和多样性下降敏感。此外，我们的方法显示在一些具有挑战性的任务，即无条件的语言生成，机器翻译和文本摘要较高的相关性比现有的指标人工评估。

12. Denoising Multi-Source Weak Supervision for Neural Text Classification [PDF] 返回目录
Wendi Ren, Yinghao Li, Hanting Su, David Kartchner, Cassie Mitchell, Chao Zhang
Abstract: We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels. The denoised pseudo labels then supervise a neural classifier to predicts soft labels for unmatched samples, which address the rule coverage issue. We evaluate our model on five benchmarks for sentiment, topic, and relation classifications. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods consistently, and achieves comparable performance with fully-supervised methods even without any labeled data. Our code can be found at this https URL.
摘要：我们不使用任何标记数据研究性学习神经文本分类的问题，但只有易提供的规则多个监管不力来源。这个问题是有挑战性的，因为规则诱发弱的标签经常是既嘈杂和不完整的。要解决这两个难题，我们设计了一个标签去噪，评价使用条件软注意机制源的可靠性，然后通过聚集规则标注弱标签减少标签的噪音。被去噪伪标签，然后监督神经分类器来预测软标签实现无与伦比的样品，其解决规则覆盖问题。我们评估我们在五个基准情绪，主题和关系模型的分类。结果表明，我们的模型优于国家的最先进的弱监督和半监督方法一致，并实现了与全监督的方法相当的性能，甚至无任何标记的数据。我们的代码可以在此HTTPS URL中找到。

13. HENIN: Learning Heterogeneous Neural Interaction Networks for Explainable Cyberbullying Detection on Social Media [PDF] 返回目录
Hsin-Yu Chen, Cheng-Te Li
Abstract: In the computational detection of cyberbullying, existing work largely focused on building generic classifiers that rely exclusively on text analysis of social media sessions. Despite their empirical success, we argue that a critical missing piece is the model explainability, i.e., why a particular piece of media session is detected as cyberbullying. In this paper, therefore, we propose a novel deep model, HEterogeneous Neural Interaction Networks (HENIN), for explainable cyberbullying detection. HENIN contains the following components: a comment encoder, a post-comment co-attention sub-network, and session-session and post-post interaction extractors. Extensive experiments conducted on real datasets exhibit not only the promising performance of HENIN, but also highlight evidential comments so that one can understand why a media session is identified as cyberbullying.
摘要：网络欺凌的计算发现，现有的工作主要集中在建筑，仅依靠社交媒体会话的文本分析通用分类。尽管他们的成功经验，我们认为，一个关键缺少的一块是模型explainability，即为什么媒体会话的特定部分被检测为网络欺凌。在本文中，因此，我们提出了一个新颖的深层模型，异质神经相互作用网络（HENIN），用于解释的欺凌检测。 HENIN包含以下组件：评论编码器，后注释共同关注的子网络，和会话的会话和后后的交互提取。在真实数据集进行了广泛的实验，不仅表现出HENIN的表现看好，但也突出了证据的意见，使人们可以理解为什么媒体会被识别为网络欺凌。

14. Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis [PDF] 返回目录
João A. Leite, Diego F. Silva, Kalina Bontcheva, Carolina Scarton
Abstract: Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainly in English, with very few work in languages like Brazilian Portuguese. In this paper, we propose a new large-scale dataset for Brazilian Portuguese with tweets annotated as either toxic or non-toxic or in different types of toxicity. We present our dataset collection and annotation process, where we aimed to select candidates covering multiple demographic groups. State-of-the-art BERT models were able to achieve 76% macro-F1 score using monolingual data in the binary case. We also show that large-scale monolingual data is still needed to create more accurate models, despite recent advances in multilingual approaches. An error analysis and experiments with multi-label classification show the difficulty of classifying certain types of toxic comments that appear less frequently in our data and highlights the need to develop models that are aware of different categories of toxicity.
摘要：仇恨言论和有毒意见是社会化媒体平台的用户共同关注的问题。虽然这些评论是，幸运的是，在这些平台上的少数，他们仍然能够造成的伤害。因此，确定这些评论是研究和预防毒性的增殖，社交媒体的一项重要任务。在自动检测有毒条评论以前的工作主要集中在英语，在像巴西葡萄牙语语言非常少的工作。在本文中，我们提出了巴西葡萄牙语与标注为化学物质可能有毒或无毒的或不同类型的毒性鸣叫一个新的大型数据集。我们提出我们的数据收集和注释过程中，我们的目的是选择覆盖多个人口群体的候选人。国家的最先进的BERT车型能够在二进制的情况下使用单语数据，实现76％的宏观F1得分。我们还表明仍然需要一个大规模的单语数据来建立更精确的模型，尽管多语种方法的最新进展。错误分析和实验用的多标签分类显示分类的特定类型的不经常出现在我们的数据，并强调有必要制定模式，了解不同种类的毒性的有毒意见的难度。

15. Measuring What Counts: The case of Rumour Stance Classification [PDF] 返回目录
Carolina Scarton, Diego F. Silva, Kalina Bontcheva
Abstract: Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 and 2019. Results demonstrated that this is a challenging problem since naturally occurring rumour stance data is highly imbalanced. This paper specifically questions the evaluation metrics used in these shared tasks. We re-evaluate the systems submitted to the two RumourEval tasks and show that the two widely adopted metrics -- accuracy and macro-F1 - are not robust for the four-class imbalanced task of rumour stance classification, as they wrongly favour systems with highly skewed accuracy towards the majority class. To overcome this problem, we propose new evaluation metrics for rumour stance detection. These are not only robust to imbalanced data but also score higher systems that are capable of recognising the two most informative minority classes (support and deny).
摘要：姿态分类对于了解用户是否和相信网上传言的有力工具。任务目标自动预测答复的立场对一个给定的传闻，即支持，否认，问题或评论。许多方法已经被提出，其性能相比于RumourEval在2017年共同任务和2019年的结果显示，这是自自然产生的谣言姿态数据是高度不平衡的一个具有挑战性的问题。本文具体问题，在这些共享任务中使用的评价指标。我们重新评估提交给两个RumourEval任务，显示这两个广泛采用的指标系统 - 准确性和宏观F1 - 不是为谣言的态度分类的四大类不平衡任务健壮，因为他们错误地青睐与高度系统倾斜精度对多数类。为了克服这个问题，我们提出了辟谣的姿态检测新的评价指标。这些不仅是稳健不平衡数据也得分较高的系统是能够识别两个最翔实的民族班的（支持和拒绝）。

16. What Have We Achieved on Text Summarization? [PDF] 返回目录
Dandan Huang, Leyang Cui, Sen Yang, Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang
Abstract: Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semantic level, we consult the Multidimensional Quality Metric(MQM) and quantify 8 major sources of errors on 10 representative summarization models manually. Primarily, we find that 1) under similar settings, extractive summarizers are in general better than their abstractive counterparts thanks to strength in faithfulness and factual-consistency; 2) milestone techniques such as copy, coverage and hybrid extractive/abstractive methods do bring specific improvements but also demonstrate limitations; 3) pre-training techniques, and in particular sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART giving the best results.
摘要：深学习，导致了各种方法文摘显著改善研究和改进ROUGE得分年报告了。然而，间隙仍自动summarizers和人的专业人士所产生的摘要之间存在。旨在获得更多摘要系统相对于自己的长处和限制在细粒度句法和语义层面的理解，我们咨询多维质量度量（MQM）和手动对10个有代表性总结模型量化误差的8个主要来源。首先，我们发现：1）在类似的设置，采掘summarizers表现一般比他们的抽象得益于忠诚和事实一致性实力技高一筹; 2）里程碑技术，如复制，覆盖和混合萃取/抽象方法都带来具体的改进也证明限制; 3）前的训练技术，特别是序列到序列前的训练，是用于提高文本摘要，与BART给出了最好的结果非常有效。

17. Online Back-Parsing for AMR-to-Text Generation [PDF] 返回目录
Xuefeng Bai, Linfeng Song, Yue Zhang
Abstract: AMR-to-text generation aims to recover a text containing the same meaning as an input AMR graph. Current research develops increasingly powerful graph encoders to better represent AMR graphs, with decoders based on standard language modeling being used to generate outputs. We propose a decoder that back predicts projected AMR graphs on the target sentence during text generation. As the result, our outputs can better preserve the input meaning than standard decoders. Experiments on two AMR benchmarks show the superiority of our model over the previous state-of-the-art system based on graph Transformer.
摘要：AMR到文本生成旨在回收含有含义作为输入AMR图表相同文本。目前的研究开发日益强大的图形编码器基于标准的语言被用来产生模拟输出更好地代表AMR图表，用解码器。我们提出了一个解码器背面的文本生成期间预测目标句子预计AMR图。其结果是，我们的输出可以更好地保存比标准的解码器输入的含义。两个AMR基准实验表明我们的模型基于对图形变以前的国家的最先进的系统的优越性。

18. Self-Paced Learning for Neural Machine Translation [PDF] 返回目录
Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen
Abstract: Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.
摘要：最近的研究证明，神经机器翻译（NMT）的训练可以通过模仿人类的学习过程来促进。然而，这类课程学习的成就依靠人工调度与手工制作的特点制定的质量，例如句子的长度或字罕见的。我们改善与建议自学，其中NMT模型允许1更灵活的方式这个程序）自动量化了训练实例学习的信心;和2）通过调节在每个迭代步骤中的损失灵活控制其学习。在多个翻译任务实验结果表明，该模型产生比人类设计的两个翻译质量和收敛速度课程培训的强烈基线和那些机型更好的表现。

19. Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models [PDF] 返回目录
Pierangelo Lombardo, Alessio Boiardi, Luca Colombo, Angelo Schiavone, Nicolò Tamagnone
Abstract: The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks. In this work, we give a threefold contribution to address these requirements: (i) we define a protocol for the construction, based on adaptive pairwise comparisons, of a relatedness-based evaluation dataset tailored on the available resources and optimized to be particularly accurate in top-rank evaluation; (ii) we define appropriate metrics, extensions of well-known ranking correlation coefficients, to evaluate a semantic model via the aforementioned dataset by taking into account the greater significance of top ranks. Finally, (iii) we define a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which confirms the effectiveness of the proposed dataset construction protocol.
摘要：语义模型的特定领域的应用增长，最近的无监督嵌入学习算法的成绩提高，需要特定领域的评估数据集。在许多情况下，基于内容的推荐人是一个很好的例子，这些模型都是根据自己的语义关联到一个给定的概念需要等级单词或文字，尤其侧重于顶尖行列。在这项工作中，我们给出了一个三重的贡献来解决这些条件：（1）我们定义为建筑的协议，基于自适应两两比较，对可用资源量身打造的一款基于关联性评价数据集和优化，特别在准确最高等级评定; （ⅱ）我们定义适当的度量，公知的顺序相关系数的扩展，以通过考虑顶部行列更大的意义评价通过上述数据集语义模型。最后，（ⅲ）我们定义一个随机传递模型来模拟语义驱动成对比较，这证实了所提出的数据集构建方案的有效性。

20. Word Level Language Identification in English Telugu Code Mixed Data [PDF] 返回目录
Sunil Gundapu, Radhika Mamidi
Abstract: In a multilingual or sociolingual configuration Intra-sentential Code Switching (ICS) or Code Mixing (CM) is frequently observed nowadays. In the world, most of the people know more than one language. CM usage is especially apparent in social media platforms. Moreover, ICS is particularly significant in the context of technology, health, and law where conveying the upcoming developments are difficult in one's native language. In applications like dialog systems, machine translation, semantic parsing, shallow parsing, etc. CM and Code Switching pose serious challenges. To do any further advancement in code-mixed data, the necessary step is Language Identification. In this paper, we present a study of various models - Nave Bayes Classifier, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for Language Identification in English - Telugu Code Mixed Data. Considering the paucity of resources in code mixed languages, we proposed the CRF model and HMM model for word level language identification. Our best performing system is CRF-based with an f1-score of 0.91.
摘要：在多语言或sociolingual配置帧内句子码转换（ICS）或代码混合（CM）经常观察时下。在世界上，大多数人都知道一种以上的语言。 CM用法是在社交媒体平台尤其明显。此外，ICS是在传达即将到来的事态发展一个人的母语是很难科技，卫生，法律的背景下尤为显著。在象对话系统，机器翻译，语义分析，浅层分析等应用CM和码转换带来严峻的挑战。要做到在代码混合数据的任何进一步前进，必要的步骤是语言识别。在本文中，我们提出了各种模型的研究 - 殿贝叶斯分类，随机森林分类，条件随机场（CRF），和隐马尔可夫模型（HMM）的语言识别英语 - 泰卢固语码混合数据。考虑到资源的代码混合语言贫乏，我们提出了字级语言识别CRF模型和HMM模型。我们表现最好的系统是基于CRF 0.91的F1-得分。

21. MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset [PDF] 返回目录
Marina Fomicheva, Shuo Sun, Erick Fonseca, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins
Abstract: We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
摘要：我们提出MLQE-PE，机器翻译（MT）的质量估计（QE）和自动后编辑（APE）的新数据集。该数据集包含七个语言对，每语言对9000译文如下格式人标签：句子级直接评估和后期编辑的努力，和字级好/坏的标签。它还包含了后期编辑的句子，以及其中的句子，从提取的文章的标题，并用于文本翻译的神经MT车型。

22. Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels [PDF] 返回目录
Harris Chan, Jamie Kiros, William Chan
Abstract: A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all possible factorizations within and across all channels. MGLM endows flexible inference, including unconditional generation, conditional generation (where 1 channel is observed and other channels are generated), and partially observed generation (where incomplete observations are spread across all the channels). We experiment with the Multi30K dataset containing English, French, Czech, and German. We demonstrate experiments with unconditional, conditional, and partially conditional generation. We provide qualitative samples sampled unconditionally from the generative joint distribution. We also quantitatively analyze the quality-diversity trade-offs and find MGLM outperforms traditional bilingual discriminative models.
摘要：信道对应于一个潜在的含义的视点或转化。一对在英语和法语平行的句子表达了同样的潜在的含义，但通过两个独立通道相当于他们的语言。在这项工作中，我们提出了多通道剖成语言模型（MGLM）。 MGLM是在频道的生成联合分布模型。 MGLM边缘化了内和跨所有渠道的所有可能的因式分解。 MGLM赋予柔性推理，包括无条件代，有条件地生成（其中，1个通道中观察到并生成其它信道），并且部分地观察到生成（其中，不完全的观察是在所有信道传播）。我们与包括英语，法语，捷克语和德语的Multi30K数据集实验。我们证明无条件的，有条件的，有条件的部分实验产生。我们提供从生成联合分布无条件抽样定性样品。我们还定量分析的质量，多样性的权衡，找到MGLM效果优于传统的双语判别模型。

23. Uncertainty-Aware Semantic Augmentation for Neural Machine Translation [PDF] 返回目录
Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo
Abstract: As a sequence-to-sequence generation task, neural machine translation (NMT) naturally contains intrinsic uncertainty, where a single sentence in one language has multiple valid counterparts in the other. However, the dominant methods for NMT only observe one of them from the parallel corpora for the model training but have to deal with adequate variations under the same meaning at inference. This leads to a discrepancy of the data distribution between the training and the inference phases. To address this problem, we propose uncertainty-aware semantic augmentation, which explicitly captures the universal semantic information among multiple semantically-equivalent source sentences and enhances the hidden representations with this information for better translations. Extensive experiments on various translation tasks reveal that our approach significantly outperforms the strong baselines and the existing methods.
摘要：作为一个序列到序列生成任务，神经机器翻译（NMT）自然包含固有不确定性，其中一种语言单个句子在其他多个有效的同行。然而，对于NMT的主要方法只能观察其中一人从模型训练的平行语料库，但必须处理的推理相同的意义下足够的变化。这导致了训练和推理阶段之间的数据分布的差异。为了解决这个问题，我们提出了不确定性感知语义增强，其明确地捕捉多个语义等效源句子之间的通用的语义信息，并与这些信息进行更好的翻译增强了隐藏表示。各种翻译任务大量的实验表明，我们的方法显著优于强基线和现有方法。

24. gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data [PDF] 返回目录
Sunil Gundapu, Radhika Mamidi
Abstract: The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.
摘要：在同一话语中混合多国语言的词汇和语法的现象称为码混用。这是在多语言社会更加明显。在本文中，我们已经制定了SemEval 2020系统：任务9对市场情绪分析代码混社会媒体文本。我们的系统首先为社会化媒体的文本生成两种类型的嵌入的。在这些中，第一个是人物等级的嵌入编码的字符级别的信息和处理外的词汇中的条目和第二个是FastText字的嵌入用于捕获形态学和语义。这两种方式的嵌入被传递到LSTM网络和系统跑赢基准模型。

25. A Survey of Knowledge-Enhanced Text Generation [PDF] 返回目录
Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, Meng Jiang
Abstract: The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models. This research direction is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry.
摘要：文本生成的目标是使机器在人类语言表达。它是自然语言处理的最重要而具有挑战性的任务（NLP）之一。从2014年开始，由Seq2Seq率先各种神经编码器，解码器模型已被提出通过学习来映射输入文本输出文本要达到的目标。但是，输入文本单独通常提供有限的知识来产生所需的输出，所以文本生成的性能在许多现实世界场景的满意度仍远。为了解决这个问题，研究人员已经考虑引入各种形式的超越输入文本知识进代车型。本研究方向是已知的知识，增强了文本生成。在本次调查中，我们提出，在过去五年知识提高文本生成的研究进行了全面审查。主要内容包括两个部分：（i）用于整合知识成文本生成的一般方法和架构; （ⅱ）根据不同形式的知识数据的特定技术和应用。本次调查可以有广泛的受众群体，研究人员和从业人员，在学术界和工业界。

26. Graphing Contributions in Natural Language Processing Research: Intra-Annotator Agreement on a Trial Dataset [PDF] 返回目录
Jennifer D'Souza, Sören Auer
Abstract: Purpose: To stabilize the NLPContributionGraph scheme for the surface structuring of contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: first stage - to define the scheme; and second stage - to stabilize the graphing model. Approach: Re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triples. To this end specifically, care was taken in the second annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring scheme. Findings: The application of NLPContributionGraph on the 50 articles resulted in finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triples indicating that with an increased granularity of the information, the annotation decision variance is greater. Practical Implications: Demonstrate NLPContributionGraph data integrated in the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with compute enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Value: NLPContributionGraph is a novel scheme to obtain research contribution-centered graphs from NLP articles which to the best of our knowledge does not exist in the community. And our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
摘要：目的：为稳定的贡献信息，自然语言处理（NLP）通过两阶段注释方法，学术文章的表面结构化的NLPContributionGraph方案：第一阶段 - 定义方案;和第二阶段 - 以稳定上述图形模型。处理方法：重新注释，第二次，在数据管道包括字词在整个50之前标注的NLP学术文章的贡献，相关信息：贡献为中心的句子，短语和三倍。为此专门时，注意到在第二阶段的注释，以减少噪音的注释，同时制定我们提出的新的NLP贡献结构方案的指导方针。调查结果：NLPContributionGraph对50篇文章的应用在900的贡献为重点的句子，4,702贡献信息为中心的短语，和2980的表面结构的三元组数据集最终毫无结果。在第一和第二阶段之间的帧内注释协议，在F1而言，是为句子67.92％，对于短语41.82％，以及用于指示与信息的增加的粒度，注释决定方差大于三元组22.31％。现实意义：证明集成在开放研究的知识图谱（ORKG）NLPContributionGraph数据，以计算基于KG-下一代数字图书馆启用了结构化的学术知识，作为一种可行的援助，以协助他们每天的日常任务的研究人员。价值：NLPContributionGraph是一个新的计划，从中所获得的认识并不存在于社区NLP文章获得研究成果为中心的图表。而我们在两阶段注释任务的定量评估提出见解，任务难度。

27. Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation [PDF] 返回目录
Yan Zhang, Zhijiang Guo, Zhiyang Teng, Wei Lu, Shay B. Cohen, Zuozhu Liu, Lidong Bing
Abstract: AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information aggregation scheme. To account for these issues, larger and deeper GCN models are required to capture more complex interactions. In this paper, we introduce a dynamic fusion mechanism, proposing Lightweight Dynamic Graph Convolutional Networks (LDGCNs) that capture richer non-local interactions by synthesizing higher order information from the input graphs. We further develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity. With the help of these strategies, we are able to train a model with fewer parameters while maintaining the model capacity. Experiments demonstrate that LDGCNs outperform state-of-the-art models on two benchmark datasets for AMR-to-text generation with significantly fewer parameters.
摘要：AMR到文本生成用于转导抽象意义表达结构（AMR）转换成文本。在这个任务中的一个关键挑战是如何有效地学习有效的图形表示。此前，图表卷积网络（GCNs）来编码输入自动抄表系统，但是，香草GCNs无法捕捉到的非本地信息，另外，他们也跟着本地（一阶）信息聚合方案。考虑到这些问题，更大和更深的GCN车型都需要获得更多的复杂的相互作用。在本文中，我们引入一个动态融合机制，通过合成从输入图表高阶信息提出轻量级动态图形卷积网络（LDGCNs），该捕获更丰富的非本地交互。我们进一步开发了基于该组图形卷积和重绑卷积，以减少内存使用和模型的复杂性两种新的参数保存策略。有了这些策略的帮助下，我们能够培养出模型参数少，同时保持模型的能力。实验表明，LDGCNs超越国家的最先进的车型上的两个标准数据集的AMR到文本生成与显著较少的参数。

28. Token-level Adaptive Training for Neural Machine Translation [PDF] 返回目录
Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, Dong Yu
Abstract: There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies, which leads to different learning difficulties for tokens in Neural Machine Translation (NMT). The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution. However, low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected. In this paper, we explored target token-level adaptive objectives based on token frequencies to assign appropriate weights for each target token during training. We aimed that those meaningful but relatively low-frequency words could be assigned with larger weights in objectives to encourage the model to pay more attention to these tokens. Our method yields consistent improvements in translation quality on ZH-EN, EN-RO, and EN-DE translation tasks, especially on sentences that contain more low-frequency tokens where we can get 1.68, 1.02, and 0.52 BLEU increases compared with baseline, respectively. Further analyses show that our method can also improve the lexical diversity of translation.
摘要：存在于不同的令牌出现频率不同，自然语言标记的不平衡现象，导致在神经机器翻译（NMT）标记不同的学习困难。香草NMT模型通常采用用于具有不同频率的目标令牌琐碎相等加权的目标并趋于产生更多的高频令牌和较少低频令牌与金色令牌分配进行比较。然而，低频令牌可以携带重要的语义信息，这将影响翻译质量一旦被忽视。在本文中，我们探讨了基于令牌的频率为培训期间令牌每一个目标分配适当的权重目标令牌级自适应目标。我们的目的，那些有意义的，但相对低频词可能具有较大权重的目标被分配到鼓励模型更多地关注这些令牌。我们的方法产生在ZH-EN，EN-RO，和EN-DE翻译任务，翻译质量持续改善，尤其是在包含多个低频令牌，我们可以得到比较基准1.68，1.02和0.52 BLEU增加的句子，分别。进一步的分析表明，我们的方法也可以提高翻译的词汇多样性。

29. Q-learning with Language Model for Edit-based Unsupervised Summarization [PDF] 返回目录
Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana
Abstract: Unsupervised methods are promising for abstractive text summarization in that the parallel corpora is not required. However, their performance is still far from being satisfied, therefore research on promising solutions is on-going. In this paper, we propose a new approach based on Q-learning with an edit-based summarization. The method combines two key modules to form an Editorial Agent and Language Model converter (EALM). The agent predicts edit actions (e.t., delete, keep, and replace), and then the LM converter deterministically generates a summary on the basis of the action signals. Q-learning is leveraged to train the agent to produce proper edit actions. Experimental results show that EALM delivered competitive performance compared with the previous encoder-decoder-based methods, even with truly zero paired data (i.e., no validation set). Defining the task as Q-learning enables us not only to develop a competitive method but also to make the latest techniques in reinforcement learning available for unsupervised summarization. We also conduct qualitative analysis, providing insights into future study on unsupervised summarizers.
摘要：无监督的方法是有希望的在不需要平行语料库抽象文本摘要。然而，他们的表现仍然是不成立至今，因此研究希望的解决方案是持续的。在本文中，我们提出了一种基于与编辑总结基于Q学习的新方法。该方法结合了两种关键模块以形成一个编辑代理和语言模型转换器（EALM）。代理预测编辑操作（E.T.，删除，保持和替换），然后LM转换器产生确定性的动作信号的基础上，总结。 Q学习是利用训练代理人出示适当的编辑操作。实验结果表明，EALM与前基于编码器的解码器方法相比递送竞争力的性能，即使使用真正零成对数据（即，没有验证集）。定义任务，Q学习，使我们不仅制定有竞争力的方法，而且使最新的技术，强化学习可供监督的总结。我们还进行定性分析，提供了解未来无监督summarizers研究。

30. iobes: A Library for Span-Level Processing [PDF] 返回目录
Brian Lester
Abstract: Many tasks in natural language processing, such as named entity recognition and slot-filling, involve identifying and labeling specific spans of text. In order to leverage common models, these tasks are often recast as sequence labeling tasks. Each token is given a label and these labels are prefixed with special tokens such as B- or I-. After a model assigns labels to each token, these prefixes are used to group the tokens into spans. Properly parsing these annotations is critical for producing fair and comparable metrics; however, despite its importance, there is not an easy-to-use, standardized, programmatically integratable library to help work with span labeling. To remedy this, we introduce our open-source library, iobes. iobes is used for parsing, converting, and processing spans represented as token level decisions.
摘要：在自然语言处理很多任务，如命名实体识别和槽填充，包括：标识和文字标注具体的跨度。为了利用常用型号，这些任务往往重铸为序列标注任务。每个令牌被赋予了标签和这些标签用特殊标记如B超或I-前缀。一个模型受让人标签每个令牌之后，这些前缀被用于组的标记为跨度。正确解析这些注解是产生公平和可比指标的关键;然而，尽管它的重要性，没有一个易于使用的，标准化，程序集成的库与跨度标签帮助工作。为了解决这个问题，我们介绍我们的开源库，iobes。 iobes用于解析，转换，和处理表示为令牌水平决定的跨度。

31. Pragmatically Informative Color Generation by Grounding Contextual Modifiers [PDF] 返回目录
Zhengxuan Wu, Desmond C. Ong
Abstract: Grounding language in contextual information is crucial for fine-grained natural language understanding. One important task that involves grounding contextual modifiers is color generation. Given a reference color "green", and a modifier "bluey", how does one generate a color that could represent "bluey green"? We propose a computational pragmatics model that formulates this color generation task as a recursive game between speakers and listeners. In our model, a pragmatic speaker reasons about the inferences that a listener would make, and thus generates a modified color that is maximally informative to help the listener recover the original referents. In this paper, we show that incorporating pragmatic information provides significant improvements in performance compared with other state-of-the-art deep learning models where pragmatic inference and flexibility in representing colors from a large continuous space are lacking. Our model has an absolute 98% increase in performance for the test cases where the reference colors are unseen during training, and an absolute 40% increase in performance for the test cases where both the reference colors and the modifiers are unseen during training.
摘要：在上下文信息接地语言是细粒度的自然语言理解的关键。这涉及接地语境修饰的一个重要任务是颜色的产生。给定一个基准色的“绿色”和修改“发蓝的”，一个人如何产生，可能代表着“发蓝的绿”色？我们建议，制定本颜色生成任务作为演讲嘉宾和听众之间的递归游戏的计算模型，语用学。在我们的模型中，对推断务实扬声器原因的监听器将使，从而生成修改的颜色，是最大限度地信息，帮助听众恢复原始参照物。在本文中，我们将展示与语用推理和代表的颜色的灵活性从大量连续的空间缺乏，而其他国家的最先进的深度学习的机型相比是结合实际的信息在性能显著的改善。我们的模型已经在测试情况下，基准色是在训练中表现看不见的绝对值98％的增长，并且在性能上的绝对增加40％的测试情况下，基准色和改性剂都是在训练中看不见。

32. Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers [PDF] 返回目录
Brian Lester, Daniel Pressel, Amy Hemmeter, Sagnik Ray Choudhury, Srinivas Bangalore
Abstract: Current state-of-the-art models for named entity recognition (NER) are neural models with a conditional random field (CRF) as the final layer. Entities are represented as per-token labels with a special structure in order to decode them into spans. Current work eschews prior knowledge of how the span encoding scheme works and relies on the CRF learning which transitions are illegal and which are not to facilitate global coherence. We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant, effectively eliminating the need for a CRF. We analyze the dynamics of tag co-occurrence to explain when these constraints are most effective and provide open source implementations of our tagger in both PyTorch and TensorFlow.
摘要：当前状态的最先进的模型命名实体识别（NER）是神经模型与条件随机场（CRF）作为最终层。实体，以便将它们解码为跨度表示为每令牌具有特殊结构的标签。目前的工作如何避开跨度编码方案作品的先验知识，并且依赖于CRF学习哪些转换是非法的，哪些不是，以促进全球一致性。我们通过限制来抑制非法的转变，我们可以快两倍训练捉了交叉熵损失与F1统计不显着，有效地消除了对CRF的需求差异的CRF的输出中找到。我们分析标签共生的动态解释时，这些限制是最有效的，并提供我们在这两个PyTorch和TensorFlow恶搞的开源实现。

33. Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding [PDF] 返回目录
Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li
Abstract: Neural models have yielded state-of-the-art results in deciphering spoken language understanding (SLU) problems; however, these models require a significant amount of domain-specific labeled examples for training, which is prohibitively expensive. While pre-trained language models like BERT have been shown to capture a massive amount of knowledge by learning from unlabeled corpora and solve SLU using fewer labeled examples for adaption, the encoding of knowledge is implicit and agnostic to downstream tasks. Such encoding results in model inefficiencies in parameter usage: an entirely new model is required for every domain. To address these challenges, we introduce a novel SLU framework, comprising a conversational language modeling (CLM) pre-training task and a light encoder architecture. The CLM pre-training enables networks to capture the representation of the language in conversation style with the presence of ASR errors. The light encoder architecture separates the shared pre-trained networks from the mappings of generally encoded knowledge to specific domains of SLU, allowing for the domain adaptation to be performed solely at the light encoder and thus increasing efficiency. With the framework, we match the performance of state-of-the-art SLU results on Alexa internal datasets and on two public ones (ATIS, SNIPS), adding only 4.4% parameters per task.
摘要：神经模型已经破译口语理解（SLU）的问题得到国家的最先进的成果;然而，这些模型需要的领域特定标识样本进行训练，这是非常昂贵的一个显著量。虽然预先训练语言模型，如BERT已被证明由未标记的语料库学习获取知识的巨量和使用更少的标识样本为适应解决SLU，知识的编码是隐含的，不可知的下游任务。在模型的低效率这样的编码结果在参数用法：一个全新的模型是必需的针对每一个域。为了应对这些挑战，我们引入了一个新的框架SLU，包括会话语言模型（CLM）前培训任务和光编码器架构。该CLM前培训，使网络捕捉与ASR错误的存在在谈话风格语言的表示。光编码器的体系结构分离开的通常编码知识映射到SLU的特定结构域的共享预训练的网络中，从而允许域适应要在光编码器仅执行，从而提高了效率。随着框架内，我们在Alexa上的内部数据集，并在两个公立（ATIS，SNIPS）国家的最先进的SLU结果的匹配性能，将每个任务只有4.4％的参数。

34. Plug-and-Play Conversational Models [PDF] 返回目录
Andrea Madotto, Etsuko Ishii, Zhaojiang Lin, Sumanth Dathathri, Pascale Fung
Abstract: There has been considerable progress made towards conversational models that generate coherent and fluent responses; however, this often involves training large language models on large dialogue datasets, such as Reddit. These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for attribute specific generation that can be used for fine-tuning the model. In this paper, we first propose and evaluate plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model. While effective, the decoding procedure induces considerable computational overhead, rendering the conversational model unsuitable for interactive usage. To overcome this, we introduce an approach that does not require further computation at decoding time, while also does not require any fine-tuning of a large language model. We demonstrate, through extensive automatic and human evaluation, a high degree of control over the generated conversational responses with regard to multiple desired attributes, while being fluent.
摘要：一直朝着产生连贯和流畅应答对话模式取得了长足进步;然而，这往往涉及到对大型数据集的对话，例如Reddit训练大语言模型。这些大的会话模型提供了对产生的响应几乎没有控制，而这种控制是在不存在用于特定属性注释生成会话的数据集可被用于微调该模型的进一步的限制。在本文中，我们首先提出和评估可控反应生成，不需要对话具体数据集，不依赖于微调的大型模型插件和播放方法。虽然有效，解码过程诱导相当大的计算开销，使会话模型不适用于交互式用法。为了克服这个问题，我们引入这并不在解码时需要进一步的计算，同时并不需要大量的语言模型的任何微调的方法。我们证明，通过广泛的自动和人工评估，高度超过关于多个期望的属性所产生的会话的响应控制，而被流畅。

35. NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative COVID-19 Tweets using Ensembling and Adversarial Training [PDF] 返回目录
Priyanshu Kumar, Aadarsh Singh
Abstract: We experiment with COVID-Twitter-BERT and RoBERTa models to identify informative COVID-19 tweets. We further experiment with adversarial training to make our models robust. The ensemble of COVID-Twitter-BERT and RoBERTa obtains a F1-score of 0.9096 (on the positive class) on the test data of WNUT-2020 Task 2 and ranks 1st on the leaderboard. The ensemble of the models trained using adversarial training also produces similar result.
摘要：我们与COVID，Twitter的-BERT和罗伯塔模型进行试验，以确定信息COVID-19鸣叫。我们进一步实验的对抗训练，使我们的模型的鲁棒性。 COVID-Twitter的BERT和罗伯塔的合奏获得的0.9096上WNUT-2020任务2的测试数据的F1-得分（对正类）和居上排行榜第一。该车型的整体使用对抗训练也产生类似的结果训练。

36. Langsmith: An Interactive Academic Text Revision System [PDF] 返回目录
Takumi Ito, Tatsuki Kuribayashi, Masatoshi Hidaka, Jun Suzuki, Kentaro Inui
Abstract: Despite the current diversity and inclusion initiatives in the academic community, researchers with a non-native command of English still face significant obstacles when writing papers in English. This paper presents the Langsmith editor, which assists inexperienced, non-native researchers to write English papers, especially in the natural language processing (NLP) field. Our system can suggest fluent, academic-style sentences to writers based on their rough, incomplete phrases or sentences. The system also encourages interaction between human writers and the computerized revision system. The experimental results demonstrated that Langsmith helps non-native English-speaker students write papers in English. The system is available at https://emnlp-demo.editor. this http URL.
摘要：尽管在学术界目前的多元化和包容性的举措，研究人员用英语的非本地命令还是英文写论文时面临显著障碍。本文介绍了Langsmith编辑器，它帮助没有经验的，非本地的研究人员写的英文论文，特别是在自然语言处理（NLP）领域。我们的系统能够流畅，学术风格的句子建议根据自己的粗糙，不完整的短语或句子的作家。该系统还鼓励人类的作家和计算机化修正系统之间的交互。实验结果表明，Langsmith帮助非英语为母语的学生扬声器写论文的英文。该系统可为：https：//emnlp-demo.editor。这个HTTP URL。

37. Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning [PDF] 返回目录
Xiaomian Kang, Yang Zhao, Jiajun Zhang, Chengqing Zong
Abstract: Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context so that the document-level translation model can utilize the more useful selected context sentences to produce better translations. Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence. Then, we propose two strategies to explicitly select a variable number of context sentences and feed them into the translation module. We train the two modules end-to-end via reinforcement learning. A novel reward is proposed to encourage the selection and utilization of dynamic context sentences. Experiments demonstrate that our approach can select adaptive context sentences for different source sentences, and significantly improves the performance of document-level translation methods.
摘要：文档级神经机器翻译取得了有吸引力的改进。然而，大多数的现有方法大致使用所有上下文的句子在一个固定的范围。他们忽略了一个事实，即不同的源句子需要上下文的不同尺寸。为了解决这个问题，我们建议选择动态上下文，以便文档级翻译模型可以利用更多有用的选择的上下文的句子来产生更好的翻译的有效途径。具体来说，我们引入了选择模块独立于翻译模块的得分每个候选上下文句子。然后，我们提出了两种策略来明确选择上下文的句子数量可变的，并将它们送入转换模块。我们培养两个模块通过强化学习结束到终端。一种新型的报酬，提出鼓励动态上下文的句子的选择和利用。实验表明，我们的方法可以选择自适应上下文的句子对不同来源的句子，并显著提高了文档级的翻译方法的性能。

38. How Can Self-Attention Networks Recognize Dyck-n Languages? [PDF] 返回目录
Javid Ebrahimi, Dhruv Gelda, Wei Zhang
Abstract: We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{^+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.
摘要：我们专注于识别戴克-n的（$ \ mathcal {d} _n $）具有自注意语言（SA）网络，这已被认为是这些网络的一项艰巨的任务。我们比较SA，一个的两个变体的性能开始符号（SA $ ^ + $），一个没有（SA $ ^ - $）。我们的研究结果表明，SA $ ^ + $能够推广到更长的序列和更深的依赖关系。对于$ \ mathcal {d} $ _2，我们发现SA $ ^ - $完全打破了长序列，而SA $ ^ + $ 58.82为$ \％$的准确性。我们发现，关注通过映射$ \ {文本} SA了解到{^ +} $以便能够进行解释，并兼容基于堆栈的语言识别器。出人意料的是，SA网络的性能是按面值与LSTMs，它提供了SA的学习层次，而不递归的证据能力。

39. Masked ELMo: An evolution of ELMo towards fully contextual RNN language models [PDF] 返回目录
Gregory Senay, Emmanuelle Salin
Abstract: This paper presents Masked ELMo, a new RNN-based model for language model pre-training, evolved from the ELMo language model. Contrary to ELMo which only uses independent left-to-right and right-to-left contexts, Masked ELMo learns fully bidirectional word representations. To achieve this, we use the same Masked language model objective as BERT. Additionally, thanks to optimizations on the LSTM neuron, the integration of mask accumulation and bidirectional truncated backpropagation through time, we have increased the training speed of the model substantially. All these improvements make it possible to pre-train a better language model than ELMo while maintaining a low computational cost. We evaluate Masked ELMo by comparing it to ELMo within the same protocol on the GLUE benchmark, where our model outperforms significantly ELMo and is competitive with transformer approaches.
摘要：本文介绍蒙面埃尔莫，为语言模型前训练一个新的基于RNN模型，从埃尔莫语言模型发展而来的。相反，ELMO只使用独立的左到右，右到左上下文，蒙面ELMO学习完全双向字表示。为了实现这一目标，我们用同样的蒙面语言模型客观的BERT。此外，由于在LSTM神经优化，面具积累和双向截断反向传播的时间，通过整合，我们已经大大提高了模型的训练速度。所有这些改进使得它可以预先训练比ELMO更好的语言模型，同时保持较低的计算成本。我们通过它上胶基准，在我们的模型显著优于ELMO而且很有竞争与变压器接近相同的协议中比较ELMO评估蒙面ELMO。

40. Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task [PDF] 返回目录
Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh
Abstract: The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. Our submission is based on BLEURT, a previously published metric based on transfer learning. We extend the metric beyond English and evaluate it on 12 languages for which training examples are available, as well as four "zero-shot" languages, for which we have no fine-tuning data. Additionally, we focus on English to German and demonstrate how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that BLEURT achieves competitive results on the WMT Metrics 2019 Shared Task, indicating its promise for the 2020 edition.
摘要：机器翻译系统的质量在过去十年里已经显着改善，并因此，评估已经成为一个越来越具有挑战性的问题。本文介绍了我们对WMT 2020度量共享任务，翻译的自动评估的主要指标的贡献。我们提出基于BLEURT，基于迁移学习先前公布的指标。我们超出英语度量和评估其对12种语言，其训练的例子是可用的，以及四“零射门”的语言，对此我们没有微调的数据。此外，我们专注于英语德语和演示如何BLEURT的预测与一似的结合，并使用替代参考译文，以提高性能。实证结果表明，BLEURT实现对WMT指标2019共享任务的竞争的结果，表明其对2020年版的诺言。

41. Analysis of Disfluency in Children's Speech [PDF] 返回目录
Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf
Abstract: Disfluencies are prevalent in spontaneous speech, as shown in many studies of adult speech. Less is understood about children's speech, especially in pre-school children who are still developing their language skills. We present a novel dataset with annotated disfluencies of spontaneous explanations from 26 children (ages 5--8), interviewed twice over a year-long period. Our preliminary analysis reveals significant differences between children's speech in our corpus and adult spontaneous speech from two corpora (Switchboard and CallHome). Children have higher disfluency and filler rates, tend to use nasal filled pauses more frequently, and on average exhibit longer reparandums than repairs, in contrast to adult speakers. Despite the differences, an automatic disfluency detection system trained on adult (Switchboard) speech transcripts performs reasonably well on children's speech, achieving an F1 score that is 10\% higher than the score on an adult out-of-domain dataset (CallHome).
摘要：不流利是自发的讲话普遍，如在成年人语音的许多研究。少即是了解有关儿童的讲话，尤其是学龄前儿童谁仍在发展他们的语言技能。我们提出一个新的数据集从26名儿童（5-8岁）自发的解释注释不流利，两次采访过长达一年的时间。我们的初步分析显示，从两个语料库（总机和CallHome）我们的语料库和成人自然语音儿童的言语之间显著的差异。孩子有较高的不流利和填充速率，倾向于更频繁地使用鼻填充停顿，平均显示了较长reparandums比维修，而相比之下，成年扬声器。尽管有分歧，训练有素的成人自动不流利检测系统（总机）语音转录文本表现相当不错对儿童的讲话，获得了F1的分数比对成人外的域数据集（CallHome）的得分高出10 \％。

42. comp-syn: Perceptually Grounded Word Embeddings with Color [PDF] 返回目录
Bhargav Srinivasa Desikan, Tasker Hull, Ethan O. Nadler, Douglas Guilbeault, Aabir Abubaker Kar, Mark Chu, Donald Ruggiero Lo Sardo
Abstract: Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings for over 40,000 English words, each associated with crowd-sourced word concreteness judgments.
摘要：流行的方法自然语言处理创建基于文本的共生模式的嵌入词，但往往忽略了体现，语言的感觉方面。在这里，我们介绍Python包COMP-syn时，它提供了基于谷歌的图片搜索结果的感知均匀颜色分布接地字的嵌入。我们表明，COMP-SYN显著丰富的分布式语义模型。特别是，我们表明，（1）COMP-SYN预测字的具体性的人的判断更准确和更可解释的方式比使用低维字色的嵌入word2vec，和（2）COMP-顺式进行同等于word2vec上隐喻与字面字对分类任务。 COMP-SYN是开源的PyPI上，并与主流机器学习Python包兼容。我们的包装版本包括超过40,000英文单词的字色的嵌入，每个人群来源的词具体性的判断有关。

43. On the Role of Style in Parsing Speech with Neural Models [PDF] 返回目录
Trang Tran, Jiahong Yuan, Yang Liu, Mari Ostendorf
Abstract: The differences in written text and conversational speech are substantial; previous parsers trained on treebanked text have given very poor results on spontaneous speech. For spoken language, the mismatch in style also extends to prosodic cues, though it is less well understood. This paper re-examines the use of written text in parsing speech in the context of recent advances in neural language processing. We show that neural approaches facilitate using written text to improve parsing of spontaneous speech, and that prosody further improves over this state-of-the-art result. Further, we find an asymmetric degradation from read vs. spontaneous mismatch, with spontaneous speech more generally useful for training parsers.
摘要：在书面文字和对话语音是巨大的差异;培训了treebanked文本以前解析器给了自发的讲话非常差的结果。对于口语，在风格上的不匹配也延伸到韵律线索，虽然它不太容易理解。本文重新审视在神经语言处理的最新进展的背景下解析讲话中使用书面文字。我们表明，神经方法能够促进使用书面文字，以提高自然语音的分析，以及韵律进一步提高了国家的最先进的这个结果。此外，我们发现从读与自然不符的不对称的退化，与自然语音更普遍有用的培训解析器。

44. Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems [PDF] 返回目录
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny
Abstract: Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly extracts intents from speech requires large amounts of intent-labeled speech data, which is time consuming and expensive to collect. Initializing the S2I model with an ASR model trained on copious speech data can alleviate data sparsity. In this paper, we attempt to leverage NLU text resources. We implemented a CTC-based S2I system that matches the performance of a state-of-the-art, traditional cascaded SLU system. We performed controlled experiments with varying amounts of speech and text training data. When only a tenth of the original data is available, intent classification accuracy degrades by 7.6% absolute. Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system. The proposed approaches recover 80% of performance lost due to using limited intent-labeled speech.
摘要：培训一端至端（E2E）神经网络的语音到意图（S2I）系统，该系统直接提取意图从语音需要大量的意图标记的语音数据，这是耗时且昂贵的收集。初始化训练有素上丰富的语音数据可以减轻数据稀疏的ASR模型S2I模型。在本文中，我们试图利用自然语言理解的文本资源。我们实施了一个国家的最先进的，传统的级联SLU系统的性能相匹配的基于CTC-S2I系统。我们用不同的语音和文本训练数据量进行对照实验。当只有原始数据的十分之一可用，意图分类精度下降了7.6％，是绝对的。假设我们有额外的文本到意图数据（无语音）可用，我们研究了两种技术来提高系统S2I：（1）转让学习，这对于意图分类声的嵌入是绑微调BERT文本的嵌入;和（2）的数据增大，其中，所述文本到意图数据是使用多扬声器的文本到语音系统转换成语音到意图数据。所提出的方法恢复的性能80％，失去了应有的使用限制的意图标记的讲话。

45. Fake Reviews Detection through Analysis of Linguistic Features [PDF] 返回目录
Faranak Abri, Luis Felipe Gutierrez, Akbar Siami Namin, Keith S. Jones, David R. W. Sears
Abstract: Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We present a detailed analysis of linguistic features for distinguishing fake and trustworthy online reviews. We study 15 linguistic features and measure their significance and importance towards the classification schemes employed in this study. Our results indicate that fake reviews tend to include more redundant terms and pauses, and generally contain longer sentences. The application of several machine learning classification algorithms revealed that we were able to discriminate fake from real reviews with high accuracy using these linguistic features.
摘要：网上评论发挥企业成败的一个组成部分。在此之前购买的服务或商品，顾客至上审查提交的旧客户的网上评论。然而，有可能通过张贴假冒和虚假评论来提高肤浅或妨碍一些企业。本文探讨了自然语言处理的方法来识别虚假评论。我们提出的语言特征的详细分析，区分虚假和值得信赖的在线评论。我们研究15个语言特点和衡量他们在本研究中所采用的分类方案意义和重要性。我们的研究结果表明，虚假评论往往包含多个冗余的条款和暂停，并且通常包含更长的句子。几种机器学习分类算法的应用表明，我们能够区分假冒的高精度实时评论使用这些语言特征。

46. Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks [PDF] 返回目录
Ansel MacLaughlin, Jwala Dhamala, Anoop Kumar, Sriram Venkatapathy, Ragav Venkatesan, Rahul Gupta
Abstract: Neural Architecture Search (NAS) methods, which automatically learn entire neural model or individual neural cell architectures, have recently achieved competitive or state-of-the-art (SOTA) performance on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. In this work, we explore the applicability of a SOTA NAS algorithm, Efficient Neural Architecture Search (ENAS) (Pham et al., 2018) to two sentence pair tasks, paraphrase detection and semantic textual similarity. We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. We explore the effectiveness of ENAS through experiments on three datasets (MRPC, SICK, STS-B), with two different models (ESIM, BiLSTM-Max), and two sets of embeddings (Glove, BERT). In contrast to prior work applying ENAS to NLP tasks, our results are mixed -- we find that ENAS architectures sometimes, but not always, outperform LSTMs and perform similarly to random architecture search.
摘要：神经结构搜索（NAS）的方法，它可以自动学习整个神经网络模型或单个神经细胞结构，最近已经取得的竞争或国家的最先进的（SOTA）性能的各种自然语言处理和计算机视觉任务，包括语言建模，自然语言推理，和图像分类。在这项工作中，我们探索出一条SOTA NAS算法，高效的神经结构搜索（ENAS）（Pham等人，2018）的适用性两个句话对任务，意译检测和语义文本相似性。我们使用ENAS进行微观层面的搜索和学习任务优化RNN单元架构的下拉更换为LSTM。我们通过对三个数据集（MRPC，SICK，STS-B）的实验探索ENAS的有效性，有两种不同的型号（ESIM，BiLSTM-max）和两套的嵌入的（手套，BERT）。相较于应用ENAS到NLP任务以前的工作，我们的结果是混合 - 我们发现ENAS架构有时，但并非总是如此，跑赢LSTMs，同样进行随机架构搜索。

47. Dual Inference for Improving Language Understanding and Generation [PDF] 返回目录
Yung-Sung Chuang, Shang-Yu Su, Yun-Nung Chen
Abstract: Natural language understanding (NLU) and Natural language generation (NLG) tasks hold a strong dual relationship, where NLU aims at predicting semantic labels based on natural language utterances and NLG does the opposite. The prior work mainly focused on exploiting the duality in model training in order to obtain the models with better performance. However, regarding the fast-growing scale of models in the current NLP area, sometimes we may have difficulty retraining whole NLU and NLG models. To better address the issue, this paper proposes to leverage the duality in the inference stage without the need of retraining. The experiments on three benchmark datasets demonstrate the effectiveness of the proposed method in both NLU and NLG, providing the great potential of practical usage.
摘要：自然语言理解（NLU）和自然语言生成（NLG）任务抱持着强烈的双重关系，其中NLU的目的是基于自然语言语句和NLG则正好相反预测语义标签。现有的工作主要集中在利用模型训练的二元性，从而具有更好的性能，以获得模型。然而，关于车型快速增长的规模在当前NLP领域，有时候我们可能很难再培训全NLU和NLG模型。为了更好地解决这一问题，本文提出利用在推论阶段的两重性，而不需要再培训的。对三个标准数据集上的实验证明在这两个NLU和NLG了该方法的有效性，提供实际使用的巨大潜力。

48. Query-Key Normalization for Transformers [PDF] 返回目录
Alex Henry, Prudhvi Raj Dachapally, Shubham Pawar, Yuxuan Chen
Abstract: Low-resource language translation is a challenging but socially valuable NLP task. Building on recent work adapting the Transformer's normalization to this setting, we propose QKNorm, a normalization technique that modifies the attention mechanism to make the softmax function less prone to arbitrary saturation without sacrificing expressivity. Specifically, we apply $\ell_2$ normalization along the head dimension of each query and key matrix prior to multiplying them and then scale up by a learnable parameter instead of dividing by the square root of the embedding dimension. We show improvements averaging 0.928 BLEU over state-of-the-art bilingual benchmarks for 5 low-resource translation pairs from the TED Talks corpus and IWSLT'15.
摘要：低资源语言翻译是一个挑战，但社会价值的NLP任务。最近的工作适应变压器的规范化此设置的基础上，我们提出QKNorm，标准化技术，改变了注意机制，使SOFTMAX功能不易任意饱和度不牺牲表现力。具体来说，我们之前乘以它们应用于沿每个查询和按键矩阵的头部尺寸$ \ ell_2 $正常化，然后通过可学习参数，而不是通过嵌入维的平方根除以比例放大。我们展示的改进平均0.928 BLEU在国家的最先进的双语基准从TED演讲语料库和IWSLT'15 5低资源翻译对。

49. PoinT-5: Pointer Network and T-5 based Financial NarrativeSummarisation [PDF] 返回目录
Abhishek Singh
Abstract: Companies provide annual reports to their shareholders at the end of the financial year that describes their operations and financial conditions. The average length of these reports is 80, and it may extend up to 250 pages long. In this paper, we propose our methodology PoinT-5 (the combination of Pointer Network and T-5 (Test-to-text transfer Transformer) algorithms) that we used in the Financial Narrative Summarisation (FNS) 2020 task. The proposed method uses pointer networks to extract important narrative sentences from the report, and then T-5 is used to paraphrase extracted sentences into a concise yet informative sentence. We evaluate our method using ROUGE-N (1,2), L, and SU4. The proposed method achieves the highest precision scores in all the metrics and highest F1 scores in ROUGE1, and LCS and the only solution to cross the MUSE solution baseline in ROUGE-LCS metrics.
摘要：提供公司在描述他们的经营和财务状况的财务年度结束年度报告股东。这些报告的平均长度为80，并且其可以向上延伸长至250页。在本文中，我们提出我们的方法论点5，我们在金融叙事概要（FNS）2020任务中使用（指针网络和T-5（测试到文本传输变压器）算法的组合）。所提出的方法使用指针网络从报告中提取重要句叙述，然后T-5是用来提取语句套用到简明而信息句。我们评估使用ROUGE-N（1,2），L，和SU4我们的方法。所提出的方法实现了所有度量精度最高分数和在ROUGE1，和LCS最高F1分数和交叉在ROUGE-LCS度量MUSE溶液基线的唯一解决方案。

50. Causal Feature Selection with Dimension Reduction for Interpretable Text Classification [PDF] 返回目录
Guohou Shan, James Foulds, Shimei Pan
Abstract: Text features that are correlated with class labels, but do not directly cause them, are sometimesuseful for prediction, but they may not be insightful. As an alternative to traditional correlation-basedfeature selection, causal inference could reveal more principled, meaningful relationships betweentext features and labels. To help researchers gain insight into text data, e.g. for social scienceapplications, in this paper we investigate a class of matching-based causal inference methods fortext feature selection. Features used in document classification are often high dimensional, howeverexisting causal feature selection methods use Propensity Score Matching (PSM) which is known to beless effective in high-dimensional spaces. We propose a new causal feature selection framework thatcombines dimension reduction with causal inference to improve text feature selection. Experiments onboth synthetic and real-world data demonstrate the promise of our methods in improving classificationand enhancing interpretability.
摘要：这与类标签相关，但并不直接导致他们文本功能，是对sometimesuseful预测，但他们可能不会是有见地的。作为替代传统的相关性basedfeature选择，因果推理可以揭示更多的原则性，betweentext功能和标签有意义的关系。为了帮助研究人员深入了解的文本数据，例如社会scienceapplications，在本文中，我们研究了一类基于匹配因果推理方法fortext特征选择。在文档分类中使用的特征往往是高维的，howeverexisting因果特征选择方法使用其已知BELESS有效高维空间倾向评分匹配（PSM）。我们提出用因果推断新的因果特征选择框架thatcombines降维，提高文本特征选择。实验onboth合成和真实世界的数据证明我们的方法在改善classificationand提高解释性的承诺。

51. The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders [PDF] 返回目录
Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda
Abstract: In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020). We aim to determine the effectiveness of two recent significant technologies in VC: sequence-to-sequence (seq2seq) models and autoregressive (AR) neural vocoders. Two respective systems were developed for the two tasks in the challenge: for task 1, we adopted the Voice Transformer Network, a Transformer-based seq2seq VC model, and extended it with synthetic parallel data to tackle nonparallel data; for task 2, we used the frame-based cyclic variational autoencoder (CycleVAE) to model the spectral features of a speech waveform and the AR WaveNet vocoder with additional fine-tuning. By comparing with the baseline systems, we confirmed that the seq2seq modeling can improve the conversion similarity and that the use of AR vocoders can improve the naturalness of the converted speech.
摘要：在本文中，我们提出了在名古屋大学（NU）的语音转换挑战2020（VCC2020）开发的语音转换（VC）系统。我们的目标是确定的VC最近两次显著技术的有效性：序列到序列（seq2seq）模型和自回归（AR）神经声码器。两个各自系统是在挑战这两个任务开发：任务1，我们通过语音变压器网络，基于变压器的seq2seq VC模型，并用合成并行数据，以解决平行数据扩展它;为任务2，我们使用了基于帧的循环变分自动编码器（CycleVAE）到语音波形和AR WaveNet声码器与另外的微调的光谱特征进行建模。通过与基线系统相比，我们证实seq2seq建模可提高转换相似，并且采用AR声码器，可提高转换语音的自然度。

52. Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN [PDF] 返回目录
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda
Abstract: In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and Parallel WaveGAN (PWG), i.e., CycleVAEPWG. CycleVAE is a nonparallel VAE-based voice conversion that utilizes converted acoustic features to consider cyclically reconstructed spectra during optimization. On the other hand, PWG is a non-autoregressive neural vocoder that is based on a generative adversarial network for a high-quality and fast waveform generator. In practice, the CycleVAEPWG system can be straightforwardly developed with the VCC 2020 dataset using a unified model for both Task 1 (intralingual) and Task 2 (cross-lingual), where our open-source implementation is available at this https URL. The results of VCC 2020 have demonstrated that the CycleVAEPWG baseline achieves the following: 1) a mean opinion score (MOS) of 2.87 in naturalness and a speaker similarity percentage (Sim) of 75.37% for Task 1, and 2) a MOS of 2.56 and a Sim of 56.46% for Task 2, showing an approximately or nearly average score for naturalness and an above average score for speaker similarity.
摘要：在本文中，我们提出了语音转换挑战（VCC）2020的基线系统的描述与环状变自动编码器（CycleVAE）和并行WaveGAN（PWG），即，CycleVAEPWG。 CycleVAE是不平行的基于VAE声音转换，其利用转换的声学特征在优化过程中考虑周期性重建的光谱。在另一方面，PWG是一种非自回归神经声码器是基于一个生成对抗网络的高品质和快速波形发生器上。在实践中，CycleVAEPWG系统可以直截了当地用一个统一的模型VCC 2020集两个任务1（语内）和任务2（跨语种），在那里我们的开源实现可在此HTTPS URL开发。 VCC 2020的结果已经证实，CycleVAEPWG基线实现以下情况：1）在自然的平均意见得分（MOS）的2.87和75.37％为任务1的扬声器的相似百分比（SIM），和2）的2.56的MOS和56.46％一个SIM卡，任务2，示出了用于自然近似或几乎平均分和用于扬声器相似度高于平均分数。

53. Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter [PDF] 返回目录
Sarah Masud, Subhabrata Dutta, Sakshi Makkar, Chhavi Jain, Vikram Goyal, Amitava Das, Tanmoy Chakraborty
Abstract: Online hate speech, particularly over microblogging platforms like Twitter, has emerged as arguably the most severe issue of the past decade. Several countries have reported a steep rise in hate crimes infuriated by malicious hate campaigns. While the detection of hate speech is one of the emerging research areas, the generation and spread of topic-dependent hate in the information network remain under-explored. In this work, we focus on exploring user behaviour, which triggers the genesis of hate speech on Twitter and how it diffuses via retweets. We crawl a large-scale dataset of tweets, retweets, user activity history, and follower networks, comprising over 161 million tweets from more than $41$ million unique users. We also collect over 600k contemporary news articles published online. We characterize different signals of information that govern these dynamics. Our analyses differentiate the diffusion dynamics in the presence of hate from usual information diffusion. This motivates us to formulate the modelling problem in a topic-aware setting with real-world knowledge. For predicting the initiation of hate speech for any given hashtag, we propose multiple feature-rich models, with the best performing one achieving a macro F1 score of 0.65. Meanwhile, to predict the retweet dynamics on Twitter, we propose RETINA, a novel neural architecture that incorporates exogenous influence using scaled dot-product attention. RETINA achieves a macro F1-score of 0.85, outperforming multiple state-of-the-art models. Our analysis reveals the superlative power of RETINA to predict the retweet dynamics of hateful content compared to the existing diffusion models.
摘要：网上仇恨言论，特别是在微博平台Twitter等，已经成为可以说是过去十年中最严重的问题。一些国家报告了恶意的仇恨运动激怒了仇恨犯罪急剧上升。虽然仇恨言论的检测是新兴的研究领域之一，产生和信息网络中话题相关的仇恨蔓延仍在-探索。在这项工作中，我们重点探讨用户行为，从而引发仇恨言论的Twitter上发生，它是如何通过锐推扩散。我们抓取的鸣叫，转推，用户活动的历史，以及跟随网络的大规模数据集，包括来自超过$ $ 41万的独立用户超过1.61亿鸣叫。我们还收集了60万当代新闻文章在线发表。我们表征支配这些动态的信息不同的信号。我们的分析区分扩散动力学仇恨从平常的信息扩散的存在。这促使我们制定与现实世界的知识主题感知环境的建模问题。对于预测仇恨言论对于任何给定的主题标签的开始，我们提出了多个功能丰富的机型，与表现最好的一个实现的0.65宏F1得分。同时，预测Twitter上转推动力，我们建议视网膜，在使用缩放的点积注意结合外生影响一个新的神经结构。视网膜达到0.85的宏F1-得分，表现优于国家的最先进的多个模型。我们的分析表明视网膜的最高级功率预测相比，现有的扩散模型的仇恨内容的转推动力。

54. Event Representation with Sequential, Semi-Supervised Discrete Variables [PDF] 返回目录
Mohammad Mehdi Rezaee Taghiabadi, Francis Ferraro
Abstract: Within the context of event modeling and understanding, we propose a new method for neural sequence modeling that takes partially-observed sequences of discrete, external knowledge into account. We construct a sequential, neural variational autoencoder that uses a carefully defined encoder, and Gumbel-Softmax reparametrization, to allow for successful backpropagation during training. We show that our approach outperforms multiple baselines and the state-of-the-art in narrative script induction on multiple event modeling tasks. We demonstrate that our approach converges more quickly.
摘要：在事件建模和理解的背景下，我们提出了神经序列建模等需要分离，外部知识的部分，观测序列考虑的新方法。我们构建了一个连续的，神经变自动编码器使用一个精心定义的编码器，以及冈贝尔-使用SoftMax重新参数，允许在训练中成功的反向传播。我们证明了我们的方法优于多基线和国家的最先进的叙事脚本感应多个事件建模任务。我们证明我们的方法更迅速收敛。

55. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling [PDF] 返回目录
Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu
Abstract: This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model, replacing the attention mechanism with an explicit duration predictor. This improves robustness significantly as measured by unaligned duration ratio and word deletion rate, two metrics introduced in this paper for large-scale robustness evaluation using a pre-trained speech recognition model. With the use of Gaussian upsampling, Non-Attentive Tacotron achieves a 5-scale mean opinion score for naturalness of 4.41, slightly outperforming Tacotron 2. The duration predictor enables both utterance-wide and per-phoneme control of duration at inference time. When accurate target durations are scarce or unavailable in the training data, we propose a method using a fine-grained variational auto-encoder to train the duration predictor in a semi-supervised or unsupervised manner, with results almost as good as supervised training.
摘要：本文介绍非细心Tacotron基础上，Tacotron 2文本 - 语音模型，有一个明确的时间预测取代注意机制。这如由未对齐的持续时间比和字缺失率测量显著提高的鲁棒性，在本文中介绍了使用预训练的语音识别模型大规模稳健性评估两个指标。与使用高斯采样的，非细心Tacotron实现了5级平均意见得分为4.41自然，略优于Tacotron 2.持续时间预测同时启用发声宽和每音素在推理时间的持续时间的控制。当精确的目标持续时间是稀缺的，或在训练数据不可用，我们建议使用细粒度变自动编码器来训练的半监督或监督的方式持续时间预测因子的方法，其结果几乎一样好监督训练。

56. Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements [PDF] 返回目录
Yang Li, Gang Li, Luheng He, Jingjie Zheng, Hong Li, Zhiwei Guan
Abstract: Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a large-scale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces.
摘要：用户界面（UI）元素的自然语言描述，诸如可选文本是用于一般访问性和基于语言的相互作用至关重要。然而，这些描述都在不断丢失的移动用户界面。我们建议插件字幕，一种新颖的任务从输入的多峰包括图像和用户界面的结构表示两者自动生成用于UI元素语言描述。我们收集部件字幕大规模数据集众包。我们的数据包含由人类工人注释跨越21,750独特的UI界面61285个UI元素创建162859个语言短语。我们深入分析数据集，并培训和评估一套深层模型配置，以调查各特征模式，以及学习策略的选择如何影响预测字幕的质量。任务的制定和数据集，以及我们的标杆车型贡献了坚实的基础，这种新型多任务字幕连接的语言和用户界面。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-10-12

目录

摘要