摘要

1. Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion [PDF] 返回目录
Jeroen Van Hautte, Vincent Schelstraete, Mikaël Wornoo
Abstract: Machine learning plays an ever-bigger part in online recruitment, powering intelligent matchmaking and job recommendations across many of the world's largest job platforms. However, the main text is rarely enough to fully understand a job posting: more often than not, much of the required information is condensed into the job title. Several organised efforts have been made to map job titles onto a hand-made knowledge base as to provide this information, but these only cover around 60\% of online vacancies. We introduce a novel, purely data-driven approach towards the detection of new job titles. Our method is conceptually simple, extremely efficient and competitive with traditional NER-based approaches. Although the standalone application of our method does not outperform a finetuned BERT model, it can be applied as a preprocessing step as well, substantially boosting accuracy across several architectures.
摘要：机器学习起着网上招聘的越来越大的部分，在许多世界上最大的工作平台的供电智能牵线搭桥和工作建议。然而，主要的文字是很少能够完全理解一个招聘启事：更多的，往往不是太大的必要信息凝结成的职务。一些组织已作出努力，以职称映射到一个手工制作的知识基础，以提供这些信息，但这些只涵盖约60 \％的网上职位空缺。我们引入新的，对检测的新的工作头衔纯粹数据驱动的方法。我们的方法是概念简单，但非常有效和有竞争力与传统的基于NER的办法。虽然我们的方法的独立的应用程序不优于一个微调，BERT模型，它可以作为预处理步骤以及施加，跨若干架构基本上升压精度。

2. Meta-Learning for Few-Shot NMT Adaptation [PDF] 返回目录
Amr Sharaf, Hany Hassan, Hal Daumé III
Abstract: We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. META-MT provides a new approach to make NMT models easily adaptable to many target domains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed meta-learning strategy on ten domains with general large scale NMT systems. We show that META-MT significantly outperforms classical domain adaptation when very few in-domain examples are available. Our experiments shows that META-MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).
摘要：我们目前META-MT，超常的学习方法在几合一设定适应神经机器翻译（NMT）系统。 META-MT提供了一种新的方法，使NMT模式很容易适应在域数据的最小量多目标域。我们帧NMT系统的适应作为元学习的问题，在这里我们学会适应于基于模拟的离线元培训领域适应性任务的新看不见的领域。我们评估与一般大型NMT系统10域提议的元学习策略。我们证明了META-MT显著优于时很少域内范例可供古典领域适应性。我们的实验表明，META-MT可以通过高达2.5 BLEU点只看到4，000译词（300个平行句子）后胜过经典微调。

3. Evaluating NLP Models via Contrast Sets [PDF] 返回目录
Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Abstract: Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we recommend that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Contrast sets provide a local view of a model's decision boundary, which can be used to more accurately evaluate a model's true linguistic capabilities. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets (e.g., DROP reading comprehension, UD parsing, IMDb sentiment analysis). Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets---up to 25\% in some cases. We release our contrast sets as new evaluation benchmarks and encourage future dataset construction efforts to follow similar annotation processes.
摘要：标准的测试集合为指导的学习评估分布概括。不幸的是，当一个数据集有系统性的差距（例如，注释文物），这些评估是误导：一个模型可以得知，在测试组表现良好，但没有捕获数据集的预期能力的简单决策规则。我们提出了NLP新的注释模式，有助于在测试数据来关闭系统的差距。具体而言，构造一个数据集后，我们建议将数据集作者人工扰动小，但有意义的方式是（通常）改变黄金标签，创建对比度设置测试实例。对比度设置提供了一个模型的决策边界，它可以用来在本地查看更准确地评估模型的真正的语言能力。我们通过对10点不同的数据集NLP（例如，DROP阅读理解，UD解析，IMDB情感分析）创建它们展示的对比组的疗效。虽然我们的对比组都没有明确的对抗，模型性能的显著他们比原来的测试集---高达25 \％，在某些情况下低。我们发布的对比度集作为新的评估基准，并鼓励未来的数据集建设力度，以遵循类似的注释的过程。

4. Quantum Inspired Word Representation and Computation [PDF] 返回目录
Shen Li, Renfen Hu, Jinshan Wu
Abstract: Word meaning has different aspects, while the existing word representation "compresses" these aspects into a single vector, and it needs further analysis to recover the information in different dimensions. Inspired by quantum probability, we represent words as density matrices, which are inherently capable of representing mixed states. The experiment shows that the density matrix representation can effectively capture different aspects of word meaning while maintaining comparable reliability with the vector representation. Furthermore, we propose a novel method to combine the coherent summation and incoherent summation in the computation of both vectors and density matrices. It achieves consistent improvement on word analogy task.
摘要：词义具有不同的方面，而现有的单词表示“压缩”这些方面到单一载体中，它需要进一步的分析，以恢复在不同的尺寸的信息。通过量子概率启发，我们表示字作为密度矩阵，其固有地能够表示混合态的。实验表明，密度矩阵表示可有效地捕获的字义不同方面，同时保持与向量表示可比的可靠性。此外，我们提出了一种新颖的方法，以相干求和和非相干求和在载体和密度矩阵的计算相结合。它实现了对词的比喻任务持续改善。

5. At Which Level Should We Extract? An Empirical Study on Extractive Document Summarization [PDF] 返回目录
Qingyu Zhou, Furu Wei, Ming Zhou
Abstract: Extractive methods have proven to be very effective in automatic document summarization. Previous works perform this task by identifying informative contents at sentence level. However, it is unclear whether performing extraction at sentence level is the best solution. In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative. Specifically, we propose extracting sub-sentential units on the corresponding constituency parsing tree. A neural extractive model which leverages the sub-sentential information and extracts them is presented. Extensive experiments and analyses show that extracting sub-sentential units performs competitively comparing to full sentence extraction under the evaluation of both automatic and human evaluations. Hopefully, our work could provide some inspiration of the basic extraction units in extractive summarization for future research.
摘要：采掘方法已被证明是在自动文档总结是非常有效的。以前的作品通过在句子层面识别信息内容执行此任务。但是，目前还不清楚是否在句子层面进行提取是最好的解决方案。在这项工作中，我们表明，提取完整的句子时，和提取子句子单位是有前途的替代存在不需要性和冗余的问题。具体来说，我们建议在提取相应的选区解析树子句子单位。它充分利用子句子信息，并提取它们的神经采掘模型。大量的实验和分析表明，提取子句子单位来进行竞争性地比较完整的句子提取了自动和人类评估的评估之下。希望我们的工作能够提供基本的提取单位的一些灵感在未来的研究采掘总结。

6. Sparse Text Generation [PDF] 返回目录
Pedro Henrique Martins, Zita Marinho, André F. T. Martins
Abstract: Current state-of-the-art text generators build on powerful language models such as GPT-2, which have impressive performance. However, to avoid degenerate text, they require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling. This creates a mismatch between training and testing conditions. In this paper, we use the recently introduced entmax transformation to train and sample from a natively sparse language model, avoiding this mismatch. The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text. In order to evaluate our model, we propose three new metrics that are tailored for comparing sparse or truncated distributions: $\epsilon$-perplexity, sparsemax score, and Jensen-Shannon divergence. Human-evaluated experiments in story completion and dialogue generation show that entmax sampling leads to more engaging and coherent stories and conversations.
摘要：当前国家的最先进的文本生成建立在强大的语言模型，如GPT-2，其具有不俗的表现。然而，为了避免退化文本，它们需要由改性SOFTMAX采样，经由温度参数或自组织技术截短，如在顶$ $ķ或细胞核采样。这将创建训练和测试条件之间的不匹配。在本文中，我们使用了最近推出的entmax改造，火车和样品从稀疏本地语言模型，避免这种不匹配。其结果是一个文本生成与流畅性和一致性，更少的重复方面是有利的性能，和n-gram中的多样性更接近人的文本。为了评估我们的模型，我们建议，量身定制比较稀少或截断分布三个新指标：$ \ $小量-perplexity，sparsemax得分，和Jensen-Shannon散。在完成剧情和对话代秀人评估实验证明entmax采样导致更吸引人的和连贯的故事和对话。

7. Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight [PDF] 返回目录
Hengyi Cai, Hongshen Chen, Yonghao Song, Cheng Zhang, Xiaofang Zhao, Dawei Yin
Abstract: Current state-of-the-art neural dialogue models learn from human conversations following the data-driven paradigm. As such, a reliable training corpus is the crux of building a robust and well-behaved dialogue model. However, due to the open-ended nature of human conversations, the quality of user-generated training data varies greatly, and effective training samples are typically insufficient while noisy samples frequently appear. This impedes the learning of those data-driven neural dialogue models. Therefore, effective dialogue learning requires not only more reliable learning samples, but also fewer noisy samples. In this paper, we propose a data manipulation framework to proactively reshape the data distribution towards reliable samples by augmenting and highlighting effective learning samples as well as reducing the effect of inefficient samples simultaneously. In particular, the data manipulation model selectively augments the training samples and assigns an importance weight to each instance to reform the training data. Note that, the proposed data manipulation framework is fully data-driven and learnable. It not only manipulates training samples to optimize the dialogue generation model, but also learns to increase its manipulation skills through gradient descent with validation samples. Extensive experiments show that our framework can improve the dialogue generation performance with respect to 13 automatic evaluation metrics and human judgments.
摘要：当前国家的最先进的神经对话模式从以下数据驱动的范式人交谈学习。因此，一个可靠的训练语料库是构建一个稳健乖巧的对话模式的症结所在。然而，由于人类对话的开放性质，用户生成的训练数据的质量良莠不齐，有效的训练样本通常不足，而噪声采样频频出现。这些数据驱动的神经对话模式这阻碍了学习。因此，有效的对话学习，不仅需要更可靠的学习样本，同时也减少噪声采样。在本文中，我们提出了一个数据操作框架，通过增加，并强调有效的学习样本，以及同时减少低效样品的影响，积极重塑对可靠的样本数据的分布。特别地，所述数据操纵模型选择性地增强所述训练样本和重要性权重分配给每个实例改革训练数据。需要注意的是，所提出的数据操作框架是完全数据驱动的和可以学习的。它不仅操纵训练样本，优化了对话代车型，而且还学会通过与验证样品梯度下降，以增加其操作技能。大量的实验表明，我们的架构能够提高相对于13个自动评估指标和人为判断的对话生成性能。

8. Learning to Summarize Passages: Mining Passage-Summary Pairs from Wikipedia Revision Histories [PDF] 返回目录
Qingyu Zhou, Furu Wei, Ming Zhou
Abstract: In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories. In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously. The constructed dataset contains more than one hundred thousand passage-summary pairs. The quality analysis shows that it is promising that the dataset can be used as a training and validation set for passage summarization. We validate and analyze the performance of various summarization systems on the proposed dataset. The dataset will be available online at this https URL.
摘要：在本文中，我们提出了通过挖掘维基百科页面修订历史记录，自动构建一个通道对汇总数据集的方法。特别地，该方法地雷主体的通道和被同时添加到页面引入句子。构建的数据集包含超过十万通道，总结对。质量分析表明，它是有前途的，该数据集可以被用作通道总结训练和验证集。我们验证和分析所提出的数据集各种摘要系统的性能。此数据集将在此HTTPS URL下载。

9. Bootstrapping a Crosslingual Semantic Parser [PDF] 返回目录
Tom Sherborne, Yumo Xu, Mirella Lapata
Abstract: Datasets for semantic parsing scarcely consider languages other than English and professional translation can be prohibitively expensive. In this work, we propose to adapt a semantic parser trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We evaluate if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and resources such as multilingual BERT. Experimental results on a new version of ATIS and Overnight in German and Chinese indicate that MT can approximate training data in a new language for accurate parsing when augmented with paraphrasing through multiple MT engines.
摘要：数据集的语义分析几乎没有考虑英语和专业翻译其他语言可以是非常昂贵的。在这项工作中，我们提出了适应训练了单一的语言，如英语语义解析，以新的语言，用最少的注释多个域。我们评估是否机器翻译是训练数据充足的替代品，并延长该调查使用英语，意译和资源，联合训练等多语种BERT自举。在ATIS的新版本和隔夜在德国和中国的实验结果表明，当通过多个MT引擎转述增强该MT可以精确解析新的语言逼近训练数据。

10. Dictionary-based Data Augmentation for Cross-Domain Neural Machine Translation [PDF] 返回目录
Wei Peng, Chongxuan Huang, Tianhao Li, Yun Chen, Qun Liu
Abstract: Existing data augmentation approaches for neural machine translation (NMT) have predominantly relied on back-translating in-domain (IND) monolingual corpora. These methods suffer from issues associated with a domain information gap, which leads to translation errors for low frequency and out-of-vocabulary terminology. This paper proposes a dictionary-based data augmentation (DDA) method for cross-domain NMT. DDA synthesizes a domain-specific dictionary with general domain corpora to automatically generate a large-scale pseudo-IND parallel corpus. The generated pseudo-IND data can be used to enhance a general domain trained baseline. The experiments show that the DDA-enhanced NMT models demonstrate consistent significant improvements, outperforming the baseline models by 3.75-11.53 BLEU. The proposed method is also able to further improve the performance of the back-translation based and IND-finetuned NMT models. The improvement is associated with the enhanced domain coverage produced by DDA.
摘要：现有数据扩张接近用于神经机器翻译（NMT）都主要依赖于背平移域（IND）的单语语料库。这些方法从与域信息差距相关的问题，遭受导致翻译错误的低频和外的词汇术语。本文提出了一种跨域NMT基于字典的数据扩张（DDA）方法。 DDA合成具有一般结构域全集的域专用辞典来自动生成大型伪IND平行语料库。所生成的伪-IND数据可以被用于增强通用域训练基线。实验表明，DDA增强NMT模型演示一致显著的改善，通过3.75-11.53 BLEU跑赢基准模型。该方法还能够进一步提高回译基础，IND-NMT微调，模型的性能。的改善与由DDA所产生的增强的域覆盖范围相关联。

11. Distinguish Confusing Law Articles for Legal Judgment Prediction [PDF] 返回目录
Nuo Xu, Pinghui Wang, Long Chen, Li Pan, Xiaoyan Wang, Junzhou Zhao
Abstract: Legal Judgment Prediction (LJP) is the task of automatically predicting a law case's judgment results given a text describing its facts, which has great prospects in judicial assistance systems and handy services for the public. In practice, confusing charges are often presented, because law cases applicable to similar law articles are easily misjudged. To address this issue, the existing work relies heavily on domain experts, which hinders its application in different law systems. In this paper, we present an end-to-end model, LADAN, to solve the task of LJP. To distinguish confusing charges, we propose a novel graph neural network to automatically learn subtle differences between confusing law articles and design a novel attention mechanism that fully exploits the learned differences to attentively extract effective discriminative features from fact descriptions. Experiments conducted on real-world datasets demonstrate the superiority of our LADAN.
摘要：法律判决预测（LJP）是自动预测案子的判决结果给予其描述事实文本，它在为公众提供司法协助系统和方便的服务前景巨大的任务。在实践中，混乱的收费往往呈现，因为适用于类似的法律条款违法案件很容易误判。为了解决这个问题，现有的工作在很大程度上依赖于领域专家，这阻碍了在不同的法律制度及其应用。在本文中，我们提出了一个端到端的高端型号，拉丹，解决LJP的任务。为了区分混淆费用，我们提出了一个新颖的图形神经网络自动学习困惑法条款之间的细微差别和设计新颖的注意机制，充分利用所学差异，聚精会神地提取事实说明有效的判别特征。在现实世界中的数据集进行的实验证明我们拉丹的优越性。

12. Building a Norwegian Lexical Resource for Medical Entity Recognition [PDF] 返回目录
Ildikó Pilán, Pål H. Brekke, Lilja Øvrelid
Abstract: We present a large Norwegian lexical resource of categorized medical terms. The resource merges information from large medical databases, and contains over 77,000 unique entries, including automatically mapped terms from a Norwegian medical dictionary. We describe the methodology behind this automatic dictionary entry mapping based on keywords and suffixes and further present the results of a manual evaluation performed on a subset by a domain expert. The evaluation indicated that ca. 80% of the mappings were correct.
摘要：我们提出分类医学术语的大挪威词汇资源。资源合并来自大型医疗数据库的信息，并包含超过77000唯一条目，其中包括来自挪威医学词典自动映射条款。我们描述背后根据关键字和后缀，并进一步呈现由领域专家的一个子集进行人工评估的结果，这种自动字典项映射的方法。评估表明，约映射的80％是正确的。

13. An analysis of the utility of explicit negative examples to improve the syntactic abilities of neural language models [PDF] 返回目录
Hiroshi Noji, Hiroya Takamura
Abstract: We explore the utilities of explicit negative examples in training neural language models. Negative examples here are incorrect words in a sentence, such as "barks" in "*The dogs barks". Neural language models are commonly trained only on positive examples, a set of sentences in the training data, but recent studies suggest that the models trained in this way are not capable of robustly handling complex syntactic constructions, such as long-distance agreement. In this paper, using English data, we first demonstrate that appropriately using negative examples about particular constructions (e.g., subject-verb agreement) will boost the model's robustness on them, with a negligible loss of perplexity. The key to our success is an additional margin loss between the log-likelihoods of a correct word and an incorrect word. We then provide a detailed analysis of the trained models. One of our findings is the difficulty of object-relative clauses for RNNs. We find that even with our direct learning signals the models still suffer from resolving agreement across an object-relative clause. Augmentation of training sentences involving the constructions somewhat helps, but the accuracy still does not reach the level of subject-relative clauses. Although not directly cognitively appealing, our method can be a tool to analyze the true architectural limitation of neural models on challenging linguistic constructions.
摘要：我们探讨的明确反面典型的公用事业训练神经语言模型。这里负面的例子是在一个句子不正确的话，如“*狗叫声”，“树皮”。神经语言模型仅在正面的例子，一组训练数据的句子一般受过训练的，但最近的研究表明，以这种方式培养出来的模型不能够稳健地处理复杂的句法结构，如长途协议。在本文中，使用英语的数据，我们首先证明关于特定结构（例如，主谓一致）将提高模型的鲁棒性，请用困惑的一个微不足道的损失适当地利用反面典型。我们成功的关键是正确的字的日志可能性和不正确的字之间的附加差额损失。然后，我们提供训练的模型进行了详细分析。我们的一个调查结果是RNNs相对于对象的条款的难度。我们发现，即使我们直接学习信号的模型仍然从解决跨相对于对象的条款协议受苦。训练涉及句子结构有所增强帮助，但精度仍没有达到对象相对条款的水平。虽然没有直接认知的吸引力，我们的方法可以是分析具有挑战性的语言结构的神经模型的真正的建筑限制的工具。

14. SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction [PDF] 返回目录
Xuming Hu, Lijie Wen, Yusong Xu, Chenwei Zhang, Philip S. Yu
Abstract: Open relation extraction is the task of extracting open-domain relation facts from natural language sentences. Existing works either utilize heuristics or distant-supervised annotations to train a supervised classifier over pre-defined relations, or adopt unsupervised methods with additional assumptions that have less discriminative power. In this work, we proposed a self-supervised framework named SelfORE, which exploits weak, self-supervised signals by leveraging large pretrained language model for adaptive clustering on contextualized relational features, and bootstraps the self-supervised signals by improving contextualized features in relation classification. Experimental results on three datasets show the effectiveness and robustness of SelfORE on open-domain Relation Extraction when comparing with competitive baselines. Source code is available at this https URL.
摘要：开放式关系抽取是提取自然语言中的句子开放域关系事实的任务。现有的作品无论是利用启发式或远处监督注解来训练监督分类在预先定义的关系，或采取与具有较少的辨别力附加假设无监督的方法。在这项工作中，我们通过改善关系的分类语境特点提出了一个名为SelfORE自我监督框架，通过利用自适应集群技术预训练的语言模型的语境关系特性，并且引导了自我监督的信号利用弱，自我监督的信号。三个数据集实验结果具有竞争力的基准进行比较时表现出SelfORE的开放域关系抽取的有效性和鲁棒性。源代码可在此HTTPS URL。

15. Grayscale Data Construction and Multi-Level Ranking Objective for Dialogue Response Selection [PDF] 返回目录
Zibo Lin, Deng Cai, Yan Wang, Xiaojiang Liu, Hai-Tao Zheng, Shuming Shi
Abstract: Response selection plays a vital role in building retrieval-based conversation systems. Recent works on enhancing response selection mainly focus on inventing new neural architectures for better modeling the relation between dialogue context and response candidates. In almost all these previous works, binary-labeled training data are assumed: Every response candidate is either positive (relevant) or negative (irrelevant). We propose to automatically build training data with grayscale labels. To make full use of the grayscale training data, we propose a multi-level ranking strategy. Experimental results on two benchmark datasets show that our new training strategy significantly improves performance over existing state-of-the-art matching models in terms of various evaluation metrics.
摘要：反应选择在构建基于内容的检索的会话系统至关重要的作用。对提高反应选择最近的作品主要集中在创造新的神经结构更好地模拟对话的语境和响应候选人之间的关系。在几乎所有这些以前的作品中，假定二进制标记的训练数据：每回答候选要么是正的（相关）或阴性（无关）。我们建议自动建立与灰度标签的训练数据。为了充分利用灰度训练数据，我们提出了一个多层次的排名策略。在两个标准数据集实验结果表明，我们的新培训战略显著改善各种评价指标方面比国家的最先进的现有配套机型性能。

16. PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems [PDF] 返回目录
Tian Lan, Xian-Ling Mao, Wei Wei, Xiaoyan Gao, Heyan Huang
Abstract: Open-domain generative dialogue systems have attracted considerable attention over the past few years. Currently, how to automatically evaluate them, is still a big challenge problem. As far as we know, there are three kinds of automatic methods to evaluate the open-domain generative dialogue systems: (1) Word-overlap-based metrics; (2) Embedding-based metrics; (3) Learning-based metrics. Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective. In this paper, we will first measure systematically all kinds of automatic evaluation metrics over the same experimental setting to check which kind is best. Through extensive experiments, the learning-based metrics are demonstrated that they are the most effective evaluation metrics for open-domain generative dialogue systems. Moreover, we observe that nearly all learning-based metrics depend on the negative sampling mechanism, which obtains an extremely imbalanced and low-quality dataset to train a score model. In order to address this issue, we propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments by using augmented POsitive samples and valuable NEgative samples, called PONE. Extensive experiments demonstrate that our proposed evaluation method significantly outperforms the state-of-the-art learning-based evaluation methods, with an average correlation improvement of 13.18%. In addition, we have publicly released the codes of our proposed method and state-of-the-art baselines.
摘要：开放域生成对话系统已经吸引了相当多的关注，在过去的几年里。目前，如何自动对其进行评估，仍然是一个很大的挑战问题。据我们所知，有三种自动方法评估开放域生成对话系统：（1）字重叠为基础的指标; （2）基于嵌入的指标; （3）基于学习的指标。由于缺乏系统的比较，目前尚不清楚哪种度量的是更有效的。在本文中，我们将首先测量系统在相同的实验设置，检查哪一种最好各种自动评价指标。通过大量的实验，基于学习的指标表明，他们是开放域生成对话系统中最有效的评价指标。此外，我们观察到，几乎所有的基于学习的指标取决于负采样机制，取得了极其不平衡，低质量的数据集来训练分数模型。为了解决这个问题，我们提出了一种新的，可行的学习为基础的指标，可以通过使用增强阳性样品和宝贵的阴性样品，称为推迟实施显著改善与人工判断的相关性。大量的实验证明，我们提出的评价方法显著优于国家的最先进的基于学习的评价方法，有13.18％的平均相关性改进。此外，我们已经公开发布我们提出的方法和国家的最先进的基线的代码。

17. Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games [PDF] 返回目录
Yufei Feng, Mo Yu, Wenhan Xiong, Xiaoxiao Guo, Junjie Huang, Shiyu Chang, Murray Campbell, Michael Greenspan, Xiaodan Zhu
Abstract: We propose the new problem of learning to recover reasoning chains from weakly supervised signals, i.e., the question-answer pairs. We propose a cooperative game approach to deal with this problem, in which how the evidence passages are selected and how the selected passages are connected are handled by two models that cooperate to select the most confident chains from a large set of candidates (from distant supervision). For evaluation, we created benchmarks based on two multi-hop QA datasets, HotpotQA and MedHop; and hand-labeled reasoning chains for the latter. The experimental results demonstrate the effectiveness of our proposed approach.
摘要：我们建议学习来恢复弱信号的监督，即问答配对推理链的新问题。我们提出了一个合作博弈的方法来处理这个问题，其中如何证据通道选择和合作所选择的通道连接的两个模型是处理如何选择从大集候选人最有信心的链（从遥远的监督）。为了进行评估，我们基于两个多跳QA数据集，HotpotQA和MedHop创建基准;和手标记的推理链后者。实验结果表明，我们提出的方法的有效性。

18. BERT in Negotiations: Early Prediction of Buyer-Seller Negotiation Outcomes [PDF] 返回目录
Kushal Chawla, Gale Lucas, Jonathan Gratch, Jonathan May
Abstract: The task of building automatic agents that can negotiate with humans in free-form natural language has gained recent interest in the literature. Although there have been initial attempts, combining linguistic understanding with strategy effectively still remains a challenge. Towards this end, we aim to understand the role of natural language in negotiations from a data-driven perspective by attempting to predict a negotiation's outcome, well before the negotiation is complete. Building on the recent advancements in pre-trained language encoders, our model is able to predict correctly within 10% for more than 70% of the cases, by looking at just 60% of the negotiation. These results suggest that rather than just being a way to realize a negotiation, natural language should be incorporated in the negotiation planning as well. Such a framework can be directly used to get feedback for training an automatically negotiating agent.
摘要：建立能够与自由形式的自然语言人类洽谈代理自动的任务已经获得了在最近的文学兴趣。尽管已有初步尝试，语言理解与战略有效结合仍然是一个挑战。为此，我们的目标是试图预测谈判的结果，以及之前的谈判是完全理解自然语言的，从一个数据驱动的角度谈判中的作用。在预先训练语言编码器的最新进展的基础上，我们的模型能够在10％以内正确预测的情况下超过70％，通过看谈判的只有60％。这些结果表明，而不是仅仅是实现谈判的方式，自然语言应该在谈判规划结合为好。这种框架可以直接使用得到的反馈用于训练自动协商剂。

19. Domain-based Latent Personal Analysis and its use for impersonation detection in social media [PDF] 返回目录
Osnat Mokryn, Hagit Ben-Shoshan
Abstract: Zipf's law defines an inverse proportion between a word's ranking in a given corpus and its frequency in it, roughly dividing the vocabulary to frequent (popular) words and infrequent ones. Here, we stipulate that within a domain an author's signature can be derived from, in loose terms, the author's missing popular words and frequently used infrequent-words. We devise a method, termed Latent Personal Analysis (LPA), for finding such domain-based personal signatures. LPA determines what words most contributed to the distance between a user's vocabulary from the domain's. We identify the most suitable distance metric for the method among several and construct a personal signature for authors. We validate the correctness and power of the signatures in identifying authors and utilize LPA to identify two types of impersonation in social media: (1) authors with sockpuppets (multiple) accounts; (2) front-user accounts, operated by several authors. We validate the algorithms and employ them over a large scale dataset obtained from a social media site with over 4000 accounts, and corroborate the results employing temporal rate analysis. LPA can be used to devise personal signatures in a wide range of scientific domains in which the constituents have a long-tail distribution of elements.
摘要：齐普夫定律定义了一个词的排名在一个给定的语料库及其在它的频率之间的反比，词汇大致划分频繁（流行）字样和罕见的。在这里，我们规定一个域内的作者的签名可以源自，在松散的术语，笔者的失踪流行词和经常使用的罕见字。我们设计一种方法，称为潜在的个人分析（LPA），为找到这样的基于域的个人签名。 LPA决定什么话最有助于从域中的用户的词汇之间的距离。我们确定在几个方法最合适的距离度量和构建作者个人签名。我们验证签名的正确性和动力识别作者和利用LPA识别社交媒体两种类型的模拟：（1）与sockpuppets（多）作者账户; （2）前的用户帐户，操作由几个作者。我们验证算法，并使用它们在超过4000帐户从社交媒体网站获得大规模的数据集，并证实利用时间比率分析的结果。 LPA可用于设计在广泛科学领域的个人签名其中成分具有元件的长尾分布。

20. Neural Machine Translation with Imbalanced Classes [PDF] 返回目录
Thamme Gowda, Jonathan May
Abstract: We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components. Classifiers are known to perform better with balanced class distributions during training. Since the Zipfian nature of languages causes imbalanced classes, we explore the effect of class imbalance on NMT. We analyze the effect of vocabulary sizes on NMT performance and reveal an explanation for 'why' certain vocabulary sizes are better than others.
摘要：我们投的神经机器翻译（NMT）作为自回归设置一个分类任务和分析分类和自回归部分的局限性。分类器被称为训练过程中执行与平衡类分布更好。由于语言的Zipfian性质导致不平衡类，我们探索类不平衡对NMT的影响。我们分析对NMT性能词汇量的影响，并揭示了解释“为什么”一定的词汇量大小比别人做得更好。

21. Improved Pretraining for Domain-specific Contextual Embedding Models [PDF] 返回目录
Subendhu Rongali, Abhyuday Jagannatha, Bhanu Pratap Singh Rawat, Hong Yu
Abstract: We investigate methods to mitigate catastrophic forgetting during domain-specific pretraining of contextual embedding models such as BERT, DistilBERT, and RoBERTa. Recently proposed domain-specific models such as BioBERT, SciBERT and ClinicalBERT are constructed by continuing the pretraining phase on a domain-specific text corpus. Such pretraining is susceptible to catastrophic forgetting, where the model forgets some of the information learned in the general domain. We propose the use of two continual learning techniques (rehearsal and elastic weight consolidation) to improve domain-specific training. Our results show that models trained by our proposed approaches can better maintain their performance on the general domain tasks, and at the same time, outperform domain-specific baseline models on downstream domain tasks.
摘要：我们调查方法特定领域的训练前情境嵌入模型如BERT，DistilBERT和罗伯塔的过程中，以减轻灾难性遗忘。最近提出的特定域等车型BioBERT，SciBERT和ClinicalBERT通过继续在特定领域的文本语料库的训练前阶段构成。这样的训练前易受灾难性的遗忘，其中模型忘记一些在一般领域了解到的信息。我们建议使用两个连续的学习技术（排练和弹性体重合并），以提高特定领域的培训。我们的研究结果表明，我们所提出的方法训练的模型能较好地保持其在通用领域任务上的表现，并在同一时间，跑赢大市域特定基准模型上下游领域的任务。

22. Hierarchical Entity Typing via Multi-level Learning to Rank [PDF] 返回目录
Tongfei Chen, Yunmo Chen, Benjamin Van Durme
Abstract: We propose a novel method for hierarchical entity classification that embraces ontological structure at both training and during prediction. At training, our novel multi-level learning-to-rank loss compares positive types against negative siblings according to the type tree. During prediction, we define a coarse-to-fine decoder that restricts viable candidates at each level of the ontology based on already predicted parent type(s). We achieve state-of-the-art across multiple datasets, particularly with respect to strict accuracy.
摘要：我们提出了分层实体分类，在训练和预测期间涵盖本体结构的新方法。在训练，我们的新型多层次的学习到秩损失比较根据类型树对负的兄弟姐妹积极的类型。在预测，我们定义一个粗到细解码器，它限制基于已经预测父类型（一个或多个）的本体的各电平的可行候选。我们实现状态的最先进的跨多个数据集，特别是相对于严格的精度。

23. Natural language processing for word sense disambiguation and information extraction [PDF] 返回目录
K. R. Chowdhary
Abstract: This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.
摘要：用自然语言处理（NLP）和必要的信息提取这项研究工作涉及在一个明确的形式。信息管理策略中，最常见的是文献检索（DR）和信息过滤。 DR系统可以工作为联合收割机，其中带回有用的材料从原材料的广阔领域。随着大量的手可能有用的信息，信息提取（IE）系统然后可以通过精炼和它减少到原始文本的胚芽变换原料。文件检索系统收集携带所需的信息，从文本信息库中的相关文件。然后一个IE系统将它们转换成被更容易地消化和分析的信息。它排列在一起隔离相关的文字片段，提取从片段的相关信息，然后在一个连贯的框架目标的信息。本文提出了使用词库词义消了新的途径。在说明性实施例中支持这种方法迅速有效的消歧的有效性。的文件检索方法，基于模糊逻辑的已经描述和它的应用中示出。一问一答的系统描述了从检索到的文本文档信息提取的操作。信息提取的回答查询的过程相当通过使用基于查询的枢机主教在谁，什么，何时，何地，为什么形式的结构化描述语言（SDL）简化。本文的结论基于证据推理的证据理论，文献检索和信息提取的新策略的表现。这一策略允许的诸多限制，这是在贝叶斯概率统计方法固有的放松。

24. Semantics of the Unwritten [PDF] 返回目录
He Bai, Peng Shi, Jimmy Lin, Luchen Tan, Kun Xiong, Wen Gao, Jie Liu, Ming Li
Abstract: The semantics of a text is manifested not only by what is read, but also by what is not read. In this article, we will study how those implicit "not read" information such as end-of-paragraph (EOP) and end-of-sequence (EOS) affect the quality of text generation. Transformer-based pretrained language models (LMs) have demonstrated the ability to generate long continuations with good quality. This model gives us a platform for the first time to demonstrate that paragraph layouts and text endings are also important components of human writing. Specifically, we find that pretrained LMs can generate better continuations by learning to generate the end of the paragraph (EOP) in the fine-tuning stage. Experimental results on English story generation show that EOP can lead to higher BLEU score and lower EOS perplexity. To further investigate the relationship between text ending and EOP, we conduct experiments with a self-collected Chinese essay dataset on Chinese-GPT2, a character level LM without paragraph breaker or EOS during pre-training. Experimental results show that the Chinese GPT2 can generate better essay endings with paragraph information. Experiments on both English stories and Chinese essays demonstrate that learning to end paragraphs can benefit the continuation generation with pretrained LMs.
摘要：文本的语义不仅受所读体现，而且还什么都不读。在这篇文章中，我们将研究如何将这些隐含的“不读”信息，如结束第（EOP）和结束序列（EOS）影响文本生成的质量。基于变压器的预训练的语言模型（LMS）已经证明，产生长期延续良好质量的能力。这种模式使我们首次证明，段落布局和文本的结局也是人写的重要组成部分的平台。具体而言，我们发现，预训练的LMS能够通过学习产生在微调阶段，第（EOP）结束时生成更好的延续。英语故事一代表明，EOP能带来更高的BLEU得分和更低的EOS困惑实验结果。为了进一步研究文本结束和EOP之间的关系，我们进行实验与中国-GPT2，在训练前不段断路器或EOS字符级LM自行收集中国散文集。实验结果表明，中国GPT2可以产生一段信息的更好的文章的结尾。在英语故事和杂文中国实验证明，学习结束的段落可以用预先训练的LM受益延续一代。

25. Speaker Recognition using SincNet and X-Vector Fusion [PDF] 返回目录
Mayank Tripathi, Divyanshu Singh, Seba Susan
Abstract: In this paper, we propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Vector. The idea behind using SincNet filters on the raw speech waveform is to extract more distinguishing frequency-related features in the initial convolution layers of the CNN architecture. X-Vectors are used to take advantage of the fact that this embedding is an efficient method to churn out fixed dimension features from variable length speech utterances, something which is challenging in plain CNN techniques, making it efficient both in terms of speed and accuracy. Our approach uses the best of both worlds by combining X-vector in the later layers while using SincNet filters in the initial layers of our deep model. This approach allows the network to learn better embedding and converge quicker. Previous works use either X-Vector or SincNet Filters or some modifications, however we introduce a novel fusion architecture wherein we have combined both the techniques to gather more information about the speech signal hence, giving us better results. Our method focuses on the VoxCeleb1 dataset for speaker recognition, and we have used it for both training and testing purposes.
摘要：在本文中，我们提出了通过融合两个最近推出的深层神经网络（DNNs）执行说话人识别的一种创新方法，即 - SincNet和X-矢量。后面使用SincNet想法滤波器对原始声音波形是在CNN架构的初始卷积层以提取多个区别频率相关的功能。 X-载体用于以利用这样一个事实，这种嵌入是生产出从可变长度语音话语固定维数的特征的有效的方法，一些东西，以纯CNN技术挑战，使其无论在速度和准确度方面有效。我们的方法，而在我们的深层模型的初始层使用SincNet过滤器在后来的层结合X-矢量使用两全其美。这种方法使网络更好地学习嵌入和收敛更快。以前的作品请使用X-Vector或SincNet过滤器或一些修改，但我们引入一个新的融合架构，其中，我们结合两种技术，从而收集有关语音信号的详细信息，让我们更好的效果。我们的方法侧重于VoxCeleb1数据集用于识别说话，我们已经用它来训练和测试的目的。

26. Prototype-to-Style: Dialogue Generation with Style-Aware Editing on Retrieval Memory [PDF] 返回目录
Yixuan Su, Yan Wang, Simon Baker, Deng Cai, Xiaojiang Liu, Anna Korhonen, Nigel Collier
Abstract: The ability of a dialog system to express prespecified language style during conversations has a direct, positive impact on its usability and on user satisfaction. We introduce a new prototype-to-style (PS) framework to tackle the challenge of stylistic dialogue generation. The framework uses an Information Retrieval (IR) system and extracts a response prototype from the retrieved response. A stylistic response generator then takes the prototype and the desired language style as model input to obtain a high-quality and stylistic response. To effectively train the proposed model, we propose a new style-aware learning objective as well as a de-noising learning strategy. Results on three benchmark datasets from two languages demonstrate that the proposed approach significantly outperforms existing baselines in both in-domain and cross-domain evaluations
摘要：对话系统的对话中表达预先指定的语言风格的能力有它的可用性和用户满意度有直接和积极的影响。我们引入一个新的原型到样式（PS）框架来解决文体的对话产生的挑战。框架使用的信息检索（IR）系统，并提取从所检索的响应的响应原型。然后，响应文体发电机取原型和所需的语言风格作为模型的输入，以获得高品质的和文体响应。为了有效地培养提出的模型，我们提出了一种新的风格感知学习目标以及去噪学习策略。从两种语言的三个地基准数据集的结果表明，该方法显著优于两个域内和跨域评估现有基准

27. Syntax-driven Iterative Expansion Language Models for Controllable Text Generation [PDF] 返回目录
Noe Casas, José A. R. Fonollosa, Marta R. Costa-jussà
Abstract: The dominant language modeling paradigms handle text as a sequence of discrete tokens. While these approaches can capture the latent structure of the text, they are inherently constrained to sequential dynamics for text generation. We propose a new paradigm for introducing a syntactic inductive bias into neural language modeling and text generation, where the dependency parse tree is used to drive the Transformer model to generate sentences iteratively, starting from a root placeholder and generating the tokens of the different dependency tree branches in parallel, using either word or subword vocabularies. Our experiments show that this paradigm is effective for text generation, with quality and diversity comparable or superior to those of sequential baselines, and how its inherently controllable generation process enables control over the output syntactic constructions, allowing the induction of stylistic variations.
摘要：占主导地位的语言建模范式处理文本为离散的标记序列。虽然这些方法可以捕获文本的潜在结构，它们本身受限于对文本生成连续的动态。我们提出了一个新的范式引入语法归纳偏置成神经语言模型和文本生成，其中依赖解析树被用来驱动变压器模型迭代产生的句子，从根占位符开始，并产生不同的依赖关系树的令牌在并行分支，即使用词或子词的词汇表。我们的实验表明，这种模式是有效的文本生成，质量和多样性相媲美或优于连续的基线，而其固有的可控生成过程中如何使在输出句法结构的控制，允许风格变化的诱导。

28. Stylistic Dialogue Generation via Information-Guided Reinforcement Learning Strategy [PDF] 返回目录
Yixuan Su, Deng Cai, Yan Wang, Simon Baker, Anna Korhonen, Nigel Collier, Xiaojiang Liu
Abstract: Stylistic response generation is crucial for building an engaging dialogue system for industrial use. While it has attracted much research interest, existing methods often generate stylistic responses at the cost of the content quality (relevance and fluency). To enable better balance between the content quality and the style, we introduce a new training strategy, know as Information-Guided Reinforcement Learning (IG-RL). In IG-RL, a training model is encouraged to explore stylistic expressions while being constrained to maintain its content quality. This is achieved by adopting reinforcement learning strategy with statistical style information guidance for quality-preserving explorations. Experiments on two datasets show that the proposed approach outperforms several strong baselines in terms of the overall response performance.
摘要：文体响应一代是建设一个接合对话系统用于工业用途的关键。虽然它吸引了很多的研究兴趣，现有的方法往往产生在内容质量（相关性和流畅性）的成本文体响应。为了使内容品质和风格之间取得更好的平衡，我们引入了一个新的培训策略，知道的信息为指导强化学习（IG-RL）。在IG-RL，训练模式是鼓励而被约束，以保持它的内容质量，探索文体表达。这是通过采用强化学习策略与风格统计信息指导质量保持探索实现。两个数据集的实验表明，该方法比几个强势基线的整体响应性能方面。

29. Understanding Learning Dynamics for Neural Machine Translation [PDF] 返回目录
Conghui Zhu, Guanlin Li, Lemao Liu, Tiejun Zhao, Shuming Shi
Abstract: Despite the great success of NMT, there still remains a severe challenge: it is hard to interpret the internal dynamics during its training process. In this paper we propose to understand learning dynamics of NMT by using a recent proposed technique named Loss Change Allocation (LCA)~\citep{lan-2019-loss-change-allocation}. As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario. %motivated by the lesson from sgd. Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results to the brute-force implementation. In particular, extensive experiments on two standard translation benchmark datasets reveal some valuable findings.
摘要：尽管NMT的巨大成功，也仍然是一个严峻的挑战：这是很难在其训练过程来解释内在动力。在本文中，我们提出用最近提出的损失技术命名更改分配（LCA）了解NMT的学习动力〜\ {citep LAN-2019-损失变分配}。由于LCA需要计算在每个更新的整个数据集的梯度，而不是我们提出了一个大致把它付诸实践NMT场景。％从SGD的教训动机。我们的模拟实验表明，这种近似计算是有效的，根据经验证明，提供一致的结果，以蛮力实现。特别是，两个标准的翻译标准数据集大量的实验揭示了一些有价值的结论。

30. AR: Auto-Repair the Synthetic Data for Neural Machine Translation [PDF] 返回目录
Shanbo Cheng, Shaohui Kuang, Rongxiang Weng, Heng Yu, Changfeng Zhu, Weihua Luo
Abstract: Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality. However, as a well-known shortcoming, synthetic parallel data is noisy because they are generated by an imperfect NMT system. As a result, the improvements in translation quality bring by the synthetic parallel data are greatly diminished. In this paper, we propose a novel Auto- Repair (AR) framework to improve the quality of synthetic data. Our proposed AR model can learn the transformation from low quality (noisy) input sentence to high quality sentence based on large scale monolingual data with BT and FT techniques. The noise in synthetic parallel data will be sufficiently eliminated by the proposed AR model and then the repaired synthetic parallel data can help the NMT models to achieve larger improvements. Experimental results show that our approach can effective improve the quality of synthetic parallel data and the NMT model with the repaired synthetic data achieves consistent improvements on both WMT14 EN!DE and IWSLT14 DE!EN translation tasks.
摘要：仅使用有限的真实并行数据作为训练语料相比，许多研究已经证明，包含合成的并行数据，其通过反向翻译（BT）或前向平移（FT，或selftraining）中产生，进NMT训练过程可以显著提高翻译质量。然而，如公知的缺点，因为它们是由一个不完美的NMT系统生成的合成的并行数据是有噪声的。其结果是，在翻译质量通过合成并行数据带来的改进大大减少。在本文中，我们提出了一种新颖的自动 - 修理（AR）框架以改善合成数据的质量。我们所提出的AR模型可以借鉴低质量（吵）输入的句子，以优质的判决基于与BT和FT技术的大规模单语数据的转换。在合成的并行数据中的噪声将通过所提出的AR模型被充分地消除，然后修复合成并行数据可以帮助NMT模型，以实现更大的改进。实验结果表明，我们的方法可以有效的提高合成并行数据的质量和NMT模型与修复合成数据实现了两个WMT14 EN！DE和IWSLT14 DE！EN翻译任务持续改善。

31. Arabic Offensive Language on Twitter: Analysis and Experiments [PDF] 返回目录
Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali
Abstract: Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building effective Arabic offensive tweet detection. We introduce a method for building an offensive dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. Next, we analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct a large battery of experiments to produce strong results (F1 = 79.7) on the dataset using Support Vector Machine techniques.
摘要：在Twitter检测冒犯性的语言有许多应用，从检测/预测威吓到测量极化。在本文中，我们专注于建立有效的进攻阿拉伯语鸣叫检测。我们介绍了构建不受话题，方言，或目标偏向进攻数据集的方法。我们生产的最大的阿拉伯语数据集的最新的庸俗和仇恨言论的特殊标签。接下来，我们分析数据集以确定哪些话题，方言和性别是最有攻击性的鸣叫和扬声器如何用阿拉伯语攻击性的语言有关。最后，我们进行了大量的电池的实验，产生基于支持向量机技术的数据集强劲的业绩（F1 = 79.7）。

32. Detecting and Understanding Generalization Barriers for Neural Machine Translation [PDF] 返回目录
Guanlin Li, Lemao Liu, Conghui Zhu, Tiejun Zhao, Shuming Shi
Abstract: Generalization to unseen instances is our eternal pursuit for all data-driven models. However, for realistic task like machine translation, the traditional approach measuring generalization in an average sense provides poor understanding for the fine-grained generalization ability. As a remedy, this paper attempts to identify and understand generalization barrier words within an unseen input sentence that \textit{cause} the degradation of fine-grained generalization. We propose a principled definition of generalization barrier words and a modified version which is tractable in computation. Based on the modified one, we propose three simple methods for barrier detection by the search-aware risk estimation through counterfactual generation. We then conduct extensive analyses on those detected generalization barrier words on both Zh$\Leftrightarrow$En NIST benchmarks from various perspectives. Potential usage of the detected barrier words is also discussed.
摘要：推广到未见实例是我们所有的数据驱动模型的永恒追求。然而，对于像机器翻译现实的任务，在平均意义上的传统方法测量推广提供了细粒度的泛化能力缺乏了解。作为补救措施，本文试图看不见的输入句子内的识别和理解概括屏障单词\ textit {原因}细粒度概括的降解。我们提出的推广障碍的话有原则的定义和修改后的版本，其在计算上容易处理。根据修改后的一个，我们通过反代搜索感知风险评估提出了障碍检测三个简单方法。然后，我们对从各个角度都深航$ \ $ Leftrightarrow恩NIST基准的检测泛化屏障的话进行了广泛的分析。所检测到的屏障词语的潜在的使用进行了讨论。

33. FastBERT: a Self-distilling BERT with Adaptive Inference Time [PDF] 返回目录
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju
Abstract: Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.
摘要：预先训练语言模型，如BERT已被证明是非常高性能的。然而，他们往往在许多实际场景中计算量很大，对于这样的重型机型也很难被轻易利用有限的资源来实现。为了提高自己的有保证模型的性能效率，我们提出了具有自适应推理时间新颖的速度可调FastBERT。在推理速度可以根据不同的需求来灵活地调节，而避免了样品的冗余计算。此外，这款机型采用的微调独特的自蒸馏机制，进一步实现与性能的最小损失更大的计算效能。我们的模型实现承诺十二英国和中国的数据集的结果。它能够通过的宽范围内比BERT 1至12倍，以加快如果给定的不同的阈值的加速比，使速度性能折衷。

34. Reinforced Multi-task Approach for Multi-hop Question Generation [PDF] 返回目录
Deepak Gupta$, Hardik Chauhan, Asif Ekbal, Pushpak Bhattacharyya
Abstract: Question generation (QG) attempts to solve the inverse of question answering (QA) problem by generating a natural language question given a document and an answer. While sequence to sequence neural models surpass rule-based systems for QG, they are limited in their capacity to focus on more than one supporting fact. For QG, we often require multiple supporting facts to generate high-quality questions. Inspired by recent works on multi-hop reasoning in QA, we take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. In addition, we also proposed a question-aware reward function in a Reinforcement Learning (RL) framework to maximize the utilization of the supporting facts. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA. Empirical evaluation shows our model to outperform the single-hop neural question generation models on both automatic evaluation metrics such as BLEU, METEOR, and ROUGE, and human evaluation metrics for quality and coverage of the generated questions.
摘要：问代（QG）试图通过给定生成的文档和回答的自然语言问题解决问答（QA）问题的反。虽然序列序列神经车型超越基于规则的系统QG，它们在各自专注于一个以上的支持其实容量的限制。对于QG，我们经常需要多个配套事实来生成高质量的问题。通过在QA多跳推理近期作品的启发，我们采取了多跳问题的产生，其目的是基于使用上下文支持的事实相关的问题。我们采用多任务与答案感知配套事实上预测的辅助任务学习指导问题生成。此外，我们还提出在强化学习（RL）的框架问题意识的奖励功能，最大限度地支持事实的利用率。我们证明我们的方法通过对多跳问答集，HotPotQA实验的有效性。实证评价表明我们的模型跑赢上都自动评价指标，如BLEU，流星，胭脂，以及所产生的问题，质量和覆盖面人的评价标准单跳的神经问题代车型。

35. Reference Language based Unsupervised Neural Machine Translation [PDF] 返回目录
Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita
Abstract: Exploiting common language as an auxiliary for better translation has a long tradition in machine translation, which lets supervised learning based machine translation enjoy the enhancement delivered by the well-used pivot language, in case that the prerequisite of parallel corpus from source language to target language cannot be fully satisfied. The rising of unsupervised neural machine translation (UNMT) seems completely relieving the parallel corpus curse, though still subject to unsatisfactory performance so far due to vague clues available used for its core back-translation training. Further enriching the idea of pivot translation by freeing the use of parallel corpus other than its specified source and target, we propose a new reference language based UNMT framework, in which the reference language only shares parallel corpus with the source, indicating clear enough signal to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline in terms of only one auxiliary language, demonstrating the usefulness of the proposed reference language based UNMT with a good start.
摘要：开拓共同语言作为辅助更好的翻译在机器翻译，它可以让监督学习基于机器翻译享受到井使用的支点语言传递的增强，以防悠久的传统平行语料库的前提下，从源语言目标语言不能完全满意。无监督神经机器翻译（UNMT）的上升似乎完全缓解平行语料库诅咒，但仍然受制于业绩不理想，到目前为止，由于可用于其核心回译训练模糊的线索。通过释放使用比其指定的源和目标等平行语料库的进一步丰富枢翻译的思想，提出了一种基于UNMT框架的新参考语言，其中参考语言只股平行语料库与源，显示足够清晰的信号帮助UNMT重建培训，通过一个建议参考的协议机制。实验结果表明，我们的方法提高UNMT过，一个强大的基线的质量，只有一个辅助语言方面，也表明我们基于UNMT有一个良好的开端，建议参考语言的实用性。

36. GIANT: Scalable Creation of a Web-scale Ontology [PDF] 返回目录
Bang Liu, Weidong Guo, Di Niu, Jinwen Luo, Chaoyue Wang, Zhen Wen, Yu Xu
Abstract: Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.
摘要：了解什么在线用户可留意的关键是内容推荐和搜索服务。这些服务将受益于实体，概念，事件，话题和类别高度结构化和网络规模的本体。虽然现有的知识基础和分类体现了大量的实体和类别，我们认为，他们没有发现在网上人口的语言风格正确细粒度的概念，事件和话题。无论是这些概念之间保持了逻辑结构的本体。在本文中，我们目前巨大的，一种机制，构建以用户为中心，网络化，结构化的本体，含有大量的自然语言短语符合各种粒度的用户关注，从Web文档和搜索的庞大体积开采单击图表。各种类型的边缘也被构造成保持在本体的层次结构。我们目前在GIANT使用我们的图形，基于神经网络的技术，并且相比于各种基线评估所提出的方法。 GIANT产生了关注本体论，其已被部署在涉及超过十亿用户的各种应用的腾讯。腾讯QQ浏览器显示的测试进行在线A / B注意力本体可以显著提高新闻推荐点击率。

37. Unsupervised Domain Clusters in Pretrained Language Models [PDF] 返回目录
Roee Aharoni, Yoav Goldberg
Abstract: The notion of "in-domain data" in NLP is often over-simplistic and vague, as textual data varies in many nuanced linguistic aspects such as topic, style or level of formality. In addition, domain labels are many times unavailable, making it challenging to build domain-specific systems. We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision -- suggesting a simple data-driven definition of domains in textual data. We harness this property and propose domain data selection methods based on such models, which require only a small set of in-domain monolingual data. We evaluate our data selection methods for neural machine translation across five diverse domains, where they outperform an established approach as measured by both BLEU and by precision and recall of sentence selection with respect to an oracle.
摘要：“在域数据”的NLP的概念往往是过于简单和模糊，如文本数据在许多细致入微的语言等方面的话题，风格或形式的电平而变化。此外，域标签多次无法使用，使得它具有挑战性的构建特定领域的系统。我们发现，大量的预先训练的语言模型隐含学习句子表示该集群由域没有监督 - 这表明在文本数据领域的一个简单的数据驱动的定义。我们利用这种特性，并提出基于这种模式，仅需要一小套域内单语数据域数据的选择方法。我们评估了在五个不同领域，其中如通过BLEU和精度和句子选择的召回相对于一个oracle测量他们超越既定的做法神经机器翻译数据选择方法。

38. A Resource for Studying Chatino Verbal Morphology [PDF] 返回目录
Hilaria Cruz, Gregory Stump, Antonios Anastasopoulos
Abstract: We present the first resource focusing on the verbal inflectional morphology of San Juan Quiahije Chatino, a tonal mesoamerican language spoken in Mexico. We provide a collection of complete inflection tables of 198 lemmata, with morphological tags based on the UniMorph schema. We also provide baseline results on three core NLP tasks: morphological analysis, lemmatization, and morphological inflection.
摘要：我们提出的第一个资源集中在圣胡安Quiahije Chatino，在墨西哥讲话的声调语言中美洲的口头屈折形态。我们提供198 lemmata的完整拐点表的集合，基于单晶片架构形态标记。我们还提供了三个核心NLP任务基准结果：形态分析，词形还原和形态拐点。

39. Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech [PDF] 返回目录
Mihir Kale, Scott Roy
Abstract: While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expressed in English, text generation into other languages involves elements of translation, transliteration and copying elements already encoded in neural machine translation systems. Moreover, since data-to-text corpora are typically small, this task can benefit greatly from pre-training. Based on our experiments on Czech, a morphologically complex language, we find that pre-training lets us train end-to-end models with significantly improved performance, as judged by automatic metrics and human evaluation. We also show that this approach enjoys several desirable properties, including improved performance in low data scenarios and robustness to unseen slot values.
摘要：虽然有大量的研究机构研究从结构化数据的文本生成深的学习方法，它对几乎所有的纯粹专注于英语。在本文中，我们研究的机器翻译为基础数据到文本生成在非英语语言训练前的效果。由于结构化数据是英文普遍表示，文本生成成其他语言包括神经机器翻译系统已编码的意译，音译的元素和复制的元素。此外，由于数据到文本语料库通常很小，这个任务可以极大地从前期培训中获益。基于我们对捷克，形态上复杂的语言实验，我们发现前培训，让我们培养端至高端机型与显著提高性能，通过自动的度量和人评价判断。我们还表明，这种方法享有若干所期望的特性，包括在低数据场景下的改进性能和稳定性，以看不见的槽值。

40. Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation [PDF] 返回目录
Sreyashi Nag, Mihir Kale, Varun Lakshminarasimhan, Swapnil Singhavi
Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.
摘要：本文探讨结合双语词典，使半监督的神经机器翻译的方式。传统的回译的方法显示，在利用目标端的单语数据成功。然而，由于回译车型的质量依赖于可用的平行语料库的大小，这可能会产生不利的合成产生的句子在低资源环境的影响。我们提出了一个简单的数据增强技术解决这两个这个缺点。我们引入广泛使用的双语词典是产量字的字翻译生成合成句子。这种自动扩展，同时保持高品质的内容模型的词汇。我们的方法显示性能比强基线的明显改善。

41. End-to-End Abstractive Summarization for Meetings [PDF] 返回目录
Chenguang Zhu, Ruochen Xu, Michael Zeng, Xuedong Huang
Abstract: With the abundance of automatic meeting transcripts, meeting summarization is of great interest to both participants and other parties. Traditional methods of summarizing meetings depend on complex multi-step pipelines that make joint optimization intractable. Meanwhile, there are a handful of deep neural models for text summarization and dialogue systems. However, the semantic structure and styles of meeting transcripts are quite different from articles and conversations. In this paper, we propose a novel end-to-end abstractive summary network that adapts to the meeting scenario. We propose a role vector for each participant and a hierarchical structure to accommodate long meeting transcripts. Empirical results show that our model considerably outperforms previous approaches in both automatic metrics and human evaluation. For example, in the ICSI dataset, the ROUGE-1 score increases from 32.00% to 39.51%.
摘要：同丰自动会议笔录，会议总结是非常感兴趣的参与者和其他各方。总结会议的传统方法依赖于复杂的多步流水线，使联合优化棘手。同时，也有文字总结和对话系统深层神经车型屈指可数。然而，语义结构和风格满足成绩单是从文章和谈话相当不同。在本文中，我们提出了一个新颖的终端到终端的抽象总结的网络，适应了会议的情况。我们建议每个参与者和层次结构，以适应漫长的会议笔录作用向量。实证结果表明，我们的模型大大优于在两个自动度量和人工评估以前的办法。例如，在数据集中ICSI，胭脂-1得分从32.00％增加至39.51％。

42. Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection [PDF] 返回目录
Hanjie Chen, Guangtao Zheng, Yangfeng Ji
Abstract: Generating explanations for neural networks has become crucial for their applications in real-world with respect to reliability and trustworthiness. In natural language processing, existing methods usually provide important features which are words or phrases selected from an input text as an explanation, but ignore the interactions between them. It poses challenges for humans to interpret an explanation and connect it to model prediction. In this work, we build hierarchical explanations by detecting feature interactions. Such explanations visualize how words and phrases are combined at different levels of the hierarchy, which can help users understand the decision-making of black-box models. The proposed method is evaluated with three neural text classifiers (LSTM, CNN, and BERT) on two benchmark datasets, via both automatic and human evaluations. Experiments show the effectiveness of the proposed method in providing explanations that are both faithful to models and interpretable to humans.
摘要：生成解释为神经网络已成为他们在真实世界相对于可靠性和可信赖的应用至关重要。在自然语言处理，现有方法通常提供作为从输入文本作为解释选择的词或短语的重要特征，但是忽略它们之间的相互作用。它带来的挑战为人类所解释的解释，并把它连接到模型预测。在这项工作中，我们建立通过检测特征交互层次的解释。这样的解释直观了解单词和短语在不同级别层次的，它可以帮助用户了解黑箱模型的决策相结合。所提出的方法与在两个基准数据集3级神经文本的分类器（LSTM，CNN和BERT）来评价，通过自动和人的评估。实验表明，在提供既忠实于模型和解释人类说明了该方法的有效性。

43. Talk to Papers: Bringing Neural Question Answering to Academic Search [PDF] 返回目录
Tianchang Zhao, Kyusong Lee
Abstract: We introduce Talk to Papers, which exploits the recent open-domain question answering (QA) techniques to improve the current experience of academic search. It's designed to enable researchers to use natural language queries to find precise answers and extract insights from a massive amount of academic papers. We present a large improvement over classic search engine baseline on several standard QA datasets and provide the community a collaborative data collection tool to curate the first natural language processing research QA dataset via a community effort.
摘要：介绍交谈论文，它利用近期开放域问答（QA）技术，以提高学术搜索的当前经验。它的目的是使研究人员能够使用自然语言查询，发现从学术论文的巨量准确的答案和提取见解。我们目前的一些标准数据集QA相比于传统的搜索引擎基线大的改善，并提供社区协作数据收集工具通过社区的努力策划第一自然语言处理研究QA数据集。

44. Graph Sequential Network for Reasoning over Sequences [PDF] 返回目录
Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou
Abstract: Recently Graph Neural Network (GNN) has been applied successfully to various NLP tasks that require reasoning, such as multi-hop machine reading comprehension. In this paper, we consider a novel case where reasoning is needed over graphs built from sequences, i.e. graph nodes with sequence data. Existing GNN models fulfill this goal by first summarizing the node sequences into fixed-dimensional vectors, then applying GNN on these vectors. To avoid information loss inherent in the early summarization and make sequential labeling tasks on GNN output feasible, we propose a new type of GNN called Graph Sequential Network (GSN), which features a new message passing algorithm based on co-attention between a node and each of its neighbors. We validate the proposed GSN on two NLP tasks: interpretable multi-hop reading comprehension on HotpotQA and graph based fact verification on FEVER. Both tasks require reasoning over multiple documents or sentences. Our experimental results show that the proposed GSN attains better performance than the standard GNN based methods.
摘要：近日图表神经网络（GNN）已成功地应用于需要推理各种NLP任务，如多跳机阅读理解。在本文中，我们考虑其中需要在从序列内置图形推理，即图中的节点与序列数据的新颖的情况。现有GNN模型由第一总结节点序列插入固定维矢量，然后对这些载体施加GNN实现这一目标。为了避免信息丢失固有的早期总结，并就GNN输出序列标号的任务是可行的，我们提出了一种新型GNN称为图形顺序网络（GSN），其特点传递算法新消息基于节点之间共同关注每一个邻国。我们验证了该GSN上的两个NLP任务：在HotpotQA可解释的多跳阅读理解和FEVER基于图形事实的验证。这两个任务都需要推理在多个文档或句子。我们的实验结果表明，该GSN达到比标准的基于GNN方法更好的性能。

45. Open Domain Dialogue Generation with Latent Images [PDF] 返回目录
Ze Yang, Wei Wu, Huang Hu, Can Xu, Zhoujun Li
Abstract: We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that there is a latent variable in a textual dialogue that represents the image, and trying to recover the latent image through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.
摘要：我们认为接地开域对话，图像。现有的工作假定两个图像和文本语境是可用的，但自然图像接地对话更难获得比文本对话。因此，我们建议假设有在代表图像的文本对话的一个潜变量，并试图通过文字到图像生成技术来恢复潜像学习既形象接地对话与文本的对话响应生成模式。这两种类型的对话的可能性，然后通过一个响应产生器和被有条件变自动编码框架中了解到的图像重构器配制。实证研究在两个图像接地谈话和基于文本的对话进行。在第一种情况，图像接地对话，特别是在低资源设置，可以有效地通过与潜像文本对话增强;而在第二种情况下，潜像可以丰富的响应的内容，并在同一时间让他们相关的上下文。

46. Hooks in the Headline: Learning to Generate Headlines with Controlled Styles [PDF] 返回目录
Di Jin, Zhijing Jin, Joey Tianyi Zhou, Lisa Orii, Peter Szolovits
Abstract: Current summarization systems only produce plain, factual headlines, but do not meet the practical needs of creating memorable titles to increase exposure. We propose a new task, Stylistic Headline Generation (SHG), to enrich the headlines with three style options (humor, romance and clickbait), in order to attract more readers. With no style-specific article-headline pair (only a standard headline summarization dataset and mono-style corpora), our method TitleStylist generates style-specific headlines by combining the summarization and reconstruction tasks into a multitasking framework. We also introduced a novel parameter sharing scheme to further disentangle the style from the text. Through both automatic and human evaluation, we demonstrate that TitleStylist can generate relevant, fluent headlines with three target styles: humor, romance, and clickbait. The attraction score of our model generated headlines surpasses that of the state-of-the-art summarization model by 9.68%, and even outperforms human-written references.
摘要：当前摘要系统只生产普通的，真实的头条新闻，但不符合创造难忘的标题，以增加曝光的实际需要。我们提出了一个新的任务，文体头条代（SHG），充实与三个样式选项（幽默，浪漫和clickbait）的头条新闻，以吸引更多的读者。在没有特别的风格 - 文章 - 标题对（只一个标准的标题总结数据集和单式语料库），我们的方法TitleStylist通过总结和重建任务组合成一个多任务的框架产生风格特异的头条新闻。我们还推出了一个新的参数共享计划，以进一步解开风格从文本。通过自动和人工评估，我们证明了TitleStylist可以生成相关的，流畅的标题有三个目标风格：幽默，浪漫，clickbait。我们生成的模型头条新闻的吸引力得分超过由9.68％的国家的最先进的总结模式，并且甚至超过人类写引用。

47. Learning a Simple and Effective Model for Multi-turn Response Generation with Auxiliary Tasks [PDF] 返回目录
Yufan Zhao, Can Xu, Wei Wu
Abstract: We study multi-turn response generation for open-domain dialogues. The existing state-of-the-art addresses the problem with deep neural architectures. While these models improved response quality, their complexity also hinders the application of the models in real systems. In this work, we pursue a model that has a simple structure yet can effectively leverage conversation contexts for response generation. To this end, we propose four auxiliary tasks including word order recovery, utterance order recovery, masked word recovery, and masked utterance recovery, and optimize the objectives of these tasks together with maximizing the likelihood of generation. By this means, the auxiliary tasks that relate to context understanding can guide the learning of the generation model to achieve a better local optimum. Empirical studies with three benchmarks indicate that our model can significantly outperform state-of-the-art generation models in terms of response quality on both automatic evaluation and human judgment, and at the same time enjoys a much faster decoding process.
摘要：开放域对话研究多圈反应的产生。现有的国家的最先进的地址与深层神经结构的问题。虽然这些车型提高响应质量，其复杂性也阻碍了模型在实际系统中的应用。在这项工作中，我们追求的是结构简单，但可以有效利用对话上下文的响应一代的典范。为此，我们提出了四个辅助任务，包括词序恢复，话语秩序的恢复，屏蔽字恢复，并掩盖话语恢复，并优化这些任务的目标是最大化一代的可能性在一起。通过这种方式，了涉及上下文理解的辅助任务可以指导生成模型的学习以达到更好的局部最优。有三个基准实证研究表明，我们的模型可以在两个自动评估和人的判断，并在同一时间响应质量方面显著跑赢国家的最先进的代机型享有更快的解码过程。

48. BAE: BERT-based Adversarial Examples for Text Classification [PDF] 返回目录
Siddhant Garg, Goutham Ramakrishnan
Abstract: Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans but which get misclassified by the model. We present BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging a language model to generate alternatives for the masked tokens. Compared to prior work, we show that BAE performs a stronger attack on three widely used models for seven text classification datasets.
摘要：现代文本分类模型很容易受到对抗性的例子，被人类扰动原文无形的版本，但其中获得由模型错误分类。我们目前BAE，强大的黑匣子攻击生成语法正确和语义连贯的对抗例子。 BAE内容替换和插入通过遮蔽文本的一部分，并利用一语言模型来生成用于掩盖令牌替代在原始文本标记。相比于以前的工作，我们表明，BAE执行七个文本分类数据集上三个广泛使用的车型更强的攻击。

49. A Dependency Syntactic Knowledge Augmented Interactive Architecture for End-to-End Aspect-based Sentiment Analysis [PDF] 返回目录
Yunlong Liang, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, Jie Zhou
Abstract: The aspect-based sentiment analysis (ABSA) task remains to be a long-standing challenge, which aims to extract the aspect term and then identify its sentiment this http URL previous approaches, the explicit syntactic structure of a sentence, which reflects the syntax properties of natural language and hence is intuitively crucial for aspect term extraction and sentiment recognition, is typically neglected or insufficiently modeled. In this paper, we thus propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA. This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn). Additionally, we design a simple yet effective message-passing mechanism to ensure that our model learns from multiple related tasks in a multi-task learning framework. Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach, which significantly outperforms existing state-of-the-art methods. Besides, we achieve further improvements by using BERT as an additional feature extractor.
摘要：基于方面，情感分析（ABSA）任务仍然是一个长期的挑战，其目的是提取方面的术语，然后确定其情绪这个HTTP URL以前的方法，句子的明确的句法结构，这体现了自然语言语法特性，因此为术语方面提取和情绪识别直观关键，通常忽略或不充分建模。在本文中，我们由此提出一种新的依赖语法知识增强互动架构与多任务学习最终到终端的ABSA。这种模式能够充分利用通过利用精心设计的依存关系嵌入的图形卷积网络（DreGcn）的语法知识（依存关系和类型）。此外，我们设计了一个简单而有效的信息传递机制，以确保从多个相关任务，我们的模型在获悉多任务学习框架。对三个标准数据集大量的实验结果证明我们的方法，这显著优于现有的国家的最先进方法的有效性。此外，我们实现了通过BERT作为额外的特征提取进一步的改进。

50. Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems [PDF] 返回目录
Jia-Chen Gu, Tianda Li, Quan Liu, Xiaodan Zhu, Zhen-Hua Ling, Yu-Ping Ruan
Abstract: The NOESIS II challenge, as the Track 2 of the 8th Dialogue System Technology Challenges (DSTC 8), is the extension of DSTC 7. This track incorporates new elements that are vital for the creation of a deployed task-oriented dialogue system. This paper describes our systems that are evaluated on all subtasks under this challenge. We study the problem of employing pre-trained attention-based network for multi-turn dialogue systems. Meanwhile, several adaptation methods are proposed to adapt the pre-trained language models for multi-turn dialogue systems, in order to keep the intrinsic property of dialogue systems. In the released evaluation results of Track 2 of DSTC 8, our proposed models ranked fourth in subtask 1, third in subtask 2, and first in subtask 3 and subtask 4 respectively.
摘要：意向活动II挑战，作为第八届对话系统技术挑战（DSTC 8）的第2道，是DSTC 7的延伸这条赛道结合了用于创建一个部署的面向任务的对话系统的重要的新元素。本文介绍了我们的系统，其是在根据本挑战所有子任务评估。我们研究采用预先训练的重视基于网络的多转对话系统的问题。同时，提出了几种适应方法，以适应多转对话系统预先训练的语言模型，以保持对话系统的固有特性。在DSTC 8的音轨2的释放评价结果，我们提出的模型排在子任务1第四，第三子任务中2，并在第一子任务分别3和子任务4。

51. An Iterative Knowledge Transfer Network with Routing for Aspect-based Sentiment Analysis [PDF] 返回目录
Yunlong Liang, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, Jie Zhou
Abstract: Aspect-based sentiment analysis (ABSA) mainly involves three subtasks: aspect term extraction, opinion term extraction and aspect-level sentiment classification, which are typically handled separately or (partially) jointly. However, the semantic interrelationships among all the three subtasks are not well exploited in previous approaches, which restricts their performance. Additionally, the linguistic knowledge from document-level labeled sentiment corpora is usually used in a coarse way for the ABSA. To address these issues, we propose a novel Iterative Knowledge Transfer Network (IKTN) for the end-to-end ABSA. For one thing, to fully exploit the semantic correlations among the three aspect-level subtasks for mutual promotion, the IKTN transfers the task-specific knowledge from any two of the three subtasks to another one by leveraging a specially-designed routing algorithm, that is, any two of the three subtasks will help the third one. Besides, the IKTN discriminately transfers the document-level linguistic knowledge, i.e., domain-specific and sentiment-related knowledge, to the aspect-level subtasks to benefit the corresponding ones. Experimental results on three benchmark datasets demonstrate the effectiveness of our approach, which significantly outperforms existing state-of-the-art methods.
摘要：基于Aspect的情绪分析（ABSA）主要包括三个子任务：方面术语提取，看来术语提取和方面级情感分类，其典型地单独处理或（部分地）共同。然而，所有这三个子任务之间的语义相互关系没有得到很好的在以前的方法，这限制了其性能的发挥。此外，从文档级标记情绪语料库语言学知识粗方式的ABSA通常使用。为了解决这些问题，我们提出了终端到终端的ABSA一个新的迭代知识转移网络（IKTN）。一方面，要充分发挥三个方面的一级子任务相互促进之间的语义关联，该IKTN将来自三个子任务到另一个通过利用特别设计的路由算法中的任意两个任务，特定的知识，也就是，这三个子任务中的任何两个有利于第三个。此外，IKTN区别地传输文件级别的语言知识，即特定领域和情绪有关的知识，纵横级子任务到相应的人受益。对三个标准数据集实验结果证明我们的方法，这显著优于现有的国家的最先进方法的有效性。

52. "None of the Above":Measure Uncertainty in Dialog Response Retrieval [PDF] 返回目录
Yulan Feng, Shikib Mehri, Maxine Eskenazi, Tiancheng Zhao
Abstract: This paper discusses the importance of uncovering uncertainty in end-to-end dialog tasks, and presents our experimental results on uncertainty classification on the Ubuntu Dialog Corpus. We show that, instead of retraining models for this specific purpose, the original retrieval model's underlying confidence concerning the best prediction can be captured with trivial additional computation.
摘要：本文讨论了终端到终端的对话任务揭露不确定性的重要性，并提出我们对Ubuntu的对话语料库的不确定性分类的实验结果。我们证明了，而不是再培训为这一特定目的的机型，原检索模型的关于最佳预测潜在的信心能稍作额外的计算被捕获。

53. Prerequisites for Explainable Machine Reading Comprehension: A Position Paper [PDF] 返回目录
Saku Sugawara, Pontus Stenetorp, Akiko Aizawa
Abstract: Machine reading comprehension (MRC) has received considerable attention in natural language processing over the past few years. However, the conventional task design of MRC lacks the explainability beyond the model interpretation, i.e., the internal mechanics of the model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC based on psychology and psychometrics and summarizes it in terms of the requirements for explainable MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by improving the question quality and by formulating a white-box task.
摘要：机阅读理解（MRC）已收到在自然语言处理相当多的关注，在过去的几年里。然而，MRC的常规任务设计缺乏explainability超出模型解释，即，该模型的内部机制不能从人的角度解释。为此，该立场文件提供了MRC的基于心理学，心理测量学，总结它的可解释为MRC的要求方面的设计提供了理论依据。我们的结论是未来的数据集应该（一）评估模型构建一个协调一致的能力和接地表示理解上下文相关的情况和（ii）确保实质有效性通过提高问题质量，通过制定一个白盒的任务。

54. Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models [PDF] 返回目录
Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
Abstract: This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs). We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. In CQR benchmarks of task-oriented dialogue systems, we evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task. Examining a variety of architectures with different numbers of parameters, we demonstrate that the recent text-to-text transfer transformer (T5) achieves the best results both on CANARD and CAsT with fewer parameters, compared to similar transformer architectures.
摘要：本文介绍对话问题再形成（CQR）与序列到序列结构和预训练的语言模型（周期性肢体运动障碍）的实证研究。我们充分利用周期性肢体运动障碍，以解决共同的目标，最大似然估计作出了坚定的令牌到令牌独立的假设，对于CQR任务。在面向任务的对话系统CQR基准，我们评估在最近推出的CANARD数据集作为一个域内任务微调周期性肢体运动障碍，并从TREC 2019验证使用的数据模型CAST轨道作为外域任务。检查各种具有不同数量参数架构，我们表明，当前的文本到文本传输变压器（T5）达到上都CANARD和更少的参数投的最好成绩，相比同类变压器的架构。

55. Knowledge Guided Metric Learning for Few-Shot Text Classification [PDF] 返回目录
Dianbo Sui, Yubo Chen, Binjie Mao, Delai Qiu, Kang Liu, Jun Zhao
Abstract: The training of deep-learning-based text classification models relies heavily on a huge amount of annotation data, which is difficult to obtain. When the labeled data is scarce, models tend to struggle to achieve satisfactory performance. However, human beings can distinguish new categories very efficiently with few examples. This is mainly due to the fact that human beings can leverage knowledge obtained from relevant tasks. Inspired by human intelligence, we propose to introduce external knowledge into few-shot learning to imitate human knowledge. A novel parameter generator network is investigated to this end, which is able to use the external knowledge to generate relation network parameters. Metrics can be transferred among tasks when equipped with these generated parameters, so that similar tasks use similar metrics while different tasks use different metrics. Through experiments, we demonstrate that our method outperforms the state-of-the-art few-shot text classification models.
摘要：基于深学习文本分类模型的训练在很大程度上依赖于一个巨大的注释数据量，这是很难获得的。当标签数据稀少，模型往往很难达到满意的性能。然而，人类可以用几个例子非常有效区分新的类别。这主要是由于这样的事实，人类可以利用的相关任务获得的知识。通过人类智慧的启发，我们建议外部知识转化为少数次学习引进模仿人类的知识。一种新的参数生成器的网络进行了研究，为此，它是能够使用外部知识来生成关系网络参数。度量可以任务之间传输时配备有这些生成的参数，以使得在不同的任务使用不同的指标相似的任务使用类似的指标。通过实验，我们证明了我们的方法优于国家的最先进的几拍文本分类模型。

56. Evaluating Multimodal Representations on Visual Semantic Textual Similarity [PDF] 返回目录
Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre
Abstract: The combination of visual and textual representations has produced excellent results in tasks such as image captioning and visual question answering, but the inference capabilities of multimodal representations are largely untested. In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of textual representations. The long term goal of our research is to devise multimodal representation techniques that improve current inference capabilities. We thus present a novel task, Visual Semantic Textual Similarity (vSTS), where such inference ability can be tested directly. Given two items comprised each by an image and its accompanying caption, vSTS systems need to assess the degree to which the captions in context are semantically equivalent to each other. Our experiments using simple multimodal representations show that the addition of image representations produces better inference, compared to text-only representations. The improvement is observed both when directly computing the similarity between the representations of the two items, and when learning a siamese network based on vSTS training data. Our work shows, for the first time, the successful contribution of visual information to textual inference, with ample room for benchmarking more complex multimodal representation options.
摘要：视觉和文字表述的组合已经产生了任务的优异成绩，如图像字幕和视觉问题解答，但多表示的推理能力在很大程度上是未经考验的。在文本表示的情况下，推论任务，如文字蕴涵和语义文本相似性已经被经常用于衡量文本表示的质量。我们研究的长期目标是设计出多表现技术，改善目前的推理能力。因此，我们提出一个新的任务，可视语义文本相似性（VSTS），其中这样的推理能力，可以直接进行测试。鉴于由每两个项由图像和与之配套的字幕，VSTS系统需要，以评估在上下文中的字幕是语义上等同于彼此的程度。使用简单的多式联运表示我们的实验表明，添加图像表示的产生更好的推论，相对于纯文本表示。的提高，观察到两者时，直接计算两个项的表示之间的相似度，并且基于VSTS训练数据进行学习一个连体网络时。我们的工作表明，第一次，视觉信息的成功贡献文字推理，以充足的空间确定需要更复杂的多模态表示选项。

57. CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection [PDF] 返回目录
Congying Xia, Chenwei Zhang, Hoang Nguyen, Jiawei Zhang, Philip Yu
Abstract: In this paper, we formulate a more realistic and difficult problem setup for the intent detection task in natural language understanding, namely Generalized Few-Shot Intent Detection (GFSID). GFSID aims to discriminate a joint label space consisting of both existing intents which have enough labeled data and novel intents which only have a few examples for each class. To approach this problem, we propose a novel model, Conditional Text Generation with BERT (CG-BERT). CG-BERT effectively leverages a large pre-trained language model to generate text conditioned on the intent label. By modeling the utterance distribution with variational inference, CG-BERT can generate diverse utterances for the novel intents even with only a few utterances available. Experimental results show that CG-BERT achieves state-of-the-art performance on the GFSID task with 1-shot and 5-shot settings on two real-world datasets.
摘要：在本文中，我们制定一个更现实的和困难的问题建立在自然语言理解，即广义为数不多的拍摄意图检测（GFSID）意图检测任务。 GFSID旨在辨别关节标签中心包括具有足够的标签数据和新颖意图仅具有为每个类的几个例子包括现有意图的。解决这个问题，我们提出了一种新的模式，有条件的文本生成与BERT（CG-BERT）。 CG-BERT有效地利用了大预先训练语言模型生成文本空调的意图标签上。通过建模与变推理话语分布，CG-BERT可以生成即使只有少数话语提供的新的意图不同的话语。实验结果表明，CG-BERT实现国家的最先进的性能与1次和5次的设置GFSID任务上的两个现实世界的数据集。

58. News-Driven Stock Prediction With Attention-Based Noisy Recurrent State Transition [PDF] 返回目录
Xiao Liu, Heyan Huang, Yue Zhang, Changsen Yuan
Abstract: We consider direct modeling of underlying stock value movement sequences over time in the news-driven stock movement prediction. A recurrent state transition model is constructed, which better captures a gradual process of stock movement continuously by modeling the correlation between past and future price movements. By separating the effects of news and noise, a noisy random factor is also explicitly fitted based on the recurrent states. Results show that the proposed model outperforms strong baselines. Thanks to the use of attention over news events, our model is also more explainable. To our knowledge, we are the first to explicitly model both events and noise over a fundamental stock value state for news-driven stock movement prediction.
摘要：我们认为标的股票价值运动序列的直接建模上的消息驱动的股票走势预测时间。反复出现的状态转换模型构建了更好的捕捉不断通过模拟过去和未来的价格走势之间的相关股票走势的一个渐进的过程。通过分离的消息和噪声的影响，嘈杂的随机因素也明确安装基于经常性的状态。结果表明，该模型优于强基线。由于使用的注意了新闻事件，我们的模型也更可以解释。据我们所知，我们是第一个在一个基本的股票价值状态的事件和噪音明确模型消息驱动的股票走势预测。

59. STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization [PDF] 返回目录
Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, Ming Zhou
Abstract: Abstractive summarization aims to rewrite a long document to its shorter form, which is usually modeled as a sequence-to-sequence (Seq2Seq) learning problem. Seq2Seq Transformers are powerful models for this problem. Unfortunately, training large Seq2Seq Transformers on limited supervised summarization data is challenging. We, therefore, propose STEP (as shorthand for Sequence-to-Sequence Transformer Pre-training), which can be trained on large scale unlabeled documents. Specifically, STEP is pre-trained using three different tasks, namely sentence reordering, next sentence generation, and masked document generation. Experiments on two summarization datasets show that all three tasks can improve performance upon a heavily tuned large Seq2Seq Transformer which already includes a strong pre-trained encoder by a large margin. By using our best task to pre-train STEP, we outperform the best published abstractive model on CNN/DailyMail by 0.8 ROUGE-2 and New York Times by 2.4 ROUGE-2.
摘要： - 抽象汇总目标重写长文档其较短的形式，其通常建模为一个序列到序列（Seq2Seq）学习问题。 Seq2Seq变压器是针对此问题强悍的机型。不幸的是，培养大Seq2Seq变压器有限的监督汇总数据是具有挑战性的。因此，我们建议阶段（简写序列，以序互感器前培训），它可以在大规模未标记文档进行培训。具体而言，步骤是预先训练使用三种不同的任务，即句子重新排序，下句生成，并掩盖文档生成。在两种聚合数据集实验结果表明，所有三个任务可以提高在重调大Seq2Seq变压器已经包括了一大截强大的预先训练的编码器性能。通过使用我们的最好的任务预火车一步，我们2.4 ROUGE-2跑赢大盘0.8 ROUGE-2和纽约时报CNN /每日邮报公布的最佳抽象模型。

60. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments [PDF] 返回目录
Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
Abstract: We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions. By being situated in continuous environments, this setting lifts a number of assumptions implicit in prior work that represents environments as a sparse graph of panoramas with edges corresponding to navigability. Specifically, our setting drops the presumptions of known environment topologies, short-range oracle navigation, and perfect agent localization. To contextualize this new task, we develop models that mirror many of the advances made in prior settings as well as single-modality baselines. While some of these techniques transfer, we find significantly lower absolute performance in the continuous setting -- suggesting that performance in prior `navigation-graph' settings may be inflated by the strong implicit assumptions.
摘要：我们开发一个连续的3D环境，让代理商必须执行低一级的行动遵循自然语言导航方向语言制导导航任务集。通过被位于连续的环境中，此设置升降机若干假设在表示环境作为全景的稀疏图与对应于边缘适航之前工作的隐式的。具体来说，我们的设置下降知环境拓扑，短距离甲骨文导航，以及完善的代理本地化的推定。为了情境这项新任务，我们开发了许多反映在之前的设置以及单模态下的基线所取得的进步的模型。而其中的一些技术转移，我们发现在连续设置显著较低的绝对性能 - 这表明在现有`导航图表”设置，性能可能会受强烈的隐含假设被充气。

61. Improved Code Summarization via a Graph Neural Network [PDF] 返回目录
Alexander LeClair, Sakib Haque, Linfgei Wu, Collin McMillan
Abstract: Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of advances in neural network and AI technologies. In general, source code summarization techniques use the source code as input and outputs a natural language description. Yet a strong consensus is developing that using structural information as input leads to improved performance. The first approaches to use structural information flattened the AST into a sequence. Recently, more complex approaches based on random AST paths or graph neural networks have improved on the models using flattened ASTs. However, the literature still does not describe the using a graph neural network together with source code sequence as separate inputs to a model. Therefore, in this paper, we present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries. We evaluate our technique using a data set of 2.1 million Java method-comment pairs and show improvement over four baseline techniques, two from the software engineering literature, and two from machine learning literature.
摘要：自动源代码总结是产生自然语言描述源代码的任务。自动代码总结是一个迅速扩大研究领域，尤其是在社会已采取的神经网络和人工智能技术的进步更大的优势。一般来说，源代码摘要技术使用源代码作为输入，并输出的自然语言描述。然而，一个强烈的共识，正在开发利用结构信息作为输入会提高业绩。第一接近使用结构信息变平AST成的序列。最近，基于随机AST路径或图形神经网络更复杂的方法已经使用平坦化的AST模型改进。然而，文献仍然没有连同源代码序列作为单独输入到模型描述了使用图形神经网络。因此，在本文中，我们提出使用基于图的神经结构，更好地匹配了AST的默认结构来生成这些汇总的方法。我们使用的210万Java方法注释对数据集和改善显示在四个基线技术，二是来自软件工程文献，以及两个机器学习文献评估我们的技术。

62. A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification [PDF] 返回目录
Yinglong Ma, Jingpeng Zhao, Beihong Jin
Abstract: Many important classification problems in real world consist of a large number of categories. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchical structure or taxonomy has become a challenging problem. In this paper, we present a hierarchical fine-tuning deep learning approach for HMTC. A joint embedding approach of words and parent category are utilized by leveraging the hierarchical relations in the hierarchical structure of categories and the textual data. A fine tuning technique is applied to the Ordered Neural LSTM (ONLSTM) neural network such that the text classification results in the upper levels should contribute to the classification in the lower ones. The extensive experiments were made over two benchmark datasets, and the results show that the method proposed in this paper outperforms the state-of-the-art hierarchical and flat multi-label text classification approaches at significantly lower compu-tational cost while maintaining high interpretability.
摘要：在现实世界中许多重要的分类问题由大量的类别中。分层多标签文本分类（HMTC）以更高的精度在大集的层次结构或分类组织密切相关的类别，已成为一个具有挑战性的问题。在本文中，我们提出了HMTC分层微调深学习方法。字和父类的联合嵌入方法是通过利用在类别和文本数据的层次结构的层次关系利用。甲微调技术应用于使得在上层的文本分类结果应在较下层向分类排序神经LSTM（ONLSTM）神经网络。广泛的实验，取得了两个标准数据集，结果表明，本文所提出的方法优于国家的最先进的分层和平坦的多标签文本分类方法在显著较低COMPU-tational成本，同时保持高解释性。

63. Applying Cyclical Learning Rate to Neural Machine Translation [PDF] 返回目录
Choon Meng Lee, Jianfeng Liu, Wei Peng
Abstract: In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.
摘要：在训练中深学习网络，优化以及相关的学习速率通常用于没有太多的想法或以最小的调整，即使它是在保证快速收敛到最小损失函数的优良品质也可以概括以及关键测试数据集。从周期性学习率政策对计算机视觉相关的卷积网络和数据集的成功应用中汲取灵感，我们探究的学习如何循环率可应用于神经机器翻译，基于变压器的火车神经网络。从我们精心设计的实验，我们表明，优化的选择和相关的周期性学习率政策能够对性能有显著的影响。此外，我们将周期性学习速率神经机器翻译任务时制定准则。因此，我们的工作，我们希望的选择正确优化的重要性提高认识，伴随的学习的利率政策，同时，鼓励进一步研究学习过程变得简单易用的汇率政策。

64. TAPAS: Weakly Supervised Table Parsing via Pre-training [PDF] 返回目录
Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Martin Eisenschlos
Abstract: Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition, the generated logical forms are only used as an intermediate step prior to retrieving the denotation. In this paper, we present TAPAS, an approach to question answering over tables without generating logical forms. TAPAS trains from weak supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding aggregation operator to such selection. TAPAS extends BERT's architecture to encode tables as input, initializes from an effective joint pre-training of text segments and tables crawled from Wikipedia, and is trained end-to-end. We experiment with three different semantic parsing datasets, and find that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL and WIKITQ, but with a simpler model architecture. We additionally find that transfer learning, which is trivial in our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art.
摘要：在表回答自然语言问题通常被看作是一种语义分析的任务。为缓解全逻辑形式的收集成本，一个流行的做法侧重于由denotations而不是逻辑形式的监管不力。然而，从弱监督训练语义解析器带来困难，此外，所产生的逻辑形式仅之前检索所述外延用作中间步骤。在本文中，我们本小吃，到问题的方法回答过的表而不产生逻辑形式。从弱监督TAPAS火车，并预测通过选择表格单元和任选地施加相应的聚合操作者这样的选择的外延。 TAPAS延伸BERT的体系结构来编码表作为输入，从文本段和表的一个有效的联合预训练初始化维基百科抓取，并且经过培训的端至端。我们有三个不同的语义分析的数据集进行实验，发现TAPAS性能优于或通过提高国家的最先进的从55.1上SQA准确性67.2，并与WIKISQL的国家的最先进的执行看齐对手语义分析模型和WIKITQ，但有一个简单的模型架构。我们还发现，转让的学习，这是我们设置小事，从WIKISQL到WIKITQ，得到48.7精度，4.2分的国家的最先进的上方。

65. Generating Rationales in Visual Question Answering [PDF] 返回目录
Hammad A. Ayyubi, Md. Mehrab Tanjim, Julian J. McAuley, Garrison W. Cottrell
Abstract: Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics
摘要：尽管在Visual QuestionAnswering（VQA）的最新进展，它仍然是一个挑战todetermine很大的成功如何能attributedto声音的推理和理解ability.We寻求调查这一问题propos-ING ofrationale生成新的任务。 ES-sentially，我们任务与它预测的答案发电机密封-ING理由一VQA模型。 Weuse从Visual常识雷亚 - soning（VCR）任务数据，因为它包含地面truthrationales有视觉问题和-swers一起。我们首先调查常识未derstanding中的龙头VCR MOD-ELS之一，ViLBERT，通过产生使用状态的最先进的LAN-瓜哥模型，GPT-2理由frompretrained权重。接下来，我们寻求结束对endfashion与GPT-2 jointlytrain ViLBERT与预测VQA的一个-swer和产生理由的双重任务。 Weshow，这种训练的内喷射COM-monsense在VQA modelthrough定量和定性evaluationmetrics理解

66. ForecastTB -- An R Package as a Test-bench for Forecasting Methods Comparison [PDF] 返回目录
Neeraj Dhanraj Bokde, Gorm Bruun Andersen
Abstract: This paper introduces the R package ForecastTB that can be used to compare the forecasting accuracy of different methods as related to the characteristics of a dataset. The ForecastTB is a plug-and-play structured module, and several forecasting methods can be included with simple instructions. The proposed test-bench is not limited to the default forecasting and error metric functions, and users are able to append, remove or choose the desired methods as per requirements. Besides, several plotting functions are provided to allow comparative visualization of the performance and behavior of different forecasting methods. Furthermore, this paper presents example applications with natural time series datasets to exhibit the feature of the ForecastTB package to evaluate forecasting comparison analysis as affected by characteristics of a dataset.
摘要：介绍将R包ForecastTB可用于比较作为与数据集的特性的不同方法中的预测精度。该ForecastTB是一个即插即用结构模块，和几个预测方法可以包含简单的指令。所提出的测试台不限于默认的预测和误差度量功能，并且用户可以添加，删除或选择所需的方法，按要求。此外，提供了多种绘图功能，允许不同的预测方法的性能和行为比较可视化。此外，本文提出用天然的时间序列数据集的示例应用到表现出ForecastTB封装的特征，因为受到一个数据集的特性来评估预测比较分析。

67. Identifying Radiological Findings Related to COVID-19 from Medical Literature [PDF] 返回目录
Yuxiao Liang, Pengtao Xie
Abstract: Coronavirus disease 2019 (COVID-19) has infected more than one million individuals all over the world and caused more than 55,000 deaths, as of April 3 in 2020. Radiological findings are important sources of information in guiding the diagnosis and treatment of COVID-19. However, the existing studies on how radiological findings are correlated with COVID-19 are conducted separately by different hospitals, which may be inconsistent or even conflicting due to population bias. To address this problem, we develop natural language processing methods to analyze a large collection of COVID-19 literature containing study reports from hospitals all over the world, reconcile these results, and draw unbiased and universally-sensible conclusions about the correlation between radiological findings and COVID-19. We apply our method to the CORD-19 dataset and successfully extract a set of radiological findings that are closely tied to COVID-19.
摘要：冠状病毒病2019（COVID-19）已经在2020年放射学发现感染了超过一百万的人遍布世界各地，并造成超过55000人死亡，4月3日的在指导COVID的诊断和治疗信息的重要来源-19。然而，关于如何放射学发现与COVID-19是相关的现有的研究是由不同的医院，这可能是不一致的，或者由于人口偏压甚至冲突的分开进行。为了解决这个问题，我们开发自然语言处理的方法来分析大集合COVID-19包含来自医院遍布世界各地的研究报告，调和这些结果，并得出X线表现之间的相关性偏见和普遍，合理的结论文学2019冠状病毒病。我们应用我们的方法对CORD-19数据集，并成功提取出一组是紧密联系在一起COVID-19放射性的发现。

68. Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification [PDF] 返回目录
Caleb Ziems, Ymir Vigfusson, Fred Morstatter
Abstract: Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.
摘要：网络欺凌是在线社区普遍的问题。为了确定在大型的社交网络网络欺凌情况下，内容版主依赖于机器学习分类进行自动检测网络欺凌。然而，现有的模型仍然不适合现实世界的应用，主要是由于可公开获得的训练数据的不足和缺乏分配地面实况标签通用标准。在这项研究中，我们解决了使用原来的注释框架可靠数据的需要。由社会科学的启发研成欺凌行为，我们描述了使用五个明确的因素，以代表其社会和语言方面的网络欺凌的细致入微的问题。我们使用社交网络和基于语言的特性，提高了分类器性能模拟这种行为。这些结果表明代表和建模网络恐吓作为一种社会现象的重要性。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-07

目录

摘要