摘要

1. Self-training Improves Pre-training for Natural Language Understanding [PDF] 返回目录
Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau
Abstract: Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a specific task, we introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web. Unlike previous semi-supervised methods, our approach does not require in-domain unlabeled data and is therefore more generally applicable. Experiments show that self-training is complementary to strong RoBERTa baselines on a variety of tasks. Our augmentation approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks. Finally, we also show strong gains on knowledge-distillation and few-shot learning.
摘要：无监督前的训练，导致了自然语言理解得多的最新进展。在本文中，我们研究的自我训练通过半监督学习另一种方式来利用未标记数据。要获取特定任务的其他数据，我们引入SentAugment，其计算从标记数据的特定任务查询的嵌入到从数十亿未标记的句子的银行取回的句子从网上抓取数据增强方法。不同于以往的半监督方法，我们的方法并不需要在域未标记的数据，因此更普遍适用。实验表明，自我训练是在各种任务的强大的罗伯塔基线的补充。我们的隆胸方法导致了可扩展性和有效的自我训练，高达2.6％的标准文本分类基准的改进。最后，我们还显示对知识蒸馏和几个次学习涨势强劲。

2. Pareto Probing: Trading Off Accuracy for Complexity [PDF] 返回目录
Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
Abstract: The question of how to probe contextual word representations in a way that is principled and useful has seen significant recent attention. In our contribution to this discussion, we argue, first, for a probe metric that reflects the trade-off between probe complexity and performance: the Pareto hypervolume. To measure complexity, we present a number of parametric and non-parametric metrics. Our experiments with such metrics show that probe's performance curves often fail to align with widely accepted rankings between language representations (with, e.g., non-contextual representations outperforming contextual ones). These results lead us to argue, second, that common simplistic probe tasks such as POS labeling and dependency arc labeling, are inadequate to evaluate the properties encoded in contextual word representations. We propose full dependency parsing as an example probe task, and demonstrate it with the Pareto hypervolume. In support of our arguments, the results of this illustrative experiment conform closer to accepted rankings among contextual word representations.
摘要：如何在探测是原则性的，有用的看到最近显著注意的方法语境词表示这个问题。在我们本次讨论的贡献，我们认为，首先，对于一个探头指标，反映了探头的复杂性和性能之间的权衡：帕累托超体积。要测量的复杂性，我们提出了一些参数和非参数指标。我们有这样的度量实验表明，探测器的性能曲线经常无法对齐用语言表述的广泛接受的排名（用，例如，非上下文表示跑赢上下文的）。这些结果使我们认为，第二，普通简单的探测任务，如POS标签和依赖性弧线标签，不足以评估上下文字表示编码的性能。我们建议全依存分析作为一个例子探测任务，并与帕累托超体积表明它。为了支持我们的论点，这说明实验的结果符合上下文字表示中更接近公认的排名。

3. Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent [PDF] 返回目录
Yun-Hsuan Jen, Chieh-Yang Huang, Mei-Hua Chen, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku
Abstract: Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverage entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.
摘要：许多英语作为一个第二语言学习者有麻烦用近义词的话（例如，小vs.little;简要vs.shortly）正确，并经常找例句来学习两个几乎同义术语的区别。以前的工作使用手工制作的分数，建议的句子，但在采用这样的分数将所有近义词为近义以各种方式不同的困难。我们注意到，学习材料的乐于助人将反映在学生的表现。因此，我们提出基于推理学习者类试剂来模仿学习行为，并通过检查代理的性能鉴定很好的学习资料。为了使代理人的行为类似于推断从所提供的材料的答案学习者，我们充分利用蕴涵建模的能力。实验结果表明，该试剂配备了良好的学习类似的行为，以实现最佳的性能都填充式的空白（FITB）和良好的例句选择任务。我们进一步开展与大学英语学习者课堂用户研究。用户研究的结果显示，所提出的代理可以找出例句是帮助学生更容易，更有效地学习。相比其他车型，建议剂改善学生的17％以上，学习后的成绩。

4. Speakers Fill Lexical Semantic Gaps with Context [PDF] 返回目录
Tiago Pimentel, Rowan Hall Maudslay, Damián Blasi, Ryan Cotterell
Abstract: Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear---resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this---one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. $\rho = 0.40$ in English). We then test our main hypothesis---that a word's lexical ambiguity should negatively correlate with its contextual uncertainty---and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
摘要：词汇歧义是在语言普遍，允许对经济的单词形式的再利用，因此使语言更有效。如果暧昧的话不能从上下文中被消除歧义，但是，这种增益效率可能会使语言不太清楚---导致频繁的沟通不畅。对于语言清晰，高效编码，我们断定，一个字类型的词汇歧义应该多少信息方面提供了有关其平均相关。为了调查这是否是这种情况，我们实施一个字，因为它可以利用意义的熵的词汇歧义，并提供了两种方法来估计该---一个需要人的注解（使用共发现），以及一个不（使用BERT），使得它可以容易地应用于大量的语言。我们验证通过表明，对六大高资源语言这些措施，有歧义的我们的基于BERT的估值和同义词的词在WordNet数（例如$ \ RHO = $ 0.40英文）之间的显著皮尔森相关性。然后，我们测试我们的主要假设---一个字的词汇歧义应该负的上下文相关成分的不确定性---并找到所有的18种类型学不同语言中，我们分析显著的相关性。这表明，在不确定性的情况下，扬声器通过使上下文更多的信息弥补。

5. A Streaming Approach For Efficient Batched Beam Search [PDF] 返回目录
Kevin Yang, Violet Yao, John DeNero, Dan Klein
Abstract: We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically ``refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71% compared to a fixed-width beam search baseline and 17% compared to a variable-width baseline, while matching baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.
摘要：我们提出了对GPU架构变长解码的高效配料战略。在解码期间，当候选终止或根据启发式，我们的流的方法周期性地``笔芯”与候选人的所选子集继续之前的批次修剪。我们应用我们的方法来可变宽度波束搜索上的国家的the-技术的机器翻译模型，我们的方法降低达71％，较相比宽度可变的基线固定宽度的波束搜索基线和17％运行时，而匹配基线BLEU。最后，实验表明，我们的方法可以加快解码在其它结构域，诸如语义和语法分析。

6. Knowledge Association with Hyperbolic Knowledge Graph Embeddings [PDF] 返回目录
Zequn Sun, Muhao Chen, Wei Hu, Chengming Wang, Jian Dai, Wei Zhang
Abstract: Capturing associations for knowledge graphs (KGs) through entity alignment, entity type inference and other related tasks benefits NLP applications with comprehensive knowledge representations. Recent related methods built on Euclidean embeddings are challenged by the hierarchical structures and different scales of KGs. They also depend on high embedding dimensions to realize enough expressiveness. Differently, we explore with low-dimensional hyperbolic embeddings for knowledge association. We propose a hyperbolic relational graph neural network for KG embedding and capture knowledge associations with a hyperbolic transformation. Extensive experiments on entity alignment and type inference demonstrate the effectiveness and efficiency of our method.
摘要：通过实体对齐，实体类型推断和全面的知识表示其他相关任务的好处NLP应用程序捕获的知识图（KGS）协会。建立在欧几里得的嵌入近期相关的方法是通过分层结构和幼儿园的不同尺度的挑战。他们还依赖于高嵌入尺寸来实现足够的表现力。不同的是，我们探索与知识的关联低维双曲的嵌入。我们提出了一个双曲线关系图神经网络KG嵌入和捕获知识与联想双曲线转型。在实体对齐和类型推断大量的实验证明了该方法的有效性和效率。

7. Viable Threat on News Reading: Generating Biased News Using Natural Language Models [PDF] 返回目录
Saurabh Gupta, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Abstract: Recent advancements in natural language generation has raised serious concerns. High-performance language models are widely used for language generation tasks because they are able to produce fluent and meaningful sentences. These models are already being used to create fake news. They can also be exploited to generate biased news, which can then be used to attack news aggregators to change their reader's behavior and influence their bias. In this paper, we use a threat model to demonstrate that the publicly available language models can reliably generate biased news content based on an input original news. We also show that a large number of high-quality biased news articles can be generated using controllable text generation. A subjective evaluation with 80 participants demonstrated that the generated biased news is generally fluent, and a bias evaluation with 24 participants demonstrated that the bias (left or right) is usually evident in the generated articles and can be easily identified.
摘要：在自然语言生成的最新进展已经引起了严重的关切。高性能的语言模型被广泛用于语言生成的任务，因为他们能够生产流畅和有意义的句子。这些模型已被用于制造假新闻。它们也可以被利用来产生偏见的新闻，然后可以用来攻击的新闻聚合，以改变他们的读者的行为，并影响他们的偏见。在本文中，我们使用了一个威胁模型表明，基于输入的原始新闻的公开可用的语言模型能够可靠地产生偏见的新闻内容。我们还表明，可以通过控制文本生成中产生了大量高品质的偏置新闻报道。用80名参加者主观评估表明，所生成的偏压的消息是一般流畅，并用24名参与者一个偏压评价，结果证实该偏压（左或右）是在所产生的制品通常明显的，并且可以容易地识别。

8. PublishInCovid19 at WNUT 2020 Shared Task-1: Entity Recognition in Wet Lab Protocols using Structured Learning Ensemble and Contextualised Embeddings [PDF] 返回目录
Janvijay Singh, Anshul Wadhawan
Abstract: In this paper, we describe the approach that we employed to address the task of Entity Recognition over Wet Lab Protocols -- a shared task in EMNLP WNUT-2020 Workshop. Our approach is composed of two phases. In the first phase, we experiment with various contextualised word embeddings (like Flair, BERT-based) and a BiLSTM-CRF model to arrive at the best-performing architecture. In the second phase, we create an ensemble composed of eleven BiLSTM-CRF models. The individual models are trained on random train-validation splits of the complete dataset. Here, we also experiment with different output merging schemes, including Majority Voting and Structured Learning Ensembling (SLE). Our final submission achieved a micro F1-score of 0.8175 and 0.7757 for the partial and exact match of the entity spans, respectively. We were ranked first and second, in terms of partial and exact match, respectively.
摘要：在本文中，我们描述了我们用来解决实体识别过湿实验室协议的任务的方法 - 在EMNLP WNUT-2020车间共享任务。我们的方法是由两个阶段组成。在第一阶段，我们用各种contextualised字嵌入物（如基于BERT天才）和BiLSTM-CRF模型实验，在表现最好的架构到达。在第二阶段中，我们创建十BiLSTM-CRF模型组成的集合。个别型号进行培训的完整数据集的随机列车验证分裂。在这里，我们也尝试不同的输出合并计划，包括资本多数决的学习Ensembling（SLE）。我们最后提交达到0.8175和0.7757微F1-得分为实体跨度的部分和完全匹配，分别。我们分别排名第一和第二，在局部和精确匹配的条件。

9. Lifelong Language Knowledge Distillation [PDF] 返回目录
Yung-Sung Chuang, Shang-Yu Su, Yun-Nung Chen
Abstract: It is challenging to perform lifelong language learning (LLL) on a stream of different tasks without any performance degradation comparing to the multi-task counterparts. To address this issue, we present Lifelong Language Knowledge Distillation (L2KD), a simple but efficient method that can be easily applied to existing LLL architectures in order to mitigate the degradation. Specifically, when the LLL model is trained on a new task, we assign a teacher model to first learn the new task, and pass the knowledge to the LLL model via knowledge distillation. Therefore, the LLL model can better adapt to the new task while keeping the previously learned knowledge. Experiments show that the proposed L2KD consistently improves previous state-of-the-art models, and the degradation comparing to multi-task models in LLL tasks is well mitigated for both sequence generation and text classification tasks.
摘要：这是具有挑战性的，而不比较多任务同行任何性能下降执行终身语言学习（LLL）上的不同的任务流。为了解决这个问题，我们提出了终身语言知识蒸馏（L2KD），一个简单而有效的方法，该方法可以很容易地应用到现有的LLL架构，以减轻劣化。具体而言，当LLL模型对新任务的训练，我们给老师模型首先学习新的任务，并通过知识蒸馏传递知识，微光模式。因此，LLL模型能更好地适应新的任务，同时保持以前学过的知识。实验表明，该L2KD持续改善国家的最先进的以往机型，并在LLL任务比较多任务模式的退化以及减轻两个序列生成和文本分类任务。

10. Explaining The Efficacy of Counterfactually-Augmented Data [PDF] 返回目录
Divyansh Kaushik, Amrith Setlur, Eduard Hovy, Zachary C. Lipton
Abstract: In attempts to produce machine learning models less reliant on spurious patterns in training data, researchers have recently proposed a human-in-the-loop process for generating counterfactually augmented datasets. As applied in NLP, given some documents and their (initial) labels, humans are tasked with revising the text to make a (given) counterfactual label applicable. Importantly, the instructions prohibit edits that are not necessary to flip the applicable label. Models trained on the augmented (original and revised) data have been shown to rely less on semantically irrelevant words and to generalize better out-of-domain. While this work draws on causal thinking, casting edits as interventions and relying on human understanding to assess outcomes, the underlying causal model is not clear nor are the principles underlying the observed improvements in out-of-domain evaluation. In this paper, we explore a toy analog, using linear Gaussian models. Our analysis reveals interesting relationships between causal models, measurement noise, out-of-domain generalization, and reliance on spurious signals. Interestingly our analysis suggests that data corrupted by adding noise to causal features will degrade out-of-domain performance, while noise added to non-causal features may make models more robust out-of-domain. This analysis yields interesting insights that help to explain the efficacy of counterfactually augmented data. Finally, we present a large-scale empirical study that supports this hypothesis.
摘要：在尝试生产机器学习模型在训练数据杂散模式的依赖，研究人员最近提出了一个人在最回路产生反事实的增强数据集的过程。作为NLP应用，给出了一些文件和他们的（最初的）标签，人类的任务是修订文本作（定）反标签适用。重要的是，说明禁止那些不需要翻转适用的标签编辑。训练有素的增强（原件和经修订的）数据模型已经显示出较少依赖语义无关的话，更好地概括出的域。虽然这项工作借鉴了因果思维，铸造编辑的干预措施，依靠人类的认识成果评估，潜在的因果模式不清晰，也不是在外面的域评价得到了改善的基本原则。在本文中，我们探索玩具的模拟，使用线性高斯模型。我们的分析表明因果模型之间有趣的关系，测量噪声，超出域泛化和依赖杂散信号。有趣的是我们的分析表明，通过增加噪声的因果功能损坏的数据会降低域外的性能，而噪音加入到非因果功能可能会使模型更健壮领域外的。这种分析得出了有趣的见解，有助于解释反事实的增强数据的有效性。最后，我们提出了一个大型的实证研究支持这一假说。

11. Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection [PDF] 返回目录
Gaurav Arora
Abstract: This paper describes the system submitted to Dravidian-Codemix-HASOC2020: Hate Speech and Offensive Content Identification in Dravidian languages (Tamil-English and Malayalam-English). The task aims to identify offensive language in code-mixed dataset of comments/posts in Dravidian languages collected from social media. We participated in both Sub-task A, which aims to identify offensive content in mixed-script (mixture of Native and Roman script) and Sub-task B, which aims to identify offensive content in Roman script, for Dravidian languages. In order to address these tasks, we proposed pre-training ULMFiT on synthetically generated code-mixed data, generated by modelling code-mixed data generation as a Markov process using Markov chains. Our model achieved 0.88 weighted F1-score for code-mixed Tamil-English language in Sub-task B and got 2nd rank on the leader-board. Additionally, our model achieved 0.91 weighted F1-score (4th Rank) for mixed-script Malayalam-English in Sub-task A and 0.74 weighted F1-score (5th Rank) for code-mixed Malayalam-English language in Sub-task B.
摘要：本文介绍了提交给德拉威-Codemix-HASOC2020系统：仇恨言论和在达罗毗荼语系（泰米尔语 - 英语和马拉雅拉姆语，英语）令人反感的内容识别。该任务旨在在从社交媒体收集到达罗毗荼语系的意见/职位代码混合数据集识别攻击性的语言。我们参加了两个子任务A，其目的是确定在混合脚本攻击性的内容和子任务B，其目的是确定攻击性内容在罗马脚本，达罗毗荼语系（本地和罗马脚本的混合物）。为了解决这些任务，我们提出了综合生成的代码混合数据，通过代码混合数据生成建模为利用马尔可夫链马尔可夫过程中产生的预训练ULMFiT。我们的模型达到0.88加权F1的得分在子任务B的代码混合泰米尔英语语言，得到了在领导者，板第二的排名。此外，我们的模型在子任务A和0.74加权F1-得分（第5级）为子任务B.代码混合马来亚英语语言达到0.91加权F1-得分（第四级）混合脚本马拉雅拉姆语，英语

12. The Grammar of Emergent Languages [PDF] 返回目录
Oskar van der Wal, Silvan de Boer, Elia Bruni, Dieuwke Hupkes
Abstract: In this paper, we consider the syntactic properties of languages emerged in referential games, using unsupervised grammar induction (UGI) techniques originally designed to analyse natural language. We show that the considered UGI techniques are appropriate to analyse emergent languages and we then study if the languages that emerge in a typical referential game setup exhibit syntactic structure, and to what extent this depends on the maximum message length and number of symbols that the agents are allowed to use. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge, but they also illustrate that more sophisticated game scenarios are required to obtain syntactic properties more akin to those observed in human language. We argue that UGI techniques should be part of the standard toolkit for analysing emergent languages and release a comprehensive library to facilitate such analysis for future researchers.
摘要：在本文中，我们考虑语言的语法特性在参考游戏出现，使用无监督语法归纳（UGI）技术最初设计来分析自然语言。我们发现，所考虑的UGI技术适合于分析出现的语言，我们再研究如果出现在一个典型的参考游戏设置展览句法结构，以及在何种程度上，这取决于符号的最大消息长度和数量语言的代理被允许使用。我们的实验表明，在一定的消息长度和词汇尺寸所需要的结构出现，但它们也说明了更复杂的游戏场景需要获得句法属性更类似于那些在人类语言观察。我们认为，UGI技术应该是分析急诊语言的标准工具包的一部分，并释放一个综合库，以方便日后的研究人员如此分析。

13. Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition [PDF] 返回目录
Jean-Benoit Delbrouck, Noé Tits, Stéphane Dupont
Abstract: This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field. To demonstrate the efficiency of our models, we carefully evaluate their performances on the IEMOCAP, MOSI, MOSEI and MELD dataset. The experiments can be directly replicated and the code is fully open for future researches.
摘要：本文旨在带来了新的轻量级的情感识别和情感分析的任务，但功能强大的解决方案。我们的动机是提出基于在从大范围的数据集到的挑战结合语言和声学输入变压器和调制两种结构，有时超过，所述状态的最先进的在外地。为了证明我们的模型的效率，我们仔细评估其对IEMOCAP，MOSI，MOSEI和MELD数据集的演出。实验可以直接复制和代码是完全开放的未来的研究。

14. A Fully Hyperbolic Neural Model for Hierarchical Multi-Class Classification [PDF] 返回目录
Federico López, Michael Strube
Abstract: Label inventories for fine-grained entity typing have grown in size and complexity. Nonetheless, they exhibit a hierarchical structure. Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. However, it is not clear how to integrate hyperbolic components into downstream tasks. This is the first work that proposes a fully hyperbolic model for multi-class multi-label classification, which performs all operations in hyperbolic space. We evaluate the proposed model on two challenging datasets and compare to different baselines that operate under Euclidean assumptions. Our hyperbolic model infers the latent hierarchy from the class distribution, captures implicit hyponymic relations in the inventory, and shows performance on par with state-of-the-art methods on fine-grained classification with remarkable reduction of the parameter size. A thorough analysis sheds light on the impact of each component in the final prediction and showcases its ease of integration with Euclidean layers.
摘要：细粒度的实体类型标签库存已经在规模和复杂性增加。然而，他们表现出的层次结构。双曲空间提供学习符号数据的分层表示数学吸引力的方法。但是，目前尚不清楚如何双曲组件集成到下游的任务。这是提出了多级多标签分类，其双曲空间中执行所有操作完全双曲线模型的第一项工作。我们评估两个数据集的挑战所提出的模型，并比较，根据欧几里得假设操作不同的基线。我们的双曲线模型推断从类分布的潜在层次，在库存捕捉隐含hyponymic关系，并显示在标准杆的成绩与上显着减少的参数大小的细颗粒分级的国家的最先进的方法。进行全面分析每种组分在最终预测的影响鸡舍光，并展示其易于与欧几里德层一体化。

15. Zero-Shot Clinical Acronym Expansion with a Hierarchical Metadata-Based Latent Variable Model [PDF] 返回目录
Griffin Adams, Mert Ketenci, Adler Perotte, Noemie Elhadad
Abstract: We introduce Latent Meaning Cells, a deep latent variable model which learns contextualized representations of words by combining local lexical context and metadata. Metadata can refer to granular context, such as section type, or to more global context, such as unique document ids. Reliance on metadata for contextualized representation learning is apropos in the clinical domain where text is semi-structured and expresses high variation in topics. We evaluate the LMC model on the task of clinical acronym expansion across three datasets. The LMC significantly outperforms a diverse set of baselines at a fraction of the pre-training cost and learns clinically coherent representations.
摘要：介绍潜在含义细胞，结合本地词汇背景数据和元学习单词的情境表示深深的潜变量模型。元数据可以参考颗粒方面，如款型，或更大的全球范围，比如独特的文档ID。依赖于情境表示学习的元数据在临床领域，其中文本是半结构化和表达的主题变化较大中肯。我们评估跨三个数据集临床缩写扩张的任务LMC模型。该LMC显著优于一组不同的基线在预培训费用的一小部分，并学会临床上一致表示。

16. Gender prediction using limited Twitter Data [PDF] 返回目录
Maaike Burghoorn, Maaike H.T. de Boer, Stephan Raaijmakers
Abstract: Transformer models have shown impressive performance on a variety of NLP tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP classification tasks, reducing the need for large amounts of additional training data. However, little research has addressed how much data is required to accurately fine-tune such pre-trained transformer models, and how much data is needed for accurate prediction. This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. Forensic applications include detecting gender obfuscation, e.g. males posing as females in chat rooms. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. The results show that finetuning BERT contributes to good gender classification performance (80% F1) when finetuned on only 200 tweets per person. But when using just 20 tweets per person, the performance of our classifier deteriorates non-steeply (to 70% F1). These results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users, and, consequently, that it is possible to determine gender on the basis of just a low volume of tweets. This opens up an operational perspective on the swift detection of gender.
摘要：变压器模型已经在各种NLP任务，表现出了不俗的表现。关闭的，现成的，预先训练模型可以进行微调，具体的NLP分类任务，减少了大量额外的训练数据的需要。然而，一些研究已经解决了多少数据需要精确地微调等预先训练变压器模型，以及有多少数据需要准确的预测。本文探讨了BERT的关于社交媒体的性别预测的可用性（一个变压器模型字嵌入）。法医应用包括检测性别混淆，例如男性冒充女性在聊天室。一个荷兰BERT模型是微调上标记以性别一个荷兰语Twitter数据集的不同的样品，在每个人使用的鸣叫的次数而变化。结果表明，BERT细化和微调有助于良好的性别分类性能（80％F1）每个人只有200鸣叫微调，时。但每人只用20鸣叫的时候，我们的分类器的性能降低非陡（70％F1）。这些结果表明，即使以相对少量的数据，BERT可以进行微调，以准确地帮助预测Twitter用户的性别，并且，因此，它是能够确定的鸣叫只是低量的基础上的性别。这开辟了在迅速检测性别的操作透视图。

17. Assessing Robustness of Text Classification through Maximal Safe Radius Computation [PDF] 返回目录
Emanuele La Malfa, Min Wu, Luca Laurenti, Benjie Wang, Anthony Hartshorn, Marta Kwiatkowska
Abstract: Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME.
摘要：神经网络模型NLP是容易维持在一个不同的预测本义，但结果输入小的修改。在本文中，我们侧重于对词替换文本分类的鲁棒性，旨在提供保证，如果一个单词替换一个可行的替代方案，如同义词模型预测不会改变。作为稳健性的措施，我们采取了一个给定的输入文字，这是在嵌入空间的决策边界的最小距离的最大安全半径的概念。由于计算的精确最大安全半径在实践中是行不通的，我们不是通过计算上限和下限接近它。对于上限计算，我们采用蒙特卡洛树搜索与句法过滤结合使用来分析单个和多个字替代的作用。下界计算是通过在工具CNN证书和POPQORN，分别用于卷积和复发性网络模型中实现的线性边界技术的修改来实现。我们评估了四年的数据集（IMDB，SST，公司新闻和新闻）以及一系列的嵌入的情感分析和新闻分类模型的方法，并提供鲁棒性趋势的分析。我们也我们的框架适用于解释性分析，并用石灰进行比较。

18. X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset [PDF] 返回目录
Angel Daza, Anette Frank
Abstract: Even though SRL is researched for many languages, major improvements have mostly been obtained for English, for which more resources are available. In fact, existing multilingual SRL datasets contain disparate annotation styles or come from different domains, hampering generalization in multilingual learning. In this work, we propose a method to automatically construct an SRL corpus that is parallel in four languages: English, French, German, Spanish, with unified predicate and role annotations that are fully comparable across languages. We apply high-quality machine translation to the English CoNLL-09 dataset and use multilingual BERT to project its high-quality annotations to the target languages. We include human-validated test sets that we use to measure the projection quality, and show that projection is denser and more precise than a strong baseline. Finally, we train different SOTA models on our novel corpus for mono- and multilingual SRL, showing that the multilingual annotations improve performance especially for the weaker languages.
摘要：尽管SRL进行研究的许多语言，大多已被英语取得了重大的改进，为此更多的资源可用。事实上，现有的多语种SRL数据集包含不同的诠释风格或来自不同的领域，在多语言学习阻碍推广。在这项工作中，我们提出了一种方法来自动构建SRL语料库是四种语言并行：英语，法语，德语，西班牙语，统一谓词和角色的注解是跨语言完全可比。我们采用高品质的机器翻译到英语CoNLL-09数据集，并使用多种语言的BERT项目以其高品质的注解目标语言。我们包括人类验证测试集，我们用它来测量投影质量，并表明投影致密，比一个强基线更精确。最后，我们培养我们新的语料库单和多语种SRL不同SOTA模型，显示出多语种注释改善尤其是对于较弱的语言表现。

19. Learning from Context or Names? An Empirical Study on Neural Relation Extraction [PDF] 返回目录
Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, Jie Zhou
Abstract: Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding which type of information affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at this https URL.
摘要：神经车型已经实现对关系抽取（RE）的基准显着成效。但是，其中的信息类型会影响现有的RE模型来做出决定，如何进一步提高这些机型的表现没有清晰的认识。为此，我们实证研究的两个主要的信息源文本的效果：原文上下文和实体提到（地名）。我们发现，（我），而背景是主要来源，以支持该预测，RE车型也严重依赖实体的信息提到，其中大部分是类型信息，以及（ii）通过实体提到现有数据集可能会泄漏浅启发式和从而有助于对RE的基准的高性能。基于该分析，我们提出了一个实体掩盖的对比前培训框架RE以两个文本上下文和类型信息有更深刻的理解，同时避免实体或使用浅线索中提到的死记硬背。我们开展了广泛的实验来支持我们的观点，并表明我们的框架可以改善在不同场景RE的有效性和神经模型的鲁棒性。所有的代码和数据集在此HTTPS URL被释放。

20. Exploiting Unsupervised Data for Emotion Recognition in Conversations [PDF] 返回目录
Wenxiang Jiao, Michael R. Lyu, Irwin King
Abstract: Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations, which is essentially a text classification task. Unlike the sentence-level text classification problem, the available supervised data for the ERC task is limited, which potentially prevents the models from playing their maximum effect. In this paper, we propose a novel approach to leverage unsupervised conversation data, which is more accessible. Specifically, we propose the Conversation Completion (ConvCom) task, which attempts to select the correct answer from candidate answers to fill a masked utterance in a conversation. Then, we Pre-train a basic COntext- Dependent Encoder (Pre-CODE) on the ConvCom task. Finally, we fine-tune the Pre-CODE on the datasets of ERC. Experimental results demonstrate that pre-training on unsupervised data achieves significant improvement of performance on the ERC datasets, particularly on the minority emotion classes.
摘要：情感识别对话（ERC）旨在预测扬声器的情绪状态在谈话中，这基本上是一个文本分类的任务。不同于句子级文本分类问题，对于ERC任务可用的教师数据是有限的，这可能阻止车型从演奏他们的最大作用。在本文中，我们提出了一种新的方法来利用无人监管谈话的数据，这是更容易获得。具体来说，我们建议对话完成（ConvCom）任务，尝试选择候选答案的正确答案，在对话中填充掩盖话语。然后，我们在ConvCom任务前训练基本依赖于上下文的编码器（预编码）。最后，我们微调对ERC的数据集的预编码。实验结果表明，无监督数据前培训达到的性能显著改善的ERC数据集，特别是在少数人的情感类。

21. Dynamic Anticipation and Completion for Multi-Hop Reasoning over Sparse Knowledge Graph [PDF] 返回目录
Xin Lv, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Wei Zhang, Yichi Zhang, Hao Kong, Suhui Wu
Abstract: Multi-hop reasoning has been widely studied in recent years to seek an effective and interpretable method for knowledge graph (KG) completion. Most previous reasoning methods are designed for dense KGs with enough paths between entities, but cannot work well on those sparse KGs that only contain sparse paths for reasoning. On the one hand, sparse KGs contain less information, which makes it difficult for the model to choose correct paths. On the other hand, the lack of evidential paths to target entities also makes the reasoning process difficult. To solve these problems, we propose a multi-hop reasoning model named DacKGR over sparse KGs, by applying novel dynamic anticipation and completion strategies: (1) The anticipation strategy utilizes the latent prediction of embedding-based models to make our model perform more potential path search over sparse KGs. (2) Based on the anticipation information, the completion strategy dynamically adds edges as additional actions during the path search, which further alleviates the sparseness problem of KGs. The experimental results on five datasets sampled from Freebase, NELL and Wikidata show that our method outperforms state-of-the-art baselines. Our codes and datasets can be obtained from this https URL
摘要：多跳推理已被广泛研究，近年来寻求知识图（KG）完成一个有效的和可解释的方法。以往大多数推理的方法是专为密集幼稚园与实体之间有足够的路径，但不能对这些稀疏幼儿园只包含推理稀疏路径很好地工作。在一方面，稀疏幼儿园含有较少的信息，这使得模型很难选择正确的路径。在另一方面，由于缺乏证据路径目标实体也使得推理过程困难。为了解决这些问题，我们提出了一个名为DacKGR在稀疏幼稚园多跳推理模型，通过应用新型的动态预测和完成战略：（1）预期策略使用基于嵌入的模型的潜在的预测，使我们的模型进行更多的潜力在稀疏幼儿园路径搜索。（2）基于该预期的信息，完成战略动态路径搜索，这能够进一步缓和幼稚园的稀疏问题过程中添加作为边缘的附加动作。在五个数据集从游离碱，NELL和维基数据采样，实验结果表明，该方法优于国家的最先进的基线。我们的代码和数据集可以从这个HTTPS URL获得

22. Exploring Semantic Capacity of Terms [PDF] 返回目录
Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-mei Hwu, Jinjun Xiong
Abstract: We introduce and study semantic capacity of terms. For example, the semantic capacity of artificial intelligence is higher than that of linear regression since artificial intelligence possesses a broader meaning scope. Understanding semantic capacity of terms will help many downstream tasks in natural language processing. For this purpose, we propose a two-step model to investigate semantic capacity of terms, which takes a large text corpus as input and can evaluate semantic capacity of terms if the text corpus can provide enough co-occurrence information of terms. Extensive experiments in three fields demonstrate the effectiveness and rationality of our model compared with well-designed baselines and human-level evaluations.
摘要：介绍和研究方面的语义能力。例如，人工智能的语义容量比线性回归的更高，因为人工智能具有更广泛的含义范围。术语的理解语义能力将有助于在自然语言处理许多下游任务。为此，我们提出了一个两步模型来研究方面的语义能力，这需要大量文本语料库作为输入，并可以评估方面的能力语义如果文本语料库能提供方面的足够共同出现的信息。在三个领域有着广泛的实验证明精心设计的基线和人类水平的评估相比，我们的模型的有效性与合理性。

23. PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition [PDF] 返回目录
Piotr Janiszewski, Mateusz Skiba, Urszula Walińska
Abstract: In this paper, we describe the PUM team's entry to the SemEval-2020 Task 12. Creating our solution involved leveraging two well-known pretrained models used in natural language processing: BERT and XLNet, which achieve state-of-the-art results in multiple NLP tasks. The models were fine-tuned for each subtask separately and features taken from their hidden layers were combined and fed into a fully connected neural network. The model using aggregated Transformer features can serve as a powerful tool for offensive language identification problem. Our team was ranked 7th out of 40 in Sub-task C - Offense target identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A Offensive language identification (89.726% F1-score).
摘要：在本文中，我们描述了PUM球队进入到SemEval-2020任务12.创建我们的解决方案涉及利用自然语言处理使用了两个著名的预先训练模式：BERT和XLNet，其实现国家的最先进的结果在多个NLP任务。该模型被微调，用于分别每个子任务和特性从他们的隐藏层采取合并，并送入完全连接的神经网络。使用聚合变压器特征的模型可以作为攻击性语言识别问题的有力工具。在子任务A冒犯的语言识别（89.726％F1-分）进攻目标识别与64.727％的宏F1-得分和第64出来的85 - 我们的团队是排在子任务C的40第七出来。

24. Regularizing Dialogue Generation by Imitating Implicit Scenarios [PDF] 返回目录
Shaoxiong Feng, Xuancheng Ren, Hongshen Chen, Bin Sun, Kan Li, Xu Sun
Abstract: Human dialogues are scenario-based and appropriate responses generally relate to the latent context knowledge entailed by the specific scenario. To enable responses that are more meaningful and context-specific, we propose to improve generative dialogue systems from the scenario perspective, where both dialogue history and future conversation are taken into account to implicitly reconstruct the scenario knowledge. More importantly, the conversation scenarios are further internalized using imitation learning framework, where the conventional dialogue model that has no access to future conversations is effectively regularized by transferring the scenario knowledge contained in hierarchical supervising signals from the scenario-based dialogue model, so that the future conversation is not required in actual inference. Extensive evaluations show that our approach significantly outperforms state-of-the-art baselines on diversity and relevance, and expresses scenario-specific knowledge.
摘要：人的对话是基于具体情况作出适当反应通常涉及通过特定的情景entailed潜在背景知识。为了让那些更有意义和背景，具体的反应，我们建议从改善方案的角度来看，其中两个对话历史与未来的对话都考虑到隐含重建方案的知识生成的对话系统。更重要的是，对话的场景是用模仿学习框架，其中有对未来的对话没有进入传统的对话模式被有效地转移包含在从基于场景的对话模式分级监管的信号场景知识正进一步内化，从而使未来的对话不实际的推断需要。广泛的评估表明，我们的方法显著优于多样性和相关性，并表示具体情景知识的国家的最先进的基线。

25. A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [PDF] 返回目录
Anh Tuan Nguyen, Mai Hoang Dao, Dat Quoc Nguyen
Abstract: Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual language model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual language model XLM-R (Conneau et al., 2020).
摘要：语义分析是一种重要的NLP任务。不过，越南在这一研究领域的低资源语言。在本文中，我们提出的第一次公开大规模文本到SQL语义分析数据集越南。我们对我们的数据扩展和评估两个强语义分析基线EditSQL（Zhang等，2019）和IRNet（Guo等，2019）。我们比较这两个基线与主要配置和发现：越南自动分词同时提高基线的分析结果;归一化的逐点互信息（NPMI）评分（鲍马，2009）是用于模式连接有用;从一个神经依赖解析器提取越南也改善结果潜句法特征;和越南（Nguyen和阮，2020年）的单语语言模型PhoBERT有助于产生较近期最好多语种语言模型XLM-R更高的性能（Conneau等，2020）。

26. "LazImpa": Lazy and Impatient neural agents learn to communicate efficiently [PDF] 返回目录
Mathieu Rita, Rahma Chaabouni, Emmanuel Dupoux
Abstract: Previous work has shown that artificial neural agents naturally develop surprisingly non-efficient codes. This is illustrated by the fact that in a referential game involving a speaker and a listener neural networks optimizing accurate transmission over a discrete channel, the emergent messages fail to achieve an optimal length. Furthermore, frequent messages tend to be longer than infrequent ones, a pattern contrary to the Zipf Law of Abbreviation (ZLA) observed in all natural languages. Here, we show that near-optimal and ZLA-compatible messages can emerge, but only if both the speaker and the listener are modified. We hence introduce a new communication system, "LazImpa", where the speaker is made increasingly lazy, i.e. avoids long messages, and the listener impatient, i.e.,~seeks to guess the intended content as soon as possible.
摘要：以前的工作表明，人工神经代理商自然发展令人惊讶的非效率的代码。这是通过这样的事实，在涉及的扬声器和收听者的神经网络在离散信道优化精确传输的参考游戏，紧急消息不能达到最佳的长度示出。此外，频繁的消息往往比那些不常长，模式违背了缩写的齐普夫定律（ZLA）在所有的自然语言观察。在这里，我们表明，接近最优和ZLA兼容的消息可以出现，但只有两个扬声器和监听器被修改。因此，我们引入一个新的通信系统，“LazImpa”，其中扬声器搞得越来越懒，即避免了长消息，与听者不耐烦，即寻求〜猜意的内容，尽快。

27. Linguistic Profiling of a Neural Language Model [PDF] 返回目录
Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta, Giulia Venturi
Abstract: In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT's capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information, the higher will be its capacity of predicting the correct label.
摘要：本文通过调查前神经语言模型（NLM）和微调的过程之后，如何这方面的知识在几个分类问题影响其预测学到的语言知识。我们采用了一系列广泛的探测任务，每个任务对应于不同层次的语言注释中提取的不同句子级特性。我们表明，BERT能够编码范围广的语言特点，但它往往在特定的下游任务训练的时候失去了这个信息。我们还发现，BERT的能力，编码不同的语言特性在它的预测有积极的影响：更多的则存储读取的语言信息，将越高其预测正确的标签的能力。

28. Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading [PDF] 返回目录
Yifan Gao, Chien-Sheng Wu, Jingjing Li, Shafiq Joty, Steven C.H. Hoi, Caiming Xiong, Irwin King, Michael R. Lyu
Abstract: Document interpretation and dialog understanding are the two major challenges for conversational machine reading. In this work, we propose Discern, a discourse-aware entailment reasoning network to strengthen the connection and enhance the understanding for both document and dialog. Specifically, we split the document into clause-like elementary discourse units (EDU) using a pre-trained discourse segmentation model, and we train our model in a weakly-supervised manner to predict whether each EDU is entailed by the user feedback in a conversation. Based on the learned EDU and entailment representations, we either reply to the user our final decision "yes/no/irrelevant" of the initial question, or generate a follow-up question to inquiry more information. Our experiments on the ShARC benchmark (blind, held-out test set) show that Discern achieves state-of-the-art results of 78.3% macro-averaged accuracy on decision making and 64.0 BLEU1 on follow-up question generation. Code and models are released at this https URL.
摘要：原稿解释和对话了解，是谈话机器阅读的两大挑战。在这项工作中，我们提出了识别，话语感知蕴涵推理网络，以加强连接，并增强双方的文档和对话的理解。具体来说，我们拆分文档使用预先训练的话语分割模型into子句状基本话语单元（EDU），并且我们训练我们的模型中的弱监督方式来预测是否每个EDU由用户反馈在交谈entailed 。基于学习EDU和蕴涵表示，我们要么回复用户我们的最终决定最初的问题的“是/否/不相关”，或产生的后续问题询问更多的信息。我们对SHARC基准实验（盲，保留检验集）显示，考辨达到78.3％宏平均准确度对决策国家的先进成果和64.0 BLEU1对后续问题的产生。代码和模型，在此HTTPS URL释放。

29. On the Frailty of Universal POS Tags for Neural UD Parsers [PDF] 返回目录
Mark Anderson, Carlos Gómez-Rodríguez
Abstract: We present an analysis on the effect UPOS accuracy has on parsing performance. Results suggest that leveraging UPOS tags as features for neural parsers requires a prohibitively high tagging accuracy and that the use of gold tags offers a non-linear increase in performance, suggesting some sort of exceptionality. We also investigate what aspects of predicted UPOS tags impact parsing accuracy the most, highlighting some potentially meaningful linguistic facets of the problem.
摘要：我们对效果UPOS精度对解析性能提出了一个分析。结果表明，利用UPOS的标签，功能神经解析器要求过高标注精度和使用黄金标签提供的性能的非线性增加，表明某种例外性的。我们还调查预测UPOS标签的影响分析精度的大部分内容方面，突出问题的一些潜在的有意义的语言方面。

30. GenAug: Data Augmentation for Finetuning Text Generators [PDF] 返回目录
Steven Y. Feng, Varun Gangal, Dongyeop Kang, Teruko Mitamura, Eduard Hovy
Abstract: In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.
摘要：在本文中，我们探讨文本生成，我们称之为GenAug数据增强。文本生成和语言模型是自然语言处理中的重要任务，而对于低数据制度是特别具有挑战性。我们建议，并评估各种增强的方法，包括一些将外部知识，对于Yelp上评论的一个子集微调GPT-2。我们还检查增大的量和生成的文本的质量之间的关系。我们利用几个指标，评价生成的文本的重要方面，包括它的多样性和流畅性。我们的实验证明字符级合成噪声的插入和关键字替换为上位词被有效扩增的方法，以及世代的质量提高到的峰，在原始数据的大约三倍的量。

31. Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior [PDF] 返回目录
Zi Lin, Jeremiah Zhe Liu, Zi Yang, Nan Hua, Dan Roth
Abstract: Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. Our method identifies and discards unimportant non-linear mappings in the residual connections by applying a thresholding operator on the function norm. It is applicable to any structured module, including a single attention head, an entire attention block, or a feed-forward subnetwork. Furthermore, we introduce spectral normalization to stabilize the distribution of the post-activation values of the Transformer layers, further improving the pruning effectiveness of the proposed methodology. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance. Specifically, we improve the performance over the state-of-the-art by 0.5 to 1.0% on average at 50% compression ratio.
摘要：传统（非结构化）修剪的变压器模型注重通过惩罚他们走向正规化零个别权重的方法。在这项工作中，我们探索频谱标准化身份先验（SNIP），结构化的修剪方法，在朝向标识映射一个变压器模型惩罚的整个剩余模块。我们的方法识别并丢弃通过在功能规范施加阈值算子不重要中的残余的连接非线性映射。它适用于任何结构的模块，其包括一个单一的注意力头，整个关注块，或前馈子网。此外，我们引入谱归一化，以稳定变压器层的后激活值的分布，从而进一步提高所提出的方法的修剪有效性。我们进行实验与5个GLUE基准任务BERT证明SNIP实现有效的修剪效果，同时保持相当的性能。具体来说，我们提高在50％的压缩比在国家的最先进的性能通过在平均0.5〜1.0％。

32. Corpora Evaluation and System Bias Detection in Multi-document Summarization [PDF] 返回目录
Alvin Dey, Tanya Chowdhury, Yash Kumar Atri, Tanmoy Chakraborty
Abstract: Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing a new MDS corpus. Next, we analyze the reason behind the absence of an MDS system which achieves superior performance across all corpora. We then observe the extent to which system metrics are influenced, and bias is propagated due to corpus properties. The scripts to reproduce the experiments in this work are available at this https URL.
摘要：多文档文摘（MDS）是任何一组的文件放到一个简洁的文字段落反映了关键点的任务。在过去，它已被用于聚合新闻，微博，产品评价等，从各种来源。由于任务没有标准的定义，我们遇到的数据集过多而交错参与文件之间的冲突不同级别。也有关于什么构成MDS摘要信息还没有标准。而另外一个挑战的事实是，新系统报告一组选定的数据集，这可能不是与他们的其他数据集性能相关的结果。在本文中，我们研究了一些广泛使用的MDS语料库的帮助和一套国家的最先进的机型这个异类任务。我们做出试图量化总结语料库的质量和规定点的列表，同时提出了一个新的MDS语料库考虑。接下来，我们分析了缺乏MDS系统，它实现了在所有语料库性能优越的背后的原因。然后，我们观察到哪个系统度量的影响的程度，并偏压被传播由于语料库特性。重现在这项工作中实验的脚本可在此HTTPS URL。

33. Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning [PDF] 返回目录
Hanlu Wu, Tengfei Ma, Lingfei Wu, Tariro Manyumwa, Shouling Ji
Abstract: Evaluation of a document summarization system has been a critical factor to impact the success of the summarization task. Previous approaches, such as ROUGE, mainly consider the informativeness of the assessed summary and require human-generated references for each test summary. In this work, we propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning. Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT. To learn the metric, for each summary, we construct different types of negative samples with respect to different aspects of the summary qualities, and train our model with a ranking loss. Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries. Furthermore, we show that our method is general and transferable across datasets.
摘要：一个文档文摘系统的评估一直是影响汇总任务成功的关键因素。以前的方法，如胭脂，主要考虑分摊摘要的信息量，并且需要为每个测试总结人类生成的引用。在这项工作中，我们提出了通过无监督对比学习评价不参考摘要摘要素质。具体来说，我们设计了涵盖语言素质和基于语义BERT一个信息量新的度量。要了解相应指标，对每个综上所述，我们构建不同类型的阴性样品相对于摘要质量的不同方面，培养我们的模型与排名的损失。对新闻和CNN /每日邮实验表明，我们的新的评价方法优于其他指标，即使没有参考摘要。此外，我们表明，我们的方法是一般和整个数据集转让。

34. Improving AMR Parsing with Sequence-to-Sequence Pre-training [PDF] 返回目录
Dongqin Xu, Junhui Li, Muhua Zhu, Min Zhang, Guodong Zhou
Abstract: In the literature, the research on abstract meaning representation (AMR) parsing is much restricted by the size of human-curated dataset which is critical to build an AMR parser with good performance. To alleviate such data size restriction, pre-trained models have been drawing more and more attention in AMR parsing. However, previous pre-trained models, like BERT, are implemented for general purpose which may not work as expected for the specific task of AMR parsing. In this paper, we focus on sequence-to-sequence (seq2seq) AMR parsing and propose a seq2seq pre-training approach to build pre-trained models in both single and joint way on three relevant tasks, i.e., machine translation, syntactic parsing, and AMR parsing itself. Moreover, we extend the vanilla fine-tuning method to a multi-task learning fine-tuning method that optimizes for the performance of AMR parsing while endeavors to preserve the response of pre-trained models. Extensive experimental results on two English benchmark datasets show that both the single and joint pre-trained models significantly improve the performance (e.g., from 71.5 to 80.2 on AMR 2.0), which reaches the state of the art. The result is very encouraging since we achieve this with seq2seq models rather than complex models. We make our code and model available at this https URL.
摘要：在文献中，在抽象意义表示（AMR）解析研究备受人策划的数据集的大小是建立一个AMR解析器具有良好性能的关键限制。为了减轻这种数据大小的限制，预先训练模式已经绘制在AMR解析越来越多的关注。然而，以往的预训练的模型，如BERT，是为如预期的AMR分析的特定任务，可能不起作用的通用实现。在本文中，我们侧重于序列到序列（seq2seq）AMR解析并提出seq2seq前培训的方法来构建预先训练模式在单和联合的方式对三个相关的任务，即，机器翻译，句法分析，和AMR解析本身。此外，我们的香草微调方法扩展到多任务学习微调方法，对于AMR的性能更优的解析，同时尽力维护的预先训练模式的响应。在两个英国标准数据集广泛的实验结果表明，无论是单一的和联合预训练的模型显著改善性能（例如，从71.5至80.2上AMR 2.0），达到现有技术的状态。其结果是非常令人鼓舞的，因为我们有seq2seq模型，而不是复杂的模型，实现这一目标。我们使我们的代码和型号可在此HTTPS URL。

35. Second-Order NLP Adversarial Examples [PDF] 返回目录
John X. Morris
Abstract: Adversarial example generation methods in NLP rely on models like language models or sentence encoders to determine if potential adversarial examples are valid. In these methods, a valid adversarial example fools the model being attacked, and is determined to be semantically or syntactically valid by a second model. Research to date has counted all such examples as errors by the attacked model. We contend that these adversarial examples may not be flaws in the attacked model, but flaws in the model that determines validity. We term such invalid inputs second-order adversarial examples. We propose the constraint robustness curve, and associated metric ACCS, as tools for evaluating the robustness of a constraint to second-order adversarial examples. To generate this curve, we design an adversarial attack to run directly on the semantic similarity models. We test on two constraints, the Universal Sentence Encoder (USE) and BERTScore. Our findings indicate that such second-order examples exist, but are typically less common than first-order adversarial examples in state-of-the-art models. They also indicate that USE is effective as constraint on NLP adversarial examples, while BERTScore is nearly ineffectual. Code for running the experiments in this paper is available \href{this https URL}{here}.
摘要：NLP对抗性例如代方法依赖于像语言模型或句子编码器模型，以确定潜在的敌对例子是有效的。在这些方法中，一个有效对抗例如愚弄被攻击的模型，并且确定由第二模型，以在语义或语法上有效。研究迄今统计的所有这样的例子由攻击模型误差。我们主张，这些对抗性的例子可能不是在攻击模式的缺陷，但瑕疵决定的有效性模型。我们长期这样的无效输入二阶对抗的例子。我们提出的约束鲁棒性曲线，以及相关的指标ACCS，作为评价约束二阶对抗例子的鲁棒性的工具。为了产生这条曲线，我们设计了一个敌对的攻击直接在语义相似的模型运行。我们对两个约束测试，通用句编码器（USE）和BERTScore。我们的研究结果表明，这样的二阶例子存在的，但通常比在国家的最先进的模型一阶对抗性例子不太常见。他们还指出，使用可有效作为NLP对抗例子约束，而BERTScore几乎是无效的。代码本文运行实验可用\ {HREF这HTTPS URL} {}这里。

36. On the Effects of Knowledge-Augmented Data in Word Embeddings [PDF] 返回目录
Diego Ramirez-Echavarria, Antonis Bikakis, Luke Dickens, Rob Miller, Andreas Vlachidis
Abstract: This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
摘要：本文研究技术知识注入从大型语料库未注释的数据了解到字的嵌入。这些表述被训练用字一同出现统计和不常用的利用从语言知识基础，这可能限制其复制性领域与不同的语言分布或用法句法和语义信息。我们提出了语言知识注入一种新的方法通过数据增强学习的是执行从数据语义关系，并系统地评估其对所产生的交涉的影响字的嵌入。我们发现我们的知识隆胸方法提高学习的嵌入的固有特性，而不是显著改变上下游的文本分类任务的结果。

37. Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models [PDF] 返回目录
Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari
Abstract: Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \emph{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \emph{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.
摘要：最近的工作表明广泛覆盖的适应的重要性contextualised感兴趣的目标任务的域模型嵌入。目前自我监督的适应方法是简单的，作为训练信号来自\ {EMPH随机}屏蔽掉令牌的一小部分。在本文中，我们表明，认真屏蔽策略可以通过分配自检需要的地方更有效地弥补掩盖语言模型（的MLM）的有关领域的知识鸿沟。此外，我们提出通过adversarially屏蔽掉那些更难由底层MLM来重建令牌有效的训练策略。敌对目标导致一个具有挑战性的组合优化问题，在\ EMPH令牌{}子集，这是我们为有效遏止通过松弛变分下界和动态规划。在涉及命名实体识别6级监督的领域适应性的任务，我们的方法优于强烈的随机屏蔽策略，并实现高达1.64 F1比分改进。

38. Transformer-Based Neural Text Generation with Syntactic Guidance [PDF] 返回目录
Yinghao Li, Rui Feng, Isaac Rehg, Chao Zhang
Abstract: We study the problem of using (partial) constituency parse trees as syntactic guidance for controlled text generation. Existing approaches to this problem use recurrent structures, which not only suffer from the long-term dependency problem but also falls short in modeling the tree structure of the syntactic guidance. We propose to leverage the parallelism of Transformer to better incorporate parse trees. Our method first expands a partial template constituency parse tree to a full-fledged parse tree tailored for the input source text, and then uses the expanded tree to guide text generation. The effectiveness of our model in this process hinges upon two new attention mechanisms: 1) a path attention mechanism that forces one node to attend to only other nodes located in its path in the syntax tree to better incorporate syntax guidance; 2) a multi-encoder attention mechanism that allows the decoder to dynamically attend to information from multiple encoders. Our experiments in the controlled paraphrasing task show that our method outperforms SOTA models both semantically and syntactically, improving the best baseline's BLEU score from 11.83 to 26.27.
摘要：我们研究使用（部分）选区解析树作为控制文本生成语法指导的问题。现有的方法对这个问题反复使用结构，这不仅从长期依赖问题的困扰，但也没有达到在造型语法指导的树结构。我们建议在变压器的并行利用，以更好地将解析树。我们的方法首先扩展部分模板选区解析树为输入原文量身定做一个全面的解析树，然后使用扩展树指南文本生成。我们的模型在此过程中的铰链在两个新的关注机制的有效性：1）的路径注意机制的力量一个节点出席位于其路径在语法树以更好地将语法指引，其他节点; 2）一个多编码器注意机制，其允许解码器动态地参加来自多个编码器的信息。我们在控制意译任务的实验表明我们的方法优于SOTA车型都在语义和语法，提高了最好基线的BLEU得分从11.83到26.27。

39. MCMH: Learning Multi-Chain Multi-Hop Rules for Knowledge Graph Reasoning [PDF] 返回目录
Lu Zhang, Mo Yu, Tian Gao, Yue Yu
Abstract: Multi-hop reasoning approaches over knowledge graphs infer a missing relationship between entities with a multi-hop rule, which corresponds to a chain of relationships. We extend existing works to consider a generalized form of multi-hop rules, where each rule is a set of relation chains. To learn such generalized rules efficiently, we propose a two-step approach that first selects a small set of relation chains as a rule and then evaluates the confidence of the target relationship by jointly scoring the selected chains. A game-theoretical framework is proposed to this end to simultaneously optimize the rule selection and prediction steps. Empirical results show that our multi-chain multi-hop (MCMH) rules result in superior results compared to the standard single-chain approaches, justifying both our formulation of generalized rules and the effectiveness of the proposed learning framework.
摘要：多跳推理方法比知识图推断与多跳规则实体之间的关系缺失，对应于关系链。我们扩展现有作品考虑的多跳的规则，其中每个规则是一组关系链的一种推广形式。为了有效地学习这种普遍的规则，我们提出了一个两步走的方法，首先选择一小部分关系链作为一项规则，然后评估对象的关系通过联合得分所选链的信心。一种游戏-理论框架，提出为此以同时优化规则选择和预测步骤。实证结果表明，我们的多链多跳（MCMH）规则产生更好的结果相比，标准的单链方法，证明了我们的广义规则制定和提出的学习框架的有效性。

40. STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation [PDF] 返回目录
Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng, Mohit Iyyer
Abstract: Systems for story generation are asked to produce plausible and enjoyable stories given an input context. This task is underspecified, as a vast number of diverse stories can originate from a single input. The large output space makes it difficult to build and evaluate story generation models, as (1) existing datasets lack rich enough contexts to meaningfully guide models, and (2) existing evaluations (both crowdsourced and automatic) are unreliable for assessing long-form creative text. To address these issues, we introduce a dataset and evaluation platform built from STORIUM, an online collaborative storytelling community. Our author-generated dataset contains 6K lengthy stories (125M tokens) with fine-grained natural language annotations (e.g., character goals and attributes) interspersed throughout each narrative, forming a robust source for guiding models. We evaluate language models fine-tuned on our dataset by integrating them onto STORIUM, where real authors can query a model for suggested story continuations and then edit them. Automatic metrics computed over these edits correlate well with both user ratings of generated stories and qualitative feedback from semi-structured user interviews. We release both the STORIUM dataset and evaluation platform to spur more principled research into story generation.
摘要：系统的故事一代被要求出示给定的输入上下文合理的和愉快的故事。这个任务是尚未，因为不同的故事了广大的可以从一个单一的输入来源。产量大空间使得它难以建立和评估故事一代车型，如（1）现有数据集缺乏足够丰富的情境有意义引导模式，和（2）现有的评估（包括众包和自动）是不可靠的评估长篇创作文本。为了解决这些问题，我们将介绍从STORIUM，在线协作评书界建立了一个数据集和评估平台。我们的作家产生的数据集包含6K冗长的故事（125M令牌）在整个叙述每一个穿插细粒度的自然语言的注释（例如，人物的目标和属性），形成了引导模式强大的源。我们评估的语言模型通过将它们集成到STORIUM，其中真正的作者可以查询建议的故事延续一个模型，然后编辑他们在我们的数据微调。在计算这些编辑自动度量与半结构化面试用户生成的故事和定性反馈的用户都收视率密切相关。我们释放STORIUM数据集和评估平台，既可以刺激更多原则性的研究故事的产生。

41. Reading Comprehension as Natural Language Inference: A Semantic Analysis [PDF] 返回目录
Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan Kapanipathi, Kartik Talamadupula
Abstract: In the recent past, Natural language Inference (NLI) has gained significant attention, particularly given its promise for downstream NLP tasks. However, its true impact is limited and has not been well studied. Therefore, in this paper, we explore the utility of NLI for one of the most prominent downstream tasks, viz. Question Answering (QA). We transform the one of the largest available MRC dataset (RACE) to an NLI form, and compare the performances of a state-of-the-art model (RoBERTa) on both these forms. We propose new characterizations of questions, and evaluate the performance of QA and NLI models on these categories. We highlight clear categories for which the model is able to perform better when the data is presented in a coherent entailment form, and a structured question-answer concatenation form, respectively.
摘要：最近一段时间以来，自然语言推理（NLI）已获得显著的关注，特别是考虑到其下游NLP任务的承诺。然而，其真正的影响是有限的，并没有得到很好的研究。因此，在本文中，我们探讨NLI的效用最突出的下游的任务之一，即问答系统（QA）。我们把最大可用MRC数据集（RACE）到NLI形式之一，并比较这两种形式的一个国家的最先进的模型（罗伯塔）的性能。我们提出的问题，新的刻画，并评估QA和NLI模型对这些类别的表现。我们分别突出清楚哪些类别的模型能够更好的表现，当数据在一个连贯的蕴涵形式呈现，以及层次分明的问答形式串联，。

42. On Losses for Modern Language Models [PDF] 返回目录
Stephane Aroca-Ouellette, Frank Rudzicz
Abstract: BERT set many state-of-the-art results over varied NLU benchmarks by pre-training over two tasks: masked language modelling (MLM) and next sentence prediction (NSP), the latter of which has been highly criticized. In this paper, we 1) clarify NSP's effect on BERT pre-training, 2) explore fourteen possible auxiliary pre-training tasks, of which seven are novel to modern language models, and 3) investigate different ways to include multiple tasks into pre-training. We show that NSP is detrimental to training due to its context splitting and shallow semantic signal. We also identify six auxiliary pre-training tasks -- sentence ordering, adjacent sentence prediction, TF prediction, TF-IDF prediction, a FastSent variant, and a Quick Thoughts variant -- that outperform a pure MLM baseline. Finally, we demonstrate that using multiple tasks in a multi-task pre-training framework provides better results than using any single auxiliary task. Using these methods, we outperform BERT Base on the GLUE benchmark using fewer than a quarter of the training tokens.
摘要：BERT设置许多由前训练的国家的最先进成果在不同的NLU基准超2项任务：蒙面语言模型（MLM）和下一句预测（NSP），后者一直高度批评。在本文中，我们1）澄清NSP对BERT效应预培训，2）探索14可以辅助前的训练任务，其中7个是新的现代的语言模型，以及3）调查不同的方式包括多任务分成预训练。我们表明，NSP是不利的训练，由于它的上下文分裂和浅层语义信号。我们还识别出6种辅助前培训任务 - 句子顺序，相邻的句子预测，TF预测，TF-IDF预测，一个FastSent变的，快速的思考变种 - 即超越纯粹的传销基线。最后，我们证明了在多任务前培训框架使用多任务提供了比使用任何单一的辅助任务更好的结果。使用这些方法，我们跑赢上使用除培训令牌的四分之一少胶基准BERT基地。

43. DLGNet-Task: An End-to-end Neural Network Framework for Modeling Multi-turn Multi-domain Task-Oriented Dialogue [PDF] 返回目录
Oluwatobi O. Olabiyi, Prarthana Bhattarai, C. Bayan Bruss, Zachary Kulis
Abstract: Task oriented dialogue (TOD) requires the complex interleaving of a number of individually controllable components with strong guarantees for explainability and verifiability. This has made it difficult to adopt the multi-turn multi-domain dialogue generation capabilities of streamlined end-to-end open-domain dialogue systems. In this paper, we present a new framework, DLGNet-Task, a unified task-oriented dialogue system which employs autoregressive transformer networks such as DLGNet and GPT-2/3 to complete user tasks in multi-turn multi-domain conversations. Our framework enjoys the controllable, verifiable, and explainable outputs of modular approaches, and the low development, deployment and maintenance cost of end-to-end systems. Treating open-domain system components as additional TOD system modules allows DLGNet-Task to learn the joint distribution of the inputs and outputs of all the functional blocks of existing modular approaches such as, natural language understanding (NLU), state tracking, action policy, as well as natural language generation (NLG). Rather than training the modules individually, as is common in real-world systems, we trained them jointly with appropriate module separations. When evaluated on the MultiWOZ2.1 dataset, DLGNet-Task shows comparable performance to the existing state-of-the-art approaches. Furthermore, using DLGNet-Task in conversational AI systems reduces the level of effort required for developing, deploying, and maintaining intelligent assistants at scale.
摘要：基于任务的对话（TOD）需要许多独立可控器件与explainability和可验证性强保障的复杂交织。这使得它很难采用多转向精简终端到终端的开放领域的对话系统的多领域对话生成功能。在本文中，我们提出了一个新的框架，DLGNet任务，统一面向任务的对话系统，该系统采用自回归变压器网络如DLGNet和GPT-2/3在多转多领域的对话完整的用户任务。我们的框架中享有的模块化方法可控，可核实和解释的输出和低开发，部署和终端到终端系统的维护成本。治疗开域系统的部件附加TOD系统模块允许DLGNet任务学习的现有模块化接近所有的功能块，诸如，自然语言理解（NLU），状态跟踪，操作策略的输入和输出的联合分布，以及自然语言生成（NLG）。而不是单独训练模块，是在现实世界系统中常见的，我们有相应的模块分离训练他们联合。当在数据集MultiWOZ2.1，评价DLGNet任务示出相当的性能与现有状态的最先进的方法。此外，在对话AI系统使用DLGNet任务减少的努力用于开发，部署和大规模维护智能助手所要求的水平。

44. Weakly-supervised Fine-grained Event Recognition on Social Media Texts for Disaster Management [PDF] 返回目录
Wenlin Yao, Cheng Zhang, Shiva Saravanan, Ruihong Huang, Ali Mostafavi
Abstract: People increasingly use social media to report emergencies, seek help or share information during disasters, which makes social networks an important tool for disaster management. To meet these time-critical needs, we present a weakly supervised approach for rapidly building high-quality classifiers that label each individual Twitter message with fine-grained event categories. Most importantly, we propose a novel method to create high-quality labeled data in a timely manner that automatically clusters tweets containing an event keyword and asks a domain expert to disambiguate event word senses and label clusters quickly. In addition, to process extremely noisy and often rather short user-generated messages, we enrich tweet representations using preceding context tweets and reply tweets in building event recognition classifiers. The evaluation on two hurricanes, Harvey and Florence, shows that using only 1-2 person-hours of human supervision, the rapidly trained weakly supervised classifiers outperform supervised classifiers trained using more than ten thousand annotated tweets created in over 50 person-hours.
摘要：人们越来越多地使用社交媒体来报道突发事件，寻求灾难时帮助或共享信息，这使得社交网络进行灾害管理的重要工具。为了满足这些时间紧迫的需求，我们提出了一个快速构建该标签与细粒度事件类别每个单独的Twitter消息，高品质的分类弱监督方法。最重要的是，我们提出了一个新颖的方法及时，包含自动集群鸣叫事件的关键字，并询问领域专家的歧义事件词义和标签集群快速创建高质量的标签数据。此外，处理极其嘈杂，往往相当短的用户生成的消息，我们丰富且前面上下文鸣叫鸣叫陈述和建设事件识别分类回复鸣叫。在两次飓风，哈维和佛罗伦萨，实践证明，采用只有1-2人时人的监督，迅速训练的弱监督分类跑赢使用超过50人时创造超过一万个注释鸣叫训练的监督分类评价。

45. Optimal Neural Program Synthesis from Multimodal Specifications [PDF] 返回目录
Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett
Abstract: Multimodal program synthesis, which leverages different types of user input to synthesize a desired program, is an attractive way to scale program synthesis to challenging settings; however, it requires integrating noisy signals from the user (like natural language) with hard constraints on the program's behavior. This paper proposes an optimal neural synthesis approach where the goal is to find a program that satisfies user-provided constraints while also maximizing the program's score with respect to a neural model. Specifically, we focus on multimodal synthesis tasks in which the user intent is expressed using combination of natural language (NL) and input-output examples. At the core of our method is a top-down recurrent neural model that places distributions over abstract syntax trees conditioned on the NL input. This model not only allows for efficient search over the space of syntactically valid programs, but it allows us to leverage automated program analysis techniques for pruning the search space based on infeasibility of partial programs with respect to the user's constraints. The experimental results on a multimodal synthesis dataset (StructuredRegex) show that our method substantially outperforms prior state-of-the-art techniques in terms of accuracy %, finds model-optimal programs more frequently, and explores fewer states during search.
摘要：多模式程序合成，它利用不同类型的用户输入，以合成所希望的节目，是一个有吸引力的方式来扩展程序合成具有挑战性设置;但是，它需要从用户使用的程序的行为硬约束积分噪声信号（如自然语言）。本文提出了一种最佳的神经合成方法，其目的是要找到一个方案，满足用户提供的限制，同时也对于最大限度地提高程序的得分与神经网络模型。具体而言，我们专注于其中用户意图使用自然语言（NL）和输入输出的例子的组合来表示多峰合成任务。在我们的方法的核心是一个自上而下的递归神经模型，该模型在抽象语法树的地方分布条件的NL输入。这种模式不仅可以超过语法有效的程序空间高效的搜索，但它使我们能够利用自动化的程序分析技术用于修剪基于相对于用户的限制部分节目不可行的搜索空间。上的多峰合成数据集（StructuredRegex）显示，我们的方法基本上在准确性％计优于先前状态的最先进的技术的实验结果，在搜索期间找到模型最优方案更频繁，并探讨了更少的状态。

46. Local Additivity Based Data Augmentation for Semi-supervised NER [PDF] 返回目录
Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang
Abstract: Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by interpolating sequences close to each other. Our approach has two variations: Intra-LADA and Inter-LADA, where Intra-LADA performs interpolations among tokens within one sentence, and Inter-LADA samples different sentences to interpolate. Through linear additions between sampled training data, LADA creates an infinite amount of labeled data and improves both entity and context learning. We further extend LADA to the semi-supervised setting by designing a novel consistency loss for unlabeled data. Experiments conducted on two NER benchmarks demonstrate the effectiveness of our methods over several strong baselines. We have publicly released our code at this https URL.
摘要：命名实体识别（NER）是深语言的第一阶段尚未认识当前NER模型在很大程度上依赖于人注释的数据之一。在这项工作中，以减轻标记数据的依赖性，我们提出了半监督NER基于本地可加数据扩张（LADA）方法，在此我们通过彼此接近内插序列创建虚拟样品。我们的方法有两个变化：帧内LADA和跨LADA，其中帧内LADA一个句子中的令牌之间进行插值，并且跨LADA样本不同的句子进行插值。通过采样的训练数据之间的线性增加，LADA创建标记数据的无限量和改善了实体和上下文学习。我们通过设计为未标记数据的一致性新颖损失进一步延伸LADA到半监督设置。两个NER基准进行的实验证明我们的方法在几个强大的基线的有效性。我们在此HTTPS URL已公开发布我们的代码。

47. Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization [PDF] 返回目录
Jiaao Chen, Diyi Yang
Abstract: Text summarization is one of the most challenging and interesting problems in NLP. Although much attention has been paid to summarizing structured text like news reports or encyclopedia articles, summarizing conversations---an essential part of human-human/machine interaction where most important pieces of information are scattered across various utterances of different speakers---remains relatively under-investigated. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations and then utilizing a multi-view decoder to incorporate different views to generate dialogue summaries. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment. We also discussed specific challenges that current approaches faced with this task. We have publicly released our code at this https URL.
摘要：文摘是在NLP最具挑战性和有趣的问题之一。虽然备受关注已经支付给总结结构类似的新闻报道或百科全书文章，总结对话---人与人的重要组成部分/人机交互，其中的信息最重要的部分是分散在不同的扬声器的各种发音---遗体文本相对不足的影响。这项工作提出了通过第一提取来自不同视图的非结构化每日聊天会话结构以表示对话，然后利用多视图解码器，能把不同视图来生成对话摘要的多视图序列到序列模型。在大规模的对话摘要文集实验证明我们的方法同时通过自动评估和人工判断显著优于国家的最先进的以往机型。我们还讨论了面临这一任务的当前方法的具体挑战。我们在此HTTPS URL已公开发布我们的代码。

48. Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation [PDF] 返回目录
Luyu Gao, Xinyi Wang, Graham Neubig
Abstract: To improve the performance of Neural Machine Translation~(NMT) for low-resource languages~(LRL), one effective strategy is to leverage parallel data from a related high-resource language~(HRL). However, multilingual data has been found more beneficial for NMT models that translate from the LRL to a target language than the ones that translate into the LRLs. In this paper, we aim to improve the effectiveness of multilingual transfer for NMT models that translate \emph{into} the LRL, by designing a better decoder word embedding. Extending upon a general-purpose multilingual encoding method Soft Decoupled Encoding~\citep{SDE}, we propose DecSDE, an efficient character n-gram based embedding specifically designed for the NMT decoder. Our experiments show that DecSDE leads to consistent gains of up to 1.8 BLEU on translation from English to four different languages.
摘要：为了提高神经机器翻译〜（NMT）的资源少的语言〜（LRL）的性能，一个有效的策略是利用并行数据来自相关资源丰富的语言〜（HRL）。然而，多语种数据已发现NMT模型，从LRL转化为比翻译成LRLS的那些目标语言更有益。在本文中，我们的目标是提高NMT模型，通过设计一个更好的解码字嵌入翻译\ {EMPH到} LRL的，多语种传播的有效性。在一个通用的多语言编码方法软解耦编码〜\ citep {SDE}延伸，我们提出DecSDE，一种有效的字符基于n元语法包埋专门为NMT译码器设计。我们的实验表明，DecSDE导致高达1.8 BLEU对翻译的英语四种不同的语言一致的收益。

49. Generating Dialogue Responses from a Semantic Latent Space [PDF] 返回目录
Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin
Abstract: Existing open-domain dialogue generation models are usually trained to mimic the gold response in the training set using cross-entropy loss on the vocabulary. However, a good response does not need to resemble the gold response, since there are multiple possible responses to a given prompt. In this work, we hypothesize that the current models are unable to integrate information from multiple semantically similar valid responses of a prompt, resulting in the generation of generic and uninformative responses. To address this issue, we propose an alternative to the end-to-end classification on vocabulary. We learn the pair relationship between the prompts and responses as a regression task on a latent space instead. In our novel dialog generation model, the representations of semantically related sentences are close to each other on the latent space. Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
摘要：现有的开放域对话代车型通常被训练模拟使用的词汇交叉熵损失训练集黄金响应。然而，良好的反应并不需要像黄金响应，因为有一个及时给予多个可能的响应。在这项工作中，我们假设目前的模型无法从信息提示的多重语义相似有效响应积分，导致通用和无信息反应的产生。为了解决这个问题，我们建议对词汇的端至端的分类替代。我们学习上的潜在空间的提示和响应的回归任务之间的关系，对代替。在我们的新的对话生成模型，语义相关的句子的表示是相互接近的潜在空间。人的评价结果表明，学习上的连续空间的任务可以生成既相关和翔实的答复。

50. Inquisitive Question Generation for High Level Text Comprehension [PDF] 返回目录
Wei-Jen Ko, Te-Yuan Chen, Yiyan Huang, Greg Durrett, Junyi Jessy Li
Abstract: Inquisitive probing questions come naturally to humans in a variety of settings, but is a challenging task for automatic systems. One natural type of question to ask tries to fill a gap in knowledge during text comprehension, like reading a news article: we might ask about background information, deeper reasons behind things occurring, or more. Despite recent progress with data-driven approaches, generating such questions is beyond the range of models trained on existing datasets. We introduce INQUISITIVE, a dataset of ~19K questions that are elicited while a person is reading through a document. Compared to existing datasets, INQUISITIVE questions target more towards high-level (semantic and discourse) comprehension of text. We show that readers engage in a series of pragmatic strategies to seek information. Finally, we evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions although the task is challenging, and highlight the importance of context to generate INQUISITIVE questions.
摘要：好奇探索性的问题来自然给人类的各种设置，但不支持自动系统一项艰巨的任务。一个很自然的类型的问题要问尝试文本理解中，以填补知识的差距，就像读一本新闻文章：我们可能会问有关的背景资料，事物背后更深层次的原因发生的，或更多。尽管有数据驱动方法的最新进展，产生这样的问题已经超出了培训了现有数据集模型的范围内。我们引进好奇的〜19K问题的数据集，而一个人通过的文件读取被激发。相比于现有的数据集，好奇的问题，目标更多地转向高层次的文本（语义和话语）理解。我们发现，读者参与了一系列务实的策略来寻求信息。最后，我们评估基于GPT-2提问代车型，并表明我们的模型是能够产生合理的问题，尽管任务具有挑战性，并突出背景的重要性，产生好奇的问题。

51. An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels [PDF] 返回目录
Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Abstract: Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware annotation proximity. Finally, the label hierarchies are periodically updated, requiring LMTC models capable of zero-shot generalization. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs), which (1) typically treat LMTC as flat multi-label classification; (2) may use the label hierarchy to improve zero-shot learning, although this practice is vastly understudied; and (3) have not been combined with pre-trained Transformers (e.g. BERT), which have led to state-of-the-art results in several NLP benchmarks. Here, for the first time, we empirically evaluate a battery of LMTC methods from vanilla LWANs to hierarchical classification approaches and transfer learning, on frequent, few, and zero-shot learning on three datasets from different domains. We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs. Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWANs. Finally, we propose new models that leverage the label hierarchy to improve few and zero-shot learning, considering on each dataset a graph-aware annotation proximity measure that we introduce.
摘要：大型多标签文本分类（LMTC）具有广泛的自然语言处理（NLP）的应用，并提出有趣的挑战。首先，并不是所有的标签以及在训练集表示，由于非常大的标签组和LMTC数据集的偏态分布的标签。此外，标签层次结构和人类标签准则差异可能会影响图形感知注释接近。最后，标签层次定期更新，需要能够零射门泛化LMTC模型。当前状态的最先进的LMTC模型雇用的Label-明智注意网络（LWANs），其（1）典型地对待LMTC为平坦多标记分类; （2）可以使用标签的层次结构，以提高零射门的学习，虽然这种做法极大地充分研究;和（3）没有被结合预训练的变压器（例如BERT），这些都导致了国家的最先进的结果在几个NLP基准。在这里，第一次，我们要凭经验分层分类方法评估的香草LWANs LMTC方法电池和转让学习，对来自不同域的三个数据集中频繁，很少和零次学习。我们展示基于概率标签树（血小板）跑赢大市LWANs是分层的方法。此外，我们表明，基于变压器的方法优于国家的最先进的两个数据集，并提出了一个新的国家的最先进的方法，该方法与LWANs结合BERT。最后，我们提出了新的模式，充分利用标签的层次结构，以改善少数和零射门学习，考虑到每个数据集的图形感知注解接近措施，我们介绍。

52. Meta Sequence Learning and Its Applications [PDF] 返回目录
Cheng Zhang, Jie Wang
Abstract: We present a meta-sequence representation of sentences and demonstrate how to use meta sequence learning to generate adequate question-answer pairs (QAPs) over a given article. A meta sequence is a sequence of vectors of semantic and syntactic tags. On a given declarative sentence, a trained model converts it to a meta sequence, finds a matched meta sequence in its learned database, and uses the corresponding meta sequence for interrogative sentence to generate QAPs. We show that, trained on a small dataset, our method generates efficiently, on the official SAT practice reading tests, a large number of syntactically and semantically correct QAPs with high accuracy.
摘要：我们提出句子的meta序列的代表性和示范如何使用元序列学习产生在给定的物品足够的问答配对（QAPs）。甲元序列是语义和语法标记向量的序列。在给定的陈述句，它训练模型转换为元序列，发现在其了解到数据库匹配元序列，并使用疑问句产生QAPs相应的元序列。我们表明，受过训练的一个小数据集，我们的方法产生有效的，在官方SAT练习阅读测试中，大量的语法和语义高精度正确QAPs的。

53. When in Doubt, Ask: Generating Answerable and Unanswerable Questions, Unsupervised [PDF] 返回目录
Liubov Nikolenko, Pouya Rezazadeh Kalehbasti
Abstract: Question Answering (QA) is key for making possible a robust communication between human and machine. Modern language models used for QA have surpassed the human-performance in several essential tasks; however, these models require large amounts of human-generated training data which are costly and time-consuming to create. This paper studies augmenting human-made datasets with synthetic data as a way of surmounting this problem. A state-of-the-art model based on deep transformers is used to inspect the impact of using synthetic answerable and unanswerable questions to complement a well-known human-made dataset. The results indicate a tangible improvement in the performance of the language model (measured in terms of F1 and EM scores) trained on the mixed dataset. Specifically, unanswerable question-answers prove more effective in boosting the model: the F1 score gain from adding to the original dataset the answerable, unanswerable, and combined question-answers were 1.3\%, 5.0\%, and 6.7\%, respectively. [Link to the Github repository: this https URL]
摘要：问答系统（QA）可以使人类和机器之间可能的一个强大的通信密钥。用于QA现代语言模型已经超过了几个基本任务的人表现;然而，这些模型需要大量这是昂贵和费时的创建人为产生的训练数据。本文研究了增强人类制造与合成数据作为克服这个问题的一种方法的数据集。基于深变压器一个国家的最先进的模型，用于检查使用合成回答的和无法回答的问题，以补充公知的人为数据集的影响。结果表明在训练对混合数据集的语言模型（在F1和EM分数衡量）的性能的明显改善。具体而言，无法回答的问题 - 回答证明在升压模型更有效的：从添加到所述原始数据集的回答的，无法回答，和合并的问题，回答是1.3 \％，5.0 \％，和6.7 \％，分别为F1分数增益。 [链接到Github上库：此HTTPS URL]

54. Adversarial Attack and Defense of Structured Prediction Models [PDF] 返回目录
Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu
Abstract: Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on classification problems. In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. Besides the difficulty of perturbing discrete words and the sentence fluency problem faced by attackers in any NLP tasks, there is a specific challenge to attackers of structured prediction models: the structured output of structured prediction models is sensitive to small perturbations in the input. To address these problems, we propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model with feedbacks from multiple reference models of the same structured prediction task. Based on the proposed attack, we further reinforce the victim model with adversarial training, making its prediction more robust and accurate. We evaluate the proposed framework in dependency parsing and part-of-speech tagging. Automatic and human evaluations show that our proposed framework succeeds in both attacking state-of-the-art structured prediction models and boosting them with adversarial training.
摘要：建立一个有效的对抗攻击者并制定针对自然语言处理（NLP）对抗攻击的对策，吸引了大量的研究，在近几年。然而，大多数现有的方法集中在分类问题。在本文中，我们探讨在NLP结构预测任务的攻击和防御。除了扰乱离散的话，在任何NLP任务面临被攻击的句子流畅度问题的难度，有具体的挑战结构预测模型的攻击：构建预测模型的结构化输出是输入小扰动敏感。为了解决这些问题，我们提出了一个新的，统一的框架，学会使用序列到序列模型与来自相同结构的预测任务的多个参考模型反馈攻击一个结构化的预测模型。基于提出的攻击，我们进一步加强与对抗性训练受害人模式，使得它的预测更强大和准确。我们评估依存分析和部分词性标注拟议的框架。自动和人的评估表明，我们提出的框架，既攻击的国家的最先进的结构预测模型成功，并与对抗性训练提升他们。

55. Reverse Operation based Data Augmentation for Solving Math Word Problems [PDF] 返回目录
Qianying Liu, Wenyu Guan, Sujian Li, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi
Abstract: Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. Inspired by human double-checking mechanism, we propose a reverse operation based data augmentation method that makes use of mathematical logic to produce new high-quality math problems and introduce new knowledge points that can give supervision for new mathematical reasoning logic. We apply the augmented data on two SOTA math word problem solving models. Experimental results show the effectiveness of our approach\footnote{We will release our code and data after the paper is accepted.}.
摘要：自动解决数学文字问题是自然语言处理领域的一个重要任务。最新型号已经达到了他们的性能瓶颈，而且需要培养更多高质量的数据。人类双重检查机制的启发，我们提出了一种基于反向运行数据隆胸方法，它利用数理逻辑的产生新的高质量的数学问题和引入新的知识点，可以给新的数学推理逻辑的监督。我们采用两种SOTA数学文字问题解决模式的增强数据。实验结果表明，我们的方法\脚注的有效性{纸被接受后，我们会发布我们的代码和数据。}。

56. Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus [PDF] 返回目录
Sina Ahmadi, Hossein Hassani, Daban Q. Jaff
Abstract: Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have remained indispensable for progress in the field. In an attempt to create parallel corpora for the Kurdish language, in this paper, we describe our approach in retrieving potentially-alignable news articles from multi-language websites and manually align them across dialects and languages based on lexical similarity and transliteration of scripts. We present a corpus containing 12,327 translation pairs in the two major dialects of Kurdish, Sorani and Kurmanji. We also provide 1,797 and 650 translation pairs in English-Kurmanji and English-Sorani. The corpus is publicly available under the CC BY-NC-SA 4.0 license.
摘要：机器翻译已经在自然语言处理发展的一大动力。尽管在创造更高效的机器翻译系统由于深学习方法蓬勃发展的成就，平行语料库仍然在领域的进步是不可或缺的。在试图库尔德语言创建平行语料库，在本文中，我们描述的方法，从多语言网站获取潜在的配向新闻文章和手动跨基于词汇相似度和脚本的音译方言和语言对齐。我们提出包含库尔德人，索拉尼和Kurmanji两大方言12,327翻译对语料库。我们还提供英语，Kurmanji和英语索拉尼1797和650的翻译对。该语料库是公开可用下CC BY-NC-SA 4.0许可证。

57. A Survey of Unsupervised Dependency Parsing [PDF] 返回目录
Wenjuan Han, Yong Jiang, Hwee Tou Ng, Kewei Tu
Abstract: Syntactic dependency parsing is an important task in natural language processing. Unsupervised dependency parsing aims to learn a dependency parser from sentences that have no annotation of their correct parse trees. Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data. It also serves as the basis for other research in low-resource parsing. In this paper, we survey existing approaches to unsupervised dependency parsing, identify two major classes of approaches, and discuss recent trends. We hope that our survey can provide insights for researchers and facilitate future research on this topic.
摘要：语法结构分析是自然语言处理的一项重要任务。无监督依存分析的目的来学习从没有自己的正确解析树的注解句子依赖解析器。尽管它的难度，无监督的解析是因为它利用几乎是无限的未注释的文本数据的能力的一个有趣的研究方向。它也可以作为在低资源分析等研究奠定了基础。在本文中，我们调查现有的方法无监督依存分析，识别方法两大类，并讨论最近的趋势。我们希望，我们的调查可以为研究人员提供的见解和方便以后对这个课题的研究。

58. A Multi-task Learning Framework for Opinion Triplet Extraction [PDF] 返回目录
Chen Zhang, Qiuchi Li, Dawei Song, Benyou Wang
Abstract: The state-of-the-art Aspect-based Sentiment Analysis (ABSA) approaches are mainly based on either detecting aspect terms and their corresponding sentiment polarities, or co-extracting aspect and opinion terms. However, the extraction of aspect-sentiment pairs lacks opinion terms as a reference, while co-extraction of aspect and opinion terms would not lead to meaningful pairs without determining their sentiment dependencies. To address the issue, we present a novel view of ABSA as an opinion triplet extraction task, and propose a multi-task learning framework to jointly extract aspect terms and opinion terms, and simultaneously parses sentiment dependencies between them with a biaffine scorer. At inference phase, the extraction of triplets is facilitated by a triplet decoding method based on the above outputs. We evaluate the proposed framework on four SemEval benchmarks for ASBA. The results demonstrate that our approach significantly outperforms a range of strong baselines and state-of-the-art approaches.
摘要：国家的最先进的基于Aspect的情感分析（ABSA）的方法主要是基于任一检测方面术语和它们的相应的情感的极性，或共提取方面和意见条款。但是，纵横情绪对提取缺乏看来术语作为参考，同时方面和意见术语共萃取不会没有确定其情绪依赖性导致有意义对。为了解决这个问题，我们提出ABSA的新视图的意见三重提取任务，并提出了多任务学习框架，共同提取方面的条款和条件的意见，同时分析它们之间的依赖情绪与biaffine得分手。在推断阶段，三胞胎的提取是通过基于上述输出三重解码方法促进。我们评估对ASBA 4个SemEval基准拟议的框架。结果表明，我们的方法显著优于一系列强有力的基线和国家的最先进的方法的。

59. Multi-turn Response Selection using Dialogue Dependency Relations [PDF] 返回目录
Qi Jia, Yizhu Liu, Siyu Ren, Kenny Q. Zhu, Haifeng Tang
Abstract: Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations. Each thread can be regarded as a self-contained sub-dialogue. We also propose Thread-Encoder model to encode threads and candidates into compact representations by pre-trained Transformers and finally get the matching score through an attention layer. The experiments show that dependency relations are helpful for dialogue context understanding, and our model outperforms the state-of-the-art baselines on both DSTC7 and DSTC8*, with competitive results on UbuntuV2.
摘要：多转响应的选择是专为开展对话代理的任务。这项任务的性能与预训练的语言模型中显着的改善。然而，这些模式只是在连接对话历史作为输入匝并且在很大程度上忽略圈之间的依赖关系。在本文中，我们提出了一个对话提取算法变换对话历史基于其依赖关系的线程。每个线程都可以被看作是一个独立的子对话。我们还建议线程编码模型编码线程，应聘进了简洁表示通过预先训练变形金刚终于打通注意的层匹配分数。实验表明，依存关系是对话的上下文的理解有帮助的，而我们的模型优于两个DSTC7和DSTC8 *国家的最先进的基线，对UbuntuV2竞争力的结果。

60. Explaining Deep Neural Networks [PDF] 返回目录
Oana-Maria Camburu
Abstract: Deep neural networks are becoming more and more popular due to their revolutionary success in diverse areas, such as computer vision, natural language processing, and speech recognition. However, the decision-making processes of these models are generally not interpretable to users. In various domains, such as healthcare, finance, or law, it is critical to know the reasons behind a decision made by an artificial intelligence system. Therefore, several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neural networks. The first direction consists of feature-based post-hoc explanatory methods, that is, methods that aim to explain an already trained and fixed model (post-hoc), and that provide explanations in terms of input features, such as tokens for text and superpixels for images (feature-based). The second direction consists of self-explanatory neural models that generate natural language explanations, that is, models that have a built-in module that generates explanations for the predictions of the model.
摘要：深层神经网络正变得越来越流行，是因为在不同的领域，如计算机视觉，自然语言处理和语音识别他们的革命的成功。然而，这些模型的决策过程一般不解释给用户。在不同的领域，如医疗保健，金融，或法律，它是要知道通过人工智能系统作出决定背后的原因至关重要。因此，用于解释神经模型几个方向最近已经探索。在本文中，我调查了解释深层神经网络的两个主要方向。所述第一方向包括基于特征的事后说明的方法，即，方法旨在解释已经训练和固定模型（事后），并且在输入功能方面提供解释，诸如令牌文本和对于图像的超像素（基于特征的）。第二个方向包括能产生自然语言的解释，这是不言自明的神经模型，有机型内置生成的模型的预测解释模块。

61. Dialogue Generation on Infrequent Sentence Functions via Structured Meta-Learning [PDF] 返回目录
Yifan Gao, Piji Li, Wei Bi, Xiaojiang Liu, Michael R. Lyu, Irwin King
Abstract: Sentence function is an important linguistic feature indicating the communicative purpose in uttering a sentence. Incorporating sentence functions into conversations has shown improvements in the quality of generated responses. However, the number of utterances for different types of fine-grained sentence functions is extremely imbalanced. Besides a small number of high-resource sentence functions, a large portion of sentence functions is infrequent. Consequently, dialogue generation conditioned on these infrequent sentence functions suffers from data deficiency. In this paper, we investigate a structured meta-learning (SML) approach for dialogue generation on infrequent sentence functions. We treat dialogue generation conditioned on different sentence functions as separate tasks, and apply model-agnostic meta-learning to high-resource sentence functions data. Furthermore, SML enhances meta-learning effectiveness by promoting knowledge customization among different sentence functions but simultaneously preserving knowledge generalization for similar sentence functions. Experimental results demonstrate that SML not only improves the informativeness and relevance of generated responses, but also can generate responses consistent with the target sentence functions.
摘要：句中的作用是表明说出一个句子的交际目的的重要语言功能。掺入句子功能集成到谈话已经显示出产生的响应的质量的改善。然而，话语对于不同类型的细粒度句子功能的数量非常不平衡。除了少数高资源句子功能，句子的功能很大一部分是罕见的。因此，对话代空调从数据缺乏这些罕见的句子功能受到影响。在本文中，我们研究了一个结构化的元学习（SML）对话代上罕见的句子功能的方法。我们把对话代空调在不同的句子作为独立的任务，并运用模型无关元学习资源丰富的句子功能的数据。此外，SML增强元学习促进不同句子功能之间的知识自定义，但同时保存知识概括为类似句子的功能效果。实验结果表明，SML不仅提高了信息量和生成的响应的相关性，而且还可以生成与目标句子的功能一致的反应。

62. Paragraph-Level Commonsense Transformers with Recurrent Memory [PDF] 返回目录
Saadia Gabriel, Chandra Bhagavatula, Vered Shwartz, Ronan Le Bras, Maxwell Forbes, Yejin Choi
Abstract: Human understanding of narrative texts requires making commonsense inferences beyond what is stated in the text explicitly. A recent model, COMeT, can generate such inferences along several dimensions such as pre- and post-conditions, motivations, and mental-states of the participants. However, COMeT was trained on short phrases, and is therefore discourse-agnostic. When presented with each sentence of a multi-sentence narrative, it might generate inferences that are inconsistent with the rest of the narrative. We present the task of discourse-aware commonsense inference. Given a sentence within a narrative, the goal is to generate commonsense inferences along predefined dimensions, while maintaining coherence with the rest of the narrative. Such large-scale paragraph-level annotation is hard to get and costly, so we use available sentence-level annotations to efficiently and automatically construct a distantly supervised corpus. Using this corpus, we train PARA-COMeT, a discourse-aware model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives. PARA-COMeT captures both semantic knowledge pertaining to prior world knowledge, and episodic knowledge involving how current events relate to prior and future events in a narrative. Our results confirm that PARA-COMeT outperforms the sentence-level baselines, particularly in generating inferences that are both coherent and novel.
摘要：叙事文本的人理解需要进行常识性推断超出了在文本中明确说明。最近的一个模型，彗星，可以产生沿着几个方面，如前置和后置条件，动机和参与者的心理，状态，例如推论。然而，彗星被训练的短语，因此，是话语无关。当与多句的叙述每个句子呈现，它可能会产生与叙事的其余矛盾的推论。我们提出的话语感知常识推理的任务。给定一个故事中的一句话，我们的目标是产生沿着预定尺寸常识推论，同时保持连贯性与叙事的其余部分。这种大规模款的注解是很难得到和昂贵的，所以我们可以使用语句级注释高效，自动构建一个遥远监督语料库。使用这个语料库，我们训练PARA-彗星，话语感知模型，结合段落级信息来生成叙事连贯常识推论。 PARA-捕捉彗星有关前世界知识两个语义知识，和情节的知识，涉及时事如何与在叙述之前和未来的事件。我们的结果证实，对 - 彗星优于句子级的基线，特别是在生成的相干和新颖的推论。

63. Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network [PDF] 返回目录
Junyi Li, Siqing Li, Wayne Xin Zhao, Gaole He, Zhicheng Wei, Nicholas Jing Yuan, Ji-Rong Wen
Abstract: Personalized review generation (PRG) aims to automatically produce review text reflecting user preference, which is a challenging natural language generation task. Most of previous studies do not explicitly model factual description of products, tending to generate uninformative content. Moreover, they mainly focus on word-level generation, but cannot accurately reflect more abstractive user preference in multiple aspects. To address the above issues, we propose a novel knowledge-enhanced PRG model based on capsule graph neural network~(Caps-GNN). We first construct a heterogeneous knowledge graph (HKG) for utilizing rich item attributes. We adopt Caps-GNN to learn graph capsules for encoding underlying characteristics from the HKG. Our generation process contains two major steps, namely aspect sequence generation and sentence generation. First, based on graph capsules, we adaptively learn aspect capsules for inferring the aspect sequence. Then, conditioned on the inferred aspect label, we design a graph-based copy mechanism to generate sentences by incorporating related entities or words from HKG. To our knowledge, we are the first to utilize knowledge graph for the PRG task. The incorporated KG information is able to enhance user preference at both aspect and word levels. Extensive experiments on three real-world datasets have demonstrated the effectiveness of our model on the PRG task.
摘要：个性化复习代（PRG）旨在自动生成反映用户的喜好，这是一个具有挑战性的自然语言生成任务评论文章。以往大多数研究并没有明确建模的产品事实说明，倾向于产生无信息内容。此外，他们主要集中在文字层面产生，但不能准确地反映在多个方面更抽象的用户偏好。为了解决上述问题，我们提出了基于胶囊图形神经网络〜（CAPS-GNN）一种新型的知识增强PRG模式。我们首先构建利用丰富的项目属性的异质知识图谱（HKG）。我们采用大写GNN学习曲线胶囊从HKG编码基本特征。我们的生成过程包括两个主要步骤，即一方面序列生成和句子的产生。首先，基于图胶囊，我们自适应学习方面胶囊推断方面的序列。然后，条件推断方面的标签上，我们设计了一个基于图形的副本生成机制通过将相关的实体或词从HKG句子。据我们所知，我们是第一个利用知识图的PRG任务。该合并KG的信息能够增强用户的喜好在两个方面和文字水平。三个真实世界的数据集大量的实验已经证明了我们对PRG任务模型的有效性。

64. Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space [PDF] 返回目录
Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv, Nan Duan, Ming Zhou
Abstract: In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA
摘要：在本文中，我们提出了一个新颖的数据增强方法，被称为基于可控重写问题数据扩张（CRQDA），用于机器阅读理解（MRC），问题的产生，和答疑自然语言推理任务。我们对待这个问题的数据增强任务作为约束问题重写问题产生的上下文相关的，高品质，多样的问题，数据样本。 CRQDA利用变压器自编码到原始离散问题映射到连续的嵌入空间。然后，它使用预训练MRC模型，基于梯度的优化反复修改的问题表示。最后，修订后的问题表示被映射回离散的空间，作为附加问题的数据。在队内2.0，阵容1.1问题生成，并QNLI任务综合实验证明CRQDA的有效性

65. Sentence Constituent-Aware Aspect-Category Sentiment Analysis with Graph Attention Networks [PDF] 返回目录
Yuncong Li, Cunxiang Yin, Sheng-hua Zhong
Abstract: Aspect category sentiment analysis (ACSA) aims to predict the sentiment polarities of the aspect categories discussed in sentences. Since a sentence usually discusses one or more aspect categories and expresses different sentiments toward them, various attention-based methods have been developed to allocate the appropriate sentiment words for the given aspect category and obtain promising results. However, most of these methods directly use the given aspect category to find the aspect category-related sentiment words, which may cause mismatching between the sentiment words and the aspect categories when an unrelated sentiment word is semantically meaningful for the given aspect category. To mitigate this problem, we propose a Sentence Constituent-Aware Network (SCAN) for aspect-category sentiment analysis. SCAN contains two graph attention modules and an interactive loss function. The graph attention modules generate representations of the nodes in sentence constituency parse trees for the aspect category detection (ACD) task and the ACSA task, respectively. ACD aims to detect aspect categories discussed in sentences and is a auxiliary task. For a given aspect category, the interactive loss function helps the ACD task to find the nodes which can predict the aspect category but can't predict other aspect categories. The sentiment words in the nodes then are used to predict the sentiment polarity of the aspect category by the ACSA task. The experimental results on five public datasets demonstrate the effectiveness of SCAN.
摘要：方面类情绪分析（ACSA）目标来预测句子讨论的方面类别的情绪极性。由于句子通常讨论的一个或多个方面的类别，并表示对他们不同的情绪，不同的注意力为基础的方法已发展到分配对于给定的方面类别相应的情绪的话，并取得可喜的成果。然而，大多数这些方法直接使用给定方面类别找到方面类别相关的情绪字，这可能会导致情绪字和纵横类别之间不匹配时不相关的情绪字是对于给定的纵横类别语义上有意义的。为了缓解这一问题，我们提出了纵横类情感分析一个语句成分感知网络（SCAN）。 SCAN包含两个图形注意模块和互动损失函数。图表注意模块产生的分别方面类别检测（ACD）的任务和ACSA任务，句子选区分析树节点表示。 ACD旨在检测句子讨论的方面类别和是一个辅助任务。对于给定的体范畴，交互式损失函数帮助ACD任务找到能够预测方面类别，但无法预测另一方面类别的节点。在接着的节点的情绪字用于通过ACSA任务来预测方面类别的情感极性。在五个公共数据集上的实验结果表明，扫描的效果。

66. MIME: MIMicking Emotions for Empathetic Response Generation [PDF] 返回目录
Navonil Majumder, Pengfei Hong, Shanshan Peng, Jiankun Lu, Deepanway Ghosal, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria
Abstract: Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure, where all the emotions are treated uniformly. We argue that empathetic responses often mimic the emotion of the user to a varying degree, depending on its positivity or negativity and content. We show that the consideration of this polarity-based emotion clusters and emotional mimicry results in improved empathy and contextual relevance of the response as compared to the state-of-the-art. Also, we introduce stochasticity into the emotion mixture that yields emotionally more varied empathetic responses than the previous work. We demonstrate the importance of these factors to empathetic response generation using both automatic- and human-based evaluations. The implementation of MIME is publicly available at this https URL.
摘要：当前方法移情响应生成视图的集合在输入文本为平坦结构，其中所有的情绪被均匀地处理过的表达情绪。我们认为，移情反应经常模仿用户的情绪在不同程度上，取决于它的阳性或阴性和内容。我们发现，相比于国家的最先进的这种基于极性情绪聚类和改进的同情和情境响应的相关性情感模仿结果的考虑。此外，我们引入随机性到情感混合物，其收益率在感情上更多样化的比以前的工作移情反应。我们证明的同时使用自动 - 和以人为本的评估这些因素，移情反应生成的重要性。 MIME的实现是公开的，在此HTTPS URL。

67. GraphDialog: Integrating Graph Knowledge into End-to-End Task-Oriented Dialogue Systems [PDF] 返回目录
Shiquan Yang, Rui Zhang, Sarah Erfani
Abstract: End-to-end task-oriented dialogue systems aim to generate system responses directly from plain text inputs. There are two challenges for such systems: one is how to effectively incorporate external knowledge bases (KBs) into the learning framework; the other is how to accurately capture the semantics of dialogue history. In this paper, we address these two challenges by exploiting the graph structural information in the knowledge base and in the dependency parsing tree of the dialogue. To effectively leverage the structural information in dialogue history, we propose a new recurrent cell architecture which allows representation learning on graphs. To exploit the relations between entities in KBs, the model combines multi-hop reasoning ability based on the graph structure. Experimental results show that the proposed model achieves consistent improvement over state-of-the-art models on two different task-oriented dialogue datasets.
摘要：端至端面向任务的对话系统旨在产生直接从纯文本输入系统响应。有这样的系统两个方面的挑战：一是如何有效地将外部知识库（KBS）进入学习框架;另一个是如何准确地捕捉对话历史的语义。在本文中，我们讨论通过利用在知识基础，并在对话的依存分析树的图形结构信息这两个挑战。为了有效地发挥对话历史的结构信息，我们提出了一个新的复发性单元架构，它允许在图形上表示学习。为了利用以KB实体之间的关系，该模型将基于图形结构多跳的推理能力。实验结果表明，该模型实现了两种不同的面向任务的对话数据集在国家的最先进的车型持续改善。

68. Aspect-Based Sentiment Analysis in Education Domain [PDF] 返回目录
Rinor Hajrizi, Krenare Pireva Nuçi
Abstract: Analysis of a large amount of data has always brought value to institutions and organizations. Lately, people's opinions expressed through text have become a very important aspect of this analysis. In response to this challenge, a natural language processing technique known as Aspect-Based Sentiment Analysis (ABSA) has emerged. Having the ability to extract the polarity for each aspect of opinions separately, ABSA has found itself useful in a wide range of domains. Education is one of the domains in which ABSA can be successfully utilized. Being able to understand and find out what students like and don't like most about a course, professor, or teaching methodology can be of great importance for the respective institutions. While this task represents a unique NLP challenge, many studies have proposed different approaches to tackle the problem. In this work, we present a comprehensive review of the existing work in ABSA with a focus in the education domain. A wide range of methodologies are discussed and conclusions are drawn.
摘要：大数据量的分析总是带来价值，机构和组织。最近，人们通过文字所表达的观点已经成为这种分析的一个非常重要的方面。为了应对这一挑战，被称为Aspect的基于情感分析（ABSA）的自然语言处理技术已经出现。具有以提取的单独观点各方面的极性的能力，ABSA发现自己在一个宽范围的结构域是有用的。教育是其中ABSA可以成功地使用的结构域之一。能够理解，并找出学生喜欢和最不喜欢的有关课程，教授或教学方法可以对各个机构的高度重视。这个任务是一个独特的自然语言处理的挑战，许多研究都提出了解决这个问题的不同方法。在这项工作中，我们提出在ABSA在教育领域重点对现有的工作进行了全面审查。宽范围的方法的讨论和结论。

69. Semantic Role Labeling Guided Multi-turn Dialogue ReWriter [PDF] 返回目录
Kun Xu, Haochen Tan, Linfeng Song, Han Wu, Haisong Zhang, Linqi Song, Dong Yu
Abstract: For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting rid of the noises is essential to improve its performance. Existing attentive models attend to all words without prior focus, which results in inaccurate concentration on some dispensable words. In this paper, we propose to use semantic role labeling (SRL), which highlights the core semantic information of who did what to whom, to provide additional guidance for the rewriter model. Experiments show that this information significantly improves a RoBERTa-based model that already outperforms previous state-of-the-art systems.
摘要：对于多转对话改写，有效建模对话情境语言知识和摆脱噪音的能力是必不可少的，以提高其性能。现有的周到车型出席，恕不另行焦点，这会导致不准确的浓度对一些可有可无的话所有单词。在本文中，我们建议使用语义角色标注（SRL），其中突出谁做了什么谁的核心语义信息，为重写模式提供更多的指导。实验表明，该信息显著改善已经优于先前的国家的最先进的系统基于罗伯塔模型。

70. A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples [PDF] 返回目录
Zhao Meng, Roger Wattenhofer
Abstract: Generating adversarial examples for natural language is hard, as natural language consists of discrete symbols, and examples are often of variable lengths. In this paper, we propose a geometry-inspired attack for generating natural language adversarial examples. Our attack generates adversarial examples by iteratively approximating the decision boundary of Deep Neural Networks (DNNs). Experiments on two datasets with two different models show that our attack fools natural language models with high success rates, while only replacing a few words. Human evaluation shows that adversarial examples generated by our attack are hard for humans to recognize. Further experiments show that adversarial training can improve model robustness against our attack.
摘要：生成对抗性实例为自然语言是硬的，如自然语言由离散的符号，和实施例是经常可变的长度。在本文中，我们提出了产生自然语言对抗例子几何风格的攻击。我们的攻击通过反复逼近深层神经网络（DNNs）的决策边界产生对抗的例子。对两个数据集有两种不同型号的实验表明，我们的攻击愚弄与高成功率自然语言模型，而只更换了几句话。人力评估表明，我们的攻击对抗产生的例子是很难为人类所认识。进一步的实验表明，对抗性训练可以提高对我们的攻击模型的鲁棒性。

71. Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles [PDF] 返回目录
Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, Erik Cambria
Abstract: Recently, the automatic prediction of personality traits has received increasing attention and has emerged as a hot topic within the field of affective computing. In this work, we present a novel deep learning-based approach for automated personality detection from text. We leverage state of the art advances in natural language understanding, namely the BERT language model to extract contextualized word embeddings from textual data for automated author personality detection. Our primary goal is to develop a computationally efficient, high-performance personality prediction model which can be easily used by a large number of people without access to huge computation resources. Our extensive experiments with this ideology in mind, led us to develop a novel model which feeds contextualized embeddings along with psycholinguistic features toa Bagged-SVM classifier for personality trait prediction. Our model outperforms the previous state of the art by 1.04% and, at the same time is significantly more computationally efficient to train. We report our results on the famous gold standard Essays dataset for personality detection.
摘要：近日，个性特征的自动预测已经越来越受到关注，并已成为情感计算领域内的一个热门话题。在这项工作中，我们提出了从文本中自动检测人一种新的基于深学习的方法。我们在自然语言理解技术进步的杠杆状态，即BERT语言模型从自动化作家个性的检测文本数据中提取情境字的嵌入。我们的主要目标是开发一种可以方便地使用了大量的人没有获得巨大的计算资源，计算效率，高性能的个性预测模型。我们与这个思想在头脑广泛的实验，使我们开发出饲料与心理语言学特点TOA袋装-SVM分类的人格特质预测沿语境的嵌入了一种新的模式。我们的模型由1.04％优于现有技术中以前的状态，并在同一时间显著计算效率更高训练。我们报道了我们在著名的金标准散文集人格检测结果。

72. Unsupervised Cross-lingual Image Captioning [PDF] 返回目录
Jiahui Gao, Yi Zhou, Philip L. H. Yu, Jiuxiang Gu
Abstract: Most recent image captioning works are conducted in English as the majority of image-caption datasets are in English. However, there are a large amount of non-native English speakers worldwide. Generating image captions in different languages is worth exploring. In this paper, we present a novel unsupervised method to generate image captions without using any caption corpus. Our method relies on 1) a cross-lingual auto-encoding, which learns the scene graph mapping function along with the scene graph encoders and sentence decoders on machine translation parallel corpora, and 2) an unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to sentence modality. By leveraging cross-lingual auto-encoding, cross-modal feature mapping, and adversarial learning, our method can learn an image captioner to generate captions in different languages. We verify the effectiveness of our proposed method on the Chinese image caption generation. The comparisons against several baseline methods demonstrate the effectiveness of our approach.
摘要：最近的图像字幕作品在英语为广大图像字幕数据集进行的是英语。不过，也有大量的非英语为母语的世界各地。在不同的语言生成图像字幕是值得探讨的。在本文中，我们提出了一个新颖的无监督方法来生成图像标题，而无需使用任何字幕语料库。我们的方法依赖于跨语种自动编码1），其与机器翻译平行语料库场景图的编码器和句解码器沿着获知场景图映射函数，以及2）无监督特征映射，其目的是映射编码场景图从图像形态特征来一句形态。通过利用自动编码跨语言，跨模态特征映射，和对抗性的学习，我们的方法可以学习的图像字幕员，以生成不同语言的字幕。我们对中国人的形象，字幕生成验证我们提出的方法的有效性。针对几个基线方法的比较，证明了该方法的有效性。

73. Towards Interpretable Reasoning over Paragraph Effects in Situation [PDF] 返回目录
Mucheng Ren, Xiubo Geng, Tao Qin, Heyan Huang, Daxin Jiang
Abstract: We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the complicated reasoning process and solve it with a one-step "black box" model. Inspired by human cognitive processes, in this paper we propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules. In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model. Experimental results on the ROPES dataset demonstrate the effectiveness and explainability of our proposed approach.
摘要：我们专注于推理过一段效应形势任务，这需要一个模型，以了解在后台段所述的因果关系，并运用所学知识一个新的环境。现有的作品忽略了复杂的推理过程，并用一步法“黑盒子”模式解决它。人类认知过程的启发，在本文中，我们提出了一个顺序的方法完成这个任务，其中明确型号，神经网络模块的推理过程的每一个步骤。具体地，五个推理模块被设计并在端至端的方式，这导致了更可解释模型学习。在数据集中显示了我们提出的方法的有效性和explainability绳索实验结果。

74. Partially-Aligned Data-to-Text Generation with Distant Supervision [PDF] 返回目录
Zihao Fu, Bei Shi, Wai Lam, Lidong Bing, Zhiyuan Liu
Abstract: The Data-to-Text task aims to generate human-readable text for describing some given structured data enabling more interpretability. However, the typical generation task is confined to a few particular domains since it requires well-aligned data which is difficult and expensive to obtain. Using partially-aligned data is an alternative way of solving the dataset scarcity problem. This kind of data is much easier to obtain since it can be produced automatically. However, using this kind of data induces the over-generation problem posing difficulties for existing models, which tends to add unrelated excerpts during the generation procedure. In order to effectively utilize automatically annotated partially-aligned datasets, we extend the traditional generation task to a refined task called Partially-Aligned Data-to-Text Generation (PADTG) which is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. To tackle this new task, we propose a novel distant supervision generation framework. It firstly estimates the input data's supportiveness for each target word with an estimator and then applies a supportiveness adaptor and a rebalanced beam search to harness the over-generation problem in the training and generation phases respectively. We also contribute a partially-aligned dataset (The data and source code of this paper can be obtained from this https URL by sampling sentences from Wikipedia and automatically extracting corresponding KB triples for each sentence from Wikidata. The experimental results show that our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
摘要：数据到文本的任务目标，以生成人类可读的文本描述某个给定的结构化数据使更多的可解释性。然而，由于它需要良好对准的数据是困难和昂贵的，以获得典型生成任务被局限于少数特定结构域。使用部分对齐的数据是解决数据集稀缺问题的另一种方式。这种数据是很容易获得，因为它可以自动生成。然而，使用这种数据对于诱导现有的模式，这往往产生过程中添加无关的摘录过生成问题冒充困难。为了有效地利用自动注释的部分对齐的数据集，我们扩展了传统的生成任务到一个精炼任务调用部分排数据到文本代（PADTG），这是更实际的，因为它利用了自动注释的训练数据，从而大大扩大了应用领域。为了解决这个新任务，我们提出了一个新颖的遥远的监督生成框架。它首先估计与估计每个目标词的输入数据的支持性，然后应用一个支持性适配器和重新平衡梁的搜索分别以利用在培训和生成阶段的过度生成问题。我们也对部分对齐的数据集（本文的数据和源代码可以从该HTTPS URL通过采样来自维基百科的句子，并自动从维基数据每个句子中提取相应的KB三元获得。实验结果表明，我们的框架优于所有基线模型以及验证采用部分对齐数据的可行性。

75. Multilevel Text Alignment with Cross-Document Attention [PDF] 返回目录
Xuhui Zhou, Nikolaos Pappas, Noah A. Smith
Abstract: Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document levels. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence-to-document). Our component is weakly supervised from document pairs and can align at multiple levels. Our evaluation on predicting document-to-document relationships and sentence-to-document relationships on the tasks of citation recommendation and plagiarism detection shows that our approach outperforms previously established hierarchical, attention encoders based on recurrent and transformer contextualization that are unaware of structural correspondence between documents.
摘要：在任务，如引用推荐和抄袭检测文本对齐方式得到应用。现有的比对方法在一个单一的，预定义的水平运行，不能学会在对齐文本，例如，句子和文档级别。我们建议装备之前为表示与跨文档注意分量的文件，使不同层级（文档到文档和句子到文件）结构比较建立的分层编码器的关注新的学习方法。我们的组件是弱的文档对监督，可在多个级别保持一致。我们在预测上，我们的方法比以前建立的基于递归和变压器语境层次，注意编码器是不知道之间的结构对应的引文建议和抄袭检测显示任务的文档到文档关系和句子对文档关系的评价文档。

76. Mining Knowledge for Natural Language Inference from Wikipedia Categories [PDF] 返回目录
Mingda Chen, Zewei Chu, Karl Stratos, Kevin Gimpel
Abstract: Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WikiNLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT and RoBERTa by pretraining them on WikiNLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WikiNLI gives the best performance. In addition, we construct WikiNLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages.
摘要：准确的词汇蕴含（LE）和自然语言推理（NLI）通常需要大量昂贵的注解。为了减轻对标数据的需要，我们介绍WikiNLI：关于改进和NLI LE任务模型性能的资源。它包含428899对从维基百科自然注释类别层次结构构成的短语。我们表明，我们可以通过训练前他们WikiNLI和下游任务转移模型提高强基线如BERT和罗伯塔。我们同其他的知识基础，如共发现和维基数据提取短语系统的比较发现，训练前对WikiNLI提供了最好的性能。此外，我们构造WikiNLI其他语言，并表明，训练前对他们的相应语言的NLI任务提高性能。

77. AI pptX: Robust Continuous Learning for Document Generation with AI Insights [PDF] 返回目录
Vineeth Ravi, Selim Amrouni, Andrea Stefanucci, Prashant Reddy, Manuela Veloso
Abstract: Business analysts create billions of slide decks, reports and documents annually. Most of these documents have well-defined structure comprising of similar content generated from data. We present 'AI pptX', a novel AI framework for creating and modifying documents as well as extract insights in the form of natural language sentences from data. AI pptX has three main components: (i) a component that translates users' natural language input into 'skills' that encapsulate content editing and formatting commands, (ii) a robust continuously learning component that interacts with users, and (iii) a component that automatically generates hierarchical insights in the form of natural language sentences. We illustrate (i) and (ii) with a study of 18 human users tasked to create a presentation deck and observe the learning capability from a decrease in user-input commands by up to 45%. We demonstrate the robust learning capability of AI pptX with experimental simulations of non-collaborative users. We illustrate (i) and (iii) by automatically generating insights in natural language using a data set from the Electricity Transmission Network of France (RTE); we show that a complex statistical analysis of series can automatically be distilled into easily interpretable explanations called AI Insights.
摘要：业务分析师打造百亿幻灯片组，报告和文件每年。大多数这些文献都明确定义的结构，其包括从数据生成的类似的内容。我们现在“AI PPTX”，用于创建和从数据自然语言中的句子的形式修改文件以及提取的见解新颖的AI框架。 AI PPTX有三个主要部件：（i）该翻译用户的自然语言输入到‘封装内容编辑和格式化命令技能’的组分，（ⅱ）一个健壮连续学习组件，它与用户相互作用，和（iii）的组分自动生成自然语言句子的形式层次的见解。我们示出了（i）和（ii）用的任务是创建演示文稿甲板和高达45％观察到来自于用户输入命令的降低学习能力18个人类用户进行了研究。我们证明非协作用户的实验模拟AI PPTX格式的强大的学习能力。我们举例说明（i）及（iii）通过使用来自法国的电力传输网络（RTE）数据集自然语言自动生成的见解;我们表明，一系列复杂的统计分析，可以自动地提炼成所谓的AI见解容易解释的解释。

78. Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit [PDF] 返回目录
Zeljko Kraljevic, Thomas Searle, Anthony Shek, Lukasz Roguski, Kawsar Noor, Daniel Bean, Aurelie Mascio, Leilei Zhu, Amos A Folarin, Angus Roberts, Rebecca Bendayan, Mark P Richardson, Robert Stewart, Anoop D Shah, Wai Keong Wong, Zina Ibrahim, James T Teo, Richard JB Dobson
Abstract: Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a feature-rich annotation interface for customizing and training IE models; and c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets ( F1 0.467-0.791 vs 0.384-0.691). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ~8.8B words from ~17M clinical records and further fine-tuning with ~6K clinician annotated examples. We show strong transferability ( F1 >0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
摘要：电子健康记录（EHR）含有大量的非结构化文本，要求信息抽取（IE）技术的应用，使临床分析。我们目前的开源医学概念注释工具包（MedCAT），其提供：用于提取使用任何概念词汇包括UMLS / SNOMED-CT概念a）一种新颖的自监督的机器学习算法; B）一个功能丰富的注释界面定制和训练IE模型;和c）集成到更广泛的CogStack生态系统供应商无关的健康系统的部署。我们发现，从开放的数据集（F1 0.467-0.791 VS 0.384-0.691）提取UMLS概念更好的性能。而且真实世界的验证表明，在3家大伦敦医院拥有超过自我指导训练SNOMED-CT提取〜在〜17M临床记录，并进一步微调以〜6K医生注释例子8.8B话。我们发现医院，数据集和概念类型的指示加速临床和科研使用情况跨域EHR无关的工具之间的强转移性（F1> 0.94）。

79. Automatic Extraction of Rules Governing Morphological Agreement [PDF] 返回目录
Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig
Abstract: Creating a descriptive grammar of a language is an indispensable step for language documentation and preservation. However, at the same time it is a tedious, time-consuming task. In this paper, we take steps towards automating this process by devising an automated framework for extracting a first-pass grammatical specification from raw text in a concise, human- and machine-readable format. We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages. We apply our framework to all languages included in the Universal Dependencies project, with promising results. Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data. We confirm this finding with human expert evaluations of the rules that our framework produces, which have an average accuracy of 78%. We release an interface demonstrating the extracted rules at this https URL.
摘要：创建语言的描写语法是语言的文档和保存不可缺少的一步。然而，在同一时间，它是一个繁琐，费时的任务。在本文中，我们采取对通过设计自动化框架以简明，人类和机器可读的格式提取原始文本的第一通语法规范自动执行此处理步骤。我们专注于提取描述协议，在世界上许多语言的语法的核心形态句法现象的规则。我们运用我们的框架，包括在通用依赖项目中的所有语言，可喜的成果。利用跨语言传输，即使在感兴趣的语言中没有专家的注解，我们的框架提取语法规范这几乎等同于拥有大量黄金标准注释数据的创建。我们证实这一发现与我们的框架产生具有78％的平均准确度的规则，人的鉴定结论。我们解除的界面，在此HTTPS URL表明提取的规则。

80. Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media [PDF] 返回目录
Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris
Abstract: Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data. Often, the pretraining data used in these models are selected based on their subject matter, e.g., biology or computer science. Given the range of applications using social media text, and its unique language variety, we pretrain two models on tweets and forum text respectively, and empirically demonstrate the effectiveness of these two resources. In addition, we investigate how similarity measures can be used to nominate in-domain pretraining data. We publicly release our pretrained models at this https URL.
摘要：在特定领域-BERT模型最近的研究表明下游的任务是效力可以在模型上域数据预先训练得到改善。通常，在这些模型中使用的训练前的数据选择的依据是他们的主题，例如，生物学和计算机科学。鉴于利用社会媒体文本的应用范围，其独特的语言不同，我们分别pretrain在微博和论坛文字两种型号，并根据经验表明这两种资源的有效性。此外，我们研究了相似的措施如何可以用来提名域内训练前的数据。我们公开在此HTTPS URL释放我们的预先训练模式。

81. Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems [PDF] 返回目录
Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak
Abstract: The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior the longest, i.e., \emph{Survival Analysis}. This metric has the ability to correlate a bot's performance to certain of its characteristics (e.g., \ fluency or sensibleness), yielding interpretable results. The comparably low cost of our framework allows for frequent evaluations of chatbots during their evaluation cycle. We empirically validate our claims by applying \emph{Spot The Bot} to three domains, evaluating several state-of-the-art chatbots, and drawing comparisons to related work. The framework is released as a ready-to-use tool.
摘要：时间不够，高效，可靠的评价方法阻碍对话的对话系统（聊天机器人）的开发。人类需要与聊天机器人交谈评估的时间和成本密集型的，把对人的法官高的认知需求，产量低质量的结果。在这项工作中，我们将介绍\ {EMPH现货机器人}，具有成本效益和稳健的评估框架，与机器人之间的对话内容替换人机器人对话。人类法官则只为每个实体注释在谈话中，他们是否认为这是人或不（假设有人类参与这些对话）。这些注释然后让我们对他们对人类的模仿对话行为能力排名聊天机器人。因为我们希望所有的机器人最终被承认的，我们引入一个度量措施，聊天机器人能坚持类似人类的行为时间最长，即\ {EMPH生存分析}。该度量具有关联一个机器人的性能将其部分特性（例如，\流畅或见识），产生可解释的结果的能力。我们的框架的成本相对较低允许对自己的评价周期内聊天机器人的频繁的评估。我们凭经验应用\ {EMPH现货机器人}到三个领域，评估国家的最先进的几个聊天机器人，并提请比较，以相关工作验证我们的要求。该框架被释放为准备使用的工具。

82. A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling [PDF] 返回目录
Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han
Abstract: We study the problem of modeling human mobility from semantic trace data, wherein each GPS record in a trace is associated with a text message that describes the user's activity. Existing methods fall short in unveiling human movement regularities, because they either do not model the text data at all or suffer from text sparsity severely. We propose SHMM, a multi-modal spherical hidden Markov model for semantics-rich human mobility modeling. Under the hidden Markov assumption, SHMM models the generation process of a given trace by jointly considering the observed location, time, and text at each step of the trace. The distinguishing characteristic of SHMM is the text modeling part. We use fixed-size vector representations to encode the semantics of the text messages, and model the generation of the l2-normalized text embeddings on a unit sphere with the von Mises-Fisher (vMF) distribution. Compared with other alternatives like multi-variate Gaussian, our choice of the vMF distribution not only incurs much fewer parameters, but also better leverages the discriminative power of text embeddings in a directional metric space. The parameter inference for the vMF distribution is non-trivial since it involves functional inversion of ratios of Bessel functions. We theoretically prove that: 1) the classical Expectation-Maximization algorithm can work with vMF distributions; and 2) while closed-form solutions are hard to be obtained for the M-step, Newton's method is guaranteed to converge to the optimal solution with quadratic convergence rate. We have performed extensive experiments on both synthetic and real-life data. The results on synthetic data verify our theoretical analysis; while the results on real-life data demonstrate that SHMM learns meaningful semantics-rich mobility models, outperforms state-of-the-art mobility models for next location prediction, and incurs lower training cost.
摘要：我们研究从语义的跟踪数据，其中在跟踪每个GPS记录与描述用户的活动短信相关的模拟人类移动性的问题。现有的方法在揭幕人体运动规律功亏一篑，因为他们要么不文本数据在所有的模型或从文本稀疏严重受损。我们建议SHMM，多模式球形隐马尔可夫模型丰富的语义，人口流动模型。根据隐马尔可夫假设，SHMM模型通过联合考虑观察的地点，时间，和文本在跟踪的每一步骤给定的轨迹的生成过程。 SHMM的显着特点是文字造型的一部分。我们使用固定大小的矢量表示以编码文本消息的语义，并且在与冯米塞斯-Fisher分析（VMF）分布的单元球上的L2归一化的文本的嵌入的产生进行建模。与像多变量高斯，我们的选择VMF分布的其他替代品相比，不仅招致少很多参数，而且还更好地利用了文字的嵌入的辨别力的方向度量空间。参数推断为VMF分布是不平凡的，因为它涉及的贝塞尔函数比的功能反转。我们从理论上证明：1）经典的最大期望算法可以与VMF发行工作;和2），而封闭形式的解决方案是很难的M-步骤中获得的，牛顿法是保证收敛到与二次收敛率的最佳解决方案。我们已经在人工和真实数据进行了大量的实验。对合成数据的结果验证了我们的理论分析;而现实生活中的数据结果表明，SHMM学习有意义丰富的语义，移动模型，优于国家的最先进的移动性模型下一个位置预测，并造成降低培训成本。

83. Improving Device Directedness Classification of Utterances with Semantic Lexical Features [PDF] 返回目录
Kellen Gillespie, Ioannis C. Konstantakopoulos, Xingzhi Guo, Vishal Thanvantri Vasudevan, Abhinav Sethy
Abstract: User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without the need of a wakeword. For the system to only respond when appropriate, and to ignore speech not intended for it, utterances must be classified as device-directed or non-device-directed. State-of-the-art systems have largely used acoustic features for this task, while others have used only lexical features or have added LM-based lexical features. We propose a directedness classifier that combines semantic lexical features with a lightweight acoustic feature and show it is effective in classifying directedness. The mixed-domain lexical and acoustic feature model is able to achieve 14% relative reduction of EER over a state-of-the-art acoustic-only baseline model. Finally, we successfully apply transfer learning and semi-supervised learning to the model to improve accuracy even further.
摘要：个人助理，页面一样，谷歌主页和Siri的用户交互通常由尾项或wakeword启动。一些个人助理功能“跟进”模式，让用户进行更多的互动，而不需要一个wakeword的。为了使系统仅在适当的时候，并忽略语音不用于响应它，话语必须被分类为设备定向或非设备定向。国家的最先进的系统基本上都采用声学特征此任务，而其他人只使用词汇特征或添加基于LM-词汇特征。我们提出了一个指向性的分类是一个轻量级的声学特征结合了语义词汇特征显示它是有效的指向性进行分类。在混合域词汇和声学特征模型是能够在一个国家的最先进的声仅基线模型来实现EER的14％的相对减少。最后，我们成功地应用迁移学习和半监督学习的模型，以进一步提高精度。

84. PMI-Masking: Principled masking of correlated spans [PDF] 返回目录
Yoav Levine, Barak Lenz, Opher Lieber, Omri Abend, Kevin Leyton-Brown, Moshe Tennenholtz, Yoav Shoham
Abstract: Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation over the corpus. PMI-Masking motivates, unifies, and improves upon prior more heuristic approaches that attempt to address the drawback of random uniform token masking, such as whole-word masking, entity/phrase masking, and random-span masking. Specifically, we show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training.
摘要：掩蔽令牌均匀地随机构成屏蔽语言模型（的MLM）如BERT的预训练一个共同的缺陷。我们发现，这样的统一屏蔽允许传销是锁定浅本地信号，从而导致效率低下训练前和次优下游性能，以尽量减少其培训目标。为了解决这一缺陷，我们提出PMI-遮蔽的基础上，逐点互信息（PMI）的概念，一个有原则的屏蔽策略，共同面具象征性的n-gram，如果它表现出一些超过胼高搭配。 PMI-掩蔽能够激励，结合，并改进其试图解决随机均匀令牌掩蔽，如全词掩蔽，实体/短语掩蔽和随机掩蔽跨度的缺点之前更启发式方法。具体来说，我们通过实验证明该PMI-遮蔽到达之前屏蔽性能一半的训练时间接近，并在训练结束持续提高性能。

85. Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speaker Separation [PDF] 返回目录
Zhong-Qiu Wang, Peidong Wang, DeLiang Wang
Abstract: We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for offline utterance-wise and block-online continuous speaker separation in reverberant conditions, aiming at both speaker separation and dereverberation. Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online continuous speaker separation (CSS). Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.
摘要：本文提出多麦克风复杂的频谱映射，应用深度学习的时变非线性波束形成的一种简单的方法，为混响条件下线话语明智和块的在线连续扬声器分离，针对这两个扬声器分离和去混响。假设训练和测试之间的固定阵列的几何形状，我们培养深神经网络（DNN）在来自多个麦克风的RI组件的参考麦克风预测目标语音的实部和虚部（RI）的组件。然后，我们用整合波束形成和后滤波以进一步改善分离，并与块的在线连续扬声器分离（CSS）帧级扬声器计数结合它多麦克风复杂频谱映射。虽然我们的系统是基于固定数量的麦克风安排在一个给定的几何模拟房间脉冲响应（RIR）的训练，它概括很好地使用相同的几何真正的数组。国家的最先进的是在模拟双健谈SMS-WSJ语料和真实记录LibriCSS数据集获得的分离性能。

86. Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [PDF] 返回目录
Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney
Abstract: Natural language comments convey key aspects of source code such as implementation, usage, and pre- and post-conditions. Failure to update comments accordingly when the corresponding code is modified introduces inconsistencies, which is known to lead to confusion and software bugs. In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code, in order to catch potential inconsistencies just-in-time, i.e., before they are committed to a version control system. To achieve this, we develop a deep-learning approach that learns to correlate a comment with code changes. By evaluating on a large corpus of comment/code pairs spanning various comment types, we show that our model outperforms multiple baselines by significant margins. For extrinsic evaluation, we show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system which can both detect and resolve inconsistent comments based on code changes.
摘要：自然语言意见传达源代码的关键方面，如实施，使用和前置和后置条件。当相应的代码被修改介绍的不一致，这是已知的导致混乱和软件缺陷未能更新相应的评论。在本文中，我们的目标是检测的评论是随着修改的代码对应体的结果不一致，为了赶上刚刚在一次潜在的不一致，即，他们致力于一个版本控制系统之前。为了实现这一目标，我们开发了一个深刻的学习办法，学会与代码更改的注释相关。通过对大量语料跨越各种注释类型注释/代码对评估，我们表明，我们的模型通过显著利润率优于多个基准。对于外在的评价，我们通过与评论更新模型相结合，以建立一个更全面的自动化的注释维护系统，它可以同时检测和基于代码更改决心不一致的言论表明我们的方法的有效性。

87. Holistic static and animated 3D scene generation from diverse text descriptions [PDF] 返回目录
Faria Huq, Anindya Iqbal, Nafees Ahmed
Abstract: We propose a framework for holistic static and animated 3D scene generation from diverse text descriptions. Prior works of scene generation rely on static rule-based entity extraction from natural language description. However, this limits the usability of a practical solution. To overcome this limitation, we use one of state-of-the-art architecture - TransformerXL. Instead of rule-based extraction, our framework leverages the rich contextual encoding which allows us to process a larger range (diverse) of possible natural language descriptions. We empirically show how our proposed mechanism generalizes even on novel combinations of object-features during inference. We also show how our framework can jointly generate static and animated 3D scene efficiently. We modify CLEVR to generate a large, scalable dataset - Integrated static and animated 3D scene (Iscene). Data preparation code and pre-trained model available at - this https URL.
摘要：我们提出了整体的静态和来自不同的文字描述了一个框架三维动画场景生成。场景生成的作品之前依赖从自然语言描述静态基于规则的实体提取。然而，这限制了可行的解决方案的可用性。为了克服这种局限性，我们使用国家的最先进的建筑之一 - TransformerXL。相反，基于规则的提取，我们的框架利用丰富的背景编码，这使我们能够处理可能的自然语言描述更大范围的（不同的）。我们经验表明推理过程如何我们提出的机制推广甚至在的新组合对象的特性。我们还展示了如何我们的架构能够有效地共同生成静态和动画的3D场景。我们修改CLEVR产生大的，可扩展的数据集 - 综合静态和动态3D场景（Iscene）。数据准备代码，并可以在预先训练模型 - 这HTTPS URL。

88. NLP Service APIs and Models for Efficient Registration of New Clients [PDF] 返回目录
Sahil Shah, Vihari Piratla, Soumen Chakrabarti, Sunita Sarawagi
Abstract: State-of-the-art NLP inference uses enormous neural architectures and models trained for GPU-months, well beyond the reach of most consumers of NLP. This has led to one-size-fits-all public API-based NLP service models by major AI companies, serving large numbers of clients. Neither (hardware deficient) clients nor (heavily subscribed) servers can afford traditional fine tuning. Many clients own little or no labeled data. We initiate a study of adaptation of centralized NLP services to clients, and present one practical and lightweight approach. Each client uses an unsupervised, corpus-based sketch to register to the service. The server uses an auxiliary network to map the sketch to an abstract vector representation, which then informs the main labeling network. When a new client registers with its sketch, it gets immediate accuracy benefits. We demonstrate the success of the proposed architecture using sentiment labeling, NER, and predictive language modeling
摘要：国家的最先进的自然语言处理推论使用训练GPU月巨大的神经结构和模型，远远超出了大多数消费者的NLP的范围。这导致了一个尺寸适合所有公共基于API的NLP的服务模式主要由AI公司，服务于大量客户端。无论是（硬件缺陷）客户端，也不（订阅严重）服务器可以承受传统的微调。许多客户自己很少或没有标签的数据。我们首次对客户的集中NLP服务适应的研究，目前一个实际和轻量级的方法。每个用户都使用一种无监督的，基于语料库的草图注册到服务。服务器使用辅助网络映射草图到抽象向量表示，其然后通知主标签网络。当一个新的客户端提供草图注册时，它立即取得准确的好处。我们用情绪标签，净入学率，并预测语言建模证明了该架构的成功

89. Reproducible Science with \LaTeX [PDF] 返回目录
Haim Bar, HaiYing Wang
Abstract: This paper proposes a procedure to execute external source codes from a \LaTeX\space document and include the calculation outputs in the resulting Portable Document Format (pdf) file automatically. It integrates programming tools into the \LaTeX\space writing tool to facilitate the production of reproducible research. In our proposed approach to a \LaTeX-based scientific notebook the user can easily invoke any programming language or a command-line program when compiling the \LaTeX\space document, while using their favorite \LaTeX\space editor in the writing process. The required \LaTeX\space setup, a new \proglang{Python} package, and the defined preamble are discussed in detail, and working examples using \proglang{R}, \proglang{Julia}, and \proglang{MatLab} to reproduce existing research are provided to illustrate the proposed procedure. We also demonstrate how to include system setting information in a paper by invoking shell scripts when compiling the document.
摘要：提出从\乳胶\空间文档执行外部源代码和自动包括在所得的可移植文档格式（PDF）文件的计算输出的过程。它集成了编程工具到\ LaTeX的\空间书写工具，方便生产重复性研究。在我们提出的方法为基于乳胶的\科学的笔记本用户可以方便地调用任何编程语言或命令行程序编译\ LaTeX的\空间文档时，同时使用自己喜爱的\ LaTeX \空间在写作过程中编辑。所需\乳胶\空间设置，一个新的\ proglang {的Python}包，和所定义的前导码进行了详细讨论，和工作实例使用\ proglang {R}，\ proglang {朱莉娅}，和\ proglang {MatLab的}重现现有的研究提供了所提程序。我们还演示了如何通过编写文档时调用shell脚本以包含在一份文件系统设置信息。

90. SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization [PDF] 返回目录
Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao
Abstract: Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g., experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task. To fill the gaps, we propose a new method SumGNN:~{\it knowledge summarization graph neural network}, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54\%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction.
摘要：由于使用机器学习模型药物相互作用（DDI）的数据集和大型的生物医学知识图（KGS），准确的检测不良DDI的可用性的增加成为可能。然而，它在很大程度上仍然开放的问题，如何有效地利用了DDI检测大和嘈杂的生物医学KG。由于其规模和在幼稚园噪声量，它往往是不太有利的直接集成与其它小但质量较高的数据（例如，实验数据）公斤。大多数现有的方法完全忽略公斤。一些试图直接集成通过了有限的成功图表神经网络等数据公斤。而且，大多数以前的作品集中在二进制DDI预测，而多类型的DDI药理作用的预测是更有意义，但更重的任务。为了填补国内空白，我们提出了一个新的方法SumGNN：〜{\它的知识汇总图中的神经网络}，它是由一个子提取模块启用，可以有效地从幼稚园相关的子图锚，自我关注基于子图总结方案以产生子图内推理路径，以及利用用于显著改进的多类型的DDI预测大规模外部生物医学知识的多通道的知识和数据集成模块。 SumGNN优于通过了最好基线到5.54 \％，而性能增益是低的数据关系类型尤其显著。此外，经由SumGNN针对每个预测所产生的推理路径提供可解释的预测。

91. Code to Comment "Translation": Data, Metrics, Baselining & Evaluation [PDF] 返回目录
David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu
Abstract: The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep learning methods to this task, and specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CodeNN, DeepCom, FunCom, and DocString. We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators. We found some interesting differences between the code-comment data and the WMT19 natural language data. Next, we describe and conduct some studies to calibrate BLEU (which is commonly used as a measure of comment quality). using "affinity pairs" of methods, from different projects, in the same project, in the same class, etc; Our study suggests that the current performance on some datasets might need to be improved substantially. We also argue that fairly naive information retrieval (IR) methods do well enough at this task to be considered a reasonable baseline. Finally, we make some suggestions on how our findings might be used in future research in this area.
摘要：注释代码的关系，特别是，鉴于生成的代码有用的意见的任务，长期以来一直感兴趣。最早的方法是基于对评论结构的强句法理论，并依靠文本模板。最近，研究人员已经应用深度学习方法这个任务，具体而言，这是众所周知的做工非常精良的自然语言转换（例如，从德语改为英语）训练的生成翻译模型。我们仔细审视这里的基本假设：即生成评论任务充分类似于自然语言，等等类似的模型和评价指标之间的转换，可以使用的任务。我们分析最近的几个代码注释的数据集此任务：CodeNN，DeepCom，Funcom公司，和文档字符串。我们与WMT19，标准数据集经常用来训练艺术自然语言翻译的状态进行比较。我们发现，代码注释数据和WMT19自然语言数据之间的一些有趣的差异。接下来，我们将介绍并进行了一些研究，以校正BLEU（这是常用的评论质量的度量）。使用方法“亲和对”，从不同的项目，在同一项目中，在同一类等;我们的研究表明，在一些数据集目前的表现可能需要大幅提高。我们还认为，相当幼稚的信息检索（IR）的方法做的不够好，在这个任务被认为是合理的基线。最后，我们对如何我们的研究结果可能在将来这方面的研究中使用的一些建议。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-10-06

目录

摘要