摘要

1. Let's Stop Incorrect Comparisons in End-to-end Relation Extraction! [PDF] 返回目录
Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari
Abstract: Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the datasets statistics and we call for unifying the evaluation setting in end-to-end RE.
摘要：尽管努力区分三种不同的评价设置，众多终端到终端的关系抽取（RE）的文章提出不可靠的性能相比于以前的工作（Bekoulis等，2018）。在本文中，我们首先确定在发表的论文比较无效的几种模式并加以描述，以避免它们的传播。然后，我们提出了一个小的实证研究，量化最常见的错误的影响，并评估它导致约5％的ACE05高估最终RE性能。我们也抓住这一契机，研究两队近况未开发的消融：利用语言模型训练前（特别是BERT）和量程级NER的。此荟萃分析强调在评价设置和数据集都统计报告严谨性的需要，我们呼吁统一的端至端RE的评价设置。

2. AutoRC: Improving BERT Based Relation Classification Models via Architecture Search [PDF] 返回目录
Wei Zhu, Xiaoling Wang, Xipeng Qiu, Yuan Ni, Guotong Xie
Abstract: Although BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models, it seems that no consensus can be reached on what is the optimal architecture. Firstly, there are multiple alternatives for entity span identification. Second, there are a collection of pooling operations to aggregate the representations of entities and contexts into fixed length vectors. Third, it is difficult to manually decide which feature vectors, including their interactions, are beneficial for classifying the relation types. In this work, we design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices mentioned above. Experiments on seven benchmark RC tasks show that our method is efficient and effective in finding better architectures than the baseline BERT based RC model. Ablation study demonstrates the necessity of our search space design and the effectiveness of our search method.
摘要：虽然基于BERT关系分类（RC）模型已经超过了传统的深度学习模式取得了显著的改善，似乎没有达成一致意见可以对什么是最佳的架构来达到。首先，对于实体跨度识别多个备选方案。第二，有集中操作实体和上下文的表示聚集成固定长度向量的集合。第三，它是很难手动决定哪些特征向量，包括它们之间的相互作用，是分类的关系类型是有益的。在这项工作中，我们设计了基于BERT RC模型，并采用神经结构搜索（NAS）方法进行全面的搜索空间自动发现上面提到的设计选择。七个基准RC任务的实验结果表明，我们的方法是有效的，有效地发现比基线BERT基于RC模型更好的架构。消融研究表明我们的搜索空间设计的必要性和我们的搜索方法的有效性。

3. GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis [PDF] 返回目录
Huaishao Luo, Lei Ji, Tianrui Li, Nan Duan, Daxin Jiang
Abstract: In this paper, we focus on the imbalance issue, which is rarely studied in aspect term extraction and aspect sentiment classification when regarding them as sequence labeling tasks. Besides, previous works usually ignore the interaction between aspect terms when labeling polarities. We propose a GRadient hArmonized and CascadEd labeling model (GRACE) to solve these problems. Specifically, a cascaded labeling module is developed to enhance the interchange between aspect terms and improve the attention of sentiment tokens when labeling sentiment polarities. The polarities sequence is designed to depend on the generated aspect terms labels. To alleviate the imbalance issue, we extend the gradient harmonized mechanism used in object detection to the aspect-based sentiment analysis by adjusting the weight of each label dynamically. The proposed GRACE adopts a post-pretraining BERT as its backbone. Experimental results demonstrate that the proposed model achieves consistency improvement on multiple benchmark datasets and generates state-of-the-art results.
摘要：在本文中，我们重点关注的不平衡问题，这是关于他们的序列标注任务时在长期方面提取方面的情感分类研究较少。此外，以前的作品标注的极性时，经常忽略的方面而言之间的相互作用。我们提出了一个统一的梯度和级联标签模型（GRACE）来解决这些问题。具体地，级联标记模块被显影以提高方面术语之间的交换和标记情绪极性时改善情绪的令牌的注意。极性序列设计成依赖于所生成的方面而言标签。为了缓解失衡问题，我们通过动态调整每个标签的重量延伸在物体检测用于基于纵横情绪分析的梯度统一机制。所提出的GRACE采用了后训练前BERT为骨干。实验结果表明，所提出的模型实现了对多个基准数据集的一致性改善，并产生状态的最先进的结果。

4. Context-theoretic Semantics for Natural Language: an Algebraic Framework [PDF] 返回目录
Daoud Clarke
Abstract: Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context. In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts it occurs in in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings. Together these properties form guidelines as to how to construct semantic representations within the framework. A context theory is an implementation of the framework; in an implementation strings are represented as vectors with the properties deduced from the theoretical analysis. We show how to incorporate logical semantics into context theories; this enables us to represent statistical information about uncertainty by taking weighted sums of individual representations. We also use the framework to analyse approaches to the task of recognising textual entailment, to ontological representations of meaning and to representing syntactic structure. For the latter, we give new algebraic descriptions of link grammar.
摘要：技术，其中字被表示为矢量已被证明在计算语言学许多应用中是有用的，但是目前还用于表示矢量来表示没有一般语义形式主义。我们提出了自然语言的语义中单词，短语和句子都表示为载体，在此基础上假设的意义是通过上下文确定的理论分析框架。在理论分析，我们定义了一个语料库模型作为文本语料库的数学抽象。词语的字符串的意义被假定为代表它在语料库中的模型发生上下文的载体。基于这个假设，我们可以显示文字的矢量表示可以被视为在领域的代数的元素。我们注意到，在给代表的话的含义向量空间的应用程序有一个潜在的晶格结构;我们的解释是格子的部分排序为描述意义之间蕴涵。我们还定义了一个字符串的情况下，理论概率，并在此基础上与晶格结构，一定程度串之间的蕴涵。这些特性共同形成了对如何框架内建立的语义表示的准则。上下文理论框架的实现;在一个实施字符串被表示为具有从理论上分析推导出的特征矢量。我们展示如何将逻辑语义学的来龙去脉理论;这使我们通过采取个别交涉的加权和来表示的不确定性的统计信息。我们还用框架方法分析，以识别文字蕴涵，对意义的本体论表示的任务，并表示语法结构。对于后者，我们给出的链接语法的新代数描述。

5. SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning [PDF] 返回目录
Hwaran Lee, Seokhwan Jo, HyungJun Kim, Sangkeun Jung, Tae-Yoon Kim
Abstract: The recent advent of neural approaches for developing each dialog component in task-oriented dialog systems has greatly improved, yet optimizing the overall system performance remains a challenge. In this paper, we propose an end-to-end trainable neural dialog system with reinforcement learning, named SUMBT+LaRL. The SUMBT+ estimates user-acts as well as dialog belief states, and the LaRL models latent system action spaces and generates response given the estimated contexts. We experimentally demonstrated that the training framework in which the SUMBT+ and LaRL are separately pretrained, then the entire system is fine-tuned significantly increases dialog success rates. We propose new success criteria for reinforcement learning to the end-to-end dialog system as well as provide experimental analysis on a different result aspect depending on the success criteria and evaluation methods. Consequently, our model achieved the new state-of-the-art success rate of 85.4% on corpus-based evaluation, and a comparable success rate of 81.40% on simulator-based evaluation provided by the DSTC8 challenge.
摘要：最近的发展在面向任务的对话系统每个对话框成分神经的方法出现，极大的提高，但优化系统的整体性能仍然是一个挑战。在本文中，我们提出了一个终端到年底可训练神经对话框，强化学习，命名SUMBT + LaRL系统。该SUMBT +预测用户行为，以及对话信仰状态和LaRL车型潜在的系统操作的空间，并生成给出的估计上下文响应。我们通过实验证明了的培训框架，使SUMBT +和LaRL分别预先训练，然后对整个系统进行微调显著增加对话的成功率。我们提出了新的成功标准，强化学习的端至端对话系统以及上取决于成功的标准和评价方法的结果不同方面提供了实验分析。因此，我们的模型实现了85.4％的基于语料库的评价的新的国家的最先进的成功率和81.40％的基于模拟器的评价由DSTC8挑战提供了相当的成功率。

6. CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking [PDF] 返回目录
Zhi Chen, Lu Chen, Zihan Xu, Yanbin Zhao, Su Zhu, Kai Yu
Abstract: In dialogue systems, a dialogue state tracker aims to accurately find a compact representation of the current dialogue status, based on the entire dialogue history. While previous approaches often define dialogue states as a combination of separate triples ({\em domain-slot-value}), in this paper, we employ a structured state representation and cast dialogue state tracking as a sequence generation problem. Based on this new formulation, we propose a {\bf C}oa{\bf R}s{\bf E}-to-fine {\bf DI}alogue state {\bf T}racking ({\bf CREDIT}) approach. Taking advantage of the structured state representation, which is a marked language sequence, we can further fine-tune the pre-trained model (by supervised learning) by optimizing natural language metrics with the policy gradient method. Like all generative state tracking methods, CREDIT does not rely on pre-defined dialogue ontology enumerating all possible slot values. Experiments demonstrate our tracker achieves encouraging joint goal accuracy for the five domains in MultiWOZ 2.0 and MultiWOZ 2.1 datasets.
摘要：在对话系统，对话状态追踪器旨在准确地找到当前对话状态的紧凑表示，基于整个对话历史。虽然以前的方法通常定义对话状态作为单独的三元组的组合（{\ EM域时隙值}），在本文中，我们采用了结构化状态表示和铸造对话状态跟踪作为序列生成问题。基于此新配方，我们提出了{\ BFÇ} OA {\ BF R}Š{\ BFË} -to-细{\ BF DI} alogue状态{\ BF T】货架（{\ BF CREDIT}）接近。以结构化的状态表示，这是显着的语言序列的优点，我们可以通过与策略梯度法优化自然语言度量还微调预训练的模型（由监督学习）。像所有生成状态跟踪方法，信用不靠本体列举所有可能的槽值预先定义的对话。实验证明我们的跟踪器达到鼓励在MultiWOZ 2.0和2.1 MultiWOZ数据集五个领域共同目标的准确性。

7. Dual Learning for Dialogue State Tracking [PDF] 返回目录
Zhi Chen, Lu Chen, Yanbin Zhao, Su Zhu, Kai Yu
Abstract: In task-oriented multi-turn dialogue systems, dialogue state refers to a compact representation of the user goal in the context of dialogue history. Dialogue state tracking (DST) is to estimate the dialogue state at each turn. Due to the dependency on complicated dialogue history contexts, DST data annotation is more expensive than single-sentence language understanding, which makes the task more challenging. In this work, we formulate DST as a sequence generation problem and propose a novel dual-learning framework to make full use of unlabeled data. In the dual-learning framework, there are two agents: the primal tracker agent (utterance-to-state generator) and the dual utterance generator agent (state-to-utterance genera-tor). Compared with traditional supervised learning framework, dual learning can iteratively update both agents through the reconstruction error and reward signal respectively without labeled data. Reward sparsity problem is hard to solve in previous DST methods. In this work, the reformulation of DST as a sequence generation model effectively alleviates this problem. We call this primal tracker agent dual-DST. Experimental results on MultiWOZ2.1 dataset show that the proposed dual-DST works very well, especially when labelled data is limited. It achieves comparable performance to the system where labeled data is fully used.
摘要：面向任务的多圈的对话系统，对话状态指的是在对话历史背景下的用户目标的紧凑表示。对话状态跟踪（DST）是在每回合估计对话状态。由于对复杂的对话历史语境的依赖，DST数据注解比单句语言理解，这使得任务比较艰巨更加昂贵。在这项工作中，我们制定DST作为一个序列生成问题，并提出了一种新的双学框架，充分利用未标记的数据。在双学习框架，有两个代理：原初跟踪器代理（发声到状态发生器）和双发声发生器剂（状态到发声发电机）。与传统的监督学习框架相比，双学习可以迭代分别而不标记数据更新通过重构误差和奖励信号两种药剂。奖励稀疏的问题是很难在以前的DST的方法来解决。在这项工作中，DST作为序列生成模式的重新制定有效缓解这一问题。我们称这种原始跟踪剂双DST。在MultiWOZ2.1数据集的实验结果表明所提出的双DST工作得非常好，尤其是当标签的数据是有限的。它实现了相当的性能到标签的数据得到充分利用的系统。

8. Logical foundations for hybrid type-logical grammars [PDF] 返回目录
Richard Moot, Symon Stevens-Guille
Abstract: This paper explores proof-theoretic aspects of hybrid type-logical grammars , a logic combining Lambek grammars with lambda grammars. We prove some basic properties of the calculus, such as normalisation and the subformula property and also present both a sequent and a proof net calculus for hybrid type-logical grammars. In addition to clarifying the logical foundations of hybrid type-logical grammars, the current study opens the way to variants and extensions of the original system, including but not limited to a non-associative version and a multimodal version incorporating structural rules and unary modes.
摘要：本文探讨混合型逻辑文法的证明理论方面，结合Lambek逻辑与拉姆达语法语法。我们证明了演算的一些基本性能，如归一化和子式属性和同样存在于一个连续和混合型逻辑文法证明净演算。除了澄清混合型逻辑语法的逻辑基础，目前的研究开辟了道路变体和原始系统的扩展，包括但不限于一个非关联的版本和结合结构规则和一元模式的多模式版本。

9. Global-to-Local Neural Networks for Document-Level Relation Extraction [PDF] 返回目录
Difeng Wang, Wei Hu, Ermei Cao, Weijian Sun
Abstract: Relation extraction (RE) aims to identify the semantic relations between named entities in text. Recent years have witnessed it raised to the document level, which requires complex reasoning with entities and mentions throughout an entire document. In this paper, we propose a novel model to document-level RE, by encoding the document information in terms of entity global and local representations as well as context relation representations. Entity global representations model the semantic information of all entities in the document, entity local representations aggregate the contextual information of multiple mentions of specific entities, and context relation representations encode the topic information of other relations. Experimental results demonstrate that our model achieves superior performance on two public datasets for document-level RE. It is particularly effective in extracting relations between entities of long distance and having multiple mentions.
摘要：关系抽取（RE）的目标识别文本命名实体之间的语义关系。近年来，两国将其提升到文档级别，这需要与实体复杂的推理，并能在整个文档中提到。在本文中，我们提出了一个新的模式，文档级RE，在实体全局和局部交涉条款以及上下文关系表示编码的文档信息。全球实体表示文档中的所有实体的语义信息模型，实体地方交涉聚集特定实体的多个的上下文信息中提到，和上下文关系表示编码其他关系的主题。实验结果表明，我们的模型实现了两个公共数据集文档级RE卓越的性能。它是在提取长距离的实体之间的关系，并具有多个提到特别有效。

10. Structured Hierarchical Dialogue Policy with Graph Neural Networks [PDF] 返回目录
Zhi Chen, Xiaoyuan Liu, Lu Chen, Kai Yu
Abstract: Dialogue policy training for composite tasks, such as restaurant reservation in multiple places, is a practically important and challenging problem. Recently, hierarchical deep reinforcement learning (HDRL) methods have achieved good performance in composite tasks. However, in vanilla HDRL, both top-level and low-level policies are all represented by multi-layer perceptrons (MLPs) which take the concatenation of all observations from the environment as the input for predicting actions. Thus, traditional HDRL approach often suffers from low sampling efficiency and poor transferability. In this paper, we address these problems by utilizing the flexibility of graph neural networks (GNNs). A novel ComNet is proposed to model the structure of a hierarchical agent. The performance of ComNet is tested on composited tasks of the PyDial benchmark. Experiments show that ComNet outperforms vanilla HDRL systems with performance close to the upper bound. It not only achieves sample efficiency but also is more robust to noise while maintaining the transferability to other composite tasks.
摘要：复合任务，如在多个地方餐厅预订对话政策培训，是一个重要的现实意义和挑战性的问题。近日，深层次强化学习（HDRL）方法在复合任务取得了良好的业绩。然而，在香草HDRL，既顶层和低级别策略都是由多层感知器（的MLP），其采取的所有观察的级联从环境作为输入用于预测行动表示。因此，传统的HDRL方法往往从低采样效率和可转移性较差困扰。在本文中，我们要解决利用图形神经网络（GNNS）的灵活性这些问题。一种新的信通提出了分层剂的结构进行建模。信通的性能在PyDial标杆的复合任务测试。实验表明，信通性能优于香草HDRL系统的性能接近上限。它不仅实现了样品效率，而且是更强大的噪音，同时保持复制性等复合任务。

11. Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management [PDF] 返回目录
Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu
Abstract: The task-oriented spoken dialogue system (SDS) aims to assist a human user in accomplishing a specific task (e.g., hotel booking). The dialogue management is a core part of SDS. There are two main missions in dialogue management: dialogue belief state tracking (summarising conversation history) and dialogue decision-making (deciding how to reply to the user). In this work, we only focus on devising a policy that chooses which dialogue action to respond to the user. The sequential system decision-making process can be abstracted into a partially observable Markov decision process (POMDP). Under this framework, reinforcement learning approaches can be used for automated policy optimization. In the past few years, there are many deep reinforcement learning (DRL) algorithms, which use neural networks (NN) as function approximators, investigated for dialogue policy.
摘要：面向任务的语音对话系统（SDS）的目的是协助人类用户在完成特定的任务（例如，酒店预订）。对话管理是SDS的核心部分。有对话管理两个主要任务：对话信仰状态跟踪（总结对话历史记录）和对话的决策（决定如何回复用户）。在这项工作中，我们只专注于制定一项政策，选哪个对话的行动来响应用户。顺序系统的决策过程可以抽象成一个部分可观测马尔科夫决策过程（POMDP）。在此框架下，强化学习方法，可用于自动优化政策。在过去的几年里，有许多深刻的强化学习（DRL）算法，利用神经网络（NN）作为函数逼近，追究对话政策。

12. Deep Reinforcement Learning for On-line Dialogue State Tracking [PDF] 返回目录
Zhi Chen, Lu Chen, Xiang Zhou, Kai Yu
Abstract: Dialogue state tracking (DST) is a crucial module in dialogue management. It is usually cast as a supervised training problem, which is not convenient for on-line optimization. In this paper, a novel companion teaching based deep reinforcement learning (DRL) framework for on-line DST optimization is proposed. To the best of our knowledge, this is the first effort to optimize the DST module within DRL framework for on-line task-oriented spoken dialogue systems. In addition, dialogue policy can be further jointly updated. Experiments show that on-line DST optimization can effectively improve the dialogue manager performance while keeping the flexibility of using predefined policy. Joint training of both DST and policy can further improve the performance.
摘要：对话状态跟踪（DST）在对话管理的关键模块。它通常是投作为一个指导训练的问题，这是不便于在线优化。在本文中，用于在线DST优化基于新的伴侣教学深强化学习（DRL）框架建议。据我们所知，这是优化的DRL框架内DST模块上线面向任务的语音对话系统的第一次努力。此外，对话的政策可以进一步联合更新。实验表明，上线DST优化能有效地提高，对话管理器性能，同时保持使用预定义的策略的灵活性。既DST和政策的联合培训，可以进一步提高性能。

13. PodSumm -- Podcast Audio Summarization [PDF] 返回目录
Aneesh Vartakavi, Amanmeet Garg
Abstract: The diverse nature, scale, and specificity of podcasts present a unique challenge to content discovery systems. Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content. Some factors like the presentation style of the narrator and production quality are significant indicators of subjective user preference but are difficult to quantify and not reflected in the text descriptions provided by the podcast creators. We propose the automated creation of podcast audio summaries to aid in content discovery and help listeners to quickly preview podcast content before investing time in listening to an entire episode. In this paper, we present a method to automatically construct a podcast summary via guidance from the text-domain. Our method performs two key steps, namely, audio to text transcription and text summary generation. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. We fine-tune a PreSumm[10] model with our augmented dataset and perform an ablation study. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset. We hope these results may inspire future research in this direction.
摘要：不同的性质，规模，和播客的特异性呈现给内容发现系统中一个独特的挑战。听众往往依赖于播客制作者提供的情节的文字描述来发现新的内容。喜欢的解说员和生产质量信息的呈现样式的一些因素是主观的用户偏好的显著指标，但难以量化，而不是体现在播客制作者提供的文字说明。我们建议播客音频摘要的自动创建在听完整个事件投入时间之前的内容发现和帮助听众，以帮助快速预览播客内容。在本文中，我们提出了一种方法，以通过从所述文本域指导自动构造一个播客的总结。我们的方法执行两个关键步骤，即音频转录文本和文本生成摘要。由于缺乏数据集用于这项任务的启发，我们策划一个内部数据集，找到数据增强的有效方案，并设计了协议从注释收集汇总。我们微调PreSumm [10]模型与我们的增强数据集，并进行消融研究。我们的方法实现ROUGE-F（1/2 / L）的对我们的数据集0.63 / 0.53 / 0.63的分数。我们希望这些结果可能激发未来研究这个方向发展。

14. Event Coreference Resolution via a Multi-loss Neural Network without Using Argument Information [PDF] 返回目录
Xinyu Zuo, Yubo Chen, Kang Liu, Jun Zhao
Abstract: Event coreference resolution(ECR) is an important task in Natural Language Processing (NLP) and nearly all the existing approaches to this task rely on event argument information. However, these methods tend to suffer from error propagation from the stage of event argument extraction. Besides, not every event mention contains all arguments of an event, and argument information may confuse the model that events have arguments to detect event coreference in real text. Furthermore, the context information of an event is useful to infer the coreference between events. Thus, in order to reduce the errors propagated from event argument extraction and use context information effectively, we propose a multi-loss neural network model that does not need any argument information to do the within-document event coreference resolution task and achieve a significant performance than the state-of-the-art methods.
摘要：事件指代消解（ECR）是在自然语言处理（NLP）和几乎所有的现有做法，这个任务依赖于事件的参数信息的一个重要任务。然而，这些方法倾向于从误差传播从事件参数提取阶段遭受。此外，并不是每一个事件提包含事件的所有参数和参数的信息可能会混淆模型，事件具有参数实时检测文本事件共指。此外，事件的上下文信息是有用的推断事件之间的共参照。因此，为了有效地减少事件参数提取和使用上下文信息传播的错误，我们建议不需要任何参数信息做内部文档事件指代消解任务，并实现了显著的性能多亏损神经网络模型比状态的最先进的方法。

15. Towards Causal Explanation Detection with Pyramid Salient-Aware Network [PDF] 返回目录
Xinyu Zuo, Yubo Chen, Kang Liu, Jun Zhao
Abstract: Causal explanation analysis (CEA) can assist us to understand the reasons behind daily events, which has been found very helpful for understanding the coherence of messages. In this paper, we focus on \emph{Causal Explanation Detection}, an important subtask of causal explanation analysis, which determines whether a causal explanation exists in one message. We design a \textbf{P}yramid \textbf{S}alient-\textbf{A}ware \textbf{N}etwork (PSAN) to detect causal explanations on messages. PSAN can assist in causal explanation detection via capturing the salient semantics of discourses contained in their keywords with a bottom graph-based word-level salient network. Furthermore, PSAN can modify the dominance of discourses via a top attention-based discourse-level salient network to enhance explanatory semantics of messages. The experiments on the commonly used dataset of CEA shows that the PSAN outperforms the state-of-the-art method by 1.8\% F1 value on the \emph{Causal Explanation Detection} task.
摘要：因果解释分析（CEA）可以帮助我们理解日常事件，已发现非常有帮助理解信息的一致性背后的原因。在本文中，我们侧重于\ {EMPH因果解释检测}，因果解释分析的一个重要的子任务，它决定在一个消息中是否存在因果解释。我们设计了\ textbf {P} yramid \ textbf {S} alient- \ textbf {A}洁具\ textbf {N} etwork（PSAN）来检测上的消息因果解释。 PSAN可以通过捕捉包含在他们的关键字底基于图的字级显着的网络话语的突出语义协助因果解释检测。此外，PSAN可以经由基于关注顶话语级凸网络修改话语的主导地位，增强消息的说明语义。上的CEA示出了通常使用的数据集，所述PSAN优于通过在\ {EMPH因果解释检测}任务1.8 \％F1值的状态的最先进的方法的实验。

16. Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application [PDF] 返回目录
Chris J. Kennedy, Geoff Bacon, Alexander Sahn, Claudia von Vacano
Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech in our case, into multiple constituent components that are labeled as ordinal survey items. Those survey responses are transformed via IRT into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of each labeler using faceted IRT, allowing responses from low-quality labelers to be removed. Our faceted Rasch scaling procedure integrates naturally with a multitask deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal variables for the neural networks' internal concept learning. We test the use of an activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretation because each continuous prediction can be directly explained by the constituent components in the penultimate layer. We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw's Perspective API models, showing significant improvement over this standard benchmark.
摘要：本文提出了一种通用的方法通过监督深度学习与建设措施相结合的测量在一个连续，间隔频谱复变数的方法来面Rasch项目反应理论（IRT）。我们在我们的情况分解目标结构，仇恨言论，成标记为序调查项目的多个组成部件。这些调查的答复是通过IRT转化为debiased，连续成果的措施。我们的方法估算人类贴标并消除了调查演绎偏见，对产生的连续测量的影响。我们使用方位IRT，从而允许去除从低质量的贴标响应还可以估计每个贴标的响应质量。我们的面Rasch缩放流程整合自然与多任务深学习架构上的新数据自动预测。目标成果的理论成分的评级被用作神经网络的内部概念学习监督，有序变量。我们测试使用的激活功能（序SOFTMAX），旨在利用序结果变量的结构损失函数（序交叉熵）的。我们的多任务体系结构导致模型解释的一种新形式的，因为每个连续预测可以直接由在倒数第二层的构成成分进行说明。我们证明对来自YouTube，Twitter的，和Reddit来源和11,000设在美国，亚马逊的Mechanical Turk工人标记测量从仇恨言论counterspeech连续光谱50000个社交媒体评论的数据集这种新方法。我们评估万能句式编码器，BERT和罗伯塔的语言表示的型号为注释文本，并比较我们的预测精度谷歌拼图的角度来看API模型，展示了该标准的基准显著改善。

17. ALICE: Active Learning with Contrastive Natural Language Explanations [PDF] 返回目录
Weixin Liang, James Zou, Zhou Yu
Abstract: Training a supervised neural network classifier typically requires many annotated training samples. Collecting and annotating a large number of data points are costly and sometimes even infeasible. Traditional annotation process uses a low-bandwidth human-machine communication interface: classification labels, each of which only provides several bits of information. We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning. ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations from experts. Then it extracts knowledge from these explanations using a semantic parser. Finally, it incorporates the extracted knowledge through dynamically changing the learning model's structure. We applied ALICE in two visual recognition tasks, bird species classification and social relationship classification. We found by incorporating contrastive explanations, our models outperform baseline models that are trained with 40-100% more training data. We found that adding 1 explanation leads to similar performance gain as adding 13-30 labeled training data points.
摘要：培训监督的神经网络分类器通常需要许多注释的训练样本。收集和注释了大量的数据点的是昂贵的，有时甚至是不可行的。传统注释过程采用了低带宽人机通信接口：分类标签，其中的每一个仅提供的信息的若干比特。我们建议用对比说明（ALICE），一个专家在半实物培训框架，利用对比自然语言解释的学习，以提高数据效率的主动学习。 ALICE学会第一次使用主动学习来选择最翔实的对标签类从专家征求对比自然语言的解释。然后，它提取从使用语义解析这些解释的知识。最后，它包含通过动态改变学习模式的结构所提取的知识。我们在两个视觉识别任务应用的驴友，鸟类物种分类和社会关系的分类。我们发现通过将对比说明，我们的模型超越了与40-100％的训练数据训练的基本模式。我们发现，添加1所解释导致类似的性能增益增加13-30标记的训练数据点。

18. An Empirical Study on Neural Keyphrase Generation [PDF] 返回目录
Rui Meng, Xingdi Yuan, Tong Wang, Sanqiang Zhao, Adam Trischler, Daqing He
Abstract: Recent years have seen a flourishing of neural keyphrase generation works, including the release of several large-scale datasets and a host of new models to tackle them. Model performance on keyphrase generation tasks has increased significantly with evolving deep learning research. However, there lacks a comprehensive comparison among models, and an investigation on related factors (e.g., architectural choice, decoding strategy) that may affect a keyphrase generation system's performance. In this empirical study, we aim to fill this gap by providing extensive experimental results and analyzing the most crucial factors impacting the performance of keyphrase generation models. We hope this study can help clarify some of the uncertainties surrounding the keyphrase generation task and facilitate future research on this topic.
摘要：最近几年，一个蓬勃发展的神经的关键词一代的作品，其中包括几家大型数据集的释放和一台主机的新模式来解决这些问题。在关键词的生成任务模型的性能已与深发展的学习研究显著上升。然而，缺乏模型中进行综合比较，并可能影响一个关键词的发电系统的性能相关的因素（例如，架构选择，解码策略）进行调查。在此实证研究中，我们的目标是通过提供广泛的实验结果和分析影响的关键词一代机型性能的最关键因素，填补了这一空白。我们希望这项研究能够帮助澄清一些周边的关键词生成任务的不确定性，并有助于未来对这个课题的研究。

19. The Persian Dependency Treebank Made Universal [PDF] 返回目录
Mohammad Sadegh Rasooli, Pegah Safari, Amirsaeid Moloodi, Alireza Nourian
Abstract: We describe an automatic method for converting the Persian Dependency Treebank (Rasooli et al, 2013) to Universal Dependencies. This treebank contains 29107 sentences. Our experiments along with manual linguistic analysis show that our data is more compatible with Universal Dependencies than the Uppsala Persian Universal Dependency Treebank (Seraji et al., 2016), and is larger in size and more diverse in vocabulary. Our data brings in a labeled attachment F-score of 85.2 in supervised parsing. Our delexicalized Persian-to-English parser transfer experiments show that a parsing model trained on our data is ~2% absolutely more accurate than that of Seraji et al. (2016) in terms of labeled attachment score.
摘要：我们描述的自动方法，用于将波斯依赖树库（Rasooli等人，2013），以通用的依赖关系。该树库包含29107个句子。我们与人工语言分析表明，我们的数据是具有比乌普萨拉波斯通用依赖树库通用依赖性更兼容沿着实验（Seraji等，2016），并且是尺寸较大，在词汇更加多样化。在85.2的监督解析标记的附件F-得分我们的数据带来的。我们的波斯虚化到英语语法分析器转移实验表明，经过训练我们的数据解析模型〜2％，绝对比Seraji等人的更准确。（2016）在标记附着分数方面。

20. SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness [PDF] 返回目录
Nathan Ng, Kyunghyun Cho, Marzyeh Ghassemi
Abstract: Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training examples by using a pair of corruption and reconstruction functions to move randomly on a data manifold. We investigate the use of SSMBA in the natural language domain, leveraging the manifold assumption to reconstruct corrupted text with masked language models. In experiments on robustness benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods and baseline models on both in-domain and OOD data, achieving gains of 0.8% accuracy on OOD Amazon reviews, 1.8% accuracy on OOD MNLI, and 1.4 BLEU on in-domain IWSLT14 German-English.
摘要：模型，在训练领域表现良好常常不能推广到外的域（OOD）的例子。数据增强是用于防止过度拟合，提高OOD一般化的常用方法。然而，在自然语言，就很难产生停留在基础数据歧管新的例子。我们引入SSMBA，用于通过使用对腐败和重建函数，其对数据歧管随机移动生成合成训练示例数据增强方法。我们研究自然语言领域使用SSMBA的，利用歧管的假设来重建有掩盖语言模型损坏的文本。在横跨3个任务和9点的数据集上的稳健性基准测试实验，SSMBA始终优于现有的数据隆胸方法和基本模式，无论是在域和OOD的数据，实现了0.8％的精度上OOD亚马逊的评论，1.8％的精度上OOD MNLI的收益，并1.4 BLEU在域IWSLT14德语英语。

21. "When they say weed causes depression, but it's your fav antidepressant": Knowledge-aware Attention Framework for Relationship Extraction [PDF] 返回目录
Shweta Yadav, Usha Lokala, Raminta Daniulaityte, Krishnaprasad Thirunarayan, Francois Lamy, Amit Sheth
Abstract: With the increasing legalization of medical and recreational use of cannabis, more research is needed to understand the association between depression and consumer behavior related to cannabis consumption. Big social media data has potential to provide deeper insights about these associations to public health analysts. In this interdisciplinary study, we demonstrate the value of incorporating domain-specific knowledge in the learning process to identify the relationships between cannabis use and depression. We develop an end-to-end knowledge infused deep learning framework (Gated-K-BERT) that leverages the pre-trained BERT language representation model and domain-specific declarative knowledge source (Drug Abuse Ontology (DAO)) to jointly extract entities and their relationship using gated fusion sharing mechanism. Our model is further tailored to provide more focus to the entities mention in the sentence through entity-position aware attention layer, where ontology is used to locate the target entities position. Experimental results show that inclusion of the knowledge-aware attentive representation in association with BERT can extract the cannabis-depression relationship with better coverage in comparison to the state-of-the-art relation extractor.
摘要：随着医疗，康乐使用大麻的合法化增加，需要更多的研究来了解抑郁症和相关的大麻消费消费行为之间的关联。大社交媒体数据有潜力提供有关这些协会的公共卫生分析师更深刻的见解。在这种跨学科的研究中，我们说明将在学习过程中特定领域的知识，以确定使用大麻和抑郁症之间的关系的价值。我们开发的终端到终端的知识灌输深刻的学习框架（门控-K-BERT），它利用预先训练的BERT语言表达模型和特定领域的陈述性知识源（药物滥用本体（DAO））共同提取实体和使用门融合共享机制，他们的关系。我们的模型进一步定制提供更多的焦点实体在句子中通过实体位置感知层的关注，其中本体是用来定位目标的实体位置不在话下。实验结果表明，协会，包括知识，知道细心的代表与BERT可以提取相比，国家的最先进的提取关系具有更好的覆盖大麻抑郁症的关系。

22. A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline [PDF] 返回目录
Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol
Abstract: We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 335 hours of transcribed audio comprising over 154,000 utterances spoken by participants from different regions, age groups, and gender. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and prepossessing procedures followed by the description of the database specifications. We also share our experience and challenges faced during database construction. To demonstrate the reliability of the database, we performed the preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts are promising. To enable experiment reproducibility and ease the corpus usage, we also released the ESPnet recipe.
摘要：我们提出一个开放源码的语料库为哈萨克语。哈语料库（KSC）包含围绕335小时转录音频的包含超过154000话语通过从不同区域，年龄组，性别和与会者发言。它进行了仔细母语哈萨克语扬声器检查，以确保高品质。该KSC是开发推进各种哈萨克语音和语言处理应用的最大的公开可用的数据库。在本文中，我们首先描述了数据收集和讨人喜欢遵循的程序由数据库规范的描述。我们也分享我们的经验和数据库建设过程中面临的挑战。为了证明该数据库的可靠性，我们进行了初步的语音识别实验。实验结果暗示的音频和成绩单的质量是有前途的。为了使实验的重复性和缓解语料库的使用，我们也发布了ESPnet配方。

23. End-to-End Speech Recognition and Disfluency Removal [PDF] 返回目录
Paria Jamshid Lou, Mark Johnson
Abstract: Disfluency detection is usually an intermediate step between an automatic speech recognition (ASR) system and a downstream task. By contrast, this paper aims to investigate the task of end-to-end speech recognition and disfluency removal. We specifically explore whether it is possible to train an ASR model to directly map disfluent speech into fluent transcripts, without relying on a separate disfluency detection model. We show that end-to-end models do learn to directly generate fluent transcripts; however, their performance is slightly worse than a baseline pipeline approach consisting of an ASR system and a disfluency detection model. We also propose two new metrics that can be used for evaluating integrated ASR and disfluency models. The findings of this paper can serve as a benchmark for further research on the task of end-to-end speech recognition and disfluency removal in the future.
摘要：不流利检测通常是自动语音识别（ASR）系统和下游任务之间的中间步骤。相比之下，本文旨在探讨终端到终端的语音识别和去除不流利的任务。我们特别探讨是否有可能培养一个ASR模式不流利的语音直接映射到流利的成绩单，不依靠单独的不流利检测模型。我们表明，终端到高端机型也学会直接生成流畅的成绩单;然而，他们的表现比由ASR系统和不流利检测模型的基准管道的方法略差。我们还建议，可用于评估集成ASR和不流利模型两个新指标。本文的研究结果可以作为对终端到终端的语音识别的任务，并在未来不流利去除进一步研究的基准。

24. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis [PDF] 返回目录
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Ming Zhou, Ambrosio Blanco, Shuai Ma
Abstract: Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models. In the area of code synthesis, the commonly used evaluation metric is BLEU or perfect accuracy, but they are not suitable enough to evaluate codes, because BLEU is originally designed to evaluate the natural language, neglecting important syntactic and semantic features of codes, and perfect accuracy is too strict thus it underestimates different outputs with the same semantic logic. To remedy this, we introduce a new automatic evaluation metric, dubbed CodeBLEU. It absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow. We conduct experiments by evaluating the correlation coefficient between CodeBLEU and quality scores assigned by the programmers on three code synthesis tasks, i.e., text-to-code, code translation, and code refinement. Experimental results show that our proposed CodeBLEU can achieve a better correlation with programmer assigned scores compared with BLEU and accuracy.
摘要：评价指标中发挥面积增长了至关重要的作用，因为它定义好的和坏的车型区分的标准。在代码合成领域，常用的评价指标是BLEU或完全准确，但它们不适合不足以评估代码，因为BLEU最初设计为评估自然语言，而忽略了重要的语法和规范的语义特征，及完善精度过于严格因此低估具有相同语义逻辑不同的输出。为了解决这个问题，我们引入了一个新的自动评估度，被称为CodeBLEU。它吸收BLEU的在n-gram中的匹配，并进一步注入代码语法通过经由数据流抽象语法树（AST）和代码语义的强度。我们通过评估CodeBLEU和程序员三个码合成任务分配质量分数之间的相关系数进行实验，即文本到代码，代码转换和代码细化。实验结果表明，该CodeBLEU可以实现与程序员分配的分数更好的相关性与BLEU和准确性进行比较。

25. SQuARE: Semantics-based Question Answering and Reasoning Engine [PDF] 返回目录
Kinjal Basu, Sarat Chandra Varanasi, Farhad Shakerin, Gopal Gupta
Abstract: Understanding the meaning of a text is a fundamental challenge of natural language understanding (NLU) and from its early days, it has received significant attention through question answering (QA) tasks. We introduce a general semantics-based framework for natural language QA and also describe the SQuARE system, an application of this framework. The framework is based on the denotational semantics approach widely used in programming language research. In our framework, valuation function maps syntax tree of the text to its commonsense meaning represented using basic knowledge primitives (the semantic algebra) coded using answer set programming (ASP). We illustrate an application of this framework by using VerbNet primitives as our semantic algebra and a novel algorithm based on partial tree matching that generates an answer set program that represents the knowledge in the text. A question posed against that text is converted into an ASP query using the same framework and executed using the s(CASP) goal-directed ASP system. Our approach is based purely on (commonsense) reasoning. SQuARE achieves 100% accuracy on all the five datasets of bAbI QA tasks that we have tested. The significance of our work is that, unlike other machine learning based approaches, ours is based on "understanding" the text and does not require any training. SQuARE can also generate an explanation for an answer while maintaining high accuracy.
摘要：理解文本的意义是自然语言理解（NLU）一个根本性的挑战，并从成立之初，就已经通过问答（QA）任务收到显著的关注。我们引入了自然语言QA基于语义的总体框架，并描述了方形系统，这个框架的应用。该框架是基于指称语义的方法在语言研究编程广泛使用。在我们的分析框架，估值功能的文字语法树映射到其常识性的含义使用利用回答集编程（ASP）编码的基本知识原语（语义代数）表示。我们通过使用VerbNet原语为我们的语义代数和基于部分树匹配生成表示在文本知识的答案集程序的新算法说明这个框架的应用。针对文本中提出的问题被转换成使用相同的框架的ASP查询并使用S（CASP）目标导向ASP系统执行。我们的做法是在（常识）的推理基于纯粹。方实现在所有的芭比QA任务的五个数据集，我们已经测试了100％的准确率。我们工作的意义在于，不同于其他基于机器学习的方法，我们是基于“理解”的文字，并且不需要任何培训。同时保持高精确度广场还可以生成一个答案的解释。

26. A Finitist's Manifesto: Do we need to Reformulate the Foundations of Mathematics? [PDF] 返回目录
Jonathan Lenchner
Abstract: There is a problem with the foundations of classical mathematics, and potentially even with the foundations of computer science, that mathematicians have by-and-large ignored. This essay is a call for practicing mathematicians who have been sleep-walking in their infinitary mathematical paradise to take heed. Much of mathematics relies upon either (i) the "existence'" of objects that contain an infinite number of elements, (ii) our ability, "in theory", to compute with an arbitrary level of precision, or (iii) our ability, "in theory", to compute for an arbitrarily large number of time steps. All of calculus relies on the notion of a limit. The monumental results of real and complex analysis rely on a seamless notion of the "continuum" of real numbers, which extends in the plane to the complex numbers and gives us, among other things, "rigorous" definitions of continuity, the derivative, various different integrals, as well as the fundamental theorems of calculus and of algebra - the former of which says that the derivative and integral can be viewed as inverse operations, and the latter of which says that every polynomial over $\mathbb{C}$ has a complex root. This essay is an inquiry into whether there is any way to assign meaning to the notions of "existence" and "in theory'" in (i) to (iii) above.
摘要：与传统的数学基础的问题，甚至可能与计算机科学的基础，是数学家们通过和大型忽略。本文是实践谁一直梦游数学家在他们infinitary数学天堂采取谨慎的呼叫。许多数学的依赖于或者（i）将包含元件的‘理论上’的无限数量，（II）我们的能力，对象，来计算与精度任意水平，或（iii）我们的能力的“存在“” ，“理论上”，以计算任意大量的时间的步骤。所有演算依赖于极限的概念。真实和复杂的分析不朽的结果依赖于实数的“连续”，这在平面延伸到复杂的数字，给了我们，除其他事项外的无缝概念，“严谨”连续性的定义，衍生，各种不同的积分，以及微积分和代数基本定理 - 其中前者说，微分和积分可以被看作是相反的操作，而后者说，每一个多项式超过$ \ mathbb {C} $有一个复杂的根。本文是调查是否有任何的方式来分配的意义，“存在”和“理论“”在（i）至（三）项的概念。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-09-23

目录

摘要