摘要

1. Novel Keyword Extraction and Language Detection Approaches [PDF] 返回目录
Malgorzata Pikies, Andronicus Riyono, Junade Ali
Abstract: Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6% decrease in processing time with an estimated improvement in recall of 3.1% at the cost of a 2.6% decrease in precision. This approach is able to work even where keywords are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14% more likely to match the classification than the IP Address.
摘要：模糊字符串匹配和语言的分类在自然语言处理管道的重要工具，本文在这两个领域提供进展。我们提出了一个快速的新方法，以字符串断词的模糊语言匹配和实验证明处理时间，在精度的2.6％减少的成本的3.1％，召回的估计提高了83.6％的下降。这种方法能够即使关键字细分为多个单词的工作，而无需扫描字符至字符。到目前为止，已经使用元数据来增强语言的分类算法考虑一些工作。我们提供的观测数据，并找到Accept-Language头为14％更有可能比相匹配的IP地址分类。

2. Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences [PDF] 返回目录
Boon Peng Yap, Andrew Koh Jin Jie, Eng Siong Chng
Abstract: Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks. In this work, we propose to formulate word sense disambiguation as a relevance ranking task, and fine-tune BERT on sequence-pair ranking task to select the most probable sense definition given a context sentence and a list of candidate sense definitions. We also introduce a data augmentation technique for WSD using existing example sentences from WordNet. Using the proposed training objective and data augmentation technique, our models are able to achieve state-of-the-art results on the English all-words benchmark datasets.
摘要：域改编或使用预训练的语言模型，如BERT已被证明是对许多自然语言处理任务的有效途径迁移学习。在这项工作中，我们建议制定词义消歧作为序列对的排序任务相关性排序任务，并进行微调BERT选择给定语境的句子和候选感定义列表的最可能的意义的定义。我们还引入了数据增强技术使用WSD从现有的WordNet例句。使用所提出的培养目标和数据增强技术，我们的模型能够实现对所有英语词基准数据集的国家的最先进的成果。

3. Generating Commonsense Explanation by Extracting Bridge Concepts from Reasoning Paths [PDF] 返回目录
Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang
Abstract: Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense. While this task is easy to human, the machine still struggles to generate reasonable and informative explanations. In this work, we propose a method that first extracts the underlying concepts which are served as \textit{bridges} in the reasoning chain and then integrates these concepts to generate the final explanation. To facilitate the reasoning process, we utilize external commonsense knowledge to build the connection between a statement and the bridge concepts by extracting and pruning multi-hop paths to build a subgraph. We design a bridge concept extraction model that first scores the triples, routes the paths in the subgraph, and further selects bridge concepts with weak supervision at both the triple level and the concept level. We conduct experiments on the commonsense explanation generation task and our model outperforms the state-of-the-art baselines in both automatic and human evaluation.
摘要：常识的解释产生的目的是通过产生合理的解释，对常识性陈述授权机器的意义建构的能力。这个任务是很容易的人，机器仍在努力产生合理的和翔实的解释。在这项工作中，我们提出了一种方法，首先提取其担任的推理链条\ textit {}的桥梁，然后整合这些概念的一些基本概念，产生最终的解释。为了便于推理过程中，我们利用外部常识的知识，建立了言，并通过提取和修剪多跳路径建立一个子图大桥概念之间的连接。我们设计的桥梁概念提取模式，第一的分数，在子图的三元组，路由路径，并在三重水平和观念层面都监管不力进一步选择桥的概念。我们对常识的解释生成任务进行实验，我们的模型优于国家的最先进的基线在自动和人工评估。

4. Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph [PDF] 返回目录
Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang
Abstract: Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation. Existing approaches that integrate commonsense knowledge into generative pre-trained language models simply transfer relational knowledge by post-training on individual knowledge triples while ignoring rich connections within the knowledge graph. We argue that exploiting both the structural and semantic information of the knowledge graph facilitates commonsense-aware text generation. In this paper, we propose Generation with Multi-Hop Reasoning Flow (GRF) that enables pre-trained models with dynamic multi-hop reasoning on multi-relational paths extracted from the external commonsense knowledge graph. We empirically show that our model outperforms existing baselines on three text generation tasks that require reasoning over commonsense knowledge. We also demonstrate the effectiveness of the dynamic multi-hop reasoning module with reasoning paths inferred by the model that provide rationale to the generation.
摘要：尽管生成预训练的语言模型的一系列的文本生成任务的成功，他们仍然遭受情况下，在推理过程中产生需要常识知识库底层。集成了常识性的知识转化为生成预训练的语言模型现有的方法简单地通过个人知识三元培训后转移关系的知识而忽略了知识图中丰富的连接。我们认为，双方利用知识图的结构和语义信息有助于常识感知文本生成。在本文中，我们提出了用代多跳推理流量（GRF），能够与来自外部的常识知识图提取多关系路径动态多跳推理预先训练模式。我们经验表明，我们的模型优于现有的在需要推理在常识知识库三个文本生成任务基线。我们还演示了用推理由模型推断路径，为发电提供理论基础的动态多跳推理模块的有效性。

5. N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models [PDF] 返回目录
Wanxiang Che, Yunlong Feng, Libo Qin, Ting Liu
Abstract: We introduce \texttt{N-LTP}, an open-source Python Chinese natural language processing toolkit supporting five basic tasks: Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic dependency parsing. \texttt{N-LTP} adopts the multi-task framework with the pre-trained model to capture the shared knowledge across all Chinese relevant tasks. In addition, we propose to use knowledge distillation where single-task models teach a multi-task model, helping the multi-task model surpass its single-task teachers. Finally, we provide fundamental tasks API and a visualization tool to make users easier to use and view the processing results directly. To the best of our knowledge, this is the first toolkit to support all Chinese NLP fundamental tasks. Source code, documentation, and pre-trained models are available at this https URL.
摘要：介绍\ texttt {N-LTP}，一个开源的Python中国自然语言处理工具包支持五项基本任务：中国的分词，部分词性标注，命名实体识别，依存分析和语义依存分析。 \ texttt {N-LTP}采用与预先训练的模型来捕捉所有中国相关任务共享知识的多任务框架。此外，我们建议使用知识蒸馏，其中单任务模式教多任务模式，帮助多任务模式超越了单任务的教师。最后，我们提供基本任务的API和可视化工具，使用户更易于使用和直接查看处理结果。据我们所知，这是第一个工具包，支持所有中国NLP根本任务。源代码，文档，以及预先训练模式可在此HTTPS URL。

6. Feature Adaptation of Pre-Trained Language Models across Languages and Domains for Text Classification [PDF] 返回目录
Hai Ye, Qingyu Tan, Ruidan He, Juntao Li, Hwee Tou Ng, Lidong Bing
Abstract: Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.
摘要：适应预先训练的语言模型（PrLMs）（例如，BERT）到新的领域最近获得广泛关注。相反，在大多数以前的工作做了微调PrLMs，我们探讨如何适应PrLMs新域的功能，而无需微调。我们将探讨在本文中无人监管的领域适应性（UDA）。从PrLMs功能，我们适应从源域到未标记的目标域标记数据训练的模型。自培训被广泛用于UDA这对培训目标域数据预测伪标签。然而，预测伪标签，难免有噪音，这将培养稳健的模型产生负面影响。要提高自我锻炼的鲁棒性，在本文中，我们目前一流的感知功能的自蒸馏（CFD）学习从PrLMs判别特征，其中PRLM特点是自我提炼成一个功能适配模块，并从相同的特征类更紧密聚集。我们进一步扩展CFD到跨语言设置，在这种语言的差异进行了研究。在两个单语和多语种亚马逊审核数据集实验结果表明，CFD可以持续改进的跨域和跨语言设置自我训练的表现。

7. Grounded Compositional Outputs for Adaptive Language Modeling [PDF] 返回目录
Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith
Abstract: Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's \emph{vocabulary}---typically selected before training and permanently fixed later---affects its size and is part of what makes it resistant to such adaptation. Prior work has used compositional input embeddings based on surface forms to ameliorate this issue. In this work, we go one step beyond and propose a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary. We evaluate the model on conventional language modeling as well as challenging cross-domain settings with an open vocabulary, finding that it matches or outperforms previous state-of-the-art output embedding methods and adaptation approaches. Our analysis attributes the improvements to sample efficiency: our model is more accurate for low-frequency words.
摘要：语言模型已成为横跨NLP的核心组成部分，和了很大的进展依赖于廉价地适应它们的能力（例如，通过微调）到新的领域和任务。语言模型的\ {EMPH词汇} ---通常训练之前选择并永久固定以后---影响了它的大小，是什么使得它以这种适应性的一部分。以前的工作已经使用了基于表面形式组成的嵌入输入改善这个问题。在这项工作中，我们走一步超越，并提出了语言模型的全成分输出埋层，这是在信息进一步接地从一个结构化的词汇（共发现），即语义相关的词和自由文本定义。据我们所知，其结果是第一个字级语言模型的大小并不依赖于训练的词汇。我们评估对传统语言建模模型，以及以开放的词汇挑战跨域的设置，发现它符合或远远超过前国家的最先进的输出嵌入方法和适应方法。我们的分析属性的改进样品效率：我们的模型是低频词更准确。

8. Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems [PDF] 返回目录
Wei Zhao, Mingyue Shang, Yang Liu, Liang Wang, Jingming Liu
Abstract: Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at this https URL
摘要：自动数学文字问题的解决已经引起越来越多的关注，近年来。由以前的作品中所使用的数据集的评估在规模和多样性方面严重的限制。在本文中，我们发布了新的大规模和丰富的模板，数学应用题集命名Ape210K。它由210K中国小学级数学题，这是最大的公共数据集Math23K的大小9倍。每个问题都包含黄金的答案并从中获得答案所需的方程。 Ape210K也更具多样性与56K模板，这比Math23K多25倍。我们的分析表明，解决Ape210K不仅需要自然语言理解，而且常识知识库。我们预计Ape210K成为数学文字问题解决系统的基准。实验表明，在Math23K国家的最先进的车型数据集上Ape210K表现不佳。我们提出了一个复制和增强功能丰富的序列顺序（seq2seq）模型，它在Math23K数据集3.2％优于现有的模型，并作为Ape210K数据集的强大的基线。差距还是人类和我们的基准模型之间显著，需要进一步的研究工作。我们做Ape210K数据集中公布于该HTTPS URL

9. AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation [PDF] 返回目录
Huishuang Tian, Kexin Yang, Dayiheng Liu, Jiancheng Lv
Abstract: Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release AnchiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of-the-art results in all cases.
摘要：中国古代是中国文化的精髓。有中国古域的几个自然语言处理任务，比如中国古代的现代的翻译，诗歌生成和对联的产生。以前的研究通常使用的监督模式，其深深依赖于并行数据。但是，很难获得中国古代的大规模并行数据。为了充分利用更容易获得单语中国古代语料，我们发布AnchiBERT，基于BERT的架构，这是在大型的中国古代语料训练的预先训练的语言模型。我们评估两个语言理解和生成任务，包括诗分类，古代现代的中国翻译，诗歌的产生，并对联代AnchiBERT。实验结果表明，AnchiBERT性能优于BERT以及非预训练的模型和实现国家的最先进成果在所有情况下。

10. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [PDF] 返回目录
Sam Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith
Abstract: Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from a widely-used toxicity classifier. Using RealToxicityPrompts, we find that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts. We empirically assess several controllable generation methods, and find that while data- or compute-intensive methods (e.g., adaptive pretraining on non-toxic data) are more effective at steering away from toxicity than simpler solutions (e.g., banning "bad" words), no current method is failsafe against neural toxic degeneration. To pinpoint the potential cause of such persistent toxic degeneration, we analyze two web text corpora used to pretrain several LMs (including GPT-2; Radford et. al, 2019), and find a significant amount of offensive, factually unreliable, and otherwise toxic content. Our work provides a test bed for evaluating toxic generations by LMs and stresses the need for better data selection processes for pretraining.
摘要：预训练的神经语言模型（LMS）易于发生种族主义，性别歧视，或其他有毒的语言，阻碍了他们的安全部署。我们调查到预训练的LMS能够提示会产生有毒的语言程度，可控的文本生成算法，以防止这种有毒变性的效果。我们创建和发布RealToxicityPrompts，100K的自然产生的数据集，语句级的提示从大量语料英文网页的文本，从一个广泛使用的毒性分类与毒性分数配对的。使用RealToxicityPrompts，我们发现，预训练的LMS能够蜕变成有毒的文本甚至看似无害的提示。我们根据经验评估几个可控的生成方法，并发现，虽然数据 - 或计算密集的方法（例如，在无毒的数据适应性训练前）正处于转向由毒性较简单的解决方案远更有效（例如，禁止“坏”字），没有电流的方法是对神经有毒的退化失效保护。为了查明这种持久性有毒变性的潜在原因，我们分析两个网络文本语料库用于pretrain几个LM的（包括GPT-2;雷德福等，2019），并找到一个显著量攻势，事实上是不可靠的，否则有毒内容。我们的工作提供了一个测试平台，用于评估由学习有毒代和强调，为了更好地训练前的数据选择过程的需要。

11. Task-Oriented Dialogue as Dataflow Synthesis [PDF] 返回目录
Jacob Andreas, John Bufe, David Burkett, Charles Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christopher H Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang, Alexander Zotov
Abstract: We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset and code for replicating experiments are available at this https URL.
摘要：我们描述了一种方法，其中的对话状态表示为数据流图面向任务的对话。甲对话代理程序的每个用户话语映射到延伸该曲线图的程序。课程包括metacomputation运营商参考和修改是重复使用先前轮流数据流片段。我们基于图形的状态使复杂的用户意图的表达和处理，并明确metacomputation使这些意图更容易学模型来预测。我们推出了新的数据集，SMCalFlow，具有约事件，天气，地点和人的复杂的对话。实验表明，数据流图和metacomputation大幅提高这些自然的对话表示性和可预见性。在MultiWOZ另外的实验数据集上，我们的数据流表示使得原本关闭的，现成的序列到序列模型相匹配的最好的现有特定任务状态跟踪模型。该SMCalFlow数据集和代码复制实验可在此HTTPS URL。

12. Multi-Pass Transformer for Machine Translation [PDF] 返回目录
Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux
Abstract: In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers. To maintain a directed acyclic graph structure, the encoder stack of a transformer is repeated along a new multi-pass dimension, keeping the parameters tied, and information is allowed to proceed unidirectionally both towards deeper layers within an encoder stack and towards any layer of subsequent stacks. We consider both soft (i.e., continuous) and hard (i.e., discrete) connections between parallel encoder stacks, relying on a neural architecture search to find the best connection pattern in the hard case. We perform an extensive ablation study of the proposed MPT architecture and compare it with other state-of-the-art transformer architectures. Surprisingly, Base Transformer equipped with MPT can surpass the performance of Large Transformer on the challenging machine translation En-De and En-Fr datasets. In the hard connection case, the optimal connection pattern found for En-De also leads to improved performance for En-Fr.
摘要：与在信息只朝一个堆栈的更深的层流先前的方法相反，我们考虑其中较早的层被允许处理信息在光后层的输出的多遍变压器（MPT）架构。为了保持向无环图的结构，沿新的多通尺寸重复变压器的编码器堆，保持参数并列，并且信息被允许编码器叠层内并且朝向后续的任何层单向既朝向更深的层进行栈。我们认为这两个软（即，连续的）和硬（即，离散的）连接并行编码器堆叠之间，依靠神经结构搜索以找到在硬壳最好连接图案。我们执行所提出的MPT架构的广泛消融研究，它与国家的最先进的变压器等比较架构。出人意料的是，基地变压器，其具备MPT能超越的挑战机器翻译恩德和恩神父的数据集大型变压器的性能。在硬连接的情况下，最佳的连接模式发现恩德也导致对恩神父提高性能。

13. ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ) [PDF] 返回目录
Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeff Dalton, Mikhail Burtsev
Abstract: This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some user requests might be ambiguous. In IR settings such a situation is handled mainly thought the diversification of the search result page. It is however much more challenging in dialogue settings with limited bandwidth. Therefore, in this challenge, we provide a common evaluation framework to evaluate mixed-initiative conversations. Participants are asked to rank clarifying questions in an information-seeking conversations. The challenge is organized in two stages where in Stage 1 we evaluate the submissions in an offline setting and single-turn conversations. Top participants of Stage 1 get the chance to have their model tested by human annotators.
摘要：本文主要描述的挑战是澄清对话系统问题（ClariQ）的详细介绍。面临的挑战是组织为在2020年在搜索面向会话AI（SCAI）EMNLP车间的对话AI挑战系列（ConvAI3）的一部分，该对话系统的主要目的是为了响应用户的请求返回一个合适的答案。然而，一些用户请求可能是不明确的。在IR设置这种情况下的处理主要是想在搜索结果页面的多样化。然而更多在带宽有限的对话设置的挑战。因此，在这个挑战中，我们提供了一个通用评估框架，评估混合主动交谈。参与者被要求在寻求信息的谈话排名澄清的问题。我们面临的挑战是在在第一阶段，我们评估的脱机设置和单圈谈话提交两个阶段组织。第1阶段的顶级参与者得到了他们的模型由人工注释测试的机会。

14. The importance of fillers for text representations of speech transcripts [PDF] 返回目录
Tanvi Dinkar, Pierre Colombo, Matthieu Labeau, Chloé Clavel
Abstract: While being an essential component of spoken language, fillers (e.g."um" or "uh") often remain overlooked in Spoken Language Understanding (SLU) tasks. We explore the possibility of representing them with deep contextualised embeddings, showing improvements on modelling spoken language and two downstream tasks - predicting a speaker's stance and expressed confidence.
摘要：虽然是口语，填料的重要组成部分（例如，“嗯”或“嗯”）往往停留在口语理解（SLU）任务忽视。我们探讨深contextualised的嵌入代表他们，表现出对造型语言和两个下行任务改进的可能性 - 预测扬声器的立场，表示有信心。

15. Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining [PDF] 返回目录
Ananya B. Sai, Akash Kumar Mohankumar, Siddhartha Arora, Mitesh M. Khapra
Abstract: There is an increasing focus on model-based dialog evaluation metrics such as ADEM, RUBER, and the more recent BERT-based metrics. These models aim to assign a high score to all relevant responses and a low score to all irrelevant responses. Ideally, such models should be trained using multiple relevant and irrelevant responses for any given context. However, no such data is publicly available, and hence existing models are usually trained using a single relevant response and multiple randomly selected responses from other contexts (random negatives). To allow for better training and robust evaluation of model-based metrics, we introduce the DailyDialog++ dataset, consisting of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context. Using this dataset, we first show that even in the presence of multiple correct references, n-gram based metrics and embedding based metrics do not perform well at separating relevant responses from even random negatives. While model-based metrics perform better than n-gram and embedding based metrics on random negatives, their performance drops substantially when evaluated on adversarial examples. To check if large scale pretraining could help, we propose a new BERT-based evaluation metric called DEB, which is pretrained on 727M Reddit conversations and then finetuned on our dataset. DEB significantly outperforms existing models, showing better correlation with human judgements and better performance on random negatives (88.27% accuracy). However, its performance again drops substantially, when evaluated on adversarial responses, thereby highlighting that even large-scale pretrained evaluation models are not robust to the adversarial examples in our dataset. The dataset and code are publicly available.
摘要：人们越来越注重基于模型的对话的评价指标，如ADEM，RUBER，以及最近的基于BERT的指标。这些模型的目标是分配高分到所有相关的答复和低分到所有无关的反应。理想情况下，这种模式应使用特定环境下多个相关和不相关的反应进行培训。但是，没有这样的数据是可公开获得的，并使用单个相关响应和来自其它上下文（随机阴性）多个随机选择的响应，因此现有的模型通常训练。为了能够更好的培训和基于模型的指标强有力的评估，我们引入DailyDialog ++数据集，包括为每个上下文及（ii）五个adversarially制作无关的反应为每个上下文（I）的五个关联响应。使用这个数据集，我们首先表明，即使在多个正确引用的情况下，正克基于度量和基于嵌入在指标甚至从随机的底片分离相关的反应表现不好。虽然基于模型的度量大于n-gram中表现得更好，并在随机底片嵌入基于度量，在对抗实例评估时它们的性能显着下降。要检查是否大规模训练前可以帮助，我们提出名为DEB一个新的基于BERT评价指标，这是预先训练上727M reddit的对话，然后微调，对我们的数据。 DEB显著优于现有的模式，呈现出人类的判断和随机的底片更好的表现（88.27％精确度）更好的相关性。然而，它的性能再次大幅度下降，在对抗性反应进行评估时，从而突出，即使是大型预训练的评价模型是不稳健，在我们的数据集对抗的例子。该数据集和代码是公开的。

16. AliMe KG: Domain Knowledge Graph Construction and Application in E-commerce [PDF] 返回目录
Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, Haiqing Chen
Abstract: Pre-sales customer service is of importance to E-commerce platforms as it contributes to optimizing customers' buying process. To better serve users, we propose AliMe KG, a domain knowledge graph in E-commerce that captures user problems, points of interests (POI), item information and relations thereof. It helps to understand user needs, answer pre-sales questions and generate explanation texts. We applied AliMe KG to several online business scenarios such as shopping guide, question answering over properties and recommendation reason generation, and gained positive results. In the paper, we systematically introduce how we construct domain knowledge graph from free text, and demonstrate its business value with several applications. Our experience shows that mining structured knowledge from free text in vertical domain is practicable, and can be of substantial value in industrial settings.
摘要：售前的客户服务是重要的电子商务平台，因为它有助于优化客户的购买过程。为更好的服务用户，我们建议AliMe KG，电子商务域知识图捕获用户的问题，利益（POI），项目信息和关系及其分。它有助于了解用户需求，解答售前问题，并生成解释文本。我们应用AliMe KG多款在线业务方案，如导购，答疑在性能和推荐理由产生，并取得了积极的成果。在本文中，我们系统介绍我们如何构建从自由文本领域知识图，并与多家应用展示其商业价值。我们的经验表明，开采从垂直领域自由文本结构的知识是可行的，并且可以在工业环境重大价值。

17. CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation [PDF] 返回目录
Xinping Liu, Zehong Cao, Son Tran
Abstract: Word embeddings can reflect the semantic representations, and the embedding qualities can be comprehensively evaluated with human natural reading-related cognitive data sources. In this paper, we proposed the CogniFNN framework, which is the first attempt at using fuzzy neural networks to extract non-linear and non-stationary characteristics for evaluations of English word embeddings against the corresponding cognitive datasets. In our experiment, we used 15 human cognitive datasets across three modalities: EEG, fMRI, and eye-tracking, and selected the mean square error and multiple hypotheses testing as metrics to evaluate our proposed CogniFNN framework. Compared to the recent pioneer framework, our proposed CogniFNN showed smaller prediction errors of both context-independent (GloVe) and context-sensitive (BERT) word embeddings, and achieved higher significant ratios with randomly generated word embeddings. Our findings suggested that the CogniFNN framework could provide a more accurate and comprehensive evaluation of cognitive word embeddings. It will potentially be beneficial to the further word embeddings evaluation on extrinsic natural language processing tasks.
摘要：Word中的嵌入能反映语义表示，和嵌入质量可与人类自然阅读相关的认知数据源进行综合评价。在本文中，我们提出了CogniFNN框架，这是在用模糊神经网络来提取对相应认知数据集的英文单词的嵌入的评估非线性和非平稳特征的第一次尝试。在我们的实验中，我们使用15个人的认知数据集在三个模式：脑电图，磁共振成像，和眼睛跟踪，并选择了均方误差和多假设检验的指标来评估我们提出的CogniFNN框架。相比近期先锋框架下，我们提出的CogniFNN既表现出独立于上下文的（手套）和上下文敏感的（BERT）字的嵌入更小的预测误差，并取得了与随机生成字的嵌入更高显著比例。我们的研究结果建议，CogniFNN框架能够提供认知单词的嵌入更准确和全面的评价。这将有可能是进一步的嵌入字上外在的自然语言处理任务的评价是有益的。

18. Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning [PDF] 返回目录
Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Abstract: The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.
摘要：我们用于检测和声学场景分类的任务6（自动化音频字幕）和事件（DCASE）该系统2020挑战结合了三个要素，即dataaugmentation，多任务学习，和后处理，audiocaptioning。该系统获得了最高的评价分数，butwhich的各个元素最充分地促成了其perfor - 曼斯的尚未澄清。在这里，以评估他们的贡献，我们首先对我们的systemto估计到什么程度每个元素都是有效的进行逐元素消融研究。然后，我们CON-涵道的详细模块明智消融研究以进一步澄清用于提高精度thekey处理模块。 showthat数据增强和结果显著后处理improvethe得分在我们的系统。特别地，混合式数据augmentationand在后处理波束搜索由0.8和1.6points分别提高蜘蛛。

19. Structure Aware Negative Sampling in Knowledge Graphs [PDF] 返回目录
Kian Ahrabian, Aarash Feizi, Yasmin Salehi, William L. Hamilton, Avishek Joey Bose
Abstract: Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds high-quality negatives that are highly competitive with SOTA methods, and requires no additional parameters nor difficult adversarial optimization.
摘要：学习的低维表示使用对比估计实体和知识图的关系表示推断连接模式的可扩展性和有效的方法。对比学习的一个重要方面接近是产生硬阴性样品，这迫使嵌入模型学习辨别表示，发现观测数据的关键特性腐败分布的选择。虽然以前的方法要么雇用太简单了腐败分布，即均匀，容易产生消极不提供信息或复杂的敌对分布具有挑战性的优化方案，他们没有明确地将已知图形结构导致次优的负面影响。在本文中，我们提出了结构感知负采样（SANS），通过从一个节点的K-跳邻居选择阴性样品利用丰富的图形结构的廉价的负采样策略。根据经验，我们证明了SANS发现高质量的底片是高度竞争与SOTA方法，不需要额外的参数，也难以对抗优化。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-09-25

目录

摘要