摘要

1. Nested Named Entity Recognition with Partially-Observed TreeCRFs [PDF] 返回目录
Yao Fu, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang
Abstract: Named entity recognition (NER) is a well-studied task in natural language processing. However, the widely-used sequence labeling framework is difficult to detect entities with nested structures. In this work, we view nested NER as constituency parsing with partially-observed trees and model it with partially-observed TreeCRFs. Specifically, we view all labeled entity spans as observed nodes in a constituency tree, and other spans as latent nodes. With the TreeCRF we achieve a uniform way to jointly model the observed and the latent nodes. To compute the probability of partial trees with partial marginalization, we propose a variant of the Inside algorithm, the \textsc{Masked Inside} algorithm, that supports different inference operations for different nodes (evaluation for the observed, marginalization for the latent, and rejection for nodes incompatible with the observed) with efficient parallelized implementation, thus significantly speeding up training and inference. Experiments show that our approach achieves the state-of-the-art (SOTA) F1 scores on the ACE2004, ACE2005 dataset, and shows comparable performance to SOTA models on the GENIA dataset. Our approach is implemented at: \url{this https URL}.
摘要：命名实体识别（NER）是自然语言处理中经过充分研究的任务。但是，广泛使用的序列标记框架很难检测具有嵌套结构的实体。在这项工作中，我们将嵌套的NER视为使用部分观察的树进行解析的选区，并使用部分观察的TreeCRF对其进行建模。具体来说，我们将所有标记的实体范围视为选区树中的观察节点，而将其他范围视为潜在节点。借助TreeCRF，我们实现了一种统一的方式来对观察节点和潜在节点进行联合建模。为了计算具有部分边缘化的部分树的概率，我们提出了Inside算法的一种变体\ textsc {Masked Inside}算法，该算法支持针对不同节点的不同推理操作（观察值的评估，潜伏的边缘化和拒绝对于与观察到的不兼容的节点）具有有效的并行实现，从而显着加快了训练和推理速度。实验表明，我们的方法在ACE2004，ACE2005数据集上达到了最先进的（SOTA）F1分数，并在GENIA数据集上显示了与SOTA模型相当的性能。我们的方法在以下位置实现：\ url {此https URL}。

2. Multi-Aspect Sentiment Analysis with Latent Sentiment-Aspect Attribution [PDF] 返回目录
Yifan Zhang, Fan Yang, Marjan Hosseinia, Arjun Mukherjee
Abstract: In this paper, we introduce a new framework called the sentiment-aspect attribution module (SAAM). SAAM works on top of traditional neural networks and is designed to address the problem of multi-aspect sentiment classification and sentiment regression. The framework works by exploiting the correlations between sentence-level embedding features and variations of document-level aspect rating scores. We demonstrate several variations of our framework on top of CNN and RNN based models. Experiments on a hotel review dataset and a beer review dataset have shown SAAM can improve sentiment analysis performance over corresponding base models. Moreover, because of the way our framework intuitively combines sentence-level scores into document-level scores, it is able to provide a deeper insight into data (e.g., semi-supervised sentence aspect labeling). Hence, we end the paper with a detailed analysis that shows the potential of our models for other applications such as sentiment snippet extraction.
摘要：在本文中，我们介绍了一个新的框架，称为情感方面归因模块（SAAM）。 SAAM在传统神经网络的基础上工作，旨在解决多方面情感分类和情感回归的问题。该框架通过利用句子级嵌入功能和文档级方面评分得分变化之间的相关性来工作。我们在基于CNN和RNN的模型之上展示了我们框架的几种变体。在酒店评论数据集和啤酒评论数据集上进行的实验表明，相比于相应的基本模型，SAAM可以提高情感分析性能。此外，由于我们的框架以直观的方式将句子级别的分数组合为文档级别的分数，因此它能够提供对数据的更深入的了解（例如，半监督的句子方面标签）。因此，我们以详细的分析结束本文，该分析显示了我们的模型在其他应用（例如情感代码段提取）中的潜力。

3. Modeling Homophone Noise for Robust Neural Machine Translation [PDF] 返回目录
Wenjie Qin, Xiang Li, Yuhui Sun, Deyi Xiong, Jianwei Cui, Bin Wang
Abstract: In this paper, we propose a robust neural machine translation (NMT) framework. The framework consists of a homophone noise detector and a syllable-aware NMT model to homophone errors. The detector identifies potential homophone errors in a textual sentence and converts them into syllables to form a mixed sequence that is then fed into the syllable-aware NMT. Extensive experiments on Chinese->English translation demonstrate that our proposed method not only significantly outperforms baselines on noisy test sets with homophone noise, but also achieves a substantial improvement on clean text.
摘要：在本文中，我们提出了一个健壮的神经机器翻译（NMT）框架。该框架包括一个同音电话噪声检测器和一个针对同音电话错误的音节感知NMT模型。检测器识别出文本句子中可能存在的同音异义词错误，并将其转换为音节以形成混合序列，然后将其馈入可识别音节的NMT中。大量的汉英翻译实验表明，我们提出的方法不仅在同音噪声的嘈杂测试集上明显优于基线，而且在纯文本方面也有了实质性的改进。

4. Keyword-Guided Neural Conversational Model [PDF] 返回目录
Peixiang Zhong, Yong Liu, Hao Wang, Chunyan Miao
Abstract: We study the problem of imposing conversational goals/keywords on open-domain conversational agents, where the agent is required to lead the conversation to a target keyword smoothly and fast. Solving this problem enables the application of conversational agents in many real-world scenarios, e.g., recommendation and psychotherapy. The dominant paradigm for tackling this problem is to 1) train a next-turn keyword classifier, and 2) train a keyword-augmented response retrieval model. However, existing approaches in this paradigm have two limitations: 1) the training and evaluation datasets for next-turn keyword classification are directly extracted from conversations without human annotations, thus, they are noisy and have low correlation with human judgements, and 2) during keyword transition, the agents solely rely on the similarities between word embeddings to move closer to the target keyword, which may not reflect how humans converse. In this paper, we assume that human conversations are grounded on commonsense and propose a keyword-guided neural conversational model that can leverage external commonsense knowledge graphs (CKG) for both keyword transition and response retrieval. Automatic evaluations suggest that commonsense improves the performance of both next-turn keyword prediction and keyword-augmented response retrieval. In addition, both self-play and human evaluations show that our model produces responses with smoother keyword transition and reaches the target keyword faster than competitive baselines.
摘要：我们研究了在开放域会话代理上施加会话目标/关键字的问题，在该代理中，需要代理将会话平稳快速地引导至目标关键字。解决该问题使得对话代理可以在许多实际场景中应用，例如推荐和心理治疗。解决此问题的主要范例是：1）训练下一轮关键字分类器，以及2）训练关键字增强的响应检索模型。但是，该范例中的现有方法有两个局限性：1）下一轮关键字分类的训练和评估数据集是直接从没有人注解的对话中提取的，因此它们嘈杂且与人的判断力相关性较低； 2）关键字过渡时，主体仅依靠词嵌入之间的相似性来更接近目标关键字，而这可能无法反映人类的交谈方式。在本文中，我们假设人类对话是基于常识的，并提出了一种关键字指导的神经对话模型，该模型可以利用外部常识知识图（CKG）来进行关键字转换和响应检索。自动评估表明，常识可以提高下一轮关键字预测和关键字增强响应检索的性能。此外，自我评估和人类评估都表明，我们的模型产生的响应具有更平滑的关键字过渡，并且比竞争基准更快地到达了目标关键字。

5. CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts [PDF] 返回目录
Peixiang Zhong, Di Wang, Pengfei Li, Chen Zhang, Hao Wang, Chunyan Miao
Abstract: Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we hypothesize that combining rationality and emotion into conversational agents can improve response quality. To test the hypothesis, we focus on one fundamental aspect of rationality, i.e., commonsense, and propose CARE, a novel model for commonsense-aware emotional response generation. Specifically, we first propose a framework to learn and construct commonsense-aware emotional latent concepts of the response given an input message and a desired emotion. We then propose three methods to collaboratively incorporate the latent concepts into response generation. Experimental results on two large-scale datasets support our hypothesis and show that our model can produce more accurate and commonsense-aware emotional responses and achieve better human ratings than state-of-the-art models that only specialize in one aspect.
摘要：理性和情感是人类的两个基本要素。赋予代理商理性和情感是AI的主要里程碑之一。但是，在对话式AI领域中，大多数现有模型仅专注于一个方面而忽略了另一方面，这通常会导致反应迟钝或不相关。在本文中，我们假设将理性和情感结合到对话主体中可以提高响应质量。为了检验该假设，我们集中于理性的一个基本方面，即常识，并提出了CARE，这是一种用于产生常识的情绪反应生成的新颖模型。具体来说，我们首先提出一个框架，用于学习和构造给定输入消息和所需情感的常识感知情感潜在概念。然后，我们提出了三种将协作性潜在概念整合到响应生成中的方法。在两个大型数据集上的实验结果支持了我们的假设，并表明与仅专注于一个方面的最新模型相比，我们的模型可以产生更准确和常识性的情感反应，并获得更好的人类评级。

6. Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution [PDF] 返回目录
Ron Zhu
Abstract: Hateful meme detection is a new research area recently brought out that requires both visual, linguistic understanding of the meme and some background knowledge to performing well on the task. This technical report summarises the first place solution of the Hateful Meme Detection Challenge 2020, which extending state-of-the-art visual-linguistic transformers to tackle this problem. At the end of the report, we also point out the shortcomings and possible directions for improving the current methodology.
摘要：讨厌的模因检测是最近出现的一个新研究领域，它要求对模因有视觉，语言方面的理解，并需要一些背景知识才能很好地完成任务。本技术报告总结了2020年仇恨模因检测挑战赛的首要解决方案，该解决方案扩展了最先进的视觉语言转换器以解决此问题。在报告的结尾，我们还指出了改进当前方法的缺点和可能的方向。

7. Efficient Clustering from Distributions over Topics [PDF] 返回目录
Carlos Badenes-Olmedo, Jose-Luis Redondo García, Oscar Corcho
Abstract: There are many scenarios where we may want to find pairs of textually similar documents in a large corpus (e.g. a researcher doing literature review, or an R&D project manager analyzing project proposals). To programmatically discover those connections can help experts to achieve those goals, but brute-force pairwise comparisons are not computationally adequate when the size of the document corpus is too large. Some algorithms in the literature divide the search space into regions containing potentially similar documents, which are later processed separately from the rest in order to reduce the number of pairs compared. However, this kind of unsupervised methods still incur in high temporal costs. In this paper, we present an approach that relies on the results of a topic modeling algorithm over the documents in a collection, as a means to identify smaller subsets of documents where the similarity function can then be computed. This approach has proved to obtain promising results when identifying similar documents in the domain of scientific publications. We have compared our approach against state of the art clustering techniques and with different configurations for the topic modeling algorithm. Results suggest that our approach outperforms (> 0.5) the other analyzed techniques in terms of efficiency.
摘要：在很多情况下，我们可能希望在大型语料库中找到成对的文本相似的文档（例如，进行文献审查的研究人员或分析项目建议的研发项目经理）。以编程方式发现这些连接可以帮助专家实现这些目标，但是当文档语料库的大小太大时，强行成对比较在计算上是不够的。文献中的某些算法将搜索空间划分为包含潜在相似文档的区域，这些文档随后与其余文档分开进行处理，以减少比较的对数。但是，这种无监督的方法仍然会招致高昂的时间成本。在本文中，我们提出了一种基于主题建模算法对集合文档的结果的方法，该方法可用来识别可计算相似性函数的较小文档子集。在鉴定科学出版物领域的相似文献时，已证明该方法可获得可喜的结果。我们已经将我们的方法与最新的聚类技术以及主题建模算法的不同配置进行了比较。结果表明，就效率而言，我们的方法优于其他分析技术（> 0.5）。

8. Learning to Check Contract Inconsistencies [PDF] 返回目录
Shuo Zhang, Junzhou Zhao, Pinghui Wang, Nuo Xu, Yang Yang, Yiting Liu, Yi Huang, Junlan Feng
Abstract: Contract consistency is important in ensuring the legal validity of the contract. In many scenarios, a contract is written by filling the blanks in a precompiled form. Due to carelessness, two blanks that should be filled with the same (or different)content may be incorrectly filled with different (or same) content. This will result in the issue of contract inconsistencies, which may severely impair the legal validity of the contract. Traditional methods to address this issue mainly rely on manual contract review, which is labor-intensive and costly. In this work, we formulate a novel Contract Inconsistency Checking (CIC) problem, and design an end-to-end framework, called Pair-wise Blank Resolution (PBR), to solve the CIC problem with high accuracy. Our PBR model contains a novel BlankCoder to address the challenge of modeling meaningless blanks. BlankCoder adopts a two-stage attention mechanism that adequately associates a meaningless blank with its relevant descriptions while avoiding the incorporation of irrelevant context words. Experiments conducted on real-world datasets show the promising performance of our method with a balanced accuracy of 94.05% and an F1 score of 90.90% in the CIC problem.
摘要：合同一致性对于确保合同的法律效力至关重要。在许多情况下，通过以预编译形式填充空格来写合同。由于粗心大意，两个本应填充相同（或不同）内容的空白可能会错误地填充不同（或相同）内容。这将导致合同不一致的问题，这可能会严重损害合同的法律效力。解决此问题的传统方法主要依靠人工合同审查，这是劳动密集型且成本高昂的。在这项工作中，我们制定了一个新颖的合同不一致检查（CIC）问题，并设计了一个称为成对空白解决方案（PBR）的端到端框架，以高精度解决CIC问题。我们的PBR模型包含一个新颖的BlankCoder，以解决建模无意义的坯料的挑战。 BlankCoder采用两阶段注意机制，该机制将无意义的空白与其相关的描述充分关联，同时避免合并无关的上下文词。在真实数据集上进行的实验表明，我们的方法具有令人满意的性能，在CIC问题中的平衡精度为94.05％，F1得分为90.90％。

9. A Response Retrieval Approach for Dialogue Using a Multi-Attentive Transformer [PDF] 返回目录
Matteo A. Senese, Alberto Benincasa, Barbara Caputo, Giuseppe Rizzo
Abstract: This paper presents our work for the ninth edition of the Dialogue System Technology Challenge (DSTC9). Our solution addresses the track number four: Simulated Interactive MultiModal Conversations. The task consists in providing an algorithm able to simulate a shopping assistant that supports the user with his/her requests. We address the task of response retrieval, that is the task of retrieving the most appropriate agent response from a pool of response candidates. Our approach makes use of a neural architecture based on transformer with a multi-attentive structure that conditions the response of the agent on the request made by the user and on the product the user is referring to. Final experiments on the SIMMC Fashion Dataset show that our approach achieves the second best scores on all the retrieval metrics defined by the organizers. The source code is available at this https URL.
摘要：本文介绍了我们在第九版对话系统技术挑战赛（DSTC9）中的工作。我们的解决方案解决了第四条难题：模拟交互式多模态对话。任务在于提供一种算法，该算法能够模拟购物助手，以支持用户的请求。我们解决了响应检索的任务，即从响应候选者池中检索最合适的代理响应的任务。我们的方法利用了基于变压器的神经架构，该变压器具有多关注结构，可根据用户的请求以及用户所指的产品来确定代理的响应。在SIMMC时尚数据集上进行的最终实验表明，我们的方法在组织者定义的所有检索指标上均获得了第二好的成绩。源代码可从此https URL获得。

10. Writing Polishment with Simile: Task, Dataset and A Neural Approach [PDF] 返回目录
Jiayi Zhang, Zhi Cui, Xiaoqiang Xia, Yalong Guo, Yanran Li, Chen Wei, Jianwei Cui
Abstract: A simile is a figure of speech that directly makes a comparison, showing similarities between two different things, e.g. "Reading papers can be dull sometimes,like watching grass grow". Human writers often interpolate appropriate similes into proper locations of the plain text to vivify their writings. However, none of existing work has explored neural simile interpolation, including both locating and generation. In this paper, we propose a new task of Writing Polishment with Simile (WPS) to investigate whether machines are able to polish texts with similes as we human do. Accordingly, we design a two-staged Locate&Gen model based on transformer architecture. Our model firstly locates where the simile interpolation should happen, and then generates a location-specific simile. We also release a large-scale Chinese Simile (CS) dataset containing 5 million similes with context. The experimental results demonstrate the feasibility of WPS task and shed light on the future research directions towards better automatic text polishment.
摘要：明喻是直接进行比较的修辞格，显示了两个不同事物之间的相似性，例如“有时看书就像看草丛生的书一样枯燥乏味”。人类作家经常将适当的比喻插入到纯文本的适当位置中，以使自己的作品充满生气。但是，现有的工作都没有探索神经明喻插值，包括定位和生成。在本文中，我们提出了一项新的任务，即用明文书写波兰文（WPS），以研究机器是否能够像人类一样使用明文波兰语。因此，我们设计了基于变压器架构的两阶段Locate＆Gen模型。我们的模型首先确定明喻插值发生的位置，然后生成特定位置的明喻。我们还发布了包含上下文的500万个明喻的大规模中文明喻（CS）数据集。实验结果证明了WPS任务的可行性，并为未来更好的自动文本修饰提供了研究方向。

11. Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data [PDF] 返回目录
Nick Altieri, Briton Park, Mara Olson, John DeNero, Anobel Odisho, Bin Yu
Abstract: Precision medicine has the potential to revolutionize healthcare, but much of the data for patients is locked away in unstructured free-text, limiting research and delivery of effective personalized treatments. Generating large annotated datasets for information extraction from clinical notes is often challenging and expensive due to the high level of expertise needed for high quality annotations. To enable natural language processing for small dataset sizes, we develop a novel enriched hierarchical annotation scheme and algorithm, Supervised Line Attention (SLA), and apply this algorithm to predicting categorical tumor attributes from kidney and colon cancer pathology reports from the University of California San Francisco (UCSF). Whereas previous work only annotated document level labels, we in addition ask the annotators to enrich the traditional label by asking them to also highlight the relevant line or potentially lines for the final label, which leads to a 20% increase of annotation time required per document. With the enriched annotations, we develop a simple and interpretable machine learning algorithm that first predicts the relevant lines in the document and then predicts the tumor attribute. Our results show across the small dataset sizes of 32, 64, 128, and 186 labeled documents per cancer, SLA only requires half the number of labeled documents as state-of-the-art methods to achieve similar or better micro-f1 and macro-f1 scores for the vast majority of comparisons that we made. Accounting for the increased annotation time, this leads to a 40% reduction in total annotation time over the state of the art.
摘要：精密医学具有革新医疗保健的潜力，但患者的许多数据被锁定在非结构化的自由文本中，从而限制了有效个性化治疗的研究和提供。由于需要高质量的注释，因此从临床注释中提取大型注释数据集以提取信息通常是具有挑战性和昂贵的。为了能够对较小的数据集进行自然语言处理，我们开发了一种新颖的，丰富的层次化注释方案和算法“监督行注意（SLA）”，并将该算法应用于根据加州大学圣保罗分校的肾脏和结肠癌病理报告预测分类肿瘤属性旧金山（UCSF）。鉴于先前的工作仅对文档级别的标签进行了注释，我们还要求注释者通过要求他们还突出显示最终标签的相关行或潜在行来丰富传统标签，从而使每个文档所需的注释时间增加20％。利用丰富的注释，我们开发了一种简单且可解释的机器学习算法，该算法首先预测文档中的相关行，然后预测肿瘤属性。我们的结果表明，在每种癌症的32、64、128和186个标记文档的小型数据集中，SLA仅需要标记文档数量的一半即可作为最先进的方法来实现相似或更好的micro-f1和macro -f1为我们进行的绝大多数比较评分。考虑到增加的注释时间，这导致与现有技术相比，总注释时间减少了40％。

12. Traditional IR rivals neural models on the MS~MARCO Document Ranking Leaderboard [PDF] 返回目录
Leonid Boytsov
Abstract: This short document describes a traditional IR system that achieved MRR@100 equal to 0.298 on the MS MARCO Document Ranking leaderboard (on 2020-12-06). Although inferior to most BERT-based models, it outperformed several neural runs (as well as all non-neural ones), including two submissions that used a large pretrained Transformer model for re-ranking. We provide software and data to reproduce our results.
摘要：本简短文档描述了一种传统的IR系统，该系统在MS MARCO Document Rank排行榜上（在2020-12-06）达到了MRR @ 100等于0.298。尽管它不如大多数基于BERT的模型，但它胜过几个神经运行（以及所有非神经运行），包括两个使用大型预训练Transformer模型进行重新排名的提交。我们提供软件和数据来重现我们的结果。

13. Primer AI's Systems for Acronym Identification and Disambiguation [PDF] 返回目录
Nicholas Egan, John Bohannon
Abstract: The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.
摘要：首字母缩写词的泛滥使科学文献难以为人类和机器所理解，因此需要一种能够自动识别文本中的首字母缩写词并消除歧义的模型。我们引入了用于首字母缩略词识别和歧义消除的新方法：我们的首字母缩写词识别模型将学习到的令牌嵌入投影到标签预测上，而我们的首字母缩略词歧义模型找到具有相似句子嵌入的训练示例作为测试示例。我们的两个系统都比以前建议的方法获得了显着的性能提升，并且在SDU @ AAAI-21共享任务排行榜上具有竞争力。我们的模型在新的远程监督数据集上进行了部分训练，以完成这些任务，这些任务称为AuxAI和AuxAD。我们还确定了SciAD数据集中的重复冲突问题，并形成了SciAD的重复数据删除版本，我们称之为SciAD-dedupe。我们公开发布了所有这三个数据集，并希望它们能帮助社区在科学文献理解方面取得更大的进步。

14. Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision [PDF] 返回目录
Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi
Abstract: The black-box nature of neural models has motivated a line of research that aims to generate natural language rationales to explain why a model made certain predictions. Such rationale generation models, to date, have been trained on dataset-specific crowdsourced rationales, but this approach is costly and is not generalizable to new tasks and domains. In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales. We investigate multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and train generative models capable of composing explanatory rationales for unseen instances. We demonstrate our approach on the defeasible inference task, a nonmonotonic reasoning task in which an inference may be strengthened or weakened when new information (an update) is introduced. Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information, however, it mostly generates trivial rationales reflecting the fundamental limitations of neural language models. Conversely, the more realistic setup of jointly predicting the update or its type and generating rationale is more challenging, suggesting an important future direction.
摘要：神经模型的黑匣子性质激发了一系列研究，旨在产生自然语言原理，以解释模型为何做出某些预测。迄今为止，这种基本原理生成模型已经针对特定于数据集的众包基本原理进行了培训，但是这种方法成本高昂，并且不能推广到新的任务和领域。在本文中，我们研究了神经模型在多大程度上可以依靠自然语言推理来解释模型预测，而这种推理仅依赖于远程监督，而无需为人工编写的推理增加任何注释成本。我们研究了使用预训练的语言模型，神经知识模型和对相关任务的远程监督来自动生成基本原理的多种方法，并训练了能够为看不见的实例构成基本解释的生成模型。我们演示了关于可废止推理任务的一种方法，这是一种非单调的推理任务，其中在引入新信息（更新）时可能会增强或减弱推理。我们的模型显示了产生事后推理的希望，解释了为什么在附加信息下推理或多或少的可能性，但是，它主要是生成了反映神经语言模型基本局限性的琐碎推理。相反，联合预测更新或其类型并生成基本原理的更为现实的设置更具挑战性，这表明了未来的重要方向。

15. Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings [PDF] 返回目录
Geetanjali Bihani, Julia Taylor Rayz
Abstract: Static word embeddings encode word associations, extensively utilized in downstream NLP tasks. Although prior studies have discussed the nature of such word associations in terms of biases and lexical regularities captured, the variation in word associations based on the embedding training procedure remains in obscurity. This work aims to address this gap by assessing attributive word associations across five different static word embedding architectures, analyzing the impact of the choice of the model architecture, context learning flavor and training corpora. Our approach utilizes a semi-supervised clustering method to cluster annotated proper nouns and adjectives, based on their word embedding features, revealing underlying attributive word associations formed in the embedding space, without introducing any confirmation bias. Our results reveal that the choice of the context learning flavor during embedding training (CBOW vs skip-gram) impacts the word association distinguishability and word embeddings' sensitivity to deviations in the training corpora. Moreover, it is empirically shown that even when trained over the same corpora, there is significant inter-model disparity and intra-model similarity in the encoded word associations across different word embedding models, portraying specific patterns in the way the embedding space is created for each embedding architecture.
摘要：静态词嵌入对词关联进行编码，在下游NLP任务中广泛使用。尽管先前的研究已经从偏见和所捕获的词汇规律性方面讨论了此类单词联想的性质，但是基于嵌入训练过程的单词联想的变化仍然不清楚。这项工作旨在通过评估五个不同的静态单词嵌入体系结构中的定语关联性，分析模型体系结构选择，上下文学习风格和训练语料库的影响，来解决这一差距。我们的方法利用半监督聚类方法对带注释的专有名词和形容词进行聚类，基于它们的词嵌入特征，揭示在嵌入空间中形成的潜在定性词联想，而不会引入任何确认偏差。我们的结果表明，在嵌入训练过程中选择上下文学习风格（CBOW与跳过语法）会影响单词联想的可区分性和单词嵌入对训练语料库中偏差的敏感性。此外，根据经验表明，即使在相同的语料库上进行训练，跨不同词嵌入模型的编码词关联中也存在显着的模型间差异和模型内相似性，以创建嵌入空间的方式描绘特定模式每个嵌入架构。

16. Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures [PDF] 返回目录
David Ding, Felix Hill, Adam Santoro, Matt Botvinick
Abstract: Neural networks have achieved success in a wide array of perceptual tasks, but it is often stated that they are incapable of solving tasks that require higher-level reasoning. Two new task domains, CLEVRER and CATER, have recently been developed to focus on reasoning, as opposed to perception, in the context of spatio-temporal interactions between objects. Initial experiments on these domains found that neuro-symbolic approaches, which couple a logic engine and language parser with a neural perceptual front-end, substantially outperform fully-learned distributed networks, a finding that was taken to support the above thesis. Here, we show on the contrary that a fully-learned neural network with the right inductive biases can perform substantially better than all previous neural-symbolic models on both of these tasks, particularly on questions that most emphasize reasoning over perception. Our model makes critical use of both self-attention and learned "soft" object-centric representations, as well as BERT-style semi-supervised predictive losses. These flexible biases allow our model to surpass the previous neuro-symbolic state-of-the-art using less than 60% of available labelled data. Together, these results refute the neuro-symbolic thesis laid out by previous work involving these datasets, and they provide evidence that neural networks can indeed learn to reason effectively about the causal, dynamic structure of physical events.
摘要：神经网络在各种各样的感知任务中都取得了成功，但是人们经常说神经网络无法解决需要更高层次推理的任务。最近开发了两个新的任务域，即CLEVRER和CATER，以在对象之间时空交互的上下文中着重于与感知相反的推理。在这些领域的初步实验发现，将逻辑引擎和语言解析器与神经感知前端结合使用的神经符号方法，其性能明显优于完全学习的分布式网络，这一发现被用来支持上述观点。在这里，我们相反地表明，在这两个任务上，尤其是在那些最着重于推理而不是感知的问题上，具有正确的归纳偏差的完全学习的神经网络可以比所有以前的神经符号模型表现更好。我们的模型充分利用了自我注意力和学习的“软”以对象为中心的表示形式，以及BERT样式的半监督预测损失。这些灵活的偏差允许我们的模型使用少于60％的可用标记数据来超越以前的神经符号最新技术水平。总之，这些结果驳斥了以前涉及这些数据集的工作提出的神经符号命题，并且它们提供了证据，表明神经网络确实可以学习有效地推理物理事件的因果关系，动态结构。

17. Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks [PDF] 返回目录
Cunchao Zhu, Muhao Chen, Changjun Fan, Guangquan Cheng, Yan Zhan
Abstract: Large knowledge graphs often grow to store temporal facts that model the dynamic relations or interactions of entities along the timeline. Since such temporal knowledge graphs often suffer from incompleteness, it is important to develop time-aware representation learning models that help to infer the missing temporal facts. While the temporal facts are typically evolving, it is observed that many facts often show a repeated pattern along the timeline, such as economic crises and diplomatic activities. This observation indicates that a model could potentially learn much from the known facts appeared in history. To this end, we propose a new representation learning model for temporal knowledge graphs, namely CyGNet, based on a novel timeaware copy-generation mechanism. CyGNet is not only able to predict future facts from the whole entity vocabulary, but also capable of identifying facts with repetition and accordingly predicting such future facts with reference to the known facts in the past. We evaluate the proposed method on the knowledge graph completion task using five benchmark datasets. Extensive experiments demonstrate the effectiveness of CyGNet for predicting future facts with repetition as well as de novo fact prediction.
摘要：大型知识图通常会增长以存储时间事实，这些时间事实对实体沿时间轴的动态关系或相互作用进行建模。由于此类时间知识图经常遭受不完整的困扰，因此开发有助于推断缺失的时间事实的时间感知表示学习模型非常重要。虽然时间事实通常在发展，但是可以观察到，许多事实通常在时间轴上显示出重复的模式，例如经济危机和外交活动。这种观察表明，一个模型可以从历史上出现的已知事实中学到很多东西。为此，我们基于一种新颖的时间感知拷贝生成机制，为时域知识图提出了一种新的表示学习模型，即CyGNet。 CyGNet不仅能够从整个实体词汇中预测未来的事实，而且能够通过重复识别事实，并参考过去的已知事实来预测此类未来的事实。我们使用五个基准数据集评估在知识图完成任务上提出的方法。大量的实验证明了CyGNet通过重复以及从头事实预测来预测未来事实的有效性。

18. QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification [PDF] 返回目录
Deepak Kumar, Nalin Kumar, Subhankar Mishra
Abstract: Hate speech, quite common in the age of social media, at times harmless but can also cause mental trauma to someone or even riots in communities. Image of a religious symbol with derogatory comment or video of a man abusing a particular community, all become hate speech with its every modality (such as text, image, and audio) contributing towards it. Models based on a particular modality of hate speech post on social media are not useful, rather, we need models like multi-modal fusion models that consider both image and text while classifying hate speech. Text-image fusion models are heavily parameterized, hence we propose a quaternion neural network-based model having additional fusion components for each pair of modalities. The model is tested on the MMHS150K twitter dataset for hate speech classification. The model shows an almost 75% reduction in parameters and also benefits us in terms of storage space and training time while being at par in terms of performance as compared to its real counterpart.
摘要：仇恨言论在社交媒体时代非常普遍，有时无害，但也会给某人造成精神创伤，甚至在社区引起骚乱。带有贬义评论的宗教符号图像或某个特定社区中某人的视频被贬义，它们全部变成仇恨言论，其各种形式（例如文本，图像和音频）都对它产生影响。基于社交媒体上特定的仇恨言论发布方式的模型没有用，相反，我们需要像多模式融合模型这样的模型，这些模型在对仇恨言论进行分类时同时考虑图像和文本。文本图像融合模型的参数设置很严格，因此我们提出了一种基于四元数神经网络的模型，其中每对模态都有附加的融合分量。该模型在MMHS150K twitter数据集上进行了测试，用于仇恨语音分类。该模型显示参数减少了将近75％，并且在存储空间和培训时间方面也给我们带来了好处，而在性能方面，与实际模型相比却相当。

19. *-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task [PDF] 返回目录
Dmitry Tsarkov, Tibor Tihon, Nathan Scales, Nikola Momchev, Danila Sinopalnikov, Nathanael Schärli
Abstract: We present *-CFQ ("star-CFQ"): a suite of large-scale datasets of varying scope based on the CFQ semantic parsing benchmark, designed for principled investigation of the scalability of machine learning systems in a realistic compositional task setting. Using this suite, we conduct a series of experiments investigating the ability of Transformers to benefit from increased training size under conditions of fixed computational cost. We show that compositional generalization remains a challenge at all training sizes, and we show that increasing the scope of natural language leads to consistently higher error rates, which are only partially offset by increased training data. We further show that while additional training data from a related domain improves the accuracy in data-starved situations, this improvement is limited and diminishes as the distance from the related domain to the target domain increases.
摘要：我们提出了* -CFQ（“ star-CFQ”）：基于CFQ语义解析基准的一套范围广泛的大型数据集，旨在对在实际的合成任务环境中机器学习系统的可伸缩性进行原则性研究。使用此套件，我们进行了一系列实验，研究了在固定计算成本的情况下，变形金刚能够从增加的训练规模中受益。我们表明，在所有训练规模上，合成泛化仍然是一个挑战，并且，我们表明，增加自然语言的范围会导致持续较高的错误率，而错误率仅会因增加的训练数据而被部分抵消。我们进一步表明，尽管来自相关领域的其他训练数据提高了数据匮乏情况下的准确性，但这种改进是有限的，并且随着从相关领域到目标领域的距离的增加而减小。

20. Generation of complex database queries and API calls from natural language utterances [PDF] 返回目录
Amol Kelkar, Nachiketa Rajpurohit, Utkarsh Mittal, Peter Relan
Abstract: Generating queries corresponding to natural language questions is a long standing problem. Traditional methods lack language flexibility, while newer sequence-to-sequence models require large amount of data. Schema-agnostic sequence-to-sequence models can be fine-tuned for a specific schema using a small dataset but these models have relatively low accuracy. We present a method that transforms the query generation problem into an intent classification and slot filling problem. This method can work using small datasets. For questions similar to the ones in the training dataset, it produces complex queries with high accuracy. For other questions, it can use a template-based approach or predict query pieces to construct the queries, still at a higher accuracy than sequence-to-sequence models. On a real-world dataset, a schema fine-tuned state-of-the-art generative model had 60\% exact match accuracy for the query generation task, while our method resulted in 92\% exact match accuracy.
摘要：生成与自然语言问题相对应的查询是一个长期存在的问题。传统方法缺乏语言灵活性，而较新的序列到序列模型需要大量数据。可以使用较小的数据集针对特定的方案微调与方案无关的序列到序列模型，但是这些模型的准确性相对较低。我们提出了一种将查询生成问题转换为意图分类和时隙填充问题的方法。此方法可以使用小型数据集工作。对于与训练数据集中的问题类似的问题，它会以很高的准确性生成复杂的查询。对于其他问题，它可以使用基于模板的方法或预测查询片段来构建查询，但其准确度仍高于序列到序列模型。在现实世界的数据集上，模式微调的最新生成模型对查询生成任务具有60％的精确匹配精度，而我们的方法产生了92％的精确匹配精度。

21. Relation-Aware Neighborhood Matching Model for Entity Alignment [PDF] 返回目录
Yao Zhu, Hongzhi Liu, Zhonghai Wu, Yingpeng Du
Abstract: Entity alignment which aims at linking entities with the same meaning from different knowledge graphs (KGs) is a vital step for knowledge fusion. Existing research focused on learning embeddings of entities by utilizing structural information of KGs for entity alignment. These methods can aggregate information from neighboring nodes but may also bring noise from neighbors. Most recently, several researchers attempted to compare neighboring nodes in pairs to enhance the entity alignment. However, they ignored the relations between entities which are also important for neighborhood matching. In addition, existing methods paid less attention to the positive interactions between the entity alignment and the relation alignment. To deal with these issues, we propose a novel Relation-aware Neighborhood Matching model named RNM for entity alignment. Specifically, we propose to utilize the neighborhood matching to enhance the entity alignment. Besides comparing neighbor nodes when matching neighborhood, we also try to explore useful information from the connected relations. Moreover, an iterative framework is designed to leverage the positive interactions between the entity alignment and the relation alignment in a semi-supervised manner. Experimental results on three real-world datasets demonstrate that the proposed model RNM performs better than state-of-the-art methods.
摘要：实体对齐旨在将来自不同知识图谱（KG）的具有相同含义的实体链接在一起，这是知识融合的重要步骤。现有研究集中在通过利用KG的结构信息进行实体对准来学习实体的嵌入。这些方法可以聚合来自相邻节点的信息，但也可能带来来自相邻节点的噪声。最近，一些研究人员试图成对比较相邻节点以增强实体对齐。但是，他们忽略了实体之间的关系，这对于邻域匹配也很重要。此外，现有方法很少关注实体对齐方式和关系对齐方式之间的积极相互作用。为了解决这些问题，我们提出了一种新的关系感知邻居匹配模型，称为RNM，用于实体对齐。具体来说，我们建议利用邻域匹配来增强实体对齐。除了在匹配邻居时比较邻居节点，我们还尝试从连接关系中探索有用的信息。此外，设计了一个迭代框架，以半监督的方式利用实体对齐方式和关系对齐方式之间的积极互动。在三个真实数据集上的实验结果表明，所提出的RNM模型的性能优于最新方法。

22. A review of on-device fully neural end-to-end automatic speech recognition algorithms [PDF] 返回目录
Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han
Abstract: In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms, therefore their on-device implementation has become feasible. In this paper, we review such end-to-end speech recognition models. We extensively discuss their structures, performance, and advantages compared to conventional algorithms.
摘要：在本文中，我们回顾了各种端到端自动语音识别算法及其在设备上应用的优化技术。常规的语音识别系统包括大量离散的组件，例如声学模型，语言模型，发音模型，文本规范化器，逆文本规范化器，基于加权有限状态换能器（WFST）的解码器，以及以此类推。为了利用这样的常规语音识别系统获得足够高的语音识别精度，通常需要非常大的语言模型（最大100 GB）。因此，相应的WFST大小变得巨大，这禁止了它们在设备上的实现。近来，已经提出了完全神经网络的端到端语音识别算法。示例包括基于连接器时间分类（CTC）的语音识别系统，递归神经网络换能器（RNN-T），基于注意力的编解码器模型（AED），单调块式注意力（MoChA），基于变压器的语音识别系统，等等。与常规算法相比，这些完全基于神经网络的系统所需的内存占用空间小得多，因此它们在设备上的实现已变得可行。在本文中，我们回顾了这种端到端语音识别模型。与常规算法相比，我们广泛讨论了它们的结构，性能和优势。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-16

目录

摘要