摘要

1. Universal Sentence Representation Learning with Conditional Masked Language Model [PDF] 返回目录
Ziyi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve
Abstract: This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora. CMLM integrates sentence representation learning into MLM training by conditioning on the encoded vectors of adjacent sentences. Our English CMLM model achieves state-of-the-art performance on SentEval, even outperforming models learned using (semi-)supervised signals. As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains. We find that a multilingual CMLM model co-trained with bitext retrieval~(BR) and natural language inference~(NLI) tasks outperforms the previous state-of-the-art multilingual models by a large margin. We explore the same language bias of the learned representations, and propose a principle component based approach to remove the language identifying information from the representation while still retaining sentence semantics.
摘要：本文提出了一种新颖的训练方法，即条件屏蔽语言建模（CMLM），可以有效地学习大规模未标记语料库中的句子表示。 CMLM通过以相邻句子的编码向量为条件，将句子表示学习整合到MLM训练中。我们的英语CMLM模型在SentEval上达到了最先进的性能，甚至优于使用（半）监督信号学习的模型。作为一种完全不受监督的学习方法，CMLM可以方便地扩展到多种语言和领域。我们发现，与bitext检索〜（BR）和自然语言推理〜（NLI）任务协同训练的多语言CMLM模型在很大程度上优于以前的最新多语言模型。我们探索学习到的表示形式的相同语言偏见，并提出一种基于主成分的方法来从表示形式中删除语言识别信息，同时仍保留句子的语义。

2. DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language [PDF] 返回目录
Md. Rezaul Karim, Sumon Kanti Dey, Bharathi Raja Chakravarthi
Abstract: Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behavior like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behavior analysis, by predicting the contexts mostly for highly-resourced languages like English. However, some languages such as Bengali are under-resourced that lack of computational resources for natural language processing(NLP). In this paper, we propose an explainable approach for hate speech detection from under-resourced Bengali language, which we called DeepHateExplainer. In our approach, Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates, by employing neural ensemble of different transformer-based neural architectures(i.e., monolingual Bangla BERT-base, multilingual BERT-cased and uncased, and XLM-RoBERTa), followed by identifying important terms with sensitivity analysis and layer-wise relevance propagation(LRP) to provide human-interpretable explanations. Evaluations against several machine learning~(linear and tree-based models) and deep neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1 scores of 84%, 90%, 88%, and 88%, for political, personal, geopolitical, and religious hates, respectively, during 3-fold cross-validation tests.
摘要：社交媒体和微博站点的指数增长不仅提供了增强表达自由和个人声音的平台，而且使人们能够表达反社会行为，例如在线骚扰，网络欺凌和仇恨言论。通过预测主要用于资源丰富的语言（例如英语）的上下文，已经提出了许多工作来利用这些数据进行社会和反社会行为分析。但是，某些语言（例如孟加拉语）资源不足，缺乏自然语言处理（NLP）的计算资源。在本文中，我们提出了一种从资源贫乏的孟加拉语中仇恨语音检测的可解释方法，我们将其称为DeepHateExplainer。在我们的方法中，首先对孟加拉语文本进行全面的预处理，然后通过使用基于不同转换器的神经体系结构（即，单语的Bangla BERT，多语种的BERT和（无大写和XLM-RoBERTa），然后通过敏感性分析和逐层相关性传播（LRP）识别重要术语，以提供可解释的解释。对几种机器学习（线性和基于树的模型）和深度神经网络（即带有词嵌入的CNN，Bi-LSTM和Conv-LSTM）基准进行评估，得出的F1分数分别为84％，90％，88％和在三重交叉验证测试中，分别有88％的人对政治，个人，地缘政治和宗教仇恨。

3. BURT: BERT-inspired Universal Representation from Learning Meaningful Segment [PDF] 返回目录
Yian Li, Hai Zhao
Abstract: Although pre-trained contextualized language models such as BERT achieve significant performance on various downstream tasks, current language representation still only focuses on linguistic objective at a specific granularity, which may not applicable when multiple levels of linguistic units are involved at the same time. Thus this work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present a universal representation model, BURT (BERT-inspired Universal Representation from learning meaningful segmenT), to encode different levels of linguistic unit into the same vector space. Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage. We conduct experiments on datasets for English and Chinese including the GLUE and CLUE benchmarks, where our model surpasses its baselines and alternatives on a wide range of downstream tasks. We present our approach of constructing analogy datasets in terms of words, phrases and sentences and experiment with multiple representation models to examine geometric properties of the learned vector space through a task-independent evaluation. Finally, we verify the effectiveness of our unified pre-training strategy in two real-world text matching scenarios. As a result, our model significantly outperforms existing information retrieval (IR) methods and yields universal representations that can be directly applied to retrieval-based question-answering and natural language generation tasks.
摘要：尽管经过训练的上下文化语言模型（例如BERT）在各种下游任务上均取得了显着的性能，但当前的语言表示仍然仅专注于特定粒度的语言目标，当同时涉及多个语言单元级别时，这可能不适用。因此，这项工作介绍并探索了通用表示学习，即在统一向量空间中嵌入不同级别的语言单元的方法。我们提出了一种通用表示模型BURT（通过学习有意义的细分产生的BERT启发的通用表示），将不同级别的语言单元编码到相同的向量空间中。具体来说，我们根据逐点互信息（PMI）提取并掩盖有意义的细分，以将不同的粒度目标纳入预训练阶段。我们对英语和中文数据集（包括GLUE和CLUE基准）进行了实验，我们的模型在许多下游任务上都超过了其基准和替代方法。我们介绍了根据单词，短语和句子构造类比数据集的方法，并尝试了多种表示模型，以通过独立于任务的评估来检查学习的向量空间的几何特性。最后，我们在两个实际的文本匹配场景中验证了我们统一的预培训策略的有效性。结果，我们的模型大大优于现有的信息检索（IR）方法，并产生了可以直接应用于基于检索的问题解答和自然语言生成任务的通用表示形式。

4. Panarchy: ripples of a boundary concept [PDF] 返回目录
Juan Rocha, Linda Luvuno, Jesse Rieb, Erin Crockett, Katja Malmborg, Michael Schoon, Garry Peterson
Abstract: How do social-ecological systems change over time? In 2002 Holling and colleagues proposed the concept of Panarchy, which presented social-ecological systems as an interacting set of adaptive cycles, each of which is produced by the dynamic tensions between novelty and efficiency at multiple scales. Initially introduced as a conceptual framework and set of metaphors, panarchy has gained the attention of scholars across many disciplines and its ideas continue to inspire further conceptual developments. Almost twenty years after this concept was introduced we review how it has been used, tested, extended and revised. We do this by combining qualitative methods and machine learning. Document analysis was used to code panarchy features that are commonly used in the scientific literature (N = 42), a qualitative analysis that was complemented with topic modeling of 2177 documents. We find that the adaptive cycle is the feature of panarchy that has attracted the most attention. Challenges remain in empirically grounding the metaphor, but recent theoretical and empirical work offers some avenues for future research.
摘要：社会生态系统如何随着时间变化？在2002年，Holling及其同事提出了泛族制的概念，该概念将社会生态系统作为一组相互作用的适应性循环进行展示，每个适应性循环都是由新颖性和效率之间在多个尺度上的动态张力所产生。泛美主义最初是作为概念框架和一组隐喻引入的，已经引起了许多学科的学者的关注，其思想继续激发着概念的进一步发展。引入此概念近二十年后，我们回顾了它的使用，测试，扩展和修订方式。我们通过结合定性方法和机器学习来做到这一点。文档分析用于编码科学文献中常用的全景图功能（N = 42），该定性分析与2177文档的主题建模相辅相成。我们发现，适应性循环是最受关注的全权制的特征。凭经验为隐喻打下基础仍然存在挑战，但是最近的理论和经验工作为将来的研究提供了一些途径。

5. Towards Fully Automated Manga Translation [PDF] 返回目录
Ryota Hinami, Shonosuke Ishiwatari, Kazuhiko Yasuda, Yusuke Matsui
Abstract: We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation. Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In this paper, we make the following four contributions that establishes the foundation of manga translation research. First, we propose multimodal context-aware translation framework. We are the first to incorporate context information obtained from manga image. It enables us to translate texts in speech bubbles that cannot be translated without using context information (e.g., texts in other speech bubbles, gender of speakers, etc.). Second, for training the model, we propose the approach to automatic corpus construction from pairs of original manga and their translations, by which large parallel corpus can be constructed without any manual labeling. Third, we created a new benchmark to evaluate manga translation. Finally, on top of our proposed methods, we devised a first comprehensive system for fully automated manga translation.
摘要：我们解决了漫画，日本漫画的机器翻译问题。漫画翻译在机器翻译中涉及两个重要的问题：上下文感知和多模式翻译。由于在漫画中文本和图像以非结构化的方式混合在一起，因此从图像中获取上下文对于漫画翻译至关重要。但是，如何从图像中提取上下文并集成到MT模型中仍然是一个悬而未决的问题。此外，目前尚没有用于训练和评估这种模型的语料库和基准。在本文中，我们做出了以下四个贡献，为漫画翻译研究奠定了基础。首先，我们提出了多模式上下文感知翻译框架。我们是第一个结合从漫画图像获得的上下文信息的人。它使我们能够翻译不使用上下文信息就无法翻译的对话气泡中的文本（例如，其他对话气泡中的文本，说话者的性别等）。其次，为了训练模型，我们提出了从成对的原始漫画及其翻译中自动构建语料库的方法，通过该方法可以构建大型并行语料库而无需任何人工标记。第三，我们创建了一个新的基准来评估漫画翻译。最后，在我们提出的方法之上，我们设计了第一个全面的系统，用于全自动漫画翻译。

6. Red Dragon AI at TextGraphs 2020 Shared Task: LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking [PDF] 返回目录
Yew Ken Chia, Sam Witteveen, Martin Andrews
Abstract: Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranking positions in the re-ranking setting. Our model is competitive on the current leaderboard for the TextGraphs 2020 shared task, achieving a test-set MAP of 0.5607, and would have gained third place had we submitted before the competition deadline. Our code implementation is made available at this https URL
摘要：对科学问题的可解释问题解答是一项具有挑战性的任务，需要对大量事实句子进行多跳推理。为了克服单独查看每个查询文档对的方法的局限性，我们提出了一种LSTM交错式转换器，该转换器结合了跨文档交互以改善多跳排名。 LIT体系结构可以利用重新排名设置中的先前排名位置。我们的模型在TextGraphs 2020共享任务的当前排行榜上具有竞争力，实现了测试集MAP为0.5607，如果在竞赛截止日期之前提交，则该模型将获得第三名。我们的代码实现可通过以下https URL获得

7. On Generating Extended Summaries of Long Documents [PDF] 返回目录
Sajad Sotudeh, Arman Cohan, Nazli Goharian
Abstract: Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can't fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book. In this paper, we present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach. We then present our results on three long summarization datasets, arXiv-Long, PubMed-Long, and Longsumm. Our method outperforms or matches the performance of strong baselines. Furthermore, we perform a comprehensive analysis over the generated results, shedding insights on future research for long-form summary generation task. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections. Our datasets, and codes are publicly available at this https URL
摘要：文档摘要的先前工作主要集中于生成文档的简短摘要。虽然这种类型的摘要有助于获得给定文档的高级视图，但在某些情况下，还是希望了解其简要要点无法找到的要点的更详细信息。对于较长的文件，例如研究论文，法律文件或书籍，通常是这种情况。在本文中，我们提出了一种生成长论文扩展摘要的新方法。我们的方法利用文档的层次结构，并通过多任务学习方法将其合并到提取摘要模型中。然后，我们在三个长汇总数据集arXiv-Long，PubMed-Long和Longsumm上展示我们的结果。我们的方法优于或匹配强基准的性能。此外，我们对生成的结果进行全面的分析，以减少对长格式摘要生成任务的未来研究的见解。我们的分析表明，我们的多任务处理方法可以调整提取概率分布，以适应不同部分中值得总结的句子。我们的数据集和代码可在此https URL上公开获得

8. Neural Text Generation with Artificial Negative Examples [PDF] 返回目录
Keisuke Shirai, Kazuma Hashimoto, Akiko Eriguchi, Takashi Ninomiya, Shinsuke Mori
Abstract: Neural text generation models conditioning on given input (e.g. machine translation and image captioning) are usually trained by maximum likelihood estimation of target text. However, the trained models suffer from various types of errors at inference time. In this paper, we propose to suppress an arbitrary type of errors by training the text generation model in a reinforcement learning framework, where we use a trainable reward function that is capable of discriminating between references and sentences containing the targeted type of errors. We create such negative examples by artificially injecting the targeted errors to the references. In experiments, we focus on two error types, repeated and dropped tokens in model-generated text. The experimental results show that our method can suppress the generation errors and achieve significant improvements on two machine translation and two image captioning tasks.
摘要：通常以目标文本的最大似然估计来训练以给定输入为条件的神经文本生成模型（例如机器翻译和图像字幕）。但是，训练后的模型在推理时会遭受各种类型的错误。在本文中，我们建议通过在强化学习框架中训练文本生成模型来抑制任意类型的错误，在该框架中，我们使用可训练的奖励函数，该函数能够区分引用和包含目标错误类型的句子。我们通过人为地将有针对性的错误注入参考文献来创建此类负面示例。在实验中，我们关注两种错误类型，即模型生成的文本中的重复令牌和丢弃令牌。实验结果表明，该方法可以抑制生成错误，并且在两次机器翻译和两个图像字幕任务上均取得了明显的进步。

9. Syntax-Enhanced Pre-trained Model [PDF] 返回目录
Zenan Xu, Daya Guo, Duyu Tang, Qinliang Su, Linjun Shou, Ming Gong, Wanjun Zhong, Xiaojun Quan, Nan Duan, Daxin Jiang
Abstract: We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing, and question answering. Results show that our model achieves state-of-the-art performance on six public benchmark datasets. We have two major findings. First, we demonstrate that infusing automatically produced syntax of text improves pre-trained models. Second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens.
摘要：我们研究了利用文本的句法结构来增强诸如BERT和RoBERTa之类的预训练模型的问题。现有方法在预训练阶段或在微调阶段都利用文本的语法，因此它们在两个阶段之间存在差异。这样的问题将导致需要具有人类注释的句法信息，这限制了将现有方法应用于更广泛的场景。为了解决这个问题，我们提出了一个模型，该模型在预训练和微调阶段都利用了文本的语法。我们的模型基于带有语法感知注意层的Transformer，该注意层考虑了文本的依赖关系树。我们进一步介绍了一种新的预训练任务，用于预测依赖关系树中标记之间的句法距离。我们在三个下游任务（包括关系分类，实体类型和问题回答）上评估模型。结果表明，我们的模型在六个公共基准数据集上实现了最先进的性能。我们有两个主要发现。首先，我们证明了注入自动生成的文本语法可以改善预训练的模型。其次，与连续令牌之间的局部头部关系相比，令牌之间的全局句法距离带来了更大的性能提升。

10. Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval [PDF] 返回目录
Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher DuBois
Abstract: Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English. They not only suffer from the shortcomings of non-English document retrieval, but are reliant on language-specific supervision for either the task or translation. We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages. Assuming a strong English question answering model or database, we compare and analyze methods that pivot through English: to map foreign queries to English and then English answers back to target language answers. Within this task setup we propose Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking, which outperforms the strongest baselines by 2.7% on XQuAD and 6.2% on MKQA. Analysis demonstrates the particular efficacy of this strategy over state-of-the-art alternatives in challenging settings: low-resource languages, with extensive distractor data and query distribution misalignment. Circumventing retrieval, our analysis shows this approach offers rapid answer generation to almost any language off-the-shelf, without the need for any additional training data in the target language.
摘要：现有的使用低资源语言（LRL）进行公开检索的问题解答方法明显落后于英语。他们不仅遭受非英语文档检索的缺点，而且依赖于任务或翻译的特定于语言的监督。我们制定了一个对可用资源更切合实际的任务设置，从而绕过文档检索，以可靠地将知识从英语传递到资源较少的语言。假设一个强大的英语问题回答模型或数据库，我们将比较和分析贯穿英语的方法：将外国查询映射为英语，然后将英语答案映射回目标语言答案。在此任务设置中，我们提出了重新排序的多语言最大内部产品搜索（RM-MIPS），类似于具有重新排序的英语培训集上的语义相似性检索，其在XQuAD和MKQA上均比最强的基线好2.7％。分析表明，在具有挑战性的环境中，该策略相对于最新的替代方法具有特殊的效果：资源匮乏的语言，具有大量干扰因素数据和查询分布不对齐的情况。绕过检索，我们的分析表明，这种方法可以为几乎所有现成的语言提供快速的答案生成，而无需使用目标语言提供任何其他培训数据。

11. Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning [PDF] 返回目录
Yangyang Zhao, Zhenyu Wang, Zhenhua Huang
Abstract: Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to train on, have been considered as an affordable substitute for real users. However, this random sampling method ignores the law of human learning, making the learned dialogue policy inefficient and unstable. We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which replaces the traditional random sampling method with a teacher policy model to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent and the over-repetition penalty without any requirement of prior knowledge. The learning progress of the dialogue agent reflects the relationship between the dialogue agent's ability and the sampled goals' difficulty for sample efficiency. The over-repetition penalty guarantees the sampled diversity. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin. Furthermore, the framework can be further improved by equipping with different curriculum schedules, which demonstrates that the framework has strong generalizability.
摘要：由于成本高，基于强化学习的对话策略学习很难应用于真实用户来从头训练对话代理。用户模拟器为对话代理选择随机的用户目标进行训练，已经被认为是实际用户可负担的替代品。但是，这种随机抽样方法忽略了人类学习的规律，使学习的对话政策效率低下且不稳定。我们提出了一个新颖的框架，即基于自动课程学习的深度Q网络（ACL-DQN），该框架用教师策略模型代替了传统的随机抽样方法，从而实现了自动课程学习的对话策略。教师模型安排了有意义的有序课程，并通过监视对话代理的学习进度和过度重复惩罚自动调整了课程，而无需任何先验知识。对话主体的学习进度反映了对话主体的能力与抽样目标的抽样效率之间的关系。重复次数过多的惩罚保证了采样的多样性。实验表明，ACL-DQN以统计上显着的优势显着提高了对话任务的有效性和稳定性。此外，可以通过配备不同的课程表来进一步完善该框架，这表明该框架具有很强的通用性。

12. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation [PDF] 返回目录
Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu
Abstract: Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training. The teacher network is supposed to be a trustworthy predictor and the student tries to mimic its predictions. Usually, a student with a lighter architecture is selected so we can achieve compression and yet deliver high-quality results. In such a setting, distillation only happens for final predictions whereas the student could also benefit from teacher's supervision for internal components. Motivated by this, we studied the problem of distillation for intermediate layers. Since there might not be a one-to-one alignment between student and teacher layers, existing techniques skip some teacher layers and only distill from a subset of them. This shortcoming directly impacts quality, so we instead propose a combinatorial technique which relies on attention. Our model fuses teacher-side information and takes each layer's significance into consideration, then performs distillation between combined teacher layers and those of the student. Using our technique, we distilled a 12-layer BERT (Devlin et al. 2019) into 6-, 4-, and 2-layer counterparts and evaluated them on GLUE tasks (Wang et al. 2018). Experimental results show that our combinatorial approach is able to outperform other existing techniques.
摘要：知识蒸馏被认为是一种训练和压缩策略，在训练过程中，教师和学生这两个神经网络被耦合在一起。教师网络应该是一个值得信赖的预测者，而学生则试图模仿其预测。通常，选择的是具有较轻架构的学生，因此我们可以实现压缩，同时提供高质量的结果。在这种情况下，蒸馏仅用于最终预测，而学生也可以从老师对内部组件的监督中受益。因此，我们研究了中间层的蒸馏问题。由于在学生和教师层之间可能没有一对一的一致性，因此现有技术会跳过某些教师层，而仅从其中的一部分中提取出来。此缺点直接影响质量，因此我们提出了一种依靠注意力的组合技术。我们的模型融合了教师端的信息，并考虑了每一层的重要性，然后在合并的教师层与学生的层之间进行提炼。使用我们的技术，我们将12层BERT（Devlin等人2019）蒸馏为6、4和2层对应物，并在GLUE任务中对其进行了评估（Wang等人2018）。实验结果表明，我们的组合方法能够胜过其他现有技术。

13. SMART: A Situation Model for Algebra Story Problems via Attributed Grammar [PDF] 返回目录
Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu
Abstract: Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situations. To address such limits of neural solvers, we introduce the concept of a \emph{situation model}, which originates from psychology studies to represent the mental states of humans in problem-solving, and propose \emph{SMART}, which adopts attributed grammar as the representation of situation models for algebra story problems. Specifically, we first train an information extraction module to extract nodes, attributes, and relations from problem texts and then generate a parse graph based on a pre-defined attributed grammar. An iterative learning strategy is also proposed to improve the performance of SMART further. To rigorously study this task, we carefully curate a new dataset named \emph{ASP6.6k}. Experimental results on ASP6.6k show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability. To test these models' generalization capability, we also design an out-of-distribution (OOD) evaluation, in which problems are more complex than those in the training set. Our model exceeds state-of-the-art models by 17\% in the OOD evaluation, demonstrating its superior generalization ability.
摘要：解决代数故事问题在人工智能中仍然是一项艰巨的任务，需要对现实世界的情况有详细的了解，并需要强大的数学推理能力。先前的数学单词问题的神经求解器将问题文本直接转换为方程式，缺乏对情况的明确解释，并且通常无法处理更复杂的情况。为了解决神经求解器的这种局限性，我们引入了\ emph {情境模型}的概念，该概念源自心理学研究来代表人类在解决问题中的心理状态，并提出了\ emph {SMART}，它采用了归因语法作为代数故事问题的情境模型的表示。具体来说，我们首先训练一个信息提取模块，以从问题文本中提取节点，属性和关系，然后基于预定义的属性语法生成解析图。还提出了一种迭代学习策略，以进一步提高SMART的性能。为了严格研究此任务，我们精心策划了一个名为\ emph {ASP6.6k}的新数据集。在ASP6.6k上的实验结果表明，所提出的模型在很大程度上保留了所有更好的可解释性，并且优于所有以前的神经求解器。为了测试这些模型的泛化能力，我们还设计了一种分布外（OOD）评估，其中的问题比训练集中的问题更为复杂。我们的模型在OOD评估中比最新模型高出17％，证明了其出色的泛化能力。

14. Explaining NLP Models via Minimal Contrastive Editing (MiCE) [PDF] 返回目录
Alexis Ross, Ana Marasović, Matthew E. Peters
Abstract: Humans give contrastive explanations that explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the important role that contrastivity plays in how people generate and evaluate explanations, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for generating contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks -- binary sentiment classification, topic classification, and multiple-choice question answering -- show that MiCE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MiCE edits can be used for two use cases in NLP system development -- uncovering dataset artifacts and debugging incorrect model predictions -- and thereby illustrate that generating contrastive explanations is a promising research direction for model interpretability.
摘要：人类给出了对比解释，解释了为什么观察到的事件发生而不是其他一些反事实事件（对比案例）。尽管对比度在人们如何生成和评估解释中起着重要作用，但是目前用于解释NLP模型的方法在很大程度上缺乏这一特性。我们提出了最小对比编辑（MiCE），这是一种以对输入的编辑形式生成模型预测的对比解释的方法，该输入将模型输出更改为对比情况。我们对三项任务的实验-二进制情感分类，主题分类和多项选择题回答-表明MiCE能够产生不仅具有对比性而且具有最小和流畅性的编辑，与人类的对比性编辑一致。我们演示了如何将MiCE编辑用于NLP系统开发中的两个用例-发现数据集工件和调试不正确的模型预测-从而说明生成对比性解释是模型可解释性的有希望的研究方向。

15. MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining [PDF] 返回目录
Zhi Wen, Xing Han Lu, Siva Reddy
Abstract: One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
摘要：禁止在临床环境中使用许多当前的NLP方法的最大挑战之一是公共数据集的可用性。在这项工作中，我们介绍了MeDAL，这是一个大型的医学文本数据集，旨在简化歧义消除，旨在为医学领域的自然语言理解预训练提供帮助。我们在此数据集上对几种常见架构的模型进行了预训练，并根据经验表明，当对下游医疗任务进行微调时，这种预训练可以提高性能和收敛速度。

16. Adaptive Convolution for Semantic Role Labeling [PDF] 返回目录
Kashif Munir, Hai Zhao, Zuchao Li
Abstract: Semantic role labeling (SRL) aims at elaborating the meaning of a sentence by forming a predicate-argument structure. Recent researches depicted that the effective use of syntax can improve SRL performance. However, syntax is a complicated linguistic clue and is hard to be effectively applied in a downstream task like SRL. This work effectively encodes syntax using adaptive convolution which endows strong flexibility to existing convolutional networks. The existing CNNs may help in encoding a complicated structure like syntax for SRL, but it still has shortcomings. Contrary to traditional convolutional networks that use same filters for different inputs, adaptive convolution uses adaptively generated filters conditioned on syntactically informed inputs. We achieve this with the integration of a filter generation network which generates the input specific filters. This helps the model to focus on important syntactic features present inside the input, thus enlarging the gap between syntax-aware and syntax-agnostic SRL systems. We further study a hashing technique to compress the size of the filter generation network for SRL in terms of trainable parameters. Experiments on CoNLL-2009 dataset confirm that the proposed model substantially outperforms most previous SRL systems for both English and Chinese languages
摘要：语义角色标记（SRL）旨在通过形成谓词-自变量结构来详细说明句子的含义。最近的研究表明，语法的有效使用可以提高SRL的性能。但是，语法是一个复杂的语言线索，很难有效地应用于诸如SRL之类的下游任务。这项工作使用自适应卷积有效地编码了语法，这为现有的卷积网络提供了强大的灵活性。现有的CNN可能有助于编码复杂的结构，例如SRL的语法，但仍存在不足之处。与将相同的滤波器用于不同输入的传统卷积网络相反，自适应卷积使用以句法通知的输入为条件的自适应生成的滤波器。我们通过集成生成特定输入过滤器的过滤器生成网络来实现这一目标。这有助于模型将注意力集中在输入内部存在的重要句法特征上，从而扩大了语法感知和语法无关的SRL系统之间的差距。我们进一步研究了一种散列技术，以根据可训练参数来压缩SRL的过滤器生成网络的大小。在CoNLL-2009数据集上进行的实验证实，该模型大大优于以前的大多数英语和汉语SRL系统。

17. SG-Net: Syntax Guided Transformer for Language Representation [PDF] 返回目录
Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, Hai Zhao, Rui Wang
Abstract: Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. The proposed SG-Net is applied to typical Transformer encoders. Extensive experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design.
摘要：理解人类语言是人工智能的关键主题之一。对于语言表示，有效地从细节繁琐的冗长文本中建模语言知识并消除噪音的能力对于提高其性能至关重要。传统的注意模型会在没有显式约束的情况下出现所有单词，从而导致某些可有可无的单词无法正确地集中注意力。在这项工作中，我们建议使用语法通过将显式句法约束合并到注意力机制中来指导文本建模，以更好地激发语言动机。详细地，对于自注意网络（SAN）赞助的基于Transformer的编码器，我们将感兴趣的句法相关性（SDOI）设计引入到SAN中以形成具有语法指导的自注意的SDOI-SAN。然后，语法指导网络（SG-Net）由此额外的SDOI-SAN和SAN组成，原始的Transformer编码器通过双上下文架构提供了更好的语言启发式表示。提出的SG-Net应用于典型的变压器编码器。对流行的基准任务的广泛实验，包括机器阅读理解，自然语言推断和神经机器翻译，证明了所提出的SG-Net设计的有效性。

18. An Embarrassingly Simple Model for Dialogue Relation Extraction [PDF] 返回目录
Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng
Abstract: Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue. In this paper, we model Dialogue RE as a multi-label classification task and propose a simple yet effective model named SimpleRE. SimpleRE captures the interrelations among multiple relations in a dialogue through a novel input format, BERT Relation Token Sequence (BRS). In BRS, multiple [CLS] tokens are used to capture different relations between different pairs of entities. A Relation Refinement Gate (RRG) is designed to extract relation-specific semantic representation adaptively. Experiments on DialogRE show that SimpleRE achieves the best performance with much shorter training time. SimpleRE outperforms all direct baselines on sentence-level RE without using external resources.
摘要：对话关系提取（RE）是为了预测对话中提到的两个实体的关系类型。在本文中，我们将Dialogue RE建模为多标签分类任务，并提出了一个简单而有效的名为SimpleRE的模型。 SimpleRE通过一种新颖的输入格式BERT关系令牌序列（BRS）捕获对话中多个关系之间的相互关系。在BRS中，多个[CLS]令牌用于捕获不同对实体之间的不同关系。关系细化门（RRG）旨在自适应地提取特定于关系的语义表示。在DialogRE上进行的实验表明，SimpleRE在缩短培训时间的情况下达到了最佳性能。在不使用外部资源的情况下，SimpleRE优于句子级RE的所有直接基准。

19. My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism [PDF] 返回目录
Swapnil Parekh, Yaman Kumar Singla, Changyou Chen, Junyi Jessy Li, Rajiv Ratn Shah
Abstract: Significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. However, little research has been put to understand and interpret the black-box nature of these deep-learning based scoring models. Recent work shows that automated scoring systems are prone to even common-sense adversarial samples. Their lack of natural language understanding capability raises questions on the models being actively used by millions of candidates for life-changing decisions. With scoring being a highly multi-modal task, it becomes imperative for scoring models to be validated and tested on all these modalities. We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms and why they are susceptible to adversarial samples. We find that the systems tested consider essays not as a piece of prose having the characteristics of natural flow of speech and grammatical structure, but as `word-soups' where a few words are much more important than the other words. Removing the context surrounding those few important words causes the prose to lose the flow of speech and grammar, however has little impact on the predicted score. We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as ``the world is flat'' actually increases the score instead of decreasing it.
摘要：在过去的二十年中，基于深度学习的自动作文评分（AES）系统取得了重大进展。但是，很少有研究来理解和解释这些基于深度学习的评分模型的黑匣子性质。最近的工作表明，自动评分系统甚至容易出现常识性对抗样本。他们缺乏对自然语言的理解能力，这引发了人们对数以百万计的候选人积极使用的用于改变生活的决策模型的质疑。由于评分是一个高度多模式的任务，因此必须在所有这些模式上对评分模型进行验证和测试。我们利用可解释性方面的最新进展来发现诸如连贯性，内容和相关性之类的特征在多大程度上对自动评分机制很重要，以及为什么它们容易受到对抗性样本的影响。我们发现，所测试的系统认为散文不是具有自然语言流和语法结构特征的散文，而是散布在某些单词比其他单词重要得多的“单词汤”中。除去这几个重要单词周围的上下文会使散文失去言语和语法的流向，但是对预测分数的影响很小。我们还发现，由于这些模型在语义上不具有世界知识和常识，因此添加诸如``世界是平坦的''之类的错误事实实际上会增加而不是降低得分。

20. Learning Light-Weight Translation Models from Deep Transformer [PDF] 返回目录
Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu
Abstract: Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at this https URL.
摘要：最近，深度模型显示出神经机器翻译（NMT）的巨大进步。然而，这种系统在计算上是昂贵的并且是存储器密集的。在本文中，我们向学习功能强大但重量轻的NMT系统迈出了自然的一步。我们提出了一种新颖的基于组置换的知识蒸馏方法，将深层的Transformer模型压缩为浅层模型。在多个基准上的实验结果验证了我们方法的有效性。我们的压缩模型比深度模型浅8倍，而BLEU几乎没有损失。为了进一步增强教师模型，我们提出了一种“跳过子层”方法来随机省略子层，以将扰动引入训练中，在英语-德语newstest2014上，BLEU得分达到30.63。该代码可从此https URL公开获得。

21. Inserting Information Bottlenecks for Attribution in Transformers [PDF] 返回目录
Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin
Abstract: Pretrained transformers achieve the state of the art across tasks in natural language processing, motivating researchers to investigate their inner mechanisms. One common direction is to understand what features are important for prediction. In this paper, we apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model. We use BERT as the example and evaluate our approach both quantitatively and qualitatively. We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers. We demonstrate that our technique outperforms two competitive methods in degradation tests on four datasets. Code is available at this https URL.
摘要：经过预训练的变压器可以在自然语言处理的各个任务中达到最先进的水平，从而促使研究人员研究其内部机制。一个共同的方向是了解哪些特征对于预测很重要。在本文中，我们应用信息瓶颈来分析每个特征的属性，以便在黑盒模型上进行预测。我们以BERT为例，对方法进行定量和定性评估。我们从归因的角度展示了我们方法的有效性，并提供了洞察信息如何流经层的能力。我们证明，在对四个数据集进行的退化测试中，我们的技术优于两种竞争方法。在此https URL上提供了代码。

22. Fine-grained Emotion and Intent Learning in Movie Dialogues [PDF] 返回目录
Anuradha Welivita, Yubo Xie, Pearl Pu
Abstract: We propose a novel large-scale emotional dialogue dataset, consisting of 1M dialogues retrieved from the OpenSubtitles corpus and annotated with 32 emotions and 9 empathetic response intents using a BERT-based fine-grained dialogue emotion classifier. This work explains the complex pipeline used to preprocess movie subtitles and select good movie dialogues to annotate. We also describe the semi-supervised learning process followed to train a fine-grained emotion classifier to annotate these dialogues. Despite the large set of labels, our dialogue emotion classifier achieved an accuracy of $65\%$ and was used to annotate 1M emotional movie dialogues from OpenSubtitles. This scale of emotional dialogue classification has never been attempted before, both in terms of dataset size and fine-grained emotion and intent categories. Visualization techniques used to analyze the quality of the resultant dataset suggest that it conforms to the patterns of human social interaction.
摘要：我们提出了一个新颖的大规模情感对话数据集，该数据集由从OpenSubtitles语料库中检索到的1M对话组成，并使用基于BERT的细粒度对话情感分类器标注了32种情感和9种移情反应意图。这项工作解释了用于预处理电影字幕和选择好的电影对话进行注释的复杂管道。我们还描述了用于训练细粒度的情感分类器来注释这些对话的半监督学习过程。尽管有大量标签，我们的对话情感分类器仍能达到$ 65 \％$的准确度，并用于注释来自OpenSubtitles的1M情感电影对话。无论是在数据集大小，细粒度的情感和意图类别方面，都从未尝试过这种规模的情感对话分类。用于分析所得数据集质量的可视化技术表明，它符合人类社会互动的模式。

23. LOREN: Logic Enhanced Neural Reasoning for Fact Verification [PDF] 返回目录
Jiangjie Chen, Qiaoben Bao, Jiaze Chen, Changzhi Sun, Hao Zhou, Yanghua Xiao, Lei Li
Abstract: Given a natural language statement, how to verify whether it is supported, refuted, or unknown according to a large-scale knowledge source like Wikipedia? Existing neural-network-based methods often regard a sentence as a whole. While we argue that it is beneficial to decompose a statement into multiple verifiable logical points. In this paper, we propose LOREN, a novel approach for fact verification that integrates both Logic guided Reasoning and Neural inference. The key insight of LOREN is that it decomposes a statement into multiple reasoning units around the central phrases. Instead of directly validating a single reasoning unit, LOREN turns it into a question-answering task and calculates the confidence of every single hypothesis using neural networks in the embedding space. They are aggregated to make a final prediction using a neural joint reasoner guided by a set of three-valued logic rules. LOREN enjoys the additional merit of interpretability -- it is easy to explain how it reaches certain results with intermediate results and why it makes mistakes. We evaluate LOREN on FEVER, a public benchmark for fact verification. Experiments show that our proposed LOREN outperforms other previously published methods and achieves 73.43% of the FEVER score.
摘要：给定自然语言陈述后，如何根据诸如Wikipedia之类的大规模知识来源来验证其是否被支持，被拒绝或未知？现有的基于神经网络的方法通常将一个句子视为一个整体。虽然我们认为将语句分解为多个可验证的逻辑点是有益的。在本文中，我们提出了LOREN，这是一种新颖的事实验证方法，它融合了逻辑引导推理和神经推理。 LOREN的主要见解在于它将陈述分解为围绕中心短语的多个推理单元。 LOREN无需直接验证单个推理单元，而是将其变成一个问答任务，并使用神经网络在嵌入空间中计算每个单个假设的置信度。使用一组三值逻辑规则指导的神经联合推理机，对它们进行汇总以做出最终预测。 LOREN具有可解释性的其他优点-很容易解释它如何在达到中间结果的同时达到某些结果以及为什么会出错。我们通过FEVER（事实验证的公开基准）对LOREN进行评估。实验表明，我们提出的LOREN优于其他以前发布的方法，并达到了FEVER分数的73.43％。

24. Contextual Temperature for Language Modeling [PDF] 返回目录
Pei-Hsin Wang, Sheng-Iou Hsieh, Shih-Chieh Chang, Yu-Ting Chen, Jia-Yu Pan, Wei Wei, Da-Chang Juan
Abstract: Temperature scaling has been widely used as an effective approach to control the smoothness of a distribution, which helps the model performance in various tasks. Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule. However, our studies indicate that the individual optimal trajectory for each class can change with the context. To this end, we propose contextual temperature, a generalized approach that learns an optimal temperature trajectory for each vocabulary over the context. Experimental results confirm that the proposed method significantly improves state-of-the-art language models, achieving a perplexity of 55.31 and 62.89 on the test set of Penn Treebank and WikiText-2, respectively. In-depth analyses show that the behaviour of the learned temperature schedules varies dramatically by vocabulary, and that the optimal schedules help in controlling the uncertainties. These evidences further justify the need for the proposed method and its advantages over fixed temperature schedules.
摘要：温度缩放已被广泛用作控制分布平滑度的有效方法，这有助于模型在各种任务中的性能。当前应用温度缩放的做法假定采用固定的或手动制作的动态更改时间表。但是，我们的研究表明，每个类别的个体最佳轨迹可以随上下文而变化。为此，我们提出了上下文温度，这是一种通用方法，可以为上下文中的每个词汇学习最佳温度轨迹。实验结果证实，该方法大大改善了最新的语言模型，分别在Penn Treebank和WikiText-2的测试集上实现了55.31和62.89的困惑。深入的分析表明，学习的温度计划的行为因词汇的不同而有很大差异，并且最佳计划有助于控制不确定性。这些证据进一步证明了对所提出的方法的需求及其相对于固定温度时间表的优势。

25. Towards a Universal Continuous Knowledge Base [PDF] 返回目录
Gang Chen, Maosong Sun, Yang Liu
Abstract: In artificial intelligence, knowledge is the information required by an intelligent system to accomplish tasks. While traditional knowledge bases use discrete, symbolic representations, detecting knowledge encoded in the continuous representations learned from data has received increasing attention recently. In this work, we propose a method for building a continuous knowledge base that can store knowledge imported from multiple, diverse neural networks. The key idea of our approach is to define an interface for each neural network and cast knowledge transferring as a function simulation problem. Preliminary experiments on text classification show promising results: we first import the knowledge encoded in an RNN model and a CNN model to the knowledge base, from which the fused knowledge is exported back to the RNN model, achieving a higher classification accuracy than the original RNN model. With the continuous knowledge base, it is also easy to achieve knowledge distillation and transfer learning. Our work opens the door to building a universal continuous knowledge base to collect, store, and organize all continuous knowledge encoded in different neural networks trained for different AI tasks.
摘要：在人工智能中，知识是智能系统完成任务所需的信息。尽管传统知识库使用离散的符号表示形式，但从数据中学到的以连续表示形式编码的知识的检测近来受到越来越多的关注。在这项工作中，我们提出了一种构建连续知识库的方法，该知识库可以存储从多个不同的神经网络导入的知识。我们方法的关键思想是为每个神经网络定义一个接口，并将知识转移转化为功能仿真问题。文本分类的初步实验显示出了可喜的结果：我们首先将RNN模型和CNN模型中编码的知识导入知识库，然后将融合的知识导出回RNN模型，从而实现了比原始RNN更高的分类精度模型。有了连续的知识库，也很容易实现知识的升华和转移学习。我们的工作为建立通用的连续知识库打开了大门，该知识库可以收集，存储和组织在针对不同AI任务而训练的不同神经网络中编码的所有连续知识。

26. Why Neural Machine Translation Prefers Empty Outputs [PDF] 返回目录
Xing Shi, Yijun Xiao, Kevin Knight
Abstract: We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.
摘要：我们研究了为什么神经机器翻译（NMT）系统将高概率分配给空翻译。我们找到两种解释。首先，标签平滑处理使正确长度的翻译更加不确定，从而使空翻译最终更容易胜过它们。其次，NMT系统使用相同的高频EoS词结束所有目标句子，无论长度如何。这将创建隐式平滑，从而增加零长度平移。在不同长度的目标句子中使用不同的EoS类型可以消除这种隐式平滑。

27. ThamizhiUDp: A Dependency Parser for Tamil [PDF] 返回目录
Kengatharaiyer Sarveswaran, Gihan Dias
Abstract: This paper describes how we developed a neural-based dependency parser, namely ThamizhiUDp, which provides a complete pipeline for the dependency parsing of the Tamil language text using Universal Dependency formalism. We have considered the phases of the dependency parsing pipeline and identified tools and resources in each of these phases to improve the accuracy and to tackle data scarcity. ThamizhiUDp uses Stanza for tokenisation and lemmatisation, ThamizhiPOSt and ThamizhiMorph for generating Part of Speech (POS) and Morphological annotations, and uuparser with multilingual training for dependency parsing. ThamizhiPOSt is our POS tagger, which is based on the Stanza, trained with Amrita POS-tagged corpus. It is the current state-of-the-art in Tamil POS tagging with an F1 score of 93.27. Our morphological analyzer, ThamizhiMorph is a rule-based system with a very good coverage of Tamil. Our dependency parser ThamizhiUDp was trained using multilingual data. It shows a Labelled Assigned Score (LAS) of 62.39, 4 points higher than the current best achieved for Tamil dependency parsing. Therefore, we show that breaking up the dependency parsing pipeline to accommodate existing tools and resources is a viable approach for low-resource languages.
摘要：本文描述了我们如何开发基于神经的依赖性解析器ThamizhiUDp，该解析器提供了使用通用依赖性形式主义对泰米尔语文本进行依赖性解析的完整管道。我们已经考虑了依赖关系解析管道的各个阶段，并在每个阶段中确定了工具和资源，以提高准确性并解决数据短缺问题。 ThamizhiUDp使用Stanza进行标记化和lemmatization，使用ThamizhiPOSt和ThamizhiMorph生成词性（POS）和形态注释，并使用uuparser进行多语言培训以进行依存关系解析。 ThamizhiPOSt是我们的POS标记器，它基于Stanza，受Amrita POS标记的语料库训练。这是Tamil POS标记中的最新技术，F1得分为93.27。我们的形态分析仪ThamizhiMorph是基于规则的系统，对泰米尔语的覆盖率很高。我们的依赖项解析器ThamizhiUDp使用多语言数据进行了训练。它显示的标签分配分数（LAS）为62.39，比当前泰米尔语依赖项解析的最佳成绩高4点。因此，我们表明，打破依赖关系解析管道以容纳现有工具和资源对于低资源语言是一种可行的方法。

28. Mechanism of Evolution Shared by Gene and Language [PDF] 返回目录
Li-Min Wang, Hsing-Yi Lai, Sun-Ting Tsai, Shan-Jyun Wu, Meng-Xue Tsai, Daw-Wei Wang, Yi-Ching Su, Chen Siang Ng, Tzay-Ming Hong
Abstract: We propose a general mechanism for evolution to explain the diversity of gene and language. To quantify their common features and reveal the hidden structures, several statistical properties and patterns are examined based on a new method called the rank-rank analysis. We find that the classical correspondence, "domain plays the role of word in gene language", is not rigorous, and propose to replace domain by protein. In addition, we devise a new evolution unit, syllgram, to include the characteristics of spoken and written language. Based on the correspondence between (protein, domain) and (word, syllgram), we discover that both gene and language shared a common scaling structure and scale-free network. Like the Rosetta stone, this work may help decipher the secret behind non-coding DNA and unknown languages.
摘要：我们提出了一种进化的通用机制来解释基因和语言的多样性。为了量化它们的共同特征并揭示隐藏的结构，基于一种称为秩分析的新方法检查了几种统计属性和模式。我们发现经典的对应关系“域在基因语言中起词的作用”并不严格，因此建议用蛋白质代替域。此外，我们设计了一个新的演变单元，音节图，以包含口语和书面语言的特征。基于（蛋白质，结构域）和（单词，音节图）之间的对应关系，我们发现基因和语言都共享相同的缩放结构和无缩放网络。像罗塞塔石碑一样，这项工作可能有助于破译非编码DNA和未知语言背后的秘密。

29. The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes [PDF] 返回目录
Nils Reimers, Iryna Gurevych
Abstract: Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25. However, no previous work investigated how dense representations perform with large index sizes. We show theoretically and empirically that the performance for dense representations decreases quicker than sparse representations for increasing index sizes. In extreme cases, this can even lead to a tipping point where at a certain index size sparse representations outperform dense representations. We show that this behavior is tightly connected to the number of dimensions of the representations: The lower the dimension, the higher the chance for false positives, i.e. returning irrelevant documents.
摘要：最近，使用密集的低维表示进行信息检索变得很流行，并且表现出比BM25等传统的稀疏表示更出色的性能。但是，以前没有工作研究过密集表示如何在大索引大小下执行。我们从理论和经验上表明，对于增加索引大小，密集表示的性能下降比稀疏表示更快。在极端情况下，这甚至可能导致临界点，在某些索引大小下，稀疏表示的性能优于密集表示。我们证明此行为与表示的维数紧密相关：维数越小，误报率越高，即返回无关文档的机会越大。

30. Neural document expansion for ad-hoc information retrieval [PDF] 返回目录
Cheng Tang, Andrew Arnold
Abstract: Recently, Nogueira et al. [2019] proposed a new approach to document expansion based on a neural Seq2Seq model, showing significant improvement on short text retrieval task. However, this approach needs a large amount of in-domain training data. In this paper, we show that this neural document expansion approach can be effectively adapted to standard IR tasks, where labels are scarce and many long documents are present.
摘要：最近，Nogueira等人。 [2019]提出了一种基于神经Seq2Seq模型的文档扩展新方法，显示了对短文本检索任务的重大改进。但是，这种方法需要大量的域内训练数据。在本文中，我们证明了这种神经文档扩展方法可以有效地适应标准的IR任务，其中标签稀缺且存在许多长文档。

31. Measuring University Impact: Wikipedia approach [PDF] 返回目录
Tatiana Kozitsina, Viacheslav Goiko, Roman Palkin, Valentin Khomutenko, Yulia Mundrievskaya, Maria Sukhareva, Isak Froumin, Mikhail Myagkov
Abstract: The impact of Universities on the social, economic and political landscape is one of the key directions in contemporary educational evaluation. In this paper, we discuss the new methodological technique that evaluates the impact of university based on popularity (number of page-views) of their alumni's pages on Wikipedia. It allows revealing the alumni popularity dynamics and tracking its state. Preliminary analysis shows that the number of page-views is higher for the contemporary persons that prove the perspectives of this approach. Then, universities were ranked based on the methodology and compared to the famous international university rankings ARWU and QS based only on alumni scales: for the top 10 universities, there is an intersection of two universities (Columbia University, Stanford University). The correlation coefficients between different university rankings are provided in the paper. Finally, the ranking based on the alumni popularity was compared with the ranking of universities based on the popularity of their webpages on Wikipedia: there is a strong connection between these indicators.
摘要：大学对社会，经济和政治格局的影响是当代教育评估的关键方向之一。在本文中，我们讨论了一种新的方法论技术，该技术根据其在Wikipedia上的校友页面的受欢迎程度（页面浏览量）来评估大学的影响。它可以揭示校友的流行动态并跟踪其状态。初步分析表明，当代人的页面浏览量更高，证明了这种方法的观点。然后，根据方法对大学进行排名，并将其与仅基于校友规模的国际著名大学排名ARWU和QS进行比较：在排名前10的大学中，有两所大学（哥伦比亚大学，斯坦福大学）相交。本文提供了不同大学排名之间的相关系数。最后，将基于校友受欢迎程度的排名与根据其在Wikipedia上的网页受欢迎程度的大学排名进行比较：这些指标之间有很强的联系。

32. Improving Opinion Spam Detection by Cumulative Relative Frequency Distribution [PDF] 返回目录
Michela Fazzolari, Francesco Buccafurri, Gianluca Lax, Marinella Petrocchi
Abstract: Over the last years, online reviews became very important since they can influence the purchase decision of consumers and the reputation of businesses, therefore, the practice of writing fake reviews can have severe consequences on customers and service providers. Various approaches have been proposed for detecting opinion spam in online reviews, especially based on supervised classifiers. In this contribution, we start from a set of effective features used for classifying opinion spam and we re-engineered them, by considering the Cumulative Relative Frequency Distribution of each feature. By an experimental evaluation carried out on real data from this http URL, we show that the use of the distributional features is able to improve the performances of classifiers.
摘要：在过去的几年中，在线评论变得非常重要，因为它们会影响消费者的购买决定和企业声誉，因此，撰写假评论的做法会对客户和服务提供商造成严重后果。已经提出了各种方法来检测在线评论中的垃圾邮件，尤其是基于监督分类器。在此贡献中，我们从用于对垃圾邮件进行分类的一组有效功能开始，并通过考虑每个功能的累积相对频率分布对它们进行了重新设计。通过对来自此http URL的真实数据进行的实验评估，我们表明使用分布特征可以提高分类器的性能。

33. Translating Natural Language Instructions to Computer Programs for Robot Manipulation [PDF] 返回目录
Sagar Gubbi Venkatesh, Raviteja Upadrashta, Bharadwaj Amrutur
Abstract: It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning methods predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function which when executed queries the scene by accessing the output of the object detector and controls the robot to perform the specified task. This enables the use of non-differentiable modules such as a constraint solver when computing commands to the robot. Moreover, the labels in this setup are significantly more descriptive computer programs rather than teleoperated demonstrations. We show that the proposed method performs better than training a neural network to directly predict the robot actions.
摘要：非常需要与人类一起工作的机器人能够理解自然语言的指令。现有的以语言为条件的模仿学习方法根据图像观察和指令文本预测执行器命令。我们建议将自然语言指令转换为Python函数，而不是直接预测执行器命令，该函数在执行时通过访问对象检测器的输出来查询场景并控制机器人执行指定任务。这使得在计算对机器人的命令时可以使用不可微分的模块，例如约束求解器。此外，此设置中的标签明显是更具描述性的计算机程序，而不是远程演示。我们表明，所提出的方法比训练神经网络直接预测机器人动作要好。

34. Spatial Reasoning from Natural Language Instructions for Robot Manipulation [PDF] 返回目录
Sagar Gubbi Venkatesh, Anirban Biswas, Raviteja Upadrashta, Vikram Srinivasan, Partha Talukdar, Bharadwaj Amrutur
Abstract: Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates corresponding to the locations where the robot must pick up and place the object respectively. We show that representing the localized objects by quantizing their positions to a binary grid is preferable to representing them as a list of 2D co-ordinates. We also show that attention improves generalization and can overcome biases in the dataset. The proposed method is used to pick-and-place playing cards using a robot arm.
摘要：可以在非结构化环境中操纵对象并与人类协作的机器人可以通过了解自然语言而受益匪浅。我们提出了两个阶段的流水线架构，以对文本输入执行空间推理。首先将场景中的所有对象定位，然后将自然语言中的机器人指令和定位坐标映射到与机器人必须拾取并放置对象的位置相对应的起点和终点坐标分别。我们表明，通过量化对象在二进制网格中的位置来表示本地化对象比将其表示为2D坐标列表更可取。我们还表明，注意力提高了概括性，可以克服数据集中的偏差。所提出的方法用于使用机械手拾取和放置纸牌。

35. Social media data reveals signal for public consumer perceptions [PDF] 返回目录
Neeti Pokhriyal, Abenezer Dara, Benjamin Valentino, Soroush Vosoughi
Abstract: Researchers have used social media data to estimate various macroeconomic indicators about public behaviors, mostly as a way to reduce surveying costs. One of the most widely cited economic indicator is consumer confidence index (CCI). Numerous studies in the past have focused on using social media, especially Twitter data, to predict CCI. However, the strong correlations disappeared when those models were tested with newer data according to a recent comprehensive survey. In this work, we revisit this problem of assessing the true potential of using social media data to measure CCI, by proposing a robust non-parametric Bayesian modeling framework grounded in Gaussian Process Regression (which provides both an estimate and an uncertainty associated with it). Integral to our framework is a principled experimentation methodology that demonstrates how digital data can be employed to reduce the frequency of surveys, and thus periodic polling would be needed only to calibrate our model. Via extensive experimentation we show how the choice of different micro-decisions, such as the smoothing interval, various types of lags etc. have an important bearing on the results. By using decadal data (2008-2019) from Reddit, we show that both monthly and daily estimates of CCI can, indeed, be reliably estimated at least several months in advance, and that our model estimates are far superior to those generated by the existing methods.
摘要：研究人员已使用社交媒体数据来估计有关公共行为的各种宏观经济指标，这主要是为了降低调查成本。被广泛引用的经济指标之一是消费者信心指数（CCI）。过去，许多研究都集中在使用社交媒体（尤其是Twitter数据）来预测CCI。然而，根据最近的一项综合调查，当使用较新的数据对这些模型进行测试时，强相关性消失了。在这项工作中，我们通过提出基于高斯过程回归的健壮的非参数贝叶斯建模框架（该评估提供了估计和与之相关的不确定性），重新审视了评估使用社交媒体数据衡量CCI的真正潜力的问题。。与我们的框架集成在一起的是一种原理性实验方法，该方法论证了如何利用数字数据来减少调查的频率，因此仅需要定期轮询来校准我们的模型。通过广泛的实验，我们展示了如何选择不同的微决策，例如平滑间隔，各种类型的滞后等，对结果产生重要影响。通过使用Reddit的年代际数据（2008-2019），我们显示CCI的每月和每日估算确实可以提前至少几个月可靠地估算，并且我们的模型估算远远优于现有估算得出的估算方法。

36. Learning by Fixing: Solving Math Word Problems with Weak Supervision [PDF] 返回目录
Yining Hong, Qing Li, Daniel Ciao, Siyuan Haung, Song-Chun Zhu
Abstract: Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions. In this paper, we address this issue by introducing a \textit{weakly-supervised} paradigm for learning MWPs. Our method only requires the annotations of the final answers and can generate various solutions for a single problem. To boost weakly-supervised learning, we propose a novel \textit{learning-by-fixing} (LBF) framework, which corrects the misperceptions of the neural network via symbolic reasoning. Specifically, for an incorrect solution tree generated by the neural network, the \textit{fixing} mechanism propagates the error from the root node to the leaf nodes and infers the most probable fix that can be executed to get the desired answer. To generate more diverse solutions, \textit{tree regularization} is applied to guide the efficient shrinkage and exploration of the solution space, and a \textit{memory buffer} is designed to track and save the discovered various fixes for each problem. Experimental results on the Math23K dataset show the proposed LBF framework significantly outperforms reinforcement learning baselines in weakly-supervised learning. Furthermore, it achieves comparable top-1 and much better top-3/5 answer accuracies than fully-supervised methods, demonstrating its strength in producing diverse solutions.
摘要：以前的数学单词问题（MWP）的神经求解器是在充分监督下学习的，无法生成多种解决方案。在本文中，我们通过介绍用于学习MWP的\ textit {弱监督}范例来解决此问题。我们的方法只需要注解最终答案，就可以为单个问题生成各种解决方案。为了促进弱监督学习，我们提出了一种新颖的\ textit {fixing-by-fixing}（LBF）框架，该框架可通过符号推理来纠正对神经网络的误解。具体来说，对于由神经网络生成的不正确的解决方案树，\ textit {fixing}机制将错误从根节点传播到叶节点，并推断出可以执行的最可能的修复方法，以获得所需的答案。为了生成更多不同的解决方案，可以使用\ textit {tree正则化}来指导有效缩小和探索解决方案空间，并设计\ textit {memory buffer}来跟踪和保存针对每个问题发现的各种解决方案。在Math23K数据集上的实验结果表明，所提出的LBF框架在弱监督学习中明显优于强化学习基线。此外，与完全监督的方法相比，它具有可比的top-1和top-3 / 5更好的答案精度，证明了其在生成各种解决方案方面的实力。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-29

目录

摘要