摘要

1. Learning a natural-language to LTL executable semantic parser for grounded robotics [PDF] 返回目录
Christopher Wang, Candace Ross, Boris Katz, Andrei Barbu
Abstract: Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands. In particular, we focus on the difficult domain of commands with a temporal aspect, whose semantics we capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of sentences and executions as well as an executor. At training time, the parser hypothesizes a meaning representation for the input as a formula in LTL. Three competing pressures allow the parser to discover meaning from language. First, any hypothesized meaning for a sentence must be permissive enough to reflect all the annotated execution trajectories. Second, the executor -- a pretrained end-to-end LTL planner -- must find that the observed trajectories are likely executions of the meaning. Finally, a generator, which reconstructs the original input, encourages the model to find representations that conserve knowledge about the command. Together these ensure that the meaning is neither too general nor too specific. Our model generalizes well, being able to parse and execute both machine-generated and human-generated commands, with near-equal accuracy, despite the fact that the human-generated sentences are much more varied and complex with an open lexicon. The approach presented here is not specific to LTL; it can be applied to any domain where sentence meanings can be hypothesized and an executor can verify these meanings, thus opening the door to many applications for robotic agents.
摘要：孩子们通过观察语言是如何在上下文中使用，并试图利用它自己获得他们明显缓解的母语。他们这样做是不费力的注解，反面的例子，甚至直接更正。我们对中机器人，可以通过训练接地语义解析，其中发现的潜伏，可用于自然语言命令的执行语言表述做同样的一个步骤。特别地，我们专注于命令与时间方面，其语义与我们线性时序逻辑，LTL捕捉困难域。我们的分析器进行训练判决和执行的对，以及一个执行者。在训练时间，解析器hypothesizes用于输入作为LTL公式的含义表示。三个竞争的压力使解析器能够发现从语言的意思。首先，一个句子的任何假设的意义必须是允许的，足以反映所有注释执行轨迹。其次，执行器 - 预训练结束到终端的LTL策划者 - 必须找到所观察到的轨迹是意义可能执行。最后，一台发电机，其重建原始输入，鼓励模型发现，保存有关命令的知识表达。这些一起确保含义既不过于笼统，也没有太具体。我们的模型概括好，能尽管事实上，解析和执行这两个机器产生的和人为的命令，用近乎相等的准确性，人类产生的句子变化较多，并以开放的词汇复杂。这里介绍的方法是不特定的零担;它可以被应用到其中句子的含义可以推测和执行人可以验证这些含义，从而打开门机器人剂许多应用的任何域。

2. SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media [PDF] 返回目录
Amirreza Shirani, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio
Abstract: In this paper, we present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media. The goal of this shared task is to design automatic methods for emphasis selection, i.e. choosing candidates for emphasis in textual content to enable automated design assistance in authoring. The main focus is on short text instances for social media, with a variety of examples, from social media posts to inspirational quotes. Participants were asked to model emphasis using plain text with no additional context from the user or other design considerations. SemEval-2020 Emphasis Selection shared task attracted 197 participants in the early phase and a total of 31 teams made submissions to this task. The highest-ranked submission achieved 0.823 Matchm score. The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used, and part of speech tag (POS) was the most useful feature. Full results can be found on the task's website.
摘要：在本文中，我们目前的主要结论和SemEval-2020的比较结果任务10，在视觉媒体的书面文字重点选择。这个共享任务的目标是设计中的文字内容为重点进行强调选择自动方法，即选择的候选人能够在创作自动化设计援助。重点是对社交媒体的简短文字的情况下，用各种例子，从社交媒体文章，以鼓舞人心的报价。与会者使用与来自用户或其他设计方面的考虑没有额外的背景下明文要求模型强调。 SemEval-2020重点选择共享任务吸引了197名参与者在早期阶段，共提出意见这项任务31支球队。排名靠前的提交实现0.823 Matchm得分。提交给任务系统的分析表明，BERT和罗伯塔是用于预先训练模型最常见的选择，和语音标签（POS）的一部分，是最有用的功能。全部结果可在任务的网站上找到。

3. Quran Intelligent Ontology Construction Approach Using Association Rules Mining [PDF] 返回目录
Fouzi Harrag, Abdullah Al-Nasser, Abdullah Al-Musnad, Rayan Al-Shaya
Abstract: Ontology can be seen as a formal representation of knowledge. They have been investigated in many artificial intelligence studies including semantic web, software engineering, and information retrieval. The aim of ontology is to develop knowledge representations that can be shared and reused. This research project is concerned with the use of association rules to extract the Quran ontology. The manual acquisition of ontologies from Quran verses can be very costly; therefore, we need an intelligent system for Quran ontology construction using patternbased schemes and associations rules to discover Quran concepts and semantics relations from Quran verses. Our system is based on the combination of statistics and linguistics methods to extract concepts and conceptual relations from Quran. In particular, a linguistic pattern-based approach is exploited to extract specific concepts from the Quran, while the conceptual relations are found based on association rules technique. The Quran ontology will offer a new and powerful representation of Quran knowledge, and the association rules will help to represent the relations between all classes of connected concepts in the Quran ontology.
摘要：本体可以看作是知识的形式表示。他们在许多人工智能的研究，包括语义网，软件工程，信息检索进行了研究。本体的目标是开发可共享和重用知识表示。这一研究项目涉及利用关联规则来提取古兰经本体。手动采集从兰经经文本体可以是非常昂贵;因此，我们需要利用patternbased方案和协会的规则来发现古兰经的概念，并从古兰经经文语义关系古兰经本体构建的智能系统。我们的系统是基于统计和语言学的方法来提取概念和概念古兰经关系的组合。具体地，语言基于模式的方法被利用，提取从可兰经特定的概念，而概念关系是基于关联规则技术找到。古兰经本体将提供古兰经的知识新的强大表现，以及关联规则将有助于代表在古兰经本体连接的概念，所有的类之间的关系。

4. IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection [PDF] 返回目录
Jens Kaiser, Dominik Schlechtweg, Sean Papay, Sabine Schulte im Walde
Abstract: We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that differences in performance can largely be attributed to model-specific sources of noise, and we reveal a strong relationship between dimensionality and frequency-induced noise in VI alignment. Our results suggest that lexical semantic change models integrating vector space alignment should pay more attention to the role of the dimensionality parameter.
摘要：我们提出我们的系统的结果SemEval-2020任务1，它利用基于跳过革兰阴性与采样常用词汇语义变化检测模型。我们强调对初始化向量（VI）比对，比较VI到目前排名靠前的型号为子任务2和证明，这些都可以跑赢如果我们优化VI维度。我们表明，在性能上的差异可以在很大程度上归因于噪声模型的具体来源，而我们揭示维和频率诱导的VI对准噪声之间牢固的关系。我们的研究结果表明，词汇语义变化模型整合载体空间排列应更加注重维度参数的作用。

5. Privacy Guarantees for De-identifying Text Transformations [PDF] 返回目录
David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow
Abstract: Machine Learning approaches to Natural Language Processing tasks benefit from a comprehensive collection of real-life user data. At the same time, there is a clear need for protecting the privacy of the users whose data is collected and processed. For text collections, such as, e.g., transcripts of voice interactions or patient records, replacing sensitive parts with benign alternatives can provide de-identification. However, how much privacy is actually guaranteed by such text transformations, and are the resulting texts still useful for machine learning? In this paper, we derive formal privacy guarantees for general text transformation-based de-identification methods on the basis of Differential Privacy. We also measure the effect that different ways of masking private information in dialog transcripts have on a subsequent machine learning task. To this end, we formulate different masking strategies and compare their privacy-utility trade-offs. In particular, we compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks like named entity recognition, intent detection, and dialog act classification. We find that only word-by-word replacement is robust against performance drops in various tasks.
摘要：机器学习从现实生活中的用户数据的全面收集接近自然语言处理任务的好处。与此同时，对保护其数据收集和处理用户的隐私显然需要。对于文本集合，如，例如，语音相互作用或病人记录的转录物，良性替代品取代敏感的部件可以提供去标识。然而，有多少隐私被这样的文本转换实际上是有保证的，而且所产生的文本仍然是机器学习有用吗？在本文中，我们推导出微分隐私的基础上，一般基于文本的转换，去识别方法正式的隐私保证。我们还测量，在对话框的成绩单遮盖私人信息不同的方式对后续的机器学习任务的影响。为此，我们制定不同的屏蔽策略，并比较他们的隐私效用的权衡。特别是，我们使用像命名实体识别，意图检测和对话行为分类多个自然语言理解任务深度学习模式比较简单的方法涂黑用更复杂的字的字替换。我们发现，只有字的字替换为具备应对表现在各种任务下降。

6. Perception Score, A Learned Metric for Open-ended Text Generation Evaluation [PDF] 返回目录
Jing Gu, Qingyang Wu, Zhou Yu
Abstract: Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.
摘要：开放式的自然语言生成任务自动评价仍然是一个挑战。现有指标如BLEU显示出与人的判断具有低相关性。我们提出一种新颖且功能强大的基于学习的评价指标：物理感知分数。该方法测量的产生和分数的整体质量整体上，而不是只着眼于一个评价标准，诸如字重叠。此外，它也显示了有关其评价结果不确定性的量。通过连接的不确定性，物理感知分数给出了发电系统更准确的评价。感知分数提供在两个条件生成任务和两个无条件生成任务状态的最先进的结果。

7. A Context-based Disambiguation Model for Sentiment Concepts Using a Bag-of-concepts Approach [PDF] 返回目录
Zeinab Rajabi, MohammadReza Valavi, Maryam Hourali
Abstract: With the widespread dissemination of user-generated content on different social networks, and online consumer systems such as Amazon, the quantity of opinionated information available on the Internet has been increased. One of the main tasks of the sentiment analysis is to detect polarity within a text. The existing polarity detection methods mainly focus on keywords and their naive frequency counts; however, they less regard the meanings and implicit dimensions of the natural concepts. Although background knowledge plays a critical role in determining the polarity of concepts, it has been disregarded in polarity detection methods. This study presents a context-based model to solve ambiguous polarity concepts using commonsense knowledge. First, a model is presented to generate a source of ambiguous sentiment concepts based on SenticNet by computing the probability distribution. Then the model uses a bag-of-concepts approach to remove ambiguities and semantic augmentation with the ConceptNet handling to overcome lost knowledge. ConceptNet is a large-scale semantic network with a large number of commonsense concepts. In this paper, the point mutual information (PMI) measure is used to select the contextual concepts having strong relationships with ambiguous concepts. The polarity of the ambiguous concepts is precisely detected using positive/negative contextual concepts and the relationship of the concepts in the semantic knowledge base. The text representation scheme is semantically enriched using Numberbatch, which is a word embedding model based on the concepts from the ConceptNet semantic network. The proposed model is evaluated by applying a corpus of product reviews, called Semeval. The experimental results revealed an accuracy rate of 82.07%, representing the effectiveness of the proposed model.
摘要：对不同的社交网络和在线消费系统，如亚马逊的用户生成内容的广泛传播，自以为是的信息在互联网上提供的数量有所增加。其中的情感分析的主要任务是将文本中检测极性。现有的极性检测方法主要集中在关键词和他们的天真的频数;然而，他们认为少的意义和自然的概念隐含的尺寸。尽管背景知识起着决定的概念极性至关重要的作用，它已被忽视的极性检测方法。这项研究提出了一种基于上下文的模式来解决使用常识性知识暧昧性概念。首先，模型被呈现给生成的通过计算概率分布基于SenticNet暧昧情绪概念的源。然后，该模型采用了一袋概念为基础，以消除含糊和语义增强与ConceptNet处理，以克服失去的知识。 ConceptNet是有大量的常识性概念的大型语义网络。在本文中，点互信息（PMI）测量来选择具有不明确的概念牢固的关系上下文的概念。的歧义的概念的极性是使用正/负的上下文的概念和概念在语义知识库的关系精确地检测。该文本表示方案是使用Numberbatch，这是基于从ConceptNet语义网络的概念的字嵌入模型语义富集。该模型是通过应用的商品评论文集，名为Semeval评估。实验结果揭示了82.07％的准确率，代表所提出的模型的有效性。

8. Data Weighted Training Strategies for Grammatical Error Correction [PDF] 返回目录
Jared Lichtarge, Chris Alberti, Shankar Kumar
Abstract: Recent progress in the task of Grammatical Error Correction (GEC) has been driven by addressing data sparsity, both through new methods for generating large and noisy pretraining data and through the publication of small and higher-quality finetuning data in the BEA-2019 shared task. Building upon recent work in Neural Machine Translation (NMT), we make use of both kinds of data by deriving example-level scores on our large pretraining data based on a smaller, higher-quality dataset. In this work, we perform an empirical study to discover how to best incorporate delta-log-perplexity, a type of example scoring, into a training schedule for GEC. In doing so, we perform experiments that shed light on the function and applicability of delta-log-perplexity. Models trained on scored data achieve state-of-the-art results on common GEC test sets.
摘要：在语法纠错（GEC）任务的最新进展已被解决数据稀疏驱动，无论是通过新的方法来产生大和嘈杂的训练前的数据，并通过小型和更高质量的细化和微调的数据在BEA-2019发布共享任务。在神经中的机器翻译（NMT）最近的工作基础上，我们通过基于更小，更高质量的数据集上我们的大型训练前数据推导例如级分数使用两种类型的数据。在这项工作中，我们进行了实证研究，以发现如何最好地一体化Δ-数茫然，一种例如得分，成GEC一个训练计划。在此过程中，我们执行的功能和三角洲日志茫然的适用性阐明实验。上训练得分数据模型实现共同GEC测试集状态的最先进的结果。

9. Which Kind Is Better in Open-domain Multi-turn Dialog,Hierarchical or Non-hierarchical Models? An Empirical Study [PDF] 返回目录
Tian Lan, Xian-Ling Mao, Wei Wei, Heyan Huang
Abstract: Currently, open-domain generative dialog systems have attracted considerable attention in academia and industry. Despite the success of single-turn dialog generation, multi-turn dialog generation is still a big challenge. So far, there are two kinds of models for open-domain multi-turn dialog generation: hierarchical and non-hierarchical models. Recently, some works have shown that the hierarchical models are better than non-hierarchical models under their experimental settings; meanwhile, some works also demonstrate the opposite conclusion. Due to the lack of adequate comparisons, it's not clear which kind of models are better in open-domain multi-turn dialog generation. Thus, in this paper, we will measure systematically nearly all representative hierarchical and non-hierarchical models over the same experimental settings to check which kind is better. Through extensive experiments, we have the following three important conclusions: (1) Nearly all hierarchical models are worse than non-hierarchical models in open-domain multi-turn dialog generation, except for the HRAN model. Through further analysis, the excellent performance of HRAN mainly depends on its word-level attention mechanism; (2) The performance of other hierarchical models will also obtain a great improvement if integrating the word-level attention mechanism into these models. The modified hierarchical models even significantly outperform the non-hierarchical models; (3) The reason why the word-level attention mechanism is so powerful for hierarchical models is because it can leverage context information more effectively, especially the fine-grained information. Besides, we have implemented all of the models and already released the codes.
摘要：目前，开域生成对话系统已经吸引了相当多的关注学术界和工业界。尽管单圈对话生成的成功，多轮对话一代仍然是一个很大的挑战。到目前为止，有两种型号可供开放领域多轮对话产生：分层和非分层模型。最近，一些作品已经表明，分层模型比在他们的实验设置非分层模型更好;同时，一些作品也表现出了相反的结论。由于缺乏足够的比较，目前还不清楚是哪一种机型在开放领域更好的多轮对话的产生。因此，在本文中，我们将系统地衡量，几乎所有代表分层和非分层模型在相同的实验设置检查哪种更好。通过大量的实验，我们有以下三个重要结论：（1）几乎所有的分层模型比开放领域多轮对话代非分层模型糟糕的是，除了HRAN模型。通过进一步分析，HRAN的出色表现主要取决于其词的高度重视机制; （2）其他层次模型的性能也将整合是否在词的高度重视机制引入这些模型得到了很大的改进。修改后的分层模型甚至显著优于非分层模型; （3）之所以字级注意机制是如此强大的分层模型是因为它可以更有效地利用上下文信息，尤其是细粒度信息。此外，我们已经实现了所有的模型和已经发布的代码。

10. Evaluating computational models of infant phonetic learning across languages [PDF] 返回目录
Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater
Abstract: In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.
摘要：在生命的第一年，婴儿言语感知变得切合他们的母语的声音。这个早期的音标学习存在的许多帐户，但计算模型预测从他们听到一直缺乏语音输入在婴儿观察到的点化模式。最近的一项研究中提出的第一个这样的模型中，提出了从自然语音无监督学习算法绘图，并测试它在一个单一的手机对比。我们在这里学习五个这样的算法，挑选适合自己的潜力的认知意义。我们模拟音标学习每个算法，并从不同的语言三款手机的对比进行测试，比较结果以婴儿的歧视模式。五款显示器不同程度的协议与经验观察，这表明我们的方法可以帮助候选人之间的机制决定早音标学习，并提供洞察到其中的车型方面对于捕捉婴幼儿的感性发展的关键。

11. Efficient Neural Query Auto Completion [PDF] 返回目录
Sida Wang, Weiwei Guo, Huiji Gao, Bo Long
Abstract: Query Auto Completion (QAC), as the starting point of information retrieval tasks, is critical to user experience. Generally it has two steps: generating completed query candidates according to query prefixes, and ranking them based on extracted features. Three major challenges are observed for a query auto completion system: (1) QAC has a strict online latency requirement. For each keystroke, results must be returned within tens of milliseconds, which poses a significant challenge in designing sophisticated language models for it. (2) For unseen queries, generated candidates are of poor quality as contextual information is not fully utilized. (3) Traditional QAC systems heavily rely on handcrafted features such as the query candidate frequency in search logs, lacking sufficient semantic understanding of the candidate. In this paper, we propose an efficient neural QAC system with effective context modeling to overcome these challenges. On the candidate generation side, this system uses as much information as possible in unseen prefixes to generate relevant candidates, increasing the recall by a large margin. On the candidate ranking side, an unnormalized language model is proposed, which effectively captures deep semantics of queries. This approach presents better ranking performance over state-of-the-art neural ranking methods and reduces $\sim$95\% latency compared to neural language modeling methods. The empirical results on public datasets show that our model achieves a good balance between accuracy and efficiency. This system is served in LinkedIn job search with significant product impact observed.
摘要：查询自动完成（QAC），作为信息检索任务的出发点，是用户体验的关键。通常它有两个步骤：根据查询前缀生成完成查询候选，并且基于提取的特征排名它们。三大挑战为查询自动完成系统观察：（1）QAC有着严格的在线等待时间要求。对于每次键击，结果要几十毫秒，这对设计复杂的语言模型为它显著挑战之内归还。（2）对于看不见的查询，生成候选的质量差作为上下文信息没有被充分利用。（3）传统QAC系统严重依赖于手工制作的功能，如搜索日志查询候选频率，缺乏候选人有足够的语义理解。在本文中，我们提出了有效的上下文建模来克服这些挑战的一个有效的神经QAC系统。在候选发电侧，该系统采用了尽可能多的信息可以在看不见的前缀来生成有关的候选人，大幅度提高了召回。在候选排序侧，非标准化的语言模型，提出了一种有效地捕捉查询的深层语义。这种方法呈现更好的排名性能比国家的最先进的神经排名方法和相比减少神经语言建模方法$ \ $ SIM卡95 \％的等待时间。公共数据集上的实证结果表明，我们的模型实现了精度和效率之间的良好平衡。该系统提供与观察显著产品影响LinkedIn求职。

12. A Multilingual Neural Machine Translation Model for Biomedical Data [PDF] 返回目录
Alexandre Bérard, Zae Myung Kim, Vassilina Nikoulina, Eunjeong L. Park, Matthias Gallé
Abstract: We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.
摘要：我们发布一个多语种的神经机器翻译模型，它可以用来翻译在生物医学领域的文本。该模型可以从5种语言（法语，德语，意大利语，韩语和西班牙语）翻译成英语。它被训练用大量的通用和生物医学数据的，使用域标签。我们的基准测试显示，它同时执行对新闻（通用域）和生物医学试验台附近国家的最先进的，而且它优于现有的公开发布的机型。我们相信，本次发布将有助于COVID-19危机及其对社会，经济和医疗政策效应的数字内容的大型多语种分析。我们还推出一考定韩英生物医学文本。它由来自官方的指导方针和最近的论文758句，所有关于COVID-19。

13. Semantic Complexity in End-to-End Spoken Language Understanding [PDF] 返回目录
Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris
Abstract: End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.
摘要：结束到终端的口语理解（SLU）模型是一类直接从语音语义预测模型架构。由于它们的输入和输出类型，我们称它们为语音 - 解释（STI）的模型。以前的作品已经成功地应用于STI车型针对性地使用情况，比如承认家庭自动化的命令，但没有研究尚未解决这些模式如何推广到更广泛的用例。在这项工作中，我们分析了STI车型的性能和所应用于他们的使用案例的难度之间的关系。我们介绍的数据集语义复杂的经验措施，量化的SLU任务的难度。我们发现，在文献报道STI车型近乎完美的性能指标，用具有低语义复杂度值数据集获得。我们执行我们改变了大量的专有数据集的语义复杂性，并表明，我们的语义的复杂性措施STI模型性能相关，使得性能随着复杂度值下降的实验。我们的研究结果表明，它的背景情况的STI车型的其训练数据集的复杂度值表现，以揭示其适用范围是非常重要的。

14. aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning [PDF] 返回目录
Anton Chernyavskiy, Dmitry Ilvovsky, Preslav Nakov
Abstract: We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in training. We achieved sizable improvements over baseline fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd (almost tied with the 2nd) out of 36 teams on the span identification subtask with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams on the technique classification subtask with an F1 score of 0.62.
摘要：我们描述我们的系统SemEval-2020工作11宣传部技术的检测新闻文章。我们开发了使用基于罗伯塔神经结构，附加CRF层，两个子任务之间的迁移学习，和先进的后处理处理任务的多标签性质，嵌套的跨度，重复和相似的标签之间的一致性合奏模型跨度培训。我们在基线微调罗伯塔模型实现可观的改善，以及官方的评价排名我们的系统3日（几乎与第二并列）的36支球队的跨度识别子任务与F1得分0.491，而第二（几乎与绑第一）的31个团队上与F1分数0.62的技术分类子任务出来。

15. Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks [PDF] 返回目录
Gábor Gosztolya, László Tóth
Abstract: The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) consists of three Sub-Challenges, where the tasks are to identify the level of arousal and valence of elderly speakers, determine whether the actual speaker wearing a surgical mask, and estimate the actual breathing of the speaker. In our contribution to the Challenge, we focus on the Elderly Emotion and the Mask sub-challenges. Besides utilizing standard or close-to-standard features such as ComParE functionals, Bag-of-Audio-Words and Fisher vectors, we exploit that emotion is related to the velocity of speech (i.e. speech rate). To utilize this, we perform phone-level recognition using an ASR system, and extract features from the output such as articulation tempo, speech tempo, and various attributes measuring the amount of pauses. We also hypothesize that wearing a surgical mask makes the speaker feel uneasy, leading to a slower speech rate and more hesitations; hence, we experiment with the same features in the Mask sub-challenge as well. Although this theory was not justified by the experimental results on the Mask Sub-Challenge, in the Elderly Emotion Sub-Challenge we got significantly improved arousal and valence values with this feature type both on the development set and in cross-validation.
摘要：2020年INTERSPEECH计算Paralinguistics挑战（比较）由三个子挑战，其任务是确定的觉醒和老年扬声器价的水平，确定实际的扬声器是否戴口罩，并估计实际呼吸演讲者。在我们对挑战的贡献，我们重点对老年人情绪和面具子的挑战。除了利用标准或贴近标准的功能，如比较函，袋的音频字和Fisher矢量，我们利用该情感有关的语音的速度（即，话音速率）。为了利用这一点，我们使用执行ASR系统电话级识别和提取特征从输出如关节运动节奏，讲话速度，和各种属性测量暂停的量。我们还推测，戴外科口罩，使扬声器感到不安，导致较慢的语速和更犹豫;因此，我们尝试用Mask子挑战相同的功能，以及。虽然这个理论不是在面膜次挑战赛的实验结果合理，老年人情感子挑战，我们得到了与无论在开发集和交叉验证这个功能型显著改进的觉醒和价值。

16. Better Fine-Tuning by Reducing Representational Collapse [PDF] 返回目录
Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta
Abstract: Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.
摘要：尽管广泛采用，用于微调预训练的语言模型现有的方法已被证明是整个超参数设置不稳定，激励近期信赖域方法的工作。在本文中，我们提出植根于信任区理论替换先前用于与参数噪声（来自正常的或均匀分布的采样）对抗目标的简化且有效的方法，由此阻止微调期间表示变化时可能的，而不损害性能。我们还引入了一个新的分析，以更普遍的激励采用信赖域方法，通过研究代表性倒塌;概括的表述从前期训练的模型降解，因为它们是微调特定结束任务。大量的实验表明，我们的微调方法匹配或超过在一系列的理解和生成任务（包括每日邮报/ CNN，Gigaword，reddit的TIFU和胶基准）以前信赖域方法的性能，同时还快得多。我们还表明，它是不容易表现倒塌;预先训练的车型保持他们每次都是微调时间更一般化表示。

17. Convolutional Complex Knowledge Graph Embeddings [PDF] 返回目录
Caglar Demir, Axel-Cyrille Ngonga Ngomo
Abstract: In this paper, we study the problem of learning continuous vector representations of knowledge graphs for predicting missing links. We present a new approach called ConEx, which infers missing links by leveraging the composition of a 2D convolution with a Hermitian inner product of complex-valued embedding vectors. We evaluate ConEx against state-of-the-art approaches on the WN18RR, FB15K-237, KINSHIP and UMLS benchmark datasets. Our experimental results show that ConEx achieves a performance superior to that of state-of-the-art approaches such as RotatE, QuatE and TuckER on the link prediction task on all datasets while requiring at least 8 times fewer parameters. We ensure the reproducibility of our results by providing an open-source implementation which includes the training, evaluation scripts along with pre-trained models at this https URL.
摘要：在本文中，我们研究学习知识图的连续向量表示预测缺失环节的问题。我们提出了一种称为CONEX新的方法，其推断通过利用二维卷积的组合物与复值嵌入矢量的厄米内积缺失的环节。我们评估CONEX对国家的最先进的WN18RR接近，FB15K-237，亲情和UMLS基准数据集。我们的实验结果表明，实现了CONEX优于的状态的最先进的方法，如旋转，QuatE和Tucker在链接预测任务上的所有数据集，同时需要至少8倍参数更少性能。我们通过提供开源实现，其中包括培训，评估与脚本预训练的模型一起在这个HTTPS URL确保了结果的可重复性。

18. Pretraining Techniques for Sequence-to-Sequence Voice Conversion [PDF] 返回目录
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda
Abstract: Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody. Nonetheless, without sufficient data, seq2seq VC models can suffer from unstable training and mispronunciation problems in the converted speech, thus far from practical. To tackle these shortcomings, we propose to transfer knowledge from other speech processing tasks where large-scale corpora are easily available, typically text-to-speech (TTS) and automatic speech recognition (ASR). We argue that VC models initialized with such pretrained ASR or TTS model parameters can generate effective hidden representations for high-fidelity, highly intelligible converted speech. We apply such techniques to recurrent neural network (RNN)-based and Transformer based models, and through systematical experiments, we demonstrate the effectiveness of the pretraining scheme and the superiority of Transformer based models over RNN-based models in terms of intelligibility, naturalness, and similarity.
摘要：序列到序列（seq2seq）语音转换（VC）模型是有吸引力的，由于自己的能力转换韵律。尽管如此，没有足够的数据，seq2seq VC模型可以从转换的语音不稳定的培训和发音错误问题的困扰，从而远离现实。为了解决这些缺点，我们建议从那里大规模语料库容易获得其他语音处理任务，通常是文本到语音转换（TTS）和自动语音识别（ASR）知识转让。我们认为，这样的预训练ASR或TTS模型参数初始化VC模型可以生成高保真，高清晰度的转换的语音有效隐藏表示。我们应用这种技术来回归神经网络（RNN）为基础，并基于变压器模型，并通过系统化的实验中，我们证明在清晰度，自然方面的训练前方案，并基于变压器模型比基于RNN的模型的优越性的效果，和相似性。

19. Peking Opera Synthesis via Duration Informed Attention Network [PDF] 返回目录
Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
Abstract: Peking Opera has been the most dominant form of Chinese performing art since around 200 years ago. A Peking Opera singer usually exhibits a very strong personal style via introducing improvisation and expressiveness on stage which leads the actual rhythm and pitch contour to deviate significantly from the original music score. This inconsistency poses a great challenge in Peking Opera singing voice synthesis from a music score. In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework. To tackle the rhythm mismatch, Lagrange multiplier is used to find the optimal output phoneme duration sequence with the constraint of the given note duration from music score. As for the pitch contour mismatch, instead of directly inferring from music score, we adopt a pseudo music score generated from the real singing and feed it as input during training. The experiments demonstrate that with the proposed system we can synthesize Peking Opera singing voice with high-quality timbre, pitch and expressiveness.
摘要：京剧一直是中国的最主要形式，因为大约200年前的表演艺术。京剧歌手通常通过引入在舞台上即兴发挥和表现力，导致实际的节奏和音调曲线从原来的乐谱显著偏离表现出非常强烈的个人风格。这种不一致带来的京剧从乐谱歌声合成一个巨大的挑战。在这项工作中，我们提出来处理这个问题和合成基于持续时间知情关注网络（榴莲）框架的乐谱表现京剧唱腔。为解决这个节奏不匹配，拉格朗日乘子是用来寻找与乐谱给定的音符持续时间的约束的最佳输出音素时间序列。至于基音不匹配，而不是从乐谱直接推断，我们采用从实际的歌唱生成的伪乐谱和培训过程中喂养它作为输入。实验结果表明，所提出的系统，我们可以合成京剧歌声与高品质的音色，音高和表现力。

20. Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification [PDF] 返回目录
Wenjun Qiu, David Lie
Abstract: Privacy policies are statements that notify users of the services' data practices. However, few users are willing to read through policy texts due to the length and complexity. While automated tools based on machine learning exist for privacy policy analysis, to achieve high classification accuracy, classifiers need to be trained on a large labeled dataset. Most existing policy corpora are labeled by skilled human annotators, requiring significant amount of labor hours and effort. In this paper, we leverage active learning and crowdsourcing techniques to develop an automated classification tool named Calpric (Crowdsourcing Active Learning PRIvacy Policy Classifier), which is able to perform annotation equivalent to those done by skilled human annotators with high accuracy while minimizing the labeling cost. Specifically, active learning allows classifiers to proactively select the most informative segments to be labeled. On average, our model is able to achieve the same F1 score using only 62% of the original labeling effort. Calpric's use of active learning also addresses naturally occurring class imbalance in unlabeled privacy policy datasets as there are many more statements stating the collection of private information than stating the absence of collection. By selecting samples from the minority class for labeling, Calpric automatically creates a more balanced training set.
摘要：隐私策略是通知服务数据的做法的用户报表。然而，很少有用户愿意通过政策文本由于长度和复杂性阅读。虽然基于机器学习的存在隐私政策分析，自动化工具实现高分类准确率，分类需要一个大的数据集标记进行训练。大多数现有的政策语料库由熟练的人工注释标记，需要的劳动时间和精力显著量。在本文中，我们利用主动学习和众包技术来开发一个名为Calpric（众包主动学习的隐私政策分类）的自动分类工具，它能够执行注释等同于那些由高精度熟练人工注释完成，同时尽量减少标签成本。具体而言，主动学习允许分类器来主动选择待标记信息最多的段。平均而言，我们的模型能够仅使用62％的原始标签的努力来达到同样的F1值。 Calpric的使用主动学习的地址也自然发生在未标记的隐私策略的数据集类的不平衡，因为有更多的说明，说明私人信息的收集不是说明没有收集的。通过从少数类的标签选择样本，Calpric会自动创建一个更加平衡的训练集。

21. A general solution to the preferential selection model [PDF] 返回目录
Jake Ryland Williams, Diana Solano-Oropeza, Jacob R. Hunsberger
Abstract: We provide a general analytic solution to Herbert Simon's 1955 model for time-evolving novelty functions. This has far-reaching consequences: Simon's is a pre-cursor model for Barabasi's 1999 preferential attachment model for growing social networks, and our general abstraction of it more considers attachment to be a form of link selection. We show that any system which can be modeled as instances of types---i.e., occurrence data (frequencies)---can be generatively modeled (and simulated) from a distributional perspective with an exceptionally high-degree of accuracy.
摘要：我们的时间演变的新奇功能，提供了一般的分析解决方案，赫伯特·西蒙的1955年模型。这具有深远的后果：西蒙的是种植社交网络Barabasi 1999年的优先连接模型前标模型，我们对它的一般抽象更认为附件是链路选择的一种形式。我们表明，其可与准确度的异常高度被建模为从分配透视的类型的实例---即，发生数据（频率）---可以生成地建模（和模拟）的任何系统。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-10

目录

摘要