摘要

1. Interactive Text Graph Mining with a Prolog-based Dialog Engine [PDF] 返回目录
Paul Tarau, Eduardo Blanco
Abstract: On top of a neural network-based dependency parser and a graph-based natural language processing module we design a Prolog-based dialog engine that explores interactively a ranked fact database extracted from a text document. We reorganize dependency graphs to focus on the most relevant content elements of a sentence and integrate sentence identifiers as graph nodes. Additionally, after ranking the graph we take advantage of the implicit semantic information that dependency links and WordNet bring in the form of subject-verb-object, is-a and part-of relations. Working on the Prolog facts and their inferred consequences, the dialog engine specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at this https URL . Under consideration in Theory and Practice of Logic Programming (TPLP).
摘要：在基于神经网络的依赖解析器，我们设计了一个基于Prolog的，对话引擎，探索交互方式从排名文本文档中提取事实数据库基于图的自然语言处理模块的顶部。我们重组依赖图把重点放在句子的最相关的内容要素，整合句子标识符作为图形节点。此外，排名图表后，我们采取的隐性语义信息相关性的联系和共发现带来主语 - 动词 - 对象的形式，是-a和关系部分的优势。在Prolog的事实及其后果推断合作，对话引擎专门针对文字图形的查询和交互显示文档的最相关的内容元素。集成系统的开放源代码可在此HTTPS URL。正在审议的理论和逻辑编程实践（TPLP）。

2. OFAI-UKP at HAHA@IberLEF2019: Predicting the Humorousness of Tweets Using Gaussian Process Preference Learning [PDF] 返回目录
Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, Iryna Gurevych
Abstract: Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
摘要：在最好的离散最幽默的处理系统，到目前为止时许，滑稽和传统，但这样的观念之间的粗粒度的区分可以更好地概念化为一种广谱。在本文中，我们提出了一个概率方法，高斯过程偏好学习的变体（GPPL），即学会等级和通过利用人类偏好的判断，并自动来源语言注释率短文本的humorousness。我们应用我们的系统，该系统此前曾表示与成对humorousness注解英文单行性能好，给HAHA @ IberLEF2019评价活动的西班牙语数据集。我们报告了活动的两个子程序的系统性能，幽默检测和funniness分数预测，并讨论从我们的方法需要在HAHA @ IberLEF2019数据时使用的数字分数和成对判断注释之间的转换所产生的一些问题。

3. LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT? [PDF] 返回目录
Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann
Abstract: This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.
摘要：本文介绍@赫尔辛基小组提交LT的SemEval 2020共享任务12.我们的团队不同的车型参加了子任务A和C;标题分别冒犯性的语言识别和进攻目标识别，。在这两种情况下，我们从变压器（BERT），由谷歌前培训，并通过我们的奥利德和SOLID数据集微调一个模型中使用的所谓的双向编码表示。结果表明，进攻鸣叫分类是基于几种语言任务，其中BERT能达到国家的最先进的成果之一。

4. Elsevier OA CC-By Corpus [PDF] 返回目录
Daniel Kershaw, Rob Koeling
Abstract: We introduce the Elsevier OA CC-BY corpus. This is the first open corpus of Scientific Research papers which has a representative sample from across scientific disciplines. This corpus not only includes the full text of the article, but also the metadata of the documents, along with the bibliographic information for each reference.
摘要：介绍了爱思唯尔OA CC-BY语料库。这是它有来自全国各地的科学学科具有代表性的样本科研论文先开语料库。该文集不仅包括文章的全文，同时也是文件的元数据，对每个参考书目信息。

5. Deep Learning based Topic Analysis on Financial Emerging Event Tweets [PDF] 返回目录
Shaan Aryaman, Nguwi Yok Yen
Abstract: Financial analyses of stock markets rely heavily on quantitative approaches in an attempt to predict subsequent or market movements based on historical prices and other measurable metrics. These quantitative analyses might have missed out on un-quantifiable aspects like sentiment and speculation that also impact the market. Analyzing vast amounts of qualitative text data to understand public opinion on social media platform is one approach to address this gap. This work carried out topic analysis on 28264 financial tweets [1] via clustering to discover emerging events in the stock market. Three main topics were discovered to be discussed frequently within the period. First, the financial ratio EPS is a measure that has been discussed frequently by investors. Secondly, short selling of shares were discussed heavily, it was often mentioned together with Morgan Stanley. Thirdly, oil and energy sectors were often discussed together with policy. These tweets were semantically clustered by a method consisting of word2vec algorithm to obtain word embeddings that map words to vectors. Semantic word clusters were then formed. Each tweet was then vectorized using the Term Frequency-Inverse Document Frequency (TF-IDF) values of the words it consisted of and based on which clusters its words were in. Tweet vectors were then converted to compressed representations by training a deep-autoencoder. K-means clusters were then formed. This method reduces dimensionality and produces dense vectors, in contrast to the usual Vector Space Model. Topic modelling with Latent Dirichlet Allocation (LDA) and top frequent words were used to analyze clusters and reveal emerging events.
摘要：股市的财务分析在很大程度上依赖于定量方法，试图预测根据历史价格和其他衡量指标，后续或市场走势。这些量化分析可能对像情绪和猜测也影响了市场的非量化的方面错过了。分析海量的定性的文本数据，了解社会化媒体平台，舆论是解决这一差距的一种方法。这项工作进行了专题分析28264个金融鸣叫[1]通过聚类发现在股市中出现的事件。被发现的三个主要议题进行的期限内经常讨论。首先，财务比率EPS是得到了投资者经常讨论的一个措施。其次，股票卖空进行了大量讨论，它经常与摩根士丹利一起提到。第三，石油和能源部门往往与政策放在一起讨论。这些鸣叫是语义上通过由word2vec算法，以获得单词的嵌入映射字矢量的方法聚类。然后形成语义词簇。然后，将每个鸣叫是使用词频 - 逆文档频率矢量（TF-IDF）的它共有和在此基础上的字的值聚类其话英寸然后资料Tweet载体通过训练深自动编码转换为压缩表示。然后形成K-均值群集。这种方法减少维数，并产生致密的载体中，与通常的向量空间模型。与隐含狄利克雷分布（LDA）和上频繁出现的词汇主题建模来分析集群和新兴揭示事件。

6. SemEval-2020 Task 5: Counterfactual Recognition [PDF] 返回目录
Xiaoyu Yang, Stephen Obadinma, Huasha Zhao, Qiong Zhang, Stan Matwin, Xiaodan Zhu
Abstract: We present a counterfactual recognition (CR) task, the shared Task 5 of SemEval-2020. Counterfactuals describe potential outcomes (consequents) produced by actions or circumstances that did not happen or cannot happen and are counter to the facts (antecedent). Counterfactual thinking is an important characteristic of the human cognitive system; it connects antecedents and consequents with causal relations. Our task provides a benchmark for counterfactual recognition in natural language with two subtasks. Subtask-1 aims to determine whether a given sentence is a counterfactual statement or not. Subtask-2 requires the participating systems to extract the antecedent and consequent in a given counterfactual statement. During the SemEval-2020 official evaluation period, we received 27 submissions to Subtask-1 and 11 to Subtask-2. The data, baseline code, and leaderboard can be found at this https URL. The data and baseline code are also available at this https URL.
摘要：我们提出一个反识别（CR）的任务，SemEval-2020的共享任务5。反事实描述的动作或没有发生或可能不会发生，是违背事实（前期）的情况下产生的潜在后果（后项）。反事实思维是人类认知系统的重要特征;它连接前因和后项与因果关系。我们的任务提供了自然语言识别反两个子任务的基准。子任务-1的目标来确定给定句子是否是一个反事实的陈述或没有。子任务-2要求参与系统提取的前提和后果在给定的反声明。在SemEval-2020正式评估期间，我们收到了27名提交给子任务-1和11子任务-2。数据，基准码，以及排行榜可以在此HTTPS URL中找到。数据和基线代码，也可在此HTTPS URL。

7. Video Question Answering on Screencast Tutorials [PDF] 返回目录
Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin
Abstract: This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.
摘要：本文介绍了关于截屏教程的新视频问答任务。我们引进包括问题，答案，从软件的视频教程背景三倍的数据集。与其他视频答疑作品，在我们的数据集中所有的答案都接地领域知识库。单触发识别算法的目的是提取的视觉线索，这有助于增强视频问答的性能。我们还提出了一种基于从数据集视频环境的各个方面的若干基线神经网络结构。实验结果表明，我们提出的模型显著改善的问题通过引入多模态环境和领域知识回答表演。

8. Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments [PDF] 返回目录
Ilyos Rabbimov, Iosif Mporas, Vasiliki Simaki, Sami Kobilov
Abstract: Opinion mining on social media posts has become more and more popular. Users often express their opinion on a topic not only with words but they also use image symbols such as emoticons and emoji. In this paper, we investigate the effect of emoji-based features in opinion classification of Uzbek texts, and more specifically movie review comments from YouTube. Several classification algorithms are tested, and feature ranking is performed to evaluate the discriminative ability of the emoji-based features.
摘要：社交媒体文章观点挖掘已经成为越来越流行。用户经常会表达他们对某个主题的意见不能只用言语，但他们也用图像符号，如表情符号和表情符号。在本文中，我们研究了基于表情符号的功能在乌兹别克文本的意见分类的效果，更确切地说是从YouTube视频的审查意见。几个分类算法进行测试，并且执行特征排名来评估的基于表情符号的功能的辨别能力。

9. Relation Extraction with Self-determined Graph Convolutional Network [PDF] 返回目录
Sunil Kumar Sahu, Derek Thomas, Billy Chiu, Neha Sengupta, Mohammady Mahdy
Abstract: Relation Extraction is a way of obtaining the semantic relationship between entities in text. The state-of-the-art methods use linguistic tools to build a graph for the text in which the entities appear and then a Graph Convolutional Network (GCN) is employed to encode the pre-built graphs. Although their performance is promising, the reliance on linguistic tools results in a non end-to-end process. In this work, we propose a novel model, the Self-determined Graph Convolutional Network (SGCN), which determines a weighted graph using a self-attention mechanism, rather using any linguistic tool. Then, the self-determined graph is encoded using a GCN. We test our model on the TACRED dataset and achieve the state-of-the-art result. Our experiments show that SGCN outperforms the traditional GCN, which uses dependency parsing tools to build the graph.
摘要：关系抽取是获得文本中的实体之间的语义关系的一种方式。国家的最先进的方法使用的语言工具来建立对于其中实体出现，然后采用一个图形卷积网络（GDN）来编码预建图的文本的曲线图。虽然他们的表现是有希望的，在非终端到端到端流程语言工具的结果的依赖。在这项工作中，我们提出了一种新颖的模型中，自判定格拉夫卷积网络（SGCN），使用自注意机制，而使用任何语言的工具，其确定一个加权图。然后，自确定的曲线图是使用GCN编码。我们测试我们对TACRED数据集模型和实现国家的最先进的结果。我们的实验表明，SGCN优于传统GCN，它采用依存分析工具来构建图形。

10. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning [PDF] 返回目录
Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan
Abstract: Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems can be created by finetuning on bitext. In this work, we show that multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one direction, a pretrained model is finetuned on many directions at the same time. Compared to multilingual models trained from scratch, starting from pretrained models incorporates the benefits of large quantities of unlabeled monolingual data, which is particularly important for low resource languages where bitext is not available. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance. We double the number of languages in mBART to support multilingual machine translation models of 50 languages. Finally, we create the ML50 benchmark, covering low, mid, and high resource languages, to facilitate reproducible research by standardizing training and evaluation data. On ML50, we demonstrate that multilingual finetuning improves on average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while improving 9.3 BLEU on average over bilingual baselines from scratch.
摘要：最近的研究表明建立一个模型的多语言训练前，可用于不同语言的各种任务的潜力。在多语言训练前之前的工作已经证明，机器翻译系统可以通过微调在bitext创建。在这项工作中，我们表明，多语种翻译模型可以通过多语种进行细化和微调创建。代替微调在一个方向的，一个预训练的模型被微调，对同时多方向。相比从头训练的多语言模型，从模型预先训练开始整合大量未标记的单语的数据，这是资源不足的语言，其中bitext不可尤为重要的好处。我们表明，预训练的模型可以扩展到包含其他语言不会损失性能。我们加倍的mBART语言的数量，支持50种语言的多语言机器翻译模型。最后，我们创建了ML50基准，涵盖了低，中，高资源的语言，以促进标准化的培训和评估数据重复性研究。上ML50，我们表明，多种语言提高细化和微调平均1个BLEU以上最强的基线（即从头开始或双语或者细化和微调多种语言），同时提高平均9.3 BLEU过从头双语基线。

11. A Text Classification Survey: From Shallow to Deep Learning [PDF] 返回目录
Qian Li, Hao Peng, Jianxin Li, Congyin Xia, Renyu Yang, Lichao Sun, Philip S. Yu, Lifang He
Abstract: Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state of the art approaches from 1961 to 2020, focusing on models from shallow to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.
摘要：文本分类是自然语言处理的最根本，最重要的任务。在过去的十年中已经看到了这方面的研究激增，由于深学习的空前成功。许多方法，数据集和评价标准已提出在文献中，提高了全面更新调查的必要性。本文填补了审查技术状态接近1961年至2020年，专注于模型由浅到深的学习差距。我们对文本分类创建根据所涉及的文本和用于特征提取和分类模型的分类法。然后，我们讨论了每个类别中的细节，处理的技术发展和标准数据集涵盖了预测的支持同时测试。在该调查中还提供了不同的技术，以及作为识别各种评价度量的优点和缺点之间的全面比较。最后，我们将总结关键的影响，未来的研究方向，并面向研究领域所面临的挑战。

12. Cross-context News Corpus for Protest Events related Knowledge Base Construction [PDF] 返回目录
Ali Hürriyetoğlu, Erdem Yörük, Deniz Yüret, Osman Mutlu, Çağrı Yoltar, Fırat Duruşan, Burak Gürel
Abstract: We describe a gold standard corpus of protest events that comprise of various local and international sources from various countries in English. The corpus contains document, sentence, and token level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information, constructing knowledge bases which enable comparative social and political science studies. For each news source, the annotation starts on random samples of news articles and continues with samples that are drawn using active learning. Each batch of samples was annotated by two social and political scientists, adjudicated by an annotation supervisor, and was improved by identifying annotation errors semi-automatically. We found that the corpus has the variety and quality to develop and benchmark text classification and event extraction systems in a cross-context setting, which contributes to the generalizability and robustness of automated text processing systems. This corpus and the reported results will set the currently lacking common ground in automated protest event collection studies.
摘要：我们描述了包括来自不同国家的英语各种本地和国际来源的抗议事件的金标准语料库。语料库中包含的文档，句子和令牌级注解。该语料库便于创建的机器学习模型，自动分类的新闻文章和提取抗议事件相关的信息，构建知识库这使比较社会和政治学的研究。对于每一个新闻源，注释开始于新闻报道的随机样本，并与正在使用主动学习抽取的样本仍在继续。每批样品是由两个社会和政治学家，通过注释的主管裁决注释，并用半自动识别标注错误，提高了。我们发现，胼具有跨情境的设置，品种和质量，开发和基准文本分类和事件抽取系统，这有助于普遍性和自动化文字处理系统的鲁棒性。这语料库和报告的结果将设置在自动抗议事件收集研究了目前所缺乏的共同点。

13. Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting [PDF] 返回目录
Ali Hürriyetoğlu, Erdem Yörük, Deniz Yüret, Çağrı Yoltar, Burak Gürel, Fırat Duruşan, Osman Mutlu, Arda Akdemir
Abstract: We present an overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing. The lab consists of document, sentence, and token level information classification and extraction tasks that were referred as task 1, task 2, and task 3 respectively in the scope of this lab. The tasks required the participants to identify protest relevant information from English local news at one or more aforementioned levels in a cross-context setting, which is cross-country in the scope of this lab. The training and development data were collected from India and test data was collected from India and China. The lab attracted 58 teams to participate in the lab. 12 and 9 of these teams submitted results and working notes respectively. We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the cross-country setting, which is China.
摘要：我们在普及应用自然语言处理的情况下，从新闻中提取抗议目前的CLEF-2019实验室ProtestNews的概述。该实验室由文件，句子，和令牌级别信息分类和提取的任务了在该实验的范围分别称为任务1，任务2，和任务3。该任务所需要的参加者来自英国本地新闻在一个或多个上述水平的跨情境的设置，这是越野本实验的范围内识别抗议的相关信息。培训和发展的数据来自印度收集的测试数据是来自印度和中国收集。该实验室吸引了58支球队参加的实验室。这些团队的12和9分别提交的结果和工作笔记。我们观察到神经网络产生最好的结果和表现为多数在越野环境，这是中国提交的材料显著下降。

14. Extracting actionable information from microtexts [PDF] 返回目录
Ali Hürriyetoğlu
Abstract: Microblogs such as Twitter represent a powerful source of information. Part of this information can be aggregated beyond the level of individual posts. Some of this aggregated information is referring to events that could or should be acted upon in the interest of e-governance, public safety, or other levels of public interest. Moreover, a significant amount of this information, if aggregated, could complement existing information networks in a non-trivial way. This dissertation proposes a semi-automatic method for extracting actionable information that serves this purpose. First, we show that predicting time to event is possible for both in-domain and cross-domain scenarios. Second, we suggest a method which facilitates the definition of relevance for an analyst's context and the use of this definition to analyze new data. Finally, we propose a method to integrate the machine learning based relevant information classification method with a rule-based information classification technique to classify microtexts. Fully automatizing microtext analysis has been our goal since the first day of this research project. Our efforts in this direction informed us about the extent this automation can be realized. We mostly first developed an automated approach, then we extended and improved it by integrating human intervention at various steps of the automated approach. Our experience confirms previous work that states that a well-designed human intervention or contribution in design, realization, or evaluation of an information system either improves its performance or enables its realization. As our studies and results directed us toward its necessity and value, we were inspired from previous studies in designing human involvement and customized our approaches to benefit from human input.
摘要：微博客如Twitter表示的信息的强大的源。这部分信息可以聚集超过个别职位的级别。一些这方面的信息汇总指的是可能或应该被在电子政务，公共安全或者社会公共利益的其他层次的利益行事事件。此外，该信息的显著量，如果聚合，可以在一个不平凡的方式补充现有的信息网络。本文提出了用于提取用于这个目的可操作的信息半自动方法。首先，我们表明，预测时间对事件有可能在两个域和跨域场景。其次，我们建议有利于对分析师的上下文相关的定义和使用这个定义来分析新的数据的方法。最后，我们建议整合与基于规则的信息分类技术分类微缩文本的基于机器学习的相关信息分类方法的方法。全automatizing微缩分析一直以来这个研究项目的第一天，我们的目标。我们在这方面的努力向我们通报了可以实现这种自动化的程度。我们主要先开发了一个自动化的方法，然后我们扩展，并通过在自动化方法的各个步骤整合的人为干预改善它。我们的经验证实了以前的工作，指出在设计，实现，或信息系统的评价一个精心设计的人工干预或贡献要么提高其性能或启用它的实现。由于我们的研究和成果指导我们走向它的必要性和价值，我们从以前的研究设计人员的参与和灵感来自人类输入自定义我们的利益的方法。

15. SemEval-2020 Task 7: Assessing Humor in Edited News Headlines [PDF] 返回目录
Nabil Hossain, John Krumm, Michael Gamon, Henry Kautz
Abstract: This paper describes the SemEval-2020 shared task "Assessing Humor in Edited News Headlines." The task's dataset contains news headlines in which short edits were applied to make them funny, and the funniness of these edited headlines was rated using crowdsourcing. This task includes two subtasks, the first of which is to estimate the funniness of headlines on a humor scale in the interval 0-3. The second subtask is to predict, for a pair of edited versions of the same original headline, which is the funnier version. To date, this task is the most popular shared computational humor task, attracting 48 teams for the first subtask and 31 teams for the second.
摘要：本文介绍了SemEval-2020共享任务“在编辑的新闻标题评估幽默。”任务的数据集包含，在短期的编辑被应用，使他们有趣的头条新闻，而这些新闻标题编辑的funniness使用众包评级。这个任务包括两个子任务，其中第一个是估计在区间0-3上的幽默规模头条funniness。第二子任务是预测，对于一对相同的原始标题，这是有趣的版本编辑版本。迄今为止，这项任务是最流行的共享计算幽默的任务，吸引了48支球队的第一子任务和31支球队的第二位。

16. The test set for the TransCoder system [PDF] 返回目录
Ernest Davis
Abstract: The TransCoder system translates source code between Java, C++, and Python 3. The test set that was used to evaluate its quality is missing important features of Java, including the ability to define and use classes and the ability to call user-defined functions other than recursively. Therefore, the accuracy of TransCoder over programs with those features remains unknown.
摘要：代码转换器系统翻译的Java，C ++和Python 3.测试组被用来评估其质量已丢失的Java的重要功能，包括以定义的能力和使用的类和能力呼叫用户定义之间的源代码功能比其它递归。因此，转码器的使用上这些功能的程序的准确性仍然是未知的。

17. TweepFake: about Detecting Deepfake Tweets [PDF] 返回目录
Tiziano Fagni, Fabrizio Falchi, Margherita Gambini, Antonio Martella, Maurizio Tesconi
Abstract: The threat of deepfakes, synthetic, or manipulated media, is becoming increasingly alarming, especially for social media platforms that have already been accused of manipulating public opinion. Even the cheapest text generation techniques (e.g. the search-and-replace method) can deceive humans, as the Net Neutrality scandal proved in 2017. Meanwhile, more powerful generative models have been released, from RNN-based methods to the GPT-2 language model. State-of-the-art language models, transformer-based in particular, can generate synthetic text in response to the model being primed with arbitrary input. Thus, Therefore, it is crucial to develop tools that help to detect media authenticity. To help the research in this field, we collected a dataset of real Deepfake tweets. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,836 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. In order to create a solid baseline for detection techniques on the proposed dataset we tested 13 detection methods based on various state-of-the-art approaches. The detection results reported as a baseline using 13 detection methods, confirm that the newest and more sophisticated generative methods based on transformer architecture (e.g., GPT-2) can produce high-quality short texts, difficult to detect.
摘要：deepfakes，合成，或操纵媒体的威胁，正变得越来越令人担忧，特别是对那些已经被指控操纵舆论的社会化媒体平台。即使是最便宜的文本生成技术（如搜索和替换法）能骗得了人，在2017年同时，更强大的生成模型证明了网络中立性丑闻已经出炉，基于RNN的方法对GPT-2语言模型。国家的最先进的语言模型，基于变压器的特别，可以生成响应模型合成文本中加满了任意输入。因此，因此，关键是要开发工具，帮助检测介质的真实性。为了帮助在这一领域的研究，我们收集了真正的Deepfake鸣叫的数据集。正是在这个意义上，每个deepfake鸣叫实际上是在Twitter上发布真实。我们收集了从鸣叫共23个机器人模仿人类的17个账户。该漫游器是基于各种生成技术，即，马氏链，RNN，RNN +马尔可夫，LSTM，GPT-2。我们还随机抽取了僵尸模仿人类鸣叫有25836个鸣叫的整体平衡的数据集（半人生成的半机器人）。该数据集是Kaggle公开。为了在所提出的数据集创建用于检测技术固体基线我们测试基于各种状态的最先进的方法13种的检测方法。检测结果报告为使用13种的检测方法中，确认基于变压器架构最新的和更复杂的生成方法（例如，GPT-2）可产生高质量的短文本，难以检测的基准。

18. On The Plurality of Graphs [PDF] 返回目录
Nicole Fitzgerald, Jacopo Tagliabue
Abstract: We conduct a series of experiments designed to empirically demonstrate the effects of varying the structural features of a multi-agent emergent communication game framework. Specifically, we model the interactions (edges) between individual agents (nodes)as the structure of a graph generated according to a series of known random graph generating algorithms. Confirming the hypothesis proposed in [10], we show that the two factors of variation induced in this work, namely 1) the graph-generating process and 2) the centrality measure according to which edges are sampled, in fact play a significant role in determining the dynamics of language emergence within the population at hand.
摘要：我们进行了一系列旨在展示经验不同的多代理应急通信游戏框架的结构特征的影响实验。具体来说，我们如根据一系列已知的随机图形生成算法所生成的图的结构的相互作用（边缘）个人代理（节点）之间进行建模。在确认中提出的假说[10]，我们表明，变化的两个因素在这项工作中诱导的，即1）图的生成处理，以及2）中间值，根据该边缘采样，实际上起到一个显著作用确定了人口在手语中出现的动态。

19. A Survey on the Evolution of Stream Processing Systems [PDF] 返回目录
Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, Asterios Katsifodimos
Abstract: Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.
摘要：流处理一直是一个活跃的研究领域超过20年，但现在是见证其黄金时间，由于近期研究界和众多国际开源社区的成功努力。本次调查提供的流处理系统的基本方面和他们在乱序数据管理，状态管理，容错性，高可用性，负载管理，弹性和可重构的功能区演变的全面概述。我们回顾过去值得注意的研究成果，勾勒出早期（'00 -'10）和现代（'11 -'18）流媒体系统之间的异同，并讨论最近的趋势和有待解决的问题。

20. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset [PDF] 返回目录
Shahan Ali Memon, Kathleen M. Carley
Abstract: From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are two-fold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities. Our analyses show that COVID-19 misinformed communities are denser, and more organized than informed communities, with a high volume of the misinformation being part of disinformation campaigns. Our analyses also suggest that a large majority of misinformed users may be anti-vaxxers. Finally, our sociolinguistic analyses suggest that COVID-19 informed users tend to use more narratives than misinformed users.
摘要：从阴谋论假治疗和假治疗，COVID-19已经成为一个热门床为误传的网上传播。比以往任何时候，以确定联网方法揭穿和纠正虚假信息更重要。在本文中，我们提出了一个方法，并分析了在线表征两个竞争COVID-19误传社区：（一）误导用户或谁正在积极张贴误传，及（ii）通知用户或用户谁正在积极传播真实信息的用户，或呼叫误传。这项研究的目标是双重的：（一）收集一组不同的，可以通过研究界被用来进行有意义的分析注释COVID-19的Twitter数据集;和（ii）在它们的网络结构，语言模式，以及它们在其他社区成员方面表征所述两个目标的社区。我们的分析表明，COVID-19误导社区密集，不是在知情的社区更有组织，有大量的误传是虚假宣传的一部分。我们的分析还表明，绝大多数用户被误导的可能是反vaxxers。最后，我们的社会语言学的分析表明，COVID - 19日通知用户倾向于使用比误导用户更多的故事。

21. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech [PDF] 返回目录
Tomáš Nekvinda, Ondřej Dušek
Abstract: We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches. Our model is based on Tacotron 2 with a fully convolutional input text encoder whose weights are predicted by a separate parameter generator network. To boost voice cloning, the model uses an adversarial speaker classifier with a gradient reversal layer that removes speaker-specific information from the encoder. We arranged two experiments to compare our model with baselines using various levels of cross-lingual parameter sharing, in order to evaluate: (1) stability and performance when training on low amounts of data, (2) pronunciation accuracy and voice quality of code-switching synthesis. For training, we used the CSS10 dataset and our new small dataset based on Common Voice recordings in five languages. Our model is shown to effectively share information across languages and according to a subjective evaluation test, it produces more natural and accurate code-switching speech than the baselines.
摘要：介绍一种方法，它使用上下文参数生成的元学习理念和生产用更多的语言，比以前的方法训练数据不太自然冠冕堂皇的多语言语音多语种语音合成。我们的模型基于Tacotron 2具有完全卷积输入文本编码器的权重由单独的参数生成网络进行预测。到升压语音克隆，该模型使用了对抗扬声器分类用梯度反转层从编码器，扬声器特定表示删除信息。我们安排了两个实验来比较我们使用的跨语言参数共享各级基线模型，以评估：（1）稳定性和性能，当低数据量的训练，（2）的发音准确度和语音质量的代码 - 切换合成。对于训练，我们采用了共同的声音录音五种语言的CSS10数据集和我们新的小数据集。我们的模型被示出为跨语言有效地共享信息，并根据主观评价试验中，它产生比基线更自然的和准确的码转换的语音。

22. Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics [PDF] 返回目录
Okko Räsänen, María Andrea Cruz Blandón
Abstract: Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of pre-defined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation of alignment quality, thereby enabling systematic discovery of pattern pairs that have similarity what would be expected by coincidence. We test PDTW on Zero Resource Speech Challenge 2017 datasets as a part of 2020 implementation of the challenge. The results show that the system performs consistently on all five tested languages using fixed hyperparameters, clearly outperforming the earlier DTW-based system in terms of coverage of the detected patterns.
摘要：无监督语音项发现（UTD）旨在从声学的语音数据语料库发现演讲的重复片段。对这个问题的一个可能的方法是使用动态时间规整（DTW）找到从语音数据以及对准图案。然而，对于那些其中“足够好”的比对的DTW取向和检测初始候选段的自动选择需要某种类型的预先定义的标准，常常操作性作为用于信号表示之间的逐对距离度量的阈值参数。在现有的UTD系统，最优超参数可以跨越不同的数据集，从而限制了其适用于新的语料库，真正低资源场景。在本文中，我们提出了一个新颖的概率方法命名为PDTW基于DTW-UTD。在PDTW，经处理的语料库的分布特征被用于对准质量的自适应评价，从而能够具有相似什么会偶然预期图案对系统的发现。我们对Zero资源言语挑战2017年的数据集为2020年实施的挑战的一部分测试PDTW。结果表明，该系统执行上始终使用固定的超参数，明显优于早期的基于DTW的系统中检测到的模式的覆盖范围所有五个测试语言。

23. Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech [PDF] 返回目录
Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff
Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representations. As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion. Experiments conducted on the Fisher corpus show that our proposed approach achieves ~6-9% and ~3-4% absolute improvement (F1 score) over the baseline BLSTM model on reference transcripts and ASR outputs respectively. We further improve the model robustness to ASR errors by performing data augmentation with N-best lists which achieves up to an additional ~2-6% improvement on ASR outputs. We also demonstrate the effectiveness of semi-supervised learning approach by performing ablation study on various sizes of the corpus. When trained on 1 hour of speech and text data, the proposed model achieved ~9-18% absolute improvement over baseline model.
摘要：在这项工作中，我们通过学习，从大量的未标记的音频和文本数据的表示探索标点符号预测的多式联运半监督学习方法。在常规的语音处理方法通常使用强制对准到每帧的声学特征，以编码器电平字特征，并执行最终的声学和词汇表示的多模态融合。作为替代方案，我们将探讨基于注意力多模态融合，并比较其与基于强制对齐融合性能。在费舍尔语料库表明，该方法实现〜6-9％和3-4〜％的绝对改善（F1分）超过上参考成绩单和ASR基线BLSTM模型输出分别进行了实验。通过用这实现了对ASR的输出的附加〜2-6％的改善N-最佳列表执行数据的增强我们进一步提高了模型的鲁棒性ASR错误。我们还通过对语料的各种尺寸进行消融研究证明半监督学习方法的有效性。当1小时的演讲和文本数据的训练，该模型实现了〜9-18％的绝对提高了基准模型。

24. DeLighT: Very Deep and Light-weight Transformer [PDF] 返回目录
Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi
Abstract: We introduce a very deep and light-weight transformer, DeLighT, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on machine translation and language modeling tasks show that DeLighT matches the performance of baseline Transformers with significantly fewer parameters. On the WMT'14 En-Fr high resource dataset, DeLighT requires 1.8 times fewer parameters and 2 times fewer operations and achieves better performance (+0.4 BLEU score) than baseline transformers. On the WMT'16 En-Ro low resource dataset, DeLighT delivers similar performance with 2.8 times fewer parameters than baseline transformers.
摘要：介绍了非常深刻的，重量轻的变压器，高兴的是，它提供比显著较少的参数基于变压器的模型类似或更好的性能。喜悦更有效地分配参数两者（1）使用DExTra，横跨使用逐块定标块深且重量轻的转化和（2）各变压器块，其允许更浅和更窄的喜悦块输入端附近和更宽，更深内喜悦输出近块。总体而言，喜悦网络比标准变压器模型更深的2.5〜4倍，但有更少的参数和操作。机器翻译和语言建模任务实验证明，喜悦基线变压器与显著较少参数的性能相匹配。在WMT'14恩神父高资源数据集，喜悦需要1.8倍更少的参数少2次的操作和实现比基准变压器更好的性能（+0.4 BLEU得分）。在WMT'16恩滚装低资源数据集，喜悦提供比基线变压器参数的2.8倍更少类似的性能。

25. Audiovisual Speech Synthesis using Tacotron2 [PDF] 返回目录
Ahmed Hussen Abdelaziz, Anushree Prasanna Kumar, Chloe Seivwright, Gabriele Fanelli, Justin Binder, Yannis Stylianou, Sachin Kajarekar
Abstract: Audiovisual speech synthesis is the problem of synthesizing a talking face while maximizing the coherency of the acoustic and visual speech. In this paper, we propose and compare two audiovisual speech synthesis systems for 3D face models. The first system is the AVTacotron2, which is an end-to-end text-to-audiovisual speech synthesizer based on the Tacotron2 architecture. AVTacotron2 converts a sequence of phonemes representing the sentence to synthesize into a sequence of acoustic features and the corresponding controllers of a face model. The output acoustic features are used to condition a WaveRNN to reconstruct the speech waveform, and the output facial controllers are used to generate the corresponding video of the talking face. The second audiovisual speech synthesis system is modular, where acoustic speech is synthesized from text using the traditional Tacotron2. The reconstructed acoustic speech signal is then used to drive the facial controls of the face model using an independently trained audio-to-facial-animation neural network. We further condition both the end-to-end and modular approaches on emotion embeddings that encode the required prosody to generate emotional audiovisual speech. We analyze the performance of the two systems and compare them to the ground truth videos using subjective evaluation tests. The end-to-end and modular systems are able to synthesize close to human-like audiovisual speech with mean opinion scores (MOS) of 4.1 and 3.9, respectively, compared to a MOS of 4.1 for the ground truth generated from professionally recorded videos. While the end-to-end system gives a better overall quality, the modular approach is more flexible and the quality of acoustic speech and visual speech synthesis is almost independent of each other.
摘要：视听语音合成是合成一个会说话的面孔，同时最大化的声音和视觉语音的一致性的问题。在本文中，我们提出并比较两个视听语音合成的三维人脸模型系统。第一系统是AVTacotron2，这是基于所述Tacotron2架构的端至端的文本到视听语音合成器。 AVTacotron2将表示句子合成为的声学特征的序列音素与一脸部模型的相应控制器的序列。输出的声学特征被用于调节一个WaveRNN重构语音波形，并输出面部控制器用于生成通话面的对应的视频。第二视听语音合成系统是模块化的，其中声学语音从文本采用传统Tacotron2合成。然后将重构的声学语音信号被用于驱动用独立地训练音频到面部动画神经网络的脸部模型的面部对照。我们进一步条件两端到终端和编码所需的韵律产生情感视听语音情感的嵌入模块的方法。我们分析这两个系统的性能，并比较他们用主观评价试验的地面实况视频。终端到终端的和模块化的系统能够接近合成人类般有4.1和3.9平均意见得分（MOS）视听演讲，分别比4.1从专业录制的视频所产生的地面实况的MOS。而端至端系统给出了一个更好的整体质量，模块化方法更为灵活和声学语音和视觉语音合成的质量几乎相互独立的。

26. Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages [PDF] 返回目录
Badr Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow
Abstract: State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.
摘要：国家的最先进的语言识别（LID）系统，该系统是基于终端到终端的深层神经网络，不仅在遥远的语言之间的区别，但也密切相关的语言之间，或者甚至表现出了非凡的成功不同品种的口语语言一样。但是，目前还不清楚到什么程度神经LID模型推广到与因域转移不同的声学条件语音样本。在本文中，我们提出了一组实验，调查域不匹配的神经LID系统的性能横跨两个领域的影响为6种斯拉夫语言的一个子集（读取语音和无线电广播），并检查两个低级别的信号描述符（频谱和倒谱特征）完成这个任务。我们的实验表明严重制约：（1）外的域的语音样本的神经LID模型的性能，和（2），而光谱和倒谱特征显示中域相当的性能，光谱特征下显示域不匹配更好的稳健性。此外，我们应用无监督的领域适应性，以尽量减少在我们的研究中两个域之间的差异。我们实现相对精确度的改进，这取决于在源域声学条件的多样性的范围从9％至77％。

27. Large-scale, Language-agnostic Discourse Classification of Tweets During COVID-19 [PDF] 返回目录
Oguzhan Gencoglu
Abstract: Quantifying the characteristics of public attention is an essential prerequisite for appropriate crisis management during severe events such as pandemics. For this purpose, we propose language-agnostic tweet representations to perform large-scale Twitter discourse classification with machine learning. Our analysis on more than 26 million COVID-19 tweets show that large-scale surveillance of public discourse is feasible with computationally lightweight classifiers by out-of-the-box utilization of these representations.
摘要：量化公众关注的特点是严重的事件，比如大流行期间，适当的危机管理的基本前提。为此，我们提出了语言无关的鸣叫表示给Twitter与机器学习话语分类进行大规模。我们对超过26万COVID-19鸣叫分析表明公共话语的，大规模的监控是通过外的开箱利用这些表象的计算轻量级分类是可行的。

28. Bayesian Optimization for Selecting Efficient Machine Learning Models [PDF] 返回目录
Lidan Wang, Franck Dernoncourt, Trung Bui
Abstract: The performance of many machine learning models depends on their hyper-parameter settings. Bayesian Optimization has become a successful tool for hyper-parameter optimization of machine learning algorithms, which aims to identify optimal hyper-parameters during an iterative sequential process. However, most of the Bayesian Optimization algorithms are designed to select models for effectiveness only and ignore the important issue of model training efficiency. Given that both model effectiveness and training time are important for real-world applications, models selected for effectiveness may not meet the strict training time requirements necessary to deploy in a production environment. In this work, we present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. We propose an objective that captures the tradeoff between these two metrics and demonstrate how we can jointly optimize them in a principled Bayesian Optimization framework. Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency while maintaining strong effectiveness as compared to state-of-the-art Bayesian Optimization algorithms.
摘要：许多机器学习模型的性能取决于其超参数设置。优化贝叶斯已经成为机器学习算法超参数优化，其目的是在一个反复的连续过程，以确定最佳超参数一个成功的工具。然而，大多数的优化贝叶斯算法的设计仅选择效果模型和忽略的模型训练效率的重要问题。鉴于这两个模型的有效性和培训时间是现实世界的应用很重要，选择效益模型可能无法满足必要在生产环境中部署了严格的培训时间要求。在这项工作中，我们提出了一个统一的优化贝叶斯框架的联合优化模型都预测效果和训练效率。我们提出了一个目标，抓住这两个指标之间的权衡，并证明我们如何能够共同优化他们在一个有原则的优化贝叶斯框架。在选型试验的推荐任务的指示选择这种方式显著提高模型的训练效率比国家的最先进的贝叶斯优化算法，同时保持了良好的效益模型。

29. Trojaning Language Models for Fun and Profit [PDF] 返回目录
Xinyang Zhang, Zheng Zhang, Ting Wang
Abstract: Recent years have witnessed a new paradigm of building natural language processing (NLP) systems: general-purpose, pre-trained language models (LMs) are fine-tuned with simple downstream models to attain state-of-the-art performance for a variety of target tasks. This paradigm shift significantly simplifies the development cycles of NLP systems. Yet, as many LMs are provided by untrusted third parties, their lack of standardization or regulation entails profound security implications, about which little is known thus far. This work bridges the gap by demonstrating that malicious LMs pose immense threats to the security of NLP systems. Specifically, we present TROJAN-ML, a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction in a highly predictable manner. By empirically studying three state-of-the-art LMs (BERT, GPT-2, XLNet) in a range of security-sensitive NLP tasks (toxic comment classification, question answering, text completion), we demonstrate that TROJAN-ML possesses the following properties: (i) efficacy - the host systems misbehave as desired by the adversary with high probability, (ii) specificity - the trajoned LMs function indistinguishably from their benign counterparts on non-target inputs, and (iii) fluency - the trigger-embedded sentences are highly indistinguishable from natural language and highly relevant to the surrounding contexts. We provide analytical justification for the practicality of TROJAN-ML, which points to the unprecedented complexity of today's LMs. We further discuss potential countermeasures and their challenges, which lead to several promising research directions.
摘要：近年来，两国建立自然语言处理（NLP）系统的一个新的模式：通用，预先训练语言模型（LMS）是微调用简单的下游模型获得国家的最先进的性能各种各样的目标任务。这一转变显著简化NLP系统的开发周期。然而，正如许多的LM是由不信任第三方提供的，他们缺乏标准化，法规的需要深厚的安全隐患，对此一点是迄今已知的。这项工作由桥梁证明恶意的LM会对NLP系统的安全威胁巨大的差距。具体来说，我们本TROJAN-ML，一类新的，其中恶意制作的LM触发主机NLP系统故障以高度可预测的方式在强奸攻击。通过一系列的安全敏感NLP任务（有毒意见分类，问题解答，文本完成）实证研究三大国有的最先进的LM（BERT，GPT-2，XLNet），我们证明了木马ML拥有以下性质：（ⅰ）功效 - 主机系统出现异常通过以高概率，（ⅱ）特异性对手根据需要 - 在trajoned的LM从对非目标输入他们的良性同行无差别功能，和（iii）的流畅性 - 的TRIGGER-嵌入式句子是从自然语言非常难以区分和周围环境密切相关。我们提供TROJAN-ML，它指向了目前的LM的空前复杂的实用性分析理由。我们进一步讨论可能的对策和他们面临的挑战，这导致一些有前景的研究方向。

30. Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification [PDF] 返回目录
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
Abstract: In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data often required in conventional teacher student learning. The proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental results based on AlexNet-L deep classification models confirm the effectiveness of our proposed approach for mismatch situations. NLE-alone adaptation compares favourably with the conventional device adaptation and teacher student based adaptation techniques. NLE with RTSL further improves the classification accuracy.
摘要：在本文中，我们提出了一个域自适应框架在神经标签嵌入（NLE）和关系师生学习（RTSL）解决声学场景分类撬动设备不匹配问题。考虑到现场声类，我们提出的框架捕获这样的关系，其本质上是独立于设备之间的结构关系。在培训阶段，可转让的知识是在NLE从源域冷凝。接下来的调整阶段，新颖的RTSL策略，采用学习适应的目标模式，而无需使用传统的教师学生的学习往往需要配对的源 - 目标数据。拟议的框架上DCASE 2018 Task1b数据集进行评估。基于AlexNet-L深分类模型实验结果证实了我们提出的方法失配情况下的有效性。 NLE单独适应与常规设备适配和老师学生的适应技术相比毫不逊色。 NLE与RTSL进一步提高了分类的准确性。

31. An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances [PDF] 返回目录
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee
Abstract: In this paper, we propose a sub-utterance unit selection framework to remove acoustic segments in audio recordings that carry little information for acoustic scene classification (ASC). Our approach is built upon a universal set of acoustic segment units covering the overall acoustic scene space. First, those units are modeled with acoustic segment models (ASMs) used to tokenize acoustic scene utterances into sequences of acoustic segment units. Next, paralleling the idea of stop words in information retrieval, stop ASMs are automatically detected. Finally, acoustic segments associated with the stop ASMs are blocked, because of their low indexing power in retrieval of most acoustic scenes. In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i.e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification. On the DCASE 2018 dataset, scene classification accuracy increases from 68%, with whole utterances, to 72.1%, with segment selection. This represents a competitive accuracy without any data augmentation, and/or ensemble strategy. Moreover, our approach compares favourably to AlexNet-L with attention.
摘要：在本文中，我们提出了一个子发声单元选择框架以消除录音携带声学场景分类（ASC）的资料很少声音片段。我们的做法是在一套通用的声学分割单元覆盖了整个声学现场空间建造。首先，这些单元被建模与用于标记化声学场景话语成声段单元的序列声学段模型（ASM）。接下来，并联在信息检索停止词的想法，停止ASM的自动检测。最后，停止ASM的相关声段被堵塞，因为在大多数声场景检索其低功耗的索引中。与用全话语，则ASM-除去子话语，即，不停止声段声学发声，然后使用作为输入到AlexNet-L的后端为最终的分类建立场景模型。在DCASE 2018集，从68％场景分类准确度的增加，与全言论，至72.1％，与段选择。这表示没有任何数据扩张有竞争力的精确度，和/或合奏策略。此外，我们的做法相比，毫不逊色于AlexNet-L与关注。

32. Back-propagation through Signal Temporal Logic Specifications: Infusing Logical Structure into Gradient-Based Methods [PDF] 返回目录
Karen Leung, Nikos Aréchiga, Marco Pavone
Abstract: This paper presents a technique, named STLCG, to compute the quantitative semantics of Signal Temporal Logic (STL) formulas using computation graphs. STLCG provides a platform which enables the incorporation of logical specifications into robotics problems that benefit from gradient-based solutions. Specifically, STL is a powerful and expressive formal language that can specify spatial and temporal properties of signals generated by both continuous and hybrid systems. The quantitative semantics of STL provide a robustness metric, i.e., how much a signal satisfies or violates an STL specification. In this work, we devise a systematic methodology for translating STL robustness formulas into computation graphs. With this representation, and by leveraging off-the-shelf automatic differentiation tools, we are able to back-propagate through STL robustness formulas and hence enable a natural and easy-to-use integration with many gradient-based approaches used in robotics. We demonstrate, through examples stemming from various robotics applications, that STLCG is versatile, computationally efficient, and capable of injecting human-domain knowledge into the problem formulation.
摘要：本文提出了一种技术，命名STLCG，来计算使用计算图的信号时序逻辑（STL）式的定量语义。 STLCG提供了一个平台，使逻辑规格掺入机器人问题，即从基于梯度的解决方案的好处。具体而言，STL是可指定的通过连续和混合系统产生的信号的空间和时间特性的强大的和表现形式语言。 STL的定量语义提供了稳健性度量，即，多少信号满足或违反了STL规范。在这项工作中，我们设计了翻译STL稳健性公式为计算图形一个系统的方法。借助这种表示，并通过利用现成的货架自动分化的工具，我们能够背传播通过STL鲁棒性的公式，因此能够与基于梯度许多自然的和易于使用的集成机器人办法使用。我们证明，通过各种机器人应用所产生的实例，即STLCG是通用的，计算效率，并能注入人类领域知识进配方问题的。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-04

目录

摘要