摘要

1. Prosody leaks into the memories of words [PDF] 返回目录
Kevin Tang, Jason A. Shaw
Abstract: The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of word reduction are stored as part of a word's representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6,000 word types spoken by 1,655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that informativity is closely linked to prosodic prominence, and that lexical representation of a word includes phonetic details associated with its prosodic prominence. In other words, the lexicon absorbs prosodic influences on speech production.
摘要：一个词的上下文中的平均预测（又名信息性）已显示状态字的持续时间（Seyfarth，2014）。所有其他条件相同，也就是说倾向于更可预测的环境中发生的较倾向于在难以预料的环境中发生的话短。对时间的信息性效应的一个账户是字简化的声音细节被存储为一个字的表示的一部分。其他的研究认为，预测效果依赖于韵律结构中不可或缺的方式。随着评估在言语产生信息性影响的潜在的韵律基础的目的，本研究延伸过去在两个方向上工作;它在另一个大语言，国语中国调查信息性作用，并扩大研究超越字期间额外音响的尺寸，间距和强度，已知指数韵律突出。实词的声音信息是从大量电话交谈语料库中提取与超过40万令牌和6000点字的类型由1655人说话和信息性的使用从4.31亿字语料字幕估计频率的统计分析效果。结果表明：低信息度是词有更短的持续时间，复制在英语中发现的影响。此外，信息性对最大俯仰和强度，与韵律突出两种语音尺寸显著的效果。扩展这一解释，这些结果表明，信息性是密切相关的韵律突出，并且一个字是词汇表示包括其韵律突出相关的语音信息。换句话说，词汇吸收的言语产生韵律的影响。

2. The Importance of Suppressing Domain Style in Authorship Analysis [PDF] 返回目录
Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, Ben Thies, Matthias Hagen, Efstathios Stamatatos, Benno Stein, Martin Potthast
Abstract: The prerequisite of many approaches to authorship analysis is a representation of writing style. But despite decades of research, it still remains unclear to what extent commonly used and widely accepted representations like character trigram frequencies actually represent an author's writing style, in contrast to more domain-specific style components or even topic. We address this shortcoming for the first time in a novel experimental setup of fixed authors but swapped domains between training and testing. With this setup, we reveal that approaches using character trigram features are highly susceptible to favor domain information when applied without attention to domains, suffering drops of up to 55.4 percentage points in classification accuracy under domain swapping. We further propose a new remedy based on domain-adversarial learning and compare it to ones from the literature based on heuristic rules. Both can work well, reducing accuracy losses under domain swapping to 3.6% and 3.9%, respectively.
摘要：许多方法来分析作者的前提条件是文风的表示。但是，尽管几十年的研究，仍然不清楚到什么程度普遍使用和广泛接受的性格一样卦频率表示实际上代表了作者的写作风格，相反，多个域风格的组件，甚至话题。我们解决这个缺点在固定的作者，但是训练和测试之间交换域的一个新的实验设置的第一次。有了这个设置中，我们揭示了接近使用字符卦的特点是高度敏感的时候没有注意应用于域青睐域信息，痛苦滴在域交换高达55.4个百分点，在分类的准确性。我们进一步提出了一种基于域对抗性学习新的补救措施，并把它比作基于启发式规则从文献的。这两种分别可以很好地工作，减少在结构域交换的准确性损失3.6％和3.9％。

3. Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models [PDF] 返回目录
Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro
Abstract: Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope it will be a useful resource for researchers who propose new datasets, to have a set of tools to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who develop novel architectures, to further understand the implications of their improvements with respect to their model's acquired capabilities.
摘要：近年来出现了越来越多的分析自然语言推理（NLI）数据集用于浅表线索，他们是否破坏的潜在这些数据集以及它们如何影响被优化，并在此数据评估这些模型的任务的复杂性出版物。这种结构的一项调查中分类模型和数据集报的弱点，并提出揭示和减轻对英语的弱点的方法提供了不断发展的研究领域的概述。我们总结和讨论调查结果，并与一组可能的未来的研究方向提出建议的结论。我们希望这将是谁提出新的数据集，拥有一套工具来评估其数据的适用性和质量评价感兴趣的各种现象的研究人员有用的资源，以及那些谁开发新的架构，进一步了解相对于他们的模型的收购能力改善他们的影响。

4. Investigating Deep Learning Approaches for Hate Speech Detection in Social Media [PDF] 返回目录
Prashant Kapil, Asif Ekbal, Dipankar Das
Abstract: The phenomenal growth on the internet has helped in empowering individual's expressions, but the misuse of freedom of expression has also led to the increase of various cyber crimes and anti-social activities. Hate speech is one such issue that needs to be addressed very seriously as otherwise, this could pose threats to the integrity of the social fabrics. In this paper, we proposed deep learning approaches utilizing various embeddings for detecting various types of hate speeches in social media. Detecting hate speech from a large volume of text, especially tweets which contains limited contextual information also poses several practical challenges. Moreover, the varieties in user-generated data and the presence of various forms of hate speech makes it very challenging to identify the degree and intention of the message. Our experiments on three publicly available datasets of different domains shows a significant improvement in accuracy and F1-score.
摘要：在互联网上的显着增长已经帮助在增强个人的表达，但表达自由的滥用也导致了各种网络犯罪和反社会活动的增加。仇恨言论是这样一个问题，需要加以否则很认真的处理，这可能构成威胁到社会结构的完整性。在本文中，我们提出了深刻的学习方法是使用各种方式的嵌入在社会化媒体检测各类仇恨言论的。从大量文本的检测仇恨言论，其中包含有限上下文信息，也带来了一些现实的挑战尤其鸣叫。此外，在用户生成的数据的种类和各种形式的讨厌语音的存在使得它非常具有挑战性的，以确定该消息的程度和意图。我们在不同领域的三个公开可用的数据集实验表明在精度和F1-得分显著的改善。

5. Massive Choice, Ample Tasks (MaChAmp):A Toolkit for Multi-task Learning in NLP [PDF] 返回目录
Rob van der Goot, Ahmet Üstün, Alan Ramponi, Barbara Plank
Abstract: Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy use of fine-tuning BERT-like models in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of NLP tasks in a uniform toolkit, from text classification to sequence labeling and dependency parsing.
摘要：转让学习，特别是接近相结合的多任务与预先训练情境的嵌入和微调学习，拥有先进的极大近年来自然语言处理领域。在本文中，我们提出MaChAmp，方便使用微调的BERT般多任务设置模型的工具。 MaChAmp的好处是其灵活的配置选项，以及各种NLP任务，以统一的工具包支持，从文本分类序列标签和依存分析。

6. SLAM-Inspired Simultaneous Contextualization and Interpreting for Incremental Conversation Sentences [PDF] 返回目录
Yusuke Takimoto, Yosuke Fukuchi, Shoya Matsumori, Michita Imai
Abstract: Distributed representation of words has improved the performance for many natural language tasks. In many methods, however, only one meaning is considered for one label of a word, and multiple meanings of polysemous words depending on the context are rarely handled. Although research works have dealt with polysemous words, they determine the meanings of such words according to a batch of large documents. Hence, there are two problems with applying these methods to sequential sentences, as in a conversation that contains ambiguous expressions. The first problem is that the methods cannot sequentially deal with the interdependence between context and word interpretation, in which context is decided by word interpretations and the word interpretations are decided by the context. Context estimation must thus be performed in parallel to pursue multiple interpretations. The second problem is that the previous methods use large-scale sets of sentences for offline learning of new interpretations, and the steps of learning and inference are clearly separated. Such methods using offline learning cannot obtain new interpretations during a conversation. Hence, to dynamically estimate the conversation context and interpretations of polysemous words in sequential sentences, we propose a method of Simultaneous Contextualization And INterpreting (SCAIN) based on the traditional Simultaneous Localization And Mapping (SLAM) algorithm. By using the SCAIN algorithm, we can sequentially optimize the interdependence between context and word interpretation while obtaining new interpretations online. For experimental evaluation, we created two datasets: one from Wikipedia's disambiguation pages and the other from real conversations. For both datasets, the results confirmed that SCAIN could effectively achieve sequential optimization of the interdependence and acquisition of new interpretations.
摘要：词的分布式表示已改善了许多自然语言任务的执行。在很多方法，但是，只有一个含义被认为是一个单词的一个标签，并根据上下文多义词的多重意义很少处理。虽然研究工作已经处理了多义词，他们按照一批大型文件确定的这些话的含义。因此，有两个问题，将这些方法应用于顺序的句子，如含有暧昧的表情交谈。第一个问题是，这些方法不能按顺序处理上下文和字解读之间的相互依存关系，在这种情况下，通过字的解释，这个词的解释决定的背景下决定的。上下文估计必须从而并行地执行追求多种解释。第二个问题是，以前的方法使用大型语句集合为新的诠释离线学习，学习和推理的步骤是明确分开。使用离线学习这些方法在通话过程中不能获得新的诠释。因此，动态地估计对话上下文，并且在连续的句子多义词的解释，我们建议同时语境和解释传统的基于同步定位与地图（SLAM）算法（SCAIN）的方法。通过使用SCAIN算法，我们可以依次优化环境和字解读之间的相互依存关系，同时在线获得新的诠释。对于实验评价，我们创建了两个数据集：一个来自维基百科的页面消歧和现实的对话对方。对于这两个数据集，结果证实SCAIN可以有效地实现相互依存和收购新的解释的顺序优化。

7. Detection of Bangla Fake News using MNB and SVM Classifier [PDF] 返回目录
Md Gulzar Hussain, Md Rashidul Hasan, Mahmuda Rahman, Joy Protim, Sakib Al Hasan
Abstract: Fake news has been coming into sight in significant numbers for numerous business and political reasons and has become frequent in the online world. People can get contaminated easily by these fake news for its fabricated words which have enormous effects on the offline community. Thus, interest in research in this area has risen. Significant research has been conducted on the detection of fake news from English texts and other languages but a few in Bangla Language. Our work reflects the experimental analysis on the detection of Bangla fake news from social media as this field still requires much focus. In this research work, we have used two supervised machine learning algorithms, Multinomial Naive Bayes (MNB) and Support Vector Machine (SVM) classifiers to detect Bangla fake news with CountVectorizer and Term Frequency - Inverse Document Frequency Vectorizer as feature extraction. Our proposed framework detects fake news depending on the polarity of the corresponding article. Finally, our analysis shows SVM with the linear kernel with an accuracy of 96.64% outperform MNB with an accuracy of 93.32%.
摘要：假新闻已进入人们的视线中显著数量众多的商业和政治的原因，并已成为在网络世界中频繁。人们可以很容易地通过这些假新闻的这对线下社区的巨大影响其制造的话受到污染。因此，在这方面的研究兴趣上升。显著的研究已经从英文文本和其他语言，但在孟加拉语言的几个检测的假新闻已进行。我们的工作体现在检测从社交媒体孟加拉假新闻的实验分析这一领域仍然需要大量的焦点。在这项研究工作中，我们使用了两种监督的机器学习算法，多项朴素贝叶斯（MNB）和支持向量机（SVM）分类器来检测孟加拉假新闻与CountVectorizer和词频 - 逆文档频率矢量器为特征提取。我们提出的框架检测根据相应文章的极性假新闻。最后，我们的分析表明SVM与96.64％，强于大盘的MNB与93.32％的准确度的精度线性内核。

8. Using Large Pretrained Language Models for Answering User Queries from Product Specifications [PDF] 返回目录
Kalyani Roy, Smit Shah, Nithish Pai, Jaidam Ramtej, Prajit Prashant Nadkarn, Jyotirmoy Banerjee, Pawan Goyal, Surender Kumar
Abstract: While buying a product from the e-commerce websites, customers generally have a plethora of questions. From the perspective of both the e-commerce service provider as well as the customers, there must be an effective question answering system to provide immediate answers to the user queries. While certain questions can only be answered after using the product, there are many questions which can be answered from the product specification itself. Our work takes a first step in this direction by finding out the relevant product specifications, that can help answering the user questions. We propose an approach to automatically create a training dataset for this problem. We utilize recently proposed XLNet and BERT architectures for this problem and find that they provide much better performance than the Siamese model, previously applied for this problem. Our model gives a good performance even when trained on one vertical and tested across different verticals.
摘要：虽然购买从电子商务网站的产品，客户一般都有的问题太多了。无论从电子商务服务提供商以及客户的角度看，必须有一个有效的问答系统提供即时解答用户疑问。虽然某些问题只能通过使用该产品后回答，也有可从产品规范本身需要回答许多问题。我们的工作需要通过找出相关的产品规格在这个方向迈出的第一步，可以帮助回答用户的问题。我们提出一种方法来自动创建这个问题的训练数据集。我们利用最近提出XLNet和BERT架构对于这个问题，发现他们提供比连体模型，此前申请了这个问题更好的性能。我们的模型提供了训练的一个垂直和跨越不同的垂直测试，即使有不错的表现。

9. Neural Simultaneous Speech Translation Using Alignment-Based Chunking [PDF] 返回目录
Patrick Wilken, Tamer Alkhouli, Evgeny Matusov, Pavel Golik
Abstract: In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words, with a trade-off between latency and quality. We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words. The model is composed of two main components: one to dynamically decide on ending a source chunk, and another that translates the consumed chunk. We train the components jointly and in a manner consistent with the inference conditions. To generate chunked training data, we propose a method that utilizes word alignment while also preserving enough context. We compare models with bidirectional and unidirectional encoders of different depths, both on real speech and text input. Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.
摘要：在同时机器翻译中，目的是确定何时产生给定的源字的连续流的部分平移，以延迟和质量之间的折衷。我们提出了一个神经机器翻译（NMT）模型，使动态决定何时继续喂养对输入或生成输出的话。该模型是由两个主要部分组成：一个对结束了源块动态地决定，而另一个转换该消耗块。我们培养的组件共同并与推理的条件相一致的方式。为了产生分块的训练数据，我们建议使用的字排列，同时保持足够的上下文的方法。我们比较模型不同深度的双向和单向编码器，无论是在真实的语音和文本输入。我们对IWSLT 2020英语到德国的任务结果2.6至3.7％BLEU跑赢等待-K基线的绝对。

10. Noise-robust Named Entity Understanding for Virtual Assistants [PDF] 返回目录
Deepak Muralidharan, Joel Ruben Antony Moniz, Sida Gao, Xiao Yang, Lin Li, Justine Kao, Stephen Pulman, Atish Kothari, Ray Shen, Yinying Pan, Vivek Kaul, Mubarak Seyed Ibrahim, Gang Xiang, Nan Dun, Yidan Zhou, Andy O, Yuan Zhang, Pooja Chitkara Xuan Wang, Alkesh Patel, Kushal Tayal, Roger Zheng, Peter Grasch, Jason Williams
Abstract: Named Entity Understanding (NEU) plays an essential role in interactions between users and voice assistants, since successfully identifying entities and correctly linking them to their standard forms is crucial to understanding the user's intent. NEU is a challenging task in voice assistants due to the ambiguous nature of natural language and because noise introduced by speech transcription and user errors occur frequently in spoken natural language queries. In this paper, we propose an architecture with novel features that jointly solves the recognition of named entities (a.k.a. Named Entity Recognition, or NER) and the resolution to their canonical forms (a.k.a. Entity Linking, or EL). We show that by combining NER and EL information in a joint reranking module, our proposed framework improves accuracy in both tasks. This improved performance and the features that enable it, also lead to better accuracy in downstream tasks, such as domain classification and semantic parsing.
摘要：命名实体理解（NEU）发挥了用户与语音助手之间的互动至关重要的作用，因为成功地识别实体和正确地将它们链接到他们的标准形式是理解用户的意图至关重要。 NEU是语音助手一个具有挑战性的任务，由于自然语言的模糊性，并因为的语音转录和用户错误的噪声引入口述自然语言查询频频发生。在本文中，我们提出了用新的特征的架构，共同解决命名实体识别（又名命名实体识别，或NER）以及分辨率为它们的规范形式（又名实体链接，或EL）。我们表明，联合重新排名模块中结合NER和EL信息，我们提出的框架，这两项任务提高了精度。这种改进的性能和启用它的功能，也带来更好的精度下游任务，如域分类和语义分析。

11. On Incorporating Structural Information to improve Dialogue Response Generation [PDF] 返回目录
Nikita Moghe, Priyesh Vijayan, Balaraman Ravindran, Mitesh M. Khapra
Abstract: We consider the task of generating dialogue responses from background knowledge comprising of domain specific resources. Specifically, given a conversation around a movie, the task is to generate the next response based on background knowledge about the movie such as the plot, review, Reddit comments etc. This requires capturing structural, sequential and semantic information from the conversation context and the background resources. This is a new task and has not received much attention from the community. We propose a new architecture that uses the ability of BERT to capture deep contextualized representations in conjunction with explicit structure and sequence information. More specifically, we use (i) Graph Convolutional Networks (GCNs) to capture structural information, (ii) LSTMs to capture sequential information and (iii) BERT for the deep contextualized representations that capture semantic information. We analyze the proposed architecture extensively. To this end, we propose a plug-and-play Semantics-Sequences-Structures (SSS) framework which allows us to effectively combine such linguistic information. Through a series of experiments we make some interesting observations. First, we observe that the popular adaptation of the GCN model for NLP tasks where structural information (GCNs) was added on top of sequential information (LSTMs) performs poorly on our task. This leads us to explore interesting ways of combining semantic and structural information to improve the performance. Second, we observe that while BERT already outperforms other deep contextualized representations such as ELMo, it still benefits from the additional structural information explicitly added using GCNs. This is a bit surprising given the recent claims that BERT already captures structural information. Lastly, the proposed SSS framework gives an improvement of 7.95% over the baseline.
摘要：我们认为，从包含特定领域的资源的背景知识生成的对话响应的任务。具体来说，鉴于围绕电影的对话，任务是基于对电影的背景知识，如情节，审查，以产生下一个响应，reddit的评论等，这需要从对话上下文和捕捉结构，顺序和语义信息背景资源。这是一个新的任务，并没有从社区受到多方关注。我们建议使用BERT来捕捉深层语境表示与明确的结构和序列信息相结合的能力，新的架构。更具体地说，我们使用（我）图卷积网络（GCNs）来捕获结构信息，（二）LSTMs捕捉为深情境表示捕获语义信息的顺序信息及（iii）BERT。我们广泛的分析提出的架构。为此，我们提出了一个插件和播放语义序列，结构（SSS）架构，使我们能够有效地结合这样的语言信息。通过一系列的实验中，我们做了一些有趣的观察。首先，我们观察到，其中序贯信息（LSTMs）的顶部加入结构信息（GCNs）执行我们的任务，对于不良的NLP任务的GCN模型的热门改编。这使我们探索相结合语义和结构信息，以提高性能的有趣的方式。其次，我们观察到，尽管已经BERT优于其他深语境表示，如毛毛，它仍然具有额外的结构信息的好处明确使用GCNs增加。这是一个有点出人意料鉴于近期声称BERT已经捕获的结构信息。最后，提出了SSS框架给出了7.95％，超过基线的改善。

12. What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP [PDF] 返回目录
Oskar Wysocki, Malina Florea, Andre Freitas
Abstract: SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the contributions behind SemEval. By understanding the distribution of task types, metrics, architectures, participation and citations over time we aim to answer the question on what is being evaluated by SemEval.
摘要：SemEval是在NLP社区主会场的新的挑战的建议和NLP系统的系统的实证分析。本文提供旨在证据背后SemEval贡献的图案SemEval进行了系统的定量分析。通过了解任务类型，指标，体系结构，参与和引文的分布随着时间的推移，我们的目标是回答什么是由SemEval评估的问题。

13. Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking [PDF] 返回目录
Thibault Févry, Nicholas FitzGerald, Livio Baldini Soares, Tom Kwiatkowski
Abstract: In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity candidates, Transformer architecture, and input perturbations. Lastly, we present promising results on more challenging settings such as end-to-end entity linking and entity linking without in-domain training data.
摘要：在这项工作中，我们提出了结合了变压器架构，大规模维基百科的链接训练前的实体链接模型。我们的模型实现了对两种常用的实体连接数据集的国家的最先进的：在CoNLL 96.7％和94.9％的TAC-KBP。我们提出详细的分析，以了解设计的选择是实体连接，包括负面实体候选人的选择，变压器的架构，并输入扰动重要。最后，我们给出承诺更有挑战性的设置，如终端到终端实体链接和链接实体的结果，而在域训练数据。

14. Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization [PDF] 返回目录
Benjamin Milde, Chris Biemann
Abstract: The Sparsespeech model is an unsupervised acoustic model that can generate discrete pseudo-labels for untranscribed speech. We extend the Sparsespeech model to allow for sampling over a random discrete variable, yielding pseudo-posteriorgrams. The degree of sparsity in this posteriorgram can be fully controlled after the model has been trained. We use the Gumbel-Softmax trick to approximately sample from a discrete distribution in the neural network and this allows us to train the network efficiently with standard backpropagation. The new and improved model is trained and evaluated on the Libri-Light corpus, a benchmark for ASR with limited or no supervision. The model is trained on 600h and 6000h of English read speech. We evaluate the improved model using the ABX error measure and a semi-supervised setting with 10h of transcribed speech. We observe a relative improvement of up to 31.4% on ABX error rates across speakers on the test set with the improved Sparsespeech model on 600h of speech data and further improvements when we scale the model to 6000h.
摘要：Sparsespeech模型是可以产生离散伪标签未转录语音无人监督的声学模型。我们扩展了Sparsespeech模型以允许在取样的随机离散变量，产生伪posteriorgrams。稀疏的这一posteriorgram程度可以将模型已经被训练后，被完全控制。我们使用冈贝尔，使用SoftMax招约样品从神经网络中的离散分布，这使我们能够与标准的反向传播有效的培训网络。新的和改进的模型进行训练，并在利布里光强语料库，对ASR的基准有限或没有监督评估。该模型被训练在600H和英语的阅读6000H讲话。我们评估使用ABX错误的措施，并与转录语音10H一个半监督设置的改进型号。当我们缩放模式6000H我们观察到高达上ABX错误率31.4％，整个音箱的测试集与语音数据和进一步改进的600H改进Sparsespeech模型的相对改善。

15. Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data [PDF] 返回目录
Susanne Arndt, Dieter Schnäpp
Abstract: The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universität Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards in the domain of functional safety, IT security, and dependability. An intrinsic evaluation method in the form of a similarity rating task has been applied in which two linguists and three domain experts from standardization participated. The data set is used to evaluate a specific implementation of an established sentence embedding model. This implementation proves to be satisfactory for the domain-specific data so that further implementations for harmonization assistance may be brought forward by the project. Considering recent criticism on intrinsic evaluation methods, the article concludes with an evaluation of Harbsafe-162 and joins a more general discussion about the nature of similarity rating tasks. Harbsafe-162 has been made available for the community.
摘要：本文介绍Harbsafe-162，用于评估分布式语义模型域特定的数据集。它由工业大学不伦瑞克和德国委员会DIN和VDE的电气，电子和信息技术的Harbsafe项目合作起源。一个目标的项目是分布式语义模型适用于术语条目，即包括至少一个或几个术语，术语和短语的定义，复杂的词汇数据。需要该应用程序来解决一个更复杂的问题：标准和标准机构的术语的协调（即doublettes和不一致的分辨率）。由于缺乏评估数据集术语条目，Harbsafe-162的建立是实现协调援助的必要步骤。从功能安全，IT安全和可靠性的域9个电工标准Harbsafe-162套的数据。在相似性评估任务的形式的内在评价方法已应用于两个语言学家和标准化三级域名专家参加。该数据集用于评估建立的句子嵌入模型的具体实施。此实现证明是针对特定领域的数据令人满意所以协调援助进一步实现可能受项目提出。考虑到最近关于内在评价方法的批评，文章最后提出了Harbsafe-162的评价，并加入约的相似性评级工作的性质更广泛的讨论。 Harbsafe-162已提供为社会。

16. Analyzing COVID-19 on Online Social Media: Trends, Sentiments and Emotions [PDF] 返回目录
Xiaoya Li, Mingxin Zhou, Jiawei Wu, Arianna Yuan, Fei Wu, Jiwei Li
Abstract: At the time of writing, the ongoing pandemic of coronavirus disease (COVID-19) has caused severe impacts on society, economy and people's daily lives. People constantly express their opinions on various aspects of the pandemic on social media, making user-generated content an important source for understanding public emotions and concerns. In this paper, we perform a comprehensive analysis on the affective trajectories of the American people and the Chinese people based on Twitter and Weibo posts between January 20th, 2020 and May 11th 2020. Specifically, by identifying people's sentiments, emotions (i.e., anger, disgust, fear, happiness, sadness, surprise) and the emotional triggers (e.g., what a user is angry/sad about) we are able to depict the dynamics of public affect in the time of COVID-19. By contrasting two very different countries, China and the Unites States, we reveal sharp differences in people's views on COVID-19 in different cultures. Our study provides a computational approach to unveiling public emotions and concerns on the pandemic in real-time, which would potentially help policy-makers better understand people's need and thus make optimal policy.
摘要：在写这篇文章的时候，冠状病毒病的流行持续（COVID-19）已造成对社会，经济和人们的日常生活造成严重影响。人们不断表达自己对流行的社交媒体各方面的意见，使用户产生的内容对于理解公众情绪和关注的重要来源。在本文中，我们通过识别人的情操执行基于2020年1月20日和5月11日到2020年具体来说之间Twitter和微博的帖子美国人民的情感轨迹和中国人的综合分析，情感（即，愤怒，厌恶，恐惧，快乐，悲伤，惊讶）和情感触发器（例如，用户是什么生气/难过），我们能够描绘出公众的动力学COVID-19的时间的影响。通过对比两个非常不同的国家，中国和团结状态，我们揭示了人的看法上COVID-19在不同文化的巨大差异。我们的研究提供实时的流感大流行的计算方法来揭幕公众情绪和顾虑，这将有可能帮助决策者更好地了解人们的需求，从而使最优策略。

17. SNR-based teachers-student technique for speech enhancement [PDF] 返回目录
Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao
Abstract: It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the teacher models under multiple small-range SNRs that do not coincide with each other so that they can perform speech enhancement well within the specific SNR range. Then, we choose different teacher models to supervise the training of the student model according to the SNR of the training data. Eventually, the student model can perform speech enhancement under both high SNR and low SNR. To evaluate the proposed method, we constructed a dataset with an SNR ranging from -20dB to 20dB based on the public dataset. We experimentally analyzed the effectiveness of the SNR-based teachers-student technique and compared the proposed method with several state-of-the-art methods.
摘要：这是非常用于语音增强方法都高信噪比（SNR），并同时低SNR下具有挑战性的实现稳健的性能。在本文中，我们提出了一种方法，它集成了一个基于SNR的教师与学生的技术和时域掌中来处理这个问题。具体而言，该方法由多个教师模型和学生模型。首先，我们下火车的多个小范围的信噪比不相互一致，使他们能够在特定的SNR范围内进行语音增强以及老师的机型。于是，我们选择不同的教师模型根据训练数据的SNR监督学生模型的培训。最终，学生模型既可以高信噪比和低信噪比下进行语音增强。为了评估所提出的方法，我们构建了一个数据集的SNR基于公共数据集从-20dB至20dB。我们实验分析基于SNR的教师，学生技术的有效性，并比较了该方法与国家的最先进的几种方法。

18. Sub-band Knowledge Distillation Framework for Speech Enhancement [PDF] 返回目录
Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li
Abstract: In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pre-train an elite-level sub-band enhancement model (teacher model) for each sub-band. These teacher models are dedicated to processing their own sub-bands. Next, under the teacher models' guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model's performance is further improved. To evaluate our proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model's performance, which exceeds the full-band model by employing fewer parameters.
摘要：在单通道语音增强的基础上，全波段光谱特征的方法已被广泛研究。然而，只有少数几个方法要注意非全波段的光谱特征。在本文中，我们探索基于用于单通道语音增强子带频谱映射知识蒸馏框架。具体而言，我们把全频带分成多个子频带的预培养为每个子带的精英级子频带扩展模型（教师模型）。这些老师车型专用于处理自己的子带。接下来，老师模型的指导下，我们培养的是为所有子带工作有一个大致子带增强模式（学生模型）。在不增加的模型参数和计算复杂性的数量，学生模型的性能进一步提高。为了评估我们提出的方法，我们在一个开源的数据集进行了大量的实验。最后的实验结果表明，从精英级车型老师的指导极大地提高了学生模型的性能，其采用更少的参数超过了全频段的模型。

19. SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions [PDF] 返回目录
Mao Ye, Chengyue Gong, Qiang Liu
Abstract: State-of-the-art NLP models can often be fooled by human-unaware transformations such as synonymous word substitution. For security reasons, it is of critical importance to develop models with certified robustness that can provably guarantee that the prediction is can not be altered by any possible synonymous word substitution. In this work, we propose a certified robust method based on a new randomized smoothing technique, which constructs a stochastic ensemble by applying random word substitutions on the input sentences, and leverage the statistical properties of the ensemble to provably certify the robustness. Our method is simple and structure-free in that it only requires the black-box queries of the model outputs, and hence can be applied to any pre-trained models (such as BERT) and any types of models (world-level or subword-level). Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks. To the best of our knowledge, we are the first work to achieve certified robustness on large systems such as BERT with practically meaningful certified accuracy.
摘要：国家的最先进的NLP模型通常可以通过人不知道的转换所迷惑，如同义词替换。出于安全原因，它是至关重要的发展与认证的鲁棒性的模型，可以可证明保证预测不能被任何可能的同义词取代而改变。在这项工作中，我们提出了基于新的随机平滑技术，它通过对输入的句子把随机单词替换构造了一个随机合奏经过认证的可靠的方法，并充分利用合奏的统计特性，以可证明证明的鲁棒性。我们的方法是简单和自由结构，它只需要模型输出的黑盒的查询，并因此可以应用到任何预训练的模型（如BERT）和任何类型的模型（世界级或子词-水平）。我们的方法显著优于国家的最先进的最新方法上都IMDB和亚马逊文本分类任务认证的稳健性。据我们所知，我们是第一个工作，实现对大型系统认证的鲁棒性，例如BERT与现实意义认证的准确度。

20. Controlling Length in Image Captioning [PDF] 返回目录
Ruotian Luo, Greg Shakhnarovich
Abstract: We develop and evaluate captioning models that allow control of caption length. Our models can leverage this control to generate captions of different style and descriptiveness.
摘要：我们开发和评估字幕模式，使标题长度的控制。我们的模型可以利用这种控制产生不同的风格和描述性标题。

21. On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition [PDF] 返回目录
Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu
Abstract: Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E) models for automatic speech recognition. Currently, there are three promising E2E methods: recurrent neural network transducer (RNN-T), RNN attention-based encoder-decoder (AED), and Transformer-AED. In this study, we conduct an empirical comparison of RNN-T, RNN-AED, and Transformer-AED models, in both non-streaming and streaming modes. We use 65 thousand hours of Microsoft anonymized training data to train these models. As E2E models are more data hungry, it is better to compare their effectiveness with large amount of training data. To the best of our knowledge, no such comprehensive study has been conducted yet. We show that although AED models are stronger than RNN-T in the non-streaming mode, RNN-T is very competitive in streaming mode if its encoder can be properly initialized. Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode. We show that both streaming RNN-T and transformer-AED models can obtain better accuracy than a highly-optimized hybrid model.
摘要：最近，一直有力地推动转变，从混合模式结束到终端（E2E）模型的自动语音识别。目前，有三个极具发展前景E2E方法：反复发作的神经网络转换器（RNN-T），基于关注RNN编码解码器（AED），和变压器AED。在这项研究中，我们进行RNN-T，RNN-AED，和变压器AED模型的实证比较，在这两个非流数据和流模式。我们使用65000小时微软的匿名训练数据来训练这些模型。由于E2E模型更多的数据饿了，最好是它们的有效性与大量的训练数据的比较。据我们所知，没有这样全面的研究已进行了尚。我们表明，虽然AED模型在非流模式比RNN-T更强，RNN-T是流模式，如果它的编码器可以进行正确的初始化非常有竞争力。在所有三个E2E模型，变压器AED在流式和非流模式达到最佳精度。我们表明，无论是流RNN-T和变压器AED模型可以比一个高度优化的混合模式获得更好的精度。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-01

目录

摘要