0%

【arxiv论文】 Computation and Language 2020-08-18

目录

1. Emotion Carrier Recognition from Personal Narratives [PDF] 摘要
2. Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement [PDF] 摘要
3. Narrative Interpolation for Generating and Understanding Stories [PDF] 摘要
4. HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition [PDF] 摘要
5. Evaluating for Diversity in Question Generation over Text [PDF] 摘要
6. A Survey of Active Learning for Text Classification using Deep Neural Networks [PDF] 摘要
7. BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [PDF] 摘要
8. Comparison of Syntactic Parsers on Biomedical Texts [PDF] 摘要
9. Logical Semantics, Dialogical Argumentation, and Textual Entailment [PDF] 摘要
10. Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size [PDF] 摘要
11. Efficient Knowledge Graph Validation via Cross-Graph Representation Learning [PDF] 摘要
12. OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames [PDF] 摘要
13. DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification [PDF] 摘要
14. TopicBERT: A Transformer transfer learning based memory-graph approach for multimodal streaming social media topic detection [PDF] 摘要
15. Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance [PDF] 摘要
16. TextDecepter: Hard Label Black Box Attack on Text Classifiers [PDF] 摘要
17. SGG: Spinbot, Grammarly and GloVe based Fake News Detection [PDF] 摘要
18. Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation [PDF] 摘要
19. Label-Wise Document Pre-Training for Multi-Label Text Classification [PDF] 摘要
20. Quantification of BERT Diagnosis Generalizability Across Medical Specialties Using Semantic Dataset Distance [PDF] 摘要
21. Do face masks introduce bias in speech technologies? The case of automated scoring of speaking proficiency [PDF] 摘要
22. Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages [PDF] 摘要
23. Learning Interpretable Representation for Controllable Polyphonic Music Generation [PDF] 摘要
24. PIANOTREE VAE: Structured Representation Learning for Polyphonic Music [PDF] 摘要
25. DeVLBert: Learning Deconfounded Visio-Linguistic Representations [PDF] 摘要
26. Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder [PDF] 摘要
27. Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition [PDF] 摘要
28. Adaptation Algorithms for Speech Recognition: An Overview [PDF] 摘要

摘要

1. Emotion Carrier Recognition from Personal Narratives [PDF] 返回目录
  Aniruddha Tammewar, Alessandra Cervone, Giuseppe Riccardi
Abstract: Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion classification (i.e. happy, sad). However, these tasks might overlook more fine-grained information that could nevertheless prove relevant for understanding PNs. In this work, we propose a novel task for Narrative Understanding: Emotion Carrier Recognition (ECR). We argue that automatic recognition of emotion carriers, the text fragments that carry the emotions of the narrator (i.e. 'loss of a grandpa', 'high school reunion'), from PNs, provides a deeper level of emotion analysis needed, for instance, in the mental healthcare domain. In this work, we explore the task of ECR using a corpus of PNs manually annotated with emotion carriers and investigate different baseline models for the task. Furthermore, we propose several evaluation strategies for the task. Based on the inter-annotator agreement, the task in itself was found to be complex and subjective for humans. Nevertheless, we discuss evaluation metrics that could be suitable for applications based on ECR.
摘要:个人叙述(PN) - 从自己的经验事实,事件和想法的回忆 - 在日常对话中经常被使用。到目前为止,期票主要被用于探索任务,比如价预测或情感类别(即快乐,悲伤)。然而,这些任务可能忽略更细粒度的信息可能仍然证明相关理解的期票。在这项工作中,我们提出了叙事理解一个新的任务:情感载体识别(ECR)。我们认为情感的载体,携带叙述者的情感的文字片段是自动识别,从期票,提供情感分析的需要更深层次的,比如(即“高中同学聚会”爷爷的损失“),在精神卫生保健领域。在这项工作中,我们将探讨使用带有情感的载体手动注释期票的语料库的ECR的任务,并探讨了不同任务的基本模式。此外,我们提出了任务几个评估策略。基于该-注释间协议,在自身任务被认为是复杂的和主观的人类。不过,我们讨论的评价指标,可能是适用于基于ECR的应用程序。

2. Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement [PDF] 返回目录
  Shaunak Mishra, Manisha Verma, Yichao Zhou, Kapil Thadani, Wei Wang
Abstract: In the online advertising industry, the process of designing an ad creative (i.e., ad text and image) requires manual labor. Typically, each advertiser launches multiple creatives via online A/B tests to infer effective creatives for the target audience, that are then refined further in an iterative fashion. Due to the manual nature of this process, it is time-consuming to learn, refine, and deploy the modified creatives. Since major ad platforms typically run A/B tests for multiple advertisers in parallel, we explore the possibility of collaboratively learning ad creative refinement via A/B tests of multiple advertisers. In particular, given an input ad creative, we study approaches to refine the given ad text and image by: (i) generating new ad text, (ii) recommending keyphrases for new ad text, and (iii) recommending image tags (objects in image) to select new ad image. Based on A/B tests conducted by multiple advertisers, we form pairwise examples of inferior and superior ad creatives, and use such pairs to train models for the above tasks. For generating new ad text, we demonstrate the efficacy of an encoder-decoder architecture with copy mechanism, which allows some words from the (inferior) input text to be copied to the output while incorporating new words associated with higher click-through-rate. For the keyphrase and image tag recommendation task, we demonstrate the efficacy of a deep relevance matching model, as well as the relative robustness of ranking approaches compared to ad text generation in cold-start scenarios with unseen advertisers. We also share broadly applicable insights from our experiments using data from the Yahoo Gemini ad platform.
摘要:在网络广告行业,设计一个广告创意(即,广告文字和图片)的过程需要手工劳动。典型地,通过在线A / B测试每个广告客户发射多个广告来推断有效素材为目标受众,被然后以迭代方式进一步细化。由于这一过程的人工性质,这是费时学习,改进,并部署修改后的广告。由于主要的广告平台通常运行多个广告客户A / B测试并行,我们探索通过A协作学习广告创意细化多位广告客户/ B测试的可能性。特别是,鉴于输入广告的创意,我们的研究方法提炼由给定的广告文字和图片:(i)产生新的广告文字,(二)推荐为新广告文字的关键字句,及(iii)建议的图像标签(对象图像)选择新的广告图片。基于A /多个广告客户所进行的B测试,我们形成下部和上部广告素材的成对的例子,并使用这样的对来训练模型用于上述任务。用于生成新的广告文本中,我们证明的编码器 - 解码器的体系结构与复制机制,它允许从(劣)输入文本一些字被复制到输出,同时结合相关联的新的单词的功效更高的点击率。对于的关键词和图像标签推荐任务,我们证明了深刻的关联匹配模型的有效性,以及方法相比于广告文本生成与看不见的广告主冷启动情况下排名的相对稳健性。我们也同意使用从双子座雅虎广告平台的数据从我们的实验中广泛应用的见解。

3. Narrative Interpolation for Generating and Understanding Stories [PDF] 返回目录
  Su Wang, Greg Durrett, Katrin Erk
Abstract: We propose a method for controlled narrative/story generation where we are able to guide the model to produce coherent narratives with user-specified target endings by interpolation: for example, we are told that Jim went hiking and at the end Jim needed to be rescued, and we want the model to incrementally generate steps along the way. The core of our method is an interpolation model based on GPT-2 which conditions on a previous sentence and a next sentence in a narrative and fills in the gap. Additionally, a reranker helps control for coherence of the generated text. With human evaluation, we show that ending-guided generation results in narratives which are coherent, faithful to the given ending guide, and require less manual effort on the part of the human guide writer than past approaches.
摘要:我们提出了控制了叙事/故事一代,我们是能够引导模式通过插值来产生与用户指定的目标结局连贯叙述的方法:例如,我们被告知,吉姆去徒步旅行,并在年底吉姆需要被救出的,我们希望模型逐步产生沿途步骤。我们的方法的核心是基于GPT-2插值模型对前一句和在缝隙文字说明,并填补了下一句该条件。此外,reranker有助于为生成的文本的一致性控制。随着人类评测中,我们显示了叙述它们是一致的,忠实于给定的结局指导,并要求对人的引导作家比过去的方法的部分减少人工的努力,结局导向的生成结果。

4. HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition [PDF] 返回目录
  Leon Weber, Mario Sänger, Jannes Münchmeyer, Maryam Habibi, Ulf Leser
Abstract: Summary: Named Entity Recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, highly accurate, and robust towards variations in text genre and style. To this end, we propose HunFlair, an NER tagger covering multiple entity types integrated into the widely used NLP framework Flair. HunFlair outperforms other state-of-the-art standalone NER tools with an average gain of 7.26 pp over the next best tool, can be installed with a single command and is applied with only four lines of code. Availability: HunFlair is freely available through the Flair framework under an MIT license: this https URL and is compatible with all major operating systems. Contact:{weberple,saengema,alan.akbik}@informatik.this http URL
摘要:总结:命名实体识别(NER)是在生物医学信息提取管线的一个重要步骤。对于NER工具应该很容易使用,覆盖多个实体类型,高度准确,以及对文本体裁和风格变化的鲁棒性。为此,我们提出HunFlair,一个NER恶搞覆盖集成到广泛使用的NLP框架天才多个实体类型。 HunFlair优于其他国家的最先进的独立NER工具,7.26 PP的在接下来的最佳工具的平均增益,可以用一个命令来安装和被施加有只有四个行代码。可用性:HunFlair是免费提供通过弗莱尔框架下的MIT许可:此HTTPS URL,并与所有主要的操作系统兼容。联系方式:{weberple,saengema,alan.akbik} @ informatik.this HTTP URL

5. Evaluating for Diversity in Question Generation over Text [PDF] 返回目录
  Michael Sejr Schlichtkrull, Weiwei Cheng
Abstract: Generating diverse and relevant questions over text is a task with widespread applications. We argue that commonly-used evaluation metrics such as BLEU and METEOR are not suitable for this task due to the inherent diversity of reference questions, and propose a scheme for extending conventional metrics to reflect diversity. We furthermore propose a variational encoder-decoder model for this task. We show through automatic and human evaluation that our variational model improves diversity without loss of quality, and demonstrate how our evaluation scheme reflects this improvement.
摘要:在后的文本的多样化和相关的问题是广泛应用的任务。我们认为,常用的评价指标,如BLEU和流星不适合这个任务,因为基准问题固有的多样性,并提出延长传统的指标来反映多样性的方案。我们还提出了这个任务变编码器,解码器模型。我们通过自动和人工评估表明,我们的模型变提升多样性没有质量损失,并证明我们的评估方案是如何体现这种改进。

6. A Survey of Active Learning for Text Classification using Deep Neural Networks [PDF] 返回目录
  Christopher Schröder, Andreas Niekler
Abstract: Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years. For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity. By using the superior text classification performance of NNs for AL, we can either increase a model's performance using the same amount of data or reduce the data and therefore the required annotation efforts while keeping the same performance. We review AL for text classification using deep neural networks (DNNs) and elaborate on two main causes which used to hinder the adoption: (a) the inability of NNs to provide reliable uncertainty estimates, on which the most commonly used query strategies rely, and (b) the challenge of training DNNs on small data. To investigate the former, we construct a taxonomy of query strategies, which distinguishes between data-based, model-based, and prediction-based instance selection, and investigate the prevalence of these classes in recent research. Moreover, we review recent NN-based advances in NLP like word embeddings or language models in the context of (D)NNs, survey the current state-of-the-art at the intersection of AL, text classification, and DNNs and relate recent advances in NLP to AL. Finally, we analyze recent work in AL for text classification, connect the respective query strategies to the taxonomy, and outline commonalities and shortcomings. As a result, we highlight gaps in current research and present open research questions.
摘要:自然语言处理(NLP)和神经网络(神经网络)在最近几年经历了两个变化显著。对于主动学习(AL)的目的,神经网络,然而,不太常用的 - 尽管他们目前的受欢迎程度。利用神经网络对AL优越的文本分类的性能,我们可以用相同的数据量或者提高模型的性能或降低数据,因此所需要的注解努力,同时保持相同的性能。我们使用深层神经网络(DNNs)审查AL文本分类,并阐述其用于阻止通过两个主要原因:(1)神经网络的无法提供可靠的不确定性的估计,对其中的最常用的查询策略依赖,并(二)对小数据训练DNNs的挑战。调查前,我们构造的查询策略分类,其中区分数据为基础,基于模型和基于预测的实例选择,并调查这些类在最近的研究中流行。此外,我们回顾像的(d)神经网络上下文字的嵌入或语言模型NLP最近基于NN-的进步,在AL,文本分类的交叉点测量后,当前国家的最先进的,和DNNs并涉及近在NLP预付款AL。最后,我们分析最近的工作在AL文本分类,相应的查询策略,连接到分类,和大纲共性和缺点。因此,我们强调在目前的研究和目前开放的研究问题的空白。

7. BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [PDF] 返回目录
  Josef Jon, Martin Fajčík, Martin Dočekal, Pavel Smrž
Abstract: This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 Commonsense Validation and Explanation. We participated in all three subtasks. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset, or translated model inputs. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss. In subtask C, our submission, which is based on pretrained sequence-to-sequence model (BART), ranked 1st in BLEU score ranking, however, we show that the correlation between BLEU and human evaluation, in which our submission ended up 4th, is low. We analyse the metrics used in the evaluation and we propose an additional score based on model from subtask B, which correlates well with our manual ranking, as well as reranking method based on the same principle. We performed an error and dataset analysis for all subtasks and we present our findings.
摘要:本文介绍了在SemEval 2020任务4常识验证和说明的BUT-FIT团队的工作。我们参加了所有三个子任务。在子任务A和B,我们的意见是基于预训练的语言表示模型(即ALBERT)和数据增强。我们尝试了解决问题的方案为另一种语言,捷克,多语种的模型和机器翻译的数据集,或者翻译模型输入的方式。我们发现,具有较强的机器翻译系统,我们的系统可以在另一种语言使用一个小的精度损失。在子任务C,我们的提交,这是基于预训练序列到序列模型(BART),在BLEU排名第一得分排名,然而,我们表明,BLEU和人类的评价之间的相关性,在我们提交结束了第四,低。我们分析在评价中使用的指标,并提出了一种基于从子任务B,它基于同样的原理我们手动排名,以及重新分级方法很好的相关性模型的加分。我们进行了错误和数据集分析所有的子任务和我们提出我们的调查结果。

8. Comparison of Syntactic Parsers on Biomedical Texts [PDF] 返回目录
  Maria Biryukov
Abstract: Syntactic parsing is an important step in the automated text analysis which aims at information extraction. Quality of the syntactic parsing determines to a large extent the recall and precision of the text mining results. In this paper we evaluate the performance of several popular syntactic parsers in application to the biomedical text mining.
摘要:句法分析是在自动文本分析,其目的是信息提取的一个重要步骤。的句法分析的质量决定在很大程度上文本挖掘结果的召回和精度。在本文中,我们评估应用到生物医学文本挖掘一些流行的语法分析器的性能。

9. Logical Semantics, Dialogical Argumentation, and Textual Entailment [PDF] 返回目录
  Davide Catta, Richard Moot, Christian Retoré
Abstract: In this chapter, we introduce a new dialogical system for first order classical logic which is close to natural language argumentation, and we prove its completeness with respect to usual classical validity. We combine our dialogical system with the Grail syntactic and semantic parser developed by the second author in order to address automated textual entailment, that is, we use it for deciding whether or not a sentence is a consequence of a short text. This work-which connects natural language semantics and argumentation with dialogical logic-can be viewed as a step towards an inferentialist view of natural language semantics.
摘要:在本章中,我们介绍一阶经典逻辑一种新的对话系统,这是接近自然语言的论证,并证明其完整性相对于通常的经典有效性。我们结合我们的对话系统,由第二作者,以发展到地址自动文字蕴涵圣杯句法和语义解析,那就是,我们用它来决定一个句子是否是短文本的结果。这个工作,其连接自然语言的语义和论证与对话逻辑可以被看作是对自然语言的语义的inferentialist视图的步骤。

10. Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size [PDF] 返回目录
  Davis Yoshida, Allyson Ettinger, Kevin Gimpel
Abstract: Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years. While the results from these models are impressive, applying them can be extremely computationally expensive, as is pretraining new models with the latest architectures. We present a novel method for applying pretrained transformer language models which lowers their memory requirement both at training and inference time. An additional benefit is that our method removes the fixed context size constraint that most transformer models have, allowing for more flexible use. When applied to the GPT-2 language model, we find that our method attains better perplexity than an unmodified GPT-2 model on the PG-19 and WikiText-103 corpora, for a given amount of computation or memory.
摘要:微调预训练变压器下游任务已成为在过去几年中,NLP的标准方法。虽然从这些模型得出的结果是令人印象深刻,应用它们可以是非常昂贵的计算,如训练前的新车型与最新的架构。我们提出了将变压器预先训练语言模型,其无论是在训练和推理时间降低了它们的内存需求的新方法。另外一个好处是,我们的方法去除固定上下文大小的限制,大多数变压器型号有,允许更灵活的使用。当应用到GPT-2语言模型,我们发现,我们的方法获得高于上的PG-19和wikitext的-103语料库未修改GPT-2模型,用于计算或存储一定量更好的困惑。

11. Efficient Knowledge Graph Validation via Cross-Graph Representation Learning [PDF] 返回目录
  Yaqing Wang, Fenglong Ma, Jing Gao
Abstract: Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.
摘要:在信息提取的最新进展已经从大规模语料库挖掘巨大的动机知识图(KGS)的自动构建。然而,嘈杂的事实不可避免地被引入,可以通过自动提取引起公斤。为了验证一个KG内部的事实(即,三胞胎)的正确性,一种可能的方法是通过捕获事实的语义含义三胞胎成矢量表示映射。尽管许多代表学习方法已经被用于知识图发展,这些方法并不能有效验证。他们通常认为事实是正确的,并因此可能过度拟合嘈杂的事实,无法检测到这样的事实。实现有效KG验证,我们建议利用外部人力策划KG作为辅助信息源,以帮助在目标KG检测错误。外部KG是对人策划的知识仓库建成并倾向于具有精度高。在另一方面,尽管KG建立由信息提取从文本目标精度低,它可以覆盖新的或特定领域的事实是不以任何人策划的仓库。为了解决这个艰巨的任务,我们提出了一个交叉图表示学习框架,即CrossVal,它可以利用外部KG验证的事实在目标KG有效。这是通过嵌入基于它们的语义三胞胎,拉丝横KG负样本并且估计的置信度得分,基于它的正确性的程度每个三元组实现的。我们评估对跨越不同领域的数据集所提出的框架。实验结果表明,与大型幼稚园的国家的最先进的方法相比,该框架达到最佳性能。

12. OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames [PDF] 返回目录
  Alyssa Smith, David Assefa Tofu, Mona Jalal, Edward Edberg Halim, Yimeng Sun, Vidya Akavoor, Margrit Betke, Prakash Ishwar, Lei Guo, Derry Wijaya
Abstract: When journalists cover a news story, they can cover the story from multiple angles or perspectives. A news article written about COVID-19 for example, might focus on personal preventative actions such as mask-wearing, while another might focus on COVID-19's impact on the economy. These perspectives are called "frames," which when used may influence public perception and opinion of the issue. We introduce a Web-based system for analyzing and classifying frames in text documents. Our goal is to make effective tools for automatic frame discovery and labeling based on topic modeling and deep learning widely accessible to researchers from a diverse array of disciplines. To this end, we provide both state-of-the-art pre-trained frame classification models on various issues as well as a user-friendly pipeline for training novel classification models on user-provided corpora. Researchers can submit their documents and obtain frames of the documents. The degree of user involvement is flexible: they can run models that have been pre-trained on select issues; submit labeled documents and train a new model for frame classification; or submit unlabeled documents and obtain potential frames of the documents. The code making up our system is also open-sourced and well-documented, making the system transparent and expandable. The system is available on-line at this http URL and via our GitHub page this https URL .
摘要:当记者涵盖了新闻故事,他们可以覆盖从多个角度或观点的故事。写COVID-19例如,新闻文章可能专注于个人的预防措施,如戴面具式,而另一个可能会集中在COVID-19的对经济的影响。这些观点被称为其使用可能会影响公众的看法和问题的看法时,“框”。我们引入一个基于Web的系统,用于分析和分类在文本文档框架。我们的目标是基于主题建模和深入学习,从学科的多样化研究者广泛接受,使有效的工具自动帧发现和标签。为此,我们提供的各种问题以及为用户提供的语料训练的新分类模型,一个用户友好的管道既国家的最先进的预先训练帧分类模型。研究人员可以提交文件并获得文件的帧。用户参与的程度是灵活的:它们可以运行已在选择问题预先训练模式;提交标记文件,火车帧分类的新模式;或者提交未标记的文件并获得文件的潜在帧。使我们的系统中的代码也是开源的,证据充分的,使得系统的透明度和可扩展性。该系统可上线这个HTTP URL,并通过我们的GitHub页面此HTTPS URL。

13. DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification [PDF] 返回目录
  Libo Qin, Wanxiang Che, Yangming Li, Minheng Ni, Ting Liu
Abstract: In dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately. Most of the existing systems either treat them as separate tasks or just jointly model the two tasks by sharing parameters in an implicit way without explicitly modeling mutual interaction and relation. To address this problem, we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks by introducing a co-interactive relation layer. In addition, the proposed relation layer can be stacked to gradually capture mutual knowledge with multiple steps of interaction. Especially, we thoroughly study different relation layers and their effects. Experimental results on two public datasets (Mastodon and Dailydialog) show that our model outperforms the state-of-the-art joint model by 4.3% and 3.4% in terms of F1 score on dialog act recognition task, 5.7% and 12.4% on sentiment classification respectively. Comprehensive analysis empirically verifies the effectiveness of explicitly modeling the relation between the two tasks and the multi-steps interaction mechanism. Finally, we employ the Bidirectional Encoder Representation from Transformer (BERT) in our framework, which can further boost our performance in both tasks.
摘要:在对话系统,对话行为的认同和情感分类是两个相关的任务,以捕捉扬声器的意图,在那里对话行为和情绪可以表明显性和隐性意图分开。大多数现有系统要么把它们作为单独的任务或只是共同但没有明确建模相互作用和关系,以含蓄的方式共享参数的两个任务进行建模。为了解决这个问题,我们提出了一个深合作互动关系网络(DCR-网)明确地考虑交叉影响,并通过引入合作互动关系层两个任务之间的互动模式。此外,所提出的关系层可堆叠逐步占领相互了解与互动的多个步骤。特别是,我们深入研究不同的关系层及其效果。在两个公共数据集(乳齿象和Dailydialog)的实验结果表明我们的模型优于4.3%和3.4%的国家的最先进的联合模型在F1得分对话行为识别任务,5.7%,市场人气12.4%的条款分类分别。综合分析实证检验的明确建模的两个任务和多步骤互动机制之间的关系的有效性。最后,我们采用从变压器(BERT)的双向编码器代表在我们的框架,它可以进一步加强在这两个任务我们的表现。

14. TopicBERT: A Transformer transfer learning based memory-graph approach for multimodal streaming social media topic detection [PDF] 返回目录
  Meysam Asgari-Chenaghlu, Mohammad-Reza Feizi-Derakhshi, Leili farzinvash, Mohammad-Ali Balafar, Cina Motamed
Abstract: Real time nature of social networks with bursty short messages and their respective large data scale spread among vast variety of topics are research interest of many researchers. These properties of social networks which are known as 5'Vs of big data has led to many unique and enlightenment algorithms and techniques applied to large social networking datasets and data streams. Many of these researches are based on detection and tracking of hot topics and trending social media events that help revealing many unanswered questions. These algorithms and in some cases software products mostly rely on the nature of the language itself. Although, other techniques such as unsupervised data mining methods are language independent but many requirements for a comprehensive solution are not met. Many research issues such as noisy sentences that adverse grammar and new online user invented words are challenging maintenance of a good social network topic detection and tracking methodology; The semantic relationship between words and in most cases, synonyms are also ignored by many of these researches. In this research, we use Transformers combined with an incremental community detection algorithm. Transformer in one hand, provides the semantic relation between words in different contexts. On the other hand, the proposed graph mining technique enhances the resulting topics with aid of simple structural rules. Named entity recognition from multimodal data, image and text, labels the named entities with entity type and the extracted topics are tuned using them. All operations of proposed system has been applied with big social data perspective under NoSQL technologies. In order to present a working and systematic solution, we combined MongoDB with Neo4j as two major database systems of our work. The proposed system shows higher precision and recall compared to other methods in three different datasets.
摘要:具有突发性短消息和广大中各种主题的各自的大数据规模传播的社交网络的实时性是许多研究者的研究兴趣。这被称为大数据的5'Vs社交网络的这些特性导致了许多独特的和启迪的算法和技术应用到大型社交数据集和数据流。许多这些研究都是基于检测和热点跟踪和趋势分析是帮助揭示许多悬而未决的问题社交媒体活动。这些算法在某些情况下,软件产品主要依赖于语言本身的性质。虽然,其它技术,如无监督的数据挖掘方法是独立的语言,但全面解决许多不符合要求。许多研究问题,如嘈杂的句子不良语法和新的在线用户臆造词是具有挑战性的一个良好的社会网络话题检测的维护和跟踪方法;字与字之间,而且在大多数情况下的语义关系,同义词也被许多这些研究忽略。在这项研究中,我们使用变压器与增量社区检测算法相结合。变压器在一方面,提供了在不同的语境词之间的语义关系。在另一方面,所提出的图挖掘技术,增强了简单的结构规则援助所产生的话题。从多模数据,图像和文本命名实体识别,标签的命名实体与实体类型和所提取的主题使用它们调整。提出了系统的所有操作已被应用与下的NoSQL技术的大的社会数据的观点。为了呈现的工作和系统的解决方案,我们与Neo4j的结合的MongoDB作为我们工作的两个主要的数据库系统。所提出的系统显示较高的精度和召回相比,在三个不同的数据集的其他方法。

15. Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance [PDF] 返回目录
  Tafseer Ahmed, Muhammad Suffian Nizami, Muhammad Yaseen Khan
Abstract: Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as genetic relationship, mutual intelligibility, and the usage of one's vocabulary into other. There are various methods through which LS is evaluated. In the same regard, this paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with them. The system converts the words into the corresponding International Phonetic Alphabet (IPA), followed by the conversion of IPA into its set of articulatory features. Later, the lists of the set of articulatory features are compared using the proposed method. As an example, PED gives edit distance of German word vater and Persian word pidar as 0.82; and similarly, Hebrew word shalom and Arabic word salaam as 0.93, whereas for a juxtapose comparison, their IPA based edit distances are 4 and 2 respectively. Experiments are performed with six languages (Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu). In this regard, we extracted part of speech wise word-lists from the Universal Dependency corpora and evaluated the LS for every pair of language. Thus, with the proposed approach, we find the genetic affinity, similarity, and borrowing/loan-words despite having script differences and sound variation phenomena among these languages.
摘要:词汇相似度两种语言揭示了许多有趣的语言见解之间(LS),如遗传关系,相互理解,和一个人的词汇的使用到其他。有通过该LS被评估的各种方法。在同一个点上,使用的使用关节字母软比较本文礼物语音编辑距离(PED)的方法具有与它们相关联。该系统转换的话成相应的国际音标(IPA),然后是IPA转化成其组的关节的特征。后来,一套关节功能的列表是使用该方法比较。作为一个例子,给出了PED德语单词壶腹和波斯字pidar为0.82的编辑距离;并且类似地,希伯来语字沙洛姆和阿拉伯字salaam的作为0.93,而对于比较幻影长矛,其基于IPA编辑距离分别为4和2。实验是用六种语言(阿拉伯语,印地文,马拉地语,波斯语,梵语,和乌尔都语)进行。在这方面,我们提取从通用语料库依赖演讲明智字表的一部分,并评估了LS每对语言的。因此,所提出的方法,我们尽管有这些语言之间有差异的脚本和声音变化的现象找到遗传亲和力,相似性,以及借款/贷款的话。

16. TextDecepter: Hard Label Black Box Attack on Text Classifiers [PDF] 返回目录
  Sachin Saxena
Abstract: Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarialexamples. The generation of these adversarial examples helps to make the models more robust and give as an insight of the underlying decision making of these models. Over the years, researchers have successfully attacked image classifiers in, both, white and black-box setting. Although, these methods are not directly applicable to texts as text data is discrete in nature. In recent years, research on crafting adversarial examples against textual applications has been on the rise. In this paper, we present a novel approach for hard label black-box attacks against Natural Language Processing (NLP) classifiers, where no model information is disclosed, and an attacker can only query the model to get final decision of the classifier, without confidence scores of the classes involved. Such attack scenario is applicable to real world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection
摘要:机器学习已被证明是容易受到精心制作的样品,被称为adversarialexamples。这些对抗性的例子生成有助于使模型更加坚固,并给这些模型的基本决策的洞察力。多年来,研究人员已经成功地攻击了图像分类中,无论是,白色和黑色盒设置。虽然,作为文本数据在本质上是离散的,这些方法都不能直接应用于文本。近年来,在起草文本对抗敌对的应用实例研究一直在上升。在本文中,我们目前在那里被披露​​没有模型信息,对自然语言处理(NLP)分类,硬标签黑箱攻击的新方法,并且攻击者只能查询模型来获得分类的最终决定,没有信心该班的分数参与。这样的攻击情形适用于现实世界的黑盒子模型被用于安全敏感的应用,如情感分析和有毒含量检测

17. SGG: Spinbot, Grammarly and GloVe based Fake News Detection [PDF] 返回目录
  Akansha Gautam, Koteswar Rao Jerripothula
Abstract: Recently, news consumption using online news portals has increased exponentially due to several reasons, such as low cost and easy accessibility. However, such online platforms inadvertently also become the cause of spreading false information across the web. They are being misused quite frequently as a medium to disseminate misinformation and hoaxes. Such malpractices call for a robust automatic fake news detection system that can keep us at bay from such misinformation and hoaxes. We propose a robust yet simple fake news detection system, leveraging the tools for paraphrasing, grammar-checking, and word-embedding. In this paper, we try to the potential of these tools in jointly unearthing the authenticity of a news article. Notably, we leverage Spinbot (for paraphrasing), Grammarly (for grammar-checking), and GloVe (for word-embedding) tools for this purpose. Using these tools, we were able to extract novel features that could yield state-of-the-art results on the Fake News AMT dataset and comparable results on Celebrity datasets when combined with some of the essential features. More importantly, the proposed method is found to be more robust empirically than the existing ones, as revealed in our cross-domain analysis and multi-domain analysis.
摘要:近日,新闻采用在线门户网站的新闻消费量成倍增加,由于多种原因,如成本低,拆装方便。然而,这样的网络平台,在不经意间也成为散布在整个网络虚假信息的原因。他们被滥用相当频繁,传播错误信息和恶作剧的媒介。这种不法行为要求一个强大的自动假新闻的检测系统,可以在海湾保持我们从这样的误导和愚弄。我们提出了一个强大而简单的假新闻的检测系统,利用该工具进行复述,语法检查,和字嵌入。在本文中,我们尝试这些工具的潜力发掘联合新闻文章的真实性。值得注意的是,我们的杠杆作用Spinbot(为意译),grammarly破解(对于语法检查)和手套(对于字嵌入)用于此目的的工具。利用这些工具,我们能够提取的新功能时的一些基本特征相结合,可以产生的假新闻AMT数据集和数据集名人比较的结果,国家的先进成果。更重要的是,该方法被认为是经验比现有的更稳健,因为在我们的跨域分析和多域分析显示。

18. Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation [PDF] 返回目录
  Goran Glavaš, Ivan Vulić
Abstract: Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level language understanding. The recent advent of end-to-end neural language learning, self-supervised via language modeling (LM), and its success on a wide range of language understanding tasks, however, questions this belief. In this work, we empirically investigate the usefulness of supervised parsing for semantic language understanding in the context of LM-pretrained transformer networks. Relying on the established fine-tuning paradigm, we first couple a pretrained transformer with a biaffine parsing head, aiming to infuse explicit syntactic knowledge from Universal Dependencies (UD) treebanks into the transformer. We then fine-tune the model for language understanding (LU) tasks and measure the effect of the intermediate parsing training (IPT) on downstream LU performance. Results from both monolingual English and zero-shot language transfer experiments (with intermediate target-language parsing) show that explicit formalized syntax, injected into transformers through intermediate supervised parsing, has very limited and inconsistent effect on downstream LU performance. Our results, coupled with our analysis of transformers' representation spaces before and after intermediate parsing, make a significant step towards providing answers to an essential question: how (un)availing is supervised parsing for high-level semantic language understanding in the era of large neural models?
摘要:传统的NLP在很长时间以来(监管)语法必要为成功的更高级别的语言理解分析。最近结束到终端的神经语言学习,自我监督的通过语言模型(LM)的出现,及其对广泛的但是语言理解任务,问题,这种信念的成功。在这项工作中,我们凭经验调查监督分析的有用性语义语言在LM-预训练的变压器网络的上下文中理解。依托建立的微调模式,我们首先对夫妇预训练变压器用biaffine解析头,旨在从通用相关性(UD)树库到变压器注入明确的语法知识。然后,我们微调模型语言理解(LU)的任务和衡量上下游LU性能中间解析培训(IPT)的效果。来自单语英语和零射门的语言转移实验(带中间目标语言解析)结果表明,明确的形式化语法,通过中间监督解析注入互感器,已经非常有限,而且效果不一致下游LU性能。我们的研究结果,加上我们之前和中间解析后变压器表示空间的分析,朝着一个重要的问题提供答案一个显著步:如何(UN)援用是监督解析高层语义语言理解的大时代神经车型?

19. Label-Wise Document Pre-Training for Multi-Label Text Classification [PDF] 返回目录
  Han Liu, Caixia Yuan, Xiaojie Wang
Abstract: A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information. The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents. LW-PT implements this idea by constructing label-wise document classification tasks and trains label-wise document encoders. Finally, the pre-trained label-wise encoder is fine-tuned with the downstream MLTC task. Extensive experimental results validate that the proposed method has significant advantages over the previous state-of-the-art models and is able to discover reasonable label relationship. The code is released to facilitate other researchers.
摘要:多标签文本分类的(MLTC)的一个主要挑战是stimulatingly利用可能的标签差异和标签的相关性。在本文中,我们将处理通过开发标签明智的预培训(LW-PT)方法来获取与标签感知信息的文档表示这一挑战。其基本思想是,一个多标签文档可以表示为多个标签明智表示的组合,而且,相关的标签总是在相同或类似的文件cooccur。 LW-PT实现了该想法通过构建标签明智的文件分类的任务和火车标签明智的文件编码器。最后,预先训练的标签明智编码器微调与下游MLTC任务。大量的实验验证,该方法具有较先进设备,最先进的机型显著优势,能够发现合理的标签关系。发布代码,以方便其他研究人员。

20. Quantification of BERT Diagnosis Generalizability Across Medical Specialties Using Semantic Dataset Distance [PDF] 返回目录
  Mihir P. Khambete, William Su, Juan Garcia, Joseph Lehar, Martin Kang, Marcus A. Badgeley
Abstract: Deep learning models in healthcare may fail to generalize on data from unseen corpora. Additionally, no quantitative metric exists to tell how existing models will perform on new data. Previous studies demonstrated that NLP models of medical notes generalize variably between institutions, but ignored other levels of healthcare organization. We measured SciBERT diagnosis sentiment classifier generalizability between medical specialties using EHR sentences from MIMIC-III. Models trained on one specialty performed better on internal test sets than mixed or external test sets (mean AUCs 0.92, 0.87, and 0.83, respectively; p = 0.016). When models are trained on more specialties, they have better test performances (p < 1e-4). Model performance on new corpora is directly correlated to the similarity between train and test sentence content (p < 1e-4). Future studies should assess additional axes of generalization to ensure deep learning models fulfil their intended purpose across institutions, specialties, and practices.
摘要:在医疗保健深学习模型可能无法一概而论从看不见的语料库的数据。此外,没有量化指标的存在是为了告诉现有车型将如何在新的数据上执行。以往的研究表明,医疗票据NLP模型机构之间的概括可变,却忽略了医疗机构的其他级别。我们测量使用EHR句子从MIMIC-III医学专科之间SciBERT诊断情感分类普遍性。训练在一个特殊模型内部测试集比混合的或外部的测试集(P = 0.016平均的AUC 0.92,0.87,和0.83,分别地)更好执行。当模型更专业的训练,他们有更好的性能测试(P <1e-4)。上新的语料库模型的性能直接相关列车和测试句子含量(p <1e-4)之间的相似度。未来的研究应评估泛化的附加轴,以确保深学习模式跨学校,专业和实践履行其预期的目的。< font>

21. Do face masks introduce bias in speech technologies? The case of automated scoring of speaking proficiency [PDF] 返回目录
  Anastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner
Abstract: The COVID-19 pandemic has led to a dramatic increase in the use of face masks worldwide. Face coverings can affect both acoustic properties of the signal as well as speech patterns and have unintended effects if the person wearing the mask attempts to use speech processing technologies. In this paper we explore the impact of wearing face masks on the automated assessment of English language proficiency. We use a dataset from a large-scale speaking test for which test-takers were required to wear face masks during the test administration, and we compare it to a matched control sample of test-takers who took the same test before the mask requirements were put in place. We find that the two samples differ across a range of acoustic measures and also show a small but significant difference in speech patterns. However, these differences do not lead to differences in human or automated scores of English language proficiency. Several measures of bias showed no differences in scores between the two groups.
摘要:COVID-19大流行导致了在世界各地使用口罩的急剧增加。表面覆盖物可能会影响信号的两个声学性能以及语音模式,并有意想不到的效果,如果该人戴面具尝试使用语音处理技术。在本文中,我们探索穿着英语语言能力的自动评估口罩的影响。我们使用的数据集从一个大规模的讲了哪些考生被要求在考试管理期间戴口罩的测试,我们把它比作谁采取了同样的测试模板的要求是前考生相匹配的对照样本到位。我们发现,这两个样品在一系列的声不同的措施,也表明在语音模式一个小而显著差异。然而,这些差异不会导致英语语言能力的人或自动化成绩的差异。偏见的若干措施显示,在两组之间的分数没有差异。

22. Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages [PDF] 返回目录
  Chris C. Emezue, Bonaventure F.P. Dossou
Abstract: Over the years, there have been campaigns to include the African languages in the growing research on machine translation (MT) in particular, and natural language processing (NLP) in general. Africa has the highest language diversity, with 1500-2000 documented languages and many more undocumented or extinct languages(Lewis, 2009; Bendor-Samuel, 2017). This makes it hard to keep track of the MT research, models and dataset that have been developed for some of them. As the internet and social media make up the daily lives of more than half of the world(Lin, 2020), as well as over 40% of Africans(Campbell, 2019), online platforms can be useful in creating accessibility to researches, benchmarks and datasets in these African languages, thereby improving reproducibility and sharing of existing research and their results. In this paper, we introduce Lanfrica, a novel, on-going framework that employs a participatory approach to documenting researches, projects, benchmarks and dataset on African languages.
摘要:多年来,已经有活动,以包括在机器翻译(MT),特别是,和自然语言处理(NLP)一般成长研究非洲语言。非洲具有最高的语言多样性,1500-2000记载语言和许多无证或灭绝的语言(刘易斯,2009年;的Bendor塞缪尔,2017年)。这使得它很难跟踪已为他们中的一些开发的MT研究,模型和数据集。随着互联网和社交媒体组成的多个日常生活超过一半的世界的(林,2020年)的非洲人,以及超过40%(坎贝尔,2019),网络平台可以在创建无障碍研究有用,基准和数据集在这些非洲语言,从而改善重现性和共享现有的研究及其结果。在本文中,我们介绍Lanfrica,一种新的,持续的框架,采用参与式方法在非洲语言记录的研究,项目,标准和数据集。

23. Learning Interpretable Representation for Controllable Polyphonic Music Generation [PDF] 返回目录
  Ziyu Wang, Dingsu Wang, Yixiao Zhang, Gus Xia
Abstract: While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.
摘要:虽然深生成模型已成为算法组成的领导方法,它仍然控制生成过程,因为最深刻的学习模型的潜在变量缺乏良好的解释性一个具有挑战性的问题。由内容风格的解开思想的启发,设计了一种新颖的结构,VAE框架下,有效地学习和弦音乐两种解释的潜在因素:和弦和质感。目前的模式侧重于学习8拍朗琴组成部分。我们表明,这种弦质感的解开提供了一个可控的发电途径,导致广泛的应用范围,包括作曲风格转移,口感变差,和伴奏安排。客观和主观评价表明,该方法实现了成功的解开和高品质的控制音乐生成。

24. PIANOTREE VAE: Structured Representation Learning for Polyphonic Music [PDF] 返回目录
  Ziyu Wang, Yiyi Zhang, Yixiao Zhang, Junyan Jiang, Ruihan Yang, Junbo Zhao, Gus Xia
Abstract: The dominant approach for music representation learning involves the deep unsupervised model family variational autoencoder (VAE). However, most, if not all, viable attempts on this problem have largely been limited to monophonic music. Normally composed of richer modality and more complex musical structures, the polyphonic counterpart has yet to be addressed in the context of music representation learning. In this work, we propose the PianoTree VAE, a novel tree-structure extension upon VAE aiming to fit the polyphonic music learning. The experiments prove the validity of the PianoTree VAE via (i)-semantically meaningful latent code for polyphonic segments; (ii)-more satisfiable reconstruction aside of decent geometry learned in the latent space; (iii)-this model's benefits to the variety of the downstream music generation.
摘要:音乐表示学习的主导方法涉及深无人监督的模范家庭变的自动编码(VAE)。然而,大多数,如果不是全部,在这个问题上可行的尝试基本上都被限制为单声道音乐。通常由更丰富的形式和更复杂的音乐结构中,和弦对方还有待于在音乐表现学习的背景下加以解决。在这项工作中,我们提出了PianoTree VAE,VAE后一个新的树结构扩展旨在适应和弦音乐学习。实验证明通过PianoTree VAE(i)用于和弦段-semantically有意义潜代码的有效性; (ⅱ) - 更令人满意的重建的一边在潜空间了解到体面几何形状; (三) - 这种模式的好处的各种下游音乐生成的。

25. DeVLBert: Learning Deconfounded Visio-Linguistic Representations [PDF] 返回目录
  Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu, Jin Yu, Hongxia Yang, Fei Wu
Abstract: In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned. Existing methods for this problem are purely likelihood-based, leading to the spurious correlations and hurt the generalization ability when transferred to out-of-domain downstream tasks. By spurious correlation, we mean that the conditional probability of one token (object or word) given another one can be high (due to the dataset biases) without robust (causal) relationships between them. To mitigate such dataset biases, we propose a Deconfounded Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning. We borrow the idea of the backdoor adjustment from the research field of causality and propose several neural-network based architectures for Bert-style out-of-domain pretraining. The quantitative results on three downstream tasks, Image Retrieval (IR), Zero-shot IR, and Visual Question Answering, show the effectiveness of DeVLBert by boosting generalization ability.
摘要:在本文中,我们提出研究领域外的Visio的语言训练前的问题,其中来自其上预训练的模式将是微调的下行数据的预训练数据分布不同。对于这个问题的现有方法是纯粹的可能性为基础的,导致伪相关,当转移到域外下游任务伤泛化能力。通过伪相关,我们的意思是给定一个又一个一个令牌(对象或文字)的条件概率可以很高(由于数据集的偏见),他们之间没有稳健(因果)关系。为了减轻这种数据集的偏见,我们提出了一个Deconfounded Visio的语言伯特框架,简称DeVLBert,进行基础的干预学习。我们借用因果关系的研究领域借壳调整的思路,并提出了几种基于神经网络结构为伯特风格外的域训练前。在三个下游任务的定量结果,图像检索(IR),零射门IR和Visual问答系统,通过提高推广能力显示DeVLBert的有效性。

26. Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder [PDF] 返回目录
  Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee
Abstract: In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.
摘要:在近期的作品,基于流量的神经声码器已显示出实时语音生成任务显著改善。可逆流操作的顺序可以从简单的分布音频采样模式转换样本。然而,在个别声音数据训练连续密度模型可以因潜和实际分布之间的拓扑差降低模型的性能。要解决这个问题,我们提出了在高保真音频生成基于流的神经声码器的音频去量化方法。数据去量化在图像生成公知的方法,但是还没有在音频域进行了研究。出于这个原因,我们实施基于流的神经声码器的各种音频去量化方法和调查所产生音频效果。我们开展各种客观绩效评估和主观评价表明,音频量化可提高音频质量产生。从我们的实验中,使用音频量化生产具有更好的谐波结构和较少的数字假象波形音频。

27. Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition [PDF] 返回目录
  Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara
Abstract: Multimodal emotion recognition from speech is an important area in affective computing. Fusing multiple data modalities and learning representations with limited amounts of labeled data is a challenging task. In this paper, we explore the use of modality-specific "BERT-like" pretrained Self Supervised Learning (SSL) architectures to represent both speech and text modalities for the task of multimodal speech emotion recognition. By conducting experiments on three publicly available datasets (IEMOCAP, CMU-MOSEI, and CMU-MOSI), we show that jointly fine-tuning "BERT-like" SSL architectures achieve state-of-the-art (SOTA) results. We also evaluate two methods of fusing speech and text modalities and show that a simple fusion mechanism can outperform more complex ones when using SSL models that have similar architectural properties to BERT.
摘要:从语音多式联运情感识别是情感计算的一个重要领域。融合多种数据方式和学习交涉数量有限的标签数据是一项艰巨的任务。在本文中,我们探索利用方式特有的“BERT样”预训练的自我监督学习(SSL)架构来表示黄海军演情感识别任务的语音和文本模式。通过三个公开可用的数据集(IEMOCAP,CMU-MOSEI和CMU-MOSI)进行实验,我们表明,共同微调“BERT样” SSL架构实现国家的最先进的(SOTA)的结果。我们也评估融合语音和文本形式的两种方法,并显示一个简单的融合机制可以使用具有类似的结构特性,以BERT SSL模型时,跑赢更复杂的。

28. Adaptation Algorithms for Speech Recognition: An Overview [PDF] 返回目录
  Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
Abstract: We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation. The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation. We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature.
摘要:本文提出了一种结构化的概述适应算法基于神经网络的语音识别,同时考虑混合隐马尔可夫模型/神经网络系统和终端到终端的神经网络系统,重点是说话者适应,领域适应性和重音适应。概述表征自适应算法基于嵌入物,模型参数适配,或数据扩张。我们目前的语音识别自适应算法的性能进行了荟萃分析的基础上,为文献报道的相对误差率降低。

注:中文为机器翻译结果!封面为论文标题词云图!