0%

【arxiv论文】 Computation and Language 2020-05-19

目录

1. Reconstructing Maps from Text [PDF] 摘要
2. Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations [PDF] 摘要
3. Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings [PDF] 摘要
4. Inflecting when there's no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals [PDF] 摘要
5. Interaction Matching for Long-Tail Multi-Label Classification [PDF] 摘要
6. Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia [PDF] 摘要
7. Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature [PDF] 摘要
8. The presence of occupational structure in online texts based on word embedding NLP models [PDF] 摘要
9. Efficient Wait-k Models for Simultaneous Machine Translation [PDF] 摘要
10. SemEval-2020 Task 5: Detecting Counterfactuals by Disambiguation [PDF] 摘要
11. Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks [PDF] 摘要
12. Text Classification with Few Examples using Controlled Generalization [PDF] 摘要
13. Syntax-guided Controlled Generation of Paraphrases [PDF] 摘要
14. MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform [PDF] 摘要
15. Cross-Lingual Word Embeddings for Turkic Languages [PDF] 摘要
16. Context-Based Quotation Recommendation [PDF] 摘要
17. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [PDF] 摘要
18. Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer [PDF] 摘要
19. LiSSS: A toy corpus of Literary Spanish Sentences Sentiment for Emotions Detection [PDF] 摘要
20. Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation [PDF] 摘要
21. Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles [PDF] 摘要
22. How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? [PDF] 摘要
23. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce [PDF] 摘要
24. Multi-modal Automated Speech Scoring using Attention Fusion [PDF] 摘要
25. IMoJIE: Iterative Memory-Based Joint Open Information Extraction [PDF] 摘要
26. Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages [PDF] 摘要
27. Adversarial Training for Commonsense Inference [PDF] 摘要
28. Semi-Automating Knowledge Base Construction for Cancer Genetics [PDF] 摘要
29. RPD: A Distance Function Between Word Embeddings [PDF] 摘要
30. Learning Probabilistic Sentence Representations from Paraphrases [PDF] 摘要
31. Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning [PDF] 摘要
32. Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehensio [PDF] 摘要
33. IntelliCode Compose: Code Generation Using Transformer [PDF] 摘要
34. A Text Reassembling Approach to NaturalLanguage Generation [PDF] 摘要
35. Unsupervised Embedding-based Detection of Lexical Semantic Changes [PDF] 摘要
36. Logical Inferences with Comparatives and Generalized Quantifiers [PDF] 摘要
37. ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them [PDF] 摘要
38. Sequential Sentence Matching Network for Multi-turn Response Selection in Retrieval-based Chatbots [PDF] 摘要
39. Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection [PDF] 摘要
40. MicroNet for Efficient Language Modeling [PDF] 摘要
41. Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms [PDF] 摘要
42. KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT [PDF] 摘要
43. A Scientific Information Extraction Dataset for Nature Inspired Engineering [PDF] 摘要
44. In Layman's Terms: Semi-Open Relation Extraction from Scientific Texts [PDF] 摘要
45. Uncovering Gender Bias in Media Coverage of Politicians with Machine Learning [PDF] 摘要
46. Critical Impact of Social Networks Infodemic on Defeating Coronavirus COVID-19 Pandemic: Twitter-Based Study and Research Directions [PDF] 摘要
47. Machine learning on Big Data from Twitter to understand public reactions to COVID-19 [PDF] 摘要
48. Conversational Search -- A Report from Dagstuhl Seminar 19461 [PDF] 摘要
49. Design Choices for X-vector Based Speaker Anonymization [PDF] 摘要
50. Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation [PDF] 摘要
51. Audio-visual Multi-channel Recognition of Overlapped Speech [PDF] 摘要
52. Robust Training of Vector Quantized Bottleneck Models [PDF] 摘要
53. Attention-based Transducer for Online Speech Recognition [PDF] 摘要
54. An Effective End-to-End Modeling Approach for Mispronunciation Detection [PDF] 摘要
55. The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge [PDF] 摘要
56. Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP [PDF] 摘要
57. Vector-Quantized Autoregressive Predictive Coding [PDF] 摘要
58. Fixed Point Semantics for Stream Reasoning [PDF] 摘要
59. Wake Word Detection with Alignment-Free Lattice-Free MMI [PDF] 摘要
60. A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer [PDF] 摘要
61. On the Combined Use of Extrinsic Semantic Resources for Medical Information Search [PDF] 摘要
62. Dual Learning: Theoretical Study and an Algorithmic Extension [PDF] 摘要
63. #Coronavirus or #Chinesevirus?!: Understanding the negative sentiment reflected in Tweets with racist hashtags across the development of COVID-19 [PDF] 摘要
64. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [PDF] 摘要
65. Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models [PDF] 摘要
66. Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory [PDF] 摘要
67. Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation [PDF] 摘要
68. AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition [PDF] 摘要
69. Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss [PDF] 摘要
70. Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition [PDF] 摘要
71. Oscillating Statistical Moments for Speech Polarity Detection [PDF] 摘要
72. Glottal Source Estimation using an Automatic Chirp Decomposition [PDF] 摘要
73. Large scale weakly and semi-supervised learning for low-resource video ASR [PDF] 摘要
74. Speaker Re-identification with Speaker Dependent Speech Enhancement [PDF] 摘要
75. Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification [PDF] 摘要
76. Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions [PDF] 摘要
77. JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment [PDF] 摘要

摘要

1. Reconstructing Maps from Text [PDF] 返回目录
  Johnathan E. Avery, Robert L. Goldstone, Michael N. Jones
Abstract: Previous research has demonstrated that Distributional Semantic Models (DSMs) are capable of reconstructing maps from news corpora (Louwerse & Zwaan, 2009) and novels (Louwerse & Benesh, 2012). The capacity for reproducing maps is surprising since DSMs notoriously lack perceptual grounding (De Vega et al., 2012). In this paper we investigate the statistical sources required in language to infer maps, and resulting constraints placed on mechanisms of semantic representation. Study 1 brings word co-occurrence under experimental control to demonstrate that direct co-occurrence in language is necessary for traditional DSMs to successfully reproduce maps. Study 2 presents an instance-based DSM that is capable of reconstructing maps independent of the frequency of co-occurrence of city names.
摘要:以前的研究已经证明,分布式语义模型(DSM的)能够从新闻语料(Louwerse&Zwaan酒店,2009年)和小说(Louwerse&Benesh,2012)重建地图。用于再现的地图的能力是令人惊讶的,因为众所周知,DSM的缺乏知觉接地(德Vega等人,2012)。在本文中,我们探讨语言来推断地图所需的统计来源,并导致约束放在语义表达的机制。研究1字带来共生实验控制下,以证明直接共生的语言是必要的传统DSM的成功重现地图。 2研究提出了一种基于实例的DSM是能够重建的映射独立的城市的名字共同出现的频率。

2. Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations [PDF] 返回目录
  Sam Coope, Tyler Farghly, Daniela Gerz, Ivan Vulić, Matthew Henderson
Abstract: We introduce Span-ConveRT, a light-weight model for dialog slot-filling which frames the task as a turn-based span extraction task. This formulation allows for a simple integration of conversational knowledge coded in large pretrained conversational models such as ConveRT (Henderson et al., 2019). We show that leveraging such knowledge in Span-ConveRT is especially useful for few-shot learning scenarios: we report consistent gains over 1) a span extractor that trains representations from scratch in the target domain, and 2) a BERT-based span extractor. In order to inspire more work on span extraction for the slot-filling task, we also release RESTAURANTS-8K, a new challenging data set of 8,198 utterances, compiled from actual conversations in the restaurant booking domain.
摘要:我们介绍跨度频转换,用于对话槽填充物的重量轻的模型哪些帧的任务作为回合制期间提取的任务。该制剂允许简单的集成对话知识的编码的大预训练会话模型,例如转换(Henderson等人,2019)。我们发现在跨度转换是利用这种知识是几拍的学习场景非常有用:我们在1报告相一致的收益)的跨度提取从无到有的目标域,以及2)基于BERT跨度提取火车交涉。为了激励在期间提取的槽填充的任务更多的工作,我们也释放RESTAURANTS-8K,8,198的话语新的具有挑战性的数据集,从餐馆预约域实际交谈编译。

3. Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings [PDF] 返回目录
  Katherine McCurdy, Oguz Serbetci
Abstract: Recent research has demonstrated that vector space models of semantics can reflect undesirable biases in human culture. Our investigation of crosslinguistic word embeddings reveals that topical gender bias interacts with, and is surpassed in magnitude by, the effect of grammatical gender associations, and both may be attenuated by corpus lemmatization. This finding has implications for downstream applications such as machine translation.
摘要:最近的研究表明,语义向量空间模型能够反映人类文化不可取的偏见。我们crosslinguistic字的嵌入的调查显示与局部的性别偏见相互作用,并通过中,语法性别团体的影响幅度超过,都可以通过语料库词形还原衰减。这一发现对下游应用,如机器翻译的影响。

4. Inflecting when there's no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals [PDF] 返回目录
  Kate McCurdy, Sharon Goldwater, Adam Lopez
Abstract: Can artificial neural networks learn to represent inflectional morphology and generalize to new words as human speakers do? Kirov and Cotterell (2018) argue that the answer is yes: modern Encoder-Decoder (ED) architectures learn human-like behavior when inflecting English verbs, such as extending the regular past tense form -(e)d to novel words. However, their work does not address the criticism raised by Marcus et al. (1995): that neural models may learn to extend not the regular, but the most frequent class --- and thus fail on tasks like German number inflection, where infrequent suffixes like -s can still be productively generalized. To investigate this question, we first collect a new dataset from German speakers (production and ratings of plural forms for novel nouns) that is designed to avoid sources of information unavailable to the ED model. The speaker data show high variability, and two suffixes evince 'regular' behavior, appearing more often with phonologically atypical inputs. Encoder-decoder models do generalize the most frequently produced plural class, but do not show human-like variability or 'regular' extension of these other plural markers. We conclude that modern neural models may still struggle with minority-class generalization.
摘要:可以人工神经网络学会代表屈折形态和推广到新的词作为人类音箱呢?基洛夫和Cotterell(2018)认为,答案是肯定的:现代编码器,解码器(ED)架构学习类人行为时的英语活用动词,如规则的过去式形式延伸 - (E)d新颖的话。然而,他们的工作并没有解决由Marcus等人提出了批评。 (1995年):在神经模型,可以学习不延长定期,但最常见的类---因此不能像德国数拐点,其中罕见的后缀-s一样仍然可以高效地推广任务。为了研究这个问题,我们首先收集来自德语的人(生产和复数形式新型名词评级)被设计为不可用的ED模型的信息,从而避免资源的新数据集。扬声器的数据显示高可变性,和两个后缀表示出“常规”的行为,与音位非典型输入出现更频繁。编码器,解码器模型并推广最频繁产生的复数类,但没有表现出类似人类的变异或这些多种标志物的“常规”的延伸。我们的结论是现代神经车型仍可能有少数一流的泛化奋斗。

5. Interaction Matching for Long-Tail Multi-Label Classification [PDF] 返回目录
  Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder
Abstract: We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking. By performing soft n-gram interaction matching, we match labels with natural language descriptions (which are common to have in most multi-labeling tasks). Our approach can be used to enhance existing multi-label classification approaches, which are biased toward frequently-occurring labels. We evaluate our approach on two challenging tasks: automatic medical coding of clinical notes and automatic labeling of entities from software tutorial text. Our results show that our method can yield up to an 11% relative improvement in macro performance, with most of the gains stemming labels that appear infrequently in the training set (i.e., the long tail of labels).
摘要:提出了一种优雅而有效的方法通过并入互动匹配,证明是有用的即席搜索结果排名的一个概念解决现有的多标签分类模型的局限性。通过执行软正克互动匹配,我们配合自然语言描述(这是常见的有在大多数多标签的任务)标签。我们的方法可用于增强现有的多标签分类方法,这是对频繁出现的标签偏见。我们评估我们在两个有挑战性的任务的方法:临床笔记和软件教程文本实体的自动贴标自动医疗编码。我们的研究结果表明,我们的方法可以产生高达微距性能的11%的相对改善,大多数制止,在训练集(即标签的长尾)很少出现标签的收益。

6. Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia [PDF] 返回目录
  Sergey Zinin, Yang Xu
Abstract: Chinese dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE. The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change. However, there is no freely available open-source corpus of these histories, making Classical Chinese low-resource. This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license. An original list of Classical Chinese gender-specific terms was developed as a case study for analyzing the historical linguistic use of male and female terms. The study demonstrates considerable stability in the usage of these terms, with dominance of male terms. Exploration of word meanings uses keyword analysis of focus corpora created for genderspecific terms. This method yields meaningful semantic representations that can be used for future studies of diachronic semantics.
摘要:中国朝代历史形成约2000多年的大型连续的语言空间,从公元前3世纪到18世纪的行政长官。这些历史都记录在古典(文学)在中国超过20万字,适合于历史词汇和语义变化的计算分析语料库。然而,这些历史没有免费的开源的语料库,使得中国古典低资源。该项目引进的24由Creative Commons许可覆盖的王朝史一个新的开源语料库。中国古典性别的具体条款的原始列表是作为分析的历史语境使用的男性和女性方面的案例研究。这项研究表明,在这些术语的使用相当的稳定性,与男性方面的统治地位。词义的探索用来genderspecific方面造成重点语料的关键字分析。这种方法产生可用于历时语义未来的研究有意义的语义表示。

7. Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature [PDF] 返回目录
  Mhd Wesam Al-Nabki, Francisco Jañez-Martino, Roberto A. Vasco-Carofilis, Eduardo Fidalgo, Javier Velasco-Mata
Abstract: Name entity recognition in noisy user-generated texts is a difficult task usually enhanced by incorporating an external resource of information, such as gazetteers. However, gazetteers are task-specific, and they are expensive to build and maintain. This paper adopts and improves the approach of Aguilar et al. by presenting a novel feature, called Local Distance Neighbor, which substitutes gazetteers. We tested the new approach on the W-NUT-2017 dataset, obtaining state-of-the-art results for the Group, Person and Product categories of Named Entities. Next, we added 851 manually labeled samples to the W-NUT-2017 dataset to account for named entities in the Tor Darknet related to weapons and drug selling. Finally, our proposal achieved an entity and surface F1 scores of 52.96% and 50.57% on this extended dataset, demonstrating its usefulness for Law Enforcement Agencies to detect named entities in the Tor hidden services.
摘要:在嘈杂的用户生成的文本命名实体识别是一项艰巨的任务通常是通过将信息的外部资源,如地方志增强。然而,地方志是任务特定的,它们是建立和维护费用昂贵。本文采用和改进阿吉拉尔等人的做法。通过呈现新的特征,被称为现代远程邻居,它替代方志。我们测试的W-NUT-2017数据集的新方法,获得国家的最先进的结果命名实体的团体,个人和产品类别。接下来,我们增加了851个手动标记的样品到W-NUT-2017数据集,以考虑Tor的暗网命名实体相关的武器和药品销售。最后,我们的建议实现这一扩展数据集的52.96%和50.57%的实体和表面F1成绩,展示了其执法机构有效性检测Tor的隐匿服务命名实体。

8. The presence of occupational structure in online texts based on word embedding NLP models [PDF] 返回目录
  Zoltán Kmetty, Julia Koltai, Tamás Rudas
Abstract: Research on social stratification is closely linked to analysing the prestige associated with different occupations. This research focuses on the positions of occupations in the semantic space represented by large amounts of textual data. The results are compared to standard results in social stratification to see whether the classical results are reproduced and if additional insights can be gained into the social positions of occupations. The paper gives an affirmative answer to both questions. The results show fundamental similarity of the occupational structure obtained from text analysis to the structure described by prestige and social distance scales. While our research reinforces many theories and empirical findings of the traditional body of literature on social stratification and, in particular, occupational hierarchy, it pointed to the importance of a factor not discussed in the main line of stratification literature so far: the power and organizational aspect.
摘要:研究社会分层是密切相关的分析不同职业所带来的声望。本研究以职业的由大量的文本数据所表示的语义空间的位置。结果相比,在社会分层标准的结果看经典的结果是否被复制,如果更多的分析可以得到进入职业的社会地位。文中给出了肯定的回答这两个问题。结果表明从文本分析的结构得到的职业结构的基本相似性由信誉和社会距离尺度描述。虽然我们的研究强化了许多理论和社会分层和,特别是职业阶层文学传统的身体的实证研究结果,指出在分层文学的主线到目前为止还没有讨论的一个因素的重要性:电源和组织方面。

9. Efficient Wait-k Models for Simultaneous Machine Translation [PDF] 返回目录
  Maha Elbayad, Laurent Besacier, Jakob Verbeek
Abstract: Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets. We improve training of these models using unidirectional encoders, and training across multiple values of k. Experiments with Transformer and 2D-convolutional architectures show that our wait-k models generalize well across a wide range of latency levels. We also show that the 2D-convolution architecture is competitive with Transformers for simultaneous translation of spoken language.
摘要:同时机器翻译在于开始输出生成的整个输入序列可用之前。等待-K解码器提供了针对此问题的一种简单而有效的方法。他们首先读k个源令牌,之后他们产生目标令牌和读取另一个源令牌之间交替。我们调查的等待-K的使用IWSLT数据集口语语料库低资源环境解码的行为。我们提高使用跨k的多个值单向编码器,以及培训培训这些模型。带变压器和二维卷积架构实验表明,我们的等待-K模型在大范围的延迟水平的概括好。我们还表明,二维卷积架构与变形金刚有竞争力的口语的同声翻译。

10. SemEval-2020 Task 5: Detecting Counterfactuals by Disambiguation [PDF] 返回目录
  Hanna Abi Akl, Dominique Mariko, Estelle Labidurie
Abstract: In this paper, we explore strategies to detect and evaluate counterfactual sentences. Since causal insight is an inherent characteristic of a counterfactual, is it possible to use this information in order to locate antecedent and consequent fragments in counterfactual statements? We thus propose to compare and evaluate models to correctly identify and chunk counterfactual sentences. In our experiments, we attempt to answer the following questions: First, can a learned model discern counterfactual statements reasonably well? Second, is it possible to clearly identify antecedent and consequent parts of counterfactual sentences?
摘要:在本文中,我们探讨的策略来检测和评估反句子。由于因果见解是反的固有特性,是有可能,以便在反语句来定位前提和后果片段以使用该信息?因此,我们提出了比较和评估模型,以正确识别和大块反句子。在我们的实验中,我们尝试回答以下几个问题:首先,一个可以学习的模型辨别反报表还算不错?其次,是可以清楚地识别前提和反句子随之而来的部分?

11. Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks [PDF] 返回目录
  Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Chitta Baral
Abstract: Numerical reasoning is often important to accurately understand the world. Recently, several format-specific datasets have been proposed, such as numerical reasoning in the settings of Natural Language Inference (NLI), Reading Comprehension (RC), and Question Answering (QA). Several format-specific models and architectures in response to those datasets have also been proposed. However, there exists a strong need for a benchmark which can evaluate the abilities of models, in performing question format independent numerical reasoning, as (i) the numerical reasoning capabilities we want to teach are not controlled by question formats, (ii) for numerical reasoning technology to have the best possible application, it must be able to process language and reason in a way that is not exclusive to a single format, task, dataset or domain. In pursuit of this goal, we introduce NUMBERGAME, a multifaceted benchmark to evaluate model performance across numerical reasoning tasks of eight diverse formats. We add four existing question types in our compilation. Two of the new types we add are about questions that require external numerical knowledge, commonsense knowledge and domain knowledge. For building a more practical numerical reasoning system, NUMBERGAME demands four capabilities beyond numerical reasoning: (i) detecting question format directly from data (ii) finding intermediate common format to which every format can be converted (iii) incorporating commonsense knowledge (iv) handling data imbalance across formats. We build several baselines, including a new model based on knowledge hunting using a cheatsheet. However, all baselines perform poorly in contrast to the human baselines, indicating the hardness of our benchmark. Our work takes forward the recent progress in generic system development, demonstrating the scope of these under-explored tasks.
摘要:数字推理往往是重要的准确了解世界。最近,一些特定格式的数据集已经被提出,如自然语言推理(NLI),阅读理解(RC),和答疑(QA)的设置,数字推理。针对这些数据集几种格式,具体型号和体系结构也已被提出。但是,存在着对能够评估模型的能力,在执行问题的格式独立的数字推理,因为(我),我们要教的数字推理的能力不被质疑的格式控制的基准的强烈需求,(二)数值推理技术,有最好的应用程序,它必须能够处理语言和推理的方式,不只限于一个单一格式,任务,数据集或领域。在实现这一目标,我们引入NUMBERGAME,多方面的基准来评估横跨八个不同格式的数字推理任务模型的性能。我们在编制补充现有四个问题类型。新的类型,我们添加的两个大约需要外部的数字知识,常识性的知识和领域知识的问题。为了构建一个更实际的数字推理系统,NUMBERGAME要求4种能力超出数字推理:(i)从数据(ⅱ)找到中间通用格式直接检测问题的格式,其每一个的格式可以被转换(ⅲ)将常识知识库(ⅳ)处理跨格式数据不平衡。我们建几个基线,包括基于使用的cheatsheet知识狩猎的新模式。然而,所有基准对比人类基线表现不佳,这表明我们的基准的硬度。我们的工作需要向前通用系统发展的最新进展,展示了这些充分开发的任务范围。

12. Text Classification with Few Examples using Controlled Generalization [PDF] 返回目录
  Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth
Abstract: Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but the manner and extent of generalization is task dependent. Current practice primarily relies on pre-trained word embeddings to map words unseen in training to similar seen ones. Unfortunately, this squishes many components of meaning into highly restricted capacity. Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora; based on the available training data, we select features that offers the relevant generalizations. This produces task-specific semantic vectors; here, we show that a feed-forward network over these vectors is especially effective in low-data scenarios, compared to existing state-of-the-art methods. By further pairing this network with a convolutional neural network, we keep this edge in low data scenarios and remain competitive when using full training sets.
摘要:文本分类的训练数据往往是在实践中的限制,尤其是对于许多输出类或涉及到很多相关的分类问题的应用程序。这意味着分类器必须从有限的证据一概而论,但概括的方式和程度是任务依赖性的。目前的做法主要依赖于预先训练字的嵌入映射在训练中看不见类似的看到那些话。不幸的是,这意味着squishes成高度限制容量的许多组件。我们的替代开始与未标记解析的语料来源的稀疏预先训练的表示;基于现有的训练数据,我们选择的功能,它提供了相关的概括。这产生特定任务语义矢量;这里,我们表明,前馈网络在这些载体是在低数据方案是特别有效的,相比​​于现有的国家的最先进的方法。通过进一步配对这个网络卷积神经网络,我们一直在低数据情况下,这边缘,并使用完整的训练集时保持竞争力。

13. Syntax-guided Controlled Generation of Paraphrases [PDF] 返回目录
  Ashutosh Kumar, Kabir Ahuja, Raghuram Vadapalli, Partha Talukdar
Abstract: Given a sentence (e.g., "I like mangoes") and a constraint (e.g., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (e.g., "I hate mangoes"). Going beyond such simple constraints, recent works have started exploring the incorporation of complex syntactic-guidance as constraints in the task of controlled paraphrase generation. In these methods, syntactic-guidance is sourced from a separate exemplar sentence. However, these prior works have only utilized limited syntactic information available in the parse tree of the exemplar sentence. We address this limitation in the paper and propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation. We find that SGCP can generate syntax conforming sentences while not compromising on relevance. We perform extensive automated and human evaluations over multiple real-world English language datasets to demonstrate the efficacy of SGCP over state-of-the-art baselines. To drive future research, we have made SGCP's source code available
摘要:给定一个句子(例如,“我喜欢芒果”)和约束(例如,情绪翻转),控制文本生成的目标是产生一个句子适应输入句子,以满足约束的要求(例如, “我讨厌芒果”)。超越这种简单的限制,最近的作品已经开始探索的复杂语法指导纳入作为控制意译一代人的任务限制。在这些方法中,语法指导由单独的典范句子来源。然而,这些现有的作品只使用在典型句子的解析树提供有限的句法信息。我们解决了纸这一限制,并提出了语法制导控制Paraphraser(SGCP),最终到终端的框架句法释义产生。我们发现,SGCP可以生成符合语法的句子,而不是在相关影响。我们进行过多次实际的英语数据集丰富的自动和人工评估,以证明SGCP超过国家的最先进的基线功效。为了推动未来的研究中,我们已提供SGCP的源代码

14. MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform [PDF] 返回目录
  Xiang Gao, Michel Galley, Bill Dolan
Abstract: We present MixingBoard, a platform for quickly building demos with a focus on knowledge grounded stylized text generation. We unify existing text generation algorithms in a shared codebase and further adapt earlier algorithms for constrained generation. To borrow advantages from different models, we implement strategies for cross-model integration, from the token probability level to the latent space level. An interface to external knowledge is provided via a module that retrieves on-the-fly relevant knowledge from passages on the web or any document collection. A user interface for local development, remote webpage access, and a RESTful API are provided to make it simple for users to build their own demos.
摘要:我们目前MixingBoard,用于快速构建演示重点放在知识接地程式化的文本生成的平台。我们统一在一个共享代码库中现有的文本生成算法,并进一步适应约束一代人以前的算法。从不同车型借用的优势,我们实现了跨模型集成策略,从令牌概率水平的潜在空间水平。对外部知识的接口通过一个模块提供检索网页或任何文档集合从通道上即时相关知识。提供了本地的发展,远程网页访问,以及一个RESTful API的用户界面,使用户可以简单建立自己的演示。

15. Cross-Lingual Word Embeddings for Turkic Languages [PDF] 返回目录
  Elmurod Kuriyozov, Yerai Doval, Carlos Gómez-Rodríguez
Abstract: There has been an increasing interest in learning cross-lingual word embeddings to transfer knowledge obtained from a resource-rich language, such as English, to lower-resource languages for which annotated data is scarce, such as Turkish, Russian, and many others. In this paper, we present the first viability study of established techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh and Kyrgyz, members of the Turkic family which is heavily affected by the low-resource constraint. Those techniques are known to require little explicit supervision, mainly in the form of bilingual dictionaries, hence being easily adaptable to different domains, including low-resource ones. We obtain new bilingual dictionaries and new word embeddings for these languages and show the steps for obtaining cross-lingual word embeddings using state-of-the-art techniques. Then, we evaluate the results using the bilingual dictionary induction task. Our experiments confirm that the obtained bilingual dictionaries outperform previously-available ones, and that word embeddings from a low-resource language can benefit from resource-rich closely-related languages when they are aligned together. Furthermore, evaluation on an extrinsic task (Sentiment analysis on Uzbek) proves that monolingual word embeddings can, although slightly, benefit from cross-lingual alignments.
摘要:一直在学习跨语种字的嵌入到从一个资源丰富的语言得到传递知识,如英语,降低资源为其标注的数据是稀缺的语言,如土耳其,俄罗斯,以及许多的兴趣越来越大其他。在本文中,我们提出了建立技术第一可行性研究对准土耳其,乌兹别克斯坦,阿塞拜疆,哈萨克斯坦和吉尔吉斯,突厥族这在很大程度上受低资源约束的成员单语嵌入空间。这些技术是众所周知的需要很少明确的监督,主要是在双语词典的形式,因此被轻易适应不同的领域,包括低资源者。我们获取这些语言新的双语词典和新词的嵌入,并显示使用获得国家的最先进的技术,跨语言的嵌入字的步骤。然后,我们评估使用双语词典感应任务的结果。我们的实验确认,该双语词典优于以前可用的,并且当它们排列在一起,从一个资源匮乏的语言,文字的嵌入可以从丰富的资源密切相关的语言受益。此外,在一个外在的任务(在乌兹别克情感分析)评估证明和英语单词的嵌入就可以了,虽然略显,从跨语言比对效益。

16. Context-Based Quotation Recommendation [PDF] 返回目录
  Ansel MacLaughlin, Tao Chen, Burcu Karagol Ayan, Dan Roth
Abstract: While composing a new document, anything from a news article to an email or essay, authors often utilize direct quotes from a variety of sources. Although an author may know what point they would like to make, selecting an appropriate quote for the specific context may be time-consuming and difficult. We therefore propose a novel context-aware quote recommendation system which utilizes the content an author has already written to generate a ranked list of quotable paragraphs and spans of tokens from a given source document. We approach quote recommendation as a variant of open-domain question answering and adapt the state-of-the-art BERT-based methods from open-QA to our task. We conduct experiments on a collection of speech transcripts and associated news articles, evaluating models' paragraph ranking and span prediction performances. Our experiments confirm the strong performance of BERT-based methods on this task, which outperform bag-of-words and neural ranking baselines by more than 30% relative across all ranking metrics. Qualitative analyses show the difficulty of the paragraph and span recommendation tasks and confirm the quotability of the best BERT model's predictions, even if they are not the true selected quotes from the original news articles.
摘要:在撰写新的文件,从新闻文章,电子邮件或任何一篇文章,作者通常利用来自各种来源的直接报价。尽管提交可以知道什么时候,他们想打,选择合适的报价为特定的上下文可能是耗时和困难。因此,我们提出了一个新的上下文感知报价推荐系统,它利用一个作者已经写产生值得借鉴的段落,并从给定的源文档令牌的跨度的排名列表的内容。我们接近报价的建议开放域问答的变体,并且从开口-QA的国家的最先进的基于BERT的方法适应我们的任务。我们对语音转录文本和相关的新闻文章的集合进行实验,评估模型段落排序和寿命预测的表演。我们的实验证实了这一基于任务的BERT的方法,它通过在所有排名指标相对30%以上跑赢袋的词和神经排名基线的强劲表现。定性分析显示的段落和跨度推荐任务的难度和确定最佳BERT模型预测的quotability,即使他们不是从原来的新闻文章的真正选择的报价。

17. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [PDF] 返回目录
  Pengcheng Yin, Graham Neubig, Wen-tau Yih, Sebastian Riedel
Abstract: Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like semantic parsing over structured data, which require reasoning over both free-form NL questions and structured tabular data (e.g., database tables). In this paper we present TaBERT, a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. In experiments, neural semantic parsers using TaBERT as feature representation layers achieve new best results on the challenging weakly-supervised semantic parsing benchmark WikiTableQuestions, while performing competitively on the text-to-SQL dataset Spider. Implementation of the model will be available at this http URL .
摘要:近年来,两国预训练的语言模型(LMS)的基于文本的自然语言(NL)了解任务的蓬勃发展。这样的模型通常训练上自由形式的文本NL,因此可能不适合于像在结构化数据,其需要超过推理自由形式NL问题,并在结构化的表格数据(例如,数据库中的表)两个语义分析任务。在本文中,我们目前TaBERT,预训练LM是共同学习为NL的句子,(半)结构表表示。 TaBERT是在一个大的语料库的26万台和英语情境训练。在实验中,使用TaBERT作为特征表示层神经语义分析程序实现上弱监督的语义分析基准WikiTableQuestions,而在文本到SQL数据集蜘蛛竞争力进行新的挑战最好的结果。该模型的实施将可在此http网址。

18. Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer [PDF] 返回目录
  Bhaskar Sen, Nikhil Gopal, Xinwei Xue
Abstract: Quality of questions and answers from community support websites (e.g. Microsoft Developers Network, Stackoverflow, Github, etc.) is difficult to define and a prediction model of quality questions and answers is even more challenging to implement. Previous works have addressed the question quality models and answer quality models separately using meta-features like number of up-votes, trustworthiness of the person posting the questions or answers, titles of the post, and context naive natural language processing features. However, there is a lack of an integrated question-answer quality model for community question answering websites in the literature. In this brief paper, we tackle the quality Q&A modeling problems from the community support websites using a recently developed deep learning model using bidirectional transformers. We investigate the applicability of transfer learning on Q&A quality modeling using Bidirectional Encoder Representations from Transformers (BERT) trained on a separate tasks originally using Wikipedia. It is found that a further pre-training of BERT model along with finetuning on the Q&As extracted from Microsoft Developer Network (MSDN) can boost the performance of automated quality prediction to more than 80%. Furthermore, the implementations are carried out for deploying the finetuned model in real-time scenario using AzureML in Azure knowledge base system.
摘要:从社区支持的网站(例如微软开发者网络,#1,Github上,等)的问题和答案的质量是很难界定和质量问题和答案的预测模型更是具有挑战性的实施。以前的作品已分别使用元功能,如高达-票数,发布后的问题或答案,职称的人的诚信和上下文天真自然语言处理功能,解决问题的质量模型和答案质量模型。然而,缺乏一个统一的问答质量模型在文学社区问答网站。在这篇简短的论文中,我们解决质量Q&A从建模使用最近开发的深度学习模型中使用双向变压器社会各界的支持网站的问题。我们研究传递学习的Q&A质量使用双向编码器交涉从培训了原来使用维基百科的一个单独的任务变形金刚(BERT)模型的适用性。据发现,BERT模型的进一步的预训练与微调的Q&作为沿着从微软开发者网络(MSDN)萃取可以提高自动质量预测性能,以80%以上。此外,实现被部署在Azure的知识库系统使用AzureML在实时情景的微调,模式进行。

19. LiSSS: A toy corpus of Literary Spanish Sentences Sentiment for Emotions Detection [PDF] 返回目录
  Juan-Manuel Torres-Moreno, Luis-Gil Moreno-Jiménez
Abstract: In this work we present a new and small corpus in the area of Computational Creativity (CC), the Literary Sentiment Sentence Spanish Corpus (LISSS). We address this corpus of literary sentences in order to evaluate algorithms of sentiment classification and emotions detection. We have constitute it by manually classifying its sentences in five emotions: Love, Fear, Happiness, Anger and Sadness/Pain. We also present some baseline classification algorithms applied on our corpus. The LISSS corpus will be available to the community as a free resource to evaluate or create CC algorithms.
摘要:在这项工作中,我们在计算创意(CC),文学感悟句子西班牙语语料库(LISSS)的区域呈现出新的,小的语料库。我们讨论,以评估情感分类和情感的检测算法文学的句子这个语料库。我们通过五名志手工分类的句子构成的:爱,恐惧,快乐,愤怒和悲伤/疼痛。我们还提出适用于我们的语料库一些基准分类算法。该LISSS语料库将提供给社区的免费资源,以评估或创建CC算法。

20. Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation [PDF] 返回目录
  Won Ik Cho, Donghyun Kwak, Jiwon Yoon, Nam Soo Kim
Abstract: Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized an end-to-end structure that preserves the uncertainty information. This further reduces the propagation of speech recognition error and guarantees computational efficiency. We claim that in this process, the speech comprehension can benefit from the inference of massive pre-trained language models (LMs). We transfer the knowledge from a concrete Transformer-based text LM to an SLU module which can face a data shortage, based on recent cross-modal distillation methodologies. We demonstrate the validity of our proposal upon the performance on the Fluent Speech Command dataset. Thereby, we experimentally verify our hypothesis that the knowledge could be shared from the top layer of the LM to a fully speech-based module, in which the abstracted speech is expected to meet the semantic representation.
摘要:语音是沟通的最有效的手段之一,也是充满了帮助说话者的思想传递信息。然而,主要是由于声学特征繁琐的处理,音素或词的后验概率经常被理解自然语言丢弃。因此,最近的一些口语理解(SLU)模块已经利用的端至端的结构,保留了不确定性的信息。这进一步减少的语音识别错误和保证计算效率传播。我们主张,在这个过程中,言语理解可以从大量的预先训练的语言模型(LMS)的推理中受益。我们从基于变压器的具体文本LM知识转移到可以面对数据不足,根据最近的跨模态蒸馏方法的SLU模块。我们在上流利的语音命令数据集中的表现证明了我们建议的有效性。因此,我们通过实验证实了我们的假设的知识可以从LM到一个完全基于语音的模块,其中抽象的讲话预计将达到语义表达的顶层共享。

21. Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles [PDF] 返回目录
  Ben Eyal, Michael Elhadad
Abstract: We present a semantic role labeling resource for Hebrew built semi-automatically through annotation projection from English. This corpus is derived from the multilingual OpenSubtitles dataset and includes short informal sentences, for which reliable linguistic annotations have been computed. We provide a fully annotated version of the data including morphological analysis, dependency syntax and semantic role labeling in both FrameNet and PropBank styles. Sentences are aligned between English and Hebrew, both sides include full annotations and the explicit mapping from the English arguments to the Hebrew ones. We train a neural SRL model on this Hebrew resource exploiting the pre-trained multilingual BERT transformer model, and provide the first available baseline model for Hebrew SRL as a reference point. The code we provide is generic and can be adapted to other languages to bootstrap SRL resources.
摘要:本文提出了一种语义角色标注资源希伯来语通过从英文注释投影内置半自动。这个语料库从多种语言OpenSubtitles数据集导出的,并且包括短非正式的句子,为此可靠语言的注释已经计算。我们所提供的数据包括两个框架网络和PropBank风格形态分析,依赖语法和语义角色标注的完全注释版本。句子是英语和希伯来语之间对齐,双方包括完整的注释,并从英语议论文希伯来文的人明确的映射。我们对这个希伯来语资源开采预先训练的多语种BERT变压器模型训练神经SRL模型,并提供希伯来语SRL第一个可用的基准模型作为参考点。我们提供的代码是通用的,可以适用于其他语言来引导SRL资源。

22. How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? [PDF] 返回目录
  Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal
Abstract: Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks. On the other hand, simple recurrent networks (SRNs), which appear more biologically grounded in terms of synaptic connections, have generally been less successful at capturing long-range dependencies as well as the loci of grammatical errors in an unsupervised setting. In this paper, we seek to develop models that bridge the gap between biological plausibility and linguistic competence. We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations and models the excitatory and inhibitory connections in a population of neurons. Besides its biological inspiration, our model also shows competitive performance relative to LSTMs on subject-verb agreement, sentence grammaticality, and language modeling tasks. These results provide some pointers towards probing the nature of the inductive biases required for RNN architectures to model linguistic phenomena successfully.
摘要:长短期存储器(LSTM)网络和它们的变体是能够包封远程相关性,这从它们对各种语言的任务性能是显而易见的。在另一方面,简单复发性网络(的SRN),其出现多种生物接地突触连接方面,一般都在捕获远距离依赖性以及语法错误的以无监督的设定轨迹不太成功。在本文中,我们力求发展模式,弥补生物合理性和语言能力之间的差距。我们提出了一个新的架构,衰减RNN,它结合了神经元的激活和模型的兴奋性和抑制连接的腐朽本质的神经元的群。除了它的生物灵感,有竞争力的性能相对于在主谓一致,句子语法性和语言建模任务LSTMs我们的模型也显示了。这些结果提供了一些指引迈向探查RNN架构所需的电感偏见性质的语言现象成功建模。

23. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce [PDF] 返回目录
  Juntao Li, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao, Rui Yan
Abstract: With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.
摘要:随着繁荣的跨境电子商务的,有设计用于辅助电子商务卖家提供本地产品,为消费者世界各地的智能方法的迫切需要。在本文中,我们将探讨跨语言信息检索,新的任务,即,跨语言设置到描述检索跨境电子商务,涉及配套产品属性集源语言与说服力的产品说明目标语言。我们手动收集新的和高质量的数据集配对,其中每对包含在源语言中的无序的产品属性集和在目标语言中的信息产品说明。随着数据集建设过程既费时又昂贵,新数据集只包括13.5K对,这是一个低资源设置,可以看作是对模型开发和评估跨境电子商务的一个具有挑战性的测试平台。为了解决这个跨语言设置到描述检索任务,我们提出与上下文相关的跨语言映射在预先训练的单语BERT表示增强一种新型的跨语言匹配网络(CLMN)。实验结果表明,我们提出的CLMN产生的具有挑战性的任务和BERT上下文相关的跨语言映射了不俗的业绩产生了预先训练的多语种BERT模式明显改善。

24. Multi-modal Automated Speech Scoring using Attention Fusion [PDF] 返回目录
  Manraj Singh Grover, Yaman Kumar, Sumit Sarin, Payman Vafaee, Mika Hama, Rajiv Ratn Shah
Abstract: In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Attention fusion is performed on these learned predictive features to learn complex interactions between different modalities before final scoring. We compare our model with strong baselines and find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system. Further, we present a qualitative and quantitative analysis of our model.
摘要:在这项研究中,我们提出了非英语母语者使用注意融合自然语音的自动评估一种新的多模终端到终端的神经途径。管道采用双向递归卷积神经网络和双向长短期记忆神经网络来编码声音分别从频谱图和改编曲,词汇线索。注意融合这些教训预测功能进行最终得分前学习不同模式之间复杂的相互作用。我们比较我们的模型具有较强的基线,并找到结合兼顾词汇和声学线索显著提高了系统的整体性能。此外,我们提出我们的模型进行定性和定量分析。

25. IMoJIE: Iterative Memory-Based Joint Open Information Extraction [PDF] 返回目录
  Keshav Kolluru, Samarth Aggarwal, Vipul Rathore, Mausam, Soumen Chakrabarti
Abstract: While traditional systems for Open Information Extraction were statistical and rule-based, recently neural models have been introduced for the task. Our work builds upon CopyAttention, a sequence generation OpenIE model (Cui et. al., 2018). Our analysis reveals that CopyAttention produces a constant number of extractions per sentence, and its extracted tuples often express redundant information. We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples. This approach overcomes both shortcomings of CopyAttention, resulting in a variable number of diverse extractions per sentence. We train IMoJIE on training data bootstrapped from extractions of several non-neural systems, which have been automatically filtered to reduce redundancy and noise. IMoJIE outperforms CopyAttention by about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts, establishing a new state of the art for the task.
摘要:虽然开放式信息提取的传统系统进行统计和基于规则的,最近的神经模型已经引入了任务。我们的工作是建立在CopyAttention,一个序列生成OpenIE模型(崔等人,2018)。我们的分析表明,CopyAttention产生每个句子提取的常数,其提取的元组通常表示的冗余信息。我们本IMoJIE,延伸到CopyAttention,产生下一个提取条件对所有先前提取的元组。这种方法克服了CopyAttention的两个缺点,导致每句子多样提取的可变数目。我们培养的训练从几个非神经系统,该系统已经自动过滤,以减少冗余和噪声提取自举数据IMoJIE。 IMoJIE性能优于CopyAttention约18 F1分,并以2分的F1基于BERT强基础,建立艺术为任务的新状态。

26. Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages [PDF] 返回目录
  Tyler A. Chang, Anna N. Rafferty
Abstract: We train neural machine translation (NMT) models from English to six target languages, using NMT encoder representations to predict ancestor constituent labels of source language words. We find that NMT encoders learn similar source syntax regardless of NMT target language, relying on explicit morphosyntactic cues to extract syntactic features from source sentences. Furthermore, the NMT encoders outperform RNNs trained directly on several of the constituent label prediction tasks, suggesting that NMT encoder representations can be used effectively for natural language tasks involving syntax. However, both the NMT encoders and the directly-trained RNNs learn substantially different syntactic information from a probabilistic context-free grammar (PCFG) parser. Despite lower overall accuracy scores, the PCFG often performs well on sentences for which the RNN-based models perform poorly, suggesting that RNN architectures are constrained in the types of syntax they can learn.
摘要:我们训练神经机器翻译(NMT)车型从英语到六个目标语言,使用NMT编码表示预测的源语言单词祖先成分标签。我们发现,NMT编码器学习相似的源语法不管NMT目标语言,依靠明确的形态句法线索从源代码中提取的句子的句法特点。此外,NMT编码器直接跑赢几个的成分标签预测任务的培训RNNs,这表明NMT编码器表示可以有效地对涉及语法自然语言任务中使用。然而,无论是NMT编码器和直接训练RNNs学习从概率上下文无关文法(PCFG)分析器实质上不同句法的信息。尽管较低的整体精度的分数,在PCFG经常为其基于RNN的模型表现不佳的句子表现良好,这表明RNN架构中的语法类型,他们可以学习的制约。

27. Adversarial Training for Commonsense Inference [PDF] 返回目录
  Lis Pereira, Xiaodong Liu, Fei Cheng, Masayuki Asahara, Ichiro Kobayashi
Abstract: We propose an AdversariaL training algorithm for commonsense InferenCE (ALICE). We apply small perturbations to word embeddings and minimize the resultant adversarial risk to regularize the model. We exploit a novel combination of two different approaches to estimate these perturbations: 1) using the true label and 2) using the model prediction. Without relying on any human-crafted features, knowledge bases, or additional datasets other than the target datasets, our model boosts the fine-tuning performance of RoBERTa, achieving competitive results on multiple reading comprehension datasets that require commonsense inference.
摘要:我们提出了常识推理(ALICE)的对抗性训练算法。我们运用小扰动字的嵌入和尽量减少产生对抗风险正规化模型。我们利用的两种不同的方法的新颖组合来估计这些扰动:1)使用利用该模型预测的正确标签和2)。而不依赖于超过目标数据集之外的任何人制作的功能,知识库,或额外的数据集,我们的模型提升罗伯塔的微调性能,实现了在需要常识推理多阅读理解数据集的竞争结果。

28. Semi-Automating Knowledge Base Construction for Cancer Genetics [PDF] 返回目录
  Somin Wadhwa, Kanhua Yin, Kevin S. Hughes, Byron C. Wallace
Abstract: In this work, we consider the exponentially growing subarea of genetics in cancer. The need to synthesize and centralize this evidence for dissemination has motivated a team of physicians to manually construct and maintain a knowledge base that distills key results reported in the literature. This is a laborious process that entails reading through full-text articles to understand the study design, assess study quality, and extract the reported cancer risk estimates associated with particular hereditary cancer genes (i.e., penetrance). In this work, we propose models to automatically surface key elements from full-text cancer genetics articles, with the ultimate aim of expediting the manual workflow currently in place. We propose two challenging tasks that are critical for characterizing the findings reported cancer genetics studies: (i) Extracting snippets of text that describe \emph{ascertainment mechanisms}, which in turn inform whether the population studied may introduce bias owing to deviations from the target population; (ii) Extracting reported risk estimates (e.g., odds or hazard ratios) associated with specific germline mutations. The latter task may be viewed as a joint entity tagging and relation extraction problem. To train models for these tasks, we induce distant supervision over tokens and snippets in full-text articles using the manually constructed knowledge base. We propose and evaluate several model variants, including a transformer-based joint entity and relation extraction model to extract } pairs. We observe strong empirical performance, highlighting the practical potential for such models to aid KB construction in this space. We ablate components of our model, observing, e.g., that a joint model for fares substantially better than a pipelined approach.
摘要:在这项工作中,我们认为在癌症遗传学的成倍增长分区。合成和集中这方面的证据进行传播的需求已经促使一个团队医师手动构建和维护一个知识库,蒸馏出关键成果的文献报道。这是一个艰苦的过程,要经过完整文章阅读理解的研究设计,评估研究质量,并提取与特定的遗传性癌症基因(即外显率)相关的被报告的癌症风险估计。在这项工作中,我们提出的模型,自动从表面全文肿瘤遗传学文章的关键因素,在目前的地方加快手动工作流程的最终目的。我们提出了用于鉴定结果报告的肿瘤遗传学研究的关键2项有挑战性的任务:(一)提取描述\ EMPH {探查机制},这反过来又通知文本片段是否研究可能引入从目标由于偏差偏置人口人口; (ⅱ)提取报道具有特定种系突变相关风险估计(例如,赔率或风险比)。后者任务可以被视为一个关节实体标记和关系抽取问题。为了训练模型,这些任务,我们诱导使用手动构造知识库文章全文在令牌和片段遥远的监督。我们建议和评估模型的几个变种,包括基于变压器的合资实体和关系抽取模型提取<胚系突变,风险估计>}对。我们观察到强的经验性能,彰显在这个空间对于这种模式,以援助KB建设的实际潜力。我们烧蚀我们的组件模型,观察,例如,对于<胚系突变,风险估计>的联合示范票价大幅好于流水线的方式。

29. RPD: A Distance Function Between Word Embeddings [PDF] 返回目录
  Xuhui Zhou, Zaixiang Zheng, Shujian Huang
Abstract: It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.
摘要:这是很好理解的,不同的算法,训练过程和语料库产生不同的字的嵌入。然而,较少有人知道不同的嵌入空间之间的关系,即不同组的嵌入的多远彼此偏离。在本文中,我们提出了一个所谓的两两相对的内积距离(RPD),将不同组字的嵌入之间的距离新颖的度量。该指标有比较不同的组字的嵌入统一的规模。基于RPD的性能,系统地研究不同的算法字的嵌入的关系,并探讨不同的培训流程和语料库的影响。结果阐明了知之甚少字的嵌入和证明RPD为嵌入空间的距离的测量值。

30. Learning Probabilistic Sentence Representations from Paraphrases [PDF] 返回目录
  Mingda Chen, Kevin Gimpel
Abstract: Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences. In this paper we define probabilistic models that produce distributions for sentences. Our best-performing model treats each word as a linear transformation operator applied to a multivariate Gaussian distribution. We train our models on paraphrases and demonstrate that they naturally capture sentence specificity. While our proposed model achieves the best performance overall, we also show that specificity is represented by simpler architectures via the norm of the sentence vectors. Qualitative analysis shows that our probabilistic model captures sentential entailment and provides ways to analyze the specificity and preciseness of individual words.
摘要:概率字的嵌入已捕获的普遍性和蕴涵概念,显示效果,但对做类似类型的调查判刑的很少的工作。在本文中,我们定义了产生判刑分布概率模型。我们的表现最好的模型将每个字作为线性变换算子应用到多变量高斯分布。我们培养的释义我们的模型,并证明他们自然捕捉句子特异性。虽然我们提出的模型整体达到最佳的性能,我们还表明,特异性是由简单的架构通过句子向量的规范来体现。定性分析表明,我们的概率模型捕捉句子蕴涵,并提供方法来分析个别单词的特异性和准确性。

31. Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning [PDF] 返回目录
  Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Xu Sun
Abstract: In sequence-to-sequence learning, the attention mechanism has been a great success in bridging the information between the encoder and the decoder. However, it is often overlooked that the decoder only has a single view of the source sequences, that is, the representations generated by the last encoder layer, which is supposed to be a global view of source sequences. Such implementation hinders the decoder from concrete, fine-grained, local source information. In this work, we explore to reuse the representations from different encoder layers for layer-wise cross-view decoding, that is, different views of the source sequences are presented to different decoder layers. We investigate multiple, representative strategies for cross-view coding, of which the granularity consistent attention (GCA) strategy proves the most efficient and effective in the experiments on neural machine translation task. Especially, GCA surpasses the previous state-of-the-art architecture on three machine translation datasets.
摘要:在序列对序列学习,注意机制一直在缩小编码器和解码器之间的信息了巨大的成功。然而,它经常被忽视的是,解码器仅具有源序列的单一视图,即,由最后编码器层,其被认为是源序列的全局视图生成的表示。这种实现阻碍从具体的,细粒度,本地源信息的解码器。在这项工作中,我们探索重用来自不同的编码器层的表示为层方向的横视图解码,也就是,源序列的不同视图被呈现给不同译码器层。我们调查的编码,其粒度一致关注(GCA)的策略证明了神经机器翻译评测实验最高效和有效的交叉视角多,有代表性的策略。特别是,GCA超过三个机器翻译的数据集的先前状态的最先进的架构。

32. Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehensio [PDF] 返回目录
  Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu
Abstract: In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer. State-of-the-art models tend to use a pretrained transformer model (e.g., BERT) to encode the joint contextual information of document and question. However, these transformer-based models can only take a fixed-length (e.g., 512) text as its input. To deal with even longer text inputs, previous approaches usually chunk them into equally-spaced segments and predict answers based on each segment independently without considering the information from other segments. As a result, they may form segments that fail to cover the correct answer span or retain insufficient contexts around it, which significantly degrades the performance. Moreover, they are less capable of answering questions that need cross-segment information. We propose to let a model learn to chunk in a more flexible way via reinforcement learning: a model can decide the next segment that it wants to process in either direction. We also employ recurrent mechanisms to enable information to flow across segments. Experiments on three MRC datasets -- CoQA, QuAC, and TriviaQA -- demonstrate the effectiveness of our proposed recurrent chunking mechanisms: we can obtain segments that are more likely to contain complete answers and at the same time provide sufficient contexts around the ground truth answers for better predictions.
摘要:在本文中,我们对长文本研究机器阅读理解(MRC),其中一个模型从文档作为一个答案需要为输入冗长的文件和问题,然后提取文本范围。状态的最先进的模型倾向于使用一个预训练的变压器模型(例如,BERT)来编码的文件和有关的关节的上下文信息。然而,这些基于变压器的模型只能取固定长度(例如,512)的文本作为其输入。为了应对更长时间的文字输入,以前的方法通常是一块它们变成相等间隔段和预测基础上每一段独立的答案不考虑其他细分的信息。其结果是,他们可能会形成无法涵盖正确答案跨度或保留周围环境不足,这显著性能变差段。此外,他们能够回答这个需要跨区域信息问题的少。我们建议让模型通过强化学习学习块以更灵活的方式:一种模式可以决定它想要在任何一个方向来处理下一段。我们还聘请复发机制,使信息跨段流动。 CoQA,QuAC和TriviaQA - - 三个MRC数据集实验证明我们提出的经常性分块机制的有效性:我们可以得到更可能包含完整的答案段和在同一时间提供周围的地面真相的答案足够的上下文为更好的预测。

33. IntelliCode Compose: Code Generation Using Transformer [PDF] 返回目录
  Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, Neel Sundaresan
Abstract: In software development through integrated development environments (IDEs), code completion is one of the most widely used features. Nevertheless, majority of integrated development environments only support completion of methods and APIs, or arguments. In this paper, we introduce IntelliCode Compose $-$ a general-purpose multilingual code completion tool which is capable of predicting sequences of code tokens of arbitrary types, generating up to entire lines of syntactically correct code. It leverages state-of-the-art generative transformer model trained on 1.2 billion lines of source code in Python, $C\#$, JavaScript and TypeScript programming languages. IntelliCode Compose is deployed as a cloud-based web service. It makes use of client-side tree-based caching, efficient parallel implementation of the beam search decoder, and compute graph optimizations to meet edit-time completion suggestion requirements in the Visual Studio Code IDE and Azure Notebook. Our best model yields an average edit similarity of $86.7\%$ and a perplexity of 1.82 for Python programming language.
摘要:在软件开发中通过集成开发环境(IDE),代码完成是最广泛的功能之一。然而,大多数的集成开发环境只支持方法和API,或参数完成。在本文中,我们介绍IntelliCode撰写$ - $通用多语言代码完成的工具,它能够预测的任意类型的代码的令牌的序列,产生最多的语法正确的代码整个线。它利用受过训练的关于Python中,$ C \#$,JavaScript和打字稿编程语言1.2十亿行的源代码,国家的最先进的生成变压器模型。 IntelliCode撰写部署为一个基于云的Web服务。它利用客户端基于树的缓存,有效地并行执行束搜索解码器和图形计算的优化在Visual Studio代码IDE和Azure的笔记本电脑满足编辑时间完成建议的要求。我们最好的模型产量为86.7 $平均编辑相似度\%$和1.82 Python编程语言的一个困惑。

34. A Text Reassembling Approach to NaturalLanguage Generation [PDF] 返回目录
  Xiao Li, Kees van Deemter, Chenghua Lin
Abstract: Recent years have seen a number of proposals for performing Natural Language Generation (NLG) based in large part on statistical techniques. Despite having many attractive features, we argue that these existing approaches nonetheless have some important drawbacks, sometimes because the approach in question is not fully statistical (i.e., relies on a certain amount of handcrafting), sometimes because the approach in question lacks transparency. Focussing on some of the key NLG tasks (namely Content Selection, Lexical Choice, and Linguistic Realisation), we propose a novel approach, called the Text Reassembling approach to NLG (TRG), which approaches the ideal of a purely statistical approach very closely, and which is at the same time highly transparent. We evaluate the TRG approach and discuss how TRG may be extended to deal with other NLG tasks, such as Document Structuring, and Aggregation. We discuss the strengths and limitations of TRG, concluding that the method may hold particular promise for domain experts who want to build an NLG system despite having little expertise in linguistics and NLG.
摘要:最近几年,一些用于执行基于对统计技术很大一部分的自然语言生成(NLG)的建议。尽管有许多吸引人的特点,我们认为,这些现有的方法仍然有一些重要的缺点,有时是因为有问题的做法是不完全的统计(即依赖于一定量的手工制作的),有时是因为有问题的做法缺乏透明度。重点放在一些关键NLG任务(即内容选择,词汇选择和语言实现),我们提出了一种新的方法,叫做文本重新组装的方法来NLG(TRG),它非常非常接近纯粹的统计方法的理想,并且其是在同一时间高度透明。我们评估TRG的做法,并讨论如何TRG可扩展以处理其他NLG任务,如文档结构和聚合。我们讨论TRG的优势和局限性,认为该方法可以适用于谁想要尽管有语言学和NLG一点专业知识,建立一个NLG系统领域的专家特别承诺。

35. Unsupervised Embedding-based Detection of Lexical Semantic Changes [PDF] 返回目录
  Ehsaneddin Asgari, Christoph Ringlstetter, Hinrich Schütze
Abstract: This paper describes EmbLexChange, a system introduced by the "Life-Language" team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames t1 and t2). The underlying assumption is that the lexical-semantic change of word w would affect its co-occurring words and subsequently alters the neighborhoods in the embedding spaces. We show that using a resampling framework for the selection of reference words, we can reliably detect lexical-semantic changes in English, German, Swedish, and Latin. EmbLexChange achieved second place in the binary detection of semantic changes in the SemEval-2020.
摘要:本文介绍了EmbLexChange,由“生活语言”团队SemEval-2020推出了任务1的系统,对词汇语义变化无监督检测。 EmbLexChange被定义为字的嵌入基于轮廓瓦特之间的差异在源和目标域(相对于一组参考字计算)(源和目标域可以简单地两个时间帧T1和T2)。基本假设是词w的词汇语义变化将影响其共同出现的词,随后改变了嵌入空间的社区。我们发现,使用重采样框架的参考字的选择,我们可以可靠地检测英语,德语,瑞典语和拉丁词汇语义变化。 EmbLexChange在在SemEval-2020语义更改的二进制检测达到第二位置。

36. Logical Inferences with Comparatives and Generalized Quantifiers [PDF] 返回目录
  Izumi Haruta, Koji Mineshima, Daisuke Bekki
Abstract: Comparative constructions pose a challenge in Natural Language Inference (NLI), which is the task of determining whether a text entails a hypothesis. Comparatives are structurally complex in that they interact with other linguistic phenomena such as quantifiers, numerals, and lexical antonyms. In formal semantics, there is a rich body of work on comparatives and gradable expressions using the notion of degree. However, a logical inference system for comparatives has not been sufficiently developed for use in the NLI task. In this paper, we present a compositional semantics that maps various comparative constructions in English to semantic representations via Combinatory Categorial Grammar (CCG) parsers and combine it with an inference system based on automated theorem proving. We evaluate our system on three NLI datasets that contain complex logical inferences with comparatives, generalized quantifiers, and numerals. We show that the system outperforms previous logic-based systems as well as recent deep learning-based models.
摘要:比较结构带来的自然语言推理(NLI),这是确定文本是否需要一个假设的任务挑战。对比是结构复杂的,因为它们与其他语言现象如量词,数字,和词汇反义词相互作用。在正式的语义,有丰富的使用度的概念比较级和分级性表达的工作机构。然而,对比逻辑推理系统尚未充分在NLI任务而开发的。在本文中,我们提出,在英语的各种比较结构映射到通过组合子范畴语法(CCG)解析器的语义表示进行组成语义和基于自动定理证明的推理系统结合起来。我们评估我们对包含有对比,广义量词,和数字复杂的逻辑推理3个NLI数据集中系统。我们表明,该系统以前优于基于逻辑的系统,以及最近深学习型车型。

37. ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them [PDF] 返回目录
  Dawid Jurkiewicz, Łukasz Borchmann, Izabela Kosmala, Filip Graliński
Abstract: This paper presents the winning system for the propaganda Technique Classification (TC) task and the second-placed system for the propaganda Span Identification (SI) task. The purpose of TC task was to identify an applied propaganda technique given propaganda text fragment. The goal of SI task was to find specific text fragments which contain at least one propaganda technique. Both of the developed solutions used semi-supervised learning technique of self-training. Interestingly, although CRF is barely used with transformer-based language models, the SI task was approached with RoBERTa-CRF architecture. An ensemble of RoBERTa-based models was proposed for the TC task, with one of them making use of Span CLS layers we introduce in the present paper. In addition to describing the submitted systems, an impact of architectural decisions and training schemes is investigated along with remarks regarding training models of the same or better quality with lower computational budget. Finally, the results of error analysis are presented.
摘要:本文介绍了宣传技术分类(TC)的任务,为宣传跨度识别(SI)任务的第二位的系统制胜系统。 TC任务的目的是确定施加政治宣传技巧给予宣传文字片段。 SI任务的目标是找到它至少包含一个宣传手法特定的文本片段。无论是开发的解决方案中使用的自我训练半监督学习技术。有趣的是,虽然是CRF勉强基于变压器的语言模型中使用时,SI任务与罗伯塔-CRF架构走近。提出了一种基于罗伯塔的模型的集合为TC任务,其中一人利用我们在本文介绍跨度CLS层。除了描述提交系统的架构决策和培训计划的影响以及对于相同或更好的质量与较低的计算预算的培训模式言论的影响。最后,误差分析的结果。

38. Sequential Sentence Matching Network for Multi-turn Response Selection in Retrieval-based Chatbots [PDF] 返回目录
  Chao Xiong, Che Liu, Zijun Xu, Junfeng Jiang, Jieping Ye
Abstract: Recently, open domain multi-turn chatbots have attracted much interest from lots of researchers in both academia and industry. The dominant retrieval-based methods use context-response matching mechanisms for multi-turn response selection. Specifically, the state-of-the-art methods perform the context-response matching by word or segment similarity. However, these models lack a full exploitation of the sentence-level semantic information, and make simple mistakes that humans can easily avoid. In this work, we propose a matching network, called sequential sentence matching network (S2M), to use the sentence-level semantic information to address the problem. Firstly and most importantly, we find that by using the sentence-level semantic information, the network successfully addresses the problem and gets a significant improvement on matching, resulting in a state-of-the-art performance. Furthermore, we integrate the sentence matching we introduced here and the usual word similarity matching reported in the current literature, to match at different semantic levels. Experiments on three public data sets show that such integration further improves the model performance.
摘要:近日,开域多圈聊天机器人吸引了来自许多在学术界和工业界的研究人员的极大兴趣。占主导地位的基于内容的检索的方法用于多匝响应选择上下文响应匹配机制。具体而言,状态的最先进的方法由字或段的相似性执行上下文响应匹配。然而,这些模型缺乏充分利用的句级语义信息,并进行简单的错误,人类可以很容易地避免。在这项工作中,我们提出了一个匹配网络,被称为连续句子匹配网络(S2M),用句级语义信息来解决这个问题。首先,最重要的是,我们发现,通过使用句级语义信息,网络成功地解决了问题,并得到匹配一个显著的改善,导致国家的最先进的性能。此外,我们整合我们这里介绍的句子匹配,并在目前的文献报道常用的单词相似性匹配,以匹配不同的语义水平。在三个公开的数据集上的实验表明,这种整合进一步提高模型的性能。

39. Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection [PDF] 返回目录
  Lei Zhong, Juan Cao, Qiang Sheng, Junbo Guo, Ziang Wang
Abstract: Identifying controversial posts on social media is a fundamental task for mining public sentiment, assessing the influence of events, and alleviating the polarized views. However, existing methods fail to 1) effectively incorporate the semantic information from content-related posts; 2) preserve the structural information for reply relationship modeling; 3) properly handle posts from topics dissimilar to those in the training set. To overcome the first two limitations, we propose Topic-Post-Comment Graph Convolutional Network (TPC-GCN), which integrates the information from the graph structure and content of topics, posts, and comments for post-level controversy detection. As to the third limitation, we extend our model to Disentangled TPC-GCN (DTPC-GCN), to disentangle topic-related and topic-unrelated features and then fuse dynamically. Extensive experiments on two real-world datasets demonstrate that our models outperform existing methods. Analysis of the results and cases proves that our models can integrate both semantic and structural information with significant generalizability.
摘要:在社交媒体上识别争议的帖子是挖掘民情,评估事件的影响,并减轻了两种意见的根本任务。但是,现有的方法不能1)有效地将来自内容相关帖子的语义信息; 2)保存的答复关系建模的结构信息; 3)妥善处理不同的主题中的那些训练了一套岗位。为了克服前两个限制,我们提出话题后评论曲线图卷积网络(TPC-GCN),它集成了从主题,帖子和评论对后级争论检测图形结构和内容的信息。至于第三个限制,我们保险丝动态地扩展我们的模型来解开的TPC-GCN(DTPC-GCN),理清与主题相关的主题无关的特性,然后。两个真实世界的数据集,大量实验表明,我们的模型优于现有的方法。结果和案例分析证明,我们的模型可以整合与显著普遍性语义和结构信息。

40. MicroNet for Efficient Language Modeling [PDF] 返回目录
  Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han
Abstract: It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and we have released our full source code at this https URL.
摘要:设计简洁的语言,型号为有效的部署是非常重要的。我们在提高在语言域建模和模型压缩域结构参数和计算效率的语言模型两者的最新进展。我们使用自适应嵌入和SOFTMAX,微非参数缓存,赫布SOFTMAX,知识蒸馏,网络修剪和低比特量化的高效变压器为基础的架构。在本文中,我们提供了成功的解决方案到NeurIPS 2019的MicroNet挑战的语言模型轨道。相比于通过的MicroNet挑战提供基线语言模型,我们的模型是90倍以上的参数,高效和36倍的计算效率的同时实现对wikitext的-103数据集的35所要求的试验困惑。我们希望,这项工作将有助于未来的研究高效的语言模型,我们已经发布了我们的全部源代码在此HTTPS URL。

41. Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms [PDF] 返回目录
  Gale Yan Huang, Jiahao Chen, Haochen Liu, Weiping Fu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Guoliang Li, Zitao Liu
Abstract: Asking questions is one of the most crucial pedagogical techniques used by teachers in class. It not only offers open-ended discussions between teachers and students to exchange ideas but also provokes deeper student thought and critical analysis. Providing teachers with such pedagogical feedback will remarkably help teachers improve their overall teaching quality over time in classrooms. Therefore, in this work, we build an end-to-end neural framework that automatically detects questions from teachers' audio recordings. Compared with traditional methods, our approach not only avoids cumbersome feature engineering, but also adapts to the task of multi-class question detection in real education scenarios. By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions. We conducted extensive experiments on the question detection tasks in a real-world online classroom dataset and the results demonstrate the superiority of our model in terms of various evaluation metrics.
摘要:提出问题是在课堂上使用教师最关键的教学方法之一。它不仅提供了教师和学生之间的开放式讨论,交换想法,但也引发更深层次的学生的思考和批判性的分析。为教师提供教学等反馈将大大帮助教师提高在教室的时间他们的整体教学质量。因此,在这项工作中,我们必须建立一个终端到终端的神经框架,能够自动检测从老师的录音问题。与传统方法相比,我们的方法不仅避免了繁琐的功能设计,也适应于真正的教育场景的多类问题的检测任务。通过将多任务学习技术,我们能够加强不同类型的问题之间的语义关系的理解。我们在真实世界的在线课堂的数据集进行了关于这个问题的检测任务,大量的实验,结果证明我们的模型的优势在各个评价指标的条款。

42. KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT [PDF] 返回目录
  Saja Khaled Tawalbeh, Mahmoud Hammad, Mohammad AL-Smadi
Abstract: This research presents our team KEIS@JUST participation at SemEval-2020 Task 12 which represents shared task on multilingual offensive language. We participated in all the provided languages for all subtasks except sub-task-A for the English language. Two main approaches have been developed the first is performed to tackle both languages Arabic and English, a weighted ensemble consists of Bi-GRU and CNN followed by Gaussian noise and global pooling layer multiplied by weights to improve the overall performance. The second is performed for other languages, a transfer learning from BERT beside the recurrent neural networks such as Bi-LSTM and Bi-GRU followed by a global average pooling layer. Word embedding and contextual embedding have been used as features, moreover, data augmentation has been used only for the Arabic language.
摘要:研究提出我们的团队KEIS @ JUST参与在SemEval-2020工作12表示对多语言攻击性的语言共同任务。我们参加了不同的子任务-A的所有子任务的英语提供的所有语言。两种主要的方法已经开发了第一个被执行,以解决这两种语言阿拉伯文和英文,加权合奏由双GRU和CNN,随后高斯噪声和乘以权重,以提高整体性能的全球统筹层。第二个是用于其他语言的执行,转印从BERT学习递归神经网络,例如Bi-LSTM和Bi-GRU旁后跟一个全球平均池层。作为特征,而且,数据增强了仅用于阿拉伯语单词嵌入和情境嵌入已经被使用。

43. A Scientific Information Extraction Dataset for Nature Inspired Engineering [PDF] 返回目录
  Ruben Kruiper, Julian F.V. Vincent, Jessica Chen-Burger, Marc P.Y. Desmulliez, Ioannis Konstas
Abstract: Nature has inspired various ground-breaking technological developments in applications ranging from robotics to aerospace engineering and the manufacturing of medical devices. However, accessing the information captured in scientific biology texts is a time-consuming and hard task that requires domain-specific knowledge. Improving access for outsiders can help interdisciplinary research like Nature Inspired Engineering. This paper describes a dataset of 1,500 manually-annotated sentences that express domain-independent relations between central concepts in a scientific biology text, such as trade-offs and correlations. The arguments of these relations can be Multi Word Expressions and have been annotated with modifying phrases to form non-projective graphs. The dataset allows for training and evaluating Relation Extraction algorithms that aim for coarse-grained typing of scientific biological documents, enabling a high-level filter for engineers.
摘要:自然应用范围从机器人到航空航天工程和医疗器械生产,激发各种突破性的技术发展。然而,在访问科学的生物学教科书,捕获的信息是需要特定领域知识耗时而艰巨的任务。改善外人访问可以有助于跨学科的研究就像自然启发工程。本文介绍了1500表达在科学生物学文本中心概念之间的领域无关的关系,如权衡和关联手动标注句子的数据集。这些关系的参数可以是多词汇表达和批注的修改短语形成非投影图。该数据集可用于培训和评估,旨在为生物科学文献粗粒打字关系抽取算法,使得工程师的高级过滤器。

44. In Layman's Terms: Semi-Open Relation Extraction from Scientific Texts [PDF] 返回目录
  Ruben Kruiper, Julian F.V. Vincent, Jessica Chen-Burger, Marc P.Y. Desmulliez, Ioannis Konstas
Abstract: Information Extraction (IE) from scientific texts can be used to guide readers to the central information in scientific documents. But narrow IE systems extract only a fraction of the information captured, and Open IE systems do not perform well on the long and complex sentences encountered in scientific texts. In this work we combine the output of both types of systems to achieve Semi-Open Relation Extraction, a new task that we explore in the Biology domain. First, we present the Focused Open Biological Information Extraction (FOBIE) dataset and use FOBIE to train a state-of-the-art narrow scientific IE system to extract trade-off relations and arguments that are central to biology texts. We then run both the narrow IE system and a state-of-the-art Open IE system on a corpus of 10k open-access scientific biological texts. We show that a significant amount (65%) of erroneous and uninformative Open IE extractions can be filtered using narrow IE extractions. Furthermore, we show that the retained extractions are significantly more often informative to a reader.
摘要:信息抽取(IE)从科学的文本可以被用来引导读者在科学文献中心的信息。而窄的IE系统仅提取捕获的信息的一小部分,而打开IE系统并不在科学文献中遇到的漫长而复杂的句子表现良好。在这项工作中,我们结合两种类型的系统的输出实现半开式关系抽取,新的任务,我们在生物学领域探索。首先,我们目前的重点开放生物信息抽取(FOBIE)数据集,并使用FOBIE培养一个国家的最先进的狭窄的科学IE系统提取折衷是中央的生物学教科书,关系和参数。然后,我们的10K开放获取科学生物文本语料库运行窄IE系统和一个国家的最先进的开放式IE系统两者。我们表明,错误的和不提供信息的开放式IE提取的显著量(65%)可使用窄IE提取进行过滤。此外,我们还表明,保留提取更多的信息往往对读者有显著。

45. Uncovering Gender Bias in Media Coverage of Politicians with Machine Learning [PDF] 返回目录
  Susan Leavy
Abstract: This paper presents research uncovering systematic gender bias in the representation of political leaders in the media, using artificial intelligence. Newspaper coverage of Irish ministers over a fifteen year period was gathered and analysed with natural language processing techniques and machine learning. Findings demonstrate evidence of gender bias in the portrayal of female politicians, the kind of policies they were associated with and how they were evaluated in terms of their performance as political leaders. This paper also sets out a methodology whereby media content may be analysed on a large scale utilising techniques from artificial intelligence within a theoretical framework founded in gender theory and feminist linguistics.
摘要:本文研究呈现在媒体的政治领袖的代表揭露系统的性别偏见,采用人工智能。在十五年期爱尔兰部长本报报道被收集和使用自然语言处理技术和机器学习分析。研究结果表明,女性政治家的那种,他们分别与相关政策的写照性别偏见的证据,以及他们如何在他们的政治领袖性能方面进行评估。本文还提出了一种方法,其中的媒体内容可以在利用成立于性别理论和女性主义语言学的理论框架内,从人工智能技术大规模地进行分析。

46. Critical Impact of Social Networks Infodemic on Defeating Coronavirus COVID-19 Pandemic: Twitter-Based Study and Research Directions [PDF] 返回目录
  Azzam Mourad, Ali Srour, Haidar Harmanani, Cathia Jenainatiy, Mohamad Arafeh
Abstract: News creation and consumption has been changing since the advent of social media. An estimated 2.95 billion people in 2019 used social media worldwide. The widespread of the Coronavirus COVID-19 resulted with a tsunami of social media. Most platforms were used to transmit relevant news, guidelines and precautions to people. According to WHO, uncontrolled conspiracy theories and propaganda are spreading faster than the COVID-19 pandemic itself, creating an infodemic and thus causing psychological panic, misleading medical advises, and economic disruption. Accordingly, discussions have been initiated with the objective of moderating all COVID-19 communications, except those initiated from trusted sources such as the WHO and authorized governmental entities. This paper presents a large-scale study based on data mined from Twitter. Extensive analysis has been performed on approximately one million COVID-19 related tweets collected over a period of two months. Furthermore, the profiles of 288,000 users were analyzed including unique users profiles, meta-data and tweets context. The study noted various interesting conclusions including the critical impact of the (1) exploitation of the COVID-19 crisis to redirect readers to irrelevant topics and (2) widespread of unauthentic medical precautions and information. Further data analysis revealed the importance of using social networks in a global pandemic crisis by relying on credible users with variety of occupations, content developers and influencers in specific fields. In this context, several insights and findings have been provided while elaborating computing and non-computing implications and research directions for potential solutions and social networks management strategies during crisis periods.
摘要:新闻创作和消费,因为社交媒体的出现已经改变。据估计,2.95十亿人在2019年全球使用社交媒体。广泛的冠状病毒COVID-19的产生与社会化媒体的海啸。大多数平台都用来传输相关的新闻,原则和注意事项的人。根据世界卫生组织,不受控制的阴谋论和宣传正在蔓延比COVID-19大流行本身更快,创造了一个infodemic,从而造成心理恐慌,误导性医疗建议和经济混乱。因此,讨论已经开始,其目标缓和所有COVID-19通信,除了来自可信来源发起的,如世界卫生组织和授权的政府实体。本文提出了一种基于从Twitter挖掘数据的大规模研究。已经进行了广泛的分析,收集在为期两个月的大约一百万COVID-19相关的微博。此外,288000个用户的配置文件进行分析,包括独特的用户配置文件,元数据和鸣叫上下文。研究指出各种有趣的结论包括:(2)广泛的不可信的医疗预防措施和信息(1)开发的COVID-19危机的重要影响,以飨读者重定向到不相关的主题和。进一步的数据分析依靠可信的用户提供各种职业,内容开发商和影响力在特定领域中揭示了一个全球性流行病危机使用社交网络的重要性。在此背景下,一些观点和结论已经提供,同时阐述计算和非计算的影响,研究方向为在危机期间可能的解决方案和社交网络管理策略。

47. Machine learning on Big Data from Twitter to understand public reactions to COVID-19 [PDF] 返回目录
  Jia Xue, Junxiang Chen, Chen Chen, ChengDa Zheng, Tingshao Zhu
Abstract: The study aims to understand Twitter users' discussions and reactions about the COVID-19. We use machine learning techniques to analyze about 1.8 million Tweets messages related to coronavirus collected from January 20th to March 7th, 2020. A total of "cases outside China (worldwide)," "COVID-19 outbreak in South Korea," "early signs of the outbreak in New York," "Diamond Princess cruise," "economic impact," "Preventive/Protective measures," "authorities," and "supply chain". Results do not reveal treatment and/or symptoms related messages as a prevalent topic on Twitter. We also run sentiment analysis and the results show that trust for the authorities remained a prevalent emotion, but mixed feelings of trust for authorities, fear for the outbreak, and anticipation for the potential preventive measures will be taken are identified. Implications and limitations of the study are also discussed.
摘要:本研究旨在了解Twitter用户关于COVID-19的讨论和反应。我们使用机器学习技术来分析约180万的鸣叫与从1月20日收集到的3月7日消息冠状病毒,2020年共有“COVID-19的爆发在韩国”,“初步迹象“中国(全球),外面的情况”纽约爆发“,‘钻石公主邮轮’,‘经济影响’,‘预防/保护措施’,‘权威’和‘供应链’。结果不露治疗和/或症状有关的消息是在Twitter上流行的话题。我们也跑情绪分析,结果表明对当局的信任仍然是一个普遍的情感,但对当局信任的感慨,担心爆发,并为潜在的预防措施的预期将采取的标识。这项研究的意义和局限性进行了讨论。

48. Conversational Search -- A Report from Dagstuhl Seminar 19461 [PDF] 返回目录
  Avishek Anand, Lawrence Cavedon, Matthias Hagen, Hideo Joho, Mark Sanderson, Benno Stein
Abstract: Dagstuhl Seminar 19461 "Conversational Search" was held on 10-15 November 2019. 44~researchers in Information Retrieval and Web Search, Natural Language Processing, Human Computer Interaction, and Dialogue Systems were invited to share the latest development in the area of Conversational Search and discuss its research agenda and future directions. A 5-day program of the seminar consisted of six introductory and background sessions, three visionary talk sessions, one industry talk session, and seven working groups and reporting sessions. The seminar also had three social events during the program. This report provides the executive summary, overview of invited talks, and findings from the seven working groups which cover the definition, evaluation, modelling, explanation, scenarios, applications, and prototype of Conversational Search. The ideas and findings presented in this report should serve as one of the main sources for diverse research programs on Conversational Search.
摘要:Dagstuhl研讨会19461“对话搜索”是2019年十一月举行10-15 44〜研究人员在信息检索和网页搜索,自然语言处理,人机交互和对话系统被邀请分享最新发展领域对话搜索和讨论其研究议程和未来的方向。本次研讨会的有5天的方案包括六个介绍和背景会话,三次富有远见的谈话会议,一位业内人士说会话,以及七个工作组和报告会。本次研讨会也有计划在三个社交活动。本报告提供执行摘要,特邀报告的概要,并从覆盖的定义,评估,建模,解释,场景,应用程序和原型对话搜索的七个工作组的调查结果。本报告中提出的观点和结论应作为的有关对话搜索不同的研究项目的主要来源之一。

49. Design Choices for X-vector Based Speaker Anonymization [PDF] 返回目录
  Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
Abstract: The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection. To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their knowledge of the anonymization scheme. The Equal Error Rate (EER) achieved by the attackers and the decoding Word Error Rate (WER) over anonymized data are reported as the measures of privacy and utility. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
摘要:最近提出的X基于矢量的匿名方式将任何输入的语音到一个随机伪扬声器。在本文中,我们提出了一种灵活的伪扬声器选择技术作为用于第一VoicePrivacy挑战的基线。我们探索扬声器之间的距离度量几个设计选择,这里的假想扬声器拾取X向量空间,和性别选择的区域。为了评估实现匿名的实力,我们可以考虑使用谁可以使用原来的或匿名发言报名,根据他们的匿名方案的知识的X基于矢量的说话人验证系统的攻击。由攻击者和解码词错误率(WER)在匿名数据报告隐私权和实用的措施来实现等错误率(EER)。实验使用来自LibriSpeech导出数据集找到的设计选择中的私密性和实用性方面的最佳组合进行。

50. Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation [PDF] 返回目录
  Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Shang-Wen Li, Hung-yi Lee
Abstract: For self-supervised speech processing, it is crucial to use pretrained models as speech representation extractors. In recent works, increasing the size of the model has been utilized in acoustic model training in order to achieve better performance. In this paper, we propose Audio ALBERT, a lite version of the self-supervised speech representation model. We use the representations with two downstream tasks, speaker identification, and phoneme classification. We show that Audio ALBERT is capable of achieving competitive performance with those huge models in the downstream tasks while utilizing 91\% fewer parameters. Moreover, we use some simple probing models to measure how much the information of the speaker and phoneme is encoded in latent representations. In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.
摘要:对于自监督语音处理,关键是使用预训练的模型作为演讲表示提取。在近期的作品,增加模型的大小已被用于声学模型训练,以达到更好的性能。在本文中,我们提出了音频伟业,自我监督的讲话表示模型的一个精简版的版本。我们使用两个下游任务,说话人识别和音素分类表示。我们表明,音频伟业,同时使用较少的91 \%的参数能够实现与下游任务的巨大的模型有竞争力的表现。此外,我们使用一些简单的探测模型多少来衡量的扬声器和音素的信息在潜伏表示编码。在探测实验中,我们发现,潜表示编码既音素和扬声器比最后一层的更丰富的信息。

51. Audio-visual Multi-channel Recognition of Overlapped Speech [PDF] 返回目录
  Jianwei Yu, Bo Wu, Rongzhi Gu Shi-Xiong Zhang Lianwu Chen Yong Xu Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng
Abstract: Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual modality to acoustic signal corruption, this paper presents an audio-visual multi-channel overlapped speech recognition system featuring tightly integrated separation front-end and recognition back-end. A series of audio-visual multi-channel speech separation front-end components based on \textit{TF masking}, \textit{filter\&sum} and \textit{mask-based MVDR} beamforming approaches were developed. To reduce the error cost mismatch between the separation and recognition components, they were jointly fine-tuned using the connectionist temporal classification (CTC) loss function, or a multi-task criterion interpolation with scale-invariant signal to noise ratio (Si-SNR) error cost. Experiments suggest that the proposed multi-channel AVSR system outperforms the baseline audio-only ASR system by up to 6.81\% (26.83\% relative) and 22.22\% (56.87\% relative) absolute word error rate (WER) reduction on overlapped speech constructed using either simulation or replaying of the lipreading sentence 2 (LRS2) dataset respectively.
摘要:重叠语音自动语音识别(ASR)仍然是一个非常具有挑战性的任务日期。为此,多声道麦克风阵列数据被广泛应用于国家的最先进的ASR系统。通过视觉模态的不变性声学信号损坏的启发,提出了一种视听多通道重叠的语音识别系统,其特点紧密集成分离前端和识别后端。一系列基于\ textit {TF掩蔽}视听多通道语音分离前端组件的,\ textit {滤波器\&总和}和\ textit {基于掩模的MVDR波束成形}方法被开发出来。减少分离和识别组件之间的误差成本不匹配,将它们共同使用联结颞分类(CTC)损耗函数微调,或一个多任务标准插补尺度不变的信号噪声比(Si-SNR)错误成本。实验表明,所提出的多通道AVSR系统优于由向上基线纯音频ASR系统到6.81 \%(相对于26.83 \%)和22.22 \%(56.87 \%相对)绝对字差错率(WER)还原上重叠语音构造或者使用模拟或数据集分别为2(LRS2)的唇读句子的重放。

52. Robust Training of Vector Quantized Bottleneck Models [PDF] 返回目录
  Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J.G.A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent
Abstract: In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line $k$-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.
摘要:在本文中,我们展示了使用矢量量化,变自动编码器模型离散表示的可靠和高效的培训(VQ-VAES)的方法。离散潜变量模型已被证明学习讲话的非平凡表示,适用于无监督语音转换和深远的单位发现任务的国家的最先进的性能。对于无监督的学习表现,他们成为了可行的替代方案,以连续潜变量模型,如变自动编码器(VAE)。然而,培养深厚的离散变量模型是一个挑战,由于离散化操作的固有的非可辨。在本文中,我们侧重于VQ-VAE,显示出与它的连续的对应看齐执行状态的最先进的离散瓶颈模型。它用量化上线$ķ$ -means集群编码器输出。我们表明,码书学习可以从初始化穷人和集群编码器输出的非平稳性受到影响。我们证明,这些可以通过增加学习率码本和定期日期相关的码字重新初始化成功克服。其结果是,我们实现了跨不同的任务更强大的培训,并显著增加潜在的码字的使用,甚至对于大码本。这有实际的好处,例如,在无监督表示学习,其中大码本可能导致潜在的交涉解开。

53. Attention-based Transducer for Online Speech Recognition [PDF] 返回目录
  Bin Wang, Yan Yin, Hui Lin
Abstract: Recent studies reveal the potential of recurrent neural network transducer (RNN-T) for end-to-end (E2E) speech recognition. Among some most popular E2E systems including RNN-T, Attention Encoder-Decoder (AED), and Connectionist Temporal Classification (CTC), RNN-T has some clear advantages given that it supports streaming recognition and does not have frame-independency assumption. Although significant progresses have been made for RNN-T research, it is still facing performance challenges in terms of training speed and accuracy. We propose attention-based transducer with modification over RNN-T in two aspects. First, we introduce chunk-wise attention in the joint network. Second, self-attention is introduced in the encoder. Our proposed model outperforms RNN-T for both training speed and accuracy. For training, we achieves over 1.7x speedup. With 500 hours LAIX non-native English training data, attention-based transducer yields ~10.6% WER reduction over baseline RNN-T. Trained with full set of over 10K hours data, our final system achieves ~5.5% WER reduction over that trained with the best Kaldi TDNN-f recipe. After 8-bit weight quantization without WER degradation, RTF and latency drop to 0.34~0.36 and 268~409 milliseconds respectively on a single CPU core of a production server.
摘要:最近的研究显示经常性的神经网络转换器(RNN-T)的提供端与端(E2E)语音识别的潜力。其中一些最流行的E2E系统包括RNN-T,注意编码器 - 解码器(AED),并联结态分类(CTC),RNN-T具有给定的,它支持流识别,并且不具有帧独立性假设一些明显的优点。虽然显著进展已经作出RNN-T研究,但仍面临挑战,表现在训练速度和精度方面。我们建议关注基于换能器修改过RNN-T在两个方面。首先,我们介绍了联合网络中块明智的关注。二,自我关注在编码器引入的。我们提出的模型优于RNN-T针对训练速度和精度。对于训练,我们实现了1.7倍的速度提升。随着500小时LAIX母语非英语的训练数据,关注基于传感器的产量〜10.6%WER减少超过基线RNN-T。拥有整套超过10K小时的数据的训练,我们的最终系统实现了〜5.5%WER减少了与最好的Kaldi TDNN-F配方培训。后没有降解WER,RTF 8位加权量化和等待时间下降到0.34分别〜0.36和268〜409毫秒在生产服务器上的单个CPU核心。

54. An Effective End-to-End Modeling Approach for Mispronunciation Detection [PDF] 返回目录
  Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen
Abstract: Recently, end-to-end (E2E) automatic speech recognition (ASR) systems have garnered tremendous attention because of their great success and unified modeling paradigms in comparison to conventional hybrid DNN-HMM ASR systems. Despite the widespread adoption of E2E modeling frameworks on ASR, there still is a dearth of work on investigating the E2E frameworks for use in computer-assisted pronunciation learning (CAPT), particularly for Mispronunciation detection (MD). In response, we first present a novel use of hybrid CTCAttention approach to the MD task, taking advantage of the strengths of both CTC and the attention-based model meanwhile getting around the need for phone-level forced alignment. Second, we perform input augmentation with text prompt information to make the resulting E2E model more tailored for the MD task. On the other hand, we adopt two MD decision methods so as to better cooperate with the proposed framework: 1) decision-making based on a recognition confidence measure or 2) simply based on speech recognition results. A series of Mandarin MD experiments demonstrate that our approach not only simplifies the processing pipeline of existing hybrid DNN-HMM systems but also brings about systematic and substantial performance improvements. Furthermore, input augmentation with text prompts seems to hold excellent promise for the E2E-based MD approach.
摘要:近日,端至端(E2E)自动语音识别(ASR)系统已经赢得了,因为他们的大获成功的极大关注和统一相较于传统的混合DNN-HMM ASR系统的建模范例。尽管在ASR普遍采用端到端建模框架,仍有工作的查处E2E框架计算机辅助发音学习(CAPT),特别是发音错误检测(MD)的使用缺乏。为此,我们首先提出一个新颖的使用混合动力CTCAttention方式向MD任务,同时服用CTC和关注基于模型的同时让周围的手机级别的强制对准需要的长处。其次,我们必须进行输入增强与文字提示信息,使所产生的E2E型号为MD任务更加适合。在另一方面,我们采用两种MD决策方法,以便更好地与所提议的框架合作:1)决策的基础上识别置信度或2)简单地基于语音识别结果。一系列的普通话MD实验表明,我们的方法不仅简化现有的混合动力DNN-HMM系统的处理管道,但也带来了系统性和实质性的性能提高。此外,有文字提示输入隆胸似乎保持基于E2E-MD的做法优秀的诺言。

55. The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge [PDF] 返回目录
  Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen
Abstract: This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA. This ASR shared task is made much more challenging due to the coexisting diversity of non-native and children speaking characteristics. In the setting of closed-track evaluation, all participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer. To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models, meanwhile harnessing the synergistic power of various data augmentation strategies, including both utterance- and word-level speed perturbation and spectrogram augmentation, alongside a simple yet effective data-cleansing approach. All variants of our ASR system employed an RNN-based language model to rescore the first-pass recognition hypotheses, which was trained solely on the text dataset released by the organizer. Our system with the best configuration came out in second place, resulting in a word error rate (WER) of 17.59 %, while those of the top-performing, second runner-up and official baseline systems are 15.67%, 18.71%, 35.09%, respectively.
摘要:本文介绍了NTNU ASR系统参与INTERSPEECH 2020非本地儿童语音ASR挑战由SIG-子组ISCA的支持。这ASR共享任务变得更加具有挑战性,由于非本地和孩子说话的特点共存的多样性。在封闭轨道评价的设置,所有的参与者被限制仅仅根据主办方提供的语音和文本语料库来开发自己的系统。要解决此资源不足的问题,我们建立在-TDNNF基于CNN声学模型的顶部我们的ASR系统,同时利用各种数据扩张战略的协同力量,包括utterance-和字级速度扰动和频谱增强,旁边一个简单而有效的数据清洗方法。我们采用基于RNN的语言模型rescore第一通识别假设ASR系统,该系统由主办方公布的文本数据集只训练有素的所有变体。我们用最好的配置系统在第二位出来,造成了字错误率17.59%(WER),而那些表现最出色,季军和官方基准系统是15.67%,18.71%,35.09% , 分别。

56. Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP [PDF] 返回目录
  Pedram Hosseini, Poorya Hosseini, David A. Broniatowski
Abstract: Iran, along with China, South Korea, and Italy was among the countries that were hit hard in the first wave of the COVID-19 spread. Twitter is one of the widely-used online platforms by Iranians inside and abroad for sharing their opinion, thoughts, and feelings about a wide range of issues. In this study, using more than 530,000 original tweets in Persian/Farsi on COVID-19, we analyzed the topics discussed among users, who are mainly Iranians, to gauge and track the response to the pandemic and how it evolved over time. We applied a combination of manual annotation of a random sample of tweets and topic modeling tools to classify the contents and frequency of each category of topics. We identified the top 25 topics among which living experience under home quarantine emerged as a major talking point. We additionally categorized broader content of tweets that shows satire, followed by news, is the dominant tweet type among the Iranian users. While this framework and methodology can be used to track public response to ongoing developments related to COVID-19, a generalization of this framework can become a useful framework to gauge Iranian public reaction to ongoing policy measures or events locally and internationally.
摘要:伊朗,中国,韩国一起,和意大利是被在COVID-19传播的首波主打硬盘的国家之一。 Twitter是内部和国外共享关于一系列的问题发表意见,想法和感受被广泛使用的在线平台伊朗人之一。在这项研究中,使用53个万多原来的鸣叫在波斯语/波斯语上COVID-19,我们分析的用户,谁是主要伊朗人之间讨论的议题,来衡量和跟踪对流感大流行的应对,以及它如何随着时间而演变。我们采用的鸣叫和主题建模工具,以主题的每个类别的内容和频率进行分类随机抽样的人工标注的组合。我们确定了前25的主题其中在居家隔离生活体验成为主要话题。我们还分类鸣叫的更广泛的内容,节目讽刺,其次是新闻,是伊朗的用户中占主导地位的鸣叫类型。虽然这种框架和方法可以用来追踪到相关的COVID-19目前的事态发展公开回应,这一框架的推广可以成为一个有用的框架,以了解伊朗民众反应正在进行的政策措施或事件的本地和国际。

57. Vector-Quantized Autoregressive Predictive Coding [PDF] 返回目录
  Yu-An Chung, Hao Tang, James Glass
Abstract: Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks. However, the connection between low self-supervised loss and strong performance in downstream tasks remains unclear. In this work, we propose Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations, allowing us to explicitly control the amount of information encoded in the representations. By studying a sequence of increasingly limited models, we reveal the constituents of the learned representations. In particular, we confirm the presence of information with probing tasks, while showing the absence of information with mutual information, uncovering the model's preference in preserving speech information as its capacity becomes constrained. We find that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective. As a byproduct, the learned codes for a particular model capacity correspond well to English phones.
摘要:自回归预测编码(APC),如从大量的无标签数据的学习表示自我监督的目标,一直享有的成功,以及学习表示是丰富了很多下游的任务。然而,在下游任务低自我监督损耗和强劲的性能之间的关系尚不清楚。在这项工作中,我们提出了矢量量化,自回归预测编码(VQ-APC),产生量化表示,使我们能够明确控制在表示编码信息量新模式。通过研究越来越有限机型的序列中,我们揭示了解到表示的成分。特别是,我们确认的信息与探测任务的存在,同时表现出缺乏与互信息资料,其容量为约束保留语音信息揭露模式的偏好。我们发现,存在其中的语音和扬声器的信息被放大到最大化点自我监督的目标。作为一个副产品,用于特定型号容量学习的代码对应关系较好的英语电话。

58. Fixed Point Semantics for Stream Reasoning [PDF] 返回目录
  Christian Antić
Abstract: Reasoning over streams of input data is an essential part of human intelligence. During the last decade {\em stream reasoning} has emerged as a research area within the AI-community with many potential applications. In fact, the increased availability of streaming data via services like Google and Facebook has raised the need for reasoning engines coping with data that changes at high rate. Recently, the rule-based formalism {\em LARS} for non-monotonic stream reasoning under the answer set semantics has been introduced. Syntactically, LARS programs are logic programs with negation incorporating operators for temporal reasoning, most notably {\em window operators} for selecting relevant time points. Unfortunately, by preselecting {\em fixed} intervals for the semantic evaluation of programs, the rigid semantics of LARS programs is not flexible enough to {\em constructively} cope with rapidly changing data dependencies. Moreover, we show that defining the answer set semantics of LARS in terms of FLP reducts leads to undesirable circular justifications similar to other ASP extensions. This paper fixes all of the aforementioned shortcomings of LARS. More precisely, we contribute to the foundations of stream reasoning by providing an operational fixed point semantics for a fully flexible variant of LARS and we show that our semantics is sound and constructive in the sense that answer sets are derivable bottom-up and free of circular justifications.
摘要:在推理的输入数据流是人类智慧的一个重要组成部分。在过去的十年{\ EM流推理}已经​​成为了AI-社区许多潜在的应用中的研究领域。事实上,通过像谷歌和Facebook的服务流数据的增加的可用性已经提高了推理引擎与高速率数据变化应对的必要性。近日,基于规则的形式主义{\ EM LARS}的回答集语义下非单调推理流已经出台。在语法上,LARS程序与时间推理否定纳入运营商的逻辑程序,最值得注意的是{\ EM窗口运营}选择相应的时间点。不幸的是,预选{\ EM固定}对于程序的语义评价的时间间隔,LARS方案的刚性语义不够灵活,以{\ EM建设性}应对快速变化的数据依赖关系。此外,我们表明,在确定FLP方面LARS的回答集语义约简导致类似于其他ASP扩展不良圆形的理由。本文修复所有的LARS的上述缺点。更确切地说,贡献我们的流推理的基础由LARS的一个完全灵活的变体提供了一个操作的固定点语义和我们证明了我们的语义是正确和具有建设性的意义上,答案集推导自下而上和免费的圆理。

59. Wake Word Detection with Alignment-Free Lattice-Free MMI [PDF] 返回目录
  Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
Abstract: Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word; (ii) we show that the classical keyword/filler model must be supplemented with an explicit non-speech (silence) model for good performance; (iii) we present an FST-based decoder to perform online detection. We evaluate our methods on two real data sets, showing 80%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures, and re-validate them on a third (large) data set.
摘要:始终对语言的接口,例如个人数字助理,依靠唤醒词开始处理语音输入。我们本发明的新颖的方法来训练的混合DNN / HMM唤醒字检测系统从局部标记的训练数据,以及在在线应用使用:(ⅰ)除去的帧级比对的先决条件在LF-MMI训练算法,允许使用的那些只用于存在/不存在唤醒字的注释的未转录的训练实例; (ⅱ),我们表明,古典关键字/填料模型必须补充有一个明确的非语音(无声)模型性能好; (ⅲ),我们提出一个基于FST-解码器执行的在线检测。我们评估我们在两个真实数据集的方法,显示出80% - 在错误拒绝率降低90%,在预先指定的误报率较此前最好公布的数据,并重新验证它们在第三(大)的数据集。

60. A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer [PDF] 返回目录
  Vladimir Iashin, Esa Rahtu
Abstract: Dense video captioning aims to localize and describe important events in untrimmed videos. Existing methods mainly tackle this task by exploiting only visual features, while completely neglecting the audio track. Only a few prior works have utilized both modalities, yet they show poor results or demonstrate the importance on a dataset with a specific domain. In this paper, we introduce Bi-modal Transformer which generalizes the Transformer architecture for a bi-modal input. We show the effectiveness of the proposed model with audio and visual modalities on the dense video captioning task, yet the module is capable to input any two modalities in a sequence-to-sequence task. We show that the pre-training a bi-modal encoder along with a bi-modal decoder for captioning can be used as a feature extractor for a simple proposal generation module. The performance is demonstrated on a challenging ActivityNet Captions dataset where our model achieves outstanding performance.
摘要:密集的视频字幕的目标定位和描述修剪视频的重要事件。现有的方法主要是通过利用唯一的视觉功能,解决这一任务,而忽略了完全的音频轨道。只有少数作品之前已经使用两种模式,但他们表现出不好的结果,或证明与特定域的数据集的重要性。在本文中,我们介绍了双峰变压器推广了变压器架构双模式输入。我们表明,该模型在密集的视频字幕任务的视听方式的有效性,但该模块能够输入任何两种模式在一个序列到序列的任务。我们表明,前培训与字幕双峰解码器沿双峰编码器可以作为一个简单的方案生成模块特征提取。性能演示了一个具有挑战性的ActivityNet字幕数据集,其中我们的模型实现了卓越的性能。

61. On the Combined Use of Extrinsic Semantic Resources for Medical Information Search [PDF] 返回目录
  Mohammed Maree, Israa Noor, Khaled Rabayah, Mohammed Belkhatir, Saadat M. Alhashmi
Abstract: Semantic concepts and relations encoded in domain-specific ontologies and other medical semantic resources play a crucial role in deciphering terms in medical queries and documents. The exploitation of these resources for tackling the semantic gap issue has been widely studied in the literature. However, there are challenges that hinder their widespread use in real-world applications. Among these challenges is the insufficient knowledge individually encoded in existing medical ontologies, which is magnified when users express their information needs using long-winded natural language queries. In this context, many of the users query terms are either unrecognized by the used ontologies, or cause retrieving false positives that degrade the quality of current medical information search approaches. In this article, we explore the combination of multiple extrinsic semantic resources in the development of a full-fledged medical information search framework to: i) highlight and expand head medical concepts in verbose medical queries (i.e. concepts among query terms that significantly contribute to the informativeness and intent of a given query), ii) build semantically enhanced inverted index documents, iii) contribute to a heuristical weighting technique in the query document matching process. To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF eHealth 2014 dataset. Findings indicate that the proposed method combining several extrinsic semantic resources proved to be more effective than related approaches in terms of precision measure.
摘要:在特定领域的本体和其他医疗资源的语义编码语义概念和关系发挥破译医疗查询和文档方面具有至关重要的作用。这些资源用于解决语义差距问题的开发已经得到广泛的研究文献。不过,也有阻碍其在实际应用中广泛使用的挑战。在这些挑战是知识不足现有的医疗本体单独编码,当用户使用表达啰嗦的自然语言查询自己的信息需求,这被放大。在此背景下,许多用户查询词的要么无法识别通过所使用的本体,也导致检索降解当前医疗信息检索的质量接近误报。在这篇文章中,我们探索的多重外在语义资源组合在一个完整的医疗信息检索框架的发展:1)突出和扩大查询词中以详细的医学询问头部医学概念(即概念显著有助于信息量和意图的给定查询的),ⅱ)构建语义增强倒排索引文件,III)向在查询文档匹配处理的启发式加权技术。为了证明该方法的有效性,我们在CLEF电子健康2014的数据集进行了多次实验。调查结果表明,该方法相结合被证明是比在精确测量方面的相关办法更有效一些外在的语义资源。

62. Dual Learning: Theoretical Study and an Algorithmic Extension [PDF] 返回目录
  Zhibing Zhao, Yingce Xia, Tao Qin, Lirong Xia, Tie-Yan Liu
Abstract: Dual learning has been successfully applied in many machine learning applications including machine translation, image-to-image transformation, etc. The high-level idea of dual learning is very intuitive: if we map an $x$ from one domain to another and then map it back, we should recover the original $x$. Although its effectiveness has been empirically verified, theoretical understanding of dual learning is still very limited. In this paper, we aim at understanding why and when dual learning works. Based on our theoretical analysis, we further extend dual learning by introducing more related mappings and propose multi-step dual learning, in which we leverage feedback signals from additional domains to improve the qualities of the mappings. We prove that multi-step dual learn-ing can boost the performance of standard dual learning under mild conditions. Experiments on WMT 14 English$\leftrightarrow$German and MultiUNEnglish$\leftrightarrow$French translations verify our theoretical findings on dual learning, and the results on the translations among English, French, and Spanish of MultiUN demonstrate the effectiveness of multi-step dual learning.
摘要:双学习已在许多机器学习应用,包括机器翻译,图像到图像变换等双重学习的高层次的想法得到了成功应用是非常直观的:如果我们绘制$ x $从一个域到另一个然后映射回来,我们应该恢复原来的$ X $。尽管其有效性已经被经验证实,双重学习的理论认识还非常有限。在本文中,我们的目标是理解为什么当双学习的作品。根据我们的理论分析,我们进一步引入更多相关映射延长双重学习,并提出多步双学习,其中来自其他域,我们利用反馈信号来提高映射的质量。我们证明了多步双学习可以促进温和条件下标准的双学习的性能。在WMT实验14英语$ \ leftrightarrow $德国和MultiUNEnglish $ \ leftrightarrow $法语翻译核实之中英语,法语翻译我们的双重学习理论的研究结果,结果,和MultiUN西班牙演示多步双学习的成效。

63. #Coronavirus or #Chinesevirus?!: Understanding the negative sentiment reflected in Tweets with racist hashtags across the development of COVID-19 [PDF] 返回目录
  Xin Pei, Deval Mehta
Abstract: Situated in the global outbreak of COVID-19, our study enriches the discussion concerning the emergent racism and xenophobia on social media. With big data extracted from Twitter, we focus on the analysis of negative sentiment reflected in tweets marked with racist hashtags, as racism and xenophobia are more likely to be delivered via the negative sentiment. Especially, we propose a stage-based approach to capture how the negative sentiment changes along with the three development stages of COVID-19, under which it transformed from a domestic epidemic into an international public health emergency and later, into the global pandemic. At each stage, sentiment analysis enables us to recognize the negative sentiment from tweets with racist hashtags, and keyword extraction allows for the discovery of themes in the expression of negative sentiment by these tweets. Under this public health crisis of human beings, this stage-based approach enables us to provide policy suggestions for the enactment of stage-specific intervention strategies to combat racism and xenophobia on social media in a more effective way.
摘要:位于COVID-19全球爆发,我们的研究丰富有关紧急种族主义和仇外心理在社交媒体上的讨论。随着从Twitter提取大数据,我们着重体现在标有种族主义井号标签的tweets负面情绪的分析,种族主义和仇外心理更容易通过消极情绪交付。特别是,我们提出了基于阶段的方法来捕获与COVID-19的三个发展阶段,在它从国内流行转化为国际突发公共卫生事件,后来,进入全球大流行沿负面情绪如何变化。在每个阶段,情感分析使我们能够从鸣叫认负情绪种族主义主题标签和关键字提取允许主题的负面情绪的表达,发现这些鸣叫。在人类的这个公共健康危机,这种基于阶段的方法使我们能够提供更有效的方式为阶段特异性干预策略的制定打击种族主义和仇外心理在社会媒体政策建议。

64. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [PDF] 返回目录
  Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Abstract: Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages - an encouraging result for the low-resource speech community.
摘要:只有世界语言的少数丰富与实现的语音处理技术的实际应用中的资源。其中一个解决这个问题的一种方法是使用现有的其他语言资源培训多语言自动语音识别(ASR)模型,该模型,直观,应该学习一些通用的语音表述。在这项工作中,我们着眼于取得的这些表述如何可能一般是更深的了解,以及如何个别手机都在多语言环境得到改善。为此,我们选择一个音素多样化的语言,并进行了一系列的单语,多语言和crosslingual(零次)实验。该ASR被训练识别国际音标(IPA)令牌序列。我们观察到多语言设置,并在crosslingual设置,其中的模型,其他错误中,认为作为爪哇声调语言鲜明的降解在所有语言显著的改善。值得注意的是,短短10小时内目标语言的训练数据大大降低了ASR错误率。我们的分析发现,即使是唯一的单一语言的手机都可以从其他语言增加训练数据大大受益 - 一个令人鼓舞的结果对于低资源言语社区。

65. Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models [PDF] 返回目录
  Qiang Huang, Thomas Hain
Abstract: Many applications of speech technology require more and more audio data. Automatic assessment of the quality of the collected recordings is important to ensure they meet the requirements of the related applications. However, effective and high performing assessment remains a challenging task without a clean reference. In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism. The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features. To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds. In our experiments, two tasks are explored. The first task is to predict an utterance quality score, and the second is to identify where an anomalous distortion takes place in a recording. The obtained results show that the use of our proposed approach outperforms a strong baseline method and gains about 5% improvements after being measured by three metrics, Linear Correlation Coefficient and Spearman Rank Correlation Coefficient, and F1.
摘要:语音技术的许多应用需要越来越多的音频数据。所收集的录音质量的自动评估是非常重要的,以确保满足相关应用的需求。然而,有效的和高性能的评估仍没有一个干净的参考一项艰巨的任务。在本文中,对音频质量评估的一个新的模式是由共同使用双向长短期记忆和注意力的机制提出。前者是从记录学习信息模仿人类听觉感知能力,而后者是从所希望的信号进一步判别干扰通过突出显示目标相关的功能。为了评估我们提出的方法,在TIMIT数据集的使用和与各种自然的声音混合增强。在我们的实验中,两个任务进行了探索。第一项任务是预测的发声质量得分,并且第二个是,以标识异常失真发生在一个记录。将所得到的结果表明,使用我们提出的方法的由三个度量,线性相关系数和Spearman秩相关系数,和F1被测量后优于约5%的改善强烈基线方法和增益。

66. Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory [PDF] 返回目录
  Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang
Abstract: Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. However, it requires access to the full sequence, and thecomputational cost grows quadratically with respect to the in-put sequence length. These factors limit its adoption for stream-ing applications. In this work, we proposed a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories. The memory bankstores the embedding information for all the processed seg-ments. On the librispeech benchmark, our proposed methodoutperforms all the existing streamable transformer methods bya large margin and achieved over 15% relative error reduction,compared with the widely used LC-BLSTM baseline. Our find-ings are also confirmed on some large internal datasets.
摘要:基于变压器的声学建模有两种混合动力和序列到序列语音识别程序。 - 重刑取得了巨大的SUC-塞斯。但是,它需要访问完整的序列,并thecomputational成本相对于中,把序列长度平方增长。这些因素限制了其采用的流ing应用。在这项工作中,我们提出了一种新augmentedmemory自我关注,这对参加序列theinput的一个短段和回忆的银行。内存bankstores所有加工SEG-发言:嵌入信息。在librispeech基准,我们提出的methodoutperforms所有现有的可流变压器方法BYA大比分并在相对误差减少了15%,实现与广泛使用的LC-BLSTM比较基准。我们发现,英格斯也证实了一些大的内部数据集。

67. Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation [PDF] 返回目录
  Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-yi Lee
Abstract: Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many institutes from building multi-speaker TTS systems of great performance. In this work, we propose a semi-supervised learning approach for multi-speaker TTS. A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation. The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices. We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy. In addition, our analysis reveals that different speaker characteristics of the paired data have an impact on the effectiveness of semi-supervised TTS.
摘要:近日,在那里有很多高质量的语音以及其相应的改编可用的局面结束到终端的多喇叭文本到语音转换(TTS)系统增益成功。然而,费力的配对数据收集过程阻止许多机构从建立的伟大业绩多扬声器系统TTS。在这项工作中,我们提出了多扬声器TTS一个半监督学习方法。一种多扬声器TTS模型可以从经由与离散的语音表示所提出的编码器 - 解码器框架中的未转录音频学习。实验结果表明,只有一对语音数据的一个小时,不管配对数据从多个扬声器或单个扬声器,该模型可以生成不同的声音清晰的语音。我们发现即使当不成对语音数据的一部分,是嘈杂的模型可以从提出的半监督学习方法中获益。此外,我们的分析表明,配对数据不同的扬声器特性对半监督TTS的有效性产生影响。

68. AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition [PDF] 返回目录
  Afroz Ahamad, Ankit Anand, Pranesh Bhargava
Abstract: Modern Automatic Speech Recognition (ASR) technology has evolved to identify the speech spoken by native speakers of a language very well. However, identification of the speech spoken by non-native speakers continues to be a major challenge for it. In this work, we first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems. We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us, and a compilation of samples from 4 native-English, and a metropolitan Indian-English accent. We also present an analysis on separability of the collected accent data. Further, we present several accent classification models and evaluate them thoroughly against human-labelled accent classes. We test the generalization of our classifier models in a variety of setups of seen and unseen data. Finally, we introduce the task of accent neutralization of non-native accents to native accents using autoencoder models with task-specific architectures. Thus, our work aims to aid ASR systems at every stage of development with a database for training, classification models for feature augmentation, and neutralization systems for acoustic transformations of non-native accents of English.
摘要:现代自动语音识别(ASR)技术的发展,以确定由语言的母语发言非常出色的演讲。然而,非母语发言的讲话识别仍然是它的一大挑战。在这项工作中,我们首先阐明了在非母语口音训练创建语音样本的精心策划数据库和测试强大的ASR系统的关键要求。然后,我们介绍AccentDB,这样一个数据库,它包含我们收集4的印度口音样本和样品的4英语为母语的汇编和大城市印度的英语口音。我们还收集到的重音数据的可分性提出了一个分析。此外,我们提出几个口音分类模型和对人标记口音类彻底评估。我们测试我们的分类模式的推广在各种可见和不可见的数据设置的。最后,我们引进非本地口音使用的自动编码型号任务的具体架构本地口音的口音中和的任务。因此,我们的工作目标在发展的每一个阶段,帮助ASR系统的培训,分类模型功能的增强,以及对英语的非本地口音的声音变换中和系统的数据库。

69. Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss [PDF] 返回目录
  Burin Naowarat, Thananchai Kongthaworn, Korrawe Karunratanakul, Sheng Hui Wu, Ekapol Chuangsuwanich
Abstract: Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme duplication, resulting in language-inconsistent spellings. We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies of a character-based non-autoregressive ASR which allows for faster inference. The CCTC loss conditions the main prediction on the predicted contexts to ensure language consistency in the spellings. In contrast to existing CTC-based approaches, CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path. Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.
摘要:语码转换(CS)仍然是自动语音识别(ASR),尤其是基于字符的模式提出了挑战。随着字符的多种语言的组合选择,从音素重复基于字符的模型遭罪,导致语言的拼写不一致的结果。我们建议联结语境化时空分类(CCTC)的损失,以鼓励基于字符的非自回归ASR,允许更快的推理拼写一致性。所述CCTC损失条件的主要预测所预测的上下文,以确保在拼写语言的一致性。与此相反,以现有的基于CTC-方法,CCTC损失不要求帧级比对,因为上下文地面实况从模型的估计的路径获得。相比与普通CTC损失训练的同型号,我们的方法不断改善对CS和单语语料库的ASR性能。

70. Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition [PDF] 返回目录
  Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen
Abstract: Non-autoregressive transformer models have achieved extremely fast inference speed and comparable performance with autoregressive sequence-to-sequence models in neural machine translation. Most of the non-autoregressive transformers decode the target sequence from a predefined-length mask sequence. If the predefined length is too long, it will cause a lot of redundant calculations. If the predefined length is shorter than the length of the target sequence, it will hurt the performance of the model. To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence. All the experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the proposed model can accurately predict the length of the target sequence and achieve a competitive performance with the advanced transformers. What's more, the model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
摘要:非自回归模型的变压器已经实现极快的推理速度和神经机器翻译自回归序列到序列模型相媲美的性能。大多数非自回归变压器解码从预定义长度的掩码序列与靶序列。如果预定长度过长,就会造成大量的冗余计算。如果预定义的长度比目标序列的长度越短,它会伤害模型的性能。为了解决这个问题,提高了推理速度,我们提出了终端到终端的语音识别,它引入了一个CTC模块预测目标序列的长度和加速收敛穗触发非自回归变压器模型。所有的实验都在一个公共的中国柑橘数据集AISHELL-1进行。结果表明,该模型能够准确预测目标序列的长度,实现了先进的变压器有竞争力的表现。更重要的是,该机型甚至达到0.0056实时因素,这超过了所有主流的语音识别模型。

71. Oscillating Statistical Moments for Speech Polarity Detection [PDF] 返回目录
  Thomas Drugman, Thierry Dutoit
Abstract: An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing. An automatic method for determining the speech polarity (which is dependent upon the recording setup) is thus required as a preliminary step for ensuring the well-behaviour of such techniques. This paper proposes a new approach of polarity detection relying on oscillating statistical moments. These moments have the property to oscillate at the local fundamental frequency and to exhibit a phase shift which depends on the speech polarity. This dependency stems from the introduction of non-linearity or higher-order statistics in the moment calculation. The resulting method is shown on 10 speech corpora to provide a substantial improvement compared to state-of-the-art techniques.
摘要:语音极性的反转可能对语音处理的各种技术的性能的显着不利影响。因此用于确定语音极性(其是依赖于记录的设置)的自动方法是必需的,作为确保这样的技术的公行为的预备步骤。本文提出了极性检测依靠振荡统计矩的一种新方法。这些时刻具有在当地基频振荡和表现出的相移依赖于语音极性的性质。这种依赖性在矩计算引入的非线性或高阶统计量的茎。将所得的方法是在10个语音语料库所示相比状态的最先进的技术,以提供显着改善。

72. Glottal Source Estimation using an Automatic Chirp Decomposition [PDF] 返回目录
  Thomas Drugman, Baris Bozkurt, Thierry Dutoit
Abstract: In a previous work, we showed that the glottal source can be estimated from speech signals by computing the Zeros of the Z-Transform (ZZT). Decomposition was achieved by separating the roots inside (causal contribution) and outside (anticausal contribution) the unit circle. In order to guarantee a correct deconvolution, time alignment on the Glottal Closure Instants (GCIs) was shown to be essential. This paper extends the formalism of ZZT by evaluating the Z-transform on a contour possibly different from the unit circle. A method is proposed for determining automatically this contour by inspecting the root distribution. The derived Zeros of the Chirp Z-Transform (ZCZT)-based technique turns out to be much more robust to GCI location errors.
摘要:在先前的工作中,我们表明,声门源可以从语音信号通过计算Z-变换(ZZT)的零点来估计。分解物通过分离内(因果投稿)和外部(反因果投稿)单位圆上的根获得。为了保证在声门闭合即食一个正确解卷积,时间对准(GCIS)被证明是必要的。本文通过评估上的轮廓从单位圆可能不同的Z变换延伸ZZT的形式主义。一种方法,提出了一种用于通过检查根分配自动确定该轮廓。线性调频z变换(ZCZT)基技术的派生零点原来是更健壮到GCI位置误差。

73. Large scale weakly and semi-supervised learning for low-resource video ASR [PDF] 返回目录
  Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Abstract: Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems. On the challenging task of transcribing social media videos in low-resource conditions, we conduct a large scale systematic comparison between two self-labeling methods on one hand, and weakly-supervised pretraining using contextual metadata on the other. We investigate distillation methods at the frame level and the sequence level for hybrid, encoder-only CTC-based, and encoder-decoder speech recognition systems on Dutch and Romanian languages using 27,000 and 58,000 hours of unlabeled audio respectively. Although all approaches improved upon their respective baseline WERs by more than 8%, sequence-level distillation for encoder-decoder models provided the largest relative WER reduction of 20% compared to the strongest data-augmented supervised baseline.
摘要:许多半和弱监督的方法已被研究用于克服建设高质量的语音识别系统的标签成本。在资源匮乏的条件下转录社交媒体视频的具有挑战性的任务,我们一方面开展两个自标记方法之间的大规模系统的比较,和弱监督使用的其他上下文元训练前。我们调查使用27000和58000小时未标记的音频的分别在帧级蒸馏方法和用于混合序列水平,编码器仅基于CTC和荷兰语编码器 - 解码器的语音识别系统和罗马尼亚语。虽然所有的方法在它们各自的基线WERS超过8%的提高,对于编码器 - 解码器模型序列级蒸馏相比最强的数据增强监督基线提供最大相对WER减少20%。

74. Speaker Re-identification with Speaker Dependent Speech Enhancement [PDF] 返回目录
  Yanpei Shi, Qiang Huang, Thomas Hain
Abstract: While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
摘要:虽然使用深层神经网络已显著提高说话人识别性能,它仍然是具有挑战性的分离在恶劣的声学环境的扬声器。这里语音增强方法传统上允许改进的性能。最近的工作表明,适应语音增强可能导致进一步的收益。本文介绍一种新的方法是级联语音增强和说话人识别。在第一个步骤中,生成一个扬声器嵌入矢量,其在第二步骤中使用,以提高语音质量和重新标识扬声器。模特们在联合优化的集成框架的培训。所提出的方法是使用Voxceleb1数据集,其目的是评估在真实世界的情况下,说话人识别评估。此外这项工作中加入三种不同的信号噪声比的噪声。得到的结果表明,该方法使用扬声器依赖语音增强可以产生更好的说话人识别和语音增强的性能比在各种噪声条件下两个基线。

75. Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification [PDF] 返回目录
  Yanpei Shi, Qiang Huang, Thomas Hain
Abstract: Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of a hierarchical structure, consisting of a frame-level encoder and a segment-level encoder, aims to learn speaker related information locally and globally. Speech streams are segmented into fragments. The frame-level encoder with attention learns features and highlights the target related frames locally, and output a fragment based embedding. The segment-level encoder works with a second attention layer to emphasize the fragments probably related to target speakers. The global information is finally collected from segment-level module to predict speakers via a classifier. To evaluate the effectiveness of the proposed approach, artificial datasets based on Switchboard Cellular part1 (SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices are overlapped and not overlapped. Comparing to two baselines the obtained results show that the proposed approach can achieve better performances. Moreover, further experiments are conducted to evaluate the impact of utterance segmentation. The results show that a reasonable segmentation can slightly improve identification performances.
摘要:不知道识别多个扬声器,其中扬声器的声音是一个记录是一项具有挑战性的任务。在本文中,分层网络的关注,提出解决弱标记说话人识别的问题。使用分层结构,包括一个帧级编码器和分段级编码器的,目的是在本地和全球学习扬声器有关的信息。语音流被分段成片段。注意力获悉特色和亮点帧级编码器中的目标相关的帧局部,并输出基于片段的嵌入。段等级编码器的工作原理与第二层的关注,强调可能与目标扬声器的片段。全球信息最后从段级模块收集经由分类预测扬声器。为了评估该方法的有效性,基于蜂窝总机第一部分(SWBC)和Voxceleb1人工数据集在两个条件,在扬声器的声音重叠和不重叠的构造。比较这两个基准得到的结果表明,该方法可以实现更好的性能。此外,进一步的实验中进行评估发声分割的影响。结果表明,合理的分割可以稍微提高识别性能。

76. Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions [PDF] 返回目录
  Zhuohao Chen, Nikolaos Flemotomos, Victor Ardulov, Torrey A. Creed, Zac E. Imel, David C. Atkins, Shrikanth Narayanan
Abstract: Cognitive Behavioral Therapy (CBT) is a goal-oriented psychotherapy for mental health concerns implemented in a conversational setting with broad empirical support for its effectiveness across a range of presenting problems and client populations. The quality of a CBT session is typically assessed by trained human raters who manually assign pre-defined session-level behavioral codes. In this paper, we develop an end-to-end pipeline that converts speech audio to diarized and transcribed text and extracts linguistic features to code the CBT sessions automatically. We investigate both word-level and utterance-level features and propose feature fusion strategies to combine them. The utterance level features include dialog act tags as well as behavioral codes drawn from another well-known talk psychotherapy called Motivational Interviewing (MI). We propose a novel method to augment the word-based features with the utterance level tags for subsequent CBT code estimation. Experiments show that our new fusion strategy outperforms all the studied features, both when used individually and when fused by direct concatenation. We also find that incorporating a sentence segmentation module can further improve the overall system given the preponderance of multi-utterance conversational turns in CBT sessions.
摘要:认知行为疗法(CBT)是在与它在一系列问题提出和客户群体的有效性广泛经验支持对话设置来实现心理健康的关注面向目标的心理治疗。一个CBT会话的质量通常是通过谁手动分配预先定义的会话级的行为准则培训的人工评级进行评估。在本文中,我们开发一个终端到端到端管道,其将语音音频diarized和转录文本和提取物的语言功能的代码CBT会话自动。我们研究这两个字级和话语层面的特点,并提出特征融合策略将它们组合在一起。话语级功能包括对话行为标签,以及来自所谓的动机性访谈(MI)另一个著名脱口秀心理绘制的行为准则。我们提出了一个新的方法以供后续CBT代码估计话语层面的标签,以增加基于词的特点。实验结果表明,新的融合策略优于所有研究的功能,无论是单独使用,当通过直接串联融合的时候。我们还发现,结合句子分割模块,可进一步提高定的多话语谈话轮的优势在CBT会话的整个系统。

77. JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment [PDF] 返回目录
  Dan Lim, Won Jang, Gyeonghwan O, Hyeyeong Park, Bongwan Kim, Jesam Yoon
Abstract: We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text. In this work, inspired by the recent success of the duration informed networks such as FastSpeech and DurIAN, we further simplify its sequential, two-stage training pipeline to a single-stage training. Specifically, we extract the phoneme duration from the autoregressive Transformer on the fly during the joint training instead of pretraining the autoregressive model and using it as a phoneme duration extractor. To our best knowledge, it is the first implementation to jointly train the feed-forward Transformer without relying on a pre-trained phoneme duration extractor in a single training pipeline. We evaluate the effectiveness of the proposed model on the publicly available Korean Single speaker Speech (KSS) dataset compared to the baseline text-to-speech (TTS) models trained by ESPnet-TTS.
摘要:本文提出知情变压器(JDI-T)联合训练的时间,前馈变压器与持续时间的预测没有明确的路线,以生成输入文本的声学特征序列联合训练。在这项工作中,受近期持续的成功的启发通知网络如FastSpeech和榴莲,我们进一步简化其顺序,两个阶段的训练管道单级训练。具体来说,我们的联合训练,而不是训练前的自回归模型,并使用它作为音素时长提取过程中提取对飞自回归变压器的音素时长。据我们所知,这是第一个实现共同训练前馈变压器没有一个单一的训练管线依靠预先训练音素持续时间提取。我们评估了模型的公开提供的韩国单身者的语音(KSS)的数据集比较基准文本到语音转换(TTS)模型由ESPnet-TTS训练的有效性。

注:中文为机器翻译结果!