0%

【arxiv论文】 Computation and Language 2020-03-05

目录

1. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [PDF] 摘要
2. Data Augmentation using Pre-trained Transformer Models [PDF] 摘要
3. Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification [PDF] 摘要
4. Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation [PDF] 摘要
5. Sequential Neural Networks for Noetic End-to-End Response Selection [PDF] 摘要
6. Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network [PDF] 摘要
7. Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks [PDF] 摘要
8. SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification [PDF] 摘要
9. HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics [PDF] 摘要
10. AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment [PDF] 摘要
11. GraphTTS: graph-to-sequence modelling in neural text-to-speech [PDF] 摘要
12. On Emergent Communication in Competitive Multi-Agent Teams [PDF] 摘要
13. Discover Your Social Identity from What You Tweet: a Content Based Approach [PDF] 摘要
14. Untangling in Invariant Speech Recognition [PDF] 摘要
15. Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data [PDF] 摘要
16. Towards Real-time Mispronunciation Detection in Kids' Speech [PDF] 摘要

摘要

1. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [PDF] 返回目录
  Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman
Abstract: We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark tasks. We demonstrate that jiant reproduces published performance on a variety of tasks and models, including BERT and RoBERTa. jiant is available at this https URL.
摘要:介绍jiant,一个开源工具包进行英语自然语言理解任务,多任务和迁移学习实验。 jiant使模块化和配置驱动的实验与国家的最先进的车型,并实施一系列针对探测任务,传递学习和多任务训练实验。 50个NLU任务jiant工具,包括所有的胶水,强力胶基准任务。我们证明在各种不同的任务和模型,包括BERT和罗伯塔公布业绩的是jiant再现。 jiant可在此HTTPS URL。

2. Data Augmentation using Pre-trained Transformer Models [PDF] 返回目录
  Varun Kumar, Ashutosh Choudhary, Eunah Cho
Abstract: Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.
摘要:语言模型基于预先训练模式,如BERT提供了在不同的NLP任务显著的收益。在本文中,我们研究了不同类型的预训练的基于变压器的模型,比如有条件数据增强自回归模型(GPT-2),自动编码器模型(BERT),以及seq2seq模型(BART)。我们发现,在前面加上类标签文本序列提供了一个简单而有效的方法来调理增强数据预先训练的模式。三个分类基准,预先训练Seq2Seq模型优于其他车型。此外,我们探讨,如何条件不同的预先训练的基于模型的数据增强不同数据的多样性,以及如何以及这种方法保留类的标签信息。

3. Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification [PDF] 返回目录
  Hsin-Ping Huang, Junyi Jessy Li
Abstract: Implicit discourse relations are not only more challenging to classify, but also to annotate, than their explicit counterparts. We tackle situations where training data for implicit relations are lacking, and exploit domain adaptation from explicit relations (Ji et al., 2015). We present an unsupervised adversarial domain adaptive network equipped with a reconstruction component. Our system outperforms prior works and other adversarial benchmarks for unsupervised domain adaptation. Additionally, we extend our system to take advantage of labeled data if some are available.
摘要:隐性语篇关系不仅更具挑战性进行分类,而且还注释,比他们的同行明确。我们解决了隐含关系训练数据匮乏的地方的情况,并利用从显性关系领域适应性(Ji等人,2015年)。我们提出装备有一重建成分的无监督对抗性域自适应网络。我们的系统优于之前的作品和无监督领域适应性等对抗性的基准。此外,我们扩展我们的系统采取标记数据的优势,如果一些可用。

4. Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation [PDF] 返回目录
  Hongzheng Li, Heyan Huang
Abstract: Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be helpful for improving the performance of translation effectively, especially for low-resource scenarios. While most works related to BT mainly focus on European languages, few of them study languages in other areas around the world. In this paper, we investigate the impacts of BT on Asia language translations between the extremely low-resource Chinese and Vietnamese language pair. We evaluate and compare the effects of different sizes of synthetic data on both NMT and Statistical Machine Translation (SMT) models for Chinese to Vietnamese and Vietnamese to Chinese, with character-based and word-based settings. Some conclusions from previous works are partially confirmed and we also draw some other interesting findings and conclusions, which are beneficial to understand BT further.
摘要:回译(BT)已被广泛使用,成为在神经机器翻译(NMT)数据增强标准技术之一,BT已经被证明是有效地提高翻译的性能,特别是对于低资源场景很有帮助。虽然与BT大多数作品主要集中在欧洲语言,他们几个人的研究在世界各地的其他地区语言。在本文中,我们研究了BT对亚洲语言翻译极低的资源的中国和越南的语言对之间的影响。我们评价和比较不同的尺寸上都NMT和统计机器翻译(SMT)模型对中国越南和越南到中国的综合数据,与基于词的基于字符和设置的影响。从以前的作品有些结论部分证实,我们也借鉴了其他一些有趣的发现和结论,这有利于进一步了解BT。

5. Sequential Neural Networks for Noetic End-to-End Response Selection [PDF] 返回目录
  Qian Chen, Wen Wang
Abstract: The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper presents our systems that are ranked top 1 on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchy-based (utterance-level and token-level) neural networks to explicitly model the interactions among different turns' utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking top 1 in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, on two large-scale public multi-turn response selection benchmark datasets.
摘要:智力的终端到终端的反应选择的挑战,因为在第7对话系统的技术挑战(DSTC7)目标之一的轨道,推动艺术的真实世界面向目标的对话系统,其参与者需要话语分类的状态从一组的多匝背景候选中选择正确的下话语。本文介绍了我们正在根据这一挑战排名榜首1在两个数据集中系统,一个专注和小(通知)和其他更加多样化和大型(Ubuntu的)。上一页国家的最先进的机型采用层次结构为基础的(话语级别和标记级别)神经网络,以明确的相互作用不同转弯话语的上下文建模中的模型。在本文中,我们调查仅仅基于多转响应选定链序列连续匹配模型。我们的研究结果表明,顺序匹配方法的潜力尚未完全过去,多转反应选择利用。除了挑战排名前1,该模型优于以前的所有型号,包括国家的最先进的基于层次的模型,两个大型公共多圈响应的选择标准数据集。

6. Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network [PDF] 返回目录
  Shaoxiong Feng, Hongshen Chen, Kan Li, Dawei Yin
Abstract: Neural conversational models learn to generate responses by taking into account the dialog history. These models are typically optimized over the query-response pairs with a maximum likelihood estimation objective. However, the query-response tuples are naturally loosely coupled, and there exist multiple responses that can respond to a given query, which leads the conversational model learning burdensome. Besides, the general dull response problem is even worsened when the model is confronted with meaningless response training instances. Intuitively, a high-quality response not only responds to the given query but also links up to the future conversations, in this paper, we leverage the query-response-future turn triples to induce the generated responses that consider both the given context and the future conversations. To facilitate the modeling of these triples, we further propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN), which consists of a forward and a backward generative discriminator to cooperatively encourage the generated response to be informative and coherent by two complementary assessment perspectives. Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation, which verifies the advantages of considering two assessment perspectives.
摘要:神经会话模型学会考虑对话历史产生响应。这些模型在用最大似然估计目标的查询和响应对优化典型。但是,查询响应元组自然松散耦合,并存在可以对给定的查询,从而导致会话模型学习负担响应多个响应。此外,当模型正面临着毫无意义的应对训练情况一般沉闷的响应问题甚至恶化。直观地说,一个高品质的响应,不仅响应给定的查询,但还链接到未来的对话,在本文中,我们利用查询响应,未来又将三元组诱导考虑给定的背景和双方产生的响应未来的对话。为了便于这些三元组的建模,我们进一步提出了一种基于生成对抗学习框架的新的编码器 - 解码器,后剖成对抗性网络(后路-GAN),其由前向和后向生成鉴别器的协同促进所产生的响应为内容丰富,由两个互补的评估观点一致。实验结果表明,该方法有效地提升在自动和人工评估,从而验证考虑两种评价视点的优势产生响应的信息量和连贯性。

7. Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks [PDF] 返回目录
  Ethan Fetaya, Yonatan Lifshitz, Elad Aaron, Shai Gordin
Abstract: The main source of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Despite being an invaluable resource, many tablets are fragmented leading to missing information. Currently these missing parts are manually completed by experts. In this work we investigate the possibility of assisting scholars and even automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia by modelling the language using recurrent neural networks.
摘要:信息关于古代美索不达米亚的历史和文化的主要来源是粘土楔形文字片。尽管是一个非常宝贵的资源,许多片支离破碎导致丢失的信息。目前,这些缺失的部分是由人工完成的专家。在这项工作中,我们协助调查的学者,甚至通过模拟使用递归神经网络的语言自动完成从阿契美尼德时期巴比伦古阿卡德文字,断裂的可能性。

8. SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification [PDF] 返回目录
  Changzeng Fu, Chaoran Liu, Carlos Toshinori Ishi, Yuichiro Yoshikawa, Hiroshi Ishiguro
Abstract: Text categorization is the task of assigning labels to documents written in a natural language, and it has numerous real-world applications including sentiment analysis as well as traditional topic assignment tasks. In this paper, we propose 5 different configurations for the semantic matrix-based memory neural network with end-to-end learning manner and evaluate our proposed method on two corpora of news articles (AG news, Sogou news). The best performance of our proposed method outperforms the baseline VDCNN models on the text classification task and gives a faster speed for learning semantics. Moreover, we also evaluate our model on small scale datasets. The results show that our proposed method can still achieve better results in comparison to VDCNN on the small scale dataset. This paper is to appear in the Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing (ICSC 2020), San Diego, California, 2020.
摘要:文本分类是分配标签写在一个自然语言文档的任务,它有许多现实世界的应用,包括情感分析,以及传统的话题分配任务。在本文中,我们提出了与终端到终端的学习方式,基于语义矩阵存储神经网络5个不同的配置和评估我们提出了新闻报道的两个语料库(AG新闻,搜狗新闻)方法。我们提出的方法的最佳性能优于对文本分类的任务基线VDCNN模型,并给出了学习语义更快的速度。此外,我们还评估了小规模的数据集模型。结果表明,该方法仍然可以取得更好的成绩相比,VDCNN在小规模数据集。本文是出现在语义计算的2020年IEEE第14届国际大会(ICSC 2020)的诉讼,圣迭戈,加利福尼亚州,2020年

9. HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics [PDF] 返回目录
  Pedro Alonso, Kumar Shridhar, Denis Kleyko, Evgeny Osipov, Marcus Liwicki
Abstract: Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, the memory reduction was about 100 times and train and test speed-ups were over 100 times. More importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting the strict dependency between the dimensionality of the representation and the parameters of n-gram statistics, thus, opening a room for tradeoffs.
摘要:深学的最新进展已经导致了一些NLP任务显著的性能提升,但是,模型越来越多的计算能力的要求。因此,本文铲球对NLP任务的高效计算算法的域。特别是,调查分布文本的n-gram统计表示。这些表示使用超维度计算启用了嵌入形成。这些表述则充当功能,这被用作输入标准分类。我们研究了嵌入在一个大的和三个小数据集标准对采用九个分类分类任务的适用性。看齐F1分数实现同时降低相比于常规的n-gram统计数倍的时间和内存需求,嵌入例如,用于在一个小数据集的分类器中的一个,所述存储器减少为6.18倍;而火车和测试速度起坐分别为4.62和3.84倍。有关大数据集众多分类,存储量减少了约100倍,培养和测试速度起坐均超过100倍。更重要的是,经由超维度计算形成分布表示的使用允许解剖表示的维数和n-gram中的统计参数之间的严格相关性,因此,开启了折衷的余地。

10. AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment [PDF] 返回目录
  Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao
Abstract: Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor.Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset show that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean option score (MOS), but also a high efficiency which is more than 50 times faster than real-time.
摘要:在两个高效率和性能定位,我们提出AlignTTS预测并行梅尔频谱。 AlignTTS是基于前馈变压器,其从字符序列生成梅尔频谱,并且确定每一个字符的持续时间被持续时间predictor.Instead采用在变压器TTS注意机制来对齐文本到梅尔谱,提出对准损失利用动态规划的考虑培训所有可能的路线。在LJSpeech数据集的实验表明我们的模型实现不仅是国家的最先进的,其平均选项得分(MOS)优于变压器TTS 0.03的表现,也是一种高效率的比实时更快的超过50倍。

11. GraphTTS: graph-to-sequence modelling in neural text-to-speech [PDF] 返回目录
  Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao
Abstract: This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms. The graphical inputs consist of node and edge representations constructed from input texts. The encoding of these graphical inputs incorporates syntax information by a GNN encoder module. Besides, applying the encoder of GraphTTS as a graph auxiliary encoder (GAE) can analyse prosody information from the semantic structure of texts. This can remove the manual selection of reference audios process and makes prosody modelling an end-to-end procedure. Experimental analysis shows that GraphTTS outperforms the state-of-the-art sequence-to-sequence models by 0.24 in Mean Opinion Score (MOS). GAE can adjust the pause, ventilation and tones of synthesised audios automatically. This experimental conclusion may give some inspiration to researchers working on improving speech synthesis prosody.
摘要:利用神经文本到语音(GraphTTS),该图形中嵌入输入序列谱图映射图对序列的方法。所述图形输入包括从输入的文本构成节点和边表示。这些图形输入的编码包含由GNN编码器模块的语法信息。此外,施加GraphTTS的编码器的图辅助编码器(GAE)可以分析从文本的语义结构韵律信息。这可以去除参考音频处理的手动选择和使韵律模型的端至端的过程。实验分析显示,GraphTTS 0.24在平均意见得分(MOS)优于状态的最先进的序列到序列的机型。 GAE可以自动调整合成音的暂停,通风和音调。这个实验的结论可能会提供一些灵感,在提高语音合成韵律工作的研究人员。

12. On Emergent Communication in Competitive Multi-Agent Teams [PDF] 返回目录
  Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur
Abstract: Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task. However, human populations learn to solve complex tasks involving communicative behaviors not only in fully cooperative settings but also in scenarios where competition acts as an additional external pressure for improvement. In this work, we investigate whether competition for performance from an external, similar agent team could act as a social influence that encourages multi-agent populations to develop better communication protocols for improved performance, compositionality, and convergence speed. We start from Task & Talk, a previously proposed referential game between two cooperative agents as our testbed and extend it into Task, Talk & Compete, a game involving two competitive teams each consisting of two aforementioned cooperative agents. Using this new setting, we provide an empirical study demonstrating the impact of competitive influence on multi-agent teams. Our results show that an external competitive influence leads to improved accuracy and generalization, as well as faster emergence of communicative languages that are more informative and compositional.
摘要:最近的一些作品已经发现接地组成的语言中所学到的端至端时,上下游任务,最大限度地提高性能的主要合作多智能体系统开发的通信协议的出现。然而,人类群体学会解决不仅涉及在完全合作的设置,而且在场景中的竞争作为改善额外的外部压力的交际行为,复杂的任务。在这项工作中,我们研究了来自外部,类似的代理团队绩效的竞争是否能为鼓励多主体人群制定,以改善性能,组合性和收敛速度更好的通信协议的社会影响作用。我们从任务与交流,为我们的测试平台两个互惠代理之间的先前提出的指称游戏开始,并延伸到任务,对话和竞争,涉及到两个有竞争力的球队分别由上述两个合作代理的游戏。使用这个新的设置,我们提供了一个实证研究表明竞争影响多代理团队的影响。我们的研究结果表明,在外部竞争的影响导致提高精度和泛化,以及更具信息和组成交际语言的更快出现。

13. Discover Your Social Identity from What You Tweet: a Content Based Approach [PDF] 返回目录
  Binxuan Huang, Kathleen M. Carley
Abstract: An identity denotes the role an individual or a group plays in highly differentiated contemporary societies. In this paper, our goal is to classify Twitter users based on their role identities. We first collect a coarse-grained public figure dataset automatically, then manually label a more fine-grained identity dataset. We propose a hierarchical self-attention neural network for Twitter user role identity classification. Our experiments demonstrate that the proposed model significantly outperforms multiple baselines. We further propose a transfer learning scheme that improves our model's performance by a large margin. Such transfer learning also greatly reduces the need for a large amount of human labeled data.
摘要:身份表示角色的个人或在高度分化当代社会的基团戏剧。在本文中,我们的目标是基于角色的身份Twitter用户进行分类。我们首先会自动收集粗粒度的公众人物数据集,然后手动标注更细粒度的身份数据集。我们提出了Twitter的用户角色身份分类分层的自我关注的神经网络。我们的实验表明,该模型显著优于多个基准。我们进一步建议,提高了一大截我们的模型的性能转移的学习方案。这种转移学习也大大减少了对大量的人力标记数据的需要。

14. Untangling in Invariant Speech Recognition [PDF] 返回目录
  Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung
Abstract: Encouraged by the success of deep neural networks on a variety of visual tasks, much theoretical and experimental work has been aimed at understanding and interpreting how vision networks operate. Meanwhile, deep neural networks have also achieved impressive performance in audio processing applications, both as sub-components of larger systems and as complete end-to-end systems by themselves. Despite their empirical successes, comparatively little is understood about how these audio models accomplish these tasks. In this work, we employ a recently developed statistical mechanical theory that connects geometric properties of network representations and the separability of classes to probe how information is untangled within neural networks trained to recognize speech. We observe that speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties such as words and phonemes are untangled in later layers. Higher level concepts such as parts-of-speech and context dependence also emerge in the later layers of the network. Finally, we find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation. Taken together, these findings shed light on how deep auditory models process time dependent input signals to achieve invariant speech recognition, and show how different concepts emerge through the layers of the network.
摘要:通过深层神经网络对各种视觉任务成功的鼓舞,许多理论和实验工作已经瞄准理解和解释视觉网络是如何运作的。同时,深层神经网络也取得了骄人的业绩在音频处理应用中,无论是作为较大系统的子组件和完整自己的终端到终端系统。尽管他们的成功经验,相对较少了解关于这些音频模式如何完成这些任务。在这项工作中,我们采用连接网络表示的几何特性和类探头的可分性信息如何被训练识别语音神经网络内解开的一个新近开发的统计力学理论。我们观察到的说话者特定滋扰变化是由网络的层次结构丢弃,而与任务相关的属性,如单词和音素在以后层解开。更高层次的概念,例如部件的词性和上下文依赖性也出现在网络的后面的层。最后,我们发现,深表示通过在计算中的每一步高效提取与任务相关的功能进行显著时间解开。总之,这些研究结果揭示听觉模型有多深加工时间相关的输入信号,以实现不变的语音识别,并显示不同的概念如何通过网络的各层出现光。

15. Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data [PDF] 返回目录
  Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier
Abstract: While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. This technique takes a frozen acoustic model trained on clean speech to provide valuable feedback to the enhancement model, even in the case where no parallel speech data is available. Our work is one of the first to show intelligibility improvement for neural enhancement systems without parallel speech data, and we show phonetic feedback can improve a state-of-the-art neural enhancement system trained with parallel speech data.
摘要:虽然深学习系统在语音增强的研究已经获得了显著地,这些系统还没有充分利用深度学习系统的全部潜能提供高层次的反馈。特别是,语音反馈是,即使它包含有价值的自上而下的信息,语音增强研究少见。我们使用模拟损失的技术来提供语音反馈关闭的,现成的增强系统,并找到磬-4数据的客观清晰度得分收益。这种技术需要训练有素的清洁讲话冻结的声学模型提供有价值的反馈,以增强模式,即使在没有并行语音数据是可用的情况下。我们的工作是第一次,以示对神经增强系统的清晰度提高无并行语音数据之一,我们将展示语音反馈可以改善与并行语音数据训练一个国家的最先进的神经增强系统。

16. Towards Real-time Mispronunciation Detection in Kids' Speech [PDF] 返回目录
  Peter Plantinga, Eric Fosler-Lussier
Abstract: Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network may be more appropriate. Teacher-student learning is a natural approach to use to improve a uni-directional model, but when using a CTC objective, this is limited by poor alignment of outputs to evidence. We address this limitation by trying two loss terms for improving the alignments of our models. One loss is an "alignment loss" term that encourages outputs only when features do not resemble silence. The other loss term uses a uni-directional model as teacher model to align the bi-directional model. Our proposed model uses these aligned bi-directional models as teacher models. Experiments on the CSLU kids' corpus show that these changes decrease the latency of the outputs, and improve the detection rates, with a trade-off between these goals.
摘要:现代读音错误检测与诊断系统,已经看到了精度显著收益由于引入深的学问。然而,这些系统目前尚未评估在实时运行的能力,在提供快速反馈应用的重要因素。特别地,国家的最先进的用途双向复发性网络,其中的单向网络可以是更合适的。教师与学生的学习是一种自然的方法来使用,以提高单向模式,但使用CTC的目标时,这是通过对证据的输出对准差的限制。我们应对努力为改善我们的模型的比对2点损失而言,这限制。一个亏损是一个“对准损失”一词,鼓励输出只有当功能并不像沉默。其他损耗项采用了单向模型作为教师模型对准双向模式。我们提出的模型使用这些对准双向模型作为教师的模型。在CSLU孩子的语料实验表明,这些变化降低了输出的延迟时间,提高检测率,这些目标之间的权衡。

注:中文为机器翻译结果!