
【arxiv论文】 Computation and Language 2020-03-05


1. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [PDF] 摘要
2. Data Augmentation using Pre-trained Transformer Models [PDF] 摘要
3. Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification [PDF] 摘要
4. Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation [PDF] 摘要
5. Sequential Neural Networks for Noetic End-to-End Response Selection [PDF] 摘要
6. Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network [PDF] 摘要
7. Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks [PDF] 摘要
8. SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification [PDF] 摘要
9. HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics [PDF] 摘要
10. AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment [PDF] 摘要
11. GraphTTS: graph-to-sequence modelling in neural text-to-speech [PDF] 摘要
12. On Emergent Communication in Competitive Multi-Agent Teams [PDF] 摘要
13. Discover Your Social Identity from What You Tweet: a Content Based Approach [PDF] 摘要
14. Untangling in Invariant Speech Recognition [PDF] 摘要
15. Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data [PDF] 摘要
16. Towards Real-time Mispronunciation Detection in Kids' Speech [PDF] 摘要


1. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [PDF] 返回目录
  Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman
Abstract: We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark tasks. We demonstrate that jiant reproduces published performance on a variety of tasks and models, including BERT and RoBERTa. jiant is available at this https URL.
摘要:介绍jiant,一个开源工具包进行英语自然语言理解任务,多任务和迁移学习实验。 jiant使模块化和配置驱动的实验与国家的最先进的车型,并实施一系列针对探测任务,传递学习和多任务训练实验。 50个NLU任务jiant工具,包括所有的胶水,强力胶基准任务。我们证明在各种不同的任务和模型,包括BERT和罗伯塔公布业绩的是jiant再现。 jiant可在此HTTPS URL。

2. Data Augmentation using Pre-trained Transformer Models [PDF] 返回目录
  Varun Kumar, Ashutosh Choudhary, Eunah Cho
Abstract: Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.

3. Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification [PDF] 返回目录
  Hsin-Ping Huang, Junyi Jessy Li
Abstract: Implicit discourse relations are not only more challenging to classify, but also to annotate, than their explicit counterparts. We tackle situations where training data for implicit relations are lacking, and exploit domain adaptation from explicit relations (Ji et al., 2015). We present an unsupervised adversarial domain adaptive network equipped with a reconstruction component. Our system outperforms prior works and other adversarial benchmarks for unsupervised domain adaptation. Additionally, we extend our system to take advantage of labeled data if some are available.

4. Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation [PDF] 返回目录
  Hongzheng Li, Heyan Huang
Abstract: Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be helpful for improving the performance of translation effectively, especially for low-resource scenarios. While most works related to BT mainly focus on European languages, few of them study languages in other areas around the world. In this paper, we investigate the impacts of BT on Asia language translations between the extremely low-resource Chinese and Vietnamese language pair. We evaluate and compare the effects of different sizes of synthetic data on both NMT and Statistical Machine Translation (SMT) models for Chinese to Vietnamese and Vietnamese to Chinese, with character-based and word-based settings. Some conclusions from previous works are partially confirmed and we also draw some other interesting findings and conclusions, which are beneficial to understand BT further.

5. Sequential Neural Networks for Noetic End-to-End Response Selection [PDF] 返回目录
  Qian Chen, Wen Wang
Abstract: The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper presents our systems that are ranked top 1 on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchy-based (utterance-level and token-level) neural networks to explicitly model the interactions among different turns' utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking top 1 in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, on two large-scale public multi-turn response selection benchmark datasets.

6. Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network [PDF] 返回目录
  Shaoxiong Feng, Hongshen Chen, Kan Li, Dawei Yin
Abstract: Neural conversational models learn to generate responses by taking into account the dialog history. These models are typically optimized over the query-response pairs with a maximum likelihood estimation objective. However, the query-response tuples are naturally loosely coupled, and there exist multiple responses that can respond to a given query, which leads the conversational model learning burdensome. Besides, the general dull response problem is even worsened when the model is confronted with meaningless response training instances. Intuitively, a high-quality response not only responds to the given query but also links up to the future conversations, in this paper, we leverage the query-response-future turn triples to induce the generated responses that consider both the given context and the future conversations. To facilitate the modeling of these triples, we further propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN), which consists of a forward and a backward generative discriminator to cooperatively encourage the generated response to be informative and coherent by two complementary assessment perspectives. Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation, which verifies the advantages of considering two assessment perspectives.
摘要:神经会话模型学会考虑对话历史产生响应。这些模型在用最大似然估计目标的查询和响应对优化典型。但是,查询响应元组自然松散耦合,并存在可以对给定的查询,从而导致会话模型学习负担响应多个响应。此外,当模型正面临着毫无意义的应对训练情况一般沉闷的响应问题甚至恶化。直观地说,一个高品质的响应,不仅响应给定的查询,但还链接到未来的对话,在本文中,我们利用查询响应,未来又将三元组诱导考虑给定的背景和双方产生的响应未来的对话。为了便于这些三元组的建模,我们进一步提出了一种基于生成对抗学习框架的新的编码器 - 解码器,后剖成对抗性网络(后路-GAN),其由前向和后向生成鉴别器的协同促进所产生的响应为内容丰富,由两个互补的评估观点一致。实验结果表明,该方法有效地提升在自动和人工评估,从而验证考虑两种评价视点的优势产生响应的信息量和连贯性。

7. Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks [PDF] 返回目录
  Ethan Fetaya, Yonatan Lifshitz, Elad Aaron, Shai Gordin
Abstract: The main source of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Despite being an invaluable resource, many tablets are fragmented leading to missing information. Currently these missing parts are manually completed by experts. In this work we investigate the possibility of assisting scholars and even automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia by modelling the language using recurrent neural networks.

8. SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification [PDF] 返回目录
  Changzeng Fu, Chaoran Liu, Carlos Toshinori Ishi, Yuichiro Yoshikawa, Hiroshi Ishiguro
Abstract: Text categorization is the task of assigning labels to documents written in a natural language, and it has numerous real-world applications including sentiment analysis as well as traditional topic assignment tasks. In this paper, we propose 5 different configurations for the semantic matrix-based memory neural network with end-to-end learning manner and evaluate our proposed method on two corpora of news articles (AG news, Sogou news). The best performance of our proposed method outperforms the baseline VDCNN models on the text classification task and gives a faster speed for learning semantics. Moreover, we also evaluate our model on small scale datasets. The results show that our proposed method can still achieve better results in comparison to VDCNN on the small scale dataset. This paper is to appear in the Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing (ICSC 2020), San Diego, California, 2020.
摘要:文本分类是分配标签写在一个自然语言文档的任务,它有许多现实世界的应用,包括情感分析,以及传统的话题分配任务。在本文中,我们提出了与终端到终端的学习方式,基于语义矩阵存储神经网络5个不同的配置和评估我们提出了新闻报道的两个语料库(AG新闻,搜狗新闻)方法。我们提出的方法的最佳性能优于对文本分类的任务基线VDCNN模型,并给出了学习语义更快的速度。此外,我们还评估了小规模的数据集模型。结果表明,该方法仍然可以取得更好的成绩相比,VDCNN在小规模数据集。本文是出现在语义计算的2020年IEEE第14届国际大会(ICSC 2020)的诉讼,圣迭戈,加利福尼亚州,2020年

9. HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics [PDF] 返回目录
  Pedro Alonso, Kumar Shridhar, Denis Kleyko, Evgeny Osipov, Marcus Liwicki
Abstract: Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, the memory reduction was about 100 times and train and test speed-ups were over 100 times. More importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting the strict dependency between the dimensionality of the representation and the parameters of n-gram statistics, thus, opening a room for tradeoffs.

10. AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment [PDF] 返回目录
  Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao
Abstract: Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor.Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset show that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean option score (MOS), but also a high efficiency which is more than 50 times faster than real-time.
摘要:在两个高效率和性能定位,我们提出AlignTTS预测并行梅尔频谱。 AlignTTS是基于前馈变压器,其从字符序列生成梅尔频谱,并且确定每一个字符的持续时间被持续时间predictor.Instead采用在变压器TTS注意机制来对齐文本到梅尔谱,提出对准损失利用动态规划的考虑培训所有可能的路线。在LJSpeech数据集的实验表明我们的模型实现不仅是国家的最先进的,其平均选项得分(MOS)优于变压器TTS 0.03的表现,也是一种高效率的比实时更快的超过50倍。

11. GraphTTS: graph-to-sequence modelling in neural text-to-speech [PDF] 返回目录
  Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao
Abstract: This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms. The graphical inputs consist of node and edge representations constructed from input texts. The encoding of these graphical inputs incorporates syntax information by a GNN encoder module. Besides, applying the encoder of GraphTTS as a graph auxiliary encoder (GAE) can analyse prosody information from the semantic structure of texts. This can remove the manual selection of reference audios process and makes prosody modelling an end-to-end procedure. Experimental analysis shows that GraphTTS outperforms the state-of-the-art sequence-to-sequence models by 0.24 in Mean Opinion Score (MOS). GAE can adjust the pause, ventilation and tones of synthesised audios automatically. This experimental conclusion may give some inspiration to researchers working on improving speech synthesis prosody.
摘要:利用神经文本到语音(GraphTTS),该图形中嵌入输入序列谱图映射图对序列的方法。所述图形输入包括从输入的文本构成节点和边表示。这些图形输入的编码包含由GNN编码器模块的语法信息。此外,施加GraphTTS的编码器的图辅助编码器(GAE)可以分析从文本的语义结构韵律信息。这可以去除参考音频处理的手动选择和使韵律模型的端至端的过程。实验分析显示,GraphTTS 0.24在平均意见得分(MOS)优于状态的最先进的序列到序列的机型。 GAE可以自动调整合成音的暂停,通风和音调。这个实验的结论可能会提供一些灵感,在提高语音合成韵律工作的研究人员。

12. On Emergent Communication in Competitive Multi-Agent Teams [PDF] 返回目录
  Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur
Abstract: Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task. However, human populations learn to solve complex tasks involving communicative behaviors not only in fully cooperative settings but also in scenarios where competition acts as an additional external pressure for improvement. In this work, we investigate whether competition for performance from an external, similar agent team could act as a social influence that encourages multi-agent populations to develop better communication protocols for improved performance, compositionality, and convergence speed. We start from Task & Talk, a previously proposed referential game between two cooperative agents as our testbed and extend it into Task, Talk & Compete, a game involving two competitive teams each consisting of two aforementioned cooperative agents. Using this new setting, we provide an empirical study demonstrating the impact of competitive influence on multi-agent teams. Our results show that an external competitive influence leads to improved accuracy and generalization, as well as faster emergence of communicative languages that are more informative and compositional.

13. Discover Your Social Identity from What You Tweet: a Content Based Approach [PDF] 返回目录
  Binxuan Huang, Kathleen M. Carley
Abstract: An identity denotes the role an individual or a group plays in highly differentiated contemporary societies. In this paper, our goal is to classify Twitter users based on their role identities. We first collect a coarse-grained public figure dataset automatically, then manually label a more fine-grained identity dataset. We propose a hierarchical self-attention neural network for Twitter user role identity classification. Our experiments demonstrate that the proposed model significantly outperforms multiple baselines. We further propose a transfer learning scheme that improves our model's performance by a large margin. Such transfer learning also greatly reduces the need for a large amount of human labeled data.

14. Untangling in Invariant Speech Recognition [PDF] 返回目录
  Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung
Abstract: Encouraged by the success of deep neural networks on a variety of visual tasks, much theoretical and experimental work has been aimed at understanding and interpreting how vision networks operate. Meanwhile, deep neural networks have also achieved impressive performance in audio processing applications, both as sub-components of larger systems and as complete end-to-end systems by themselves. Despite their empirical successes, comparatively little is understood about how these audio models accomplish these tasks. In this work, we employ a recently developed statistical mechanical theory that connects geometric properties of network representations and the separability of classes to probe how information is untangled within neural networks trained to recognize speech. We observe that speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties such as words and phonemes are untangled in later layers. Higher level concepts such as parts-of-speech and context dependence also emerge in the later layers of the network. Finally, we find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation. Taken together, these findings shed light on how deep auditory models process time dependent input signals to achieve invariant speech recognition, and show how different concepts emerge through the layers of the network.

15. Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data [PDF] 返回目录
  Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier
Abstract: While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. This technique takes a frozen acoustic model trained on clean speech to provide valuable feedback to the enhancement model, even in the case where no parallel speech data is available. Our work is one of the first to show intelligibility improvement for neural enhancement systems without parallel speech data, and we show phonetic feedback can improve a state-of-the-art neural enhancement system trained with parallel speech data.

16. Towards Real-time Mispronunciation Detection in Kids' Speech [PDF] 返回目录
  Peter Plantinga, Eric Fosler-Lussier
Abstract: Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network may be more appropriate. Teacher-student learning is a natural approach to use to improve a uni-directional model, but when using a CTC objective, this is limited by poor alignment of outputs to evidence. We address this limitation by trying two loss terms for improving the alignments of our models. One loss is an "alignment loss" term that encourages outputs only when features do not resemble silence. The other loss term uses a uni-directional model as teacher model to align the bi-directional model. Our proposed model uses these aligned bi-directional models as teacher models. Experiments on the CSLU kids' corpus show that these changes decrease the latency of the outputs, and improve the detection rates, with a trade-off between these goals.
