摘要

1. Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [PDF] 返回目录
Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz
Abstract: Domain adaptation has recently become a key problem in dialogue systems research. Deep learning, while being the preferred technique for modeling such systems, works best given massive training data. However, in the real-world scenario, such resources aren't available for every new domain, so the ability to train with a few dialogue examples can be considered essential. Pre-training on large data sources and adapting to the target data has become the standard method for few-shot problems within the deep learning framework. In this paper, we present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset. Robust and diverse in response generation, our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.
摘要：域名适应最近已成为对话系统研究的关键问题。深度学习，而被用于模拟这样的系统的首选技术，最适合给大量的训练数据。然而，在实际情况中，这种资源是不可用于每一个新的领域，因此有能力与列车的几个对话的例子可以被认为是必不可少的。在大型数据源和适应目标数据前培训已成为深学习框架内为数不多的射门问题的标准方法。在本文中，我们提出在DSTC8，基于GPT-2微调到多域MetaLWOz数据集混合生成检索模型的快速领域适应性任务的获奖作品。健壮和在响应生成多样化的，我们的模型使用检索逻辑作为备用，上MetaLWOz人评价为Sota株（>在第2个地方系统4％的改善）和实现在适应看不见MultiWOZ数据集竞争性泛化性能。

2. Improving Uyghur ASR systems with decoders using morpheme-based language models [PDF] 返回目录
Zicheng Qiu, Wei Jiang, Turghunjan Mamut
Abstract: Uyghur is a minority language, and its resources for Automatic Speech Recognition (ASR) research are always insufficient. THUYG-20 is currently the only open-sourced dataset of Uyghur speeches. State-of-the-art results of its clean and noiseless speech test task haven't been updated since the first release, which shows a big gap in the development of ASR between mainstream languages and this http URL this paper, we try to bridge the gap by ultimately optimizing the ASR systems, and by developing a morpheme-based decoder, MLDG-Decoder (Morpheme Lattice Dynamically Generating Decoder for Uyghur DNN-HMM systems), which has long been missing. We have open-sourced the decoder. The MLDG-Decoder employs an algorithm, named as "on-the-fly composition with FEBABOS", to allow the back-off states and transitions to play the role of a relay station in on-the-fly composition. The algorithm empowers the dynamically generated graph to constrain the morpheme sequences in the lattices as effectively as the static and fully composed graph does when a 4-Gram morpheme-based Language Model (LM) is used. We have trained deeper and wider neural network acoustic models, and experimented with three kinds of decoding schemes. The experimental results show that the decoding based on the static and fully composed graph reduces state-of-the-art Word Error Rate (WER) on the clean and noiseless speech test task in THUYG-20 to 14.24%. The MLDG-Decoder reduces the WER to 14.54% while keeping the memory consumption reasonable. Based on the open-sourced MLDG-Decoder, readers can easily reproduce the experimental results in this paper.
摘要：维吾尔语是一种少数民族语言，它的资源自动语音识别（ASR）的研究始终是不够的。 THUYG-20是目前维吾尔语演讲的唯一开源数据集。国家的最先进的清洁和无声的语音测试任务的结果尚未以来的第一个版本更新，这显示了ASR的主流语言的发展有很大的差距，这HTTP URL本文中，我们尝试桥该间隙由最终优化ASR系统，并且通过开发基于词素解码器，MLDG - 解码器（语素格子动态地生成解码维吾尔族DNN-HMM系统），它一直被丢失。我们已经开源的解码器。该MLDG - 解码器采用的算法，命名为“上即时组合物与FEBABOS”，允许回退状态和转换在即时成分发挥中继站的角色。该算法授权时使用基于词素-4- gram语言模型（LM）做动态生成图形，以限制在晶格语素序列尽可能有效的静态和完全构成图。我们已经培训了更深，更广的神经网络声学模型，并具有三种解码方案的实验。实验结果表明，基于静态和完全由曲线图中的解码降低状态的最先进的在THUYG-20至14.24％的清洁和无噪声语音测试任务字差错率（WER）。所述MLDG - 解码器减少到WER 14.54％，同时保持存储器消耗合理。基于开源MLDG - 解码器，读者可以很容易地复制在本文的实验结果。

3. Multi-Task Learning Network for Emotion Recognition in Conversation [PDF] 返回目录
Jingye Li, Meishan Zhang, Donghong Ji, Yijiang Liu
Abstract: Conversational emotion recognition (CER) has attracted increasing interests in the natural language processing (NLP) community. Different from the vanilla emotion recognition, effective speaker-sensitive utterance representation is one major challenge for CER. In this paper, we exploit speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations. By this method, we can learn better speaker-aware contextual representations from the additional SI corpus. Experiments on two benchmark datasets demonstrate that the proposed architecture is highly effective for CER, obtaining new state-of-the-art results on two datasets.
摘要：会话情感识别（CER）已经吸引了自然语言处理（NLP）的社区越来越浓厚的兴趣。从香草情感识别不同的，有效的扬声器敏感的话语表示是CER一个重大的挑战。在本文中，我们利用说话人识别（SI）作为辅助任务，以提高对话的话语表示。通过这种方法，我们可以从另外的SI语料更好的演讲者感知上下文表示。两个基准数据集的实验表明，该架构是CER高效，对两个数据集获得国家的最先进的新成果。

4. XGPT: Cross-modal Generative Pre-Training for Image Captioning [PDF] 返回目录
Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Ming Zhou
Abstract: While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text retrieval and VQA, they cannot be applied to generation tasks directly. In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation tasks, including Image-conditioned Masked Language Modeling (IMLM), Image-conditioned Denoising Autoencoding (IDA), and Text-conditioned Image Feature Generation (TIFG). As a result, the pre-trained XGPT can be fine-tuned without any task-specific architecture modifications to create state-of-the-art models for image captioning. Experiments show that XGPT obtains new state-of-the-art results on the benchmark datasets, including COCO Captions and Flickr30k Captions. We also use XGPT to generate new image captions as data augmentation for the image retrieval task and achieve significant improvement on all recall metrics.
摘要：虽然许多基于BERT，跨模式预先训练模型产生像图像，文本检索和VQA下游理解任务优异的成绩，他们不能直接应用于生成任务。在本文中，我们提出XGPT，跨模态剖成前的训练中的图像字幕的新方法，旨在通过三种新的生成任务，包括图像的空调蒙面语言模型预列车文字到图片标题发生器（ IMLM），图像空调去噪Autoencoding（IDA），和文本条件的图像特征生成（TIFG）。其结果是，预先训练XGPT可以进行微调而没有任何特定任务的架构修改以用于图像字幕创建状态的最先进的模型。实验表明，XGPT获得国家的最先进的新的基准数据集，包括COCO字幕和字幕Flickr30k结果。我们还使用XGPT产生新的图片说明为图像检索任务数据增强，实现对所有召回的指标显著改善。

5. Seshat: A tool for managing and verifying annotation campaigns of audio data [PDF] 返回目录
Hadrien Titeux, Rachid Riad, Xuan-Nga Cao, Nicolas Hamilakis, Kris Madden, Alejandrina Cristia, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux
Abstract: We introduce Seshat, a new, simple and open-source software to efficiently manage annotations of speech corpora. The Seshat software allows users to easily customise and manage annotations of large audio corpora while ensuring compliance with the formatting and naming conventions of the annotated output files. In addition, it includes procedures for checking the content of annotations following specific rules are implemented in personalised parsers. Finally, we propose a double-annotation mode, for which Seshat computes automatically an associated inter-annotator agreement with the $\gamma$ measure taking into account the categorisation and segmentation discrepancies.
摘要：介绍塞莎特，一个新的，简单的和开源软件来有效地管理语音语料库的注释。该塞莎特软件允许用户轻松定制并同时确保遵守注释的输出文件的格式和命名规范管理大型音语料库的注释。此外，它还包括用于检查下列特定规则在个性化的解析器实现注释的内容程序。最后，我们提出了一个双注释模式，为此塞莎特自动计算与$ \ $伽玛衡量考虑到分类和分割的差异关联，注释间协议。

6. Meta-Embeddings Based On Self-Attention [PDF] 返回目录
Qichen Li, Xiaoke Jiang, Jun Xia, Jian Li
Abstract: Creating meta-embeddings for better performance in language modelling has received attention lately, and methods based on concatenation or merely calculating the arithmetic mean of more than one separately trained embeddings to perform meta-embeddings have shown to be beneficial. In this paper, we devise a new meta-embedding model based on the self-attention mechanism, namely the Duo. With less than 0.4M parameters, the Duo mechanism achieves state-of-the-art accuracy in text classification tasks such as 20NG. Additionally, we propose a new meta-embedding sequece-to-sequence model for machine translation, which to the best of our knowledge, is the first machine translation model based on more than one word-embedding. Furthermore, it has turned out that our model outperform the Transformer not only in terms of achieving a better result, but also a faster convergence on recognized benchmarks, such as the WMT 2014 English-to-French translation task.
摘要：创建元的嵌入在语言模型更好的性能受到了关注最近和方法，基于级联或仅仅计算一个以上的单独训练的嵌入物进行荟萃的嵌入的算术平均值已经证明是有益的。在本文中，我们设计基础上，自注意机制，即双核新的元嵌入模型。少于0.4M参数，二重奏机构达到的状态的最先进的精度在文本分类任务，如20ng的。此外，我们提出了一个新的元嵌入sequece到序列模型机器翻译，这在我们所知的，是基于超过一个字嵌入第一机器翻译模型。此外，它已经证明，我们的模型跑赢变压器不仅实现了较好的效果方面，也对认可的基准，如WMT 2014英语对法语翻译任务较快的收敛。

7. CLUECorpus2020: A Large-scale Chinese Corpus for Pre-trainingLanguage Model [PDF] 返回目录
Liang Xu, Xuanwei Zhang, Qianqian Dong
Abstract: In this paper, we introduce the Chinese corpus from CLUE organization, CLUECorpus2020, a large-scale corpus that can be used directly for self-supervised learning such as pre-training of a language model, or language generation. It has 100G raw corpus with 35 billion Chinese characters, which is retrieved from Common Crawl. To better understand this corpus, we conduct language understanding experiments on both small and large scale, and results show that the models trained on this corpus can achieve excellent performance on Chinese. We release a new Chinese vocabulary with a size of 8K, which is only one-third of the vocabulary size used in Chinese Bert released by Google. It saves computational cost and memory while works as good as original vocabulary. We also release both large and tiny versions of the pre-trained model on this corpus. The former achieves the state-of-the-art result, and the latter retains most precision while accelerating training and prediction speed for eight times compared to Bert-base. To facilitate future work on self-supervised learning on Chinese, we release our dataset, new vocabulary, codes, and pre-trained models on Github.
摘要：在本文中，我们将介绍从CLUE组织，CLUECorpus2020，可直接用于大规模语料库的中国语料库自我监督学习，如语言模型前培训或语言的产生。它拥有100G 35个十亿中国字，这是从常见的爬行检索生语料库。为了更好地理解这个语料库，我们在两个小规模和大规模开展语言理解实验，结果表明，经过训练这个语料库的车型可以实现在中国的出色表现。我们发布了新的中国的词汇，大小为8K，这是只有三分之一由谷歌发布了中国伯特使用的词汇量的大小的。它节省了计算成本和内存，而作品不如原来的词汇。我们还发布关于这个语料库大型和微型版本的预先训练模式。前者达到国家的最先进的结果，后者则保留了大部分精度，同时加速训练和预测速度的八倍相比伯特基。为了便于对中国自我监督学习今后的工作中，我们发布的数据集，新的词汇，代码和预先训练Github上的模型。

8. Improving Candidate Generation for Low-resource Cross-lingual Entity Linking [PDF] 返回目录
Shuyan Zhou, Shruti Rijhawani, John Wieting, Jaime Carbonell, Graham Neubig
Abstract: Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages (HRL), but these do not extend well to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the LRL by utilizing resources in closely-related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: we experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared to state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL.
摘要：跨语种实体链接（XEL）是在目标语言知识基础提到从源语言文本中提取找到参照物（KB）的任务。（X）EL的第一步是候选生成，它检索来自目标语言KB对每个提似是而非候选实体的列表。但是基于维基百科资源相对较高的资源语言（HRL）的领域证明是成功的，但这些并没有很好地低资源语言（LRL）很少，如果有的话，维基百科页面扩展。近日，传递学习方法已被证明在密切相关的语言运用资源，以减少LRL对资源的需求，但性能仍远远落后于他们的高资源同行。在本文中，我们首先评估低资源XEL面临当前实体候选生成方法存在的问题，则提出了三种改善：（1）降低断开之间实体提到和KB条目，和（2）提高了模型的鲁棒性以低资源场景。方法很简单，但有效的：我们尝试与我们的七个XEL数据集的方法，并发现它们产生了16.9％的前30金的候选人召回的平均收益，相较于国家的最先进的基线。我们的改进的模型也产生的7.9％在-KB的端至端XEL的精度的平均增益。

9. Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection [PDF] 返回目录
Qian Chen, Mengzhe Chen, Bo Li, Wen Wang
Abstract: With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth. In this paper, we propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks in real time. The CT-Transformer model facilitates freezing partial outputs with controllable time delay to fulfill the real-time constraints in partial decoding required by subsequent applications. We further propose a fast decoding strategy to minimize latency while maintaining competitive performance. Experimental results on the IWSLT2011 benchmark dataset and an in-house Chinese annotated dataset demonstrate that the proposed approach outperforms the previous state-of-the-art models on F-scores and achieves a competitive inference speed.
摘要：近年来，随着自动语音识别（ASR）的申请越来越多，这是至关重要的自动插入标点符号和成绩单删除不流利，以提高转录本的可读性，以及随后的应用程序，如机器的性能翻译，对话系统，等等。在本文中，我们提出了共同完成实时标点符号预测和不流利的检测任务的可控延时互感器（CT变压器）模型。的CT-Transformer模型有利于冷冻可控时间延迟部分输出到履行由后续应用所需部分解码的实时约束。我们进一步提出了一个快速解码策略，以减少等待时间，同时保持具有竞争力的表现。在IWSLT2011基准数据集和内部注解中国数据集表明，该方法比对F-分数以前的状态的最先进的车型，并实现了有竞争力的推理速度实验结果。

10. Transfer Learning for Context-Aware Spoken Language Understanding [PDF] 返回目录
Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu
Abstract: Spoken language understanding (SLU) is a key component of task-oriented dialogue systems. SLU parses natural language user utterances into semantic frames. Previous work has shown that incorporating context information significantly improves SLU performance for multi-turn dialogues. However, collecting a large-scale human-labeled multi-turn dialogue corpus for the target domains is complex and costly. To reduce dependency on the collection and annotation effort, we propose a Context Encoding Language Transformer (CELT) model facilitating exploiting various context information for SLU. We explore different transfer learning approaches to reduce dependency on data collection and annotation. In addition to unsupervised pre-training using large-scale general purpose unlabeled corpora, such as Wikipedia, we explore unsupervised and supervised adaptive training approaches for transfer learning to benefit from other in-domain and out-of-domain dialogue corpora. Experimental results demonstrate that the proposed model with the proposed transfer learning approaches achieves significant improvement on the SLU performance over state-of-the-art models on two large-scale single-turn dialogue benchmarks and one large-scale multi-turn dialogue benchmark.
摘要：口语理解（SLU）是面向任务的对话系统的关键组成部分。 SLU解析自然语言的用户话语到语义框架。以前的研究表明，结合上下文信息显著提高了多圈的对话SLU性能。然而，收集大规模人标记的多匝对话语料库用于目标域是复杂和昂贵的。为了减少对收集和注释工作的依赖性，我们提出了一个上下文编语言变压器（CELT）模型有利于开发用于SLU各种上下文信息。我们在探索不同的传输学习方法，以减少对数据收集和注解的依赖。除了使用大型通用语料库未标注，如维基百科监督的训练前，我们探索无监督和监督适应性训练转移学习其它域内和外的域对话语料库方法中受益。实验结果表明，该模型所提出的迁移学习方法实现了对两个大型单圈对话基准和一个大型多转对话标杆SLU表现在国家的最先进的车型显著的改善。

11. Med7: a transferable clinical natural language processing model for electronic health records [PDF] 返回目录
Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Alejo Nevado-Holgado
Abstract: The field of clinical natural language processing has been advanced significantly since the introduction of deep learning models. The self-supervised representation learning and the transfer learning paradigm became the methods of choice in many natural language processing application, in particular in the settings with the dearth of high quality manually annotated data. Electronic health record systems are ubiquitous and the majority of patients' data are now being collected electronically and in particular in the form of free text. Identification of medical concepts and information extraction is a challenging task, yet important ingredient for parsing unstructured data into structured and tabulated format for downstream analytical tasks. In this work we introduced a named-entity recognition model for clinical natural language processing. The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration. The model was first self-supervisedly pre-trained by predicting the next word, using a collection of 2 million free-text patients' records from MIMIC-III corpora and then fine-tuned on the named-entity recognition task. The model achieved a lenient (strict) micro-averaged F1 score of 0.957 (0.893) across all seven categories. Additionally, we evaluated the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK. A direct application of the trained NER model to CRIS data resulted in reduced performance of F1=0.762, however after fine-tuning on a small sample from CRIS, the model achieved a reasonable performance of F1=0.944. This demonstrated that despite a close similarity between the data sets and the NER tasks, it is essential to fine-tune on the target domain data in order to achieve more accurate results.
摘要：临床自然语言处理领域一直以来引进深学习模式的显著进展。自我监督表示学习和转移学习模式成为首选的方法在许多自然语言处理的应用，特别是在高品质的手工标注的数据缺乏的设置。电子病历系统是普遍存在的，大多数病人的数据，目前正在收集电子，特别是在自由文本形式。医学概念和信息提取鉴定是一项艰巨的任务，用于分析非结构化数据转换成结构和表格形式进行下游分析任务而重要的组成部分。在这项工作中，我们介绍了临床自然语言处理命名实体识别模型。该模型被训练识别七类：药品名称，路径，频率，用量，强度，形式，时间。该模型是第一个自supervisedly通过预测下一个字，用200万自由文本病人从MIMIC-III语料库记录，然后在命名实体识别任务微调集合预先训练。该模型在所有七个类别取得了宽松（严格）的微平均F1的0.957（0.893）得分。此外，我们评估使用从重症监护病房，在美国的数据在英国的二级医疗机构的心理健康记录（CRIS）开发的模型的转移性。经训练的模型NER到CRIS数据的直接应用导致F1 = 0.762的性能下降，但之后从CRIS一个小样本的微调，模型取得的F1 = 0.944一个合理的性能。这表明，尽管该数据集和NER任务之间有密切的相似性，它是在目标域数据微调必要的，以便实现更精确的结果。

12. Understanding the Prediction Mechanism of Sentiments by XAI Visualization [PDF] 返回目录
Chaehan So
Abstract: People often rely on online reviews to make purchase decisions. The present work aimed to gain an understanding of a machine learning model's prediction mechanism by visualizing the effect of sentiments extracted from online hotel reviews with explainable AI (XAI) methodology. Study 1 used the extracted sentiments as features to predict the review ratings by five machine learning algorithms (knn, CART decision trees, support vector machines, random forests, gradient boosting machines) and identified random forests as best algorithm. Study 2 analyzed the random forests model by feature importance and revealed the sentiments joy, disgust, positive and negative as the most predictive features. Furthermore, the visualization of additive variable attributions and their prediction distribution showed correct prediction in direction and effect size for the 5-star rating but partially wrong direction and insufficient effect size for the 1-star rating. These prediction details were corroborated by a what-if analysis for the four top features. In conclusion, the prediction mechanism of a machine learning model can be uncovered by visualization of particular observations. Comparing instances of contrasting ground truth values can draw a differential picture of the prediction mechanism and inform decisions for model improvement.
摘要：人们通常依赖于网上评论做出购买决定。目前的工作的目的是通过从可视化的在线酒店评论与解释的AI（XAI）方法提取情绪的效果，以获得一个机器学习模型的预测机制的理解。研究1使用提取的情感为特征五个机器学习算法（KNN，CART决策树，支持向量机，随机森林，助推梯度机器），并确定随机森林是最好的算法预测评价评分。研究2分析了功能重要性的随机森林模型，揭示了感情的喜悦，厌恶，积极和消极作为最有预测功能。此外，添加剂可变归因和它们的预测分布的可视化显示，在方向和效果大小正确预测对5-星级，而是部分错误的方向和作用大小不足1星评级。这些预测资料由一个假设分析的四大功能确证。最后，机器学习模型的预测机构可以通过特定的观察可视化被发现。比较对比地面真值的情况下，可以得出预测机制的差分图像，并告知决策模型的改进。

13. Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion [PDF] 返回目录
Jingyuan Yang, Guang Liu, Yuzhao Mao, Zhiwei Zhao, Weiguo Gao, Xuan Li, Haiqin Yang, Jianping Shen
Abstract: Task 1 of the DSTC8-track1 challenge aims to develop an end-to-end multi-domain dialogue system to accomplish complex users' goals under tourist information desk settings. This paper describes our submitted solution, Hierarchical Context Enhanced Dialogue System (HCEDS), for this task. The main motivation of our system is to comprehensively explore the potential of hierarchical context for sufficiently understanding complex dialogues. More specifically, we apply BERT to capture token-level information and employ the attention mechanism to capture sentence-level information. The results listed in the leaderboard show that our system achieves first place in automatic evaluation and the second place in human evaluation.
摘要：DSTC8-TRACK1挑战目标的任务1，开发一个终端到终端的多领域对话系统来完成在旅游信息服务台设置复杂的用户的目标。本文介绍了我们的解决方案提出，阶层上下文加强对话系统（HCEDS），这个任务。我们的系统的主要动机是为全面探索分层上下文的潜力充分理解复杂的对话。更具体地讲，我们应用BERT捕捉标记级别信息和使用注意机制，捕捉句子级别的信息。结果在排行榜显示，我们的系统实现了自动评估和人工评估的第二位第一位上市。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-03-04

目录

摘要