0%

【arxiv论文】 Computation and Language 2020-02-28

目录

1. Generating Followup Questions for Interpretable Multi-hop Question Answering [PDF] 摘要
2. Few-shot Natural Language Generation for Task-Oriented Dialog [PDF] 摘要
3. A Primer in BERTology: What we know about how BERT works [PDF] 摘要
4. Annotation of Emotion Carriers in Personal Narratives [PDF] 摘要
5. Improving cross-lingual model transfer by chunking [PDF] 摘要
6. Binarized PMI Matrix: Bridging Word Embeddings and Hyperbolic Spaces [PDF] 摘要
7. Integrating Boundary Assembling into a DNN Framework for Named Entity Recognition in Chinese Social Media Text [PDF] 摘要
8. CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset [PDF] 摘要
9. Analysis of diversity-accuracy tradeoff in image captioning [PDF] 摘要
10. Echo State Neural Machine Translation [PDF] 摘要
11. Universal Phone Recognition with a Multilingual Allophone System [PDF] 摘要
12. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers [PDF] 摘要
13. Towards Zero-shot Learning for Automatic Phonemic Transcription [PDF] 摘要
14. SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation [PDF] 摘要
15. Attacking Neural Text Detectors [PDF] 摘要

摘要

1. Generating Followup Questions for Interpretable Multi-hop Question Answering [PDF] 返回目录
  Christopher Malon, Bing Bai
Abstract: We propose a framework for answering open domain multi-hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single-hop answer extractor. This framework makes each hop interpretable, and makes the retrieval associated with later hops as flexible and specific as for the first hop. As a first instantiation of this framework, we train a pointer-generator network to predict followup questions based on the question and partial information. This provides a novel application of a neural question generation network, which is applied to give weak ground truth single-hop followup questions based on the final answers and their supporting facts. Learning to generate followup questions that select the relevant answer spans against downstream supporting facts, while avoiding distracting premises, poses an exciting semantic challenge for text generation. We present an evaluation using the two-hop bridge questions of HotpotQA.
摘要:我们提出了回答,其中的部分信息被读取并用来产生后续问题开域多跳的问题,最终由预训练的单跳的答案提取回答了一个框架。这种框架使每一跳解释,并与后来跳灵活和具体为第一跳相关的检索。作为这一框架的第一个实例,我们培养一个指针发电机网络基础上的问题,部分信息来预测后续问题。这提供了神经问题生成网络,并应用于给基础上,最终的答案及其配套事实弱地面实况单跳后续问题的一种新的应用程序。学习产生选择对下游配套事实相关答案跨度后续问题,同时避免分散注意力的前提下,提出了文本生成一个令人兴奋的语义挑战。我们目前使用HotpotQA的两跳桥问题的评估。

2. Few-shot Natural Language Generation for Task-Oriented Dialog [PDF] 返回目录
  Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, Jianfeng Gao
Abstract: As a crucial component in task-oriented dialog systems, the Natural Language Generation (NLG) module converts a dialog act represented in a semantic form into a response in natural language. The success of traditional template-based or statistical models typically relies on heavily annotated data, which is infeasible for new domains. Therefore, it is pivotal for an NLG system to generalize well with limited labelled data in real applications. To this end, we present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. Further, we develop the SC-GPT model. It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains. Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods, measured by various automatic metrics and human evaluations.
摘要:作为面向任务的对话系统的重要组成部分,自然语言生成(NLG)模块将对话行为代表了一个语义形式到自然语言的响应。传统的基于模板或统计模型的成功通常依赖于大量标注的数据,这是不可行的新领域。因此,枢转用于NLG系统以在实际应用中的限制标记数据概括良好。为此,我们提出FewShotWoz,第一NLG基准,以模拟几拍学习面向任务的对话系统设置。此外,我们开发的SC-GPT模式。这是预先训练对大量注释NLG语料库的获取可控发电能力,并微调只有少数特定领域的标签,以适应新的领域。上FewShotWoz和大多域-WOZ数据集实验结果表明,所提出的SC-GPT显著优于现有方法,通过各种自动度量和人的评估测量。

3. A Primer in BERTology: What we know about how BERT works [PDF] 返回目录
  Anna Rogers, Olga Kovaleva, Anna Rumshisky
Abstract: Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. We also provide an overview of the proposed modifications to the model and its training regime. We then outline the directions for further research.
摘要:现在基于变压器的模型被广泛应用于NLP,但我们还是不明白了很多关于他们的内部运作。本文介绍了什么是迄今已知约著名的BERT模型(Devlin等。2019),合成了40分析研究。我们还提供了建议修改模型及其培训制度的概述。然后,我们概述了进一步的研究方向。

4. Annotation of Emotion Carriers in Personal Narratives [PDF] 返回目录
  Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi
Abstract: We are interested in the problem of understanding personal narratives (PN) spoken or written - recollections of facts, events, and thoughts. In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user. Such segments may include entities, verb or noun phrases. Advanced automatic understanding of PNs requires not only the prediction of the user emotional state but also to identify which events (e.g. {\em the loss of relative} or {\em the visit of grandpa}) or people ( e.g. {\em the old group of high school mates}) carry the emotion manifested during the personal recollection. This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives. Compared to other text genres such as news and microblogs, spoken PNs are particularly challenging because a narrative is usually unstructured, involving multiple sub-events and characters as well as thoughts and associated emotions perceived by the narrator. In this work, we experiment with annotating emotion carriers from speech transcriptions in the Ulm State-of-Mind in Speech (USoMS) corpus, a dataset of German PNs. We believe this resource could be used for experiments in the automatic extraction of emotion carriers from PN, a task that could provide further advancements in narrative understanding.
摘要:我们感兴趣的是口头或书面的谅解个人叙述(PN)的问题 - 的事实,事件和想法的回忆。在PN,情感载体是语音或文本段最能说明用户的情绪状态。这样的段可以包括实体,动词或名词短语。期票先进的自动理解,不仅需要用户的情绪状态的预测还要找出哪些事件(例如{\ EM的相对损失}或{\ EM爷爷的访问})或人(例如{\ EM老高中队友})的组携带的个人记忆中表现出的情感。这项工作提出了评估和注释模型,用于识别口语个人叙述的情感载体。相对于其他类型的文字,如新闻,微博,口头的PN特别具有挑战性,因为叙述通常是非结构化的,涉及多个子事件和人物,以及思想和叙述者感知相关的情感。在这项工作中,我们实验在语音(USoMS)语料库,德国期票的数据集从语音转录标注情感运营商在乌尔姆州的头脑。我们相信,这种资源可以从PN情感载体的自动提取用于实验任务,可以在叙事理解提供进一步的进展。

5. Improving cross-lingual model transfer by chunking [PDF] 返回目录
  Ayan Das, Sudeshna Sarkar
Abstract: We present a shallow parser guided cross-lingual model transfer approach in order to address the syntactic differences between source and target languages more effectively. In this work, we assume the chunks or phrases in a sentence as transfer units in order to address the syntactic differences between the source and target languages arising due to the differences in ordering of words in the phrases and the ordering of phrases in a sentence separately.
摘要:为了更有效地解决源语言和目标语言的语法上的不同呈现出浅解析器引导跨语言模型传递方法。在这项工作中,我们假设在一个句子作为传输单位块或短语,以解决由于在一个句子里分别产生于在词组词和短语的顺序进行排序的不同源语言和目标语言之间的语法差异。

6. Binarized PMI Matrix: Bridging Word Embeddings and Hyperbolic Spaces [PDF] 返回目录
  Zhenisbek Assylbekov, Alibi Jangeldin
Abstract: We show analytically that removing sigmoid transformation in the SGNS objective does not harm the quality of word vectors significantly and at the same time is related to factorizing a binarized PMI matrix which, in turn, can be treated as an adjacency matrix of a certain graph. Empirically, such graph is a complex network, i.e. it has strong clustering and scale-free degree distribution, and is tightly connected with hyperbolic spaces. In short, we show the connection between static word embeddings and hyperbolic spaces through the binarized PMI matrix using analytical and empirical methods.
摘要:我们证明分析,在SGNS目标去除乙状结肠转型并不显著伤害词矢量的质量,并在同一时间与因式分解,反过来,可以被视为一个特定的邻接矩阵二值化PMI矩阵图形。根据经验,这样的图是一个复杂的网络,即,它具有很强的聚类和无标度分布,并紧密地与双曲空间相连。总之,我们将展示通过使用分析和实证方法的二值化PMI矩阵静态字的嵌入和双曲空间之间的连接。

7. Integrating Boundary Assembling into a DNN Framework for Named Entity Recognition in Chinese Social Media Text [PDF] 返回目录
  Zhaoheng Gong, Ping Chen, Jiang Zhou
Abstract: Named entity recognition is a challenging task in Natural Language Processing, especially for informal and noisy social media text. Chinese word boundaries are also entity boundaries, therefore, named entity recognition for Chinese text can benefit from word boundary detection, outputted by Chinese word segmentation. Yet Chinese word segmentation poses its own difficulty because it is influenced by several factors, e.g., segmentation criteria, employed algorithm, etc. Dealt improperly, it may generate a cascading failure to the quality of named entity recognition followed. In this paper we integrate a boundary assembling method with the state-of-the-art deep neural network model, and incorporate the updated word boundary information into a conditional random field model for named entity recognition. Our method shows a 2% absolute improvement over previous state-of-the-art results.
摘要:命名实体识别是自然语言处理一项艰巨的任务,尤其是非正式和嘈杂的社会化媒体的文字。中国字边界也是实体的边界,因此,对于中国的文字命名实体识别可以从字边界检测中受益,通过中国的分词输出。然而,中国的分词带来了自己的困难,因为它是由几个因素,例如,细分的标准,所采用的算法等影响肾阴虚不当,则可能产生级联故障命名实体识别的质量紧随其后。在本文中,我们结合与国家的最先进的深层神经网络模型的边界组装方法,并纳入更新字边界信息转化为命名实体识别条件随机场模型。我们的方法显示了状态的最先进的以前的结果2%的绝对改进。

8. CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset [PDF] 返回目录
  Qi Zhu, Kaili Huang, Zheng Zhang, Xiaoyan Zhu, Minlie Huang
Abstract: To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts at both user and system sides. About 60% of the dialogues have cross-domain user goals that favor inter-domain dependency and encourage natural transition across domains in conversation. We also provide a user simulator and several benchmark models for pipelined task-oriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus. The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.
摘要:为了推进以及缓解中国面向任务的数据集的不足多域(跨域)对话建模,我们建议CrossWOZ,第一次大规模的中国跨域向导的盎司的面向任务的数据集。它包含6K对话会议和102K话语5个领域,包括酒店,餐厅,景点,地铁和出租车。此外,该语料库包含在用户和系统双方对话国和对话行为的丰富诠释。对话的约60%的有利于域间的依赖,并鼓励在谈话中跨多个域的自然过渡跨域用户的目标。我们还提供用户模拟器和一些标杆车型的流水线面向任务的对话系统,这将有助于研究人员比较和评价这个语料库他们的模型。大尺寸和CrossWOZ丰富的注解使它适合进行调查的各种跨域对话建模任务,如对话状态跟踪,政策学习,用户模拟等。

9. Analysis of diversity-accuracy tradeoff in image captioning [PDF] 返回目录
  Ruotian Luo, Gregory Shakhnarovich
Abstract: We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions. Our results show that 1) simple decoding by naive sampling, coupled with low temperature is a competitive and fast method to produce diverse and accurate caption sets; 2) training with CIDEr-based reward using Reinforcement learning harms the diversity properties of the resulting generator, which cannot be mitigated by manipulating decoding parameters. In addition, we propose a new metric AllSPICE for evaluating both accuracy and diversity of a set of captions by a single value.
摘要:我们调查不同的模型结构,培养目标,超参数设置和解码程序自动生成图片说明多样性的影响。我们的研究结果表明:1)由幼稚采样,加上低温简单的解码是产生不同的和准确的标题集的有竞争力的和快速的方法; 2)训练用强化学习危害所得的发生器,它不能由操纵解码参数来缓解的多样性基于属性苹果酒奖励。此外,我们提出了一个新的度量五香粉由单一值判断一组字幕的准确度和多样性。

10. Echo State Neural Machine Translation [PDF] 返回目录
  Ankush Garg, Yuan Cao, Qi Ge
Abstract: We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training. We show that even with this extremely simple model construction and training procedure, ESNMT can already reach 70-80% quality of fully trainable baselines. We examine how spectral radius of the reservoir, a key quantity that characterizes the model, determines the model behavior. Our findings indicate that randomized networks can work well even for complicated sequence-to-sequence prediction NLP tasks.
摘要:回送状态网络(ESN),命名为回声状态NMT(ESNMT),其中,所述编码器和译码器层的权重是随机生成的,然后固定在整个训练启发我们本神经机器翻译(NMT)模型。我们发现,即使有这个非常简单的模型构建和训练过程,ESNMT已经可以达到充分训练的基准70-80%的质量。我们研究水库谱半径,表征模型中的核心数量,如何确定模型的行为。我们的研究结果表明,随机网络甚至可以为复杂的序列,以序列预测NLP任务工作。

11. Universal Phone Recognition with a Multilingual Allophone System [PDF] 返回目录
  Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W Black, Florian Metze
Abstract: Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This can lead to performance degradation when combining a variety of training languages, as identically annotated phonemes can actually correspond to several different underlying phonetic realizations. In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute in low-resource conditions. Additionally, because we are explicitly modeling language-independent phones, we can build a (nearly-)universal phone recognizer that, when combined with the PHOIBLE large, manually curated database of phone inventories, can be customized into 2,000 language dependent recognizers. Experiments on two low-resourced indigenous languages, Inuktitut and Tusom, show that our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
摘要:多语言模型可以提高语言处理,特别是对低资源情况,针对不同的语言共享参数。语言声学模型,然而,通常忽略音素(声音,可以支持在一个特定的语言词汇对比)及其相应的手机(这实际上是讲的声音,这是独立的语言)之间的差异。结合各种培训语言的时候,因为相同的注解音素实际上可以对应多个不同的基础语音的实现这可能会导致性能下降。在这项工作中,我们提出了两种语言无关的电话和语言相关的音素分布的联合模型。在多语言ASR实验超过11种语言,我们发现,这个模型提高了2%音素误差率在低资源条件绝对性能测试。此外,因为我们明确地建模语言无关的电话,我们可以建立一个(nearly-)通用手机识别,当与手机库存的PHOIBLE大,人工监管的数据库相结合,可定制成2000语言相关的识别。两个资源不足地区的土著语言,因纽特语和Tusom实验,证明我们的识别器实现了超过17%的手机正确率提高,移动更近了一步语音识别所有的语言在世界上。

12. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers [PDF] 返回目录
  Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez
Abstract: Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.
摘要:由于硬件资源有限,培养深度学习模型的目的通常是为了最大限度地提高准确性受到的训练和推理的时间和内存的限制。我们研究模型的大小在此设置的影响,专注于变压器模型由计算限制NLP任务:自我监督的训练前和高资源机器翻译。我们首先表明,即使较小的变压器模型执行每次迭代更快,更广泛和更深入的模型显著更少的步骤收敛。此外,这种加速收敛通常赶不上使用更大型号的额外计算开销。因此,计算最有效的培训战略是后少量的迭代,以违反直觉训练非常大的模型,但停止。这就导致了一个明显的权衡的大型变压器模型的训练效率和小型变压器模型推理效率之间。然而,我们表明,大的模型更加坚固的压缩技术,如量化和修剪比小车型。因此,人们可以得到两全其美:严重压缩,大型模型实现比轻微压缩,小型号更高的精度。

13. Towards Zero-shot Learning for Automatic Phonemic Transcription [PDF] 返回目录
  Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W Black, Florian Metze
Abstract: Automatic phonemic transcription tools are useful for low-resource language documentation. However, due to the lack of training sets, only a tiny fraction of languages have phonemic transcription tools. Fortunately, multilingual acoustic modeling provides a solution given limited audio training data. A more challenging problem is to build phonemic transcribers for languages with zero training data. The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes. In this work, we address this problem by adopting the idea of zero-shot learning. Our model is able to recognize unseen phonemes in the target language without any training data. In our model, we decompose phonemes into corresponding articulatory attributes such as vowel and consonant. Instead of predicting phonemes directly, we first predict distributions over articulatory attributes, and then compute phoneme distributions with a customized acoustic model. We evaluate our model by training it using 13 languages and testing it using 7 unseen languages. We find that it achieves 7.7% better phoneme error rate on average over a standard multilingual model.
摘要:自动音位标工具是资源少的语言的文档非常有用。然而,由于缺乏训练集,只有语言的一小部分有音位标音工具。幸运的是,语言声学建模提供给有限的音频训练数据的解决方案。一个更具挑战性的问题是建立音素誊写与零个的训练数据的语言。这个任务的难度是音素库存经常训练语言和目标语言之间的差异,使得它不可能认识到看不见的音素。在这项工作中,我们采用零射门学习的想法解决这个问题。我们的模型能够识别在目标语言音素看不见的,没有任何的训练数据。在我们的模型中,我们分解音素为元音辅音和成相应的关节等属性。而不是直接预测音素的,我们首先预测了关节的属性分布,然后计算音素分布与定制声学模型。我们用13种语言训练,并使用7种看不见的语言测试它评估我们的模型。我们发现它在一个标准的多语言模型达到平均7.7%更好的音素的错误率。

14. SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation [PDF] 返回目录
  Arya D. McCarthy, Liezl Puzon, Juan Pino
Abstract: We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English$\to$French and English$\to$Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English$\to$French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks.
摘要:本文提出autoencoding扬声器转换为自动语音翻译训练数据增强。此技术直接变换的音频序列,从而导致合成类似于另一个说话者的声音的音频。我们的方法相比毫不逊色SpecAugment英语$ \至$法语和英语$ \达罗马尼亚自动语音翻译(AST)的任务,以及在低资源英语自动语音识别(ASR)的任务。此外,在消融,我们展示的数量和多样性的增强数据的好处。最后,我们表明,我们可以通过机器翻译的成绩单与增强结合我们的方法来获得优于上一个英语$ \至$法国AST任务非常强的级联模型有竞争力的终端到终端的AST模型。我们的方法是足够的一般,它可以应用到其他语音生成和分析任务。

15. Attacking Neural Text Detectors [PDF] 返回目录
  Max Wolff
Abstract: Machine learning based language models have recently made significant progress, which introduces a danger to spread misinformation. To combat this potential danger, several methods have been proposed for detecting text written by these language models. This paper presents two classes of black-box attacks on these detectors, one which randomly replaces characters with homoglyphs, and the other a simple scheme to purposefully misspell words. The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44% to 0.26% and 22.68%, respectively. Results also indicate that the attacks are transferable to other neural text detectors.
摘要:基于机器学习的语言模型最近取得显著的进展,介绍以传播误传的危险。为了解决这个潜在的危险,几种方法已经被提出用于检测由这些语言模型的书面文字。本文呈现两类对这些检测器,其中一个与随机替换同形字字符黑盒攻击,而另一种简单的方案来有目的地拼错单词。在同形字拼写错误和攻击从97.44%减少对神经的文本流行的神经文本探测器的召回至0.26%和22.68%,分别。结果还表明,这些攻击是转移到其他神经文本探测器。

注:中文为机器翻译结果!