目录
1. Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation [PDF] 摘要
5. Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis [PDF] 摘要
6. TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing [PDF] 摘要
13. RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media [PDF] 摘要
摘要
1. Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation [PDF] 返回目录
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Abstract: Back-translation provides a simple yet effective approach to exploit monolingual corpora in Neural Machine Translation (NMT). Its iterative variant, where two opposite NMT models are jointly trained by alternately using a synthetic parallel corpus generated by the reverse model, plays a central role in unsupervised machine translation. In order to start producing sound translations and provide a meaningful training signal to each other, existing approaches rely on either a separate machine translation system to warm up the iterative procedure, or some form of pre-training to initialize the weights of the model. In this paper, we analyze the role that such initialization plays in iterative back-translation. Is the behavior of the final system heavily dependent on it? Or does iterative back-translation converge to a similar solution given any reasonable initialization? Through a series of empirical experiments over a diverse set of warmup systems, we show that, although the quality of the initial system does affect final performance, its effect is relatively small, as iterative back-translation has a strong tendency to convergence to a similar solution. As such, the margin of improvement left for the initialization method is narrow, suggesting that future research should focus more on improving the iterative mechanism itself.
摘要:回译提供了一个简单而有效的方法来利用单语语料库在神经机器翻译(NMT)。其迭代变型,其中两个相对的NMT模型共同通过交替地使用由反向模型产生的合成平行语料库的训练,起着无监督机器翻译中心作用。为了开始产生声音翻译和彼此提供有意义的训练信号,现有的方法依赖任一单独的机器翻译系统上预热迭代过程,或某种形式的预训练初始化模型的权重。在本文中,我们分析的作用,在反复的回译这样的初始化播放。在最终系统的行为很大程度上依赖于它?抑或反复回译收敛于给定的任何合理的初始化类似的解决方案?通过一系列在一组不同的预热系统的实证实验,我们发现,虽然初始系统的质量不会影响最终的性能,它的影响相对较小,因为反复的回译强烈倾向于收敛到类似解。因此,改善留给初始化方法的范围窄,表明将来的研究应该更加注重提高迭代机制本身。
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Abstract: Back-translation provides a simple yet effective approach to exploit monolingual corpora in Neural Machine Translation (NMT). Its iterative variant, where two opposite NMT models are jointly trained by alternately using a synthetic parallel corpus generated by the reverse model, plays a central role in unsupervised machine translation. In order to start producing sound translations and provide a meaningful training signal to each other, existing approaches rely on either a separate machine translation system to warm up the iterative procedure, or some form of pre-training to initialize the weights of the model. In this paper, we analyze the role that such initialization plays in iterative back-translation. Is the behavior of the final system heavily dependent on it? Or does iterative back-translation converge to a similar solution given any reasonable initialization? Through a series of empirical experiments over a diverse set of warmup systems, we show that, although the quality of the initial system does affect final performance, its effect is relatively small, as iterative back-translation has a strong tendency to convergence to a similar solution. As such, the margin of improvement left for the initialization method is narrow, suggesting that future research should focus more on improving the iterative mechanism itself.
摘要:回译提供了一个简单而有效的方法来利用单语语料库在神经机器翻译(NMT)。其迭代变型,其中两个相对的NMT模型共同通过交替地使用由反向模型产生的合成平行语料库的训练,起着无监督机器翻译中心作用。为了开始产生声音翻译和彼此提供有意义的训练信号,现有的方法依赖任一单独的机器翻译系统上预热迭代过程,或某种形式的预训练初始化模型的权重。在本文中,我们分析的作用,在反复的回译这样的初始化播放。在最终系统的行为很大程度上依赖于它?抑或反复回译收敛于给定的任何合理的初始化类似的解决方案?通过一系列在一组不同的预热系统的实证实验,我们发现,虽然初始系统的质量不会影响最终的性能,它的影响相对较小,因为反复的回译强烈倾向于收敛到类似解。因此,改善留给初始化方法的范围窄,表明将来的研究应该更加注重提高迭代机制本身。
2. Metaphoric Paraphrase Generation [PDF] 返回目录
Kevin Stowe, Leonardo Ribeiro, Iryna Gurevych
Abstract: This work describes the task of metaphoric paraphrase generation, in which we are given a literal sentence and are charged with generating a metaphoric paraphrase. We propose two different models for this task: a lexical replacement baseline and a novel sequence to sequence model, 'metaphor masking', that generates free metaphoric paraphrases. We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases. We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity, while our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.
摘要:该作品描述了隐喻的释义产生的,任务中,我们给出了一个文字句子,并负责生成一个隐喻的释义。我们提出了两种不同的模式完成这个任务:一个词汇替换基线和新的序列序列模型,“比喻掩蔽”,能产生游离隐喻的释义。我们利用众包来评估我们的成果,以及开发用于评估隐喻的释义自动度量。我们发现,虽然词汇替换基线是能够产生准确的释义中,他们往往缺乏隐喻性,而我们的隐喻产生隐喻的句子,而对于流畅性和复述质量进行近也掩盖模型过人之处。
Kevin Stowe, Leonardo Ribeiro, Iryna Gurevych
Abstract: This work describes the task of metaphoric paraphrase generation, in which we are given a literal sentence and are charged with generating a metaphoric paraphrase. We propose two different models for this task: a lexical replacement baseline and a novel sequence to sequence model, 'metaphor masking', that generates free metaphoric paraphrases. We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases. We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity, while our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.
摘要:该作品描述了隐喻的释义产生的,任务中,我们给出了一个文字句子,并负责生成一个隐喻的释义。我们提出了两种不同的模式完成这个任务:一个词汇替换基线和新的序列序列模型,“比喻掩蔽”,能产生游离隐喻的释义。我们利用众包来评估我们的成果,以及开发用于评估隐喻的释义自动度量。我们发现,虽然词汇替换基线是能够产生准确的释义中,他们往往缺乏隐喻性,而我们的隐喻产生隐喻的句子,而对于流畅性和复述质量进行近也掩盖模型过人之处。
3. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [PDF] 返回目录
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
Abstract: We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks.
摘要:我们建议预先训练一个统一的语言模型两个autoencoding和使用一种新的训练过程部分自回归语言建模任务,被称为伪掩盖语言模型(PMLM)。给定一个输入文本与屏蔽令牌,我们依靠传统的口罩通过autoencoding学习损坏的标记和上下文之间的相互关系,以及伪面具通过部分自回归模型学习掩盖跨度之间的内部关系。精心设计的嵌入位置和自我关注口罩,上下文编码被重新使用,以避免重复计算。此外,用于autoencoding传统面具提供全局隐蔽的信息,以使所有位置的嵌入是部分自回归语言模型访问。此外,这两个任务前训练一个统一的语言模型为双向编码器和序列对序列解码器,分别。我们的实验表明,统一的语言模型预先训练使用PMLM实现跨越几个广泛使用的基准测试就广泛的自然语言理解和生成任务新的国家的最先进的成果。
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
Abstract: We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks.
摘要:我们建议预先训练一个统一的语言模型两个autoencoding和使用一种新的训练过程部分自回归语言建模任务,被称为伪掩盖语言模型(PMLM)。给定一个输入文本与屏蔽令牌,我们依靠传统的口罩通过autoencoding学习损坏的标记和上下文之间的相互关系,以及伪面具通过部分自回归模型学习掩盖跨度之间的内部关系。精心设计的嵌入位置和自我关注口罩,上下文编码被重新使用,以避免重复计算。此外,用于autoencoding传统面具提供全局隐蔽的信息,以使所有位置的嵌入是部分自回归语言模型访问。此外,这两个任务前训练一个统一的语言模型为双向编码器和序列对序列解码器,分别。我们的实验表明,统一的语言模型预先训练使用PMLM实现跨越几个广泛使用的基准测试就广泛的自然语言理解和生成任务新的国家的最先进的成果。
4. Automatic Section Recognition in Obituaries [PDF] 返回目录
Valentino Sabbatino, Laura Bostan, Roman Klinger
Abstract: Obituaries contain information about people's values across times and cultures, which makes them a useful resource for exploring cultural history. They are typically structured similarly, with sections corresponding to Personal Information, Biographical Sketch, Characteristics, Family, Gratitude, Tribute, Funeral Information and Other aspects of the person. To make this information available for further studies, we propose a statistical model which recognizes these sections. To achieve that, we collect a corpus of 20058 English obituaries from TheDaily Item, this http URL and The London Free Press. The evaluation of our annotation guidelines with three annotators on 1008 obituaries shows a substantial agreement of Fleiss k = 0.87. Formulated as an automatic segmentation task, a convolutional neural network outperforms bag-of-words and embedding-based BiLSTMs and BiLSTM-CRFs with a micro F1 = 0.81.
摘要:讣告包含关于人的跨越时代和文化价值,这使得它们为探索文化史上的一个有用的资源信息。它们通常结构相似,对应于个人信息,传记素描,特点,家庭,感恩,致敬,殡葬信息和人的其他方面的部分。为了使可用于进一步研究这些信息,我们提出一种识别这些部分的统计模型。为了实现这一目标,我们将收集的20058个英文讣告从TheDaily项目语料库,这个HTTP URL和伦敦出版自由。我们的注释准则对1008个讣告显示弗雷斯K = 0.87的实质性协议3个注释的评价。配制为自动分割任务,卷积神经网络性能优于袋的词和嵌入基于BiLSTMs和具有微F1 = 0.81 BiLSTM-的CRF。
Valentino Sabbatino, Laura Bostan, Roman Klinger
Abstract: Obituaries contain information about people's values across times and cultures, which makes them a useful resource for exploring cultural history. They are typically structured similarly, with sections corresponding to Personal Information, Biographical Sketch, Characteristics, Family, Gratitude, Tribute, Funeral Information and Other aspects of the person. To make this information available for further studies, we propose a statistical model which recognizes these sections. To achieve that, we collect a corpus of 20058 English obituaries from TheDaily Item, this http URL and The London Free Press. The evaluation of our annotation guidelines with three annotators on 1008 obituaries shows a substantial agreement of Fleiss k = 0.87. Formulated as an automatic segmentation task, a convolutional neural network outperforms bag-of-words and embedding-based BiLSTMs and BiLSTM-CRFs with a micro F1 = 0.81.
摘要:讣告包含关于人的跨越时代和文化价值,这使得它们为探索文化史上的一个有用的资源信息。它们通常结构相似,对应于个人信息,传记素描,特点,家庭,感恩,致敬,殡葬信息和人的其他方面的部分。为了使可用于进一步研究这些信息,我们提出一种识别这些部分的统计模型。为了实现这一目标,我们将收集的20058个英文讣告从TheDaily项目语料库,这个HTTP URL和伦敦出版自由。我们的注释准则对1008个讣告显示弗雷斯K = 0.87的实质性协议3个注释的评价。配制为自动分割任务,卷积神经网络性能优于袋的词和嵌入基于BiLSTMs和具有微F1 = 0.81 BiLSTM-的CRF。
5. Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis [PDF] 返回目录
Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King
Abstract: We aim to characterize how different speakers contribute to the perceived output quality of multi-speaker Text-to-Speech (TTS) synthesis. We automatically rate the quality of TTS using a neural network (NN) trained on human mean opinion score (MOS) ratings. First, we train and evaluate our NN model on 13 different TTS and voice conversion (VC) systems from the ASVSpoof 2019 Logical Access (LA) Dataset. Since it is not known how best to represent speech for this task, we compare 8 different representations alongside MOSNet frame-based features. Our representations include image-based spectrogram features and x-vector embeddings that explicitly model different types of noise such as T60 reverberation time. Our NN predicts MOS with a high correlation to human judgments. We report prediction correlation and error. A key finding is the quality achieved for certain speakers seems consistent, regardless of the TTS or VC system. It is widely accepted that some speakers give higher quality than others for building a TTS system: our method provides an automatic way to identify such speakers. Finally, to see if our quality prediction models generalize, we predict quality scores for synthetic speech using a separate multi-speaker TTS system that was trained on LibriTTS data, and conduct our own MOS listening test to compare human ratings with our NN predictions.
摘要:我们的目标是表征音箱如何不同的贡献多扬声器文本到语音转换(TTS)合成的感知输出质量。我们利用受过训练的人平均意见得分(MOS)评级神经网络(NN)自动评分TTS的质量。首先,我们训练和评估从ASVSpoof 2019逻辑访问(LA)数据集13个不同的TTS语音转换(VC)系统,我们的神经网络模型。因为它不知道如何最好地表示语音完成这个任务,我们比较靠MOSNet基于帧的功能8级不同的表示。我们表示包括基于图像的频谱特性和X-矢量的嵌入不同类型的噪音清晰的模型,如T60混响时间。我们NN预测MOS具有较高的相关性,以人的判断。我们报告预测的相关性和误差。一个关键发现是肯定的扬声器达到的质量似乎一致,无论TTS或VC系统。它已被广泛接受,一些发言者给予比其他人更高的质量为建设TTS系统:我们的方法提供了一种自动的方式来识别这些扬声器。最后,来看看我们的质量预测模型推广,我们预测使用已在LibriTTS数据训练一个独立的多扬声器系统,TTS合成语音质量得分,并进行我们自己的MOS听力测试对人体评级我们预测NN比较。
Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King
Abstract: We aim to characterize how different speakers contribute to the perceived output quality of multi-speaker Text-to-Speech (TTS) synthesis. We automatically rate the quality of TTS using a neural network (NN) trained on human mean opinion score (MOS) ratings. First, we train and evaluate our NN model on 13 different TTS and voice conversion (VC) systems from the ASVSpoof 2019 Logical Access (LA) Dataset. Since it is not known how best to represent speech for this task, we compare 8 different representations alongside MOSNet frame-based features. Our representations include image-based spectrogram features and x-vector embeddings that explicitly model different types of noise such as T60 reverberation time. Our NN predicts MOS with a high correlation to human judgments. We report prediction correlation and error. A key finding is the quality achieved for certain speakers seems consistent, regardless of the TTS or VC system. It is widely accepted that some speakers give higher quality than others for building a TTS system: our method provides an automatic way to identify such speakers. Finally, to see if our quality prediction models generalize, we predict quality scores for synthetic speech using a separate multi-speaker TTS system that was trained on LibriTTS data, and conduct our own MOS listening test to compare human ratings with our NN predictions.
摘要:我们的目标是表征音箱如何不同的贡献多扬声器文本到语音转换(TTS)合成的感知输出质量。我们利用受过训练的人平均意见得分(MOS)评级神经网络(NN)自动评分TTS的质量。首先,我们训练和评估从ASVSpoof 2019逻辑访问(LA)数据集13个不同的TTS语音转换(VC)系统,我们的神经网络模型。因为它不知道如何最好地表示语音完成这个任务,我们比较靠MOSNet基于帧的功能8级不同的表示。我们表示包括基于图像的频谱特性和X-矢量的嵌入不同类型的噪音清晰的模型,如T60混响时间。我们NN预测MOS具有较高的相关性,以人的判断。我们报告预测的相关性和误差。一个关键发现是肯定的扬声器达到的质量似乎一致,无论TTS或VC系统。它已被广泛接受,一些发言者给予比其他人更高的质量为建设TTS系统:我们的方法提供了一种自动的方式来识别这些扬声器。最后,来看看我们的质量预测模型推广,我们预测使用已在LibriTTS数据训练一个独立的多扬声器系统,TTS合成语音质量得分,并进行我们自己的MOS听力测试对人体评级我们预测NN比较。
6. TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing [PDF] 返回目录
Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu
Abstract: In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setup of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configuration, we achieve results that are comparable with or even higher than the state-of-the-art performance. Our toolkit is available through: this http URL
摘要:在本文中,我们介绍TextBrewer,一个开放源代码的知识蒸馏工具箱专为自然语言处理。它的工作原理与不同的神经网络模型,并支持各种任务,如文本分类,阅读理解,序列标注。 TextBrewer提供了一种简单且均匀的工作流程,使具有高度灵活的配置的蒸馏实验快速设置。它提供了一组预定义的蒸馏方法,可以用自定义代码进行扩展。作为一个案例研究中,我们使用TextBrewer提炼BERT几个典型的NLP任务。通过简单的配置,就可以实现的结果与相当或高于国家的最先进的性能更高。我们的工具包可以通过:此http网址
Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu
Abstract: In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setup of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configuration, we achieve results that are comparable with or even higher than the state-of-the-art performance. Our toolkit is available through: this http URL
摘要:在本文中,我们介绍TextBrewer,一个开放源代码的知识蒸馏工具箱专为自然语言处理。它的工作原理与不同的神经网络模型,并支持各种任务,如文本分类,阅读理解,序列标注。 TextBrewer提供了一种简单且均匀的工作流程,使具有高度灵活的配置的蒸馏实验快速设置。它提供了一组预定义的蒸馏方法,可以用自定义代码进行扩展。作为一个案例研究中,我们使用TextBrewer提炼BERT几个典型的NLP任务。通过简单的配置,就可以实现的结果与相当或高于国家的最先进的性能更高。我们的工具包可以通过:此http网址
7. DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding [PDF] 返回目录
Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang
Abstract: Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each retrieved document. Despite the success of these approaches in terms of QA accuracy, due to the concatenation, they can barely handle high-throughput of incoming questions each with a large collection of retrieved documents. To address the efficiency problem, we propose DC-BERT, a decoupled contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings. On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance compared to state-of-the-art approaches for open-domain question answering.
摘要:开放域问答最近的研究使用预训练的语言模型来实现显着的性能改进,如BERT。国家的最先进的方法通常遵循“检索和阅读”管道和采用基于BERT-reranker进行筛选检索文档它们送入读卡模块之前。所述BERT检索作为输入问题的级联和每个检索的文档。尽管这些成功的QA准确性方面接近,由于级联,他们可以勉强应付高吞吐量的每一个大集合检索文档的传入的问题。为了解决效率问题,我们提出了DC-BERT,一个去耦上下文编码框架,它具有双重BERT模式:在线BERT编码的问题只有一次,和离线BERT预编码的所有文件和缓存的编码。在队内开阔自然的问题打开数据集,DC-BERT实现对文档检索10倍速度提升,同时相对于国家的最先进的开放域问答接近QA性能的保留大部分(约98%)。
Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang
Abstract: Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each retrieved document. Despite the success of these approaches in terms of QA accuracy, due to the concatenation, they can barely handle high-throughput of incoming questions each with a large collection of retrieved documents. To address the efficiency problem, we propose DC-BERT, a decoupled contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings. On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance compared to state-of-the-art approaches for open-domain question answering.
摘要:开放域问答最近的研究使用预训练的语言模型来实现显着的性能改进,如BERT。国家的最先进的方法通常遵循“检索和阅读”管道和采用基于BERT-reranker进行筛选检索文档它们送入读卡模块之前。所述BERT检索作为输入问题的级联和每个检索的文档。尽管这些成功的QA准确性方面接近,由于级联,他们可以勉强应付高吞吐量的每一个大集合检索文档的传入的问题。为了解决效率问题,我们提出了DC-BERT,一个去耦上下文编码框架,它具有双重BERT模式:在线BERT编码的问题只有一次,和离线BERT预编码的所有文件和缓存的编码。在队内开阔自然的问题打开数据集,DC-BERT实现对文档检索10倍速度提升,同时相对于国家的最先进的开放域问答接近QA性能的保留大部分(约98%)。
8. Modeling Future Cost for Neural Machine Translation [PDF] 返回目录
Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao
Abstract: Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a time-dependent future cost is estimated based on the current generated target word and its contextual information to boost the training of the NMT model. Furthermore, the learned future context representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 German-to-English, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline.
摘要:现有神经机器翻译(NMT)系统利用序列到序列神经网络通过字,以生成目标翻译单词,然后使在每个时间步骤中的参考文献中所产生的字和对应尽可能一致。然而,训练的翻译模型往往把重点放在确保在当前时间步生成的目标词的准确性,也没有考虑,这意味着产生后续的目标翻译的预期成本的未来成本(即,下一个目标字) 。为了应对这个问题,我们提出了一个简单有效的方法,为NMT系统的每个目标词的未来成本模型。具体而言,与时间相关的未来费用是根据所产生的电流目标词和它的上下文信息来提振NMT模型的训练估计。此外,在当前时间步学习未来上下文表示是用来帮助解码中的下一个目标词的产生。三个广泛使用的翻译数据集,包括WMT14德国到英国,WMT14英语到法语,WMT17中国到英语,结果表明,该方法实现了强大的基于变压器的NMT基线显著改善实验结果。
Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao
Abstract: Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a time-dependent future cost is estimated based on the current generated target word and its contextual information to boost the training of the NMT model. Furthermore, the learned future context representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 German-to-English, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline.
摘要:现有神经机器翻译(NMT)系统利用序列到序列神经网络通过字,以生成目标翻译单词,然后使在每个时间步骤中的参考文献中所产生的字和对应尽可能一致。然而,训练的翻译模型往往把重点放在确保在当前时间步生成的目标词的准确性,也没有考虑,这意味着产生后续的目标翻译的预期成本的未来成本(即,下一个目标字) 。为了应对这个问题,我们提出了一个简单有效的方法,为NMT系统的每个目标词的未来成本模型。具体而言,与时间相关的未来费用是根据所产生的电流目标词和它的上下文信息来提振NMT模型的训练估计。此外,在当前时间步学习未来上下文表示是用来帮助解码中的下一个目标词的产生。三个广泛使用的翻译数据集,包括WMT14德国到英国,WMT14英语到法语,WMT17中国到英语,结果表明,该方法实现了强大的基于变压器的NMT基线显著改善实验结果。
9. Robust Unsupervised Neural Machine Translation with Adversarial Training [PDF] 返回目录
Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao
Abstract: Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community, achieving only slightly worse results than supervised neural machine translation. However, in real-world scenarios, there usually exists minor noise in the input sentence and the neural translation system is sensitive to the small perturbations in the input, leading to poor performance. In this paper, we first define two types of noises and empirically show the effect of these noisy data on UNMT performance. Moreover, we propose adversarial training methods to improve the robustness of UNMT in the noisy scenario. To the best of our knowledge, this paper is the first work to explore the robustness of UNMT. Experimental results on several language pairs show that our proposed methods substantially outperform conventional UNMT systems in the noisy scenario.
摘要:无监督神经机器翻译(UNMT)最近吸引了机器翻译界的极大兴趣,达到仅比监管神经机器翻译略差的结果。然而,在现实情况中,通常存在于输入句子轻微的噪声和神经翻译系统是在输入的小扰动敏感,导致业绩不佳。在本文中,我们首先定义两种类型的噪声和经验显示在UNMT性能这些噪声数据的影响。此外,我们提出了对抗性训练方法,提高UNMT的稳健性在喧嚣的场景。据我们所知,这是本文探讨UNMT的鲁棒性的第一部作品。在几个语言对实验结果表明,该方法显着优于在喧闹的场景传统UNMT系统。
Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao
Abstract: Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community, achieving only slightly worse results than supervised neural machine translation. However, in real-world scenarios, there usually exists minor noise in the input sentence and the neural translation system is sensitive to the small perturbations in the input, leading to poor performance. In this paper, we first define two types of noises and empirically show the effect of these noisy data on UNMT performance. Moreover, we propose adversarial training methods to improve the robustness of UNMT in the noisy scenario. To the best of our knowledge, this paper is the first work to explore the robustness of UNMT. Experimental results on several language pairs show that our proposed methods substantially outperform conventional UNMT systems in the noisy scenario.
摘要:无监督神经机器翻译(UNMT)最近吸引了机器翻译界的极大兴趣,达到仅比监管神经机器翻译略差的结果。然而,在现实情况中,通常存在于输入句子轻微的噪声和神经翻译系统是在输入的小扰动敏感,导致业绩不佳。在本文中,我们首先定义两种类型的噪声和经验显示在UNMT性能这些噪声数据的影响。此外,我们提出了对抗性训练方法,提高UNMT的稳健性在喧嚣的场景。据我们所知,这是本文探讨UNMT的鲁棒性的第一部作品。在几个语言对实验结果表明,该方法显着优于在喧闹的场景传统UNMT系统。
10. UKARA 1.0 Challenge Track 1: Automatic Short-Answer Scoring in Bahasa Indonesia [PDF] 返回目录
Ali Akbar Septiandri, Yosef Ardhito Winatmoko
Abstract: We describe our third-place solution to the UKARA 1.0 challenge on automated essay scoring. The task consists of a binary classification problem on two datasets | answers from two different questions. We ended up using two different models for the two datasets. For task A, we applied a random forest algorithm on features extracted using unigram with latent semantic analysis (LSA). On the other hand, for task B, we only used logistic regression on TF-IDF features. Our model results in F1 score of 0.812.
摘要:我们描述了我们的第三位解决方案的自动化作文得分UKARA 1.0挑战。该任务由一个二元分类问题的两个数据集|从两个不同的问题的答案。我们结束了使用两种不同的模式在两个数据集。对于任务A,我们应用使用的单字组与潜在语义分析(LSA)提取的特征随机森林算法。在另一方面,任务B,我们只用在TF-IDF功能回归。我们在F1模型结果得分为0.812。
Ali Akbar Septiandri, Yosef Ardhito Winatmoko
Abstract: We describe our third-place solution to the UKARA 1.0 challenge on automated essay scoring. The task consists of a binary classification problem on two datasets | answers from two different questions. We ended up using two different models for the two datasets. For task A, we applied a random forest algorithm on features extracted using unigram with latent semantic analysis (LSA). On the other hand, for task B, we only used logistic regression on TF-IDF features. Our model results in F1 score of 0.812.
摘要:我们描述了我们的第三位解决方案的自动化作文得分UKARA 1.0挑战。该任务由一个二元分类问题的两个数据集|从两个不同的问题的答案。我们结束了使用两种不同的模式在两个数据集。对于任务A,我们应用使用的单字组与潜在语义分析(LSA)提取的特征随机森林算法。在另一方面,任务B,我们只用在TF-IDF功能回归。我们在F1模型结果得分为0.812。
11. Temporal Convolutional Attention-based Network For Sequence Modeling [PDF] 返回目录
Hongyan Hao, Yan Wang, Yudi Xia, Jian Zhao, Furao Shen
Abstract: With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 26.92 on word-level PTB, 1.043 on character-level PTB, and 6.66 on WikiText-2.
摘要:随着前馈机型的发展,为序列建模的默认模式已经逐渐演变为经常更换网络。基于卷积网络和注意机制的许多功能强大的前馈模型,提出并表现出更多的潜力来处理序列建模任务。我们不知道这是有一个架构,不仅可以实现循环网络的近似替代,但也吸收前馈机型的优点。因此,我们提出了一个试探性的架构称之为基于注意时态卷积网络(TCAN)相结合的时间卷积网络和注意机制。 TCAN包括两个部分,一个是颞注意(TA),其捕获的序列内的相关特征,另一种是增强的残留(ER),其提取浅层的重要信息,并转移到深层。我们提高wikitext的-2 BPC /困惑的上字级PTB,1.043的字符级PTB,和6.66的国家的最先进成果26.92。
Hongyan Hao, Yan Wang, Yudi Xia, Jian Zhao, Furao Shen
Abstract: With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 26.92 on word-level PTB, 1.043 on character-level PTB, and 6.66 on WikiText-2.
摘要:随着前馈机型的发展,为序列建模的默认模式已经逐渐演变为经常更换网络。基于卷积网络和注意机制的许多功能强大的前馈模型,提出并表现出更多的潜力来处理序列建模任务。我们不知道这是有一个架构,不仅可以实现循环网络的近似替代,但也吸收前馈机型的优点。因此,我们提出了一个试探性的架构称之为基于注意时态卷积网络(TCAN)相结合的时间卷积网络和注意机制。 TCAN包括两个部分,一个是颞注意(TA),其捕获的序列内的相关特征,另一种是增强的残留(ER),其提取浅层的重要信息,并转移到深层。我们提高wikitext的-2 BPC /困惑的上字级PTB,1.043的字符级PTB,和6.66的国家的最先进成果26.92。
12. Optimizing Memory-Access Patterns for Deep Learning Accelerators [PDF] 返回目录
Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai, Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang
Abstract: Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.
摘要:深学习(DL),工作负载加速迈向移动进行更快的处理和更低的成本。现代DL加速器是善于处理大型乘法累加运算主宰DL工作量;但是,它是具有挑战性的充分利用油门的计算能力,因为数据必须是正确的软件管理暂存器上演。如果不这样做可能会导致显著的性能损失。本文提出了一种利用多面体模型来分析DL模型在一起的所有运营商,以尽量减少内存访问次数的系统方法。实验结果表明,我们的方法可以大大降低内存存取影响,通过对自主开发的常见的神经网络模型所需的AWS命名Inferentia机器学习的推理片,这是可以通过Amazon EC2的INF1实例。
Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai, Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang
Abstract: Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.
摘要:深学习(DL),工作负载加速迈向移动进行更快的处理和更低的成本。现代DL加速器是善于处理大型乘法累加运算主宰DL工作量;但是,它是具有挑战性的充分利用油门的计算能力,因为数据必须是正确的软件管理暂存器上演。如果不这样做可能会导致显著的性能损失。本文提出了一种利用多面体模型来分析DL模型在一起的所有运营商,以尽量减少内存访问次数的系统方法。实验结果表明,我们的方法可以大大降低内存存取影响,通过对自主开发的常见的神经网络模型所需的AWS命名Inferentia机器学习的推理片,这是可以通过Amazon EC2的INF1实例。
13. RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media [PDF] 返回目录
Jie Gao, Sooji Han, Xingyi Song, Fabio Ciravegna
Abstract: Early rumor detection (ERD) on social media platform is very challenging when limited, incomplete and noisy information is available. Most of the existing methods have largely worked on event-level detection that requires the collection of posts relevant to a specific event and relied only on user-generated content. They are not appropriate to detect rumor sources in the very early stages, before an event unfolds and becomes widespread. In this paper, we address the task of ERD at the message level. We present a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development. We apply multi-layered attention models to jointly learn attentive context embeddings over multiple context inputs. Our experiments employ a stringent leave-one-out cross-validation (LOO-CV) evaluation setup on seven publicly available real-life rumor event data sets. Our models achieve state-of-the-art(SoA) performance for detecting unseen rumors on large augmented data which covers more than 12 events and 2,967 rumors. An ablation study is conducted to understand the relative contribution of each component of our proposed model.
摘要:当有限的,不完整的社交媒体平台的早期传言检测(ERD)是非常具有挑战性和嘈杂的信息是可用的。大多数现有的方法大都工作在事件等级检测,需要相关的特定事件的帖子收集和只在用户生成内容的依据。他们并不适合检测非常早期的谣言来源,事件都呈现出前,成为普遍。在本文中,我们解决在消息级别的ERD的任务。我们提出了一个新颖的混合神经网络体系结构,它结合了一个基于字符任务专用双向语言模型和堆叠长短期存储器(LSTM)网络来表示文本内容和输入源鸣叫的社会时空上下文,用于建模传播模式在其发展的早期阶段的传言。我们采用多层关注车型,共同学习了多个方面的投入周到的背景下的嵌入。我们的实验采用了严格的留一交叉验证(LOO-CV)评估设置七个公开提供真实的谣言事件数据。我们的模型实现状态的最先进的(SOA)表现为在其上覆盖超过12个事件和2967个传言大增强数据检测看不见传言。消融研究以了解我们提出的模型中各组分的相对贡献。
Jie Gao, Sooji Han, Xingyi Song, Fabio Ciravegna
Abstract: Early rumor detection (ERD) on social media platform is very challenging when limited, incomplete and noisy information is available. Most of the existing methods have largely worked on event-level detection that requires the collection of posts relevant to a specific event and relied only on user-generated content. They are not appropriate to detect rumor sources in the very early stages, before an event unfolds and becomes widespread. In this paper, we address the task of ERD at the message level. We present a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development. We apply multi-layered attention models to jointly learn attentive context embeddings over multiple context inputs. Our experiments employ a stringent leave-one-out cross-validation (LOO-CV) evaluation setup on seven publicly available real-life rumor event data sets. Our models achieve state-of-the-art(SoA) performance for detecting unseen rumors on large augmented data which covers more than 12 events and 2,967 rumors. An ablation study is conducted to understand the relative contribution of each component of our proposed model.
摘要:当有限的,不完整的社交媒体平台的早期传言检测(ERD)是非常具有挑战性和嘈杂的信息是可用的。大多数现有的方法大都工作在事件等级检测,需要相关的特定事件的帖子收集和只在用户生成内容的依据。他们并不适合检测非常早期的谣言来源,事件都呈现出前,成为普遍。在本文中,我们解决在消息级别的ERD的任务。我们提出了一个新颖的混合神经网络体系结构,它结合了一个基于字符任务专用双向语言模型和堆叠长短期存储器(LSTM)网络来表示文本内容和输入源鸣叫的社会时空上下文,用于建模传播模式在其发展的早期阶段的传言。我们采用多层关注车型,共同学习了多个方面的投入周到的背景下的嵌入。我们的实验采用了严格的留一交叉验证(LOO-CV)评估设置七个公开提供真实的谣言事件数据。我们的模型实现状态的最先进的(SOA)表现为在其上覆盖超过12个事件和2967个传言大增强数据检测看不见传言。消融研究以了解我们提出的模型中各组分的相对贡献。
14. A multi-layer approach to disinformation detection on Twitter [PDF] 返回目录
Francesco Pierri, Carlo Piccardi, Stefano Ceri
Abstract: We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. Our technique is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). We employ a multi-layer representation of Twitter diffusion networks, and we compute for each layer a set of global network features which quantify different aspects of the sharing process. Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy (AUROC up to 94%), also when considering the political bias of different sources in the classification task. We also highlight differences in the sharing patterns of the two news domains which appear to be country-independent. We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.
摘要:我们通过处理仅仅检查在Twitter上的扩散机制有关造谣VS主流新闻新闻文章进行分类的问题。相比于现有的基于文本的方法我们的技术本质上是简单的,因为它允许绕过这些新闻内容中发现的复杂的多层次(如语法,句法,风格)。我们使用Twitter的扩散网络的多层表示,我们计算每个层的一套全球网络功能,其量化的共享过程的不同方面。有两个大型数据集的实验结果,对应的消息扩散级联在美国和意大利分别共享,表明一个简单的逻辑回归模型能够进行分类造谣VS高精度主流网络(AUROC高达94%),还考虑在分类任务不同来源的政治偏见的时候。我们还强调在这似乎是国家独立的两个新闻领域的交流模式的差异。我们相信,我们的基于网络的方法提供了铺平了道路系统的未来发展,以检测误导和有害信息在社会化媒体传播有益的见解。
Francesco Pierri, Carlo Piccardi, Stefano Ceri
Abstract: We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. Our technique is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). We employ a multi-layer representation of Twitter diffusion networks, and we compute for each layer a set of global network features which quantify different aspects of the sharing process. Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy (AUROC up to 94%), also when considering the political bias of different sources in the classification task. We also highlight differences in the sharing patterns of the two news domains which appear to be country-independent. We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.
摘要:我们通过处理仅仅检查在Twitter上的扩散机制有关造谣VS主流新闻新闻文章进行分类的问题。相比于现有的基于文本的方法我们的技术本质上是简单的,因为它允许绕过这些新闻内容中发现的复杂的多层次(如语法,句法,风格)。我们使用Twitter的扩散网络的多层表示,我们计算每个层的一套全球网络功能,其量化的共享过程的不同方面。有两个大型数据集的实验结果,对应的消息扩散级联在美国和意大利分别共享,表明一个简单的逻辑回归模型能够进行分类造谣VS高精度主流网络(AUROC高达94%),还考虑在分类任务不同来源的政治偏见的时候。我们还强调在这似乎是国家独立的两个新闻领域的交流模式的差异。我们相信,我们的基于网络的方法提供了铺平了道路系统的未来发展,以检测误导和有害信息在社会化媒体传播有益的见解。
15. Exploring and Distilling Cross-Modal Information for Image Captioning [PDF] 返回目录
Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
Abstract: Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.
摘要:近日,注意基于编码器的解码器模型已被广泛应用在影像字幕使用。然而,仍然有很大的难度当前方法来实现深图像理解。在这项工作中,我们认为,这样的理解,需要视觉注意相关的图像区域和语义重视利益一致的属性。要进行有效的关注,我们从跨模态的角度探讨图像字幕,并提出了全局和局部信息的探索和 - 蒸馏的方法,探索和提炼在视觉和语言的源信息。它全局提供方面向量,基于字幕上下文图像的空间和关系表示,通过显着区域分组和属性搭配的提取,并在本地提取细粒度区域和属性参考用于字选择纵横向量。我们全面周到的模型实现了129.3对的COCO测试集效率惊人离线COCO评价苹果酒得分在精度,速度和参数预算方面。
Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
Abstract: Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation on the COCO testing set with remarkable efficiency in terms of accuracy, speed, and parameter budget.
摘要:近日,注意基于编码器的解码器模型已被广泛应用在影像字幕使用。然而,仍然有很大的难度当前方法来实现深图像理解。在这项工作中,我们认为,这样的理解,需要视觉注意相关的图像区域和语义重视利益一致的属性。要进行有效的关注,我们从跨模态的角度探讨图像字幕,并提出了全局和局部信息的探索和 - 蒸馏的方法,探索和提炼在视觉和语言的源信息。它全局提供方面向量,基于字幕上下文图像的空间和关系表示,通过显着区域分组和属性搭配的提取,并在本地提取细粒度区域和属性参考用于字选择纵横向量。我们全面周到的模型实现了129.3对的COCO测试集效率惊人离线COCO评价苹果酒得分在精度,速度和参数预算方面。
16. Learning Directly from Grammar Compressed Text [PDF] 返回目录
Yoichi Sasaki, Kosuke Akimoto, Takanori Maehara
Abstract: Neural networks using numerous text data have been successfully applied to a variety of tasks. While massive text data is usually compressed using techniques such as grammar compression, almost all of the previous machine learning methods assume already decompressed sequence data as their input. In this paper, we propose a method to directly apply neural sequence models to text data compressed with grammar compression algorithms without decompression. To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations. Through experiments on real datasets, we empirically showed that the proposal model can achieve both memory and computational efficiency while maintaining moderate performance.
摘要:用大量的文字资料神经网络已经成功地应用于各种任务。虽然大量的文本数据使用的技术,如语法压缩通常被压缩,几乎所有的以前的机器学习方法假设已经解压缩序列数据作为其输入。在本文中,我们建议的神经序列模型直接适用于与语法的压缩算法压缩不减压的文本数据的方法。编码显示在压缩规则独特的符号,我们介绍作曲家模块的符号递增编码为向量表示。通过对真实数据集实验中,我们经验表明,建议模型可以实现内存和计算效率,同时保持适度的性能。
Yoichi Sasaki, Kosuke Akimoto, Takanori Maehara
Abstract: Neural networks using numerous text data have been successfully applied to a variety of tasks. While massive text data is usually compressed using techniques such as grammar compression, almost all of the previous machine learning methods assume already decompressed sequence data as their input. In this paper, we propose a method to directly apply neural sequence models to text data compressed with grammar compression algorithms without decompression. To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations. Through experiments on real datasets, we empirically showed that the proposal model can achieve both memory and computational efficiency while maintaining moderate performance.
摘要:用大量的文字资料神经网络已经成功地应用于各种任务。虽然大量的文本数据使用的技术,如语法压缩通常被压缩,几乎所有的以前的机器学习方法假设已经解压缩序列数据作为其输入。在本文中,我们建议的神经序列模型直接适用于与语法的压缩算法压缩不减压的文本数据的方法。编码显示在压缩规则独特的符号,我们介绍作曲家模块的符号递增编码为向量表示。通过对真实数据集实验中,我们经验表明,建议模型可以实现内存和计算效率,同时保持适度的性能。
17. Comment Ranking Diversification in Forum Discussions [PDF] 返回目录
Curtis G. Northcutt, Kimberly A. Leon, Naichun Chen
Abstract: Viewing consumption of discussion forums with hundreds or more comments depends on ranking because most users only view top-ranked comments. When comments are ranked by an ordered score (e.g. number of replies or up-votes) without adjusting for semantic similarity of near-ranked comments, top-ranked comments are more likely to emphasize the majority opinion and incur redundancy. In this paper, we propose a top K comment diversification re-ranking model using Maximal Marginal Relevance (MMR) and evaluate its impact in three categories: (1) semantic diversity, (2) inclusion of the semantics of lower-ranked comments, and (3) redundancy, within the context of a HarvardX course discussion forum. We conducted a double-blind, small-scale evaluation experiment requiring subjects to select between the top 5 comments of a diversified ranking and a baseline ranking ordered by score. For three subjects, across 100 trials, subjects selected the diversified (75% score, 25% diversification) ranking as significantly (1) more diverse, (2) more inclusive, and (3) less redundant. Within each category, inter-rater reliability showed moderate consistency, with typical Cohen-Kappa scores near 0.2. Our findings suggest that our model improves (1) diversification, (2) inclusion, and (3) redundancy, among top K ranked comments in online discussion forums.
摘要:查看论坛的消耗与数百个或多个注释取决于排名,因为大多数用户只能查看排名第一的意见。当评论通过一个有序的得分没有调整的近位列评论语义相似的排名(如回复或向上的票数),排名第一的评论更可能强调多数人的意见,并招致冗余。在本文中,我们提出了用最大边缘相关(MMR)一前K评论多样化重排序模型,并评估其在三类影响:(1)语义的多样性,(2)列入排名较低的意见语义,并(3)冗余,一个HarvardX当然讨论论坛的范围内。我们进行了一项双盲,小规模的评价实验,要求受试者顶部的5条评论多样化的排名和基准之间进行选择按分数排名排序。对于三个科,跨越100次试验,选择受试者多样化(75%的分数,25%多样化)作为显著(1)排名更加多样化,(2)更包容,和(3)较少的冗余。在每个类别中,评估者间可靠性显示中度的一致性,用近0.2典型科恩-卡帕分数。我们的研究结果表明,我们的模型提高了(1)多元化,(2)包容,和(3)冗余,跻身K的在线论坛排名意见。
Curtis G. Northcutt, Kimberly A. Leon, Naichun Chen
Abstract: Viewing consumption of discussion forums with hundreds or more comments depends on ranking because most users only view top-ranked comments. When comments are ranked by an ordered score (e.g. number of replies or up-votes) without adjusting for semantic similarity of near-ranked comments, top-ranked comments are more likely to emphasize the majority opinion and incur redundancy. In this paper, we propose a top K comment diversification re-ranking model using Maximal Marginal Relevance (MMR) and evaluate its impact in three categories: (1) semantic diversity, (2) inclusion of the semantics of lower-ranked comments, and (3) redundancy, within the context of a HarvardX course discussion forum. We conducted a double-blind, small-scale evaluation experiment requiring subjects to select between the top 5 comments of a diversified ranking and a baseline ranking ordered by score. For three subjects, across 100 trials, subjects selected the diversified (75% score, 25% diversification) ranking as significantly (1) more diverse, (2) more inclusive, and (3) less redundant. Within each category, inter-rater reliability showed moderate consistency, with typical Cohen-Kappa scores near 0.2. Our findings suggest that our model improves (1) diversification, (2) inclusion, and (3) redundancy, among top K ranked comments in online discussion forums.
摘要:查看论坛的消耗与数百个或多个注释取决于排名,因为大多数用户只能查看排名第一的意见。当评论通过一个有序的得分没有调整的近位列评论语义相似的排名(如回复或向上的票数),排名第一的评论更可能强调多数人的意见,并招致冗余。在本文中,我们提出了用最大边缘相关(MMR)一前K评论多样化重排序模型,并评估其在三类影响:(1)语义的多样性,(2)列入排名较低的意见语义,并(3)冗余,一个HarvardX当然讨论论坛的范围内。我们进行了一项双盲,小规模的评价实验,要求受试者顶部的5条评论多样化的排名和基准之间进行选择按分数排名排序。对于三个科,跨越100次试验,选择受试者多样化(75%的分数,25%多样化)作为显著(1)排名更加多样化,(2)更包容,和(3)较少的冗余。在每个类别中,评估者间可靠性显示中度的一致性,用近0.2典型科恩-卡帕分数。我们的研究结果表明,我们的模型提高了(1)多元化,(2)包容,和(3)冗余,跻身K的在线论坛排名意见。
注:中文为机器翻译结果!