目录
3. Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports [PDF] 摘要
4. A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models [PDF] 摘要
5. Investigating Language Impact in Bilingual Approaches for Computational Language Documentation [PDF] 摘要
7. Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid [PDF] 摘要
8. Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation [PDF] 摘要
10. Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement [PDF] 摘要
12. Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models [PDF] 摘要
13. Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling [PDF] 摘要
21. A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency [PDF] 摘要
23. Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin [PDF] 摘要
28. BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions [PDF] 摘要
29. Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework [PDF] 摘要
摘要
1. European Language Grid: An Overview [PDF] 返回目录
Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, Ondrej Klejch
Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 National Competence Centres (NCCs) and the European LT Council (LTC) for outreach and coordination purposes.
摘要:24欧盟官方和许多其他语言,使用多种语言在欧洲和包容性的数字单一市场只能通过语言技术(LTS)启用。欧洲LT业务是由数百家中小企业和一些大的球员为主。许多都是世界一流的,与跑赢了全球玩家的技术。然而,欧洲LT业务也支离破碎,由民族国家,语言文字,垂直和部门,显著阻碍了它的影响。欧洲语言网格(ELG)项目解决了这一碎片通过建立ELG在欧洲LT的主要平台。该ELG是一个可扩展的云平台,提供,在一个易于整合的方式,获得了数百家商业和非商业三烯所有欧洲语言,包括运行工具和服务,以及数据集和资源。一旦完全投入使用,它将使商业和非商业的欧洲LT社区存款和上传自己的技术和数据集到ELG,通过网格来部署他们,并与其他资源的连接。该ELG将推动多语种数字单一市场走向繁荣的欧洲LT社会,创造新的就业岗位和机会。此外,ELG项目组织多达20个试点项目,两个开放的呼叫。它还设置了32个国家竞争力中心(NCC的)和欧洲理事会LT(LTC)宣传和协调的目的。
Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, Ondrej Klejch
Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 National Competence Centres (NCCs) and the European LT Council (LTC) for outreach and coordination purposes.
摘要:24欧盟官方和许多其他语言,使用多种语言在欧洲和包容性的数字单一市场只能通过语言技术(LTS)启用。欧洲LT业务是由数百家中小企业和一些大的球员为主。许多都是世界一流的,与跑赢了全球玩家的技术。然而,欧洲LT业务也支离破碎,由民族国家,语言文字,垂直和部门,显著阻碍了它的影响。欧洲语言网格(ELG)项目解决了这一碎片通过建立ELG在欧洲LT的主要平台。该ELG是一个可扩展的云平台,提供,在一个易于整合的方式,获得了数百家商业和非商业三烯所有欧洲语言,包括运行工具和服务,以及数据集和资源。一旦完全投入使用,它将使商业和非商业的欧洲LT社区存款和上传自己的技术和数据集到ELG,通过网格来部署他们,并与其他资源的连接。该ELG将推动多语种数字单一市场走向繁荣的欧洲LT社会,创造新的就业岗位和机会。此外,ELG项目组织多达20个试点项目,两个开放的呼叫。它还设置了32个国家竞争力中心(NCC的)和欧洲理事会LT(LTC)宣传和协调的目的。
2. QRMine: A python package for triangulation in Grounded Theory [PDF] 返回目录
Bell Raj Eapen, Norm Archer, Kamran Sartipi
Abstract: Grounded theory (GT) is a qualitative research method for building theory grounded in data. GT uses textual and numeric data and follows various stages of coding or tagging data for sense-making, such as open coding and selective coding. Machine Learning (ML) techniques, including natural language processing (NLP), can assist the researchers in the coding process. Triangulation is the process of combining various types of data. ML can facilitate deriving insights from numerical data for corroborating findings from the textual interview transcripts. We present an open-source python package (QRMine) that encapsulates various ML and NLP libraries to support coding and triangulation in GT. QRMine enables researchers to use these methods on their data with minimal effort. Researchers can install QRMine from the python package index (PyPI) and can contribute to its development. We believe that the concept of computational triangulation will make GT relevant in the realm of big data.
摘要:扎根理论(GT)是用于构建理论数据接地的定性研究方法。 GT使用文本和数字数据和编码如下或标记为感测决策,数据的各个阶段诸如开放编码和选择性编码。机器学习(ML)技术,包括自然语言处理(NLP),可以帮助研究人员在编码的过程。三角测量是组合的各种类型的数据的过程。 ML可以便于从数值数据推导的见解从文本访谈记录确证的调查结果。我们提出了一个开放源码Python包(QRMine)封装各种ML和NLP库来支持在GT编码和三角测量。 QRMine使研究人员能够使用他们的数据,这些方法以最小的努力。研究人员可以从Python包索引(PyPI中)安装QRMine并能促进其发展。我们相信,计算三角形的概念将使GT在大数据领域相关。
Bell Raj Eapen, Norm Archer, Kamran Sartipi
Abstract: Grounded theory (GT) is a qualitative research method for building theory grounded in data. GT uses textual and numeric data and follows various stages of coding or tagging data for sense-making, such as open coding and selective coding. Machine Learning (ML) techniques, including natural language processing (NLP), can assist the researchers in the coding process. Triangulation is the process of combining various types of data. ML can facilitate deriving insights from numerical data for corroborating findings from the textual interview transcripts. We present an open-source python package (QRMine) that encapsulates various ML and NLP libraries to support coding and triangulation in GT. QRMine enables researchers to use these methods on their data with minimal effort. Researchers can install QRMine from the python package index (PyPI) and can contribute to its development. We believe that the concept of computational triangulation will make GT relevant in the realm of big data.
摘要:扎根理论(GT)是用于构建理论数据接地的定性研究方法。 GT使用文本和数字数据和编码如下或标记为感测决策,数据的各个阶段诸如开放编码和选择性编码。机器学习(ML)技术,包括自然语言处理(NLP),可以帮助研究人员在编码的过程。三角测量是组合的各种类型的数据的过程。 ML可以便于从数值数据推导的见解从文本访谈记录确证的调查结果。我们提出了一个开放源码Python包(QRMine)封装各种ML和NLP库来支持在GT编码和三角测量。 QRMine使研究人员能够使用他们的数据,这些方法以最小的努力。研究人员可以从Python包索引(PyPI中)安装QRMine并能促进其发展。我们相信,计算三角形的概念将使GT在大数据领域相关。
3. Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports [PDF] 返回目录
Juan C Quiroz, Liliana Laranjo, Catalin Tufanaru, Ahmet Baki Kocaballi, Dana Rezazadegan, Shlomo Berkovsky, Enrico Coiera
Abstract: Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power law distributions to the data, and testing whether alternative distributions--lognormal, exponential, stretched exponential, and truncated power law--provided superior fits to the data. Results show that discharge reports are best fit by the truncated power law and lognormal distributions. Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power law and lognormal probability priors.
摘要:贝叶斯模型和统计文本分析依靠明智的概率先验鼓励良好的解决方案。本文实证分析的医疗排放报告文本是否遵守齐普夫定律,其中词频如下离散幂律分布语言的常用假设的统计特性。我们研究了从模仿-III数据集20000个医疗排放报告。方法包括分裂放电报告到令牌,令牌计数频率,拟合幂律分布的数据,和检测是否替代分布 - 对数正态分布,指数,拉伸的指数,并截断幂律 - 提供优于拟合到数据。结果表明,排放报告是由被截断的幂律和对数正态分布最合适的。我们的研究结果表明,贝叶斯模型和排放报告文本的统计文本分析将使用截断幂和对数正态分布概率先验受益。
Juan C Quiroz, Liliana Laranjo, Catalin Tufanaru, Ahmet Baki Kocaballi, Dana Rezazadegan, Shlomo Berkovsky, Enrico Coiera
Abstract: Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power law distributions to the data, and testing whether alternative distributions--lognormal, exponential, stretched exponential, and truncated power law--provided superior fits to the data. Results show that discharge reports are best fit by the truncated power law and lognormal distributions. Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power law and lognormal probability priors.
摘要:贝叶斯模型和统计文本分析依靠明智的概率先验鼓励良好的解决方案。本文实证分析的医疗排放报告文本是否遵守齐普夫定律,其中词频如下离散幂律分布语言的常用假设的统计特性。我们研究了从模仿-III数据集20000个医疗排放报告。方法包括分裂放电报告到令牌,令牌计数频率,拟合幂律分布的数据,和检测是否替代分布 - 对数正态分布,指数,拉伸的指数,并截断幂律 - 提供优于拟合到数据。结果表明,排放报告是由被截断的幂律和对数正态分布最合适的。我们的研究结果表明,贝叶斯模型和排放报告文本的统计文本分析将使用截断幂和对数正态分布概率先验受益。
4. A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models [PDF] 返回目录
Fabian Galetzka, Chukwuemeka U. Eneh, David Schlangen
Abstract: Fully data driven Chatbots for non-goal oriented dialogues are known to suffer from inconsistent behaviour across their turns, stemming from a general difficulty in controlling parameters like their assumed background personality and knowledge of facts. One reason for this is the relative lack of labeled data from which personality consistency and fact usage could be learned together with dialogue behaviour. To address this, we introduce a new labeled dialogue dataset in the domain of movie discussions, where every dialogue is based on pre-specified facts and opinions. We thoroughly validate the collected dialogue for adherence of the participants to their given fact and opinion profile, and find that the general quality in this respect is high. This process also gives us an additional layer of annotation that is potentially useful for training models. We introduce as a baseline an end-to-end trained self-attention decoder model trained on this data and show that it is able to generate opinionated responses that are judged to be natural and knowledgeable and show attentiveness.
摘要:完全数据驱动的非目标导向的对话聊天机器人被称为来自全国各地轮番上阵不一致的行为受苦,在控制像他们假定的背景和性格了解事实参数的一般困难而产生。这其中的一个原因是相对缺乏从个性的一致性和事实的使用可能与对话行为来一起学习的标签数据。为了解决这个问题,我们在电影的讨论,其中每个对话是基于预先指定的事实和观点的领域引入新的标记对话集。我们对参与者的坚持所收集的对话彻底验证其给定的事实和意见的个人资料,发现在这方面的总体质量是高的。这个过程也为我们提供了注释的附加层是用于训练模型可能有用。我们引进作为基线培训了这些数据,并表明它是能够产生被判断为自然和了解,表示注意力自以为是反应的端至端训练有素的自我关注解码器模型。
Fabian Galetzka, Chukwuemeka U. Eneh, David Schlangen
Abstract: Fully data driven Chatbots for non-goal oriented dialogues are known to suffer from inconsistent behaviour across their turns, stemming from a general difficulty in controlling parameters like their assumed background personality and knowledge of facts. One reason for this is the relative lack of labeled data from which personality consistency and fact usage could be learned together with dialogue behaviour. To address this, we introduce a new labeled dialogue dataset in the domain of movie discussions, where every dialogue is based on pre-specified facts and opinions. We thoroughly validate the collected dialogue for adherence of the participants to their given fact and opinion profile, and find that the general quality in this respect is high. This process also gives us an additional layer of annotation that is potentially useful for training models. We introduce as a baseline an end-to-end trained self-attention decoder model trained on this data and show that it is able to generate opinionated responses that are judged to be natural and knowledgeable and show attentiveness.
摘要:完全数据驱动的非目标导向的对话聊天机器人被称为来自全国各地轮番上阵不一致的行为受苦,在控制像他们假定的背景和性格了解事实参数的一般困难而产生。这其中的一个原因是相对缺乏从个性的一致性和事实的使用可能与对话行为来一起学习的标签数据。为了解决这个问题,我们在电影的讨论,其中每个对话是基于预先指定的事实和观点的领域引入新的标记对话集。我们对参与者的坚持所收集的对话彻底验证其给定的事实和意见的个人资料,发现在这方面的总体质量是高的。这个过程也为我们提供了注释的附加层是用于训练模型可能有用。我们引进作为基线培训了这些数据,并表明它是能够产生被判断为自然和了解,表示注意力自以为是反应的端至端训练有素的自我关注解码器模型。
5. Investigating Language Impact in Bilingual Approaches for Computational Language Documentation [PDF] 返回目录
Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Abstract: For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight that the choice of language for translation influences the word segmentation performance, and that different lexicons are learned by using different aligned translations. Lastly, this paper proposes a hybrid approach for bilingual word segmentation, combining boundary clues extracted from a non-parametric Bayesian model (Goldwater et al., 2009a) with the attentional word segmentation neural model from Godard et al. (2018). Our results suggest that incorporating these clues into the neural models' input representation increases their translation and alignment quality, specially for challenging language pairs.
摘要:濒危语言,数据收集活动必须适应这一挑战,其中许多人都是从口头传统,以及生产改编曲是昂贵的。因此,最根本的是将它们转换成广泛的语言,保证了录音的解释性。在本文中,我们调查的翻译语言的选择如何影响后的文档工作和潜在的自动方法,将所制造的双语语料库基础上工作。为了回答这个问题,我们利用大众多语种语音库(博伊托等人,2020年)创建56双语对我们适用于低资源监督的分词和调整的任务。我们的研究结果强调,语言翻译的选择会影响分词的性能,并且不同的词典通过使用不同的对齐的翻译经验。最后,提出一种双语词分割的混合方法,组合来自非参数贝叶斯模型提取的边界线索(金水等人,2009a的)与来自戈达尔等人的注意力分词神经网络模型。 (2018)。我们的研究结果表明,将这些线索到神经模型的输入表示增加了他们的翻译和对齐质量,专为具有挑战性的语言对。
Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Abstract: For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight that the choice of language for translation influences the word segmentation performance, and that different lexicons are learned by using different aligned translations. Lastly, this paper proposes a hybrid approach for bilingual word segmentation, combining boundary clues extracted from a non-parametric Bayesian model (Goldwater et al., 2009a) with the attentional word segmentation neural model from Godard et al. (2018). Our results suggest that incorporating these clues into the neural models' input representation increases their translation and alignment quality, specially for challenging language pairs.
摘要:濒危语言,数据收集活动必须适应这一挑战,其中许多人都是从口头传统,以及生产改编曲是昂贵的。因此,最根本的是将它们转换成广泛的语言,保证了录音的解释性。在本文中,我们调查的翻译语言的选择如何影响后的文档工作和潜在的自动方法,将所制造的双语语料库基础上工作。为了回答这个问题,我们利用大众多语种语音库(博伊托等人,2020年)创建56双语对我们适用于低资源监督的分词和调整的任务。我们的研究结果强调,语言翻译的选择会影响分词的性能,并且不同的词典通过使用不同的对齐的翻译经验。最后,提出一种双语词分割的混合方法,组合来自非参数贝叶斯模型提取的边界线索(金水等人,2009a的)与来自戈达尔等人的注意力分词神经网络模型。 (2018)。我们的研究结果表明,将这些线索到神经模型的输入表示增加了他们的翻译和对齐质量,专为具有挑战性的语言对。
6. How human judgment impairs automated deception detection performance [PDF] 返回目录
Bennett Kleinberg, Bruno Verschuere
Abstract: Background: Deception detection is a prevalent problem for security practitioners. With a need for more large-scale approaches, automated methods using machine learning have gained traction. However, detection performance still implies considerable error rates. Findings from other domains suggest that hybrid human-machine integrations could offer a viable path in deception detection tasks. Method: We collected a corpus of truthful and deceptive answers about participants' autobiographical intentions (n=1640) and tested whether a combination of supervised machine learning and human judgment could improve deception detection accuracy. Human judges were presented with the outcome of the automated credibility judgment of truthful and deceptive statements. They could either fully overrule it (hybrid-overrule condition) or adjust it within a given boundary (hybrid-adjust condition). Results: The data suggest that in neither of the hybrid conditions did the human judgment add a meaningful contribution. Machine learning in isolation identified truth-tellers and liars with an overall accuracy of 69%. Human involvement through hybrid-overrule decisions brought the accuracy back to the chance level. The hybrid-adjust condition did not deception detection performance. The decision-making strategies of humans suggest that the truth bias - the tendency to assume the other is telling the truth - could explain the detrimental effect. Conclusion: The current study does not support the notion that humans can meaningfully add to the deception detection performance of a machine learning system.
摘要:背景:欺骗检测是安全从业者一个普遍的问题。随着需要更大规模的方法,利用机器学习自动化方法已开始流行。然而,检测性能还是相当意味着错误率。来自其他领域的研究结果表明,混合动力人机集成可以提供在欺骗检测任务的可行路径。方法:我们收集了真实的和欺骗性的答案约参与者的自传意图语料库(N = 1640)和测试监督机器学习和人工判断的组合是否能提高欺骗的检测精度。人类法官赠送了真实和欺骗性陈述的自动化可信度判断的结果。他们既可以完全否决它(混合过规则条件),或调整一个给定的边界(混合调整状态)内。结果:数据表明,既不的混合条件做了人的判断添加有意义的贡献。机器学习在隔离标识真相出纳员和骗子用69%的整体精度。通过混合过规则决定人类参与所带来的准确性回机会水平。混合动力调整状态并没有欺骗检测性能。人类的决策策略提示,真理偏见 - 承担其他的趋势说的是实话 - 可以解释的不利影响。结论:目前的研究并不支持这一概念,人类可以有意识添加到机器学习系统的欺骗检测性能。
Bennett Kleinberg, Bruno Verschuere
Abstract: Background: Deception detection is a prevalent problem for security practitioners. With a need for more large-scale approaches, automated methods using machine learning have gained traction. However, detection performance still implies considerable error rates. Findings from other domains suggest that hybrid human-machine integrations could offer a viable path in deception detection tasks. Method: We collected a corpus of truthful and deceptive answers about participants' autobiographical intentions (n=1640) and tested whether a combination of supervised machine learning and human judgment could improve deception detection accuracy. Human judges were presented with the outcome of the automated credibility judgment of truthful and deceptive statements. They could either fully overrule it (hybrid-overrule condition) or adjust it within a given boundary (hybrid-adjust condition). Results: The data suggest that in neither of the hybrid conditions did the human judgment add a meaningful contribution. Machine learning in isolation identified truth-tellers and liars with an overall accuracy of 69%. Human involvement through hybrid-overrule decisions brought the accuracy back to the chance level. The hybrid-adjust condition did not deception detection performance. The decision-making strategies of humans suggest that the truth bias - the tendency to assume the other is telling the truth - could explain the detrimental effect. Conclusion: The current study does not support the notion that humans can meaningfully add to the deception detection performance of a machine learning system.
摘要:背景:欺骗检测是安全从业者一个普遍的问题。随着需要更大规模的方法,利用机器学习自动化方法已开始流行。然而,检测性能还是相当意味着错误率。来自其他领域的研究结果表明,混合动力人机集成可以提供在欺骗检测任务的可行路径。方法:我们收集了真实的和欺骗性的答案约参与者的自传意图语料库(N = 1640)和测试监督机器学习和人工判断的组合是否能提高欺骗的检测精度。人类法官赠送了真实和欺骗性陈述的自动化可信度判断的结果。他们既可以完全否决它(混合过规则条件),或调整一个给定的边界(混合调整状态)内。结果:数据表明,既不的混合条件做了人的判断添加有意义的贡献。机器学习在隔离标识真相出纳员和骗子用69%的整体精度。通过混合过规则决定人类参与所带来的准确性回机会水平。混合动力调整状态并没有欺骗检测性能。人类的决策策略提示,真理偏见 - 承担其他的趋势说的是实话 - 可以解释的不利影响。结论:目前的研究并不支持这一概念,人类可以有意识添加到机器学习系统的欺骗检测性能。
7. Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid [PDF] 返回目录
Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitrios Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuel Gómez Pérez, Andres Garcia Silva
Abstract: The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines.
摘要:目前的科技景观为特征的数据资源和加工工具和服务的日益普及。在这种背景下,元数据已经成为这样的数字资产的关键因素有利于管理,共享和使用。在本文中,我们目前ELG-SHARE,丰富的元数据模式迎合语言资源和技术(处理和生成服务和工具,模型,语料库,术语列表等)的描述,以及相关实体(例如,组织,项目,证明文件等)。该模式的权力欧洲语言网格平台,目标是成为在欧洲的主要枢纽和市场对行业相关的语言技术。 ELG-SHARE已经基于各种元数据架构,词汇,和本体,以及相关的建议和指导方针。
Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitrios Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuel Gómez Pérez, Andres Garcia Silva
Abstract: The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines.
摘要:目前的科技景观为特征的数据资源和加工工具和服务的日益普及。在这种背景下,元数据已经成为这样的数字资产的关键因素有利于管理,共享和使用。在本文中,我们目前ELG-SHARE,丰富的元数据模式迎合语言资源和技术(处理和生成服务和工具,模型,语料库,术语列表等)的描述,以及相关实体(例如,组织,项目,证明文件等)。该模式的权力欧洲语言网格平台,目标是成为在欧洲的主要枢纽和市场对行业相关的语言技术。 ELG-SHARE已经基于各种元数据架构,词汇,和本体,以及相关的建议和指导方针。
8. Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation [PDF] 返回目录
Pei Zhang, Xu Zhang, Wei Chen, Jian Yu, Yanfeng Wang, Deyi Xiong
Abstract: Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence. In this paper, we propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence. By enforcing the NMT model to predict source context, we want the model to learn "contextualized" source sentence representations that capture document-level dependencies on the source side. We further propose two different methods to learn and integrate such contextualized sentence embeddings into NMT: a joint training method that jointly trains an NMT model with the source context prediction model and a pre-training & fine-tuning method that pretrains the source context prediction model on a large-scale monolingual document corpus and then fine-tunes it with the NMT model. Experiments on Chinese-English and English-German translation show that both methods can substantially improve the translation quality over a strong document-level Transformer baseline.
摘要:文档级机器翻译整合跨句子的依赖关系到一个源句子的翻译。在本文中,我们通过训练神经机器翻译(NMT)预测目标的翻译和原文句子的周边句子都提出模型跨句依赖的新框架。通过强制执行NMT模型预测源情况下,我们希望模型学习“情境”的原文句子陈述的是在源端捕获文件级依赖。我们进一步提出了两种不同的方法来学习和整合这样的情境句子的嵌入到NMT:即共同训练与源上下文预测模型和预培训和微调方法的NMT模型联合训练方法pretrains源上下文预测模型在大规模的单语语料库的文档,并与NMT模型,然后微调它。在中国,英语和英语 - 德语翻译的实验表明这两种方法可以显着提高了一个强大的文档级变压器基线的翻译质量。
Pei Zhang, Xu Zhang, Wei Chen, Jian Yu, Yanfeng Wang, Deyi Xiong
Abstract: Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence. In this paper, we propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence. By enforcing the NMT model to predict source context, we want the model to learn "contextualized" source sentence representations that capture document-level dependencies on the source side. We further propose two different methods to learn and integrate such contextualized sentence embeddings into NMT: a joint training method that jointly trains an NMT model with the source context prediction model and a pre-training & fine-tuning method that pretrains the source context prediction model on a large-scale monolingual document corpus and then fine-tunes it with the NMT model. Experiments on Chinese-English and English-German translation show that both methods can substantially improve the translation quality over a strong document-level Transformer baseline.
摘要:文档级机器翻译整合跨句子的依赖关系到一个源句子的翻译。在本文中,我们通过训练神经机器翻译(NMT)预测目标的翻译和原文句子的周边句子都提出模型跨句依赖的新框架。通过强制执行NMT模型预测源情况下,我们希望模型学习“情境”的原文句子陈述的是在源端捕获文件级依赖。我们进一步提出了两种不同的方法来学习和整合这样的情境句子的嵌入到NMT:即共同训练与源上下文预测模型和预培训和微调方法的NMT模型联合训练方法pretrains源上下文预测模型在大规模的单语语料库的文档,并与NMT模型,然后微调它。在中国,英语和英语 - 德语翻译的实验表明这两种方法可以显着提高了一个强大的文档级变压器基线的翻译质量。
9. InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining [PDF] 返回目录
Junyang Lin, An Yang, Yichang Zhang, Jie Liu, Jingren Zhou, Hongxia Yang
Abstract: Multi-modal pretraining for learning high-level multi-modal representation is a further step towards deep learning and artificial intelligence. In this work, we propose a novel model, namely InterBERT (BERT for Interaction), which owns strong capability of modeling interaction between the information flows of different modalities. The single-stream interaction module is capable of effectively processing information of multiple modalilties, and the two-stream module on top preserves the independence of each modality to avoid performance downgrade in single-modal tasks. We pretrain the model with three pretraining tasks, including masked segment modeling (MSM), masked region modeling (MRM) and image-text matching (ITM); and finetune the model on a series of vision-and-language downstream tasks. Experimental results demonstrate that InterBERT outperforms a series of strong baselines, including the most recent multi-modal pretraining methods, and the analysis shows that MSM and MRM are effective for pretraining and our method can achieve performances comparable to BERT in single-modal tasks. Besides, we propose a large-scale dataset for multi-modal pretraining in Chinese, and we develop the Chinese InterBERT which is the first Chinese multi-modal pretrained model. We pretrain the Chinese InterBERT on our proposed dataset of 3.1M image-text pairs from the mobile Taobao, the largest Chinese e-commerce platform. We finetune the model for text-based image retrieval, and recently we deployed the model online for topic-based recommendation.
摘要:学习高层次的多模态表示多式联运训练前是朝着深度学习和人工智能的又一步骤。在这项工作中,我们提出了一种新的模式,即InterBERT(BERT互动),拥有之间的不同方式的信息流建模交互能力强。单流交互模块能够有效地处理多个modalilties的信息,和在顶部的两流模块保留每种模态,以避免性能降级在单一模态的任务的独立性。我们pretrain有三个训练前的任务,包括屏蔽段模型(MSM),掩模区的建模(MRM)和影像文本匹配(ITM)的模型;和微调的一系列视觉和语言下游任务的模型。实验结果表明,InterBERT胜过了一系列强有力的基线,包括最近的多模式训练前的方法,并分析表明MSM和MRM是有效的训练前,我们的方法可以达到与BERT演出单模态的任务。此外,我们提出了在中国的多模态训练前大规模数据集,我们开发的中国InterBERT这是中国第一个多模态模型预训练。我们pretrain对我们提出的从移动淘宝,中国最大的电子商务平台3.1M图文对数据集中的中国InterBERT。我们微调了基于文本的图像检索模型,最近我们部署的模型在线基于主题的建议。
Junyang Lin, An Yang, Yichang Zhang, Jie Liu, Jingren Zhou, Hongxia Yang
Abstract: Multi-modal pretraining for learning high-level multi-modal representation is a further step towards deep learning and artificial intelligence. In this work, we propose a novel model, namely InterBERT (BERT for Interaction), which owns strong capability of modeling interaction between the information flows of different modalities. The single-stream interaction module is capable of effectively processing information of multiple modalilties, and the two-stream module on top preserves the independence of each modality to avoid performance downgrade in single-modal tasks. We pretrain the model with three pretraining tasks, including masked segment modeling (MSM), masked region modeling (MRM) and image-text matching (ITM); and finetune the model on a series of vision-and-language downstream tasks. Experimental results demonstrate that InterBERT outperforms a series of strong baselines, including the most recent multi-modal pretraining methods, and the analysis shows that MSM and MRM are effective for pretraining and our method can achieve performances comparable to BERT in single-modal tasks. Besides, we propose a large-scale dataset for multi-modal pretraining in Chinese, and we develop the Chinese InterBERT which is the first Chinese multi-modal pretrained model. We pretrain the Chinese InterBERT on our proposed dataset of 3.1M image-text pairs from the mobile Taobao, the largest Chinese e-commerce platform. We finetune the model for text-based image retrieval, and recently we deployed the model online for topic-based recommendation.
摘要:学习高层次的多模态表示多式联运训练前是朝着深度学习和人工智能的又一步骤。在这项工作中,我们提出了一种新的模式,即InterBERT(BERT互动),拥有之间的不同方式的信息流建模交互能力强。单流交互模块能够有效地处理多个modalilties的信息,和在顶部的两流模块保留每种模态,以避免性能降级在单一模态的任务的独立性。我们pretrain有三个训练前的任务,包括屏蔽段模型(MSM),掩模区的建模(MRM)和影像文本匹配(ITM)的模型;和微调的一系列视觉和语言下游任务的模型。实验结果表明,InterBERT胜过了一系列强有力的基线,包括最近的多模式训练前的方法,并分析表明MSM和MRM是有效的训练前,我们的方法可以达到与BERT演出单模态的任务。此外,我们提出了在中国的多模态训练前大规模数据集,我们开发的中国InterBERT这是中国第一个多模态模型预训练。我们pretrain对我们提出的从移动淘宝,中国最大的电子商务平台3.1M图文对数据集中的中国InterBERT。我们微调了基于文本的图像检索模型,最近我们部署的模型在线基于主题的建议。
10. Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement [PDF] 返回目录
Alireza Mohammadshahi, James Henderson
Abstract: We propose the Recursive Non-autoregressive Graph-to-graph Transformer architecture (RNG-Tr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. The Graph-to-Graph Transformer architecture of \newcite{mohammadshahi2019graphtograph} has previously been used for autoregressive graph prediction, but here we use it to predict all edges of the graph independently, conditioned on a previous prediction of the same graph. We demonstrate the power and effectiveness of RNG-Tr on several dependency corpora, using a refinement model pre-trained with BERT \cite{devlin2018bert}. We also introduce Dependency BERT (DepBERT), a non-recursive parser similar to our refinement model. RNG-Tr is able to improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks and the English and Chinese Penn Treebanks, even improving over the new state-of-the-art results achieved by DepBERT, significantly improving the state-of-the-art for all corpora tested.
摘要:我们提出了任意图的迭代优化递归非自回归图形对图形变压器架构(RNG-TR)通过非自回归图形对图形变压器的递归应用程序并将其应用到语法结构分析。 \ {newcite} mohammadshahi2019graphtograph的图形到图形变压器架构之前已用于自回归图表预测,但在这里我们用它来预测图形独立空调,在同一张图的先前预测的所有边缘。我们证明在几个依赖语料的力量和RNG-TR的效果,采用了精致的模型预先训练与BERT \ {引用} devlin2018bert。我们还引进依赖BERT(DepBERT),非递归解析器类似我们的细化模型。 RNG-TR是能够改善各种初始解析器从通用依赖树库和英国和中国宾州树库13种语言的准确度,甚至提高了通过DepBERT达到新的国家的最先进成果,显著改善所述状态的最先进的对于所有测试的语料库。
Alireza Mohammadshahi, James Henderson
Abstract: We propose the Recursive Non-autoregressive Graph-to-graph Transformer architecture (RNG-Tr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. The Graph-to-Graph Transformer architecture of \newcite{mohammadshahi2019graphtograph} has previously been used for autoregressive graph prediction, but here we use it to predict all edges of the graph independently, conditioned on a previous prediction of the same graph. We demonstrate the power and effectiveness of RNG-Tr on several dependency corpora, using a refinement model pre-trained with BERT \cite{devlin2018bert}. We also introduce Dependency BERT (DepBERT), a non-recursive parser similar to our refinement model. RNG-Tr is able to improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks and the English and Chinese Penn Treebanks, even improving over the new state-of-the-art results achieved by DepBERT, significantly improving the state-of-the-art for all corpora tested.
摘要:我们提出了任意图的迭代优化递归非自回归图形对图形变压器架构(RNG-TR)通过非自回归图形对图形变压器的递归应用程序并将其应用到语法结构分析。 \ {newcite} mohammadshahi2019graphtograph的图形到图形变压器架构之前已用于自回归图表预测,但在这里我们用它来预测图形独立空调,在同一张图的先前预测的所有边缘。我们证明在几个依赖语料的力量和RNG-TR的效果,采用了精致的模型预先训练与BERT \ {引用} devlin2018bert。我们还引进依赖BERT(DepBERT),非递归解析器类似我们的细化模型。 RNG-TR是能够改善各种初始解析器从通用依赖树库和英国和中国宾州树库13种语言的准确度,甚至提高了通过DepBERT达到新的国家的最先进成果,显著改善所述状态的最先进的对于所有测试的语料库。
11. Named Entities in Medical Case Reports: Corpus and Experiments [PDF] 返回目录
Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm
Abstract: We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
摘要:本文提出了一种新的语料库包括病例报告,从PubMed中心的开放存取图书馆发起的医疗实体的注解。在病例报告,我们标注的情况下,条件,结果,影响因素和否定的改性剂。此外,在适用情况下,我们标注这些实体之间的关系。因此,这是这种以英文提供给科学界的第一语料库。它使信息自动提取从病例报告通过像命名实体识别,关系抽取和(句子/段落)的相关检测任务的初步调查。此外,我们提出了检测的医疗机构的四强基线系统通过注释的数据集提供。
Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm
Abstract: We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
摘要:本文提出了一种新的语料库包括病例报告,从PubMed中心的开放存取图书馆发起的医疗实体的注解。在病例报告,我们标注的情况下,条件,结果,影响因素和否定的改性剂。此外,在适用情况下,我们标注这些实体之间的关系。因此,这是这种以英文提供给科学界的第一语料库。它使信息自动提取从病例报告通过像命名实体识别,关系抽取和(句子/段落)的相关检测任务的初步调查。此外,我们提出了检测的医疗机构的四强基线系统通过注释的数据集提供。
12. Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models [PDF] 返回目录
Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, Junji Tomita
Abstract: Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.
摘要:预先训练序列对序列(SEQ对SEQ)模型已经显著改善的几种语言生成任务,包括抽象概括的准确性。虽然抽象概括的流畅性已经通过微调这些车型有了很大的提高,目前尚不清楚他们是否也能确定要包含在汇总源文本的重要组成部分。在这项研究中,我们调查相结合,通过大量的实验,标识与预先训练序列对序列模型的源文本的重要组成部分显着模型的有效性。我们还建议由提取自源文字和序列到序列模型,采用序列作为额外的输入文本令牌序列的显着性模型的新组合模型。实验结果表明,大多数组合模型优于简单的微调序列到序列模型的CNN / DM和XSUM数据集两者即使以次对序列模型是大型语料库预先训练。此外,对于CNN / DM数据集,所提出的组合模型超出了ROUGE-L以前执行的最佳模型由1.33分。
Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, Junji Tomita
Abstract: Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.
摘要:预先训练序列对序列(SEQ对SEQ)模型已经显著改善的几种语言生成任务,包括抽象概括的准确性。虽然抽象概括的流畅性已经通过微调这些车型有了很大的提高,目前尚不清楚他们是否也能确定要包含在汇总源文本的重要组成部分。在这项研究中,我们调查相结合,通过大量的实验,标识与预先训练序列对序列模型的源文本的重要组成部分显着模型的有效性。我们还建议由提取自源文字和序列到序列模型,采用序列作为额外的输入文本令牌序列的显着性模型的新组合模型。实验结果表明,大多数组合模型优于简单的微调序列到序列模型的CNN / DM和XSUM数据集两者即使以次对序列模型是大型语料库预先训练。此外,对于CNN / DM数据集,所提出的组合模型超出了ROUGE-L以前执行的最佳模型由1.33分。
13. Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling [PDF] 返回目录
Dmitrii Aksenov, Julián Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm
Abstract: We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.
摘要:我们在探索什么有关用于预先训练的语言模型程度的知识是抽象概括的任务是有益的。为此,我们实验条件的BERT语言模型基于变压器的神经网络模型的编码器和解码器。此外,我们提出了BERT-窗口,这使得文本块明智处理长于BERT窗口大小的新方法。我们也探讨如何局部性建模,即,计算当地情况的明确的限制,可能会影响变压器的总结能力。这通过将2维卷积自关注到编码器的第一层完成。我们的模型的结果进行比较基准,并在CNN /每日邮报数据集的国家的最先进的车型。我们还培养我们的SwissText数据集模型来证明可用性德国。这两种型号跑赢上两个数据集在ROUGE得分基线并显示其在人工定性分析的优越性。
Dmitrii Aksenov, Julián Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm
Abstract: We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.
摘要:我们在探索什么有关用于预先训练的语言模型程度的知识是抽象概括的任务是有益的。为此,我们实验条件的BERT语言模型基于变压器的神经网络模型的编码器和解码器。此外,我们提出了BERT-窗口,这使得文本块明智处理长于BERT窗口大小的新方法。我们也探讨如何局部性建模,即,计算当地情况的明确的限制,可能会影响变压器的总结能力。这通过将2维卷积自关注到编码器的第一层完成。我们的模型的结果进行比较基准,并在CNN /每日邮报数据集的国家的最先进的车型。我们还培养我们的SwissText数据集模型来证明可用性德国。这两种型号跑赢上两个数据集在ROUGE得分基线并显示其在人工定性分析的优越性。
14. A Dataset of German Legal Documents for Named Entity Recognition [PDF] 返回目录
Elena Leitner, Georg Rehm, Julián Moreno-Schneider
Abstract: We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
摘要:我们将在德国联邦法院的判决对命名实体识别开发的数据集。它由约的。 67000个句子超过200万令牌。该资源包含54000个手动注明实体,映射到19细粒度语义类:人,法官,律师,国家,城市,街道,景观,组织,公司,机构,法院,品牌,法律,法令,欧洲的法律规范,监管,合同,法院判决和法律文献。该法律文件中,此外,自动与超过35,000基于TimeML时间表达式注释。该数据集,其下的连接1-2002格式的CC-BY 4.0许可可用,是为在欧盟项目山猫德国法律文件的培训服务NER发展。
Elena Leitner, Georg Rehm, Julián Moreno-Schneider
Abstract: We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
摘要:我们将在德国联邦法院的判决对命名实体识别开发的数据集。它由约的。 67000个句子超过200万令牌。该资源包含54000个手动注明实体,映射到19细粒度语义类:人,法官,律师,国家,城市,街道,景观,组织,公司,机构,法院,品牌,法律,法令,欧洲的法律规范,监管,合同,法院判决和法律文献。该法律文件中,此外,自动与超过35,000基于TimeML时间表达式注释。该数据集,其下的连接1-2002格式的CC-BY 4.0许可可用,是为在欧盟项目山猫德国法律文件的培训服务NER发展。
15. Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining [PDF] 返回目录
Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
Abstract: Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), served as a meta-learner to solve a group of similar NLP tasks for neural language models. Instead of simply multi-task training over all the datasets, MFT only learns from typical instances of various domains to acquire highly transferable knowledge. It further encourages the language model to encode domain-invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initializations and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning.
摘要:预训练神经语言模型带来的各种NLP任务显著的改善,通过微调模型上的特定任务的训练集。在微调,参数直接预先训练模式,而忽略了如何在不同领域的类似NLP任务的学习过程是相关,相互加强的初始化。在本文中,我们提出了一个名为元微调(MFT)有效的学习过程中,充当元学习者解决一组神经语言模型类似的NLP任务。而不是简单地在所有数据集中的多任务训练,MFT只能从各个领域的典型实例学会获得高度转让知识。它通过优化一系列新领域的腐败损失函数还鼓励语言模型编码域的恒定表征。 MFT后,该模型可以进行微调与更好的参数初始化和更高的泛化能力各个领域。我们在BERT实现MFT解决的几个多域的文本挖掘任务。实验结果证实MFT以及它的实用性为几拍的学习成效。
Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
Abstract: Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), served as a meta-learner to solve a group of similar NLP tasks for neural language models. Instead of simply multi-task training over all the datasets, MFT only learns from typical instances of various domains to acquire highly transferable knowledge. It further encourages the language model to encode domain-invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initializations and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning.
摘要:预训练神经语言模型带来的各种NLP任务显著的改善,通过微调模型上的特定任务的训练集。在微调,参数直接预先训练模式,而忽略了如何在不同领域的类似NLP任务的学习过程是相关,相互加强的初始化。在本文中,我们提出了一个名为元微调(MFT)有效的学习过程中,充当元学习者解决一组神经语言模型类似的NLP任务。而不是简单地在所有数据集中的多任务训练,MFT只能从各个领域的典型实例学会获得高度转让知识。它通过优化一系列新领域的腐败损失函数还鼓励语言模型编码域的恒定表征。 MFT后,该模型可以进行微调与更好的参数初始化和更高的泛化能力各个领域。我们在BERT实现MFT解决的几个多域的文本挖掘任务。实验结果证实MFT以及它的实用性为几拍的学习成效。
16. User Generated Data: Achilles' heel of BERT [PDF] 返回目录
Ankit Kumar, Piyush Makhija, Anuj Gupta
Abstract: Pre-trained language models such as BERT are known to perform exceedingly well on various NLP tasks and have even established new State-Of-The-Art (SOTA) benchmarks for many of these tasks. Owing to its success on various tasks and benchmark datasets, industry practitioners have started to explore BERT to build applications solving industry use cases. These use cases are known to have much more noise in the data as compared to benchmark datasets. In this work we systematically show that when the data is noisy, there is a significant degradation in the performance of BERT. Specifically, we performed experiments using BERT on popular tasks such sentiment analysis and textual similarity. For this we work with three well known datasets - IMDB movie reviews, SST-2 and STS-B to measure the performance. Further, we examine the reason behind this performance drop and identify the shortcomings in the BERT pipeline.
摘要:预先训练语言模型,如BERT已知的各种NLP任务执行得非常好,并甚至许多任务建立了新的国家的最艺术(SOTA)基准。由于其对各种任务和标准数据集的成功,行业的从业者已开始探索BERT构建应用程序解决行业的用例。这些用例被称为比较基准数据集有更多的噪音数据。在这项工作中,我们系统地表明,当数据是嘈杂的,有在BERT的性能显著下降。具体来说,我们使用BERT上流行的任务,例如情感分析和文本相似的实验。对于有三个著名的数据集这一点,我们的工作 - IMDB电影评论,SST-2和STS-B来衡量性能。此外,我们分析这背后的性能下降的原因,并确定在BERT管道的缺点。
Ankit Kumar, Piyush Makhija, Anuj Gupta
Abstract: Pre-trained language models such as BERT are known to perform exceedingly well on various NLP tasks and have even established new State-Of-The-Art (SOTA) benchmarks for many of these tasks. Owing to its success on various tasks and benchmark datasets, industry practitioners have started to explore BERT to build applications solving industry use cases. These use cases are known to have much more noise in the data as compared to benchmark datasets. In this work we systematically show that when the data is noisy, there is a significant degradation in the performance of BERT. Specifically, we performed experiments using BERT on popular tasks such sentiment analysis and textual similarity. For this we work with three well known datasets - IMDB movie reviews, SST-2 and STS-B to measure the performance. Further, we examine the reason behind this performance drop and identify the shortcomings in the BERT pipeline.
摘要:预先训练语言模型,如BERT已知的各种NLP任务执行得非常好,并甚至许多任务建立了新的国家的最艺术(SOTA)基准。由于其对各种任务和标准数据集的成功,行业的从业者已开始探索BERT构建应用程序解决行业的用例。这些用例被称为比较基准数据集有更多的噪音数据。在这项工作中,我们系统地表明,当数据是嘈杂的,有在BERT的性能显著下降。具体来说,我们使用BERT上流行的任务,例如情感分析和文本相似的实验。对于有三个著名的数据集这一点,我们的工作 - IMDB电影评论,SST-2和STS-B来衡量性能。此外,我们分析这背后的性能下降的原因,并确定在BERT管道的缺点。
17. Orchestrating NLP Services for the Legal Domain [PDF] 返回目录
Julián Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza
Abstract: Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.
摘要:法律技术目前接到了不少的注意力从不同的角度。在这方面的贡献,我们描述了一个系统,是目前在欧洲的创新项目山猫,其中包括来自工业和研究伙伴开发的主要技术组件。本文的主要贡献是工作流管理器,它使基于自然语言处理和内容策展服务的组合以及包含语义信息和法律文件有意义的参考多语言法律知识图的工作流程的灵活编排。我们还描述了我们的实验和开发原型解决方案不同的使用情况。
Julián Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza
Abstract: Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.
摘要:法律技术目前接到了不少的注意力从不同的角度。在这方面的贡献,我们描述了一个系统,是目前在欧洲的创新项目山猫,其中包括来自工业和研究伙伴开发的主要技术组件。本文的主要贡献是工作流管理器,它使基于自然语言处理和内容策展服务的组合以及包含语义信息和法律文件有意义的参考多语言法律知识图的工作流程的灵活编排。我们还描述了我们的实验和开发原型解决方案不同的使用情况。
18. Unsupervised feature learning for speech using correspondence and Siamese networks [PDF] 返回目录
Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper
Abstract: In zero-resource settings where transcribed speech audio is unavailable, unsupervised feature learning is essential for downstream speech processing tasks. Here we compare two recent methods for frame-level acoustic feature learning. For both methods, unsupervised term discovery is used to find pairs of word examples of the same unknown type. Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models. For the correspondence autoencoder (CAE), matching frames are presented as input-output pairs. The Triamese network uses a contrastive loss to reduce the distance between frames of the same predicted word type while increasing the distance between negative examples. For the first time, these feature extractors are compared on the same discrimination tasks using the same weak supervision pairs. We find that, on the two datasets considered here, the CAE outperforms the Triamese network. However, we show that a new hybrid correspondence-Triamese approach (CTriamese), consistently outperforms both the CAE and Triamese models in terms of average precision and ABX error rates on both English and Xitsonga evaluation data.
摘要:在零资源贫乏地区转录语音音频不可用,无监督功能学习是下游语音处理任务至关重要。在这里,我们比较两个最近的帧级声学特征的学习方法。对于这两种方法,无监督术语发现用于查找对同一类型未知单词的例子。然后,动态规划用于对齐每个单词对之间的功能框架,作为两个型号弱自上而下的监督。对于自动编码器对应(CAE),匹配帧被呈现为输入 - 输出对。所述Triamese网络使用对比损耗以减少同一预测单词类型的帧之间的距离,同时增加负的例子之间的距离。这是第一次,这些特征提取都使用相同的监管不力对同等判别的任务比较。我们发现,在这里考虑的两个数据集,在CAE优于Triamese网络。然而,我们表明,一种新的混合对应-Triamese办法(CTriamese),始终优于无论是CAE和Triamese模型在英语和Xitsonga评价数据的平均准确率和ABX错误率方面。
Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper
Abstract: In zero-resource settings where transcribed speech audio is unavailable, unsupervised feature learning is essential for downstream speech processing tasks. Here we compare two recent methods for frame-level acoustic feature learning. For both methods, unsupervised term discovery is used to find pairs of word examples of the same unknown type. Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models. For the correspondence autoencoder (CAE), matching frames are presented as input-output pairs. The Triamese network uses a contrastive loss to reduce the distance between frames of the same predicted word type while increasing the distance between negative examples. For the first time, these feature extractors are compared on the same discrimination tasks using the same weak supervision pairs. We find that, on the two datasets considered here, the CAE outperforms the Triamese network. However, we show that a new hybrid correspondence-Triamese approach (CTriamese), consistently outperforms both the CAE and Triamese models in terms of average precision and ABX error rates on both English and Xitsonga evaluation data.
摘要:在零资源贫乏地区转录语音音频不可用,无监督功能学习是下游语音处理任务至关重要。在这里,我们比较两个最近的帧级声学特征的学习方法。对于这两种方法,无监督术语发现用于查找对同一类型未知单词的例子。然后,动态规划用于对齐每个单词对之间的功能框架,作为两个型号弱自上而下的监督。对于自动编码器对应(CAE),匹配帧被呈现为输入 - 输出对。所述Triamese网络使用对比损耗以减少同一预测单词类型的帧之间的距离,同时增加负的例子之间的距离。这是第一次,这些特征提取都使用相同的监管不力对同等判别的任务比较。我们发现,在这里考虑的两个数据集,在CAE优于Triamese网络。然而,我们表明,一种新的混合对应-Triamese办法(CTriamese),始终优于无论是CAE和Triamese模型在英语和Xitsonga评价数据的平均准确率和ABX错误率方面。
19. HIN: Hierarchical Inference Network for Document-Level Relation Extraction [PDF] 返回目录
Hengzhu Tang, Yanan Cao, Zhenyu Zhang, Jiangxia Cao, Fang Fang, Shi Wang, Pengfei Yin
Abstract: Document-level RE requires reading, inferring and aggregating over multiple sentences. From our point of view, it is necessary for document-level RE to take advantage of multi-granularity inference information: entity level, sentence level and document level. Thus, how to obtain and aggregate the inference information with different granularity is challenging for document-level RE, which has not been considered by previous work. In this paper, we propose a Hierarchical Inference Network (HIN) to make full use of the abundant information from entity level, sentence level and document level. Translation constraint and bilinear transformation are applied to target entity pair in multiple subspaces to get entity-level inference information. Next, we model the inference between entity-level information and sentence representation to achieve sentence-level inference information. Finally, a hierarchical aggregation approach is adopted to obtain the document-level inference information. In this way, our model can effectively aggregate inference information from these three different granularities. Experimental results show that our method achieves state-of-the-art performance on the large-scale DocRED dataset. We also demonstrate that using BERT representations can further substantially boost the performance.
摘要:文档级RE需要阅读,推断和聚集在多个句子。从我们的角度来看,有必要对文档级RE充分利用多粒度推理信息:实体层面,句子层面和文档级。因此,如何获取和汇总不同粒度的推断信息文档级RE,尚未考虑由以前的工作挑战。在本文中,我们提出了一个层次推理网络(HIN),以充分利用从实体层面,句子层面和文档级别的丰富信息。翻译约束和双线性变换应用到目标实体对多个子空间得到公司层面的推理信息。接下来,我们模型实体层面的信息以及句子表示实现语句级推断信息的推断。最后,分级聚合方法被采用,以获得文档级推断信息。这样一来,我们的模型可以从三个不同的粒度有效地聚集推断信息。实验结果表明,我们的方法实现对大规模数据集DocRED国家的最先进的性能。我们还表明,使用BERT表示可以进一步大幅提升性能。
Hengzhu Tang, Yanan Cao, Zhenyu Zhang, Jiangxia Cao, Fang Fang, Shi Wang, Pengfei Yin
Abstract: Document-level RE requires reading, inferring and aggregating over multiple sentences. From our point of view, it is necessary for document-level RE to take advantage of multi-granularity inference information: entity level, sentence level and document level. Thus, how to obtain and aggregate the inference information with different granularity is challenging for document-level RE, which has not been considered by previous work. In this paper, we propose a Hierarchical Inference Network (HIN) to make full use of the abundant information from entity level, sentence level and document level. Translation constraint and bilinear transformation are applied to target entity pair in multiple subspaces to get entity-level inference information. Next, we model the inference between entity-level information and sentence representation to achieve sentence-level inference information. Finally, a hierarchical aggregation approach is adopted to obtain the document-level inference information. In this way, our model can effectively aggregate inference information from these three different granularities. Experimental results show that our method achieves state-of-the-art performance on the large-scale DocRED dataset. We also demonstrate that using BERT representations can further substantially boost the performance.
摘要:文档级RE需要阅读,推断和聚集在多个句子。从我们的角度来看,有必要对文档级RE充分利用多粒度推理信息:实体层面,句子层面和文档级。因此,如何获取和汇总不同粒度的推断信息文档级RE,尚未考虑由以前的工作挑战。在本文中,我们提出了一个层次推理网络(HIN),以充分利用从实体层面,句子层面和文档级别的丰富信息。翻译约束和双线性变换应用到目标实体对多个子空间得到公司层面的推理信息。接下来,我们模型实体层面的信息以及句子表示实现语句级推断信息的推断。最后,分级聚合方法被采用,以获得文档级推断信息。这样一来,我们的模型可以从三个不同的粒度有效地聚集推断信息。实验结果表明,我们的方法实现对大规模数据集DocRED国家的最先进的性能。我们还表明,使用BERT表示可以进一步大幅提升性能。
20. Variational Transformers for Diverse Response Generation [PDF] 返回目录
Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung
Abstract: Despite the great promise of Transformers in many sequence modeling tasks (e.g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation. Previous work proposes to capture the variability of dialogue responses with a recurrent neural network (RNN)-based conditional variational autoencoder (CVAE). However, the autoregressive computation of the RNN limits the training efficiency. Therefore, we propose the Variational Transformer (VT), a variational self-attentive feed-forward sequence model. The VT combines the parallelizability and global receptive field of the Transformer with the variational nature of the CVAE by incorporating stochastic latent variables into Transformers. We explore two types of the VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of fine-grained latent variables. Then, the proposed models are evaluated on three conversational datasets with both automatic metric and human evaluation. The experimental results show that our models improve standard Transformers and other baselines in terms of diversity, semantic relevance, and human judgment.
摘要:尽管变压器在许多序列建模任务(例如,机器翻译)的巨大潜力,从推广到高熵的任务,例如对话响应生成的确定性特点阻碍了他们。以往的工作提出了捕捉对话响应的变异与回归神经网络(RNN)为基础的条件变的自动编码(CVAE)。然而,RNN的自回归计算限制了训练效率。因此,我们提出变变压器(VT),一个变自周到的前馈序列模型。该VT通过将随机潜在变量到变形金刚结合了CVAE的性质变变压器的并行性和全球性感受野。我们探索两种类型VT的:1)建模与全球潜变量语篇层次多样性;和2)增强与细粒潜变量序列的变压器解码器。然后,所提出的模型在三个会话数据集进行评估既自动公制和人工评估。实验结果表明,我们的模型提高多样性,语义关联,和人的判断方面标准变压器和其他基线。
Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung
Abstract: Despite the great promise of Transformers in many sequence modeling tasks (e.g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation. Previous work proposes to capture the variability of dialogue responses with a recurrent neural network (RNN)-based conditional variational autoencoder (CVAE). However, the autoregressive computation of the RNN limits the training efficiency. Therefore, we propose the Variational Transformer (VT), a variational self-attentive feed-forward sequence model. The VT combines the parallelizability and global receptive field of the Transformer with the variational nature of the CVAE by incorporating stochastic latent variables into Transformers. We explore two types of the VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of fine-grained latent variables. Then, the proposed models are evaluated on three conversational datasets with both automatic metric and human evaluation. The experimental results show that our models improve standard Transformers and other baselines in terms of diversity, semantic relevance, and human judgment.
摘要:尽管变压器在许多序列建模任务(例如,机器翻译)的巨大潜力,从推广到高熵的任务,例如对话响应生成的确定性特点阻碍了他们。以往的工作提出了捕捉对话响应的变异与回归神经网络(RNN)为基础的条件变的自动编码(CVAE)。然而,RNN的自回归计算限制了训练效率。因此,我们提出变变压器(VT),一个变自周到的前馈序列模型。该VT通过将随机潜在变量到变形金刚结合了CVAE的性质变变压器的并行性和全球性感受野。我们探索两种类型VT的:1)建模与全球潜变量语篇层次多样性;和2)增强与细粒潜变量序列的变压器解码器。然后,所提出的模型在三个会话数据集进行评估既自动公制和人工评估。实验结果表明,我们的模型提高多样性,语义关联,和人的判断方面标准变压器和其他基线。
21. A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency [PDF] 返回目录
Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao
Abstract: Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.
摘要:迄今为止,端至端(E2E)模型已未显示出国家的最先进的跑赢常规型号相对于两个质量,即,误字率(WER)和等待时间,即,时间的设定是在用户完成后停止说话。在本文中,我们开发了第一通递归神经网络传感器(RNN-T)模型和第二遍听,参加,这超越了质量和延迟的传统模式法术(LAS)rescorer。在质量方面,我们整合了大量的话语跨变化领域,以增加声音的多样性,并通过模型看到的词汇。我们也有口音的英语语音训练,使模型更加坚固,以不同的发音。此外,由于训练数据量的增加,我们探索出变化率的学习进度。在延迟方面,我们探索使用由RNN-T模型发出的结束句决定关闭麦克风,同时也介绍了各种优化,以提高LAS再评分的速度。总体而言,我们发现,相比于传统的模型RNN-T + LAS提供了更好的WER和延迟权衡。例如,对于相同的等待时间,RNN-T + LAS获得在WER一个8%的相对改善,同时在模型尺寸超过400倍。
Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao
Abstract: Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.
摘要:迄今为止,端至端(E2E)模型已未显示出国家的最先进的跑赢常规型号相对于两个质量,即,误字率(WER)和等待时间,即,时间的设定是在用户完成后停止说话。在本文中,我们开发了第一通递归神经网络传感器(RNN-T)模型和第二遍听,参加,这超越了质量和延迟的传统模式法术(LAS)rescorer。在质量方面,我们整合了大量的话语跨变化领域,以增加声音的多样性,并通过模型看到的词汇。我们也有口音的英语语音训练,使模型更加坚固,以不同的发音。此外,由于训练数据量的增加,我们探索出变化率的学习进度。在延迟方面,我们探索使用由RNN-T模型发出的结束句决定关闭麦克风,同时也介绍了各种优化,以提高LAS再评分的速度。总体而言,我们发现,相比于传统的模型RNN-T + LAS提供了更好的WER和延迟权衡。例如,对于相同的等待时间,RNN-T + LAS获得在WER一个8%的相对改善,同时在模型尺寸超过400倍。
22. Serialized Output Training for End-to-End Overlapped Speech Recognition [PDF] 返回目录
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
Abstract: This paper proposes serialized output training (SOT), a novel framework for multi-speaker overlapped speech recognition based on an attention-based encoder-decoder approach. Instead of having multiple output layers as with the permutation invariant training (PIT), SOT uses a model with only one output layer that generates the transcriptions of multiple speakers one after another. The attention and decoder modules take care of producing multiple transcriptions from overlapped speech. SOT has two advantages over PIT: (1) no limitation in the maximum number of speakers, and (2) an ability to model the dependencies among outputs for different speakers. We also propose a simple trick to reduce the complexity of processing each training sample from $O(S!)$ to $O(1)$, where $S$ is the number of the speakers in the training sample, by using the start times of the constituent source utterances. Experimental results on LibriSpeech corpus show that the SOT models can transcribe overlapped speech with variable numbers of speakers significantly better than PIT-based models. We also show that the SOT models can accurately count the number of speakers in the input audio.
摘要:提出串行化输出训练(SOT),基于一个基于注意机制的编码器 - 解码器的方法对多扬声器重叠语音识别的新框架。代替具有多个输出的层作为与排列不变训练(PIT)的,SOT使用具有仅一个,其生成多个扬声器的转录陆续输出层的模型。注意和解码器模块需要从重叠的讲话产生多种转录的照顾。 SOT具有超过PIT两个优点:(1)发言者的最大数目没有限制,和(2)的依赖关系不同的扬声器输出中进行建模的能力。我们还提出了一个简单的技巧,以减少距离O $处理每个训练样本的复杂性(S!)$到$ O(1)$,其中$ S $是训练样本中扬声器的数量,通过使用开始构成源话语倍。在LibriSpeech语料的实验结果表明该模型SOT可以录制语音重叠带扬声器的可变数目显著优于基于PIT-模型。我们还表明,SOT模型可以精确计算输入音频扬声器的数量。
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
Abstract: This paper proposes serialized output training (SOT), a novel framework for multi-speaker overlapped speech recognition based on an attention-based encoder-decoder approach. Instead of having multiple output layers as with the permutation invariant training (PIT), SOT uses a model with only one output layer that generates the transcriptions of multiple speakers one after another. The attention and decoder modules take care of producing multiple transcriptions from overlapped speech. SOT has two advantages over PIT: (1) no limitation in the maximum number of speakers, and (2) an ability to model the dependencies among outputs for different speakers. We also propose a simple trick to reduce the complexity of processing each training sample from $O(S!)$ to $O(1)$, where $S$ is the number of the speakers in the training sample, by using the start times of the constituent source utterances. Experimental results on LibriSpeech corpus show that the SOT models can transcribe overlapped speech with variable numbers of speakers significantly better than PIT-based models. We also show that the SOT models can accurately count the number of speakers in the input audio.
摘要:提出串行化输出训练(SOT),基于一个基于注意机制的编码器 - 解码器的方法对多扬声器重叠语音识别的新框架。代替具有多个输出的层作为与排列不变训练(PIT)的,SOT使用具有仅一个,其生成多个扬声器的转录陆续输出层的模型。注意和解码器模块需要从重叠的讲话产生多种转录的照顾。 SOT具有超过PIT两个优点:(1)发言者的最大数目没有限制,和(2)的依赖关系不同的扬声器输出中进行建模的能力。我们还提出了一个简单的技巧,以减少距离O $处理每个训练样本的复杂性(S!)$到$ O(1)$,其中$ S $是训练样本中扬声器的数量,通过使用开始构成源话语倍。在LibriSpeech语料的实验结果表明该模型SOT可以录制语音重叠带扬声器的可变数目显著优于基于PIT-模型。我们还表明,SOT模型可以精确计算输入音频扬声器的数量。
23. Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin [PDF] 返回目录
Orevaoghene Ahia, Kelechi Ogueji
Abstract: Nigerian Pidgin is arguably the most widely spoken language in Nigeria. Variants of this language are also spoken across West and Central Africa, making it a very important language. This work aims to establish supervised and unsupervised neural machine translation (NMT) baselines between English and Nigerian Pidgin. We implement and compare NMT models with different tokenization methods, creating a solid foundation for future works.
摘要:尼日利亚洋泾浜可以说是在尼日利亚最广泛的语言。这种语言的变体也讲西非和中非跨越,使其成为一个非常重要的语言。这项工作的目的是建立监督和英语与尼日利亚之间的Pidgin无监督神经机器翻译(NMT)基线。我们实施和使用不同的标记化方法进行了比较NMT模型,为未来的工作奠定坚实基础。
Orevaoghene Ahia, Kelechi Ogueji
Abstract: Nigerian Pidgin is arguably the most widely spoken language in Nigeria. Variants of this language are also spoken across West and Central Africa, making it a very important language. This work aims to establish supervised and unsupervised neural machine translation (NMT) baselines between English and Nigerian Pidgin. We implement and compare NMT models with different tokenization methods, creating a solid foundation for future works.
摘要:尼日利亚洋泾浜可以说是在尼日利亚最广泛的语言。这种语言的变体也讲西非和中非跨越,使其成为一个非常重要的语言。这项工作的目的是建立监督和英语与尼日利亚之间的Pidgin无监督神经机器翻译(NMT)基线。我们实施和使用不同的标记化方法进行了比较NMT模型,为未来的工作奠定坚实基础。
24. TREC CAsT 2019: The Conversational Assistance Track Overview [PDF] 返回目录
Jeffrey Dalton, Chenyan Xiong, Jamie Callan
Abstract: The Conversational Assistance Track (CAsT) is a new track for TREC 2019 to facilitate Conversational Information Seeking (CIS) research and to create a large-scale reusable test collection for conversational search systems. The document corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. Relevance assessments are provided for 30 training topics and 20 test topics. This year 21 groups submitted a total of 65 runs using varying methods for conversational query understanding and ranking. Methods include traditional retrieval based methods, feature based learning-to-rank, neural models, and knowledge enhanced methods. A common theme through the runs is the use of BERT-based neural reranking methods. Leading methods also employed document expansion, conversational query expansion, and generative language models for conversational query rewriting (GPT-2). The results show a gap between automatic systems and those using the manually resolved utterances, with a 35% relative improvement of manual rewrites over the best automatic system.
摘要:会话援助轨道(CAST)是TREC 2019一条新的轨道,以促进对话信息搜寻(CIS)的研究和建立一个大型的可重用的测试收集对话搜索系统。文档语料库是从TREC复杂的答案检索(CAR)和Microsoft机阅读理解(MARCO)数据集38426252个通道。八十信息寻求对话(30火车,50个测试)是9至10个问题的平均长。提供了30个培训主题和20个测试题目相关的评估。今年21组共使用会话查询理解和排名变化的方法65个运行的提交。方法包括传统的基于检索方法,基于学习到秩,神经模型和知识的增强方法的特征。通过运行一个共同主题是使用基于BERT神经重新排名的方法。领导方式和方法也使用文档的扩展,对话查询扩展,并为对话查询重写(GPT-2)生成的语言模型。结果表明自动系统和使用该手动解决话语,具有手动重写在最佳自动系统中的35%的相对改善的那些之间的间隙。
Jeffrey Dalton, Chenyan Xiong, Jamie Callan
Abstract: The Conversational Assistance Track (CAsT) is a new track for TREC 2019 to facilitate Conversational Information Seeking (CIS) research and to create a large-scale reusable test collection for conversational search systems. The document corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. Relevance assessments are provided for 30 training topics and 20 test topics. This year 21 groups submitted a total of 65 runs using varying methods for conversational query understanding and ranking. Methods include traditional retrieval based methods, feature based learning-to-rank, neural models, and knowledge enhanced methods. A common theme through the runs is the use of BERT-based neural reranking methods. Leading methods also employed document expansion, conversational query expansion, and generative language models for conversational query rewriting (GPT-2). The results show a gap between automatic systems and those using the manually resolved utterances, with a 35% relative improvement of manual rewrites over the best automatic system.
摘要:会话援助轨道(CAST)是TREC 2019一条新的轨道,以促进对话信息搜寻(CIS)的研究和建立一个大型的可重用的测试收集对话搜索系统。文档语料库是从TREC复杂的答案检索(CAR)和Microsoft机阅读理解(MARCO)数据集38426252个通道。八十信息寻求对话(30火车,50个测试)是9至10个问题的平均长。提供了30个培训主题和20个测试题目相关的评估。今年21组共使用会话查询理解和排名变化的方法65个运行的提交。方法包括传统的基于检索方法,基于学习到秩,神经模型和知识的增强方法的特征。通过运行一个共同主题是使用基于BERT神经重新排名的方法。领导方式和方法也使用文档的扩展,对话查询扩展,并为对话查询重写(GPT-2)生成的语言模型。结果表明自动系统和使用该手动解决话语,具有手动重写在最佳自动系统中的35%的相对改善的那些之间的间隙。
25. Using News Articles and Financial Data to predict the likelihood of bankruptcy [PDF] 返回目录
Michael Filletti, Aaron Grech
Abstract: Over the past decade, millions of companies have filed for bankruptcy. This has been caused by a plethora of reasons, namely, high interest rates, heavy debts and government regulations. The effect of a company going bankrupt can be devastating, hurting not only workers and shareholders, but also clients, suppliers and any related external companies. One of the aims of this paper is to provide a framework for company bankruptcy to be predicted by making use of financial figures, provided by our external dataset, in conjunction with the sentiment of news articles about certain sectors. News articles are used to attempt to quantify the sentiment on a company and its sector from an external perspective, rather than simply using internal figures. This work builds on previous studies carried out by multiple researchers, to bring us closer to lessening the impact of such events.
摘要:在过去的十年里,数以百万计的公司已经申请破产。这已通过的原因,即,高利率,沉重的债务和政府规章过多引起的。公司破产的影响可能是毁灭性的,伤害不仅是工人和股东,也是客户,供应商以及任何相关的外部公司。一本文的目的之一是为公司的破产通过利用财务数据,通过我们的外部数据集提供,连同有关某些部门的新闻文章的情绪预测的框架。新闻文章来试图从外部视角量化对公司的感情和它的部门,而不是简单地使用内部数据。这项工作建立在以前的研究中进行的多次研究,以使我们更接近减轻此类事件的影响。
Michael Filletti, Aaron Grech
Abstract: Over the past decade, millions of companies have filed for bankruptcy. This has been caused by a plethora of reasons, namely, high interest rates, heavy debts and government regulations. The effect of a company going bankrupt can be devastating, hurting not only workers and shareholders, but also clients, suppliers and any related external companies. One of the aims of this paper is to provide a framework for company bankruptcy to be predicted by making use of financial figures, provided by our external dataset, in conjunction with the sentiment of news articles about certain sectors. News articles are used to attempt to quantify the sentiment on a company and its sector from an external perspective, rather than simply using internal figures. This work builds on previous studies carried out by multiple researchers, to bring us closer to lessening the impact of such events.
摘要:在过去的十年里,数以百万计的公司已经申请破产。这已通过的原因,即,高利率,沉重的债务和政府规章过多引起的。公司破产的影响可能是毁灭性的,伤害不仅是工人和股东,也是客户,供应商以及任何相关的外部公司。一本文的目的之一是为公司的破产通过利用财务数据,通过我们的外部数据集提供,连同有关某些部门的新闻文章的情绪预测的框架。新闻文章来试图从外部视角量化对公司的感情和它的部门,而不是简单地使用内部数据。这项工作建立在以前的研究中进行的多次研究,以使我们更接近减轻此类事件的影响。
26. AliCoCo: Alibaba E-commerce Cognitive Concept Net [PDF] 返回目录
Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinhang Wu, Qiang Li, Keping Yang, Kenny Q. Zhu
Abstract: One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between prevents shopping experience from being more intelligent. In this paper, we propose to construct a large-scale e-commerce cognitive concept net named "AliCoCo", which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. We formally define user needs in e-commerce, then conceptualize them as nodes in the net. We present details on how AliCoCo is constructed semi-automatically and its successful, ongoing and potential applications in e-commerce.
摘要:一个电子商务平台的最终目标是要满足客户的各种购物需求。很多努力致力于电子商务朝着这个目标创建分类法或本体。然而,在电子商务用户的需求仍然没有得到很好的定义,并没有一个现有的本体具有普遍了解用户需求的足够的深度和广度。在两者之间的语义间隙防止网购成为更加智能的体验。在本文中,我们提出构建名为“AliCoCo”大规模电子商务的认知概念网,这是在阿里巴巴实行,在世界上最大的中国电子商务平台。我们在电子商务的正式定义用户需求,那么概念化他们作为在网节点。我们对AliCoCo如何半自动构建目前细节及其成功,正在进行和电子商务的应用潜力。
Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinhang Wu, Qiang Li, Keping Yang, Kenny Q. Zhu
Abstract: One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between prevents shopping experience from being more intelligent. In this paper, we propose to construct a large-scale e-commerce cognitive concept net named "AliCoCo", which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. We formally define user needs in e-commerce, then conceptualize them as nodes in the net. We present details on how AliCoCo is constructed semi-automatically and its successful, ongoing and potential applications in e-commerce.
摘要:一个电子商务平台的最终目标是要满足客户的各种购物需求。很多努力致力于电子商务朝着这个目标创建分类法或本体。然而,在电子商务用户的需求仍然没有得到很好的定义,并没有一个现有的本体具有普遍了解用户需求的足够的深度和广度。在两者之间的语义间隙防止网购成为更加智能的体验。在本文中,我们提出构建名为“AliCoCo”大规模电子商务的认知概念网,这是在阿里巴巴实行,在世界上最大的中国电子商务平台。我们在电子商务的正式定义用户需求,那么概念化他们作为在网节点。我们对AliCoCo如何半自动构建目前细节及其成功,正在进行和电子商务的应用潜力。
27. A Novel Method of Extracting Topological Features from Word Embeddings [PDF] 返回目录
Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
Abstract: In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
摘要:近年来,拓扑数据分析已被用于各种各样的问题来处理高维噪声数据。虽然文字表述往往是高维和嘈杂,只有在自然语言处理的拓扑数据分析应用的几个工作。在本文中,我们介绍一种新颖的算法,以从字嵌入可用于文本分类文本的表示提取拓扑特征。工作字的嵌入,拓扑数据分析可以解释嵌入高维空间,发现不同的嵌入维度之间的关系。我们将使用持久的同源性,从拓扑数据分析中最常用的工具,我们的实验。长文本文件审查我们的拓扑算法,我们将展示我们的定义拓扑功能可以超越传统的文本挖掘功能。
Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
Abstract: In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
摘要:近年来,拓扑数据分析已被用于各种各样的问题来处理高维噪声数据。虽然文字表述往往是高维和嘈杂,只有在自然语言处理的拓扑数据分析应用的几个工作。在本文中,我们介绍一种新颖的算法,以从字嵌入可用于文本分类文本的表示提取拓扑特征。工作字的嵌入,拓扑数据分析可以解释嵌入高维空间,发现不同的嵌入维度之间的关系。我们将使用持久的同源性,从拓扑数据分析中最常用的工具,我们的实验。长文本文件审查我们的拓扑算法,我们将展示我们的定义拓扑功能可以超越传统的文本挖掘功能。
28. BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions [PDF] 返回目录
Ozan Arkan Can, İlker Kesen, Deniz Yuret
Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.
摘要:我们提出BiLingUNet,国家的最先进的模型用于使用参照表达式图像分割。 BiLingUNet使用的语言来定制视觉过滤器,优于接近于连接一个语言表达的视觉输入。我们发现,使用的语言,以便更好地调节两个自下而上和自上而下的视觉处理工作不仅仅是使自上而下的处理语言的条件。我们认为,共同的1x1语言条件过滤器不能代表关系概念和实验证明,更宽的过滤器更好地工作。我们的模型实现了四个指表达数据集的状态的最先进的性能。
Ozan Arkan Can, İlker Kesen, Deniz Yuret
Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.
摘要:我们提出BiLingUNet,国家的最先进的模型用于使用参照表达式图像分割。 BiLingUNet使用的语言来定制视觉过滤器,优于接近于连接一个语言表达的视觉输入。我们发现,使用的语言,以便更好地调节两个自下而上和自上而下的视觉处理工作不仅仅是使自上而下的处理语言的条件。我们认为,共同的1x1语言条件过滤器不能代表关系概念和实验证明,更宽的过滤器更好地工作。我们的模型实现了四个指表达数据集的状态的最先进的性能。
29. Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework [PDF] 返回目录
Yaochen Zhu, Jiayi Xie, Zhenzhong Chen
Abstract: As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions. However, the popularity pattern of an individual micro-video still remains elusive among the researchers. One of the major challenges is that the potential popularity of a micro-video tends to fluctuate under the impact of various external factors, which makes it full of uncertainties. In addition, since micro-videos are mainly uploaded by individuals that lack professional techniques, multiple types of noise could exist that obscure useful information. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework for micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian embedding of a micro-video that is informative to its popularity level while preserves the inherent uncertainties simultaneously. Moreover, through the optimization of a deep variational information bottleneck lower-bound (IBLBO), the learned hidden representation is shown to be maximally expressive about the popularity target while maximally compressive to the noise in micro-video features. Furthermore, the Bayesian product-of-experts principle is applied to the multimodal encoder, where the decision for information keeping or discarding is made comprehensively with all available modalities. Extensive experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
摘要:随着用户生成内容的新兴型,微视频大大丰富了人们的娱乐体验和社会交往。然而,一个人的微视频的普及模式仍然是研究人员之间难以捉摸。其中一个主要的挑战是,微视频的潜力普及下的各种外部因素的影响,这使得它充满了不确定性倾向于波动。此外,由于微视频主要是由缺乏专业技术个人上传的,多种类型的噪声可能存在这种不起眼的有用信息。在本文中,我们提出微视频的普及预测任务的多模态变编码器,解码器(MMVED)框架。 MMVED学习微视频的随机高斯嵌入就是翔实它的普及水平,同时保留了固有的不确定性同时进行。此外,通过一个深变信息瓶颈优化下限(IBLBO),所学习的隐藏表示被示出为最大限度表现力大约普及目标,而最大程度地压缩到在微视频特征的噪声。此外,贝叶斯产品的专业人士原理应用到多式编码器,其中对于信息保留或丢弃的决定是综合了所有可用的模式做。在公共数据集,我们从Xigua收集的数据集进行了广泛的实验,证明了该MMVED框架的有效性。
Yaochen Zhu, Jiayi Xie, Zhenzhong Chen
Abstract: As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions. However, the popularity pattern of an individual micro-video still remains elusive among the researchers. One of the major challenges is that the potential popularity of a micro-video tends to fluctuate under the impact of various external factors, which makes it full of uncertainties. In addition, since micro-videos are mainly uploaded by individuals that lack professional techniques, multiple types of noise could exist that obscure useful information. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework for micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian embedding of a micro-video that is informative to its popularity level while preserves the inherent uncertainties simultaneously. Moreover, through the optimization of a deep variational information bottleneck lower-bound (IBLBO), the learned hidden representation is shown to be maximally expressive about the popularity target while maximally compressive to the noise in micro-video features. Furthermore, the Bayesian product-of-experts principle is applied to the multimodal encoder, where the decision for information keeping or discarding is made comprehensively with all available modalities. Extensive experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
摘要:随着用户生成内容的新兴型,微视频大大丰富了人们的娱乐体验和社会交往。然而,一个人的微视频的普及模式仍然是研究人员之间难以捉摸。其中一个主要的挑战是,微视频的潜力普及下的各种外部因素的影响,这使得它充满了不确定性倾向于波动。此外,由于微视频主要是由缺乏专业技术个人上传的,多种类型的噪声可能存在这种不起眼的有用信息。在本文中,我们提出微视频的普及预测任务的多模态变编码器,解码器(MMVED)框架。 MMVED学习微视频的随机高斯嵌入就是翔实它的普及水平,同时保留了固有的不确定性同时进行。此外,通过一个深变信息瓶颈优化下限(IBLBO),所学习的隐藏表示被示出为最大限度表现力大约普及目标,而最大程度地压缩到在微视频特征的噪声。此外,贝叶斯产品的专业人士原理应用到多式编码器,其中对于信息保留或丢弃的决定是综合了所有可用的模式做。在公共数据集,我们从Xigua收集的数据集进行了广泛的实验,证明了该MMVED框架的有效性。
30. Countering Language Drift with Seeded Iterated Learning [PDF] 返回目录
Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
Abstract: Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift by using iterated learning. We iterate between fine-tuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learn-ing drastically counters language drift as well as it improves the task completion metric.
摘要:监督学习方法,善于训练的时候在大语料库捕捉语言的统计特性。然而,这些模型往往产生在面向目标的语言设置不一致的产出,因为他们没有受过训练,完成基础任务。此外,一旦代理微调,最大限度地完成任务,他们从所谓的语言甩尾现象受苦。他们慢慢失去语法和语言,他们只专注于解决任务的语义特性。在本文中,我们提出了用迭代学习的一个通用的办法来应对语言漂移。我们反复与互动培训步骤微调代理之间,并定期与从最后一次迭代播种和培训,以模仿最新款微调,新的代理商更换它们。迭代学习不需要外部句法约束,也不语义知识,使其成为一个有价值的任务无关的协议细化和微调。我们首先探讨在刘易斯游戏迭代学习。然后,我们向上扩展在翻译游戏的方法。在这两种环境,我们的研究结果表明,迭代学习-ING大幅柜台语言漂移以及它提高了完成任务指标。
Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
Abstract: Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift by using iterated learning. We iterate between fine-tuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learn-ing drastically counters language drift as well as it improves the task completion metric.
摘要:监督学习方法,善于训练的时候在大语料库捕捉语言的统计特性。然而,这些模型往往产生在面向目标的语言设置不一致的产出,因为他们没有受过训练,完成基础任务。此外,一旦代理微调,最大限度地完成任务,他们从所谓的语言甩尾现象受苦。他们慢慢失去语法和语言,他们只专注于解决任务的语义特性。在本文中,我们提出了用迭代学习的一个通用的办法来应对语言漂移。我们反复与互动培训步骤微调代理之间,并定期与从最后一次迭代播种和培训,以模仿最新款微调,新的代理商更换它们。迭代学习不需要外部句法约束,也不语义知识,使其成为一个有价值的任务无关的协议细化和微调。我们首先探讨在刘易斯游戏迭代学习。然后,我们向上扩展在翻译游戏的方法。在这两种环境,我们的研究结果表明,迭代学习-ING大幅柜台语言漂移以及它提高了完成任务指标。
注:中文为机器翻译结果!