0%

【arxiv论文】 Computation and Language 2020-04-16

目录

1. Entities as Experts: Sparse Memory Access with Entity Supervision [PDF] 摘要
2. Document-level Representation Learning using Citation-informed Transformers [PDF] 摘要
3. PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [PDF] 摘要
4. Bayesian Hierarchical Words Representation Learning [PDF] 摘要
5. Analyzing analytical methods: The case of phonology in neural models of spoken language [PDF] 摘要
6. Gestalt: a Stacking Ensemble for SQuAD2.0 [PDF] 摘要
7. Exploring Probabilistic Soft Logic as a framework for integrating top-down and bottom-up processing of language in a task context [PDF] 摘要
8. Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter [PDF] 摘要
9. ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues [PDF] 摘要
10. Coreferential Reasoning Learning for Language Representation [PDF] 摘要
11. On the Linguistic Capacity of Real-Time Counter Automata [PDF] 摘要
12. A Human Evaluation of AMR-to-English Generation Systems [PDF] 摘要
13. Mining Coronavirus (COVID-19) Posts in Social Media [PDF] 摘要
14. A Simple Yet Strong Pipeline for HotpotQA [PDF] 摘要
15. Balancing Training for Multilingual Neural Machine Translation [PDF] 摘要
16. lamBERT: Language and Action Learning Using Multimodal BERT [PDF] 摘要
17. A hybrid classical-quantum workflow for natural language processing [PDF] 摘要
18. Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model [PDF] 摘要
19. Speaker Diarization with Lexical Information [PDF] 摘要

摘要

1. Entities as Experts: Sparse Memory Access with Entity Supervision [PDF] 返回目录
  Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, Tom Kwiatkowski
Abstract: We focus on the problem of capturing declarative knowledge in the learned parameters of a language model. We introduce a new model, Entities as Experts (EaE), that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EaE's entity representations are learned directly from text. These representations capture sufficient knowledge to answer TriviaQA questions such as "Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?". EaE outperforms a Transformer model with $30\times$ the parameters on this task. According to the Lama knowledge probes, EaE also contains more factual knowledge than a similar sized Bert. We show that associating parameters with specific entities means that EaE only needs to access a fraction of its parameters at inference time, and we show that the correct identification, and representation, of entities is essential to EaE's performance. We also argue that the discrete and independent entity representations in EaE make it more modular and interpretable than the Transformer architecture on which it is based.
摘要:我们专注于语言模型的参数了解到捕捉陈述性知识的问题。我们推出了新的模式,实体为专家(EAE),可以访问在一段文字中提到的实体的独特记忆。不同于以往的努力,实体知识融入序列模型,EAE的实体表示是直接从课文中学到了。这些表示捕获足够的知识来回答TriviaQA问题,如“哪个医生谁小人已经被罗杰·德尔加多,安东尼·阿利,埃里克·罗伯茨扮演?”。 EAE胜过一个变压器模型,以$ 30 \次美元这一任务的参数。据喇嘛知识探头,EAE还含有比类似规模的伯特更实际知识。我们发现,与相关联的特定实体的手段参数EAE只需要在推理时间来访问它的参数的一小部分,我们表明,正确识别,并表示,实体是EAE的性能至关重要。我们也认为,在EAE分立的独立实体时表示使它比变压器的架构上,它是基于更加模块化和解释。

2. Document-level Representation Learning using Citation-informed Transformers [PDF] 返回目录
  Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld
Abstract: Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark.
摘要:表示学习是自然语言处理系统中的关键因素。最近变压器语言模型,如BERT学习功能强大的文本表示,但这些模型是针对令牌的走向和句子层次的培训目标,并没有对文档间关联性,这限制了它们的文档级表现力利用信息。有关科学文档的应用程序,如分类和推荐,在结束任务的嵌入功率强劲的性能。我们建议幽灵,一种新的方法来生成基于文档级关联的一个强有力的信号预训练变压器语言模型科学文件文档级嵌入:引用图。与现有的预训练的语言模型,幽灵可以很容易地应用到下游应用,而无需特定任务的微调。此外,为了鼓励在文件级别车型的进一步研究,我们引入SciDocs,新的评价标准由七文档级的任务,从引文预测,到文档分类和推荐。我们发现,幽灵优于各种基准的有竞争力的基线。

3. PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [PDF] 返回目录
  Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang
Abstract: Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation, such as BERT, MASS and BART. The existing pre-training techniques employ autoencoding and/or autoregressive objectives to train Transformer-based models by recovering original word tokens from corrupted text with some masked tokens. In this work, we present PALM which pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus especially for downstream generation conditioned on context, such as generative question answering and conversational response generation. PALM minimizes the mismatch introduced by the existing denoising scheme between pre-training and fine-tuning where generation is more than reconstructing original text. With a novel pre-training scheme, PALM achieves new state-of-the-art results on a variety of language generation benchmarks covering generative question answering (Rank 1 on the official MARCO leaderboard), abstractive summarization on Gigaword and conversational response generation on Cornell Movie Dialogues.
摘要:自监督前培训已经成为了自然语言理解和生成,如BERT,质量和BART的强大技术。现有的预培训技术采用autoencoding和/或自回归目标由一些蒙面令牌损坏的文本恢复原来的字令牌来训练基于变压器的模型。在这项工作中,我们本PALM哪个预列车上的大的未标记的语料库尤其对于下游代空调上下文,诸如生成问答和会话响应生成一个autoencoding和自回归语言模型。 PALM可以最小化前的训练和微调。发电量超过重构原始文本之间现有的降噪方案出台的不匹配。具有新颖预培养方案,PALM实现对各种语言生成基准覆盖生成问答(等级1上的官方MARCO横幅),上Gigaword抽象总结和会话响应生成上康奈尔的状态的最先进的新的结果电影的对话。

4. Bayesian Hierarchical Words Representation Learning [PDF] 返回目录
  Oren Barkan, Idan Rejwan, Avi Caciularu, Noam Koenigstein
Abstract: This paper presents the Bayesian Hierarchical Words Representation (BHWR) learning algorithm. BHWR facilitates Variational Bayes word representation learning combined with semantic taxonomy modeling via hierarchical priors. By propagating relevant information between related words, BHWR utilizes the taxonomy to improve the quality of such representations. Evaluation of several linguistic datasets demonstrates the advantages of BHWR over suitable alternatives that facilitate Bayesian modeling with or without semantic priors. Finally, we further show that BHWR produces better representations for rare words.
摘要:本文介绍了分层贝叶斯词表示(BHWR)学习算法。 BHWR有利于变贝叶斯字表示学习通过分层先验语义分类模型相结合。通过传播相关的词之间的相关信息,BHWR利用改善这种表示的质量分类。几种语言的数据集的评估表明在促进有或无语义先验贝叶斯模型合适的替代BHWR的优势。最后,我们进一步的研究表明BHWR的生僻字能产生更好的表示。

5. Analyzing analytical methods: The case of phonology in neural models of spoken language [PDF] 返回目录
  Grzegorz Chrupała, Bertrand Higy, Afra Alishahi
Abstract: Given the fast development of analysis techniques for NLP and speech processing systems, few systematic studies have been conducted to compare the strengths and weaknesses of each method. As a step in this direction we study the case of representations of phonology in neural network models of spoken language. We use two commonly applied analytical techniques, diagnostic classifiers and representational similarity analysis, to quantify to what extent neural activation patterns encode phonemes and phoneme sequences. We manipulate two factors that can affect the outcome of analysis. First, we investigate the role of learning by comparing neural activations extracted from trained versus randomly-initialized models. Second, we examine the temporal scope of the activations by probing both local activations corresponding to a few milliseconds of the speech signal, and global activations pooled over the whole utterance. We conclude that reporting analysis results with randomly initialized models is crucial, and that global-scope methods tend to yield more consistent results and we recommend their use as a complement to local-scope diagnostic methods.
摘要:鉴于对NLP分析技术和语音处理系统的快速发展,很少有系统的研究已进行了比较每种方法的长处和短处。由于在此方向迈出的一步,我们研究语言的神经网络模型的音韵申述的情况。我们使用两种通常应用的分析技术,诊断分类器和代表性相似性分析,以量化到什么程度的神经激活模式编码音素和音素序列。我们操纵两个因素会影响分析的结果。首先,我们调查通过比较提取从训练的对比随机初始化模型神经激活学习的作用。其次,我们考察通过探测对应于语音信号的几毫秒,并汇集在整个话语全球激活本地激活的激活的时间范围。我们得出结论,报告分析结果与随机初始化模式是至关重要的,而全球范围的方法往往会产生更一致的结果,我们建议他们以补充本地范围的诊断方法。

6. Gestalt: a Stacking Ensemble for SQuAD2.0 [PDF] 返回目录
  Mohamed El-Geish
Abstract: We propose a deep-learning system -- for the SQuAD2.0 task -- that finds, or indicates the lack of, a correct answer to a question in a context paragraph. Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that, when blended properly, outperforms the best model in the ensemble per se. We created a stacking ensemble that combines top-N predictions from two models, based on ALBERT and RoBERTa, into a multiclass classification task to pick the best answer out of their predictions. We explored various ensemble configurations, input representations, and model architectures. For evaluation, we examined test-set EM and F1 scores; our best-performing ensemble incorporated a CNN-based meta-model and scored 87.117 and 90.306, respectively -- a relative improvement of 0.55% for EM and 0.61% for F1 scores, compared to the baseline performance of the best model in the ensemble, an ALBERT-based model, at 86.644 for EM and 89.760 for F1.
摘要:我们提出了一个深刻的学习系统 - 为SQuAD2.0任务 - 即认定,或表明缺乏,一个正确的答案,在上下文段落的问题。我们的目标是要学会异质SQuAD2.0车型的集合,当适当混合,优于在乐团本身的最佳模式。我们创建了一个堆叠合奏,从两种模式结合了前N个预测的基础上,阿尔伯特和罗伯塔,为多类分类任务要挑最好的答案了他们的预测。我们探讨不同的合奏配置,输入表示,与模型架构。对于评价,我们检测测试组EM和F1分数;我们成立一个基于CNN-元模型和最佳表演合奏拿下87.117和90.306,分别为 - 0.55%,为EM和F1分数0.61%的相对改善,相比于最佳模式在合奏的基准性能,一个ALBERT为基础的模式,以86.644的EM和89.760的F1。

7. Exploring Probabilistic Soft Logic as a framework for integrating top-down and bottom-up processing of language in a task context [PDF] 返回目录
  Johannes Dellert
Abstract: This technical report describes a new prototype architecture designed to integrate top-down and bottom-up analysis of non-standard linguistic input, where a semantic model of the context of an utterance is used to guide the analysis of the non-standard surface forms, including their automated normalization in context. While the architecture is generally applicable, as a concrete use case of the architecture we target the generation of semantically-informed target hypotheses for answers written by German learners in response to reading comprehension questions, where the reading context and possible target answers are given. The architecture integrates existing NLP components to produce candidate analyses on eight levels of linguistic modeling, all of which are broken down into atomic statements and connected into a large graphical model using Probabilistic Soft Logic (PSL) as a framework. Maximum a posteriori inference on the resulting graphical model then assigns a belief distribution to candidate target hypotheses. The current version of the architecture builds on Universal Dependencies (UD) as its representation formalism on the form level and on Abstract Meaning Representations (AMRs) to represent semantic analyses of learner answers and the context information provided by the target answers. These general choices will make it comparatively straightforward to apply the architecture to other tasks and other languages.
摘要:该技术报告描述了一个新的原型架构,旨在整合非标准语言输入,在话语的上下文的语义模型被用来引导非标面的分析,自上而下和自下而上的分析形式,包括在上下文中它们的自动规格化。虽然架构是普遍适用的,因为建筑的具体使用情况下,我们的目标语义知情目标假设的生成由德国学生在响应写入到阅读理解题,这里的阅读环境和可能的目标答案给出答案。现有NLP组件架构集成,以产生八个水平语言建模的,所有这些都被分解成原子语句和连接成使用概率软逻辑(PSL),为框架大图形模型候选分析。最大对所得图形模型的后验推理然后分配一个置信度分布到候选目标假设。该体系结构的当前版本建立在通用依赖性(UD)上的形式水平其代表性形式主义,在抽象意义交涉(AMRS)来表示学习者的答案和由目标答案提供的上下文信息的语义分析。这些一般选择将使其比较简单的架构应用到其他任务和其他语言。

8. Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter [PDF] 返回目录
  Philipp Wicke, Marianna M. Bolognesi
Abstract: Doctors and nurses in these weeks are busy in the trenches, fighting against a new invisible enemy: Covid-19. Cities are locked down and civilians are besieged in their own homes, to prevent the spreading of the virus. War-related terminology is commonly used to frame the discourse around epidemics and diseases. Arguably the discourse around the current epidemic will make use of war-related metaphors too,not only in public discourse and the media, but also in the tweets written by non-experts of mass communication. We hereby present an analysis of the discourse around #Covid-19, based on a corpus of 200k tweets posted on Twitter during March and April 2020. Using topic modelling we first analyze the topics around which the discourse can be classified. Then, we show that the WAR framing is used to talk about specific topics, such as the virus treatment, but not others, such as the effects of social distancing on the population. We then measure and compare the popularity of the WAR frame to three alternative figurative frames (MONSTER, STORM and TSUNAMI) and a literal frame used as control (FAMILY). The results show that while the FAMILY literal frame covers a wider portion of the corpus, among the figurative framings WAR is the most frequently used, and thus arguably the most conventional one. However, we conclude, this frame is not apt to elaborate the discourse around many aspects involved in the current situation. Therefore, we conclude, in line with previous suggestions, a plethora of framing options, or a metaphor menu, may facilitate the communication of various aspects involved in the Covid-19-related discourse on the social media, and thus support civilians in the expression of their feelings, opinions and ideas during the current pandemic.
摘要:医生和护士在这周都在忙着在战壕里,对一个新的看不见的敌人战斗:Covid-19。城市被锁定,并且平民在自己的家包围,以防止病毒的传播。战争相关的术语是常用的框架周围流行病和疾病的话语。可以说,围绕当前流行的话语会利用战争有关的隐喻太多,不仅在公共话语和媒体,也是通过大众传播的非专家撰写的tweet。在此,我们提出围绕#Covid-19的话语进行分析的基础上,三月和四月2020年使用主题建模我们首先分析其周围的话语可以分为主题期间Twitter上发布200K鸣叫语料库。然后,我们表明,战争框架来谈谈具体的主题,如病毒治疗,而不是其他人,如对居民社会距离的影响。然后,我们测量和WAR帧的普及比较三种替代形象化帧(怪物,STORM海啸)和用作对照(家庭)字面帧。结果表明,而家庭字面框架覆盖语料库的较宽部分,所述具象之间的构架WAR是最经常使用的,并且因此可以说是最常规的。然而,我们认为,这个框架是不容易细说围绕涉及当前局势的许多方面的话语权。因此,我们得出结论,与先前建议的框架选项过多,或比喻菜单线,可以方便的参与对社会化媒体的Covid-19相关的话语各方面的沟通,从而支持平民表达目前大流行期间,他们的感受,意见和想法。

9. ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues [PDF] 返回目录
  Chien-Sheng Wu, Steven Hoi, Richard Socher, Caiming Xiong
Abstract: The use of pre-trained language models has emerged as a promising direction for improving dialogue systems. However, the underlying difference of linguistic patterns between conversational data and general text makes the existing pre-trained language models not as effective as they have been shown to be. Recently, there are some pre-training approaches based on open-domain dialogues, leveraging large-scale social media data such as Twitter or Reddit. Pre-training for task-oriented dialogues, on the other hand, is rarely discussed because of the long-standing and crucial data scarcity problem. In this work, we combine nine English-based, human-human, multi-turn and publicly available task-oriented dialogue datasets to conduct language model pre-training. The experimental results show that our pre-trained task-oriented dialogue BERT (ToD-BERT) surpasses BERT and other strong baselines in four downstream task-oriented dialogue applications, including intention detection, dialogue state tracking, dialogue act prediction, and response selection. Moreover, in the simulated limited data experiments, we show that ToD-BERT has stronger few-shot capacity that can mitigate the data scarcity problem in task-oriented dialogues.
摘要:采用预训练的语言模型已经成为了提高对话系统有前途的方向。然而,会话数据和一般的文本之间的语言模式的根本区别,使现有的预训练的语言模型不是有效,因为他们已经被证明是。近来,有一些前培训方法的基础上开放域的对话,充分利用大型的社交媒体数据如Twitter或reddit的。面向任务的对话前培训,另一方面,很少因为长期和重要数据匮乏问题的讨论。在这项工作中,我们结合基于英语的九人 - 人,多圈并公布了面向任务的对话数据集进行语言模型前培训。实验结果表明,我们的预先训练的面向任务的对话BERT(TOD-BERT)超过BERT和四个下游面向任务的对话应用等强大的基线,包括意图识别,对话状态跟踪,对话行为预测和响应的选择。此外,在模拟的有限的实验数据,我们发现的ToD-BERT具有可以在面向任务的对话解决数据匮乏问题更强一些拍能力。

10. Coreferential Reasoning Learning for Language Representation [PDF] 返回目录
  Deming Ye, Yankai Lin, Jiaju Du, Zhenghao Liu, Maosong Sun, Zhiyuan Liu
Abstract: Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, existing language representation models seldom consider coreference explicitly, the relationship between noun phrases referring to the same entity, which is essential to a coherent understanding of the whole discourse. To address this issue, we present CorefBERT, a novel language representation model designed to capture the relations between noun phrases that co-refer to each other. According to the experimental results, compared with existing baseline models, the CorefBERT model has made significant progress on several downstream NLP tasks that require coreferential reasoning, while maintaining comparable performance to previous models on other common NLP tasks.
摘要:语言表达等车型BERT能有效地从纯文本捕捉上下文语义信息,并且已经证明,以实现大量的适当的微调下游NLP任务可喜的成果。然而,现有的语言表示模型很少考虑共指明确,名词短语指的是同一实体,这是整个语篇的连贯理解至关重要的关系。为了解决这个问题,我们目前CorefBERT,一种新的语言表达模型用来捕捉名词短语之间的关系是合作,相互引用。根据实验结果,用现有的基准模型相比,CorefBERT模式使得在需要共指推理几个下游NLP任务显著的进步,同时保持相当的性能以往机型的其他常见NLP任务。

11. On the Linguistic Capacity of Real-Time Counter Automata [PDF] 返回目录
  William Merrill
Abstract: Counter machines have achieved a newfound relevance to the field of natural language processing (NLP): recent work suggests some strong-performing recurrent neural networks utilize their memory as counters. Thus, one potential way to understand the success of these networks is to revisit the theory of counter computation. Therefore, we study the abilities of real-time counter machines as formal grammars, focusing on formal properties that are relevant for NLP models. We first show that several variants of the counter machine converge to express the same class of formal languages. We also prove that counter languages are closed under complement, union, intersection, and many other common set operations. Next, we show that counter machines cannot evaluate boolean expressions, even though they can weakly validate their syntax. This has implications for the interpretability and evaluation of neural network systems: successfully matching syntactic patterns does not guarantee that counter memory accurately encodes compositional semantics. Finally, we consider whether counter languages are semilinear. This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.
摘要:计数器机器都取得了新发现的相关性自然语言处理(NLP)领域:最近的工作提出了一些表现强劲的回归神经网络利用他们的内存计数器。因此,要了解这些网络的成功一个潜在方法是重新审视计数器计算理论。因此,我们研究的实时计数器机作为正式语法的能力,重点是相关的NLP模型形式特征。我们首先展示柜台机收敛的几个变体来表达同一类形式语言。我们也证明了反语言下的补充,并,交,和许多其他常见的一组操作关闭。接下来,我们表明,反机器不能评价布尔表达式,即使他们可以弱验证其语法。这对神经网络系统的可解释性和评价的影响:成功匹配句法模式并不能保证内存计数器准确编码组合语义学。最后,我们认为反语言是否是半线性。这个工作对对理解回归神经网络的潜在利益的形式语言的一般理论贡献。

12. A Human Evaluation of AMR-to-English Generation Systems [PDF] 返回目录
  Emma Manning, Shira Wein, Nathan Schneider
Abstract: Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems.
摘要:用于从抽象意义表示(AMR)生成英文文本大多数当前的国家的本领域的系统已经使用自动化度量,诸如BLEU,其已知是对自然语言生成问题仅评价。在这项工作中,我们提出它收集的流畅度和充足的得分,以及错误类型进行分类,对最近的几个AMR代系统的新的人评价的结果。我们讨论了这些系统的相对质量和我们的结果如何比较这些自动生成的指标,发现,虽然指标是总排名系统大多成功,收集人的判断允许更细致的比较。我们还分析了这些系统进行的常见错误。

13. Mining Coronavirus (COVID-19) Posts in Social Media [PDF] 返回目录
  Negin Karisani, Payam Karisani
Abstract: World Health Organization (WHO) characterized the novel coronavirus (COVID-19) as a global pandemic on March 11th, 2020. Before this and in late January, more specifically on January 27th, while the majority of the infection cases were still reported in China and a few cruise ships, we began crawling social media user postings using the Twitter search API. Our goal was to leverage machine learning and linguistic tools to better understand the impact of the outbreak in China. Unlike our initial expectation to monitor a local outbreak, COVID-19 rapidly spread across the globe. In this short article we report the preliminary results of our study on automatically detecting the positive reports of COVID-19 from social media user postings using state-of-the-art machine learning models.
摘要:世界卫生组织(WHO)表征了新型冠状病毒(COVID-19)3月11日,2020年全球大流行在此之前和在一月下旬,更具体地说1月27日,而大多数的感染病例仍报在中国和一些游船,我们开始爬行使用Twitter搜索API的社交媒体用户的帖子。我们的目标是利用机器学习和语言工具,以更好地了解中国爆发的影响。不像我们最初的预期监视本地爆发,COVID-19迅速蔓延全球。在这短短的文章中,我们报告自动检测到来自社交媒体用户帖子采用先进设备,最先进的机器学习模型COVID-19的正面报道我们研究的初步结果。

14. A Simple Yet Strong Pipeline for HotpotQA [PDF] 返回目录
  Dirk Groeneveld, Tushar Khot, Mausam, Ashish Sabharwal
Abstract: State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However, does their strong performance on popular multi-hop datasets really justify this added design complexity? Our results suggest that the answer may be no, because even our simple pipeline based on BERT, named Quark, performs surprisingly well. Specifically, on HotpotQA, Quark outperforms these models on both question answering and support identification (and achieves performance very close to a RoBERTa model). Our pipeline has three steps: 1) use BERT to identify potentially relevant sentences independently of each other; 2) feed the set of selected sentences as context into a standard BERT span prediction model to choose an answer; and 3) use the sentence selection model, now with the chosen answer, to produce supporting sentences. The strong performance of Quark resurfaces the importance of carefully exploring simple model designs before using popular benchmarks to justify the value of complex techniques.
摘要:国家的最先进的机型多跳的问题通常回答扩充大型语言模型,如BERT额外的,直观实用的功能,如命名实体识别,基于图形的推理和问题分解。然而,他们确实对流行的多跳数据集强劲的性能真的证明这种添加设计的复杂性?我们的研究结果表明,因为即使我们简单的管道基础上BERT,命名为夸克,执行出奇地好答案可能是否定的,。具体来说,在HotpotQA,夸克优于两个问题回答和支持识别这些模型(和达到性能非常接近罗伯塔模型)。我们的管道有三个步骤:1)使用BERT来识别潜在的相关句子互不影响; 2)进料的组中选择的句子作为上下文的成标准BERT寿命预测模型,选择答案; 3)用句子选择模型,现在所选择的答案,生产配套的句子。夸克性能强劲复出的使用流行的基准测试来证明的复杂技术的价值之前,仔细探索简单的模型设计的重要性。

15. Balancing Training for Multilingual Neural Machine Translation [PDF] 返回目录
  Xinyi Wang, Yulia Tsvetkov, Graham Neubig
Abstract: When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others. Standard practice is to up-sample less resourced languages to increase representation, and the degree of up-sampling has a large effect on the overall performance. In this paper, we propose a method that instead automatically learns how to weight training data through a data scorer that is optimized to maximize performance on all test languages. Experiments on two sets of languages under both one-to-many and many-to-one MT settings show our method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.
摘要:当训练多语言机器翻译(MT)的模型,可以从多语言翻译成/,我们面临着不平衡的训练集:有些语言有更多的训练数据比其他人。标准的做法是将上采样较贫穷的语言,以增加代表性,和上采样的程度对整体性能有很大的影响。在本文中,我们提出,而不是自动通过优化在所有测试语言性能最大化数据的得分手如何学习重量训练数据的方法。同时根据一个一对多和多对一一个MT设置两套语言实验证明我们的方法不仅一致优于启发式基线的平均性能方面,而且还提供了其中的语言进行了优化性能的灵活控制。

16. lamBERT: Language and Action Learning Using Multimodal BERT [PDF] 返回目录
  Kazuki Miyazawa, Tatsuya Aoki, Takato Horii, Takayuki Nagai
Abstract: Recently, the bidirectional encoder representations from transformers (BERT) model has attracted much attention in the field of natural language processing, owing to its high performance in language understanding-related tasks. The BERT model learns language representation that can be adapted to various tasks via pre-training using a large corpus in an unsupervised manner. This study proposes the language and action learning using multimodal BERT (lamBERT) model that enables the learning of language and actions by 1) extending the BERT model to multimodal representation and 2) integrating it with reinforcement learning. To verify the proposed model, an experiment is conducted in a grid environment that requires language understanding for the agent to act properly. As a result, the lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models, such as the convolutional neural network-based model and the lamBERT model without pre-training.
摘要:近日,从变压器(BERT)模式,双向编码表示备受关注的自然语言处理领域,由于在语言理解相关的任务,它的高性能。该BERT模型获悉语言表示,可以通过前培训在无人监督的方式使用大型语料库适应各种任务。本研究提出利用扩展BERT模式,多表示,2例强化学习它整合多BERT(兰伯特)模式,由1使语言和动作的学习))的语言和行动学习。为了验证该模型,实验用在需要语言理解为使代理正常行为网格环境中进行。其结果是,相比其他车型,如卷积基于神经网络的模型,并没有预先训练兰伯特模型兰伯特模型得到的多任务设置和传输设置更高的回报。

17. A hybrid classical-quantum workflow for natural language processing [PDF] 返回目录
  Lee J. O'Riordan, Myles Doyle, Fabio Baruffa, Venkatesh Kannan
Abstract: Natural language processing (NLP) problems are ubiquitous in classical computing, where they often require significant computational resources to infer sentence meanings. With the appearance of quantum computing hardware and simulators, it is worth developing methods to examine such problems on these platforms. In this manuscript we demonstrate the use of quantum computing models to perform NLP tasks, where we represent corpus meanings, and perform comparisons between sentences of a given structure. We develop a hybrid workflow for representing small and large scale corpus data sets to be encoded, processed, and decoded using a quantum circuit model. In addition, we provide our results showing the efficacy of the method, and release our developed toolkit as an open software suite.
摘要:自然语言处理(NLP)问题在传统的计算,他们往往需要显著的计算资源来推断句子的含义无处不在。随着量子计算的硬件和模拟器的出现,这是值得开发的方法来研究这些平台上的此类问题。在这份手稿,我们展示了使用量子计算模型来执行任务NLP,在这里我们代表语料库的意义,给定结构的句子之间进行比较。我们开发用于表示小规模和大规模语料库数据集要被编码,处理和使用量子电路模型解码的混合工作流程。此外,我们提供我们的研究结果显示该方法的有效性,并释放我们的开发工具包,作为一个开放的软件套件。

18. Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model [PDF] 返回目录
  Toktam A. Oghaz, Ece C. Mutlu, Jasser Jasser, Niloofar Yousefi, Ivan Garibay
Abstract: Online social media platforms are turning into the prime source of news and narratives about worldwide events. However,a systematic summarization-based narrative extraction that can facilitate communicating the main underlying events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization. Our topic model identifies topics' recurrence over time with a varying time resolution. This framework not only captures the topic distributions from the data, but also approximates the user activity fluctuations over time. Furthermore, we define significance-dispersity trade-off (SDT) as a comparison measure to identify the topic with the highest lifetime attractiveness in a timestamped corpus. We evaluate our model on a large corpus of Twitter data, including more than one million tweets in the domain of the disinformation campaigns conducted against the White Helmets of Syria. Our results indicate that the proposed framework is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data.
摘要:在线社交媒体平台正在成为新闻和有关全球事件的叙述主要来源。然而,基于系统的总结叙事提取,可以方便沟通的主要潜在事件缺乏。为了解决这个问题,我们提出了一种新的基于事件的叙述总结提取框架。我们提出的框架被设计成一个概率主题模型,与分类时间分布,然后提取文本摘要。我们的话题模型确定主题复发随时间变化的时间分辨率。这个框架不仅捕获来自数据的话题分布,而且随着时间的推移接近用户活动的波动。此外,我们定义的意义分散度权衡(SDT)作为比较指标,以确定在时间戳语料库最高寿命吸引力的话题。我们评估我们在大语料库Twitter的数据,其中包括对叙利亚的白盔进行的造谣运动的域名超过一个百万鸣叫模型。我们的研究结果表明,该框架是有效地识别局部趋势,以及从时间戳数据文本语料库中提取的叙述总结。

19. Speaker Diarization with Lexical Information [PDF] 返回目录
  Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
Abstract: This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.
摘要:这项工作提出了扬声器diarization一种新的方法来通过自动语音识别提供杠杆词汇信息。我们提出了一个扬声器diarization系统,该系统可以将字级扬声器转概率与扬声器的嵌入到扬声器聚类处理以提高整体diarization准确性。聚类过程中的综合集成方法词汇和声音信息,我们介绍谱聚类邻接矩阵整合。由于对字级扬声器的拐弯可能性估计词与词边界信息是通过语音识别系统提供的,我们提出的方法工作没有任何人为干预的手工抄录。我们表明,该方法提高相比,只扬声器的嵌入使用声学信息基线diarization系统上的各种评价数据集diarization性能。

注:中文为机器翻译结果!