0%

【arxiv论文】 Computation and Language 2020-09-10

目录

1. On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions [PDF] 摘要
2. Impact of News on the Commodity Market: Dataset and Results [PDF] 摘要
3. Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability [PDF] 摘要
4. Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages [PDF] 摘要
5. Quantifying the Effects of COVID-19 on Mental Health Support Forums [PDF] 摘要
6. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function [PDF] 摘要
7. Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling [PDF] 摘要
8. Covid-Transformer: Detecting Trending Topics on Twitter Using Universal Sentence Encoder [PDF] 摘要
9. Quantifying the Causal Effects of Conversational Tendencies [PDF] 摘要
10. Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition [PDF] 摘要
11. Brown University at TREC Deep Learning 2019 [PDF] 摘要
12. Combining Determinism and Non-Determinism [PDF] 摘要

摘要

1. On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions [PDF] 返回目录
  Ziqiao Wang, Yongyi Mao, Hongyu Guo, Richong Zhang
Abstract: SkipGram word embedding models with negative sampling, or SGN in short, is an elegant family of word embedding models. In this paper, we formulate a framework for word embedding, referred to as Word-Context Classification (WCC), that generalizes SGN to a wide family of models. The framework, utilizing some "noise examples", is justified through a theoretical analysis. The impact of noise distribution on the learning of the WCC embedding models is studied experimentally, suggesting that the best noise distribution is in fact the data distribution, in terms of both the embedding performance and the speed of convergence during training. Along our way, we discover several novel embedding models that outperform the existing WCC models.
摘要:SkipGram字嵌入负采样,或SGN机型总之就是一句话嵌入车型的优雅系列。在本文中,我们制定字嵌入了一个框架,简称字语境分类(WCC),一般化SGN了广泛系列的车型。的框架内,利用一些“噪声的例子”,通过理论分析合理的。在WCC嵌入模型的学习噪声分布的影响进行了实验研究,提示最好的噪声分布实际上是在数据分布,嵌入性能和收敛的训练时的速度两个方面。随着我们的方式,我们发现跑赢现有WCC模型几种新的嵌入模型。

2. Impact of News on the Commodity Market: Dataset and Results [PDF] 返回目录
  Ankur Sinha, Tanmay Khandait
Abstract: Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.
摘要:在过去的几年中,基于机器学习的方法已经从新闻流在金融领域应用中提取信息。然而,这些信息大多是包含在新闻头条的金融情感的形式,主要为股票价格。在我们目前的工作,我们提出了各种其他信息尺寸可以从新闻头条,这将有兴趣的投资者,政策制定者和其他从业人员中提取。我们提出了一个框架,提取诸如过去的运动和价格预期的方向性,资产比较等一般信息,新闻指的是。我们应用这个框架对商品的“黄金”,用的11412人注解新闻标题数据集(与本研究发布),从2000至19年期间收集训练机器学习模型。我们的实验来验证对金价的消息流的因果关系,并观察该信息来自我们的框架显著影响未来黄金价格产生的。

3. Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability [PDF] 返回目录
  Mayank Chhipa, Hrushikesh Mahesh Vazurkar, Abhijeet Kumar, Mridul Mishra
Abstract: With the recent influx of bidirectional contextualized transformer language models in the NLP, it becomes a necessity to have a systematic comparative study of these models on variety of datasets. Also, the performance of these language models has not been explored on non-GLUE datasets. The study presented in paper compares the state-of-the-art language models - BERT, ELECTRA and its derivatives which include RoBERTa, ALBERT and DistilBERT. We conducted experiments by finetuning these models for cross domain and disparate data and penned an in-depth analysis of model's performances. Moreover, an explainability of language models coherent with pretraining is presented which verifies the context capturing capabilities of these models through a model agnostic approach. The experimental results establish new state-of-the-art for Yelp 2013 rating classification task and Financial Phrasebank sentiment detection task with 69% accuracy and 88.2% accuracy respectively. Finally, the study conferred here can greatly assist industry researchers in choosing the language model effectively in terms of performance or compute efficiency.
摘要:随着近期NLP双向情境变压器的语言模型的涌入,就成了一种必然有这些模型对各种数据集进行了系统的比较研究。此外,这些语言模型的性能尚未探索的无胶的数据集。在本文所提出的研究比较了国家的最先进的语言模型 - BERT,ELECTRA和其衍生物,其中包括罗伯塔,ALBERT和DistilBERT。我们可以通过微调这些模型的跨域和不同的数据和模型的表现写下了深入的分析,进行实验。此外,提出语言与训练前的explainability模型相干验证了上下文通过模型无关的方法捕捉到这些模型的能力。实验结果建立新的国家的最先进的喊叫2013评分分类任务,并分别与69%的准确率和88.2%的精度金融Phrasebank情绪检测任务。最后,这项研究赋予这里可以极大地帮助行业研究员在性能和计算效率方面有效地选择语言模型。

4. Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages [PDF] 返回目录
  Christopher Liu, Laura Dominé, Kevin Chavez, Richard Socher
Abstract: Machine translation tools do not yet exist for the Yup'ik language, a polysynthetic language spoken by around 8,000 people who live primarily in Southwest Alaska. We compiled a parallel text corpus for Yup'ik and English and developed a morphological parser for Yup'ik based on grammar rules. We trained a seq2seq neural machine translation model with attention to translate Yup'ik input into English. We then compared the influence of different tokenization methods, namely rule-based, unsupervised (byte pair encoding), and unsupervised morphological (Morfessor) parsing, on BLEU score accuracy for Yup'ik to English translation. We find that using tokenized input increases the translation accuracy compared to that of unparsed input. Although overall Morfessor did best with a vocabulary size of 30k, our first experiments show that BPE performed best with a reduced vocabulary size.
摘要:机器翻译工具尚不存在的Yup'ik语言,身边的人谁8000西南阿拉斯加主要生活讲话的多式综合语。我们编制了Yup'ik和英语平行语料库,并开发了基于语法规则Yup'ik形态解析器。我们训练了与关注seq2seq神经机器翻译模型来Yup'ik输入翻译成英文。然后,我们比较了不同的分词方法,即以规则为基础,无监督(字节对编码)的影响,且无人监管的形态(Morfessor)分析,在BLEU得分精度Yup'ik的英语翻译。我们发现,使用切分的输入相比,非解析输入增加了翻译的准确性。虽然整体Morfessor做了最好的30K词汇量,我们的第一个实验表明,BPE具有减少词汇量表现最佳。

5. Quantifying the Effects of COVID-19 on Mental Health Support Forums [PDF] 返回目录
  Laura Biester, Katie Matton, Janarthanan Rajendran, Emily Mower Provost, Rada Mihalcea
Abstract: The COVID-19 pandemic, like many of the disease outbreaks that have preceded it, is likely to have a profound effect on mental health. Understanding its impact can inform strategies for mitigating negative consequences. In this work, we seek to better understand the effects of COVID-19 on mental health by examining discussions within mental health support communities on Reddit. First, we quantify the rate at which COVID-19 is discussed in each community, or subreddit, in order to understand levels of preoccupation with the pandemic. Next, we examine the volume of activity in order to determine whether the quantity of people seeking online mental health support has risen. Finally, we analyze how COVID-19 has influenced language use and topics of discussion within each subreddit.
摘要:COVID-19大流行,像许多已经此前的疾病爆发,很可能会对心理健康产生深远的影响。了解其影响可以通知策略以减轻负面影响。在这项工作中,我们力求通过检查Reddit上精神健康支援社区内的讨论更好地了解COVID-19对心理健康的影响。首先,我们量化处COVID-19在每个社区,或版(Subreddit)进行了讨论,以了解与流行偏见的水平速度。接下来,我们考察活动量,以确定的人的数量是否在网上寻求精神健康支持已经上升。最后,我们分析COVID-19是如何影响每个版(Subreddit)中的语言使用和讨论的话题。

6. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function [PDF] 返回目录
  Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov
Abstract: In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-the-art results for text classification task on several benchmark datasets. In particular, on the ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. We also show the generality of the mixed objective function by improving the performance on relation extraction task.
摘要:在本文中,我们同时使用监督和半监督办法的文本分类的任务研究双向LSTM网络。几种现有的作品表明,使用无监督方法,如语言建模(戴勒2015年,宫户,岱,古德费洛2016)无论是训练前复杂的方案或复杂的模型(约翰逊和张2017年)是必要的,以实现较高的分类精度。但是,我们制定培训策略,甚至允许一个简单的BiLSTM模型中,当用交叉熵损失的训练,更复杂的方法相比,实现竞争的结果。此外,除了交叉熵损失,通过使用熵最小化,敌对,并为标记的和未标记的数据虚拟对抗损失的组合,我们在几个基准数据集上报国家的先进成果的文本分类的任务。特别是,在ACL-IMDB情感分析和AG-新闻主题分类的数据集,我们的方法优于通过大幅盈余电流的方法。我们还通过改善关系抽取任务的性能表现混合目标函数的普遍性。

7. Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling [PDF] 返回目录
  Yiding Hao, Simon Mendelsohn, Rachel Sterneck, Randi Martinez, Robert Frank
Abstract: By positing a relationship between naturalistic reading times and information-theoretic surprisal, surprisal theory (Hale, 2001; Levy, 2008) provides a natural interface between language models and psycholinguistic models. This paper re-evaluates a claim due to Goodkind and Bicknell (2018) that a language model's ability to model reading times is a linear function of its perplexity. By extending Goodkind and Bicknell's analysis to modern neural architectures, we show that the proposed relation does not always hold for Long Short-Term Memory networks, Transformers, and pre-trained models. We introduce an alternate measure of language modeling performance called predictability norm correlation based on Cloze probabilities measured from human subjects. Our new metric yields a more robust relationship between language model quality and psycholinguistic modeling performance that allows for comparison between models with different training configurations.
摘要:通过并主张自然的阅读时间和信息理论surprisal,surprisal理论之间的关系时(HALE,2001;利维,2008)提供了语言模型和心理语言学模型之间的天然接口。本文重新评估一权利要求由于Goodkind和比克内尔(2018),该语言模型的对读取次数建模能力是它的困惑的线性函数。通过扩展Goodkind和比克内尔的分析,以现代神经架构,我们表明,该关系并不总是保持长期短时记忆网络,变压器和预先训练模式。我们介绍了基于从人受试者测量完形填空概率语言建模性能称为预测性规范相关的替代量度。我们新的度量产量的语言模型的质量和心理语言学的造型表现,允许不同的培训配置的模型之间的比较之间更强劲的关系。

8. Covid-Transformer: Detecting Trending Topics on Twitter Using Universal Sentence Encoder [PDF] 返回目录
  Meysam Asgari-Chenaghlu, Narjes Nikzad-Khasmakhi, Shervin Minaee
Abstract: The novel corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe. With its global impact, COVID-19 has become a major concern of people almost everywhere, and therefore there are a large number of tweets coming out from every corner of the world, about COVID-19 related topics. In this work, we try to analyze the tweets and detect the trending topics and major concerns of people on Twitter, which can enable us to better understand the situation, and devise better planning. More specifically we propose a model based on the universal sentence encoder to detect the main topics of Tweets in recent months. We used universal sentence encoder in order to derive the semantic representation and the similarity of tweets. We then used the sentence similarity and their embeddings, and feed them to K-means clustering algorithm to group similar tweets (in semantic sense). After that, the cluster summary is obtained using a text summarization algorithm based on deep learning, which can uncover the underlying topics of each cluster. Through experimental results, we show that our model can detect very informative topics, by processing a large number of tweets on sentence level (which can preserve the overall meaning of the tweets). Since this framework has no restriction on specific data distribution, it can be used to detect trending topics from any other social media and any other context rather than COVID-19. Experimental results show superiority of our proposed approach to other baselines, including TF-IDF, and latent Dirichlet allocation (LDA).
摘要:新型电晕病毒病(也称为COVID-19),导致大流行,影响世界各地的200多个国家。凭借其全球影响力,COVID-19已经成为人们主要关心的问题几乎无处不在,因此有来自世界的每一个角落,约COVID-19相关的话题来了大量的鸣叫。在这项工作中,我们试图分析的鸣叫和检测的热门话题,并在Twitter上的人们主要关注的问题,它可以使我们更好地了解情况,并制定更好的规划。更具体地讲,我们提出了基于通用句子编码器的模型,以检测在最近几个月推文的主要议题。我们为了获得语义表示和鸣叫的相似性使用通用句话编码器。然后,我们使用的句子相似度和他们的嵌入,以及它们提供给K-均值聚类算法将类似的鸣叫(语义意义上的)。在此之后,使用基于深度学习一个文本摘要算法,该算法可以发现每个群集的基本主题获得簇摘要。通过实验,我们发现我们的模型可以检测到非常丰富的主题,通过处理大量在句子层面的鸣叫(可以保持鸣叫的整体意思)。由于这个框架对具体数据的分布没有限制,它可以用于任何其他社交媒体和其他任何情况下,而不是COVID-19检测的热门话题。实验结果表明,我们提出的方法与其他基线,包括TF-IDF,和潜在狄利克雷分配(LDA)的优越性。

9. Quantifying the Causal Effects of Conversational Tendencies [PDF] 返回目录
  Justine Zhang, Sendhil Mullainathan, Cristian Danescu-Niculescu-Mizil
Abstract: Understanding what leads to effective conversations can aid the design of better computer-mediated communication platforms. In particular, prior observational work has sought to identify behaviors of individuals that correlate to their conversational efficiency. However, translating such correlations to causal interpretations is a necessary step in using them in a prescriptive fashion to guide better designs and policies. In this work, we formally describe the problem of drawing causal links between conversational behaviors and outcomes. We focus on the task of determining a particular type of policy for a text-based crisis counseling platform: how best to allocate counselors based on their behavioral tendencies exhibited in their past conversations. We apply arguments derived from causal inference to underline key challenges that arise in conversational settings where randomized trials are hard to implement. Finally, we show how to circumvent these inference challenges in our particular domain, and illustrate the potential benefits of an allocation policy informed by the resulting prescriptive information.
摘要:了解什么导致了有效的对话可以帮助更好的计算机中介传播平台设计。特别是,之前的观测工作,力求找出关联到他们的谈话效率的个人行为。然而,这种转换对相关解释因果在使用他们在一个规定的方式更好地指导设计和政策的必要步骤。在这项工作中,我们正式描述绘制对话的行为和结果之间因果关系的问题。我们专注于确定特定策略类型的基于文本的危机咨询平台的任务:如何最好地根据其过去谈话展出了他们的行为倾向分配辅导员。我们采用从因果推论得出在会话设置里的随机试验很难实施中出现的下划线关键挑战论点。最后,我们将展示如何规避我们的特定领域,这些推断的挑战,并说明由所产生的说明性信息通报的分配政策的潜在好处。

10. Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition [PDF] 返回目录
  Junghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee
Abstract: Collecting and accessing a large amount of medical data is very time-consuming and laborious, not only because it is difficult to find specific patients but also because it is required to resolve the confidentiality of a patient's medical records. On the other hand, there are deep learning models, trained on easily collectible, large scale datasets such as Youtube or Wikipedia, offering useful representations. It could therefore be very advantageous to utilize the features from these pre-trained networks for handling a small amount of data at hand. In this work, we exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network, with a small dataset provided by the ADReSS Challenge at INTERSPEECH 2020. The challenge regards to discern patients suspicious of Alzheimer's Dementia by providing acoustic and textual data. With the multi-modal features, we modify a Convolutional Recurrent Neural Network based structure to perform classification and regression tasks simultaneously and is capable of computing conversations with variable lengths. Our test results surpass baseline's accuracy by 18.75%, and our validation result for the regression task shows the possibility of classifying 4 classes of cognitive impairment with an accuracy of 78.70%.
摘要:收集和访问大量医疗数据是非常费时费力,不仅因为它是很难找到具体的患者,但也因为它是需要解决的一个病人的医疗记录的保密性。在另一方面,也有深刻的学习模式,培训了容易收藏,大规模的数据集,如YouTube或维基百科,提供有用的表示。因此,它可以是利用来自这些预训练网络的特征在手处理数据量小是非常有利的。在这项工作中,我们利用从预先训练网络提取利用神经网络识别早老性痴呆,与面临的挑战方面辨别病人可疑的早老性痴呆的由ADRESS挑战赛INTERSPEECH 2020提供了一个小数据集各种多模态功能提供声音和文本数据。与多模态功能,我们修改卷积递归神经网络基于结构同时执行分类和回归的任务和能够计算具有可变长度的对话。我们的测试结果18.75%超过基线的准确性,以及我们对回归任务验证结果表明,4类认知障碍与78.70%的准确度进行分类的可能性。

11. Brown University at TREC Deep Learning 2019 [PDF] 返回目录
  George Zerveas, Ruochen Zhang, Leila Kim, Carsten Eickhoff
Abstract: This paper describes Brown University's submission to the TREC 2019 Deep Learning track. We followed a 2-phase method for producing a ranking of passages for a given input query: In the the first phase, the user's query is expanded by appending 3 queries generated by a transformer model which was trained to rephrase an input query into semantically similar queries. The expanded query can exhibit greater similarity in surface form and vocabulary overlap with the passages of interest and can therefore serve as enriched input to any downstream information retrieval method. In the second phase, we use a BERT-based model pre-trained for language modeling but fine-tuned for query document relevance prediction to compute relevance scores for a set of 1000 candidate passages per query and subsequently obtain a ranking of passages by sorting them based on the predicted relevance scores. According to the results published in the official Overview of the TREC Deep Learning Track 2019, our team ranked 3rd in the passage retrieval task (including full ranking and re-ranking), and 2nd when considering only re-ranking submissions.
摘要:本文介绍了布朗大学的提交TREC 2019深度学习轨道。我们随后用于产生针对给定输入查询的排名通道的2阶段方法:在第一阶段,用户的查询是通过附加由被培养成改写输入查询到语义相似的变压器模型生成3次的查询扩展查询。所扩展的查询可以表现出在表面的形式较大的相似性和词汇重叠与感兴趣的通道,并因此可以用作富集输入到任何下游信息检索方法。在第二阶段中,我们使用基于BERT模型预先训练的语言模型,但微调查询文档相关预测计算相关性得分为一组的每个查询1000个候选人通道,然后通过排序他们获得排名通道基于预测的相关性得分。据发表在TREC深度学习轨道2019的官方概述的结果,我们的团队在通道检索任务只考虑重新排名时提交的材料(包括完整的排名和重新排序),和第2排名第三。

12. Combining Determinism and Non-Determinism [PDF] 返回目录
  Michael Stephen Fiske
Abstract: Our goal is to construct mathematical operations that combine non-determinism measured from quantum randomness with computational determinism so that non-mechanistic behavior is preserved in the computation. Formally, some results about operations applied to computably enumerable (c.e.) and bi-immune sets are proven here, where the objective is for the operations to preserve bi-immunity. While developing rearrangement operations on the natural numbers, we discovered that the bi-immune rearrangements generate an uncountable subgroup of the symmetric group on the natural numbers. The structure of this new subgroup is unknown.
摘要:我们的目标是构建一个使非机械行为是在计算保留结合量子随机测量与计算决定不确定性的数学运算。从形式上看,有关操作的一些成果应用于computably枚举(CE值)和双向免疫组在这里证实,这里的目的是使操作保持双向免疫力。在开发自然数上重排的操作中,我们发现,双向免疫重排产生的对称群上的自然数不可数子组。这个新的子分组的结构是未知的。

注:中文为机器翻译结果!封面为论文标题词云图!