0%

【arxiv论文】 Computation and Language 2020-03-13

目录

1. KGvec2go -- Knowledge Graph Embeddings as a Service [PDF] 摘要
2. It Means More if It Sounds Good: Yet Another Hypotheses Concerning the Evolution of Polysemous Words [PDF] 摘要
3. Sentiment Analysis with Contextual Embeddings and Self-Attention [PDF] 摘要
4. Learning word-referent mappings and concepts from raw inputs [PDF] 摘要
5. Semantic Holism and Word Representations in Artificial Neural Networks [PDF] 摘要
6. Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking [PDF] 摘要
7. A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis [PDF] 摘要
8. Investigating the influence Brexit had on Financial Markets, in particular the GBP/EUR exchange rate [PDF] 摘要
9. Regular Intersection Emptiness of Graph Problems: Finding a Needle in a Haystack of Graphs with the Help of Automata [PDF] 摘要
10. Expressiveness and machine processability of Knowledge Organization Systems (KOS): An analysis of concepts and relations [PDF] 摘要

摘要

1. KGvec2go -- Knowledge Graph Embeddings as a Service [PDF] 返回目录
  Jan Portisch, Michael Hladik, Heiko Paulheim
Abstract: In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals that the combination of multiple models can lead to a better outcome than the best individual model.
摘要:在本文中,我们提出KGvec2go,一个Web API用于在下游应用中的轻量的方式访问和使用图嵌入。目前,我们服务的四个知识图预先训练的嵌入。介绍服务及其使用情况,我们进一步表明,经过训练的模型通过对多个语义基准进行评价,具有语义值。评价结果还显示,多个模型的组合可以导致比最佳个体模型一个更好的结果。

2. It Means More if It Sounds Good: Yet Another Hypotheses Concerning the Evolution of Polysemous Words [PDF] 返回目录
  Ivan P. Yamshchikov, Cyrille Merleau Nono Saha, Igor Samenko, Jürgen Jost
Abstract: This position paper looks into the formation of language and shows ties between structural properties of the words in the English language and their polysemy. Using Ollivier-Ricci curvature over a large graph of synonyms to estimate polysemy it shows empirically that the words that arguably are easier to pronounce also tend to have multiple meanings.
摘要:这份立场文件看起来是对在英文和多义词的结构特性之间的语言和表演的关系的形成。使用奥利维耶-Ricci曲率在大图表同义词来估计多义的它显示凭经验认为这可以说是更容易也发音字趋向于具有多种含义。

3. Sentiment Analysis with Contextual Embeddings and Self-Attention [PDF] 返回目录
  Katarzyna Biesialska, Magdalena Biesialska, Henryk Rybinski
Abstract: In natural language the intended meaning of a word or phrase is often implicit and depends on the context. In this work, we propose a simple yet effective method for sentiment analysis using contextual embeddings and a self-attention mechanism. The experimental results for three languages, including morphologically rich Polish and German, show that our model is comparable to or even outperforms state-of-the-art models. In all cases the superiority of models leveraging contextual embeddings is demonstrated. Finally, this work is intended as a step towards introducing a universal, multilingual sentiment classifier.
摘要:在自然语言中的单词或短语的本意通常是隐而取决于上下文。在这项工作中,我们提出了情感分析一个简单而有效的方法使用情境的嵌入和自注意机制。三种语言,包括形态丰富的波兰和德国,表明我们的模型相媲美,甚至实验结果优于国家的最先进的车型。在所有情况下,模型利用上下文的嵌入优越性证明。最后,这项工作的目的是作为对实行全面的,多语言的情感分类的一个步骤。

4. Learning word-referent mappings and concepts from raw inputs [PDF] 返回目录
  Wai Keen Vong, Brenden M. Lake
Abstract: How do children learn correspondences between the language and the world from noisy, ambiguous, naturalistic input? One hypothesis is via cross-situational learning: tracking words and their possible referents across multiple situations allows learners to disambiguate correct word-referent mappings (Yu & Smith, 2007). However, previous models of cross-situational word learning operate on highly simplified representations, side-stepping two important aspects of the actual learning problem. First, how can word-referent mappings be learned from raw inputs such as images? Second, how can these learned mappings generalize to novel instances of a known word? In this paper, we present a neural network model trained from scratch via self-supervision that takes in raw images and words as inputs, and show that it can learn word-referent mappings from fully ambiguous scenes and utterances through cross-situational learning. In addition, the model generalizes to novel word instances, locates referents of words in a scene, and shows a preference for mutual exclusivity.
摘要:如何孩子学会从嘈杂的,暧昧的,自然的输入语言和世界之间的对应关系?一个假设是通过跨情境学习:跟踪文字和跨多种情况及其可能的参照物让学习者歧义正确的字,指称映射(Yu和史密斯,2007)。然而,跨情境字的学习以往机型操作上高度简化表示,侧步执行实际的学习问题的两个重要方面。首先,怎样字所指的映射从原材料投入学习,如图像?二,如何能够将这些学到的映射推广到一个已知字的小说实例?在本文中,我们提出通过自检,在原始图像和文字需要作为输入,并表明,它可以通过跨情境学习借鉴充分暧昧场景和话语词所指的映射从头开始训练了神经网络模型。此外,该模型推广到新词的情况下,定位在一个场景的话指称,并显示了相互排斥的偏好。

5. Semantic Holism and Word Representations in Artificial Neural Networks [PDF] 返回目录
  Tomáš Musil
Abstract: Artificial neural networks are a state-of-the-art solution for many problems in natural language processing. What can we learn about language and meaning from the way artificial neural networks represent it? Word representations obtained from the Skip-gram variant of the word2vec model exhibit interesting semantic properties. This is usually explained by referring to the general distributional hypothesis, which states that the meaning of the word is given by the contexts where it occurs. We propose a more specific approach based on Frege's holistic and functional approach to meaning. Taking Tugendhat's formal reinterpretation of Frege's work as a starting point, we demonstrate that it is analogical to the process of training the Skip-gram model and offers a possible explanation of its semantic properties.
摘要:人工神经网络是在自然语言处理的许多问题一个国家的最先进的解决方案。我们可以了解什么语言和方式人工神经网络代表它的意思?从word2vec模型跳过克变种获得话语表述呈现出有趣的语义特性。这通常是参照一般分布的假设,其中指出,这个词的意思是在那里发生的情境给出解释。我们提出了一种基于弗雷格的整体性和功能的方法来意义更具体的做法。以图根哈特的弗雷格的工作为起点,正式重新解释,我们证明了它是类比的训练跳过-gram模型,并提供其语义特性的一个可能的解释过程。

6. Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking [PDF] 返回目录
  Samuel Broscheit
Abstract: A typical architecture for end-to-end entity linking systems consists of three steps: mention detection, candidate generation and entity disambiguation. In this study we investigate the following questions: (a) Can all those steps be learned jointly with a model for contextualized text-representations, i.e. BERT (Devlin et al., 2019)? (b) How much entity knowledge is already contained in pretrained BERT? (c) Does additional entity knowledge improve BERT's performance in downstream tasks? To this end, we propose an extreme simplification of the entity linking setup that works surprisingly well: simply cast it as a per token classification over the entire entity vocabulary (over 700K classes in our case). We show on an entity linking benchmark that (i) this model improves the entity representations over plain BERT, (ii) that it outperforms entity linking architectures that optimize the tasks separately and (iii) that it only comes second to the current state-of-the-art that does mention detection and entity disambiguation jointly. Additionally, we investigate the usefulness of entity-aware token-representations in the text-understanding benchmark GLUE, as well as the question answering benchmarks SQUAD V2 and SWAG and also the EN-DE WMT14 machine translation benchmark. To our surprise, we find that most of those benchmarks do not benefit from additional entity knowledge, except for a task with very small training data, the RTE task in GLUE, which improves by 2%.
摘要:端至端实体链接系统的典型架构包括三个步骤:提的检测,候选生成和实体消歧。在这项研究中,我们探讨以下问题:(1)是否所有这些步骤共同与语境化的文本表示模型教训,即BERT(Devlin等,2019)? (二)有多少实体的知识已经包含在预训练的BERT? (三)有没有额外的实体知识,提高BERT的下游任务中的表现?为此,我们提出的实体连接设置,工程出奇地好一个极度简化:可简单地把它作为每一个在整个实体词汇令牌分类(在我们的案例超过700K类)。我们展示的链接基准,(i)本模型提高了普通BERT实体表示一个实体,(二),它优于实体连接架构,分别优化的任务及(iii),它仅排第二,以目前的状态的-the艺术,做提检测和实体消歧联合。此外,我们研究在文本的理解基准胶实体感知令牌表示的实用性,还有答疑基准SQUAD V2和SWAG,也是EN-DE WMT14机器翻译的标杆。令我们惊讶的是,我们发现,大多数这些基准不从另外的实体知识中获益,用非常小的训练数据,在RTE任务胶水,从而提高了2%,除了任务。

7. A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis [PDF] 返回目录
  Faiza Memood, Muhammad Usman Ghani, Muhammad Ali Ibrahim, Rehab Shehzadi, Muhammad Nabeel Asim
Abstract: In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this paper for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu dataset which consists of 3241 sentiments annotated against positive, negative and neutral classes. To provide benchmark baseline performance over the presented dataset, we adapt diverse machine learning (Support Vector Machine Logistic Regression, Naive Bayes), deep learning (convolutional neural network, recurrent neural network), and hybrid approaches. Effectiveness of generated neural word embeddings is evaluated by comparing the performance of machine and deep learning based methodologies using 7, and 5 distinct feature representation approaches respectively. Finally, it proposes a novel precisely extreme multi-channel hybrid methodology which outperforms state-of-the-art adapted machine and deep learning approaches by the figure of 9%, and 4% in terms of F1-score. Roman Urdu Sentiment Analysis, Pretrain word embeddings for Roman Urdu, Word2Vec, Glove, Fast-Text
摘要:为了加快的各种自然语言处理任务罗马乌尔都语性能,本文为第一次提供了3点字神经的嵌入使用最广泛的方法,即Word2vec,FastText和手套准备。产生神经字的嵌入的完整性是使用内在和外在的评价方法进行评价。考虑到缺乏可公开获得的基准数据集的,它提供了有史以来第一个罗马乌尔都语数据集,其包括对注释的正面,负面和中性类3241情操。为了提供对数据集中呈现基准的基准性能,我们适应多样化的学习机(支持向量机Logistic回归朴素贝叶斯),深度学习(卷积神经网络,递归神经网络),以及混合方法。产生神经字的嵌入的有效性通过比较机器的性能及深使用7-基于学习的方法评价,5个不同的特征表示分别接近。最后,提出了一种新颖的极端精确的多通道混合方法,它优于国家的最先进的机器适于和深学习通过在F1-得分方面的9%,和4%的数字接近。罗马乌尔都语情感分析,Pretrain字的嵌入为罗马乌尔都语,Word2Vec,手套,快速文本

8. Investigating the influence Brexit had on Financial Markets, in particular the GBP/EUR exchange rate [PDF] 返回目录
  Michael Filletti
Abstract: On 23rd June 2016, 51.9% of British voters voted to leave the European Union, triggering a process and events that have led to the United Kingdom leaving the EU, an event that has become known as 'Brexit'. In this piece of research, we investigate the effects of this entire process on the currency markets, specifically the GBP/EUR exchange rate. Financial markets are known to be sensitive to news articles and media, and the aim of this research is to evaluate the magnitude of impact of relevant events, as well as whether the impact was positive or negative for the GBP.
摘要:在2016年6月23日,英国选民的51.9%投票赞成离开欧盟,触发一个过程,导致英国离开欧盟,这已成为被称为“Brexit”事件的事件。在这片研究,我们研究了这整个过程中对货币市场,特别是英镑/欧元汇率的影响。金融市场被称为是对新闻报道和媒体的敏感,这项研究的目的是评估相关事件的影响的大小,影响,以及是否是阳性或阴性英镑。

9. Regular Intersection Emptiness of Graph Problems: Finding a Needle in a Haystack of Graphs with the Help of Automata [PDF] 返回目录
  Petra Wolf, Henning Fernau
Abstract: The Int_reg-problem of a combinatorial problem P asks, given a nondeterministic automaton M as input, whether the language L(M) accepted by M contains any positive instance of the problem P. We consider the Int_reg-problem for a number of different graph problems and give general criteria that give decision procedures for these Int_reg-problems. To achieve this goal, we consider a natural graph encoding so that the language of all graph encodings is regular. Then, we draw the connection between classical pumping- and interchange-arguments from the field of formal language theory with the graph operations induced on the encoded graph. Our techniques apply among others to the Int_reg-problem of well-known graph problems like Vertex Cover and Independent Set, as well as to subgraph problems, graph-edit problems and graph-partitioning problems, including coloring problems.
摘要:一个组合问题P要求的Int_reg-问题,给出一个不确定性自动机M作为输入,由M接受的语言L(M)是否包含问题P的任何积极的情况下,我们考虑Int_reg-问题的若干不同的图形问题,并提供,让这些Int_reg-问题决策程序的一般标准。为了实现这一目标,我们考虑编码,使所有图形编码的语言是有规律的自然曲线。然后,我们借鉴形式语言理论领域具有诱发性上的编码图形的图形操作古典pumping-和交换论点之间的连接。我们的技术应用等来的像点覆盖和独立集知名图形的问题,以及为子问题,图形编辑的问题,图的划分问题,包括着色问题Int_reg-问题。

10. Expressiveness and machine processability of Knowledge Organization Systems (KOS): An analysis of concepts and relations [PDF] 返回目录
  Manolis Peponakis, Anna Mastora, Sarantos Kapidakis, Martin Doerr
Abstract: This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
摘要:本研究认为表现(即表达能力或表达能力)不同类型的知识组织系统(KOS)的,并讨论其潜力成为机器可处理的语义Web的上下文。为此,科斯的理论基础进行审查基础上通过主题规范数据(FRSAD)和简单知识组织系统(SKOS)的功能需求推出概念化;自然语言处理技术也可以实现。施加比较分析,数据集包括一个词库(Eurovoc),一个主题标题系统(LCSH)和一个分类方案(DDC)。这些都与通过关注他们是如何界定和处理概念和关系的本体(CIDOC-CRM)相比较。据观察在字符串(nomens)而不是语义建模的形式主义LCSH和DDC焦点;他们什么构成的概念定义很模糊,他们包括了大量的复杂的概念。相比之下,词典对什么是一个概念的定义一致,并运用系统的方法来关系的建模。本体明确定义不同类型的关系,并根据其性质的机器加工的。本文认为,每个KOS的表现力和机加工性两者的电位被广泛地其结构规则调节。这是很难代表主题标题和分类方案与节点和弧语义网络,而叙词是更适合这样的表示。此外,一个范式转变显露侧重于概念之间的关系,而不是概念本身的造型。

注:中文为机器翻译结果!