目录
3. Greedy Transition-Based Dependency Parsing with Discrete and Continuous Supertag Features [PDF] 摘要
摘要
1. CompRes: A Dataset for Narrative Structure in News [PDF] 返回目录
Effi Levi, Guy Mor, Shaul Shenhav, Tamir Sheafer
Abstract: This paper addresses the task of automatically detecting narrative structures in raw texts. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to news articles, motivated by their growing social impact as well as their role in creating and shaping public opinion. We introduce CompRes -- the first dataset for narrative structure in news media. We describe the process in which the dataset was constructed: first, we designed a new narrative annotation scheme, better suited for news media, by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success); then, we used that scheme to annotate a set of 29 English news articles (containing 1,099 sentences) collected from news and partisan websites. We use the annotated dataset to train several supervised models to identify the different narrative elements, achieving an $F_1$ score of up to 0.7. We conclude by suggesting several promising directions for future work.
摘要:本文论述的自动检测原始文本的叙事结构的任务。以前的作品已经利用由拉波夫和Waletzky口头叙事理论来识别个人的故事文本不同的叙事元素。相反,我们把我们的重点新闻文章,通过其不断增长的社会影响以及它们在创造和塑造舆论作用的动机。我们引进压缩模式 - 叙事结构中的第一个数据集的新闻媒体。我们描述了该数据集构建的过程:首先,我们设计了一种新的叙事诠释方案,更适合于新闻媒体,通过调整从拉波夫和Waletzky(并发症和分辨率)的叙事理论元素和添加新的叙事元素我们自己的(成功);然后,我们使用方案注释一套29个从新闻和党派网站收集英语新闻(含1099句)。我们使用注释数据集,以培养几个监督模型来识别不同的叙事元素,实现了$ F_1 $得分最高的0.7。最后,我们建议今后工作的几个有前途的方向。
Effi Levi, Guy Mor, Shaul Shenhav, Tamir Sheafer
Abstract: This paper addresses the task of automatically detecting narrative structures in raw texts. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to news articles, motivated by their growing social impact as well as their role in creating and shaping public opinion. We introduce CompRes -- the first dataset for narrative structure in news media. We describe the process in which the dataset was constructed: first, we designed a new narrative annotation scheme, better suited for news media, by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success); then, we used that scheme to annotate a set of 29 English news articles (containing 1,099 sentences) collected from news and partisan websites. We use the annotated dataset to train several supervised models to identify the different narrative elements, achieving an $F_1$ score of up to 0.7. We conclude by suggesting several promising directions for future work.
摘要:本文论述的自动检测原始文本的叙事结构的任务。以前的作品已经利用由拉波夫和Waletzky口头叙事理论来识别个人的故事文本不同的叙事元素。相反,我们把我们的重点新闻文章,通过其不断增长的社会影响以及它们在创造和塑造舆论作用的动机。我们引进压缩模式 - 叙事结构中的第一个数据集的新闻媒体。我们描述了该数据集构建的过程:首先,我们设计了一种新的叙事诠释方案,更适合于新闻媒体,通过调整从拉波夫和Waletzky(并发症和分辨率)的叙事理论元素和添加新的叙事元素我们自己的(成功);然后,我们使用方案注释一套29个从新闻和党派网站收集英语新闻(含1099句)。我们使用注释数据集,以培养几个监督模型来识别不同的叙事元素,实现了$ F_1 $得分最高的0.7。最后,我们建议今后工作的几个有前途的方向。
2. Targeting the Benchmark: On Methodology in Current Natural Language Processing Research [PDF] 返回目录
David Schlangen
Abstract: It has become a common pattern in our field: One group introduces a language task, exemplified by a dataset, which they argue is challenging enough to serve as a benchmark. They also provide a baseline model for it, which then soon is improved upon by other groups. Often, research efforts then move on, and the pattern repeats itself. What is typically left implicit is the argumentation for why this constitutes progress, and progress towards what. In this paper, we try to step back for a moment from this pattern and work out possible argumentations and their parts.
摘要:它已成为在我们的领域一种常见的模式:一只介绍了一种语言的任务,由数据集,他们认为是具有挑战性的,足以作为基准例证。他们还提供了它的基准模型,然后很快就被经其他群体改善。通常情况下,研究工作,然后继续前进,图案重演。什么是典型的左隐是为什么这构成了进展,并朝着什么进步的论证。在本文中,我们试图从这种模式了一会儿后退一步,制定出可能的论证及其零部件。
David Schlangen
Abstract: It has become a common pattern in our field: One group introduces a language task, exemplified by a dataset, which they argue is challenging enough to serve as a benchmark. They also provide a baseline model for it, which then soon is improved upon by other groups. Often, research efforts then move on, and the pattern repeats itself. What is typically left implicit is the argumentation for why this constitutes progress, and progress towards what. In this paper, we try to step back for a moment from this pattern and work out possible argumentations and their parts.
摘要:它已成为在我们的领域一种常见的模式:一只介绍了一种语言的任务,由数据集,他们认为是具有挑战性的,足以作为基准例证。他们还提供了它的基准模型,然后很快就被经其他群体改善。通常情况下,研究工作,然后继续前进,图案重演。什么是典型的左隐是为什么这构成了进展,并朝着什么进步的论证。在本文中,我们试图从这种模式了一会儿后退一步,制定出可能的论证及其零部件。
3. Greedy Transition-Based Dependency Parsing with Discrete and Continuous Supertag Features [PDF] 返回目录
Ali Basirat, Joakim Nivre
Abstract: We study the effect of rich supertag features in greedy transition-based dependency parsing. While previous studies have shown that sparse boolean features representing the 1-best supertag of a word can improve parsing accuracy, we show that we can get further improvements by adding a continuous vector representation of the entire supertag distribution for a word. In this way, we achieve the best results for greedy transition-based parsing with supertag features with $88.6\%$ LAS and $90.9\%$ UASon the English Penn Treebank converted to Stanford Dependencies.
摘要:我们研究的贪心过渡依存分析丰富的SuperTag功能的影响。虽然以前的研究已经表明,表示字的1最佳的SuperTag稀疏布尔功能可以提高分析的准确性,我们表明,我们可以通过添加一个字,整个的SuperTag分布的连续向量表示得到进一步的改善。这样一来,我们实现了贪婪的最好成绩过渡型与SuperTag的解析与88.6 $ \%$ LAS和$为90.9特点\%$ UASon英语宾州树库转化为斯坦福大学的依赖性。
Ali Basirat, Joakim Nivre
Abstract: We study the effect of rich supertag features in greedy transition-based dependency parsing. While previous studies have shown that sparse boolean features representing the 1-best supertag of a word can improve parsing accuracy, we show that we can get further improvements by adding a continuous vector representation of the entire supertag distribution for a word. In this way, we achieve the best results for greedy transition-based parsing with supertag features with $88.6\%$ LAS and $90.9\%$ UASon the English Penn Treebank converted to Stanford Dependencies.
摘要:我们研究的贪心过渡依存分析丰富的SuperTag功能的影响。虽然以前的研究已经表明,表示字的1最佳的SuperTag稀疏布尔功能可以提高分析的准确性,我们表明,我们可以通过添加一个字,整个的SuperTag分布的连续向量表示得到进一步的改善。这样一来,我们实现了贪婪的最好成绩过渡型与SuperTag的解析与88.6 $ \%$ LAS和$为90.9特点\%$ UASon英语宾州树库转化为斯坦福大学的依赖性。
4. Principal Word Vectors [PDF] 返回目录
Ali Basirat, Christian Hardmeier, Joakim Nivre
Abstract: We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regard to different types of contexts and different types of annotations provided for a corpus. The second is to generalize the transformation step used in most of the word embedding methods. To this end, we define two levels of transformations. The first is a quadratic transformation, which accounts for different types of weighting over the vocabulary units and contextual features. Second is an adaptive non-linear transformation, which reshapes the data distribution to be meaningful to principal component analysis. The effect of these generalizations on the word vectors is intrinsically studied with regard to the spread and the discriminability of the word vectors. We also provide an extrinsic evaluation of the contribution of the principal word vectors on a word similarity benchmark and the task of dependency parsing. Our experiments are finalized by a comparison between the principal word vectors and other sets of word vectors generated with popular word embedding methods. The results obtained from our intrinsic evaluation metrics show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods. The results obtained from the extrinsic evaluation metrics show that the principal word vectors are better than some of the word embedding methods and on par with popular methods of word embedding.
摘要:我们嵌入话到向量空间一概而论主成分分析。泛化在两大层面提出。第一种方法是概括语料库的概念作为由三个关键要素词汇集,特征(注释)组,和上下文中定义的计数处理。这个概括可以使主字嵌入方法,以产生相对于不同类型的上下文和不同类型提供了一种用于语料库注释的字向量。二是概括在大多数字嵌入方法中使用的变换步骤。为此,我们定义转换的两个层次。第一个是二次变换,其占不同类型的加权过的词汇单元和上下文特征的。第二是自适应非线性变换,其重塑数据分布是有意义的主成分分析。对单词矢量这些概括的影响本质上是关于传播和单词矢量的可辨性的研究。我们还提供了主要的单词矢量的就一个字相似基准的贡献和依存分析的任务的外在评价。我们的实验是由主字向量和其它组与流行的文字嵌入方法产生的字向量之间的比较完成。从我们内在的评价标准得到的结果表明,传播和主词矢量的可辨性是比其他字嵌入方法更高。从外在评价标准得到的结果表明,该主字矢量是比一些字嵌入方法的更好和看齐字嵌入的流行的方法。
Ali Basirat, Christian Hardmeier, Joakim Nivre
Abstract: We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regard to different types of contexts and different types of annotations provided for a corpus. The second is to generalize the transformation step used in most of the word embedding methods. To this end, we define two levels of transformations. The first is a quadratic transformation, which accounts for different types of weighting over the vocabulary units and contextual features. Second is an adaptive non-linear transformation, which reshapes the data distribution to be meaningful to principal component analysis. The effect of these generalizations on the word vectors is intrinsically studied with regard to the spread and the discriminability of the word vectors. We also provide an extrinsic evaluation of the contribution of the principal word vectors on a word similarity benchmark and the task of dependency parsing. Our experiments are finalized by a comparison between the principal word vectors and other sets of word vectors generated with popular word embedding methods. The results obtained from our intrinsic evaluation metrics show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods. The results obtained from the extrinsic evaluation metrics show that the principal word vectors are better than some of the word embedding methods and on par with popular methods of word embedding.
摘要:我们嵌入话到向量空间一概而论主成分分析。泛化在两大层面提出。第一种方法是概括语料库的概念作为由三个关键要素词汇集,特征(注释)组,和上下文中定义的计数处理。这个概括可以使主字嵌入方法,以产生相对于不同类型的上下文和不同类型提供了一种用于语料库注释的字向量。二是概括在大多数字嵌入方法中使用的变换步骤。为此,我们定义转换的两个层次。第一个是二次变换,其占不同类型的加权过的词汇单元和上下文特征的。第二是自适应非线性变换,其重塑数据分布是有意义的主成分分析。对单词矢量这些概括的影响本质上是关于传播和单词矢量的可辨性的研究。我们还提供了主要的单词矢量的就一个字相似基准的贡献和依存分析的任务的外在评价。我们的实验是由主字向量和其它组与流行的文字嵌入方法产生的字向量之间的比较完成。从我们内在的评价标准得到的结果表明,传播和主词矢量的可辨性是比其他字嵌入方法更高。从外在评价标准得到的结果表明,该主字矢量是比一些字嵌入方法的更好和看齐字嵌入的流行的方法。
5. DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels [PDF] 返回目录
Alberto Barbado, Víctor Fresno, Ángeles Manjarrés Riesco, Salvador Ros
Abstract: Nowadays, there are many applications of text mining over corpus from different languages, such as using supervised machine learning in order to predict labels associated to a text using as predictors features derived from the text itself. However, most of these applications are based on texts in prose, with a lack of applications that work with poetry texts. An example of application of text mining in poetry is the usage of features derived from their individual word in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over a labeled corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words in order to predict their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, identifying with tags when a sonnet is related to a specific term (p.e, when the sonnet's content is related to "daydream"). Then, we study how the GAM changes according to each of those psychological terms. The corpus contains 230 Spanish sonnets from authors of different centuries, from 15th to 19th. This corpus was annotated by different domain experts. The experts annotated the poems with affective features, as well as with domain concepts that belong to psychology. Thanks to this, the corpora of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
摘要:目前,有超过语料库文本挖掘的许多应用程序从不同的语言,如使用监督机器学习,以预测相关的使用作为预测功能从文本本身获得的文本标签。然而,大多数这些应用都是基于在散文文本,与缺乏应用与诗歌文本的工作。在诗歌应用文本挖掘的一个例子是从他们的单词,以捕捉词汇,形旁亚词汇和interlexical意义,并推断文字的一般情感意义(GAM)衍生功能的使用情况。然而,尽管这个建议已经被证明为在一些语言的诗有用,对西班牙语诗歌和高度结构化的诗意的组合物,如十四行诗缺乏研究。本文提出了一种研究对西班牙的十四行诗标记的语料库,以分析是否有可能建立以预测它们的GAM从他们个人的话的特点。这样做的目的是为了在十四行诗情感级别车型。文章还分析了十四行诗的GAM和内容本身之间的关系。对于这一点,我们考虑从心理学的角度讲的内容,以及标签标识时,一首十四行诗是关系到一个特定术语(体育课,当十四行诗的内容与“白日梦”)。然后,我们研究了GAM根据每个这些心理方面的如何变化。该语料库包含230个西班牙语十四行诗从不同世纪的作家,从第15位到第19位。该文集是由不同领域专家注释。专家标注了情感的功能,属于心理学的诗歌,以及与域的概念。由于这一点,十四行诗的语料库可以在不同的应用程序,如诗歌推荐系统,作者的个性文本挖掘研究,还是诗歌的用于治疗目的的使用中。
Alberto Barbado, Víctor Fresno, Ángeles Manjarrés Riesco, Salvador Ros
Abstract: Nowadays, there are many applications of text mining over corpus from different languages, such as using supervised machine learning in order to predict labels associated to a text using as predictors features derived from the text itself. However, most of these applications are based on texts in prose, with a lack of applications that work with poetry texts. An example of application of text mining in poetry is the usage of features derived from their individual word in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over a labeled corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words in order to predict their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, identifying with tags when a sonnet is related to a specific term (p.e, when the sonnet's content is related to "daydream"). Then, we study how the GAM changes according to each of those psychological terms. The corpus contains 230 Spanish sonnets from authors of different centuries, from 15th to 19th. This corpus was annotated by different domain experts. The experts annotated the poems with affective features, as well as with domain concepts that belong to psychology. Thanks to this, the corpora of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
摘要:目前,有超过语料库文本挖掘的许多应用程序从不同的语言,如使用监督机器学习,以预测相关的使用作为预测功能从文本本身获得的文本标签。然而,大多数这些应用都是基于在散文文本,与缺乏应用与诗歌文本的工作。在诗歌应用文本挖掘的一个例子是从他们的单词,以捕捉词汇,形旁亚词汇和interlexical意义,并推断文字的一般情感意义(GAM)衍生功能的使用情况。然而,尽管这个建议已经被证明为在一些语言的诗有用,对西班牙语诗歌和高度结构化的诗意的组合物,如十四行诗缺乏研究。本文提出了一种研究对西班牙的十四行诗标记的语料库,以分析是否有可能建立以预测它们的GAM从他们个人的话的特点。这样做的目的是为了在十四行诗情感级别车型。文章还分析了十四行诗的GAM和内容本身之间的关系。对于这一点,我们考虑从心理学的角度讲的内容,以及标签标识时,一首十四行诗是关系到一个特定术语(体育课,当十四行诗的内容与“白日梦”)。然后,我们研究了GAM根据每个这些心理方面的如何变化。该语料库包含230个西班牙语十四行诗从不同世纪的作家,从第15位到第19位。该文集是由不同领域专家注释。专家标注了情感的功能,属于心理学的诗歌,以及与域的概念。由于这一点,十四行诗的语料库可以在不同的应用程序,如诗歌推荐系统,作者的个性文本挖掘研究,还是诗歌的用于治疗目的的使用中。
6. Automatic Personality Prediction; an Enhanced Method Using Ensemble Modeling [PDF] 返回目录
Majid Ramezani, Mohammad-Reza Feizi-Derakhshi, Mohammad-Ali Balafar, Meysam Asgari-Chenaghlu, Ali-Reza Feizi-Derakhshi, Narjes Nikzad-Khasmakhi, Mehrdad Ranjbar-Khadivi, Zoleikha Jahanbakhsh-Nagadeh, Elnaz Zafarani-Moattar, Taymaz Rahkar-Farshi
Abstract: Human personality is significantly represented by those words which he/she uses in his/her speech or writing. As a consequence of spreading the information infrastructures (specifically the Internet and social media), human communications have reformed notably from face to face communication. Generally, Automatic Personality Prediction (or Perception) (APP) is the automated forecasting of the personality on different types of human generated/exchanged contents (like text, speech, image, video, etc.). The major objective of this study is to enhance the accuracy of APP from the text. To this end, we suggest five new APP methods including term frequency vector-based, ontology-based, enriched ontology-based, latent semantic analysis (LSA)-based, and deep learning-based (BiLSTM) methods. These methods as the base ones, contribute to each other to enhance the APP accuracy through ensemble modeling (stacking) based on a hierarchical attention network (HAN) as the meta-model. The results show that ensemble modeling enhances the accuracy of APP.
摘要:人的性格是由他/她使用在他/她的口头或书面的那些话显著表示。作为传播信息基础设施(特别是互联网和社交媒体)的结果,人类交流了改革特别是从面对面的交流。通常,自动人格预测(或感知)(APP)是在不同类型的人类生成/交换内容(如文本,语音,图像,视频等)的个性的自动预测。这项研究的主要目的是从文本增强APP的准确性。为此,我们建议五个新的APP方法,包括基于矢量的词频,基于本体的,丰富的基于本体的,潜在语义分析(LSA)为主,和深学习型(BiLSTM)方法。这些方法为基础的,有助于相互基于分层关注网络(HAN)的元模型上,以增强通过整体建模(堆积)APP准确性。结果表明,整体建模增强APP的准确性。
Majid Ramezani, Mohammad-Reza Feizi-Derakhshi, Mohammad-Ali Balafar, Meysam Asgari-Chenaghlu, Ali-Reza Feizi-Derakhshi, Narjes Nikzad-Khasmakhi, Mehrdad Ranjbar-Khadivi, Zoleikha Jahanbakhsh-Nagadeh, Elnaz Zafarani-Moattar, Taymaz Rahkar-Farshi
Abstract: Human personality is significantly represented by those words which he/she uses in his/her speech or writing. As a consequence of spreading the information infrastructures (specifically the Internet and social media), human communications have reformed notably from face to face communication. Generally, Automatic Personality Prediction (or Perception) (APP) is the automated forecasting of the personality on different types of human generated/exchanged contents (like text, speech, image, video, etc.). The major objective of this study is to enhance the accuracy of APP from the text. To this end, we suggest five new APP methods including term frequency vector-based, ontology-based, enriched ontology-based, latent semantic analysis (LSA)-based, and deep learning-based (BiLSTM) methods. These methods as the base ones, contribute to each other to enhance the APP accuracy through ensemble modeling (stacking) based on a hierarchical attention network (HAN) as the meta-model. The results show that ensemble modeling enhances the accuracy of APP.
摘要:人的性格是由他/她使用在他/她的口头或书面的那些话显著表示。作为传播信息基础设施(特别是互联网和社交媒体)的结果,人类交流了改革特别是从面对面的交流。通常,自动人格预测(或感知)(APP)是在不同类型的人类生成/交换内容(如文本,语音,图像,视频等)的个性的自动预测。这项研究的主要目的是从文本增强APP的准确性。为此,我们建议五个新的APP方法,包括基于矢量的词频,基于本体的,丰富的基于本体的,潜在语义分析(LSA)为主,和深学习型(BiLSTM)方法。这些方法为基础的,有助于相互基于分层关注网络(HAN)的元模型上,以增强通过整体建模(堆积)APP准确性。结果表明,整体建模增强APP的准确性。
7. Less is More: Rejecting Unreliable Reviews for Product Question Answering [PDF] 返回目录
Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris
Abstract: Promptly and accurately answering questions on products is important for e-commerce applications. Manually answering product questions (e.g. on community question answering platforms) results in slow response and does not scale. Recent studies show that product reviews are a good source for real-time, automatic product question answering (PQA). In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question. In this paper, we focus on the issue of answerability and answer reliability for PQA using reviews. Our investigation is based on the intuition that many questions may not be answerable with a finite set of reviews. When a question is not answerable, a system should return nil answers rather than providing a list of irrelevant reviews, which can have significant negative impact on user experience. Moreover, for answerable questions, only the most relevant reviews that answer the question should be included in the result. We propose a conformal prediction based framework to improve the reliability of PQA systems, where we reject unreliable answers so that the returned results are more concise and accurate at answering the product question, including returning nil answers for unanswerable questions. Experiments on a widely used Amazon dataset show encouraging results of our proposed framework. More broadly, our results demonstrate a novel and effective application of conformal methods to a retrieval task.
摘要:对产品及时,准确地回答问题是电子商务的应用很重要。手动回答其他问题(如社区问答平台)的响应速度慢的结果和不结垢。最近的研究表明,产品评价是实时的良好来源,产品自动问答(PQA)。在文献中,PQA被配制成检索问题,其目标是寻找最相关的评论回答给定的产品的问题。在本文中,我们侧重于回应能力和可靠性的答案使用PQA审查的问题。我们的调查是基于直觉,很多问题可能不会回答的一个有限集审查。当一个问题不回答的,系统将返回零的答案,而不是提供的无关的评论列表,这会对用户体验显著的负面影响。此外,回答的问题,只有最相关的条评论回答这个问题应包括在结果中。我们提出的保形预测为基础的框架,以提高PQA系统,在那里我们拒绝不可靠的答案,使返回的结果是更简洁,准确的解答在产品的问题,包括返回的无法回答的问题无答案的可靠性。在广泛使用亚马逊的实验数据集上鼓励我们提出的框架的结果。更广泛地说,我们的结果表明,保形方法来检索任务新颖而有效的应用。
Shiwei Zhang, Xiuzhen Zhang, Jey Han Lau, Jeffrey Chan, Cecile Paris
Abstract: Promptly and accurately answering questions on products is important for e-commerce applications. Manually answering product questions (e.g. on community question answering platforms) results in slow response and does not scale. Recent studies show that product reviews are a good source for real-time, automatic product question answering (PQA). In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question. In this paper, we focus on the issue of answerability and answer reliability for PQA using reviews. Our investigation is based on the intuition that many questions may not be answerable with a finite set of reviews. When a question is not answerable, a system should return nil answers rather than providing a list of irrelevant reviews, which can have significant negative impact on user experience. Moreover, for answerable questions, only the most relevant reviews that answer the question should be included in the result. We propose a conformal prediction based framework to improve the reliability of PQA systems, where we reject unreliable answers so that the returned results are more concise and accurate at answering the product question, including returning nil answers for unanswerable questions. Experiments on a widely used Amazon dataset show encouraging results of our proposed framework. More broadly, our results demonstrate a novel and effective application of conformal methods to a retrieval task.
摘要:对产品及时,准确地回答问题是电子商务的应用很重要。手动回答其他问题(如社区问答平台)的响应速度慢的结果和不结垢。最近的研究表明,产品评价是实时的良好来源,产品自动问答(PQA)。在文献中,PQA被配制成检索问题,其目标是寻找最相关的评论回答给定的产品的问题。在本文中,我们侧重于回应能力和可靠性的答案使用PQA审查的问题。我们的调查是基于直觉,很多问题可能不会回答的一个有限集审查。当一个问题不回答的,系统将返回零的答案,而不是提供的无关的评论列表,这会对用户体验显著的负面影响。此外,回答的问题,只有最相关的条评论回答这个问题应包括在结果中。我们提出的保形预测为基础的框架,以提高PQA系统,在那里我们拒绝不可靠的答案,使返回的结果是更简洁,准确的解答在产品的问题,包括返回的无法回答的问题无答案的可靠性。在广泛使用亚马逊的实验数据集上鼓励我们提出的框架的结果。更广泛地说,我们的结果表明,保形方法来检索任务新颖而有效的应用。
8. Discourse Coherence, Reference Grounding and Goal Oriented Dialogue [PDF] 返回目录
Baber Khalid, Malihe Alikhani, Michael Fellner, Brian McMahan, Matthew Stone
Abstract: Prior approaches to realizing mixed-initiative human--computer referential communication have adopted information-state or collaborative problem-solving approaches. In this paper, we argue for a new approach, inspired by coherence-based models of discourse such as SDRT \cite{asher-lascarides:2003a}, in which utterances attach to an evolving discourse structure and the associated knowledge graph of speaker commitments serves as an interface to real-world reasoning and conversational strategy. As first steps towards implementing the approach, we describe a simple dialogue system in a referential communication domain that accumulates constraints across discourse, interprets them using a learned probabilistic model, and plans clarification using reinforcement learning.
摘要:在此之前的方法来实现混合行动人 - 计算机通信的参考都采用了信息化状态或协作解决问题的办法。在本文中,我们认为一种新的方法,通过话语如SDRT \举的基于一致性的模型启发{抛光机,lascarides:2003年a},其中话语附加到一个不断发展的篇章结构和扬声器承诺发球的相关知识图作为现实世界的推理和对话策略的接口。作为执行该方法的第一步,我们描述了一个简单的对话系统在跨话语积累约束参考通信领域,利用学到概率模型解释它们,并利用强化学习计划澄清。
Baber Khalid, Malihe Alikhani, Michael Fellner, Brian McMahan, Matthew Stone
Abstract: Prior approaches to realizing mixed-initiative human--computer referential communication have adopted information-state or collaborative problem-solving approaches. In this paper, we argue for a new approach, inspired by coherence-based models of discourse such as SDRT \cite{asher-lascarides:2003a}, in which utterances attach to an evolving discourse structure and the associated knowledge graph of speaker commitments serves as an interface to real-world reasoning and conversational strategy. As first steps towards implementing the approach, we describe a simple dialogue system in a referential communication domain that accumulates constraints across discourse, interprets them using a learned probabilistic model, and plans clarification using reinforcement learning.
摘要:在此之前的方法来实现混合行动人 - 计算机通信的参考都采用了信息化状态或协作解决问题的办法。在本文中,我们认为一种新的方法,通过话语如SDRT \举的基于一致性的模型启发{抛光机,lascarides:2003年a},其中话语附加到一个不断发展的篇章结构和扬声器承诺发球的相关知识图作为现实世界的推理和对话策略的接口。作为执行该方法的第一步,我们描述了一个简单的对话系统在跨话语积累约束参考通信领域,利用学到概率模型解释它们,并利用强化学习计划澄清。
9. DeepSinger: Singing Voice Synthesis with Data Mined From the Web [PDF] 返回目录
Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu
Abstract: In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a lyrics-to-singing alignment model to automatically extract the duration of each phoneme in lyrics starting from coarse-grained sentence level to fine-grained phoneme level, and further design a multi-lingual multi-singer singing model based on a feed-forward Transformer to directly generate linear-spectrograms from lyrics, and synthesize voices using Griffin-Lim. DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers. We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages (Chinese, Cantonese and English). The results demonstrate that with the singing data purely mined from the Web, DeepSinger can synthesize high-quality singing voices in terms of both pitch accuracy and voice naturalness (footnote: Our audio samples are shown in this https URL.)
摘要:在本文中,我们开发DeepSinger,一个多语种多歌手歌声合成(SVS)系统,它是从无到有使用从音乐网站挖掘歌唱训练数据构建的。 DeepSinger的管道包括几个步骤,包括数据检索,歌唱和伴奏分离,歌词到歌唱对准,数据过滤,和歌唱建模。具体来说,我们设计了一个歌词至歌唱对准模型自动提取在从粗粒句子级到细粒度音素级别开始歌词的每个音素的持续时间,并且还设计了一种基于多语言多歌手演唱模型前馈变压器直接生成从歌词使用格里芬-LIM线性谱图,和合成声音。 DeepSinger具有比前SVS系统的几个优点:1)据我们所知,这是第SVS系统直接矿山训练从音乐网站的数据,2)歌词至歌唱对准模型进一步避免了对准任何人的努力标签和极大地降低了贴标成本,3)基于前馈变压器歌唱模型是简单和有效的,通过去除在参数合成的复杂声学特征建模,并利用一个参考编码器来捕获歌手的从嘈杂歌声数据的音色和4),它可以合成多语言和多歌手的歌声。我们评估我们的开采唱歌数据集,它由约92个小时的数据从三种语言(中国,广东话和英语)89名歌手DeepSinger。结果表明,与歌唱数据纯粹从Web挖掘,DeepSinger可以同时在音准和语音自然度方面合成高品质的歌声(脚注:我们的音频样本显示在该HTTPS URL)。
Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu
Abstract: In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a lyrics-to-singing alignment model to automatically extract the duration of each phoneme in lyrics starting from coarse-grained sentence level to fine-grained phoneme level, and further design a multi-lingual multi-singer singing model based on a feed-forward Transformer to directly generate linear-spectrograms from lyrics, and synthesize voices using Griffin-Lim. DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers. We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages (Chinese, Cantonese and English). The results demonstrate that with the singing data purely mined from the Web, DeepSinger can synthesize high-quality singing voices in terms of both pitch accuracy and voice naturalness (footnote: Our audio samples are shown in this https URL.)
摘要:在本文中,我们开发DeepSinger,一个多语种多歌手歌声合成(SVS)系统,它是从无到有使用从音乐网站挖掘歌唱训练数据构建的。 DeepSinger的管道包括几个步骤,包括数据检索,歌唱和伴奏分离,歌词到歌唱对准,数据过滤,和歌唱建模。具体来说,我们设计了一个歌词至歌唱对准模型自动提取在从粗粒句子级到细粒度音素级别开始歌词的每个音素的持续时间,并且还设计了一种基于多语言多歌手演唱模型前馈变压器直接生成从歌词使用格里芬-LIM线性谱图,和合成声音。 DeepSinger具有比前SVS系统的几个优点:1)据我们所知,这是第SVS系统直接矿山训练从音乐网站的数据,2)歌词至歌唱对准模型进一步避免了对准任何人的努力标签和极大地降低了贴标成本,3)基于前馈变压器歌唱模型是简单和有效的,通过去除在参数合成的复杂声学特征建模,并利用一个参考编码器来捕获歌手的从嘈杂歌声数据的音色和4),它可以合成多语言和多歌手的歌声。我们评估我们的开采唱歌数据集,它由约92个小时的数据从三种语言(中国,广东话和英语)89名歌手DeepSinger。结果表明,与歌唱数据纯粹从Web挖掘,DeepSinger可以同时在音准和语音自然度方面合成高品质的歌声(脚注:我们的音频样本显示在该HTTPS URL)。
10. Cultural Cartography with Word Embeddings [PDF] 返回目录
Dustin S. Stoltz, Marshall A. Taylor
Abstract: Using the presence or frequency of keywords is a classic approach in the formal analysis of text, but has the drawback of glossing over the relationality of word meanings. Word embedding models overcome this problem by constructing a standardized meaning space where words are assigned a location based on relations of similarity to, and difference from, other words based on how they are used in natural language samples. We show how word embeddings can be put to the task of interpretation via two kinds of navigation. First, one can hold terms constant and measure how the embedding space moves around them--much like astronomers measured the changing of celestial bodies with the seasons. Second, one can also hold the embedding space constant and see how documents or authors move relative to it--just as ships use the stars on a given night to determine their location. Using the empirical case of immigration discourse in the United States, we demonstrate the merits of these two broad strategies to advance formal approaches to cultural analysis.
摘要:使用关键字的存在或频率在文本的形式分析的经典方法,但有粉饰词义的关系性的缺点。 Word中嵌入模型通过构建这里的话被赋予基于基于它们是如何自然语言样本中使用的相似性,并从换句话说差异的关系位置的标准化含义空间克服这个问题。我们将展示如何字的嵌入可以通过2种导航付诸口译任务。首先,一个可容纳条款不变,并测量其周围嵌入空间移动的方式 - 就像天文学家测量随着季节的天体的变化。其次,还可以保持嵌入空间不变,看看文件或作者如何移动相对于它 - 就像船只利用某一天晚上的星星来确定自己的位置。使用在美国的移民话语的实证案例中,我们证明这两大战略的长处推进正规的方法来文化分析。
Dustin S. Stoltz, Marshall A. Taylor
Abstract: Using the presence or frequency of keywords is a classic approach in the formal analysis of text, but has the drawback of glossing over the relationality of word meanings. Word embedding models overcome this problem by constructing a standardized meaning space where words are assigned a location based on relations of similarity to, and difference from, other words based on how they are used in natural language samples. We show how word embeddings can be put to the task of interpretation via two kinds of navigation. First, one can hold terms constant and measure how the embedding space moves around them--much like astronomers measured the changing of celestial bodies with the seasons. Second, one can also hold the embedding space constant and see how documents or authors move relative to it--just as ships use the stars on a given night to determine their location. Using the empirical case of immigration discourse in the United States, we demonstrate the merits of these two broad strategies to advance formal approaches to cultural analysis.
摘要:使用关键字的存在或频率在文本的形式分析的经典方法,但有粉饰词义的关系性的缺点。 Word中嵌入模型通过构建这里的话被赋予基于基于它们是如何自然语言样本中使用的相似性,并从换句话说差异的关系位置的标准化含义空间克服这个问题。我们将展示如何字的嵌入可以通过2种导航付诸口译任务。首先,一个可容纳条款不变,并测量其周围嵌入空间移动的方式 - 就像天文学家测量随着季节的天体的变化。其次,还可以保持嵌入空间不变,看看文件或作者如何移动相对于它 - 就像船只利用某一天晚上的星星来确定自己的位置。使用在美国的移民话语的实证案例中,我们证明这两大战略的长处推进正规的方法来文化分析。
11. IQ-VQA: Intelligent Visual Question Answering [PDF] 返回目录
Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha
Abstract: Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then also learn to answer the generated implication correctly. As a part of the cyclic framework, we propose a novel implication generator which can generate implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human annotated VQA-Implications dataset. The dataset consists of ~30k questions containing implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA v2.0 validation dataset. We show that our framework improves consistency of VQA models by ~15% on the rule-based dataset, ~7% on VQA-Implications dataset and robustness by ~2%, without degrading their performance. In addition, we also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
摘要:尽管人们对视觉问答系统领域的巨大进步,今天的车型仍倾向于不一致而脆。为此,我们提出增加任何VQA架构的一致性和稳健性的典范,独立循环的框架。我们培训模式,以回答原来的问题,基于答案的含义,然后还要学会正确回答产生意义。作为环状框架的一部分,我们提出了一个新的含义发生器,它可以产生任何问题 - 答案对隐含的问题。至于未来的基线上工作的一致性,我们提供了一个新的人类注释VQA-意义的数据集。该数据集由包含3种类型的含义〜30K的问题 - 逻辑等价,必要条件和互斥 - 从VQA V2.0验证数据集的制作。我们证明了我们的框架由〜15%提高VQA模型的一致性对基于规则的数据集,在〜VQA,影响7%的数据集和鲁棒性的〜2%,而不会降低其性能。此外,我们还定量地显示在地图的关注凸显其视觉和语言的更好的多模态的认识的提高。
Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha
Abstract: Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then also learn to answer the generated implication correctly. As a part of the cyclic framework, we propose a novel implication generator which can generate implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human annotated VQA-Implications dataset. The dataset consists of ~30k questions containing implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA v2.0 validation dataset. We show that our framework improves consistency of VQA models by ~15% on the rule-based dataset, ~7% on VQA-Implications dataset and robustness by ~2%, without degrading their performance. In addition, we also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
摘要:尽管人们对视觉问答系统领域的巨大进步,今天的车型仍倾向于不一致而脆。为此,我们提出增加任何VQA架构的一致性和稳健性的典范,独立循环的框架。我们培训模式,以回答原来的问题,基于答案的含义,然后还要学会正确回答产生意义。作为环状框架的一部分,我们提出了一个新的含义发生器,它可以产生任何问题 - 答案对隐含的问题。至于未来的基线上工作的一致性,我们提供了一个新的人类注释VQA-意义的数据集。该数据集由包含3种类型的含义〜30K的问题 - 逻辑等价,必要条件和互斥 - 从VQA V2.0验证数据集的制作。我们证明了我们的框架由〜15%提高VQA模型的一致性对基于规则的数据集,在〜VQA,影响7%的数据集和鲁棒性的〜2%,而不会降低其性能。此外,我们还定量地显示在地图的关注凸显其视觉和语言的更好的多模态的认识的提高。
注:中文为机器翻译结果!封面为论文标题词云图!