0%

【arxiv论文】 Computation and Language 2020-06-18

目录

1. On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms [PDF] 摘要
2. Fine-grained Sentiment Controlled Text Generation [PDF] 摘要
3. An Exploratory Study of Argumentative Writing by Young Students: A Transformer-based Approach [PDF] 摘要
4. Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification [PDF] 摘要
5. A Tweet-based Dataset for Company-Level Stock Return Prediction [PDF] 摘要
6. Automatically Ranked Russian Paraphrase Corpus for Text Generation [PDF] 摘要
7. Exploiting Review Neighbors for Contextualized Helpfulness Prediction [PDF] 摘要
8. Iterative Edit-Based Unsupervised Sentence Simplification [PDF] 摘要
9. Building Low-Resource NER Models Using Non-Speaker Annotation [PDF] 摘要
10. Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network [PDF] 摘要
11. Modeling subjective assessments of guilt in newspaper crime narratives [PDF] 摘要
12. Cross-lingual Retrieval for Iterative Self-Supervised Training [PDF] 摘要
13. EPIE Dataset: A Corpus For Possible Idiomatic Expressions [PDF] 摘要
14. Selective Question Answering under Domain Shift [PDF] 摘要
15. The Role of Verb Semantics in Hungarian Verb-Object Order [PDF] 摘要
16. Conversational Neuro-Symbolic Commonsense Reasoning [PDF] 摘要
17. A novel sentence embedding based topic detection method for micro-blog [PDF] 摘要
18. Contrastive Learning for Weakly Supervised Phrase Grounding [PDF] 摘要
19. De-Anonymizing Text by Fingerprinting Language Generation [PDF] 摘要
20. CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization [PDF] 摘要

摘要

1. On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms [PDF] 返回目录
  Adam Sutton, Nello Cristianini
Abstract: Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of "concept" as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others.
摘要:字曲面嵌入在多个自然语言处理(NLP)应用广泛的应用。他们是在一本字典的每个词,在一个大的语料库,从这些词的统计特性推断出相关的坐标。在本文中,我们介绍了“概念”的概念作为具有共享语义内容的单词列表。我们用这个概念来分析的某些概念的学习能力,定义为分类上它的一个随机子集训练后认识到的一个概念看不见成员的能力。我们首先使用这种方法来衡量的概念,在预训练字的嵌入的可学性。然后,我们开发的概念可学习性的统计分析的基础上,假设检验和ROC曲线,以比较使用固定的语料库和超参数的各种嵌入算法的相对优点。我们发现所有的嵌入方法捕捉到这些词汇表的语义内容,但fastText执行比别人更好。

2. Fine-grained Sentiment Controlled Text Generation [PDF] 返回目录
  Bidisha Samanta, Mohit Agarwal, Niloy Ganguly
Abstract: Controlled text generation techniques aim to regulate specific attributes (e.g. sentiment) while preserving the attribute independent content. The state-of-the-art approaches model the specified attribute as a structured or discrete representation while making the content representation independent of it to achieve a better control. However, disentangling the text representation into separate latent spaces overlooks complex dependencies between content and attribute, leading to generation of poorly constructed and not so meaningful sentences. Moreover, such an approach fails to provide a finer control on the degree of attribute change. To address these problems of controlled text generation, in this paper, we propose DE-VAE, a hierarchical framework which captures both information enriched entangled representation and attribute specific disentangled representation in different hierarchies. DE-VAE achieves better control of sentiment as an attribute while preserving the content by learning a suitable lossless transformation network from the disentangled sentiment space to the desired entangled representation. Through feature supervision on a single dimension of the disentangled representation, DE-VAE maps the variation of sentiment to a continuous space which helps in smoothly regulating sentiment from positive to negative and vice versa. Detailed experiments on three publicly available review datasets show the superiority of DE-VAE over recent state-of-the-art approaches.
摘要:受控文本生成技术的目的,同时保留属性独立的内容,以调节特定属性(例如情绪)。状态的最先进的接近指定的属性作为结构化的或离散表示,同时使表示内容独立它的实现更好的控制模型中的。然而,解开文本表示为单独的潜在空间俯瞰内容和属性之间复杂的依赖关系,导致代建造不良和不那么有意义的句子。此外,这样的方法不能提供关于属性变化度的更精细的控制。为了解决控制文本生成的这些问题,在本文中,我们提出了DE-VAE,分层架构,同时捕捉信息丰富纠缠代表性和特定属性在不同层次解开表示。 DE-VAE实现情绪的更好的控制,作为属性,同时保留通过从解缠结的情绪空间到所需的缠结表示学习适当的无损变换网络的内容。通过对解缠结表示的单维特征监督,DE-VAE映射情绪的一个连续空间,这有助于顺利地调节情绪从正到负,反之亦然的变化。三个公开可用的数据集审查详细的实验表明,在最近的国家的最先进的方法DE-VAE的优越性。

3. An Exploratory Study of Argumentative Writing by Young Students: A Transformer-based Approach [PDF] 返回目录
  Debanjan Ghosh, Beata Beigman Klebanov, Yi Song
Abstract: We present a computational exploration of argument critique writing by young students. Middle school students were asked to criticize an argument presented in the prompt, focusing on identifying and explaining the reasoning flaws. This task resembles an established college-level argument critique task. Lexical and discourse features that utilize detailed domain knowledge to identify critiques exist for the college task but do not perform well on the young students data. Instead, transformer-based architecture (e.g., BERT) fine-tuned on a large corpus of critique essays from the college task performs much better (over 20% improvement in F1 score). Analysis of the performance of various configurations of the system suggests that while children's writing does not exhibit the standard discourse structure of an argumentative essay, it does share basic local sequential structures with the more mature writers.
摘要:我们青年学生提出论点的批评写作的计算探索。中学生被要求批评的及时提出了一个论点,专注于识别和解释推理缺陷。此任务类似于一个既定的大学水平的参数批判的任务。利用详细的领域知识,以确定词汇批评和话语特征存在,为高校的任务,但不要在青少年学生数据表现良好。相反,基于变压器的架构(例如,BERT)微调从大学的任务进行更好的(在F1得分超过20%的改善)的大型语料库批评随笔。该系统的各种配置的性能分析指出,尽管孩子们的写作并没有表现出议论文的标准话语结构,它的份额​​基本局部序列结构与更成熟的作家。

4. Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification [PDF] 返回目录
  Anton Alekseev, Elena Tutubalina, Valentin Malykh, Sergey Nikolenko
Abstract: Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
摘要:基于自我关注深学习体系结构已经最近取得和无监督方面的提取和主题建模的任务超越了艺术的结果状态。虽然模型,如神经重视基础方面萃取(ABAE)已成功地应用于用户生成的文本,当应用到传统的数据源,如新闻报道和新闻组的文件他们不太连贯。在这项工作中,我们将介绍以提高基于新闻组内容了解到话题方面,而无需修改ABAE的基本机制基于句子过滤的简单方法。我们培养了概率分类域外的文本(外数据集)和在文本域(目标数据集)之间进行区分。然后,数据准备过程中,我们筛选出具有域内是低概率的句子和培训上的剩余句子的神经网络模型。在主题连贯句子滤波的积极作用表现在比较训练有素的未经过滤的文本方面提取模型。

5. A Tweet-based Dataset for Company-Level Stock Return Prediction [PDF] 返回目录
  Karolina Sowinska, Pranava Madhyastha
Abstract: Public opinion influences events, especially related to stock market movement, in which a subtle hint can influence the local outcome of the market. In this paper, we present a dataset that allows for company-level analysis of tweet based impact on one-, two-, three-, and seven-day stock returns. Our dataset consists of 862, 231 labelled instances from twitter in English, we also release a cleaned subset of 85, 176 labelled instances to the community. We also provide baselines using standard machine learning algorithms and a multi-view learning based approach that makes use of different types of features. Our dataset, scripts and models are publicly available at: this https URL.
摘要:舆论影响的事件,尤其是涉及到股市的运动,在这种微妙的暗示能够影响市场的局部结果。在本文中,我们提出了一个数据集,允许基于鸣叫影响公司层面分析一,二,三,和七天的股票收益。我们的数据包括862,231从Twitter的英文标记情况下,我们也发布清洗子集的85,176层标记的实例给社会。我们还提供了使用标准的机器学习算法和多视图学习基础的方法,使得使用不同类型的特征的基线。我们的数据,脚本和模型都公布于:此HTTPS URL。

6. Automatically Ranked Russian Paraphrase Corpus for Text Generation [PDF] 返回目录
  Vadim Gudkov, Olga Mitrofanova, Elizaveta Filippskikh
Abstract: The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be applied in end-to-end text generation solutions. Meanwhile, paraphrase generation requires a large amount of training data. In our study we propose a solution to the problem: we collect, rank and evaluate a new publicly available headline paraphrase corpus (ParaPhraser Plus), and then perform text generation experiments with manual evaluation on automatically ranked corpora using the Universal Transformer architecture.
摘要:本文主要关注自动发展和大型语料库俄罗斯意译代其被证明是俄罗斯计算语言学这种类型的第一语料库的排名。现有俄罗斯手动注释的数据集复述被限制为小尺寸ParaPhraser语料库和ParaPlag其适合于一组NLP任务,如复述和剽窃检测,句子相似度和相关性估计,等等。由于尺寸的限制,这些数据集可以在终端到终端的文本生成的解决方案几乎没有被应用。同时,意译生成需要大量的训练数据。在我们的研究中,我们提出了一个解决问题的办法:我们收集,等级和评估新的可公开获得的标题复述语料库(ParaPhraser加号),然后用自动排使用通用变压器架构语料库人工评估进行文本生成实验。

7. Exploiting Review Neighbors for Contextualized Helpfulness Prediction [PDF] 返回目录
  Jiahua Du, Jia Rong, Hua Wang, Yanchun Zhang
Abstract: Helpfulness prediction techniques have been widely used to identify and recommend high-quality online reviews to customers. Currently, the vast majority of studies assume that a review's helpfulness is self-contained. In practice, however, customers hardly process reviews independently given the sequential nature. The perceived helpfulness of a review is likely to be affected by its sequential neighbors (i.e., context), which has been largely ignored. This paper proposes a new methodology to capture the missing interaction between reviews and their neighbors. The first end-to-end neural architecture is developed for neighbor-aware helpfulness prediction (NAP). For each review, NAP allows for three types of neighbor selection: its preceding, following, and surrounding neighbors. Four weighting schemes are designed to learn context clues from the selected neighbors. A review is then contextualized into the learned clues for neighbor-aware helpfulness prediction. NAP is evaluated on six domains of real-world online reviews against a series of state-of-the-art baselines. Extensive experiments confirm the effectiveness of NAP and the influence of sequential neighbors on a current reviews. Further hyperparameter analysis reveals three main findings. (1) On average, eight neighbors treated with uneven importance are engaged for context construction. (2) The benefit of neighbor-aware prediction mainly results from closer neighbors. (3) Equally considering up to five closest neighbors of a review can usually produce a weaker but tolerable prediction result.
摘要:乐于助人预测技术已被广泛用于识别和推荐优质的在线评论给客户。目前,绝大多数研究认为检讨的乐于助人自成体系。然而在实践中,客户难以处理的评论给出独立的顺序性。审查的感知乐于助人可能被其邻居顺序(即上下文),这在很大程度上被忽略的影响。本文提出了一种新的方法来捕捉评论和他们的邻居之间缺少互动。第一端至端的神经结构被邻居感知有用性预测(NAP)开发。对于每个评论,NAP允许三种类型的邻居选择的:它之前,之后,和周围的邻居。四个加权方案的目的是学习从选定的邻居上下文线索。然后审查语境转化为邻居感知助人为乐预测学习线索。 NAP是对与一系列国家的最先进的基线的真实世界在线评论六个领域进行评估。大量的实验证实NAP的有效性和顺序邻居对当前评论的影响。另外超参数分析显示三个主要结论。 (1)平均而言,与不均匀重要性处理八个相邻从事上下文建设。 (2)邻居感知预测的好处主要是由于更近的邻居。 (3)同样考虑到一个审查的5米最近的邻居,通常可以产生一个较弱,但容忍预测结果。

8. Iterative Edit-Based Unsupervised Sentence Simplification [PDF] 返回目录
  Dhruv Kumar, Lili Mou, Lukasz Golab, Olga Vechtomova
Abstract: We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretable. Experiments on Newsela and WikiLarge datasets show that our approach is nearly as effective as state-of-the-art supervised approaches.
摘要:本文提出了一种新的迭代,编辑为基础的无监督的句子简化。我们的模型是由涉及的流畅性,简单性和意义保存打分函数引导。然后,我们就反复复句进行单词和短语的级别进行编辑。与以前的方法相比,我们的模型并不需要平行训练集,但更可控和解释。在Newsela和WikiLarge数据集实验结果表明,我们的做法是作为国家的最先进的监督的方法几乎一样有效。

9. Building Low-Resource NER Models Using Non-Speaker Annotation [PDF] 返回目录
  Tatiana Tsygankova, Francesca Marini, Stephen Mayhew, Dan Roth
Abstract: In low-resource natural language processing (NLP), the key problem is a lack of training data in the target language. Cross-lingual methods have had notable success in addressing this concern, but in certain common circumstances, such as insufficient pre-training corpora or languages far from the source language, their performance suffers. In this work we propose an alternative approach to building low-resource Named Entity Recognition (NER) models using "non-speaker" (NS) annotations, provided by annotators with no prior experience in the target language. We recruit 30 participants to annotate unfamiliar languages in a carefully controlled annotation experiment, using Indonesian, Russian, and Hindi as target languages. Our results show that use of non-speaker annotators produces results that approach or match performance of fluent speakers. NS results are also consistently on par or better than cross-lingual methods built on modern contextual representations, and have the potential to further outperform with additional effort. We conclude with observations of common annotation practices and recommendations for maximizing non-speaker annotator performance.
摘要:在资源匮乏的自然语言处理(NLP),关键的问题是在目标语言缺乏训练数据。跨语言的方法已经在解决这一问题显着的成功,但在某些常见的情况下,如不足前的训练语料库或远离源语言的语言,其性能会受到影响。在这项工作中,我们提出了一种替代的方法来构建低资源使用“非扬声器”(NS)的注释,注释者通过在目标语言没有现成的经验提供了命名实体识别(NER)模型。我们招收30名学员注释不熟悉的语言在严格控制的注解实验中,用印尼语,俄语和印地文为目标语言。我们的研究结果显示,使用非扬声器注释器所产生的结果流利的扬声器的方法或匹配性能。 NS结果也是一致的差不多,甚至比建立在现代语境表示跨语言的方法更好,并有额外的努力,进一步跑赢大盘的潜力。最后,我们的共同标注的做法,并最大限度地提高非扬声器标注性能的建议意见。

10. Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network [PDF] 返回目录
  Tianwen Jiang, Tong Zhao, Bing Qin, Ting Liu, Nitesh V. Chawla, Meng Jiang
Abstract: Noun phrases and relational phrases in Open Knowledge Bases are often not canonical, leading to redundant and ambiguous facts. In this work, we integrate structural information (from which tuple, which sentence) and semantic information (semantic similarity) to do the canonicalization. We represent the two types of information as a multi-layered graph: the structural information forms the links across the sentence, relational phrase, and noun phrase layers; the semantic information forms weighted intra-layer links for each layer. We propose a graph neural network model to aggregate the representations of noun phrases and relational phrases through the multi-layered meta-graph structure. Experiments show that our model outperforms existing approaches on a public datasets in general domain.
摘要:名词短语和开放知识基地关系的短语往往不规范,导致多余的,模棱两可的事实。在这项工作中,我们整合结构信息(来自哪一个数组,这句话)和语义信息(语义相似)做标准化。我们所代表的两种类型的作为多分层图信息:所述结构信息形成横跨句子的链接,关系短语,名词短语层;语义信息的形式加权的每个层的层内的链接。我们提出了一个图表神经网络模型通过多层元图形结构来聚集名词短语和词组的关系的表示。实验结果表明,现有的模型优于在一般领域公共数据集的方法。

11. Modeling subjective assessments of guilt in newspaper crime narratives [PDF] 返回目录
  Elisa Kreiss, Zijian Wang, Christopher Potts
Abstract: Crime reporting is a prevalent form of journalism with the power to shape public perceptions and social policies. How does the language of these reports act on readers? We seek to address this question with the SuspectGuilt Corpus of annotated crime stories from English-language newspapers in the U.S. For SuspectGuilt, annotators read short crime articles and provided text-level ratings concerning the guilt of the main suspect as well as span-level annotations indicating which parts of the story they felt most influenced their ratings. SuspectGuilt thus provides a rich picture of how linguistic choices affect subjective guilt judgments. In addition, we use SuspectGuilt to train and assess predictive models, and show that these models benefit from genre pretraining and joint supervision from the text-level ratings and span-level annotations. Such models might be used as tools for understanding the societal effects of crime reporting.
摘要:报案是塑造公众的看法和社会政策的权力新闻业的普遍形式。如何对这些报告的语言作用于读者?我们寻求解决这一问题,从英文报纸在美国SuspectGuilt标注的犯罪故事的SuspectGuilt语料库,注释读了关于主要犯罪嫌疑人的有罪以及跨级别的注解短犯罪的文章和文字级评级表明他们认为故事的部分最具影响力的评级。因此SuspectGuilt提供了如何语言选择影响主观罪过的判断丰富的图片。此外,我们使用SuspectGuilt训练和评估预测模型,并表明,这些模型从体裁训练前和联合监管受益于文字级评级和跨级别的注解。这种模型可作为工具,了解犯罪报告的社会影响。

12. Cross-lingual Retrieval for Iterative Self-Supervised Training [PDF] 返回目录
  Chau Tran, Yuqing Tang, Xian Li, Jiatao Gu
Abstract: Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach -- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU improvement on average compared to mBART, when finetuned on supervised machine translation downstream tasks.
摘要:最近的研究表明多语种预训练的语言模型的跨语言取向能力。在这项工作中,我们发现,跨语种对准可以通过使用自己的编码器输出开采句对训练seq2seq车型进一步提高。我们利用这些发现开发一种新的方法 - 跨语言检索迭代自我指导训练(CRISS),其中采矿和训练过程迭代应用,提高在同一时间跨语种对齐和翻译能力。使用这种方法,我们实现了国家的最先进的无监督的机器翻译结果于9个语言方向2.4 BLEU的平均改善,以及21.5%的平均改善16种语言的XTREME标杆Tatoeba句子检索任务在绝对精度。此外,CRISS也带来了平均额外1.8 BLEU改进相比mBART,有监督的机器翻译下游任务微调,时。

13. EPIE Dataset: A Corpus For Possible Idiomatic Expressions [PDF] 返回目录
  Prateek Saxena, Soma Paul
Abstract: Idiomatic expressions have always been a bottleneck for language comprehension and natural language understanding, specifically for tasks like Machine Translation(MT). MT systems predominantly produce literal translations of idiomatic expressions as they do not exhibit generic and linguistically deterministic patterns which can be exploited for comprehension of the non-compositional meaning of the expressions. These expressions occur in parallel corpora used for training, but due to the comparatively high occurrences of the constituent words of idiomatic expressions in literal context, the idiomatic meaning gets overpowered by the compositional meaning of the expression. State of the art Metaphor Detection Systems are able to detect non-compositional usage at word level but miss out on idiosyncratic phrasal idiomatic expressions. This creates a dire need for a dataset with a wider coverage and higher occurrence of commonly occurring idiomatic expressions, the spans of which can be used for Metaphor Detection. With this in mind, we present our English Possible Idiomatic Expressions(EPIE) corpus containing 25206 sentences labelled with lexical instances of 717 idiomatic expressions. These spans also cover literal usages for the given set of idiomatic expressions. We also present the utility of our dataset by using it to train a sequence labelling module and testing on three independent datasets with high accuracy, precision and recall scores.
摘要:习惯用语一直对语言理解和自然语言理解的瓶颈,特别是对于像机器翻译(MT)的任务。 MT系统主要产生习惯用语的直译,因为他们没有表现出可以用于表达的非成分的含义的理解被利用通用和语言确定性的模式。这些表达式发生在用于训练平行语料库,但由于相对高的出现在字面上下文习惯用语的构成的话,在惯用的含义得到由表达的组成含义制服。艺术隐喻检测系统的国家都能够在字的级别检测非成分的使用,但错过特质短语习惯用语。这产生了具有更宽覆盖范围的数据集和通常存在的习惯用语较高发生率的迫切需要,它的跨度可用于隐喻检测。考虑到这一点,我们提出了包含标记的717台惯用语词汇实例25206分的句子我们的英语可能习语(EPIE)语料库。这些跨度也涵盖了给定的习惯用语的字面上的用法。我们还用它来训练序列标注模块和以高准确度,精确度和召回分数三个独立的数据集测试提出我们的数据集的效用。

14. Selective Question Answering under Domain Shift [PDF] 返回目录
  Amita Kamath, Robin Jia, Percy Liang
Abstract: To avoid giving wrong answers, question answering (QA) models need to know when to abstain from answering. Moreover, users often ask questions that diverge from the model's training data, making errors more likely and thus abstention more critical. In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i.e., not abstain on) as many questions as possible while maintaining high accuracy. Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs. Instead, we train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely. Crucially, the calibrator benefits from observing the model's behavior on out-of-domain data, even if from a different domain than the test data. We combine this method with a SQuAD-trained QA model and evaluate on mixtures of SQuAD and five other QA datasets. Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.
摘要:为了避免给人错误的答案,问答(QA)模型需要知道何时从应答弃权。此外,用户经常会问的问题是发散从模型的训练数据,失误更容易,因而弃权更为关键。在这项工作中,我们提出了选择性答疑下域切换中的设置,其中QA模型是在域和外的域数据的混合测试,必须回答(即不投弃权票)多达问题,同时尽可能保持高精度。仅在模型的概率添加Softmax基于弃权政策表现不佳,因为模型上域外的投入过于自信。相反,我们培养了校准,以确定在其上QA模型ERRS,和弃权时,预测错误可能是投入。最重要的是,通过观察模型的行为对出域的数据,即使从比测试数据不同的域校准优点。我们有一队训练有素QA模型结合这种方法和评价等五个QA数据集队的混合物。我们的方法回答了问题,56%,同时保持80%的准确率;相比之下,直接使用模型的概率只有在回答了80%的准确率48%。

15. The Role of Verb Semantics in Hungarian Verb-Object Order [PDF] 返回目录
  Dorottya Demszky, László Kálmán, Dan Jurafsky, Beth Levin
Abstract: Hungarian is often referred to as a discourse-configurational language, since the structural position of constituents is determined by their logical function (topic or comment) rather than their grammatical function (e.g., subject or object). We build on work by Komlósy (1989) and argue that in addition to discourse context, the lexical semantics of the verb also plays a significant role in determining Hungarian word order. In order to investigate the role of lexical semantics in determining Hungarian word order, we conduct a large-scale, data-driven analysis on the ordering of 380 transitive verbs and their objects, as observed in hundreds of thousands of examples extracted from the Hungarian Gigaword Corpus. We test the effect of lexical semantics on the ordering of verbs and their objects by grouping verbs into 11 semantic classes. In addition to the semantic class of the verb, we also include two control features related to information structure, object definiteness and object NP weight, chosen to allow a comparison of their effect size to that of verb semantics. Our results suggest that all three features have a significant effect on verb-object ordering in Hungarian and among these features, the semantic class of the verb has the largest effect. Specifically, we find that stative verbs, such as fed "cover", jelent "mean" and övez "surround", tend to be OV-preferring (with the exception of psych verbs which are strongly VO-preferring) and non-stative verbs, such as bírál "judge", csökkent "reduce" and csókol "kiss", verbs tend to be VO-preferring. These findings support our hypothesis that lexical semantic factors influence word order in Hungarian.
摘要:匈牙利通常被称为话语式构型的语言,因为组分的结构的位置由它们的逻辑功能(主题或注释),而不是它们的语法功能(例如,主题或对象)来确定。我们建立对Komlósy(1989年)的工作,并认为,除了话语背景下,该词汇语义动词也起着决定匈牙利词序一个显著的作用。为了研究词汇语义的决定匈牙利词序的作用,我们对380个及物动词及其对象的排序进行了大规模,数据驱动的分析,数以十万计的例子来自匈牙利Gigaword提取观察语料库。我们通过分组动词分为11类语义上测试动词及其对象的排序词汇语义的影响。除语义类别的动词,我们也包括两个控制功能相关的信息的结构,对象定性与对象NP重量,选择为允许它们的作用大小与该动词的语义的比较。我们的研究结果表明,所有三个功能在匈牙利和这些功能之间的动宾订购一个显著的效果,语义类的动词有最大的影响。具体来说,我们发现,静态动词,如进料“覆盖”,jelent“平均”和övez“环绕声”,往往是OV-宁愿(与心理动词其是强烈VO-宁愿除外)和非静态动词如BIRAL“法官”,csökkent“减少”和csókol“吻”,动词往往是VO-喜欢。这些研究结果支持了我们的假设词汇语义因素影响匈牙利词序。

16. Conversational Neuro-Symbolic Commonsense Reasoning [PDF] 返回目录
  Forough Arabshahi, Jennifer Lee, Mikayla Gawarecki, Kathryn Mazaitis, Amos Azaria, Tom Mitchell
Abstract: One aspect of human commonsense reasoning is the ability to make presumptions about daily experiences, activities and social interactions with others. We propose a new commonsense reasoning benchmark where the task is to uncover commonsense presumptions implied by imprecisely stated natural language commands in the form of if-then-because statements. For example, in the command "If it snows at night then wake me up early because I don't want to be late for work" the speaker relies on commonsense reasoning of the listener to infer the implicit presumption that it must snow enough to cause traffic slowdowns. Such if-then-because commands are particularly important when users instruct conversational agents. We release a benchmark data set for this task, collected from humans and annotated with commonsense presumptions. We develop a neuro-symbolic theorem prover that extracts multi-hop reasoning chains and apply it to this problem. We further develop an interactive conversational framework that evokes commonsense knowledge from humans for completing reasoning chains.
摘要:人的常识推理的一个方面是作出关于日常经验,活动和与他人的社会交往推定的能力。我们提出了一个新的常识推理的基准,其中任务是由不准确陈述自然语言指令在IF-THEN-因为语句的形式暗示揪出常识推定。例如,在命令“如果下雪的晚上,然后叫我起床早,因为我不想上班迟到”扬声器依赖于听者的常识推理推断隐含的假定,即它必须积雪足以引起交通速度的下降。这样的IF-THEN-因为命令特别重要,当用户指示会话代理。我们发布这个任务的基准数据集,从人类收集并用常识推定注解。我们开发了一个神经象征性的定理证明提取的多跳推理链并将其应用到这个问题。我们进一步开发交互式对话框架,唤起常识知识来自人类完成推理链。

17. A novel sentence embedding based topic detection method for micro-blog [PDF] 返回目录
  Cong Wan, Shan Jiang, Cuirong Wang, Cong Wang, Changming Xu, Xianxia Chen, Ying Yuan
Abstract: Topic detection is a challenging task, especially without knowing the exact number of topics. In this paper, we present a novel approach based on neural network to detect topics in the micro-blogging dataset. We use an unsupervised neural sentence embedding model to map the blogs to an embedding space. Our model is a weighted power mean word embedding model, and the weights are calculated by attention mechanism. Experimental result shows our embedding method performs better than baselines in sentence clustering. In addition, we propose an improved clustering algorithm referred as relationship-aware DBSCAN (RADBSCAN). It can discover topics from a micro-blogging dataset, and the topic number depends on dataset character itself. Moreover, in order to solve the problem of parameters sensitive, we take blog forwarding relationship as a bridge of two independent clusters. Finally, we validate our approach on a dataset from sina micro-blog. The result shows that we can detect all the topics successfully and extract keywords in each topic.
摘要:主题检测是一项艰巨的任务,尤其是在不知道的话题的确切数目。在本文中,我们提出了基于神经网络的检测在微博数据集主题的新方法。我们使用无监督神经句子嵌入模式,以博客映射到一个嵌入的空间。我们的模型是一个加权均值字嵌入模型,而权重则通过注意机制计算。实验结果表明我们的埋线法进行比句子集群基线。此外,我们提出了一种改进的聚类算法称为关系感知DBSCAN(RADBSCAN)。它可以从一个微博数据集中发现的话题,而话题数量取决于数据集角色本身。此外,为了解决参数敏感的问题,我们需要为两个独立的集群的桥梁博客转发关系。最后,我们验证了我们来自新浪微博客的数据集的方法。结果表明,我们可以检测到所有各主题中的成功主题和提取关键词。

18. Contrastive Learning for Weakly Supervised Phrase Grounding [PDF] 返回目录
  Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
Abstract: Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a $\sim10\%$ absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of $5.7\%$ to achieve $76.7\%$ accuracy on Flickr30K Entities benchmark.
摘要:短语接地,图像区域以标题词相关联的问题,是视觉语言任务的重要组成部分。我们表明,短语接地可以通过优化字区域的关注,最大限度上的图像和字幕的话之间的相互信息的下界来学习。由于对图像和字幕,我们最大限度的注意权重的区域,并在相应的标题字样的兼容性,相比于非对应的成对的图像和字幕。一个关键的概念是构建有效的负字幕通过语言模型引导词替换学习。我们的底片训练产生了$ \ sim10 \%$绝对精度增益过随机采样底片从训练数据。培训了COCO-字幕显示了我们的弱监督短语基础模型$ 5.7 \%$的健康增益,以实现对Flickr30K实体基准$ 76.7 \%$准确性。

19. De-Anonymizing Text by Fingerprinting Language Generation [PDF] 返回目录
  Zhen Sun, Roei Schuster, Vitaly Shmatikov
Abstract: Components of machine learning systems are not (yet) perceived as security hotspots. Secure coding practices, such as ensuring that no execution paths depend on confidential inputs, have not yet been adopted by ML developers. We initiate the study of code security of ML systems by investigating how nucleus sampling---a popular approach for generating text, used for applications such as auto-completion---unwittingly leaks texts typed by users. Our main result is that the series of nucleus sizes for many natural English word sequences is a unique fingerprint. We then show how an attacker can infer typed text by measuring these fingerprints via a suitable side channel (e.g., cache access times), explain how this attack could help de-anonymize anonymous texts, and discuss defenses.
摘要:机器学习系统的部件(还)没有被视为安全的热点。安全编码实践,如确保没有执行路径依赖于机密投入,尚未通过ML开发商。我们通过调查如何取样核---用于生成文本流行的方法,用于应用,如自动完成启动ML系统的代码安全性的研究---不知不觉中泄露用户键入的文本。我们的主要结果是,对许多自然的英文单词序列的一系列核的大小是一个独特的指纹。然后,我们展示了如何攻击者可以通过通过合适的边信道测量这些指纹推断键入的文本(例如,高速缓存访​​问时间),说明这种攻击可以如何帮助去进行匿名匿名文本,并讨论防御。

20. CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization [PDF] 返回目录
  Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, Richard Socher
Abstract: The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. As of May 2020, 128,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset Challenge. Here we present CO-Search, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis. The retriever is built from a Siamese-BERT encoder that is linearly composed with a TF-IDF vectorizer, and reciprocal-rank fused with a BM25 vectorizer. The ranker is composed of a multi-hop question-answering module, that together with a multi-paragraph abstractive summarizer adjust retriever scores. To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations, creating 1.3 million (citation title, paragraph) tuples for training the encoder. We evaluate our system on the data of the TREC-COVID information retrieval challenge. CO-Search obtains top performance on the datasets of the first and second rounds, across several key metrics: normalized discounted cumulative gain, precision, mean average precision, and binary preference.
摘要:COVID-19全球大流行已导致国际努力了解,跟踪和缓解病情,产生COVID-19和SARS-COV-2相关的跨学科出版物显著语料库。截至5月到2020年,128000冠状病毒相关出版物已通过COVID-19开放研究数据集挑战收集。在这里,我们目前CO-检索,检索,排序器的语义搜索引擎,旨在通过COVID-19文学来处理复杂的查询,在危机期间的时间找到科学回答可能帮助负担过重的卫生工作者。检索器是从线性与TF-IDF矢量化,并与BM25矢量化融合互惠秩组成的连体-BERT编码器内置。排名器是由一个多跳答疑模块,具有多段抽象概括器一起调整猎犬分数。要帐户域特定的数据集和相对有限,我们生成的文档段落和引文的二分图,创造130万(引文标题,段落)的元组训练编码器。我们评估我们在TREC-COVID信息检索挑战的数据系统。 CO-搜索取得第一轮和第二轮的数据集顶级性能,在几个关键指标:归贴现累计收益,精密,平均平均精度和二进制偏好。

注:中文为机器翻译结果!