0%

【arxiv论文】 Computation and Language 2020-09-24

目录

1. On the Ability of Self-Attention Networks to Recognize Counter Languages [PDF] 摘要
2. A Token-wise CNN-based Method for Sentence Compression [PDF] 摘要
3. A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [PDF] 摘要
4. Crosslingual Topic Modeling with WikiPDA [PDF] 摘要
5. Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages [PDF] 摘要
6. Hierarchical Pre-training for Sequence Labelling in Spoken Dialog [PDF] 摘要
7. Evolution of Part-of-Speech in Classical Chinese [PDF] 摘要
8. Worst-Case-Aware Curriculum Learning for Zero and Few Shot Transfer [PDF] 摘要
9. Seq2Edits: Sequence Transduction Using Span-level Edit Operations [PDF] 摘要
10. Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling [PDF] 摘要
11. KoBE: Knowledge-Based Machine Translation Evaluation [PDF] 摘要
12. The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [PDF] 摘要
13. Exploiting Vietnamese Social Media Characteristics for Textual Emotion Recognition in Vietnamese [PDF] 摘要
14. LA-HCN: Label-based Attention for Hierarchical Multi-label TextClassification Neural Network [PDF] 摘要
15. Controlling Style in Generated Dialogue [PDF] 摘要
16. Keeping Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions [PDF] 摘要
17. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [PDF] 摘要
18. Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts [PDF] 摘要
19. Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification [PDF] 摘要
20. On Data Augmentation for Extreme Multi-label Classification [PDF] 摘要
21. Lifelong Learning Dialogue Systems: Chatbots that Self-Learn On the Job [PDF] 摘要
22. X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers [PDF] 摘要
23. Cosine Similarity of Multimodal Content Vectors for TV Programmes [PDF] 摘要
24. Text Classification with Novelty Detection [PDF] 摘要
25. Message Passing for Hyper-Relational Knowledge Graphs [PDF] 摘要

摘要

1. On the Ability of Self-Attention Networks to Recognize Counter Languages [PDF] 返回目录
  Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
Abstract: Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so. We first provide a construction of Transformers for a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations. In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction. Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behavior and the influence of positional encoding schemes on the learning and generalization ability of the model.
摘要:变压器已经在大量的NLP任务取代复发模型。然而,在自己的能力模型不同的语法特性的差异仍是未知。过去的作品表明,LSTMs概括得非常好定期语言,并与反语言的密切联系。在这项工作中,我们系统地研究变形金刚这样的语言以及其各个组成部分在这样的榜样的能力。我们首先对反语言,包括充分研究的语言,如n元布尔表达式,戴克-1,和其概括的子类提供变压器的结构。在实验中,我们发现,变形金刚在这个子类做的很好,和自己所学的机制与我们建设强相关。也许令人惊讶,而相比之下,LSTMs,变形金刚仅与降解性能正规语言的子集,做好,因为我们让语言更复杂根据复杂的一个众所周知的措施。我们的分析还提供了自我注意机制在模拟某些行为和对模型的学习和推广能力位置编码方案的影响作用的见解。

2. A Token-wise CNN-based Method for Sentence Compression [PDF] 返回目录
  Weiwei Hou, Hanna Suominen, Piotr Koniusz, Sabrina Caldwell, Tom Gedeon
Abstract: Sentence compression is a Natural Language Processing (NLP) task aimed at shortening original sentences and preserving their key information. Its applications can benefit many fields e.g. one can build tools for language education. However, current methods are largely based on Recurrent Neural Network (RNN) models which suffer from poor processing speed. To address this issue, in this paper, we propose a token-wise Convolutional Neural Network, a CNN-based model along with pre-trained Bidirectional Encoder Representations from Transformers (BERT) features for deletion-based sentence compression. We also compare our model with RNN-based models and fine-tuned BERT. Although one of the RNN-based models outperforms marginally other models given the same input, our CNN-based model was ten times faster than the RNN-based approach.
摘要:句子压缩是一种自然语言处理(NLP)的任务,旨在缩短原来的句子和保护他们的关键信息。它的应用可以受益例如许多领域一个可以建立语言教学的工具。然而,目前的方法主要是基于回归神经网络(RNN)从处理速度差遭受机型。为了解决这个问题,在本文中,我们提出了一个令牌明智的卷积神经网络,与预训练双向编码表示一起基于CNN模型从变压器(BERT)功能,基于删除句压缩。我们还比较我们与基于RNN的模型和微调BERT模式。虽然给出了相同的输入基于RNN的模型性能优于其他轻微的车型之一,我们的基于CNN的模型比基于RNN的方法快十倍。

3. A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [PDF] 返回目录
  Alexander Kalinowski, Yuan An
Abstract: Sentence embeddings encode natural language sentences as low-dimensional dense vectors. A great deal of effort has been put into using sentence embeddings to improve several important natural language processing tasks. Relation extraction is such an NLP task that aims at identifying structured relations defined in a knowledge base from unstructured text. A promising and more efficient approach would be to embed both the text and structured knowledge in low-dimensional spaces and discover semantic alignments or mappings between them. Although a number of techniques have been proposed in the literature for embedding both sentences and knowledge graphs, little is known about the structural and semantic properties of these embedding spaces in terms of relation extraction. In this paper, we investigate the aforementioned properties by evaluating the extent to which sentences carrying similar senses are embedded in close proximity sub-spaces, and if we can exploit that structure to align sentences to a knowledge graph. We propose a set of experiments using a widely-used large-scale data set for relation extraction and focusing on a set of key sentence embedding methods. We additionally provide the code for reproducing these experiments at this https URL. These embedding methods cover a wide variety of techniques ranging from simple word embedding combination to transformer-based BERT-style model. Our experimental results show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
摘要:句子的嵌入编码的自然语言中的句子作为低维密载体。更多的努力,大量已投入使用的嵌入句改善几个重要的自然语言处理任务。关系抽取是这样的NLP任务的目的是确定在从非结构化文本知识库定义的结构关系。一个有前途的,更有效的办法是同时嵌入在低维空间中的文本和结构化的知识和发现语义路线或它们之间的映射。虽然一些技术已经在文献中提出嵌入两个句子和知识图,鲜为人知的是,这些嵌入空间的关系抽取方面的结构和语义特性。在本文中,我们通过评价,其携带类似的感官句被嵌入在靠近子空间的程度调查上述特性,并且如果我们可以利用该结构,以对齐的句子向知识曲线图。我们提出了一套利用关系抽取一个广泛使用的大型数据集,并专注于一组关键句嵌入方法的实验。我们另外提供的代码在此HTTPS URL复制这些实验。这些嵌入方法涵盖了各种各样的技术,从简单的单词组合嵌入到基于变压器的BERT式模型。我们的实验结果表明,不同的嵌入空间具有的结构和语义特性不同程度的实力。这些结果提供了用于开发基于嵌入-关系提取方法的有用信息。

4. Crosslingual Topic Modeling with WikiPDA [PDF] 返回目录
  Tiziano Piccardi, Robert West
Abstract: We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links, articles are inherently language-independent. WikiPDA works in two steps, by first densifying bags of links using matrix completion and then training a standard monolingual topic model. A human evaluation shows that WikiPDA produces more coherent topics than monolingual text-based LDA, thus offering crosslinguality at no cost. We demonstrate WikiPDA's utility in two applications: a study of topical biases in 28 Wikipedia editions, and crosslingual supervised classification. Finally, we highlight WikiPDA's capacity for zero-shot language transfer, where a model is reused for new languages without any fine-tuning.
摘要:提出了一种基于维基百科,多语种狄利克雷分配(WikiPDA),一个crosslingual话题模型学会代表用任何语言在一套共同的语言无关的话题分布维基百科的文章。它利用维基百科的文章链接到对方,并映射到维基数据的知识库的概念,例如,当作为链接袋为代表,文章本质上是与语言无关的事实。 WikiPDA工作分两个步骤,首先致密的利用矩阵完成链接袋,然后培养了标准的单语主题模型。人为评估显示,WikiPDA产生比单语的基于文本的LDA更连贯的主题,因此在没有成本提供crosslinguality。我们证明WikiPDA在两个应用程序:外用的偏见在28级维基百科的版本进行研究,并crosslingual监督分类。最后,我们强调WikiPDA的零拍的语言转移,其中一个模型没有任何微调重新用于新的语言能力。

5. Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages [PDF] 返回目录
  Xavier Garcia, Aditya Siddhant, Orhan Firat, Ankur P. Parikh
Abstract: Unsupervised translation has reached impressive performance on resource-rich language pairs such as English-French and English-German. However, early studies have shown that in more realistic settings involving low-resource, rare languages, unsupervised translation performs poorly, achieving less than 3.0 BLEU. In this work, we show that multilinguality is critical to making unsupervised systems practical for low-resource settings. In particular, we present a single model for 5 low-resource languages (Gujarati, Kazakh, Nepali, Sinhala, and Turkish) to and from English directions, which leverages monolingual and auxiliary parallel data from other high-resource language pairs via a three-stage training scheme. We outperform all current state-of-the-art unsupervised baselines for these languages, achieving gains of up to 14.4 BLEU. Additionally, we outperform a large collection of supervised WMT submissions for various language pairs as well as match the performance of the current state-of-the-art supervised model for Nepali-English. We conduct a series of ablation studies to establish the robustness of our model under different degrees of data quality, as well as to analyze the factors which led to the superior performance of the proposed approach over traditional unsupervised models.
摘要:无监督的翻译已经达到了资源丰富的语言对,如英语,法语和英语,德语骄人的业绩。然而,早期的研究表明,在涉及低资源,稀有语种,翻译无人监管的表现糟糕,实现小于3.0 BLEU更现实的设置。在这项工作中,我们表明,多语言是做出实用的低资源设置无人监督系统的关键。特别是,我们提出了一个单一的模式5的低资源语言(古吉拉特,哈,尼泊尔,僧伽罗语,和土耳其)和从英国的方向,它通过一个三利用从其他高资源语言对单语和辅助的并行数据阶段培训计划。我们胜过所有当前国家的最先进的无人监督的基线为这些语言,实现了的收益14.4 BLEU。此外,我们跑赢大集合监督WMT提交的各种语言对以及当前国家的最先进的性能,符合监管的尼泊尔语英语模型。我们进行了一系列的消融研究,以建立我们的模型的稳健性在不同质量的数据,以及分析导致该方法比传统的无监督模式的卓越性能的因素。

6. Hierarchical Pre-training for Sequence Labelling in Spoken Dialog [PDF] 返回目录
  Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel
Abstract: Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (\texttt{SILICONE}). \texttt{SILICONE} is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pre-training objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over $2.3$ billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.
摘要:像对话行为和情感/情绪识别序列标注任务口语对话系统的关键组成部分。在这项工作中,我们提出了一种新的方法来学习通用表示适用于口头对话,这是我们的一个新的标杆,我们调用序列标签评估基准口语基准(\ texttt {}有机硅)评估。 \ texttt {SILICONE}是模型无关,并包含各种尺寸的10个不同的数据集。我们得到我们的交涉,基于变压器的架构,为此,我们延伸出两个著名前培训目标分层编码器。前培训是在OpenSubtitles进行:含有超过$ 2.3令牌十亿$口头对话的一个大的语料库。我们演示如何编码层次实现竞争结果与比国家的最先进的机型一贯较少的参数,我们展示自己的两个前培训及微调的重要性。

7. Evolution of Part-of-Speech in Classical Chinese [PDF] 返回目录
  Bai Li
Abstract: Classical Chinese is a language notable for its word class flexibility: the same word may often be used as a noun or a verb. Bisang (2008) claimed that Classical Chinese is a precategorical language, where the syntactic position of a word determines its part-of-speech category. In this paper, we apply entropy-based metrics to evaluate these claims on historical corpora. We further explore differences between nouns and verbs in Classical Chinese: using psycholinguistic norms, we find a positive correlation between concreteness and noun usage. Finally, we align character embeddings from Classical and Modern Chinese, and find that verbs undergo more semantic change than nouns.
摘要:中国古典值得注意的是它的词类灵活性语:同一个词可能经常被用作名词或动词。 Bisang(2008)声称,中国古典是precategorical语言,其中一个单词的语法位置决定了其部分的语音范畴。在本文中,我们采用基于熵的指标来评价历史语料这些说法。我们进一步探讨中国古典名词和动词之间的区别:用心理语言学规范,我们发现具体性和名词使用量之间的正相关关系。最后,我们从古典与现代中国对齐字符的嵌入,并发现动词进行比名词更多的语义变化。

8. Worst-Case-Aware Curriculum Learning for Zero and Few Shot Transfer [PDF] 返回目录
  Sheng Zhang, Xin Zhang, Weiming Zhang, Anders Søgaard
Abstract: Multi-task transfer learning based on pre-trained language encoders achieves state-of-the-art performance across a range of tasks. Standard approaches implicitly assume the tasks, for which we have training data, are equally representative of the tasks we are interested in, an assumption which is often hard to justify. This paper presents a more agnostic approach to multi-task transfer learning, which uses automated curriculum learning to minimize a new family of worst-case-aware losses across tasks. Not only do these losses lead to better performance on outlier tasks; they also lead to better performance in zero-shot and few-shot transfer settings.
摘要:多任务迁移学习基于预先训练的语言编码器实现在一系列任务的国家的最先进的性能。标准方法隐含承担的任务,这是我们的训练数据,也同样代表了我们感兴趣的是,假设这往往是很难证明的任务。本文提出了一种更不可知的方式多任务迁移学习,它采用自动化课程的学习,以尽量减少跨任务的最坏情况下的感知损失一个新的家庭。不仅这些损失导致对异常任务更好的性能;他们还带来更好的性能在零射门,很少拍传送设置。

9. Seq2Edits: Sequence Transduction Using Span-level Edit Operations [PDF] 返回目录
  Felix Stahlberg, Shankar Kumar
Abstract: We propose Seq2Edits, an open-vocabulary approach to sequence editing for natural language processing (NLP) tasks with a high degree of overlap between input and output texts. In this approach, each sequence-to-sequence transduction is represented as a sequence of edit operations, where each operation either replaces an entire source span with target tokens or keeps it unchanged. We evaluate our method on five NLP tasks (text normalization, sentence fusion, sentence splitting & rephrasing, text simplification, and grammatical error correction) and report competitive results across the board. For grammatical error correction, our method speeds up inference by up to 5.2x compared to full sequence models because inference time depends on the number of edits rather than the number of target tokens. For text normalization, sentence fusion, and grammatical error correction, our approach improves explainability by associating each edit operation with a human-readable tag.
摘要:本文提出Seq2Edits,开放式词汇的方法来进行自然语言处理(NLP)任务序列编辑以高度的输入和输出的文本之间的重叠。在这种方法中,每一个序列到序列转导被表示为编辑操作的顺序,其中,每个操作或者与目标令牌替换整个源跨度或保持它不变。我们评估五个NLP任务(文字规范化,句子融合,句子拆分和改述,文字简化,和语法纠错)我们的方法和报告全线竞争的结果。对于语法纠错,因为推理时间取决于编辑的数量,而不是目标的令牌数我们的方法加速了高达5.2倍相比,全序列模型推断。对于文本规范化,句子的融合,和语法纠错,我们的方法每个编辑操作与人类可读标签关联提高explainability。

10. Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling [PDF] 返回目录
  Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, disregards singleton prediction, and addresses typical targeted settings in CD coreference resolution. Aiming to set baseline results for future research that would follow our evaluation methodology, we build the first end-to-end model for this task. Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting, which outperforms state-of-the-art results by a significant margin.
摘要:跨文档的最新评估协议(CD)指代消解往往是不一致或宽松,导致整个工程以及性能高估无与伦比的结果。为了方便日后对这项工作适当的研究,我们的主要贡献是提出了一个务实的评价方法即假设只访问原始文本 - 而不是假设黄金提到,无视单预测和地址典型的目标设置在CD指代消解。旨在集基线结果将跟随我们的评测方法今后的研究,我们所建立的第一端至端模型完成这个任务。我们的模型适应和内部文档指代消解,解决CD的共参照设置,由显著利润率优于国家的先进成果扩展最近的神经模型。

11. KoBE: Knowledge-Based Machine Translation Evaluation [PDF] 返回目录
  Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey
Abstract: We propose a simple and effective method for machine translation evaluation which does not require reference translations. Our approach is based on (1) grounding the entity mentions found in each source sentence and candidate translation against a large-scale multilingual knowledge base, and (2) measuring the recall of the grounded entities found in the candidate vs. those found in the source. Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task. On 4 language pairs, we also achieve higher correlation with human judgements than BLEU. To foster further research, we release a dataset containing 1.8 million grounded entity mentions across 18 language pairs from the WMT19 metrics track data.
摘要:本文提出了一种不需要参考译文机器翻译评测一个简单而有效的方法。我们的做法是基于(1)接地的实体中的每个源句子和候选翻译对一个大规模的多语言知识库中提到发现,和(2)测量接地实体召回的候选人与那些在发现发现资源。我们的方法实现了对9个从WMT19基准评估18语言对人的判断不提到的​​相关性最高,这是获胜的关于这个任务的单一评价方法的人数最多。在4语言对,我们也实现了与人类的判断比BLEU较高的相关性。为了促进进一步的研究,我们释放出含有180万接地实体在18个语言对中提到从WMT19指标数据追踪的数据集。

12. The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [PDF] 返回目录
  Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom
Abstract: For neural models to garner widespread public trust and ensure fairness, we must have human-intelligible explanations for their predictions. Recently, an increasing number of works focus on explaining the predictions of neural models in terms of the relevance of the input features. In this work, we show that feature-based explanations pose problems even for explaining trivial models. We show that, in certain cases, there exist at least two ground-truth feature-based explanations, and that, sometimes, neither of them is enough to provide a complete view of the decision-making process of the model. Moreover, we show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations, despite the apparently implicit assumption that explainers should look for one specific feature-based explanation. These findings bring an additional dimension to consider in both developing and choosing explainers.
摘要:神经模型,以争取公众的普遍信任和确保公平,我们必须有人类可理解为他们的预言解释。近来,越来越多的作品,重点讲解在输入要素的相关性方面的神经模型的预测。在这项工作中,我们表明,基于特征的解释提出的问题甚至是琐碎的解释模型。我们表明,在某些情况下,至少存在两个基于特征的地面实况的解释,那,有时,它们都不是足以提供该模型的决策过程的完整视图。此外,我们表明,两个流行类Astaro网站,沙普利Astaro网站和最小充分子集Astaro网站,目标有着根本的不同类型的地面实况解释的,尽管显然隐含的假设,Astaro网站应该寻找一个特定的基于特征的解释。这些发现带来额外的维度在发展中国家和选择Astaro网站来考虑。

13. Exploiting Vietnamese Social Media Characteristics for Textual Emotion Recognition in Vietnamese [PDF] 返回目录
  Khang Phuoc-Quy Nguyen, Kiet Van Nguyen
Abstract: Textual emotion recognition has been a promising research topic in recent years. Many researchers were trying to build a perfect automated system capable of detecting correct human emotion from text data. In this paper, we conducted several experiments to indicate how the data pre-processing affects a machine learning method on textual emotion recognition. These experiments were performed on the benchmark dataset Vietnamese Social Media Emotion Corpus (UIT-VSMEC). We explored Vietnamese social media characteristics to clean the data, and then we extracted essential phrases that are likely to contain emotional context. Our experimental evaluation shows that with appropriate pre-processing techniques, Multinomial Logistic Regression (MLR) achieves the best F1-score of 64.40%, a significant improvement of 4.66% over the CNN model built by the authors of UIT-VSMEC (59.74%).
摘要:文本情感识别一直是近年来一个有前途的研究课题。许多研究者试图建立能够从文本数据检测正确的人类情感完美的自动化系统。在本文中,我们进行了多次实验表明数据预处理如何影响对文本情感识别机器学习方法。这些实验是对基准进行数据集的越南社会媒体情感语料库(UIT-VSMEC)。我们探讨了越南社会化媒体的特性来清洗数据,然后我们提取的可能含有情感语境必要的短语。我们的实验评价结果显示,适当的预处理技术,多项Logistic回归(MLR),以达到最佳的F1-得分的64.40%,4.66%,比通过UIT-VSMEC的作者建立了CNN模型显著改善(59.74%) 。

14. LA-HCN: Label-based Attention for Hierarchical Multi-label TextClassification Neural Network [PDF] 返回目录
  Xinyi Zhang, Jiahao Xu, Charlie Soh, Lihui Chen
Abstract: Hierarchical multi-label text classification(HMTC) problems become popular recently because of its practicality. Most existing algorithms for HMTC focus on the design of classifiers, and are largely referred to as local, global, or a combination of local/global approaches. However, a few studies have started exploring hierarchical feature extraction based on the label hierarchy associating with text in HMTC. In this paper, a \textbf{N}eural network-based method called \textbf{LA-HCN} is proposed where a novel \textbf{L}abel-based \textbf{A}ttention module is designed to hierarchically extract important information from the text based on different labels. Besides, local and global document embeddings are separately generated to support the respective local and global classifications. In our experiments, LA-HCN achieves the top performance on the four public HMTC datasets when compared with other neural network-based state-of-the-art algorithms. The comparison between LA-HCN with its variants also demonstrates the effectiveness of the proposed label-based attention module as well as the use of the combination of local and global classifications. By visualizing the learned attention(words), we find LA-HCN is able to extract meaningful but different information from text based on different labels which is helpful for human understanding and explanation of classification results.
摘要:分层多标签文本分类(HMTC)的问题,因为它的实用性最近开始流行。对于HMTC注重分类的设计大多数现有的算法,并在很大程度上被称为局部,全局或局部/全局方法的结合。然而,一些研究已经开始探索基于标签的层次结构与HMTC文本关联分层特征提取。在本文中,一个\ textbf {N} eural基于网络的方法称为\ textbf {LA-HCN}提出其中新颖\ textbf {L}阿贝尔基于\ textbf {A} ttention模块被设计为分层提取的重要信息基于不同的标签的文本。此外,局部和全局的嵌入文件分别生成来支持各自的局部和全局分类。在我们的实验中,当与其它基于神经网络的国家的最先进的算法相比LA-HCN实现的四个公共HMTC数据集顶级性能。与它的变体LA-HCN之间的比较也说明了该基于标签的关注模块的有效性以及使用局部和全局分类相结合的。通过可视化学习注意力(的话),我们发现LA-HCN能够提取基于不同标签的文字是人类认识和分类结果的解释有帮助的有意义的,但不同的信息。

15. Controlling Style in Generated Dialogue [PDF] 返回目录
  Eric Michael Smith, Diana Gonzalez-Rico, Emily Dinan, Y-Lan Boureau
Abstract: Open-domain conversation models have become good at generating natural-sounding dialogue, using very large architectures with billions of trainable parameters. The vast training data required to train these architectures aggregates many different styles, tones, and qualities. Using that data to train a single model makes it difficult to use the model as a consistent conversational agent, e.g. with a stable set of persona traits and a typical style of expression. Several architectures affording control mechanisms over generation architectures have been proposed, each with different trade-offs. However, it remains unclear whether their use in dialogue is viable, and what the trade-offs look like with the most recent state-of-the-art conversational architectures. In this work, we adapt three previously proposed controllable generation architectures to open-domain dialogue generation, controlling the style of the generation to match one among about 200 possible styles. We compare their respective performance and tradeoffs, and show how they can be used to provide insights into existing conversational datasets, and generate a varied set of styled conversation replies.
摘要:开放域的会话模型已成为善于产生自然发声对话,使用非常大的架构与数十亿可训练参数。培养这些架构聚集许多不同的风格,色调,和质量所需的庞大的训练数据。使用该数据来训练单个模型使得难以使用模型作为一致的会话代理,例如有一组稳定的角色特质和表达的典型风格。一些架构得到了一代架构的控制机制已经被提出,每一个不同的取舍。但是,目前尚不清楚他们的对话使用是否可行,什么取舍看起来像最近的国家的最先进的对话架构。在这项工作中,我们适应3级先前提出的可控一代架构到开放域对话生成,控制一代的风格,以配合其中约200个可能的款式之一。我们比较它们各自的性能和权衡,并说明它们如何被用来提供洞察到现有的会话数据集,并产生各种不同的风格集谈话的答复。

16. Keeping Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions [PDF] 返回目录
  Ritam Dutt, Rishabh Joshi, Carolyn Penstein Rose
Abstract: The notion of \emph{face} refers to the public self-image of an individual that emerges both from the individual's own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we propose a generalized framework for modeling face acts in persuasion conversations, resulting in a reliable coding manual, an annotated corpus, and computational models. The framework reveals insights about differences in face act utilization between asymmetric roles in persuasion conversations. Using computational models, we are able to successfully identify face acts as well as predict a key conversational outcome (e.g. donation success). Finally, we model a latent representation of the conversational state to analyze the impact of predicted face acts on the probability of a positive conversational outcome and observe several correlations that corroborate previous findings.
摘要:\ {EMPH面对}的概念指的是从个体自身的行动以及与他人的互动都出现了个别的公众自我形象。面对建模和整个谈话了解其状态更改为维护中,并通过互动的基本人类需求的研究是至关重要的。在布朗和列文森(1978)的礼貌理论扎根,我们提出了脸建模广义框架的作用在劝说对话,产生了可靠的编码手动,一个标注语料库和计算模型。该框架揭示了在劝说对话不对称的角色之间的脸上行为利用率差异的见解。利用计算模型,我们能够成功地识别脸部行为以及预测的关键对话的结果(例如捐献成功)。最后,我们模拟会话状态的潜在表现来分析预测面部的影响作用于积极的对话结果的概率和观察证实先前的调查结果几个关系。

17. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [PDF] 返回目录
  Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi
Abstract: Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to assess the quality of data. We introduce "Data Maps"---a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instances during training (training dynamics) for building data maps. This yields two intuitive measures for each example---the model's confidence in the true class, and the variability of this confidence across epochs, in a single run of training. Experiments on four datasets show that these model-dependent measures reveal three distinct regions in the data map, each with pronounced characteristics. First, our data maps show the presence of "ambiguous" regions with respect to the model, which contribute the most towards out-of-distribution generalization. Second, the most populous regions in the data are "easy to learn" for the model, and play an important role in model optimization. Finally, data maps uncover a region with instances that the model finds "hard to learn"; these often correspond to labeling errors. Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
摘要:大数据集已成为NLP研究司空见惯。然而,在数据量增加的重点已经由它具有挑战性的评估数据的质量。我们推出“数据映射” ---一个基于模型的工具,以表征和诊断数据集。我们利用的信息在很大程度上忽视来源:建立数据映射的个别情况下,模型的过程中培训(动态)的行为。这就产生了对每个例如两个直观的措施---模型在真实类信心,跨越时代这种信心的变化,在训练一次运行。在四个数据集实验表明,这些模型依赖的措施,显示在数据映射三个不同的区域,每个区域具有显着的特点。首先,我们的数据地图显示对于“暧昧”的区域的模式,这有助于最朝着外的分布泛化的存在。其次,在数据上人口最多的地区是“简单易学”为模式,并发挥模型优化了重要的作用。最后,数据发现一个区域的地图,实例,该模型的发现“难学”;这些通常对应于标签错误。我们的研究结果表明,在重点数据的质量转变,从数量可能会导致可靠的模型和改进外的分布概括。

18. Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts [PDF] 返回目录
  Ehsan Doostmohammadi, Minoo Nassajian
Abstract: Identification of the languages written using cuneiform symbols is a difficult task due to the lack of resources and the problem of tokenization. The Cuneiform Language Identification task in VarDial 2019 addresses the problem of identifying seven languages and dialects written in cuneiform; Sumerian and six dialects of Akkadian language: Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian. This paper describes the approaches taken by SharifCL team to this problem in VarDial 2019. The best result belongs to an ensemble of Support Vector Machines and a naive Bayes classifier, both working on character-level features, with macro-averaged F1-score of 72.10%.
摘要:用楔形文字符号书面语言的识别是一项艰巨的任务,由于缺乏资源和符号化的问题。楔形文字语言识别任务VarDial 2019个地址识别七种语言和方言写的楔形文字的问题;苏美尔和阿卡德语六种方言:古巴比伦,中东巴比伦外设,标准巴比伦,新巴比伦,巴比伦晚期和新亚述。本文介绍的途径采取SharifCL团队对这个问题在2019年VarDial最好的结果属于支持向量机和朴素贝叶斯分类,在字符级特征的工作的集合,具有宏平均的72.10 F1-得分%。

19. Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification [PDF] 返回目录
  Ehsan Doostmohammadi, Hossein Sameti, Ali Saffar
Abstract: This paper presents the models submitted by Ghmerti team for subtasks A and B of the OffensEval shared task at SemEval 2019. OffensEval addresses the problem of identifying and categorizing offensive language in social media in three subtasks; whether or not a content is offensive (subtask A), whether it is targeted (subtask B) towards an individual, a group, or other entities (subtask C). The proposed approach includes character-level Convolutional Neural Network, word-level Recurrent Neural Network, and some preprocessing. The performance achieved by the proposed model for subtask A is 77.93% macro-averaged F1-score.
摘要:本文提出了一种子任务和OffensEval说明B提交Ghmerti团队模型在SemEval共享任务2019 OffensEval地址识别和三个子任务分类在社交媒体攻击性语言的问题;内容是否为攻击性(子任务A),无论是有针对性的(子任务B)向个人,组或其他实体(子任务C)。所提出的方法包括字符级卷积神经网络,字级回归神经网络,以及一些预处理。通过该模型为子任务下达到的性能是77.93%宏平均F1-得分。

20. On Data Augmentation for Extreme Multi-label Classification [PDF] 返回目录
  Danqing Zhang, Tao Li, Haiyang Zhang, Bing Yin
Abstract: In this paper, we focus on data augmentation for the extreme multi-label classification (XMC) problem. One of the most challenging issues of XMC is the long tail label distribution where even strong models suffer from insufficient supervision. To mitigate such label bias, we propose a simple and effective augmentation framework and a new state-of-the-art classifier. Our augmentation framework takes advantage of the pre-trained GPT-2 model to generate label-invariant perturbations of the input texts to augment the existing training data. As a result, it present substantial improvements over baseline models. Our contributions are two-factored: (1) we introduce a new state-of-the-art classifier that uses label attention with RoBERTa and combine it with our augmentation framework for further improvement; (2) we present a broad study on how effective are different augmentation methods in the XMC task.
摘要:在本文中,我们专注于为极端多标签分类(XMC)问题的数据增强。一个XMC的最具挑战性的问题是长尾标签分发,甚至强大的机型,从监管不足受到影响。为了减轻这种标签的偏见,我们提出了一个简单而有效的增强框架和新的国家的最先进的分类。我们的增强框架采用预先训练GPT-2模式的优势,产生输入文本标签不变的扰动增强现有的训练数据。其结果是,它提出了基线模型实质性的改进。我们的贡献是两个因素:(1)我们引入了一个新的国家的最先进的分类器使用标签注意力罗伯塔并与我们进一步改进增强框架结合起来; (2)我们提出关于如何有效的XMC任务不同的隆胸方法的广泛研究。

21. Lifelong Learning Dialogue Systems: Chatbots that Self-Learn On the Job [PDF] 返回目录
  Bing Liu, Sahisnu Mazumder
Abstract: Dialogue systems, also called chatbots, are now used in a wide range of applications. However, they still have some major weaknesses. One key weakness is that they are typically trained from manually-labeled data and/or written with handcrafted rules, and their knowledge bases (KBs) are also compiled by human experts. Due to the huge amount of manual effort involved, they are difficult to scale and also tend to produce many errors ought to their limited ability to understand natural language and the limited knowledge in their KBs. Thus, the level of user satisfactory is often low. In this paper, we propose to dramatically improve this situation by endowing the system the ability to continually learn (1) new world knowledge, (2) new language expressions to ground them to actions, and (3) new conversational skills, during conversation or "on the job" by themselves so that as the systems chat more and more with users, they become more and more knowledgeable and are better and better able to understand diverse natural language expressions and improve their conversational skills. A key approach to achieving these is to exploit the multi-user environment of such systems to self-learn through interactions with users via verb and non-verb means. The paper discusses not only key challenges and promising directions to learn from users during conversation but also how to ensure the correctness of the learned knowledge.
摘要:对话系统,也称为聊天机器人,在广泛的应用,现在使用。然而,他们仍然有一些重大的缺陷。一个关键的弱点是,它们通常是由手动标记的数据训练和/或手工制作的规则写的,他们的知识基础(KBS)也被人类专家编制。由于涉及人工劳动量巨大,他们是难以形成规模,也往往会产生许多错误,应该其有限的理解自然语言并在他们的知识库系统的有限知识的能力。因此,用户满意的水平往往较低。在本文中,我们建议赋予系统不断学习(1)新的世界知识的能力显着改善这种状况,(2)新的语言表达他们地面的行动,和(3)新的会话技能,通话过程中或“在职”本身,这样,当系统聊天越来越多的用户欢迎,他们变得越来越有知识和更好,能够更好地理解多样的自然语言表达,提高他们的谈话技巧。通过与通过动词和非谓语动词意味着用户互动的重要途径,以实现这些是利用这样的系统中的多用户环境中自我学习。本文讨论不仅主要挑战,并看好方向的谈话又是如何保证所学的知识的正确性时,从用户学习。

22. X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers [PDF] 返回目录
  Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
Abstract: Mirroring the success of masked language models, vision-and-language counterparts like ViLBERT, LXMERT and UNITER have achieved state of the art performance on a variety of multimodal discriminative tasks like visual question answering and visual grounding. Recent work has also successfully adapted such models towards the generative task of image captioning. This begs the question: Can these models go the other way and generate images from pieces of text? Our analysis of a popular representative from this model family LXMERT - finds that it is unable to generate rich and semantically meaningful imagery with its current training setup. We introduce X-LXMERT, an extension to LXMERT with training refinements including: discretizing visual representations, using uniform masking with a large range of masking ratios and aligning the right pre-training datasets to the right objectives which enables it to paint. X-LXMERT's image generation capabilities rival state of the art generative models while its question answering and captioning abilities remains comparable to LXMERT. Finally, we demonstrate the generality of these training refinements by adding image generation capabilities into UNITER to produce X-UNITER.
摘要:镜像掩盖语言模型的成功,视力和语言的同行一样ViLBERT,LXMERT和UNITER已经取得的先进的性能在各种般的视觉答疑和视觉接地多歧视性的任务。最近的研究还成功地适应对图像字幕的生成任务这样的模型。这引出了一个问题:可以在这些车型走另一条路,并从文本块生成的图像?我们从这个模型家庭LXMERT一个人民代表的分析 - 发现它无法产生与目前的培训设置丰富和语义上有意义的图像。我们引入X-LXMERT,扩展到LXMERT与训练精炼包括:离散视觉表示,使用统一的掩蔽用大范围的掩蔽比的和对准的合适的预训练数据集到右侧目标使其能够绘制。 X-LXMERT的图像生成能力的技术生成模型的对手状态,而它的答疑和字幕的能力仍然相当LXMERT。最后,我们通过将图像生成功能集成到UNITER产生X-UNITER证明这些训练改进的通用性。

23. Cosine Similarity of Multimodal Content Vectors for TV Programmes [PDF] 返回目录
  Saba Nazir. Taner Cagali, Chris Newell, Mehrnoosh Sadrzadeh
Abstract: Multimodal information originates from a variety of sources: audiovisual files, textual descriptions, and metadata. We show how one can represent the content encoded by each individual source using vectors, how to combine the vectors via middle and late fusion techniques, and how to compute the semantic similarities between the contents. Our vectorial representations are built from spectral features and Bags of Audio Words, for audio, LSI topics and Doc2vec embeddings for subtitles, and the categorical features, for metadata. We implement our model on a dataset of BBC TV programmes and evaluate the fused representations to provide recommendations. The late fused similarity matrices significantly improve the precision and diversity of recommendations.
摘要:视听文件,文本描述和元数据:从各种来源的信息,多式联运起源。我们显示如何可以表示使用载体,通过怎样中,晚融合技术,以及如何计算内容之间的语义相似性向量合并由每个单独的源编码的内容。我们的向量表示从光谱特征和音频字袋,音频,LSI主题和Doc2vec的嵌入字幕和分类功能,为元数据构建。我们实现我们在英国广播公司的电视节目的数据集模型,并评估融合交涉提供建议。已故的融合相似矩阵显著改善建议的精确度和多样性。

24. Text Classification with Novelty Detection [PDF] 返回目录
  Qi Qin, Wenpeng Hu, Bing Liu
Abstract: This paper studies the problem of detecting novel or unexpected instances in text classification. In traditional text classification, the classes appeared in testing must have been seen in training. However, in many applications, this is not the case because in testing, we may see unexpected instances that are not from any of the training classes. In this paper, we propose a significantly more effective approach that converts the original problem to a pair-wise matching problem and then outputs how probable two instances belong to the same class. Under this approach, we present two models. The more effective model uses two embedding matrices of a pair of instances as two channels of a CNN. The output probabilities from such pairs are used to judge whether a test instance is from a seen class or is novel/unexpected. Experimental results show that the proposed method substantially outperforms the state-of-the-art baselines.
摘要:本文研究的文本分类检测新的或意外情况下的问题。在传统文本分类,类别出现在测试必须在训练中被看见。然而,在许多应用中,这种情况并非如此,因为在测试中,我们可以看到,没有从任何培训班的意外情况。在本文中,我们提出了一个显著更有效的方法,其将原来的问题成对匹配问题,然后输出两个实例多大可能属于同一类。在这种方式下,我们提出两种车型。更有效的模型使用一对实例为CNN的两个信道的2点嵌入矩阵。从这样的双输出概率被用来判断一个测试实例是否来自看出类或新颖/出乎意料的。实验结果表明,所提出的方法基本上优于国家的最先进的基线。

25. Message Passing for Hyper-Relational Knowledge Graphs [PDF] 返回目录
  Mikhail Galkin, Priyansh Trivedi, Gaurav Maheshwari, Ricardo Usbeck, Jens Lehmann
Abstract: Hyper-relational knowledge graphs (KGs) (e.g., Wikidata) enable associating additional key-value pairs along with the main triple to disambiguate, or restrict the validity of a fact. In this work, we propose a message passing based graph encoder - StarE capable of modeling such hyper-relational KGs. Unlike existing approaches, StarE can encode an arbitrary number of additional information (qualifiers) along with the main triple while keeping the semantic roles of qualifiers and triples intact. We also demonstrate that existing benchmarks for evaluating link prediction (LP) performance on hyper-relational KGs suffer from fundamental flaws and thus develop a new Wikidata-based dataset - WD50K. Our experiments demonstrate that StarE based LP model outperforms existing approaches across multiple benchmarks. We also confirm that leveraging qualifiers is vital for link prediction with gains up to 25 MRR points compared to triple-based representations.
摘要:超关系知识图(KGS)(例如,维基数据)使与主三重沿消除歧义相关联附加的键 - 值对,或限制的事实的有效性。在这项工作中,我们提出了一个消息传递基于图形编码器 - 瞪眼能够模拟这种超关系幼稚园。不像现有的方法,凝视可以在保持限定符和三倍的语义角色完整编码的与主三重沿附加信息(限定符)的任意数。我们还表明,对于在Hyper-关系幼稚园评估链接预测(LP)的性能基准测试存在从根本缺陷受苦,从而开发出新的基于维基数据 - 数据集 - WD50K。我们的实验表明,基于盯LP模型优于现有的在多个基准测试方法。我们还证实,利用预选赛是链接预测与关键涨幅高达25个MRR点相比,基于三表示。

注:中文为机器翻译结果!封面为论文标题词云图!