摘要

1. Multi-source Attention for Unsupervised Domain Adaptation [PDF] 返回目录
Xia Cui, Danushka Bollegala
Abstract: Domain adaptation considers the problem of generalising a model learnt using data from a particular source domain to a different target domain. Often it is difficult to find a suitable single source to adapt from, and one must consider multiple sources. Using an unrelated source can result in sub-optimal performance, known as the \emph{negative transfer}. However, it is challenging to select the appropriate source(s) for classifying a given target instance in multi-source unsupervised domain adaptation (UDA). We model source-selection as an attention-learning problem, where we learn attention over sources for a given target instance. For this purpose, we first independently learn source-specific classification models, and a relatedness map between sources and target domains using pseudo-labelled target domain instances. Next, we learn attention-weights over the sources for aggregating the predictions of the source-specific models. Experimental results on cross-domain sentiment classification benchmarks show that the proposed method outperforms prior proposals in multi-source UDA.
摘要：领域适应性考虑使用从一个特定的源域到不同的目标域数据要概括模型的问题教训。往往是很难找到一个合适的单一来源，从去适应环境，必须考虑多种来源。使用不相关的来源可能会导致次优的性能，被称为\ EMPH {负转印}。然而，它是具有挑战性的选择用于多源无监督域适配（UDA）一个给定的目标实例进行分类的适当的源（S）。我们源选择模型作为关注学习的问题，在这里我们学会了重视来源为给定的目标实例。为此，我们首先学习独立源特定的分类模型，并采用伪标记的目标域实例源和目标域之间的关联图。接下来，我们学会关注权重超过源聚集的特定源模型的预测。跨域情感分类基准实验结果表明，所提出的方法优于在多源UDA的现有方案。

2. Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity [PDF] 返回目录
Hamza Harkous, Isabel Groves, Amir Saffari
Abstract: End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and the target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each of our components is learnt end-to-end without the need for dataset-specific heuristics, entity delexicalization, or post-processing. We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with a fluency assessed by human annotators nearing or exceeding the human-written reference texts. We further demonstrate that the model-based semantic fidelity scorer in DataTuner is a better assessment tool compared to traditional, heuristic-based measures. Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets
摘要：最终到终端的神经数据到文本（D2T）代近来已成为替代基于管线架构。然而，它在推广新的领域，并产生语义一致的文本所面临的挑战。在这项工作中，我们目前DataTuner，神经，终端到终端的数据对文本生成系统，使有关数据表示和目标域最小的假设。我们采取两个阶段的一代，重新排名的做法，结合了微调语言模型与语义保真度分类。我们的每一个部件的了解端至端无需数据集特有的启发，实体词语化，或后处理。我们表明，DataTuner实现横跨四个主要D2T数据集（LDC2017T10，WebNLG，维果，并清洗E2E）自动化度量的技术结果的状态下，与由人类注释接近或超过人类写入参考文本评估了流畅性。我们进一步表明，相比于传统的，启发式措施DataTuner基于模型的语义保真得分手是一个更好的评估工具。我们生成的文本已经遍及所有四个数据集一个显著更好的语义保真度比现有技术状态

3. Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders [PDF] 返回目录
Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe
Abstract: State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific encoder-decoders, and can thus be more easily extended to new languages by learning their corresponding modules. So as to encourage a common interlingua representation, we simultaneously train the N initial languages. Our experiments show that the proposed approach outperforms the universal encoder-decoder by 3.28 BLEU points on average, and when adding new languages, without the need to retrain the rest of the modules. All in all, our work closes the gap between shared and language-specific encoder-decoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.
摘要：国家的最先进的多语言机器翻译依赖于一个通用的编码器，解码器，它需要再培训的整个系统添加新的语言。在本文中，我们提出了基于特定语言的编码器，解码器的替代方法，从而可以通过学习他们的相应模块更容易地扩展到新的语言。以鼓励共同国际语表示，我们同时训练N个初始语言。我们的实验表明，该方法由3.28 BLEU点优于通用的编码解码器的平均值，并增加新的语言时，无需重新培训模块的其余部分。总而言之，我们的工作将关闭共享和特定语言的编码解码器之间的间隙，向能在终身学习的设置可以灵活扩展的模块化多语言机器翻译系统推进。

4. Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and Corpus-based Semantic Representation for Biomedical Concepts [PDF] 返回目录
Steven Jiang, Weiyi Wu, Naofumi Tomita, Craig Ganoe, Saeed Hassanpour
Abstract: Objective: Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that a concept can be referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework for incorporating domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text. Materials and Methods: We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the Sigmoid cross-entropy objective function. Results and Discussion: We evaluate the quality of the generated word embeddings using two established datasets of semantic similarities among biomedical concept pairs. On the first dataset with 29 concept pairs, with the similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model and 12.4% higher than that of the best ontology-based similarity measure.On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, with the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1% and the best ontology measure by 6.9%.
摘要：目的：目前，自然语言处理的主要限制（NLP）的临床应用分析是，一个概念可以跨越不同的文本各种形式引用。本文介绍了多本体精制曲面嵌入（MORE），用于将来自多个本体领域知识成分布式语义模型，从临床文本集学到了新颖的混合框架。材料与方法：我们使用RadCore并模仿-III自由文本数据集为更多基于语料库的组成部分。对于基于本体的部分，我们使用的医学主题词（目）本体和三个国家的最先进的基于本体的相似性措施。在我们的方法，我们提出了一个新的学习目标，从乙状结肠交叉熵的目标函数修改。结果和讨论：我们评估使用生物医药概念对之间的语义相似的两个已建立的数据集生成的Word的嵌入的质量。在第一数据集29的概念对与由医生和医疗编码确定的相似性得分，更多的相似性得分具有最高组合的相关性（0.633），这是比基准模型和12.4％比的更高的较高的5.0％最好基于本体的相似度measure.On第二数据集与449概念对，更的相似性得分具有0.481的相关性，平均四个医疗居民相似性评估，并且优于跳过gram模型8.1％和6.9％，最好的本体措施。

5. What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models [PDF] 返回目录
Wietse de Vries, Andreas van Cranenburgh, Malvina Nissim
Abstract: Experiments with transfer learning on pre-trained language models such as BERT have shown that the layers of these models resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers of the network. We investigate to what extent these results also hold for a language other than English. For this we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, by considering the task of part-of-speech tagging in more detail, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations and it is therefore useful to combine information from different layers for best results, instead of selecting a single layer based on the best overall performance.
摘要：实验与预训练的语言模型，如BERT迁移学习证明，这些模型的层类似于经典的NLP管道，与集中在网络的后面层面逐渐更复杂的任务。我们调查到什么程度，这些结果也持有除英语以外的语言。为此，我们探索基于BERT荷模型和荷兰NLP任务的多语种BERT模式。此外，考虑部分的语音更详细地标记的任务，我们表明，也给定的任务中，信息分布在网络的不同部分和管道可能不会像整齐，因为它似乎。每层具有不同的特化并且因此有用的信息，从不同的层以获得最佳结果组合而不是选择基于最佳的整体性能的单层，。

6. Two halves of a meaningful text are statistically different [PDF] 返回目录
Weibing Deng, R. Xie, S. Deng, Armen E. Allahverdyan
Abstract: Which statistical features distinguish a meaningful text (possibly written in an unknown system) from a meaningless set of symbols? Here we answer this question by comparing features of the first half of a text to its second half. This comparison can uncover hidden effects, because the halves have the same values of many parameters (style, genre {\it etc}). We found that the first half has more different words and more rare words than the second half. Also, words in the first half are distributed less homogeneously over the text in the sense of of the difference between the frequency and the inverse spatial period. These differences hold for the significant majority of several hundred relatively short texts we studied. The statistical significance is confirmed via the Wilcoxon test. Differences disappear after random permutation of words that destroys the linear structure of the text. The differences reveal a temporal asymmetry in meaningful texts, which is confirmed by showing that texts are much better compressible in their natural way (i.e. along the narrative) than in the word-inverted form. We conjecture that these results connect the semantic organization of a text (defined by the flow of its narrative) to its statistical features.
摘要：统计特征区分毫无意义的符号集合有意义的文本（可能是写在一个未知的系统）？在这里，我们通过比较文本的前半部分的功能，其下半年回答这个问题。这种比较可以发现隐藏的效果，因为半有很多参数（风格，流派{\它等}）的值相同。我们发现，上半年有更多不同的字，比下半年多生僻字。此外，在第一半字在以上的频率和逆空间周期之间的差的意义上的文本分布少均匀。这些差异保持了显著大多数我们研究了几百个相对的短文本。统计学显着性通过Wilcoxon检验证实。差异话的随机排列，破坏文本的线性结构后消失。差异揭示有意义的文本时间的不对称，这是表明文本是在其自然的方式更好的可压缩确认（即沿叙述）比字反转形态。我们推测，这些结果连一个文本（通过其叙事的流定义）的统计特征的语义组织。

7. Query-Variant Advertisement Text Generation with Association Knowledge [PDF] 返回目录
Siyu Duan, Wei Li, Cai Jing, Yancheng He, Yunfang Wu, Xu Sun
Abstract: Advertising is an important revenue source for many companies. However, it is expensive to manually create advertisements that meet the needs of various queries for massive items. In this paper, we propose the query-variant advertisement text generation task that aims to generate candidate advertisements for different queries with various needs given the item keywords. In this task, for many different queries there is only one general purposed advertisement with no predefined query-advertisement pair, which would discourage traditional End-to-End models from generating query-variant advertisements for different queries with different needs. To deal with the problem, we propose a query-variant advertisement text generation model that takes keywords and associated external knowledge as input during training and adds different queries during inference. Adding external knowledge helps the model adapted to the information besides the item keywords during training, which makes the transition between training and inference more smoothing when the query is added during inference. Both automatic and human evaluation show that our model can generate more attractive and query-focused advertisements than the strong baselines.
摘要：广告是许多企业的重要收入来源。但是，它是昂贵的手动创建满足大规模项目各种查询需求的广告。在本文中，我们提出了查询变广告文本生成任务，目的是生成与给定的项目的关键字各种需求不同的查询侯选广告。在此任务中，许多不同的查询，只有一个通用数字广告没有预定义的查询，广告对，这将产生从查询变广告为具有不同需求的不同的查询劝阻传统的端至中高端车型。为了解决这个问题，我们提出了一个查询变广告文本生成模型训练过程中所采用的关键字和相关外部知识的输入和推理过程中添加不同的查询。添加外部知识有助于训练中适应，除了该项目的关键字的信息模型，这使得培训和推理更平滑，当推理过程中加入查询之间的过渡。自动和人工评估表明，我们的模型可以生成更具吸引力和查询为重点的广告都比较强基线。

8. Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks [PDF] 返回目录
Shu Liu, Wei Li, Yunfang Wu, Qi Su, Xu Sun
Abstract: Target-Based Sentiment Analysis aims to detect the opinion aspects (aspect extraction) and the sentiment polarities (sentiment detection) towards them. Both the previous pipeline and integrated methods fail to precisely model the innate connection between these two objectives. In this paper, we propose a novel dynamic heterogeneous graph to jointly model the two objectives in an explicit way. Both the ordinary words and sentiment labels are treated as nodes in the heterogeneous graph, so that the aspect words can interact with the sentiment information. The graph is initialized with multiple types of dependencies, and dynamically modified during real-time prediction. Experiments on the benchmark datasets show that our model outperforms the state-of-the-art models. Further analysis demonstrates that our model obtains significant performance gain on the challenging instances under multiple-opinion aspects and no-opinion aspect situations.
摘要：基于目标情感分析的目的是检测舆论方面（方面提取）和情感极性（情绪检测）对他们。无论是以前的管道和集成方法不能这两个目标之间的内在联系精确地模拟。在本文中，我们提出了一种新的动态异构图形，共同以明确的方式这两个目标进行建模。无论是普通单词和情绪的标签都被视为在非均相图中的节点，以使得方面的话可以与情绪信息交互。该图被初始化与多个类型的依赖关系，以及实时预测期间动态地修改。基准的数据集实验表明，我们的模型优于国家的最先进的车型。进一步的分析表明，我们的模型获得关于下多舆论方面没有意见方面情况的具有挑战性的情况下显著的性能增益。

9. Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text [PDF] 返回目录
Shengbin Jia, Ling Ding, Xiaojun Chen, Yang Xiang
Abstract: Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging ambiguous information of word segmentation. Such uncertain information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., candidate position embedding -> position selective attention -> adaptive word convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experiments results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of more than 2% over previous state-of-the-art methods.
摘要：中国的分词是必要的，为中国命名实体识别（NER）系统提供了字级的信息。然而，分割错误传播是中国NER是一个挑战，同时处理，如社会媒体的文字口语化的数据。在本文中，我们提出了专业鉴定机构从中国社交媒体的文字，特别是利用分词暧昧信息模型（UIcwsNN）。这种不确定信息包含所有提供的模型来推断深字级特征的通道句子的可能的分割状态。我们提出了一个三部曲（即，候选位置嵌入 - >位置选择性注意 - >自适应字卷积）编码不确定单词分割信息和获取适当的字级表示。在社交媒体全集实验结果表明，我们的模型减轻了分割误差有效级联故障，并实现超过2％，比前一国家的最先进的方法显著的性能提升。

10. Speech Translation and the End-to-End Promise: Taking Stock of Where We Are [PDF] 返回目录
Matthias Sperber, Matthias Paulik
Abstract: Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring questions of tight coupling, and finally to end-to-end models that have recently attracted much attention. This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the speech recognizer, and from training cascaded models separately towards different objectives. Recent end-to-end modeling techniques promise a principled way of overcoming these issues by allowing joint training of all model components and removing the need for explicit intermediate representations. However, a closer look reveals that many end-to-end models fall short of solving these issues, due to compromises made to address data scarcity. This paper provides a unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions.
摘要：在它的三个十年的历史，语音翻译经历了它的主要研究课题多个班次;从语音识别，机器翻译等松耦合级联移动，探索紧密耦合的问题，终于到最近备受关注的端至中高端车型。本文提供的这些发展一个简短的调查，与该犯于声音识别中间表示干的传统方法的主要挑战的讨论一起，并分别从训练级联模型对不同的目标。最近的终端到终端的建模技术，它允许将所有模型组件的联合培训和消除对明确中间表示需要克服这些问题的原则性方法。然而，仔细观察就会发现，许多终端到高端机型短解决这些问题，都属于由于对地址数据匮乏做出妥协。本文提供了一个统一的分类和命名，涵盖传统和近来的方案，并通过强调两者的权衡和开放的研究问题，可以帮助研究人员。

11. Code Completion using Neural Attention and Byte Pair Encoding [PDF] 返回目录
Youri Arkesteijn, Nikhil Saldanha, Bastijn Kostense
Abstract: In this paper, we aim to do code completion based on implementing a Neural Network from Li et. al.. Our contribution is that we use an encoding that is in-between character and word encoding called Byte Pair Encoding (BPE). We use this on the source code files treating them as natural text without first going through the abstract syntax tree (AST). We have implemented two models: an attention-enhanced LSTM and a pointer network, where the pointer network was originally introduced to solve out of vocabulary problems. We are interested to see if BPE can replace the need for the pointer network for code completion.
摘要：在本文中，我们的目标是做基于来自Li等实现神经网络的代码完成。人。我们的贡献是，我们使用的编码与在两者之间的性格和单词了编码称为字节对编码（BPE）。我们使用这个对源代码文件将它们视为自然文本，而不通过抽象语法树（AST）首先进入。我们已经实现了两个模型：一个关注增强LSTM和指针网络，在指针网络最初推出解决词汇问题了。大家有兴趣，看看是否BPE可以取代指针网络代码完成的需要。

12. Quantifying Community Characteristics of Maternal Mortality Using Social Media [PDF] 返回目录
Rediet Abebe, Salvatore Giorgi, Anna Tedijanto, Anneke Buffone, H. Andrew Schwartz
Abstract: While most mortality rates have decreased in the US, maternal mortality has increased and is among the highest of any OECD nation. Extensive public health research is ongoing to better understand the characteristics of communities with relatively high or low rates. In this work, we explore the role that social media language can play in providing insights into such community characteristics. Analyzing pregnancy-related tweets generated in US counties, we reveal a diverse set of latent topics including Morning Sickness, Celebrity Pregnancies, and Abortion Rights. We find that rates of mentioning these topics on Twitter predicts maternal mortality rates with higher accuracy than standard socioeconomic and risk variables such as income, race, and access to health-care, holding even after reducing the analysis to six topics chosen for their interpretability and connections to known risk factors. We then investigate psychological dimensions of community language, finding the use of less trustful, more stressed, and more negative affective language is significantly associated with higher mortality rates, while trust and negative affect also explain a significant portion of racial disparities in maternal mortality. We discuss the potential for these insights to inform actionable health interventions at the community-level.
摘要：虽然大多数的死亡率在美国已经下降，产妇死亡率有所增加，最高任何经合组织国家中。广泛的公共卫生研究正在进行中具有相对高或低利率，以便更好地了解社区的特点。在这项工作中，我们探讨社交媒体语言可以提供见解等群落特征发挥的作用。在分析美国县发生与妊娠有关的微博，我们揭示多样化的潜在主题，包括孕妇晨吐，名人怀孕，和堕胎的权利。我们发现在Twitter上提到这些话题，利率预测孕产妇死亡率比标准的社会经济和风险变量如收入，种族，以及获得医疗保健更高的精确度，保持甚至降低了分析，挑选适合自己的解释性6个主题和后连接已知的危险因素。然后，我们调查的社区语言心理层面，发现使用较少的信任，更强调和更多的负面情感语言显著有较高的死亡率相关，而信任和负面影响也解释了种族差异的产妇死亡率显著部分。我们讨论的这些见解，以在社区一级告知可操作的健康干预措施的可能性。

13. Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus [PDF] 返回目录
Hao Fei, Meishan Zhang, Donghong Ji
Abstract: Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.
摘要：研究了很多努力都致力于语义角色标注（SRL），这是自然语言理解的关键。监督的方法都取得了令人印象深刻的表演，当大规模语料库可供资源丰富的语言，如英语。虽然没有注释SRL数据集的低资源语言，它仍然是具有挑战性获得竞争演出。跨语种SRL是一个有前途的解决这个问题，这与模型转移和注释投影的帮助下取得了巨大的进步的方式。在本文中，我们提出了基于语料库进行翻译替代方案，构建高质量的训练数据从源金标准SRL注释的目标语言。在通用命题银行实验结果表明，基于平移的方法是非常有效的，和自动伪数据集可以显著提高目标语言SRL表演。

14. Cascade Neural Ensemble for Identifying Scientifically Sound Articles [PDF] 返回目录
Ashwin Karthik Ambalavanan, Murthy Devarakonda
Abstract: Background: A significant barrier to conducting systematic reviews and meta-analysis is efficiently finding scientifically sound relevant articles. Typically, less than 1% of articles match this requirement which leads to a highly imbalanced task. Although feature-engineered and early neural networks models were studied for this task, there is an opportunity to improve the results. Methods: We framed the problem of filtering articles as a classification task, and trained and tested several ensemble architectures of SciBERT, a variant of BERT pre-trained on scientific articles, on a manually annotated dataset of about 50K articles from MEDLINE. Since scientifically sound articles are identified through a multi-step process we proposed a novel cascade ensemble analogous to the selection process. We compared the performance of the cascade ensemble with a single integrated model and other types of ensembles as well as with results from previous studies. Results: The cascade ensemble architecture achieved 0.7505 F measure, an impressive 49.1% error rate reduction, compared to a CNN model that was previously proposed and evaluated on a selected subset of the 50K articles. On the full dataset, the cascade ensemble achieved 0.7639 F measure, resulting in an error rate reduction of 19.7% compared to the best performance reported in a previous study that used the full dataset. Conclusion: Pre-trained contextual encoder neural networks (e.g. SciBERT) perform better than the models studied previously and manually created search filters in filtering for scientifically sound relevant articles. The superior performance achieved by the cascade ensemble is a significant result that generalizes beyond this task and the dataset, and is analogous to query optimization in IR and databases.
摘要：背景：显著障碍进行系统评价和荟萃分析发现有效的科学健全相关的文章。通常情况下，文章不到1％符合这一要求导致高度不平衡的任务。虽然功能设计和早期神经网络模型，研究了这个任务，有改善效果的机会。方法：我们陷害过滤文章作为分类任务的问题，训练和测试SciBERT的几个合奏架构，BERT的变体预先训练的科学文章，就从MEDLINE约50K的文章手动注释的数据集。由于科学合理的物品都是通过一个多步骤的过程中发现，我们提出了一个新颖的级联合奏类似于选择过程。我们比较了级联合奏与以及与以前的研究结果歌舞团的一个综合模型和其他类型的性能。结果：级联总体结构实现0.7505 F值，令人印象深刻的49.1％的错误率减少，相比于先前提出并评价对50K物品的选定子集的CNN模型。在完整数据集，级联合奏达到0.7639 F值，从而导致错误率降低了19.7％，比最好的性能在先前的研究中所使用的全部数据集的报道。结论：预先训练上下文编码的神经网络（例如SciBERT）的过滤，为科学健全的有关条款执行比以前研究的模式和手动创建的搜索过滤器更好。通过级联合奏取得的卓越性能是显著结果超出了这个任务和数据集，并概括为类似于IR和数据库查询优化。

15. Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction [PDF] 返回目录
Hong Guan, Jianfu Li, Hua Xu, Murthy Devarakonda
Abstract: Background: Identifying relationships between clinical events and temporal expressions is a key challenge in meaningfully analyzing clinical text for use in advanced AI applications. While previous studies exist, the state-of-the-art performance has significant room for improvement. Methods: We studied several variants of BERT (Bidirectional Encoder Representations using Transformers) some involving clinical domain customization and the others involving improved architecture and/or training strategies. We evaluated these methods using a direct temporal relations dataset which is a semantically focused subset of the 2012 i2b2 temporal relations challenge dataset. Results: Our results show that RoBERTa, which employs better pre-training strategies including using 10x larger corpus, has improved overall F measure by 0.0864 absolute score (on the 1.00 scale) and thus reducing the error rate by 24% relative to the previous state-of-the-art performance achieved with an SVM (support vector machine) model. Conclusion: Modern contextual language modeling neural networks, pre-trained on a large corpus, achieve impressive performance even on highly-nuanced clinical temporal relation tasks.
摘要：背景：确定临床事件和时间的表达之间的关系是有意义分析的高级人工智能应用临床文本的主要挑战。虽然有以往的研究，国家的最先进的性能有改进的余地显著。方法：我们研究（使用变压器双向编码表示）BERT的几个变种一些涉及临床领域的定制和其他涉及改进结构和/或培训策略。我们使用直接时序关系的数据集是2012年i2b2时间关系挑战数据集的语义集中的子集，评估这些方法。结果：我们的结果表明，罗伯塔，其采用更好预培训策略包括使用10级大的语料库，已经由0.0864绝对得分改善的总体F值（在1.00刻度），并且因此减少由相对24％的错误率到先前的状态-of先进性能与SVM（支持向量机）的模式来实现。结论：现代语境语言模型的神经网络，在大语料库预先训练，实现即使在高度细致入微的临床时间关系任务的骄人业绩。

16. Reverse Engineering Configurations of Neural Text Generation Models [PDF] 返回目录
Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins
Abstract: This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. Previously, the extent and degree to which these artifacts surface in generated text has not been well studied. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated a piece of text, and we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-$k$ probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by observing the generated text alone. This suggests that neural text generators may be more sensitive to various modeling choices than previously thought.
摘要：本文寻求发展的神经文代车型的基本特性有更深的了解。在机器生成的文本出现的造型选择的结果文物的研究是一个新兴的研究领域。以前，为了将这些伪像在生成的文本表面范围和程度尚未得到很好的研究。在更好地理解生成的文本模式及其文物的精神，我们提出了区分的新任务，产生一段文字给定模型的几个变种，和我们进行广泛的诊断测试套件，以观察是否造型的选择（例如，，取样方法，顶$ $ķ概率模型架构等）留在它们生成的文本检测的工件。我们的重要发现，这是一套严格的实验支持，是这种假象存在，且不同的造型选择，可以单独观察生成的文本来推断。这表明，神经文本发电机可能是比以前认为的各种造型的选择更加敏感。

17. A Divide-and-Conquer Approach to the Summarization of Academic Articles [PDF] 返回目录
Alexios Gidiotis, Grigorios Tsoumakas
Abstract: We present a novel divide-and-conquer method for the summarization of long documents. Our method processes the input in parts and generates a corresponding summary. These partial summaries are then combined in order to produce a final complete summary. Splitting the problem of long document summarization into smaller and simpler problems, reduces the computational complexity of the summarization process and leads to more training examples that at the same time contain less noise in the target summaries compared to the standard approach of producing the whole summary at once. Using a fairly simple sequence to sequence architecture with a combination of LSTM units and Rotational Units of Memory (RUM) our approach leads to state-of-the-art results in two publicly available datasets of academic articles.
摘要：本文提出了一种新颖的分而治之的方法对长文档的摘要。我们的方法处理在部分输入并产生相应的摘要。这些部分的摘要，然后以产生最终的完整的摘要结合起来。分割长文档文摘的问题分解成更小和更简单的问题，降低了精简处理的计算复杂度，并导致更多的训练例子是在同一时间在目标摘要含有较少的噪音相比，在生产全总结的标准方法一旦。使用相当简单的序列顺序架构的LSTM单元和存储器的旋转单元（RUM）我们的做法导致国家的最先进成果在学术文章中两个公开可用的数据集的组合。

18. PoKi: A Large Dataset of Poems by Children [PDF] 返回目录
Will E. Hipson, Saif M. Mohammad
Abstract: Child language studies are crucial in improving our understanding of child well-being; especially in determining the factors that impact happiness, the sources of anxiety, techniques of emotion regulation, and the mechanisms to cope with stress. However, much of this research is stymied by the lack of availability of large child-written texts. We present a new corpus of child-written text, PoKi, which includes about 62 thousand poems written by children from grades 1 to 12. PoKi is especially useful in studying child language because it comes with information about the age of the child authors (their grade). We analyze the words in PoKi along several emotion dimensions (valence, arousal, dominance) and discrete emotions (anger, fear, sadness, joy). We use non-parametric regressions to model developmental differences from early childhood to late-adolescence. Results show decreases in valence that are especially pronounced during mid-adolescence, while arousal and dominance peaked during adolescence. Gender differences in the developmental trajectory of emotions are also observed. Our results support and extend the current state of emotion development research.
摘要：儿童语言学习是提高我们的孩子的理解福祉至关重要;尤其是在确定影响幸福感，焦虑的来源，情绪调节技术，以及机制因素，以应付压力。然而，这类研究的大部分是由缺乏大的孩子，书面文本可用性阻碍。我们提出孩子写文本的语料库新，百吉，其中包括大约从1年级写的儿童到12百吉6.2万诗是研究儿童语言特别有用，因为它带有对孩子作者的年龄信息（其年级）。我们分析百吉的话沿着几个情感维度（价，觉醒，优势）和离散情绪（愤怒，恐惧，悲伤，快乐）。我们使用非参数回归从儿童早期发育的差异模型，后期青春期。结果表明，在青春期中期时尤为明显，而唤醒度和优势在青春期达到高峰价下降。在情感的发展轨迹的性别差异也被观察到。我们的研究结果支持和扩展情感发展研究的当前状态。

19. AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization [PDF] 返回目录
Keping Bi, Rahul Jha, W. Bruce Croft, Asli Celikyilmaz
Abstract: Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, not well-understood if the gain is due to better encoding techniques or better redundancy reduction approaches. Similarly, the contribution of salience versus diversity components on the created summary is not studied well. Building on the state-of-the-art encoding methods for summarization, we present two adaptive learning models: AREDSUM-SEQ that jointly considers salience and novelty during sentence selection; and a two-step AREDSUM-CTX that scores salience first, then learns to balance salience and redundancy, enabling the measurement of the impact of each aspect. Empirical results on CNN/DailyMail and NYT50 datasets show that by modeling diversity explicitly in a separate step, AREDSUM-CTX achieves significantly better performance than AREDSUM-SEQ as well as state-of-the-art extractive summarization baselines.
摘要：冗余感知采掘摘要系统得分与他们的显着性的信息或单独作为一个额外的句子打分的步骤包括在总结句子的冗余即联合。先前的研究表明的共同打分，并选择具有神经序列代车型的句子的疗效。它，然而，不是由于更好的编码技术或更好的冗余减少接近充分理解如果增益。同样，突显对在创建总结多样性组成部分的贡献不研究清楚。建立在国家的最先进的编码方法进行汇总，我们目前的两个自适应学习模式：AREDSUM-SEQ是共同认为选择句子中显着性和新颖性;和两步AREDSUM-CTX该分数显着性，然后再学会平衡显着性和冗余度，使得各个方面的影响的测量。在CNN /每日邮报和NYT50数据集实证结果表明，通过在一个单独的步骤明确建模的多样性，AREDSUM-CTX实现比AREDSUM-SEQ显著更好的性能以及国家的最先进的萃取汇总基线。

20. DialGraph: Sparse Graph Learning Networks for Visual Dialog [PDF] 返回目录
Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, Jin-Hwa Kim
Abstract: Visual dialog is a task of answering a sequence of questions grounded in an image utilizing a dialog history. Previous studies have implicitly explored the problem of reasoning semantic structures among the history using softmax attention. However, we argue that the softmax attention yields dense structures that could distract to answer the questions requiring partial or even no contextual information. In this paper, we formulate the visual dialog tasks as graph structure learning tasks. To tackle the problem, we propose Sparse Graph Learning Networks (SGLNs) consisting of a multimodal node embedding module and a sparse graph learning module. The proposed model explicitly learn sparse dialog structures by incorporating binary and score edges, leveraging a new structural loss function. Then, it finally outputs the answer, updating each node via a message passing framework. As a result, the proposed model outperforms the state-of-the-art approaches on the VisDial v1.0 dataset, only using 10.95% of the dialog history, as well as improves interpretability compared to baseline methods.
摘要：可视对话是回答的利用对话历史图像中的接地问题序列的任务。以前的研究已经含蓄地探索推理使用SOFTMAX关注历史之间的语义结构的问题。然而，我们认为，SOFTMAX关注产生致密的结构，可以分散回答需要局部甚至没有上下文信息的问题。在本文中，我们制定的视觉对话任务图结构的学习任务。为了解决这个问题，我们提出了稀疏图学习网络（SGLNs）由多节点嵌入模块和稀疏图的学习模块。该模型通过引入二元明确学习稀疏对话框结构和得分边缘，利用一个新的结构损失函数。然后，最终输出的答案，更新通过消息传递框架的每个节点。其结果是，该模型优于上VisDial V1.0数据集方法，只能使用对话历史的10.95％，以及提高可解释性与基线相比，方法先进国家的的。

21. Weight Poisoning Attacks on Pre-trained Models [PDF] 返回目录
Keita Kurita, Paul Michel, Graham Neubig
Abstract: Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at this https URL.
摘要：近日，NLP已经看到了在大预先训练模型的使用激增。用户下载的模型权重对大数据集预先训练，对他们所选择的任务再微调的权重。这就提出了是否下载不可信预训练的权重可能会造成安全威胁的问题。在本文中，我们证明了它可以构建``重量中毒“微调后，”“其中预训练的权重与该暴露``后门漏洞注入攻击”，使攻击者只需操纵模型预测注入的任意关键字。我们发现，通过应用正则化方法，我们称之为波动和初始化过程，我们称之为嵌入手术，这种攻击是可能的，即使与数据集和微调过程的了解有限。我们对情感分类，毒性检测和垃圾邮件检测的实验表明这种攻击是广泛应用并带来了严重的威胁。最后，我们列出了这种攻击的实际防御能力。代码重现我们的实验可在此HTTPS URL。

22. Compass-aligned Distributional Embeddings for Studying Semantic Differences across Corpora [PDF] 返回目录
Federico Bianchi, Valerio Di Carlo, Paolo Nicoli, Matteo Palmonari
Abstract: Word2vec is one of the most used algorithms to generate word embeddings because of a good mix of efficiency, quality of the generated representations and cognitive grounding. However, word meaning is not static and depends on the context in which words are used. Differences in word meaning that depends on time, location, topic, and other factors, can be studied by analyzing embeddings generated from different corpora in collections that are representative of these factors. For example, language evolution can be studied using a collection of news articles published in different time periods. In this paper, we present a general framework to support cross-corpora language studies with word embeddings, where embeddings generated from different corpora can be compared to find correspondences and differences in meaning across the corpora. CADE is the core component of our framework and solves the key problem of aligning the embeddings generated from different corpora. In particular, we focus on providing solid evidence about the effectiveness, generality, and robustness of CADE. To this end, we conduct quantitative and qualitative experiments in different domains, from temporal word embeddings to language localization and topical analysis. The results of our experiments suggest that CADE achieves state-of-the-art or superior performance on tasks where several competing approaches are available, yet providing a general method that can be used in a variety of domains. Finally, our experiments shed light on the conditions under which the alignment is reliable, which substantially depends on the degree of cross-corpora vocabulary overlap.
摘要：Word2vec是最常用的算法来生成，因为效率的一个很好的组合，质量生成的陈述和认知接地的字嵌入物之一。然而，词义也不是一成不变的，取决于在其中使用单词的上下文。在依赖于时间，地点，主题等因素的影响词义的差异，可以通过分析在代表这些因素的集合来自不同的语料库中产生的嵌入进行研究。例如，语言进化可以用发表在不同时间段的新闻文章的集合进行研究。在本文中，我们提出了一个总体框架，以支持跨语料库的语言研究与字的嵌入，其中来自不同的语料库中产生的嵌入能比找到对应点和不同点在整个语料库的意思。 CADE是我们的框架的核心组成部分，解决了比对来自不同语料库产生的嵌入的关键问题。尤其是，我们专注于提供有关的有效性，通用性和CADE的鲁棒性的确凿证据。为此，我们在不同的领域进行定量和定性实验，从时间上的嵌入文字语言本地化和专题分析。我们的实验结果表明，CADE实现国家的最先进的，其中几个相互竞争的方法是可用的，但提供了可在各种领域中使用的一般方法任务或优异的性能。最后，我们的实验阐明了在其下对准是可靠的，其基本上取决于横语料库词汇重叠的程度的条件。

23. Gender Detection on Social Networks using Ensemble Deep Learning [PDF] 返回目录
Kamran Kowsari, Mojtaba Heidarysafa, Tolu Odukoya, Philip Potter, Laura E. Barnes, Donald E. Brown
Abstract: Analyzing the ever-increasing volume of posts on social media sites such as Facebook and Twitter requires improved information processing methods for profiling authorship. Document classification is central to this task, but the performance of traditional supervised classifiers has degraded as the volume of social media has increased. This paper addresses this problem in the context of gender detection through ensemble classification that employs multi-model deep learning architectures to generate specialized understanding from different feature spaces.
摘要：分析社交媒体网站，如Facebook和Twitter的职位的不断增加卷需要剖析作者改进的信息处理方法。文档分类是这一任务，但传统的监督分类器的性能已经降低社交媒体的量有所增加。本文地址在性别检测，通过集成分类，它采用多模式深度学习结构背景下，这个问题产生专门从不同功能空间的理解。

24. Deep Learning Models for Multilingual Hate Speech Detection [PDF] 返回目录
Sai Saket Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee
Abstract: Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public for other researchers at this https URL.
摘要：仇恨言论检测与大多数只用一种语言提供的数据集的一个具有挑战性的问题：英语。在本文中，我们以9种语言，从16个不同的来源进行多语种仇恨言论的大规模分析。我们观察到，在低资源设置，简单的模式，如激光Logistic回归表现最好的嵌入，而在高资源设置BERT基于模型有更好的表现。在零镜头分类的情况下，语言，如意大利和葡萄牙取得良好的效果。我们提出的框架可以作为资源少的语言有效的解决方案。这些模型也可以作为很好的基线为今后的多语种仇恨言论的检测任务。我们已经在这个HTTPS URL使我们的代码和实验设置公众其他研究人员。

25. Transformer based Grapheme-to-Phoneme Conversion [PDF] 返回目录
Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth
Abstract: Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.
摘要：注意机制是基于自然语言处理（NLP）深度学习最成功的技术之一。变压器网络架构完全基于注意力机制，它优于神经机器翻译序列，对序列模型不复发和卷积层。字形 - 音素（G2P）转换是将字母（字形顺序）到它们的发音（音素序列）的任务。它在文本到语音转换（TTS）自动语音识别（ASR）系统的一个显著的作用和。在本文中，我们研究了变压器架构来G2P转换应用，并比较其复发和卷积基于神经网络的方法的性能。音素和单词错误率在CMUDict数据集为美国英语和NetTalk数据集进行评估。结果表明，基于该G2P变压器优于基于卷积的方案的字错误率方面与我们的结果显著超过了以前的反复方法（不关注）关于两个数据集字和音素错误率。此外，该模型的尺寸比以前的方法的规模要小得多。

26. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks [PDF] 返回目录
Xiujun Li, Xi Yin, Chunyuan Li, Xiaowei Hu, Pengchuan Zhang, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Abstract: Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks. While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ease the learning of alignments. Our method is motivated by the observation that the salient objects in an image can be accurately detected, and are often mentioned in the paired text. We pre-train an Oscar model on the public corpus of 6.5 million text-image pairs, and fine-tune it to downstream tasks, creating new state-of-the-arts on six well-established vision-language understanding and generation tasks.
摘要：大型学习上的图像文字对跨模态表示的前训练方法变得流行的视觉语言任务。虽然现有的方法简单地串连图像区域特征和输入文本功能的模型来进行预先训练和使用自重视学习图像文本的语义路线在蛮力方式，在本文中，我们提出了一种新的学习方法，奥斯卡（对象的语义预对准训练），其使用对象标签图像作为锚定点到显著缓解比对的学习检测。我们的方法是由观察激励该图像中的显着对象可以精确地检测，并且经常在成对的文中提到。在650万的文字，图像对公共语料库我们预先训练奥斯卡模型，它微调到下游任务，六个完善的视觉语言理解和生成任务创建新的国家的最艺术。

27. Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification [PDF] 返回目录
Ming Jiang, Jennifer D'Souza, Sören Auer, J. Stephen Downie
Abstract: With the rapidly growing number of research publications, there is a vast amount of scholarly information that needs to be organized in digital libraries. To deal with this challenge, digital libraries use semantic techniques to build knowledge-base structures for organizing scientific information. Identifying relations between scientific terms can help with the construction of a representative knowledge-based structure. While advanced automated techniques have been developed for relation extraction, many of these techniques were evaluated under different scenarios, which limits their comparability. To this end, this study presents a thorough empirical evaluation of eight Bert-based classification models by exploring two factors: 1) Bert model variants, and 2) classification strategies. To simulate real-world settings, we conduct our sentence-level assessment using the abstracts of scholarly publications in three corpora, two of which are distinct corpora and the third of which is the union of the first two. Our findings show that SciBert models perform better than Bert-BASE models. The strategy of classifying a single relation each time is preferred in the corpus consisting of abundant scientific relations, while the strategy of identifying multiple relations at one time is beneficial to the corpus with sparse relations. Our results offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build a structured knowledge-based system for the ease of scholarly information organization.
摘要：随着数量快速增长研究出版物，有需要在数字图书馆将举办的学术的大量信息。为了应对这一挑战，数字图书馆使用语义技术来构建知识基础结构的组织的科学信息。科学术语之间的关系确定可以用具有代表性的知识为基础的结构施工提供帮助。而先进的自动化技术已被用于关系抽取发达，很多这些技术在不同情况下进行了评价，这限制了它们的可比性。为此，本研究提出了八个基于伯特 - 分类模型深入的实证评价通过探索两个因素：1）伯特模型变种，和2）分类的策略。为了模拟真实世界的设置，我们在开展利用学术性出版物的文摘句级评估三个语料库，其中两个是不同的语料库和三分之一的是前两种的结合。我们的研究结果显示，SciBert模型进行比伯特-BASE车型更好。进行分类的单个关系每次优选在由丰富科学关系的语料库，而在同一时间识别多个关系的策略是稀疏关系语料库有益的策略。我们的研究结果为选择合适的技术来建立一个结构化的知识为基础的系统，为便于学术信息组织提供的数字图书馆的利益相关者的建议。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-15

目录

摘要