摘要

1. Mischief: A Simple Black-Box Attack Against Transformer Architectures [PDF] 返回目录
Adrian de Wynter
Abstract: We introduce Mischief, a simple and lightweight method to produce a class of human-readable, realistic adversarial examples for language models. We perform exhaustive experimentations of our algorithm on four transformer-based architectures, across a variety of downstream tasks, as well as under varying concentrations of said examples. Our findings show that the presence of Mischief-generated adversarial samples in the test set significantly degrades (by up to $20\%$) the performance of these models with respect to their reported baselines. Nonetheless, we also demonstrate that, by including similar examples in the training set, it is possible to restore the baseline scores on the adversarial test set. Moreover, for certain tasks, the models trained with Mischief set show a modest increase on performance with respect to their original, non-adversarial baseline.
摘要：介绍淘气，简单轻便的方法来产生一类的人类可读的，现实的对抗性例子为语言模型。我们四个基于变压器的架构进行了算法的详尽性实验，在各种下游任务，以及不同浓度的例子说下。我们的研究结果显示，恶作剧产生对抗样本的存在测试集中显著降解（高达$ 20 \％$），这些车型的性能相对于他们的报道基线。然而，我们还表明，通过在训练集类似的例子，它是可以恢复的对抗测试集的基线评分。此外，对于某些任务，以作恶训练模型显示性能上适度增加相对于原来的，非对抗性的基线。

2. Detecting Objectifying Language in Online Professor Reviews [PDF] 返回目录
Angie Waller, Kyle Gorman
Abstract: Student reviews often make reference to professors' physical appearances. Until recently this http URL, the website of this study's focus, used a design feature to encourage a "hot or not" rating of college professors. In the wake of recent #MeToo and #TimesUp movements, social awareness of the inappropriateness of these reviews has grown; however, objectifying comments remain and continue to be posted in this online context. We describe two supervised text classifiers for detecting objectifying commentary in professor reviews. We then ensemble these classifiers and use the resulting model to track objectifying commentary at scale. We measure correlations between objectifying commentary, changes to the review website interface, and teacher gender across a ten-year period.
摘要：学生评论经常提及教授的物理外观。直到最近，这个HTTP URL，这项研究的焦点的网站，使用的设计特征，以鼓励大学教授的“热不热”的评级。在最近的#MeToo和#TimesUp运动之后，这些评论是不适当的社会意识已经成长;然而，客观化的意见仍然存在，继续在这个背景下网上被公布。我们描述了在教授的评论检测客体评论2个被监视的文本分类。然后，我们合奏这些分类，并使用生成的模型来跟踪大规模客体评论。我们衡量跨越十年内教师性别客体评论之间的相关性，变动的审查网站界面，和。

3. Adaptive Feature Selection for End-to-End Speech Translation [PDF] 返回目录
Biao Zhang, Ivan Titov, Barry Haddow, Rico Sennrich
Abstract: Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to SR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech En-Fr and MuST-C benchmarks show that AFS facilitates learning of ST by pruning out ~84% temporal features, yielding an average translation gain of ~1.3-1.6 BLEU and a decoding speedup of ~1.4x. In particular, AFS reduces the performance gap compared to the cascade baseline, and outperforms it on LibriSpeech En-Fr with a BLEU score of 18.56 (without data augmentation)
摘要：在语音信号的信息分布并不均匀，使得它的端至端（E2E）语音翻译（ST），一个额外的挑战要学会专注于信息量大的特点。在本文中，我们提出了自适应特征选择基于（AFS）的编码器，解码器E2E ST。我们先预先训练的ASR编码器和应用AFS动态估算SR每个编码的语音特征的重要性。 A ST编码器，堆叠在ASR编码器的顶部，然后从（冻）ASR编码器接收经滤波的特征。我们采取L0DROP（Zhang等人，2020年）为骨干的AFS，它相对于这两个时间和特征尺寸适应sparsify语音特征。上LibriSpeech恩Fr与必须-C基准结果表明，AFS便于通过修剪出〜84％的时间特征，得到〜1.3-1.6 BLEU的平均翻译增益和1.4倍〜的解码加速学习ST的。具体地，AFS降低相比于基线级联的性能差距，并配有BLEU分数18.56优于它LibriSpeech恩星期五（无数据扩张）

4. An efficient representation of chronological events in medical texts [PDF] 返回目录
Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Hao Ni, Goran Nenadic, Alejo Nevado-Holgado
Abstract: In this work we addressed the problem of capturing sequential information contained in longitudinal electronic health records (EHRs). Clinical notes, which is a particular type of EHR data, are a rich source of information and practitioners often develop clever solutions how to maximise the sequential information contained in free-texts. We proposed a systematic methodology for learning from chronological events available in clinical notes. The proposed methodological {\it path signature} framework creates a non-parametric hierarchical representation of sequential events of any type and can be used as features for downstream statistical learning tasks. The methodology was developed and externally validated using the largest in the UK secondary care mental health EHR data on a specific task of predicting survival risk of patients diagnosed with Alzheimer's disease. The signature-based model was compared to a common survival random forest model. Our results showed a 15.4$\%$ increase of risk prediction AUC at the time point of 20 months after the first admission to a specialist memory clinic and the signature method outperformed the baseline mixed-effects model by 13.2 $\%$.
摘要：在这项工作中，我们解决捕获包含在纵向电子健康记录（电子病历）顺序信息的问题。临床笔记，这是一种特殊类型的电子病历数据的，是一个丰富的信息和从业人员经常源开发聪明的解决方法如何最大限度地包含在自由文本顺序信息。我们提出了一个系统的方法，以便在临床记录可用的时间学习活动。所提出的方法{\它的路径签名}框架创建任何类型的顺序事件的非参数等级的代表性和可作为功能下游统计学习任务。该方法的开发和利用在英国二级护理心理健康电子病历数据的最大的预测诊断患有阿尔茨海默氏症患者的生存风险的特定任务外部验证。基于签名的模型进行比较，以一个共同的生存随机森林模型。我们的研究结果显示，在第一次录取到专科门诊内存和签名方法后的20个月时间点的风险预测AUC的$ 15.4增加$ \％13.2 $ \％$跑赢基准的混合效应模型。

5. Multi-Adversarial Learning for Cross-Lingual Word Embeddings [PDF] 返回目录
Haozhou Wang, James Henderson, Paola Merlo
Abstract: Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages - without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by a single linear mapping and are approximately isomorphic. We assume instead that, especially across distant languages, the mapping is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-mapping methods, especially for distant languages.
摘要：创成对抗网络（甘斯）已经成功地诱导跨语种字的嵌入 - 没有监督 - 跨语言匹配的词的映射。尽管取得了这些成功，甘斯遥远语言的困难情况下的性能仍然不能令人满意。这些限制已经由甘斯不正确的假设解释，源和目标嵌入空间由单一的线性映射相关的和大致同构的。我们假设而不是，特别是在跨越遥远的语言，映射只是分段线性的，并提出了多对抗性的学习方法。这种新颖的方法通过诱导多个映射，每个感应到适合的映射关系对一个子空间中的种子跨语言字典。我们在无人监督的双语词典感应表明，该方法改进了以往的单映射方法的性能，特别是对于遥远的语言实验。

6. Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering [PDF] 返回目录
Wissam Siblini, Mohamed Challal, Charlotte Pasqual
Abstract: Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although transformer-based language models such as Bert have shown on SQuAD the ability to surpass humans for extracting answers in small passages of text, they suffer from their high complexity when faced to a much larger search space. The most common way to tackle this problem is to add a preliminary Information Retrieval step to heavily filter the corpus and only keep the relevant passages. In this paper, we propose a more direct and complementary solution which consists in applying a generic change in the architecture of transformer-based models to delay the attention between subparts of the input and allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, on the ODQA setting, a significant speedup and even a performance improvement in many cases.
摘要：开放域问答系统（ODQA）上的文件（例如，维基百科）大规模语料库是计算机科学的一个关键挑战。虽然基于变压器的语言模型，如伯特对阵容显示出超越人类对文本的小通道提取答案的能力，他们从他们的高复杂性面临到更大的搜索空间时受到影响。解决这个问题的最常见的方法是添加一个初步的信息检索步骤，以重过滤语料只保留了相关段落。在本文中，我们提出这在于把基于变压器模型的架构的通用变化延迟输入的子部件之间的注意力，让计算更高效的管理更直接的和互补的解决方案。得到的变体与采掘任务的原机型的竞争，并允许，在ODQA设置，在很多情况下，显著加速，甚至提高性能。

7. Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications [PDF] 返回目录
Matthew Khoury, Rumen Dangovski, Longwu Ou, Preslav Nakov, Yichen Shen, Li Jing
Abstract: Deep neural networks have become the standard approach to building reliable Natural Language Processing (NLP) applications, ranging from Neural Machine Translation (NMT) to dialogue systems. However, improving accuracy by increasing the model size requires a large number of hardware computations, which can slow down NLP applications significantly at inference time. To address this issue, we propose a novel vector-vector-matrix architecture (VVMA), which greatly reduces the latency at inference time for NMT. This architecture takes advantage of specialized hardware that has low-latency vector-vector operations and higher-latency vector-matrix operations. It also reduces the number of parameters and FLOPs for virtually all models that rely on efficient matrix multipliers without significantly impacting accuracy. We present empirical results suggesting that our framework can reduce the latency of sequence-to-sequence and Transformer models used for NMT by a factor of four. Finally, we show evidence suggesting that our VVMA extends to other domains, and we discuss novel hardware for its efficient use.
摘要：深层神经网络已经成为标准的方法来构建可靠的自然语言处理（NLP）应用，从神经机器翻译（NMT）对话系统。然而，通过增加模型的大小提高了精度，需要大量的硬件计算的，可以在推理时间显著放缓NLP应用。为了解决这个问题，我们提出了一个新的载体，载体矩阵架构（VVMA），这大大减少了推理时间为NMT的延迟。这种架构需要具有低延时矢量矢量操作和更高的延迟矢量矩阵运算专用硬件的优势。这也减少了参数的数量和触发器对于依赖于有效的矩阵乘法没有显著影响精度几乎所有的车型。我们目前的实证结果表明我们的框架可以通过四个因素减少序列对序列用于NMT和Transformer模型的等待时间。最后，我们展示的证据表明，我们的VVMA延伸到其他领域，并讨论其有效地利用新的硬件。

8. From Talk to Action with Accountability: Monitoring the Public Discussion of Finnish Decision-Makers with Deep Neural Networks and Topic Modelling [PDF] 返回目录
Vili Hätönen, Fiona Melzer
Abstract: Decades of research on climate have provided a consensus that human activity has changed the climate and we are currently heading into a climate crisis. Many tools and methods, some of which utilize machine learning, have been developed to monitor, evaluate, and predict the changing climate and its effects on societies. However, the mere existence of tools and increased awareness have not led to swift action to reduce emissions and mitigate climate change. Politicians and other policy makers lack the initiative to move from talking about the climate to concrete climate action. In this work, we contribute to the efforts of holding decision makers accountable by describing a system which digests politicians' speeches and statements into a topic summary. We propose a multi-source hybrid latent Dirichlet allocation model which can process the large number of publicly available reports, social media posts, speeches, and other documents of Finnish politicians, providing transparency and accountability towards the general public.
摘要：气候几十年的研究提供了一个共识，即人类活动改变了气候，目前我们正在走向成气候危机。许多工具和方法，其中一些利用机器学习，已经发展到监测，评估和预测气候变化及其对社会的影响。然而，工具的存在本身和提高认识没有导致迅速采取行动，以减少排放和减缓气候变化。政治家和其他决策者缺乏从谈论气候混凝土气候行动移动的主动权。在这项工作中，贡献我们的控股决策者通过描述其消化政客的讲话和声明为主题概念的系统负责的努力。我们提出了一个多源混合的隐含狄利克雷分配，可以处理大量的公开的报告，社交媒体文章，演讲，和芬兰政界人士，向广大市民提供透明度和问责制等文件模式。

9. QA2Explanation: Generating and Evaluating Explanations for Question Answering Systems over Knowledge Graph [PDF] 返回目录
Saeedeh Shekarpour, Abhishek Nadgeri, Kuldeep Singh
Abstract: In the era of Big Knowledge Graphs, Question Answering (QA) systems have reached a milestone in their performance and feasibility. However, their applicability, particularly in specific domains such as the biomedical domain, has not gained wide acceptance due to their "black box" nature, which hinders transparency, fairness, and accountability of QA systems. Therefore, users are unable to understand how and why particular questions have been answered, whereas some others fail. To address this challenge, in this paper, we develop an automatic approach for generating explanations during various stages of a pipeline-based QA system. Our approach is a supervised and automatic approach which considers three classes (i.e., success, no answer, and wrong answer) for annotating the output of involved QA components. Upon our prediction, a template explanation is chosen and integrated into the output of the corresponding component. To measure the effectiveness of the approach, we conducted a user survey as to how non-expert users perceive our generated explanations. The results of our study show a significant increase in the four dimensions of the human factor from the Human-computer interaction community.
摘要：在大知识图的时代，问答系统（QA）系统已经达到了它们的性能和可行性的一个里程碑。然而，它们的适用性，尤其是在特定的域，如生物医学领域，还没有由于它们的“黑盒子”的性质，这阻碍了透明度，公平，以及QA系统的问责获得广泛的接受。因此，用户无法了解具体的如何和为什么的问题已经回答了，而另一些失败。为了应对这一挑战，在本文中，我们开发过程中生成一个基于管道-QA系统的各个阶段的解释自动方式。我们的做法是有监督的，自动的做法，认为三类（即成功，没有答案，错误答案）的注释涉及QA组件的输出。在我们预测，模板解释被选择并集成到相应的组件的输出。为了测量方法的有效性，我们进行了一次用户调查，至于怎么非专业用户感知我们的生成解释。我们的研究结果显示，从人机交互的社区人的因素的四个维度一个显著上升。

10. Detecting ESG topics using domain-specific language models and data augmentation approaches [PDF] 返回目录
Tim Nugent, Nicole Stelea, Jochen L. Leidner
Abstract: Despite recent advances in deep learning-based language modelling, many natural language processing (NLP) tasks in the financial domain remain challenging due to the paucity of appropriately labelled data. Other issues that can limit task performance are differences in word distribution between the general corpora - typically used to pre-train language models - and financial corpora, which often exhibit specialized language and symbology. Here, we investigate two approaches that may help to mitigate these issues. Firstly, we experiment with further language model pre-training using large amounts of in-domain data from business and financial news. We then apply augmentation approaches to increase the size of our dataset for model fine-tuning. We report our findings on an Environmental, Social and Governance (ESG) controversies dataset and demonstrate that both approaches are beneficial to accuracy in classification tasks.
摘要：尽管深基础的学习，语言建模的最新进展，在金融领域的许多自然语言处理（NLP）的任务仍然具有挑战性，因为适当标记数据的缺乏。这可能会限制执行任务的其他问题是在一般的语料库之间词分布的差异 - 通常用于预先训练语言模型 - 和金融语料库，这往往表现出特殊的语言和符号。在这里，我们调查可能有助于减轻这些问题的两种方法。首先，我们尝试用商业和财经新闻大量域内数据的进一步语言模型前培训。然后，我们应用隆胸方法来增加我们的数据集的大小对模型进行微调。我们上的环境，社会和治理报告我们发现（ESG）的争议数据集，并证明了这两种方法都在分类任务精度是有益的。

11. Multi-task Learning of Negation and Speculation for Targeted Sentiment Classification [PDF] 返回目录
Andrew Moore, Jeremy Barnes
Abstract: The majority of work in targeted sentiment analysis has concentrated on finding better methods to improve the overall results. Within this paper we show that these models are not robust to linguistic phenomena, specifically negation and speculation. In this paper, we propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection, to create models that are more robust to these phenomena. Further we create two challenge datasets to evaluate model performance on negated and speculative samples. We find that multi-task models and transfer learning from a language model can improve performance on these challenge datasets. However the results indicate that there is still much room for improvement in making our models more robust to linguistic phenomena such as negation and speculation.
摘要：大多数有针对性的情感分析工作都集中在寻找更好的方法来提高整体效果。在本文中，我们表明，这些模型是不稳健的语言现象，特别是否定和猜测。在本文中，我们提出了一个多任务的学习方法，从句法和语义的辅助任务，包括否定和猜测范围检测，集信息，以创建模型，更加坚固了这些现象。再者，我们创建了两个挑战数据集，以评估对否定和投机性的样本模型的性能。我们发现从语言模型，多任务模式和迁移学习可以提高这些挑战数据集的性能。然而调查结果表明，仍然有很大的提升空间，使我们的模型更稳健的语言现象，如否定和猜测。

12. It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT [PDF] 返回目录
Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
Abstract: Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages. We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning. The results suggest that most of this information is encoded in a non-linear way, while some of it can also be recovered with purely linear tools. As part of our analysis, we test the hypothesis that mBERT learns representations which contain both a language-encoding component and an abstract, cross-lingual component, and explicitly identify an empirical language-identity subspace within mBERT representations.
摘要：最近的工作已经证明，多语种BERT（mBERT）获悉丰富的跨语言表述，即允许跨语言转移。我们研究嵌入mBERT和现在的两个简单的方法，揭露没有微调卓越的翻译能力的话，电平转换信息。结果表明，其中的大部分信息在一个非线性的方式进行编码，而它的一些也可以与纯线性工具进行恢复。作为我们分析的一部分，我们测试的假设，其中既包含语言编码组件和抽象的，跨语言成分，并明确mBERT获悉交涉识别mBERT表示内的经验语言身份子空间。

13. Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation [PDF] 返回目录
Qiang Wang, Tong Xiao, Jingbo Zhu
Abstract: The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training leads to increased model maintenance costs and slower model iterations, especially for the industry. In this work, we propose to use multi-task learning to train a flexible depth model that can adapt to different depth configurations during inference. Experimental results show that our approach can simultaneously support decoding in 24 depth configurations and is superior to the individual training and another flexible depth model training method -- LayerDrop.
摘要：标准的神经机器翻译模型只能与相同配置的深度培训解码。通过此功能的限制，我们必须部署各种尺寸以保持相同的翻译延迟的模型，因为在不同的终端装置（例如，移动电话）的硬件条件可能变化很大。这样的个人训练导致增加模型的维护成本和更慢的模型迭代，特别是对于行业。在这项工作中，我们建议使用多任务学习，培养灵活的深度模型，可以推断在适应不同深度的结构。实验结果表明，我们的方法可以同时支持24点深度解码的配置和优越的个人培训和其他灵活的深度模型的训练方法 - LayerDrop。

14. SIGTYP 2020 Shared Task: Prediction of Typological Features [PDF] 返回目录
Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein
Abstract: Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.
摘要：类型学知识库（KBS）如WALS（机和Haspelmath，2013）包含有关世界语言的语言属性的信息。他们已经被证明是对下游的应用，包括跨语言迁移学习和语言探测有用。主要缺点阻碍了更广泛的采用类型学KB的是，他们地广人稀，在这个意义上，大多数语言只对某些功能注释和扭曲，在一些功能具有覆盖面广。作为类型学特征往往彼此相关，所以能够预测它们，从而自动填充类型学KB的，这也是这个共享任务的焦点。总体而言，任务吸引了来自5支球队，哪个最成功的方法，利用这些功能相关的8份意见书。然而，我们的误差分析表明，即使是最强大提交系统与其中一些功能是已知的语言预测特征值的斗争。

15. Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers [PDF] 返回目录
Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou
Abstract: Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training. Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used to rank sentences for unsupervised extractive summarization. Specifically, we first pre-train a hierarchical transformer model using unlabeled documents only. Then we propose a method to rank sentences using sentence-level self-attentions and pre-training objectives. Experiments on CNN/DailyMail and New York Times datasets show our model achieves state-of-the-art performance on unsupervised summarization. We also find in experiments that our model is less dependent on sentence positions. When using a linear combination of our model and a recent unsupervised model explicitly modeling sentence positions, we obtain even better results.
摘要：无监督采掘文件摘要的目的来选择从文档中重要的句子无需培训期间使用标记摘要。现有方法大多基于图形的句子作为由句子的相似测量节点和边缘的权重。在这项工作中，我们发现，变压器关注可用于排名的句子无监督采掘总结。具体来说，我们先预训练只使用未标记的文档的分层变压器模型。然后，我们建议等级的句子方法使用句子层次的自我关注，预培养目标。在CNN /每日邮报和纽约时报的数据集实验表明我们的模型实现了无监督摘要国家的最先进的性能。我们还发现，在实验中，我们的模型是更少地依赖于句子位置。当使用我们的模型的线性组合与最近的无人监督的模式明确造型句子位置，我们得到更好的结果。

16. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks [PDF] 返回目录
Nandan Thakur, Nils Reimers, Johannes Daxenberger, Iryna Gurevych
Abstract: There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance.
摘要：对于成对的句子打分两种方式：交叉编码器，其在输入对执行全的关注，和Bi-编码器，每个输入独立地映射到一个密集的向量空间。虽然跨编码器往往获得更高的性能，但它们对于许多实际使用情况太慢。双编码器，而另一方面，需要在目标任务大量的培训数据和微调，以实现竞争力的性能。我们目前称为增强SBERT一个简单而有效的数据扩张策略，在这里我们使用交叉编码标记更大的一组输入对，以增加训练数据的双向编码器。我们证明了，在这个过程中，选择句对是不平凡的和关键的方法的成功。我们评估我们在多个任务（在域）上的域适应任务的做法以及。增强SBERT实现多达6个点的改善为域和最多37点域适应任务相比原来的双编码器的性能。

17. WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets [PDF] 返回目录
Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan
Abstract: In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task.
摘要：在本文中，我们提供翔实COVID-19英语鸣叫的识别WNUT-2020共享任务的概述。我们描述了如何构建10K鸣叫的语料库，并组织开发和评估阶段完成这个任务。此外，我们还提出从55队最终系统评价提交获得的结果的简要总结，发现（ⅰ）许多系统得到非常高的性能，可达0.91 F1得分，（ⅱ）多数的意见书的实现基本比基线fastText更高的结果（Joulin等，2017），及（iii）就有关语言的数据预先训练微调语言模型，然后指导训练执行以及完成这一任务。

18. Collaborative Training of GANs in Continuous and Discrete Spaces for Text Generation [PDF] 返回目录
Yanghoon Kim, Seungpil Won, Seunghyun Yoon, Kyomin Jung
Abstract: Applying generative adversarial networks (GANs) to text-related tasks is challenging due to the discrete nature of language. One line of research resolves this issue by employing reinforcement learning (RL) and optimizing the next-word sampling policy directly in a discrete action space. Such methods compute the rewards from complete sentences and avoid error accumulation due to exposure bias. Other approaches employ approximation techniques that map the text to continuous representation in order to circumvent the non-differentiable discrete process. Particularly, autoencoder-based methods effectively produce robust representations that can model complex discrete structures. In this paper, we propose a novel text GAN architecture that promotes the collaborative training of the continuous-space and discrete-space methods. Our method employs an autoencoder to learn an implicit data manifold, providing a learning objective for adversarial training in a continuous space. Furthermore, the complete textual output is directly evaluated and updated via RL in a discrete space. The collaborative interplay between the two adversarial trainings effectively regularize the text representations in different spaces. The experimental results on three standard benchmark datasets show that our model substantially outperforms state-of-the-art text GANs with respect to quality, diversity, and global consistency.
摘要：应用生成对抗网络（甘斯）文本相关的任务是由于语言的离散性挑战。研究一号线采用强化学习（RL），并直接在离散动作空间优化下一词采样政策解决此问题。这样的方法计算从完整的句子，避免误差累积由于曝光补偿的回报。其他方法采用近似技术，文本到连续表示映射为了规避不可微分离散过程。尤其是，基于自动编码的方法有效地产生强大的陈述，可以复杂的离散结构模型。在本文中，我们提出了促进的连续空间和离散空间方法协同训练一个新的文本GAN架构。我们的方法采用了自动编码学习隐式数据歧管，在连续的空间提供对抗训练中的学习目标。此外，完整的文本输出被直接评估，并经由RL在离散空间更新。这两个对立的培训之间的相互协作有效正规化在不同的空间的文本表示。三个标准的标准数据集的实验结果表明，我们的模型大幅优于国家的最先进的文本甘斯对于质量，多样性和全球一致。

19. Coarse-to-Fine Pre-training for Named Entity Recognition [PDF] 返回目录
Mengge Xue, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, Bin Wang
Abstract: More recently, Named Entity Recognition hasachieved great advances aided by pre-trainingapproaches such as BERT. However, currentpre-training techniques focus on building lan-guage modeling objectives to learn a gen-eral representation, ignoring the named entity-related knowledge. To this end, we proposea NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models. Specifi-cally, we first warm-up the model via an en-tity span identification task by training it withWikipedia anchors, which can be deemed asgeneral-typed entities. Then we leverage thegazetteer-based distant supervision strategy totrain the model extract coarse-grained typedentities. Finally, we devise a self-supervisedauxiliary task to mine the fine-grained namedentity knowledge via clustering.Empiricalstudies on three public NER datasets demon-strate that our framework achieves significantimprovements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks. Besides, weshow that our framework gains promising re-sults without using human-labeled trainingdata, demonstrating its effectiveness in label-few and low-resource scenarios
摘要：最近，命名实体识别hasachieved通过预trainingapproaches如BERT辅助巨大的进步。然而，currentpre培训技术重点建设局域网瓜哥建模目标学习GEN-ERAL表示，忽略了命名实体的相关知识。为此，我们proposea特定NER-前培训框架，在-JECT粗到精的自动开采entityknowledge到预先训练模式。 Specifi - 凯莉，我们第一场热身通过EN-tity跨度识别任务模型通过训练它withWikipedia锚，这可以被视为asgeneral类型的实体。然后我们利用thegazetteer基于遥远的监管策略totrain模型提取粗粒度typedentities。最后，我们设计一个自supervisedauxiliary任务矿三个公共NER数据集恶魔施特拉特通过clustering.Empiricalstudies细粒度namedentity知识，我们的框架，实现了对几种预先训练基地线significantimprovements，建立新了最先进每formance先进上三个基准。此外，weshow我们的框架收益承诺重新sults不使用人类标记trainingdata，证明标签数和低资源方案的有效性

20. Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning [PDF] 返回目录
Wanyun Cui, Guangyu Zheng, Wei Wang
Abstract: We propose to solve the natural language inference problem without any supervision from the inference labels via task-agnostic multimodal pretraining. Although recent studies of multimodal self-supervised learning also represent the linguistic and visual context, their encoders for different modalities are coupled. Thus they cannot incorporate visual information when encoding plain text alone. In this paper, we propose Multimodal Aligned Contrastive Decoupled learning (MACD) network. MACD forces the decoupled text encoder to represent the visual information via contrastive learning. Therefore, it embeds visual knowledge even for plain text inference. We conducted comprehensive experiments over plain text inference datasets (i.e. SNLI and STS-B). The unsupervised MACD even outperforms the fully-supervised BiLSTM and BiLSTM+ELMO on STS-B.
摘要：本文提出解决自然语言推理问题，而通过任务无关的多训练前从推理标签上的任何监督。虽然最近多自我监督学习的研究也代表了语言和视觉方面，他们对不同形式的编码器连接。因此，单独编码纯文本时，他们不能融入视觉信息。在本文中，我们提出了多式联运不结盟对比解耦学习（MACD）网络。 MACD强制解耦文本编码来表示通过对比学习的视觉信息。因此，它嵌入视觉知识甚至是纯文本的推论。我们进行了明文推断的数据集（即SNLI和STS-B）综合实验。无监督MACD甚至优于全监督BiLSTM和BiLSTM + ELMO在STS-B。

21. Lexicon-constrained Copying Network for Chinese Abstractive Summarization [PDF] 返回目录
Boyan Wan, Zhuo Tang, Li Yang
Abstract: Copy mechanism allows sequence-to-sequence models to choose words from the input and put them directly into the output, which is finding increasing use in abstractive summarization. However, since there is no explicit delimiter in Chinese sentences, most existing models for Chinese abstractive summarization can only perform character copy, resulting in inefficient. To solve this problem, we propose a lexicon-constrained copying network that models multi-granularity in both encoder and decoder. On the source side, words and characters are aggregated into the same input memory using a Transformerbased encoder. On the target side, the decoder can copy either a character or a multi-character word at each time step, and the decoding process is guided by a word-enhanced search algorithm that facilitates the parallel computation and encourages the model to copy more words. Moreover, we adopt a word selector to integrate keyword information. Experiments results on a Chinese social media dataset show that our model can work standalone or with the word selector. Both forms can outperform previous character-based models and achieve competitive performances.
摘要：复制机制允许序列到序列型号可供选择输入字，并把它们直接进入输出，这是发现在抽象概括越来越多地使用。然而，由于在中国的句子没有明确的分隔符，用于中国的抽象概括大多数现有的模型只能执行字符复制，从而导致效率低下。为了解决这个问题，我们提出了一个词汇受限的复制网络模型多粒度的编码器和解码器。在源极侧，字和字符被聚集成使用Transformerbased编码器相同的输入存储器中。在目标端，解码器可以在每个时间步复制任一个字符或一个多字符单词，并且解码过程是由便于并行计算，并鼓励模型来复制多个字的字增强搜索算法引导。此外，我们采用一个字选择以关键字信息集成。在中国的社会化媒体的实验结果数据集上，我们的模型可以独立工作或与字选择。这两种形式可以超越以前的基于字符的模式，实现有竞争力的表现。

22. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering [PDF] 返回目录
Yingqi Qu Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, Haifeng Wang
Abstract: In open-domain question answering, dense passage retrieval has become a new paradigm to retrieve relevant passages for answer finding. Typically, the dual-encoder architecture is adopted to learn dense representations of questions and passages for matching. However, it is difficult to train an effective dual-encoder due to the challenges including the discrepancy between training and inference, the existence of unlabeled positives and limited training data. To address these challenges, we propose an optimized training approach, called RocketQA, to improving dense passage retrieval. We make three major technical contributions in RocketQA, namely cross-batch negatives, denoised negative sampling and data augmentation. Extensive experiments show that RocketQA significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions. Besides, built upon RocketQA, we achieve the first rank at the leaderboard of MSMARCO Passage Ranking Task.
摘要：在开放域问答系统，密集的段落检索已经成为一种新的模式来检索答案发现相关段落。通常情况下，双编码器架构，采用学习的问题，密集的交涉和通道匹配。然而，这是难以有效双编码器训练由于包括培训和推理，未标记阳性的存在和局限于训练数据之间的差异的挑战。为了应对这些挑战，我们提出了一个优化的训练方法，称为RocketQA，提高密通道检索。我们使RocketQA，即跨批次底片，三大技术贡献去噪负采样和数据增强。大量的实验表明，RocketQA显著优于两个MSMARCO和自然问题的国家的最先进的以往机型。此外，在建RocketQA，我们实现了在MSMARCO通道排行任务的排行榜排名第一。

23. DiDi's Machine Translation System for WMT2020 [PDF] 返回目录
Tanfang Chen, Weiwei Wang, Wenyang Wei, Xing Shi, Xiangang Li, Jieping Ye, Kevin Knight
Abstract: This paper describes DiDi AI Labs' submission to the WMT2020 news translation shared task. We participate in the translation direction of Chinese->English. In this direction, we use the Transformer as our baseline model, and integrate several techniques for model enhancement, including data filtering, data selection, back-translation, fine-tuning, model ensembling, and re-ranking. As a result, our submission achieves a BLEU score of $36.6$ in Chinese->English.
摘要：本文介绍了嘀嘀AI实验室提交WMT2020新闻翻译共享任务。我们参加了华裔>英语的翻译方向。在这个方向，我们使用变压器作为我们的基础模型，并整合为模型增强多种技术，包括数据过滤，数据选择，回译，微调，模型ensembling，并重新排名。因此，我们提出实现了$ 36.6 $在华裔>英语BLEU得分。

24. Generating Diverse Translation from Model Distribution with Dropout [PDF] 返回目录
Xuanfu Wu, Yang Feng, Chenze Shao
Abstract: Despite the improvement of translation quality, neural machine translation (NMT) often suffers from the lack of diversity in its generation. In this paper, we propose to generate diverse translations by deriving a large number of possible models with Bayesian modelling and sampling models from them for inference. The possible models are obtained by applying concrete dropout to the NMT model and each of them has specific confidence for its prediction, which corresponds to a posterior model distribution under specific training data in the principle of Bayesian modeling. With variational inference, the posterior model distribution can be approximated with a variational distribution, from which the final models for inference are sampled. We conducted experiments on Chinese-English and English-German translation tasks and the results shows that our method makes a better trade-off between diversity and accuracy.
摘要：尽管翻译质量的提高，神经机器翻译（NMT）经常在其一代缺乏多样性受到影响。在本文中，我们建议获得了大量的可能的模型与贝叶斯模型，并从他们的推理抽样模型来生成不同的翻译。的可能的模型是通过将混凝土脱落到NMT模型获得和他们每个人都有其预测，具体信心其对应于贝叶斯建模的原则下特定训练数据后验模型分布。随着变推论，后分布模型可以用变配电，从中推断为最终模型进行采样近似。我们对中国英语和英语 - 德语翻译任务，结果表明，我们的方法，使一个更好的权衡多样性和准确性之间进行了实验。

25. Inferring symmetry in natural language [PDF] 返回目录
Chelsea Tanchip, Lei Yu, Aotao Xu, Yang Xu
Abstract: We present a methodological framework for inferring symmetry of verb predicates in natural language. Empirical work on predicate symmetry has taken two main approaches. The feature-based approach focuses on linguistic features pertaining to symmetry. The context-based approach denies the existence of absolute symmetry but instead argues that such inference is context dependent. We develop methods that formalize these approaches and evaluate them against a novel symmetry inference sentence (SIS) dataset comprised of 400 naturalistic usages of literature-informed verbs spanning the spectrum of symmetry-asymmetry. Our results show that a hybrid transfer learning model that integrates linguistic features with contextualized language models most faithfully predicts the empirical data. Our work integrates existing approaches to symmetry in natural language and suggests how symmetry inference can improve systematicity in state-of-the-art language models.
摘要：我们提出了在自然语言推断动词谓语的对称性的方法框架。谓词对称性的实证研究采取了两种主要的方法。基于特征的方法侧重于有关对称性语言特征。基于上下文的方式否认绝对对称的存在，而是认为这种推理是上下文相关的。我们开发这些形式化方法和评估他们对一种新型的对称推断句子（SIS）数据集由文学知情动词跨越对称不对称频谱的400个自然用法方法。我们的研究结果表明，混合动力传递学习模式，整合了情境化语言模型的语言特征最忠实的预测经验数据。我们现有的工作整合接近对称的自然语言和暗示对称推理如何提高在国家的最先进的语言模型系统性。

26. Montague Grammar Induction [PDF] 返回目录
Gene Louis Kim, Aaron Steven White
Abstract: We propose a computational modeling framework for inducing combinatory categorial grammars from arbitrary behavioral data. This framework provides the analyst fine-grained control over the assumptions that the induced grammar should conform to: (i) what the primitive types are; (ii) how complex types are constructed; (iii) what set of combinators can be used to combine types; and (iv) whether (and to what) the types of some lexical items should be fixed. In a proof-of-concept experiment, we deploy our framework for use in distributional analysis. We focus on the relationship between s(emantic)-selection and c(ategory)-selection, using as input a lexicon-scale acceptability judgment dataset focused on English verbs' syntactic distribution (the MegaAcceptability dataset) and enforcing standard assumptions from the semantics literature on the induced grammar.
摘要：我们提出了从任意的行为数据诱导组合子范畴语法的计算模型框架。该框架提供了对假设的分析师细粒度的控制，诱导语法应符合：（一）什么的基本类型有; （ⅱ）类型如何复杂构造; （三）有什么一套组合子可以用来类型组合; （四）是否（以及在何种）类型的一些词汇项目应该是固定的。在验证的概念的实验中，我们部署我们在分布式分析中的应用框架。我们专注于（emantic）-selection和c（ategory）-selection S之间的关系，使用作为输入的词典规模可接受判断数据集集中于英语动词句法分布（MegaAcceptability数据集）和执行从语义文献标准假设在诱导语法。

27. MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention [PDF] 返回目录
Aman Khullar, Udit Arora
Abstract: This paper presents MAST, a new model for Multimodal Abstractive Text Summarization that utilizes information from all three modalities -- text, audio and video -- in a multimodal video. Prior work on multimodal abstractive text summarization only utilized information from the text and video modalities. We examine the usefulness and challenges of deriving information from the audio modality and present a sequence-to-sequence trimodal hierarchical attention-based model that overcomes these challenges by letting the model pay more attention to the text modality. MAST outperforms the current state of the art model (video-text) by 2.51 points in terms of Content F1 score and 1.00 points in terms of Rouge-L score on the How2 dataset for multimodal language understanding.
摘要：本文介绍MAST，在多模写意文本摘要一种新的模式，利用从所有三个模式的资料 - 文本，音频和视频 - 以在多视频。在多抽象的文本摘要工作之前仅利用从文本和视频模式的信息。我们研究的实用性和从听觉模态获取信息的挑战，并提出了一种序列到序列三峰层次关注为主的模式，通过让模型更注重文本模式克服了这些挑战。 MAST优于在上How2数据集多式联运语言理解高棉-L分数方面的内容F1分数方面和1.00点的艺术典范（视频文字）的2.51点的当前状态。

28. GSum: A General Framework for Guided Neural Abstractive Summarization [PDF] 返回目录
Zi-Yi Dou, Pengfei Liu, Hiroaki Hayashi, Zhengbao Jiang, Graham Neubig
Abstract: Neural abstractive summarization models are flexible and can produce coherent summaries, but they are sometimes unfaithful and can be difficult to control. While previous studies attempt to provide different types of guidance to control the output and increase faithfulness, it is not clear how these strategies compare and contrast to each other. In this paper, we propose a general and extensible guided summarization framework (GSum) that can effectively take different kinds of external guidance as input, and we perform experiments across several different varieties. Experiments demonstrate that this model is effective, achieving state-of-the-art performance according to ROUGE on 4 popular summarization datasets when using highlighted sentences as guidance. In addition, we show that our guided model can generate more faithful summaries and demonstrate how different types of guidance generate qualitatively different summaries, lending a degree of controllability to the learned models.
摘要：神经抽象概括模型是灵活的，可以产生相干的摘要，但他们有时不忠，可以是难以控制的。虽然以前的研究试图提供不同类型的指导来控制输出，增加忠诚，这些策略如何比较和对比，彼此并不清楚。在本文中，我们提出了一个通用，可扩展的引导汇总框架（GSUM），可以有效地采取各种不同的外部指导作为输入，我们在几个不同的品种进行实验。实验结果表明，这种模式是有效的，使用突出的句子为指导，当上4个流行汇总数据集根据ROUGE实现国家的最先进的性能。此外，我们表明，我们的引导模型可以产生更多的忠实总结和展示不同类型的指导如何产生质的不同汇总，贷款一定程度的可控性，以学习模式。

29. What is More Likely to Happen Next? Video-and-Language Future Event Prediction [PDF] 返回目录
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
Abstract: Given a video with aligned dialogue, people can often infer what is more likely to happen next. Making such predictions requires not only a deep understanding of the rich dynamics underlying the video and dialogue, but also a significant amount of commonsense knowledge. In this work, we explore whether AI models are able to learn to make such multimodal commonsense next-event predictions. To support research in this direction, we collect a new dataset, named Video-and-Language Event Prediction (VLEP), with 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. In order to promote the collection of non-trivial challenging examples, we employ an adversarial human-and-model-in-the-loop data collection procedure. We also present a strong baseline incorporating information from video, dialogue, and commonsense knowledge. Experiments show that each type of information is useful for this challenging task, and that compared to the high human performance on VLEP, our model provides a good starting point but leaves large room for future work. Our dataset and code are available at: this https URL
摘要：鉴于对准对话的视频，人们常常可以推断出更可能接下来会发生什么。制作这样的预测，不仅需要丰富的动态视频和对话背后的深刻理解，也是常识性知识显著量。在这项工作中，我们将探讨人工智能的模型是否能够学会做这样的多式联运常识未来事件的预测。为了支持研究在这个方向上，我们收集了10,234不同的电视节目和YouTube视频日志生活方式视频剪辑一个新的数据集，命名为视频和语言事件预测（VLEP），以28726未来事件预测的例子（及其理由一起）。为了促进非平凡的挑战例子集合，我们采用对抗性人类和模型在中环的数据收集程序。我们还提出从视频，对话和常识知识很强的基准合并的信息。实验表明，每一种类型的信息是这个具有挑战性的任务是有用的，并且，相比于VLEP高人力绩效，我们的模型提供了一个很好的起点，但叶大房间为今后的工作。我们的数据和代码，请访问：此HTTPS URL

30. CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets -- RoBERTa Ensembles and The Continued Relevance of Handcrafted Features [PDF] 返回目录
Calum Perrio, Harish Tayyar Madabushi
Abstract: This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text. We explore improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. We test the effectiveness of including the aforementioned features in accommodating the challenges of a noisy data set centred on a specific subject outside the remit of the pre-training data. We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.
摘要：本文介绍了我们提交研讨会的任务2嘈杂的用户生成的文本。我们探讨改善性能的预先训练的基于变压器的语言模型微调的文本分类通过集成实现，它利用了语料库级别信息和手工制作的功能。我们测试包括在容纳中心上的预训练数据的职权范围的特定被摄体嘈杂数据集的挑战上述特征的有效性。我们发现，包括额外的功能，可以提高分类结果，并在2点前表演队实现了比分。

31. Explicit Alignment Objectives for Multilingual Bidirectional Encoders [PDF] 返回目录
Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Siddhant, Graham Neubig
Abstract: Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) and XLMR (Conneau et al., 2020) have proven to be impressively effective at enabling transfer-learning of NLP systems from high-resource languages to low-resource languages. This success comes despite the fact that there is no explicit objective to align the contextual embeddings of words/sentences with similar meanings across languages together in the same space. In this paper, we present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bidirectional EncodeR). AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. We conduct experiments on zero-shot cross-lingual transfer learning for different tasks including sequence tagging, sentence retrieval and sentence classification. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model which has 4.6x the parameters of AMBER.
摘要：（Devlin等，2019）（。Conneau等人，2020年）预训练的跨语种编码器如mBERT和XLMR已被证明是在使得能够从高资源语言NLP系统的转印学习令人印象深刻的有效低资源语言。这种成功源自尽管没有明确目标对准具有类似意义的词语/句子跨语言上下文的嵌入在同一个空间。在本文中，我们提出了学习多种语言编码器的新方法，AMBER（对齐多语种双向编码器）。琥珀是在使用不同粒度对准多种语言表示两个显式对准目标额外的并行数据训练。我们开展对零次跨语言迁移学习实验针对不同的任务，包括序列标记，句子检索和句子分类。实验结果表明，AMBER取得增益高达1.1平均得分F1序列上的标记和关于检索超过其具有AMBER的4.6倍的参数XLMR-大模型高达27.3平均准确的。

32. Analogous Process Structure Induction for Sub-event Sequence Prediction [PDF] 返回目录
Hongming Zhang, Muhao Chen, Haoyu Wang, Yangqiu Song, Dan Roth
Abstract: Computational and cognitive studies of event understanding suggest that identifying, comprehending, and predicting events depend on having structured representations of a sequence of events and on conceptualizing (abstracting) its components into (soft) event categories. Thus, knowledge about a known process such as "buying a car" can be used in the context of a new but analogous process such as "buying a house". Nevertheless, most event understanding work in NLP is still at the ground level and does not consider abstraction. In this paper, we propose an Analogous Process Structure Induction APSI framework, which leverages analogies among processes and conceptualization of sub-event instances to predict the whole sub-event sequence of previously unseen open-domain processes. As our experiments and analysis indicate, APSI supports the generation of meaningful sub-event sequences for unseen processes and can help predict missing events.
摘要：计算和事件的理解认知的研究表明，识别，理解和预测事件依赖于具有事件序列的结构表征和概念化（抽象）在其组件为（软）事件的类别。因此，关于已知的工艺知识，如“买汽车”，可以在一个新的，但类似的方法的上下文中使用诸如“买房子”。然而，在大多数NLP了解事件的工作尚处于底层，并没有考虑抽象。在本文中，我们提出了一个类似的过程中结构诱导APSI框架，它利用处理和子事件实例的概念化之间的类比来预测以前看不见的开放域进程的整个子事件序列。由于我们的实验和分析表明，APSI支持看不见的进程有意义的子事件序列的产生，并有助于预测丢失事件。

33. Effective Distributed Representations for Academic Expert Search [PDF] 返回目录
Mark Berger, Jakub Zavrel, Paul Groth
Abstract: Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval. We use the Microsoft Academic Graph dataset and experiment with different configurations of a document-centric voting model for retrieval. In particular, we explore the impact of the use of contextualized embeddings on search performance. We also present results for paper embeddings that incorporate citation information through retrofitting. Additionally, experiments are conducted using different techniques for assigning author weights based on author order. We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations for document-centric expert retrieval. However, retrofitting the paper embeddings and using elaborate author contribution weighting strategies did not improve retrieval performance.
摘要：专家搜寻旨在发现并根据用户的查询排名的专家。在学术界，检索专家通过大量理论知识的导航的有效途径。在这里，我们研究如何不同的分布式的学术论文表示（即嵌入物）影响的学术专家检索。我们使用微软学术图表数据集和试验检索文档为中心的投票模型的不同配置。特别是，我们探讨搜索性能的使用情境的嵌入的影响。通过改造纸张的嵌入我们还目前的研究结果结合了引用信息。此外，实验使用的是不同的技术来分配基于笔者为了作者的权重进行。我们观察到，利用训练句子相似度任务的变压器模型产生上下文的嵌入生产以文档为中心的专家检索最有效的文件表示。然而，改装纸的嵌入，并使用复杂的作家贡献权重的策略并没有提高检索的性能。

34. PrivNet: Safeguarding Private Attributes in Transfer Learning for Recommendation [PDF] 返回目录
Guangneng Hu, Qiang Yang
Abstract: Transfer learning is an effective technique to improve a target recommender system with the knowledge from a source domain. Existing research focuses on the recommendation performance of the target domain while ignores the privacy leakage of the source domain. The transferred knowledge, however, may unintendedly leak private information of the source domain. For example, an attacker can accurately infer user demographics from their historical purchase provided by a source domain data owner. This paper addresses the above privacy-preserving issue by learning a privacy-aware neural representation by improving target performance while protecting source privacy. The key idea is to simulate the attacks during the training for protecting unseen users' privacy in the future, modeled by an adversarial game, so that the transfer learning model becomes robust to attacks. Experiments show that the proposed PrivNet model can successfully disentangle the knowledge benefitting the transfer from leaking the privacy.
摘要：迁移学习是为了提高和从源域的知识目标的推荐系统的有效方法。现有的研究主要集中在目标域的建议性能的同时，忽略源域的隐私泄露。被转移的知识，但是，可能无意地泄漏源域的私人信息。例如，攻击者可以准确地从它们的由源域数据所有者提供的历史购买推断用户人口统计。本文通过学习改善目标性能的同时，保护源隐私秘密感知神经表示解决了上述隐私保护的问题。关键的想法是在未来，由对抗性游戏建模看不见的保护用户隐私的训练中模拟的攻击，使得转印学习模式成为强大的攻击。实验表明，该PrivNet模型可以成功地解开知识受益泄露隐私的转移。

35. Decentralized Knowledge Graph Representation Learning [PDF] 返回目录
Lingbing Guo, Weiqing Wang, Zequn Sun, Chenghao Liu, Wei Hu
Abstract: Knowledge graph (KG) representation learning methods have achieved competitive performance in many KG-oriented tasks, among which the best ones are usually based on graph neural networks (GNNs), a powerful family of networks that learns the representation of an entity by aggregating the features of its neighbors and itself. However, many KG representation learning scenarios only provide the structure information that describes the relationships among entities, causing that entities have no input features. In this case, existing aggregation mechanisms are incapable of inducing embeddings of unseen entities as these entities have no pre-defined features for aggregation. In this paper, we present a decentralized KG representation learning approach, decentRL, which encodes each entity from and only from the embeddings of its neighbors. For optimization, we design an algorithm to distill knowledge from the model itself such that the output embeddings can continuously gain knowledge from the corresponding original embeddings. Extensive experiments show that the proposed approach performed better than many cutting-edge models on the entity alignment task, and achieved competitive performance on the entity prediction task. Furthermore, under the inductive setting, it significantly outperformed all baselines on both tasks.
摘要：知识图（KG）表示学习方法已经取得了许多面向KG-任务竞争力的性能，其中最好的是，通常基于图形神经网络（GNNS），网络的强大的家族，通过学习一个实体的代表聚集其邻国和自身的特点。然而，许多KG表示学习情景仅提供描述实体之间的关系，导致该实体还没有输入的功能结构的信息。在这种情况下，现有的聚集机制不能诱导看不见实体的嵌入，因为这些实体对聚集没有预先定义的功能。在本文中，我们提出了一个分散KG表示学习方法，decentRL，它编码，只有从邻国的嵌入物的每个实体。为了优化，我们设计一个算法，从模型中提制知识本身使得输出的嵌入可连续从相应的原始的嵌入获得知识。大量的实验表明，该方法比在实体比对任务的许多尖端车型表现较好，而在实体预测任务取得竞争性优势。此外，根据感应设置，显著优于两个任务的所有基线。

36. Empirical Study of Transformers for Source Code [PDF] 返回目录
Nadezhda Chirkova, Sergey Troshin
Abstract: Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly structured, i. e. follows the syntax of the programming language. Several recent works develop Transformer modifications for capturing syntactic information in source code. The drawback of these works is that they do not compare to each other and all consider different tasks. In this work, we conduct a thorough empirical study of the capabilities of Transformers to utilize syntactic information in different tasks. We consider three tasks (code completion, function naming and bug fixing) and re-implement different syntax-capturing modifications in a unified framework. We show that Transformers are able to make meaningful predictions based purely on syntactic information and underline the best practices of taking the syntactic information into account for improving the performance of the model.
摘要：最初的自然语言处理（NLP）开发，变压器是目前广泛使用的源代码处理，由于源代码和文本之间的格式相似。相较于自然语言，源代码是严格结构化的，我。即以下的编程语言的语法。近期的一些工作制定变压器修改的源代码捕获句法信息。这些作品的缺点是它们不进行相互比较，并考虑所有不同的任务。在这项工作中，我们进行的变压器的能力，深入的实证研究，利用不同的任务句法信息。我们认为三项任务（代码完成，功能命名和bug修复），并重新实现在一个统一的框架不同的语法捕获修改。我们表明，变压器能够使句法信息纯粹基于有意义的预测，并强调采取句法信息考虑在内以提高模型的性能的最佳实践。

37. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding [PDF] 返回目录
Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge
Abstract: We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations. We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human demonstrations. The size, scope and detail of RxR dramatically expands the frontier for research on embodied language agents in simulated, photo-realistic environments.
摘要：介绍房间，跨间（RXR），一个新的视觉和语言导航（VLN）数据集。 RXR是多种语言（英语，印地文，和泰卢固语）和比其他数据集VLN更大（更路径和指令）。它强调语言在VLN在路径解决已知的偏见和引发对实体可见更多的参考作用。此外，在指令每个字是时间对准指令创造者和验证的虚拟姿势。我们建立基线分数单语和多语言设置，多任务学习，包括房间到房间的注释时。我们还提供了一个模型，结果只关注人参加游行示威的全景部分从同步姿势痕迹获悉。 RXR的规模，范围和细节极大地扩展了在模拟，照片般逼真的环境中体现语言代理研究中的前沿。

38. Personalized Neural Embeddings for Collaborative Filtering with Text [PDF] 返回目录
Guangneng Hu
Abstract: Collaborative filtering (CF) is a core technique for recommender systems. Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and views) only and hence they suffer from the data sparsity issue. Items are usually associated with unstructured text such as article abstracts and product reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. We learn such embeddings of users, items, and words jointly, and predict user preferences on items based on these learned representations. PNE estimates the probability that a user will like an item by two terms---behavior factors and semantic factors. On two real-world datasets, PNE shows better performance than four state-of-the-art baselines in terms of three metrics. We also show that PNE learns meaningful word embeddings by visualization.
摘要：协同过滤（CF）是用于推荐系统核心技术。传统的CF方法利用用户项目关系（例如，点击，喜欢，和视图）只，因此，他们从数据稀疏问题受到影响。项目通常与非结构化文本相关的，如文章摘要和产品评论。我们制定个性化的神经嵌入（PNE）框架，同时利用交互和无缝的话。我们了解用户，项目和词的嵌入等联合，以及基于这些教训陈述项预测用户偏好。 PNE估计用户将通过两个词---行为的因素和语义因素，如一个项目的可能性。两个现实世界的数据集，超过四个国家的最先进的基线三个指标方面的性能PNE节目更好。我们还表明，通过PNE可视化学习有意义的字的嵌入。

39. LCMR: Local and Centralized Memories for Collaborative Filtering with Unstructured Text [PDF] 返回目录
Guangneng Hu, Yu Zhang, Qiang Yang
Abstract: Collaborative filtering (CF) is the key technique for recommender systems. Pure CF approaches exploit the user-item interaction data (e.g., clicks, likes, and views) only and suffer from the sparsity issue. Items are usually associated with content information such as unstructured text (e.g., abstracts of articles and reviews of products). CF can be extended to leverage text. In this paper, we develop a unified neural framework to exploit interaction data and content information seamlessly. The proposed framework, called LCMR, is based on memory networks and consists of local and centralized memories for exploiting content information and interaction data, respectively. By modeling content information as local memories, LCMR attentively learns what to exploit with the guidance of user-item interaction. On real-world datasets, LCMR shows better performance by comparing with various baselines in terms of the hit ratio and NDCG metrics. We further conduct analyses to understand how local and centralized memories work for the proposed framework.
摘要：协同过滤（CF）是用于推荐系统的关键技术。纯CF办法利用只有用户项交互数据（例如，点击，喜欢和视图）和稀疏性问题受到影响。项目通常与内容信息相关联，如非结构化文本（例如，抽象的文章和产品评论的）。 CF可以扩展到杠杆文本。在本文中，我们开发了一个统一的框架，神经利用交互数据和内容信息的无缝连接。拟议的框架，叫做LCMR，是基于存储网络，由地方和中央存储器，用于分别利用内容信息和交互数据。通过建模的内容信息作为本地存储器，LCMR聚精会神地学习如何与用户互动项目的指导漏洞。在现实世界中的数据集，通过各种基线的命中率和NDCG指标方面表现的比较显示LCMR更好。我们进一步分析的行为，以了解如何在地方和中央的回忆所提出的框架内开展工作。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-10-19

目录

摘要