摘要

1. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [PDF] 返回目录
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin
Abstract: Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at this https URL.
摘要：大型预训练的语言模型，如BERT带来显著改善NLP应用。然而，他们也臭名远扬的推理中，这使得它们难以在实时应用部署缓慢。我们提出了一个简单而有效的方法，DeeBERT，加快BERT推断。我们的方法使样本更早退出而不通过整个模型。实验表明，DeeBERT能够节省高达〜40％的推理时间与模型质量退化最小。进一步的分析显示在BERT变压器层不同的行为，并揭示其冗余。我们的工作提供了新的思路，有效地应用深基于变压器的模型下游任务。代码可在此HTTPS URL。

2. Natural language processing for achieving sustainable development: the case of neural labelling to enhance community profiling [PDF] 返回目录
Costanza Conforti, Stephanie Hirmer, David Morgan, Marco Basaldella, Yau Ben Or
Abstract: In recent years, there has been an increasing interest in the application of Artificial Intelligence - and especially Machine Learning - to the field of Sustainable Development (SD). However, until now, NLP has not been applied in this context. In this research paper, we show the high potential of NLP applications to enhance the sustainability of projects. In particular, we focus on the case of community profiling in developing countries, where, in contrast to the developed world, a notable data gap exists. In this context, NLP could help to address the cost and time barrier of structuring qualitative data that prohibits its widespread use and associated benefits. We propose the new task of Automatic UPV classification, which is an extreme multi-class multi-label classification problem. We release Stories2Insights, an expert-annotated dataset, provide a detailed corpus analysis, and implement a number of strong neural baselines to address the task. Experimental results show that the problem is challenging, and leave plenty of room for future research at the intersection of NLP and SD.
摘要：近年来，人们对人工智能的应用越来越感兴趣 - 尤其是机器学习 - 可持续发展（SD）的领域。然而，到现在为止，NLP还没有在这方面应用。在本研究报告中，我们展示的NLP应用的高潜力，以提高项目的可持续性。尤其是，我们专注于社区在发展中国家，其中，相较于发达国家，一个显着的数据差距剖析存在的情况下。在此背景下，NLP可有助于解决结构定性数据的成本和时间的障碍，禁止它的广泛使用和效益的机会。我们建议UPV自动分类，这是一个极端的多类多标签分类问题的新任务。我们发布Stories2Insights，一个专家注释的数据集，提供了详细的语料的分析，并实施了一批有实力的神经基线来解决的任务。实验结果表明，该问题是一个挑战，在NLP和SD的交点留下足够的空间，为今后的研究。

3. SCDE: Sentence Cloze Dataset with High Quality Distractors From Examinations [PDF] 返回目录
Xiang Kong, Varun Gangal, Eduard Hovy
Abstract: We introduce SCDE, a dataset to evaluate the performance of computational models through sentence prediction. SCDE is a human-created sentence cloze dataset, collected from public school English examinations. Our task requires a model to fill up multiple blanks in a passage from a shared candidate set with distractors designed by English teachers. Experimental results demonstrate that this task requires the use of non-local, discourse-level context beyond the immediate sentence neighborhood. The blanks require joint solving and significantly impair each other's context. Furthermore, through ablations, we show that the distractors are of high quality and make the task more challenging. Our experiments show that there is a significant performance gap between advanced models (72%) and humans (87%), encouraging future models to bridge this gap.
摘要：介绍SCDE，一个数据集来评估计算模型，通过句子的预测性能。 SCDE是人类创建的句子填空数据集，从公立学校英语考试收集。我们的任务要求的模型从共享候选集填补了空白多的通行，而英语教师设计的干扰项。实验结果表明，该任务需要超越眼前的句子附近使用非本地，语篇层次背景。坯料需要联合解决和显著损害对方的背景。此外，通过消融，我们表明，分心的高品质，使任务更具挑战性。我们的实验表明，有先进的机型（72％）和人（87％），鼓励未来的车型来弥补这个缺口之间的显著的性能差距。

4. Intelligent Translation Memory Matching and Retrieval with Sentence Encoders [PDF] 返回目录
Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Abstract: Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper we introduce sentence encoders to improve the matching and retrieving process in Translation Memories systems - an effective and efficient solution to replace edit distance based algorithms.
摘要：匹配和检索以前翻译过的片段从翻译记忆库中的翻译记忆库系统的关键功能。然而，这种匹配和检索的过程仍限于在此基础上，我们已确定为翻译记忆库系统的主要缺点编辑距离算法。在本文中，我们介绍了一句话编码器来提高翻译记忆库系统的匹配和检索的过程 - 来取代基于编辑距离算法的有效和高效的解决方案。

5. DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking [PDF] 返回目录
Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan
Abstract: The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence. The Fact Extraction and VERification (FEVER) dataset provides such a resource for evaluating end-to-end fact-checking, requiring retrieval of evidence from Wikipedia to validate a veracity prediction. We show that current systems for FEVER are vulnerable to three categories of realistic challenges for fact-checking -- multiple propositions, temporal reasoning, and ambiguity and lexical variation -- and introduce a resource with these types of claims. Then we present a system designed to be resilient to these "attacks" using multiple pointer networks for document selection and jointly modeling a sequence of evidence sentences and veracity relation predictions. We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.
摘要：更加注重误传刺激了数据和系统的发展，用于检测根据权利要求的真实性以及检索权威证据。实况提取和验证（发烧）的数据集提供了评估端至端事实检查，需要的证据检索维基百科来验证真实性预测这样的资源。我们显示发烧当前系统很容易受到三类为事实检查现实的挑战 - 多重命题，时间推理和模糊性和词汇的变化 - 而这些类型的索赔引入资源。然后，我们提出了设计为抵御这些“攻击”使用多个指针网络进行文件的选择和共同建模证据的句子和准确性关系预测的序列的系统。我们发现，在处理这些攻击，我们获得FEVER状态的最先进的成果，主要是由于改善的证据检索。

6. Synonyms and Antonyms: Embedded Conflict [PDF] 返回目录
Igor Samenko, Alexey Tikhonov, Ivan P. Yamshchikov
Abstract: Since modern word embeddings are motivated by a distributional hypothesis and are, therefore, based on local co-occurrences of words, it is only to be expected that synonyms and antonyms can have very similar embeddings. Contrary to this widespread assumption, this paper shows that modern embeddings contain information that distinguishes synonyms and antonyms despite small cosine similarities between corresponding vectors. This information is encoded in the geometry of the embeddings and could be extracted with a manifold learning procedure or {\em contrasting map}. Such a map is trained on a small labeled subset of the data and can produce new empeddings that explicitly highlight specific semantic attributes of the word. The new embeddings produced by the map are shown to improve the performance on downstream tasks.
摘要：由于现代字的嵌入由分布假设动机，并，因此，基于词的地方共现，它是唯一可以预料的是同义词和反义词可以有非常相似的嵌入。与此相反的广泛假设，本文表明，现代的嵌入包含的信息区分同义词和反义词，尽管相应的载体之间的小余弦相似性。这个信息是在嵌入物的几何形状编码并且可以与歧管学习过程或{\ EM对比地图}中提取。这样的地图被训练在一个小的数据集标记，并且能够产生新的empeddings明确强调单词的特定的语义属性。由地图产生的新的嵌入被示出为提高下游任务的性能。

7. LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning [PDF] 返回目录
Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu
Abstract: While pre-training and fine-tuning, e.g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage. In this paper, we propose LightPAFF, a Lightweight Pre-training And Fine-tuning Framework that leverages two-stage knowledge distillation to transfer knowledge from a big teacher model to a lightweight student model in both pre-training and fine-tuning stages. In this way the lightweight model can achieve similar accuracy as the big teacher model, but with much fewer parameters and thus faster online inference speed. LightPAFF can support different pre-training methods (such as BERT, GPT-2 and MASS~\citep{song2019mass}) and be applied to many downstream tasks. Experiments on three language understanding tasks, three language modeling tasks and three sequence to sequence generation tasks demonstrate that while achieving similar accuracy with the big BERT, GPT-2 and MASS models, LightPAFF reduces the model size by nearly 5x and improves online inference speed by 5x-7x.
摘要：虽然前培训和微调，例如，BERT〜\ citep {devlin2018bert}，GPT-2〜\ citep {radford2019language}，已经在语言理解和生成任务取得了巨大成功，预先训练的模型通常太大在线部署在内存成本和推理速度而言，从实用的在线使用妨碍他们。在本文中，我们提出LightPAFF，轻量级前培训及微调框架，利用两阶段的知识蒸馏转移知识，从一个大的老师模型在这两个前培训和微调阶段的轻量级学生模型。通过这种方式，轻量级模型可以达到的精度为大老师模型相似，但少得多的参数，从而更快的联机速度推断。 LightPAFF可以支持不同的预训练方法（如BERT，GPT-2和MASS〜\ citep {song2019mass}）和适用于许多下游的任务。三种语言理解任务，三种语言建模任务和三个序列序列生成任务实验证明，同时与大BERT，GPT-2和大众车型实现了类似的准确性，LightPAFF了近5倍降低了模型的大小，并通过提高在线推理速度5X-7X。

8. ColBERT: Using BERT Sentence Embedding for Humor Detection [PDF] 返回目录
Issa Annamoradnejad
Abstract: Automatic humor detection has interesting use cases in modern technologies, such as chatbots and personal assistants. In this paper, we describe a novel approach for detecting humor in short texts using BERT sentence embedding. Our proposed model uses BERT to generate tokens and sentence embedding for texts. It sends embedding outputs as input to a two-layered neural network that predicts the target value. For evaluation, we created a new dataset for humor detection consisting of 200k formal short texts (100k positive, 100k negative). Experimental results show an accuracy of 98.1 percent for the proposed method, 2.1 percent improvement compared to the best CNN and RNN models and 1.1 percent better than a fine-tuned BERT model. In addition, the combination of RNN-CNN was not successful in this task compared to the CNN model.
摘要：自动检测幽默在现代技术有趣的应用情况，比如聊天机器人和个人助理。在本文中，我们描述了检测使用BERT句子嵌入短文本幽默的新方法。我们提出的模型使用BERT生成令牌和句子嵌入的文本。它发送嵌入输出作为输入到预测目标值两层神经网络。对于评价，我们创建了一个新的数据集，包括200K正式的短文（10万阳性，100K负）幽默检测。实验结果表明，相比于最佳CNN和RNN模型和1.1％，比微调BERT模型更好的百分之98.1的建议的方法，提高了2.1％的精度。此外，相比于CNN模型RNN-CNN的组合并不成功完成这项任务。

9. The Gutenberg Dialogue Dataset [PDF] 返回目录
Richard Csaky, Gabor Recski
Abstract: Large datasets are essential for many NLP tasks. Current publicly available open-domain dialogue datasets offer a trade-off between size and quality (e.g. DailyDialog vs. Opensubtitles). We aim to close this gap by building a high-quality dataset consisting of 14.8M utterances in English. We extract and process dialogues from publicly available online books. We present a detailed description of our pipeline and heuristics and an error analysis of extracted dialogues. Better response quality can be achieved in zero-shot and finetuning settings by training on our data than on the larger but much noisier Opensubtitles dataset. Researchers can easily build their versions of the dataset by adjusting various trade-off parameters. The code can be extended to further languages with limited effort (this https URL).
摘要：大数据集是许多NLP任务至关重要。当前公开开放域对话提供的数据集的大小和质量（例如DailyDialog与Opensubtitles）之间的权衡。我们的目标是建立一个高质量的数据集，包括英语14.8M话语来弥补这一差距。我们提取并公开提供在线图书过程中的对话。我们提出我们的管道和启发式的详细描述和提取对话的误差分析。更好的响应质量可以在零开枪微调设置通过训练我们的数据比上取得更大但更嘈杂Opensubtitles数据集。研究人员可以很容易地通过调整各种权衡参数建立自己的数据集的版本。该代码可以扩展到有限的精力进一步的语言（此HTTPS URL）。

10. Augmenting Transformers with KNN-Based Composite Memory for Dialogue [PDF] 返回目录
Angela Fan, Claire Gardent, Chloe Braud, Antoine Bordes
Abstract: Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialogue modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge from Wikipedia, images, and human-written dialogue utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.
摘要：各种机器学习任务都可以访问不同的方式，如文本和图像的外部信息中受益。近期的工作重点学习架构能存储这方面的知识的大记忆。我们建议增强生成的变压器神经网络与基于KNN-信息取得（KIF）模块。每个KIF模块学习读取操作以获取固定的外部知识。我们将这些模块生成的对话模型，其中信息必须灵活地检索并纳入维护的话题和谈话的流动一项艰巨的任务。我们通过识别维基百科，图片，和人对话的书面话语有关知识证明了该方法的有效性，并表明，利用这一检索的信息提高了模型的性能，通过自动和人工评估测量。

11. Screenplay Summarization Using Latent Narrative Structure [PDF] 返回目录
Pinelopi Papalampidi, Frank Keller, Lea Frermann, Mirella Lapata
Abstract: Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heuristics are not sufficient. In this paper, we propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays (i.e., extract an optimal sequence of scenes). Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode and improve summarization performance over general extractive algorithms leading to more complete and diverse summaries.
摘要：大多数通用采掘总结模型在新闻文章，这是短而呈现所有重要信息的前期培训。其结果是，这样的模型被偏置上的位置，并经常从文档的开头执行判决的智能选择。当总结长的叙述，其具有复杂的结构和本信息零碎，简单的位置启发式是不够的。在本文中，我们建议明确地将叙述的底层结构分为一般无监督和监督采掘总结模型。我们在关键的叙述事件（转折点）的条款正式叙事结构，并把它当作潜在为了总结剧本（即提取场景的最佳序列）。在电视电影剧本，这是我们与现场级汇总标签增加的CSI语料实验结果，表明潜在的转折点与CSI情节的重要方面相关，提高了一般采掘算法导致更完整和多样化的总结总结性能。

12. BLEU Neighbors: A Reference-less Approach to Automatic Evaluation [PDF] 返回目录
Kawin Ethayarajh, Dorsa Sadigh
Abstract: Evaluation is a bottleneck in the development of natural language generation (NLG) models. Automatic metrics such as BLEU rely on references, but for tasks such as open-ended generation, there are no references to draw upon. Although language diversity can be estimated using statistical measures such as perplexity, measuring language quality requires human evaluation. However, because human evaluation at scale is slow and expensive, it is used sparingly; it cannot be used to rapidly iterate on NLG models, in the way BLEU is used for machine translation. To this end, we propose BLEU Neighbors, a nearest neighbors model for estimating language quality by using the BLEU score as a kernel function. On existing datasets for chitchat dialogue and open-ended sentence generation, we find that -- on average -- the quality estimation from a BLEU Neighbors model has a lower mean squared error and higher Spearman correlation with the ground truth than individual human annotators. Despite its simplicity, BLEU Neighbors even outperforms state-of-the-art models on automatically grading essays, including models that have access to a gold-standard reference essay.
摘要：评估是在自然语言生成（NLG）车型发展的一个瓶颈。自动指标，如BLEU靠引用，但这样的任务，开放式的一代，没有引用可资借鉴。虽然语言多样性可以用统计的措施，如困惑估计，测量语言质量需要人工评估。然而，由于大规模人工评估是缓慢和昂贵，这是谨慎使用;它不能被用来快速迭代上NLG车型，在路上BLEU用于机器翻译。为此，我们提出了BLEU邻居，通过使用BLEU得分作为内核函数估计的语言质量的近邻模型。论闲聊对话和开放式的句子生成现有数据集，我们发现 - 平均 - 从BLEU邻居模型的质量估计具有较低的均方误差和更高的Spearman相关与地面真理不是个别人的注释。尽管它的简单，BLEU邻居甚至优于自动分级散文国家的最先进的机型，包括有机会获得金标准参考文章模型。

13. Semantic Graphs for Generating Deep Questions [PDF] 返回目录
Liangming Pan, Yuxi Xie, Yansong Feng, Tat-Seng Chua, Min-Yen Kan
Abstract: This paper proposes the problem of Deep Question Generation (DQG), which aims to generate complex questions that require reasoning over multiple pieces of information of the input passage. In order to capture the global structure of the document and facilitate reasoning, we propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN). Afterwards, we fuse the document-level and graph-level representations to perform joint training of content selection and question decoding. On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance. The code is publicly available at this https URL.
摘要：本文提出深问代（DQG），其目的是产生需要推理在多条输入通道的信息复杂问题的问题。为了捕获文档的全局结构和促进推理，我们提出了一种新颖的框架，第一构建体用于输入文档语义级图，然后通过引入基于关注GGNN（ATT-GGNN）编码语义图。后来，我们融合了文件级和图形级表示执行内容的选择和问题解码的联合训练。在HotpotQA深问题中心的数据集，我们的模型大大提高了，需要在多个事实推理题性能，导致国家的最先进的性能。该代码是公开的，在此HTTPS URL。

14. Lexically Constrained Neural Machine Translation with Levenshtein Transformer [PDF] 返回目录
Raymond Hendy Susanto, Shamil Chollampatt, Liling Tan
Abstract: This paper proposes a simple and effective algorithm for incorporating lexical constraints in neural machine translation. Previous work either required re-training existing models with the lexical constraints or incorporating them during beam search decoding with significantly higher computational overheads. Leveraging the flexibility and speed of a recently proposed Levenshtein Transformer model (Gu et al., 2019), our method injects terminology constraints at inference time without any impact on decoding speed. Our method does not require any modification to the training procedure and can be easily applied at runtime with custom dictionaries. Experiments on English-German WMT datasets show that our approach improves an unconstrained baseline and previous approaches.
摘要：本文提出了一种在神经机器翻译词汇纳入约束的简单而有效的算法。以前的工作或者需要重新培训现有机型的词法约束或定向搜索与显著较高的计算开销解码过程中引入它们。利用最近提出的Levenshtein变压器模型的灵活性和速度（Gu等人，2019），我们的方法的术语内喷射在推理时间的限制而对解码速度的任何影响。我们的方法不需要任何修改训练过程，并可以很容易地在运行时用自定义词典应用。英语，德语WMT数据集实验结果表明，我们的方法提高了不受约束的基线和以前的方法。

15. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting [PDF] 返回目录
Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, Xiangzhan Yu
Abstract: Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we propose a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better performance than directly fine-tuning of BERT-large. Further, we provide the open-source RecAdam optimizer, which integrates the proposed mechanisms into Adam optimizer, to facility the NLP community.
摘要：深预训练的语言模型已经在第一，然后训练前微调的方式取得了巨大成功。但是，这样的连续传输模式的学习往往面临灾难性遗忘的问题，并导致次优的性能。微调用更少的遗忘，我们提出了一个回忆和学习机制，采用多任务学习的理念，共同学习训练前的任务和下游任务。具体来说，我们提出了一个训练前模拟机制，从训练前任务，而无需数据召回的知识，以及目标换挡机构，逐步集中于下游任务的学习。实验表明，我们的方法实现上胶基准国家的最先进的性能。我们的方法还使BERT基实现比BERT-大的直接微调更好的性能。此外，我们提供的开源RecAdam优化，其中提出的机制为亚当优化整合，以设施NLP社区。

16. On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification [PDF] 返回目录
Xin Liu, Jiefu Ou, Yangqiu Song, Xin Jiang
Abstract: Implicit discourse relation classification is one of the most difficult parts in shallow discourse parsing as the relation prediction without explicit connectives requires the language understanding at both the text span level and the sentence level. Previous studies mainly focus on the interactions between two arguments. We argue that a powerful contextualized representation module, a bilateral multi-perspective matching module, and a global information fusion module are all important to implicit discourse analysis. We propose a novel model to combine these modules together. Extensive experiments show that our proposed model outperforms BERT and other state-of-the-art systems on the PDTB dataset by around 8% and CoNLL 2016 datasets around 16%. We also analyze the effectiveness of different modules in the implicit discourse relation classification task and demonstrate how different levels of representation learning can affect the results.
摘要：隐性语篇关系分类是浅话语分析中最困难的部分之一，因为没有明确的联结关系的预测，需要在文本跨度水平和句子层面都理解的语言。以往的研究主要集中在两个参数之间的相互作用。我们认为，一个强大的情境表示模块，双边多视角匹配模块，以及一个全球信息融合模块都是隐性话语分析很重要。我们提出了一个新的模式，这些模块组合在一起。大量的实验表明，该模型优于BERT和8％左右的PDTB数据集其他国家的最先进的系统和CoNLL 16％左右2016年的数据集。我们还分析了不同模块的有效性隐含的话语关系分类的任务，表现出不同程度的表示学习如何影响结果。

17. Neural Machine Translation with Monte-Carlo Tree Search [PDF] 返回目录
Jerrod Parker, Jerry Zikun Chen
Abstract: Recent algorithms in machine translation have included a value network to assist the policy network when deciding which word to output at each step of the translation. The addition of a value network helps the algorithm perform better on evaluation metrics like the BLEU score. After training the policy and value networks in a supervised setting, the policy and value networks can be jointly improved through common actor-critic methods. The main idea of our project is to instead leverage Monte-Carlo Tree Search (MCTS) to search for good output words with guidance from a combined policy and value network architecture in a similar fashion as AlphaZero. This network serves both as a local and a global look-ahead reference that uses the result of the search to improve itself. Experiments using the IWLST14 German to English translation dataset show that our method outperforms the actor-critic methods used in recent machine translation papers.
摘要：在机器翻译最近的算法已包括价值网络在翻译的每一步决定哪些字输出时协助政策网络。增加了一个价值网络的帮助算法的评估指标，如BLEU得分表现更好。在监督的环境培训政策和价值网络后，政策和价值网络可以共同通过普通演员评论家方法改进。我们项目的主要思想是，以代替利用蒙特卡罗树搜索（MCTS）以类似的方式为AlphaZero搜索具有指导良好输出字从组合策略和价值网络架构。这种网络既作为本地和使用该搜索的结果，以提高自身的全局先行参考。使用IWLST14德文译成中文数据集上，我们的方法优于最近机器翻译论文所用的演员评论家方法实验。

18. Assessing Discourse Relations in Language Generationfrom Pre-trained Language Models [PDF] 返回目录
Wei-Jen Ko, Junyi Jessy Li
Abstract: Recent advances in NLP have been attributed to the emergence of large-scale pre-trained language models. GPT-2, in particular, is suited for generation tasks given its left-to-right language modeling objective, yet the linguistic quality of its generated text has largely remain unexplored. Our work takes a step in understanding GPT-2's outputs in terms of discourse coherence. We perform a comprehensive study on the validity of explicit discourse relations in GPT-2's outputs under both organic generation and fine-tuned scenarios. Results show GPT-2 does not always generate text containing valid discourse relations; nevertheless, its text is more aligned with human expectation in the fine-tuned scenario. We propose a decoupled strategy to mitigate these problems and highlight the importance of explicitly modeling discourse information.
摘要：NLP的最新进展已被归因于大型预训练的语言模型的出现。 GPT-2，特别适合于给予其左到右的语言生成的任务造型客观，但其生成的文本的语言质量在很大程度上仍然未知。我们的工作需要在语篇连贯方面了解GPT-2的输出的一个步骤。我们都有机生成和微调的情况下进行显式话语的关系在GPT-2的输出的有效性进行全面的研究。结果表明：GPT-2并不总是产生一个包含有效的话语关系的文本;尽管如此，它的文本更加符合在微调的场景人意料的对齐。我们提出了一个去耦策略，以减轻这些问题，并突出显示建模话语信息的重要性。

19. PTPARL-D: Annotated Corpus of 44 years of Portuguese Parliament debates [PDF] 返回目录
Paulo Almeida, Manuel Marques-Pita, Joana Gonçalves-Sá
Abstract: In a representative democracy, some decide in the name of the rest, and these elected officials are commonly gathered in public assemblies, such as parliaments, where they discuss policies, legislate, and vote on fundamental initiatives. A core aspect of such democratic processes are the plenary debates, where important public discussions take place. Many parliaments around the world are increasingly keeping the transcripts of such debates, and other parliamentary data, in digital formats accessible to the public, increasing transparency and accountability. Furthermore, some parliaments are bringing old paper transcripts to semi-structured digital formats. However, these records are often only provided as raw text or even as images, with little to no annotation, and inconsistent formats, making them difficult to analyze and study, reducing both transparency and public reach. Here, we present PTPARL-D, an annotated corpus of debates in the Portuguese Parliament, from 1976 to 2019, covering the entire period of Portuguese democracy.
摘要：在代议制民主，有些决定在剩下的名字，而这些当选的官员通常聚集在公众集会，如议会，他们讨论的政策，立法和投票根本举措。这样的民主进程的一个核心方面是全会辩论，其中重要的公共讨论发生。世界各地的许多议会正越来越多地保持这样的辩论，以及其他议会的数据，以数字格式向公众开放的成绩单，增加透明度和问责制。此外，一些议会正在把旧纸成绩单半结构化的数字格式。然而，这些记录往往只提供作为原始文本，甚至图像，几乎没有注释，和不一致的格式，使他们难以分析和研究，同时降低透明度和公众范围。在这里，我们目前PTPARL-d，在葡萄牙议会辩论的标注语料库，2076至19年，覆盖葡萄牙民主的整个时期。

20. Experiments with LVT and FRE for Transformer model [PDF] 返回目录
Ilshat Gibadullin, Aidar Valeev
Abstract: In this paper, we experiment with Large Vocabulary Trick and Feature-rich encoding applied to the Transformer model for Text Summarization. We could not achieve better results, than the analogous RNN-based sequence-to-sequence model, so we tried more models to find out, what improves the results and what deteriorates them.
摘要：在本文中，我们尝试使用大词汇伎俩，功能丰富的编码应用于变压器模型文本摘要。我们无法取得更好的成绩，比类似的基于RNN序列到序列模型，所以我们试图更多的模型，找出有什么改善的结果，什么恶化他们。

21. Semi-Supervised Neural System for Tagging, Parsing and Lematization [PDF] 返回目录
Piotr Rybak, Alina Wróblewska
Abstract: This paper describes the ICS PAS system which took part in CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. The system consists of jointly trained tagger, lemmatizer, and dependency parser which are based on features extracted by a biLSTM network. The system uses both fully connected and dilated convolutional neural architectures. The novelty of our approach is the use of an additional loss function, which reduces the number of cycles in the predicted dependency graphs, and the use of self-training to increase the system performance. The proposed system, i.e. ICS PAS (Warszawa), ranked 3th/4th in the official evaluation obtaining the following overall results: 73.02 (LAS), 60.25 (MLAS) and 64.44 (BLEX).
摘要：本文介绍了ICS PAS系统参加了CoNLL 2018共享任务的多语言解析从原始文本环球依赖。该系统由共同训练标记器，lemmatizer，和依赖解析器其基于由biLSTM网络提取的特征的。该系统既可以使用完全连接和扩张卷积神经结构。我们的方法的新颖之处是使用额外的损失函数，从而降低在预测依赖关系图的周期数，以及使用的自我训练，以提高系统的性能。所提出的系统，即ICS PAS（华沙），排在正式评估得出以下总体结果3TH / 4：73.02（LAS），60.25（MLAS）和64.44（BLEX）。

22. Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language [PDF] 返回目录
Qianhui Wu, Zijia Lin, Börje F. Karlsson, Jian-Guang Lou, Biqing Huang
Abstract: To better tackle the named entity recognition (NER) problem on languages with little/no labeled data, cross-lingual NER must effectively leverage knowledge learned from source languages with rich labeled data. Previous works on cross-lingual NER are mostly based on label projection with pairwise texts or direct model transfer. However, such methods either are not applicable if the labeled data in the source languages is unavailable, or do not leverage information contained in unlabeled data in the target language. In this paper, we propose a teacher-student learning method to address such limitations, where NER models in the source languages are used as teachers to train a student model on unlabeled data in the target language. The proposed method works for both single-source and multi-source cross-lingual NER. For the latter, we further propose a similarity measuring method to better weight the supervision from different teacher models. Extensive experiments for 3 target languages on benchmark datasets well demonstrate that our method outperforms existing state-of-the-art methods for both single-source and multi-source cross-lingual NER.
摘要：为了更好地解决命名实体识别（NER）问题上的语言很少/无标签的数据，跨语言NER必须有效地从源头语言与丰富的标记数据了解到杠杆知识。在跨语种NER以前的作品大多是基于与成对文本或直销模式转移标签，投影。但是，这些方法要么是不适用的，如果在源语言标记的数据不可用，或者不包含在目标语言未标记的数据利用信息。在本文中，我们提出了师生的学习方法来解决这种限制，在源语言NER模型被用来作为教师培训上未标记的数据学生机型在目标语言。所提出的方法适用于单源和多源跨语种NER两者。对于后者，我们进一步提出了一个相似的测量方法，以更好的重量不同型号老师的监督。 3种目标语言在基准数据集广泛的实验以及证明我们的方法优于现有状态的最先进的方法单源和多源跨语种NER两者。

23. Towards Multimodal Response Generation with Exemplar Augmentation and Curriculum Optimization [PDF] 返回目录
Zeyang Lei, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Yujiu Yang, Cheng Niu, Jie Zhou
Abstract: Recently, variational auto-encoder (VAE) based approaches have made impressive progress on improving the diversity of generated responses. However, these methods usually suffer the cost of decreased relevance accompanied by diversity improvements. In this paper, we propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization to enhance relevance and diversity of generated responses. First, unlike existing VAE-based models that usually approximate a simple Gaussian posterior distribution, we present a Gaussian mixture posterior distribution (i.e, multimodal) to further boost response diversity, which helps capture complex semantics of responses. Then, to ensure that relevance does not decrease while diversity increases, we fully exploit similar examples (exemplars) retrieved from the training data into posterior distribution modeling to augment response relevance. Furthermore, to facilitate the convergence of Gaussian mixture prior and posterior distributions, we devise a curriculum optimization strategy to progressively train the model under multiple training criteria from easy to hard. Experimental results on widely used SwitchBoard and DailyDialog datasets demonstrate that our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
摘要：近日，变自动编码器（VAE）为基础的方法对提高生成的响应的多样性，取得了令人瞩目的进步。然而，这些方法通常遭受降低伴随着多样性的改进相关的成本。在本文中，我们提出用典范增强和优化课程一种新型的多模态反应生成框架，以提高相关性和生成的响应的多样性。首先，与现有的基于VAE的模型，通常近似一个简单的高斯后验分布，我们提出了一个高斯混合后验分布（即，多模式），以进一步提升响应的多样性，这有助于响应捕获复杂的语义。然后，以确保相关性并没有减少，而多样性的增加，我们充分利用从训练数据检索到后验分布模型，以增加响应相关类似的例子（范例）。此外，为促进混合高斯先验和后验分布的衔接，我们设计课程的优化策略，逐步培养下多个训练标准模型，从易到难。广泛使用的交换机和DailyDialog数据集实验结果表明，相比于多样性和相关性方面强大的基线我们的模型实现了显著的改善。

24. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models [PDF] 返回目录
Mengjie Zhao, Tao Lin, Martin Jaggi, Hinrich Schütze
Abstract: We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.
摘要：我们提出利用预训练的语言模型，在这里我们学到了预训练的权重选择二进制口罩代替通过修改细化和微调他们的有效方法。一系列的NLP任务遮蔽BERT和罗伯塔的广泛的评估表明，我们的屏蔽方案产量性能堪比细化和微调，但有一个更小的内存占用时，几个任务需要同时推断。通过内在的评估，我们将展示通过必要为解决下游任务掩盖语言模型编码信息来计算该表示。分析损失的风景，我们表明，掩蔽和微调产品型号驻留在极小，可以由线段与几乎恒定的测试精度进行连接。这证实了掩蔽可以用作一个有效的替代细化和微调。

25. Heterogeneous Graph Neural Networks for Extractive Document Summarization [PDF] 返回目录
Danqing Wang, Pengfei Liu, Yining Zheng, Xipeng Qiu, Xuanjing Huang
Abstract: As a crucial step in extractive document summarization, learning cross-sentence relations has been explored by a plethora of approaches. An intuitive way is to put them in the graph-based neural network, which has a more complex structure for capturing inter-sentence relationships. In this paper, we present a heterogeneous graph-based neural network for extractive summarization (HeterSumGraph), which contains semantic nodes of different granularity levels apart from sentences. These additional nodes act as the intermediary between sentences and enrich the cross-sentence relations. Besides, our graph structure is flexible in natural extension from a single-document setting to multi-document via introducing document nodes. To our knowledge, we are the first one to introduce different types of nodes into graph-based neural networks for extractive document summarization and perform a comprehensive qualitative analysis to investigate their benefits. The code will be released on Github
摘要：随着采掘文件总结了关键的一步，学习跨句子关系已经被办法过多探讨。一个直观的方法就是把它们基于图形的神经网络，这对捕捉跨句子的关系更复杂的结构分析。在本文中，我们提出了采掘汇总（HeterSumGraph），其包含除了句不同粒度等级的语义节点异构基于图的神经网络。这些额外的节点充当句子之间的中介和丰富的跨句关系。此外，我们的图形结构在从一个单文档通过引入文档节点设置为多文档自然延伸柔性的。据我们所知，我们要介绍不同类型的节点进入采掘文档文摘基于图形的神经网络，并进行全面的定性分析，调查他们的利益的第一个。该代码将在Github上发布

26. GLUECoS : An Evaluation Benchmark for Code-Switched NLP [PDF] 返回目录
Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury
Abstract: Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.
摘要：码转换是在同一个会话或话语使用一种以上的语言。近日，多语言情境嵌入模型，训练有素的多单语语料库，已经表现出对跨语种和多语种的任务可喜的成果。我们目前的评估基准，GLUECoS，代码交换语言，在英语，印地文和英语 - 西班牙语跨越多个NLP任务。具体来说，我们的评价标准包括语言识别从文本，词性标注，命名实体识别，情感分析，问题解答和码转换一个新的任务，自然语言推理。我们使用跨语种字嵌入模型和多语种的模型所有这些任务目前的结果。此外，我们微调对人工生成的代码交换数据多语种的模型。尽管多语言模型比跨语言模型显著更好地发挥，我们的研究结果表明，在大多数任务，在这两个语言对多语言模型微调的代码交换数据表现最佳，显示出多语言模型可以用于码转换有待进一步优化任务。

27. Multi-Domain Dialogue Acts and Response Co-Generation [PDF] 返回目录
Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu
Abstract: Generating fluent and informative responses is of critical importance for task-oriented dialogue systems. Existing pipeline approaches generally predict multiple dialogue acts first and use them to assist response generation. There are at least two shortcomings with such approaches. First, the inherent structures of multi-domain dialogue acts are neglected. Second, the semantic associations between acts and responses are not taken into account for response generation. To address these issues, we propose a neural co-generation model that generates dialogue acts and responses concurrently. Unlike those pipeline approaches, our act generation module preserves the semantic structures of multi-domain dialogue acts and our response generation module dynamically attends to different acts as needed. We train the two modules jointly using an uncertainty loss to adjust their task weights adaptively. Extensive experiments are conducted on the large-scale MultiWOZ dataset and the results show that our model achieves very favorable improvement over several state-of-the-art models in both automatic and human evaluations.
摘要：生成流畅和翔实的答复是面向任务的对话系统至关重要。现有的管道接近普遍预测多个对话的作用第一，并利用它们来帮助反应的产生。至少有两个缺点用这种方法。首先，多领域的对话行为的内在结构被忽略。第二，动作和响应之间的语义关联是不被考虑用于响应生成。为了解决这些问题，我们建议产生对话的行为和响应同时神经联产模式。不像那些管道接近，需要我们的行为产生模块保留了多领域的语义结构对话的作用，我们的响应生成模块动态地照顾到不同的行为。我们培养的两个模块共同使用的不确定性损失自适应地调整他们的任务权重。大量的实验是在大型数据集MultiWOZ进行，结果表明我们的模型实现了国家的最先进的几种模式中自动和人的评价非常有利的改进。

28. Relational Graph Attention Network for Aspect-based Sentiment Analysis [PDF] 返回目录
Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, Rui Wang
Abstract: Aspect-based sentiment analysis aims to determine the sentiment polarity towards a specific aspect in online reviews. Most recent efforts adopt attention-based neural network models to implicitly connect aspects with opinion words. However, due to the complexity of language and the existence of multiple aspects in a single sentence, these models often confuse the connections. In this paper, we address this problem by means of effective encoding of syntax information. Firstly, we define a unified aspect-oriented dependency tree structure rooted at a target aspect by reshaping and pruning an ordinary dependency parse tree. Then, we propose a relational graph attention network (R-GAT) to encode the new tree structure for sentiment prediction. Extensive experiments are conducted on the SemEval 2014 and Twitter datasets, and the experimental results confirm that the connections between aspects and opinion words can be better established with our approach, and the performance of the graph attention network (GAT) is significantly improved as a consequence.
摘要：基于Aspect的情感分析的目的是确定在朝着在线评论某一特定方面情感极性。最近的努力采取注重基于神经网络模型与意见的话隐含连接方面。然而，由于语言的复杂性和多方面的单句的存在，这些模型经常混淆的连接。在本文中，我们通过语法信息的有效编码的方式来解决这个问题。首先，我们通过整形和修剪一个普通的依赖关系解析树限定在目标方面为根的统一面向方面的依赖关系树结构。然后，我们提出了一个关系图注意网络（R-GAT）来编码新的树结构的情绪预测。大量的实验都在SemEval 2014年和Twitter的数据集进行的，实验结果证实，方面和观点词之间的连接可以用我们的方法来更好的建立，图形关注网络（GAT）的性能，因此被显著改善。

29. Is Your Classifier Actually Biased? Measuring Fairness under Uncertainty with Bernstein Bounds [PDF] 返回目录
Kawin Ethayarajh
Abstract: Most NLP datasets are not annotated with protected attributes such as gender, making it difficult to measure classification bias using standard measures of fairness (e.g., equal opportunity). However, manually annotating a large dataset with a protected attribute is slow and expensive. Instead of annotating all the examples, can we annotate a subset of them and use that sample to estimate the bias? While it is possible to do so, the smaller this annotated sample is, the less certain we are that the estimate is close to the true bias. In this work, we propose using Bernstein bounds to represent this uncertainty about the bias estimate as a confidence interval. We provide empirical evidence that a 95% confidence interval derived this way consistently bounds the true bias. In quantifying this uncertainty, our method, which we call Bernstein-bounded unfairness, helps prevent classifiers from being deemed biased or unbiased when there is insufficient evidence to make either claim. Our findings suggest that the datasets currently used to measure specific biases are too small to conclusively identify bias except in the most egregious cases. For example, consider a co-reference resolution system that is 5% more accurate on gender-stereotypical sentences -- to claim it is biased with 95% confidence, we need a bias-specific dataset that is 3.8 times larger than WinoBias, the largest available.
摘要：大多数NLP的数据集不与保护属性，如性别注释，因此很难用公平的标准措施（例如，机会均等）测量分类偏倚。然而，手工标注与受保护的属性的大型数据集是缓慢且昂贵。取而代之的注释所有的例子中，我们可以注释其中的一个子集，并使用该样本来估计偏差？虽然可以这样做，这个注释样本越小，那么肯定我们的估计是接近真实的偏见。在这项工作中，我们提出了用伯恩斯坦界代表约偏置估算值作为置信区间这种不确定性。我们提供的经验证据表明来源的这样一个一致的95％的置信区间边界的真实偏差。在量化这种不确定性，我们的方法，我们称之为伯恩斯坦有界不公平，有助于防止分类被认为是偏见或偏见时，有足够的证据来做出任何要求。我们的研究结果表明，目前用于测定特异性偏差的数据集太小，最后确定除了在最恶劣情况下的偏差。例如，考虑共同引用解决系统就是5％的性别定型的句子更准确 - 宣称它偏向以95％的信心，我们需要一个偏置特定的数据集比WinoBias大3.8倍，最大可用。

30. Endowing Empathetic Conversational Models with Personas [PDF] 返回目录
Peixiang Zhong, Yan Zhu, Yong Liu, Chen Zhang, Hao Wang, Zaiqing Nie, Chunyan Miao
Abstract: Empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains. In Psychology, persona has been shown to be highly correlated to personality, which in turn influences empathy. In addition, our empirical analysis also suggests that persona plays an important role in empathetic conversations. To this end, we propose a new task towards persona-based empathetic conversations and present the first empirical study on the impacts of persona on empathetic responding. Specifically, we first present a novel large-scale multi-domain dataset for persona-based empathetic conversations. We then propose CoBERT, an efficient BERT-based response selection model that obtains the state-of-the-art performance on our dataset. Finally, we conduct extensive experiments to investigate the impacts of persona on empathetic responding. Notably, our results show that persona improves empathetic responding more when CoBERT is trained on empathetic conversations than non-empathetic ones, establishing an empirical link between persona and empathy in human conversations.
摘要：移情会话模型已被证明可以改善在众多领域的用户满意度和工作成果。在心理学，角色已被证明是高度相关的个性，这反过来又影响同情。此外，我们的实证分析还表明，角色扮演善解人意的对话具有重要作用。为此，我们提出了对基于角色的同情对话新任务，并提出对人物的同情上响应的影响，第一个实证研究。具体地讲，我们首先提出了一种新的大规模多域数据集基于角色的移情对话。然后，我们提出CoBERT，即获得我们的数据集的国家的最先进的性能高效的基于BERT反应选择模型。最后，我们进行了广泛的实验，以研究移情回应人物的影响。值得注意的是，我们的研究结果表明，角色提高移情时CoBERT是在同情对话比非善解人意的，建立在人类对话的人物和同情之间的经验联系训练有素的响应等等。

31. MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization [PDF] 返回目录
Canwen Xu, Jiaxin Pei, Hongtao Wu, Yiyu Liu, Chenliang Li
Abstract: Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.
摘要：近日，大型数据集已经极大地促进了发展，自然语言处理的几乎所有领域。然而，目前还没有交任务数据集，NLP，这阻碍了多任务学习的发展。我们建议MATINF，第一个共同标记的大型数据集进行分类，问答和总结。 MATINF包含107万与人类标记的类别和用户生成的问题说明问题 - 回答对。基于这些丰富的信息，MATINF适用于三大NLP任务，包括分类，问题解答，并总结。我们的基准现有的方法和在MATINF一种新型的多任务基准激发进一步的研究。我们的综合比较和实验过MATINF和其他数据集展示由MATINF举行的优点。

32. Dual Learning for Semi-Supervised Natural Language Understanding [PDF] 返回目录
Su Zhu, Ruisheng Cao, Kai Yu
Abstract: Natural language understanding (NLU) converts sentences into structured semantic forms. The paucity of annotated training samples is still a fundamental challenge of NLU. To solve this data sparsity problem, previous work based on semi-supervised learning mainly focuses on exploiting unlabeled sentences. In this work, we introduce a dual task of NLU, semantic-to-sentence generation (SSG), and propose a new framework for semi-supervised NLU with the corresponding dual model. The framework is composed of dual pseudo-labeling and dual learning method, which enables an NLU model to make full use of data (labeled and unlabeled) through a closed-loop of the primal and dual tasks. By incorporating the dual task, the framework can exploit pure semantic forms as well as unlabeled sentences, and further improve the NLU and SSG models iteratively in the closed-loop. The proposed approaches are evaluated on two public datasets (ATIS and SNIPS). Experiments in the semi-supervised setting show that our methods can outperform various baselines significantly, and extensive ablation studies are conducted to verify the effectiveness of our framework. Finally, our method can also achieve the state-of-the-art performance on the two datasets in the supervised setting.
摘要：自然语言理解（NLU）转换成句子的语义结构形式。经注释的训练样本的缺乏仍然是自然语言理解的一个基本挑战。为了解决这一数据稀疏问题，基于半监督学习以前的工作主要集中在利用未标记的句子。在这项工作中，我们将介绍NLU，语义对句子代（SSG）的双重任务，并提出了新的框架半监督NLU与相应的双模式。该框架是由双伪标记和双学习方法，其使得NLU模型充分利用数据的（标记的和未标记的）通过原始和对偶的任务的闭环。通过整合的双重任务，该框架可以利用纯语义形式以及未标记的句子，并进一步闭环迭代改善NLU和SSG模型。所提出的方法是在两个公共数据集（ATIS和SNIPS）进行评估。在半监督设置的实验表明我们的方法可以显著优于各种基线，并广泛消融研究以验证我们的框架的有效性。最后，我们的方法也可以实现对被监控的设置中的两个数据集的国家的最先进的性能。

33. Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports [PDF] 返回目录
Baoyu Jing, Zeya Wang, Eric Xing
Abstract: Chest X-Ray (CXR) images are commonly used for clinical screening and diagnosis. Automatically writing reports for these images can considerably lighten the workload of radiologists for summarizing descriptive findings and conclusive impressions. The complex structures between and within sections of the reports pose a great challenge to the automatic report generation. Specifically, the section Impression is a diagnostic summarization over the section Findings; and the appearance of normality dominates each section over that of abnormality. Existing studies rarely explore and consider this fundamental structure information. In this work, we propose a novel framework that exploits the structure information between and within report sections for generating CXR imaging reports. First, we propose a two-stage strategy that explicitly models the relationship between Findings and Impression. Second, we design a novel cooperative multi-agent system that implicitly captures the imbalanced distribution between abnormality and normality. Experiments on two CXR report datasets show that our method achieves state-of-the-art performance in terms of various evaluation metrics. Our results expose that the proposed approach is able to generate high-quality medical reports through integrating the structure information.
摘要：胸部X光（CXR）图像通常用于临床筛查和诊断。这些图像自动撰写报告可以大大减轻放射科医生的工作量总结描述发现和结论性的印象。之间的报告部分中的复杂结构构成了对自动报表生成一个巨大的挑战。具体而言，部分是印象在部分调查结果的诊断总结;和正常的外观支配超过了异常的每个部分。现有的研究很少探讨和考虑这个基本结构信息。在这项工作中，我们提出了利用之间以及生成CXR成像报告报表节内的结构信息的新框架。首先，我们提出了两个阶段的战略，明确模型结果和印象之间的关系。其次，我们设计了一种新的合作多代理系统隐含捕捉异常和正常之间的不均衡分布。两个CXR报告数据集实验表明，我们的方法实现了各种评价指标方面国家的最先进的性能。我们的研究结果揭露，该方法能够通过整合结构信息，生成高品质的医疗报告。

34. Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias [PDF] 返回目录
Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, Stuart Shieber
Abstract: Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) decomposable into effects flowing directly from the input and indirectly through the mediators.
摘要：常用的方法为在自然语言解释的神经模型处理通常检查或者其结构或行为，但不能同时使用。我们建议在调解因果分析理论接地的方法，一个用于解释模型的零件在其行为因果牵连。它使我们能够分析由该信息从输入通过各种模型组件，被称为介质流向输出的机制。我们应用这种方法来分析预先训练变压器语言模型的性别偏见。我们研究了在三个数据集中用来衡量一个模型的性别偏见的敏感性调解性别偏见的单个神经元和关注头部的作用。我们的中介分析表明，性别偏压作用为（i）稀疏，集中在网络的一小部分; （ⅱ）协同的，扩增或通过不同的部件压抑;和（iii）可分解成直接从输入和间接通过介质流动的影响。

35. Hierarchical Multi Task Learning with Subword Contextual Embeddings for Languages with Rich Morphology [PDF] 返回目录
Arda Akdemir, Tetsuo Shibuya, Tunga Güngör
Abstract: Morphological information is important for many sequence labeling tasks in Natural Language Processing (NLP). Yet, existing approaches rely heavily on manual annotations or external software to capture this information. In this study, we propose using subword contextual embeddings to capture the morphological information for languages with rich morphology. In addition, we incorporate these embeddings in a hierarchical multi-task setting which is not employed before, to the best of our knowledge. Evaluated on Dependency Parsing (DEP) and Named Entity Recognition (NER) tasks, which are shown to benefit greatly from morphological information, our final model outperforms previous state-of-the-art models on both tasks for the Turkish language. Besides, we show a net improvement of 18.86% and 4.61% F-1 over the previously proposed multi-task learner in the same setting for the DEP and the NER tasks, respectively. Empirical results for five different MTL settings show that incorporating subword contextual embeddings brings significant improvements for both tasks. In addition, we observed that multi-task learning consistently improves the performance of the DEP component.
摘要：形态信息是在自然语言处理（NLP）许多序列标注任务重要。然而，现有的方法主要依赖于人工注释或外部软件捕捉到了这个信息。在这项研究中，我们提出了用子词的上下文的嵌入到捕捉具有丰富的形态语言形态信息。此外，我们在未使用之前，给我们所知的分层多任务环境将这些嵌入物。评估了依存分析（DEP）和命名实体识别（NER）任务，其中显示从形态学信息，我们的两个任务土耳其语最终模型优于国家的最先进的以往机型大大受益。此外，我们表现出的18.86％的净改进和4.61％F-1在以上分别为DEP和NER任务，相同的设置先前提出的多任务的学习者。五个不同MTL设置实证结果表明，结合上下文子字的嵌入带来了两个任务显著的改善。此外，我们观察到多任务持续学习提高了DEP组件的性能。

36. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification [PDF] 返回目录
Jiaao Chen, Zichao Yang, Diyi Yang
Abstract: This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled this http URL mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at this https URL.
摘要：本文介绍MixText，文本分类的半监督学习方法，它使用了所谓的TMIX新设计的数据增强方法。 TMIX在隐藏的空间插值文本创建大量增加训练样本。此外，我们利用数据增强猜测低熵的标签标签数据的最新进展，从而使他们容易，因为使用标记这个HTTP URL混合标记，未标记的和增强的数据，MixText显著优于当前预先训练，并罚款调谐模型和几种文本分类基准，其他国家的最先进的半监督学习方法。改善尤为突出，当监管极为有限。我们在此HTTPS URL已公开发布我们的代码。

37. MCQA: Multimodal Co-attention Based Network for Question Answering [PDF] 返回目录
Abhishek Kumar, Trisha Mittal, Dinesh Manocha
Abstract: We present MCQA, a learning-based algorithm for multimodal question answering. MCQA explicitly fuses and aligns the multimodal input (i.e. text, audio, and video), which forms the context for the query (question and answer). Our approach fuses and aligns the question and the answer within this context. Moreover, we use the notion of co-attention to perform cross-modal alignment and multimodal context-query alignment. Our context-query alignment module matches the relevant parts of the multimodal context and the query with each other and aligns them to improve the overall performance. We evaluate the performance of MCQA on Social-IQ, a benchmark dataset for multimodal question answering. We compare the performance of our algorithm with prior methods and observe an accuracy improvement of 4-7%.
摘要：我们提出MCQA，基于学习的算法多答疑。 MCQA明确地融合并对准多输入（即文本，音频，视频），它构成了查询（问题和答案）上下文。我们的方法保险丝和对齐的问题，这个范围内的答案。此外，我们使用的共同关注的概念来进行跨模态定位和多式联运方面的查询比对。我们的情况下，查询比对模块的多模态背景和相互并对齐它们查询的相关部分相匹配，提高整体性能。我们评估MCQA对社会智商，一个基准数据集多式联运问答的性能。我们比较我们与以前的方法算法的性能和观察的4-7％的准确度提高。

38. Quantifying the Contextualization of Word Representations with Semantic Class Probing [PDF] 返回目录
Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Schütze
Abstract: Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well. We investigate the contextualization of words in BERT. We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings. Quantifying contextualization helps in understanding and utilizing pretrained language models. We show that top layer representations achieve high accuracy inferring semantic classes; that the strongest contextualization effects occur in the lower layers; that local context is mostly sufficient for semantic class inference; and that top layer representations are more task-specific after finetuning while lower layer representations are more transferable. Finetuning uncovers task related features, but pretrained knowledge is still largely preserved.
摘要：预训练语言模型已经取得了许多NLP任务的一个新的艺术状态，但仍有关于他们如何以及为什么工作这么好许多未决问题。我们调查BERT词的语境。我们，即字如何在上下文中解释，通过研究哪个单词的语义类别可以从它的情境化的嵌入被推断的程度量化语境的量。量化语境有助于了解和利用预训练语言模型。我们发现，顶层表示达到较高的精度推断语义类;发生在较低层的最强情境影响;当地上下文语义类推理大多是足够的;而顶层表示是任务，更具体的，而下层表示是更加转让微调后。微调揭示任务相关的功能，但预训练的知识在很大程度上仍然保留。

39. QURATOR: Innovative Technologies for Content and Data Curation [PDF] 返回目录
Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle, Jens Rauenbusch, Lisa Rutenburg, André Schmidt, Mikka Wild, Henry Hoffmann, Julian Fink, Sarah Schulz, Jurica Seva, Joachim Quantz, Joachim Böttger, Josefine Matthey, Rolf Fricke, Jan Thomsen, Adrian Paschke, Jamal Al Qundus, Thomas Hoppe, Naouel Karam, Frauke Weichhardt, Christian Fillies, Clemens Neudecker, Mike Gerber, Kai Labusch, Vahid Rezanezhad, Robin Schaefer, David Zellhöfer, Daniel Siewert, Patrick Bunk, Lydia Pintscher, Elena Aleynikova, Franziska Heine
Abstract: In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industry and a broad range of expertise in AI, Machine Learning and Language Technologies, the QURATOR project, funded by the German Federal Ministry of Education and Research, develops a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content. The project's vision and ambition is to establish an ecosystem for content curation technologies that significantly pushes the current state of the art and transforms its region, the metropolitan area Berlin-Brandenburg, into a global centre of excellence for curation technologies.
摘要：在所有的领域和部门，智能系统的需求，以支持处理和生成的数字内容正在迅速增加。海量的内容和压力的可用性快速，在快速连续需要更快，更高效，更智能处理和生成方法发布新的内容。从研究和产业，并广泛在人工智能，机器学习和语言技术，该QURATOR项目，由教育和研究的德国联邦资助，开发可持续和创新的技术平台的专业知识，提供服务10合作伙伴组成的财团各行业的支持，知识工作者解决策数字内容时，他们所面临的挑战。该项目的远见和雄心是建立内容策展技术的生态系统显著推动技术的当前状态，并且将其输入区，都市区柏林 - 勃兰登堡，成为一个卓越的策展技术的全球中心。

40. Towards Discourse Parsing-inspired Semantic Storytelling [PDF] 返回目录
Georg Rehm, Karolina Zaczynska, Julián Moreno-Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild
Abstract: Previous work of ours on Semantic Storytelling uses text analytics procedures including Named Entity Recognition and Event Detection. In this paper, we outline our longer-term vision on Semantic Storytelling and describe the current conceptual and technical approach. In the project that drives our research we develop AI-based technologies that are verified by partners from industry. One long-term goal is the development of an approach for Semantic Storytelling that has broad coverage and that is, furthermore, robust. We provide first results on experiments that involve discourse parsing, applied to a concrete use case, "Explore the Neighbourhood!", which is based on a semi-automatically collected data set with documents about noteworthy people in one of Berlin's districts. Though automatically obtaining annotations for coherence relations from plain text is a non-trivial challenge, our preliminary results are promising. We envision our approach to be combined with additional features (NER, coreference resolution, knowledge graphs
摘要：语义讲故事我们以前的工作使用文本分析的程序，包括命名实体识别和事件检测。在本文中，我们列出了我们的语义讲故事的长期愿景和描述当前的概念和技术方法。在驱动我们的研究项目，我们开发由来自行业合作伙伴验证基于人工智能技术。一个长期目标是语义讲故事的方法有覆盖面广的发展，那就是，此外，强劲。我们提供第一批成果在涉及话语分析的实验，应用到具体的使用情况下，“探索邻里！”，这是基于有关柏林地区的一个值得关注的人文档的半自动收集的数据集。虽然自动获取从纯文本的连贯关系注释是一个不平凡的挑战，我们的初步结果是令人鼓舞的。我们设想我们的方法与附加功能（NER，指代消解，知识图相结合

41. Learning to Update Natural Language Comments Based on Code Changes [PDF] 返回目录
Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, Raymond J. Mooney
Abstract: We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.
摘要：我们制定的基础上的代码体它伴随的变化而自动更新现有的自然语言注释的新任务。我们提出了一种方法，学会在两个不同的语言表示归属关系的变化，而产生被应用到现有注释，以反映源代码修改编辑的序列。我们使用的数据集，我们从提交的开源软件项目的历史，每个例如由并发更新的方法和其相应的评论收集训练和评估我们的模型。我们将我们对会使用自动度量和评价人的多个基准的做法。结果反映了这个任务我们的模型优于基线和挑战相对于进行编辑。

42. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence [PDF] 返回目录
Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun
Abstract: Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as LegalAI is beneficial to the legal system for liberating legal professionals from a maze of paperwork. Legal professionals often think about how to solve tasks from rule-based and symbol-based methods, while NLP researchers concentrate more on data-driven and embedding methods. In this paper, we introduce the history, the current state, and the future directions of research in LegalAI. We illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in LegalAI. We conduct experiments and provide an in-depth analysis of the advantages and disadvantages of existing works to explore possible future directions. You can find the implementation of our work from this https URL.
摘要：法律人工智能（LegalAI）侧重于应用人工智能，尤其是自然语言处理，受益任务的技术在法律领域。近年来，LegalAI已经引起迅速从两个AI研究人员和法律界人士越来越多的关注，因为LegalAI是法律制度有利于从文书工作的迷宫中解放出来的法律专业人士。法律专业人士经常思考如何解决从基于规则和基于符号的方法任务，而NLP研究人员更专注于数据驱动和嵌入方法。在本文中，我们介绍了历史，现状，以及研究在LegalAI未来的发展方向。我们举例说明，从法律专业人士和研究人员NLP的角度任务并显示LegalAI几个有代表性的应用程序。我们进行实验和提供的优势和现有的作品缺点的深入分析，探讨未来可能的方向。你可以找到我们的工作从这个HTTPS URL实现。

43. A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? [PDF] 返回目录
Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
Abstract: Fine-tuning pretrained model has achieved promising performance on standard NER benchmarks. Generally, these benchmarks are blessed with strong name regularity, high mention coverage and sufficient context diversity. Unfortunately, when scaling NER to open situations, these advantages may no longer exist, and therefore raise the critical question of whether pretrained supervised models can still work well when facing these issues. As there is no currently available dataset to investigate this problem, this paper proposes to conduct randomization test on standard benchmarks. Specifically, we erase name regularity, mention coverage and context diversity respectively from the benchmarks, in order to explore their impact on the generalization ability of models. Moreover, we also construct a new open NER dataset that focuses on entity types with weak name regularity such as book, song, and movie. From both randomization test and empirical experiments, we draw the conclusions that 1) name regularity is vital for generalization to unseen mentions; 2) high mention coverage may undermine the model generalization ability and 3) context patterns may not require enormous data to capture when using pretrained supervised models.
摘要：微调预训练模式取得了承诺标准NER基准性能。一般情况下，这些基准被赋予较强的规律性的名字，提高覆盖率和充足的背景多样性。不幸的是，缩放NER在打开的情况下，当这些优势可能不复存在，因此提高是否面临这些问题时，预先训练监督模型仍然可以正常工作的关键问题。由于没有当前可用的数据集来研究这个问题，本文提出的标准基准进行随机检验。具体来说，我们抹去名规律性，提覆盖面和多样性方面分别从基准，以探讨其对模型的泛化能力的影响。此外，我们还建立一个新的开放NER的数据集，与弱名字规律性侧重于实体类型，如书籍，歌曲和电影。无论从随机检验和实证实验，我们得出结论：1）名称规律性是至关重要的泛化看不见提及; 2）高一提的覆盖可能会破坏模型泛化能力，并使用预先训练监督模型时3）的背景模式可能并不需要庞大的数据采集。

44. All Word Embeddings from One Embedding [PDF] 返回目录
Sho Takase, Sosuke Kobayashi
Abstract: In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable. Then, we input the constructed embedding into a feed-forward neural network to increase its expressiveness. Naively, the filter vectors occupy the same memory size as the conventional embedding matrix, which depends on the vocabulary size. To solve this issue, we also introduce a memory-efficient filter construction approach. We indicate our ALONE can be used as word representation sufficiently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization. We combined ALONE with the current state-of-the-art encoder-decoder model, the Transformer, and achieved comparable scores on WMT 2014 English-to-German translation and DUC 2004 very short summarization with less parameters.
摘要：在自然语言处理（NLP）基于神经网络的模型，参数的最大的部分往往由字的嵌入的。传统型号准备一个大的嵌入矩阵，其大小取决于词汇量。因此，在内存和磁盘存储存储这些模型是昂贵的。在这项研究中，以减少参数的总数，对于所有字的嵌入是通过转化共享嵌入表示。所提出的方法，单独的（从一个所有字的嵌入），通过修改与一个滤波器矢量，其是特定的字的但非可训练共享嵌入构造一个词的嵌入。然后，我们输入构建嵌入前馈神经网络，以增加它的表现力。天真地，过滤器载体占用相同的存储器尺寸与常规包埋基质，其依赖于词汇量。为了解决这个问题，我们也引入了内存的高效过滤器进场施工。我们表明我们的ALONE可以充分地进行对预先训练字的嵌入重建的实验中用作字表示。此外，我们还对NLP应用任务进行实验：机器翻译和总结。我们当前国家的最先进的编码解码器模型，变压器ALONE相结合，并在WMT 2014英语到德语翻译和DUC 2004年很短的总结以较少的参数来实现媲美的分数。

45. A Heterogeneous Graph with Factual, Temporal and Logical Knowledge for Question Answering Over Dynamic Contexts [PDF] 返回目录
Wanjun Zhong, Duyu Tang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin
Abstract: We study question answering over a dynamic textual environment. Although neural network models achieve impressive accuracy via learning from input-output examples, they rarely leverage various types of knowledge and are generally not interpretable. In this work, we propose a graph-based approach, where a heterogeneous graph is automatically built with factual knowledge of the context, temporal knowledge of the past states, and logical knowledge that combines human-curated knowledge bases and rule bases. We develop a graph neural network over the constructed graph, and train the model in an end-to-end manner. Experimental results on a benchmark dataset show that the injection of various types of knowledge improves a strong neural network baseline. An additional benefit of our approach is that the graph itself naturally serves as a rational behind the decision making.
摘要：我们研究问题回答了一个动态的文字环境。虽然神经网络模型通过从投入产出的例子学习实现令人惊叹的准确，他们很少利用不同类型的知识和一般不可解释。在这项工作中，我们提出了一种基于图形的方法，其中异构图形自动与上下文的事实性知识，过去状态的时态知识和逻辑知识构建，结合人策划的知识库和规则库。我们开发在构成的曲线图的曲线图的神经网络，并训练所述模型中的端至端的方式。在基准数据集显示，各类知识的注入改善了强烈神经网络基线实验结果。我们的做法的另一个好处是，图形本身自然的成为一个理性的决策背后。

46. When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People? [PDF] 返回目录
Kenneth Joseph, Jonathan H. Morgan
Abstract: Social biases are encoded in word embeddings. This presents a unique opportunity to study society historically and at scale, and a unique danger when embeddings are used in downstream applications. Here, we investigate the extent to which publicly-available word embeddings accurately reflect beliefs about certain kinds of people as measured via traditional survey methods. We find that biases found in word embeddings do, on average, closely mirror survey data across seventeen dimensions of social meaning. However, we also find that biases in embeddings are much more reflective of survey data for some dimensions of meaning (e.g. gender) than others (e.g. race), and that we can be highly confident that embedding-based measures reflect survey data only for the most salient biases.
摘要：社会的偏见在Word中的嵌入编码。这给研究社会历史和规模的一个独特的机会，以及独特的危险时的嵌入在下游应用中使用。在这里，我们探讨的是通过传统的调查方法测得，其公开可用的嵌入字准确地反映了关于某些人信仰的程度。我们发现在Word里的嵌入发现偏见做平均，整个社会意义层面17密切镜的调查数据。但是，我们也发现在嵌入物，其偏置是意（如性别）比其他人（如种族）的一些尺寸更反射调查数据，而且我们可以非常有信心，基于嵌入措施仅反映在调查数据最显着的偏差。

47. Contextualized Representations Using Textual Encyclopedic Knowledge [PDF] 返回目录
Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova
Abstract: We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1).
摘要：我们提出通过从多个文件动态检索到的文字百科全书式的知识背景共同来龙去脉他们表示输入文本的方法。我们应用我们的方法通过与有关他们提到的实体背景句子编码的问题和通道一起阅读理解任务。我们显示的文本整合背景知识是有效对焦事实推理任务，并允许强大的预训练BERT式编码器直接再利用。此外，知识整合可进一步经由自监督掩蔽语言模型在以上背景扩充输入文本中的单词目的合适的预训练改善。在TriviaQA，我们的方法取得改进的1.6到3.1 F1在不整合动态的背景知识可比罗伯塔模型。在MRQA，收集了大量不同的QA数据集，我们看到在域一致的收益具有较大的改进外的域上BioASQ（2.1〜4.2 F1），TextbookQA（1.6〜2.0 F1），和DuoRC（1.1至2.0沿F1）。

48. Syntactic Data Augmentation Increases Robustness to Inference Heuristics [PDF] 返回目录
Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen
Abstract: Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model's limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT's accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.
摘要：预训练神经模型，如BERT，当微调，以执行自然语言推理（NLI），常常表现出对标准数据集精度高，但显示令人惊讶的缺乏敏感性的语序控制的挑战集。我们推测，这个问题主要不是由预训练模型的局限性造成的，而是由可能在微调阶段传达句法结构的重要性众包NLI例子很少。我们探讨了几种方法来增强标准的训练集与语法翔实的实例，从MNLI语料库应用语法转换成句子产生。效果最佳的扩增方法中，主体/客体反转，提高了对控制的例子BERT的精度，要词序诊断灵敏度为0.28〜0.73，在不影响对MNLI测试集的性能。这一改进广义超出了用于数据增强的特殊结构，这表明增强导致BERT招收抽象语法表示。

49. Collecting Entailment Data for Pretraining: New Protocols and Negative Results [PDF] 返回目录
Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler
Abstract: Textual entailment (or NLI) data has proven useful as pretraining data for tasks requiring language understanding, even when building on an already-pretrained model like RoBERTa. The standard protocol for collecting NLI was not designed for the creation of pretraining data, and it is likely far from ideal for this purpose. With this application in mind, we propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a simple MNLI-based baseline, we collect and compare five new 8.5k-example training sets. Our primary results are solidly negative, with our baseline MNLI-style dataset yielding good transfer performance, but none of our four new methods (nor the recent ANLI) showing any improvements on that baseline. However, we do observe that all four of these interventions, especially the use of seed sentences for inspiration, reduce previously observed issues with annotation artifacts.
摘要：文字蕴涵（或NLI）数据已被证明为像罗伯塔一个已经预先训练模型构建时训练前的数据需要语言理解任务，甚至是有益的。用于收集NLI标准协议不被设计为创建预训练数据的，并且很可能远远不够理想用于此目的。有了这个应用中，我们提出了四个备选方案，每一个旨在改善或者属于注释可以产生声音训练实例或者这些例子的质量和多样性的难易程度。使用这些替代方案和基于MNLI，简单的基准，我们收集和比较了五个新8.5K，例如训练集。我们的主要结果是扎实负，与我们的基线MNLI式数据集产生良好的传输性能，但没有我们的四个新方法（也不是最近安利）显示在基线任何改进。但是，我们观察到，这些干预措施，特别是用种子句为灵感的所有四个，减少注释文物先前观察到的问题。

50. The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic [PDF] 返回目录
Hana Al-Theiabat, Aisha Al-Sadi
Abstract: This paper describes our method for the task of Semantic Question Similarity in Arabic in the workshop on NLP Solutions for Under-Resourced Languages (NSURL). The aim is to build a model that is able to detect similar semantic questions in the Arabic language for the provided dataset. Different methods of determining questions similarity are explored in this work. The proposed models achieved high F1-scores, which range from (88% to 96%). Our official best result is produced from the ensemble model of using a pre-trained multilingual BERT model with different random seeds with 95.924% F1-Score, which ranks the first among nine participants teams.
摘要：本文介绍了我们对语义相似的问题在阿拉伯语中对资源不足的语言（NSURL）NLP解决方案研讨会的任务的方法。其目的是要建立一个模型，能够检测在阿拉伯语为所提供的数据集相似的语义问题。确定问题相似的不同方法进行了探讨这项工作。提出的模型实现了高F1-分数，其范围从（88％至96％）。我们的官方最佳结果使用不同的随机种子预先训练的多语种BERT模型95.924％的F1-得分，位居九种参加球队第一的集成模型制作。

51. Practical Comparable Data Collection for Low-Resource Languages via Images [PDF] 返回目录
Aman Madaan, Shruti Rijhwani, Antonios Anastasopoulos, Yiming Yang, Graham Neubig
Abstract: We propose a method of curating high-quality comparable training data for low-resource languages without requiring that the annotators are bilingual. Our method involves using a carefully selected set of images as a pivot between the source and target languages by getting captions for such images in both languages independently. Human evaluations on the English-Hindi comparable corpora created with our method show that 81.1\% of the pairs are acceptable translations, and only 2.47\% of the pairs are not a translation at all. We further establish the potential of dataset collected through our approach by experimenting on two downstream tasks -- machine translation and dictionary extraction. All code and data are made available at \url{this https URL
摘要：本文提出策划资源少的语言的高品质相媲美的训练数据，而不要求注释是双语的方法。我们的方法包括使用仔细选择的图像集合作为源语言和目标语言之间的枢轴的通过独立地得到用于这样的图像的字幕在两种语言。与我们的方法显示创建的英语，印地文比较语料库人的评价是，对81.1 \％是可以接受的译本，只有2.47 \％的对是不是在所有的翻译。机器翻译和词典萃取 - 我们通过实验上的两个下游任务，进一步建立通过我们的方法收集的数据集的潜力。所有的代码和数据在\ {URL这HTTPS URL提供

52. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT [PDF] 返回目录
Omar Khattab, Matei Zaharia
Abstract: Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Beyond reducing the cost of re-ranking the documents retrieved by a traditional model, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from a large document collection. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring four orders-of-magnitude fewer FLOPs per query.
摘要：在自然语言的最新进展理解（NLU）推动了信息检索快节奏的进步（IR），主要是拖欠微调深语言模型（LMS）的文件排序。虽然非常有效，基于这些的LM排名模型增加通过在现有的方法数量级的计算成本，尤其是因为它们必须喂每一个查询文档对通过大规模神经网络以计算一个相关性得分。为了解决这个问题，我们目前科尔伯特，一个新的排名模型，为有效的检索适应深的LM（特别是BERT）。科尔伯特介绍了一晚的互动架构，独立编码的查询和使用BERT的文档，然后采用便宜但功能强大的交互的步骤，其型号细粒相似。通过延迟，但保留该细颗粒的相互作用，科尔伯特可以利用深的LM的表现力，同时获得离线预先计算文档表示的能力，大大加快了查询处理。除了减少重新排名由传统模式读取文件的费用，科尔伯特的修剪友好的互动机制使利用向量相似性指标从一个大的文档采集端至端检索直接。我们广泛评估使用科尔伯特最近的两个通道的搜索数据集。结果表明，科尔伯特的有效性与现有的基于BERT的模型（性能胜过每一个非BERT基线）有竞争力，同时执行两个订单的数量级速度更快，需要每个查询四个数量级的数量级少无人问津。

53. "Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples [PDF] 返回目录
Mattia Samory, Indira Sen, Julian Kohne, Fabian Floeck, Claudia Wagner
Abstract: To effectively tackle sexism online, research has focused on automated methods for detecting sexism. In this paper, we use items from psychological scales and adversarial sample generation to 1) provide a codebook for different types of sexism in theory-driven scales and in social media text; 2) test the performance of different sexism detection methods across multiple data sets; 3) provide an overview of strategies employed by humans to remove sexism through minimal changes. Results highlight that current methods seem inadequate in detecting all but the most blatant forms of sexism and do not generalize well to out-of-domain examples. By providing a scale-based codebook for sexism and insights into what makes a statement sexist, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.
摘要：为有效解决在线性别歧视，研究主要集中在用于检测性别歧视自动化方法。在本文中，我们使用从心理规模和对抗性样本代1）项目提供了不同类型的理论驱动的规模和社交媒体的文本性别歧视的码本; 2）测试的跨多个数据集不同性别歧视检测方法的性能; 3）提供的人类使用通过最小的改变，以除去性别主义策略的概述。结果强调，目前的方法似乎在检测所有，但性别歧视的最公然形式不足，不能推广很好地外的域的例子。通过提供对性别歧视和见解规模导向的码本成是什么让一个声明中的性别歧视，我们希望能有助于为性别歧视检测更好和更广泛的车型，其中包括理论驱动的方法，以数据收集反射的发展。

54. A Batch Normalized Inference Network Keeps the KL Vanishing Away [PDF] 返回目录
Qile Zhu, Wei Bi, Xiaojiang Liu, Xiyao Ma, Xiaolin Li, Dapeng Wu
Abstract: Variational Autoencoder (VAE) is widely used as a generative model to approximate a model's posterior on latent variables by combining the amortized variational inference and deep neural networks. However, when paired with strong autoregressive decoders, VAE often converges to a degenerated local optimum known as "posterior collapse". Previous approaches consider the Kullback Leibler divergence (KL) individual for each datapoint. We propose to let the KL follow a distribution across the whole dataset, and analyze that it is sufficient to prevent posterior collapse by keeping the expectation of the KL's distribution positive. Then we propose Batch Normalized-VAE (BN-VAE), a simple but effective approach to set a lower bound of the expectation by regularizing the distribution of the approximate posterior's parameters. Without introducing any new model component or modifying the objective, our approach can avoid the posterior collapse effectively and efficiently. We further show that the proposed BN-VAE can be extended to conditional VAE (CVAE). Empirically, our approach surpasses strong autoregressive baselines on language modeling, text classification and dialogue generation, and rivals more complex approaches while keeping almost the same training time as VAE.
摘要：变自动编码器（VAE）相结合的摊销变推理和深层神经网络被广泛用作生成模型对潜在变量近似模型的后路。然而，当具有很强的自回归解码器配对，VAE往往收敛于一个堕落称为“后崩溃”局部最优。以前的方法考虑每个数据点的库勒巴克Leibler散度（KL）的个体。我们建议让KL遵循跨整个数据集的分布，并分析这是足以防止坍塌后通过保持吉隆坡的分布积极的预期。然后，我们提出了批量标准化，VAE（BN-VAE），一个简单而有效的方法，通过正规化近似后的参数的分布来设定下限的期待。不引入任何新的模型组件或修改的目标，我们的方法可以有效地避免后崩溃。进一步的研究表明，所提出的BN-VAE可以扩展到有条件的VAE（CVAE）。根据经验，我们的做法超越了语言模型，文本分类和对话产生强烈的自回归基线，而对手更复杂的方法，同时保持几乎相同的训练时间VAE。

55. Detecting fake news for the new coronavirus by reasoning on the Covid-19 ontology [PDF] 返回目录
Adrian Groza
Abstract: In the context of the Covid-19 pandemic, many were quick to spread deceptive information. I investigate here how reasoning in Description Logics (DLs) can detect inconsistencies between trusted medical sources and not trusted ones. The not-trusted information comes in natural language (e.g. "Covid-19 affects only the elderly"). To automatically convert into DLs, I used the FRED converter. Reasoning in Description Logics is then performed with the Racer tool.
摘要：在Covid-19大流行的背景下，许多人很快就蔓延欺骗性的信息。我在这里探讨如何描述逻辑（DLS）推理可以信任的医疗资源和不可信任的人之间检测不一致。在不信任的信息来自于自然语言（例如，“Covid-19不仅影响老人”）。要自动转换为DL的，我用了FRED转换器。在描述逻辑推理则与赛车手工具执行。

56. Methods for Computing Legal Document Similarity: A Comparative Study [PDF] 返回目录
Paheli Bhattacharya, Kripabandhu Ghosh, Arindam Pal, Saptarshi Ghosh
Abstract: Computing similarity between two legal documents is an important and challenging task in the domain of Legal Information Retrieval. Finding similar legal documents has many applications in downstream tasks, including prior-case retrieval, recommendation of legal articles, and so on. Prior works have proposed two broad ways of measuring similarity between legal documents analyzing the precedent citation network, and measuring similarity based on textual content similarity measures. But there has not been a comprehensive comparison of these existing methods on a common platform. In this paper, we perform the first systematic analysis of the existing methods. In addition, we explore two promising new similarity computation methods - one text-based and the other based on network embeddings, which have not been considered till now.
摘要：计算两个法律文件之间的相似性是法律信息检索领域的重要和艰巨的任务。寻找类似的法律文件在下游任务，包括现有案例检索，法律条文的建议，等诸多应用。在此之前的作品提出的测量分析先例引网络法律文件之间的相似性，以及基于文本内容的相似性措施，测量相似的两大途径。但是，一直没有在公共平台上现有的这些方法的综合比较。在本文中，我们将执行现有方法的第一个系统的分析。此外，我们还探索两款有前途的新的相似度计算方法 - 一个基于文本的和基于网络的嵌入，这还没有考虑到现在其他。

57. Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching [PDF] 返回目录
Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, Marc Najork
Abstract: Many information retrieval and natural language processing problems can be formalized as a semantic matching task. However, the existing work in this area has been focused in large part on the matching between short texts like finding answer spans, sentences and passages given a query or a natural language question. Semantic matching between long-form texts like documents, which can be applied to applications such as document clustering, news recommendation and related article recommendation, is relatively less explored and needs more research effort. In recent years, self-attention based models like Transformers and BERT have achieved state-of-the-art performance in several natural language understanding tasks. These kinds of models, however, are still restricted to short text sequences like sentences due to the quadratic computational time and space complexity of self-attention with respect to the input sequence length. In this paper, we address these issues by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for document representation learning and matching, which contains several novel design choices to adapt self-attention models for long text inputs. For model pre-training, we propose the masked sentence block language modeling task in addition to the original masked word language modeling task used in BERT, to capture sentence block relations within a document. The experimental results on several benchmark data sets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art Siamese matching models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT for long-form document matching, and increases the maximum input text length from 512 to 2048 when compared with BERT-based baseline methods.
摘要：许多信息检索和自然语言处理的问题可形式化的语义匹配任务。然而，在这个领域现有的工作一直集中在很大程度上对短文就像找到一个给定的查询或自然语言问题的答案跨度，句子和段落之间的匹配。长格式的文本文档一样，它可以适用于应用，如文档聚类，新闻推荐及相关文章推荐的语义匹配，相对较少的探索，需要更多的研究工作。近年来，基于自我关注车型，如变压器和BERT几个自然语言理解任务都取得了国家的最先进的性能。这些类型的模型，但是，仍然局限于像句子简短的文字序列由于自身的关注相对于输入序列长度的平方计算时间和空间复杂度。在本文中，我们通过提出的基于变压器的连体多层次的深度（SMITH）编码器，用于文档表示学习和匹配，其中包含几个新颖的设计选择，以适应自身的关注模型长文本输入解决这些问题。对于模型前的训练中，我们提出了屏蔽语句块的语言建模任务除了在BERT使用的原始屏蔽词语言建模任务，在文档中捕获语句块关系。在几个基准数据集的实验结果进行长篇文档比表明，该SMITH模型优于以前的国家的最先进的连体匹配模型，包括层次的关注，多深入地关注基于分层回归神经网络和BERT为长格式文档匹配，并且当与基于BERT基线方法相比增加从512的最大输入文本长度为2048。

58. Jointly Trained Transformers models for Spoken Language Translation [PDF] 返回目录
Hari Krishna Vydana, Martin Karafi'at, Katerina Zmolikova, Luk'as Burget, Honza Cernocky
Abstract: Conventional spoken language translation (SLT) systems are pipeline based systems, where we have an Automatic Speech Recognition (ASR) system to convert the modality of source from speech to text and a Machine Translation (MT) systems to translate source text to text in target language. Recent progress in the sequence-sequence architectures have reduced the performance gap between the pipeline based SLT systems (cascaded ASR-MT) and End-to-End approaches. Though End-to-End and cascaded ASR-MT systems are reaching to the comparable levels of performances, we can see a large performance gap using the ASR hypothesis and oracle text w.r.t MT models. This performance gap indicates that the MT systems are prone to large performance degradation due to noisy ASR hypothesis as opposed to oracle text transcript. In this work this degradation in the performance is reduced by creating an end to-end differentiable pipeline between the ASR and MT systems. In this work, we train SLT systems with ASR objective as an auxiliary loss and both the networks are connected through the neural hidden representations. This train ing would have an End-to-End differentiable path w.r.t to the final objective function as well as utilize the ASR objective for better performance of the SLT systems. This architecture has improved from BLEU from 36.8 to 44.5. Due to the Multi-task training the model also generates the ASR hypothesis which are used by a pre-trained MT model. Combining the proposed systems with the MT model has increased the BLEU score by 1. All the experiments are reported on English-Portuguese speech translation task using How2 corpus. The final BLEU score is on-par with the best speech translation system on How2 dataset with no additional training data and language model and much less parameters.
摘要：传统的口语翻译（SLT）系统是管道系统中，当我们有一个自动语音识别（ASR）系统，以源的形式转换从语音到文本和机器翻译（MT）系统的源文本到文本翻译在目标语言。在序列序列架构最近进展已经降低（级联ASR-MT）的管道基于SLT系统之间的性能差距和结束至端接近。虽然终端对终端和级联ASR-MT系统伸手的表演相当的水平，我们可以看到使用ASR假设和Oracle文本w.r.t MT车型较大的性能差距。这种性能差距表明MT系统由于嘈杂的ASR假设是容易产生较大的性能下降，而不是Oracle Text的成绩单。在这项工作中这种退化的性能是通过创建ASR和MT系统之间的端至端微管减少。在这项工作中，我们培养具有ASR SLT系统目标作为辅助的损失和这两个网络都通过神经隐藏表示连接。这列火车ING将有端至端微路径w.r.t到最终的目标函数，以及利用ASR客观上为SLT系统的性能更好。这种架构已经从BLEU改善从36.8到44.5。由于多任务的训练模式也将生成由预训练MT模型中使用的ASR假说。与MT模型相结合所提出的系统由1，所有增加的BLEU得分的实验报告上使用How2语料库的英语，葡萄牙语语音翻译任务。最终的BLEU分数是在标准杆与How2数据集最好的语音翻译系统，没有额外的训练数据和语言模型和少得多的参数。

59. Deep Multimodal Neural Architecture Search [PDF] 返回目录
Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, Dacheng Tao, Qi Tian
Abstract: Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks. In this paper, we devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks. Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone, where each encoder or decoder block corresponds to an operation searched from a predefined operation pool. On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks. By using a gradient-based NAS algorithm, the optimal architectures for different tasks are learned efficiently. Extensive ablation studies, comprehensive analysis, and superior experimental results show that MMnasNet significantly outperforms existing state-of-the-art approaches across three multimodal learning tasks (over five datasets), including visual question answering, image-text matching, and visual grounding. Code will be made available.
摘要：设计有效的神经网络在深多学习至关重要的。大多数现有的作品集中在一个单一的任务和设计手动神经结构，这是高度针对特定任务的，难以推广到不同的任务。在本文中，我们设计了各种多模态学习任务广义深多式联运的神经结构搜索（MMnas）框架。鉴于多模式输入，我们首先定义一组基本的操作，然后构造一个深编码器 - 解码器基于统一骨架，其中每个编码器或解码块对应的动作从预定义的操作池搜索。在统一的骨干网的顶部，我们重视任务的具体负责人，以解决不同的多模态学习任务。通过使用基于梯度的NAS算法，针对不同任务的最佳架构有效地学习。广泛切除研究，全面分析和卓越的实验结果表明，MMnasNet显著优于现有的国家的最先进的跨三个多学习任务接近（超五类数据集），包括视觉问答，图片，文本匹配，和视觉接地。代码将提供。

60. Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement [PDF] 返回目录
Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz
Abstract: We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual features provide not only high-level information about speech activity, i.e. speech vs. no speech, but also fine-grained visual information about the place of articulation. An interesting byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual representations for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.
摘要：我们提出的视听语音增强模型的反思。尤其是，我们专注于解释神经视听语音增强模式如何利用视觉线索来提高目标语音信号的质量。我们表明，视觉功能不仅提供了有关语音活动的高级别信息，即语音与非语音，但对关节的地方也细粒度的视觉信息。这一发现的一个有趣的副产品是，了解到视觉的嵌入可作为功能的其他可视语音应用。我们证明学习视觉表现进行分类视素（视觉类比音素）的有效性。我们的研究结果提供洞察视听语音增强的重要方面，并演示模型如何等，可用于视觉语音应用自检任务。

61. Question Answering over Curated and Open Web Sources [PDF] 返回目录
Rishiraj Saha Roy, Avishek Anand
Abstract: The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.
摘要：在过去几年中已经看到的研究自动问答（QA）的主题爆炸，涵盖信息检索，自然语言处理和人工智能的社区。本教程将涵盖这一增长确实活跃期的亮点QA给观众一个掌握了当前正在使用的算法的家庭。策划知识图，非结构化的文本，或混合语料库：我们从何处的答案被检索的底层源分区的研究贡献。我们选择分区，因为它是最歧视性，当涉及到算法设计的这个方面。其他主要维度覆盖每个子主题中：如问题解决的复杂性，以及度explainability和交互性的系统介绍。我们可以得出结论教程与QA的无垠最有前途的新趋势，这将有助于新进入这个领域做出最佳决策，采取社会向前发展。因为在SIGIR 2016 QA上的最后一个教程很多人在社会改变了，我们认为，这及时的概述的确会受益大量的会议参与者。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-28

目录

摘要