摘要

1. StereoSet: Measuring stereotypical bias in pretrained language models [PDF] 返回目录
Moin Nadeem, Anna Bethke, Siva Reddy
Abstract: A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or Asians are bad drivers. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. In order to assess the adverse effects of these models, it is important to quantify the bias captured in them. Existing literature on quantifying bias evaluates pretrained language models on a small set of artificially constructed bias-assessing sentences. We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion. We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases. We also present a leaderboard with a hidden test set to track the bias of future language models at this https URL
摘要：刻板印象是对一个特定的一群人，如过广义信念，亚洲人擅长数学或亚洲人糟糕的司机。这样的信念（偏差）是众所周知的伤害目标群体。由于预训练的语言模型在大型现实世界的数据训练，他们被称为捕捉刻板偏见。为了评估这些模型的不利影响，它量化它们捕捉到的偏见是很重要的。在量化上预先训练一小部分人工建造的偏见，评估句子的语言模型偏向评估板现有的文献。我们提出StereoSet，在英语大规模自然数据集来测量四个领域刻板的偏见：性别，职业，种族和宗教。我们对我们的数据评估受欢迎的机型像BERT，GPT-2，罗伯塔和XLNet并表明，这些模型表现出强烈的刻板偏见。我们还提出排行榜有一个隐藏的测试集跟踪未来的语言模型的偏差在此HTTPS URL

2. PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation [PDF] 返回目录
Vivek Srivastava, Mayank Singh
Abstract: Code-mixing is the phenomenon of using more than one language in a sentence. It is a very frequently observed pattern of communication on social media platforms. Flexibility to use multiple languages in one text message might help to communicate efficiently with the target audience. But, it adds to the challenge of processing and understanding natural language to a much larger extent. This paper presents a parallel corpus of the 13,738 code-mixed English-Hindi sentences and their corresponding translation in English. The translations of sentences are done manually by the annotators. We are releasing the parallel corpus to facilitate future research opportunities in code-mixed machine translation. The annotated corpus is available at this https URL.
摘要：代码混合是在一个句子中使用一种以上语言的现象。它是沟通的社交媒体平台非常频繁的观测模式。灵活使用多种语言在一个文本消息可能有助于与目标受众有效的沟通。但是，这也增加了处理和理解自然语言到更大程度的挑战。本文介绍了13738代码混合英语 - 印地文句子的平行语料库与英语及其相应的翻译。句子的翻译由注释器手动完成。我们发布的平行语料库，以方便在代码混合机器翻译未来的研究机会。该标注语料库可在此HTTPS URL。

3. MPNet: Masked and Permuted Pre-training for Language Understanding [PDF] 返回目录
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
Abstract: BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. We argue that XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. We release the code and pre-trained model in GitHub\footnote{\url{this https URL}}.
摘要：BERT采用屏蔽语言模型（MLM）预培训和是最成功的前培训的车型之一。由于BERT忽略依赖预测的令牌中，XLNet介绍置换语言模型（PLM）预培训来解决这个问题。我们认为，XLNet没有利用句子的完整的位置信息，并因此从预培训和微调之间的位置差异受到影响。在本文中，我们提出MPNet，一种新型的预训练方法BERT和XLNet的继承优势，避免了其局限性。 MPNet利用依赖通过置换语言建模（相对于在MLM BERT）预测的令牌中，并且采取辅助位置信息作为输入，使模型看到一个完整的句子，从而减少了位置差异（在XLNet与PLM）。在一个大型数据集，我们预火车MPNet（超过160GB语料库）和各种降流任务（胶水，班长等）进行微调。实验结果表明，MPNet性能优于MLM和PLM大幅度，并用相同的模型设定下之前的状态的最先进的预训练的方法（例如，BERT，XLNet，罗伯塔）进行比较这些任务的实现更好的结果。在GitHub上\脚注我们发布的代码和预先训练模型{\ {URL这HTTPS URL}}。

4. Learning Geometric Word Meta-Embeddings [PDF] 返回目录
Pratik Jawanpuria, N T V Satya Dev, Anoop Kunchukuttan, Bamdev Mishra
Abstract: We propose a geometric framework for learning meta-embeddings of words from different embedding sources. Our framework transforms the embeddings into a common latent space, where, for example, simple averaging of different embeddings (of a given word) is more amenable. The proposed latent space arises from two particular geometric transformations - the orthogonal rotations and the Mahalanobis metric scaling. Empirical results on several word similarity and word analogy benchmarks illustrate the efficacy of the proposed framework.
摘要：我们提出了学习从不同的嵌入来源的话元的嵌入几何框架。我们的框架变换的嵌入到一个共同的潜在空间，其中，例如，不同的嵌入的简单平均（在给定的词）是更适合。所提出的潜在空间源自两个特定几何变换 - 正交旋转和马哈拉诺比斯度量缩放。几个单词相似性和字比喻基准实证结果表明了该框架的有效性。

5. A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT [PDF] 返回目录
Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee
Abstract: Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize and context window size are crucial factors to the transferability. We also observe the language-specific information in multilingual BERT. By manipulating the latent representations, we can control the output languages of multilingual BERT, and achieve unsupervised token translation. We further show that based on the observation, there is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
摘要：近日，多语种BERT工作非常出色的跨语言传输任务，优于静态非语境化的嵌入字。在这项工作中，我们提供了一个深入的实验研究，以补充的跨语言能力，现有的文献。我们比较的非语境和情境表示模型具有相同数据的跨语言的能力。我们发现，命令datasize和上下文窗口大小是至关重要的因素对转移性。我们还观察到在多语言BERT语言特定的信息。通过操纵潜表示，我们可以控制多语言BERT的输出语言，实现无人监管的令牌转换。进一步的研究表明基于这样的观察，有是提高多语种BERT的跨语言能力的计算便宜，但有效的方法。

6. On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond [PDF] 返回目录
Chen Wu, Prince Zizhuang Wang, William Yang Wang
Abstract: Variational autoencoders (VAEs) combine latent variables with amortized variational inference, whose optimization usually converges into a trivial local optimum termed posterior collapse, especially in text modeling. By tracking the optimization dynamics, we observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold. We argue that the trivial local optimum may be avoided by improving the encoder and decoder parameterizations since the posterior network is part of a transition map between them. To this end, we propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure and improves the encoder and decoder parameterizations via encoder weight sharing and decoder signal matching. We apply the proposed Coupled-VAE approach to various VAE models with different regularization, posterior family, decoder structure, and optimization strategy. Experiments on benchmark datasets (i.e., PTB, Yelp, and Yahoo) show consistently improved results in terms of probability estimation and richness of the latent space. We also generalize our method to conditional language modeling and propose Coupled-CVAE, which largely improves the diversity of dialogue generation on the Switchboard dataset.
摘要：变自动编码（VAES）结合潜在变量与摊销变推断，其优化通常收敛到一个简单的局部最优称为后崩溃，尤其是在文本造型。通过跟踪优化动态，我们观察到的编码器，解码器不兼容，导致数据歧管的差参数化。我们认为，琐碎局部最优可以通过改进编码器和解码器的参数化，因为后部网络是在它们之间的过渡图的一部分来避免。为此，我们提出了耦合VAE，其耦合一个VAE模型与具有相同结构的确定性自动编码并改善通过编码器重量共享和解码器的信号的匹配的编码器和解码器的参数化。我们应用所提出的耦合VAE方法不同VAE型号不同正规化，后家人，解码器结构，优化策略。上基准数据集实验（即，PTB，Yelp的，和Yahoo）示出在一致概率估计和潜在空间的丰富性方面改善的结果。我们还推广我们的方法，以有条件的语言模型，并提出耦合CVAE，这在很大程度上提高了对话一代的交换机上的数据集的多样性。

7. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT [PDF] 返回目录
Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng, Matthew P. Lungren
Abstract: The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we investigate BERT-based approaches to medical image report labeling that exploit both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.
摘要：标签的放射科报告文本中提取使医疗成像模型的大规模培训。现有的方法来报告贴标通常基于医疗领域的知识或由专家手工标注的复杂功能的工程依靠无论是。在这项工作中，我们研究的是利用现有基于规则的系统两者的规模和专家注释的品质的BERT的办法，以医学影像报告的标签。我们展示了一个模型BERT性能优越的第一训练了基于规则的贴标的注释，然后微调，对一小部分有自动回译增强专家的注解。我们发现，我们的最终模型，CheXbert，能够跑赢大市有统计学意义先前以规则为基础的最佳贴标，胸部X射线的最大的数据集的一个设置报告标记一个新的SOTA。

8. Variational Inference for Learning Representations of Natural Language Edits [PDF] 返回目录
Edison Marrese-Taylor, Machel Reid, Yutaka Matsuo
Abstract: Document editing has become a pervasive component of production of information, with version control systems enabling edits to be efficiently stored and applied. In light of this, the task of learning distributed representations of edits has been recently proposed. With this in mind, we propose a novel approach that employs variational inference to learn a continuous latent space of vector representations to capture the underlying semantic information with regard to the document editing process. We achieve this by introducing a latent variable to explicitly model the aforementioned features. This latent variable is then combined with a document representation to guide the generation of an edited-version of this document. Additionally, to facilitate standardized automatic evaluation of edit representations, which has heavily relied on direct human input thus far, we also propose a suite of downstream tasks, PEER, specifically designed to measure the quality of edit representations in the context of Natural Language Processing.
摘要：文档编辑已成为生产的信息的一个普遍的成分，与版本控制系统使得能够编辑被有效地存储和应用。鉴于此，学习编辑的分布式表示的任务已经最近提出。考虑到这一点，我们建议采用变推理学习向量表示连续潜在空间对于文档编辑过程中捕捉到潜在语义信息的新方法。我们通过引入潜在变量明确地建模上述功能实现这一目标。该潜变量然后用文档表示组合以指导本文件的已编辑版本的生成。此外，为了便于编辑表示，这很大程度上依赖于人类直接输入迄今的标准化自动评估，我们也提出了一套下游任务，PEER，专门用于测量编辑表示的自然语言处理的环境质量。

9. Compositionality and Generalization in Emergent Languages [PDF] 返回目录
Rahma Chaabouni, Eugene Kharitonov, Diane Bouchacourt, Emmanuel Dupoux, Marco Baroni
Abstract: Natural language allows us to refer to novel composite concepts by combining expressions denoting their parts according to systematic rules, a property known as \emph{compositionality}. In this paper, we study whether the language emerging in deep multi-agent simulations possesses a similar ability to refer to novel primitive combinations, and whether it accomplishes this feat by strategies akin to human-language compositionality. Equipped with new ways to measure compositionality in emergent languages inspired by disentanglement in representation learning, we establish three main results. First, given sufficiently large input spaces, the emergent language will naturally develop the ability to refer to novel composite concepts. Second, there is no correlation between the degree of compositionality of an emergent language and its ability to generalize. Third, while compositionality is not necessary for generalization, it provides an advantage in terms of language transmission: The more compositional a language is, the more easily it will be picked up by new learners, even when the latter differ in architecture from the original agents. We conclude that compositionality does not arise from simple generalization pressure, but if an emergent language does chance upon it, it will be more likely to survive and thrive.
摘要：自然语言允许我们通过结合根据系统的规则，被称为\ {EMPH}组合性的属性表达式表示其部分指新型复合概念。在本文中，我们研究了新兴的深的多代理仿真语言是否具有类似的指小说原始的组合能力，以及是否通过策略来完成这一壮举类似于人类的语言组合性。配备了新的方法来衡量在由代表学习的解开灵感突现的语言组合性，我们建立了三个主要成果。首先，鉴于足够大的输入空间，紧急语言自然会发展指新型复合概念的能力。第二，有一个新兴的语言组合性的程度和概括能力之间没有相关性。第三，虽然组合性是没有必要的概括，它提供了一个优势，在语言传输方面：更组成一个语言是，越容易将新的学习者，即使当后者在结构上不同于原始剂被拾起。我们的结论是组合性不从简单的概括压力出现，但如果一个新兴的语言确实在它的机会，这将是更容易生存和发展。

10. The State and Fate of Linguistic Diversity and Inclusion in the NLP World [PDF] 返回目录
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury
Abstract: Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications. In this paper we look at the relation between the types of languages, resources, and their representation in NLP conferences to understand the trajectory that different languages have followed over time. Our quantitative investigation underlines the disparity between languages, especially in terms of their resources, and calls into question the "language agnostic" status of current models and systems. Through this paper, we attempt to convince the ACL community to prioritise the resolution of the predicaments highlighted here, so that no language is left behind.
摘要：语言技术，有助于促进多种语言和语言的多样性世界各地。然而，只有超过7000种语言世界的一个非常小的数目在迅速发展的语言技术和应用的代表。在本文中，我们看种语言，资源之间的关系，以及它们在NLP会议表示理解轨迹，不同的语言都随时间跟踪。我们的定量研究强调语言之间的差距，尤其是在他们的资源方面，并质疑目前的模型和系统的“语言无关”状态。通过本文，我们试图说服ACL社会优先考虑困境的分辨率突显这里，所以没有语言被抛在后面。

11. Adaptation of a Lexical Organization for Social Engineering Detection and Response Generation [PDF] 返回目录
Archna Bhatia, Adam Dalton, Brodie Mather, Sashank Santhanam, Samira Shaikh, Alan Zemel, Tomek Strzalkowski, Bonnie J. Dorr
Abstract: We present a paradigm for extensible lexicon development based on Lexical Conceptual Structure to support social engineering detection and response generation. We leverage the central notions of ask (elicitation of behaviors such as providing access to money) and framing (risk/reward implied by the ask). We demonstrate improvements in ask/framing detection through refinements to our lexical organization and show that response generation qualitatively improves as ask/framing detection performance improves. The paradigm presents a systematic and efficient approach to resource adaptation for improved task-specific performance.
摘要：提出了一种基于词汇概念结构的扩展词汇的发展模式，以支持社会工程检测和响应的产生。我们充分利用要求的核心概念（行为的启发，如提供存取款）和帧（风险/回报被问隐含的）。我们证明ASK /通过改进帧检测到我们的词汇组织，并表明响应产生质的提高了作为问/帧检测性能的改进。范式提出了一个系统的，有效的方法，以适应资源，以提高任务的具体表现。

12. Taming the Expressiveness and Programmability of Graph Analytical Queries [PDF] 返回目录
Lu Qin, Longbin Lai, Kongzhang Hao, Zhongxin Zhou, Yiwei Zhao, Yuxing Han, Xuemin Lin, Zhengping Qian, Jingren Zhou
Abstract: Graph database has enjoyed a boom in the last decade, and graph queries accordingly gain a lot of attentions from both the academia and industry. We focus on analytical queries in this paper. While analyzing existing domain-specific languages (DSLs) for analytical queries regarding the perspectives of completeness, expressiveness and programmability, we find out that none of existing work has achieved a satisfactory coverage of these perspectives. Motivated by this, we propose the \flash DSL, which is named after the three primitive operators Filter, LocAl and PuSH. We prove that \flash is Turing complete (completeness), and show that it achieves both good expressiveness and programmability for analytical queries. We provide an implementation of \flash based on code generation, and compare it with native C++ codes and existing DSL using representative queries. The experiment results demonstrate \flash's expressiveness, and its capability of programming complex algorithms that achieve satisfactory runtime.
摘要：图形数据库一直享有在过去十年的繁荣，和图形查询相应从学术界和工业界都获得了很多关注的。我们专注于在本文的分析查询。在分析了关于完整性，表现力和可编程性的角度分析查询现有的领域特定语言（DSL），我们发现，没有任何现有的工作已经取得了这些观点的一个令人满意的报道。这个启发，我们提出了\闪光灯DSL，这是三大运营商的原始筛选命名，本地和推动。我们证明了\闪光灯图灵完整（完整），并表明它实现了分析查询既有良好的表现和可编程性。我们提供了基于代码生成\闪光的实现，并将其与使用有代表性的查询本地C ++代码和现有的DSL比较。实验结果表明\ Flash的表现力，而其复杂的算法，达到满意的运行时编程的能力。

13. Gated Convolutional Bidirectional Attention-based Model for Off-topic Spoken Response Detection [PDF] 返回目录
Yefei Zha, Ruobing Li, Hui Lin
Abstract: Off-topic spoken response detection, the task aiming at assessing whether the response is off-topic for the corresponding prompt, is important for automatic spoken assessment system. In many real-world educational applications, off-topic spoken response detection algorithm is required to achieve high recall not only on seen prompts but also on prompts that are unseen during the training process. In this paper, we propose a novel approach for off-topic spoken response detection with high off-topic recall on both seen and unseen prompts. We introduce a novel model, Gated Convolutional Bidirectional Attention-based Model (GCBi-AM), where bi-attention mechanism and convolutions are applied to extract topic words of prompt and key-phrases of a response, and gated unit and residual connections between each major layer are introduced to better represent the relevance of response and prompt. Moreover, a new negative sampling method is also proposed to augment training data. Experiment results demonstrate that our new approach can achieve significant improvements on detecting off-topic responses with extremely high on-topic recall rate, for both seen and unseen prompts.
摘要：可以使用响应检测题外话，目的在于评估任务是否响应是针对相应的提示题外话，是自动语音评估体系非常重要。在许多现实世界的教育应用程序，偏离主题的口头响应检测算法要求不仅看到提示，而且对那些在训练过程中看不见的提示，实现较高的召回率。在本文中，我们提出了对看见也看不见提示高题外话召回题外话口语响应检测的新方法。我们介绍一种新颖的模型，基于注意门控卷积双向模型（GCBI-AM），其中双注意机制和卷积各自之间施加的响应的提示和关键短语的提取主题词语，和门控单元和残余连接主要层被引入到更好地代表响应和迅速的相关性。此外，新的负采样方法也提出了扩充训练数据。实验结果表明，我们的新方法可以检测题外话反应具有极高的话题召回率，对于看到和看不到提示实现显著的改善。

14. Incorporating External Knowledge through Pre-training for Natural Language to Code Generation [PDF] 返回目录
Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig
Abstract: Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at this https URL.
摘要：开放域码生成旨在产生从自然语言（NL）意图在一个通用编程语言代码（例如Python）。通过直觉，开发商通常检索网络上编写代码时资源的推动下，我们探讨合并两个品种外部知识纳入NL-到代码生成的功效：自动开采NL-码对来自在线编程QA论坛的StackOverflow和编程语言API文档。我们的评估表明，组合所述两个源的数据增加和基于内容的检索数据重新采样的当前状态的最先进的最多提高到2.2％的绝对BLEU分值的代码生成测试床CoNaLa。代码和资源可在此HTTPS URL。

15. Adversarial Training for Large Neural Language Models [PDF] 返回目录
Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao
Abstract: Generalization and robustness are both key desiderata for designing machine learning methods. Adversarial training can enhance robustness, but past work often finds it hurts generalization. In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning. However, these models are still vulnerable to adversarial attacks. In this paper, we show that adversarial pre-training can improve both generalization and robustness. We propose a general algorithm ALUM (Adversarial training for large neural LangUage Models), which regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss. We present the first comprehensive study of adversarial training in all stages, including pre-training from scratch, continual pre-training on a well-trained model, and task-specific fine-tuning. ALUM obtains substantial gains over BERT on a wide range of NLP tasks, in both regular and adversarial scenarios. Even for models that have been well trained on extremely large text corpora, such as RoBERTa, ALUM can still produce significant gains from continual pre-training, whereas conventional non-adversarial methods can not. ALUM can be further combined with task-specific fine-tuning to attain additional gains. The ALUM code and pre-trained models will be made publicly available at this https URL.
摘要：泛化性和鲁棒性是设计的机器学习方法上的主要必要条件。对抗性训练能增强鲁棒性，但过去的工作中经常发现它伤害了概括。在自然语言处理（NLP），前培训大神经语言模型如BERT已经证明，在泛化令人印象深刻的增益的各种任务，从对抗微调进一步改善。然而，这些模型仍然容易受到攻击的对抗性。在本文中，我们证明了对抗前培训可以提高双方的泛化和鲁棒性。我们提出了一个通用算法明矾（大型神经语言模型对抗性训练），它通过以最大化对抗丢失嵌入空间应用扰动规则化的培养目标。我们目前在所有阶段对抗训练的首次全面研究，包括从头开始前的训练，不断前培训的训练有素的模型，并针对特定任务的微调。 ALUM获得了广泛的NLP任务了BERT可观的收益，常规和对抗性的情景。即使对于已经在非常大的语料库，如罗伯塔得到良好的训练模式，明矾仍然可以生产从不断前培训显著的收益，而传统的非对抗性的方法不能。 ALUM可以使用特定任务的微调进一步组合以获得额外的收益。明矾代码和预训练的车型将在此HTTPS URL公之于众。

16. The Cost of Training NLP Models: A Concise Overview [PDF] 返回目录
Or Sharir, Barak Peleg, Yoav Shoham
Abstract: We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as well as non-practitioners trying to make sense of the economics of modern-day Natural Language Processing (NLP).
摘要：我们审查的培养大型语言模型的成本，而这些成本的驱动因素。目标受众包括工程师和科学家的预算模型的培训实验，以及非从业者试图使现代的自然语言处理（NLP）的经济学意义。

17. Dynamic Knowledge Graph-based Dialogue Generation with Improved Adversarial Meta-Learning [PDF] 返回目录
Hongcai Xu, Junpeng Bao, Gaojie Zhang
Abstract: Knowledge graph-based dialogue systems are capable of generating more informative responses and can implement sophisticated reasoning mechanisms. However, these models do not take into account the sparseness and incompleteness of knowledge graph (KG)and current dialogue models cannot be applied to dynamic KG. This paper proposes a dynamic Knowledge graph-based dialogue generation method with improved adversarial Meta-Learning (KDAD). KDAD formulates dynamic knowledge triples as a problem of adversarial attack and incorporates the objective of quickly adapting to dynamic knowledge-aware dialogue generation. We train a knowledge graph-based dialog model with improved ADML using minimal training samples. The model can initialize the parameters and adapt to previous unseen knowledge so that training can be quickly completed based on only a few knowledge triples. We show that our model significantly outperforms other baselines. We evaluate and demonstrate that our method adapts extremely fast and well to dynamic knowledge graph-based dialogue generation.
摘要：基于知识的图形对话系统能够生成更多的信息的反应，并能实现复杂的推理机制。然而，这些模型没有考虑到稀疏和知识图（KG）和电流对话模式的不完整，不能适用于动态KG。本文提出了具有改善的对抗性元学习（KDAD）一个动态的基于图知识对话生成方法。 KDAD制定动态知识三元作为对抗攻击的问题，并结合客观的快速适应动态知识感知对话的产生。我们使用最小的训练样本训练一个基于图形知识的对话模型改进ADML。该模型可以初始化参数和适应以前看不见的知识，从而使训练能基于只有少数知识三元组迅速完成。我们表明，我们的模型显著优于其他基线。我们评估并证明我们的方法适应非常快，很好地动态的基于图形知识的对话产生。

18. A Chinese Corpus for Fine-grained Entity Typing [PDF] 返回目录
Chin Lee, Hongliang Dai, Yangqiu Song, Xin Li
Abstract: Fine-grained entity typing is a challenging task with wide applications. However, most existing datasets for this task are in English. In this paper, we introduce a corpus for Chinese fine-grained entity typing that contains 4,800 mentions manually labeled through crowdsourcing. Each mention is annotated with free-form entity types. To make our dataset useful in more possible scenarios, we also categorize all the fine-grained types into 10 general types. Finally, we conduct experiments with some neural models whose structures are typical in fine-grained entity typing and show how well they perform on our dataset. We also show the possibility of improving Chinese fine-grained entity typing through cross-lingual transfer learning.
摘要：细粒度实体类型是具有广泛应用的一项艰巨的任务。然而，对于这个任务大多数现有的数据集是英文的。在本文中，我们介绍了一个包含4800提到了众包通过手工标注中国细粒度实体打字语料库。每提及标注有自由形式的实体类型。为了让更多的可能出现的情况我们的数据非常有用，我们也归类所有的细粒度类型分为10种基本类型。最后，我们进行实验与一些神经模型，其结构是细粒度的实体类型，并显示他们对我们的数据如何执行典型。我们还表明，通过跨语言迁移学习提高中国细粒度实体类型的可能性。

19. On the decomposition of generalized semiautomata [PDF] 返回目录
Merve Nur Cakir, Karl-Heinz Zimmermann
Abstract: Semi-automata are abstractions of electronic devices that are deterministic finite-state machines having inputs but no outputs. Generalized semiautomata are obtained from stochastic semiautomata by dropping the restrictions imposed by probability. It is well-known that each stochastic semiautomaton can be decomposed into a sequential product of a dependent source and a deterministic semiautomaton making partly use of the celebrated theorem of Birkhoff-von Neumann. It will be shown that each generalized semiautomaton can be partitioned into a sequential product of a generalized dependent source and a deterministic semiautomaton.
摘要：半自动机是为具有输入，但没有输出确定性有限状态机的电子装置的抽象。广义semiautomata从随机semiautomata通过删除通过概率施加的限制获得。这是公知的，每个随机半自动机可以被分解成一个依赖源的顺序产物和确定性半自动机使部分地使用伯克霍夫-冯·诺依曼的著名定理。将示出，每个广义半自动机可以被划分为一个广义依赖源和确定性半自动机的顺序的产物。

20. Knowledge-graph based Proactive Dialogue Generation with Improved Meta-Learning [PDF] 返回目录
Hongcai Xu, Junpeng Bao, Junqing Wang
Abstract: Knowledge graph-based dialogue systems can narrow down knowledge candidates for generating informative and diverse responses with the use of prior information, e.g., triple attributes or graph paths. However, most current knowledge graph (KG) cover incomplete domain-specific knowledge. To overcome this drawback, we propose a knowledge graph based proactive dialogue generation model (KgDg) with three components, improved model-agnostic meta-learning algorithm (MAML), knowledge selection in knowledge triplets embedding, and knowledge aware proactive response generator. For knowledge triplets embedding and selection, we formulate it as a problem of sentence embedding to better capture semantic information. Our improved MAML algorithm is capable of learning general features from a limited number of knowledge graphs, which can also quickly adapt to dialogue generation with unseen knowledge triplets. Extensive experiments are conducted on a knowledge aware dialogue dataset (DuConv). The results show that KgDg adapts both fast and well to knowledge graph-based dialogue generation and outperforms state-of-the-art baseline.
摘要：基于图形知识对话系统可以缩小知识的候选人，用于产生具有使用现有的信息，例如，三属性或图形的路径信息和多样的响应。然而，目前大多数知识图谱（KG）覆盖不完整的特定领域的知识。为了克服这个缺点，我们提出了一个知识基于图形的积极对话，代车型（KgDg）由三个部分组成，在知识三胞胎嵌入改进型号无关元学习算法（MAML），知识选择，知识意识到积极回应发生器。对于知识三胞胎嵌入和选择，我们制定它作为句子的嵌入，以更好地捕捉语义信息的问题。我们改进MAML算法能够从有限数量的知识图的，这也能很快适应对话与一代看不见的知识三胞胎学习一般特征。大量的实验是在一个知识了解对话的数据集（DuConv）进行。结果表明，KgDg适应快速和良好知识基于图形的对话一代，优于国家的最先进的基线。

21. Extractive Summarization as Text Matching [PDF] 返回目录
Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang
Abstract: This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in this https URL.
摘要：本文创建对于我们构建神经采掘摘要系统方式的转变。而不是按照单独提取句子和建模句子之间的关系，常用的框架，我们制定采掘汇总任务作为语义文本匹配的问题，其中一个源文件和候选总结会（从原来的文本中提取）的匹配语义空间。值得注意的是，这种模式转变到语义匹配框架良好接地在我们全面的句子层次，基于数据集的特性总结性提取之间的固有差距分析。此外，即使实例化具有匹配模型的简单形式的框架内，我们（在ROUGE-1 44.41）驱动状态的最先进的萃取结果在CNN /每日邮报到新的水平。在其他五个数据集实验还表明匹配框架的有效性。我们相信，这种基于匹配，总结框架的动力还没有被充分利用。为了鼓励在未来更多的实例，我们已经发布了这个HTTPS URL我们的代码，处理的数据集，以及生成的摘要。

22. Pattern Learning for Detecting Defect Reports and Improvement Requests in App Reviews [PDF] 返回目录
Gino V.H. Mangnoesing, Maria Mihaela Trusca, Flavius Frasincar
Abstract: Online reviews are an important source of feedback for understanding customers. In this study, we follow novel approaches that target this absence of actionable insights by classifying reviews as defect reports and requests for improvement. Unlike traditional classification methods based on expert rules, we reduce the manual labour by employing a supervised system that is capable of learning lexico-semantic patterns through genetic programming. Additionally, we experiment with a distantly-supervised SVM that makes use of noisy labels generated by patterns. Using a real-world dataset of app reviews, we show that the automatically learned patterns outperform the manually created ones, to be generated. Also the distantly-supervised SVM models are not far behind the pattern-based solutions, showing the usefulness of this approach when the amount of annotated data is limited.
摘要：网上的评论是反馈了解客户的重要来源。在这项研究中，我们遵循的分类审查的缺陷报告和请求改进的目标，这个缺乏可操作的见解的新方法。与基于专家规则的传统分类方法，我们采用一个监督系统，能够通过基因编程学习字典式的语义模式的减少手工劳动。此外，我们与遥远监督SVM实验是利用由图案产生嘈杂的标签。使用的应用评价现实世界的数据集，我们证明了自动学习模式优于人工创建的，要生成。另外，遥远监督SVM模型是不远处的基于模式的解决方案的背后，显示了当注释的数据量限制了这种方法的有效性。

23. BanFakeNews: A Dataset for Detecting Fake News in Bangla [PDF] 返回目录
Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar
Abstract: Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable resource for building technologies to prevent the spreading of fake news and contribute in research with low resource languages.
摘要：观察可以通过在不同领域，如政治，金融，使用语言分析假新闻自动识别假新闻的快速繁殖要做的损害已经引起了研究界的关注。然而，这种方法在很大程度上正在为英语，其中低资源语言仍然是重点研发出来。但是，假冒和操纵新闻催生的风险没有被限制的语言。在这项工作中，我们提出〜50K新闻的注释的数据集可用于楼宇自动化假新闻的检测系统，对于像孟加拉低资源语言。此外，我们提供的数据集进行分析，并制定基准系统的艺术NLP技术状态，以确定孟加拉假新闻。为了创建这个系统中，我们探索传统的语言特征和神经网络的方法。我们预计，该数据集将是建筑技术，以防止假新闻的传播和与低资源语言研究作出贡献的宝贵资源。

24. Enhancing Pharmacovigilance with Drug Reviews and Social Media [PDF] 返回目录
Brent Biseda, Katie Mo
Abstract: This paper explores whether the use of drug reviews and social media could be leveraged as potential alternative sources for pharmacovigilance of adverse drug reactions (ADRs). We examined the performance of BERT alongside two variants that are trained on biomedical papers, BioBERT7, and clinical notes, Clinical BERT8. A variety of 8 different BERT models were fine-tuned and compared across three different tasks in order to evaluate their relative performance to one another in the ADR tasks. The tasks include sentiment classification of drug reviews, presence of ADR in twitter postings, and named entity recognition of ADRs in twitter postings. BERT demonstrates its flexibility with high performance across all three different pharmacovigilance related tasks.
摘要：本文探讨了使用药物的评论和社交媒体是否可以利用作为药品不良反应（ADR）的药物警戒潜在的替代来源。我们研究了BERT的性能一起被生物医学论文，BioBERT7和临床记录，临床BERT8训练有素的两个变种。各种各样的8个不同的BERT模型进行微调，并以评估它们的相对表现在ADR任务彼此在三个不同的任务进行比较。任务包括的药物评论情感分类，在Twitter上的帖子ADR的存在，并在Twitter上张贴不良反应的命名实体识别。 BERT表明它与所有三个不同的药物安全相关的任务高性能的灵活性。

25. SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings [PDF] 返回目录
Masoud Jalili Sabet, Philipp Dufter, Hinrich Schütze
Abstract: Word alignments are useful for tasks like statistical and neural machine translation (NMT) and annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings, both static and contextualized, for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are competitive and mostly superior to traditional statistical aligners, even in scenarios with abundant parallel data. For example, for a set of 100k parallel sentences, contextualized embeddings achieve a word alignment F1 for English-German that is more than 5% higher (absolute) than eflomal, a high quality alignment model.
摘要：字路线是一样的统计和神经机器翻译（NMT）和注释投影任务非常有用。统计字对准表现良好，因为这样做的方法是提取物的比对与NMT共同翻译。然而，大多数方法需要平行训练数据和质量下降的培训较少的数据是可用的。我们建议，不需要并行数据字对准方法。其核心思想是利用多语种文字的嵌入，静态和语境，为字对齐。我们的多语种的嵌入是从单语数据仅供而不依赖于任何并行数据或字典创建。我们找到的嵌入创建的路线是有竞争力的，大多优于传统的统计对准，即使在具有丰富的并行数据的情况。例如，对于一组平行100K句子，情境化的嵌入实现英语 - 德语是超过5％以上（绝对值）比eflomal，高质量对准模型字对准F1。

26. Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation [PDF] 返回目录
Kaustubh D. Dhole, Christopher D. Manning
Abstract: Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing Syn-QG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than existing QG systems and that back-translation drastically improves grammaticallity at a slight cost of generating irrelevant questions.
摘要：询问生成（QG）本质上是一个简单的语法变换;然而，语义的许多方面影响什么问题都好形成。我们实现这个观察通过开发的Syn-QG，一组透明的语法规则利用通用的依赖，浅层语义分析，词汇资源，并转化陈述句成问题 - 答案对自定义规则。我们利用PropBank参数的描述和VerbNet状态谓词纳入浅层语义内容，这有助于产生一个描述性的问题，并产生比现有系统的推理和语义丰富的问题。为了提高语法的流畅性和消除语法不正确的问题，我们采用回译在这些语法规则。一组人群来源的评估表明，我们的系统可以产生比现有QG系统高度的语法和相关问题和回译较大数量的生成无关的问题略有成本大幅提高grammaticallity。

27. A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention [PDF] 返回目录
Maria Mihaela Trusca, Daan Wassenberg, Flavius Frasincar, Rommert Dekker
Abstract: The Web has become the main platform where people express their opinions about entities of interest and their associated aspects. Aspect-Based Sentiment Analysis (ABSA) aims to automatically compute the sentiment towards these aspects from opinionated text. In this paper we extend the state-of-the-art Hybrid Approach for Aspect-Based Sentiment Analysis (HAABSA) method in two directions. First we replace the non-contextual word embeddings with deep contextual word embeddings in order to better cope with the word semantics in a given text. Second, we use hierarchical attention by adding an extra attention layer to the HAABSA high-level representations in order to increase the method flexibility in modeling the input data. Using two standard datasets (SemEval 2015 and SemEval 2016) we show that the proposed extensions improve the accuracy of the built model for ABSA.
摘要：Web已经成为主要的平台，在这里人们表达他们对感兴趣的实体及其相关方面的意见。一方面，基于情感分析（ABSA）旨在自动计算情绪对从自以为是文本这些方面。在本文中，我们在延伸两个方向看点基于情感分析（HAABSA）方法的状态的最先进的混合方法。首先，我们更换深上下文字的嵌入非上下文的单词的嵌入，以在给定文本字的语义更好地应对。其次，我们采用分层关注的，以增加输入数据建模方法的灵活性增加额外的关注层到HAABSA高层表示。使用两个标准数据集（SemEval 2015年和SemEval 2016），我们表明，所提出的扩展提高ABSA内置模型的准确性。

28. Exclusive Hierarchical Decoding for Deep Keyphrase Generation [PDF] 返回目录
Wang Chen, Hou Pong Chan, Piji Li, Irwin King
Abstract: Keyphrase generation (KG) aims to summarize the main ideas of a document into a set of keyphrases. A new setting is recently introduced into this problem, in which, given a document, the model needs to predict a set of keyphrases and simultaneously determine the appropriate number of keyphrases to produce. Previous work in this setting employs a sequential decoding process to generate keyphrases. However, such a decoding method ignores the intrinsic hierarchical compositionality existing in the keyphrase set of a document. Moreover, previous work tends to generate duplicated keyphrases, which wastes time and computing resources. To overcome these limitations, we propose an exclusive hierarchical decoding framework that includes a hierarchical decoding process and either a soft or a hard exclusion mechanism. The hierarchical decoding process is to explicitly model the hierarchical compositionality of a keyphrase set. Both the soft and the hard exclusion mechanisms keep track of previously-predicted keyphrases within a window size to enhance the diversity of the generated keyphrases. Extensive experiments on multiple KG benchmark datasets demonstrate the effectiveness of our method to generate less duplicated and more accurate keyphrases.
摘要：关键词的代（KG）的目的是文档的主要思想总结成一套的关键字句。一个新的设置最近推出了这个问题，其中，给定一个文档，模型需要预测一组关键字句，并同时确定关键字句，产生适当数量。在此设置中的先前工作采用顺序解码处理，以生成关键短语。然而，这样的解码方法忽略存在于关键词的集合的文档的固有分层组合性。此外，以前的工作容易产生重复的关键字句，浪费时间和计算资源。为了克服这些限制，我们提出了一个独特的分层解码架构，包括层次解码过程，无论是软或硬排除机制。分层解码过程是一个关键词短语集的分层组合性明确建模。无论是软，硬排斥机制保持一个窗口大小内轨道先前预测的关键字句，以提高所产生的关键字句的多样性。多KG基准数据集大量的实验证明我们的方法的有效性，产生更少的重复和更精确的关键字句。

29. A Formal Hierarchy of RNN Architectures [PDF] 返回目录
William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav
Abstract: We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based on two formal properties: space complexity, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine. We place several RNN variants within this hierarchy. For example, we prove the LSTM is not rational, which formally separates it from the related QRNN (Bradbury et al., 2016). We also show how these models' expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions. Our results build on the theory of "saturated" RNNs (Merrill, 2019). While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy. Experimental findings from training unsaturated networks on formal languages support this conjecture.
摘要：我们开发RNN架构的表现能力的一个正式的等级。层次结构是基于两个正式属性：空间复杂性，其测量RNN的存储器，和合理的复发，定义为是否经常更新可以通过加权有限状态机来描述。我们把这个层次结构中的几个变种RNN。例如，我们证明了LSTM是不理性的，这从相关QRNN（白普理等，2016）正式分开了。我们还表明如何将这些模型的表现能力是通过堆叠多层或具有不同的功能汇集它们组成扩大。我们的研究结果建立在“饱和” RNNs（美林，2019）的理论。虽然在形式上扩展这些调查结果的不饱和RNNs留给今后的工作中，我们假设不饱和RNNs服从相似层级的实际可学习的能力。从形式语言训练的不饱和网络的实验结果支持这一猜想。

30. Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills [PDF] 返回目录
Eric Michael Smith, Mary Williamson, Kurt Shuster, Jason Weston, Y-Lan Boureau
Abstract: Being engaging, knowledgeable, and empathetic are all desirable general qualities in a conversational agent. Previous work has introduced tasks and datasets that aim to help agents to learn those qualities in isolation and gauge how well they can express them. But rather than being specialized in one single quality, a good open-domain conversational agent should be able to seamlessly blend them all into one cohesive conversational flow. In this work, we investigate several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages. We further propose a new dataset, BlendedSkillTalk, to analyze how these capabilities would mesh together in a natural conversation, and compare the performance of different architectures and training schemes. Our experiments show that multi-tasking over several tasks that focus on particular capabilities results in better blended conversation performance compared to models trained on a single skill, and that both unified or two-stage approaches perform well if they are constructed to avoid unwanted bias in skill selection or are fine-tuned on our new task.
摘要：作为配合，知识渊博，和同情是在会话代理的所有理想的综合素质。以前的工作已引进任务，旨在帮助代理商学习孤立那些素质和衡量他们如何能够充分表达他们的数据集。但是，而不是专门在一个单一的质量，良好的开放域的会话代理应该能够将它们全部无缝融合成一个凝聚力的对话流。在这项工作中，我们研究了几种方式结合起来向隔离能力的培训模式，从需要极少的额外培训，为，在所有训练阶段包含了好几个技能，各种形式的多任务训练简单的模型聚合方案。我们进一步提出了一个新的数据集，BlendedSkillTalk，分析这些功能如何将啮合在一起在一个自然对话，并比较不同的架构和培训方案的性能。我们的实验显示，多任务处理过的几个任务，重点关注特定的能力产生更好的混合谈话相比，性能上训练技能单一的模型以及两者统一或两阶段，如果他们被构造为避免不必要的偏见方法表现良好技能选择或正在微调我们的新任务。

31. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers [PDF] 返回目录
Jung-Woo Ha, Kihyun Nam, Jin Gu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Sunghun Kim
Abstract: Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available speech corpora such as Switchboard are old-fashioned. Also, most existing call corpora are in English and mainly focus on open-domain dialog or general scenarios such as audiobooks. Here we introduce a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people, i.e., ClovaCall corpus. ClovaCall includes approximately 60,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain. We validate the effectiveness of our dataset with intensive experiments using two standard ASR models. Furthermore, we release our ClovaCall dataset and baseline source codes to be available via this https URL.
摘要：通过电话的自动语音识别（ASR）可用于各种应用，包括AI联络中心（AICC）服务至关重要。尽管ASR的进步，但是，大多数公开可用的语音语料库，如总机是老式的。此外，大多数现有呼叫语料库的英语和主要集中在开放域的对话框或一般的场景，例如有声读物。在这里，我们介绍一种新的大型韩国基于电话语音下面向目标的对话场景语料来自超过11000人，即ClovaCall语料库。 ClovaCall包括60,000对的短句，并在餐厅预订领域其对应的讲话发音。我们确认我们的数据集的使用两个标准ASR车型密集实验的有效性。此外，我们发布ClovaCall数据集和基准源代码通过此HTTPS URL可用。

32. CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings [PDF] 返回目录
Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent
Abstract: Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.
摘要：继第一，第二，第三，第四的成功和第5磬挑战，我们组织了第六届磬语音分离和识别挑战（CHIME-6）。新的挑战，重温以前的钟声-5挑战，还考虑，在日常家庭环境遥远的多麦克风对话语音diarization和认可的问题。语音材料是相同的先前磬-5记录除了准确阵列同步。使用与拍摄到代表自然对话语音的采集数据的努力晚宴场景中的物质引起的。本文提供了磬-6挑战的两个分段的多扬声器语音识别的基线描述（第1道）和不分段的多扬声器的语音识别（第2道）。值得注意的是，第2道是在社区的第一个挑战活动，以解决不分段的多扬声器的语音识别方案，其中包含了一整套可重复的开源基线提供语音增强，扬声器diarization和语音识别模块。

33. A Practical Guide to Studying Emergent Communication through Grounded Language Games [PDF] 返回目录
Jens Nevens, Paul Van Eecke, Katrien Beuls
Abstract: The question of how an effective and efficient communication system can emerge in a population of agents that need to solve a particular task attracts more and more attention from researchers in many fields, including artificial intelligence, linguistics and statistical physics. A common methodology for studying this question consists of carrying out multi-agent experiments in which a population of agents takes part in a series of scripted and task-oriented communicative interactions, called 'language games'. While each individual language game is typically played by two agents in the population, a large series of games allows the population to converge on a shared communication system. Setting up an experiment in which a rich system for communicating about the real world emerges is a major enterprise, as it requires a variety of software components for running multi-agent experiments, for interacting with sensors and actuators, for conceptualising and interpreting semantic structures, and for mapping between these semantic structures and linguistic utterances. The aim of this paper is twofold. On the one hand, it introduces a high-level robot interface that extends the Babel software system, presenting for the first time a toolkit that provides flexible modules for dealing with each subtask involved in running advanced grounded language game experiments. On the other hand, it provides a practical guide to using the toolkit for implementing such experiments, taking a grounded colour naming game experiment as a didactic example.
摘要：如何有效和高效的通信系统可以在代理群体出现一些需要解决一个特定的任务，从在许多领域，包括人工智能，语言学和统计物理研究吸引了越来越多的关注的问题。为研究这一问题的常见方法由执行多代理实验，其中的代理商群体参与一系列脚本和面向任务的沟通互动，所谓的“语言游戏”的。虽然每个人的语言游戏是由两个代理在人口通常扮演，大型系列游戏，使人民得以共享通信系统的收敛。设置在有关现实世界通信的丰富的系统出现是一个重大的企业，因为它需要各种软件组件的运行多剂的实验，对于传感器和执行器进行交互，用于构思和解释语义结构的实验，和用于这些语义结构和语言话语之间的映射。本文的目的是双重的。在一方面，它引入了扩展通天软件系统，呈现首次开发一个工具，提供了灵活的模块处理涉及运行先进的接地语言游戏实验每个子任务的高级机器人接口。在另一方面，它提供了一个实用指南使用用于实现这样的实验，以接地的颜色命名游戏实验作为教学示例的工具包。

34. Graph-Structured Referring Expression Reasoning in The Wild [PDF] 返回目录
Sibei Yang, Guanbin Li, Yizhou Yu
Abstract: Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning. Data and code are available at this https URL
摘要：接地参考表达式目标的图像的目的是通过自然语言表达中提到的定位。一个参考表达的语言结构提供推理在视觉内容的布局，并且它往往是至关重要的对准和共同理解所述图像和所述参照表达。在本文中，我们提出了引导模块化网络（SGMN）场景图，其执行在推理语义图，并与表达的语言结构的指导下神经模块场景图。特别是，我们建模图像作为结构化的语义图，并解析表达式转换为语言场景图。语言场景图不仅解码表达的语言结构，而且还具有与所述图像语义图一致的表示。除了探索结构化的解决方案，接地参考的表达，我们也提出了参考推理，大规模真实世界的数据集结构指表达推理。我们自动生成了不同的使用表达式模板和功能程序图像的场景图指称词语。此数据集配有真实世界的视觉内容用不同的推理布局以及语义丰富的表情。实验结果表明，我们的SGMN不仅显著优于现有的到手价推理的数据集的国家的最先进的算法，但也超过了国家的最先进的上常用的标准数据集构建方法。它还可以提供推理可解释的视觉证据。数据和代码都可以在此HTTPS URL

35. Are we pretraining it right? Digging deeper into visio-linguistic pretraining [PDF] 返回目录
Amanpreet Singh, Vedanuj Goswami, Devi Parikh
Abstract: Numerous recent works have proposed pretraining generic visio-linguistic representations and then finetuning them for downstream vision and language tasks. While architecture and objective function design choices have received attention, the choice of pretraining datasets has received little attention. In this work, we question some of the default choices made in literature. For instance, we systematically study how varying similarity between the pretraining dataset domain (textual and visual) and the downstream domain affects performance. Surprisingly, we show that automatically generated data in a domain closer to the downstream task (e.g., VQA v2) is a better choice for pretraining than "natural" data but of a slightly different domain (e.g., Conceptual Captions). On the other hand, some seemingly reasonable choices of pretraining datasets were found to be entirely ineffective for some downstream tasks. This suggests that despite the numerous recent efforts, vision & language pretraining does not quite work "out of the box" yet. Overall, as a by-product of our study, we find that simple design choices in pretraining can help us achieve close to state-of-art results on downstream tasks without any architectural changes.
摘要：许多近期的作品提出了训练前的通用Visio的语言表述，然后微调他们的下游视觉和语言的任务。虽然结构和目标函数设计的选择已经受到关注，预训练数据集的选择很少受到关注。在这项工作中，我们质疑的一些文献中提出的默认选项。例如，我们系统地研究如何在训练前的数据集域（文本和视频）和下游领域之间的相似性变化对性能的影响。令人惊讶地，我们显示在域自动生成的数据更接近下游任务（例如，VQA v2）的比“天然”的数据，但稍微不同的域（例如，概念字幕）的预训练更好的选择。在另一方面，发现训练前的数据集的一些看似合理的选择是对一些下游任务完全无效的。这表明，尽管最近多次努力，视力及语言训练前完全不是那么回事“开箱即用”呢。总体来说，作为我们研究的副产品，我们发现在训练前可以帮助我们实现接近下游任务的国家的艺术效果没有任何的架构改变这么简单的设计选择。

36. Pro-Russian Biases in Anti-Chinese Tweets about the Novel Coronavirus [PDF] 返回目录
Autumn Toney, Akshat Pandey, Wei Guo, David Broniatowski, Aylin Caliskan
Abstract: The recent COVID-19 pandemic, which was first detected in Wuhan, China, has been linked to increased anti-Chinese sentiment in the United States. Recently, Broniatowski et al. found that foreign powers, and especially Russia, were implicated in information operations using public health crises to promote discord -- including racial conflict -- in American society (Broniatowski, 2018). This brief considers the problem of automatically detecting changes in overall attitudes, that may be associated with emerging information operations, via artificial intelligence. Accurate analysis of these emerging topics usually requires laborious, manual analysis by experts to annotate millions of tweets to identify biases in new topics. We introduce extensions of the Word Embedding Association Test from Caliskan et. al to a new domain (Caliskan, 2017). This practical and unsupervised method is applied to quantify biases being promoted in information operations. Analyzing historical information operations from Russia's interference in the 2016 U.S. presidential elections, we quantify biased attitudes for presidential candidates, and sentiment toward Muslim groups. We next apply this method to a corpus of tweets containing anti-Chinese hashtags. We find that roughly 1% of tweets in our corpus reference Russian-funded news sources and use anti-Chinese hashtags and, beyond the expected anti-Chinese attitudes, we find that this corpus as a whole contains pro-Russian attitudes, which are not present in a control Twitter corpus containing general tweets. Additionally, 4% of the users in this corpus were suspended within a week. These findings may indicate the presence of abusive account activity associated with rapid changes in attitudes around the COVID-19 public health crisis, suggesting potential information operations.
摘要：最近COVID-19大流行，这是首次在中国武汉的检测，已与增加反中国情绪在美国。近日，Broniatowski等。包括种族冲突 - - 在美国社会（Broniatowski，2018）发现，外国列强，尤其是俄罗斯，是在信息的操作使用公共健康危机，促进不和谐有关。这个简短的考虑自动检测总体态度的变化，这可能与新兴的信息业务相关的问题，通过人工智能。这些新出现的主题准确的分析通常由专家需要费力，人工分析注释数以百万计的tweets，以确定新的主题偏见。我们介绍从Caliskan等字嵌入联想测验的扩展。人到一个新的域（Caliskan，2017）。这本实用且无人监管的方法应用到量化偏见信息的操作进行推广。俄罗斯在2016年美国总统选举的干扰分析历史信息的操作，我们偏向量化为总统候选人对穆斯林群体的态度和情绪。接下来，我们这个方法适用于含有抗中国井号标签的tweets的语料库。我们发现在我们的语料库参考鸣叫俄罗斯资助的新闻来源，并使用抗中国井号标签，并超出了预期的反中国的态度，大约1％，我们发现，这个语料库作为一个整体包含亲俄罗斯的态度，这是不存在于包含普通的鸣叫的控制微博语料库。此外，在该语料库中的用户的4％在一周内悬浮。这些发现可能表明围绕COVID-19公共健康危机的态度迅速改变，提示潜在的信息业务相关违规帐户活动的存在。

37. Knowledge-Based Visual Question Answering in Videos [PDF] 返回目录
Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima
Abstract: We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations.
摘要：本文提出通过融合知识为基础的视频答疑一种新型的视频了解任务。首先，我们介绍KnowIT VQA，视频数据集提供有关流行的情景喜剧24282人类产生的问题 - 回答对。该数据集联合视觉，文本和时间相干性与知识为基础的问题，这些问题需要回答需要从一系列的观察获得的经验一起推理。其次，我们建议结合视觉和文本的视频内容有关的节目的具体知识视频的理解模型。我们的主要结论是：（i）知识的结合产生的视频VQA优秀的改进，和（ii）在KnowIT VQA的性能仍然落后以及人类准确性背后，表明其对研究当前视频建模的限制效用。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-04-21

目录

摘要