摘要

1. Discovering and Categorising Language Biases in Reddit [PDF] 返回目录
Xavier Ferrer, Tom van Nuenen, Jose M. Such, Natalia Criado
Abstract: We present a data-driven approach using word embeddings to discover and categorise language biases on the discussion platform Reddit. As spaces for isolated user communities, platforms such as Reddit are increasingly connected to issues of racism, sexism and other forms of discrimination. Hence, there is a need to monitor the language of these groups. One of the most promising AI approaches to trace linguistic biases in large textual datasets involves word embeddings, which transform text into high-dimensional dense vectors and capture semantic relations between words. Yet, previous studies require predefined sets of potential biases to study, e.g., whether gender is more or less associated with particular types of jobs. This makes these approaches unfit to deal with smaller and community-centric datasets such as those on Reddit, which contain smaller vocabularies and slang, as well as biases that may be particular to that community. This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit. In our approach, protected attributes are connected to evaluative words found in the data, which are then categorised through a semantic analysis system. We verify the effectiveness of our method by comparing the biases we discover in the Google News dataset with those found in previous literature. We then successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities. We conclude by discussing potential application scenarios and limitations of this data-driven bias discovery method.
摘要：我们目前使用的嵌入字上发现和讨论的平台reddit的群归类语言偏见一个数据驱动的方法。至于隔离的用户社区空间，平台，例如Reddit越来越多地连接到种族主义，性别歧视和其他形式的歧视问题。因此，有必要监控这些团体的语言。其中最有希望AI的方法来跟踪语言偏见在大数据集文本涉及的嵌入字，其中转换成文本高维密集向量和词之间捕获的语义关系。然而，以往的研究要求组预定义的潜在偏见学习，例如中，性别是否与特定类型的工作或多或少的关联。这使得这些方法不适合处理更小和以社区为中心的数据集，例如那些在Reddit上，其中包含更小的词汇和俚语，以及偏见，可能是特别社区。本文提出了一种数据驱动的方法来自动发现在Reddit上的在线社区话语的词汇编码语言的偏见。在我们的方法，保护属性连接到数据，然后再通过语义分析系统分类评价中发现的话。我们通过比较我们在谷歌新闻的数据集与以前的文献中发现发现偏见验证了该方法的有效性。然后，我们发现成功的性别偏见，宗教偏见，并在不同的reddit的社区种族偏见。最后，我们讨论潜在的应用场景和这个数据驱动的偏置发现方法的局限性。

2. Compositional Networks Enable Systematic Generalization for Grounded Language Understanding [PDF] 返回目录
Yen-Ling Kuo, Boris Katz, Andrei Barbu
Abstract: Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of neural networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in a recently-released dataset, gSCAN, which explicitly measures how well a robotic agent is able to interpret novel ideas grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing all other parameters and properties to be learned end-to-end with weak supervision. We build a general-purpose mechanism that enables robots to generalize their language understanding to compositional domains. Crucially, our base network has the same state-of-the-art performance as prior work, 97% execution accuracy, while at the same time generalizing its knowledge when prior work does not; for example, achieving 95% accuracy on novel adjective-noun compositions where previous work has 55% average accuracy. Robust language understanding without dramatic failures and without corner causes is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.
摘要：理解新的句子包括他们以前从来没有遇到过的概念组合时，人类是非常灵活。最近的研究显示，而当与新颖的句子呈现深网络可以模仿人类的一些语言能力，系统性变化揭示了神经网络的语言理解能力的局限性。我们证明了这些限制，就必须以最近发布的数据集的泛化挑战需要克服，gSCAN，其中明确措施机器人剂是如何能够解释在视觉接地新奇的想法，例如，形容词和名词的新配对。关键的原则，我们采用是组合性：该网络的组成结构应该反映问题域，他们解决的组成结构，同时允许所有其他参数和性能要学习的端至端与监督乏力。我们建立了一个通用的机制，使机器人能够概括他们的语言理解成分域。重要的是，我们的基础网络具有相同的状态的最先进的性能作为现有工作，97％的执行精度，而在同一时间概括它的知识时以前的工作没有;例如，在其中先前的工作具有55％的平均精度新颖形容词 - 名词组合物实现95％的精度。没有戏剧性的失败和无角度的原因很健壮的语言理解是建立安全，公平的机器人的关键;我们证明了组合性可以实现这一目标发挥显著作用。

3. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets [PDF] 返回目录
Patrick Lewis, Pontus Stenetorp, Sebastian Riedel
Abstract: Ideally Open-Domain Question Answering models should exhibit a number of competencies, ranging from simply memorizing questions seen at training time, to answering novel question formulations with answers seen during training, to generalizing to completely novel questions with novel answers. However, single aggregated test set scores do not show the full picture of what capabilities models truly have. In this work, we perform a detailed study of the test sets of three popular open-domain benchmark datasets with respect to these competencies. We find that 60-70% of test-time answers are also present somewhere in the training sets. We also find that 30% of test-set questions have a near-duplicate paraphrase in their corresponding training sets. Using these findings, we evaluate a variety of popular open-domain models to obtain greater insight into what extent they can actually generalize, and what drives their overall performance. We find that all models perform dramatically worse on questions that cannot be memorized from training sets, with a mean absolute performance difference of 63% between repeated and non-repeated data. Finally we show that simple nearest-neighbor models out-perform a BART closed-book QA model, further highlighting the role that training set memorization plays in these benchmarks
摘要：理想的情况下开放域问答模式应该表现出一些能力，范围从简单的记忆在培训时间看到的问题，要回答新问题与配方训练中看到的答案，以推广与新颖的答案完全新颖的问题。然而，单一的聚合试验组得分没有显示出什么样的能力模型真正拥有完整的图片。在这项工作中，我们执行的测试组三个流行的开放域标准数据集进行详细的研究相对于这些能力。我们发现，测试时间回答的60-70％也都在训练集目前地方。我们还发现，测试设置的问题，30％的人在其相应的训练集近重复的意译。利用这些研究结果，我们评估了各种流行的开放领域模型，以获得更大的洞察到什么程度，他们实际上可以概括，以及推动整体性能。我们发现，所有型号上无法从训练组记忆问题急剧恶化执行，63％重复和非重复数据之间的平均绝对性能差异。最后，我们表明，简单的近邻模型外进行BART闭卷QA模式，进一步突出作用，在这些基准测试训练集中记忆播放

4. ConvBERT: Improving BERT with Span-based Dynamic Convolution [PDF] 返回目录
Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
Abstract: Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while using less than 1/4 training cost. Code and pre-trained models will be released.
摘要：预先训练语言模型，如BERT及其变种最近取得了各种自然语言理解任务的骄人业绩。然而，BERT严重依赖全球自注块上，从而遭受大容量内存和计算成本。尽管所有的注意力头对整个输入序列查询生成从全球的角度注意图中，我们观察到了一些头只需要学习当地的依赖，这意味着计算冗余的存在。因此，我们提出了一种基于整体范围的动态卷积替换这些自注意头直接模型本地依赖性。小说卷积头，其余自注意集思广益，形成一个新的混合注块是在全球和当地环境的学习更有效率。我们装备BERT这种混合注意设计和建立一个ConvBERT模型。实验表明，ConvBERT显著优于BERT及其变种在各种下游任务，具有较低的培训费用和较少的模型参数。值得注意的是，ConvBERTbase模式达到86.4胶得分，比ELECTRAbase高0.7，而使用小于1/4的培训成本。代码和预训练的模型将被释放。

5. Efficient MDI Adaptation for n-gram Language Models [PDF] 返回目录
Ruizhe Huang, Ke Li, Ashish Arora, Dan Povey, Sanjeev Khudanpur
Abstract: This paper presents an efficient algorithm for n-gram language model adaptation under the minimum discrimination information (MDI) principle, where an out-of-domain language model is adapted to satisfy the constraints of marginal probabilities of the in-domain data. The challenge for MDI language model adaptation is its computational complexity. By taking advantage of the backoff structure of n-gram model and the idea of hierarchical training method, originally proposed for maximum entropy (ME) language models, we show that MDI adaptation can be computed in linear-time complexity to the inputs in each iteration. The complexity remains the same as ME models, although MDI is more general than ME. This makes MDI adaptation practical for large corpus and vocabulary. Experimental results confirm the scalability of our algorithm on very large datasets, while MDI adaptation gets slightly worse perplexity but better word error rate results compared to simple linear interpolation.
摘要：本文提出了一种高效的算法下的最小判别信息（MDI）原理的n-gram语言模型自适应，其中一个结构域外的语言模型适于以满足在域数据的边际概率的约束。对于MDI语言模型自适应面临的挑战是它的计算复杂性。通过采取n元模型的补偿结构和层次训练法的理念的优势，最初提出的最大熵（ME）语言模型，我们表明，MDI适应可以线性时间复杂度在每个迭代计算的输入。复杂性仍然是和我一样的机型，虽然MDI是比我更普遍。这使得MDI适应实际的大语料库和词汇。实验结果证实了非常大的数据集，我们的算法的可扩展性，同时适应MDI相比于简单的线性插值得到稍微差一些困惑，但更好的词错误率的结果。

6. An Interpretable Deep Learning System for Automatically Scoring Request for Proposals [PDF] 返回目录
Subhadip Maji, Anudeep Srivatsav Appe, Raghav Bali, Veera Raghavendra Chikka, Arijit Ghosh Chowdhury, Vamsi M Bhandaru
Abstract: The Managed Care system within Medicaid (US Healthcare) uses Request For Proposals (RFP) to award contracts for various healthcare and related services. RFP responses are very detailed documents (hundreds of pages) submitted by competing organisations to win contracts. Subject matter expertise and domain knowledge play an important role in preparing RFP responses along with analysis of historical submissions. Automated analysis of these responses through Natural Language Processing (NLP) systems can reduce time and effort needed to explore historical responses, and assisting in writing better responses. Our work draws parallels between scoring RFPs and essay scoring models, while highlighting new challenges and the need for interpretability. Typical scoring models focus on word level impacts to grade essays and other short write-ups. We propose a novel Bi-LSTM based regression model, and provide deeper insight into phrases which latently impact scoring of responses. We contend the merits of our proposed methodology using extensive quantitative experiments. We also qualitatively asses the impact of important phrases using human evaluators. Finally, we introduce a novel problem statement that can be used to further improve the state of the art in NLP based automatic scoring systems.
摘要：内医疗补助（美国医疗）管理式医疗系统采用招标（RFP）授予合同的各种医疗及相关服务。 RFP响应非常详细的文件提交竞争的组织赢得合同（数百页）。主题的专业知识和领域知识准备与历史提交的分析沿着RFP响应中发挥重要作用。通过自然语言处理这些响应的自动分析（NLP）系统可减少时间和探索历史响应所需的努力，并以书面形式更好的反应帮助。我们的工作得出的得分RFP和作文评分模型之间的相似之处，同时突出强调了新的挑战和需要解释性。典型的评分模型侧重于词汇层面影响到年级作文等短写起坐。我们提出了一个新颖的双LSTM基于回归模型，并提供深入了解词组的反应，其潜在影响得分。我们主张我们提议的方法使用大量的定量实验的优点。我们还定性驴使用人工评估的重要短语的影响。最后，我们介绍一种新颖的问题陈述可以用来进一步提高本领域中基于NLP自动评分系统的状态。

7. Personalised Visual Art Recommendation by Learning Latent Semantic Representations [PDF] 返回目录
Bereket Abera Yilma, Najib Aghenda, Marcelo Romero, Yannick Naudet, Herve Panetto
Abstract: In Recommender systems, data representation techniques play a great role as they have the power to entangle, hide and reveal explanatory factors embedded within datasets. Hence, they influence the quality of recommendations. Specifically, in Visual Art (VA) recommendations the complexity of the concepts embodied within paintings, makes the task of capturing semantics by machines far from trivial. In VA recommendation, prominent works commonly use manually curated metadata to drive recommendations. Recent works in this domain aim at leveraging visual features extracted using Deep Neural Networks (DNN). However, such data representation approaches are resource demanding and do not have a direct interpretation, hindering user acceptance. To address these limitations, we introduce an approach for Personalised Recommendation of Visual arts based on learning latent semantic representation of paintings. Specifically, we trained a Latent Dirichlet Allocation (LDA) model on textual descriptions of paintings. Our LDA model manages to successfully uncover non-obvious semantic relationships between paintings whilst being able to offer explainable recommendations. Experimental evaluations demonstrate that our method tends to perform better than exploiting visual features extracted using pre-trained Deep Neural Networks.
摘要：在推荐系统，数据表示技术发挥了巨大的作用，因为他们拥有强大的纠缠，隐藏和显示嵌入数据集内的解释因素。因此，他们影响的建议的质量。具体而言，在视觉艺术（VA）建议的画内体现的概念，复杂性使得由机器从琐碎远捕获语义的任务。在弗吉尼亚州推荐，突出的作品通常使用人工监管的元数据驱动的建议。在这一领域的目标最近的作品借力使用Deep神经网络（DNN）提取视觉特征。然而，这样的数据表示方法是需要资源，并且没有直接的解释，阻碍了用户的认可。为了解决这些限制，我们引入了基于学习绘画的潜在语义表达视觉艺术的个性化推荐的方法。具体来说，我们培训了画作的文字描述背后的潜狄利克雷分配（LDA）模型。我们的LDA模型管理，以绘画之间成功地揭开非显而易见的语义关系，同时能够提供解释的建议。实验评估表明，我们的方法往往比使用利用预训练深层神经网络提取视觉特征有更好的表现。

8. Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding [PDF] 返回目录
Judith Gaspers, Quynh Do, Fabian Triefenbach
Abstract: Despite the fact that data imbalance is becoming more and more common in real-world Spoken Language Understanding (SLU) applications, it has not been studied extensively in the literature. To the best of our knowledge, this paper presents the first systematic study on handling data imbalance for SLU. In particular, we discuss the application of existing data balancing techniques for SLU and propose a multi-task SLU model for intent classification and slot filling. Aiming to avoid over-fitting, in our model methods for data balancing are leveraged indirectly via an auxiliary task which makes use of a class-balanced batch generator and (possibly) synthetic data. Our results on a real-world dataset indicate that i) our proposed model can boost performance on low frequency intents significantly while avoiding a potential performance decrease on the head intents, ii) synthetic data are beneficial for bootstrapping new intents when realistic data are not available, but iii) once a certain amount of realistic data becomes available, using synthetic data in the auxiliary task only yields better performance than adding them to the primary task training data, and iv) in a joint training scenario, balancing the intent distribution individually improves not only intent classification but also slot filling performance.
摘要：尽管数据不均衡正成为越来越现实世界更常见的口语理解（SLU）的应用，它并没有被广泛地在文献中的研究。据我们所知，本文提出在处理数据不平衡SLU第一个系统研究。特别地，我们讨论了现有的数据平衡技术SLU的应用并提出意向分类和槽分配的多任务SLU模型。旨在避免过拟合，在我们的模型的方法用于数据平衡经由辅助任务，其利用一类均衡批次发生器和（可能的）合成的数据的间接利用。我们对现实世界的数据集的结果表明，我）我们的模型可以提高对低频率的意图表现显著，同时避免在头上意图潜在的性能下降，ii）当真实的数据不可用合成数据是提供引导新意图有益但ⅲ）一旦真实数据的一定量变得可用时，在辅助任务使用合成的数据只产生更好的性能比将它们添加到主任务的训练数据，以及iv）在联合训练场景，平衡意图分配单独提高不仅意图分类，但也插槽的填充性能。

9. FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire [PDF] 返回目录
Jinglin Liu, Yi Ren, Zhou Zhao, Chen Zhang, Baoxing Huai, Jing Yuan
Abstract: Lipreading is an impressive technique and there has been a definite improvement of accuracy in recent years. However, existing methods for lipreading mainly build on autoregressive (AR) model, which generate target tokens one by one and suffer from high inference latency. To breakthrough this constraint, we propose FastLR, a non-autoregressive (NAR) lipreading model which generates all target tokens simultaneously. NAR lipreading is a challenging task that has many difficulties: 1) the discrepancy of sequence lengths between source and target makes it difficult to estimate the length of the output sequence; 2) the conditionally independent behavior of NAR generation lacks the correlation across time which leads to a poor approximation of target distribution; 3) the feature representation ability of encoder can be weak due to lack of effective alignment mechanism; and 4) the removal of AR language model exacerbates the inherent ambiguity problem of lipreading. Thus, in this paper, we introduce three methods to reduce the gap between FastLR and AR model: 1) to address challenges 1 and 2, we leverage integrate-and-fire (I\&F) module to model the correspondence between source video frames and output text sequence. 2) To tackle challenge 3, we add an auxiliary connectionist temporal classification (CTC) decoder to the top of the encoder and optimize it with extra CTC loss. We also add an auxiliary autoregressive decoder to help the feature extraction of encoder. 3) To overcome challenge 4, we propose a novel Noisy Parallel Decoding (NPD) for I\&F and bring Byte-Pair Encoding (BPE) into lipreading. Our experiments exhibit that FastLR achieves the speedup up to 10.97$\times$ comparing with state-of-the-art lipreading model with slight WER absolute increase of 1.5\% and 5.5\% on GRID and LRS2 lipreading datasets respectively, which demonstrates the effectiveness of our proposed method.
摘要：是唇读一个令人印象深刻的技术，至今准确性的明确的改善在最近几年。然而，现有的用于唇读主要建立在自回归（AR）模型的方法，其中产生目标令牌逐个和从高推理延迟受到影响。为了突破这个限制，我们提出FastLR，非自回归（NAR）唇读模型生成所有目标同时令牌。 NAR唇读是一项具有挑战性的任务，有许多困难：1）源和目标使得难以估算输出序列的长度之间的序列长度的差异; 2）NAR代的条件独立的行为缺乏跨越时间，这导致目标分布的差近似的相关性; 3）编码器的特征表示能力可能是弱由于缺乏有效的对准机构;和4）移除AR语言模型的加剧唇读的固有模糊问题。因此，在本文中，我们介绍三种方法来减少FastLR和AR模型之间的间隙：到地址1）各种挑战1和2，我们利用整合和火（I \＆F）模块到源视频帧之间的对应关系进行建模和输出文本序列。 2）为了解决挑战3，我们添加的辅助联结颞分类（CTC）解码器到编码器的上部，并用额外的CTC损耗优化它。我们还添加了辅助自回归解码器，以帮助编码器的特征提取。 3）为了克服挑战4，我们提出了一个新颖的嘈杂并行解码（NPD）的I \＆F，并把字节编码对（BPE）到唇读。我们的实验表现出FastLR达到增速高达10.97 $ \次$与国家的最先进的唇读模型1.5 \％略有WER绝对增加，分别GRID和LRS2唇读数据集5.5 \％，这表明了比较我们提出的方法的有效性。

10. DeText: A Deep Text Ranking Framework with BERT [PDF] 返回目录
Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, Ananth Sankar, Zimeng Yang, Qi Guo, Liang Zhang, Bo Long, Bee-Chung Chen, Deepak Agarwal
Abstract: Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual embedding, which has been applied to capturecomplex query-document relations for search ranking. However,this is generally done by exhaustively interacting each query wordwith each document word, which is inefficient for online servingin search product systems. In this paper, we investigate how tobuild an efficient BERT-based ranking model for industry use cases.The solution is further extended to a general ranking framework,DeText, that is open sourced and can be applied to various rankingproductions. Offline and online experiments of DeText on threereal-world search systems present significant improvement overstate-of-the-art approaches.
摘要：排名是一个搜索系统中最重要的组成部分。 Mostsearch系统处理大量的自然语言的数据，从而有效的排名系统，需要一个深understandingof文本语义。近日，深学习基于自然languageprocessing（深NLP）模型已经产生可喜的成果onranking系统。 BERT是已应用于capturecomplex查询 - 文档关系，为搜索排名最成功的车型thatlearn情境嵌入的一个。然而，这通常是通过详尽的交互每个查询wordwith每个文档的话，这是低效的在线servingin搜索产品的系统来完成。在本文中，我们研究如何泥塑工业用例。解决方案的高效的基于BERT的评级模型进一步扩展到一般排名框架，DeText，那就是开源了，可以应用于各种rankingproductions。离线和上threereal世界搜索系统DeText的在线实验目前显著改善夸大的最先进的方法。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-07

目录

摘要