摘要

1. Contextualized Translation of Automatically Segmented Speech [PDF] 返回目录
Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi
Abstract: Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sentences. This segmentation mismatch degrades considerably the quality of ST models' output. So far, researchers have focused on improving audio segmentation towards producing sentence-like splits. In this paper, instead, we address the issue in the model, making it more robust to a different, potentially sub-optimal segmentation. To this aim, we train our models on randomly segmented data and compare two approaches: fine-tuning and adding the previous segment as context. We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4.25 BLEU points.
摘要：直接语音到文本转换（ST）模型通常训练有素，在上句子层面分割语料库，但在推理时，他们通常与由语音活动检测（VAD）音频分离供电。由于VAD分割并不语法通知，将所得片段不一定对应于形成良好的句子说出由扬声器，但最有可能的，以一个或多个句子的片段。这种分割的不匹配会显着降低ST模型的输出质量。到目前为止，研究人员已经集中在提高对生产句般的分裂音频分割。在本文中，相反，我们解决模型中的问题，使之更加坚固，以不同的，潜在的次优的分割。为了达到这个目的，我们训练上随机分割数据中我们的模型，并比较两种方法：微调，增加对前一段为背景。我们证明了我们的环境感知解决方案更稳健的VAD分段输入，跑赢强碱模型和高达4.25 BLEU点上的英语 - 德语测试集的不同VAD分割微调。

2. Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts [PDF] 返回目录
Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds
Abstract: A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences. Through several case studies, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.
摘要：在计算文本的常见任务分析是量化两个语料库根据像词频，情绪，或者信息含量的测量值的差异。然而，崩文本丰富的故事情节到一个单一的数字往往是危险的概念，它是很难理直气壮地解释有趣的或意外的文本模式，而不对若隐若现的数据失真或测量有效性的担忧。为了文本之间更好的捕捉细粒度的差异，我们引入广义词移动图形，可视化其产生的话让个别有助于为可以配制成加权平均任何措施两个文本之间的差异有意义的和可解释的总结。我们表明，这种架构自然包括许多最常用的方法的比较文本，包括相对频率，字典分数，并基于熵的措施，如库勒巴克-莱布勒和詹森 - 香农分歧。通过几个案例，我们演示了如何广义词移动图形可以跨域灵活运用了摸底排查，假设生成和实质解释。通过提供详细的镜头到语料库之间的文本转变，广义词移动图形帮助计算社会科学家，数字人文主义者，和其他文本分析从业时尚更强大的科学故事。

3. Computational linguistic assessment of textbook and online learning media by means of threshold concepts in business education [PDF] 返回目录
Andy Lücking, Sebastian Brückner, Giuseppe Abrami, Tolga Uslu, Alexander Mehler
Abstract: Threshold concepts are key terms in domain-based knowledge acquisition. They are regarded as building blocks of the conceptual development of domain knowledge within particular learners. From a linguistic perspective, however, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features. Threshold concepts are typically used in specialized texts such as textbooks -- that is, within a formal learning environment. However, they also occur in informal learning environments like newspapers. In this article, a first approach is taken to combine both lines into an overarching research program - that is, to provide a computational linguistic assessment of different resources, including in particular online resources, by means of threshold concepts. To this end, the distributive profiles of 63 threshold concepts from business education (which have been collected from threshold concept research) has been investigated in three kinds of (German) resources, namely textbooks, newspapers, and Wikipedia. Wikipedia is (one of) the largest and most widely used online resources. We looked at the threshold concepts' frequency distribution, their compound distribution, and their network structure within the three kind of resources. The two main findings can be summarized as follows: Firstly, the three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles. Secondly, Wikipedia definitely appears to be a formal learning resource.
摘要：阈值的概念是基于域的知识获取的关键术语。他们被视为建设具体学习者内领域知识的概念发展的块。从语言的角度看，然而，阈值的概念是专门词汇的情况下，表现出特定语言特征。也就是说，一个正规的学习环境中 - 阈值的概念通常是在专门的文本，如教科书使用。然而，他们也出现在非正式的学习环境中，如报纸。在这篇文章中，第一种方法是采取两条线合并成一个总体研究计划 - 即，为客户提供不同的资源的计算语言学的评估，特别是包括网上资源，通过阈值概念的手段。为此，63层门槛的概念从商业教育的分布曲线（已经从概念门槛研究所收集）已经3种的（德国）资源，即教科书，报纸和维基百科的影响。维基百科是（一）规模最大，应用最广泛的在线资源。看着我们的阈值概念的频率分布，它们的复合分布和三种资源内的网络结构。首先，三种资源的确可以在其阈值概念的个人资料方面区别：两个主要结论可以总结如下。其次，维基百科肯定似乎是一个正式的学习资源。

4. Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages [PDF] 返回目录
Alexander Mehler, Wahed Hemati, Pascal Welke, Maxim Konca, Tolga Uslu
Abstract: We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
摘要：我们测试在多大程度上可以得到通过维基百科给定主题的信息取决于它被咨询的语言的假设。控制规模因素，我们正在调查这个假说的数量的25个学科领域。由于维基百科是基于网络的信息景观的核心部分，这表明一个与语言相关的，语言的偏见。因此，文章最后的维基百科是否表现出这种语言相对论或不是问题的交易。从教育科学的角度来看，发展的文章从多个文本绘制为基于网络的阅读的典型输入信息景观的计算模型。为了这个目的，它的发展的信息景观的不同部分的分子内和互文相似性的混合模型和测试上的35种语言相应的维基百科的例子和该模型。这样的文章阅读构建研究，教育科学，维基百科的研究和计算语言学之间的桥梁。

5. Improving End-to-End Speech-to-Intent Classification with Reptile [PDF] 返回目录
Yusheng Tian, Philip John Gorinski
Abstract: End-to-end spoken language understanding (SLU) systems have many advantages over conventional pipeline systems, but collecting in-domain speech data to train an end-to-end system is costly and time consuming. One question arises from this: how to train an end-to-end SLU with limited amounts of data? Many researchers have explored approaches that make use of other related data resources, typically by pre-training parts of the model on high-resource speech recognition. In this paper, we suggest improving the generalization performance of SLU models with a non-standard learning algorithm, Reptile. Though Reptile was originally proposed for model-agnostic meta learning, we argue that it can also be used to directly learn a target task and result in better generalization than conventional gradient descent. In this work, we employ Reptile to the task of end-to-end spoken intent classification. Experiments on four datasets of different languages and domains show improvement of intent prediction accuracy, both when Reptile is used alone and used in addition to pre-training.
摘要：端至端口语理解（SLU）系统具有优于常规的管道系统的许多优点，但在收集域的语音数据来训练的端至端系统是昂贵和费时的。其中一个问题是从这个：如何培养一个终端到终端的SLU用有限的数据量？许多研究者已经探索了方法，以利用等相关数据资源，通常是通过对资源丰富的语音识别模型前培训部分。在本文中，我们建议提高SLU模型与非标准学习算法，爬虫类动物的泛化性能。虽然爬虫最初提出的模型无关元的学习，我们认为，它也可以用来直接学习目标任务和结果比传统的梯度下降较好的泛化。在这项工作中，我们采用爬行动物的端至端口头意向分类的任务。在不同的语言和域名的四个数据集实验表明意向的预测精度的提高，既当爬虫单独使用，除了用于预培训。

6. Trove: Ontology-driven weak supervision for medical entity classification [PDF] 返回目录
Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming, Jose Posada, Alison Callahan, Nigam H. Shah
Abstract: Motivation: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds imperfect training sets from low cost, less accurate labeling rules, offers a potential solution. Medical ontologies are compelling sources for generating labels, however combining multiple ontologies without ground truth data creates challenges due to label noise introduced by conflicting entity definitions. Key questions remain on the extent to which weakly supervised entity classification can be automated using ontologies, or how much additional task-specific rule engineering is required for state-of-the-art performance. Also unclear is how pre-trained language models, such as BioBERT, improve the ability to generalize from imperfectly labeled data. Results: We present Trove, a framework for weakly supervised entity classification using medical ontologies. We report state-of-the-art, weakly supervised performance on two NER benchmark datasets and establish new baselines for two entity classification tasks in clinical text. We perform within an average of 3.5 F1 points (4.2%) of NER classifiers trained with hand-labeled data. Automatically learning label source accuracies to correct for label noise provided an average improvement of 3.9 F1 points. BioBERT provided an average improvement of 0.9 F1 points. We measure the impact of combining large numbers of ontologies and present a case study on rapidly building classifiers for COVID-19 clinical tasks. Our framework demonstrates how a wide range of medical entity classifiers can be quickly constructed using weak supervision and without requiring manually-labeled training data.
摘要：动机：认识命名实体（NER）和类似否定及其相关的属性是自然语言处理的核心任务。然而，对于实体任务人工标注的数据是耗时且昂贵的，以创造新的医疗应用使用机器学习障碍。弱监督学习，自动建立从低成本，不准确的标签规则，提供一个潜在的解决方案不完美的训练集。医学本体是用于生成标签引人注目的来源，但是组合多个本体无地面真实数据，创建了由冲突的实体定义噪音介绍，由于标签的挑战。关键的问题是在哪个弱监督实体分类可以使用本体是自动化的，或如何需要国家的最先进的性能多少额外的任务，具体的规则工程的程度。目前还不清楚是预先训练的语言模型，如BioBERT，如何提高从不完善标记数据一概而论的能力。结果：我们提出了Trove，利用医疗本体弱监督实体分类的框架。我们报告最先进的国家的，弱监督两个NER基准数据集的性能和建立新的基线在临床文本的两种实体分类任务。我们的平均与手工标记的数据训练NER分类器3.5 F1点（4.2％）内执行。自动学习标签源精度校正标签噪音提供的3.9 F1点平均改善。 BioBERT提供的0.9 F1点平均改善。我们衡量结合大量本体的影响以及快速构建分类器COVID-19的临床工作提出的案例研究。我们的框架演示了如何广泛的医疗实体分类器可以使用弱监督，而不需要手动标记的训练数据可以快速构建。

7. An exploration of the encoding of grammatical gender in word embeddings [PDF] 返回目录
Hartger Veeman, Ali Basirat
Abstract: The vector representation of words, known as word embeddings, has opened a new research approach in the study of languages. These representations can capture different types of information about words. The grammatical gender of nouns is a typical classification of nouns based on their formal and semantic properties. The study of grammatical gender based on word embeddings can give insight into discussions on how grammatical genders are determined. In this research, we compare different sets of word embeddings according to the accuracy of a neural classifier determining the grammatical gender of nouns. It is found that the information about grammatical gender is encoded differently in Swedish, Danish, and Dutch embeddings. Our experimental results on the contextualized embeddings pointed out that adding more contextual (semantic) information to embeddings is detrimental to the classifier's performance. We also observed that removing morpho-syntactic features such as articles from the training corpora of embeddings decreases the classification performance dramatically, indicating a large portion of the information is encoded in the relationship between nouns and articles.
摘要：字的矢量表示，被称为字的嵌入，开辟了在语言的研究提供了新的研究方法。这些表示可以捕捉不同类型的有关词的信息。名词的语法性别是基于他们的正式和语义特征的名词典型的分类。语法性别基础上的嵌入字研究有助于深入分析讨论如何语法性别确定。在这项研究中，我们比较根据神经分类确定名词的语法性别的准确性套不同的嵌入字的。研究发现，约语法性别信息在瑞典，丹麦和荷兰的嵌入不同编码。我们对情境的嵌入实验结果指出，加入更多的相关内容（语义）信息的嵌入是不利于分类器的性能。我们还观察到去除形态 - 句法特征如从的嵌入的训练语料库物品减小分类性能显着，表明该信息的大部分在名词和物品之间的关系进行编码。

8. Designing the Business Conversation Corpus [PDF] 返回目录
Matīss Rikters, Ryokan Ri, Tong Li, Toshiaki Nakazawa
Abstract: While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.
摘要：尽管书面文本的机器翻译的进度已经远远在过去几年中由于对平行语料库和基于语料库培训技术，语音文字和对话遗体甚至现代系统的挑战的自动翻译的提高可用性。在本文中，我们的目标是通过引入新建成的日英商务会谈的平行语料库，以提高对话文本的机器翻译的质量。与挑战实例为自动翻译提供沿胼的详细分析。我们也尝试用机器翻译训练场景添加语料，并展示如何从它的使用所产生的系统优势。

9. Antibody Watch: Text Mining Antibody Specificity from the Literature [PDF] 返回目录
Chun-Nan Hsu, Chia-Hui Chang, Thamolwan Poopradubsil, Amanda Lo, Karen A. William, Ko-Wei Lin, Anita Bandrowski, Ibrahim Burak Ozyurt, Jeffrey S. Grethe, Maryann E. Martone
Abstract: Motivation: Antibodies are widely used reagents to test for expression of proteins. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, they may not scale well to deal with the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating a report to alert users of problematic antibodies by extracting statements about antibody specificity reported in the literature. Results: Our goal is to construct an "Antibody Watch" knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits nonspecificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform both classification and linking tasks with weighted F-scores over 0.925 and 0.923, respectively, and 0.914 overall when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining.
摘要：动机：抗体被广泛使用的试剂测试蛋白质的表达。但是，他们可能并不总是可靠地产生结果时，他们不特异性结合靶蛋白，他们的供应商设计的他们，导致不可靠的研究成果。虽然许多提案已开发出对付抗体特异性的问题，他们可能不会很好地扩展到处理数以百万计的抗体是提供给研究人员。在这项研究中，我们探讨通过提取有关文献报道的抗体特异性报表自动生成有问题的抗体提醒用户报告的可行性。结果：我们的目标是构建含有抗体的存在问题的支持声明“抗体观察”的知识基础。我们建立了深厚的神经网络系统和两千多条的语料库报道抗体的应用测试其性能。我们把这个问题分成两个任务。鉴于输入物品，首要任务是确定片段约抗体特异性和分类，如果片断报告任何抗体表现出非特异性，因此是有问题的。第二个任务是将每个这些片段的链接到在所述摘录中提到的一种或多种抗体。实验评价结果显示，我们的系统可以与加权F-得分超过0.925 0.923和分别，并结合0.914，完成联合工作总时准确地进行分类和链接任务。我们充分利用研究资源标识符（RRID）精确地找出与所提取的特异性片段抗体。结果表明，它是可行的构造有关文本挖掘问题的抗体可靠的知识基础。

10. Automated Topical Component Extraction Using Neural Network Attention Scores from Source-based Essay Scoring [PDF] 返回目录
Haoran Zhang, Diane Litman
Abstract: While automated essay scoring (AES) can reliably grade essays at scale, automated writing evaluation (AWE) additionally provides formative feedback to guide essay revision. However, a neural AES typically does not provide useful feature representations for supporting AWE. This paper presents a method for linking AWE and neural AES, by extracting Topical Components (TCs) representing evidence from a source text using the intermediate output of attention layers. We evaluate performance using a feature-based AES requiring TCs. Results show that performance is comparable whether using automatically or manually constructed TCs for 1) representing essays as rubric-based features, 2) grading essays.
摘要：在自动作文评分（AES）能够可靠地在规模档次的论文，写的自动评估（AWE）另外提供形成性反馈，以指导作文修改。然而，神经AES通常不提供支持AWE有用的功能表示。本文提出了一种连接AWE和神经AES的方法，通过提取局部组件（TCS）表示从使用注意各层的中间输出一个源文本的证据。我们采用基于特征的AES要求技术委员会评估绩效。结果表明，性能是可比是否使用自动或手动构造的TC为1）表示的文章基于栏目的功能，2）分级的文章。

11. Word meaning in minds and machines [PDF] 返回目录
Brenden M. Lake, Gregory L. Murphy
Abstract: Machines show an increasingly broad set of linguistic competencies, thanks to recent progress in Natural Language Processing (NLP). Many algorithms stem from past computational work in psychology, raising the question of whether they understand words as people do. In this paper, we compare how humans and machines represent the meaning of words. We argue that contemporary NLP systems are promising models of human word similarity, but they fall short in many other respects. Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people use words in order to express. Word meanings must also be grounded in vision and action, and capable of flexible combinations, in ways that current systems are not. We pose concrete challenges for developing machines with a more human-like, conceptual basis for word meaning. We also discuss implications for cognitive science and NLP.
摘要：机显示了越来越广泛的语言能力，感谢在自然语言处理的最新进展（NLP）。许多算法从过去的计算工作干在心理，提高他们是否理解词，因为人们做些什么的问题。在本文中，我们比较了人类和机器如何表示词的含义。我们认为，当代NLP系统是有前途的人字相似的车型，但他们在其他许多方面功亏一篑。目前的模型过于密切相关的基于文本的模式在大型语料库，太弱链接到欲望，目标和信念，人们为了表达用文字。字的含义也必须在视觉和行动为基础，并且能够灵活组合的方式，目前的系统都没有。我们提出用一个更类似人类的，概念的词义基础显影机的具体挑战。我们还讨论了认知科学和NLP影响。

12. Select, Extract and Generate: Neural Keyphrase Generation with Syntactic Guidance [PDF] 返回目录
Wasi Uddin Ahmad, Xiao Bai, Soomin Lee, Kai-Wei Chang
Abstract: In recent years, deep neural sequence-to-sequence framework has demonstrated promising results in keyphrase generation. However, processing long documents using such deep neural networks requires high computational resources. To reduce the computational cost, the documents are typically truncated before given as inputs. As a result, the models may miss essential points conveyed in a document. Moreover, most of the existing methods are either extractive (identify important phrases from the document) or generative (generate phrases word by word), and hence they do not benefit from the advantages of both modeling techniques. To address these challenges, we propose \emph{SEG-Net}, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document, and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses a self-attentive architecture, known as, \emph{Transformer} as the building block with a couple of uniqueness. First, SEG-Net incorporates a novel \emph{layer-wise} coverage attention to summarize most of the points discussed in the target document. Second, it uses an \emph{informed} copy attention mechanism to encourage focusing on different segments of the document during keyphrase extraction and generation. Besides, SEG-Net jointly learns keyphrase generation and their part-of-speech tag prediction, where the later provides syntactic supervision to the former. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin in both domains.
摘要：近年来，深层神经序列到序列框架已经证明有希望的关键词的生成结果。然而，使用这种深层神经网络处理长文档需要较高的计算资源。为了降低计算成本，该文档通常被截断之前给出的投入。其结果是，该车型可能会错过一个文档中传达要点。此外，大多数现有的方法要么是采掘（标识从文档中重要的短语）或生成（产生由字词组的字），因此他们不从两个建模技术的优势中获益。为了应对这些挑战，我们建议\ EMPH {赛格网}，这是由两个主要部分组成神经的关键词生成模型，（1）一种选择器，选择突出的句子在文档中，和（2）提取发电机该共同提取，并产生从所选择的句子关键短语。 SEG-Net使用自周到的建筑，被称为\ {EMPH变压器}作为构建块与一对夫妇的独特性。首先，SEG-Net的结合了新颖\ EMPH {逐层}覆盖注意总结最为在目标文档中所讨论的点的。其次，它使用\ {EMPH通知}复制注意机制，鼓励着眼于文档的不同部分关键词的提取和生成过程中。此外，赛格网共同学习关键词的产生和部分的语音他们的标记预测，在以后提供语法监督前者。从科学和Web文档7个的关键词生成基准实验结果表明，SEG-Net的优于在两个领域大幅度的国家的最先进的神经生成的方法。

13. Aligning AI With Shared Human Values [PDF] 返回目录
Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt
Abstract: We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to filter out needlessly inflammatory chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
摘要：我们展示如何评价道德的基本概念的语言模型的知识。我们引入了道德的数据集，一个新的基准为公正跨度的概念，福利，职务，美德和常识道德。模型预测约多样的文本场景广泛的道德判断。这需要物理和社会世界的知识连接到价值判断，一个功能，可以使我们能够过滤掉不必要的炎症聊天机器人输出或最终合法化开放式的强化学习代理商。还与道德规范的数据集，我们发现，目前的语言模型有一个有前途的，但不完全理解基本的道德知识。我们的工作表明，进步可以在机器的道德规范，制作，它提供了对AI垫脚石是与人的价值取向。

14. Glushkov's construction for functional subsequential transducers [PDF] 返回目录
Aleksander Mendoza-Drosik
Abstract: Glushkov's construction has many interesting properties, however, they become even more evident when applied to transducers. This article strives to show the unusual link between functional subsequential finite state transducers and Glushkov's construction. The methods and algorithms presented here were used to implement compiler of regular expressions.
摘要：Glushkov的建设有许多有趣的特性，然而，当应用到传感器他们变得更加明显。本文力图展示功能的后续有限状态传感器和Glushkov的建设之间的不寻常的联系。这些方法和这里介绍的算法被用来实现正则表达式的编译器。

15. Hopfield Networks is All You Need [PDF] 返回目录
Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
Abstract: We show that the transformer attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield network can store exponentially (with the dimension) many patterns, converges with one update, and has exponentially small retrieval errors. The number of stored patterns is traded off against convergence speed and retrieval error. The new Hopfield network has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they operate in higher layers in metastable states. The gradient in transformers is maximal for metastable states, is uniformly distributed for global averaging, and vanishes for a fixed point near a stored pattern. Using the Hopfield network interpretation, we analyzed learning of transformer and BERT models. Learning starts with attention heads that average and then most of them switch to metastable states. However, the majority of heads in the first layers still averages and can be replaced by averaging, e.g. our proposed Gaussian weighting. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information created in lower layers. These heads seem to be a promising target for improving transformers. Neural networks with Hopfield networks outperform other methods on immune repertoire classification, where the Hopfield net stores several hundreds of thousands of patterns. We provide a new PyTorch layer called "Hopfield", which allows to equip deep learning architectures with modern Hopfield networks as a new powerful concept comprising pooling, memory, and attention. GitHub: this https URL
摘要：我们表明，变压器注意机制是一个现代化的Hopfield网络连续状态的更新规则。这种新的Hopfield网络可以成倍存储（与尺寸）多款，收敛与一个更新，并具有成倍小的检索错误。存储的图案的数目对收敛速度和检索错误折衷。新的Hopfield网络有三种类型的能量最小值的（更新的固定分）：（1）全球固定点平均过的所有模式，（2）亚稳状态上平均的图案的子集，和（3）的固定点，其存储区中的单一模式。变压器和BERT模型在他们的第一层最好工作在全球平均政权，而他们在亚稳态高层工作。在变压器的梯度是最大的为亚稳态，全球平均是均匀分布的，并且消失近存储的图案的固定点。使用Hopfield网络解释，我们分析了变压器和BERT模式的学习。学习注意力头开始是平均值，然后他们大多切换到亚稳态。然而，在第一层仍然平均值多数头，并且可以通过平均化，例如被替换我们提出的高斯加权。相反，在过去的各层负责人不断学习，似乎使用的亚稳态在较低层创建收集信息。这头似乎是提高变压器有希望的目标。与Hopfield网络神经网络性能超过免疫组库分类，其他方法在Hopfield网络存储几十万的模式。我们提供了一种称为“Hopfield神经”的新PyTorch层，这使得装备有现代化的Hopfield网络的深学习架构作为一个新的强大的概念，包括池，记忆和注意力。 GitHub的：这HTTPS URL

16. Future Vector Enhanced LSTM Language Model for LVCSR [PDF] 返回目录
Qi Liu, Yanmin Qian, Kai Yu
Abstract: Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.
摘要：语言模型（LM）的大词汇量连续语音识别（LVCSR）发挥了重要作用。然而，传统的语言模型只预测与给定的历史下单字，而在字的顺序连续预测通常要求在LVCSR有用。在训练的一个单词预测建模和读取需求的长期预测序列之间的不匹配可能会导致性能下降。在本文中，一种新颖的使用，提出了未来矢量长增强短期记忆（LSTM）LM。除了给定的历史，该序列的其余部分也将通过嵌入未来的载体。这个未来的向量可以与LSTM LM被合并，因此它有更长的期限序列级信息模型的能力。实验结果表明，所提出的新LSTM LM沾到BLEU分数长期序列预测更好的结果。对于语音识别再评分，虽然提出LSTM LM获得非常轻微的涨幅，新的模式似乎获得与传统LSTM LM伟大的补充。同时使用新的和传统的LSTM LMS能够实现对单词错误率非常大的改进再评分。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-06

目录

摘要