摘要

1. What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT [PDF] 返回目录
Lukas Fischer, Samuel Läubli
Abstract: Machine translation (MT) has been shown to produce a number of errors that require human post-editing, but the extent to which professional human translation (HT) contains such errors has not yet been compared to MT. We compile pre-translated documents in which MT and HT are interleaved, and ask professional translators to flag errors and post-edit these documents in a blind evaluation. We find that the post-editing effort for MT segments is only higher in two out of three language pairs, and that the number of segments with wrong terminology, omissions, and typographical problems is similar in HT.
摘要：机器翻译（MT）已被证明产生了一些需要人工后期编辑的错误，但其专业的人力翻译（HT）包含这样的错误的程度尚未相比MT。我们编译预翻译，其中MT和HT交错的文件，并请专业翻译来标记错误和后期编辑在盲评这些文件。我们发现，在后期编辑工作的MT段仅在三分之二的语言对高，与错误的术语，遗漏和印刷问题的段数为HT相似。

2. Modeling Discourse Structure for Document-level Neural Machine Translation [PDF] 返回目录
Junxuan Chen, Xiang Li, Jiarui Zhang, Chulun Zhou, Jianwei Cui, Bin Wang, Jinsong Su
Abstract: Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.
摘要：近日，文档级神经机器翻译（NMT）已经成为机器翻译的社会各界热议的话题。尽管它的成功，大多数现有的研究忽视要转换的输入文档的话语结构信息，这在其他任务显示有效。在本文中，我们提出了以改善与话语结构信息的帮助文档级NMT。我们的编码器是基于分层关注网络（HAN）上。具体地讲，我们首先分析输入文件，以获得它的话语结构。然后，我们引入一个基于变压器的路径编码器到每个字的嵌入话语结构信息。最后，我们与它之前的单词被送入编码器嵌入结合语篇结构的信息。对英语到德国的数据集上，我们的模型可以显著优于两者变压器和变压器+ HAN实验结果。

3. CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training [PDF] 返回目录
Qipeng Guo, Zhijing Jin, Xipeng Qiu, Weinan Zhang, David Wipf, Zheng Zhang
Abstract: Two important tasks at the intersection of knowledge graphs and natural language processing are graph-to-text (G2T) and text-to-graph (T2G) conversion. Due to the difficulty and high cost of data collection, the supervised data available in the two fields are usually on the magnitude of tens of thousands, for example, 18K in the WebNLG dataset, which is far fewer than the millions of data for other tasks such as machine translation. Consequently, deep learning models in these two fields suffer largely from scarce training data. This work presents the first attempt to unsupervised learning of T2G and G2T via cycle training. We present CycleGT, an unsupervised training framework that can bootstrap from fully non-parallel graph and text datasets, iteratively back translate between the two forms, and use a novel pretraining strategy. Experiments on the benchmark WebNLG dataset show that, impressively, our unsupervised model trained on the same amount of data can achieve performance on par with the supervised models. This validates our framework as an effective approach to overcome the data scarcity problem in the fields of G2T and T2G.
摘要：在知识图和自然语言处理的交叉的两个重要的任务是图形到文本（G2T）和文本 - 图（T2G）转换。由于难度和数据采集成本高，在这两个领域中可用的教师数据通常是几万的大小，例如18K在WebNLG数据集，远低于数百万其他任务较少的数据比如机器翻译。因此，在这两个领域的深度学习模式从稀缺的训练数据在很大程度上受到影响。这项工作提出了通过循环训练的第一次尝试T2G的无监督学习和G2T。我们目前CycleGT，无人监督的训练框架，可以从完全非平行图形和文本数据集引导，反复回两种形式之间进行转换，并使用新的训练前的策略。基准的WebNLG数据集实验表明，赫然，我们训练有素的同样的数据量的无监督模型可以媲美实现性能与监督模式。这证实了我们的框架，以克服G2T和T2G等领域的数据匮乏问题的有效途径。

4. Misinformation has High Perplexity [PDF] 返回目录
Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung
Abstract: Debunking misinformation is an important and time-critical task as there could be adverse consequences when misinformation is not quashed promptly. However, the usual supervised approach to debunking via misinformation classification requires human-annotated data and is not suited to the fast time-frame of newly emerging events such as the COVID-19 outbreak. In this paper, we postulate that misinformation itself has higher perplexity compared to truthful statements, and propose to leverage the perplexity to debunk false claims in an unsupervised manner. First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims. Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time. We construct two new COVID-19-related test sets, one is scientific, and another is political in content, and empirically verify that our system performs favorably compared to existing systems. We are releasing these datasets publicly to encourage more research in debunking misinformation on COVID-19 and other topics.
摘要：揭穿误传是可能有不良后果的错误信息时不及时撤销一个重要和时间的关键任务。然而，通常的监督方式经由误传分类戳穿需要人类注释的数据，并且不适合于新出现的事件的快速时间帧，如COVID-19爆发。在本文中，我们假设，相对于真实的陈述误传本身具有较高的困惑，并提出利用困惑揭穿在无人监督的方式虚假陈述。首先，我们根据句子相似度权利要求提取的科学和新闻来源可靠的证据。其次，我们与提取的证据主要语言模型，最后评估基于在揭穿时的困惑分数给出的权利要求的正确性。我们建立了两个新的COVID-19相关的测试集，一个是科学的，另一个是内容的政治和经验证明我们执行系统相比，毫不逊色于现有的系统。我们发布这些数据集公开，以鼓励更多的研究在COVID-19等话题揭穿误导。

5. ColdGANs: Taming Language GANs with Cautious Sampling Strategies [PDF] 返回目录
Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano
Abstract: Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization.
摘要：基于最大似然估计（MLE）培训制度从已知限制的影响，常常导致产生不良的文本序列。在这些限制的根源在于训练和推理，即之间的不匹配所谓的曝光补偿，加剧只考虑参考文本是正确的，而在实践中的几个替代配方可能是一样好。生成对抗性网络（甘斯）可以减轻这些限制，但文字的离散性阻碍了他们的语言生成的应用程序：到目前为止的办法提出，基于强化学习，已显示出表现不佳MLE。从以前的作品出发，我们分析在甘斯的探索步应用于文本生成，并显示取样不稳定培训如何经典结果。我们建议考虑在GAN框架替代勘探的战略，我们的名字ColdGANs，我们强制采样接近分配模式，以获得更流畅的动态学习。第一次，就我们所知，所提出的语言甘斯媲美于MLE，并获得了三个生成的任务，即无条件的文本生成，问题的产生，和抽象概括的国家的最先进的改进。

6. A Comprehensive Survey on Aspect Based Sentiment Analysis [PDF] 返回目录
Kaustubh Yadav
Abstract: Aspect Based Sentiment Analysis (ABSA) is the sub-field of Natural Language Processing that deals with essentially splitting our data into aspects ad finally extracting the sentiment information. ABSA is known to provide more information about the context than general sentiment analysis. In this study, our aim is to explore the various methodologies practiced while performing ABSA, and providing a comparative study. This survey paper discusses various solutions in-depth and gives a comparison between them. And is conveniently divided into sections to get a holistic view on the process.
摘要：看点基于情感分析（ABSA）是自然语言处理的子场，与我们的数据基本上分成方面的广告交易最终提取的情绪信息。 ABSA已知提供约比一般的情感分析上下文的更多信息。在这项研究中，我们的目的是探讨在执行ABSA，并提供了一个比较研究实行的各种方法。本次调查讨论了深入的各种解决方案，使他们之间的比较。而且使用方便，分成几个部分来获得对过程的整体视图。

7. CS-Embed-francesita at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis [PDF] 返回目录
Frances Adriana Laureano De Leon, Florimond Guéniat, Harish Tayyar Madabushi
Abstract: The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media Text}. We utilising them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, and our team ranks 14 out of 23 participating teams beating the baseline.
摘要：日益普及和社交媒体帖子的情感分析的应用也自然导致了用多种语言书写的帖子的情感分析，被称为码转换练习。虽然最近的研究为代码交换职位都集中在使用多语种字的嵌入，这些嵌入物并没有经过培训的代码交换数据。在这项工作中，我们训练上存在字的嵌入代码交换鸣叫，特别是那些使使用西班牙语和英语，被称为西班牙式英语。我们探索嵌入空间来发现它们是如何捕获两种语言词的含义。我们通过参与SemEval 2020任务9测试这些嵌入物的有效性：〜\ {EMPH情感分析上的代码混社会媒体文本}。我们利用他们来培训一个情感分类是实现了F-1的比分的0.722。这是比基线为0.656的竞争更高，我们的团队排名14的23支参赛队击败基线。

8. Towards an Argument Mining Pipeline Transforming Texts to Argument Graphs [PDF] 返回目录
Mirko Lenz, Premtim Sahitaj, Sean Kallenberg, Christopher Coors, Lorik Dumani, Ralf Schenkel, Ralph Bergmann
Abstract: This paper targets the automated extraction of components of argumentative information and their relations from natural language text. Moreover, we address a current lack of systems to provide complete argumentative structure from arbitrary natural language text for general usage. We present an argument mining pipeline as a universally applicable approach for transforming German and English language texts to graph-based argument representations. We also introduce new methods for evaluating the results based on existing benchmark argument structures. Our results show that the generated argument graphs can be beneficial to detect new connections between different statements of an argumentative text. Our pipeline implementation is publicly available on GitHub.
摘要：本文针对的议论信息和自然语言文本的关系部件的自动提取。此外，我们解决目前缺乏系统的普遍使用任意的自然语言文本提供完整的议论文结构。我们提出一个论点挖掘管道作为转化德语和英语语言文本到基于图形的参数表示一个普遍适用的方法。我们还介绍了用于评估基于现有基准参数结构的结果新方法。我们的研究结果表明，所产生的参数图是有益的检测议论文的不同语句之间的新连接。我们的管道实现是在GitHub上公开。

9. Combining word embeddings and convolutional neural networks to detect duplicated questions [PDF] 返回目录
Yoan Dimitrov
Abstract: Detecting semantic similarities between sentences is still a challenge today due to the ambiguity of natural languages. In this work, we propose a simple approach to identifying semantically similar questions by combining the strengths of word embeddings and Convolutional Neural Networks (CNNs). In addition, we demonstrate how the cosine similarity metric can be used to effectively compare feature vectors. Our network is trained on the Quora dataset, which contains over 400k question pairs. We experiment with different embedding approaches such as Word2Vec, Fasttext, and Doc2Vec and investigate the effects these approaches have on model performance. Our model achieves competitive results on the Quora dataset and complements the well-established evidence that CNNs can be utilized for paraphrase detection tasks.
摘要：检测句子之间的语义相似性仍然是今天的挑战，由于自然语言的模糊性。在这项工作中，我们提出了一个简单的方法通过组合字的嵌入和卷积神经网络（细胞神经网络）的优势识别语义相似的问题。此外，我们展示了余弦相似性度量如何可以用来有效地比较特征向量。我们的网络是在Quora的数据集，其中包含了超过40万的问题对培训。我们尝试不同的嵌入方法，如Word2Vec，Fasttext和Doc2Vec和调查对模型性能的影响，这些方法都有。我们的模型实现了对Quora的数据集有竞争力的结果，并补充完善证据表明，细胞神经网络可用于检测意译任务。

10. Tensors over Semirings for Latent-Variable Weighted Logic Programs [PDF] 返回目录
Esma Balkir, Daniel Gildea, Shay Cohen
Abstract: Semiring parsing is an elegant framework for describing parsers by using semiring weighted logic programs. In this paper we present a generalization of this concept: latent-variable semiring parsing. With our framework, any semiring weighted logic program can be latentified by transforming weights from scalar values of a semiring to rank-n arrays, or tensors, of semiring values, allowing the modelling of latent variables within the semiring parsing framework. Semiring is too strong a notion when dealing with tensors, and we have to resort to a weaker structure: a partial semiring. We prove that this generalization preserves all the desired properties of the original semiring framework while strictly increasing its expressiveness.
摘要：半环的解析是利用半环加权逻辑程序描述的解析器的优雅框架。在本文中，我们提出这个概念的泛化：潜变量半环的解析。与我们的框架，任何半环加权逻辑程序可以通过将来自一个半环的标量值权重秩-n的阵列，或张量，半环的值，从而允许潜在变量的半环解析框架内建模latentified。半环是太强概念与张量打交道时，我们不得不诉诸较弱的结构：将部分半环。我们证明了这一概括保留所有的原始半环框架所需的性能，同时严格递增的表现。

11. Pre-training Polish Transformer-based Language Models at Scale [PDF] 返回目录
Sławomir Dadas, Michał Perełkiewicz, Rafał Poświata
Abstract: Transformer-based language models are now widely used in Natural Language Processing (NLP). This statement is especially true for English language, in which many pre-trained models utilizing transformer-based architecture have been published in recent years. This has driven forward the state of the art for a variety of standard NLP tasks such as classification, regression, and sequence labeling, as well as text-to-text tasks, such as machine translation, question answering, or summarization. The situation have been different for low-resource languages, such as Polish, however. Although some transformer-based language models for Polish are available, none of them have come close to the scale, in terms of corpus size and the number of parameters, of the largest English-language models. In this study, we present two language models for Polish based on the popular BERT architecture. The larger model was trained on a dataset consisting of over 1 billion polish sentences, or 135GB of raw text. We describe our methodology for collecting the data, preparing the corpus, and pre-training the model. We then evaluate our models on thirteen Polish linguistic tasks, and demonstrate improvements over previous approaches in eleven of them.
摘要：基于变压器的语言模型现已广泛应用于自然语言处理（NLP）。这种说法是英语语言，其中使用基于变压器的架构许多预先训练模式已发表在最近几年尤其如此。这推动着艺术的状态，适用于各种标准的NLP任务，如分类，回归和序列标签，以及文本到文本的任务，如机器翻译，问答系统，或总结。这种情况一直是低资源语言，如波兰不同，但是。虽然可用于波兰一些基于变压器的语言模型，他们都没有接近的规模，在语料库大小和参数上最大的英语语言模型的数量方面。在这项研究中，我们提出了基于流行的BERT架构波兰二语车型。较大的模型被训练在由超过十亿抛光的句子，或原始文本的135GB的数据集。我们描述我们的方法收集数据，编制主体，以及预先训练模型。然后，我们评估13级地波兰语语言的任务我们的模型，并展示了它们在十一以前的方法的改进。

12. BERT Loses Patience: Fast and Robust Inference with Early Exit [PDF] 返回目录
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei
Abstract: In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM). To achieve this, our approach couples an internal-classifier with each layer of a PLM and dynamically stops inference when the intermediate predictions of the internal classifiers do not change for a pre-defined number of steps. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers. Meanwhile, experimental results with an ALBERT model show that our method can improve the accuracy and robustness of the model by preventing it from overthinking and exploiting multiple classifiers for prediction, yielding a better accuracy-speed trade-off compared to existing early exit methods.
摘要：本文提出了一种基于耐心，早早出局，可以用来作为一个插件和播放技术，同时提高预训练语言模型（PLM）的效率和稳健性的直接而有效的推断方法。为了实现这一点，我们的方法联接的内部分类器与PLM的每一层，并动态停止推断当内部分类器的中间预测对的步骤的预先定义的数量不改变。我们的方法提高效率的推论，因为它允许模型作出预测，更少的层。同时，与伟业模型表明，我们的方法可以防止它和得太多利用多个分类进行预测，得到提高模型的准确性和鲁棒性实验结果更准确，速度的权衡相比，现有的提前退出的方法。

13. Interactive Extractive Search over Biomedical Corpora [PDF] 返回目录
Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal, Yaara Cohen, Yoav Goldberg
Abstract: We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora: the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a collection of over 45,000 research papers focused on COVID-19 research. The system is publicly available at this https URL
摘要：本文提出了一种系统，使生命科学的研究人员搜索使用过的依赖关系图的图案科学文本的语言标注语料库，以及使用模式在令牌序列和布尔关键字查询的功能强大的变种。相较于以前的尝试基于相关性的搜索，我们引入不需要用户知道底层语言表述的细节，而是通过提供再加上简单的标记例句查询语料库重量轻查询语言。搜索是在交互式速度由于有效的语言图形-索引和检索引擎执行。这允许快速的勘探，开发和用户查询细化。我们演示使用例如工作流系统在两个语料库：考研语料，包括14446243考研摘要和CORD-19数据集，45000研究论文集专注于COVID-19的研究。该系统是公开的，在此HTTPS URL

14. Language Models as Fact Checkers? [PDF] 返回目录
Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, Madian Khabsa
Abstract: Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data. In this paper, we leverage this implicit knowledge to create an effective end-to-end fact checker using a solely a language model, without any external knowledge or explicit retrieval components. While previous work on extracting knowledge from LMs have focused on the task of open-domain question answering, to the best of our knowledge, this is the first work to examine the use of language models as fact checkers. In a closed-book setting, we show that our zero-shot LM approach outperforms a random baseline on the standard FEVER task, and that our fine-tuned LM compares favorably with standard baselines. Though we do not ultimately outperform methods which use explicit knowledge bases, we believe our exploration shows that this method is viable and has much room for exploration.
摘要：最近的研究表明，语言模型（LMS）同时存储的常识，并从训练前的数据了解到事实性知识。在本文中，我们利用这种隐性知识创造使用单独的语言模型的有效终端到终端的事实检查，无需任何外部知识或明确的检索功能组件。虽然从LM的提取知识以前的工作主要集中在开放域问答的任务，给我们所知，这是第一个工作，研究使用语言模型作为事实跳棋。在闭卷设置，我们证明了我们的零拍LM方法比对标准FEVER任务随机基线，我们的微调LM与标准的基线相比，毫不逊色。虽然我们没有最终超越它使用显性知识基础的方法，我们相信，我们的探索表明，该方法是可行的，并拥有勘探很大的空间。

15. Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation [PDF] 返回目录
El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Hasan Cavusoglu
Abstract: We describe our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE) (Mayhew et al., 2020). We view MT models at various training stages (i.e., checkpoints) as human learners at different levels. Hence, we employ an ensemble of multi-checkpoints from the same model to generate translation sequences with various levels of fluency. From each checkpoint, for our best model, we sample n-Best sequences (n=10) with a beam width =100. We achieve 37.57 macro F1 with a 6 checkpoint model ensemble on the official English to Portuguese shared task test data, outperforming a baseline Amazon translation system of 21.30 macro F1 and ultimately demonstrating the utility of our intuitive method.
摘要：（梅休等人，2020年）我们描述了我们提交给2020年听歌共享任务上同声传译的释义语言教育（人造）。我们在不同的训练阶段（即检查点）作为人的学习者在不同层面查看MT车型。因此，我们采用一个集成的多关卡从同一模型来生成与各级流利的翻译序列。从每个检查点，对于我们的最佳模型，我们样品与波束宽度= 100 n最佳序列（N = 10）。我们达到37.57与官方英语6关卡模式集合宏观F1到葡萄牙共享任务的测试数据，跑赢21.30宏F1的基线亚马逊翻译系统，最终证明我们的直观方法的实用性。

16. Semantic Loss Application to Entity Relation Recognition [PDF] 返回目录
Venkata Sasank Pagolu
Abstract: Usually, entity relation recognition systems either use a pipe-lined model that treats the entity tagging and relation identification as separate tasks or a joint model that simultaneously identifies the relation and entities. This paper compares these two general approaches for the entity relation recognition. State-of-the-art entity relation recognition systems are built using deep recurrent neural networks which often does not capture the symbolic knowledge or the logical constraints in the problem. The main contribution of this paper is an end-to-end neural model for joint entity relation extraction which incorporates a novel loss function. This novel loss function encodes the constraint information in the problem to guide the model training effectively. We show that addition of this loss function to the existing typical loss functions has a positive impact over the performance of the models. This model is truly end-to-end, requires no feature engineering and easily extensible. Extensive experimentation has been conducted to evaluate the significance of capturing symbolic knowledge for natural language understanding. Models using this loss function are observed to be outperforming their counterparts and converging faster. Experimental results in this work suggest the use of this methodology for other language understanding applications.
摘要：通常情况下，实体关系识别系统既可以使用一个管内衬模型治疗实体标记和关系识别为单独的任务或一联合模型，其同时识别的关系和实体。本文对实体关系识别这两种常用的方法进行比较。国家的最先进的实体关系识别系统使用的是深回归神经网络经常不捕获符号知识或问题的逻辑约束建成。本文的主要贡献是对关节实体关系抽取的端至端的神经元模型并入了新颖损失函数。这种新颖的损失函数编码中存在的问题的约束信息有效地指导模型训练。我们发现，除了这种损失函数的现有典型的损失函数拥有该机型的表现产生正面影响。这种模式是真正的终端到终端，不需要的功能设计和易于扩展。大量的实验已经进行了评估捕捉自然语言理解象征知识的重要性。使用此损失函数模型观察到其超越同行和收敛速度更快。在这项工作中的实验结果表明，使用这种方法对其他语言的理解应用。

17. A Multitask Learning Approach for Diacritic Restoration [PDF] 返回目录
Sawsan Alqahtani, Ajay Mishra, Mona Diab
Abstract: In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings. Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word. This results in a more ambiguous text making computational processing on such text more difficult. Diacritic restoration is the task of restoring missing diacritics in the written text. Most state-of-the-art diacritic restoration models are built on character level information which helps generalize the model to unseen data, but presumably lose useful information at the word level. Thus, to compensate for this loss, we investigate the use of multi-task learning to jointly optimize diacritic restoration with related NLP problems namely word segmentation, part-of-speech tagging, and syntactic diacritization. We use Arabic as a case study since it has sufficient data resources for tasks that we consider in our joint modeling. Our joint models significantly outperform the baselines and are comparable to the state-of-the-art models that are more complex relying on morphological analyzers and/or a lot more data (e.g. dialectal data).
摘要：在像阿拉伯语许多语言中，变音符号用于指定发音和含义。这样的变音符号往往省略了书面文字，增加可能发音和含义的编号为字。这导致了一个比较模糊的文本进行计算处理，对这样的文本更加困难。变音恢复是在书面文字恢复丢失的变音符号的任务。国家的最先进的音调符号恢复模式是建立在角色等级信息，这有助于推广的模式，看不见的数据，但在字的级别可能丢失有用的信息最多。因此，为了弥补这一损失，我们调查使用多任务学习，共同优化音调符号恢复与相关的NLP的问题，即分词，部分词性标注，和句法diacritization。我们使用阿拉伯语作为一个案例研究，因为它有我们在我们的联合建模考虑的任务足够的数据资源。我们的联合模型显著优于基线和比得上国家的最先进的模型，更复杂依靠形态学分析仪和/或更大量的数据（例如方言数据）。

18. Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings [PDF] 返回目录
Katikapalli Subramanyam Kalyan, S.Sangeetha
Abstract: Medical concept normalization helps in discovering standard concepts in free-form text i.e., maps health-related mentions to standard concepts in a vocabulary. It is much beyond simple string matching and requires a deep semantic understanding of concept mentions. Recent research approach concept normalization as either text classification or text matching. The main drawback in existing a) text classification approaches is ignoring valuable target concepts information in learning input concept mention representation b) text matching approach is the need to separately generate target concept embeddings which is time and resource consuming. Our proposed model overcomes these drawbacks by jointly learning the representations of input concept mention and target concepts. First, it learns the input concept mention representation using RoBERTa. Second, it finds cosine similarity between embeddings of input concept mention and all the target concepts. Here, embeddings of target concepts are randomly initialized and then updated during training. Finally, the target concept with maximum cosine similarity is assigned to the input concept mention. Our model surpasses all the existing methods across three standard datasets by improving accuracy up to 2.31%.
摘要：医疗概念正常化有助于在自由格式的文本，即发现标准概念，映射健康相关提到在词汇标准概念。它是更超越了简单的字符串匹配，需要深刻的语义理解的概念提到。最近的研究方法的概念，正常化无论是文本分类和文本匹配。在现有一个）文本分类的主要缺点办法在学习输入概念提及表示b）文本匹配方法忽略有价值目标概念的信息是需要分别生成目标概念的嵌入这是耗费时间和资源。我们提出的模型通过共同学习的输入概念提及和目标概念的表述克服了这些缺点。首先，它使用学习罗伯塔输入概念提表示。其次，它发现输入概念提的嵌入和所有的目标概念之间的余弦相似性。这里，目标概念的嵌入随机初始化，然后训练期间更新。最后，具有最大余弦相似目标概念被分配给输入的概念提。我们的模型通过提高精度高达2.31％，超过了所有在三个标准数据集的现有方法。

19. Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks [PDF] 返回目录
Gašper Beguš
Abstract: Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.
摘要：在语音数据上很好理解的依赖性训练深层神经网络可以提供新的见解，他们是如何学习的内部表示。本文认为，采集语音可以被建模为在剖成对抗性网络架构随机空间和生成的语音数据之间的相关性，并提出了一种方法揭示了网络的内部表示对应于语音和语音属性。创成对抗性架构是唯一适当的，因为网络进行训练上无注明原始声学数据和学习是没有任何语言的特定假设或抽象的预先假定的水平无人看管模拟语音和音韵学。一种生成对抗性网络进行训练对英语的音位变体分布。网络学习成功的音位交替：网络的产生的语音信号中包含的愿望持续时间的条件分布。本文提出了一种技术，用于建立所述网络的内部表示，用于识别潜在变量对应于，例如，存在的[S]及其光谱特性。通过操纵这些变量，我们积极控制[S]的存在和它的振幅frication在所生成的输出。这表明，网络学会使用潜变量作为语音和语音表征的近似值。最重要的是，我们看到，在训练学到的依赖超出了训练间隔，允许学习表示额外的探索。本文还讨论了网络的架构和创新产出如何像和语言行为在语言习得，语言障碍和言语失误差异，且语音数据如何理解的依赖性，可以帮助我们理解网络如何神经学习他们的陈述。

20. ValNorm: A New Word Embedding Intrinsic Evaluation Method Reveals Valence Biases are Consistent Across Languages and Over Decades [PDF] 返回目录
Autumn Toney, Aylin Caliskan
Abstract: Word embeddings learn implicit biases from linguistic regularities captured by word co-occurrence information. As a result, statistical methods can detect and quantify social biases as well as widely shared associations imbibed by the corpus the word embeddings are trained on. By extending methods that quantify human-like biases in word embeddings, we introduce ValNorm, a new word embedding intrinsic evaluation task, and the first unsupervised method that estimates the affective meaning of valence in words with high accuracy. The correlation between human scores of valence for 399 words collected to establish pleasantness norms in English and ValNorm scores is r=0.88. These 399 words, obtained from social psychology literature, are used to measure biases that are non-discriminatory among social groups. We hypothesize that the valence associations for these words are widely shared across languages and consistent over time. We estimate valence associations of these words using word embeddings from six languages representing various language structures and from historical text covering 200 years. Our method achieves consistently high accuracy, suggesting that the valence associations for these words are widely shared. In contrast, we measure gender stereotypes using the same set of word embeddings and find that social biases vary across languages. Our results signal that valence associations of this word set represent widely shared associations and consequently an intrinsic quality of words.
摘要：Word中的嵌入学习隐性偏见从字同现信息的捕捉语言规律。其结果是，统计方法可以检测和量化的社会偏见，以及广泛共享协会吸入由黄字的嵌入是在训练。通过扩展方法是定量人样字的嵌入偏见，我们引入ValNorm，一个新词嵌入内在评估任务，并估算价在高精度词情感意义的第一个无人监管的方法。化合价为收集建立在英国和ValNorm分数愉悦规范399个字的人的分数之间的相关性为r = 0.88。这些399分的话，从社会心理文献中获得，用于测量属于社会群体之间无歧视偏见。我们假设这些话价协会广泛跨越语言和一致的随着时间的推移共享。我们估计的使用文字的嵌入从代表各种语言结构六种语言和占地200来的历史文本这话价关联。我们的方法实现了始终如一的高准确度，这表明对这些词的化合价关联被广泛分享。相比之下，我们使用相同的一组字的嵌入的衡量性别成见和发现社会偏见因语言而异。我们的研究结果发信号通知这个字组的化合价协会代表广泛共享协会和词语的结果的内在质量。

21. A Cross-Task Analysis of Text Span Representations [PDF] 返回目录
Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel
Abstract: Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words and sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluation of six span representation methods using eight pretrained language representation models across six tasks, including two tasks that we introduce. We find that, although some simple span representations are fairly reliable across tasks, in general the optimal span representation varies by task, and can also vary within different facets of individual tasks. We also find that the choice of span representation has a bigger impact with a fixed pretrained encoder than with a fine-tuned encoder.
摘要：许多自然语言处理（NLP）任务涉及与文本的跨度，包括问答，实体识别和指代消解推理。虽然大量的研究都集中在功能结构为代表的单词和句子，对句子中表示文本的任意跨度较少的工作。在本文中，我们在六个任务进行使用八个预训练的语言表示模型六个跨度表示方法的全面经验评估，其中包括两个任务，我们介绍。我们发现，虽然一些简单的跨度表示是整个任务相当可靠的，一般来说最佳的跨度表示的任务而异，也可以单个任务不同层面内变化。我们还发现，跨度表示的选择具有比有微调编码器固定预训练的编码器产生更大的影响。

22. Challenges and Thrills of Legal Arguments [PDF] 返回目录
Anurag Pallaprolu, Radha Vaidya, Aditya Swaroop Attawar
Abstract: State-of-the-art attention based models, mostly centered around the transformer architecture, solve the problem of sequence-to-sequence translation using the so-called scaled dot-product attention. While this technique is highly effective for estimating inter-token attention, it does not answer the question of inter-sequence attention when we deal with conversation-like scenarios. We propose an extension, HumBERT, that attempts to perform continuous contextual argument generation using locally trained transformers.
摘要：国家的最先进的基于关注的机型，大多是围绕变压器架构为中心，彻底解决使用所谓的缩放点积注意序列到序列翻译的问题。虽然这种技术是估计跨令牌的关注非常有效，当我们处理交谈般的场景中没有回答的序列间注意的问题。我们提出了一个扩展，亨伯特，尝试使用当地训练的变压器进行连续上下文的说法产生。

23. Relation of the Relations: A New Paradigm of the Relation Extraction Problem [PDF] 返回目录
Zhijing Jin, Yongyi Yang, Xipeng Qiu, Zheng Zhang
Abstract: In natural language, often multiple entities appear in the same text. However, most previous works in Relation Extraction (RE) limit the scope to identifying the relation between two entities at a time. Such an approach induces a quadratic computation time, and also overlooks the interdependency between multiple relations, namely the relation of relations (RoR). Due to the significance of RoR in existing datasets, we propose a new paradigm of RE that considers as a whole the predictions of all relations in the same context. Accordingly, we develop a data-driven approach that does not require hand-crafted rules but learns by itself the RoR, using Graph Neural Networks and a relation matrix transformer. Experiments show that our model outperforms the state-of-the-art approaches by +1.12\% on the ACE05 dataset and +2.55\% on SemEval 2018 Task 7.2, which is a substantial improvement on the two competitive benchmarks.
摘要：在自然语言，往往多个实体出现在同一个文本。然而，在关系抽取（RE）多数以前的作品限制范围确定在同一时间两个实体之间的关系。这种做法导致的二次计算时间，也可俯瞰多重关系，关系（ROR）即关系之间的相互依赖性。由于回报率在现有数据集的重要意义，提出了稀土的一种新的范例，认为作为一个整体在同一环境中的所有关系的预测。因此，我们开发了一个数据驱动的办法，本身并不RoR的需要手工制作的规则，但获悉，使用图形神经网络和关系矩阵变压器。实验表明，我们的模型优于国家的最先进的通过在ACE05数据集1.12 \％和SemEval 2018任务7.2，这是两个具有竞争力基准的大幅改善2.55 \％接近。

24. Accelerating Natural Language Understanding in Task-Oriented Dialog [PDF] 返回目录
Ojas Ahuja, Shrey Desai
Abstract: Task-oriented dialog models typically leverage complex neural architectures and large-scale, pre-trained Transformers to achieve state-of-the-art performance on popular natural language understanding benchmarks. However, these models frequently have in excess of tens of millions of parameters, making them impossible to deploy on-device where resource-efficiency is a major concern. In this work, we show that a simple convolutional model compressed with structured pruning achieves largely comparable results to BERT on ATIS and Snips, with under 100K parameters. Moreover, we perform acceleration experiments on CPUs, where we observe our multi-task model predicts intents and slots nearly 63x faster than even DistilBERT.
摘要：面向任务的对话模型通常利用复杂的神经结构和规模化，预先训练变压器实现对流行的自然语言理解的基准，国家的最先进的性能。然而，这些模型中经常有超过千万的参数，使他们无法在设备部署在那里的资源效率是一个主要问题。在这项工作中，我们证明了一个简单的卷积模型下100K参数与结构修剪实现在很大程度上可比较的结果压缩到ATIS和零星消息BERT，有。此外，我们在CPU上，我们看到我们的多任务模型预测的意图和插槽近63倍，甚至比DistilBERT更快地执行加速实验。

25. UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings [PDF] 返回目录
Milan Straka, Jana Straková
Abstract: We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin. We submitted a system based on UDPipe 2.0, one of the winners of the CoNLL 2018 Shared Task, The 2018 Shared Task on Extrinsic Parser Evaluation and SIGMORPHON 2019 Shared Task. Our system places first by a wide margin both in lemmatization and POS tagging in the open modality, where additional supervised data is allowed, in which case we utilize all Universal Dependency Latin treebanks. In the closed modality, where only the EvaLatin training data is allowed, our system achieves the best performance in lemmatization and in classical subtask of POS tagging, while reaching second place in cross-genre and cross-time settings. In the ablation experiments, we also evaluate the influence of BERT and XLM-RoBERTa contextualized embeddings, and the treebank encodings of the different flavors of Latin treebanks.
摘要：我们提出我们的EvaLatin贡献共享任务，这是第一次评估活动专门的NLP工具拉丁评价。我们提出基于UDPipe 2.0，在CoNLL 2018共享任务的获奖者之一，2018年共同任务上外在分析器评估和SIGMORPHON 2019共享任务的系统。我们的系统排名第一大幅无论是在开放的模式，其中额外教师数据是允许的，在这种情况下，我们利用所有通用的依赖拉丁树库词形还原和词性标注。在关闭模式，其中只有EvaLatin训练数据是允许的，我们的系统实现了词形还原和词性标注的古典子任务的最佳性能，同时达到跨流派和跨时间设置第二位。在消融实验中，我们也评估BERT和XLM，罗伯塔的影响情境的嵌入和拉美树库的不同口味的树库编码。

26. Prague Dependency Treebank -- Consolidated 1.0 [PDF] 返回目录
Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková
Abstract: We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research. PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme (albeit not everything is annotated manually, as we describe in detail here). The texts come from different sources: daily newspaper articles, Czech translation of the Wall Street Journal, transcribed dialogs and a small amount of user-generated, short, often non-standard language segments typed into a web translator. Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation. The diversity of the texts and annotations should serve well the NLP applications as well as it is an invaluable resource for linguistic research, including comparative studies regarding texts of different genres. The corpus is publicly and freely available.
摘要：本文提出了一种丰富的注释和体裁多样化的语言资源，布拉格依存树库，综合1.0（PDT-C 1.0），其目的是 - 因为它一直是家庭中的布拉格依存树库的情况下 - 以既用作各种类型的NLP任务以及为语言学为导向的研究训练数据。 PDT-C 1.0包含捷克的四个不同的数据集，均匀地注释的使用标准的PDT方案（虽然不是一切都手动注释，如我们在这里详细描述）。该文来自不同的来源：日报的文章，华尔街日报的捷克语翻译，转录对话框和用户生成的，短量小，输入到网页翻译常常不规范语言段。总之，树库包含约18万句子，它们的形态，表层和深层语法标注。文本和注释的多样性应成为很好的NLP应用，以及它是语言研究，包括有关不同类型的文本比较研究的宝贵资源。该语料库是公开和免费提供。

27. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations [PDF] 返回目录
John M. Giorgi, Osvald Nitski, Gary D. Bader, Bo Wang
Abstract: We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, a self-supervised method for learning universal sentence embeddings that transfer to a wide variety of natural language processing (NLP) tasks. Our objective leverages recent advances in deep metric learning (DML) and has the advantage of being conceptually simple and easy to implement, requiring no specialized architectures or labelled training data. We demonstrate that our objective can be used to pretrain transformers to state-of-the-art performance on SentEval, a popular benchmark for evaluating universal sentence embeddings, outperforming existing supervised, semi-supervised and unsupervised methods. We perform extensive ablations to determine which factors contribute to the quality of the learned embeddings. Our code will be publicly available and can be easily adapted to new datasets or used to embed unseen text.
摘要：我们目前DeCLUTR：深对比学习的无监督文本表示，对于学习普及句子的嵌入自我监督的方法转移到各种各样的自然语言处理（NLP）的任务。我们的目标利用了深度量学习（DML）的最新进展，并有被概念简单，易于实现，无需专门的架构或标记的训练数据的优势。我们证明，我们的目标可以用来pretrain变压器上SentEval，一个流行的风向标，用于评估通用句子的嵌入，超越现有的监督，半监督和无人监督方法的国家的最先进的性能。我们进行大量的消融，以确定哪些因素对了解到的嵌入的质量做出贡献。我们的代码将公开，并可以很容易地适应新的数据集或用于嵌入看不见的文本。

28. DeBERTa: Decoding-enhanced BERT with Disentangled Attention [PDF] 返回目录
Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
Abstract: Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency of model pre-training and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and pre-trained models will be made publicly available at this https URL.
摘要：在预训练神经语言模型的最新进展显著改善的许多自然语言处理（NLP）任务的性能。在本文中，我们提出了一个新的模型架构DeBERTa改善使用两种新技术的BERT和罗伯塔模型（与解缠结的注意力解码增强BERT）。第一个是解缠结注意机制，其中，每个字是使用分别编码它的内容和位置，两个向量表示，并且词之间的关注权重是使用它们的内容和相对位置解缠结的矩阵计算。第二，增强的解码器掩膜被用来替换输出SOFTMAX层来预测模型预训练的掩蔽令牌。我们发现，这两种技术显著改善的模型前培训和下游任务的性能效率。相比于罗伯塔 - 大，一个DeBERTa模型中训练的训练数据进行半持续改善，就广泛的NLP任务，由+ 0.9％（90.2％对91.1％），实现对MNLI改进，在阵容V2.0通过+ 2.3％（88.4％对90.7％），并通过RACE + 3.6％（83.2％对86.8％）。该DeBERTa代码和预训练的车型将在此HTTPS URL公之于众。

29. Filtered Inner Product Projection for Multilingual Embedding Alignment [PDF] 返回目录
Vin Sachidananda, Ziyi Yang, Chenguang Zhu
Abstract: Due to widespread interest in machine translation and transfer learning, there are numerous algorithms for mapping multiple embeddings to a shared representation space. Recently, these algorithms have been studied in the setting of bilingual dictionary induction where one seeks to align the embeddings of a source and a target language such that translated word pairs lie close to one another in a common representation space. In this paper, we propose a method, Filtered Inner Product Projection (FIPP), for mapping embeddings to a common representation space and evaluate FIPP in the context of bilingual dictionary induction. As semantic shifts are pervasive across languages and domains, FIPP first identifies the common geometric structure in both embeddings and then, only on the common structure, aligns the Gram matrices of these embeddings. Unlike previous approaches, FIPP is applicable even when the source and target embeddings are of differing dimensionalities. We show that our approach outperforms existing methods on the MUSE dataset for various language pairs. Furthermore, FIPP provides computational benefits both in ease of implementation and scalability.
摘要：由于机器翻译和传递学习广泛关注，有多重的嵌入映射到一个共享的表示空间众多算法。最近，这些算法已被研究在双语词典感应的其中一个目的是对准源和目标语言，使得译词对位于彼此靠近在一个共同的表示空间的的嵌入设置。在本文中，我们提出了一种方法，过滤内积投影（FIPP），用于映射的嵌入到公共表示空间和在双语词典感应的上下文评估FIPP。语义移是不同语言和域普遍，FIPP首先识别两个的嵌入的共同几何结构，然后，只对共同的结构，对齐到的革兰氏矩阵的嵌入这些的。不同于以往的做法，即使源和目标的嵌入是不同维度的FIPP适用。我们表明，我们的方法比现有的MUSE数据集各种语言对方法。此外，FIPP同时提供了在易于实现和可扩展性的计算的好处。

30. MultiSpeech: Multi-Speaker Text to Speech with Transformer [PDF] 返回目录
Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin
Abstract: Transformer-based text to speech (TTS) model (e.g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e.g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference. However, the parallel computation increases the difficulty while learning the alignment between text and speech in Transformer, which is further magnified in the multi-speaker scenario with noisy data and diverse speakers, and hinders the applicability of Transformer for multi-speaker TTS. In this paper, we develop a robust and high-quality multi-speaker Transformer TTS system called MultiSpeech, with several specially designed components/techniques to improve text-to-speech alignment: 1) a diagonal constraint on the weight matrix of encoder-decoder attention in both training and inference; 2) layer normalization on phoneme embedding in encoder to better preserve position information; 3) a bottleneck in decoder pre-net to prevent copy between consecutive speech frames. Experiments on VCTK and LibriTTS multi-speaker datasets demonstrate the effectiveness of MultiSpeech: 1) it synthesizes more robust and better quality multi-speaker voice than naive Transformer based TTS; 2) with a MutiSpeech model as the teacher, we obtain a strong multi-speaker FastSpeech model with almost zero quality degradation while enjoying extremely fast inference speed.
摘要：基于变压器的文本到语音转换（TTS）模型（例如，变压器TTS〜\举{li2019neural}，FastSpeech〜\举{ren2019fastspeech}）已经显示出的训练和推理效率优于基于RNN模型（例如，Tacotron 〜\ {举} shen2018natural）由于在训练和/或推断其并行计算。然而，并行计算增加了困难，同时学习变压器，其进一步在所述多扬声器场景，其中噪声数据和多样化的扬声器，并阻碍变压器用于多扬声器TTS适用性放大文本和语音之间的对准。在本文中，我们开发了一个强大的和高品质的多喇叭变压器TTS系统，称为MultiSpeech，有几个专门设计的组件/技术来提高文本到语音的定位：1）编码器，解码器的权重矩阵对角线约束注意在训练和推论; 2）关于音素在编码器嵌入更好地保存位置信息层正常化; 3）在解码器中的瓶颈预净，以防止连续语音帧之间的副本。在VCTK和LibriTTS多扬声器的数据集实验证明MultiSpeech的效果：它综合了更强大和更优质的多喇叭的声音比幼稚变压器基于TTS）1; 2）用MutiSpeech模型作为老师，我们得到了一个强大的多扬声器FastSpeech模型几乎为零质量下降，同时享受极快的速度推断。

31. WaveNODE: A Continuous Normalizing Flow for Speech Synthesis [PDF] 返回目录
Hyeongju Kim, Hyeongseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim
Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conventional models, WaveNODE places no constraint on the function used for flow operation, thus allowing the usage of more flexible and complex functions. Moreover, WaveNODE can be optimized to maximize the likelihood without requiring any teacher network or auxiliary loss terms. We experimentally show that WaveNODE achieves comparable performance with fewer parameters compared to the conventional flow-based vocoders.
摘要：近年来，各种基于流的生成模型已经被提出来生成实时高保真的波形。然而，这些模型或者需要一支训练有素的教师网络或一些使他们记忆效率低下的流程步骤。在本文中，我们提出了所谓的WaveNODE一个新的生成模型，其利用语音合成连续正火流。不同于传统的车型，WaveNODE放在用于流操作的功能没有限制，因此允许更加灵活和复杂的功能的使用。此外，WaveNODE可以被优化以最大化，而不需要任何教师网络或辅助损耗项的可能性。我们通过实验证明WaveNODE实现相比于传统的基于流的声码器参数少相当的性能。

32. FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech [PDF] 返回目录
Yi Ren, Chenxu Hu, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Abstract: Advanced text-to-speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duration prediction (to provide more information as input) and knowledge distillation (to simplify the data distribution in output), which can ease the one-to-many mapping problem (i.e., multiple speech variations correspond to the same text) in TTS. However, FastSpeech has several disadvantages: 1) the teacher-student distillation pipeline is complicated, 2) the duration extracted from the teacher model is not accurate enough, and the target mel-spectrograms distilled from teacher model suffer from information loss due to data simplification, both of which limit the voice quality. In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e.g., pitch, energy and more accurate duration) as conditional inputs. Specifically, we extract duration, pitch and energy from speech waveform and directly take them as conditional inputs during training and use predicted values during inference. We further design FastSpeech 2s, which is the first attempt to directly generate speech waveform from text in parallel, enjoying the benefit of full end-to-end training and even faster inference than FastSpeech. Experimental results show that 1) FastSpeech 2 and 2s outperform FastSpeech in voice quality with much simplified training pipeline and reduced training time; 2) FastSpeech 2 and 2s can match the voice quality of autoregressive models while enjoying much faster inference speed.
摘要：高级文本到语音转换（TTS）等车型FastSpeech可以比以前的自回归模型相媲美的高质量显著更快的合成语音。 FastSpeech模型的训练依赖于用于时间预测的自回归教师模型（以提供作为输入的详细信息）和知识蒸馏（为了简化在输出数据分配），它可以缓解一个一对多映射问题（即，多个语音变化对应于相同的文字）的TTS。然而，FastSpeech有几个缺点：1）师生蒸馏管线复杂，2）从老师模型中提取的持续时间不够准确，并从教师模型蒸目标梅尔频谱的信息，遭受损失是由于数据精简，这两个限制了语音质量。在本文中，我们提出FastSpeech 2，这在FastSpeech并较好地解决了由1 TTS的一个一对多的映射问题）直接培训与地面实况的目标，而不是从教师简化输出和2个模型来解决上述问题）将语音的更多的变化信息（例如，音调，能量和更精确的持续时间）为条件的输入。具体来说，我们提取的持续时间，音调和能量从语音波形，并直接把他们作为条件输入训练期间和推理过程中使用的预测值。我们进一步设计FastSpeech 2S，这是直接从生成并行文本语音波形的第一次尝试，享受全面的终端到终端的培训比FastSpeech的利益，甚至更快的推论。实验结果表明，1）FastSpeech 2和2S优于FastSpeech在话音质量与大大简化训练管道并减少训练时间; 2）FastSpeech 2和2S可同时享受更快的速度推断匹配自回归模型的语音质量。

33. A non-causal FFTNet architecture for speech enhancement [PDF] 返回目录
Muhammed PV Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou
Abstract: In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time domain, where noise is usually highly non-correlated, and therefore it is suitable for waveform domain based speech enhancement. To further strengthen this feature of FFTNet, we suggest a non-causal FFTNet architecture, where the present sample in each layer is estimated from the past and future samples of the previous layer. By suggesting a shallow network and applying non-causality within certain limits, the suggested FFTNet for speech enhancement (SE-FFTNet) uses much fewer parameters compared to other neural network based approaches for speech enhancement like WaveNet and SEGAN. Specifically, the suggested network has considerably reduced model parameters: 32% fewer compared to WaveNet and 87% fewer compared to SEGAN. Finally, based on subjective and objective metrics, SE-FFTNet outperforms WaveNet in terms of enhanced signal quality, while it provides equally good performance as SEGAN. A Tensorflow implementation of the architecture is provided at 1 .
摘要：在本文中，我们建议基于FFTNet，产生高品质的音频波形的神经网络语音增强新的并行，非因果和浅波形域架构。相对于其他的波形为基础的方法等WaveNet，FFTNet使用初始宽扩张图案。这样的体系结构更好地表示长期在时域中，其中噪声是通常是高度不相关的语音相关的结构，因此它适合于波形域中的基于语音增强。为了进一步加强FFTNet的这一特点，我们提出了一种非因果FFTNet架构，其中每层本样品从上一层的过去和未来的样本来估计。通过建议一个浅网络和在一定限度内施加非因果关系，所建议的FFTNet用于语音增强（SE-FFTNet）相比，语音增强等WaveNet和SEGAN其他基于神经网络的方法使用少得多的参数。相比WaveNet和相比SEGAN少87％32％更少的：具体地，所建议的网络已显着减少的模型参数。最后，基于主观和客观指标，SE-FFTNet性能优于WaveNet在增强信号质量方面，虽然它提供了同样出色的性能SEGAN。甲Tensorflow执行体系结构中的1设置。

34. Characterizing Sociolinguistic Variation in the Competing Vaccination Communities [PDF] 返回目录
Shahan Ali Memon, Aman Tyagi, David R. Mortensen, Kathleen M. Carley
Abstract: Public health practitioners and policy makers grapple with the challenge of devising effective message-based interventions for debunking public health misinformation in cyber communities. "Framing" and "personalization" of the message is one of the key features for devising a persuasive messaging strategy. For an effective health communication, it is imperative to focus on "preference-based framing" where the preferences of the target sub-community are taken into consideration. To achieve that, it is important to understand and hence characterize the target sub-communities in terms of their social interactions. In the context of health-related misinformation, vaccination remains to be the most prevalent topic of discord. Hence, in this paper, we conduct a sociolinguistic analysis of the two competing vaccination communities on Twitter: "pro-vaxxers" or individuals who believe in the effectiveness of vaccinations, and "anti-vaxxers" or individuals who are opposed to vaccinations. Our data analysis show significant linguistic variation between the two communities in terms of their usage of linguistic intensifiers, pronouns, and uncertainty words. Our network-level analysis show significant differences between the two communities in terms of their network density, echo-chamberness, and the EI index. We hypothesize that these sociolinguistic differences can be used as proxies to characterize and understand these communities to devise better message interventions.
摘要：公共卫生从业者和政策制定者努力克服设计有效的基于消息的干预在网络社区揭穿公共卫生误传的挑战。 “帧”以及该消息的“个性化”是用于设计一个有说服力的消息传递策略的关键特征之一。对于一个有效的健康传播，就必须将重点放在目标子社区的偏好考虑到“基于偏爱的框架”。为了实现这一目标，要了解，因此在他们的社会交往来表征目标子社区是非常重要的。在健康相关的错误信息的情况下，接种疫苗仍然是不和谐的最流行的话题。因此，在本文中，我们进行Twitter上两个相互竞争的疫苗接种群体的社会语言学分析：“亲vaxxers”或个人谁相信在接种疫苗的有效性，以及“反vaxxers”或个人谁反对接种疫苗。我们的数据分析显示，他们的语言增强器，代词和不确定性的话使用方面的两个社区之间显著的语言变化。在他们的网络密度，回声chamberness和EI指数而言这两个社区之间我们的网络层次的分析显示显著差异。我们猜测，这些社会语言学的差异可以作为代理来表征和理解这些社区制定更好的消息干预。

35. Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation [PDF] 返回目录
Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry Dutoit
Abstract: Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.
摘要：各种参数表示已经提出了对语音信号进行建模。尽管这样的声码器的性能在语音处理的上下文中是众所周知的，他们推断歌声合成可能不是直接的。本文的目标是双重的。传统的脉冲声码器，确定性加上随机模型，谐波加噪声模型和GlottHMM：首先，比较主观评价跨适于统计参数合成四个现有技术来执行。这些技术作为歌手类型（男中音，反男高音和女高音）的函数的行为进行了研究。其次，在高音调的声音出现的伪影进行了讨论和可能的方法，以克服它们提出了建议。

36. Maximum Phase Modeling for Sparse Linear Prediction of Speech [PDF] 返回目录
Thomas Drugman
Abstract: Linear prediction (LP) is an ubiquitous analysis method in speech processing. Various studies have focused on sparse LP algorithms by introducing sparsity constraints into the LP framework. Sparse LP has been shown to be effective in several issues related to speech modeling and coding. However, all existing approaches assume the speech signal to be minimum-phase. Because speech is known to be mixed-phase, the resulting residual signal contains a persistent maximum-phase component. The aim of this paper is to propose a novel technique which incorporates a modeling of the maximum-phase contribution of speech, and can be applied to any filter representation. The proposed method is shown to significantly increase the sparsity of the LP residual signal and to be effective in two illustrative applications: speech polarity detection and excitation modeling.
摘要：线性预测（LP）是在语音处理中普遍存在的分析方法。通过引入约束稀疏度成LP框架各种研究都集中在稀疏LP算法。稀疏LP已经被证明是有效的与语音建模和编码的几个问题。然而，所有现有的方法假设语音信号是最小相位。因为语音是已知的混合相，将得到的残留信号包含一个持久最大相位分量。本文的目的是提出一种新颖的技术，它结合了语音的最大相位贡献的建模，并且可以适用于任何过滤器表示。所提出的方法被示出为显著增加LP剩余信号的稀疏性和可有效地两个说明性的应用：语音极性检测和激励建模。

37. Analysis and Synthesis of Hypo and Hyperarticulated Speech [PDF] 返回目录
Benjamin Picart, Thomas Drugman, Thierry Dutoit
Abstract: This paper focuses on the analysis and synthesis of hypo and hyperarticulated speech in the framework of HMM-based speech synthesis. First of all, a new French database matching our needs was created, which contains three identical sets, pronounced with three different degrees of articulation: neutral, hypo and hyperarticulated speech. On that basis, acoustic and phonetic analyses were performed. It is shown that the degrees of articulation significantly influence, on one hand, both vocal tract and glottal characteristics, and on the other hand, speech rate, phone durations, phone variations and the presence of glottal stops. Finally, neutral, hypo and hyperarticulated speech are synthesized using HMM-based speech synthesis and both objective and subjective tests aiming at assessing the generated speech quality are performed. These tests show that synthesized hypoarticulated speech seems to be less naturally rendered than neutral and hyperarticulated speech.
摘要：本文着重于基于HMM的语音合成的框架内分析和防和hyperarticulated合成语音。首先，法国新的数据库匹配我们的需求被创造，其中包含三个相同的组，有三个不同程度的关节明显：中性，低和hyperarticulated讲话。在此基础上，进行了声学和语音分析。结果表明，该铰接角度显著影响，一方面，两个声道和声门特性，并且在另一方面，话音速率，持续时间的电话，电话的变型和声门停止的存在。最后，中性，低和hyperarticulated语音使用基于HMM的语音合成合成并进行客观和主观测试目的在于评估所生成的语音质量。这些测试表明，合成语音hypoarticulated似乎比中性和hyperarticulated讲话被少自然呈现。

38. Incorporating Pragmatic Reasoning Communication into Emergent Language [PDF] 返回目录
Yipeng Kang, Tonghan Wang, Gerard de Melo
Abstract: Emergentism and pragmatics are two research fields that study the dynamics of linguistic communication along substantially different timescales and intelligence levels. From the perspective of multi-agent reinforcement learning, they correspond to stochastic games with reinforcement training and stage games with opponent awareness. Given that their combination has been explored in linguistics, we propose computational models that combine short-term mutual reasoning-based pragmatics with long-term language emergentism. We explore this for agent communication referential games as well as in Starcraft II, assessing the relative merits of different kinds of mutual reasoning pragmatics models both empirically and theoretically. Our results shed light on their importance for making inroads towards getting more natural, accurate, robust, fine-grained, and succinct utterances.
摘要：突现和语用两个研究领域是研究语言交际的大致沿着不同的时间表和智力水平的动态变化。从多代理强化学习的角度来看，它们对应于与对手意识强化培训和舞台的游戏随机游戏。鉴于他们的组合已经在语言学被探索，我们建议结合了长期语言突现短期相互推理语用学的计算模型。我们为代理通信参照游戏探索这个问题，以及在星际争霸II，评估不同类型的相互推理语用学模型的相对优点两个经验和理论。我们的研究结果阐明了其重要性进行袭击朝着越来越自然，准确，稳健的，细粒度的，简洁的话语。

39. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases [PDF] 返回目录
Wei Guo, Aylin Caliskan
Abstract: With the starting point that implicit human biases are reflected in the statistical regularities of language, it is possible to measure biases in static word embeddings. With recent advances in natural language processing, state-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears. Current methods of measuring social and intersectional biases in these contextualized word embeddings rely on the effect magnitudes of bias in a small set of pre-defined sentence templates. We propose a new comprehensive method, Contextualized Embedding Association Test (CEAT), based on the distribution of 10,000 pooled effect magnitudes of bias in embedding variations and a random-effects model, dispensing with templates. Experiments on social and intersectional biases show that CEAT finds evidence of all tested biases and provides comprehensive information on the variability of effect magnitudes of the same bias in different contexts. Furthermore, we develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings in addition to measuring them in contextualized word embeddings. We present the first algorithmic bias detection findings on how intersectional group members are associated with unique emergent biases that do not overlap with the biases of their constituent minority identities. IBD achieves an accuracy of 81.6% and 82.7%, respectively, when detecting the intersectional biases of African American females and Mexican American females. EIBD reaches an accuracy of 84.7% and 65.3%, respectively, when detecting the emergent intersectional biases unique to African American females and Mexican American females (random correct identification probability ranges from 1.0% to 25.5%).
摘要：随着出发点是隐含的人的偏见反映在语言的统计规律，这是可以测量静态字的嵌入偏见。随着自然语言处理的最新进展，国家的最先进的神经语言模型生成依赖于出现的单词的上下文的动态字的嵌入。在这些情境字的嵌入衡量社会和交叉偏见的当前方法依赖于偏向于一小部分的预定义模板句的影响幅度。我们提出了一个新的综合方法，嵌入语境化测试协会（反恐中心）的基础上，偏向10000个汇集效应大小在嵌入变化的分布和随机效应模型，使用模板配药。对社会和交叉偏见实验表明，反恐中心发现所有测试偏见的证据，并提供了在不同环境下相同的偏置的影响幅度的变化的全面信息。此外，我们开发了两种方法，交叉式偏差检测（IBD）和区外应急偏压检测（EIBD），自动识别从静态的嵌入字的交叉偏见和新兴交叉偏见除了在情境字的嵌入测量它们。我们目前的交叉组成员如何与独特的紧急偏见不重叠与他们组成的少数民族身份的偏见有关的第一算法偏差检测结果。 IBD分别达到81.6％和82.7％，精度，检测非洲裔女性和墨西哥裔美国女性的交叉偏见时。 EIBD达到分别84.7％和65.3％，准确度，检测特有的非洲裔美国女性和墨西哥美国女性（随机正确识别概率范围从1.0％至25.5％）的紧急交叉偏差时。

40. MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level [PDF] 返回目录
Amar Shrestha, Krittaphat Pugdeethosapol, Haowen Fang, Qinru Qiu
Abstract: Grounding free-form textual queries necessitates an understanding of these textual phrases and its relation to the visual cues to reliably reason about the described locations. Spatial attention networks are known to learn this relationship and focus its gaze on salient objects in the image. Thus, we propose to utilize spatial attention networks for image-level visual-textual fusion preserving local (word) and global (phrase) information to refine region proposals with an in-network Region Proposal Network (RPN) and detect single or multiple regions for a phrase query. We focus only on the phrase query - ground truth pair (referring expression) for a model independent of the constraints of the datasets i.e. additional attributes, context etc. For such referring expression dataset ReferIt game, our Multi-region Attention-assisted Grounding network (MAGNet) achieves over 12\% improvement over the state-of-the-art. Without the context from image captions and attribute information in Flickr30k Entities, we still achieve competitive results compared to the state-of-the-art.
摘要：接地自由形式的文本查询这些必要的文本短语的理解和它关系到视觉线索约所描述的地点可靠的理由。空间注意网络是众所周知的了解这种关系，并集中视线上的图像中的显着对象。因此，我们建议利用空间注意网络的影像级视觉文本融合保留本地（字）和全球（短语）的信息来改进区域的建议与网内地区提案网络（RPN）和检测单个或多个区域的短语查询。我们专注于短语查询 - 地面实况对（指表达）为模型的独立数据集的限制的，即附加的属性，背景等。对于这样的参考表达数据集ReferIt比赛，我们的多区域注意辅助接地网（磁铁）实现在所述状态的最先进的超过12 \％的改进。如果没有图片说明和Flickr30k实体属性信息的情况下，我们仍然实现相比，国家的最先进的具有竞争力的结果。

41. Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [PDF] 返回目录
Mingjie Li, Fuyu Wang, Xiaojun Chang, Xiaodan Liang
Abstract: Beyond the common difficulties faced in the natural image captioning, medical report generation specifically requires the model to describe a medical image with a fine-grained and semantic-coherence paragraph that should satisfy both medical commonsense and logic. Previous works generally extract the global image features and attempt to generate a paragraph that is similar to referenced reports; however, this approach has two limitations. Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure. Secondly, there are many similar sentences used in each medical report to describe the normal regions of the image, which causes serious data bias. This deviation is likely to teach models to generate these inessential sentences on a regular basis. To address these problems, we propose an Auxiliary Signal-Guided Knowledge Encoder-Decoder (ASGK) to mimic radiologists' working patterns. In more detail, ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning. The core structure of ASGK consists of a medical graph encoder and a natural language decoder, inspired by advanced Generative Pre-Training (GPT). Experiments on the CX-CHR dataset and our COVID-19 CT Report dataset demonstrate that our proposed ASGK is able to generate a robust and accurate report, and moreover outperforms state-of-the-art methods on both medical terminology classification and paragraph generation metrics.
摘要：超越面临的自然图像的字幕共同困难，医疗报告生成明确要求模型来描述与该应同时满足医疗常识和逻辑细粒度和语义相干段的医用图像。以前的作品一般提取全局图像特征，并尝试生成一个段落，类似于引用报告;然而，这种方法有两个限制。首先，主要感兴趣的放射科医师的区域通常位于全球形象的一个小区域，这意味着图像的剩余部分可以被视为训练过程中无关紧要的噪声。其次，在每个医疗报告用来描述图像，这会导致严重的数据偏差的正常区域许多类似的句子。这种偏差可能是教模型来生成定期这些无关紧要的句子。为了解决这些问题，我们提出了一个辅助信号，引导式知识编码器，解码器（ASGK）以模仿放射科医师的工作模式。更详细地，ASGK内部集成的视觉特征融合和外部医疗语言信息来指导医疗知识转移和学习。 ASGK的核心结构由医疗图表编码器和自然语言解码器，通过先进剖成预训练（GPT）的启发的。在实验的CX-CHR数据集和我们COVID-19 CT报表数据集表明，我们提出的ASGK能够产生一个强大的和准确的报告，而且性能优于两个医疗术语分类的国家的最先进的方法和段落发电指标。

42. BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining [PDF] 返回目录
Zachariah Zhang, Jingshu Liu, Narges Razavian
Abstract: Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the art performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.
摘要：临床相互作用最初记录和自由文本医案记载。 ICD编码是分类编码与患者的访问相关联的所有诊断，症状和程序的任务。该过程通常是手动的，非常耗时和昂贵的医院。在本文中，我们提出了一个机器学习模型，BERT-XML，大规模自动化ICD从EHR票据编码，利用已实现对各种NLP任务的先进的性能最近开发的无监督训练前。我们培养一个BERT模式从电子病历笔记划痕，用词汇更适合于EHR任务学习，从而跑赢关闭的，现成的模型。我们适应BERT架构ICD多标签关注编码。虽然其他作品集中在小公共医疗数据集，我们已经产生：使用数百万电子病历笔记预测成千上万的独特ICD编码的第一大规模ICD-10分类模型。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-09

目录

摘要