摘要

1. AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [PDF] 返回目录
Tong Niu, Mohit Bansal
Abstract: Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future.
摘要：许多序列对序列的对话模式容易产生安全，无信息的响应。已经有上试图消除他们各种有用的努力。然而，这些方法或者改进的推理过程中的解码算法，依靠手工制作的功能，或采用复杂的模型。在我们的工作，我们建立对话模型是动态意识到什么话语或令牌是没有任何功能的工程平淡。具体而言，我们先从一个简单而有效的自动度量，AvgOut，其在训练期间计算出的解码器侧的所有时间步长的平均输出概率分布。该指标直接估计令牌更容易产生，从而使得它的型号多样的忠实评价（即，对于不同的车型，令牌的概率应该是更均匀地分布，而不是在几个沉闷的令牌见顶）。然后，我们利用这一新的指标，提出促进多样性不失相关性三种模式。第一种模式，MinAvgOut，直接通过最大化每批的输出分布的分集比分;第二个模型，标签微调（LFT），前置到源序列通过分集比分来控制分集电平连续缩放的标签;第三种模式，RL，采用强化学习和对待多样性分数作为奖励信号。此外，我们结合MinAvgOut和RL的损失方面与混合动力模型试验。所有这四种型号跑赢上都多样性和实用性大幅度的基地LSTM-RNN模型，并比竞争基准（也可以通过人工评估验证）相当或更好。此外，我们的方法是正交的示范基地，使它们适用于作为一个附加在未来其他新兴更好的对话模式。

2. A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts [PDF] 返回目录
Lingyun Zhao, Lin Li, Xinhao Zheng
Abstract: The emergence and rapid progress of the Internet have brought ever-increasing impact on financial domain. How to rapidly and accurately mine the key information from the massive negative financial texts has become one of the key issues for investors and decision makers. Aiming at the issue, we propose a sentiment analysis and key entity detection approach based on BERT, which is applied in online financial text mining and public opinion analysis in social media. By using pre-train model, we first study sentiment analysis, and then we consider key entity detection as a sentence matching or Machine Reading Comprehension (MRC) task in different granularity. Among them, we mainly focus on negative sentimental information. We detect the specific entity by using our approach, which is different from traditional Named Entity Recognition (NER). In addition, we also use ensemble learning to improve the performance of proposed approach. Experimental results show that the performance of our approach is generally higher than SVM, LR, NBM, and BERT for two financial sentiment analysis and key entity detection datasets.
摘要：出现和互联网的飞速进步也带来了金融领域不断增加的影响。如何快速，准确地矿从大量负面财务文本的关键信息已成为投资者和决策者的关键问题之一。针对这个问题，我们提出了一个情感分析和基于BERT，这是在网上金融文本挖掘和舆情分析社交媒体应用的关键实体检测方法。通过使用预火车模型，我们首先研究情感分析，然后我们考虑的关键实体检测在不同粒度的句子匹配或机器阅读理解（MRC）的任务。其中，我们主要集中在负感伤的信息。我们发现，通过使用我们的方法，这是从传统的命名实体识别（NER）不同的特定实体。此外，我们还可以使用集成学习，以提高该方法的性能。实验结果表明，我们的方法的性能一般比SVM，LR，NBM，和BERT较高的两个财务情绪分析和关键实体检测数据集。

3. Authorship Attribution in Bangla literature using Character-level CNN [PDF] 返回目录
Aisha Khatun, Anisur Rahman, Md. Saiful Islam, Marium-E-Jannat
Abstract: Characters are the smallest unit of text that can extract stylometric signals to determine the author of a text. In this paper, we investigate the effectiveness of character-level signals in Authorship Attribution of Bangla Literature and show that the results are promising but improvable. The time and memory efficiency of the proposed model is much higher than the word level counterparts but accuracy is 2-5% less than the best performing word-level models. Comparison of various word-based models is performed and shown that the proposed model performs increasingly better with larger datasets. We also analyze the effect of pre-training character embedding of diverse Bangla character set in authorship attribution. It is seen that the performance is improved by up to 10% on pre-training. We used 2 datasets from 6 to 14 authors, balancing them before training and compare the results.
摘要：字符是文本，可以提取stylometric信号来确定文本的作者的最小单位。在本文中，我们研究了字符级信号的孟加拉文学著作权归属的有效性，并表明其结果是有希望的，但改善的。该模型的时间和内存效率比字级同行高得多，但精度比表现最好的字级车型少2-5％。执行并显示各种基于词的模型比较，该模型执行越来越多地与更大的数据集更好。我们还分析了前培训字符著作权归属不同孟加拉字符集的嵌入的效果。可以看出，性能高达10％的预培训提高。我们使用的数据集2的6至14作家，训练前平衡他们并比较结果。

4. A Continuous Space Neural Language Model for Bengali Language [PDF] 返回目录
Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Anisur Rahman, Aisha Khatun, Md. Saiful Islam
Abstract: Language models are generally employed to estimate the probability distribution of various linguistic units, making them one of the fundamental parts of natural language processing. Applications of language models include a wide spectrum of tasks such as text summarization, translation and classification. For a low resource language like Bengali, the research in this area so far can be considered to be narrow at the very least, with some traditional count based models being proposed. This paper attempts to address the issue and proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language. The performance analysis with some currently existing count based models illustrated in this paper also shows that the proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali.
摘要：语言模型通常用来估计各种语言单位的概率分布，使其自然语言处理的基本组成部分之一。语言模型的应用包括任务，如文本摘要，翻译和分类的广泛。对于像孟加拉低资源的语言，在这方面至今也算是研究是起码狭窄，而提出了一些传统的基于计数模式。本文试图解决这个问题，并提出了一个连续的空间神经语言模型，或者更准确地说是ASGD体重也下降LSTM语言模型，用技术来有效地训练它的孟加拉语一起。本文所示还显示了一些目前存在的以计数为基础的模型的性能分析，提出的架构通过实现一个推论困惑低至51.2对孟加拉的伸出数据集优于其同行。

5. Embedding Compression with Isotropic Iterative Quantization [PDF] 返回目录
Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, Bo Yuan
Abstract: Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.
摘要：词的连续表示是基于深学习NLP车型的标准组件。然而，较大量的词汇需要显著的内存，这可能会导致问题，特别是在资源受限的平台。因此，在本文中，我们提出了压缩嵌入矢量成二进制的，利用图像检索完善的迭代量化技术，同时满足基于PMI模型所需的各向同性各向同性迭代量化（IIQ）的方法。与预训练的嵌入（即，手套和HDC）实验证实与在原始实值嵌入矢量可比有时甚至改善性能超过30倍的压缩比。

6. Tensor Graph Convolutional Networks for Text Classification [PDF] 返回目录
Xien Liu, Xinxin You, Xiao Zhang, Ji Wu, Ping Lv
Abstract: Compared to sequential learning models, graph-based neural networks exhibit some excellent properties, such as ability capturing global information. In this paper, we investigate graph-based neural networks for text classification problem. A new framework TensorGCN (tensor graph convolutional networks), is presented for this task. A text graph tensor is firstly constructed to describe semantic, syntactic, and sequential contextual information. Then, two kinds of propagation learning perform on the text graph tensor. The first is intra-graph propagation used for aggregating information from neighborhood nodes in a single graph. The second is inter-graph propagation used for harmonizing heterogeneous information between graphs. Extensive experiments are conducted on benchmark datasets, and the results illustrate the effectiveness of our proposed framework. Our proposed TensorGCN presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.
摘要：相比于连续的学习模式，基于图形的神经网络表现出一些优异的性能，如能力捕捉全球信息。在本文中，我们研究了文本分类问题，基于图形的神经网络。一个新的框架TensorGCN（图张卷积网络），提出了这一任务。文本图形张量首先被构造来描述语义，语法，和顺序的上下文信息。然后，有两种传播学习上的文字图形张量执行。第一种是用于在单个图表聚集来自邻近节点的信息，图形的帧内传播。第二个是用于协调图之间异构信息曲线图间传播。大量的实验是在基准数据集进行，其结果说明我们提出的框架的有效性。我们提出的TensorGCN礼物协调和异构信息从不同类型的图形整合的有效途径。

7. Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships [PDF] 返回目录
Chundra Aroor Cathcart
Abstract: This paper addresses a series of complex and unresolved issues in the historical phonology of West Iranian languages. The West Iranian languages (Persian, Kurdish, Balochi, and other languages) display a high degree of non-Lautgesetzlich behavior. Most of this irregularity is undoubtedly due to language contact; we argue, however, that an oversimplified view of the processes at work has prevailed in the literature on West Iranian dialectology, with specialists assuming that deviations from an expected outcome in a given non-Persian language are due to lexical borrowing from some chronological stage of Persian. It is demonstrated that this qualitative approach yields at times problematic conclusions stemming from the lack of explicit probabilistic inferences regarding the distribution of the data: Persian may not be the sole donor language; additionally, borrowing at the lexical level is not always the mechanism that introduces irregularity. In many cases, the possibility that West Iranian languages show different reflexes in different conditioning environments remains under-explored. We employ a novel Bayesian approach designed to overcome these problems and tease apart the different determinants of irregularity in patterns of West Iranian sound change. Our methodology allows us to provisionally resolve a number of outstanding questions in the literature on West Iranian dialectology concerning the dialectal affiliation of certain sound changes. We outline future directions for work of this sort.
摘要：本文地址了一系列西伊朗语支的历史音韵复杂和悬而未决的问题。西伊朗语支（波斯，库尔德人，俾路支语等语种）表现出高度的非Lautgesetzlich行为。大多数这种不规则的无疑是由于语言接触;我们认为，但是，在工作流程的一个过于简单化的观点在西方的伊朗方言的文学盛行，与由于从一些时间阶段的词汇借用专家假设从给定的非波斯语的预期结果偏差波斯语。据证实，这种定性方法的产量有时有问题的结论，因为缺乏有关数据的分布概率明确推论的词干：波斯可能不是唯一的供体语言;另外，在词汇水平借用并不总是机制引入了不规则性。在许多情况下，西伊朗的语言说明了在不同的环境条件不同反射的可能性仍然充分开发。我们采用设计来克服这些问题，并梳理出不规则的西伊朗声音的变化模式的不同决定一种新的贝叶斯方法。我们的方法可以让我们暂时解决了许多文献对西方的伊朗方言有关的某些声音的变化方言隶属关系悬而未决的问题。我们为这种工作勾勒未来的发展方向。

8. Urdu-English Machine Transliteration using Neural Networks [PDF] 返回目录
Usman Mohy ud Din
Abstract: Machine translation has gained much attention in recent years. It is a sub-field of computational linguistic which focus on translating text from one language to other language. Among different translation techniques, neural network currently leading the domain with its capabilities of providing a single large neural network with attention mechanism, sequence-to-sequence and long-short term modelling. Despite significant progress in domain of machine translation, translation of out-of-vocabulary words(OOV) which include technical terms, named-entities, foreign words are still a challenge for current state-of-art translation systems, and this situation becomes even worse while translating between low resource languages or languages having different structures. Due to morphological richness of a language, a word may have different meninges in different context. In such scenarios, translation of word is not only enough in order provide the correct/quality translation. Transliteration is a way to consider the context of word/sentence during translation. For low resource language like Urdu, it is very difficult to have/find parallel corpus for transliteration which is large enough to train the system. In this work, we presented transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent. Systems learns the pattern and out-of-vocabulary (OOV) words from parallel corpus and there is no need to train it on transliteration corpus explicitly. This approach is tested on three models of statistical machine translation (SMT) which include phrasebased, hierarchical phrase-based and factor based models and two models of neural machine translation which include LSTM and transformer model.
摘要：机器翻译已经获得了广泛关注，近年来。这是着眼于从一种语言到另一种语言翻译文本计算语言学的子领域。在不同的翻译技术，目前主导其提供与注意机制，序列对序列和长短期建模一个大的神经网络的功能域的神经网络。尽管在机器翻译，出词汇的词（OOV），其中包括技术术语，命名实体翻译的领域显著进步，外来词仍然是国家的最先进的电流转换系统的挑战，而且这种情况变得更而具有不同结构的低资源语言或语言之间的转换变得更糟。由于语言的形态丰富，一个字可以有不同的上下文不同的脑膜。在这种情况下，文字的翻译不仅足以为了提供正确的/翻译质量。音译是考虑在翻译过程中字/句子的上下文的方式。对于像乌尔都语低资源语言，它是很难有/找到音译平行语料库是足够大的训练系统。在这项工作中，我们提出了基于期望最大化（EM）的音译技术，它是无监督和语言无关。系统学习的模式，从平行语料库超出词汇（OOV）的话，也没有必要训练它音译语料库明确。这种方法是在三个模型的统计机器翻译（SMT），其中包括phrasebased的测试，分层phrasebased和基于因子模型和神经机器翻译的两款车型，其中包括LSTM和变压器模型。

9. Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data [PDF] 返回目录
Ethan Steinberg, Ken Jung, Jason A. Fries, Conor K. Corbin, Stephen R. Pfohl, Nigam H. Shah
Abstract: Widespread adoption of electronic health records (EHRs) has fueled development of clinical outcome models using machine learning. However, patient EHR data are complex, and how to optimally represent them is an open question. This complexity, along with often small training set sizes available to train these clinical outcome models, are two core challenges for training high quality models. In this paper, we demonstrate that learning generic representations from the data of all the patients in the EHR enables better performing prediction models for clinical outcomes, allowing for these challenges to be overcome. We adapt common representation learning techniques used in other domains and find that representations inspired by language models enable a 3.5% mean improvement in AUROC on five clinical outcomes compared to standard baselines, with the average improvement rising to 19% when only a small number of patients are available for training a prediction model for a given clinical outcome.
摘要：广泛采用的电子健康记录（电子病历）具有利用机器学习临床结果模型的燃料的发展。然而，患者的电子病历数据是复杂的，如何最优地表示他们是一个悬而未决的问题。这种复杂性，经常与小的训练集以及尺寸，以训练这些临床结果的模型，是培养高素质模型两个核心挑战。在本文中，我们证明了学习所有的患者在电子病历的数据一般表示为临床结果可以实现更好的进行预测模型，从而不必在克服这些挑战。我们采用通用表示学习其他领域使用的技术，并找到语言模型的启发是表示能够在AUROC五个临床结果3.5％的平均改善比标准的基线，平均提高上升到19％时，只有少数患者可用于训练预测模型对于给定的临床结果。

10. The empirical structure of word frequency distributions [PDF] 返回目录
Michael Ramscar
Abstract: The frequencies at which individual words occur across languages follow power law distributions, a pattern of findings known as Zipf's law. A vast literature argues over whether this serves to optimize the efficiency of human communication, however this claim is necessarily post hoc, and it has been suggested that Zipf's law may in fact describe mixtures of other distributions. From this perspective, recent findings that Sinosphere first (family) names are geometrically distributed are notable, because this is actually consistent with information theoretic predictions regarding optimal coding. First names form natural communicative distributions in most languages, and I show that when analyzed in relation to the communities in which they are used, first name distributions across a diverse set of languages are both geometric and, historically, remarkably similar, with power law distributions only emerging when empirical distributions are aggregated. I then show this pattern of findings replicates in communicative distributions of English nouns and verbs. These results indicate that if lexical distributions support efficient communication, they do so because their functional structures directly satisfy the constraints described by information theory, and not because of Zipf's law. Understanding the function of these information structures is likely to be key to explaining humankind's remarkable communicative capacities.
摘要：在发生个别单词跨语言服从幂律分布的频率，被称为齐普夫定律发现的模式。大量文献论证了，这是否用于优化人力沟通的效率，但这种说法必然是事后，这已经表明齐普夫定律，实际上可能描述的其他分布的混合。从这个角度来看，最近的调查结果Sinosphere第一（家庭）名称几何分布是显着的，因为这是关于最优编码信息理论预测实际上是一致的。名字形成大多数语言自然交际分布，我表明，在关系分析，在一组不同的语言中使用它们的社区，第一个名称发行时都是几何和，从历史上看，非常相似，幂律分布只有当经验分布聚集出现。然后我显示英语名词和动词的交际分布发现重复的这种模式。这些结果表明，如果词汇分布支持有效的沟通，他们这样做是因为他们的功能结构直接满足信息理论中描述的约束，并没有因为齐普夫定律。了解这些信息结构的功能很可能是关键，解释人类的非凡能力，交际。

11. Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses [PDF] 返回目录
Kashyap Coimbatore Murali
Abstract: In this paper, we explore the robustness of the Multi-Task Deep Neural Networks (MT-DNN) against non-targeted adversarial attacks across Natural Language Understanding (NLU) tasks as well as some possible ways to defend against them. Liu et al., have shown that the Multi-Task Deep Neural Network, due to the regularization effect produced when training as a result of its cross task data, is more robust than a vanilla BERT model trained only on one task (1.1%-1.5% absolute difference). We further show that although the MT-DNN has generalized better, making it easily transferable across domains and tasks, it can still be compromised as after only 2 attacks (1-character and 2-character) the accuracy drops by 42.05% and 32.24% for the SNLI and SciTail tasks. Finally, we propose a domain agnostic defense which restores the model's accuracy (36.75% and 25.94% respectively) as opposed to a general-purpose defense or an off-the-shelf spell checker.
摘要：在本文中，我们探索了多任务深层神经网络（MT-DNN）的稳健性对整个自然语言理解（NLU）任务以及一些可能的方式来抵御这些非目标对抗性攻击。 Liu等人，已经表明，多任务深层的神经网络中，由于正规化效应产生当训练作为其横任务数据的结果是，比只在一个任务（1.1％培养了香草BERT模型更健壮 - 1.5％的绝对差）。进一步的研究表明，虽然MT-DNN具有更好的推广，使得它很容易跨域和任务转让的，它仍然可以作出妥协，只有2次攻击（1个字符和2个字符）的准确度42.05％和32.24％，下降后对于SNLI和SciTail任务。最后，我们提出了一个未知的领域国防其恢复模型的精确度（36.75％和25.94分别％），而不是通用的防守还是关闭的，现成的拼写检查器。

12. Detecting New Word Meanings: A Comparison of Word Embedding Models in Spanish [PDF] 返回目录
Andrés Torres-Rivera, Juan-Manuel Torres-Moreno
Abstract: Semantic neologisms (SN) are defined as words that acquire a new word meaning while maintaining their form. Given the nature of this kind of neologisms, the task of identifying these new word meanings is currently performed manually by specialists at observatories of neology. To detect SN in a semi-automatic way, we developed a system that implements a combination of the following strategies: topic modeling, keyword extraction, and word sense disambiguation. The role of topic modeling is to detect the themes that are treated in the input text. Themes within a text give clues about the particular meaning of the words that are used, for example: viral has one meaning in the context of computer science (CS) and another when talking about health. To extract keywords, we used TextRank with POS tag filtering. With this method, we can obtain relevant words that are already part of the Spanish lexicon. We use a deep learning model to determine if a given keyword could have a new meaning. Embeddings that are different from all the known meanings (or topics) indicate that a word might be a valid SN candidate. In this study, we examine the following word embedding models: Word2Vec, Sense2Vec, and FastText. The models were trained with equivalent parameters using Wikipedia in Spanish as corpora. Then we used a list of words and their concordances (obtained from our database of neologisms) to show the different embeddings that each model yields. Finally, we present a comparison of these outcomes with the concordances of each word to show how we can determine if a word could be a valid candidate for SN.
摘要：语义新词（SN）被定义为获得一个新词，同时保持其形式意义的话。鉴于这种新词的性质，目前手动专家在旧词新的天文台进行识别这些新词的意义的任务。为了检测SN以半自动化的方式，我们开发了一个系统，该系统实现了以下策略的组合：主题建模，关键词提取，以及词义消歧。主题建模的作用是检测在输入文本处理的主题。文本中的主题提供有关的被使用，例如词的特殊含义的线索：病毒只有一种含义在计算机科学（CS）和其他的方面讲卫生的时候。要提取的关键字，我们使用TextRank与POS标签过滤。通过这种方法，我们可以得到与已是西班牙词汇的一部分相关的词。我们使用了深刻的学习模式，以确定是否一个给定的关键字可能有新的意义。嵌入物是所有已知的含义（或主题）不同的指示词可能是一个有效的SN候选人。在这项研究中，我们考察以下单词嵌入型号：Word2Vec，Sense2Vec和FastText。模特们在西班牙使用维基百科语料库等效参数训练。然后我们使用的单词的列表和他们的语词（从我们的新词的数据库中获得）来显示不同的嵌入每个型号的产量。最后，我们提出这些结果与每个单词的词汇索引的比较，以显示我们如何确定一个词可能是SN有效候选人。

13. Improving Spoken Language Understanding By Exploiting ASR N-best Hypotheses [PDF] 返回目录
Mingda Li, Weitong Ruan, Xinyue Liu, Luca Soldaini, Wael Hamza, Chengwei Su
Abstract: In a modern spoken language understanding (SLU) system, the natural language understanding (NLU) module takes interpretations of a speech from the automatic speech recognition (ASR) module as the input. The NLU module usually uses the first best interpretation of a given speech in downstream tasks such as domain and intent classification. However, the ASR module might misrecognize some speeches and the first best interpretation could be erroneous and noisy. Solely relying on the first best interpretation could make the performance of downstream tasks non-optimal. To address this issue, we introduce a series of simple yet efficient models for improving the understanding of semantics of the input speeches by collectively exploiting the n-best speech interpretations from the ASR module.
摘要：在现代口语理解（SLU）系统，自然语言理解（NLU）模块需要一个讲话的解释从自动语音识别（ASR）模块的输入。该NLU模块通常使用一个给定的讲话在下游任务，如域名和意图分类第一最好的诠释。然而，ASR模块可能误识别的一些讲话和第一最好的诠释可能是错误的和嘈杂。仅仅依靠第一最好的诠释可以使下游任务的性能最优的。为了解决这个问题，我们引入了一系列简单而有效的模型，通过集体从ASR模块利用N条最佳演讲诠释提高输入演讲的语义的理解。

14. FGN: Fusion Glyph Network for Chinese Named Entity Recognition [PDF] 返回目录
Zhenyu Xuan, Rui Bao, Chuyu Ma, Shengyi Jiang
Abstract: Chinese NER is a challenging task. As pictographs, Chinese characters contain latent glyph information, which is often overlooked. We propose the FGN, Fusion Glyph Network for Chinese NER. This method may offer glyph information for fusion representation learning with BERT. The major innovations of FGN include: (1) a novel CNN structure called CGS-CNN is proposed to capture glyph information from both character graphs and their neighboring graphs. (2) we provide a method with sliding window and Slice-Attention to extract interactive information between BERT representation and glyph representation. Experiments are conducted on four NER datasets, showing that FGN with LSTM-CRF as tagger achieves new state-of-the-arts performance for Chinese NER. Further, more experiments are conducted to investigate the influences of various components and settings in FGN.
摘要：中国NER是一个具有挑战性的任务。作为象形文字，中国字符包含潜在的字形信息，这些信息往往被忽视。我们提出了FGN，融合雕文网中国ER。这种方法可以提供融合表示学习与BERT字形信息。 FGN的主要创新点包括：（1）所谓的CGS-CNN一种新颖的CNN结构，提出从两个字符图和其周边图形捕获字形信息。（2）我们提供具有滑动窗口和切片-注意提取BERT表示和字形表示之间的交互信息的方法。实验是在四个NER数据集进行，显示为恶搞实现国家的最艺术的新的性能为中国NER与LSTM-CRF是FGN。此外，更多的实验以调查FGN的各种组件和设置的影响。

15. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation [PDF] 返回目录
Jian Guan, Fei Huang, Zhihao Zhao, Xiaoyan Zhu, Minlie Huang
Abstract: Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.
摘要：故事的产生，即产生从一个领先的情况下合理的故事，是一个重要而艰巨的任务。尽管在模拟的流畅性和局部连贯，现有的神经语言生成模型的成功（例如，GPT-2）仍从重复，逻辑冲突受到影响，并且缺乏长程的在产生的故事的连贯性。我们推测，这是因为关联相关常识的知识，理解因果关系，并计划实体和事件的适当时间顺序的难度。在本文中，我们设计了常识性的故事，一代知识强化训练前的模式。我们建议利用来自外部的知识基础常识知识产生合理的故事。为了进一步捕获的因果关系，并在合理的故事句子之间的时间相关性，我们采用多任务学习相结合的具有区分客观区分微调过程中真实和虚假的故事。自动和手动评估表明，我们的模型能够产生更合理的故事，比国家的最先进的基线，特别是在逻辑和全球协调方面。

16. Parallel Machine Translation with Disentangled Context Transformer [PDF] 返回目录
Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu
Abstract: State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
摘要：国家的最先进的从左至右和每一步的前提是之前生成的令牌神经机器翻译模型生成翻译。这个生成过程的有序性导致的推论根本延迟，因为我们不能生成并行每个句子多个令牌。我们建议注意的遮蔽基于模型，称为迎刃而解上下文（迪斯科）变压器，能够同时生成给出不同的上下文中的所有令牌。迪斯科变压器训练以预测每个输出令牌给出的其它参考标记的任意子集。我们还开发并行易先推理算法，反复细化每个令牌并行，减少了所需的迭代次数。我们对7点的方向具有不同大小的数据大量的实验证明，如果没有更好的，性能比现有技术中的非自回归机器翻译的状态，而显著减少平均解码时间我们的模型实现了有竞争力的。

17. Robust Speaker Recognition Using Speech Enhancement And Attention Model [PDF] 返回目录
Yanpei Shi, Qiang Huang, Thomas Hain
Abstract: In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.
摘要：本文对说话人识别一个新颖的架构通过级联语音增强和扬声器的处理建议。其目的是为了提高说话人识别性能时，语音信号被噪声干扰。而不是单独处理语音增强和说话人识别，这两个模块是通过使用深层神经网络的联合优化集成到一个框架。此外，为了增加可以有效抵抗噪声，采用多级注意机制，突出显示上下文信息在时间和频域学会了说话者相关的功能。为了评价说话人识别和建议的方法验证性能，我们测试它VoxCeleb1，大多采用标准数据集之一的数据集。此外，我们的建议的方法的稳健性上VoxCeleb1数据还测试由三种类型的干扰，一般噪声，音乐和多路重合，在不同的信噪比（SNR）水平被损坏时。得到的结果表明，该方法使用语音增强和多级车型的关注性能优于两周强的基线没有在我们的实验中最声学条件下使用它们。

18. A Tree Adjoining Grammar Representation for Models Of Stochastic Dynamical Systems [PDF] 返回目录
Dhruv Khandelwal, Maarten Schoukens, Roland Tóth
Abstract: Model structure and complexity selection remains a challenging problem in system identification, especially for parametric non-linear models. Many Evolutionary Algorithm (EA) based methods have been proposed in the literature for estimating model structure and complexity. In most cases, the proposed methods are devised for estimating structure and complexity within a specified model class and hence these methods do not extend to other model structures without significant changes. In this paper, we propose a Tree Adjoining Grammar (TAG) for stochastic parametric models. TAGs can be used to generate models in an EA framework while imposing desirable structural constraints and incorporating prior knowledge. In this paper, we propose a TAG that can systematically generate models ranging from FIRs to polynomial NARMAX models. Furthermore, we demonstrate that TAGs can be easily extended to more general model classes, such as the non-linear Box-Jenkins model class, enabling the realization of flexible and automatic model structure and complexity selection via EA.
摘要：模型的结构和复杂的选择仍然在系统识别一个具有挑战性的问题，尤其是对于参数非线性模型。许多进化算法（EA）为基础的方法在文献中已经提出了用于估计模型结构和复杂性。在大多数情况下，所提出的方法被设计为在指定模型类内估计结构和复杂性，并因此这些方法不延伸到其他模型结构而不显著变化。在本文中，我们提出了一个树连接语法（TAG）为随机参数模型。标签可以用于在EA框架来生成模型，同时施加理想的结构约束和结合先验知识。在本文中，我们提出了一个标记，可以系统地生成模型，从FIR的多项式NARMAX模型。此外，我们证明，标签可以容易地扩展到更一般的模型类，诸如非线性箱Jenkins模型类，可实现灵活和自动模型结构和复杂选择经由EA实现。

19. Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders [PDF] 返回目录
Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, Si Si
Abstract: It has been of increasing interest in the field to develop automatic machineries to facilitate the design process. In this paper, we focus on assisting graphical user interface (UI) layout design, a crucial task in app development. Given a partial layout, which a designer has entered, our model learns to complete the layout by predicting the remaining UI elements with a correct position and dimension as well as the hierarchical structures. Such automation will significantly ease the effort of UI designers and developers. While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements. Particularly, we design two versions of Transformer-based tree decoders: Pointer and Recursive Transformer, and experiment with these models on a public dataset. We also propose several metrics for measuring the accuracy of tree prediction and ground these metrics in the domain of user experience. These contribute a new task and methods to deep learning research.
摘要：一直在该领域越来越多的关注，开发自动机器方便的设计过程。在本文中，我们侧重于帮助图形用户界面（UI）布局设计，在应用发展的重要任务。鉴于部分的布局，设计师已经进入，我们的模型学会通过预测与正确的位置和尺寸，其余的UI元素以及分层结构完成全国布局。这样的自动化将显著缓解UI设计师和开发人员的努力。虽然我们专注于界面布局的预测，我们的模型可以普遍适用于涉及树形结构和二维展示位置等布局预报问题。特别是，我们设计了基于变压器的树解码器的两个版本：指针和递归变压器，并与公共数据集，这些模型进行试验。我们还提出几个指标，用于测量树预测的准确性，并在用户体验领域地这些指标。这些贡献了新的任务和方法，深度学习研究。

20. Teddy: A System for Interactive Review Analysis [PDF] 返回目录
Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çağatay Demiralp, Wang-Chiew Tan
Abstract: Reviews are integral to e-commerce services and products. They contain a wealth of information about the opinions and experiences of users, which can help better understand consumer decisions and improve user experience with products and services. Today, data scientists analyze reviews by developing rules and models to extract, aggregate, and understand information embedded in the review text. However, working with thousands of reviews, which are typically noisy incomplete text, can be daunting without proper tools. Here we first contribute results from an interview study that we conducted with fifteen data scientists who work with review text, providing insights into their practices and challenges. Results suggest data scientists need interactive systems for many review analysis tasks. In response we introduce Teddy, an interactive system that enables data scientists to quickly obtain insights from reviews and improve their extraction and modeling pipelines.
摘要：评论是不可或缺的电子商务服务和产品。它们包含了大量关于用户的意见和经验，这有助于更好地了解消费者的决策和提高产品和服务的用户体验信息。如今，科学家数据分析通过制定规则和模型来提取，汇总评价，并了解嵌入在审查文本信息。然而，成千上万的评论，这是典型的吵不完整的文本工作，可没有合适的工具望而生畏。在这里，我们首先从访谈研究，我们具有十五数据科学家谁的工作与评论文本进行，提供洞察到他们的做法和挑战作出贡献的结果。结果表明数据科学家需要对很多的评论分析任务的交互系统。在回应我们介绍泰迪，一个互动系统，使数据科学家能够迅速从审查获得洞察力和改善他们的提取和建模管道。

21. Modeling Product Search Relevance in e-Commerce [PDF] 返回目录
Rahul Radhakrishnan Iyer, Rohan Kohli, Shrimai Prabhumoye
Abstract: With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments.
摘要：随着电子商务的快速发展，在线产品搜索已经成为一种流行和有效的模式，为客户找到所需的产品和从事网上购物。然而，仍然有客户真正渴望的产品的购买和相关性被提出以响应来自客户的查询产品之间有很大的差距。在本文中，我们提出了预测给定的搜索查询和产品的相关性分值，使用涉及机器学习，自然语言处理和信息检索技术的可靠方式。我们比较传统的信息检索模型如BM25和大狐猴与深度学习模式，如word2vec，sentence2vec和paragraph2vec。我们分享我们的一些见解和研究结果，从我们的实验。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-01-16

目录

摘要