摘要

1. Discrete Word Embedding for Logical Natural Language Understanding [PDF] 返回目录
Masataro Asai, Zilu Tang
Abstract: In this paper, we propose an unsupervised neural model for learning a discrete embedding of words. While being discrete, our embedding supports vector arithmetic operations similar to continuous embeddings by interpreting each word as a set of propositional statements describing a rule. The formulation of our vector arithmetic closely reflects the logical structure originating from the symbolic sequential decision making formalism (classical/STRIPS planning). Contrary to the conventional wisdom that discrete representation cannot perform well due to the lack of ability to capture the uncertainty, our representation is competitive against the continuous representations in several downstream tasks. We demonstrate that our embedding is directly compatible with the symbolic, classical planning solvers by performing a "paraphrasing" task. Due to the discrete/logical decision making in classical algorithms with deterministic (non-probabilistic) completeness, and also because it does not require additional training on the paraphrasing dataset, our system can negatively answer a paraphrasing query (inexistence of solutions), and can answer that only some approximate solutions exist -- A feature that is missing in the recent, huge, purely neural language models such as GPT-3.
摘要：在本文中，我们提出了学习单词的离散嵌入无人监督的神经网络模型。虽然是离散的，我们的嵌入支持向量通过解释每个单词一组描述规则命题陈述的相似，持续不断的嵌入算术运算。我们的矢量运算的配方，密切反映符号顺序决策形式主义逻辑结构始发（经典/ STRIPS规划）。相反，传统的智慧是离散表示不能因缺乏捕捉不确定性的能力表现良好，我们表示反对在几个下游任务的连续交涉竞争力。我们证明了我们的嵌入是通过执行“复述”任务的象征，经典规划求解器直接兼容。由于离散/逻辑决策的经典算法具有确定性（非概率）的完整性，而且还因为它不要求复述数据集额外的培训，我们的系统会负面直译的查询（解决方案的不存在性）回答，可以答案只有一些近似解的存在 - 这是在最近的，巨大的，纯粹是神经语言模型，如GPT-3人失踪的特点。

2. Language Models and Word Sense Disambiguation: An Overview and Analysis [PDF] 返回目录
Daniel Loureiro, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados
Abstract: Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations for encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT performs a decent job in capturing high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data.
摘要：基于变压器的语言模型已经在NLP席卷众多领域。 BERT及其衍生物支配大多数现有的评价基准的，包括那些用于词义消歧（WSD），由于它们在捕捉上下文相关的语义的细微差别的能力。然而，仍然有他们的能力和编码和恢复字感官潜在的局限性知之甚少。在这篇文章中，我们提供了一个深入的定量和相对于词汇歧义著名BERT模型的定性分析。我们的一个分析的主要结论之一是，BERT执行捕捉高层次感的区别一份体面的工作，甚至当的例子数量有限每个字的意义。我们的分析还表明，在某些情况下，语言模型来接近解决在理想条件下粗粒名词消歧在训练数据和计算资源的可用性方面。然而，这种情况很少发生在现实世界中的设置，因此，许多实际的挑战依然存在，甚至在粗粒度设置。我们还进行了两个主要的语言模型根据WSD的策略，即微调和特征提取的深入比较，发现后者的做法是相对于感测偏压更强大，它可以更好地利用有限的可用训练数据。

3. Inno at SemEval-2020 Task 11: Leveraging Pure Transfomer for Multi-Class Propaganda Detection [PDF] 返回目录
Dmitry Grigorev, Vladimir Ivanov
Abstract: The paper presents the solution of team "Inno" to a SEMEVAL 2020 task 11 "Detection of propaganda techniques in news articles". The goal of the second subtask is to classify textual segments that correspond to one of the 18 given propaganda techniques in news articles dataset. We tested a pure Transformer-based model with an optimized learning scheme on the ability to distinguish propaganda techniques between each other. Our model showed 0.6 and 0.58 overall F1 score on validation set and test set accordingly and non-zero F1 score on each class on both sets.
摘要：本文介绍了团队“创新科技”的一个SEMEVAL 2020任务11“在新闻报道的宣传技巧检测”的解决方案。第二子任务的目标是文本段对应的18个给予宣传技巧一个新闻文章的数据集进行分类。我们测试了一个纯粹的基于变压器的模型上分彼此之间的宣传技巧的能力，优化的学习方案。我们的模型显示，上两套每个类上的验证集和测试集相应非零F1得分0.6和0.58总比分F1。

4. Machine learning approach of Japanese composition scoring and writing aided system's design [PDF] 返回目录
Wanhong Huang
Abstract: Automatic scoring system is extremely complex for any language. Because natural language itself is a complex model. When we evaluate articles generated by natural language, we need to view the articles from many dimensions such as word features, grammatical features, semantic features, text structure and so on. Even human beings sometimes can't accurately grade a composition because different people have different opinions about the same article. But a composition scoring system can greatly assist language learners. It can make language leaner improve themselves in the process of output something. Though it is still difficult for machines to directly evaluate a composition at the semantic and pragmatic levels, especially for Japanese, Chinese and other language in high context cultures, we can make machine evaluate a passage in word and grammar levels, which can as an assistance of composition rater or language learner. Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about. In our experiments, we did the follows works: 1) We use word segmentation tools and dictionaries to achieve word segmentation of an article, and extract word features, as well as generate a words' complexity feature of an article. And Bow technique are used to extract the theme features. 2) We designed a Turing-complete automata model and create 300+ automatons for the grammars that appear in the JLPT examination. And extract grammars features by using these automatons. 3) We propose a statistical approach for scoring a specify theme of composition, the final score will depend on all the writings that submitted to the system. 4) We design an grammar hint function for language leaner, so that they can know currently what grammars they can use.
摘要：自动评分系统为任何语言极其复杂。由于自然语言本身是一个复杂的模型。当我们评估通过自然语言生成的文章中，我们需要从许多方面，如Word功能，语法特征，语义特征，篇章结构等查看文章。即使人类有时无法准确等级的组成，因为不同的人对同一篇文章不同的意见。但一个组成评分系统可以极大地帮助学习者。它可以使语言更精简提高自己的输出的东西的过程。虽然仍然难以让机器直接评估的语义和语用层面的组成，特别是对日本，中国和中高语境文化等语言，我们可以让机器评估单词和语法水平的通道，它可以作为一个辅助的组合物或评价者语言学习者。尤其是对于外语学习者，词汇和语法的内容通常是什么，他们更关心的问题。在我们的实验中，我们做了如下工作：1）我们用分词工具和字典，实现物品的自动分词，并提取Word功能，以及产生的一篇文章的话的复杂功能。和弓技术被用来提取主题功能。 2）我们设计了一个图灵完备的自动模式，作为出现在JLPT考试语法创建300+自动机。和提取使用这些自动机文法功能。 3）我们提出了得分的组成的指定主题的统计方法，最终的比分将取决于提交给系统中的所有著作。 4）设计语言更精简的语法提示功能，让他们知道什么当前文法他们可以使用。

5. Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text (Hinglish) [PDF] 返回目录
Gaurav Singh
Abstract: This paper discusses the design of the system used for providing a solution for the problem given at SemEval-2020 Task 9 where sentiment analysis of code-mixed language Hindi and English needed to be performed. This system uses Weka as a tool for providing the classifier for the classification of tweets and python is used for loading the data from the files provided and cleaning it. Only part of the training data was provided to the system for classifying the tweets in the test data set on which evaluation of the system was done. The system performance was assessed using the official competition evaluation metric F1-score. Classifier was trained on two sets of training data which resulted in F1 scores of 0.4972 and 0.5316.
摘要：本文论述了该系统的用于提供在SemEval-2020任务9在需要的地方被执行代码混合语言印地文和英语的情绪分析给出的问题的解决方案的设计。该系统采用了Weka作为对鸣叫和蟒蛇的分类提供了分类的工具，用于从提供的文件中加载数据和清洁它。训练数据中只有一部分提供给系统，用于在其上系统的评价完成的测试数据集的鸣叫分类。该系统的性能是使用正式比赛的评价指标F1-评分来评估。分类进行训练两套训练导致在0.4972和0.5316 F1得分数据。

6. Concept Extraction Using Pointer-Generator Networks [PDF] 返回目录
Alexander Shvets, Leo Wanner
Abstract: Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk-concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network leveraging bidirectional LSTMs and a copy mechanism. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.
摘要：概念提取对于许多下游应用的关键。然而，令人惊奇的是，简单的单令牌/标称块概念对准或字典查找技术如DBpedia中聚光灯仍然为准。我们建议，是基于指针的发电机网络利用双向LSTMs和复制机制的遥远监督的通用开放式面向领域的OOV采掘模型。该模型已培训了专门针对250K维基百科页面这个任务编译一个大的标注语料库，并定期页，其中指针到其他页面被视为地面实况概念进行测试。的实验表明，我们的模型显著优于标准技术，在DBpedia的焦点之上使用时的结果，进一步提高其性能。实验还表明，该模型可以容易地移植到在其上同样实现国家的最先进的性能的其它数据集。

7. Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization [PDF] 返回目录
Byron C. Wallace, Sayantan Saha, Frank Soboczenski, Iain J. Marshall
Abstract: We consider the problem of automatically generating a narrative biomedical evidence summary from multiple trial reports. We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews previously conducted by members of the Cochrane collaboration, using the authors conclusions section of the review abstract as our target. We enlist medical professionals to evaluate generated summaries, and we find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual. We propose new approaches that capitalize on domain-specific models to inform summarization, e.g., by explicitly demarcating snippets of inputs that convey key findings, and emphasizing the reports of large and high-quality trials. We find that these strategies modestly improve the factual accuracy of generated summaries. Finally, we propose a new method for automatically evaluating the factuality of generated narrative evidence syntheses using models that infer the directionality of reported findings.
摘要：我们认为自动生成多个试验报告叙述生物医学证据总结的问题。我们评估从以前由Cochrane协作网的成员进行系统评价相关文章摘要抽象概括现代神经机型，采用的审查抽象为目标的作者的结论部分。我们争取医疗专业人员评估产生的总结，我们发现，现代摘要系统产生一致的流畅和相关提纲，但他们并不总是事实。我们建议，利用特定领域的模型通知汇总，例如，通过明确界定的传达重要发现输入片段，并强调大和高品质的试验报告的新方法。我们发现，这些策略适度改善产生摘要的事实准确性。最后，我们提出了自动评估使用推断报告发现的方向性模型生成的叙事证据合成的真实性的新方法。

8. Extractive Summarizer for Scholarly Articles [PDF] 返回目录
Athar Sefid, Clyde Lee Giles, Prasenjit Mitra
Abstract: We introduce an extractive method that will summarize long scientific papers. Our model uses presentation slides provided by the authors of the papers as the gold summary standard to label the sentences. The sentences are ranked based on their novelty and their importance as estimated by deep neural networks. Our window-based extractive labeling of sentences results in the improvement of at least 4 ROUGE1-Recall points.
摘要：介绍将总结长期的科学论文的提取方法。我们的模型使用由论文的金标准汇总标记句子的作者提供的演示幻灯片。该判决是基于其新颖性和其作为深神经网络估计重要性排序。在至少4 ROUGE1召回点的提高的判决结果，我们基于窗口的采掘标签。

9. The Impact of Indirect Machine Translation on Sentiment Classification [PDF] 返回目录
Alberto Poncelas, Pintu Lohar, Andy Way, James Hadley
Abstract: Sentiment classification has been crucial for many natural language processing (NLP) applications, such as the analysis of movie reviews, tweets, or customer feedback. A sufficiently large amount of data is required to build a robust sentiment classification system. However, such resources are not always available for all domains or for all languages. In this work, we propose employing a machine translation (MT) system to translate customer feedback into another language to investigate in which cases translated sentences can have a positive or negative impact on an automatic sentiment classifier. Furthermore, as performing a direct translation is not always possible, we explore the performance of automatic classifiers on sentences that have been translated using a pivot MT system. We conduct several experiments using the above approaches to analyse the performance of our proposed sentiment classification system and discuss the advantages and drawbacks of classifying translated sentences.
摘要：情感分类一直是许多自然语言处理（NLP）的应用，比如电影评论，微博，或客户反馈的分析是至关重要的。足够大的数据量是必需的，以构建一个健壮的情绪分类系统。然而，这样的资源并不总是适用于所有域或所有语言。在这项工作中，我们建议采用机器翻译（MT）系统，客户的反馈翻译成另一种语言来调查什么情况下翻译的句子可以有一个自动的情感分类正面或负面的影响。此外，由于执行直接的翻译并不总是可能的，我们探索自动分类对已使用枢轴机器翻译系统翻译句子的性能。我们进行了使用上述方法来分析我们提出的情感分类系统的性能，并讨论翻译的句子进行分类的优点和缺点几个实验。

10. A Multitask Deep Learning Approach for User Depression Detection on Sina Weibo [PDF] 返回目录
Yiding Wang, Zhenyi Wang, Chenghao Li, Yilin Zhang, Haizhou Wang
Abstract: In recent years, due to the mental burden of depression, the number of people who endanger their lives has been increasing rapidly. The online social network (OSN) provides researchers with another perspective for detecting individuals suffering from depression. However, existing studies of depression detection based on machine learning still leave relatively low classification performance, suggesting that there is significant improvement potential for improvement in their feature engineering. In this paper, we manually build a large dataset on Sina Weibo (a leading OSN with the largest number of active users in the Chinese community), namely Weibo User Depression Detection Dataset (WU3D). It includes more than 20,000 normal users and more than 10,000 depressed users, both of which are manually labeled and rechecked by professionals. By analyzing the user's text, social behavior, and posted pictures, ten statistical features are concluded and proposed. In the meantime, text-based word features are extracted using the popular pretrained model XLNet. Moreover, a novel deep neural network classification model, i.e. FusionNet (FN), is proposed and simultaneously trained with the above-extracted features, which are seen as multiple classification tasks. The experimental results show that FusionNet achieves the highest F1-Score of 0.9772 on the test dataset. Compared to existing studies, our proposed method has better classification performance and robustness for unbalanced training samples. Our work also provides a new way to detect depression on other OSN platforms.
摘要：近年来，由于抑郁症的精神负担，谁危及他们的生命的人的数量已迅速增加。在线社交网络（OSN）为研究人员提供用于检测人患抑郁症的另一个角度。然而，基于机器学习的抑郁症检测的现有研究还剩下比较低的分类性能，这表明有在他们的功能改善工程改善显著潜力。在本文中，我们手动建立在新浪微博（与活跃用户在中国社会中数量最多的领先OSN），即微博用户按下检测数据集（WU3D）大型数据集。它包括超过20000个正常用户和超过10,000沮丧的用户，这两者都是手动标记的和由专业人员复查。通过分析用户的文本，社会行为，并张贴图片，十统计特征进行了总结，并提出。与此同时，基于文本的Word功能使用的是流行的预训练模型XLNet提取。此外，一个新的深神经网络的分类模型，即FusionNet（FN），提出并与上述提取的特征，这被看作是多个分类任务同时培训。实验结果表明，FusionNet实现了0.9772的测试数据集的最高F1-得分。相比于现有的研究，我们提出的方法对不平衡训练样本更好的分类性能和健壮性。我们的工作也提供了一种新的方法来检测其他OSN平台抑郁症。

11. Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance [PDF] 返回目录
Selim F. Yilmaz, E. Batuhan Kaynak, Aykut Koç, Hamdi Dibeklioğlu, Suleyman S. Kozat
Abstract: We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in 7 of 9 metrics in 3 different languages using a single model compared to the common baselines and the best-performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.
摘要：我们研究跨语言情感分析，吸引显著关注，因为它在各个领域的应用，包括市场研究，政治学和社会科学。特别是，我们引入多标签设置一个情绪分析的框架，因为它遵循的情绪普拉奇克轮。我们引入新的动态加权方法，它从平衡各类培训期间的贡献，不像以前的静态加权方法，基于他们的阶级频率分配不改变权重。此外，我们调整有利于从单标签物体识别文学更难情况下，我们的多标签设置焦点损失。此外，我们推导出要选择最佳的类特定的阈值，最大限度地提高线性时间复杂度的宏观F1值的方法。通过广泛的组实验中，我们表明，相比于常用的基线和在SemEval竞争效果最佳的方法我们的方法获得使用单个模型的状态的最先进的性能在9 7的度量在3种不同的语言。我们公开分享我们的代码，我们的模型，它可以在100种语言进行情感分析，便于进一步的研究。

12. Item Tagging for Information Retrieval: A Tripartite Graph Neural Network based Approach [PDF] 返回目录
Kelong Mao, Xi Xiao, Jieming Zhu, Biao Lu, Ruiming Tang, Xiuqiang He
Abstract: Tagging has been recognized as a successful practice to boost relevance matching for information retrieval (IR), especially when items lack rich textual descriptions. A lot of research has been done for either multi-label text categorization or image annotation. However, there is a lack of published work that targets at item tagging specifically for IR. Directly applying a traditional multi-label classification model for item tagging is sub-optimal, due to the ignorance of unique characteristics in IR. In this work, we propose to formulate item tagging as a link prediction problem between item nodes and tag nodes. To enrich the representation of items, we leverage the query logs available in IR tasks, and construct a query-item-tag tripartite graph. This formulation results in a TagGNN model that utilizes heterogeneous graph neural networks with multiple types of nodes and edges. Different from previous research, we also optimize both full tag prediction and partial tag completion cases in a unified framework via a primary-dual loss mechanism. Experimental results on both open and industrial datasets show that our TagGNN approach outperforms the state-of-the-art multi-label classification approaches.
摘要：标记已被确认为一个成功的实践推动相关匹配信息检索（IR），特别是当项目缺乏丰富的文字描述。大量的研究已经为任一多标签文本分类或图像标注已经完成。然而，缺乏发表的作品，在项目目标IR专门标注。直接申请单品级标签传统的多标签分类模型是次优的，由于在IR独有特色的无知。在这项工作中，我们建议制定品级标签的项目节点和标签节点之间的链路预测问题。为了丰富项目的代表，我们利用在IR任务提供的查询日志，并构建一个查询项标签三方图。该制剂的结果在利用异构图表神经网络具有多个类型的节点和边的TagGNN模型。从以往的研究不同的是，我们还通过初级双重损失机制优化在一个统一的框架，既充分标签预测和部分标记完成情况。在开放式和工业数据集的实验结果表明，我们的TagGNN方法优于国家的最先进的多标签分类方法。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-08-27

目录

摘要