摘要

1. COVID-19 Emotion Monitoring as a Tool to Increase Preparedness for Disease Outbreaks in Developing Regions [PDF] 返回目录
Santiago Cortes, Juan Muñoz, David Betancur, Mauricio Toro
Abstract: The COVID-19 pandemic brought many challenges, from hospital-occupation management to lock-down mental-health repercussions such as anxiety or depression. In this work, we present a solution for the later problem by developing a Twitter emotion-monitor system based on a state-of-the-art natural-language processing model. The system monitors six different emotions on accounts in cities, as well as politicians and health-authorities Twitter accounts. With an anonymous use of the emotion monitor, health authorities and private health-insurance companies can develop strategies to tackle problems such as suicide and clinical depression. The model chosen for such a task is a Bidirectional-Encoder Representations from Transformers (BERT) pre-trained on a Spanish corpus (BETO). The model performed well on a validation dataset. The system is deployed online as part of a web application for simulation and data analysis of COVID-19, in Colombia, available at this https URL.
摘要：COVID-19大流行带来了许多挑战，从医院的工作管理到控制精神健康对焦虑或抑郁之类的影响。在这项工作中，我们通过开发基于最新自然语言处理模型的Twitter情绪监控器系统，为以后的问题提供解决方案。该系统监视城市帐户以及政客和卫生当局Twitter帐户的六种不同情绪。通过匿名使用情绪监控器，卫生当局和私人健康保险公司可以制定战略来解决自杀和临床抑郁症等问题。为此任务选择的模型是在西班牙语料库（BETO）上经过预训练的来自变压器的双向编码器表示（BERT）。该模型在验证数据集上表现良好。该系统作为Web应用程序的一部分在线部署，该Web应用程序用于在哥伦比亚的COVID-19进行仿真和数据分析，可从此https URL获得。

2. Applying wav2vec2.0 to Speech Recognition in various low-resource languages [PDF] 返回目录
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu
Abstract: Several domains own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are pre-trained on large amounts of unlabelled data by self-supervision and can be effectively applied for downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on Librispeech corpus. However, this model has not been tested on real spoken scenarios and languages other than English. To verify its universality over languages, we apply the released pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20\% relative improvements in six languages compared with previous works. Among these languages, English improves up to 52.4\%. Moreover, using coarse-grained modeling units, such as subword and character, achieves better results than the letter.
摘要：多个领域拥有相应的广泛使用的特征提取器，例如ResNet，BERT和GPT-x。这些模型通过自我监督对大量未标记数据进行了预训练，可以有效地应用于下游任务。在语音领域，wav2vec2.0开始在Librispeech语料库上展示其强大的表示能力和超低资源语音识别的可行性。但是，该模型尚未在英语以外的真实口语场景和语言上进行过测试。为了验证其在各种语言中的通用性，我们使用已发布的预训练模型来解决各种口语中的低资源语音识别任务。与以前的作品相比，我们在六种语言中实现了20％以上的相对改进。在这些语言中，英语提高了52.4％。此外，使用粗粒度建模单元（例如子词和字符）可获得比字母更好的结果。

3. Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020 [PDF] 返回目录
Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
Abstract: Recent advancements in Neural Machine Translation (NMT) models have proved to produce a state of the art results on machine translation for low resource Indian languages. This paper describes the neural machine translation systems for the English-Hindi language presented in AdapMT Shared Task ICON 2020. The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus. We evaluated the effectiveness of two popular NMT models i.e, LSTM, and Transformer architectures for the English-Hindi machine translation task based on BLEU scores. We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset. The fine-tuning and mixed-domain data approaches are used for domain adaptation. Our team was ranked first in the chemistry and general domain En-Hi translation task and second in the AI domain En-Hi translation task.
摘要：事实证明，神经机器翻译（NMT）模型的最新发展为低资源印度语言的机器翻译产生了最新的成果。本文介绍了AdapMT共享任务ICON 2020中介绍的用于英语-印地语的神经机器翻译系统。该共享任务旨在使用小型的In-Stat平台为特定领域（例如人工智能（AI）和化学）建立印度语言的翻译系统。域并行语料库。我们根据BLEU分数评估了两种流行的NMT模型（即LSTM和Transformer体系结构）对于英语-印地语机器翻译任务的有效性。我们主要使用域外数据训练这些模型，并根据域内数据集的特征采用简单的域适应技术。微调和混合域数据方法用于域自适应。我们的团队在化学和通用领域的En-Hi翻译任务中排名第一，在AI领域的En-Hi翻译任务中排名第二。

4. Uncertainty and Surprisal Jointly Deliver the Punchline: Exploiting Incongruity-Based Features for Humor Recognition [PDF] 返回目录
Yubo Xie, Junze Li, Pearl Pu
Abstract: Humor recognition has been widely studied as a text classification problem using data-driven approaches. However, most existing work does not examine the actual joke mechanism to understand humor. We break down any joke into two distinct components: the set-up and the punchline, and further explore the special relationship between them. Inspired by the incongruity theory of humor, we model the set-up as the part developing semantic uncertainty, and the punchline disrupting audience expectations. With increasingly powerful language models, we were able to feed the set-up along with the punchline into the GPT-2 language model, and calculate the uncertainty and surprisal values of the jokes. By conducting experiments on the SemEval 2021 Task 7 dataset, we found that these two features have better capabilities of telling jokes from non-jokes, compared with existing baselines.
摘要：幽默识别已被广泛研究为一种使用数据驱动方法的文本分类问题。但是，大多数现有工作并未研究了解幽默的实际笑话机制。我们将笑话分解为两个截然不同的部分：设置和重点，并进一步探讨它们之间的特殊关系。受到不协调幽默理论的启发，我们将设置建模为产生语义不确定性的部分，并突显了破坏观众期望的重点。随着功能越来越强大的语言模型的建立，我们能够将设置以及重要信息输入到GPT-2语言模型中，并计算笑话的不确定性和意外值。通过对SemEval 2021 Task 7数据集进行实验，我们发现与现有基准相比，这两个功能具有更好的分辨非笑话的能力。

5. Pre-Training a Language Model Without Human Language [PDF] 返回目录
Cheng-Han Chiang, Hung-yi Lee
Abstract: In this paper, we study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. To this end, we pre-train different transformer-based masked language models on several corpora with certain features, and we fine-tune those language models on GLUE benchmarks. We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks. Our results also show that pre-training on structured data does not always make the model acquire ability that can be transferred to natural language downstream tasks. To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
摘要：在本文中，我们研究了预训练数据的内在性质如何影响微调的下游性能。为此，我们在具有特定功能的多个语料库上对不同的基于转换器的掩盖语言模型进行了预训练，并在GLUE基准上对这些语言模型进行了微调。我们发现，在非结构化数据上进行预训练的模型击败了直接在下游任务上从头训练的模型。我们的结果还表明，对结构化数据的预训练并不总是使模型获得可以转移到自然语言下游任务的能力。令我们惊讶的是，我们发现对某些非人类语言数据进行的预训练可使GLUE性能接近于对另一种非英语语言进行的预训练性能。

6. Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation [PDF] 返回目录
Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin
Abstract: Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.
摘要：具有良好医学知识的人类医生仅需与患者就症状进行几次交谈即可诊断出疾病。相反，现有的以知识为基础的对话系统经常需要大量的对话实例来学习，因为它们无法捕获不同疾病之间的相关性，而忽视了它们之间共享的诊断经验。为了解决这个问题，我们提出了一种更自然，更实用的范例，即低资源医学对话的产生，它可以将诊断经验从源疾病转移到具有少量适应性数据的目标人群。它利用常识知识图来表征先前的疾病-症状关系。此外，我们开发了一个图进化元学习（GEML）框架，该框架学习演化常识图以推理新疾病中的疾病-症状相关性，从而有效地缓解了大量对话的需求。更重要的是，通过动态发展疾病症状图，GEML还可以很好地解决现实世界中的挑战，即每种疾病的疾病症状相关性可能随着更多的诊断病例而发生变化或演变。在CMDD数据集和我们新收集的Chunyu数据集上的大量实验结果证明了我们的方法优于最新方法的优越性。此外，我们的GEML可以在线方式生成丰富的对对话敏感的知识图，这可以使基于知识图的其他任务受益。

7. g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection [PDF] 返回目录
Anna Glazkova, Maksim Glazkov, Timofey Trifonov
Abstract: The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic. In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English. In particular, we propose our approach using the transformer-based ensemble based on COVID-Twitter-BERT (CT-BERT). We describe the models used, the ways of text preprocessing and adding extra data. As a result, our best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams in total.
摘要：COVID-19大流行对人类生活的各个领域产生了巨大影响。因此，在社交媒体上正积极讨论冠状病毒大流行及其后果。但是，并非所有社交媒体帖子都是真实的。他们中的许多人散布虚假新闻，引起读者恐慌，误导人们，从而加剧了大流行的影响。在本文中，我们将结果展示在Constraint @ AAAI2021共享任务：COVID-19假新闻英语检测中。特别是，我们使用基于COVID-Twitter-BERT（CT-BERT）的基于变压器的集成提出了我们的方法。我们描述了使用的模型，文本预处理和添加额外数据的方式。结果，我们的最佳模型在此共享任务的测试集（排行榜中的第一名）上获得了98.69的加权F1得分，总共吸引了166个提交的团队。

8. A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews [PDF] 返回目录
Kai Chen, Meng Niu, Qingcai Chen
Abstract: We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
摘要：我们解决了基于文本特征自动评估候选人能力的任务，该任务来自异步视频工作面试（AVI）中的自动语音识别（ASR）转录。关键的挑战是如何构建问题和答案之间的依赖关系，以及如何对每个问题-答案（QA）对进行语义级别的交互。但是，最近AVI中的大多数研究都集中在如何更好地表示问题和答案上，而忽略了依赖性信息以及它们之间的相互作用，这对于质量保证评估至关重要。在这项工作中，我们提出了一种层次推理图神经网络（HRGNN），用于自动评估问题-答案对。具体来说，我们构建了一个句子级关系图神经网络，以捕获问题与答案之间或之间的句子依存关系信息。基于这些图，我们采用语义级别的推理图注意网络来建模当前QA会话的交互状态。最后，我们提出门控循环单元编码器，以表示最终预测的时间问题-答案对。在CHNAT（真实数据集）上进行的经验结果验证了我们提出的模型明显优于基于文本匹配的基准模型。消融研究和10个随机种子的实验结果也显示了我们模型的有效性和稳定性。

9. Learning to Retrieve Entity-Aware Knowledge and Generate Responses with Copy Mechanism for Task-Oriented Dialogue Systems [PDF] 返回目录
Chao-Hong Tan, Xiaoyu Yang, Zi'ou Zheng, Tianda Li, Yufei Feng, Jia-Chen Gu, Quan Liu, Dan Liu, Zhen-Hua Ling, Xiaodan Zhu
Abstract: Task-oriented conversational modeling with unstructured knowledge access, as track 1 of the 9th Dialogue System Technology Challenges (DSTC 9), requests to build a system to generate response given dialogue history and knowledge access. This challenge can be separated into three subtasks, (1) knowledge-seeking turn detection, (2) knowledge selection, and (3) knowledge-grounded response generation. We use pre-trained language models, ELECTRA and RoBERTa, as our base encoder for different subtasks. For subtask 1 and 2, the coarse-grained information like domain and entity are used to enhance knowledge usage. For subtask 3, we use a latent variable to encode dialog history and selected knowledge better and generate responses combined with copy mechanism. Meanwhile, some useful post-processing strategies are performed on the model's final output to make further knowledge usage in the generation task. As shown in released evaluation results, our proposed system ranks second under objective metrics and ranks fourth under human metrics.
摘要：作为第九个“对话系统技术挑战”（DSTC 9）的第1条，具有非结构化知识访问的面向任务的会话建模要求构建一个系统，以在给定对话历史和知识访问的情况下生成响应。这个挑战可以分为三个子任务：（1）寻求知识的转弯检测；（2）知识选择；（3）知识扎根的响应生成。我们使用预先训练的语言模型ELECTRA和RoBERTa作为我们用于不同子任务的基本编码器。对于子任务1和2，使用诸如域和实体之类的粗粒度信息来增强知识的使用。对于子任务3，我们使用潜在变量来更好地编码对话历史记录和所选知识，并生成与复制机制结合的响应。同时，对模型的最终输出执行一些有用的后处理策略，以在生成任务中进一步使用知识。如发布的评估结果所示，我们提出的系统在客观指标下排名第二，在人工指标下排名第四。

10. Few-Shot Text Generation with Pattern-Exploiting Training [PDF] 返回目录
Timo Schick, Hinrich Schütze
Abstract: Providing pretrained language models with simple task descriptions or prompts in natural language yields impressive few-shot results for a wide range of text classification tasks when combined with gradient-based learning from examples. In this paper, we show that the underlying idea can also be applied to text generation tasks: We adapt Pattern-Exploiting Training (PET), a recently proposed few-shot approach, for finetuning generative language models on text generation tasks. On several text summarization and headline generation datasets, our proposed variant of PET gives consistent improvements over a strong baseline in few-shot settings.
摘要：结合示例中基于梯度的学习，为预训练的语言模型提供简单的任务描述或自然语言提示，可为多种文本分类任务提供令人印象深刻的少量结果。在本文中，我们证明了该基本思想也可以应用于文本生成任务：我们采用了模式开发训练（PET）（一种最近提出的少拍方法），用于对文本生成任务上的生成语言模型进行微调。在几个文本摘要和标题生成数据集上，我们提出的PET变体在几次拍摄设置中均能在强基线之上提供一致的改进。

11. Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition [PDF] 返回目录
Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin
Abstract: Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and thus a failure of multilingual meta-learning ASR (MML-ASR). In this work, we solve this problem by developing a novel adversarial meta sampling (AMS) approach to improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language. Specifically, for each source language, if the query loss is large, it means that its tasks are not well sampled to train ASR model in terms of its quantity and difficulty and thus should be sampled more frequently for extra learning. Inspired by this fact, we feed the historical task query loss of all source language domain into a network to learn a task sampling policy for adversarially increasing the current query loss of MML-ASR. Thus, the learnt task sampling policy can master the learning situation of each language and thus predicts good task sampling probability for each language for more effective learning. Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches. Our codes are available at: this https URL.
摘要：由于资源匮乏的目标语言数据无法很好地训练ASR模型，因此资源匮乏的自动语音识别（ASR）具有挑战性。为了解决此问题，元学习将每种源语言的ASR公式化为许多小的ASR任务，并且元学习对来自不同源语言的所有任务进行模型初始化，以访问对看不见的目标语言的快速适应。但是，对于不同的源语言，其数量和难度因数据规模不同和语音系统的差异而差异很大，从而导致任务数量和任务难度失衡问题，从而导致多语言元学习ASR（MML-ASR）失败）。在这项工作中，我们通过开发一种新颖的对抗性元采样（AMS）方法来改善MML-ASR，从而解决了这一问题。在MML-ASR中采样任务时，AMS自适应地确定每种源语言的任务采样概率。具体来说，对于每种源语言，如果查询损失很大，则意味着就其数量和难度而言，没有很好地采样其任务以训练ASR模型，因此应更频繁地采样以进行额外的学习。受这一事实的启发，我们将所有源语言域的历史任务查询丢失输入网络，以学习任务采样策略，以对抗性地增加当前MML-ASR的查询丢失。因此，学习的任务采样策略可以掌握每种语言的学习情况，从而为每种语言预测良好的任务采样概率，从而实现更有效的学习。最后，在两个多语言数据集上的实验结果表明，将我们的AMS应用于MML-ASR时，性能得到了显着改善，并且还证明了AMS在其他低资源语音任务和传递学习型ASR方法中的适用性。我们的代码可在以下网址获得：https URL。

12. Undivided Attention: Are Intermediate Layers Necessary for BERT? [PDF] 返回目录
Sharath Nittur Sridhar, Anthony Sarah
Abstract: In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based architectures have a self-attention block followed by a block of intermediate layers as the basic building component. However, a strong justification for the inclusion of these intermediate layers remains missing in the literature. In this work we investigate the importance of intermediate layers on the overall network performance of downstream tasks. We show that reducing the number of intermediate layers and modifying the architecture for BERT-Base results in minimal loss in fine-tuning accuracy for downstream tasks while decreasing the number of parameters and training time of the model. Additionally, we use the central kernel alignment (CKA) similarity metric and probing classifiers to demonstrate that removing intermediate layers has little impact on the learned self-attention representations.
摘要：近年来，基于BERT的模型在解决各种自然语言处理（NLP）任务（例如阅读理解，自然语言推论，情感分析等）方面非常成功。所有基于BERT的体系结构都具有自我关注的能力块，然后是中间层的块作为基本建筑组件。然而，在文献中仍然缺少关于包含这些中间层的有力论据。在这项工作中，我们调查了中间层对下游任务的整体网络性能的重要性。我们表明，减少中间层的数量并修改BERT-Base的体系结构可最大程度地减少下游任务的微调精度，同时减少参数的数量和模型的训练时间。此外，我们使用中央核比对（CKA）相似性度量标准和探测分类器来证明删除中间层对学习到的自我注意表示影响很小。

13. Recognizing Emotion Cause in Conversations [PDF] 返回目录
Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Deepanway Ghosal, Rishabh Bhardwaj, Samson Yu Bai Jian, Romila Ghosh, Niyati Chhaya, Alexander Gelbukh, Rada Mihalcea
Abstract: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NLP. Advances in this area hold the potential to improve interpretability and performance in affect-based models. Identifying emotion causes at the utterance level in conversations is particularly challenging due to the intermingling dynamic among the interlocutors. To this end, we introduce the task of recognizing emotion cause in conversations with an accompanying dataset named RECCON. Furthermore, we define different cause types based on the source of the causes and establish strong transformer-based baselines to address two different sub-tasks of RECCON: 1) Causal Span Extraction and 2) Causal Emotion Entailment. The dataset is available at this https URL.
摘要：认识文本情感背后的原因是自然语言处理研究的一个基本但尚未探索的领域。该领域的进展具有改进基于情感的模型的可解释性和性能的潜力。由于对话者之间的交织动态，因此在对话的话语级别识别情感原因尤其具有挑战性。为此，我们介绍了在与名为RECCON的随附数据集的对话中识别情感原因的任务。此外，我们基于原因的来源来定义不同的原因类型，并建立基于变压器的强大基线，以解决RECCON的两个不同子任务：1）因果跨度提取和2）因果情感蕴涵。数据集可从此https URL获得。

14. Improved Biomedical Word Embeddings in the Transformer Era [PDF] 返回目录
jiho Noh, Ramakanth Kavuluru
Abstract: In this paper, we jointly learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information manifesting in co-occurring Medical Subject Heading (MeSH) concepts in biomedical citations. This fine-tuning is accomplished with the BERT transformer architecture in the two-sentence input mode with a classification objective that captures MeSH pair co-occurrence. In essence, we repurpose a transformer architecture (typically used to generate dynamic embeddings) to improve static embeddings using concept correlations. We conduct evaluations of these tuned static embeddings using multiple datasets for word and concept relatedness developed by previous efforts. Without selectively culling concepts and terms (as was pursued by previous efforts), we believe we offer the most exhaustive evaluation of static embeddings to date with clear performance improvements across the board. We provide our embeddings for public use for any downstream application or research endeavors: this https URL
摘要：在本文中，我们首先使用skip-gram方法共同学习单词和概念嵌入，然后通过在生物医学引用中同时出现的医学主题词（MeSH）概念中体现的相关信息进一步对其进行微调。这种微调是通过在两句输入模式下使用BERT变压器架构完成的，其分类目标可捕获MeSH对共现。本质上，我们重新设计了变压器体系结构（通常用于生成动态嵌入），以使用概念相关性来改善静态嵌入。我们使用以前的工作开发的单词和概念相关性的多个数据集，对这些调整好的静态嵌入进行了评估。如果没有有选择地剔除概念和术语（如先前的努力所追求的那样），我们相信我们可以提供迄今为止最详尽的静态嵌入评估，并全面改善性能。我们为所有下游应用程序或研究活动提供嵌入供公众使用：此https URL

15. Semi-Supervised Disentangled Framework for Transferable Named Entity Recognition [PDF] 返回目录
Zhifeng Hao, Di Lv, Zijian Li, Ruichu Cai, Wen Wen, Boyan Xu
Abstract: Named entity recognition (NER) for identifying proper nouns in unstructured text is one of the most important and fundamental tasks in natural language processing. However, despite the widespread use of NER models, they still require a large-scale labeled data set, which incurs a heavy burden due to manual annotation. Domain adaptation is one of the most promising solutions to this problem, where rich labeled data from the relevant source domain are utilized to strengthen the generalizability of a model based on the target domain. However, the mainstream cross-domain NER models are still affected by the following two challenges (1) Extracting domain-invariant information such as syntactic information for cross-domain transfer. (2) Integrating domain-specific information such as semantic information into the model to improve the performance of NER. In this study, we present a semi-supervised framework for transferable NER, which disentangles the domain-invariant latent variables and domain-specific latent variables. In the proposed framework, the domain-specific information is integrated with the domain-specific latent variables by using a domain predictor. The domain-specific and domain-invariant latent variables are disentangled using three mutual information regularization terms, i.e., maximizing the mutual information between the domain-specific latent variables and the original embedding, maximizing the mutual information between the domain-invariant latent variables and the original embedding, and minimizing the mutual information between the domain-specific and domain-invariant latent variables. Extensive experiments demonstrated that our model can obtain state-of-the-art performance with cross-domain and cross-lingual NER benchmark data sets.
摘要：用于识别非结构化文本中专有名词的命名实体识别（NER）是自然语言处理中最重要和最基本的任务之一。但是，尽管NER模型得到了广泛的使用，但是它们仍然需要大规模的标记数据集，这会由于手动注释而带来沉重的负担。域适应是解决该问题的最有希望的解决方案之一，其中利用来自相关源域的丰富标签数据来增强基于目标域的模型的通用性。但是，主流的跨域NER模型仍然受到以下两个挑战的影响：（1）提取域不变信息，例如用于跨域传输的语法信息。（2）将语义信息等领域特定信息集成到模型中，以提高NER的性能。在这项研究中，我们提出了可转移的NER的半监督框架，该框架解开了领域不变潜变量和领域特定潜变量。在提出的框架中，通过使用域预测器将特定于域的信息与特定于域的潜在变量集成在一起。使用三个互信息正则项对特定于域和不变领域的潜在变量进行纠缠，即最大化特定于领域的潜在变量和原始嵌入之间的互信息，最大化不变领域的潜在变量与变量之间的互信息。原始嵌入，并最大程度地减少了特定于领域的和不变的潜在变量之间的相互信息。大量的实验表明，我们的模型可以使用跨域和跨语言的NER基准数据集获得最新的性能。

16. Acronym Identification and Disambiguation shared tasksfor Scientific Document Understanding [PDF] 返回目录
Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu Nguyen, Walter Chang, Leo Anthony Celi
Abstract: Acronyms are the short forms of longer phrases and they are frequently used in writing, especially scholarly writing, to save space and facilitate the communication of information. As such, every text understanding tool should be capable of recognizing acronyms in text (i.e., acronym identification) and also finding their correct meaning (i.e., acronym disambiguation). As most of the prior works on these tasks are restricted to the biomedical domain and use unsupervised methods or models trained on limited datasets, they fail to perform well for scientific document understanding. To push forward research in this direction, we have organized two shared task for acronym identification and acronym disambiguation in scientific documents, named AI@SDU and AD@SDU, respectively. The two shared tasks have attracted 52 and 43 participants, respectively. While the submitted systems make substantial improvements compared to the existing baselines, there are still far from the human-level performance. This paper reviews the two shared tasks and the prominent participating systems for each of them.
摘要：首字母缩略词是较长短语的缩写形式，在写作中尤其是学术写作中经常使用，以节省空间并促进信息交流。这样，每个文本理解工具都应该能够识别文本中的首字母缩写词（即首字母缩写词标识），并且还能够找到其正确含义（即首字母缩写词歧义消除）。由于有关这些任务的大多数先前工作都局限于生物医学领域，并使用在有限数据集上训练的无监督方法或模型，因此它们无法很好地完成科学文档的理解。为了推动这一方向的研究，我们组织了两个共享的任务，分别用于科学文档中的首字母缩写词识别和首字母缩写词歧义消除，分别名为AI @ SDU和AD @ SDU。两项共同任务分别吸引了52和43名参与者。尽管与现有基准相比，提交的系统有了实质性的改进，但与人员水平的性能仍有很大差距。本文回顾了两个共享任务以及每个任务的著名参与系统。

17. SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction [PDF] 返回目录
Thomas van Dongen, Gideon Maillette de Buy Wenniger, Lambert Schomaker
Abstract: Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction.
摘要：预测学术文献的引用次数是学术文献处理中的一项即将完成的任务。除了这些信息的内在优点外，它还可以作为质量的不完美替代而得到更广泛的应用，其优点是可以廉价地获取大量学术文献。先前的工作是使用相对较小的训练数据集或较大的数据集来处理引文预测的数量，但输入文本较短，不完整。在这项工作中，我们利用开放访问ACL Anthology集合与Semantic Scholar文献计量数据库相结合，创建具有相关引用信息的大量学术文档，并提出了一种新的引用预测模型SChuBERT。在我们的实验中，我们将SChuBERT与几种最新的引文预测模型进行了比较，并显示出它大大优于以前的方法。我们还展示了使用更多的训练数据和更长的输入来预测引用次数的优点。

18. Subword Sampling for Low Resource Word Alignment [PDF] 返回目录
Ehsaneddin Asgari, Masoud Jalili Sabet, Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze
Abstract: Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sentences when dealing with low-resource languages failing the existing established IBM models. In this paper, we propose subword sampling-based alignment of text units. This method's hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment. For certain languages for which gold-standard alignments exist, we propose an iterative Bayesian optimization framework to optimize selecting possible subwords from the space of possible subword representations of the source and target sentences. We show that the subword sampling method consistently outperforms word-level alignment on six language pairs: English-German, English-French, English-Romanian, English-Persian, English-Hindi, and English-Inuktitut. In addition, we show that the hyperparameters learned for certain language pairs can be applied to other languages at no supervision and consistently improve the alignment results. We observe that using $5K$ parallel sentences together with our proposed subword sampling approach, we obtain similar F1 scores to the use of $100K$'s of parallel sentences in existing word-level fast-align/eflomal alignment methods.
摘要：注释投影是NLP中的一个重要领域，可以极大地帮助创建资源匮乏的语言的语言资源。字对齐在此设置中起关键作用。但是，大多数现有的单词对齐方法都是为机器翻译中的高资源设置而设计的，在该机器翻译中，有数百万个并行句子可用。当使用低资源语言处理现有的现有IBM模型失败时，该数量减少到数千句。在本文中，我们提出了基于子词采样的文本单元对齐方式。该方法的假设是，某些语言对的不同粒度文本的聚合可以帮助进行单词级对齐。对于存在黄金标准比对的某些语言，我们提出了一种迭代贝叶斯优化框架，以从源句子和目标句子的可能子词表示空间中优化选择可能的子词。我们显示，子词采样方法在六种语言对上始终优于词级对齐：英语-德语，英语-法语，英语-罗马尼亚语，英语-波斯语，英语-印地语和英语-英语。此外，我们表明，从某些语言对中学到的超参数可以在没有监督的情况下应用于其他语言，并不断提高对齐效果。我们观察到，使用$ 5K $并行句子以及我们提出的子词采样方法，我们获得的F1分数与现有单词级快速对齐/词义对齐方法中$ 100K $的并行句子的使用类似。

19. A Distributional Approach to Controlled Text Generation [PDF] 返回目录
Muhammad Khalifa, Hady Elsahar, Marc Dymetman
Abstract: We propose a Distributional Approach to address Controlled Text Generation from pre-trained Language Models (LMs). This view permits to define, in a single formal framework, "pointwise" and "distributional" constraints over the target LM -- to our knowledge, this is the first approach with such generality -- while minimizing KL divergence with the initial LM distribution. The optimal target distribution is then uniquely determined as an explicit EBM (Energy-Based Model) representation. From that optimal representation we then train the target controlled autoregressive LM through an adaptive distributional variant of Policy Gradient. We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM (GPT-2). We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study we show the effectiveness of our adaptive technique for obtaining faster convergence.
摘要：我们提出了一种分布式方法来解决从预训练语言模型（LM）生成受控文本的问题。这种观点允许在单个正式框架中定义目标LM的“逐点”约束和“分布”约束-据我们所知，这是具有这种通用性的第一种方法-同时将初始LM分布的KL差异最小化。然后，将最佳目标分布唯一确定为明确的EBM（基于能量的模型）表示。然后从该最佳表示中，通过策略梯度的自适应分布变量训练目标受控自回归LM。我们针对点约束进行了第一组实验，显示了我们的方法在一组基准上的优势，即获得了与初始LM（GPT-2）不同的受控LM平衡约束满足。然后，我们在分布约束条件下进行了实验，这是我们方法的独特功能，证明了其作为解决语言模型偏差问题的潜力。通过消融研究，我们证明了自适应技术对于获得更快收敛的有效性。

20. BERTChem-DDI : Improved Drug-Drug Interaction Prediction from text using Chemical Structure Information [PDF] 返回目录
Ishani Mondal
Abstract: Traditional biomedical version of embeddings obtained from pre-trained language models have recently shown state-of-the-art results for relation extraction (RE) tasks in the medical domain. In this paper, we explore how to incorporate domain knowledge, available in the form of molecular structure of drugs, for predicting Drug-Drug Interaction from textual corpus. We propose a method, BERTChem-DDI, to efficiently combine drug embeddings obtained from the rich chemical structure of drugs along with off-the-shelf domain-specific BioBERT embedding-based RE architecture. Experiments conducted on the DDIExtraction 2013 corpus clearly indicate that this strategy improves other strong baselines architectures by 3.4\% macro F1-score.
摘要：从预先训练的语言模型中获得的传统生物医学嵌入方法最近在医学领域显示出用于关系提取（RE）任务的最新结果。在本文中，我们探索了如何结合领域知识（以药物分子结构形式提供），以根据文本语料库预测药物相互作用。我们提出一种方法BERTChem-DDI，以有效地结合从药物的丰富化学结构获得的药物包埋以及基于特定领域的BioBERT嵌入的RE体系结构。在DDIExtraction 2013语料库上进行的实验清楚地表明，此策略将宏F1分数提高了3.4％，改善了其他强大的基线体系结构。

21. Event-Driven Query Expansion [PDF] 返回目录
Guy D. Rosin, Ido Guy, Kira Radinsky
Abstract: A significant number of event-related queries are issued in Web search. In this paper, we seek to improve retrieval performance by leveraging events and specifically target the classic task of query expansion. We propose a method to expand an event-related query by first detecting the events related to it. Then, we derive the candidates for expansion as terms semantically related to both the query and the events. To identify the candidates, we utilize a novel mechanism to simultaneously embed words and events in the same vector space. We show that our proposed method of leveraging events improves query expansion performance significantly compared with state-of-the-art methods on various newswire TREC datasets.
摘要：Web搜索中发布了大量与事件相关的查询。在本文中，我们力求通过利用事件来提高检索性能，并专门针对经典的查询扩展任务。我们提出了一种通过首先检测与事件相关的查询来扩展事件相关查询的方法。然后，我们将扩展候选作为与查询和事件在语义上相关的术语导出。为了识别候选人，我们利用一种新颖的机制将单词和事件同时嵌入到相同的向量空间中。我们表明，与各种新闻通讯TREC数据集上的最新方法相比，我们提出的利用事件的方法可以显着提高查询扩展性能。

22. Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval [PDF] 返回目录
Bhaskar Mitra
Abstract: Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents -- or short passages -- in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms -- such as a person's name or a product model number -- not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections -- such as the document index of a commercial Web search engine -- containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.
摘要：具有深层架构的神经网络在计算机视觉，语音识别和自然语言处理方面已表现出显着的性能提升。但是，信息检索（IR）的挑战与这些其他应用领域不同。 IR的一种常见形式是对文档（或短文）进行排名，以响应基于关键字的查询。有效的IR系统必须通过对不同查询和文档术语之间的关系以及它们如何指示相关性进行建模，来处理查询文档词汇不匹配的问题。当查询包含训练中未见的稀有术语（例如人的名字或产品型号）时，模型还应考虑词汇匹配，并避免检索与语义相关但无关的结果。在许多现实生活中的IR任务中，检索涉及非常庞大的集合（例如商业Web搜索引擎的文档索引），其中包含数十亿个文档。高效的IR方法应利用专门的IR数据结构（例如倒排索引）来从大型馆藏中高效检索。给定信息需求，IR系统还通过确定是否应显示信息伪造，应将信息放置在何处以及其他结果来调整信息伪像接收的曝光量。除了相关性外，可感知曝光的IR系统还可以针对其他目标进行优化，例如检索到的物品和内容发布者的曝光均等。在本文中，我们提出了受IR任务的特殊需求和挑战启发的新型神经体系结构和方法。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-23

目录

摘要