摘要

1. Garain at SemEval-2020 Task 12: Sequence based Deep Learning for Categorizing Offensive Language in Social Media [PDF] 返回目录
Avishek Garain
Abstract: SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media (Zampieri et al., 2020). The task was subdivided into multiple languages and datasets were provided for each one. The task was further divided into three sub-tasks: offensive language identification, automatic categorization of offense types, and offense target identification. I have participated in the task-C, that is, offense target identification. For preparing the proposed system, I have made use of Deep Learning networks like LSTMs and frameworks like Keras which combine the bag of words model with automatically generated sequence based features and manually extracted features from the given dataset. My system on training on 25% of the whole dataset achieves macro averaged f1 score of 47.763%.
摘要：SemEval-2020任务12是OffenseEval：（ZAMPIERI等，2020）多语言攻击性语言鉴定社会化媒体。任务被细分成多国语言，并提供了每个数据集一个。现在的任务是进一步分为三个子任务：攻击性的语言识别，犯罪类型的自动分类和进攻目标识别。我参加了任务-C，也就是进攻目标识别。对于准备建议的制度，我都利用深度学习网络，如LSTMs和框架，如Keras与从给定数据集自动生成基于序列特征和人工提取的特征结合起来的话模型的袋子。我在培训上的整个数据集的25％，系统实现宏平均F1值的47.763％。

2. An exploratory study of L1-specific non-words [PDF] 返回目录
David Alfter
Abstract: In this paper, we explore L1-specific non-words, i.e. non-words in a target language (in this case Swedish) that are re-ranked by a different-language language model. We surmise that speakers of a certain L1 will react different to L1-specific non-words than to general non-words. We present the results from two small case studies exploring whether re-ranking non-words with different language models leads to a perceived difference in `Swedishness' (pilot study 1) and whether German and English native speakers have longer reaction times in a lexical decision task when presented with their respective L1-specific non-words (pilot study 2). Tentative results seem to indicate that L1-specific non-words are processed second-slowest, after purely Swedish-looking non-words.
摘要：在本文中，我们探索L1-特定的非词语，即非词语的目标语言（在本例瑞典），其是重新排序的由不同语言的语言模型。我们推测，一定L1的扬声器会比一般的非词反应L1专用不同的非话。我们提出从两个小案例研究探索的结果重新排序非词用不同的语言模型，因而在`Swedishness'（试点研究1），以及是否德语和英语母语的感知差异是否有一个词汇决策更长的反应时间当与它们各自的L1-特定的非字（试点研究2）提出的任务。初步结果似乎表明，L1-特定非字处理第二最慢，后纯瑞典寻找无话。

3. Too good to be true? Predicting author profiles for abusive language [PDF] 返回目录
Isabelle van der Vegt, Bennett Kleinberg, Paul Gill
Abstract: The problem of online threats and abuse could potentially be mitigated with a computational approach, where sources of abuse are better understood or identified through author profiling. However, abusive language constitutes a specific domain of language for which it has not yet been tested whether differences emerge based on a text author's personality, age, or gender. This study examines statistical relationships between author demographics and abusive vs normal language, and performs prediction experiments for personality, age, and gender. Although some statistical relationships were established between author characteristics and language use, these patterns did not translate to high prediction performance. Personality traits were predicted within 15% of their actual value, age was predicted with an error margin of 10 years, and gender was classified correctly in 70% of the cases. These results are poor when compared to previous research on author profiling, therefore we urge caution in applying this within the context of abusive language and threat assessment.
摘要：网络威胁和虐待的问题可能被与计算方法，其中滥用的来源是更好地理解或通过笔者分析鉴定缓解。然而，粗言秽语构成它尚未差异是否冒出基于文本作者的个性，年龄或性别测试语言的特定领域。本研究探讨了个性，年龄，性别和人口的作者和滥用与正常的语言，并进行预测实验之间的统计关系。虽然是作者的特点和语言运用之间建立了一些统计关系，这些模式并没有转化为较高的预测性能。人格特质是其实际价值的15％以内，预测，年龄为10年的误差预测和性别在的情况下，70％的人正确分类。相比于上撰文分析前人研究的时候，因此，我们敦促谨慎辱骂和威胁评估的范围内应用这一结果不佳。

4. MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models [PDF] 返回目录
Thai Le, Suhang Wang, Dongwon Lee
Abstract: In recent years, the proliferation of so-called "fake news" has caused much disruptions in society and weakened the news ecosystem. Therefore, to mitigate such problems, researchers have developed state-of-the-art models to auto-detect fake news on social media using sophisticated data science and machine learning techniques. In this work, then, we ask "what if adversaries attempt to attack such detection models?" and investigate related issues by (i) proposing a novel threat model against fake news detectors, in which adversaries can post malicious comments toward news articles to mislead fake news detectors, and (ii) developing MALCOM, an end-to-end adversarial comment generation framework to achieve such an attack. Through a comprehensive evaluation, we demonstrate that about 94% and 93.5% of the time on average MALCOM can successfully mislead five of the latest neural detection models to always output targeted real and fake news labels. Furthermore, MALCOM can also fool black box fake news detectors to always output real news labels 90% of the time on average. We also compare our attack model with four baselines across two real-world datasets, not only on attack performance but also on generated quality, coherency, transferability, and robustness.
摘要：近年来，所谓的“假新闻”泛滥已经引起了社会的很大破坏，削弱了新闻生态系统。因此，为了减轻这些问题，研究人员已经开发国家的最先进的型号自动检测假新闻使用复杂的数据科学和机器学习技术，社会化媒体。在这项工作中，那么，我们要问：“如果有什么对手试图攻击这样的检测模型？”并探讨通过（i）建议对假新闻的探测器，其对手能柱向新闻文章误导假新闻探测器恶意评论的新威胁模型，及（ii）开发MALCOM，最终到终端的对抗性意见产生的相关问题框架来实现这样的攻击。通过全面的评估，我们证明了约94％和平均MALCOM时间93.5％可以成功误导了最新的神经检测模型总是输出五个针对性的真假新闻标签。此外，还MALCOM可以欺骗黑匣子假新闻探测器总是输出真正的新闻标签90％的平均时间。我们还比较我们有四个基线攻击模型跨越两个真实世界的数据集，不仅攻击性能，而且还生成质量，一致性，可转让性和耐用性。

5. Sentimental LIAR: Extended Corpus and Deep Learning Models for Fake Claim Classification [PDF] 返回目录
Bibek Upadhayay, Vahid Behzadan
Abstract: The rampant integration of social media in our every day lives and culture has given rise to fast and easier access to the flow of information than ever in human history. However, the inherently unsupervised nature of social media platforms has also made it easier to spread false information and fake news. Furthermore, the high volume and velocity of information flow in such platforms make manual supervision and control of information propagation infeasible. This paper aims to address this issue by proposing a novel deep learning approach for automated detection of false short-text claims on social media. We first introduce Sentimental LIAR, which extends the LIAR dataset of short claims by adding features based on sentiment and emotion analysis of claims. Furthermore, we propose a novel deep learning architecture based on the DistilBERT language model for classification of claims as genuine or fake. Our results demonstrate that the proposed architecture trained on Sentimental LIAR can achieve an accuracy of 70\%, which is an improvement of ~30\% over previously reported results for the LIAR benchmark.
摘要：在每天的生活和我们的文化猖獗整合社交媒体已经引起快速和更容易获取的信息比以往人类历史上的流量。然而，社交媒体平台所固有的无监督的性质也使得它更容易散布虚假信息和假新闻。此外，高体积并且在这样的平台的信息流的速度进行手动监督信息传播不可行的控制。本文旨在通过提出关于社会媒体的不实短文本索赔自动检测到新的深度学习的方法来解决这个问题。我们首先介绍了青涩的骗子，它扩展通过增加基于索赔的情绪和情感分析功能短的权利要求的LIAR数据集。此外，我们提出了一种基于对权利要求书真伪的分类DistilBERT语言模型一个新的深度学习建筑。我们的研究结果表明，该架构培训了青涩LIAR可以达到70 \％的准确度，这是〜30 \％以上的骗子基准先前报道的结果有所改善。

6. Generalisation of Cyberbullying Detection [PDF] 返回目录
Khoury Richard, Larochelle Marc-André
Abstract: Cyberbullying is a problem in today's ubiquitous online communities. Filtering it out of online conversations has proven a challenge, and efforts have led to the creation of many different datasets, all offered as resources to train classifiers. Through these datasets, we will explore the variety of definitions of cyberbullying behaviors and the impact of these differences on the portability of one classifier to another community. By analyzing the similarities between datasets, we also gain insight on the generalization power of the classifiers trained from them. A study of ensemble models combining these classifiers will help us understand how they interact with each other.
摘要：网络欺凌是当今无处不在的网络社区的一个问题。筛选出来的在线对话已被证明是一个挑战，并努力已导致创立了许多不同数据集，所有提供的资源，火车分类。通过这些数据集，我们将探讨各种网络欺凌行为的定义和对一个分类的移植到另一个社区这些差异的影响。通过分析数据集之间的相似性，我们也收获了来自他们训练的分类器的推广力量的洞察力。整体模型结合这些分类将有助于我们的研究了解他们彼此如何相互作用。

7. ASTRAL: Adversarial Trained LSTM-CNN for Named Entity Recognition [PDF] 返回目录
Jiuniu Wang, Wenjia Xu, Xingyu Fu, Guangluan Xu, Yirong Wu
Abstract: Named Entity Recognition (NER) is a challenging task that extracts named entities from unstructured text data, including news, articles, social comments, etc. The NER system has been studied for decades. Recently, the development of Deep Neural Networks and the progress of pre-trained word embedding have become a driving force for NER. Under such circumstances, how to make full use of the information extracted by word embedding requires more in-depth research. In this paper, we propose an Adversarial Trained LSTM-CNN (ASTRAL) system to improve the current NER method from both the model structure and the training process. In order to make use of the spatial information between adjacent words, Gated-CNN is introduced to fuse the information of adjacent words. Besides, a specific Adversarial training method is proposed to deal with the overfitting problem in NER. We add perturbation to variables in the network during the training process, making the variables more diverse, improving the generalization and robustness of the model. Our model is evaluated on three benchmarks, CoNLL-03, OntoNotes 5.0, and WNUT-17, achieving state-of-the-art results. Ablation study and case study also show that our system can converge faster and is less prone to overfitting.
摘要：命名实体识别（NER）是一项具有挑战性的任务，名为提取从非结构化的文本数据，包括新闻，文章，社交评论等NER系统已经研究了几十年的实体。近日，深层神经网络的发展和预先训练字嵌入的进步已成为NER的驱动力。在这种情况下，如何充分利用由字嵌入提取的信息需要更深入的研究。在本文中，我们提出了一种对抗性训练有素LSTM-CNN（ASTRAL）系统，以提高从模型结构和训练过程中双方目前NER方法。为了使使用的相邻字之间的空间信息，门控-CNN引入熔合相邻单词的信息。此外，具体的对抗性训练方法提出了应对NER过度拟合问题。我们在网络中在训练过程中扰动增加变量，使得变量更加多样化，提高了模型的泛化和鲁棒性。我们的模型是在三个基准评估，CoNLL-03，OntoNotes 5.0和WNUT-17，实现了国家的先进成果。消融研究和案例研究也表明，我们的系统能收敛更快，更不容易过度拟合。

8. Defeating Author Gender Identification with Text Style Transfer [PDF] 返回目录
Reza Khan Mohammadi, Seyed Abolghasem Mirroshandel
Abstract: Text Style Transfer can be named as one of the most important Natural Language Processing tasks. Up until now, there have been several approaches and methods experimented for this purpose. In this work, we introduce PGST, a novel polyglot text style transfer approach in gender domain composed of different building blocks. If they become fulfilled with required elements, our method can be applied in multiple languages. We have proceeded with a pre-trained word embedding for token replacement purposes, a character-based token classifier for gender exchange purposes, and the beam search algorithm for extracting the most fluent combination among all suggestions. Since different approaches are introduced in our research, we determine a trade-off value for evaluating different models' success in faking our gender identification model with transferred text. To demonstrate our method's multilingual applicability, we applied our method on both English and Persian corpora and finally ended up defeating our proposed gender identification model by 45.6% and 39.2%, respectively, and obtained highly competitive evaluation results in an analogy among English state of the art methods.
摘要：文本样式传输可以称为最重要的自然语言处理的任务之一。截至目前为止，已经有实验用于此目的的几个途径和方法。在这项工作中，我们将介绍PGST，在性别领域不同积木组成的新的多语种文本样式转移的方法。如果他们成为有需要的元素满足，我们的方法可以在多语言应用。我们已经着手与预训练字嵌入令牌替换目的，性别交换的目的基于字符的记号分类和提取所有建议中最流畅的组合梁搜索算法。由于不同的方法在我们的研究介绍，我们确定为评估不同的模型在伪造我们的性别鉴定模型转移文本成功的一个折衷值。为了证明我们的方法的多语种应用，我们应用在英语和波斯语语料库我们的方法，并最终结束同比分别增长45.6％和39.2％，击败我们提出的性别识别模型，并在比喻中英文状态下得到高度竞争性评估结果技术方法。

9. Cross-Utterance Language Models with Acoustic Error Sampling [PDF] 返回目录
G. Sun, C. Zhang, P. C. Woodland
Abstract: The effective exploitation of richer contextual information in language models (LMs) is a long-standing research problem for automatic speech recognition (ASR). A cross-utterance LM (CULM) is proposed in this paper, which augments the input to a standard long short-term memory (LSTM) LM with a context vector derived from past and future utterances using an extraction network. The extraction network uses another LSTM to encode surrounding utterances into vectors which are integrated into a context vector using either a projection of LSTM final hidden states, or a multi-head self-attentive layer. In addition, an acoustic error sampling technique is proposed to reduce the mismatch between training and test-time. This is achieved by considering possible ASR errors into the model training procedure, and can therefore improve the word error rate (WER). Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER. In particular, the CULM with a self-attentive layer-based extraction network and acoustic error sampling achieves 0.6% absolute WER reduction on AMI, 0.3% WER reduction on the Switchboard part and 0.9% WER reduction on the Callhome part of Eval2000 test set over the respective baselines.
摘要：在语言模型（LMS）更丰富的上下文信息的有效利用是自动语音识别（ASR）的长期研究的问题。横发声LM（茎秆）在本文中，这增强了输入到一个标准的长短期记忆提出（LSTM）LM与来自使用提取网络过去和未来的话语获得的上下文向量。提取网络使用另一个LSTM周围话语到载体中，其被集成到使用一个LSTM最终隐藏状态的投影，或多头自细心层的上下文向量编码。此外，声学误差采样技术被提出了减少训练和测试时间之间的不匹配。这是通过考虑可能的ASR错误到模型训练过程实现的，因此可以提高字错误率（WER）。实验上都进行AMI和总机数据集显示，秆跑赢LSTM LM基线WER。特别地，具有自细心基于层的提取网络和声学误差采样秆达到上AMI 0.6％的绝对WER减少，在切换面板部分0.3％WER减少和0.9％WER减少上Eval2000测试的Callhome部设置过度相应的基线。

10. FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT [PDF] 返回目录
Omar Mossad, Amgad Ahmed, Anandharaju Raju, Hari Karthikeyan, Zayed Ahmed
Abstract: Machine based text comprehension has always been a significant research field in natural language processing. Once a full understanding of the text context and semantics is achieved, a deep learning model can be trained to solve a large subset of tasks, e.g. text summarization, classification and question answering. In this paper we focus on the question answering problem, specifically the multiple choice type of questions. We develop a model based on BERT, a state-of-the-art transformer network. Moreover, we alleviate the ability of BERT to support large text corpus by extracting the highest influence sentences through a semantic similarity model. Evaluations of our proposed model demonstrate that it outperforms the leading models in the MovieQA challenge and we are currently ranked first in the leader board with test accuracy of 87.79%. Finally, we discuss the model shortcomings and suggest possible improvements to overcome these limitations.
摘要：机基于文本的理解一直是自然语言处理的显著的研究领域。一旦文本语境和语义有充分的认识达到，深学习模型可以被训练来解决任务的相当大的一部分，例如文本摘要，分类和答疑。在本文中，我们重点答疑问题，特别是多选类型的问题。我们基于BERT，一个国家的最先进的变压器网络发展的典范。此外，我们缓解BERT的通过语义相似模型提取影响最大的句子来支持大型文本语料库的能力。我们提出的模型的评估表明，它优于领先的车型在MovieQA挑战，我们目前排名第一的领导董事会的87.79％，测试精度。最后，我们讨论了模型的缺点，并提出可能的改进来克服这些限制。

11. Variational Inference-Based Dropout in Recurrent Neural Networks for Slot Filling in Spoken Language Understanding [PDF] 返回目录
Jun Qi, Xu Liu, Javier Tejedor
Abstract: This paper proposes to generalize the variational recurrent neural network (RNN) with variational inference (VI)-based dropout regularization employed for the long short-term memory (LSTM) cells to more advanced RNN architectures like gated recurrent unit (GRU) and bi-directional LSTM/GRU. The new variational RNNs are employed for slot filling, which is an intriguing but challenging task in spoken language understanding. The experiments on the ATIS dataset suggest that the variational RNNs with the VI-based dropout regularization can significantly improve the naive dropout regularization RNNs-based baseline systems in terms of F-measure. Particularly, the variational RNN with bi-directional LSTM/GRU obtains the best F-measure score.
摘要：本文提出了一种具有变推理（VI）基漏失正规化用于长短期存储器（LSTM）细胞以更先进的RNN架构象门控重复单元（GRU）和概括变分回归神经网络（RNN）双向LSTM / GRU。新的变RNNs被用于槽分配，这是在口语理解一个有趣而具有挑战性的任务。在ATIS实验数据集表明，与基于VI辍学正规化变RNNs可以显著提高基于RNNs天真差正规化基线系统F-措施的条款。特别地，变RNN具有双向LSTM / GRU获得最佳F值的分数。

12. Revisiting the Open-Domain Question Answering Pipeline [PDF] 返回目录
Sina J. Semnani, Manish Pandey
Abstract: Open-domain question answering (QA) is the tasl of identifying answers to natural questions from a large corpus of documents. The typical open-domain QA system starts with information retrieval to select a subset of documents from the corpus, which are then processed by a machine reader to select the answer spans. This paper describes Mindstone, an open-domain QA system that consists of a new multi-stage pipeline that employs a traditional BM25-based information retriever, RM3-based neural relevance feedback, neural ranker, and a machine reading comprehension stage. This paper establishes a new baseline for end-to-end performance on question answering for Wikipedia/SQuAD dataset (EM=58.1, F1=65.8), with substantial gains over the previous state of the art (Yang et al., 2019b). We also show how the new pipeline enables the use of low-resolution labels, and can be easily tuned to meet various timing requirements.
摘要：开放域问答（QA）是从大量语料的文档识别答案自然问题的TASL。典型的开放域问答系统与信息检索开始选择从文集文档，然后再由机器读取处理，以选择答案跨度的一个子集。本文介绍Mindstone，它由一个新的多级流水线，它采用传统的基于BM25，信息检索，基于令吉神经相关反馈，神经排序器和一台阅读理解阶段的开放域问答系统。本文建立用于第答疑终端到终端的性能维基百科/小队的数据集一个新的基准（EM = 58.1，F1 = 65.8），比现有技术的先前状态可观的收益（Yang等，2019b）。我们还表明新管道如何能够使用低分辨率的标签，并可以方便的调节，以满足不同的时序要求。

13. A Practical Chinese Dependency Parser Based on A Large-scale Dataset [PDF] 返回目录
Shuai Zhang, Lijie Wang, Ke Sun, Xinyan Xiao
Abstract: Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks. Recently, neural network based (NN-based) dependency parsing has achieved significant progress and obtained the state-of-the-art results. As we all know, NN-based approaches require massive amounts of labeled training data, which is very expensive because it requires human annotation by experts. Thus few industrial-oriented dependency parser tools are publicly available. In this report, we present Baidu Dependency Parser (DDParser), a new Chinese dependency parser trained on a large-scale manually labeled dataset called Baidu Chinese Treebank (DuCTB). DuCTB consists of about one million annotated sentences from multiple sources including search logs, Chinese newswire, various forum discourses, and conversation programs. DDParser is extended on the graph-based biaffine parser to accommodate to the characteristics of Chinese dataset. We conduct experiments on two test sets: the standard test set with the same distribution as the training set and the random test set sampled from other sources, and the labeled attachment scores (LAS) of them are 92.9\% and 86.9\% respectively. DDParser achieves the state-of-the-art results, and is released at this https URL.
摘要：依存分析是一个长期的自然语言处理任务，其输出到各种下游任务至关重要。最近，基于神经网络的（基于NN-）依存分析已取得进展显著和获得的状态的最先进的结果。大家都知道，基于NN-方法需要标记的训练数据，这是非常昂贵的，因为它需要通过专家的人注释的大量。因此，一些产业导向依赖解析器工具是公开的。在这份报告中，我们提出百度依存句法（DDParser），训练了大规模的新中国依赖解析器手工标注数据集被称为百度中国树库（DuCTB）。 DuCTB由来自多个来源，包括搜索记录，中国的通讯社，各种论坛的话语，和谈话节目约一百万标注的句子。 DDParser延伸在基于图的biaffine解析器以适应中国数据集的特性。我们在两个测试组进行实验：标准测试组具有相同的分布作为训练集和随机测试集从其它来源取样，并且将它们的标记附着分数（LAS）分别为92.9 \％和86.9 \％。 DDParser实现国家的最先进的成果，并在此HTTPS URL被释放。

14. Automated Storytelling via Causal, Commonsense Plot Ordering [PDF] 返回目录
Prithviraj Ammanabrolu, Wesley Cheung, William Broniec, Mark O. Riedl
Abstract: Automated story plot generation is the task of generating a coherent sequence of plot events. Causal relations between plot events are believed to increase the perception of story and plot coherence. In this work, we introduce the concept of soft causal relations as causal relations inferred from commonsense reasoning. We demonstrate C2PO, an approach to narrative generation that operationalizes this concept through Causal, Commonsense Plot Ordering. Using human-participant protocols, we evaluate our system against baseline systems with different commonsense reasoning reasoning and inductive biases to determine the role of soft causal relations in perceived story quality. Through these studies we also probe the interplay of how changes in commonsense norms across storytelling genres affect perceptions of story quality.
摘要：自动故事剧情一代产生情节事件的连贯序列的任务。剧情事件之间的因果关系被认为是增加故事和情节的连贯性的看法。在这项工作中，我们引入软因果关系从常识推理推断因果关系的概念。我们证明C2PO，叙事产生了一种方法，通过operationalizes因果，常识情节订购这个概念。使用人类参与者的协议，我们评估针对不同的常识推理的推理和归纳偏见基线系统系统，以确定感知质量故事软因果关系的作用。通过这些研究，我们也探讨在整个故事体裁常识规范的改变如何影响故事质量的感知相互作用。

15. Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models [PDF] 返回目录
Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal
Abstract: A common approach to solve complex tasks is by breaking them down into simple sub-problems that can then be solved by simpler modules. However, these approaches often need to be designed and trained specifically for each complex task. We propose a general approach, Text Modular Networks(TMNs), where the system learns to decompose any complex task into the language of existing models. Specifically, we focus on Question Answering (QA) and learn to decompose complex questions into sub-questions answerable by existing QA models. TMNs treat these models as blackboxes and learn their textual input-output behavior (i.e., their language) through their task datasets. Our next-question generator then learns to sequentially produce sub-questions that help answer a given complex question. These sub-questions are posed to different existing QA models and, together with their answers, provide a natural language explanation of the exact reasoning used by the model. We present the first system, incorporating a neural factoid QA model and a symbolic calculator, that uses decomposition for the DROP dataset, while also generalizing to the multi-hop HotpotQA dataset. Our system, ModularQA, outperforms a cross-task baseline by 10-60 F1 points and performs comparable to task-specific systems, while also providing an easy-to-read explanation of its reasoning.
摘要：为了解决复杂的任务，通常的做法是将它们分解成可以再由简单的模块来解决简单的子问题。然而，这些方法往往需要设计专门为每个复杂的任务训练。我们提出了一个通用的方法，文本模块化网络（TMNs），其中系统学习以分解任何复杂的任务为现有车型的语言。具体而言，我们专注于问题回答（QA），学会分解复杂的问题为子问题，通过现有的QA车型交代。 TMNs将这些模型作为黑盒，并通过他们的任务数据集学习他们的文本输入 - 输出行为（即，他们的语言）。我们的下一个问题产生，然后学会按顺序产生子问题，这有助于回答一个给定的复杂的问题。这些子问题都提出了不同的现有QA车型，并与他们的答案一起，提供该模型中使用的精确推理的自然语言解释。我们提出的第一个系统，结合了神经的factoid QA模型和象征性的计算器，使用分解为DROP数据集，同时还推广到多跳HotpotQA数据集。我们的系统，ModularQA，优于由10-60 F1分和执行媲美任务的特定系统跨任务的基线，同时还提供了推理的一个易于阅读的解释。

16. Automatic Assignment of Radiology Examination Protocols Using Pre-trained Language Models with Knowledge Distillation [PDF] 返回目录
Wilson Lau, Laura Aaltonen, Martin Gunn, Meliha Yetisgen
Abstract: Selecting radiology examination protocol is a repetitive, error-prone, and time-consuming process. In this paper, we present a deep learning approach to automatically assign protocols to computer tomography examinations, by pre-training a domain-specific BERT model ($BERT_{rad}$). To handle the high data imbalance across exam protocols, we used a knowledge distillation approach that up-sampled the minority classes through data augmentation. We compared classification performance of the described approach with the statistical n-gram models using Support Vector Machine (SVM) and Random Forest (RF) classifiers, as well as the Google's $BERT_{base}$ model. SVM and RF achieved macro-averaged F1 scores of 0.45 and 0.6 while $BERT_{base}$ and $BERT_{rad}$ achieved 0.61 and 0.63. Knowledge distillation improved overall performance on the minority classes, achieving a F1 score of 0.66. Additionally, by choosing the optimal threshold, the BERT models could classify over 50% of test samples within 5% error rate and potentially alleviate half of radiologist protocoling workload.
摘要：选择放射学检查协议是重复的，容易出错的，并且耗时的过程。在本文中，我们提出了一个深刻的学习方法自动分配协议，以计算机断层扫描检查，通过预先训练特定领域BERT模型（$ {BERT_ RAD} $）。为了处理整个考试协议的高数据的不平衡，我们采用了知识蒸馏方法通过数据扩充是上采样的民族班。我们比较了使用支持向量机（SVM）和随机森林（RF）分类，以及对谷歌的$ {BERT_基地} $模型的统计n元模型所描述的方法的分类性能。 SVM和RF实现宏观平均0.45 F1分数和0.6，同时$ BERT_ {碱} $ $和{BERT_拉德} $实现0.61和0.63。知识蒸馏提高对民族班的整体性能，实现了0.66 F1得分。另外，通过选择最佳的阈值，则BERT模型可以错误率在5％以内的测试样品的50％以上进行分类和潜在减轻放射protocoling工作量的一半。

17. Document Similarity from Vector Space Densities [PDF] 返回目录
Ilia Rushkin
Abstract: We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel regression, and takes into account semantic relations among words. We find that the accuracy of this method is virtually the same as that of a state-of-the-art method, while the gain in speed is very substantial. Additionally, we introduce generalized versions of the top-k accuracy metric and of the Jaccard metric of agreement between similarity models.
摘要：我们提出了估计文本文档之间的相似性，我们称之为密度相似（DS）方法计算光方法。该方法是基于词在高维欧氏空间和核回归嵌入，并考虑到单词之间的语义关系。我们发现，这种方法的准确度几乎是相同的国家的最先进的方法，而在速度的增益是非常可观的。此外，我们引进推广了前k准确性度量和杰卡德计量相似车型之间的协议的版本。

18. A Stance Data Set on Polarized Conversations on Twitter about the Efficacy of Hydroxychloroquine as a Treatment for COVID-19 [PDF] 返回目录
Ece Çiğdem Mutlu, Toktam A. Oghaz, Jasser Jasser, Ege Tütüncüler, Amirarsalan Rajabi, Aida Tayebi, Ozlem Ozmen, Ivan Garibay
Abstract: At the time of this study, the SARS-CoV-2 virus that caused the COVID-19 pandemic has spread significantly across the world. Considering the uncertainty about policies, health risks, financial difficulties, etc. the online media, specially the Twitter platform, is experiencing a high volume of activity related to this pandemic. Among the hot topics, the polarized debates about unconfirmed medicines for the treatment and prevention of the disease have attracted significant attention from online media users. In this work, we present a stance data set, COVID-CQ, of user-generated content on Twitter in the context of COVID-19. We investigated more than 14 thousands tweets and manually annotated the opinions of the tweet initiators regarding the use of "chloroquine" and "hydroxychloroquine" for the treatment or prevention of COVID-19. To the best of our knowledge, COVID-CQ is the first data set of Twitter users' stances in the context of the COVID-19 pandemic, and the largest Twitter data set on users' stances towards a claim, in any domain. We have made this data set available to the research community via GitHub. We expect this data set to be useful for many research purposes, including stance detection, evolution and dynamics of opinions regarding this outbreak, and changes in opinions in response to the exogenous shocks such as policy decisions and events.
摘要：本研究的时候，SARS冠状病毒2型病毒引起的COVID-19大流行有显著传遍了全世界。考虑到有关政策，健康风险，财务困难等网络媒体，特别是Twitter平台的不确定性，正在经历与此相关的流行病活动的高容量。在这些热点话题，有关疾病的治疗和预防未经证实的药品极化的辩论已经引起显著关注网络媒体的用户。在这项工作中，我们提出了一个姿态数据集，COVID-CQ，在COVID-19的情况下的Twitter用户生成的内容。我们调查了超过14个成千上万的鸣叫和手动注释关于使用“氯喹”和“羟氯喹”为COVID-19的治疗或预防的鸣叫发起人的意见。据我们所知，COVID-CQ是在COVID-19大流行的背景下立场，以及对用户的最大的Twitter数据集Twitter用户的第一个数据集对索赔的立场，在任何领域。我们已经取得了该数据通过GitHub的设置提供给研究界。我们预计这个数据集对许多研究目的，包括姿态检测，演变和对本爆发的意见动态，并在应对外部冲击，如决策和事件的看法的变化是有用的。

19. DAVE: Deriving Automatically Verilog from English [PDF] 返回目录
Hammond Pearce, Benjamin Tan, Ramesh Karri
Abstract: While specifications for digital systems are provided in natural language, engineers undertake significant efforts to translate them into the programming languages understood by compilers for digital systems. Automating this process allows designers to work with the language in which they are most comfortable --the original natural language -- and focus instead on other downstream design challenges. We explore the use of state-of-the-art machine learning (ML) to automatically derive Verilog snippets from English via fine-tuning GPT-2, a natural language ML system. We describe our approach for producing a suitable dataset of novice-level digital design tasks and provide a detailed exploration of GPT-2, finding encouraging translation performance across our task sets (94.8% correct), with the ability to handle both simple and abstract design tasks.
摘要：虽然在自然的语言提供了数字系统规范，工程师承担显著努力将其转化为通过编译器为数字系统理解的编程语言。自动执行这个过程让设计人员工作的语言中，他们最舒服的--the原有的自然语言 - 而专注于其他下游的设计挑战。我们探索利用国家的最先进的机器学习（ML）的通过微调GPT-2，自然语言ML系统从英文自动派生的Verilog代码片段。我们描述我们的方法生产的新手级数码设计任务合适的数据集，并提供GPT-2的详细勘探，发现，鼓励跨我们的任务组（94.8％正确的）翻译性能，具有处理简单和抽象的设计能力任务。

20. Identifying Documents In-Scope of a Collection from Web Archives [PDF] 返回目录
Krutarth Patel, Cornelia Caragea, Mark Phillips, Nathaniel Fox
Abstract: Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving institutions. In this paper, we explore different learning models and feature representations to determine the best performing ones for identifying the documents of interest from the web archived data. Specifically, we study both machine learning and deep learning models and "bag of words" (BoW) features extracted from the entire document or from specific portions of the document, as well as structural features that capture the structure of documents. We focus our evaluation on three datasets that we created from three different Web archives. Our experimental results show that the BoW classifiers that focus only on specific portions of the documents (rather than the full text) outperform all compared methods on all three datasets.
摘要：Web归档数据通常包含用于创建文档的专门收藏，例如，科学数字图书馆和技术报告资料库是非常有用的高品质文档。在此过程中，存在对可以区分感兴趣的文件的采集出来的网页存档机构收集的文件数量庞大的自动方法实质上需要。在本文中，我们将探讨不同的学习模式和特征表示，以确定用于识别从Web归档数据感兴趣的文档的最佳执行者。具体地，我们研究这两个机器学习和深的学习模式和“词袋”（BOW）从整个文件或从文档的特定部分，以及结构特征的文档捕获结构中提取的特征。我们专注于三个数据集中，我们从三个不同的Web档案创建我们的评价。我们的实验结果表明，只注重文件（而不是全文）的特定部分船头分类跑赢上的所有三个数据集中的所有比较的方法。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-09-03

目录

摘要