摘要

1. Artificial Intelligence (AI) in Action: Addressing the COVID-19 Pandemic with Natural Language Processing (NLP) [PDF] 返回目录
Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, Zhiyong Lu
Abstract: The COVID-19 pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP) - the branch of artificial intelligence that interprets human language - can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, case load forecasting and misinformation detection. We conclude discussing observable trends and remaining challenges.
摘要：Covid-19大流行对社会产生了重大影响，这两者都是因为Covid-19的严重健康影响，并且由于公共卫生措施来减缓其传播。其中许多困难都是从根本上提供信息需求;解决这些需求的尝试导致了研究人员和公众的信息过载。自然语言处理（NLP） - 解释人类语言的人工智能分支 - 可以应用于解决Covid-19大流行紧急的许多信息。此述评调查大约150个NLP研究以及50多个系统和数据集，用于解决Covid-19大流行。我们详细介绍了四个核心NLP任务：信息检索，命名实体识别，基于文学的发现和问题应答。我们还通过四个额外任务描述了直接解决大流行语方面的工作：主题建模，情感和情感分析，案例负荷预测和错误信息检测。我们得出结论，讨论可观察的趋势和剩下的挑战。

2. Mere account mein kitna balance hai? -- On building voice enabled Banking Services for Multilingual Communities [PDF] 返回目录
Akshat Gupta, Sai Krishna Rallabandi, Alan W Black
Abstract: Tremendous progress in speech and language processing has brought language technologies closer to daily human life. Voice technology has the potential to act as a horizontal enabling layer across all aspects of digitization. It is especially beneficial to rural communities in scenarios like a pandemic. In this work we present our initial exploratory work towards one such direction - building voice enabled banking services for multilingual societies. Speech interaction for typical banking transactions in multilingual communities involves the presence of filled pauses and is characterized by Code Mixing. Code Mixing is a phenomenon where lexical items from one language are embedded in the utterance of another. Therefore speech systems deployed for banking applications should be able to process such content. In our work we investigate various training strategies for building speech based intent recognition systems. We present our results using a Naive Bayes classifier on approximate acoustic phone units using the Allosaurus library.
摘要：言论和语言处理的巨大进步使语言技术更接近日常人的生命。语音技术具有跨越数字化的各个方面充当水平启用层的可能性。它对大流行等情景中的农村社区特别有利。在这项工作中，我们将我们的初步探索性努力实现了一个这样的方向 - 为多语种社团提供了一个这样的方向的语音。在多语种社区中典型银行交易的语音互动涉及填充暂停的存在，并通过代码混合来表征。代码混合是一种现象，其中一种语言中的词汇项目嵌入另一语言的话语中。因此，部署用于银行应用程序的语音系统应该能够处理此类内容。在我们的工作中，我们调查基于语音的意图识别系统的各种培训策略。我们在使用Allosaurus库的近似声学电话单元上使用天真的贝叶斯分类器展示我们的结果。

3. Semi-supervised Relation Extraction via Incremental Meta Self-Training [PDF] 返回目录
Xuming Hu, Fukun Ma, Chenyao Liu, Chenwei Zhang, Lijie Wen, Philip S. Yu
Abstract: To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training. To alleviate the noise in pseudo labels, we propose a method called MetaSRE, where a Relation Label Generation Network generates accurate quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification as an additional meta-objective. To reduce the influence of noisy pseudo labels, MetaSRE adopts a pseudo label selection and exploitation scheme which assesses pseudo label quality on unlabeled samples and only exploits high-quality pseudo labels in a self-training fashion to incrementally augment labeled samples for both robustness and accuracy. Experimental results on two public datasets demonstrate the effectiveness of the proposed approach.
摘要：为了减轻努力获得大规模注释，半监督关系提取方法旨在利用未标记的数据，除了从有限的样本中学习。现有的自我训练方法遭受渐变漂移问题，其中在培训期间纳入了未标记数据上的嘈杂的伪标签。为了减轻伪标签中的噪声，我们提出了一种称为MetaSre的方法，其中关系标签生成网络通过（Meta）从成功和失败的尝试尝试作为额外的元目标来对伪标签产生准确的质量评估。为了减少嘈杂的伪标签的影响，MetaSre采用伪标签选择和开发方案，该方案评估未标记的样本上的伪标签质量，仅利用自我培训时尚的高质量伪标签，以逐步增加标记的样本，以实现鲁棒性和准确性。两个公共数据集的实验结果证明了提出的方法的有效性。

4. Sentiment Analysis for Roman Urdu Text over Social Media, a Comparative Study [PDF] 返回目录
Irfan Qutab, Khawar Iqbal Malik, Hira Arooj
Abstract: In present century, data volume is increasing enormously. The data could be in form for image, text, voice, and video. One factor in this huge growth of data is usage of social media where everyone is posting data on daily basis during chatting, exchanging information, and uploading their personal and official credential. Research of sentiments seeks to uncover abstract knowledge in Published texts in which users communicate their emotions and thoughts about shared content, including blogs, news and social networks. Roman Urdu is the one of most dominant language on social networks in Pakistan and India. Roman Urdu is among the varieties of the world's third largest Urdu language but yet not sufficient work has been done in this language. In this article we addressed the prior concepts and strategies used to examine the sentiment of the roman Urdu text and reported their results as well.
摘要：在现在的世纪中，数据量越来越大。数据可以是图像，文本，语音和视频的形式。这种巨大增长的一个因素是社交媒体的用法，每个人都在聊天，交换信息和上传他们个人和官方资质期间每天发布数据。情绪的研究旨在在发布的文本中发现抽象知识，其中用户传达他们的情绪和想法，包括共享内容，包括博客，新闻和社交网络。罗马乌尔都是巴基斯坦和印度社交网络中最占主导地位的语言之一。罗马乌尔都语是世界上第三大乌尔都语的品种之一，但这种语言没有足够的工作。在本文中，我们解决了用于审查罗马乌尔都语文本的情绪并报告其结果的先前概念和策略。

5. TopicBERT for Energy Efficient Document Classification [PDF] 返回目录
Yatin Chaudhary, Pankaj Gupta, Khushbu Saxena, Vivek Kulkarni, Thomas Runkler, Hinrich Schütze
Abstract: Prior research notes that BERT's computational cost grows quadratically with sequence length thus leading to longer training times, higher GPU memory constraints and carbon emissions. While recent work seeks to address these scalability issues at pre-training, these issues are also prominent in fine-tuning especially for long sequence tasks like document classification. Our work thus focuses on optimizing the computational cost of fine-tuning for document classification. We achieve this by complementary learning of both topic and language models in a unified framework, named TopicBERT. This significantly reduces the number of self-attention operations - a main performance bottleneck. Consequently, our model achieves a 1.4x ($\sim40\%$) speedup with $\sim40\%$ reduction in $CO_2$ emission while retaining $99.9\%$ performance over 5 datasets.
摘要：现有研究说明，BERT的计算成本随着序列长度的二次繁殖，从而导致培训时间更长，GPU内存约束和碳排放。虽然最近的工作寻求在预训练中解决这些可扩展性问题，但这些问题在微调中也突出，特别是对于文档分类等长序列任务。因此，我们的工作侧重于优化文档分类微调的计算成本。我们通过在统一框架中的主题和语言模型的互补学习来实现这一目标，命名为Pointerbert。这显着降低了自我关注操作的数量 - 主要性能瓶颈。因此，我们的模型在$ \ SIM40 \％$减少$ CO_2 $ SMITSION达到1.4倍（$ \ SIM40 \％$）加速度，而在5个数据集中保留$ 99.9 \％$性能。

6. Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition [PDF] 返回目录
Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney
Abstract: To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling. We also present a simple, stable and efficient training procedure using frame-wise cross-entropy loss. A phonetic context size of one is shown to be sufficient for the best performance. A simplified scheduled sampling approach is applied for further improvement. We also briefly compare different decoding approaches. The overall performance of our best model is comparable to state-of-the-art results for the TED-LIUM Release 2 and Switchboard corpora.
摘要：加入语音识别的经典和端到端方法的优势，我们为基于音素的神经传感器建模提出了一种简单，新颖和竞争的方法。比较不同的对齐标签拓扑，并提出了基于词汇的音素标签增强以提高性能。利用音素的本地依赖性，我们采用简化的神经网络结构和与外部字级语言模型的直接集成，以保留SEQ-to-SEQ建模的一致性。我们还使用帧展跨熵损失呈现简单，稳定和高效的培训程序。语音上下文大小被显示为最佳性能。简化的预定采样方法应用于进一步改进。我们还简要比较了不同的解码方法。我们最佳模型的整体性能与TED Lium版本2和交换机语料库的最先进结果相当。

7. Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents [PDF] 返回目录
Gregory Yauney, Jack Hessel, David Mimno
Abstract: Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations. Such granular annotation is rare, expensive, and unavailable in most domain-specific contexts. In contrast, unlabeled multi-image, multi-sentence documents are abundant. Can lexical grounding be learned from such documents, even though they have significant lexical and visual overlap? Working with a case study dataset of real estate listings, we demonstrate the challenge of distinguishing highly correlated grounded terms, such as "kitchen" and "bedroom", and introduce metrics to assess this document similarity. We present a simple unsupervised clustering-based method that increases precision and recall beyond object detection and image tagging baselines when evaluated on labeled subsets of the dataset. The proposed method is particularly effective for local contextual meanings of a word, for example associating "granite" with countertops in the real estate dataset and with rocky landscapes in a Wikipedia dataset.
摘要：图像可以向我们洞察中的语境含义，但当前的图像文本接地方法需要详细的注释。这种粒度注释是罕见的，昂贵的，并且在大多数具体域的上下文中不可用。相反，未标记的多图像，多句子文件丰富。如果他们有重大的词汇和视觉重叠，可以从这些文件中吸取的词汇接地吗？与房地产上市的案例研究数据集进行工作，我们展示了区分高度相关的接地术语，如“厨房”和“卧室”，并引入指标来评估该文件的相似性。我们提出了一种简单的无监督基于聚类的方法，可以在评估数据集的标记子集时增加对象检测和图像标记基线超出对象检测和图像标记基线。该方法对于单词的本地语境含义特别有效，例如将“花岗岩”与房地产数据集中的台面相关联，并在维基百科数据集中使用岩石景观。

8. A Cross-lingual Natural Language Processing Framework for Infodemic Management [PDF] 返回目录
Ridam Pal, Rohan Pandey, Vaibhav Gautam, Kanav Bhagat, Tavpritesh Sethi
Abstract: The COVID-19 pandemic has put immense pressure on health systems which are further strained due to the misinformation surrounding it. Under such a situation, providing the right information at the right time is crucial. There is a growing demand for the management of information spread using Artificial Intelligence. Hence, we have exploited the potential of Natural Language Processing for identifying relevant information that needs to be disseminated amongst the masses. In this work, we present a novel Cross-lingual Natural Language Processing framework to provide relevant information by matching daily news with trusted guidelines from the World Health Organization. The proposed pipeline deploys various techniques of NLP such as summarizers, word embeddings, and similarity metrics to provide users with news articles along with a corresponding healthcare guideline. A total of 36 models were evaluated and a combination of LexRank based summarizer on Word2Vec embedding with Word Mover distance metric outperformed all other models. This novel open-source approach can be used as a template for proactive dissemination of relevant healthcare information in the midst of misinformation spread associated with epidemics.
摘要：Covid-19流行病对卫生系统的压力造成了巨大的压力，这是由于围绕它的错误信息而进一步应变。在这种情况下，在合适的时间提供正确的信息至关重要。利用人工智能蔓延的信息管理日益增长的需求。因此，我们利用了用于识别需要在群众之间传播的相关信息的自然语言处理的潜力。在这项工作中，我们展示了一种新的交叉语言自然语言处理框架，通过与来自世界卫生组织的可信任的指南匹配日常新闻来提供相关信息。该提议的管道部署了NLP的各种技术，例如摘要，Word Embeddings和相似度指标，以向用户提供新闻文章以及相应的医疗教练。评估了总共36个模型，并在Word2Vec嵌入Word Mover距离度量上的Lexrank基本摘要组合优于所有其他模型。这种新颖的开源方法可以用作与流行病相关的错误信息中的相关医疗信息主动传播的模板。

9. Topic-Preserving Synthetic News Generation: An Adversarial Deep Reinforcement Learning Approach [PDF] 返回目录
Ahmadreza Mosallanezhad, Kai Shu, Huan Liu
Abstract: Nowadays, there exist powerful language models such as OpenAI's GPT-2 that can generate readable text and can be fine-tuned to generate text for a specific domain. Considering GPT-2, it cannot directly generate synthetic news with respect to a given topic and the output of the language model cannot be explicitly controlled. In this paper, we study the novel problem of topic-preserving synthetic news generation. We propose a novel deep reinforcement learning-based method to control the output of GPT-2 with respect to a given news topic. When generating text using GPT-2, by default, the most probable word is selected from the vocabulary. Instead of selecting the best word each time from GPT-2's output, an RL agent tries to select words that optimize the matching of a given topic. In addition, using a fake news detector as an adversary, we investigate generating realistic news using our proposed method. In this paper, we consider realistic news as news that cannot be easily detected by a fake news classifier. Experimental results demonstrate the effectiveness of the proposed framework on generating topic-preserving news content than state-of-the-art baselines.
摘要：如今，存在强大的语言模型，如Openai的GPT-2，可以生成可读文本，可以进行微调，以为特定域生成文本。考虑到GPT-2，它不能直接生成关于给定主题的合成新闻，无法明确控制语言模型的输出。在本文中，我们研究了专题保护合成新闻生成的新问题。我们提出了一种新颖的深度加强学习的方法，用于控制GPT-2的输出相对于给定的新闻主题。使用GPT-2生成文本时，默认情况下，最可能的单词是从词汇表中选择的。 RL代理商尝试选择优化给定主题的匹配的单词而不是从GPT-2的输出中选择最佳单词。此外，使用假新闻检测器作为对手，我们使用我们提出的方法调查生成现实新闻。在本文中，我们认为现实新闻作为假新闻分类器不能轻易检测到的新闻。实验结果表明，拟议框架关于产生主题保留新闻内容的框架的有效性，而不是最先进的基线。

10. A Critical Assessment of State-of-the-Art in Entity Alignment [PDF] 返回目录
Max Berrendorf, Ludwig Wacker, Evgeniy Faerman
Abstract: In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for a fair evaluation setting in which we evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, we find out in our ablation study that often different features of SotA methods are crucial for good performance than previously assumed. The code is available at this https URL.
摘要：在这项工作中，我们对知识图中的实体对齐任务进行了广泛的两个最先进的（SOTA）方法。因此，我们首先仔细检查基准过程并确定几个缺点，这使得在原始工程中报告的结果并不总是可比的。此外，我们怀疑社区中的常见做法是直接在测试集上直接制作Quand参数优化，从而降低报告的表现的信息价值。因此，我们选择基准测试数据集的代表性样本并描述其属性。我们还检查实体表示的不同初始化，因为它们是模型性能的决定性因素。此外，我们使用共享列车/验证/测试拆分进行公平评估设置，在其中我们评估所有数据集上的所有方法。在我们的评估中，我们做了几个有趣的发现。虽然我们观察到大多数时间的方法，但是当数据集包含噪声时，它们遇到了困难，这是大多数现实生活中的噪声。此外，我们在我们的消融研究中发现了SOTA方法的经常不同特征对于良好的性能来说至关重要。该代码可在此HTTPS URL上获得。

11. Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction [PDF] 返回目录
Tong Zhu, Haitao Wang, Junjie Yu, Xiabing Zhou, Wenliang Chen, Wei Zhang, Min Zhang
Abstract: In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test set and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.
摘要：近年来，远方监督的关系提取通过使用深神经网络实现了一定的成功。远程监督（DS）可以通过将实体对与知识库（KB）对齐至句子来自动生成大规模的注释数据。然而，这些DS生成的数据集不可避免地有错误的标签，导致测试期间的评估得分不正确，这可能会误导研究人员。为了解决这个问题，我们构建了一个新的DataSet NYTH，在那里我们将DS生成的数据用作训练数据和雇用注释器来标记测试数据。与先前的数据集相比，NYT-H具有更大的测试集，然后我们可以执行更准确和一致的评估。最后，我们在NYT-H上介绍了几种广泛使用的系统的实验结果。实验结果表明，DS标记的测试数据和人类注释的测试数据上的比较系统的排名列表是不同的。这表明我们的人为注释数据是评估远方监督的关系的必要条件。

12. SLM: Learning a Discourse Language Representation with Sentence Unshuffling [PDF] 返回目录
Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning
Abstract: We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level language representations: contextualized word representations derived from language model objectives at one extreme and a whole sequence representation learned by order classification of two given textual segments at the other. However, these models are not directly encouraged to capture representations of intermediate-size structures that exist in natural languages such as sentences and the relationships among them. To that end, we propose a new approach to encourage learning of a contextualized sentence-level representation by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original ordering. Through experiments on downstream tasks such as GLUE, SQuAD, and DiscoEval, we show that this feature of our model improves the performance of the original BERT by large margins.
摘要：我们介绍了句子级语言建模，以完全自我监督的方式学习话语语言表示的新预训练目标。 NLP中最近的预训练方法专注于学习底层或顶级语言表示：从一个极端的语言模型目标导出的上下文化词表示，并通过另一个给定文本段的订单分类来学习的整个序列表示。但是，不直接鼓励这些模型捕获以自然语言（如句子）存在的中间大小结构的表示。为此，我们提出了一种新的方法来鼓励通过洗牌输入句子的序列来了解语境化句子级表示，并培训分层变压器模型以重建原始排序。通过对下游任务的实验，如胶水，小队和散发，我们表明我们模型的这种特征可以通过大幅边缘提高原始硼的性能。

13. "Thy algorithm shalt not bear false witness": An Evaluation of Multiclass Debiasing Methods on Word Embeddings [PDF] 返回目录
Thalea Schlender, Gerasimos Spanakis
Abstract: With the vast development and employment of artificial intelligence applications, research into the fairness of these algorithms has been increased. Specifically, in the natural language processing domain, it has been shown that social biases persist in word embeddings and are thus in danger of amplifying these biases when used. As an example of social bias, religious biases are shown to persist in word embeddings and the need for its removal is highlighted. This paper investigates the state-of-the-art multiclass debiasing techniques: Hard debiasing, SoftWEAT debiasing and Conceptor debiasing. It evaluates their performance when removing religious bias on a common basis by quantifying bias removal via the Word Embedding Association Test (WEAT), Mean Average Cosine Similarity (MAC) and the Relative Negative Sentiment Bias (RNSB). By investigating the religious bias removal on three widely used word embeddings, namely: Word2Vec, GloVe, and ConceptNet, it is shown that the preferred method is ConceptorDebiasing. Specifically, this technique manages to decrease the measured religious bias on average by 82,42%, 96,78% and 54,76% for the three word embedding sets respectively.
摘要：随着人工智能应用的巨大发展和就业，研究了这些算法的公平性。具体而言，在自然语言处理域中，已经表明，在嵌入词嵌入中持续存在社会偏差，因此在使用时具有放大这些偏差的危险。作为社会偏见的一个例子，宗教偏见显示在嵌入词中持续存在，并突出显示其删除的需求。本文调查了最先进的多牌子脱叠技术：硬脱结，软卓脱叠和概念脱叠。它通过通过单词嵌入关联测试（Weat），平均平均余弦相似性（MAC）和相对负面情绪偏差（RNSB）来评估宗教偏置时的性能。通过调查三个广泛使用的Word Embeddings的宗教偏见，即：Word2Vec，手套和概念网络，结果表明优选的方法是Concepordebiasing。具体而言，该技术分别设定了平均减少测量的宗教偏差，平均分别为三个单词嵌入集的82,42％，96,78％和54,76％。

14. Biomedical Concept Relatedness -- A large EHR-based benchmark [PDF] 返回目录
Claudia Schulz, Josh Levy-Kramer, Camille Van Assel, Miklos Kepes, Nils Hammerla
Abstract: A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one in question. The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores. However, all existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs. We open-source a novel concept relatedness benchmark overcoming these issues: it is six times larger than existing datasets and concept pairs are chosen based on co-occurrence in EHRs, ensuring their relevance for the application of interest. We present an in-depth analysis of our new dataset and compare it to existing ones, highlighting that it is not only larger but also complements existing datasets in terms of the types of concepts included. Initial experiments with state-of-the-art embedding methods show that our dataset is a challenging new benchmark for testing concept relatedness models.
摘要：AI向医疗保健的有希望应用是从电子健康记录（EHRS）的信息检索，例如，为了援助临床医生，寻找有关咨询的相关信息或为学习招募合适的患者。这需要远远超出简单字符串匹配的搜索能力，包括检索与有问题相关的概念（诊断，症状，药物等）。通过预测已知相关性评分的概念的相关性来测试AI方法的适用性。然而，所有现有的生物医学概念相关性数据集是臭名昭着的，由手工采摘的概念对组成。我们开源进行新的概念相关性基准克服这些问题：它比现有数据集和基于EHRS的共同发生选择的概念对六倍，确保其对兴趣应用的相关性。我们对我们的新数据集进行了深入的分析，并将其与现有数据进行比较，突出显示它不仅更大，而且还符合现有的数据集在包括所包含的概念类型方面。最先进的嵌入方法的初始实验表明，我们的数据集是测试概念相关性模型的具有挑战性的新基准。

15. HyperText: Endowing FastText with Hyperbolic Geometry [PDF] 返回目录
Yudong Zhu, Di Zhou, Jinghui Xiao, Xin Jiang, Xiao Chen, Qun Liu
Abstract: Natural language data exhibit tree-like hierarchical structures such as the hypernym-hyponym relations in WordNet. FastText, as the state-of-the-art text classifier based on shallow neural network in Euclidean space, may not model such hierarchies precisely with limited representation capacity. Considering that hyperbolic space is naturally suitable for modeling tree-like hierarchical data, we propose a new model named HyperText for efficient text classification by endowing FastText with hyperbolic geometry. Empirically, we show that HyperText outperforms FastText on a range of text classification tasks with much reduced parameters.
摘要：自然语言数据表现出类似的树状等级结构，例如Wordnet中的复义 - 下字义。 FastText，作为基于欧几里德空间中的浅神经网络的最先进的文本分类器，可能无法以有限的表示容量精确地模拟这些层次结构。考虑到双曲线空间自然适用于建模树状分层数据，我们提出了一个名为Hyper文本的新模型，以通过赋予双曲性几何来赋予FastText来获得高效的文本分类。凭经验，我们显示超文本在具有大量减少的文本分类任务范围内快速地表达FastText。

16. Target Word Masking for Location Metonymy Resolution [PDF] 返回目录
Haonan Li, Maria Vasardani, Martin Tomko, Timothy Baldwin
Abstract: Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources. In this paper, we propose an end-to-end word-level classification approach based only on BERT, without dependencies on taggers, parsers, curated dictionaries of place names, or other external resources. We show that our approach achieves the state-of-the-art on 5 datasets, surpassing conventional BERT models and benchmarks by a large margin. We also show that our approach generalises well to unseen data.
摘要：现有的转喻分辨率依赖于外部资源提取的功能，如字典和手工制作的词汇资源。在本文中，我们仅基于BERT的端到端的单词级分类方法，而无需对标记器，解析器，位置名称的策划词典或其他外部资源的依赖性。我们表明，我们的方法在5个数据集上实现了最先进的，超越了传统的BERT模型和基准。我们还表明，我们的方法概括了不合理的数据。

17. Cross-Domain Sentiment Classification With Contrastive Learning and Mutual Information Maximization [PDF] 返回目录
Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer
Abstract: Contrastive learning (CL) has been successful as a powerful representation learning method. In this work we propose CLIM: Contrastive Learning with mutual Information Maximization, to explore the potential of CL on cross-domain sentiment classification. To the best of our knowledge, CLIM is the first to adopt contrastive learning for natural language processing (NLP) tasks across domains. Due to scarcity of labels on the target domain, we introduce mutual information maximization (MIM) apart from CL to exploit the features that best support the final prediction. Furthermore, MIM is able to maintain a relatively balanced distribution of the model's prediction, and enlarges the margin between classes on the target domain. The larger margin increases our model's robustness and enables the same classifier to be optimal across domains. Consequently, we achieve new state-of-the-art results on the Amazon-review dataset as well as the airlines dataset, showing the efficacy of our proposed method CLIM.
摘要：对比学习（CL）已成功作为一种强大的代表学习方法。在这项工作中，我们提出了攀升：与相互信息最大化的对比学习，探讨CL对跨域情绪分类的潜力。据我们所知，Chile是第一个采用对比跨域的自然语言处理（NLP）任务的对比学习。由于目标域上标签的稀缺，我们将相互信息最大化（MIM）除CL以利用最佳支持最终预测的功能。此外，MIM能够维持模型的预测的相对平衡的分布，并在目标域上的类别之间扩大裕度。更大的边缘增加了我们的模型的稳健性，并使相同的分类器能够跨域最佳。因此，我们在亚马逊审查数据集以及航空公司数据集中实现了新的最先进的结果，显示了我们提出的方法高度的效果。

18. Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification [PDF] 返回目录
Juan Li, Ruoxu Wang, Ningyu Zhang, Wen Zhang, Fan Yang, Huajun Chen
Abstract: Relation classification aims to extract semantic relations between entity pairs from the sentences. However, most existing methods can only identify seen relation classes that occurred during training. To recognize unseen relations at test time, we explore the problem of zero-shot relation classification. Previous work regards the problem as reading comprehension or textual entailment, which have to rely on artificial descriptive information to improve the understandability of relation types. Thus, rich semantic knowledge of the relation labels is ignored. In this paper, we propose a novel logic-guided semantic representation learning model for zero-shot relation classification. Our approach builds connections between seen and unseen relations via implicit and explicit semantic representations with knowledge graph embeddings and logic rules. Extensive experimental results demonstrate that our method can generalize to unseen relation types and achieve promising improvements.
摘要：关系分类旨在从句子中提取实体对之间的语义关系。但是，大多数现有方法只能识别培训期间发生的所见的关系类。要认识到考试时间的看不见的关系，我们探讨了零射门关系分类问题。以前的工作将问题视为阅读理解或文本征征，这必须依赖人工描述信息来提高关系类型的可理解性。因此，忽略了对关系标签的丰富语义知识。在本文中，我们提出了一种用于零射线关系分类的新颖逻辑引导语义表示学习模型。我们的方法通过具有知识图形嵌入和逻辑规则的隐式和显式语义表示，通过隐式和显式语义表示构建了连接和看不见的关系之间的连接。广泛的实验结果表明，我们的方法可以概括看不明的关系类型并实现有前途的改进。

19. Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction [PDF] 返回目录
Haiyang Yu, Ningyu Zhang, Shumin Deng, Hongbin Ye, Wei Zhang, Huajun Chen
Abstract: Current supervised relational triple extraction approaches require huge amounts of labeled data and thus suffer from poor performance in few-shot settings. However, people can grasp new knowledge by learning a few instances. To this end, we take the first step to study the few-shot relational triple extraction, which has not been well understood. Unlike previous single-task few-shot problems, relational triple extraction is more challenging as the entities and relations have implicit correlations. In this paper, We propose a novel multi-prototype embedding network model to jointly extract the composition of relational triples, namely, entity pairs and corresponding relations. To be specific, we design a hybrid prototypical learning mechanism that bridges text and knowledge concerning both entities and relations. Thus, implicit correlations between entities and relations are injected. Additionally, we propose a prototype-aware regularization to learn more representative prototypes. Experimental results demonstrate that the proposed method can improve the performance of the few-shot triple extraction.
摘要：目前监督的关系三重提取方法需要大量标记数据，因此在几次拍摄设置中遭受差的性能。但是，人们可以通过学习一些实例来掌握新知识。为此，我们采取第一步研究几次拍摄的关系三重提取，这尚未得到很好的理解。与以前的单一任务不同的问题不同，关键三倍提取更具挑战性，因为实体和关系具有隐含的相关性。在本文中，我们提出了一种新的多原型嵌入网络模型，共同提取关系三元组的组成，即实体对和相应的关系。具体而言，我们设计了一种混合型原型学习机制，用于桥接有关实体和关系的文本和知识。因此，注入实体和关系之间的隐含相关性。此外，我们提出了一个原型感知正常化，以了解更多代表性的原型。实验结果表明，该方法可以提高少量三重提取的性能。

20. Generating Radiology Reports via Memory-driven Transformer [PDF] 返回目录
Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan
Abstract: Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical automation, which is an essential task to apply artificial intelligence to the medical domain. In this paper, we propose to generate radiology reports with memory-driven Transformer, where a relational memory is designed to record key information of the generation process and a memory-driven conditional layer normalization is applied to incorporating the memory into the decoder of Transformer. Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR, show that our proposed approach outperforms previous models with respect to both language generation metrics and clinical evaluations. Particularly, this is the first work reporting the generation results on MIMIC-CXR to the best of our knowledge. Further analyses also demonstrate that our approach is able to generate long reports with necessary medical terms as well as meaningful image-text attention mappings.
摘要：医学成像经常用于诊断和治疗的临床实践和试验中。编写成像报告是耗时的，对于缺乏经验的放射科医生来说可能是错误的。因此，非常需要自动产生放射学报告来减轻放射科学家的工作量，并因此提升临床自动化，这是对医疗领域应用人工智能的重要任务。在本文中，我们提出利用存储器驱动变压器产生放射学报告，其中关系存储器被设计为记录生成过程的关键信息，并且应用存储器驱动的条件层归一化以将存储器结合到变压器的解码器中。实验结果对两个普遍放射学报告数据集，IU X射线和模拟 - CXR，表明我们所提出的方法在语言生成度量和临床评估方面优于以前的模型。特别是，这是第一个向我们所知的模拟CXR的发电结果报告的第一项工作。进一步分析还表明，我们的方法能够以必要的医疗术语和有意义的图像文本注意映射生成长期报告。

21. VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation [PDF] 返回目录
Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi, Songfang Huang, Fei Huang, Luo Si
Abstract: Recent studies about learning multilingual representations have achieved significant performance gains across a wide range of downstream cross-lingual tasks. They train either an encoder-only Transformer mainly for understanding tasks, or an encoder-decoder Transformer specifically for generation tasks, ignoring the correlation between the two tasks and frameworks. In contrast, this paper presents a variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre-training tasks. VECO splits the standard Transformer block into several sub-modules trained with both inner-sequence and cross-sequence masked language modeling, and correspondingly reorganizes certain sub-modules for understanding and generation tasks during inference. Such a workflow not only ensures to train the most streamlined parameters necessary for two kinds of tasks, but also enables them to boost each other via sharing common sub-modules. As a result, VECO delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark covering text classification, sequence labeling, question answering, and sentence retrieval. For generation tasks, VECO also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1$\sim$2 BLEU.
摘要：最近关于学习多语言信息的研究取得了很大的下游交叉任务的显着性能。他们主要用于了解任务，或专门用于生成任务的编码器 - 解码器变压器，旨在培训唯一的变压器，忽略两个任务和框架之间的相关性。相比之下，本文介绍了可变编码器 - 解码器（VECO）预训练方法，以统一模型架构和预训练任务的两个主流。 Veco将标准变压器块分成几个具有内序和串序屏蔽语言建模的子模块，并且相应地重新组织某些子模块以在推理期间理解和生成任务。这样的工作流程不仅可以确保培训两种任务所需的最简化的参数，而且还使它们能够通过共享公共子模块来互相提升。因此，VECO在Xtreme基准覆盖文本分类，序列标记，问题应答和句子检索的各种交叉逻辑理解任务上提供新的最先进的结果。对于Generation任务，Veco还优于WMT14英语到德语和英语到法语翻译数据集上所有现有的交叉模型和最先进的变压器变体，最高可达1 $ \ SIM $ 2 Bleu 。

22. CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering [PDF] 返回目录
Xiang Yue, Xinliang, Zhang, Ziyu Yao, Simon Lin, Huan Sun
Abstract: Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts. Studies show that neural QA models trained on one corpus may not generalize well to new clinical texts from a different institute or a different patient group, where large-scale QA pairs are not readily available for retraining. To address this challenge, we propose a simple yet effective framework, CliniQG4QA, which leverages question generation (QG) to synthesize QA pairs on new clinical contexts and boosts QA models without requiring manual annotations. In order to generate diverse types of questions that are essential for training QA models, we further introduce a seq2seq-based question phrase prediction (QPP) module that can be used together with most existing QG models to diversify their generation. Our comprehensive experiment results show that the QA corpus generated by our framework is helpful to improve QA models on the new contexts (up to 8% absolute gain in terms of Exact Match), and that the QPP module plays a crucial role in achieving the gain.
摘要：临床问题应答（QA）旨在根据临床文本自动回答医疗专业人员的问题。研究表明，在一个语料库上培训的神经QA模型可能对来自不同研究所或不同患者组的新临床文本概括，其中大规模的QA对不容易再次可再培训。为了解决这一挑战，我们提出了一个简单但有效的框架CliniQG4QA，它利用了问题生成（QG）来合成了新的临床背景上的QA对，并在不需要手动注释的情况下提升QA模型。为了生成对训练QA模型至关重要的不同类型的问题，我们进一步引入了基于SEQ2SEQ的问题短语预测（QPP）模块，可以与大多数现有的QG模型一起使用，以使他们的生成多样化。我们的综合实验结果表明，由我们的框架产生的QA语料库有助于改进新上下文的QA模型（在完全匹配方面最高8％），QPP模块在实现增益方面发挥着至关重要的作用。

23. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [PDF] 返回目录
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh
Abstract: The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPrompt, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AutoPrompt, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.
摘要：预用语言模型的显着成功激励了这些模型在预磨平期间学习的知识的研究。作为填补空中问题的重新制定任务（例如，隐冻测试）是一种用于衡量此类知识的自然方法，但是，它的用法受到编写适当提示所需的手动努力和猜测的限制。为了解决此问题，我们基于梯度引导的搜索，我们开发自动研发，自动化方法为创建多种任务集的提示。使用AutoProMpt，我们显示屏蔽语言模型（MLMS）具有在没有额外参数或FineTuning的情况下执行情感分析和自然语言推理的固有能力，有时会在最近的最先进的监督模型方面实现性能。我们还表明，我们的提示从MLMS引起更准确的事实知识，而不是LAMA基准上的手动创建的提示，并且该MLM可以更有效地使用而不是监督的关系提取模型更有效地使用。这些结果表明，自动生成的提示是现有探测方法的可行性参数替代品，并且作为佩带的LMS变得更加复杂和能力，可能是FineTuning的替代。

24. RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain [PDF] 返回目录
Vitaly Ivanin, Ekaterina Artemova, Tatiana Batura, Vladimir Ivanov, Veronika Sarkisyan, Elena Tutubalina, Ivan Smurov
Abstract: We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.
摘要：我们展示了信息提取方法的应用，例如命名实体识别（NER）和关系提取（重新），由国家机构发布的文件组成的文件。该语料库的主要挑战是：1）注释方案从用于普通域语料库的一个人的贡献很大，而且2）文件用英语以外的语言编写。与期望不同，基于最先进的变换器的模型为两个任务显示了适度的性能，无论是顺序接近，还是以端到端的方式接近。我们的实验表明，在大型未标记的Cotora上进行微调不会自动产生显着的改善，因此我们可以得出结论，要求利用未标记文本的更复杂的策略。在本文中，我们描述了整个发达的管道，从文本注释，基线开发和设计共享任务，希望改进基线。最终，我们意识到当前的NER和RE技术远未成熟，而不是克服我们的挑战。

25. RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark [PDF] 返回目录
Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, Andrey Chertok, Andrey Evlampiev
Abstract: In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (this https URL), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the adapted diagnostic test set and offer the first steps to further expanding or assessing state-of-the-art models independently of language.
摘要：在本文中，我们介绍了一个先进的俄罗斯一般语言理解评估基准 - Russanglue。普通语言模型和变形金刚领域的最新进展需要开发用于广泛的诊断和测试的方法，以获得一般智能技能 - 检测自然语言推理，致辞语推理，无论文本主题或词典如何，执行简单的逻辑运行。从俄语的划痕开发了第一次收集和组织到SuperGlue方法的九个任务的基准，从俄语进行开发。我们提供基线，人力水平评估，是评估模型（此HTTPS URL）的开源框架，以及俄语的变压器模型的整体排行榜。此外，我们提出了比较了适用的诊断测试集中的多语种模型的第一个结果，并提供了一个独立于语言进一步扩展或评估最先进模型的第一步。

26. How Many Pages? Paper Length Prediction from the Metadata [PDF] 返回目录
Erion Çano, Ondřej Bojar
Abstract: Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models.
摘要：能够预测科学论文的长度可能会有所帮助。这项工作将纸张长度预测任务定义为回归问题，并使用流行的机器学习模型报告几个实验结果。我们还创建了一个庞大的发布元数据数据集和页面数量的相应长度。 DataSet将自由可用，旨在促进该领域的研究。作为未来的工作，我们想探索基于神经网络和大预磨料语言模型的更高级的回归。

27. Learning as Abduction: Trainable Natural Logic Theorem Prover for Natural Language Inference [PDF] 返回目录
Lasha Abzianidze
Abstract: Tackling Natural Language Inference with a logic-based method is becoming less and less common. While this might have been counterintuitive several decades ago, nowadays it seems pretty obvious. The main reasons for such a conception are that (a) logic-based methods are usually brittle when it comes to processing wide-coverage texts, and (b) instead of automatically learning from data, they require much of manual effort for development. We make a step towards to overcome such shortcomings by modeling learning from data as abduction: reversing a theorem-proving procedure to abduce semantic relations that serve as the best explanation for the gold label of an inference problem. In other words, instead of proving sentence-level inference relations with the help of lexical relations, the lexical relations are proved taking into account the sentence-level inference relations. We implement the learning method in a tableau theorem prover for natural language and show that it improves the performance of the theorem prover on the SICK dataset by 1.4% while still maintaining high precision (>94%). The obtained results are competitive with the state of the art among logic-based systems.
摘要：用基于逻辑的方法处理自然语言推断越来越少。虽然这可能已经在几十年前违反直觉，但如今，它看起来很明显。这种概念的主要原因是（a）基于逻辑的方法通常是脆性的，当涉及到宽覆盖文本时，（b）而不是自动从数据学习，他们需要大部分手动努力进行开发。我们通过将数据建模为绑架来克服这些缺点的一步：扭转定理证明程序，以避开语义关系，这是推理问题的黄金标签的最佳解释。换句话说，借助句子级推断关系证明了词汇关系而不是证明句子级推断关系。我们在Tableau定理箴言中实现了学习方法，用于自然语言，并表明它提高了病理证明者在病态数据集上的表现1.4％，同时保持高精度（> 94％）。所获得的结果与基于逻辑系统之间的最新技术具有竞争力。

28. Less is More: Data-Efficient Complex Question Answering over Knowledge Bases [PDF] 返回目录
Yuncheng Hua, Yuan-Fang Li, Guilin Qi, Wei Wu, Jingyao Zhang, Daiqing Qi
Abstract: Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectively, transforms a natural-language question into a sequence of primitive actions, and executes them over the knowledge base to compute the answer. We carefully formulate a set of primitive symbolic actions that allows us to not only simplify our neural network design but also accelerate model convergence. To reduce search space, we employ the copy and masking mechanisms in our encoder-decoder architecture to drastically reduce the decoder output vocabulary and improve model generalizability. We equip our model with a memory buffer that stores high-reward promising programs. Besides, we propose an adaptive reward function. By comparing the generated trial with the trials stored in the memory buffer, we derive the curriculum-guided reward bonus, i.e., the proximity and the novelty. To mitigate the sparse reward problem, we combine the adaptive reward and the reward bonus, reshaping the sparse reward into dense feedback. Also, we encourage the model to generate new trials to avoid imitating the spurious trials while making the model remember the past high-reward trials to improve data efficiency. Our NS-CQA model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both datasets, our model outperforms the state-of-the-art models. Notably, on CQA, NS-CQA performs well on questions with higher complexity, while only using approximately 1% of the total training samples.
摘要：问题应答是从知识库（KB）获取信息的有效方法。在本文中，我们提出了神经象征性的复杂问题应答（NS-CQA）模型，通过仅使用适度的训练样本来回答的数据有效的增强学习框架。我们的框架包括一个神经生成器和符号执行程序，分别将自然语言问题转换为一系列原始操作，并在知识库上执行以计算答案。我们仔细制定了一组原始符号动作，使我们不仅可以简化我们的神经网络设计，而且还加速了模型收敛。为了减少搜索空间，我们在编码器 - 解码器架构中使用复制和屏蔽机制，以大大减少解码器输出词汇表，提高模型概括性。我们使用存储高奖励承诺程序的内存缓冲区装备模型。此外，我们提出了一种自适应奖励功能。通过将生成的试验与存储在内存缓冲区中的试验进行比较，我们得出了课程引导奖励奖金，即，邻近和新奇。为了减轻稀疏奖励问题，我们将自适应奖励和奖励奖金结合起来，将稀疏奖励重塑成密集的反馈。此外，我们鼓励模型生成新的试验，以避免模仿模型记住过去的高奖励试验以提高数据效率的同时进行杂散试验。我们的NS-CQA模型在两个数据集中评估：CQA，最近的大规模复杂问题应答数据集，以及WebQuestionsSP，一个多跳问题应答数据集。在两个数据集中，我们的模型优于最先进的模型。值得注意的是，在CQA上，NS-CQA对具有更复杂性的问题进行良好，同时仅使用大约1％的总培训样本。

29. Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning [PDF] 返回目录
Yuncheng Hua, Yuan-Fang Li, Gholamreza Haffari, Guilin Qi, Tongtong Wu
Abstract: Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB). However, the conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types, harboring inherently different characteristics, e.g., difficulty level. This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions. Our method quickly and effectively adapts the meta-learned programmer to new questions based on the most similar questions retrieved from the training data. The meta-learned policy is then used to learn a good programming policy, utilizing the trial trajectories and their rewards for similar questions in the support set. Our method achieves state-of-the-art performance on the CQA dataset (Saha et al., 2018) while using only five trial trajectories for the top-5 retrieved questions in each support set, and metatraining on tasks constructed from only 1% of the training set. We have released our code at this https URL.
摘要：复杂的问答（CQA）涉及在知识库（KB）上回答复杂的自然语言问题。然而，当问题具有不同类型时，传统的神经节目感应（NPI）诱导（NPI）方法表现出不均匀的性能，其具有固有的不同特征，例如难度。本文提出了CQA中的节目诱导的元增强学习方法，以解决问题的潜在分布偏见。我们的方法很快和有效地将Meta学习的程序员突破到新问题，基于从训练数据检索的最相似的问题。然后，元学策略用于学习良好的编程策略，利用试验轨迹及其奖励在支持集中的类似问题。我们的方法在CQA DataSet上实现了最先进的性能（Saha等，2018），同时仅使用五个试验轨迹在每个支持集中的前5个检索的问题，并且只有1％构建的任务的解束训练集。我们在此HTTPS URL上发布了代码。

30. Leveraging Extracted Model Adversaries for Improved Black Box Attacks [PDF] 返回目录
Naveen Jafer Nizar, Ari Kobren
Abstract: We present a method for adversarial input generation against black box models for reading comprehension based question answering. Our approach is composed of two steps. First, we approximate a victim black box model via model extraction (Krishna et al., 2020). Second, we use our own white box method to generate input perturbations that cause the approximate model to fail. These perturbed inputs are used against the victim. In experiments we find that our method improves on the efficacy of the AddAny---a white box attack---performed on the approximate model by 25% F1, and the AddSent attack---a black box attack---by 11% F1 (Jia and Liang, 2017).
摘要：我们提出了一种针对黑匣子模型的对抗性投入生成方法，用于阅读理解的基于问题的回答。我们的方法由两个步骤组成。首先，我们通过模型提取近似受害者黑匣子模型（Krishna等，2020）。其次，我们使用自己的白色盒子方法来生成引起近似模型的输入扰动。这些扰动的输入用于受害者。在实验中，我们发现我们的方法提高了Addany的疗效---白色箱子攻击---在近似模型中进行了25％F1，以及添加攻击---黑匣子攻击---到11 ％F1（贾和梁，2017）。

31. Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews [PDF] 返回目录
Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Agnes Sliwinski, Jennifer Hamet Bagnou, Xuan Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi
Abstract: Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed-up the clinicians' reports. Yet, it is not clear which speech processing pipeline is the most performing to detect and identify the speaker turns, especially for individuals with speech and language disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of speaker role recognition and speaker enrollment methods to solve this task. We trained end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric. Experimental results are reported on naturalistic clinical conversations between Neuropsychologist and Interviewees, at different stages of Huntington's disease. We found that our Speaker Role Recognition model gave the best performances. In addition, our study underlined the importance of retraining models with in-domain data. Finally, we observed that results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
摘要：在自然条件下，临床医生和患者之间的对话是有价值的医学随访信息来源。这些对话的自动分析可以帮助提取新的语言标记并加快临床医生的报告。然而，目前尚不清楚哪种语音处理管道最表演用于检测和识别扬声器转弯，特别是对于语音和语言障碍的个人。在这里，我们提出了允许对演讲者角色识别和发言者注册方法进行比较评估的数据分开，以解决这项任务。我们培训了端到端的神经网络架构，以适应每个任务，并在相同度量下评估每个方法。报道了关于神经心理学家和受访者之间的自然临床谈话，在亨廷顿氏病的不同阶段之间进行了实验结果。我们发现，我们的发言者角色识别模型给出了最佳表现。此外，我们的研究强调了Retringing模型与域名数据的重要性。最后，我们观察到结果不依赖于受访者的人口统计，突出了我们方法的临床相关性。

32. T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model [PDF] 返回目录
Yanpei Shi, Mingjie Chen, Qiang Huang, Thomas Hain
Abstract: Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. This paper proposes a hierarchical network with transformer encoders and memory mechanism to address this problem. The proposed model contains a frame-level encoder and segment-level encoder, both of them make use of the transformer encoder block. The multi-head attention mechanism in the transformer structure could better capture different speaker properties when the input utterance contains multiple speakers. The memory mechanism used in the frame-level encoders can build a recurrent connection that better capture long-term speaker features. The experiments are conducted on artificial datasets based on the Switchboard Cellular part1 (SWBC) and Voxceleb1 datasets. In different data construction scenarios (Concat and Overlap), the proposed model shows better performance comparaing with four strong baselines, reaching 13.3% and 10.5% relative improvement compared with H-vectors and S-vectors. The use of memory mechanism could reach 10.6% and 7.7% relative improvement compared with not using memory mechanism.
摘要：识别多个扬声器，而不知道扬声器的声音在录音中是一个具有挑战性的任务。本文提出了一种具有变压器编码器和内存机制的分层网络，以解决此问题。所提出的模型包含帧级编码器和段级编码器，它们都使用变压器编码器块。当输入话语包含多个扬声器时，变压器结构中的多针注意机构可以更好地捕获不同的扬声器属性。帧级编码器中使用的内存机制可以构建重复连接，可更好地捕获长期扬声器功能。该实验在基于交换机蜂窝部分1（SWBC）和VoxceleB1数据集的人造数据集上进行。在不同的数据施工场景（Concat和Roplata）中，与H载体和S载体相比，所提出的模型与四个强的基线进行比较，达到达到13.3％和10.5％的相对改善。与不使用内存机制相比，使用内存机制的使用可以达到10.6％和7.7％。

33. Systolic Computing on GPUs for Productive Performance [PDF] 返回目录
Hongbo Rong, Xiaochen Hao, Yun Liang, Lidong Xu, Hong H Jiang, Pradeep Dubey
Abstract: We propose a language and compiler to productively build high-performance {\it software systolic arrays} that run on GPUs. Based on a rigorous mathematical foundation (uniform recurrence equations and space-time transform), our language has a high abstraction level and covers a wide range of applications. A programmer {\it specifies} a projection of a dataflow compute onto a linear systolic array, while leaving the detailed implementation of the projection to a compiler; the compiler implements the specified projection and maps the linear systolic array to the SIMD execution units and vector registers of GPUs. In this way, both productivity and performance are achieved in the same time. This approach neatly combines loop transformations, data shuffling, and vector register allocation into a single framework. Meanwhile, many other optimizations can be applied as well; the compiler composes the optimizations together to generate efficient code. We implemented the approach on Intel GPUs. This is the first system that allows productive construction of systolic arrays on GPUs. We allow multiple projections, arbitrary projection directions and linear schedules, which can express most, if not all, systolic arrays in practice. Experiments with 1- and 2-D convolution on an Intel GEN9.5 GPU have demonstrated the generality of the approach, and its productivity in expressing various systolic designs for finding the best candidate. Although our systolic arrays are purely software running on generic SIMD hardware, compared with the GPU's specialized, hardware samplers that perform the same convolutions, some of our best designs are up to 59\% faster. Overall, this approach holds promise for productive high-performance computing on GPUs.
摘要：我们提出了一种语言和编译器，可以在GPU上耗尽的高性能{\ IT软件收缩阵列}。基于严谨的数学基础（均匀复发方程和时空变换），我们的语言具有高的抽象级别，涵盖了广泛的应用。程序员{\它指定} Dataflow对线性收缩阵列的投影，同时将投影的详细实现留给编译器;编译器实现指定的投影并将线性收缩系统阵列映射到SIMD执行单元和GPU的向量寄存器。通过这种方式，在同一时间实现生产力和性能。该方法整齐地将循环变换，数据混组和向量寄存器分配组合成一个框架。同时，还可以应用许多其他优化;编译器将优化组合在一起以生成有效的代码。我们在英特尔GPU上实施了这种方法。这是第一系统，允许在GPU上施取收缩阵列的生产建设。我们允许多个投影，任意投影方向和线性时间表，其可以表达大多数，如果不是全部，则在实践中的收缩阵列。 Intel Gen9.5 GPU对1-和2-D卷积的实验表明了这种方法的一般性，以及表达各种收缩设计的生产力，以寻找最佳候选人。虽然我们的收缩阵列纯粹是在通用SIMD硬件上运行的软件，但与GPU的专业化，硬件采样器相比，我们执行相同的卷曲，我们的一些最佳设计速度越大，速度高达59 \％。总体而言，这种方法对GPU的高性能计算有望。

34. Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning [PDF] 返回目录
Yuncheng Hua, Yuan-Fang Li, Gholamreza Haffari, Guilin Qi, Wei Wu
Abstract: A compelling approach to complex question answering is to convert the question to a sequence of actions, which can then be executed on the knowledge base to yield the answer, aka the programmer-interpreter approach. Use similar training questions to the test question, meta-learning enables the programmer to adapt to unseen questions to tackle potential distributional biases quickly. However, this comes at the cost of manually labeling similar questions to learn a retrieval model, which is tedious and expensive. In this paper, we present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision, i.e., the system's performance with respect to the produced answers. To the best of our knowledge, this is the first attempt to train the retrieval model with the programmer jointly. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases. We have released our code at this https URL.
摘要：复杂问题回答的令人信服的方法是将问题转换为一系列动作，然后可以在知识库上执行，以产生答案，AKA程序员 - 解释器方法。使用类似的培训问题对测试问题，元学习使程序员能够适应不良问题以快速解决潜在的分布偏见。然而，这是以手动标记类似问题的成本来学习检索模型，这是乏味和昂贵的。在本文中，我们介绍了一种新的方法，可以使用弱监督员交替地使用程序员自动学习检索模型，即系统对所产生的答案的性能。据我们所知，这是第一次共同尝试与程序员一起训练检索模型。我们的系统导致最先进的表现，在大规模任务中进行复杂的问题，回答知识库。我们在此HTTPS URL上发布了代码。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-11-02

目录

摘要