0%

【arxiv论文】 Computation and Language 2021-01-06

目录

1. Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing [PDF] 摘要
2. On the interaction of automatic evaluation and task framing in headline style transfer [PDF] 摘要
3. Local Translation Services for Neglected Languages [PDF] 摘要
4. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing [PDF] 摘要
5. Political Depolarization of News Articles Using Attribute-aware Word Embeddings [PDF] 摘要
6. Reinforcement Learning based Collective Entity Alignment with Adaptive Features [PDF] 摘要
7. Integration of Domain Knowledge using Medical Knowledge Graph Deep Learning for Cancer Phenotyping [PDF] 摘要
8. Evaluating Empathetic Chatbots in Customer Service Settings [PDF] 摘要
9. I-BERT: Integer-only BERT Quantization [PDF] 摘要
10. Reddit Entity Linking Dataset [PDF] 摘要
11. Transformers and Transfer Learning for Improving Portuguese Semantic Role Labeling [PDF] 摘要
12. End-to-End Video Question-Answer Generation with Generator-Pretester Network [PDF] 摘要

摘要

1. Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing [PDF] 返回目录
  Binyuan Hui, Ruiying Geng, Qiyu Ren, Binhua Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei Zhu, Xiaodan Zhu
Abstract: Semantic parsing has long been a fundamental problem in natural language processing. Recently, cross-domain context-dependent semantic parsing has become a new focus of research. Central to the problem is the challenge of leveraging contextual information of both natural language utterance and database schemas in the interaction history. In this paper, we present a dynamic graph framework that is capable of effectively modelling contextual utterances, tokens, database schemas, and their complicated interaction as the conversation proceeds. The framework employs a dynamic memory decay mechanism that incorporates inductive bias to integrate enriched contextual relation representation, which is further enhanced with a powerful reranking model. At the time of writing, we demonstrate that the proposed framework outperforms all existing models by large margins, achieving new state-of-the-art performance on two large-scale benchmarks, the SParC and CoSQL datasets. Specifically, the model attains a 55.8% question-match and 30.8% interaction-match accuracy on SParC, and a 46.8% question-match and 17.0% interaction-match accuracy on CoSQL.
摘要:语义解析一直是自然语言处理中的一个基本问题。最近,跨域上下文相关的语义解析已成为研究的新焦点。问题的核心是在交互历史中利用自然语言话语和数据库模式的上下文信息的挑战。在本文中,我们提供了一个动态图框架,该框架能够随着对话的进行有效地建模上下文话语,令牌,数据库模式及其复杂的交互。该框架采用了动态记忆衰减机制,该机制结合了归纳偏差以整合丰富的上下文关系表示形式,并通过强大的重新排序模型进一步增强了该机制。在撰写本文时,我们证明了所提出的框架在很大程度上超越了所有现有模型,并在SParC和CoSQL数据集这两个大型基准上实现了最新的性能。具体而言,该模型在SParC上达到55.8%的问题匹配和30.8%的交互匹配精度,在CoSQL上达到46.8%的问题匹配和17.0%的交互匹配精度。

2. On the interaction of automatic evaluation and task framing in headline style transfer [PDF] 返回目录
  Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell'Orletta, Malvina Nissim, Albert Gatt
Abstract: An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU and ROUGE.
摘要:NLG社区中正在进行的辩论涉及评估系统的最佳方法,与基于语料库的度量相比,人工评估通常被认为是最可靠的方法。 但是,涉及细微的文本差异的任务(例如样式转换)对于人类来说往往很难执行。 在本文中,我们提出了一种基于经过专门训练的分类器的任务评估方法,该方法表明,与传统指标(如BLEU和ROUGE)相比,它可以更好地反映系统差异。

3. Local Translation Services for Neglected Languages [PDF] 返回目录
  David Noever, Josh Kalin, Matt Ciolino, Dom Hambrick, Gerry Dozier
Abstract: Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. Locally run translators for less popular languages may assist data projects with protected or personal data that may require specific compliance checks before posting to a public translation API, but which could render reasonable, cost-effective solutions if done with an army of local, small-scale pair translators. Like handling a specialist's dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak ("l33t") and 2) reverse (or "mirror") writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of hacker-speak in under 50 megabytes and demonstrates a generator for augmenting future datasets with greater than a million bilingual sentence pairs. The long short-term memory, recurrent neural network (LSTM-RNN) extends previous work demonstrating an English-to-foreign translation service built from as little as 10,000 bilingual sentence pairs. This work further solves the equivalent translation problem in twenty-six additional (non-obfuscated) languages and rank orders those models and their proficiency quantitatively with Italian as the most successful and Mandarin Chinese as the most challenging. For neglected languages, the method prototypes novel services for smaller niche translations such as Kabyle (Algerian dialect) which covers between 5-7 million speakers but one which for most enterprise translators, has not yet reached development. One anticipates the extension of this approach to other important dialects, such as translating technical (medical or legal) jargon and processing health records.
摘要:利用计算轻量级,但高质量的翻译程序,可以迅速考虑解决被忽略语言的新应用程序。由本地运行的较不流行语言的翻译人员可以为受保护或个人数据的数据项目提供帮助,这些数据可能需要在发布到公共翻译API之前进行特定的合规性检查,但是如果与当地的小型,标尺对转换器。像处理专家的方言一样,这项研究说明了翻译两种历史上有趣但令人迷惑的语言:1)达芬奇(Leonardo da Vinci)的做法是骇人说话(“ 33t”)和2)反向(或“镜像”)写作。这项工作将深度学习架构概括为具有精简,中型和硬性词汇的可翻译的黑客口语变体。最初的贡献突出了不足50兆字节的流利的黑客说译者,并演示了使用超过一百万个双语句子对扩展未来数据集的生成器。长期记忆,递归神经网络(LSTM-RNN)扩展了以前的工作,证明了从少至10,000个双语句子对构建的英语到外国翻译服务。这项工作进一步解决了26种其他(不混淆)语言的等效翻译问题,并对这些模型及其熟练程度进行了定量排名,其中以意大利语为最成功,而普通话为最具挑战性。对于被忽略的语言,该方法为较小的细分市场翻译提供了新颖的服务,例如Kabyle(阿尔及利亚方言),该语言覆盖了5-7百万的演讲者,但对于大多数企业翻译来说,这种服务尚未得到发展。人们期望将这种方法扩展到其他重要的方言,例如翻译技术(医学或法律)行话和处理健康记录。

4. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing [PDF] 返回目录
  Linh The Nguyen, Dat Quoc Nguyen
Abstract: We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech tagging, named entity recognition and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen, 2020) for each task independently. We publicly release PhoNLP as an open-source toolkit under the MIT License. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future research and applications in Vietnamese NLP. Our PhoNLP is available at this https URL
摘要:我们提出了第一个多任务学习模型-名为PhoNLP-用于联合越南语词性标记,命名实体识别和依赖项解析。 在越南基准数据集上进行的实验表明,PhoNLP产生了最先进的结果,优于单任务学习方法,该方法针对每个任务独立微调了经过预训练的越南语言模型PhoBERT(Nguyen和Nguyen,2020)。 我们根据MIT许可证公开发布了PhoNLP作为开源工具包。 我们希望PhoNLP可以作为越南NLP未来研究和应用的强大基准和有用工具包。 我们的PhoNLP可从以下https URL获得

5. Political Depolarization of News Articles Using Attribute-aware Word Embeddings [PDF] 返回目录
  Ruibo Liu, Lili Wang, Chenyan Jia, Soroush Vosoughi
Abstract: Political polarization in the US is on the rise. This polarization negatively affects the public sphere by contributing to the creation of ideological echo chambers. In this paper, we focus on addressing one of the factors that contributes to this polarity, polarized media. We introduce a framework for depolarizing news articles. Given an article on a certain topic with a particular ideological slant (eg., liberal or conservative), the framework first detects polar language in the article and then generates a new article with the polar language replaced with neutral expressions. To detect polar words, we train a multi-attribute-aware word embedding model that is aware of ideology and topics on 360k full-length media articles. Then, for text generation, we propose a new algorithm called Text Annealing Depolarization Algorithm (TADA). TADA retrieves neutral expressions from the word embedding model that not only decrease ideological polarity but also preserve the original argument of the text, while maintaining grammatical correctness. We evaluate our framework by comparing the depolarized output of our model in two modes, fully-automatic and semi-automatic, on 99 stories spanning 11 topics. Based on feedback from 161 human testers, our framework successfully depolarized 90.1% of paragraphs in semi-automatic mode and 78.3% of paragraphs in fully-automatic mode. Furthermore, 81.2% of the testers agree that the non-polar content information is well-preserved and 79% agree that depolarization does not harm semantic correctness when they compare the original text and the depolarized text. Our work shows that data-driven methods can help to locate political polarity and aid in the depolarization of articles.
摘要:美国的政治两极分化正在上升。这种极化通过促成思想上的回声室而对公共领域产生负面影响。在本文中,我们专注于解决导致这种极性的因素之一,即极化介质。我们介绍了一种使新闻文章去极化的框架。给定关于某个主题具有特定意识形态倾向(例如,自由主义或保守主义)的文章,则框架首先检测文章中的极地语言,然后生成新文章,其中极地语言被中性表达替换。为了检测极地单词,我们训练了一种多属性感知的单词嵌入模型,该模型可以了解360k全长媒体文章中的意识形态和主题。然后,对于文本生成,我们提出了一种称为文本退火去极化算法(TADA)的新算法。 TADA从单词嵌入模型中检索中性表达,这些中性表达不仅降低了意识形态的极性,而且保留了文本的原始论点,同时保持了语法上的正确性。我们通过在11个主题的99个故事上以全自动和半自动两种模式比较模型的去极化输出来评估我们的框架。根据161位测试人员的反馈,我们的框架在半自动模式下成功消除了90.1%的段落极化,在全自动模式下成功消除了78.3%的段落极化。此外,有81.2%的测试人员认为非极性内容信息得到了很好的保存,而79%的测试人员认为去极化不会损害原始文本和去极化文本的语义正确性。我们的工作表明,数据驱动的方法可以帮助定位政治上的两极分化,并有助于文章的去极化。

6. Reinforcement Learning based Collective Entity Alignment with Adaptive Features [PDF] 返回目录
  Weixin Zeng, Xiang Zhao, Jiuyang Tang, Xuemin Lin, Paul Groth
Abstract: Entity alignment (EA) is the task of identifying the entities that refer to the same real-world object but are located in different knowledge graphs (KGs). For entities to be aligned, existing EA solutions treat them separately and generate alignment results as ranked lists of entities on the other side. Nevertheless, this decision-making paradigm fails to take into account the interdependence among entities. Although some recent efforts mitigate this issue by imposing the 1-to-1 constraint on the alignment process, they still cannot adequately model the underlying interdependence and the results tend to be sub-optimal. To fill in this gap, in this work, we delve into the dynamics of the decision-making process, and offer a reinforcement learning (RL) based model to align entities collectively. Under the RL framework, we devise the coherence and exclusiveness constraints to characterize the interdependence and restrict collective alignment. Additionally, to generate more precise inputs to the RL framework, we employ representative features to capture different aspects of the similarity between entities in heterogeneous KGs, which are integrated by an adaptive feature fusion strategy. Our proposal is evaluated on both cross-lingual and mono-lingual EA benchmarks and compared against state-of-the-art solutions. The empirical results verify its effectiveness and superiority.
摘要:实体对齐(EA)是识别引用同一真实世界对象但位于不同知识图(KG)中的实体的任务。对于要对齐的实体,现有的EA解决方案将它们分开对待,并生成对齐结果作为另一侧的实体排名列表。然而,这种决策范式没有考虑实体之间的相互依赖性。尽管最近有一些工作通过在对齐过程中施加1-to-1约束来缓解此问题,但它们仍然无法充分建模基础的相互依存关系,并且结果趋于次优。为了填补这一空白,在这项工作中,我们深入研究了决策过程的动态,并提供了一种基于强化学习(RL)的模型来统一实体。在RL框架下,我们设计了相干性和排他性约束来表征相互依赖关系并限制集体一致性。此外,为了生成更精确的RL框架输入,我们采用了代表性的特征来捕获异构KG中实体之间相似性的不同方面,这些特征通过自适应特征融合策略进行了集成。我们的建议在跨语言和单语言EA基准上进行了评估,并与最新解决方案进行了比较。实证结果证明了其有效性和优越性。

7. Integration of Domain Knowledge using Medical Knowledge Graph Deep Learning for Cancer Phenotyping [PDF] 返回目录
  Mohammed Alawad, Shang Gao, Mayanka Chandra Shekar, S.M.Shamimul Hasan, J. Blair Christian, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Lynne Penberthy, Georgia Tourassi
Abstract: A key component of deep learning (DL) for natural language processing (NLP) is word embeddings. Word embeddings that effectively capture the meaning and context of the word that they represent can significantly improve the performance of downstream DL models for various NLP tasks. Many existing word embeddings techniques capture the context of words based on word co-occurrence in documents and text; however, they often cannot capture broader domain-specific relationships between concepts that may be crucial for the NLP task at hand. In this paper, we propose a method to integrate external knowledge from medical terminology ontologies into the context captured by word embeddings. Specifically, we use a medical knowledge graph, such as the unified medical language system (UMLS), to find connections between clinical terms in cancer pathology reports. This approach aims to minimize the distance between connected clinical concepts. We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics -- site, subsite, laterality, behavior, histology, and grade - from a dataset of ~900K cancer pathology reports. The results show that the MT-CNN model which uses our domain informed embeddings outperforms the same MT-CNN using standard word2vec embeddings across all tasks, with an improvement in the overall micro- and macro-F1 scores by 4.97\%and 22.5\%, respectively.
摘要:词嵌入是用于自然语言处理(NLP)的深度学习(DL)的关键组成部分。有效地捕获单词表示的单词的含义和上下文的单词嵌入可以显着提高针对各种NLP任务的下游DL模型的性能。许多现有的词嵌入技术都是基于文档和文本中的词共现来捕获词的上下文的。但是,它们通常无法捕获概念之间更广泛的领域特定关系,这对于手头的NLP任务可能至关重要。在本文中,我们提出了一种将医学术语本体中的外部知识集成到词嵌入捕获的上下文中的方法。具体来说,我们使用医学知识图(例如统一医学语言系统(UMLS))在癌症病理报告中查找临床术语之间的联系。该方法旨在最大程度地减少相关临床概念之间的距离。我们使用多任务卷积神经网络(MT-CNN)对提出的方法进行评估,以从约90万例癌症病理报告的数据集中提取六个癌症特征-部位,亚部位,偏侧性,行为,组织学和等级。结果表明,在所有任务中,使用我们的域信息嵌入的MT-CNN模型优于使用标准word2vec嵌入的相同MT-CNN,在微观和宏观F1评分上的总体得分分别提高了4.97 \%和22.5 \% , 分别。

8. Evaluating Empathetic Chatbots in Customer Service Settings [PDF] 返回目录
  Akshay Agarwal, Shashank Maiya, Sonu Aggarwal
Abstract: Customer service is a setting that calls for empathy in live human agent responses. Recent advances have demonstrated how open-domain chatbots can be trained to demonstrate empathy when responding to live human utterances. We show that a blended skills chatbot model that responds to customer queries is more likely to resemble actual human agent response if it is trained to recognize emotion and exhibit appropriate empathy, than a model without such training. For our analysis, we leverage a Twitter customer service dataset containing several million customer<->agent dialog examples in customer service contexts from 20 well-known brands.
摘要:客户服务是一种在现场人工座席响应中需要移情的设置。 最近的进展表明,在响应人类的实时言语时,如何训练开放域的聊天机器人来表现同理心。 我们显示,与未经训练的模型相比,如果训练有素的情感识别和表现出适当的同理心,则响应客户查询的混合技能聊天机器人模型更可能类似于实际的人类代理响应。 为了进行分析,我们利用了一个Twitter客户服务数据集,其中包含来自20个知名品牌的数百万个客户<->代理对话示例。

9. I-BERT: Integer-only BERT Quantization [PDF] 返回目录
  Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Abstract: Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for many edge processors, and it has been a challenge to deploy these models for edge applications and devices that have resource constraints. While quantization can be a viable solution to this, previous work on quantizing Transformer based models uses floating-point arithmetic during inference, thus limiting model deployment on many edge processors. In this work, we propose a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process. In particular, we demonstrate how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations. We use those approximations in our method, I-BERT, with an end-to-end integer-only inference, and without any floating point calculation. We test our approach on GLUE downstream tasks using RoBERTa-Base and RoBERTa-Large. For both cases, with an 8-bit integer-only quantization scheme, I-BERT achieves similar accuracy as compared to the full-precision baseline.
摘要:基于变压器的模型,例如BERT和RoBERTa,在许多自然语言处理任务中都取得了最新的成果。但是,它们的内存占用量,推理延迟和功耗对于许多边缘处理器来说都是令人望而却步的,而将这些模型部署到具有资源限制的边缘应用程序和设备上一直是一个挑战。尽管量化可能是解决此问题的可行方法,但是先前基于量化基于Transformer的模型的工作在推理过程中使用浮点算法,因此限制了模型在许多边缘处理器上的部署。在这项工作中,我们为基于Transformer的模型提出了一种新颖的仅整数量化方案,该方案对整个推理过程进行了量化。特别是,我们演示了如何使用轻量级整数计算来近似估算Transformer体系结构中的非线性运算,例如GELU,Softmax和Layer Normalization。我们在方法I-BERT中使用这些近似值,并且具有端到端的仅整数推断,而无需任何浮点计算。我们使用RoBERTa-Base和RoBERTa-Large在GLUE下游任务上测试了我们的方法。对于这两种情况,与全精度基线相比,使用8位纯整数量化方案,I-BERT可获得相似的精度。

10. Reddit Entity Linking Dataset [PDF] 返回目录
  Nicholas Botzer, Yifan Ding, Tim Weninger
Abstract: We introduce and make publicly available an entity linking dataset from Reddit that contains17,316 linked entities, each annotated by three human annotators and then grouped into Gold, Silver, and Bronze to indicate inter-annotator agreement. We analyze the different errors and disagreements made by annotators and suggest three types of corrections to the raw data. Finally, we tested existing entity linking models that are trained and tuned on text from non-social media datasets. We find that, although these existing entity linking models perform very well on their original datasets, they perform poorly on this social media dataset. We also show that the majority of these errors can be attributed to poor performance on the mention detection subtask. These results indicate the need for better entity linking models that can be applied to the enormous amount of social media text.
摘要:我们引入并公开提供了来自Reddit的实体链接数据集,该数据集包含17,316个链接实体,每个链接实体由三个人工注释者进行注释,然后分组为Gold,Silver和Bronze,以指示注释者之间的协议。 我们分析了注释者所做的不同错误和分歧,并提出了对原始数据的三种更正类型。 最后,我们测试了现有的实体链接模型,这些模型针对非社交媒体数据集中的文本进行了训练和调整。 我们发现,尽管这些现有的实体链接模型在其原始数据集上表现出色,但在该社交媒体数据集上却表现不佳。 我们还表明,这些错误中的大多数可以归因于提及检测子任务的性能不佳。 这些结果表明需要更好的实体链接模型,该模型可以应用于大量社交媒体文本。

11. Transformers and Transfer Learning for Improving Portuguese Semantic Role Labeling [PDF] 返回目录
  Sofia Oliveira, Daniel Loureiro, Alípio Jorge
Abstract: Semantic Role Labeling (SRL) is a core Natural Language Processing task. For English, recent methods based on Transformer models have allowed for major improvements over the previous state of the art. However, for low resource languages, and in particular for Portuguese, currently available SRL models are hindered by scarce training data. In this paper, we explore a model architecture with only a pre-trained BERT-based model, a linear layer, softmax and Viterbi decoding. We substantially improve the state of the art performance in Portuguese by over 15$F_1$. Additionally, we improve SRL results in Portuguese corpora by exploiting cross-lingual transfer learning using multilingual pre-trained models (XLM-R), and transfer learning from dependency parsing in Portuguese. We evaluate the various proposed approaches empirically and as result we present an heuristic that supports the choice of the most appropriate model considering the available resources.
摘要:语义角色标记(SRL)是自然语言处理的核心任务。 对于英语,基于Transformer模型的最新方法已对以前的现有技术进行了重大改进。 但是,对于资源较少的语言,尤其是葡萄牙语,由于缺乏培训数据而无法使用当前可用的SRL模型。 在本文中,我们探索了仅基于预训练的基于BERT的模型,线性层,softmax和Viterbi解码的模型体系结构。 我们将葡萄牙语的最新技术水平提高了15 $ F_1 $以上。 此外,我们通过使用多语言预训练模型(XLM-R)进行跨语言迁移学习,并从葡萄牙语中的依存关系解析迁移学习,从而提高了葡萄牙语语料库的SRL结果。 我们根据经验评估各种提议的方法,结果,我们提出了一种启发式方法,该方法支持在考虑可用资源的情况下选择最合适的模型。

12. End-to-End Video Question-Answer Generation with Generator-Pretester Network [PDF] 返回目录
  Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang Chang, Yu-Cheng Chang, Pu-Jen Cheng, Winston H. Hsu
Abstract: We study a novel task, Video Question-Answer Generation (VQAG), for challenging Video Question Answering (Video QA) task in multimedia. Due to expensive data annotation costs, many widely used, large-scale Video QA datasets such as Video-QA, MSVD-QA and MSRVTT-QA are automatically annotated using Caption Question Generation (CapQG) which inputs captions instead of the video itself. As captions neither fully represent a video, nor are they always practically available, it is crucial to generate question-answer pairs based on a video via Video Question-Answer Generation (VQAG). Existing video-to-text (V2T) approaches, despite taking a video as the input, only generate a question alone. In this work, we propose a novel model Generator-Pretester Network that focuses on two components: (1) The Joint Question-Answer Generator (JQAG) which generates a question with its corresponding answer to allow Video Question "Answering" training. (2) The Pretester (PT) verifies a generated question by trying to answer it and checks the pretested answer with both the model's proposed answer and the ground truth answer. We evaluate our system with the only two available large-scale human-annotated Video QA datasets and achieves state-of-the-art question generation performances. Furthermore, using our generated QA pairs only on the Video QA task, we can surpass some supervised baselines. We apply our generated questions to Video QA applications and surpasses some supervised baselines using generated questions only. As a pre-training strategy, we outperform both CapQG and transfer learning approaches when employing semi-supervised (20%) or fully supervised learning with annotated data. These experimental results suggest the novel perspectives for Video QA training.
摘要:我们研究了一种新颖的任务,视频问题答案生成(VQAG),用于挑战多媒体中的视频问题回答(Video QA)任务。由于昂贵的数据注释成本,许多使用广泛的大规模视频QA数据集(例如Video-QA,MSVD-QA和MSRVTT-QA)会使用字幕问题生成(CapQG)自动注释,该问题输入字幕而不是视频本身。由于字幕既不能完全代表视频,也不能始终实用,因此至关重要的是,通过视频问题解答生成(VQAG)基于视频生成问题答案对。尽管将视频作为输入,但是现有的视频到文本(V2T)方法仅产生一个问题。在这项工作中,我们提出了一种新颖的生成器-预测试器网络模型,该模型着重于两个部分:(1)联合问题-答案生成器(JQAG)生成带有相应答案的问题,以进行视频问题“答案”培训。 (2)Pretester(PT)通过尝试回答所生成的问题来进行验证,并使用模型的建议答案和地面真实答案来检查预先测试的答案。我们使用仅有的两个可用的大规模人工注释视频质量检查数据集评估我们的系统,并实现最新的问题生成性能。此外,仅在视频质量检查任务中使用生成的质量检查对,我们可以超越一些监督基准。我们将生成的问题应用于视频质量检查应用程序,仅使用生成的问题就超出了一些监督基准。作为一种预培训策略,当使用带注释数据的半监督(20%)或完全监督的学习时,我们优于CapQG和迁移学习方法。这些实验结果为视频质量检查培训提供了新颖的视角。

注:中文为机器翻译结果!封面为论文标题词云图!