目录
1. Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis [PDF] 摘要
3. Modeling Graph Structure via Relative Position for Better Text Generation from Knowledge Graphs [PDF] 摘要
4. Weakly-supervised Domain Adaption for Aspect Extraction via Multi-level Interaction Transfer [PDF] 摘要
6. A Hybrid Natural Language Generation System Integrating Rules and Deep Learning Algorithms [PDF] 摘要
8. CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning [PDF] 摘要
9. How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation [PDF] 摘要
10. PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models [PDF] 摘要
17. Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data [PDF] 摘要
19. To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks [PDF] 摘要
25. Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering [PDF] 摘要
摘要
1. Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis [PDF] 返回目录
Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, David R. Mortensen
Abstract: Cross-lingual transfer learning studies how datasets, annotations, and models can be transferred from resource-rich languages to improve language technologies in resource-poor settings. Recent works have shown that we can further benefit from the selection of the best transfer language. In this paper, we propose three pragmatically-motivated features that can help guide the optimal transfer language selection problem for cross-lingual transfer. Specifically, the proposed features operationalize cross-cultural similarities that manifest in various linguistic patterns: language context-level, sharing multi-word expressions, and the use of emotion concepts. Our experimental results show that these features significantly improve the prediction of optimal transfer languages over baselines in sentiment analysis, but are less useful for dependency parsing. Further analyses show that the proposed features indeed capture the intended cross-cultural similarities and align well with existing work in sociolinguistics and linguistic anthropology.
摘要:跨语言迁移学习研究如何数据集,注释和模型可以从资源丰富的语言转移到提高资源贫乏的语言技术。最近的工作表明,我们可以进一步从最好的传输语言的选择中受益。在本文中,我们提出了三种务实激励的功能,可以帮助指导进行跨语言传输的最佳传输语言选择问题。具体而言,所提出的功能操作化跨文化的相似之处体现在各种语言形式:语言上下文级别,共享多字的表达,以及使用的情感概念。我们的实验结果表明,这些特征显著改善了市场情绪分析基线最优转移语言的预测,但对依存分析用处不大。进一步的分析表明,该功能确实捕获所需的跨文化的相似性和对齐以及在社会语言学和语言人类学现有的工作。
Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, David R. Mortensen
Abstract: Cross-lingual transfer learning studies how datasets, annotations, and models can be transferred from resource-rich languages to improve language technologies in resource-poor settings. Recent works have shown that we can further benefit from the selection of the best transfer language. In this paper, we propose three pragmatically-motivated features that can help guide the optimal transfer language selection problem for cross-lingual transfer. Specifically, the proposed features operationalize cross-cultural similarities that manifest in various linguistic patterns: language context-level, sharing multi-word expressions, and the use of emotion concepts. Our experimental results show that these features significantly improve the prediction of optimal transfer languages over baselines in sentiment analysis, but are less useful for dependency parsing. Further analyses show that the proposed features indeed capture the intended cross-cultural similarities and align well with existing work in sociolinguistics and linguistic anthropology.
摘要:跨语言迁移学习研究如何数据集,注释和模型可以从资源丰富的语言转移到提高资源贫乏的语言技术。最近的工作表明,我们可以进一步从最好的传输语言的选择中受益。在本文中,我们提出了三种务实激励的功能,可以帮助指导进行跨语言传输的最佳传输语言选择问题。具体而言,所提出的功能操作化跨文化的相似之处体现在各种语言形式:语言上下文级别,共享多字的表达,以及使用的情感概念。我们的实验结果表明,这些特征显著改善了市场情绪分析基线最优转移语言的预测,但对依存分析用处不大。进一步的分析表明,该功能确实捕获所需的跨文化的相似性和对齐以及在社会语言学和语言人类学现有的工作。
2. Communicative need modulates competition in language change [PDF] 返回目录
Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
Abstract: All living languages change over time. The causes for this are many, one being the emergence and borrowing of new linguistic elements. Competition between the new elements and older ones with a similar semantic or grammatical function may lead to speakers preferring one of them, and leaving the other to go out of use. We introduce a general method for quantifying competition between linguistic elements in diachronic corpora which does not require language-specific resources other than a sufficiently large corpus. This approach is readily applicable to a wide range of languages and linguistic subsystems. Here, we apply it to lexical data in five corpora differing in language, type, genre, and time span. We find that changes in communicative need are consistently predictive of lexical competition dynamics. Near-synonymous words are more likely to directly compete if they belong to a topic of conversation whose importance to language users is constant over time, possibly leading to the extinction of one of the competing words. By contrast, in topics which are increasing in importance for language users, near-synonymous words tend not to compete directly and can coexist. This suggests that, in addition to direct competition between words, language change can be driven by competition between topics or semantic subspaces.
摘要:所有的生活语言随着时间的变化。造成这种情况的原因是多方面的,其中之一是出现和新的语言元素借款。新元素和旧的有类似的语义或语法功能之间的竞争可能导致喇叭更喜欢他们中的一个,并留下对方去使用了。我们推出用于定量历时语料库不需要比一个足够大的语料库其他特定语言资源语言要素之间的竞争的一般方法。这种方法很容易适用于各种语言和语言子系统。在这里,我们将它应用到词汇数据在五个语料库在语言,类型,风格,和时间跨度不同。我们发现,在交际需要的变化是一致的预测词汇竞争动态。意思相近的词更可能直接竞争,如果他们属于它的语言使用者的重要性上是恒定时间交谈的话题,可能导致的竞争一个关键词的灭绝。相比之下,在主题其重要性语言用户的不断增加,意思相近的词往往不会直接竞争,可以共存。这表明,除了单词之间的直接竞争,语言的变化可以通过主题或语义子空间之间的竞争推动。
Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
Abstract: All living languages change over time. The causes for this are many, one being the emergence and borrowing of new linguistic elements. Competition between the new elements and older ones with a similar semantic or grammatical function may lead to speakers preferring one of them, and leaving the other to go out of use. We introduce a general method for quantifying competition between linguistic elements in diachronic corpora which does not require language-specific resources other than a sufficiently large corpus. This approach is readily applicable to a wide range of languages and linguistic subsystems. Here, we apply it to lexical data in five corpora differing in language, type, genre, and time span. We find that changes in communicative need are consistently predictive of lexical competition dynamics. Near-synonymous words are more likely to directly compete if they belong to a topic of conversation whose importance to language users is constant over time, possibly leading to the extinction of one of the competing words. By contrast, in topics which are increasing in importance for language users, near-synonymous words tend not to compete directly and can coexist. This suggests that, in addition to direct competition between words, language change can be driven by competition between topics or semantic subspaces.
摘要:所有的生活语言随着时间的变化。造成这种情况的原因是多方面的,其中之一是出现和新的语言元素借款。新元素和旧的有类似的语义或语法功能之间的竞争可能导致喇叭更喜欢他们中的一个,并留下对方去使用了。我们推出用于定量历时语料库不需要比一个足够大的语料库其他特定语言资源语言要素之间的竞争的一般方法。这种方法很容易适用于各种语言和语言子系统。在这里,我们将它应用到词汇数据在五个语料库在语言,类型,风格,和时间跨度不同。我们发现,在交际需要的变化是一致的预测词汇竞争动态。意思相近的词更可能直接竞争,如果他们属于它的语言使用者的重要性上是恒定时间交谈的话题,可能导致的竞争一个关键词的灭绝。相比之下,在主题其重要性语言用户的不断增加,意思相近的词往往不会直接竞争,可以共存。这表明,除了单词之间的直接竞争,语言的变化可以通过主题或语义子空间之间的竞争推动。
3. Modeling Graph Structure via Relative Position for Better Text Generation from Knowledge Graphs [PDF] 返回目录
Martin Schmitt, Leonardo F. R. Ribeiro, Philipp Dufter, Iryna Gurevych, Hinrich Schütze
Abstract: We present a novel encoder-decoder architecture for graph-to-text generation based on Transformer, called the Graformer. With our novel graph self-attention, every node in the input graph is taken into account for the encoding of every other node - not only direct neighbors, facilitating the detection of global patterns. For this, the relation between any two nodes is characterized by the length of the shortest path between them, including the special case when there is no such path. The Graformer learns to weigh these node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph. We evaluate the Graformer on two graph-to-text generation benchmarks, the AGENDA dataset and the WebNLG challenge dataset, where it achieves strong performance while using significantly less parameters than other approaches.
摘要:我们提出一个新的编码器 - 解码器架构基于变压器图形到文本生成,称为Graformer。随着我们新的图形自我的关注,在输入图中的每个节点是考虑到对其他每个节点的编码 - 不仅直接邻居,促进全球模式的检测。对于这一点,任何两个节点之间的关系的特征在于它们之间的最短路径,其中包括特殊情况的长度时不存在这样的路径。该Graformer学会以不同的权衡这些节点,节点关系,为不同的注意头,从而无形中学习输入图形的连接方式不同意见。我们评估Graformer两个图形到文本生成基准,议程数据集和WebNLG挑战数据集,它同时采用比其它方法显著少参数实现强劲的性能。
Martin Schmitt, Leonardo F. R. Ribeiro, Philipp Dufter, Iryna Gurevych, Hinrich Schütze
Abstract: We present a novel encoder-decoder architecture for graph-to-text generation based on Transformer, called the Graformer. With our novel graph self-attention, every node in the input graph is taken into account for the encoding of every other node - not only direct neighbors, facilitating the detection of global patterns. For this, the relation between any two nodes is characterized by the length of the shortest path between them, including the special case when there is no such path. The Graformer learns to weigh these node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph. We evaluate the Graformer on two graph-to-text generation benchmarks, the AGENDA dataset and the WebNLG challenge dataset, where it achieves strong performance while using significantly less parameters than other approaches.
摘要:我们提出一个新的编码器 - 解码器架构基于变压器图形到文本生成,称为Graformer。随着我们新的图形自我的关注,在输入图中的每个节点是考虑到对其他每个节点的编码 - 不仅直接邻居,促进全球模式的检测。对于这一点,任何两个节点之间的关系的特征在于它们之间的最短路径,其中包括特殊情况的长度时不存在这样的路径。该Graformer学会以不同的权衡这些节点,节点关系,为不同的注意头,从而无形中学习输入图形的连接方式不同意见。我们评估Graformer两个图形到文本生成基准,议程数据集和WebNLG挑战数据集,它同时采用比其它方法显著少参数实现强劲的性能。
4. Weakly-supervised Domain Adaption for Aspect Extraction via Multi-level Interaction Transfer [PDF] 返回目录
Tao Liang, Wenya Wang, Fengmao Lv
Abstract: Fine-grained aspect extraction is an essential sub-task in aspect based opinion analysis. It aims to identify the aspect terms (a.k.a. opinion targets) of a product or service in each sentence. However, expensive annotation process is usually involved to acquire sufficient token-level labels for each domain. To address this limitation, some previous works propose domain adaptation strategies to transfer knowledge from a sufficiently labeled source domain to unlabeled target domains. But due to both the difficulty of fine-grained prediction problems and the large domain gap between domains, the performance remains unsatisfactory. This work conducts a pioneer study on leveraging sentence-level aspect category labels that can be usually available in commercial services like review sites to promote token-level transfer for the extraction purpose. Specifically, the aspect category information is used to construct pivot knowledge for transfer with assumption that the interactions between sentence-level aspect category and token-level aspect terms are invariant across domains. To this end, we propose a novel multi-level reconstruction mechanism that aligns both the fine-grained and coarse-grained information in multiple levels of abstractions. Comprehensive experiments demonstrate that our approach can fully utilize sentence-level aspect category labels to improve cross-domain aspect extraction with a large performance gain.
摘要:细粒度方面提取是基于方面观点的分析必不可少的子任务。它的目的是确定每个句子的产品或服务方面的术语(又名意见的目标)。然而,昂贵的注释过程通常涉及到获得足够的令牌级标签为每个域。为了解决这个限制,以前的一些作品提出域适应战略转移的知识从一个足够标记源域到未标记的目标域。但由于细粒度预测问题既困难和域之间的巨大差距域,表现仍然不能令人满意。这项工作开展于利用语句级方面类别标签,可以是通常的商业服务提供类似评论网站,以促进提取目的标记级别转移的先驱研究。具体而言,纵横类别信息用于构建枢轴知识用于与假定句子级方面类别和标记级别方面术语之间的相互作用是通过域不变传送。为此,我们提出了一种新颖的多级重建机构,其对准无论在抽象的多个级别的细粒度和粗粒度的信息。综合实验表明,我们的方法可以充分利用语句级方面类别标签,以改善与大的性能增益跨域方面提取。
Tao Liang, Wenya Wang, Fengmao Lv
Abstract: Fine-grained aspect extraction is an essential sub-task in aspect based opinion analysis. It aims to identify the aspect terms (a.k.a. opinion targets) of a product or service in each sentence. However, expensive annotation process is usually involved to acquire sufficient token-level labels for each domain. To address this limitation, some previous works propose domain adaptation strategies to transfer knowledge from a sufficiently labeled source domain to unlabeled target domains. But due to both the difficulty of fine-grained prediction problems and the large domain gap between domains, the performance remains unsatisfactory. This work conducts a pioneer study on leveraging sentence-level aspect category labels that can be usually available in commercial services like review sites to promote token-level transfer for the extraction purpose. Specifically, the aspect category information is used to construct pivot knowledge for transfer with assumption that the interactions between sentence-level aspect category and token-level aspect terms are invariant across domains. To this end, we propose a novel multi-level reconstruction mechanism that aligns both the fine-grained and coarse-grained information in multiple levels of abstractions. Comprehensive experiments demonstrate that our approach can fully utilize sentence-level aspect category labels to improve cross-domain aspect extraction with a large performance gain.
摘要:细粒度方面提取是基于方面观点的分析必不可少的子任务。它的目的是确定每个句子的产品或服务方面的术语(又名意见的目标)。然而,昂贵的注释过程通常涉及到获得足够的令牌级标签为每个域。为了解决这个限制,以前的一些作品提出域适应战略转移的知识从一个足够标记源域到未标记的目标域。但由于细粒度预测问题既困难和域之间的巨大差距域,表现仍然不能令人满意。这项工作开展于利用语句级方面类别标签,可以是通常的商业服务提供类似评论网站,以促进提取目的标记级别转移的先驱研究。具体而言,纵横类别信息用于构建枢轴知识用于与假定句子级方面类别和标记级别方面术语之间的相互作用是通过域不变传送。为此,我们提出了一种新颖的多级重建机构,其对准无论在抽象的多个级别的细粒度和粗粒度的信息。综合实验表明,我们的方法可以充分利用语句级方面类别标签,以改善与大的性能增益跨域方面提取。
5. FFR v1.1: Fon-French Neural Machine Translation [PDF] 返回目录
Bonaventure F. P. Dossou, Chris C. Emezue
Abstract: All over the world and especially in Africa, researchers are putting efforts into building Neural Machine Translation (NMT) systems to help tackle the language barriers in Africa, a continent of over 2000 different languages. However, the low-resourceness, diacritical, and tonal complexities of African languages are major issues being faced. The FFR project is a major step towards creating a robust translation model from Fon, a very low-resource and tonal language, to French, for research and public use. In this paper, we introduce FFR Dataset, a corpus of Fon-to-French translations, describe the diacritical encoding process, and introduce our FFR v1.1 model, trained on the dataset. The dataset and model are made publicly available at this https URL bonaventuredossou/ffr-v1, to promote collaboration and reproducibility.
摘要:在世界各地,特别是在非洲,研究人员正在努力将投入建设神经机器翻译(NMT)系统,以帮助解决在非洲,超过2000种不同语言的大陆的语言障碍。然而,低resourceness,变音,和色调非洲语言的复杂性所面临的重大问题。该FFR项目是朝着建立由丰强大的翻译模型,以极低的资源和声调的语言,法语,研究和公众使用的重要一步。在本文中,我们介绍了FFR数据集,丰文对法语翻译的语料库,描述了变音编码过程,并介绍我们FFR v1.1的模型,训练有素的数据集。该数据集和模型,在此HTTPS URL bonaventuredossou / FFR-V1公之于众,以促进合作和可重复性。
Bonaventure F. P. Dossou, Chris C. Emezue
Abstract: All over the world and especially in Africa, researchers are putting efforts into building Neural Machine Translation (NMT) systems to help tackle the language barriers in Africa, a continent of over 2000 different languages. However, the low-resourceness, diacritical, and tonal complexities of African languages are major issues being faced. The FFR project is a major step towards creating a robust translation model from Fon, a very low-resource and tonal language, to French, for research and public use. In this paper, we introduce FFR Dataset, a corpus of Fon-to-French translations, describe the diacritical encoding process, and introduce our FFR v1.1 model, trained on the dataset. The dataset and model are made publicly available at this https URL bonaventuredossou/ffr-v1, to promote collaboration and reproducibility.
摘要:在世界各地,特别是在非洲,研究人员正在努力将投入建设神经机器翻译(NMT)系统,以帮助解决在非洲,超过2000种不同语言的大陆的语言障碍。然而,低resourceness,变音,和色调非洲语言的复杂性所面临的重大问题。该FFR项目是朝着建立由丰强大的翻译模型,以极低的资源和声调的语言,法语,研究和公众使用的重要一步。在本文中,我们介绍了FFR数据集,丰文对法语翻译的语料库,描述了变音编码过程,并介绍我们FFR v1.1的模型,训练有素的数据集。该数据集和模型,在此HTTPS URL bonaventuredossou / FFR-V1公之于众,以促进合作和可重复性。
6. A Hybrid Natural Language Generation System Integrating Rules and Deep Learning Algorithms [PDF] 返回目录
Wei Wei, Bei Zhou, Georgios Leontidis
Abstract: This paper proposes an enhanced natural language generation system combining the merits of both rule-based approaches and modern deep learning algorithms, boosting its performance to the extent where the generated textual content is capable of exhibiting agile human-writing styles and the content logic of which is highly controllable. We also come up with a novel approach called HMCU to measure the performance of the natural language processing comprehensively and precisely.
摘要:本文提出了一种增强的自然语言生成系统组合这两个基于规则的方法和现代的深度学习算法的优点,提高其性能,如果生成的文本内容是能够展现出敏捷的人,写作风格和内容的逻辑的程度这是高度可控的。我们还拿出了一个名为HMCU来衡量自然语言全面,准确地处理性能的新方法。
Wei Wei, Bei Zhou, Georgios Leontidis
Abstract: This paper proposes an enhanced natural language generation system combining the merits of both rule-based approaches and modern deep learning algorithms, boosting its performance to the extent where the generated textual content is capable of exhibiting agile human-writing styles and the content logic of which is highly controllable. We also come up with a novel approach called HMCU to measure the performance of the natural language processing comprehensively and precisely.
摘要:本文提出了一种增强的自然语言生成系统组合这两个基于规则的方法和现代的深度学习算法的优点,提高其性能,如果生成的文本内容是能够展现出敏捷的人,写作风格和内容的逻辑的程度这是高度可控的。我们还拿出了一个名为HMCU来衡量自然语言全面,准确地处理性能的新方法。
7. Results of the seventh edition of the BioASQ Challenge [PDF] 返回目录
Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras
Abstract: The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.
摘要:BioASQ挑战的第七版的结果本文提出。在BioASQ挑战的目的是通过对大规模的生物医学语义索引和答疑的任务挑战的组织推广的系统和方法。总共30支球队中有超过100个系统参加了今年的挑战。与往年一样,最好的系统能够跑赢基准强。这表明,国家的最先进的系统在不断改进,推动研究的前沿。
Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras
Abstract: The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.
摘要:BioASQ挑战的第七版的结果本文提出。在BioASQ挑战的目的是通过对大规模的生物医学语义索引和答疑的任务挑战的组织推广的系统和方法。总共30支球队中有超过100个系统参加了今年的挑战。与往年一样,最好的系统能够跑赢基准强。这表明,国家的最先进的系统在不断改进,推动研究的前沿。
8. CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning [PDF] 返回目录
Hongru Wang, Xiangru Tang, Sunny Lai, Kwong Sak Leung
Abstract: This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE) which consists of three sub-tasks. The task is to directly validate the given sentence whether or not it makes sense and require the model to explain it. Based on BERTarchitecture with a multi-task setting, we propose an effective and interpretable "Explain, Reason and Predict" (ERP) system to solve the three sub-tasks about commonsense: (a) Validation, (b)Reasoning, and (c) Explanation. Inspired by cognitive studies of common sense, our system first generates a reason or understanding of the sentences and then chooses which one statement makes sense, which is achieved by multi-task learning. During the post-evaluation, our system has reached 92.9% accuracy in subtask A (rank 11), 89.7% accuracy in subtask B (rank 9), andBLEU score of 12.9 in subtask C (rank 8)
摘要:本文介绍了提交给SemEval 2020年任务4我们的系统:解释(ComVE),它包括三个子任务常识与验证。任务是直接验证给出的句子它是否有意义,需要模型来解释它。基于BERTarchitecture具有多任务环境中,我们提出了一个有效的和可解释的“解释,原因与预测”(ERP)系统,以解决有关常识的三个子任务:(a)确认,(B)推理,以及(c )说明。通过常识认知研究的启发,我们的系统首先生成和哪一个说法是有道理的,这是由多任务学习取得了理由或句子的理解,然后进行选择。在后评价,我们的系统中的子任务甲达到92.9%的准确度(等级11),89.7%的准确度在子任务B(秩9),在andBLEU子任务Ç得分12.9(秩8)
Hongru Wang, Xiangru Tang, Sunny Lai, Kwong Sak Leung
Abstract: This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE) which consists of three sub-tasks. The task is to directly validate the given sentence whether or not it makes sense and require the model to explain it. Based on BERTarchitecture with a multi-task setting, we propose an effective and interpretable "Explain, Reason and Predict" (ERP) system to solve the three sub-tasks about commonsense: (a) Validation, (b)Reasoning, and (c) Explanation. Inspired by cognitive studies of common sense, our system first generates a reason or understanding of the sentences and then chooses which one statement makes sense, which is achieved by multi-task learning. During the post-evaluation, our system has reached 92.9% accuracy in subtask A (rank 11), 89.7% accuracy in subtask B (rank 9), andBLEU score of 12.9 in subtask C (rank 8)
摘要:本文介绍了提交给SemEval 2020年任务4我们的系统:解释(ComVE),它包括三个子任务常识与验证。任务是直接验证给出的句子它是否有意义,需要模型来解释它。基于BERTarchitecture具有多任务环境中,我们提出了一个有效的和可解释的“解释,原因与预测”(ERP)系统,以解决有关常识的三个子任务:(a)确认,(B)推理,以及(c )说明。通过常识认知研究的启发,我们的系统首先生成和哪一个说法是有道理的,这是由多任务学习取得了理由或句子的理解,然后进行选择。在后评价,我们的系统中的子任务甲达到92.9%的准确度(等级11),89.7%的准确度在子任务B(秩9),在andBLEU子任务Ç得分12.9(秩8)
9. How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation [PDF] 返回目录
Steffen Eger, Johannes Daxenberger, Iryna Gurevych
Abstract: Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations - e.g., to increase interpretability of their results - probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack large-scale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.
摘要:句子的编码器映射的句子来实值向量在下游应用中使用。窥视这些作品 - 例如,以增加其结果的解释性 - 探测任务已经设计了查询他们的语言知识。然而,在设计的探测任务较少,资源语言是棘手的,因为这些往往缺乏大型注释的数据或(优质)的依赖解析器如英语中的探测任务设计的先决条件。要研究如何探测句的嵌入在这种情况下,我们研究探测任务结果到结构设计的选择,进行的首次大规模研究的敏感性。我们发现,设计选择喜欢的注解探测数据集的大小和类型用于评估做(有时基本上)影响探测结果的分类。然后,我们探讨与设计选择多语言设置的嵌入横亘在“稳定区”,为我们确定了英语,并找到对英语成绩不转移到其他语言。更加公平和更加全面的句子级评价探测因此,应在未来多国语言进行。
Steffen Eger, Johannes Daxenberger, Iryna Gurevych
Abstract: Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations - e.g., to increase interpretability of their results - probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack large-scale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.
摘要:句子的编码器映射的句子来实值向量在下游应用中使用。窥视这些作品 - 例如,以增加其结果的解释性 - 探测任务已经设计了查询他们的语言知识。然而,在设计的探测任务较少,资源语言是棘手的,因为这些往往缺乏大型注释的数据或(优质)的依赖解析器如英语中的探测任务设计的先决条件。要研究如何探测句的嵌入在这种情况下,我们研究探测任务结果到结构设计的选择,进行的首次大规模研究的敏感性。我们发现,设计选择喜欢的注解探测数据集的大小和类型用于评估做(有时基本上)影响探测结果的分类。然后,我们探讨与设计选择多语言设置的嵌入横亘在“稳定区”,为我们确定了英语,并找到对英语成绩不转移到其他语言。更加公平和更加全面的句子级评价探测因此,应在未来多国语言进行。
10. PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models [PDF] 返回目录
Eyal Ben-David, Carmel Rabinovitz, Roi Reichart
Abstract: Pivot-based neural representation models have lead to significant progress in domain adaptation for NLP. However, previous works that follow this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate massive unlabeled corpora that are not necessarily drawn from these domains. To alleviate this, we propose PERL: A representation learning model that extends contextualized word embedding models such as BERT with pivot-based fine-tuning. PERL outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models and increases model stability.
摘要:基于数据透视神经表示模型已导致域适配NLP显著的进展。然而,采用这种做法以前的作品中只使用来自源域,从源和目标域未标记数据,而忽略了标签的数据是要纳入不一定是从这些域抽取大量未标记的语料库。为了缓解这种情况,我们建议PERL:一个代表的学习,扩展情境字嵌入模型如BERT基于枢轴微调模型。 PERL优于跨越22情感分类域适配设置强基线,改善域模型的性能,产生有效尺寸减小的模型和模型增加稳定性。
Eyal Ben-David, Carmel Rabinovitz, Roi Reichart
Abstract: Pivot-based neural representation models have lead to significant progress in domain adaptation for NLP. However, previous works that follow this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate massive unlabeled corpora that are not necessarily drawn from these domains. To alleviate this, we propose PERL: A representation learning model that extends contextualized word embedding models such as BERT with pivot-based fine-tuning. PERL outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models and increases model stability.
摘要:基于数据透视神经表示模型已导致域适配NLP显著的进展。然而,采用这种做法以前的作品中只使用来自源域,从源和目标域未标记数据,而忽略了标签的数据是要纳入不一定是从这些域抽取大量未标记的语料库。为了缓解这种情况,我们建议PERL:一个代表的学习,扩展情境字嵌入模型如BERT基于枢轴微调模型。 PERL优于跨越22情感分类域适配设置强基线,改善域模型的性能,产生有效尺寸减小的模型和模型增加稳定性。
11. The SPPD System for Schema Guided Dialogue State Tracking Challenge [PDF] 返回目录
Miao Li, Haoqi Xiong, Yunbo Cao
Abstract: This paper introduces one of our group's work on the Dialog System Technology Challenges 8 (DSTC8), the SPPD system for Schema Guided dialogue state tracking challenge. This challenge, named as Track 4 in DSTC8, provides a brand new and challenging dataset for developing scalable multi-domain dialogue state tracking algorithms for real world dialogue systems. We propose a zero-shot dialogue state tracking system for this task. The key components of the system is a number of BERT based zero-shot NLU models that can effectively capture semantic relations between natural language descriptions of services' schemas and utterances from dialogue turns. We also propose some strategies to make the system better to exploit information from longer dialogue history and to overcome the slot carryover problem for multi-domain dialogues. The experimental results show that the proposed system achieves a significant improvement compared with the baseline system.
摘要:本文介绍了我们集团的对话系统的技术挑战8(DSTC8),用于引导模式对话状态跟踪挑战SPPD系统的工作之一。这种挑战,在DSTC8命名为第4道,提供了一个全新的,具有挑战性的数据集用于开发现实世界对话的系统可扩展的多领域对话状态跟踪算法。我们提出了一个零次对话状态跟踪系统完成这个任务。该系统的主要组成部分是一些基于BERT零射门NLU模型,可以有效地捕捉从对话转向服务模式的自然语言描述和话语之间的语义关系。我们也提出了一些策略,以使系统更好地利用从较长的对话历史信息,并克服了多领域的对话插槽携带问题。实验结果表明,该系统与基线系统相比实现了显著的改善。
Miao Li, Haoqi Xiong, Yunbo Cao
Abstract: This paper introduces one of our group's work on the Dialog System Technology Challenges 8 (DSTC8), the SPPD system for Schema Guided dialogue state tracking challenge. This challenge, named as Track 4 in DSTC8, provides a brand new and challenging dataset for developing scalable multi-domain dialogue state tracking algorithms for real world dialogue systems. We propose a zero-shot dialogue state tracking system for this task. The key components of the system is a number of BERT based zero-shot NLU models that can effectively capture semantic relations between natural language descriptions of services' schemas and utterances from dialogue turns. We also propose some strategies to make the system better to exploit information from longer dialogue history and to overcome the slot carryover problem for multi-domain dialogues. The experimental results show that the proposed system achieves a significant improvement compared with the baseline system.
摘要:本文介绍了我们集团的对话系统的技术挑战8(DSTC8),用于引导模式对话状态跟踪挑战SPPD系统的工作之一。这种挑战,在DSTC8命名为第4道,提供了一个全新的,具有挑战性的数据集用于开发现实世界对话的系统可扩展的多领域对话状态跟踪算法。我们提出了一个零次对话状态跟踪系统完成这个任务。该系统的主要组成部分是一些基于BERT零射门NLU模型,可以有效地捕捉从对话转向服务模式的自然语言描述和话语之间的语义关系。我们也提出了一些策略,以使系统更好地利用从较长的对话历史信息,并克服了多领域的对话插槽携带问题。实验结果表明,该系统与基线系统相比实现了显著的改善。
12. Manipulating emotions for ground truth emotion analysis [PDF] 返回目录
Bennett Kleinberg
Abstract: Text data are being used as a lens through which human cognition can be studied at a large scale. Methods like emotion analysis are now in the standard toolkit of computational social scientists but typically rely on third-person annotation with unknown validity. As an alternative, this paper introduces online emotion induction techniques from experimental behavioural research as a method for text-based emotion analysis. Text data were collected from participants who were randomly allocated to a happy, neutral or sad condition. The findings support the mood induction procedure. We then examined how well lexicon approaches can retrieve the induced emotion. All approaches resulted in statistical differences between the true emotion conditions. Overall, only up to one-third of the variance in emotion was captured by text-based measurements. Pretrained classifiers performed poorly on detecting true emotions. The paper concludes with limitations and suggestions for future research.
摘要:文本数据被用作通过其人类认知能够以大规模进行研究的透镜。像情感分析方法现在都在计算社会科学家的标准工具,但通常依赖于第三人称批注与未知的有效性。作为替代方案,介绍了从实验行为研究为基于文本的情感分析的方法在线情感感应技术。文本数据是从谁被随机分配到一个快乐的,中性或悲伤的状况参与者收集。这一发现支持了心境感应程序。然后,我们研究了如何以及词汇的方法可以获取引起的情感。所有这些方法产生了真正的感情状况之间的统计学差异。总体来看,最多只有三分之一的情感的变异是由基于文本的测量抓获。预训练的分类器在检测的真实情感表现不佳。论文以限制和建议,为今后的研究结论。
Bennett Kleinberg
Abstract: Text data are being used as a lens through which human cognition can be studied at a large scale. Methods like emotion analysis are now in the standard toolkit of computational social scientists but typically rely on third-person annotation with unknown validity. As an alternative, this paper introduces online emotion induction techniques from experimental behavioural research as a method for text-based emotion analysis. Text data were collected from participants who were randomly allocated to a happy, neutral or sad condition. The findings support the mood induction procedure. We then examined how well lexicon approaches can retrieve the induced emotion. All approaches resulted in statistical differences between the true emotion conditions. Overall, only up to one-third of the variance in emotion was captured by text-based measurements. Pretrained classifiers performed poorly on detecting true emotions. The paper concludes with limitations and suggestions for future research.
摘要:文本数据被用作通过其人类认知能够以大规模进行研究的透镜。像情感分析方法现在都在计算社会科学家的标准工具,但通常依赖于第三人称批注与未知的有效性。作为替代方案,介绍了从实验行为研究为基于文本的情感分析的方法在线情感感应技术。文本数据是从谁被随机分配到一个快乐的,中性或悲伤的状况参与者收集。这一发现支持了心境感应程序。然后,我们研究了如何以及词汇的方法可以获取引起的情感。所有这些方法产生了真正的感情状况之间的统计学差异。总体来看,最多只有三分之一的情感的变异是由基于文本的测量抓获。预训练的分类器在检测的真实情感表现不佳。论文以限制和建议,为今后的研究结论。
13. Causal Knowledge Extraction from Scholarly Papers in Social Sciences [PDF] 返回目录
Victor Zitian Chen, Felipe Montano-Campos, Wlodek Zadrozny
Abstract: The scale and scope of scholarly articles today are overwhelming human researchers who seek to timely digest and synthesize knowledge. In this paper, we seek to develop natural language processing (NLP) models to accelerate the speed of extraction of relationships from scholarly papers in social sciences, identify hypotheses from these papers, and extract the cause-and-effect entities. Specifically, we develop models to 1) classify sentences in scholarly documents in business and management as hypotheses (hypothesis classification), 2) classify these hypotheses as causal relationships or not (causality classification), and, if they are causal, 3) extract the cause and effect entities from these hypotheses (entity extraction). We have achieved high performance for all the three tasks using different modeling techniques. Our approach may be generalizable to scholarly documents in a wide range of social sciences, as well as other types of textual materials.
摘要:规模,今天的学术文章的范围是压倒人的研究人员谁寻求及时消化和综合知识。在本文中,我们寻求发展的自然语言处理(NLP)模型来加速从社会科学学术论文关系的提取速度,从这些文件确定的假设,并提取原因和结果的实体。具体来说,我们开发模型1)在业务和管理的学术文献作为假设(假设分类)分类的句子,2)这些假说归类为因果关系或没有(因果关系分类),而且,如果他们是因果关系,3)提取从这些假设(实体提取)因果实体。我们已经实现高性能,使用不同的建模技术,所有的三个任务。我们的方法可以推广到在广泛的社会科学学术的文件,以及其他类型的文字材料。
Victor Zitian Chen, Felipe Montano-Campos, Wlodek Zadrozny
Abstract: The scale and scope of scholarly articles today are overwhelming human researchers who seek to timely digest and synthesize knowledge. In this paper, we seek to develop natural language processing (NLP) models to accelerate the speed of extraction of relationships from scholarly papers in social sciences, identify hypotheses from these papers, and extract the cause-and-effect entities. Specifically, we develop models to 1) classify sentences in scholarly documents in business and management as hypotheses (hypothesis classification), 2) classify these hypotheses as causal relationships or not (causality classification), and, if they are causal, 3) extract the cause and effect entities from these hypotheses (entity extraction). We have achieved high performance for all the three tasks using different modeling techniques. Our approach may be generalizable to scholarly documents in a wide range of social sciences, as well as other types of textual materials.
摘要:规模,今天的学术文章的范围是压倒人的研究人员谁寻求及时消化和综合知识。在本文中,我们寻求发展的自然语言处理(NLP)模型来加速从社会科学学术论文关系的提取速度,从这些文件确定的假设,并提取原因和结果的实体。具体来说,我们开发模型1)在业务和管理的学术文献作为假设(假设分类)分类的句子,2)这些假说归类为因果关系或没有(因果关系分类),而且,如果他们是因果关系,3)提取从这些假设(实体提取)因果实体。我们已经实现高性能,使用不同的建模技术,所有的三个任务。我们的方法可以推广到在广泛的社会科学学术的文件,以及其他类型的文字材料。
14. Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation [PDF] 返回目录
Kellie Webster, Emily Pitler
Abstract: Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns (e.g., English). Predicting the underlying gender of these pronouns is difficult since it is not marked textually and must instead be inferred from coreferent mentions in the context. We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels, and show that this data can be used to fine-tune a BERT classifier with 92% F1 for Spanish dropped feminine pronouns, compared with 30-51% for neural machine translation models and 54-71% for a non-fine-tuned BERT model. We augment a neural machine translation model with labels from our classifier to improve pronoun translation, while still having parallelizable translation models that translate a sentence at a time.
摘要:文件理解不够机器翻译系统翻译时掉落或中性代词与性别代词(例如,英语)的语言可以犯错误。这些预测代词的根本的性别是困难的,因为它不是以文字标注,而必须从coreferent推断提到的背景下。我们提出了一个新颖的跨语种旋转方法用于自动地生产高品质的性别标签,并表明,该数据可用于细调用92%F1一个BERT分类为西班牙语下降女性化代词,用30-51%用于比较神经机器翻译模型和非微调BERT模型54-71%。我们从分类与扩充标签神经机器翻译模式,提高代词翻译,同时还具有并行转换模型,同时翻译一个句子。
Kellie Webster, Emily Pitler
Abstract: Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns (e.g., English). Predicting the underlying gender of these pronouns is difficult since it is not marked textually and must instead be inferred from coreferent mentions in the context. We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels, and show that this data can be used to fine-tune a BERT classifier with 92% F1 for Spanish dropped feminine pronouns, compared with 30-51% for neural machine translation models and 54-71% for a non-fine-tuned BERT model. We augment a neural machine translation model with labels from our classifier to improve pronoun translation, while still having parallelizable translation models that translate a sentence at a time.
摘要:文件理解不够机器翻译系统翻译时掉落或中性代词与性别代词(例如,英语)的语言可以犯错误。这些预测代词的根本的性别是困难的,因为它不是以文字标注,而必须从coreferent推断提到的背景下。我们提出了一个新颖的跨语种旋转方法用于自动地生产高品质的性别标签,并表明,该数据可用于细调用92%F1一个BERT分类为西班牙语下降女性化代词,用30-51%用于比较神经机器翻译模型和非微调BERT模型54-71%。我们从分类与扩充标签神经机器翻译模式,提高代词翻译,同时还具有并行转换模型,同时翻译一个句子。
15. End-to-End Code Switching Language Models for Automatic Speech Recognition [PDF] 返回目录
Ahan M. R., Shreyas Sunil Kulkarni
Abstract: In this paper, we particularly work on the code-switched text, one of the most common occurrences in the bilingual communities across the world. Due to the discrepancies in the extraction of code-switched text from an Automated Speech Recognition(ASR) module, and thereby extracting the monolingual text from the code-switched text, we propose an approach for extracting monolingual text using Deep Bi-directional Language Models(LM) such as BERT and other Machine Translation models, and also explore different ways of extracting code-switched text from the ASR model. We also explain the robustness of the model by comparing the results of Perplexity and other different metrics like WER, to the standard bi-lingual text output without any external information.
摘要:在本文中,我们尤其是在代码交换文本,在世界各地的双语社区最常见出现的一个工作。由于提取的差异代码交换文本从自动语音识别(ASR)模块,从而提取代码交换文本单语的文字,我们提出了提取使用Deep双向语言模型和英语文本的方法(LM),如BERT和其它机器翻译模型,并探讨提取从ASR型号代码交换文本的不同方式。我们还通过比较困惑等不同指标,如WER,以标准的双语文本输出的结果没有任何外部信息解释模型的稳健性。
Ahan M. R., Shreyas Sunil Kulkarni
Abstract: In this paper, we particularly work on the code-switched text, one of the most common occurrences in the bilingual communities across the world. Due to the discrepancies in the extraction of code-switched text from an Automated Speech Recognition(ASR) module, and thereby extracting the monolingual text from the code-switched text, we propose an approach for extracting monolingual text using Deep Bi-directional Language Models(LM) such as BERT and other Machine Translation models, and also explore different ways of extracting code-switched text from the ASR model. We also explain the robustness of the model by comparing the results of Perplexity and other different metrics like WER, to the standard bi-lingual text output without any external information.
摘要:在本文中,我们尤其是在代码交换文本,在世界各地的双语社区最常见出现的一个工作。由于提取的差异代码交换文本从自动语音识别(ASR)模块,从而提取代码交换文本单语的文字,我们提出了提取使用Deep双向语言模型和英语文本的方法(LM),如BERT和其它机器翻译模型,并探讨提取从ASR型号代码交换文本的不同方式。我们还通过比较困惑等不同指标,如WER,以标准的双语文本输出的结果没有任何外部信息解释模型的稳健性。
16. On the use of human reference data for evaluating automatic image descriptions [PDF] 返回目录
Emiel van Miltenburg
Abstract: Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER, etc). Thus, both the quality of the systems as well as the quality of the evaluation depends on the quality of the descriptions. As Section 2 will show, the quality of current image description datasets is insufficient. I argue that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions. With high-quality data, evaluation of image description systems could use reference descriptions, but we should also look for alternatives.
摘要:自动图象描述系统通常训练和使用众包,人类生成的图像的描述进行评价。表现最佳的系统随后使用相似的一定程度的参考数据(BLEU,流星,苹果酒,等)来确定。因此,系统的两个质量以及作为评价的质量取决于描述的质量。作为第2部分将显示,当前的图象描述数据集的质量是不够的。我认为,有必要进行考虑到视障用户的需求更详尽的指引,同时也产生适合描述的可行性。凭借高品质的数据,图像描述系统的评价可以使用引用的描述,但我们也应该寻找替代品。
Emiel van Miltenburg
Abstract: Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER, etc). Thus, both the quality of the systems as well as the quality of the evaluation depends on the quality of the descriptions. As Section 2 will show, the quality of current image description datasets is insufficient. I argue that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions. With high-quality data, evaluation of image description systems could use reference descriptions, but we should also look for alternatives.
摘要:自动图象描述系统通常训练和使用众包,人类生成的图像的描述进行评价。表现最佳的系统随后使用相似的一定程度的参考数据(BLEU,流星,苹果酒,等)来确定。因此,系统的两个质量以及作为评价的质量取决于描述的质量。作为第2部分将显示,当前的图象描述数据集的质量是不够的。我认为,有必要进行考虑到视障用户的需求更详尽的指引,同时也产生适合描述的可行性。凭借高品质的数据,图像描述系统的评价可以使用引用的描述,但我们也应该寻找替代品。
17. Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data [PDF] 返回目录
Yaqing Wang, Yifan Ethan Xu, Xian Li, Xin Luna Dong, Jing Gao
Abstract: Product catalogs are valuable resources for eCommerce website. In the catalog, a product is associated with multiple attributes whose values are short texts, such as product name, brand, functionality and flavor. Usually individual retailers self-report these key values, and thus the catalog information unavoidably contains noisy facts. Although existing deep neural network models have shown success in conducting cross-checking between two pieces of texts, their success has to be dependent upon a large set of quality labeled data, which are hard to obtain in this validation task: products span a variety of categories. To address the aforementioned challenges, we propose a novel meta-learning latent variable approach, called MetaBridge, which can learn transferable knowledge from a subset of categories with limited labeled data and capture the uncertainty of never-seen categories with unlabeled data. More specifically, we make the following contributions. (1) We formalize the problem of validating the textual attribute values of products from a variety of categories as a natural language inference task in the few-shot learning setting, and propose a meta-learning latent variable model to jointly process the signals obtained from product profiles and textual attribute values. (2) We propose to integrate meta learning and latent variable in a unified model to effectively capture the uncertainty of various categories. (3) We propose a novel objective function based on latent variable model in the few-shot learning setting, which ensures distribution consistency between unlabeled and labeled data and prevents overfitting by sampling from the learned distribution. Extensive experiments on real eCommerce datasets from hundreds of categories demonstrate the effectiveness of MetaBridge on textual attribute validation and its outstanding performance compared with state-of-the-art approaches.
摘要:产品目录是电子商务网站的宝贵资源。在目录中,一个产品具有多个属性,其值是短文本,如产品名称,品牌,功能和风味相关。通常个别零售商的自我报告这些键值,从而目录信息不可避免地含有嘈杂的事实。虽然现有的深层神经网络模型显示,在开展文本的两片交叉检查的成功,他们的成功必须依赖于一个大的一整套质量标签的数据,这是很难获得在此验证任务:产品涵盖各种类别。为了应对上述挑战,我们提出了一个新的元学习潜变量方法,称为MetaBridge,这可以从有限的标记数据类别的子集学习转让知识和捕捉从未见过的类别与标签数据的不确定性。更具体地说,我们做出以下的贡献。 (1)我们正式从一级分类品种中为数不多的拍自然语言推理任务的学习环境验证产品的文本属性值的问题,并提出了荟萃学习潜变量模型,共同处理来自获得的信号产品简介和文本的属性值。 (2)我们建议元的学习和潜变量纳入一个统一的模型,以有效地捕捉各种类别的不确定性。 (3)提出了一种基于在几次学习设定,这确保未标记的和标记的数据,并防止通过从所学习的分布采样的过度拟合之间分布一致性潜变量模型的新颖目标函数。真实数据集电子商务广泛的实验,从数百个类别的证明MetaBridge对文本属性验证的有效性和其出色的表现与国家的最先进的方法相比。
Yaqing Wang, Yifan Ethan Xu, Xian Li, Xin Luna Dong, Jing Gao
Abstract: Product catalogs are valuable resources for eCommerce website. In the catalog, a product is associated with multiple attributes whose values are short texts, such as product name, brand, functionality and flavor. Usually individual retailers self-report these key values, and thus the catalog information unavoidably contains noisy facts. Although existing deep neural network models have shown success in conducting cross-checking between two pieces of texts, their success has to be dependent upon a large set of quality labeled data, which are hard to obtain in this validation task: products span a variety of categories. To address the aforementioned challenges, we propose a novel meta-learning latent variable approach, called MetaBridge, which can learn transferable knowledge from a subset of categories with limited labeled data and capture the uncertainty of never-seen categories with unlabeled data. More specifically, we make the following contributions. (1) We formalize the problem of validating the textual attribute values of products from a variety of categories as a natural language inference task in the few-shot learning setting, and propose a meta-learning latent variable model to jointly process the signals obtained from product profiles and textual attribute values. (2) We propose to integrate meta learning and latent variable in a unified model to effectively capture the uncertainty of various categories. (3) We propose a novel objective function based on latent variable model in the few-shot learning setting, which ensures distribution consistency between unlabeled and labeled data and prevents overfitting by sampling from the learned distribution. Extensive experiments on real eCommerce datasets from hundreds of categories demonstrate the effectiveness of MetaBridge on textual attribute validation and its outstanding performance compared with state-of-the-art approaches.
摘要:产品目录是电子商务网站的宝贵资源。在目录中,一个产品具有多个属性,其值是短文本,如产品名称,品牌,功能和风味相关。通常个别零售商的自我报告这些键值,从而目录信息不可避免地含有嘈杂的事实。虽然现有的深层神经网络模型显示,在开展文本的两片交叉检查的成功,他们的成功必须依赖于一个大的一整套质量标签的数据,这是很难获得在此验证任务:产品涵盖各种类别。为了应对上述挑战,我们提出了一个新的元学习潜变量方法,称为MetaBridge,这可以从有限的标记数据类别的子集学习转让知识和捕捉从未见过的类别与标签数据的不确定性。更具体地说,我们做出以下的贡献。 (1)我们正式从一级分类品种中为数不多的拍自然语言推理任务的学习环境验证产品的文本属性值的问题,并提出了荟萃学习潜变量模型,共同处理来自获得的信号产品简介和文本的属性值。 (2)我们建议元的学习和潜变量纳入一个统一的模型,以有效地捕捉各种类别的不确定性。 (3)提出了一种基于在几次学习设定,这确保未标记的和标记的数据,并防止通过从所学习的分布采样的过度拟合之间分布一致性潜变量模型的新颖目标函数。真实数据集电子商务广泛的实验,从数百个类别的证明MetaBridge对文本属性验证的有效性和其出色的表现与国家的最先进的方法相比。
18. DynE: Dynamic Ensemble Decoding for Multi-Document Summarization [PDF] 返回目录
Chris Hokamp, Demian Gholipour Ghalandari, Nghia The Pham, John Glover
Abstract: Sequence-to-sequence (s2s) models are the basis for extensive work in natural language processing. However, some applications, such as multi-document summarization, multi-modal machine translation, and the automatic post-editing of machine translation, require mapping a set of multiple distinct inputs into a single output sequence. Recent work has introduced bespoke architectures for these multi-input settings, and developed models which can handle increasingly longer inputs; however, the performance of special model architectures is limited by the available in-domain training data. In this work we propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs. Our proposed approach allows models trained for vanilla s2s tasks to be directly used in multi-input settings. This works particularly well when each of the inputs has significant overlap with the others, as when compressing a cluster of news articles about the same event into a single coherent summary, and we obtain state-of-the-art results on several multi-document summarization datasets.
摘要:序列到序列(S2S)模型是自然语言处理大量工作的基础。然而,一些应用中,如多文档文摘,多模态机器翻译,和机器翻译自动后期编辑,需要映射一组多个不同的输入到一个输出序列。最近的工作已经推出了定制的架构,这些多输入设置和开发的模型,它可以处理越来越长的输入;然而,特殊的模型架构的性能由提供域训练数据的限制。在这项工作中,我们提出一种歌舞团在不同的输入相同型号的多个实例的输出简单的解码方法。我们建议的方法允许训练香草S2S任务模式,以在多输入设置中直接使用。这工作得特别好,当每个输入与他人,压缩大约相同的事件到一个单一的连贯的总结新闻文章的群集时为显著的重叠,我们几个多文档获得国家的先进成果汇总数据集。
Chris Hokamp, Demian Gholipour Ghalandari, Nghia The Pham, John Glover
Abstract: Sequence-to-sequence (s2s) models are the basis for extensive work in natural language processing. However, some applications, such as multi-document summarization, multi-modal machine translation, and the automatic post-editing of machine translation, require mapping a set of multiple distinct inputs into a single output sequence. Recent work has introduced bespoke architectures for these multi-input settings, and developed models which can handle increasingly longer inputs; however, the performance of special model architectures is limited by the available in-domain training data. In this work we propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs. Our proposed approach allows models trained for vanilla s2s tasks to be directly used in multi-input settings. This works particularly well when each of the inputs has significant overlap with the others, as when compressing a cluster of news articles about the same event into a single coherent summary, and we obtain state-of-the-art results on several multi-document summarization datasets.
摘要:序列到序列(S2S)模型是自然语言处理大量工作的基础。然而,一些应用中,如多文档文摘,多模态机器翻译,和机器翻译自动后期编辑,需要映射一组多个不同的输入到一个输出序列。最近的工作已经推出了定制的架构,这些多输入设置和开发的模型,它可以处理越来越长的输入;然而,特殊的模型架构的性能由提供域训练数据的限制。在这项工作中,我们提出一种歌舞团在不同的输入相同型号的多个实例的输出简单的解码方法。我们建议的方法允许训练香草S2S任务模式,以在多输入设置中直接使用。这工作得特别好,当每个输入与他人,压缩大约相同的事件到一个单一的连贯的总结新闻文章的群集时为显著的重叠,我们几个多文档获得国家的先进成果汇总数据集。
19. To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks [PDF] 返回目录
Sinong Wang, Madian Khabsa, Hao Ma
Abstract: Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%. Our findings indicate that MLM-based models might reach a diminishing return point as the supervised data size increases significantly.
摘要:训练前NLP型号的蒙面语言模型(MLM)的目标变种最近导致许多任务显著的改善。本文探讨预训练模式的好处,在下游任务中使用训练样本数的函数。在几个文本分类的任务,我们表明,作为训练样本的数量增长到几百万,1%的范围内微调从头开始变窄基于BERT模型和培训香草LSTM之间准确性差距。我们的研究结果表明,作为教师数据大小显著增加基于MLM的模型可能会达到收益递减点。
Sinong Wang, Madian Khabsa, Hao Ma
Abstract: Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%. Our findings indicate that MLM-based models might reach a diminishing return point as the supervised data size increases significantly.
摘要:训练前NLP型号的蒙面语言模型(MLM)的目标变种最近导致许多任务显著的改善。本文探讨预训练模式的好处,在下游任务中使用训练样本数的函数。在几个文本分类的任务,我们表明,作为训练样本的数量增长到几百万,1%的范围内微调从头开始变窄基于BERT模型和培训香草LSTM之间准确性差距。我们的研究结果表明,作为教师数据大小显著增加基于MLM的模型可能会达到收益递减点。
20. "Notic My Speech" -- Blending Speech Patterns With Multimedia [PDF] 返回目录
Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann
Abstract: Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact how speech is delivered and heard. To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression. On the other hand, in the visual speech recognition domain, existing studies have mostly modeled it as a classification problem, while ignoring the correlations between views, phonemes, visemes, and speech perception. This results in solutions which are further away from how human perception works. To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. We conduct experiments on three public visual speech recognition datasets. The experimental results show that our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. This characteristic benefits downstream applications such as video compression and streaming where a significant number of less important frames can be compressed or eliminated while being able to maximally preserve human speech understanding with good user experience.
摘要:语音作为天然信号是由三个部分组成 - 视位(语音的可视部分),音素(语音的语音部分),和语言(施加的结构)。然而,作为视频语音的传输和多媒体构建一个媒体大多忽略话音传递的认知方面。例如,像转码和压缩视频的应用已经到现在忽略了一个事实讲话是如何传递和听到。要关闭语音理解和多媒体视频应用之间的差距,在本文中,我们通过模拟对视觉语音感知和显示视频压缩它的使用情况表明最初的实验。在另一方面,在视觉语音识别领域,现有的研究大多仿照它作为一个分类问题,而忽视了意见,音素,视素,和语音感知之间的相关性。这导致解决方案,从如何人类感知作品渐行渐远。为了弥补这种差距,我们提出了一个观点 - 时间关注机制的观点的依赖,并在语音识别和理解visemic重要性都进行建模。我们三个公共视觉语音识别的数据集进行实验。实验结果表明,该方法中,视位错误率方面优于现有的工作由4.99%。此外,我们表明,存在多视角的讲话我们的模型的理解和人类感知之间存在很强的相关性。这种特性的好处下游应用,如视频压缩和流哪里不太重要的帧的显著数量可以被压缩或同时能够最大限度地保持与良好的用户体验人类言语理解消除。
Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann
Abstract: Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact how speech is delivered and heard. To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression. On the other hand, in the visual speech recognition domain, existing studies have mostly modeled it as a classification problem, while ignoring the correlations between views, phonemes, visemes, and speech perception. This results in solutions which are further away from how human perception works. To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. We conduct experiments on three public visual speech recognition datasets. The experimental results show that our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. This characteristic benefits downstream applications such as video compression and streaming where a significant number of less important frames can be compressed or eliminated while being able to maximally preserve human speech understanding with good user experience.
摘要:语音作为天然信号是由三个部分组成 - 视位(语音的可视部分),音素(语音的语音部分),和语言(施加的结构)。然而,作为视频语音的传输和多媒体构建一个媒体大多忽略话音传递的认知方面。例如,像转码和压缩视频的应用已经到现在忽略了一个事实讲话是如何传递和听到。要关闭语音理解和多媒体视频应用之间的差距,在本文中,我们通过模拟对视觉语音感知和显示视频压缩它的使用情况表明最初的实验。在另一方面,在视觉语音识别领域,现有的研究大多仿照它作为一个分类问题,而忽视了意见,音素,视素,和语音感知之间的相关性。这导致解决方案,从如何人类感知作品渐行渐远。为了弥补这种差距,我们提出了一个观点 - 时间关注机制的观点的依赖,并在语音识别和理解visemic重要性都进行建模。我们三个公共视觉语音识别的数据集进行实验。实验结果表明,该方法中,视位错误率方面优于现有的工作由4.99%。此外,我们表明,存在多视角的讲话我们的模型的理解和人类感知之间存在很强的相关性。这种特性的好处下游应用,如视频压缩和流哪里不太重要的帧的显著数量可以被压缩或同时能够最大限度地保持与良好的用户体验人类言语理解消除。
21. On the Computational Power of Transformers and Its Implications in Sequence Modeling [PDF] 返回目录
Satwik Bhattamishra, Arkil Patel, Navin Goyal
Abstract: Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary. We demonstrate the practical implications of our results via experiments on machine translation and synthetic tasks.
摘要:变压器正在跨越几个序列建模工作中广泛使用。显著的研究工作一直致力于探索试验变压器的内部运作。然而,他们的权力和固有的局限性我们的概念和理论的理解是仍处于初期阶段。具体地,在变压器如位置编码,注意头,剩余的连接,和前馈网络的各种部件的作用,不明确。在本文中,我们采取对回答这些问题的一个步骤。我们分析的计算能力通过图灵完备性所捕获。我们首先提供一个备用的证据表明,香草变压器是图灵完备的,然后我们证明了变形金刚与位置遮蔽,没有任何位置的编码也图灵完备。我们进一步分析了网络的图灵完备的每个组件的必要性;有趣的是,我们发现残留的连接特定类型是必要的。我们通过对机器翻译和合成任务实验证明我们的研究结果的实际意义。
Satwik Bhattamishra, Arkil Patel, Navin Goyal
Abstract: Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary. We demonstrate the practical implications of our results via experiments on machine translation and synthetic tasks.
摘要:变压器正在跨越几个序列建模工作中广泛使用。显著的研究工作一直致力于探索试验变压器的内部运作。然而,他们的权力和固有的局限性我们的概念和理论的理解是仍处于初期阶段。具体地,在变压器如位置编码,注意头,剩余的连接,和前馈网络的各种部件的作用,不明确。在本文中,我们采取对回答这些问题的一个步骤。我们分析的计算能力通过图灵完备性所捕获。我们首先提供一个备用的证据表明,香草变压器是图灵完备的,然后我们证明了变形金刚与位置遮蔽,没有任何位置的编码也图灵完备。我们进一步分析了网络的图灵完备的每个组件的必要性;有趣的是,我们发现残留的连接特定类型是必要的。我们通过对机器翻译和合成任务实验证明我们的研究结果的实际意义。
22. Modelling High-Level Mathematical Reasoning in Mechanised Declarative Proofs [PDF] 返回目录
Wenda Li, Lei Yu, Yuhuai Wu, Lawrence C. Paulson
Abstract: Mathematical proofs can be mechanised using proof assistants to eliminate gaps and errors. However, mechanisation still requires intensive labour. To promote automation, it is essential to capture high-level human mathematical reasoning, which we address as the problem of generating suitable propositions. We build a non-synthetic dataset from the largest repository of mechanised proofs and propose a task on causal reasoning, where a model is required to fill in a missing intermediate proposition given a causal context. Our experiments (using various neural sequence-to-sequence models) reveal that while the task is challenging, neural models can indeed capture non-trivial mathematical reasoning. We further propose a hierarchical transformer model that outperforms the transformer baseline.
摘要:数学证明可以用证据助理消除漏洞和错误机械化。然而,机械化还需要劳动密集型。为了促进自动化,有必要捕捉到高层次的人力数学推理,这是我们作为生成适于命题的问题解决。我们建立由机械化证明的最大资源库非合成的数据集,并提出了基于因果推理,其中一个模型被要求填写给出的因果方面缺少中间命题任务。我们的实验(使用不同的神经序列到序列模型)显示,尽管任务具有挑战性,神经模型确实可以捕捉不平凡的数学推理。我们进一步建议,优于变压器基线分层变压器模型。
Wenda Li, Lei Yu, Yuhuai Wu, Lawrence C. Paulson
Abstract: Mathematical proofs can be mechanised using proof assistants to eliminate gaps and errors. However, mechanisation still requires intensive labour. To promote automation, it is essential to capture high-level human mathematical reasoning, which we address as the problem of generating suitable propositions. We build a non-synthetic dataset from the largest repository of mechanised proofs and propose a task on causal reasoning, where a model is required to fill in a missing intermediate proposition given a causal context. Our experiments (using various neural sequence-to-sequence models) reveal that while the task is challenging, neural models can indeed capture non-trivial mathematical reasoning. We further propose a hierarchical transformer model that outperforms the transformer baseline.
摘要:数学证明可以用证据助理消除漏洞和错误机械化。然而,机械化还需要劳动密集型。为了促进自动化,有必要捕捉到高层次的人力数学推理,这是我们作为生成适于命题的问题解决。我们建立由机械化证明的最大资源库非合成的数据集,并提出了基于因果推理,其中一个模型被要求填写给出的因果方面缺少中间命题任务。我们的实验(使用不同的神经序列到序列模型)显示,尽管任务具有挑战性,神经模型确实可以捕捉不平凡的数学推理。我们进一步建议,优于变压器基线分层变压器模型。
23. Towards Automated Assessment of Stuttering and Stuttering Therapy [PDF] 返回目录
Sebastian P. Bayerl, Florian Hönig, Joelle Reister, Korbinian Riedhammer
Abstract: Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking. Severity assessment is usually done by a speech therapist. While attempts at automated assessment were made, it is rarely used in therapy. Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the average of the three longest stuttering symptoms during a speech task, or the recently introduced Speech Efficiency Score (SES). This paper introduces the Speech Control Index (SCI), a new method to evaluate the severity of stuttering. Unlike SES, it can also be used to assess therapy success for fluency shaping. We evaluate both SES and SCI on a new comprehensively labeled dataset containing stuttered German speech of clients prior to, during, and after undergoing stuttering therapy. Phone alignments of an automatic speech recognition system are statistically evaluated in relation to their relative position to labeled stuttering events. The results indicate that phone length distributions differ with respect to their position in and around labeled stuttering events
摘要:口吃是一个复杂的语言障碍,可以通过重复来识别,而说话的声音,音节或单词,并阻止拓展。严重性的评估通常是由语言治疗师来完成。而在自动评估尝试了,它很少在治疗中使用。口吃严重程度的评估常用的方法包括%的口吃音节(%SS),演讲任务期间的平均三个最长的口吃症状,或最近推出的语音效率得分(SES)。本文介绍了语音控制指数(SCI),以评估口吃的严重程度的新方法。与SES,它也可以用于评估流畅整形治疗成功。我们对包含在之前,客户的结巴德语演讲一个新的全面标记集同时评估SES和SCI,并接受口吃治疗后。自动语音识别系统的电话比对中关系进行统计评估,以它们的相对位置,以标记的口吃事件。结果表明,手机长度分布的差异是它们的位置和周围标记口吃事件
Sebastian P. Bayerl, Florian Hönig, Joelle Reister, Korbinian Riedhammer
Abstract: Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking. Severity assessment is usually done by a speech therapist. While attempts at automated assessment were made, it is rarely used in therapy. Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the average of the three longest stuttering symptoms during a speech task, or the recently introduced Speech Efficiency Score (SES). This paper introduces the Speech Control Index (SCI), a new method to evaluate the severity of stuttering. Unlike SES, it can also be used to assess therapy success for fluency shaping. We evaluate both SES and SCI on a new comprehensively labeled dataset containing stuttered German speech of clients prior to, during, and after undergoing stuttering therapy. Phone alignments of an automatic speech recognition system are statistically evaluated in relation to their relative position to labeled stuttering events. The results indicate that phone length distributions differ with respect to their position in and around labeled stuttering events
摘要:口吃是一个复杂的语言障碍,可以通过重复来识别,而说话的声音,音节或单词,并阻止拓展。严重性的评估通常是由语言治疗师来完成。而在自动评估尝试了,它很少在治疗中使用。口吃严重程度的评估常用的方法包括%的口吃音节(%SS),演讲任务期间的平均三个最长的口吃症状,或最近推出的语音效率得分(SES)。本文介绍了语音控制指数(SCI),以评估口吃的严重程度的新方法。与SES,它也可以用于评估流畅整形治疗成功。我们对包含在之前,客户的结巴德语演讲一个新的全面标记集同时评估SES和SCI,并接受口吃治疗后。自动语音识别系统的电话比对中关系进行统计评估,以它们的相对位置,以标记的口吃事件。结果表明,手机长度分布的差异是它们的位置和周围标记口吃事件
24. AVLnet: Learning Audio-Visual Language Representations from Instructional Videos [PDF] 返回目录
Andrew Rouditchenko, Angie Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Abstract: Current methods for learning visually grounded language from videos often rely on time-consuming and expensive data collection, such as human annotated textual summaries or machine generated automatic speech recognition transcripts. In this work, we introduce Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs. We circumvent the need for annotation and instead learn audio-visual language representations directly from randomly segmented video clips and their raw audio waveforms. We train AVLnet on publicly available instructional videos and evaluate our model on video clip and language retrieval tasks on three video datasets. Our proposed model outperforms several state-of-the-art text-video baselines by up to 11.8% in a video clip retrieval task, despite operating on the raw audio instead of manually annotated text captions. Further, we show AVLnet is capable of integrating textual information, increasing its modularity and improving performance by up to 20.3% on the video clip retrieval task. Finally, we perform analysis of AVLnet's learned representations, showing our model has learned to relate visual objects with salient words and natural sounds.
摘要:从视频学习视觉语言接地目前的方法通常依赖于费时和昂贵的数据采集,如人注释的文本概括或机器生成的自动语音识别成绩单。在这项工作中,我们介绍了音视频语言网络(AVLnet),自我监督的网络,直接从原始视频输入获悉共享视听嵌入空间。我们规避注释的需要,而是直接从随机分段视频剪辑和他们的原始音频波形学习视听语言表示。我们培养AVLnet上公开提供的教学视频,并评估我们对三个视频数据集视频剪辑和语言检索任务模式。我们提出的模型优于高达11.8%,在视频片段检索任务的几个国家的最先进的文本视频基线,尽管原始音频,而不是手动注释文字说明操作。此外,我们展示AVLnet能够整合文本信息,增加其模块化和提高达20.3%,在视频片段检索任务的性能。最后,我们执行AVLnet的教训陈述的分析,显示我们的模型已经学会用相关显着单词和自然的声音可视化对象。
Andrew Rouditchenko, Angie Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Abstract: Current methods for learning visually grounded language from videos often rely on time-consuming and expensive data collection, such as human annotated textual summaries or machine generated automatic speech recognition transcripts. In this work, we introduce Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs. We circumvent the need for annotation and instead learn audio-visual language representations directly from randomly segmented video clips and their raw audio waveforms. We train AVLnet on publicly available instructional videos and evaluate our model on video clip and language retrieval tasks on three video datasets. Our proposed model outperforms several state-of-the-art text-video baselines by up to 11.8% in a video clip retrieval task, despite operating on the raw audio instead of manually annotated text captions. Further, we show AVLnet is capable of integrating textual information, increasing its modularity and improving performance by up to 20.3% on the video clip retrieval task. Finally, we perform analysis of AVLnet's learned representations, showing our model has learned to relate visual objects with salient words and natural sounds.
摘要:从视频学习视觉语言接地目前的方法通常依赖于费时和昂贵的数据采集,如人注释的文本概括或机器生成的自动语音识别成绩单。在这项工作中,我们介绍了音视频语言网络(AVLnet),自我监督的网络,直接从原始视频输入获悉共享视听嵌入空间。我们规避注释的需要,而是直接从随机分段视频剪辑和他们的原始音频波形学习视听语言表示。我们培养AVLnet上公开提供的教学视频,并评估我们对三个视频数据集视频剪辑和语言检索任务模式。我们提出的模型优于高达11.8%,在视频片段检索任务的几个国家的最先进的文本视频基线,尽管原始音频,而不是手动注释文字说明操作。此外,我们展示AVLnet能够整合文本信息,增加其模块化和提高达20.3%,在视频片段检索任务的性能。最后,我们执行AVLnet的教训陈述的分析,显示我们的模型已经学会用相关显着单词和自然的声音可视化对象。
25. Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering [PDF] 返回目录
Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu
Abstract: Fact-based Visual Question Answering (FVQA) requires external knowledge beyond visible content to answer questions about an image, which is challenging but indispensable to achieve general VQA. One limitation of existing FVQA solutions is that they jointly embed all kinds of information without fine-grained selection, which introduces unexpected noises for reasoning the final answer. How to capture the question-oriented and information-complementary evidence remains a key challenge to solve the problem. In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features. On top of the multi-layer graph representations, we propose a modality-aware heterogeneous graph convolutional network to capture evidence from different layers that is most relevant to the given question. Specifically, the intra-modal graph convolution selects evidence from each modality and cross-modal graph convolution aggregates relevant information across different modalities. By stacking this process multiple times, our model performs iterative reasoning and predicts the optimal answer by analyzing all question-oriented evidence. We achieve a new state-of-the-art performance on the FVQA task and demonstrate the effectiveness and interpretability of our model with extensive experiments. The code is available at this https URL.
摘要:以事实为基础的视觉答疑(FVQA)要求超出显示的内容来对图像,这是具有挑战性的,但不可缺少的实现一般VQA答题外部知识。现有FVQA解决方案的一个限制是他们共同嵌入各种信息,而细粒度的选择,它引入了意想不到的噪音进行推理的最终答案。如何捕获问题为导向,以信息互补的证据仍然是解决问题的一个关键挑战。在本文中,我们通过一个多模式异构图,其中包含对应于视觉,语义和事实的功能的信息的多个层描绘的图像。在多层图形表示的顶部,我们提出了一个模态感知异构图形卷积网络从不同层面捕捉证据最相关的特定问题。具体地,从每个模态和跨通道图表卷积聚集有关跨不同模态信息的帧内模式图表选择卷积证据。通过多次,我们的模型进行反复的论证和预测,通过分析所有面向问题证据的最佳答案堆叠这个过程。我们实现对FVQA任务一个新的国家的最先进的性能,证明我们有广泛的实验模型的有效性和可解释性。该代码可在此HTTPS URL。
Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu
Abstract: Fact-based Visual Question Answering (FVQA) requires external knowledge beyond visible content to answer questions about an image, which is challenging but indispensable to achieve general VQA. One limitation of existing FVQA solutions is that they jointly embed all kinds of information without fine-grained selection, which introduces unexpected noises for reasoning the final answer. How to capture the question-oriented and information-complementary evidence remains a key challenge to solve the problem. In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features. On top of the multi-layer graph representations, we propose a modality-aware heterogeneous graph convolutional network to capture evidence from different layers that is most relevant to the given question. Specifically, the intra-modal graph convolution selects evidence from each modality and cross-modal graph convolution aggregates relevant information across different modalities. By stacking this process multiple times, our model performs iterative reasoning and predicts the optimal answer by analyzing all question-oriented evidence. We achieve a new state-of-the-art performance on the FVQA task and demonstrate the effectiveness and interpretability of our model with extensive experiments. The code is available at this https URL.
摘要:以事实为基础的视觉答疑(FVQA)要求超出显示的内容来对图像,这是具有挑战性的,但不可缺少的实现一般VQA答题外部知识。现有FVQA解决方案的一个限制是他们共同嵌入各种信息,而细粒度的选择,它引入了意想不到的噪音进行推理的最终答案。如何捕获问题为导向,以信息互补的证据仍然是解决问题的一个关键挑战。在本文中,我们通过一个多模式异构图,其中包含对应于视觉,语义和事实的功能的信息的多个层描绘的图像。在多层图形表示的顶部,我们提出了一个模态感知异构图形卷积网络从不同层面捕捉证据最相关的特定问题。具体地,从每个模态和跨通道图表卷积聚集有关跨不同模态信息的帧内模式图表选择卷积证据。通过多次,我们的模型进行反复的论证和预测,通过分析所有面向问题证据的最佳答案堆叠这个过程。我们实现对FVQA任务一个新的国家的最先进的性能,证明我们有广泛的实验模型的有效性和可解释性。该代码可在此HTTPS URL。
26. Generative Semantic Hashing Enhanced via Boltzmann Machines [PDF] 返回目录
Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen
Abstract: Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size, independence is always not the best assumption. In this paper, to introduce correlations among the bits of hash codes, we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training, we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that, an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques, the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
摘要:生成性语义哈希是大规模信息检索得益于其快速检索的速度和内存占用小有前途的技术。对于训练的易处理性,现有的生成散列方法主要承担后分布的因式分解形式,执行的散列码位之间的独立性。无论从模型表示和代码空间大小的角度来看,独立性始终是不是最好的假设。在本文中,以引入的散列码的位之间的相关性,我们建议采用玻尔兹曼机的分布作为变后部。为了解决培训问题棘手,我们首先制定通过增强其为高斯状分布的分级级联和伯努利分布重新参数波尔兹曼机的分布的近似方法。此基础上,一个渐近-精确下界证据进一步衍生下限(ELBO)。用这些新的技术,整个模型可有效地优化。大量的实验结果表明,通过哈希码内有效造型各异位之间的相关性,我们的模型可以达到显著的性能提升。
Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen
Abstract: Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size, independence is always not the best assumption. In this paper, to introduce correlations among the bits of hash codes, we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training, we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that, an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques, the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
摘要:生成性语义哈希是大规模信息检索得益于其快速检索的速度和内存占用小有前途的技术。对于训练的易处理性,现有的生成散列方法主要承担后分布的因式分解形式,执行的散列码位之间的独立性。无论从模型表示和代码空间大小的角度来看,独立性始终是不是最好的假设。在本文中,以引入的散列码的位之间的相关性,我们建议采用玻尔兹曼机的分布作为变后部。为了解决培训问题棘手,我们首先制定通过增强其为高斯状分布的分级级联和伯努利分布重新参数波尔兹曼机的分布的近似方法。此基础上,一个渐近-精确下界证据进一步衍生下限(ELBO)。用这些新的技术,整个模型可有效地优化。大量的实验结果表明,通过哈希码内有效造型各异位之间的相关性,我们的模型可以达到显著的性能提升。
注:中文为机器翻译结果!