0%

【NLP】 2014-2020 Multi-Source 多源相关论文整理

目录

1. Multi-Source Domain Adaptation for Visual Sentiment Classification, AAAI 2020 [PDF] 摘要
2. Adversarial Training Based Multi-Source Unsupervised Domain Adaptation for Sentiment Analysis, AAAI 2020 [PDF] 摘要
3. Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits, AAAI 2020 [PDF] 摘要
4. Multi-Source Distilling Domain Adaptation, AAAI 2020 [PDF] 摘要
5. Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling, ACL 2020 [PDF] 摘要
6. Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language, ACL 2020 [PDF] 摘要
7. Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering, ACL 2020 [PDF] 摘要
8. Transformer Based Multi-Source Domain Adaptation, EMNLP 2020 [PDF] 摘要
9. Denoising Multi-Source Weak Supervision for Neural Text Classification, EMNLP 2020 [PDF] 摘要
10. MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics, IJCAI 2020 [PDF] 摘要
11. An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension, IJCAI 2020 [PDF] 摘要
12. Multi-Source Neural Variational Inference, AAAI 2019 [PDF] 摘要
13. Multi-Source Cross-Lingual Model Transfer: Learning What to Share, ACL 2019 [PDF] 摘要
14. DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain, ACL 2019 [PDF] 摘要
15. Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation, ACL 2019 [PDF] 摘要
16. Transformer-based Automatic Post-Editing Model with Joint Encoder and Multi-source Attention of Decoder, ACL 2019 [PDF] 摘要
17. Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models, EMNLP 2019 [PDF] 摘要
18. Medical Concept Representation Learning from Multi-source Data, IJCAI 2019 [PDF] 摘要
19. Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks, NeurIPS 2019 [PDF] 摘要
20. Multi-source Domain Adaptation for Semantic Segmentation, NeurIPS 2019 [PDF] 摘要
21. Multi-Source Neural Machine Translation with Missing Data, ACL 2018 [PDF] 摘要
22. Multi-Source Multi-Class Fake News Detection, COLING 2018 [PDF] 摘要
23. Automatic Curation and Visualization of Crime Related Information from Incrementally Crawled Multi-source News Reports, COLING 2018 [PDF] 摘要
24. Multi-Source Syntactic Neural Machine Translation, EMNLP 2018 [PDF] 摘要
25. Multi-Source Domain Adaptation with Mixture of Experts, EMNLP 2018 [PDF] 摘要
26. Multi-source synthetic treebank creation for improved cross-lingual dependency parsing, EMNLP 2018 [PDF] 摘要
27. Low-resource named entity recognition via multi-source projection: Not quite there yet?, EMNLP 2018 [PDF] 摘要
28. Input Combination Strategies for Multi-Source Transformer Decoder, EMNLP 2018 [PDF] 摘要
29. Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 task, EMNLP 2018 [PDF] 摘要
30. A Transformer-Based Multi-Source Automatic Post-Editing System, EMNLP 2018 [PDF] 摘要
31. Multi-source transformer with combined losses for automatic post editing, EMNLP 2018 [PDF] 摘要
32. Recommendation with Multi-Source Heterogeneous Information, IJCAI 2018 [PDF] 摘要
33. From Shared Subspaces to Shared Landmarks: A Robust Multi-Source Classification Approach, AAAI 2017 [PDF] 摘要
34. Attention Strategies for Multi-Source Sequence-to-Sequence Learning, ACL 2017 [PDF] 摘要
35. Source-Target Similarity Modelings for Multi-Source Transfer Gaussian Process Regression, ICML 2017 [PDF] 摘要
36. A Representation Learning Framework for Multi-Source Transfer Parsing, AAAI 2016 [PDF] 摘要
37. Ensemble Learning for Multi-Source Neural Machine Translation, COLING 2016 [PDF] 摘要
38. Multi-Source Iterative Adaptation for Cross-Domain Classification, IJCAI 2016 [PDF] 摘要
39. Multi-Source Neural Translation, NAACL 2016 [PDF] 摘要
40. Multi-Source Domain Adaptation: A Causal View, AAAI 2015 [PDF] 摘要
41. Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning, IJCAI 2015 [PDF] 摘要

摘要

1. Multi-Source Domain Adaptation for Visual Sentiment Classification [PDF] 返回目录
  AAAI 2020. AAAI Technical Track: Humans and AI
  Chuang Lin, Sicheng Zhao, Lei Meng, Tat-Seng Chua
Existing domain adaptation methods on visual sentiment classification typically are investigated under the single-source scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-the-art MDA approaches for visual sentiment classification.

2. Adversarial Training Based Multi-Source Unsupervised Domain Adaptation for Sentiment Analysis [PDF] 返回目录
  AAAI 2020. AAAI Technical Track: Natural Language Processing
  Yong Dai, Jian Liu, Xiancong Ren, Zenglin Xu
Multi-source unsupervised domain adaptation (MS-UDA) for sentiment analysis (SA) aims to leverage useful information in multiple source domains to help do SA in an unlabeled target domain that has no supervised information. Existing algorithms of MS-UDA either only exploit the shared features, i.e., the domain-invariant information, or based on some weak assumption in NLP, e.g., smoothness assumption. To avoid these problems, we propose two transfer learning frameworks based on the multi-source domain adaptation methodology for SA by combining the source hypotheses to derive a good target hypothesis. The key feature of the first framework is a novel Weighting Scheme based Unsupervised Domain Adaptation framework ((WS-UDA), which combine the source classifiers to acquire pseudo labels for target instances directly. While the second framework is a Two-Stage Training based Unsupervised Domain Adaptation framework (2ST-UDA), which further exploits these pseudo labels to train a target private extractor. Importantly, the weights assigned to each source classifier are based on the relations between target instances and source domains, which measured by a discriminator through the adversarial training. Furthermore, through the same discriminator, we also fulfill the separation of shared features and private features.Experimental results on two SA datasets demonstrate the promising performance of our frameworks, which outperforms unsupervised state-of-the-art competitors.

3. Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits [PDF] 返回目录
  AAAI 2020. AAAI Technical Track: Natural Language Processing
  Han Guo, Ramakanth Pasunuru, Mohit Bansal
Domain adaptation performance of a learning algorithm on a target domain is a function of its source domain error and a divergence measure between the data distribution of these two domains. We present a study of various distance-based measures in the context of NLP tasks, that characterize the dissimilarity between domains based on sample estimates. We first conduct analysis experiments to show which of these distance measures can best differentiate samples from same versus different domains, and are correlated with empirical results. Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task's loss function, so as to achieve better unsupervised domain adaptation. Finally, we extend this model to a novel DistanceNet-Bandit model, which employs a multi-armed bandit controller to dynamically switch between multiple source domains and allow the model to learn an optimal trajectory and mixture of domains for transfer to the low-resource target domain. We conduct experiments on popular sentiment analysis datasets with several diverse domains and show that our DistanceNet model, as well as its dynamic bandit variant, can outperform competitive baselines in the context of unsupervised domain adaptation.

4. Multi-Source Distilling Domain Adaptation [PDF] 返回目录
  AAAI 2020. AAAI Technical Track: Vision
  Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.

5. Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling [PDF] 返回目录
  ACL 2020.
  Ouyu Lan, Xiao Huang, Bill Yuchen Lin, He Jiang, Liyuan Liu, Xiang Ren
Sequence labeling is a fundamental task for a range of natural language processing problems. When used in practice, its performance is largely influenced by the annotation quality and quantity, and meanwhile, obtaining ground truth labels is often costly. In many cases, ground truth labels do not exist, but noisy annotations or annotations from different domains are accessible. In this paper, we propose a novel framework Consensus Network (ConNet) that can be trained on annotations from multiple sources (e.g., crowd annotation, cross-domain data). It learns individual representation for every source and dynamically aggregates source-specific knowledge by a context-aware attention module. Finally, it leads to a model reflecting the agreement (consensus) among multiple sources. We evaluate the proposed framework in two practical settings of multi-source learning: learning with crowd annotations and unsupervised cross-domain model adaptation. Extensive experimental results show that our model achieves significant improvements over existing methods in both settings. We also demonstrate that the method can apply to various tasks and cope with different encoders.

6. Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language [PDF] 返回目录
  ACL 2020.
  Qianhui Wu, Zijia Lin, Börje Karlsson, Jian-Guang Lou, Biqing Huang
To better tackle the named entity recognition (NER) problem on languages with little/no labeled data, cross-lingual NER must effectively leverage knowledge learned from source languages with rich labeled data. Previous works on cross-lingual NER are mostly based on label projection with pairwise texts or direct model transfer. However, such methods either are not applicable if the labeled data in the source languages is unavailable, or do not leverage information contained in unlabeled data in the target language. In this paper, we propose a teacher-student learning method to address such limitations, where NER models in the source languages are used as teachers to train a student model on unlabeled data in the target language. The proposed method works for both single-source and multi-source cross-lingual NER. For the latter, we further propose a similarity measuring method to better weight the supervision from different teacher models. Extensive experiments for 3 target languages on benchmark datasets well demonstrate that our method outperforms existing state-of-the-art methods for both single-source and multi-source cross-lingual NER.

7. Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering [PDF] 返回目录
  ACL 2020.
  Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou
Multiple-choice question answering (MCQA) is one of the most challenging tasks in machine reading comprehension since it requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations. Unfortunately, most existing MCQA datasets are small in size, which increases the difficulty of model learning and generalization. To address this challenge, we propose a multi-source meta transfer (MMT) for low-resource MCQA. In this framework, we first extend meta learning by incorporating multiple training sources to learn a generalized feature representation across domains. To bridge the distribution gap between training sources and the target, we further introduce the meta transfer that can be integrated into the multi-source meta training. More importantly, the proposed MMT is independent of backbone language models. Extensive experiments demonstrate the superiority of MMT over state-of-the-arts, and continuous improvements can be achieved on different backbone networks on both supervised and unsupervised domain adaptation settings.

8. Transformer Based Multi-Source Domain Adaptation [PDF] 返回目录
  EMNLP 2020. Long Paper
  Dustin Wright, Isabelle Augenstein
In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on. Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of mixture of experts, where the predictions of multiple domain expert classifiers are combined; as well as domain adversarial training, to induce a domain agnostic representation space. Inspired by this, we investigate how such methods can be effectively applied to large pretrained transformer models. We find that domain adversarial training has an effect on the learned representations of these models while having little effect on their performance, suggesting that large transformer-based models are already relatively robust across domains. Additionally, we show that mixture of experts leads to significant performance improvements by comparing several variants of mixing functions, including one novel metric based on attention. Finally, we demonstrate that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective metrics for mixing their predictions.

9. Denoising Multi-Source Weak Supervision for Neural Text Classification [PDF] 返回目录
  EMNLP 2020. Findings Short Paper
  Wendi Ren, Yinghao Li, Hanting Su, David Kartchner, Cassie Mitchell, Chao Zhang
We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels. The denoised pseudo labels then supervise a neural classifier to predicts soft labels for unmatched samples, which address the rule coverage issue. We evaluate our model on five benchmarks for sentiment, topic, and relation classifications. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods consistently, and achieves comparable performance with fully-supervised methods even without any labeled data. Our code can be found at https://github.com/weakrules/Denoise-multi-weak-sources.

10. MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics [PDF] 返回目录
  IJCAI 2020.
  Mohammadamin Barekatain, Ryo Yonetani, Masashi Hamaya
Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.

11. An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension [PDF] 返回目录
  IJCAI 2020.
  Xin Liu, Kai Liu, Xiang Li, Jinsong Su, Yubin Ge, Bin Wang, Jiebo Luo
The lack of sufficient training data in many domains, poses a major challenge to the construction of domain-specific machine reading comprehension (MRC) models with satisfying performance. In this paper, we propose a novel iterative multi-source mutual knowledge transfer framework for MRC. As an extension of the conventional knowledge transfer with one-to-one correspondence, our framework focuses on the many-to-many mutual transfer, which involves synchronous executions of multiple many-to-one transfers in an iterative manner.Specifically, to update a target-domain MRC model, we first consider other domain-specific MRC models as individual teachers, and employ knowledge distillation to train a multi-domain MRC model, which is differentially required to fit the training data and match the outputs of these individual models according to their domain-level similarities to the target domain. After being initialized by the multi-domain MRC model, the target-domain MRC model is fine-tuned to match both its training data and the output of its previous best model simultaneously via knowledge distillation. Compared with previous approaches, our framework can continuously enhance all domain-specific MRC models by enabling each model to iteratively and differentially absorb the domain-shared knowledge from others. Experimental results and in-depth analyses on several benchmark datasets demonstrate the effectiveness of our framework.

12. Multi-Source Neural Variational Inference [PDF] 返回目录
  AAAI 2019. AAAI Technical Track: Machine Learning
  Richard Kurle, Stephan Günnemann, Patrick van der Smagt
Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned on a different information source. This allows us to relate the sources via the shared latent variables by computing divergence measures between individual source’s posterior approximations. We explore a variety of options to learn these encoders and to integrate the beliefs they compute into a consistent posterior approximation. We visualise learned beliefs on a toy dataset and evaluate our methods for learning shared representations and structured output prediction, showing trade-offs of learning separate encoders for each information source. Furthermore, we demonstrate how conflict detection and redundancy can increase robustness of inference in a multi-source setting.

13. Multi-Source Cross-Lingual Model Transfer: Learning What to Share [PDF] 返回目录
  ACL 2019.
  Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, Claire Cardie
Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

14. DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain [PDF] 返回目录
  ACL 2019. the 18th BioNLP Workshop and Shared Task
  Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, Jianfeng Gao
This paper describes our competing system to enter the MEDIQA-2019 competition. We use a multi-source transfer learning approach to transfer the knowledge from MT-DNN and SciBERT to natural language understanding tasks in the medical domain. For transfer learning fine-tuning, we use multi-task learning on NLI, RQE and QA tasks on general and medical domains to improve performance. The proposed methods are proved effective for natural language understanding in the medical domain, and we rank the first place on the QA task.

15. Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation [PDF] 返回目录
  ACL 2019. the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
  Patrick Littell, Chi-kiu Lo, Samuel Larkin, Darlene Stewart
We describe the neural machine translation (NMT) system developed at the National Research Council of Canada (NRC) for the Kazakh-English news translation task of the Fourth Conference on Machine Translation (WMT19). Our submission is a multi-source NMT taking both the original Kazakh sentence and its Russian translation as input for translating into English.

16. Transformer-based Automatic Post-Editing Model with Joint Encoder and Multi-source Attention of Decoder [PDF] 返回目录
  ACL 2019. the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
  WonKee Lee, Jaehun Shin, Jong-Hyeok Lee
This paper describes POSTECH’s submission to the WMT 2019 shared task on Automatic Post-Editing (APE). In this paper, we propose a new multi-source APE model by extending Transformer. The main contributions of our study are that we 1) reconstruct the encoder to generate a joint representation of translation (mt) and its src context, in addition to the conventional src encoding and 2) suggest two types of multi-source attention layers to compute attention between two outputs of the encoder and the decoder state in the decoder. Furthermore, we train our model by applying various teacher-forcing ratios to alleviate exposure bias. Finally, we adopt the ensemble technique across variations of our model. Experiments on the WMT19 English-German APE data set show improvements in terms of both TER and BLEU scores over the baseline. Our primary submission achieves -0.73 in TER and +1.49 in BLEU compare to the baseline.

17. Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models [PDF] 返回目录
  EMNLP 2019. the 3rd Workshop on Neural Generation and Translation
  Woon Sang Cho, Yizhe Zhang, Sudha Rao, Chris Brockett, Sungjin Lee
Ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined queries, each of which relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. We propose a new task of generating common question from multiple documents and present simple variant of an existing multi-source encoder-decoder framework, called the Multi-Source Question Generator (MSQG). We first train an RNN-based single encoder-decoder generator from (single document, question) pairs. At test time, given multiple documents, the Distribute step of our MSQG model predicts target word distributions for each document using the trained model. The Aggregate step aggregates these distributions to generate a common question. This simple yet effective strategy significantly outperforms several existing baseline models applied to the new task when evaluated using automated metrics and human judgments on the MS-MARCO-QA dataset.

18. Medical Concept Representation Learning from Multi-source Data [PDF] 返回目录
  IJCAI 2019.
  Tian Bai, Brian L. Egleston, Richard Bleicher, Slobodan Vucetic
Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

19. Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks [PDF] 返回目录
  NeurIPS 2019.
  Joshua Lee, Prasanna Sattigeri, Gregory Wornell
The advent of deep learning algorithms for mobile devices and sensors has led to a dramatic expansion in the availability and number of systems trained on a wide range of machine learning tasks, creating a host of opportunities and challenges in the realm of transfer learning. Currently, most transfer learning methods require some kind of control over the systems learned, either by enforcing constraints during the source training, or through the use of a joint optimization objective between tasks that requires all data be co-located for training. However, for practical, privacy, or other reasons, in a variety of applications we may have no control over the individual source task training, nor access to source training samples. Instead we only have access to features pre-trained on such data as the output of "black-boxes.'' For such scenarios, we consider the multi-source learning problem of training a classifier using an ensemble of pre-trained neural networks for a set of classes that have not been observed by any of the source networks, and for which we have very few training samples. We show that by using these distributed networks as feature extractors, we can train an effective classifier in a computationally-efficient manner using tools from (nonlinear) maximal correlation analysis. In particular, we develop a method we refer to as maximal correlation weighting (MCW) to build the required target classifier from an appropriate weighting of the feature functions from the source networks. We illustrate the effectiveness of the resulting classifier on datasets derived from the CIFAR-100, Stanford Dogs, and Tiny ImageNet datasets, and, in addition, use the methodology to characterize the relative value of different source tasks in learning a target task.

20. Multi-source Domain Adaptation for Semantic Segmentation [PDF] 返回目录
  NeurIPS 2019.
  Sicheng Zhao, Bo Li, Xiangyu Yue, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https://github.com/Luodian/MADAN.

21. Multi-Source Neural Machine Translation with Missing Data [PDF] 返回目录
  ACL 2018. the 2nd Workshop on Neural Machine Translation and Generation
  Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura
Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol . These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

22. Multi-Source Multi-Class Fake News Detection [PDF] 返回目录
  COLING 2018.
  Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, Jiliang Tang
Fake news spreading through media outlets poses a real threat to the trustworthiness of information and detecting fake news has attracted increasing attention in recent years. Fake news is typically written intentionally to mislead readers, which determines that fake news detection merely based on news content is tremendously challenging. Meanwhile, fake news could contain true evidence to mock true news and presents different degrees of fakeness, which further exacerbates the detection difficulty. On the other hand, the spread of fake news produces various types of data from different perspectives. These multiple sources provide rich contextual information about fake news and offer unprecedented opportunities for advanced fake news detection. In this paper, we study fake news detection with different degrees of fakeness by integrating multiple sources. In particular, we introduce approaches to combine information from multiple sources and to discriminate between different degrees of fakeness, and propose a Multi-source Multi-class Fake news Detection framework MMFD, which combines automated feature extraction, multi-source fusion and automated degrees of fakeness detection into a coherent and interpretable model. Experimental results on the real-world data demonstrate the effectiveness of the proposed framework and extensive experiments are further conducted to understand the working of the proposed framework.

23. Automatic Curation and Visualization of Crime Related Information from Incrementally Crawled Multi-source News Reports [PDF] 返回目录
  COLING 2018. System Demonstrations
  Tirthankar Dasgupta, Lipika Dey, Rupsa Saha, Abir Naskar
In this paper, we demonstrate a system for the automatic extraction and curation of crime-related information from multi-source digitally published News articles collected over a period of five years. We have leveraged the use of deep convolution recurrent neural network model to analyze crime articles to extract different crime related entities and events. The proposed methods are not restricted to detecting known crimes only but contribute actively towards maintaining an updated crime ontology. We have done experiments with a collection of 5000 crime-reporting News articles span over time, and multiple sources. The end-product of our experiments is a crime-register that contains details of crime committed across geographies and time. This register can be further utilized for analytical and reporting purposes.

24. Multi-Source Syntactic Neural Machine Translation [PDF] 返回目录
  EMNLP 2018.
  Anna Currey, Kenneth Heafield
We introduce a novel multi-source technique for incorporating source syntax into neural machine translation using linearized parses. This is achieved by employing separate encoders for the sequential and parsed versions of the same source sentence; the resulting representations are then combined using a hierarchical attention mechanism. The proposed model improves over both seq2seq and parsed baselines by over 1 BLEU on the WMT17 English-German task. Further analysis shows that our multi-source syntactic model is able to translate successfully without any parsed input, unlike standard parsed methods. In addition, performance does not deteriorate as much on long sentences as for the baselines.

25. Multi-Source Domain Adaptation with Mixture of Experts [PDF] 返回目录
  EMNLP 2018.
  Jiang Guo, Darsh Shah, Regina Barzilay
We propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources. The key idea is to explicitly capture the relationship between a target example and different source domains. This relationship, expressed by a point-to-set metric, determines how to combine predictors trained on various domains. The metric is learned in an unsupervised fashion using meta-training. Experimental results on sentiment analysis and part-of-speech tagging demonstrate that our approach consistently outperforms multiple baselines and can robustly handle negative transfer.

26. Multi-source synthetic treebank creation for improved cross-lingual dependency parsing [PDF] 返回目录
  EMNLP 2018. the Second Workshop on Universal Dependencies (UDW 2018)
  Francis Tyers, Mariya Sheyanova, Aleksandra Martynova, Pavel Stepachev, Konstantin Vinogorodskiy
This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm. Sentences are first automatically translated from a lesser-resourced language to a number of related highly-resourced languages, parsed and then the annotations are projected back to the lesser-resourced language, leading to multiple trees for each sentence from the lesser-resourced language. The final treebank is created by merging the possible trees into a graph and running the spanning tree algorithm to vote for the best tree for each sentence. We present experiments aimed at parsing Faroese using a combination of Danish, Swedish and Norwegian. In a similar experimental setup to the CoNLL 2018 shared task on dependency parsing we report state-of-the-art results on dependency parsing for Faroese using an off-the-shelf parser.

27. Low-resource named entity recognition via multi-source projection: Not quite there yet? [PDF] 返回目录
  EMNLP 2018. the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
  Jan Vium Enghoff, Søren Harrison, Željko Agić
Projecting linguistic annotations through word alignments is one of the most prevalent approaches to cross-lingual transfer learning. Conventional wisdom suggests that annotation projection “just works” regardless of the task at hand. We carefully consider multi-source projection for named entity recognition. Our experiment with 17 languages shows that to detect named entities in true low-resource languages, annotation projection may not be the right way to move forward. On a more positive note, we also uncover the conditions that do favor named entity projection from multiple sources. We argue these are infeasible under noisy low-resource constraints.

28. Input Combination Strategies for Multi-Source Transformer Decoder [PDF] 返回目录
  EMNLP 2018. the Third Conference on Machine Translation: Research Papers
  Jindřich Libovický, Jindřich Helcl, David Mareček
In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.

29. Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 task [PDF] 返回目录
  EMNLP 2018. the Third Conference on Machine Translation: Shared Task Papers
  Brian Tubay, Marta R. Costa-jussà
The Transformer architecture has become the state-of-the-art in Machine Translation. This model, which relies on attention-based mechanisms, has outperformed previous neural machine translation architectures in several tasks. In this system description paper, we report details of training neural machine translation with multi-source Romance languages with the Transformer model and in the evaluation frame of the biomedical WMT 2018 task. Using multi-source languages from the same family allows improvements of over 6 BLEU points.

30. A Transformer-Based Multi-Source Automatic Post-Editing System [PDF] 返回目录
  EMNLP 2018. the Third Conference on Machine Translation: Shared Task Papers
  Santanu Pal, Nico Herbig, Antonio Krüger, Josef van Genabith
This paper presents our English–German Automatic Post-Editing (APE) system submitted to the APE Task organized at WMT 2018 (Chatterjee et al., 2018). The proposed model is an extension of the transformer architecture: two separate self-attention-based encoders encode the machine translation output (mt) and the source (src), followed by a joint encoder that attends over a combination of these two encoded sequences (encsrc and encmt) for generating the post-edited sentence. We compare this multi-source architecture (i.e, {src, mt} → pe) to a monolingual transformer (i.e., mt → pe) model and an ensemble combining the multi-source {src, mt} → pe and single-source mt → pe models. For both the PBSMT and the NMT task, the ensemble yields the best results, followed by the multi-source model and last the single-source approach. Our best model, the ensemble, achieves a BLEU score of 66.16 and 74.22 for the PBSMT and NMT task, respectively.

31. Multi-source transformer with combined losses for automatic post editing [PDF] 返回目录
  EMNLP 2018. the Third Conference on Machine Translation: Shared Task Papers
  Amirhossein Tebbifakhr, Ruchit Agrawal, Matteo Negri, Marco Turchi
Recent approaches to the Automatic Post-editing (APE) of Machine Translation (MT) have shown that best results are obtained by neural multi-source models that correct the raw MT output by also considering information from the corresponding source sentence. To this aim, we present for the first time a neural multi-source APE model based on the Transformer architecture. Moreover, we employ sequence-level loss functions in order to avoid exposure bias during training and to be consistent with the automatic evaluation metrics used for the task. These are the main features of our submissions to the WMT 2018 APE shared task, where we participated both in the PBSMT subtask (i.e. the correction of MT outputs from a phrase-based system) and in the NMT subtask (i.e. the correction of neural outputs). In the first subtask, our system improves over the baseline up to -5.3 TER and +8.23 BLEU points ranking second out of 11 submitted runs. In the second one, characterized by the higher quality of the initial translations, we report lower but statistically significant gains (up to -0.38 TER and +0.8 BLEU), ranking first out of 10 submissions.

32. Recommendation with Multi-Source Heterogeneous Information [PDF] 返回目录
  IJCAI 2018.
  Li Gao, Hong Yang, Jia Wu, Chuan Zhou, Weixue Lu, Yue Hu
Network embedding has been recently used in social network recommendations by embedding low-dimensional representations of network items for recommendation. However, existing item recommendation models in social networks suffer from two limitations. First, these models partially use item information and mostly ignore important contextual information in social networks such as textual content and social tag information. Second, network embedding and item recommendations are learned in two independent steps without any interaction. To this end, we in this paper consider item recommendations based on heterogeneous information sources. Specifically, we combine item structure, textual content and tag information for recommendation. To model the multi-source heterogeneous information, we use two coupled neural networks to capture the deep network representations of items, based on which a new recommendation model Collaborative multi-source Deep Network Embedding (CDNE for short) is proposed to learn different latent representations. Experimental results on two real-world data sets demonstrate that CDNE can use network representation learning to boost the recommendation performance.

33. From Shared Subspaces to Shared Landmarks: A Robust Multi-Source Classification Approach [PDF] 返回目录
  AAAI 2017. Machine Learning Methods
  Sarah M. Erfani, Mahsa Baktashmotlagh, Masud Moshtaghi, Vinh Nguyen, Christopher Leckie, James Bailey, Kotagiri Ramamohanarao
Training machine leaning algorithms on augmented data fromdifferent related sources is a challenging task. This problemarises in several applications, such as the Internet of Things(IoT), where data may be collected from devices with differentsettings. The learned model on such datasets can generalizepoorly due to distribution bias. In this paper we considerthe problem of classifying unseen datasets, given several labeledtraining samples drawn from similar distributions. Weexploit the intrinsic structure of samples in a latent subspaceand identify landmarks, a subset of training instances fromdifferent sources that should be similar. Incorporating subspacelearning and landmark selection enhances generalizationby alleviating the impact of noise and outliers, as well asimproving efficiency by reducing the size of the data. However,since addressing the two issues simultaneously resultsin an intractable problem, we relax the objective functionby leveraging the theory of nonlinear projection and solve atractable convex optimisation. Through comprehensive analysis,we show that our proposed approach outperforms stateof-the-art results on several benchmark datasets, while keepingthe computational complexity low.

34. Attention Strategies for Multi-Source Sequence-to-Sequence Learning [PDF] 返回目录
  ACL 2017. Short Papers
  Jindřich Libovický, Jindřich Helcl
Modeling attention in neural multi-source sequence-to-sequence learning remains a relatively unexplored area, despite its usefulness in tasks that incorporate multiple source languages or modalities. We propose two novel approaches to combine the outputs of attention mechanisms over each source sequence, flat and hierarchical. We compare the proposed methods with existing techniques and present results of systematic evaluation of those methods on the WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the proposed methods achieve competitive results on both tasks.

35. Source-Target Similarity Modelings for Multi-Source Transfer Gaussian Process Regression [PDF] 返回目录
  ICML 2017.
  Pengfei Wei, Ramón Sagarna, Yiping Ke, Yew-Soon Ong, Chi-Keong Goh
A key challenge in multi-source transfer learning is to capture the diverse inter-domain similarities. In this paper, we study different approaches based on Gaussian process models to solve the multi-source transfer regression problem. Precisely, we first investigate the feasibility and performance of a family of transfer covariance functions that represent the pairwise similarity of each source and the target domain. We theoretically show that using such a transfer covariance function for general Gaussian process modelling can only capture the same similarity coefficient for all the sources, and thus may result in unsatisfactory transfer performance. This leads us to propose TC$_{MS}$Stack, an integrated strategy incorporating the benefits of the transfer covariance function and stacking. Extensive experiments on one synthetic and two real-world datasets, with learning settings of up to 11 sources for the latter, demonstrate the effectiveness of our proposed TC$_{MS}$Stack.

36. A Representation Learning Framework for Multi-Source Transfer Parsing [PDF] 返回目录
  AAAI 2016. Technical Papers: NLP and Machine Learning
  Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, Ting Liu
Cross-lingual model transfer has been a promising approach for inducing dependency parsers for low-resource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target language-specific syntactic structures are difficult to be recovered. To address these two challenges, we present a novel representation learning framework for multi-source transfer parsing. Our framework allows multi-source transfer parsing using full lexical features straightforwardly. By evaluating on the Google universal dependency treebanks (v2.0), our best models yield an absolute improvement of 6.53% in averaged labeled attachment score, as compared with delexicalized multi-source transfer models. We also significantly outperform the state-of-the-art transfer system proposed most recently.

37. Ensemble Learning for Multi-Source Neural Machine Translation [PDF] 返回目录
  COLING 2016.
  Ekaterina Garmash, Christof Monz
In this paper we describe and evaluate methods to perform ensemble prediction in neural machine translation (NMT). We compare two methods of ensemble set induction: sampling parameter initializations for an NMT system, which is a relatively established method in NMT (Sutskever et al., 2014), and NMT systems translating from different source languages into the same target language, i.e., multi-source ensembles, a method recently introduced by Firat et al. (2016). We are motivated by the observation that for different language pairs systems make different types of mistakes. We propose several methods with different degrees of parameterization to combine individual predictions of NMT systems so that they mutually compensate for each other’s mistakes and improve overall performance. We find that the biggest improvements can be obtained from a context-dependent weighting scheme for multi-source ensembles. This result offers stronger support for the linguistic motivation of using multi-source ensembles than previous approaches. Evaluation is carried out for German and French into English translation. The best multi-source ensemble method achieves an improvement of up to 2.2 BLEU points over the strongest single-source ensemble baseline, and a 2 BLEU improvement over a multi-source ensemble baseline.

38. Multi-Source Iterative Adaptation for Cross-Domain Classification [PDF] 返回目录
  IJCAI 2016.
  Himanshu S. Bhatt, Arun Rajkumar, Shourya Roy
Owing to the tremendous increase in the volume and variety of user generated content, train-once-apply-forever models are insufficient for supervised learning tasks. Thus, developing algorithms that adapt across domains by leveraging data from multiple domains is critical. However, existing adaptation algorithms often fail to identify the right sources to use for adaptation. In this work, we present a novel multi-source iterative domain adaptation algorithm (MSIDA) that leverages knowledge from selective sources to improve the performance in a target domain. The algorithm first chooses the best K sources from possibly numerous existing domains taking into account both similarity and complementarity properties of the domains. Then it learns target specific features in an iterative manner building on the common shared representations from the source domains. We give theoretical justifications for our source selection procedure and also give mistake bounds for the MSIDA algorithm. Experimental results justify the theory as MSIDA significantly outperforms existing cross-domain classification approaches on the real world and benchmark datasets.

39. Multi-Source Neural Translation [PDF] 返回目录
  NAACL 2016.
  Barret Zoph, Kevin Knight


40. Multi-Source Domain Adaptation: A Causal View [PDF] 返回目录
  AAAI 2015. Novel Machine Learning Algorithms
  Kun Zhang, Mingming Gong, Bernhard Schölkopf
This paper is concerned with the problem of domain adaptation with multiple sources from a causal point of view. In particular, we use causal models to represent the relationship between the features X and class label Y , and consider possible situations where different modules of the causal model change with the domain. In each situation, we investigate what knowledge is appropriate to transfer and find the optimal target-domain hypothesis. This gives an intuitive interpretation of the assumptions underlying certain previous methods and motivates new ones. We finally focus on the case where Y is the cause for X with changing PY and PX|Y , that is, PY and PX|Y change independently across domains. Under appropriate assumptions, the availability of multiple source domains allows a natural way to reconstruct the conditional distribution on the target domain; we propose to model PX|Y (the process to generate effect X from cause Y ) on the target domain as a linear mixture of those on source domains, and estimate all involved parameters by matching the target-domain feature distribution. Experimental results on both synthetic and real-world data verify our theoretical results.

41. Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning [PDF] 返回目录
  IJCAI 2015.
  Xuemeng Song, Liqiang Nie, Luming Zhang, Maofu Liu, Tat-Seng Chua
https://www.ijcai.org/User interest inference from social networks is a fundamental problem to many applications. It usually exhibits dual-heterogeneities: a user's interests are complementarily and comprehensively reflected by multiple social networks; interests are inter-correlated in a nonuniform way rather than independent to each other. Although great success has been achieved by previous approaches, few of them consider these dual-heterogeneities simultaneously. In this work, we propose a structure-constrained multi-source multi-task learning scheme to co-regularize the source consistency and the tree-guided task relatedness. Meanwhile, it is able to jointly learn the task-sharing and task-specific features. Comprehensive experiments on a real-world dataset validated our scheme. In addition, we have released our dataset to facilitate the research communities.

注:论文列表使用AC论文搜索器整理!