Research notes for Transfer Learning

Introduction

  • Transfer learning (a.k.a, cross-domain learning or domain adaptation) is an emerging research topic in computer vision and recommender systems, etc. 
  • It is well-known that the feature distributions of examples from different domains (e.g., web domain, consumer domain) may differ tremendously in terms of statistical properties (e.g., mean, intra- or inter-class variance). 
  • Transfer learning is developed to cope with such considerable variation in feature distributions between different domains.
  • Formalization: 
    • The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains, which share some commonality
    • Given a target task, how to identify the commonality between the task and previous (source) tasks, and transfer knowledge from the source tasks to the target one?
  • Fields: 
    • Transfer learning for reinforcement learning
    • Transfer learning for classification and regression problems (the focus). 
  • Usage: 
    • It is often used in the cases where a target domain (or data set) of interest contains very few or even no labeled examples, while an existing (auxiliary) domain is often available with a large number of labeled examples. We want to leverage both domain data sets to obtain a model that performs well on the target domain. This is called fully supervised case. 
    • Another case is called semi-supervised, where instead of having small labeled examples, we have a large but unlabeled target examples
  • Settings (Sinno Jialin Pan, Transfer Learning). Note that the tasks can be regarded as classification, or regression and whether they have the same target label space.
    • Supervised domain transfer
    • Semi-supervised domain transfer
    • Unsupervised domain transfer
  • Some baseline approaches (Hal Daum´e III, 2007):
    • SRC ignores the target data and trains a single model, only on the source data.
    • TGT trains a single model only on the target data.
    • ALL simply trains a model on the union of the two data sets. One problem is that since source data set (N) is much larger than the target one (M), the source data may "wash out" any effect that the target data might have. 
    • WEIGHTED is to solve the problem of ALL by introducing weights for source examples. For example, if N=10xM, we may weight each example from source domain by 0.1. 
    • PRED is based on the idea of using the output of the source classifier as a feature in the target classifier. Specifically, we first train a SRC model , then run it on the target data (training and testing). We use the obtained predictions as additional features and train a second model on the target data. 
    • LININT linearly interpolates the predictions of the RCD and the TGT models. The interpolation parameter is adjusted based on target validating data. 

Transfer Learning Theory

  • Sample selection bias assumes:
    • Same predictive distribution 
  • Covariate shift assumes: 
    • Changes in marginal distribution 
    • Same predictive distribution 
    • Shared support in target and source domains
    • References: (Gretton et al., 2008) and (Bickel et al., 2009)
  • Distance measures
  • Predictive distribution matching
  • Generalization bounds
  • Multiple sources:
    • Target domain has no or very few labeled data
    • Source domains might have different class distribution from the target one. 
    • Each source domains may train a classifier or regressor, so how to choose a classifier for the target domain?
      • Responses that lead to positive outcomes should be retained. => positive transfer
      • Responses that lead to negative outcomes should be removed. => negative transfer
      • What if the target labels are unknown? How to define positive and negative transfer?
        • Using the source classifier to cluster target unlabeled data, if they are well separated, the classifier is regarded as good, otherwise poor. 
        • Select a "good" source classifier that has minimal bias towards source domain (Seah et al., 2011). A small margin separation refers to negative transfer while a large margin separation refers to positive transfer. 
    • References: (Crammer et al., 2008) and (Schweikert et al., 2009).

Transfer Learning Algorithms

  • Instance Re-weighting. 
    • Transfer Component Analysis (TCA): finding a learned  that maps the source and target domain data onto a latent space spanned by the factors which can reduce domain difference and preserve original data structure (Pan et al., IJCAI 2009, TNN 2011). It aims to optimize: 
  • When statistics of the data change: cross-domain methods that adapt features.
    • domain transform
    • asymmetric transform
    • manifold walks
  • When labels are expensive: cross-knowledge methods that share features.
    • Sharing features across tasks
    • Visual taxonomies

Challenges and Future Directions

  • Howe to avoid negative transfer. Given a target domain/task, how to find source domains/tasks to ensure positive transfer. 
  • Transfer learning meets active learning. 
  • Given a specific application, which kind of transfer learning methods should be used?
  • Speed-up in the large-scale problems.

Application in RecSys

  • TBA. 

References

  1. Crammer et al., 2008, Learning from multiple sources, JMLR. 
  2. Hal Daum´e III, 2007, Frustratingly Easy Domain Adaptation. 
  3. Ivor Tsang, Domain transfer learning: basics and algorithms. 
  4. Schweikert et al., 2009, An empirical analysis of domain adaptation algorithm for genomic sequence analysis, NIPS. 
  5. Sinno Jialin Pan, Transfer Leaerning (tutorial slides). 
  6. Sinno Jialin Pan and Qiang Yang, 2007, A Survey on Transfer Learning, IEEE TKDE. (will give a tutorial on IJCAI 2013)
  7. Gretton et al., 2008, Covariate Shift by Kernel Mean Matching, NIPS.
  8. Bickel et al., 2009, Discriminate learning under covariate shift, JMLR
  9. A list of papers are available via here.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Abstract—Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. As the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Different from previous surveys, this survey paper reviews over forty representative transfer learning approaches from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值