Research notes for Transfer Learning

最新推荐文章于 2021-11-14 16:15:54 发布

Felix_夜雨

最新推荐文章于 2021-11-14 16:15:54 发布

阅读量1.2k

点赞数 1

分类专栏： Machine Learning Research 文章标签： Research machine learning 机器学习 study notes

本文链接：https://blog.csdn.net/u010693617/article/details/9167391

版权

Machine Learning 同时被 2 个专栏收录

23 篇文章 0 订阅

订阅专栏

Research

1 篇文章 0 订阅

订阅专栏

Introduction

Transfer learning (a.k.a, cross-domain learning or domain adaptation) is an emerging research topic in computer vision and recommender systems, etc.
It is well-known that the feature distributions of examples from different domains (e.g., web domain, consumer domain) may differ tremendously in terms of statistical properties (e.g., mean, intra- or inter-class variance).
Transfer learning is developed to cope with such considerable variation in feature distributions between different domains.
Formalization:
- The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains, which share some commonality.
- Given a target task, how to identify the commonality between the task and previous (source) tasks, and transfer knowledge from the source tasks to the target one?
Fields:
- Transfer learning for reinforcement learning
- Transfer learning for classification and regression problems (the focus).
Usage:
- It is often used in the cases where a target domain (or data set) of interest contains very few or even no labeled examples, while an existing (auxiliary) domain is often available with a large number of labeled examples. We want to leverage both domain data sets to obtain a model that performs well on the target domain. This is called fully supervised case.
- Another case is called semi-supervised, where instead of having small labeled examples, we have a large but unlabeled target examples.
Settings (Sinno Jialin Pan, Transfer Learning). Note that the tasks can be regarded as classification, or regression and whether they have the same target label space.
- Supervised domain transfer
- Semi-supervised domain transfer
- Unsupervised domain transfer
Some baseline approaches (Hal Daum´e III, 2007):
- SRC ignores the target data and trains a single model, only on the source data.
- TGT trains a single model only on the target data.
- ALL simply trains a model on the union of the two data sets. One problem is that since source data set (N) is much larger than the target one (M), the source data may "wash out" any effect that the target data might have.
- WEIGHTED is to solve the problem of ALL by introducing weights for source examples. For example, if N=10xM, we may weight each example from source domain by 0.1.
- PRED is based on the idea of using the output of the source classifier as a feature in the target classifier. Specifically, we first train a SRC model , then run it on the target data (training and testing). We use the obtained predictions as additional features and train a second model on the target data.
- LININT linearly interpolates the predictions of the RCD and the TGT models. The interpolation parameter is adjusted based on target validating data.

Transfer Learning Theory

Sample selection bias assumes:
- Same predictive distribution
Covariate shift assumes:
- Changes in marginal distribution $P(\mathcal{X})$
- Same predictive distribution $P(Y|\mathcal{X})$
- Shared support in target and source domains
- References: (Gretton et al., 2008) and (Bickel et al., 2009)
Distance measures
Predictive distribution matching
Generalization bounds
Multiple sources:
- Target domain has no or very few labeled data
- Source domains might have different class distribution from the target one.
- Each source domains may train a classifier or regressor, so how to choose a classifier for the target domain?
  - Responses that lead to positive outcomes should be retained. => positive transfer
  - Responses that lead to negative outcomes should be removed. => negative transfer
  - What if the target labels are unknown? How to define positive and negative transfer?
    - Using the source classifier to cluster target unlabeled data, if they are well separated, the classifier is regarded as good, otherwise poor.
    - Select a "good" source classifier that has minimal bias towards source domain (Seah et al., 2011). A small margin separation refers to negative transfer while a large margin separation refers to positive transfer.
- References: (Crammer et al., 2008) and (Schweikert et al., 2009).

Transfer Learning Algorithms

Instance Re-weighting.
- Transfer Component Analysis (TCA): finding a learned $\varphi$ that maps the source and target domain data onto a latent space spanned by the factors which can reduce domain difference and preserve original data structure (Pan et al., IJCAI 2009, TNN 2011). It aims to optimize:
  $\begin{array}{ll}\min_\varphi & Dist(\varphi(X_s), \varphi(X_t))+\lambda \Omega(\varphi) \\ \mbox{s.t.} & \mbox{constraints on } \varphi(X_s) \mbox{ and } \varphi(X_t)\end{array}$
When statistics of the data change: cross-domain methods that adapt features.
- domain transform
- asymmetric transform
- manifold walks
When labels are expensive: cross-knowledge methods that share features.
- Sharing features across tasks
- Visual taxonomies

Challenges and Future Directions

Howe to avoid negative transfer. Given a target domain/task, how to find source domains/tasks to ensure positive transfer.
Transfer learning meets active learning.
Given a specific application, which kind of transfer learning methods should be used?
Speed-up in the large-scale problems.

Application in RecSys

TBA.

References

Crammer et al., 2008, Learning from multiple sources, JMLR.
Hal Daum´e III, 2007, Frustratingly Easy Domain Adaptation.
Ivor Tsang, Domain transfer learning: basics and algorithms.
Schweikert et al., 2009, An empirical analysis of domain adaptation algorithm for genomic sequence analysis, NIPS.
Sinno Jialin Pan, Transfer Leaerning (tutorial slides).
Sinno Jialin Pan and Qiang Yang, 2007, A Survey on Transfer Learning, IEEE TKDE. (will give a tutorial on IJCAI 2013)
Gretton et al., 2008, Covariate Shift by Kernel Mean Matching, NIPS.
Bickel et al., 2009, Discriminate learning under covariate shift, JMLR
A list of papers are available via here.

Felix_夜雨

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Research notes for Transfer Learning

IntroductionTransfer learning (a.k.a, cross-domain learning or domain adaptation) is an emerging research topic in computer vision and recommender systems, etc. It is well-known that the feature d
复制链接

扫一扫