Domain Adaptation via TransferComponent Analysis个人笔记

abstract:

        本文提出通过一种新的学习方法,转移成分分析(TCA)来找到领域的良好特征表示,用于领域自适应。TCA学习所有域的公共迁移成分(即不会引起域间分布变化保持原始数据固有结构的成分), 使得不同域在投影后的子空间中分布差异减少.

I. INTRODUCTION:

内容:

        Our main contribution is on proposing a novel dimensionality reduction method to reduce the distance between domains via projecting data onto a learned transfer subspace.

        TCA and its semisupervised extension SSTCA are much more efficient than MMDE and can handle the outof-sample extension problem.       

摘抄句:

        This is an important learning problem because labeled data are often difficult to come by, making it desirable to make the best use of any related data available. For example.....

        A major computational problem in domain adaptation is how to reduce the difference between the distributions of the source and target domain data. Intuitively, discovering a good feature representation across domains is crucial...

        In this paper, we propose a new feature extraction approach, called transfer component analysis (TCA), for domain adaptation.例举前人的工作和它们的缺陷后提出本文自己的方法。

        More specifically, if two domains are related to each other, there may exist several common
components (or latent variables) underlying them. 

       Our main contribution is on proposing a novel dimensionality reduction method to reduce the distance between domains via projecting data onto a learned transfer subspace.

II. PREVIOUS WORKS AND PRELIMINARIES

        内容:In Section II, we first introduce the domain adaptation problem and traditional dimensionality reduction methods and describe the Hilbert space embedding for distances and dependence measure between distributions

A. Domain Adaptation

        The main difference between these methods and our proposed method is that we aim to match data distributions between domains in a latent space, where data properties can be preserved, instead of matching them in the original feature space.

        摘抄:The key assumption in most domain adaptation methods is that P≠Q,
but P(Ys|Xs) = P(Yt|Xt)   

        The main difference between these methods and our proposed method is that we aim to match data distributions between domains in a latent space, where data properties can be preserved, instead of matching them in the original feature space.

B. Hilbert Space Embedding of Distributions

        在2006年, Borgwardt等人[1]提出了一种基于再生核希尔伯特空间(Reproducing Kernel Hilbert Space, RKHS)的分布度量准则 最大均值差异(Maximum Mean Discrepancy, MMD).

        H is the RKHS norm.是核映射函数,在目标域里存在有标签数据Xt的情况下通过最小化上式,可以获得最佳函数。

C. Embedding Using HSIC

III. TCA

摘抄:

        As mentioned in Section II-A, most domain adaptation methods assume that P ≠ Q, but P(Ys|Xs) = P(Yt|Xt),However, in many real-world applications, the conditional probability P(Y|X) may also change across domains due to noisy or dynamic factors underlying the observed data. 

A. Minimizing Distance Between P(φ(Xs)) and P(φ(Xt))

        Instead of finding the nonlinear transformation φ explicitly, we first revisit a dimensionality reduction-based domain adaptation method called MMDE.

输入是两个特征矩阵,首先计算L和H矩阵,然后选择一些常用的核函数进行映射(比如线性核、高斯核)计算K,接着求((KLK+μI)^-1)*KHK的前m个特征值,,W的解就是的前m个特征值,得到的就是源域和目标域的降维后的数据,可以在上面用传统机器学习方法了。

总结:

TCA最后的优化目标是:

H是一个中心矩阵,

W是最后所求。

一旦通过TCA的计算过程获得了迁移成分矩阵W,就可以使用它来将来自源领域和目标领域的数据投影到共同的潜在空间中。这种投影使得可以对转换后的数据应用标准机器学习方法,用于诸如分类或回归等任务。可以将转换后的数据用作任何机器学习算法或模型的输入,以在源领域中训练分类器或回归模型,并将其应用于目标领域。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值