【论文总结】 Multi-source Domain Adaptation (持续更新)

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


Single-source DA vs Multi-source DA

SUDA

  • labeled data is from one single source domain
  • 常用solution: learn to map the data from source & target domains into a common feature space to learn domain-invariant representations by minimizing domain distribution discrepancy (MMD/)

MUDA

  • shift between multiple source domains (hard to align)
  • 有时候 domain 之间拥有的 class 范围甚至不同
  • 在domain-specific decision boundary 附近的 target samples 可能会被不同的classifier 判别出不同的labels
  • 常用sol:two-stage alignments
    – stage I: map each pair of source and target domains data into multiple different feature spaces --> align domain-specific distributions to learn multiple domain-invariant representations --> train multiple domain-specific classifiers using multiple domains-invariant representations
    – stage II: aligning domain-specific classifiers
    SUDA vs MUDA

Paper I: Aligning Domain-specific Distribution and Classifier for Cross-domain Classification from Multiple Sources (AAAI’19)


Problem Formulation

Given

  • N N N different underlying source distributions { p s j ( x , y ) } j = 1 N \{p_{sj}(x,y)\}_{j=1}^{N} {psj(x,y)}j=1N and labeled source domain data { ( X s j , Y s j ) } j = 1 N \{(X_{sj}, Y_{sj})\}_{j=1}^{N} {(Xsj,Ysj)}j=1N drawn from these distributions
  • target distribution { p t ( x , y ) } \{p_{t}(x,y)\} {pt(x,y)} , from which target domain data X t X_t Xt are sampled yet without label observation Y t Y_t Yt .

Objective


Methodology

在这里插入图片描述

Two-stage alignment Framework
Common feature extractor

A common subnetwork f ( . ) f(.) f(.) is used to extract common representations for all domains, which map the images from the original feature space into a common feature space.

Domain-specific feature extractor

Given:

  • x s j x^{sj} xsj from source domain ( X s j , Y s j ) (X_{sj}, Y_{sj}) (Xsj,Ysj), x t x^{t} xt from target domain X t X^{t} Xt
  • N N N unshared domain-specific subnetworks h g ( . ) h_{g}(.) hg(.) for each source domain ( X s j , Y s j ) (X_{sj}, Y_{sj}) (Xsj,Ysj), which map each pair of source and target domains into a specific feature space
  • These domain-specific feature extractors receive the common features f ( x s j ) f(x^{sj}) f(xsj) and f ( x t ) f(x^{t}) f(xt) from common feature extractor f ( . ) f(.) f(.)
  • Use the MMD/adversarial/CORAL loss method to reduce the distribution discrepancy between domains.
Domain-specific feature extractor
  • C C C is a multi-output net composed by N N N domain-specific predictor { C j } j = 1 N \{C_j\}_{j=1}^N {Cj}j=1N.
  • For each classifier, we add a classification loss using cross entropy
    在这里插入图片描述
Two Alignment
  • Domain-specific Distribution Alignment: MMD
  • Domain-specific Classifier Alignment:
    • Intuition: the same target sample predicted by different classifiers should get the same prediction
    • Utilize the absolute values of the difference between all pairs of classifiers’ probabilistic outputs of target domain data as discrepancy loss
Overall Multiple Feature Spaces Adaptation Network (MFSAN)

在这里插入图片描述
注意: 计算 Domain-specific Classifier Alignment loss 用的是target samples


Paper II: Multi-source Distilling Domain Adaptation(AAAI’ 20)


Motivation

Limitations of the state-of-the-art MDA methods:

  1. Sacrifice the discriminative property of the extracted features for the desired task learner in order to learn domain invariant features
  2. Treat the multiple sources equally and fail to consider the different discrepancy among sources and target, as illustrated in Figure 1. Such treatment may lead to suboptimal performance when some sources are very different from the target (Zhao et al. 2018a).
  3. Treat different samples from each source equally, without distilling the source data based on the fact that different samples from the same source domain may have different similarities from the target.
  4. The adversarial learning based methods suffer from vanishing gradient problem when the domain classifier network can perfectly distinguish target representations from the source ones.
    在这里插入图片描述

Problem Formulation

Given

  • M M M different labeled source domains S 1 , S 2 . . . S M S_1, S_2... S_M S1,S2...SM and a fully unlabeled target domain T T T
  • Homogeneity: data from different domains are observed in the same feature space but exhibit different distributions
  • Close set: All the domains share their categories in class label space
Objective

To learn an adaptation model that can correctly predict a sample from the target domain based on { ( X i , Y i ) } i = 1 M \{(X_i, Y_i)\}_{i=1}^M {(Xi,Yi)}i=1M and { X T } \{X_T\} {XT}


在这里插入图片描述

MDDA Framework
Source Classifier Pre-training (Step 1)
  • Pre-train a feature extractor F i F_i Fi and classifier C i C_i Ci for each labeled source domain S i S_i Si with unshared weights between different domains. F i F_i Fi and C i C_i Ci are optimized by minimizing the following cross-entropy loss.
  • Comparing with a shared feature extractor network, the unshared feature extractor network can obtain the discriminative feature representations and accurate classifiers for each source domain.
Adversarial Discriminative Adaptation (Step 2)
  • Fix feature extractor F i F_i Fi.
  • Learn a separate target encoder F i T F_i^T FiT to map the target feature into the same space of source S i S_i Si.
  • A discriminator D i D_i Di is trained adversarially to maximize the Wasserstein distance of correctly classifying the encoded target features from F i T F_i^T FiT and the source feature from pre-trained F i F_i Fi , while F i T F_i^T FiT tries to maximize the probability of D i D_i Di making a mistake, i.e. minimizing the Wasserstein distance(still a two-player minimax game)
Source Distilling (Step 3)
  • Dig into each source domain to select the source training samples that are closer to the target based on the estimated Wasserstein distance to fine-tune the source classifiers (In this paper, N i 2 \frac{N_i}{2} 2Ni of source data is selected).
Aggregated Target Prediction (Step 4)
  • Extract the features F i T ( x T ) F_i^T(x_T) FiT(xT) of the target image based on the learned target encoder from stage 2;
  • Obtain source-specific prediction C i ′ ( F i T ( x T ) ) C_i'(F_i^T(x_T)) Ci(FiT(xT)) using the distilled source classifier;
  • Combine the different predictions from each source classifier to obtain the final prediction;
  • The weighting strategy is based on the discrepancy between each source and target to emphasize more relevant sources
    and suppress the irrelevant ones.
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
无监督的多源域自适应是指在没有访问源数据的情况下进行的域自适应方法。为了解释这一概念,首先需要了解两个关键术语的含义:域自适应和多源。 域自适应是指在机器学习和数据挖掘中,将模型从一个域(即数据的分布)迁移到另一个域的过程。域自适应的目标是使模型在目标域上具有更好的性能,而不需要重新训练或收集目标域的数据。 多源是指使用来自多个源领域的数据来进行域自适应。这种情况下,我们有多个源域的数据,但没有目标域的数据。这可能是因为目标域的数据很难收集、昂贵或没有权限访问。 在无监督的多源域自适应中,我们试图使用多个源域的数据来进行迁移学习,从而在没有目标域数据的情况下提高目标域上的性能。这个问题是非常具有挑战性的,因为我们没有标签的目标域数据来指导模型的训练。 一种常见的方法是使用领域间的分布差异来进行特征学习。例如,可以使用深度神经网络来学习源域和目标域之间的有用特征。通过最小化源域和目标域之间的距离,我们可以使网络学习到一组在多个域上通用的特征表示。 另一个方法是使用领域适应的损失函数。这种损失函数通过最大化源域和目标域之间的相似性,或最小化它们之间的差异,来迫使模型在目标域上有更好的性能。 总的来说,无监督的多源域自适应是一种在没有目标域数据的情况下使用多个源域数据进行迁移学习的方法。它可以通过学习通用特征或使用领域适应的损失函数来提高目标域上的性能。这种方法对于许多现实世界的情况是非常有用的,例如在医疗图像诊断和自然语言处理等领域中。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值