【论文总结】 Multi-source Domain Adaptation (持续更新)

_孤鸿寄语_

已于 2022-11-10 03:18:41 修改

阅读量892

点赞数 1

分类专栏： Domain Adaptation 论文笔记 few-shot learning 文章标签： python 人工智能论文阅读

于 2022-10-27 02:51:26 首次发布

本文链接：https://blog.csdn.net/weixin_44563093/article/details/127525160

版权

论文笔记同时被 3 个专栏收录

17 篇文章 1 订阅

订阅专栏

Domain Adaptation

13 篇文章 1 订阅

订阅专栏

few-shot learning

6 篇文章 0 订阅

订阅专栏

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

Single-source DA vs Multi-source DA

SUDA

labeled data is from one single source domain
常用solution: learn to map the data from source & target domains into a common feature space to learn domain-invariant representations by minimizing domain distribution discrepancy (MMD/)

MUDA

shift between multiple source domains (hard to align)
有时候 domain 之间拥有的 class 范围甚至不同
在domain-specific decision boundary 附近的 target samples 可能会被不同的classifier 判别出不同的labels
常用sol：two-stage alignments
– stage I: map each pair of source and target domains data into multiple different feature spaces --> align domain-specific distributions to learn multiple domain-invariant representations --> train multiple domain-specific classifiers using multiple domains-invariant representations
– stage II: aligning domain-specific classifiers

Paper I: Aligning Domain-specific Distribution and Classifier for Cross-domain Classification from Multiple Sources (AAAI’19)

Problem Formulation

Given

$N$ different underlying source distributions ${p_{sj}(x,y)\}_{j=1}^{N}$ and labeled source domain data ${(X_{sj}, Y_{sj})\}_{j=1}^{N}$ drawn from these distributions
target distribution ${p_{t}(x,y)\}$ , from which target domain data $X_t$ are sampled yet without label observation $Y_t$ .

Objective

Methodology

在这里插入图片描述

Two-stage alignment Framework

Common feature extractor

A common subnetwork $f (.)$ is used to extract common representations for all domains, which map the images from the original feature space into a common feature space.

Domain-specific feature extractor

Given:

$x^{sj}$ from source domain $X_{sj}, Y_{sj})$ , $x^{t}$ from target domain $X^{t}$
$N$ unshared domain-specific subnetworks $h_{g}(.)$ for each source domain $X_{sj}, Y_{sj})$ , which map each pair of source and target domains into a specific feature space
These domain-specific feature extractors receive the common features $f(x^{sj})$ and $f(x^{t})$ from common feature extractor $f (.)$
Use the MMD/adversarial/CORAL loss method to reduce the distribution discrepancy between domains.

Domain-specific feature extractor

$C$ is a multi-output net composed by $N$ domain-specific predictor ${C_j\}_{j=1}^N$ .
For each classifier, we add a classification loss using cross entropy

Two Alignment

Domain-specific Distribution Alignment: MMD
Domain-specific Classifier Alignment:
- Intuition: the same target sample predicted by different classifiers should get the same prediction
- Utilize the absolute values of the difference between all pairs of classifiers’ probabilistic outputs of target domain data as discrepancy loss

Overall Multiple Feature Spaces Adaptation Network (MFSAN)

在这里插入图片描述
注意：计算 Domain-specific Classifier Alignment loss 用的是target samples

Paper II: Multi-source Distilling Domain Adaptation(AAAI’ 20)

Motivation

Limitations of the state-of-the-art MDA methods:

Sacrifice the discriminative property of the extracted features for the desired task learner in order to learn domain invariant features
Treat the multiple sources equally and fail to consider the different discrepancy among sources and target, as illustrated in Figure 1. Such treatment may lead to suboptimal performance when some sources are very different from the target (Zhao et al. 2018a).
Treat different samples from each source equally, without distilling the source data based on the fact that different samples from the same source domain may have different similarities from the target.
The adversarial learning based methods suffer from vanishing gradient problem when the domain classifier network can perfectly distinguish target representations from the source ones.

Problem Formulation

Given

$M$ different labeled source domains $S_1, S_2... S_M$ and a fully unlabeled target domain $T$
Homogeneity: data from different domains are observed in the same feature space but exhibit different distributions
Close set: All the domains share their categories in class label space

Objective

To learn an adaptation model that can correctly predict a sample from the target domain based on ${(X_i, Y_i)\}_{i=1}^M$ and ${X_T\}$

在这里插入图片描述

MDDA Framework

Source Classifier Pre-training (Step 1)

Pre-train a feature extractor $F_i$ and classifier $C_i$ for each labeled source domain $S_i$ with unshared weights between different domains. $F_i$ and $C_i$ are optimized by minimizing the following cross-entropy loss.
Comparing with a shared feature extractor network, the unshared feature extractor network can obtain the discriminative feature representations and accurate classifiers for each source domain.

Adversarial Discriminative Adaptation (Step 2)

Fix feature extractor $F_i$ .
Learn a separate target encoder $F_i^T$ to map the target feature into the same space of source $S_i$ .
A discriminator $D_i$ is trained adversarially to maximize the Wasserstein distance of correctly classifying the encoded target features from $F_i^T$ and the source feature from pre-trained $F_i$ , while $F_i^T$ tries to maximize the probability of $D_i$ making a mistake, i.e. minimizing the Wasserstein distance(still a two-player minimax game)

Source Distilling (Step 3)

Dig into each source domain to select the source training samples that are closer to the target based on the estimated Wasserstein distance to fine-tune the source classifiers (In this paper, $\frac{N_i}{2}$ of source data is selected).

Aggregated Target Prediction (Step 4)

Extract the features $F_i^T(x_T)$ of the target image based on the learned target encoder from stage 2;
Obtain source-specific prediction $C_i'(F_i^T(x_T))$ using the distilled source classifier;
Combine the different predictions from each source classifier to obtain the final prediction;
The weighting strategy is based on the discrepancy between each source and target to emphasize more relevant sources
and suppress the irrelevant ones.