【论文阅读】Universal Domain Adaptation

最新推荐文章于 2024-08-31 11:17:36 发布

置顶 petSym

最新推荐文章于 2024-08-31 11:17:36 发布

阅读量1.2k

点赞数

分类专栏：迁移学习论文阅读文章标签：论文阅读 DA

本文链接：https://blog.csdn.net/petSym/article/details/106088457

版权

迁移学习同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

论文阅读

4 篇文章 0 订阅

订阅专栏

Universal Domain Adaptation

SUMMARY@2020/3/27

文章目录

Motivation

This paper focuses on its special setting of universal domain adaptation, where

no prior information about the target label sets is provided.
we know source domain with labeled data.

The following figure shows this motivation of this setting.

and the following show some settings of this universal domain adaptation:

Related Work

This work is partly based on some early works of partially set domain adaptation by Mingsheng Long group, like:

SAN (Partial Transfer Learning with Selective Adversarial Networks)
- utilizes multiple domain discriminators with class level and instance-level weighting mechanism to achieve per-class adversarial distribution matching.
PADA (Partial adversarial domain adaptation)
- only one adversarial network and jointly applying class-level weighting on the source classifier
- haven’t yet read

and some of others’ relative work of :

IWAN (Importance weighted adversarial nets for partial domain adaptation)
- constructs an auxiliary domain discriminator to quantify the probability of a source sample being similar to the target domain.
- haven’t yet read

And these works all partly applies the idea of adversarial network GAN and domain adaptation version GAN:

GAN (Generative Adversarial Nets)
DANN( Domain-Adversarial Training of Neural Networks)
- adversarial-based, deep method domain adaptation

Challenges / Aims /Contribution

Under the universal domain adaptation setting, our goal now is to match the common categories in source and target domain. The main challenges of solving this universal problem are:

how to deal with the $\bar {C_S}$ part which is unrelated part of source domain to circumvent negative transfer for target domain
effective domain adaptation between related part of source domain and target domain
learn model (feature extraction & classifier )to minimize the target risk in the common set $C$

Method Proposed

UAN(universal adaptation network) is composes of 4 parts in training phase as following figure shows.

Feature extractor $F$

find good features that match source and target
good features to be used by classifier

Label classifier $G$

compute prediction label $\hat y = G(F(x)) \in C_S$ (source domain label set)
classification loss need to be minimized by good parameters of $F$ and $G$
$E_G = \mathbb E_{(\mathrm{x,y})\sim p}L(\mathrm{y},G(F(\mathrm x)))$

Non-adversarial domain discriminator $D^\prime$

compute similarity of each $\rm x$ to source domain
- $\hat d^\prime = D^\prime(\rm z) \in[0,1]$
- $\hat d^\prime \rightarrow 1 $ if x is more similar to source
domain classification loss need to be minimized, thus end up with good $\hat d^\prime$ output for every sample from both source and target domain:
$E_{D^\prime} = - \mathbb E_{\mathrm{x}\sim p}\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}\mathrm{log}(1- D^\prime(F(\mathrm x)))$
hypothesis: expectation of similarity value from different label set distribution will be used in weighting adversarial domain discriminator D:
$\mathbb E_{\mathrm x\sim {p_{\bar {C_S}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {p_{ {C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{{C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} {\hat d^\prime}$
not used in adversarial, since it is the same as in DANN, which aims at matching the exactly same source and target label space. may cause negative transfer in universal setting.

Adversarial domain discriminator $D$

aims at discriminate source and target in the common label set $C$
domain discriminate loss: needs to be minimized for good discriminator; needs to be maximized which equals the good representation of feature extractor:
$E_{D} = - \mathbb E_{\mathrm{x}\sim p}w^s(\mathrm x)\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}w^t(\mathrm x)\mathrm{log}(1- D^\prime(F(\mathrm x)))$
add big weights for samples from common label set in both source and target domain , aims at maximally match the source and target domain specially in common label set.
weights(called “sample level transferability criterion”) to be constructed:
$\mathbb E_{\mathrm x\sim {p_{{C}}}} w^s(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar p_{{C_s}}}} w^s(\mathrm x) \\ \mathbb E_{\mathrm x\sim {q_{{C}}}} w^t(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar q_{{C_t}}}} w^t(\mathrm x)$
use entropy of predicted vector to measure uncertainty of prediction:
$\mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {q_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{\bar {C_s}}}} H(\mathrm {\hat y})$
use domain similarity and the prediction uncertainty of each sample, to develop a weighting mechanism for discovering label sets shared by both domains and promote common-class adaptation
$w^s(\mathrm x) = \frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}-\hat d^\prime(\mathrm x) \\ w^t(\mathrm x) = \hat d^\prime(\mathrm x)-\frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}\\$
- normalized H
- all together normalized when training