【论文阅读】Universal Domain Adaptation

Universal Domain Adaptation

SUMMARY@2020/3/27


Motivation

This paper focuses on its special setting of universal domain adaptation, where

  • no prior information about the target label sets is provided.
  • we know source domain with labeled data.

The following figure shows this motivation of this setting.

and the following show some settings of this universal domain adaptation:

Related Work

This work is partly based on some early works of partially set domain adaptation by Mingsheng Long group, like:

  • SAN (Partial Transfer Learning with Selective Adversarial Networks)
    • utilizes multiple domain discriminators with class level and instance-level weighting mechanism to achieve per-class adversarial distribution matching.
  • PADA (Partial adversarial domain adaptation)
    • only one adversarial network and jointly applying class-level weighting on the source classifier
    • haven’t yet read

and some of others’ relative work of :

  • IWAN (Importance weighted adversarial nets for partial domain adaptation)
    • constructs an auxiliary domain discriminator to quantify the probability of a source sample being similar to the target domain.
    • haven’t yet read

And these works all partly applies the idea of adversarial network GAN and domain adaptation version GAN:

  • GAN (Generative Adversarial Nets)
  • DANN( Domain-Adversarial Training of Neural Networks)
    • adversarial-based, deep method domain adaptation

Challenges / Aims /Contribution

Under the universal domain adaptation setting, our goal now is to match the common categories in source and target domain. The main challenges of solving this universal problem are:

  • how to deal with the C S ˉ \bar {C_S} CSˉ part which is unrelated part of source domain to circumvent negative transfer for target domain

  • effective domain adaptation between related part of source domain and target domain

  • learn model (feature extraction & classifier )to minimize the target risk in the common set C C C

Method Proposed

UAN(universal adaptation network) is composes of 4 parts in training phase as following figure shows.

Feature extractor F F F
  • find good features that match source and target
  • good features to be used by classifier
Label classifier G G G
  • compute prediction label y ^ = G ( F ( x ) ) ∈ C S \hat y = G(F(x)) \in C_S y^=G(F(x))CS (source domain label set)

  • classification loss need to be minimized by good parameters of F F F and G G G
    E G = E ( x , y ) ∼ p L ( y , G ( F ( x ) ) ) E_G = \mathbb E_{(\mathrm{x,y})\sim p}L(\mathrm{y},G(F(\mathrm x))) EG=E(x,y)pL(y,G(F(x)))

Non-adversarial domain discriminator D ′ D^\prime D
  • compute similarity of each x \rm x x to source domain

    • d ^ ′ = D ′ ( z ) ∈ [ 0 , 1 ] \hat d^\prime = D^\prime(\rm z) \in[0,1] d^=D(z)[0,1]
    • $\hat d^\prime \rightarrow 1 $ if x is more similar to source
  • domain classification loss need to be minimized, thus end up with good d ^ ′ \hat d^\prime d^ output for every sample from both source and target domain:
    E D ′ = − E x ∼ p l o g ( D ′ ( F ( x ) ) ) − E x ∼ q l o g ( 1 − D ′ ( F ( x ) ) ) E_{D^\prime} = - \mathbb E_{\mathrm{x}\sim p}\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}\mathrm{log}(1- D^\prime(F(\mathrm x))) ED=Explog(D(F(x)))Exqlog(1D(F(x)))

  • hypothesis: expectation of similarity value from different label set distribution will be used in weighting adversarial domain discriminator D:
    E x ∼ p C S ˉ d ^ ′ > E x ∼ p C d ^ ′ > E x ∼ q C d ^ ′ > E x ∼ q C t ˉ d ^ ′ \mathbb E_{\mathrm x\sim {p_{\bar {C_S}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {p_{ {C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{{C}}}} {\hat d^\prime} > \mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} {\hat d^\prime} ExpCSˉd^>ExpCd^>ExqCd^>ExqCtˉd^

  • not used in adversarial, since it is the same as in DANN, which aims at matching the exactly same source and target label space. may cause negative transfer in universal setting.

Adversarial domain discriminator D D D
  • aims at discriminate source and target in the common label set C C C

  • domain discriminate loss: needs to be minimized for good discriminator; needs to be maximized which equals the good representation of feature extractor:
    E D = − E x ∼ p w s ( x ) l o g ( D ′ ( F ( x ) ) ) − E x ∼ q w t ( x ) l o g ( 1 − D ′ ( F ( x ) ) ) E_{D} = - \mathbb E_{\mathrm{x}\sim p}w^s(\mathrm x)\mathrm{log}(D^\prime(F(\mathrm x))) - \mathbb E_{\mathrm{x}\sim q}w^t(\mathrm x)\mathrm{log}(1- D^\prime(F(\mathrm x))) ED=Expws(x)log(D(F(x)))Exqwt(x)log(1D(F(x)))

  • add big weights for samples from common label set in both source and target domain , aims at maximally match the source and target domain specially in common label set.

  • weights(called “sample level transferability criterion”) to be constructed:
    E x ∼ p C w s ( x ) > E x ∼ p ˉ C s w s ( x ) E x ∼ q C w t ( x ) > E x ∼ q ˉ C t w t ( x ) \mathbb E_{\mathrm x\sim {p_{{C}}}} w^s(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar p_{{C_s}}}} w^s(\mathrm x) \\ \mathbb E_{\mathrm x\sim {q_{{C}}}} w^t(\mathrm x) > \mathbb E_{\mathrm x\sim {\bar q_{{C_t}}}} w^t(\mathrm x) ExpCws(x)>ExpˉCsws(x)ExqCwt(x)>ExqˉCtwt(x)

  • use entropy of predicted vector to measure uncertainty of prediction:
    E x ∼ q C t ˉ H ( y ^ ) > E x ∼ q C H ( y ^ ) > E x ∼ p C H ( y ^ ) > E x ∼ p C s ˉ H ( y ^ ) \mathbb E_{\mathrm x\sim {q_{\bar {C_t}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {q_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{{C}}}} H(\mathrm {\hat y}) >\mathbb E_{\mathrm x\sim {p_{\bar {C_s}}}} H(\mathrm {\hat y}) ExqCtˉH(y^)>ExqCH(y^)>ExpCH(y^)>ExpCsˉH(y^)

  • use domain similarity and the prediction uncertainty of each sample, to develop a weighting mechanism for discovering label sets shared by both domains and promote common-class adaptation
    w s ( x ) = H ( y ^ ) l o g ∣ C s ∣ − d ^ ′ ( x ) w t ( x ) = d ^ ′ ( x ) − H ( y ^ ) l o g ∣ C s ∣ w^s(\mathrm x) = \frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}-\hat d^\prime(\mathrm x) \\ w^t(\mathrm x) = \hat d^\prime(\mathrm x)-\frac{H(\mathrm {\hat y})}{\mathrm{log}|C_s|}\\ ws(x)=logCsH(y^)d^(x)wt(x)=d^(x)logCsH(y^)

    • normalized H
    • all together normalized when training

Training

  • to write in GAN-based two stage, but in neural network implemented end-to-end by using the gradient reversal layer from DANN:

KaTeX parse error: Expected group after '_' at position 6: \max_̲\limits{D}\min_…

Testing

see figure below :

  • no adversarial D D D
  • calculate weight w t ( x ) w^t(x) wt(x)for sample x x x from target
  • set a validated threshold to argue whether x comes from common label set

Experiment

  • F F F is pretrained ResNet50
  • all unknown in target labeled as a whole "unknow " big class
  • better than prior setting methods
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值