cGANs with Projection Discriminator

Introduction

  • We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model (i.e., a function that measures the information theoretic distance between the generative distribution and the target distribution).
    • By construction, any assumption about the form of the distribution would act as a regularization on the choice of the discriminator. In this paper, we propose a specific form of the discriminator, a form motivated by a probabilistic model in which the distribution of the conditional variable y y y given x x x is discrete or uni-modal continuous distributions.
    • As we will explain in the next section, adhering to this assumption will give rise to a structure of the discriminator that requires us to take an inner product between the embedded condition vector y y y and the feature vector (Figure 1d).
      在这里插入图片描述

The Architecture of the cGAN Discriminator with a Probablistic Model Assumptions


Notation

  • x \boldsymbol x x: input vector
  • y \boldsymbol y y: conditional information (When y \boldsymbol y y is discrete label information, we can assume that it is encoded as a one-hot vector.)
  • D ( x , y ; θ ) : = A ( f ( x , y ; θ ) ) D(\boldsymbol x, \boldsymbol y; θ) := \mathcal A(f(\boldsymbol x, \boldsymbol y; θ)) D(x,y;θ):=A(f(x,y;θ)): cGAN discriminator, where A \mathcal A A is an activation function
  • q q q: the true distributions
  • p p p: the generated distributions

f ∗ ( x , y ) f^*(x,y) f(x,y)

  • The standard adversarial loss for the discriminator is given by:
    在这里插入图片描述with A \mathcal A A in D D D representing the sigmoid function.
  • 类似 GAN 中的推导,假设 D D D 可以表示任意函数,则可以推导出 optimal discriinator D ∗ ( x , y ) D^*(x,y) D(x,y):
    D ∗ ( x , y ) = q ( x , y ) q ( x , y ) + p ( x , y ) D^*(x,y)=\frac{q(x,y)}{q(x,y)+p(x,y)} D(x,y)=q(x,y)+p(x,y)q(x,y)由于现在假设激活函数为 sigmoid,因此有
    A ( f ( x , y ; θ ) ) = 1 1 + exp ⁡ ( − f ∗ ( x , y ) ) = D ∗ ( x , y ) = q ( x , y ) q ( x , y ) + p ( x , y ) \mathcal A(f(x,y;\theta))=\frac{1}{1+\exp(-f^*(x,y))}=D^*(x,y)=\frac{q(x,y)}{q(x,y)+p(x,y)} A(f(x,y;θ))=1+exp(f(x,y))1=D(x,y)=q(x,y)+p(x,y)q(x,y)因此有
    f ∗ ( x , y ) = log ⁡ q ( x , y ) p ( x , y ) = log ⁡ q ( x ∣ y ) q ( y ) p ( x ∣ y ) p ( y ) = log ⁡ q ( y ∣ x ) p ( y ∣ x ) + log ⁡ q ( x ) p ( x ) : = r ( y ∣ x ) + r ( x ) f^*(\boldsymbol x,\boldsymbol y)=\log\frac{q(\boldsymbol x,\boldsymbol y)}{p(\boldsymbol x,\boldsymbol y)}=\log \frac{q(\boldsymbol{x} \mid \boldsymbol{y}) q(\boldsymbol{y})}{p(\boldsymbol{x} \mid \boldsymbol{y}) p(\boldsymbol{y})}=\log \frac{q(\boldsymbol{y} \mid \boldsymbol{x})}{p(\boldsymbol{y} \mid \boldsymbol{x})}+\log \frac{q(\boldsymbol{x})}{p(\boldsymbol{x})}:=r(\boldsymbol{y} \mid \boldsymbol{x})+r(\boldsymbol{x}) f(x,y)=logp(x,y)q(x,y)=logp(xy)p(y)q(xy)q(y)=logp(yx)q(yx)+logp(x)q(x):=r(yx)+r(x)

Motivation behind the Projection Discriminator

log linear model

  • Log linear model is the most popular model for p ( y ∣ x ) p(y|x) p(yx). Assume that y y y is a categorical variable taking a value in { 1 , . . . , C } \{1, . . . , C\} {1,...,C}.
  • 如果我们要 softmax 来计算 x x x 属于各个类别的概率,则
    p ( y = c ∣ x ) = exp ⁡ ( o c ) ∑ j = 1 C exp ⁡ ( o j ) p(y=c|x)=\frac{\exp(o_c)}{\sum_{j=1}^C\exp(o_j)} p(y=cx)=j=1Cexp(oj)exp(oc)其中 o j o_j oj 为神经网络全连接层的输出,我们可以把它分解成全连接层的权重矩阵 V p T V^{pT} VpT (size: C × d L C\times d^L C×dL) (上标 p p p 代表该权重与真实概率 p p p 有关) 与输入向量 ϕ ( x ) \phi(x) ϕ(x) (size: d L × 1 d^L\times 1 dL×1) (代表提取出的 x x x 的 feature) 的乘积。这样代表类别概率的输出向量 o o o 就可以表示为 o = V p T ϕ ( x ) o=V^{pT}\phi(x) o=VpTϕ(x),而其中 o j o_j oj 为:
    o j = v j p T ϕ ( x ) o_j=v_j^{pT}\phi(x) oj=vjpTϕ(x)因此,
    log ⁡ p ( y = c ∣ x ) = log ⁡ exp ⁡ ( v c p T ϕ ( x ) ) ∑ j = 1 C exp ⁡ ( v j p T ϕ ( x ) ) = v c p T ϕ ( x ) − log ⁡ ( ∑ j = 1 C exp ⁡ ( v j p T ϕ ( x ) ) ) \begin{aligned}\log p(y=c|x)&=\log \frac{\exp(v_c^{pT}\phi(x))}{\sum_{j=1}^C\exp(v_j^{pT}\phi(x))} \\&=v_c^{pT}\phi(x)-\log(\sum_{j=1}^C\exp(v_j^{pT}\phi(x))) \end{aligned} logp(y=cx)=logj=1Cexp(vjpTϕ(x))exp(vcpTϕ(x))=vcpTϕ(x)log(j=1Cexp(vjpTϕ(x))) Z p ( ϕ ( x ) ) : = ∑ j = 1 C exp ⁡ ( v j p T ϕ ( x ) ) Z^p(\phi(x)):=\sum_{j=1}^C\exp(v_j^{pT}\phi(x)) Zp(ϕ(x)):=j=1Cexp(vjpTϕ(x)),则
    log ⁡ p ( y = c ∣ x ) = v c p T ϕ ( x ) − log ⁡ Z p ( ϕ ( x ) ) \begin{aligned}\log p(y=c|x)&=v_c^{pT}\phi(x)-\log Z^p(\phi(x)) \end{aligned} logp(y=cx)=vcpTϕ(x)logZp(ϕ(x))
  • 假设 log ⁡ q ( y = c ∣ x ) \log q(y=c|x) logq(y=cx) 也可以表示为上面的形式,并且使用同样的 ϕ \phi ϕ,则下面的对数似然比可表示为
    log ⁡ q ( y = c ∣ x ) p ( y = c ∣ x ) = v c q T ϕ ( x ) − log ⁡ Z q ( ϕ ( x ) ) − v c p T ϕ ( x ) + log ⁡ Z p ( ϕ ( x ) ) = ( v c q − v c p ) T ϕ ( x ) − ( log ⁡ Z q ( ϕ ( x ) ) − log ⁡ Z p ( ϕ ( x ) ) ) \begin{aligned}\log\frac{q(y=c|x)}{p(y=c|x)}&=v_c^{qT}\phi(x)-\log Z^q(\phi(x))-v_c^{pT}\phi(x)+\log Z^p(\phi(x)) \\&=(v_c^{q}-v_c^{p})^T\phi(x)-(\log Z^q(\phi(x))-\log Z^p(\phi(x))) \end{aligned} logp(y=cx)q(y=cx)=vcqTϕ(x)logZq(ϕ(x))vcpTϕ(x)+logZp(ϕ(x))=(vcqvcp)Tϕ(x)(logZq(ϕ(x))logZp(ϕ(x)))

  • log linear model 代入 f ∗ ( x , y ) f^*(x,y) f(x,y) 可得
    f ∗ ( x , y = c ) = log ⁡ q ( y = c ∣ x ) p ( y = c ∣ x ) + log ⁡ q ( x ) p ( x ) = ( v c q − v c p ) T ϕ ( x ) − ( log ⁡ Z q ( ϕ ( x ) ) − log ⁡ Z p ( ϕ ( x ) ) ) + log ⁡ q ( x ) p ( x ) \begin{aligned}f^*(x,y=c)&=\log \frac{q({y=c} \mid {x})}{p({y=c} \mid {x})}+\log \frac{q({x})}{p({x})} \\&=(v_c^{q}-v_c^{p})^T\phi(x)-(\log Z^q(\phi(x))-\log Z^p(\phi(x)))+\log \frac{q({x})}{p({x})} \end{aligned} f(x,y=c)=logp(y=cx)q(y=cx)+logp(x)q(x)=(vcqvcp)Tϕ(x)(logZq(ϕ(x))logZp(ϕ(x)))+logp(x)q(x)
  • v c = v c q − v c p v_c=v_c^{q}-v_c^{p} vc=vcqvcp; ψ ( ϕ ( x ) ) = − ( log ⁡ Z q ( ϕ ( x ) ) − log ⁡ Z p ( ϕ ( x ) ) ) + log ⁡ q ( x ) p ( x ) \quad\psi(\phi(x))=-(\log Z^q(\phi(x))-\log Z^p(\phi(x)))+\log \frac{q({x})}{p({x})} ψ(ϕ(x))=(logZq(ϕ(x))logZp(ϕ(x)))+logp(x)q(x),则
    f ∗ ( x , y = c ) = v c T ϕ ( x ) + ψ ( ϕ ( x ) ) \begin{aligned}f^*(x,y=c)&=v_c^T\phi(x)+\psi(\phi(x)) \end{aligned} f(x,y=c)=vcTϕ(x)+ψ(ϕ(x))
  • V V V 的各行向量为 v c T v_c^T vcT. 因为 y y y 为 one-hot vector,则
    f ∗ ( x , y ) = y T V ϕ ( x ) + ψ ( ϕ ( x ) ) = ( V T y ) ⋅ ϕ ( x ) + ψ ( ϕ ( x ) ) f^*(x,y)=y^TV\phi(x)+\psi(\phi(x))=(V^Ty)\cdot \phi(x)+\psi(\phi(x)) f(x,y)=yTVϕ(x)+ψ(ϕ(x))=(VTy)ϕ(x)+ψ(ϕ(x))得出了下图的结构 (左路可以看作是在判断 x x x 是否真实,右路可以看作在判断 x x x 是否属于 y y y 类):
    在这里插入图片描述

We refer to this model of the discriminator as projection for short.

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值