GAN[NIPS. 2014]

本文深入探讨了KL散度和JS散度两种概率分布差异度量方式,阐述了它们的数学定义及其在机器学习中的作用。通过最大化变分下界,解释了GANs(生成对抗网络)中D*和G*的关系,揭示了生成模型如何通过最小化KL散度来逼近真实数据分布。同时,讨论了Jensen-Shannon散度作为非负且仅在分布相同时为零的特性,及其在评估分布相似性中的应用。
摘要由CSDN通过智能技术生成
image-20210623170057542 image-20210623105436151

KL divergence,如果KL的值越大,代表2个分布之间的差异越大,KL的值越小,代表2个分布之间的差异最小。

KL divergence:
D K L ( P ∣ ∣ Q ) = ∑ i = 1 N P ( x i ) l o g P ( x i ) Q ( x i ) D_{KL}(P||Q) = \sum^N_{i=1} P(x_i)log\frac{P(x_i)}{Q(x_i)} DKL(PQ)=i=1NP(xi)logQ(xi)P(xi)
JS divergence:
J S D ( P ∣ ∣ Q ) = 1 2 D ( P ∣ ∣ M ) + 1 2 D ( Q ∣ ∣ M ) M = 1 2 ( P + Q ) JSD(P||Q) = \frac{1}{2}D(P||M)+ \frac{1}{2}D(Q||M) \\ M = \frac{1}{2}(P+Q) JSD(PQ)=21D(PM)+21D(QM)M=21(P+Q)

G ∗ = a r g min ⁡ G D i v ( P G , P d a t a ) x = G ( z ) P G ( x ) G^* = arg\min_{G}Div(P_G, P_{data}) \\ x = G(z) \\ P_G(x) G=argGminDiv(PG,Pdata)x=G(z)PG(x)

image-20210622224853677

D ∗ = a r g max ⁡ D V ( D , G ) = P d a t a ( x ) P d a t a ( x ) + P G ( x ) \begin{aligned} D^* &=arg \max_D V(D,G) \\&= \frac{P_{data(x)}}{P_{data}(x)+ P_G(x)} \end{aligned} D=argDmaxV(D,G)=Pdata(x)+PG(x)Pdata(x)
when G is fixed:
max ⁡ D V ( G , D ) = V ( G , D ∗ ) = E x ∼ P d a t a log ⁡ ( D ∗ ( x ) ) + E x ∼ P G log ⁡ ( 1 − D ∗ ( x ) ) = − 2 l o g 2 + 2 J S D ( P d a t a ∣ ∣ P G ) \begin{aligned} \max_D V(G, D)&=V(G,D^*) \\&=E_{x \sim P_{data}} \log(D^*(x)) + E_{x \sim P_G}\log(1-D^*(x)) \\ &= -2log2 +2JSD(P_{data}||P_G) \end{aligned} DmaxV(G,D)=V(G,D)=ExPdatalog(D(x))+ExPGlog(1D(x))=2log2+2JSD(PdataPG)

  • Since the Jensen–Shannon divergence between two distributions is always non-negative and zero only when they are equal, we have shown that C ∗ = − l o g ( 4 ) C^∗ = −log(4) C=log(4) is the global minimum of C(G) and that the only solution is p g = p d a t a p_g = p_{data} pg=pdata, i.e., the generative model perfectly replicating the data generating process
  • 证明出 V ( D , G ) V(D,G) V(D,G) d i v ( P d a t a , P G ) div(P_{data}, P_G) div(Pdata,PG)是有关系的。

G ∗ = a r g min ⁡ G D i v ( P G , P d a t a ) = a r g min ⁡ G max ⁡ D V ( G , D ) \begin{aligned} G^* &= arg \min_G Div(P_G, P_{data}) \\&= arg \min_G \max_D V(G,D) \end{aligned} G=argGminDiv(PG,Pdata)=argGminDmaxV(G,D)

image-20210622223201719

Theoretical Results

image-20210623164205863 image-20210623170116919 image-20210623170129612 image-20210623170144309 image-20210623170154044 image-20210623170208532 image-20210623170311425 image-20210623170315938
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值