Learning in Implicit Generative Models

对于隐生成模型来说,其直接定义了生成过程,如GAN中的生成器,没有似然函数,对于这一类模型的学习,就不能如VAE那样通过最大化似然函数得到。那么可以基于这样一个假设:真实的数据分布跟所定义的生成模型的分布相等 p ⋆ ( x ) = q θ ( x ) p^{\star}(\mathbf x)=q_\theta(\mathbf x) p(x)=qθ(x)。主要通过两个步骤进行学习:比较和估计。对于比较步骤:利用density difference r ( x ) = p ⋆ ( x ) − q θ ( x ) r(\mathbf x)=p^{\star}(\mathbf x)-q_\theta(\mathbf x) r(x)=p(x)qθ(x)或者density ratio r ( x ) = p ⋆ ( x ) / q θ ( x ) r(\mathbf x)=p^{\star}(\mathbf x)/q_\theta(\mathbf x) r(x)=p(x)/qθ(x),利用比较器 r ( x ) r(\mathbf x) r(x)能够区分模型生成的数据与真实数据的相差程度。对于估计步骤:利用比较器所能提供的信息进而更新隐生成模型的参数 θ \theta θ
总共有四种方法进行隐模型的学习,如图所示。
在这里插入图片描述

Class Probability Estimation

设数据为 X ⊂ R d \mathcal X \subset \mathbb R^d XRd,从真实数据分布中得到 n n n个样本 X p = { x 1 ( p ) , … , x n ( p ) } \mathcal X_p=\{\mathbf x_1^{(p)},\dots,\mathbf x_n^{(p)}\} Xp={x1(p),,xn(p)},同样地,从模型分布中得到 n ′ n^{\prime} n个样本 X q = { x 1 ( q ) , … , x n ′ ( q ) } \mathcal X_q=\{\mathbf x_1^{(q)},\dots,\mathbf x_{n^\prime}^{(q)}\} Xq={x1(q),,xn(q)}。除此之外,对属于真实分布的样本赋予 y = 1 y=1 y=1,而对于模型分布的样本赋予 y = 0 y=0 y=0。这样一来我们可以表示 p ⋆ ( x ) = p ( x ∣ y = 1 ) , q θ ( x ) = p ( x ∣ y = 0 ) p^{\star}(\mathbf x)=p(\mathbf x|y=1),q_\theta(\mathbf x)=p(\mathbf x|y=0) p(x)=p(xy=1),qθ(x)=p(xy=0),则 p ⋆ ( x ) q θ ( x ) = p ( x ∣ y = 1 ) p ( x ∣ y = 0 ) = p ( y = 1 ∣ x ) p ( x ) p ( y = 1 ) / p ( y = 0 ∣ x ) p ( x ) p ( y = 0 ) = p ( y = 1 ∣ x ) p ( y = 0 ∣ x ) 1 − π π \begin{aligned} \frac{p^{\star}(\mathbf x)}{q_\theta(\mathbf x)}&=\frac{p(\mathbf x|y=1)}{p(\mathbf x|y=0)}={\frac{p(y=1|\mathbf x)p(\mathbf x)}{p(y=1)}}/{\frac{p(y=0|\mathbf x)p(\mathbf x)}{p(y=0)}}\\ &=\frac{p(y=1|\mathbf x)}{p(y=0|\mathbf x)}\frac{1-\pi}{\pi} \end{aligned} qθ(x)p(x)=p(xy=0)p(xy=1)=p(y=1)p(y=1x)p(x)/p(y=0)p(y=0x)p(x)=p(y=0x)p(y=1x)π1π可以发现ratio估计实则在估计类概率。其中 p ( y = 1 ) = π p(y=1)=\pi p(y=1)=π,代表类边缘分布,一般人为设定,常常定义为 π = 1 / 2 \pi=1 / 2 π=1/2,或者对于不平衡的数据可以定义 1 − π π ≈ n ′ / n \frac{1-\pi}{\pi} \approx n^{\prime} / n π1πn/n
下面我们的任务变为了指定一个配分函数或者判别器 D ( x ; ϕ ) = p ( y = 1 ∣ x ) ∈ [ 0 , 1 ] \mathcal{D}(\mathbf{x} ; \boldsymbol{\phi})=p(\mathbf{y}=1 | \mathbf{x})\in[0,1] D(x;ϕ)=p(y=1x)[0,1]。则密度比与判别器结果的关系为 D = r / ( r + 1 ) ; r = D / ( 1 − D ) \mathcal{D}=r /(r+1) ; r=\mathcal{D} /(1-\mathcal{D}) D=r/(r+1);r=D/(1D)。常见的配分函数有在这里插入图片描述
一般来说选择Bernoulli loss L ( ϕ , θ ) = E p ( x ∣ y ) p ( y ) [ − y log ⁡ D ( x ; ϕ ) − ( 1 − y ) log ⁡ ( 1 − D ( x ; ϕ ) ) ] = π E p ∗ ( x ) [ − log ⁡ D ( x ; ϕ ) ] + ( 1 − π ) E q θ ( x ) [ − log ⁡ ( 1 − D ( x ; ϕ ) ) ] \begin{array}{l}{\mathcal{L}(\boldsymbol{\phi}, \boldsymbol{\theta})} \\ {=\mathbb{E}_{p(\mathbf{x} | y) p(y)}[-y \log \mathcal{D}(\mathbf{x} ; \boldsymbol{\phi})-(1-y) \log (1-\mathcal{D}(\mathbf{x} ; \boldsymbol{\phi}))]} \\ {=\pi \mathbb{E}_{p^{*}(\mathbf{x})}[-\log \mathcal{D}(\mathbf{x} ; \boldsymbol{\phi})]} {+(1-\pi) \mathbb{E}_{q_{\theta}(\mathbf{x})}[-\log (1-\mathcal{D}(\mathbf{x} ; \boldsymbol{\phi}))]}\end{array} L(ϕ,θ)=Ep(xy)p(y)[ylogD(x;ϕ)(1y)log(1D(x;ϕ))]=πEp(x)[logD(x;ϕ)]+(1π)Eqθ(x)[log(1D(x;ϕ))]由于 q θ ( x ) q_\theta(\mathbf x) qθ(x)为生成器,则 L ( ϕ , θ ) = π E p ∗ ( x ) [ − log ⁡ D ( x ; ϕ ) ] + ( 1 − π ) E q ( z ) [ − log ⁡ ( 1 − D ( G ( z ; θ ) ; ϕ ) ) ] \begin{aligned} \mathcal{L}(\boldsymbol{\phi}, \boldsymbol{\theta}) &=\pi \mathbb{E}_{p *(\mathbf{x})}[-\log \mathcal{D}(\mathbf{x} ; \boldsymbol{\phi})] \\ &+(1-\pi) \mathbb{E}_{q(\mathbf{z})}[-\log (1-\mathcal{D}(\mathcal{G}(\mathbf{z} ; \boldsymbol{\theta}) ; \boldsymbol{\phi}))] \end{aligned} L(ϕ,θ)=πEp(x)[logD(x;ϕ)]+(1π)Eq(z)[log(1D(G(z;θ);ϕ))]以上结果刚好为GAN所使用的目标函数。优化上式可以两次优化(bi-level optimisation):  Ratio loss:  min ⁡ ϕ π E p ∗ ( x ) [ − log ⁡ D ( x ; ϕ ) ] + ( 1 − π ) E q θ ( x ) [ − log ⁡ ( 1 − D ( x ; ϕ ) ) ]  Generative loss:  min ⁡ e E q ( z ) [ log ⁡ ( 1 − D ( G ( z ; θ ) ) ) ] \begin{array}{l}{\text { Ratio loss: } \min _{\phi} \pi \mathbb{E}_{p^{*}(\mathbf{x})}[-\log \mathcal{D}(\mathbf{x} ; \phi)]} {\quad+(1-\pi) \mathbb{E}_{q_{\theta}(\mathbf{x})}[-\log (1-\mathcal{D}(\mathbf{x} ; \phi))]} \\ {\text { Generative loss: } \min _{e} \mathbb{E}_{q(\mathbf{z})}[\log (1-\mathcal{D}(\mathcal{G}(\mathbf{z} ; \boldsymbol{\theta})))]}\end{array}  Ratio loss: minϕπEp(x)[logD(x;ϕ)]+(1π)Eqθ(x)[log(1D(x;ϕ))] Generative loss: mineEq(z)[log(1D(G(z;θ)))]

Divergence Minimisation

第二个方法是计算 p ⋆ p^{\star} p q q q之间的散度。这就不得不提到f-散度。 D f [ p ∗ ( x ) ∥ q θ ( x ) ] = ∫ q θ ( x ) f ( p ∗ ( x ) q θ ( x ) ) d x = E q θ ( x ) [ f ( r ( x ) ) ] ≥ sup ⁡ t E p ∗ ( x ) [ t ( x ) ] − E q θ ( x ) [ f † ( t ( x ) ) ] \begin{array}{c}{D_{f}\left[p^{*}(\mathbf{x}) \| q_{\theta}(\mathbf{x})\right]=\int q_{\theta}(\mathbf{x}) f\left(\frac{p^{*}(\mathbf{x})}{q_{\theta}(\mathbf{x})}\right) d \mathbf{x}} \\ {=\mathbb{E}_{q_{\theta}(\mathbf{x})}[f(r(\mathbf{x}))]} \\ {\quad \geq \sup _{t} \mathbb{E}_{p^{*}(\mathbf{x})}[t(\mathbf{x})]-\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f^{\dagger}(t(\mathbf{x}))\right]}\end{array} Df[p(x)qθ(x)]=qθ(x)f(qθ(x)p(x))dx=Eqθ(x)[f(r(x))]suptEp(x)[t(x)]Eqθ(x)[f(t(x))]其中 f f f为凸函数,Fenchel conjugate为 f † f^{\dagger} f(前提是 f f f为凸函数,lower-semicontinuous函数), f † ( t ) = sup ⁡ u ∈ d o m f { u t − f ( u ) } f^{\dagger}(t)=\sup _{u \in \mathrm{dom}_{f}}\{u t-f(u)\} f(t)=udomfsup{utf(u)}这个共轭函数刚好有 f f f的性质,它也同样有Fenchel conjugate,即 f † † = f f^{\dagger\dagger}=f f=f D f ( P ∥ Q ) = ∫ X q ( x ) sup ⁡ t ∈ dom ⁡ f † { t p ( x ) q ( x ) − f † ( t ) } d x ≥ sup ⁡ t ∈ T ( ∫ X p ( x ) t ( x ) d x − ∫ X q ( x ) f † ( t ( x ) ) d x ) = sup ⁡ t ∈ T ( E x ∼ P [ t ( x ) ] − E x ∼ Q [ f † ( t ( x ) ) ] ) \begin{aligned} D_{f}(P \| Q) &=\int_{\mathcal{X}} q(x) \sup _{t \in \operatorname{dom}_{f^{\dagger}}}\left\{t \frac{p(x)}{q(x)}-f^{\dagger}(t)\right\} \mathrm{d} x \\ & \geq \sup _{t \in \mathcal{T}}\left(\int_{\mathcal{X}} p(x) t(x) \mathrm{d} x-\int_{\mathcal{X}} q(x) f^{\dagger}(t(x)) \mathrm{d} x\right) \\ &=\sup _{t \in \mathcal{T}}\left(\mathbb{E}_{x \sim P}[t(x)]-\mathbb{E}_{x \sim Q}\left[f^{\dagger}(t(x))\right]\right) \end{aligned} Df(PQ)=Xq(x)tdomfsup{tq(x)p(x)f(t)}dxtTsup(Xp(x)t(x)dxXq(x)f(t(x))dx)=tTsup(ExP[t(x)]ExQ[f(t(x))])这样就能进行min-max的训练了。对于以上不等式取等时为 t ∗ ( x ) = f ′ ( r ( x ) ) t^{*}(\mathbf{x})=f^{\prime}(r(\mathbf{x})) t(x)=f(r(x)),将其代入原式子得 L = E p ∗ ( x ) [ − f ′ ( r ϕ ( x ) ) ] + E q θ ( x ) [ f † ( f ′ ( r ϕ ( x ) ) ] \mathcal{L}=\mathbb{E}_{p^{*}(\mathbf{x})}\left[-f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]+\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f^{\dagger}\left(f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]\right. L=Ep(x)[f(rϕ(x))]+Eqθ(x)[f(f(rϕ(x))]其中 r ϕ = r ∗ = p ∗ / q θ r_{\phi}=r^{*}=p^{*} / q_{\theta} rϕ=r=p/qθ,则优化目标为  Ratio loss:  min ⁡ ϕ E p ∗ ( x ) [ − f ′ ( r ϕ ( x ) ) ] + E q θ ( x ) [ f † ( f ′ ( r ϕ ( x ) ) ]  Generative loss:  min ⁡ θ E q ( z ) [ − f † ( f ′ ( r ( G ( z ; θ ) ) ) ] \begin{array}{l}{\text { Ratio loss: }} {\min _{\phi} \mathbb{E}_{p^{*}(\mathbf{x})}\left[-f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]+\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f^{\dagger}\left(f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]\right.} \\ {\text { Generative loss: } \min _{\theta} \mathbb{E}_{q(\mathbf{z})}\left[-f^{\dagger}\left(f^{\prime}(r(\mathcal{G}(\mathbf{z} ; \boldsymbol{\theta})))\right]\right.}\end{array}  Ratio loss: minϕEp(x)[f(rϕ(x))]+Eqθ(x)[f(f(rϕ(x))] Generative loss: minθEq(z)[f(f(r(G(z;θ)))]密度比暗示着 p ∗ ( x ) ≈ p ~ = r ϕ ( x ) q θ ( x ) p^{*}(\mathbf{x}) \approx \tilde{p}=r_{\phi}(\mathbf{x}) q_{\theta}(\mathbf{x}) p(x)p~=rϕ(x)qθ(x),那么 D K L [ p ∗ ( x ) ∥ p ~ ( x ) ] = ∫ p ∗ ( x ) log ⁡ p ∗ ( x ) r ϕ ( x ) q θ ( x ) d x + ∫ ( r ϕ ( x ) q θ ( x ) − p ∗ ( x ) ) d x \begin{aligned} D_{K L}\left[p^{*}(\mathbf{x}) \| \tilde{p}(\mathbf{x})\right]=& \int p^{*}(\mathbf{x}) \log \frac{p^{*}(\mathbf{x})}{r_{\phi}(\mathbf{x}) q_{\theta}(\mathbf{x})} d \mathbf{x} +\int\left(r_{\phi}(\mathbf{x}) q_{\theta}(\mathbf{x})-p^{*}(\mathbf{x})\right) d \mathbf{x} \end{aligned} DKL[p(x)p~(x)]=p(x)logrϕ(x)qθ(x)p(x)dx+(rϕ(x)qθ(x)p(x))dx这是针对于非归一化的分布的KL散度。从而有 L = E p ∗ ( x ) [ − log ⁡ r ϕ ( x ) ] + E q θ ( x ) [ r ϕ ( x ) − 1 ] − E p ∗ ( x ) [ log ⁡ q θ ( x ) ] + E p ∗ ( x ) [ log ⁡ p ∗ ( x ) ] \begin{aligned} \mathcal{L}=& \mathbb{E}_{p^{*}(\mathbf{x})}\left[-\log r_{\phi}(\mathbf{x})\right]+\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x})-1\right] -\mathbb{E}_{p^{*}(\mathbf{x})}\left[\log q_{\theta}(\mathbf{x})\right]+\mathbb{E}_{p^{*}(\mathbf{x})}\left[\log p^{*}(\mathbf{x})\right] \end{aligned} L=Ep(x)[logrϕ(x)]+Eqθ(x)[rϕ(x)1]Ep(x)[logqθ(x)]+Ep(x)[logp(x)]则很容易得到ratio loss,即关于 ϕ \phi ϕ的有关项。但是通过这个没法得到generative loss,由于第三项需要 log ⁡ q θ ( x ) \log q_{\theta}(\mathbf{x}) logqθ(x),这对于隐模型来说是无法得到的。

Ratio matching

直接优化真实密度比 r ∗ ( x ) = p ∗ ( x ) / q θ ( x ) r^{*}(\mathbf{x})=p^{*}(\mathbf{x}) / q_{\theta}(\mathbf{x}) r(x)=p(x)/qθ(x)和估计密度比 r ϕ ( x ) r_{\phi}(\mathbf{x}) rϕ(x) L = 1 2 ∫ q θ ( x ) ( r ( x ) − r ∗ ( x ) ) 2 d x = 1 2 E q θ ( x ) [ r ϕ ( x ) 2 ] − E p ∗ ( x ) [ r ϕ ( x ) ] + 1 2 E p ∗ ( x ) [ r ∗ ( x ) ] = 1 2 E q θ ( x ) [ r ϕ ( x ) 2 ] − E p ∗ ( x ) [ r ϕ ( x ) ]  s.t.  r ϕ ( x ) ≥ 0 \begin{aligned} \mathcal{L} &=\frac{1}{2} \int q_{\theta}(\mathbf{x})\left(r(\mathbf{x})-r^{*}(\mathbf{x})\right)^{2} d \mathbf{x} \\ &=\frac{1}{2} \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x})^{2}\right]-\mathbb{E}_{p^{*}(\mathbf{x})}\left[r_{\phi}(\mathbf{x})\right]+\frac{1}{2} \mathbb{E}_{p^{*}(\mathbf{x})}\left[r^{*}(\mathbf{x})\right] \\ &=\frac{1}{2} \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x})^{2}\right]-\mathbb{E}_{p^{*}(\mathbf{x})}\left[r_{\phi}(\mathbf{x})\right] \quad \text { s.t. } r_{\phi}(\mathbf{x}) \geq 0 \end{aligned} L=21qθ(x)(r(x)r(x))2dx=21Eqθ(x)[rϕ(x)2]Ep(x)[rϕ(x)]+21Ep(x)[r(x)]=21Eqθ(x)[rϕ(x)2]Ep(x)[rϕ(x)] s.t. rϕ(x)0利用上式可以很容易得到ratio loss和generative loss,进而进行优化。除了使用这种平方误差,还可以考虑使用Bregman divergence(以上的均方误差为其一个特例) B f ( r ∗ ( x ) ∥ r ϕ ( x ) ) = E q θ ( x ) ( f ( r ∗ ( x ) ) − f ( r ϕ ( x ) ) − f ′ ( r ϕ ( x ) ) [ r ∗ ( x ) − r ϕ ( x ) ] ) = E q θ ( x ) [ r ϕ ( x ) f ′ ( r ϕ ( x ) ) − f ( r ϕ ( x ) ) ] − E p ∗ [ f ′ ( r ϕ ( x ) ) ] + D f [ p ∗ ( x ) ∥ q θ ( x ) ] = L B ( r ϕ ( x ) ) + D f [ p ∗ ( x ) ∥ q θ ( x ) ] \begin{array}{l}{B_{f}\left(r^{*}(\mathbf{x}) \| r_{\phi}(\mathbf{x})\right)} \\ {=\mathbb{E}_{q_{\theta}(\mathbf{x})}\left(f\left(r^{*}(\mathbf{x})\right)-f\left(r_{\phi}(\mathbf{x})\right)\right.} {-f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\left[r^{*}(\mathbf{x})-r_{\phi}(\mathbf{x})\right] )} \\{=\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x}) f^{\prime}\left(r_{\phi}(\mathbf{x})\right)-f\left(r_{\phi}(\mathbf{x})\right)\right]}{-\mathbb{E}_{p^{*}}\left[f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]+D_{f}\left[p^{*}(\mathbf{x}) \| q_{\theta}(\mathbf{x})\right]} \\ {=\mathcal{L}_{B}\left(r_{\phi}(\mathbf{x})\right)+D_{f}\left[p^{*}(\mathbf{x}) \| q_{\theta}(\mathbf{x})\right]}\end{array} Bf(r(x)rϕ(x))=Eqθ(x)(f(r(x))f(rϕ(x))f(rϕ(x))[r(x)rϕ(x)])=Eqθ(x)[rϕ(x)f(rϕ(x))f(rϕ(x))]Ep[f(rϕ(x))]+Df[p(x)qθ(x)]=LB(rϕ(x))+Df[p(x)qθ(x)]从上式可以得到ratio loss,即为 L B ( r ϕ ( x ) ) \mathcal{L}_{B}\left(r_{\phi}(\mathbf{x})\right) LB(rϕ(x)), L B ( r ϕ ( x ) ) = E p ∗ [ − f ′ ( r ϕ ( x ) ) ] + E q θ ( x ) [ r ϕ ( x ) f ′ ( r ϕ ( x ) ) − f ( r ϕ ( x ) ) ] = E p ∗ [ − f ′ ( r ϕ ( x ) ) ] + E q θ ( x ) [ f † ( f ′ ( r ϕ ( x ) ) ) ] \begin{array}{l}{\mathcal{L}_{B}\left(r_{\phi}(\mathbf{x})\right)} \\ {=E_{p^{*}}\left[-f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]} {\quad+\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x}) f^{\prime}\left(r_{\phi}(\mathbf{x})\right)-f\left(r_{\phi}(\mathbf{x})\right)\right]} \\ {=E_{p^{*}}\left[-f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]+\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f^{\dagger}\left(f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right)\right]}\end{array} LB(rϕ(x))=Ep[f(rϕ(x))]+Eqθ(x)[rϕ(x)f(rϕ(x))f(rϕ(x))]=Ep[f(rϕ(x))]+Eqθ(x)[f(f(rϕ(x)))]发现利用 f † ( f ′ ( x ) ) = max ⁡ r r f ′ ( x ) − f ( r ) f^{\dagger}\left(f^{\prime}(x)\right)=\max _{r} r f^{\prime}(x)-f(r) f(f(x))=rmaxrf(x)f(r)后,变为了上一节得到过的目标函数。进一步考虑,仍然提出关于 θ \theta θ的部分,得到generative loss L ( q θ ) = E q θ ( x ) [ r ϕ ( x ) f ′ ( r ϕ ( x ) ) ] − E q θ ( x ) [ f ( r ϕ ( x ) ) ] + D f [ p ∗ ( x ) ∥ q θ ( x ) ] \begin{aligned} \mathcal{L}\left(q_{\theta}\right)=& \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x}) f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]-\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f\left(r_{\phi}(\mathbf{x})\right)\right] +D_{f}\left[p^{*}(\mathbf{x}) \| q_{\theta}(\mathbf{x})\right] \end{aligned} L(qθ)=Eqθ(x)[rϕ(x)f(rϕ(x))]Eqθ(x)[f(rϕ(x))]+Df[p(x)qθ(x)]发现其中仍然包含 q θ ( x ) q_\theta(\mathbf x) qθ(x),无法继续求解。我们采用 p ∗ ≈ r ϕ q θ p^{*} \approx r_{\phi} q_{\theta} prϕqθ,则 D f [ p ∗ ( x ) ∥ q θ ( x ) ] = E q θ ( x ) [ f ( p ∗ q θ ( x ) ) ] ≈ E q θ ( x ) [ f ( q θ ( x ) r ϕ ( x ) q θ ( x ) ) ] = E q θ ( x ) [ f ( r ϕ ( x ) ) ] \begin{array}{l}{D_{f}\left[p^{*}(\mathbf{x}) \| q_{\theta}(\mathbf{x})\right]=\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f\left(\frac{p^{*}}{q_{\theta}(\mathbf{x})}\right)\right]} \\ {\approx \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f\left(\frac{q_{\theta}(\mathbf{x}) r_{\phi}(\mathbf{x})}{q_{\theta}(\mathbf{x})}\right)\right]=\mathbb{E}_{q_{\theta}(\mathbf{x})}\left[f\left(r_{\phi}(\mathbf{x})\right)\right]}\end{array} Df[p(x)qθ(x)]=Eqθ(x)[f(qθ(x)p)]Eqθ(x)[f(qθ(x)qθ(x)rϕ(x))]=Eqθ(x)[f(rϕ(x))]这样一来可以得到  Ratio loss:  min ⁡ ϕ E q θ ( x ) [ r ϕ ( x ) f ′ ( r ϕ ( x ) ) − f ( r ϕ ( x ) ) ] − E p ∗ [ f ′ ( r ϕ ( x ) ) ]  Generative loss:  min ⁡ θ E q θ ( x ) [ r ϕ ( x ) f ′ ( r ϕ ( x ) ) ] \begin{array}{l}{\text { Ratio loss: }} {\min _{\phi} \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x}) f^{\prime}\left(r_{\phi}(\mathbf{x})\right)-f\left(r_{\phi}(\mathbf{x})\right)\right]-\mathbb{E}_{p^{*}}\left[f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]} \\ {\text { Generative loss: } \min _{\theta} \mathbb{E}_{q_{\theta}(\mathbf{x})}\left[r_{\phi}(\mathbf{x}) f^{\prime}\left(r_{\phi}(\mathbf{x})\right)\right]}\end{array}  Ratio loss: minϕEqθ(x)[rϕ(x)f(rϕ(x))f(rϕ(x))]Ep[f(rϕ(x))] Generative loss: minθEqθ(x)[rϕ(x)f(rϕ(x))]

Moment Matching

最后一个方法是检验 p ⋆ p^\star p q q q的矩是否相同。 L ( ϕ , θ ) = ( E p ∗ ( x ) [ s ( x ) ] − E q θ ( x ) [ s ( x ) ] ) 2 = ( E p ∗ ( x ) [ s ( x ) ] − E q ( z ) [ s ( G ( z ; θ ) ) ] ) 2 \begin{aligned} \mathcal{L}(\phi, \boldsymbol{\theta}) &=\left(\mathbb{E}_{p^{*}(\mathbf{x})}[s(\mathbf{x})]-\mathbb{E}_{q_{\theta}(\mathbf{x})}[s(\mathbf{x})]\right)^{2} \\ &=\left(\mathbb{E}_{p^{*}(\mathbf{x})}[s(\mathbf{x})]-\mathbb{E}_{q(\mathbf{z})}[s(\mathcal{G}(\mathbf{z} ; \boldsymbol{\theta}))]\right)^{2} \end{aligned} L(ϕ,θ)=(Ep(x)[s(x)]Eqθ(x)[s(x)])2=(Ep(x)[s(x)]Eq(z)[s(G(z;θ))])2其中 s ( x ) s(\mathbf x) s(x)为某种统计量,其选择极其重要,一般来说我们希望所有矩都相同。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值