【深度学习】MLE视角下VAE到DDPM的Loss推导 02

DDPM的Loss推导

image.png

图中是基于MHVAE的标注,替换为 x → x 0 x \rightarrow x_0 xx0 z i → x i z_i \rightarrow x_i zixi

其中加噪过程 q ( x t ∣ x t − 1 ) q(x_{t}|x_{t-1}) q(xtxt1)是人为的,具体公式参考[[001 DDPM-v2]],因此不添加 ϕ \phi ϕ参数;
其中去噪过程 p θ ( x t − 1 ∣ x t ) p_{\theta}(x_{t-1}|x_t) pθ(xt1xt)是需要学习的,因此添加 θ \theta θ参数进行神经网络参数化操作;

DDPM的ELBO

参考上述MLE推导得到ELBO的公式

l o g p θ ( x ) = l o g ∫ p θ ( x 0 : T ) d x 1 : T = l o g ∫ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) q ( x 1 : T ∣ x 0 ) d x 1 : T = l o g E q ( x 1 : T ∣ x 0 ) [ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] ≥ E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E L B O \begin{aligned} logp_{\theta}(x)&=log\int p_{\theta}(x_{0:T})dx_{1:T}\\ &=log\int\frac{p_{\theta}(x_{0:T})q(x_{1:T}|x_0)}{q(x_{1:T}|x_0)}dx_{1:T} \\ &=log\mathbb{E}_{q(x_{1:T}|x_0)}[\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}] \\ &\geq\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}] = ELBO \\ \end{aligned} logpθ(x)=logpθ(x0:T)dx1:T=logq(x1:Tx0)pθ(x0:T)q(x1:Tx0)dx1:T=logEq(x1:Tx0)[q(x1:Tx0)pθ(x0:T)]Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]=ELBO
其中

p θ ( x 0 : T ) = p ( x T ) p ( x 0 : T − 1 ∣ x T ) = p ( x T ) p ( x T − 1 ∣ x T ) p ( x 0 : T − 2 ∣ x T − 1 , x T ) = p ( x T ) p ( x T − 1 ∣ x T ) p ( x 0 : T − 2 ∣ x T − 1 ) = ⋯ = p ( x T ) p ( x T − 1 ∣ x T ) ⋯ p ( x 0 ∣ x 1 ) = p ( x T ) ∏ t = 1 T p ( x t − 1 ∣ x t ) \begin{aligned} p_{\theta}(x_{0:T}) &= p(x_T)p(x_{0:T-1}|x_T)\\ &=p(x_T)p(x_{T-1}|x_{T})p(x_{0:T-2}|x_{T-1},x_T)\\ &=p(x_T)p(x_{T-1}|x_{T})p(x_{0:T-2}|x_{T-1}) \\ &=\cdots \\ &=p(x_T)p(x_{T-1}|x_{T})\cdots p(x_0|x_1) \\ &=p(x_T)\prod_{t=1}^Tp(x_{t-1}|x_t) \end{aligned} pθ(x0:T)=p(xT)p(x0:T1xT)=p(xT)p(xT1xT)p(x0:T2xT1,xT)=p(xT)p(xT1xT)p(x0:T2xT1)==p(xT)p(xT1xT)p(x0x1)=p(xT)t=1Tp(xt1xt)
q ( x 1 : T ∣ x 0 ) = q ( x 2 : T ∣ x 1 ) q ( x 1 ∣ x 0 ) = q ( x 3 : T ∣ x 2 , x 1 ) q ( x 2 ∣ x 1 ) q ( x 1 ∣ x 0 ) = q ( x 3 : T ∣ x 2 ) q ( x 2 ∣ x 1 ) q ( x 1 ∣ x 0 ) = ⋯ = q ( x T ∣ x T − 1 ) ⋯ q ( x 2 ∣ x 1 ) q ( x 1 ∣ x 0 ) = ∏ t = 1 T q ( x t ∣ x t − 1 ) \begin{aligned} q(x_{1:T}|x_0) &= q(x_{2:T}|x_1)q(x_1|x_0) \\ &=q(x_{3:T}|x_2,x_1)q(x_2|x_1)q(x_1|x_0) \\ &=q(x_{3:T}|x_2)q(x_2|x_1)q(x_1|x_0) \\ &=\cdots \\ &=q(x_{T}|x_{T-1})\cdots q(x_2|x_1)q(x_1|x_0)\\ &=\prod_{t=1}^Tq(x_t|x_{t-1}) \end{aligned} q(x1:Tx0)=q(x2:Tx1)q(x1x0)=q(x3:Tx2,x1)q(x2x1)q(x1x0)=q(x3:Tx2)q(x2x1)q(x1x0)==q(xTxT1)q(x2x1)q(x1x0)=t=1Tq(xtxt1)

代入得

l o g p θ ( x ) ≥ E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) ∏ t = 1 T q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) ∏ t = 2 T p θ ( x t − 1 ∣ x t ) ∏ t = 1 T q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) ∏ t = 1 T − 1 p θ ( x t ∣ x t + 1 ) q ( x T ∣ x T − 1 ) ∏ t = 1 T − 1 q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x T ∣ x T − 1 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 1 T − 1 p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) ] + E q ( x 1 : T ∣ x 0 ) [ ∑ t = 1 T − 1 l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) ] + ∑ t = 1 T − 1 E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) ] = E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x T , x T − 1 ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) ] + ∑ t = 1 T − 1 E q ( x t − 1 , x t , x t + 1 ∣ x 0 ) [ l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) ] \begin{aligned} logp_{\theta}(x)&\geq\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)\prod_{t=1}^Tp_{\theta}(x_{t-1}|x_t)}{\prod_{t=1}^Tq(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)\prod_{t=2}^Tp_{\theta}(x_{t-1}|x_t)}{\prod_{t=1}^Tq(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)\prod_{t=1}^{T-1}p_{\theta}(x_{t}|x_{t+1})}{q(x_T|x_{T-1})\prod_{t=1}^{T-1}q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_T|x_{T-1})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=1}^{T-1}\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[\sum_{t=1}^{T-1}log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}]+\sum_{t=1}^{T-1}\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{T},x_{T-1}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}]+\sum_{t=1}^{T-1}\mathbb{E}_{q(x_{t-1},x_t,x_{t+1}|x_0)}[log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}] \\ \end{aligned} logpθ(x)Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]=Eq(x1:Tx0)[logt=1Tq(xtxt1)pθ(xT)t=1Tpθ(xt1xt)]=Eq(x1:Tx0)[logt=1Tq(xtxt1)pθ(xT)pθ(x0x1)t=2Tpθ(xt1xt)]=Eq(x1:Tx0)[logq(xTxT1)t=1T1q(xtxt1)pθ(xT)pθ(x0x1)t=1T1pθ(xtxt+1)]=Eq(x1:Tx0)[logq(xTxT1)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=1T1q(xtxt1)pθ(xtxt+1)]=Eq(x1:Tx0)[logpθ(x0x1)]+Eq(x1:Tx0)[logq(xTxT1)pθ(xT)]+Eq(x1:Tx0)[t=1T1logq(xtxt1)pθ(xtxt+1)]=Eq(x1:Tx0)[logpθ(x0x1)]+Eq(x1:Tx0)[logq(xTxT1)pθ(xT)]+t=1T1Eq(x1:Tx0)[logq(xtxt1)pθ(xtxt+1)]=Eq(x1x0)[logpθ(x0x1)]+Eq(xT,xT1x0)[logq(xTxT1)pθ(xT)]+t=1T1Eq(xt1,xt,xt+1x0)[logq(xtxt1)pθ(xtxt+1)]

[!NOTE]

  1. ∏ t = 2 T p θ ( x t − 1 ∣ x t ) \prod_{t=2}^Tp_{\theta}(x_{t-1}|x_t) t=2Tpθ(xt1xt)可以通过换元法,改写成 ∏ t = 1 T − 1 p θ ( x t ∣ x t + 1 ) \prod_{t=1}^{T-1}p_{\theta}(x_{t}|x_{t+1}) t=1T1pθ(xtxt+1)
  2. 期望的和 等于 和的期望
  3. 最后一行由于其它变量都没有用上,因此只保留相关的变量进行采样;
消除变量

E q ( x T , x T − 1 ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) ] = ∬ q ( x T , x T − 1 ∣ x 0 ) l o g p θ ( x T ) q ( x T ∣ x T − 1 ) d x T − 1 d x T = ∬ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 , x 0 ) q ( x T − 1 ∣ x 0 ) d x T − 1 d x T = ∬ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 ) q ( x T − 1 ∣ x 0 ) d x T − 1 d x T = ∫ [ ∫ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 ) d x T ] q ( x T − 1 ∣ x 0 ) d x T − 1 = ∫ q ( x T − 1 ∣ x 0 ) [ − D K L ( q ( x T ∣ x T − 1 ) ∣ ∣ p θ ( x T ) ) ] d x T − 1 = E q ( x T − 1 ∣ x 0 ) [ − D K L ( q ( x T ∣ x T − 1 ) ∣ ∣ p θ ( x T ) ) ] \begin{aligned} \mathbb{E}_{q(x_{T},x_{T-1}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}] &= \iint q(x_T,x_{T-1}|x_0)log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}dx_{T-1}dx_T \\ &=\iint log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})} q(x_T|x_{T-1},x_0)q(x_{T-1}|x_0)dx_{T-1}dx_T \\ &=\iint log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})} q(x_T|x_{T-1})q(x_{T-1}|x_0)dx_{T-1}dx_T \\ &=\int [\int log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}q(x_T|x_{T-1})dx_T]q(x_{T-1}|x_0)dx_{T-1} \\ &=\int q(x_{T-1}|x_0)[-D_{KL}(q(x_T|x_{T-1})||p_{\theta}(x_T))]dx_{T-1} \\ &=\mathbb{E}_{q(x_{T-1}|x_0)}[-D_{KL}(q(x_T|x_{T-1})||p_{\theta}(x_T))] \end{aligned} Eq(xT,xT1x0)[logq(xTxT1)pθ(xT)]=q(xT,xT1x0)logq(xTxT1)pθ(xT)dxT1dxT=logq(xTxT1)pθ(xT)q(xTxT1,x0)q(xT1x0)dxT1dxT=logq(xTxT1)pθ(xT)q(xTxT1)q(xT1x0)dxT1dxT=[logq(xTxT1)pθ(xT)q(xTxT1)dxT]q(xT1x0)dxT1=q(xT1x0)[DKL(q(xTxT1)∣∣pθ(xT))]dxT1=Eq(xT1x0)[DKL(q(xTxT1)∣∣pθ(xT))]

这里尤其尤其要注意的是积分顺序非常关键

我踩得坑是:

E q ( x T , x T − 1 ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) ] = ∬ q ( x T , x T − 1 ∣ x 0 ) l o g p θ ( x T ) q ( x T ∣ x T − 1 ) d x T − 1 d x T = ∬ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 , x 0 ) q ( x T − 1 ∣ x 0 ) d x T − 1 d x T = ∬ l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 ) q ( x T − 1 ∣ x 0 ) d x T − 1 d x T = ∫ [ ∫ q ( x T − 1 ∣ x 0 ) d x T − 1 ] l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 ) d x T = ∫ 1 × l o g p θ ( x T ) q ( x T ∣ x T − 1 ) q ( x T ∣ x T − 1 ) d x T = − D K L ( q ( x T ∣ x T − 1 ) ∣ ∣ p θ ( x T ) ) \begin{aligned} \mathbb{E}_{q(x_{T},x_{T-1}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}] &= \iint q(x_T,x_{T-1}|x_0)log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}dx_{T-1}dx_T \\ &=\iint log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})} q(x_T|x_{T-1},x_0)q(x_{T-1}|x_0)dx_{T-1}dx_T \\ &=\iint log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})} q(x_T|x_{T-1})q(x_{T-1}|x_0)dx_{T-1}dx_T \\ &=\int [\int q(x_{T-1}|x_0)dx_{T-1}]log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}q(x_T|x_{T-1})dx_T \\ &=\int 1 \times log\frac{p_{\theta}(x_T)}{q(x_T|x_{T-1})}q(x_T|x_{T-1})dx_T \\ &=-D_{KL}(q(x_T|x_{T-1})||p_{\theta}(x_T)) \end{aligned} Eq(xT,xT1x0)[logq(xTxT1)pθ(xT)]=q(xT,xT1x0)logq(xTxT1)pθ(xT)dxT1dxT=logq(xTxT1)pθ(xT)q(xTxT1,x0)q(xT1x0)dxT1dxT=logq(xTxT1)pθ(xT)q(xTxT1)q(xT1x0)dxT1dxT=[q(xT1x0)dxT1]logq(xTxT1)pθ(xT)q(xTxT1)dxT=1×logq(xTxT1)pθ(xT)q(xTxT1)dxT=DKL(q(xTxT1)∣∣pθ(xT))

[!important]

  1. ∫ p ( x 1 ∣ x 2 ) d x 1 = 1 \int p(x_1|x_2) dx_1=1 p(x1x2)dx1=1,要看清楚这里是积分, x 1 x_1 x1是变量,积分完之后 x 1 x_1 x1变量就消失了
  2. 代入上式,若先把 x T − 1 x_{T-1} xT1当作变量积分掉的话,剩下的带有 x T − 1 x_{T-1} xT1条件概率的积分就无法完成
  3. 因此只能先把 x T x_T xT当作变量积分掉,因为剩下的 x T − 1 x_{T-1} xT1变量没有 x T x_T xT的条件概率

同理

E q ( x t − 1 , x t , x t + 1 ∣ x 0 ) [ l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) ] = ∭ q ( x t − 1 , x t , x t + 1 ∣ x 0 ) l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) d x t − 1 d x t d x t + 1 = ∭ q ( x t + 1 , x t − 1 ∣ x 0 ) q ( x t ∣ x t − 1 ) l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) d x t − 1 d x t d x t + 1 = ∬ [ ∫ l o g p θ ( x t ∣ x t + 1 ) q ( x t ∣ x t − 1 ) q ( x t ∣ x t − 1 ) d x t ] q ( x t + 1 , x t − 1 ∣ x 0 ) d x t − 1 d x t + 1 = ∬ q ( x t + 1 , x t − 1 ∣ x 0 ) [ − D K L ( q ( x t ∣ x t − 1 ) ∣ ∣ p θ ( x t ∣ x t + 1 ) ) ] d x t − 1 d x t + 1 = E q ( x t − 1 , x t + 1 ∣ x 0 ) [ − D K L ( q ( x t ∣ x t − 1 ) ∣ ∣ p θ ( x t ∣ x t + 1 ) ) ] \begin{aligned} \mathbb{E}_{q(x_{t-1},x_t,x_{t+1}|x_0)}[log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}] &= \iiint q(x_{t-1},x_t,x_{t+1}|x_0)log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}dx_{t-1}dx_t dx_{t+1} \\ &=\iiint q(x_{t+1},x_{t-1}|x_0)q(x_t|x_{t-1})log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}dx_{t-1}dx_t dx_{t+1} \\ &=\iint [\int log\frac{p_{\theta}(x_{t}|x_{t+1})}{q(x_t|x_{t-1})}q(x_t|x_{t-1})dx_t]q(x_{t+1},x_{t-1}|x_0)dx_{t-1}dx_{t+1} \\ &=\iint q(x_{t+1},x_{t-1}|x_0)[-D_{KL}(q(x_t|x_{t-1})||p_{\theta}(x_{t}|x_{t+1}))]dx_{t-1}dx_{t+1} \\ &=\mathbb{E}_{q(x_{t-1},x_{t+1}|x_0)}[-D_{KL}(q(x_t|x_{t-1})||p_{\theta}(x_{t}|x_{t+1}))] \end{aligned} Eq(xt1,xt,xt+1x0)[logq(xtxt1)pθ(xtxt+1)]=q(xt1,xt,xt+1x0)logq(xtxt1)pθ(xtxt+1)dxt1dxtdxt+1=q(xt+1,xt1x0)q(xtxt1)logq(xtxt1)pθ(xtxt+1)dxt1dxtdxt+1=[logq(xtxt1)pθ(xtxt+1)q(xtxt1)dxt]q(xt+1,xt1x0)dxt1dxt+1=q(xt+1,xt1x0)[DKL(q(xtxt1)∣∣pθ(xtxt+1))]dxt1dxt+1=Eq(xt1,xt+1x0)[DKL(q(xtxt1)∣∣pθ(xtxt+1))]
此时有

l o g p θ ( x ) ≥ E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x T − 1 ∣ x 0 ) [ − D K L ( q ( x T ∣ x T − 1 ) ∣ ∣ p θ ( x T ) ) ] + E q ( x t − 1 , x t + 1 ∣ x 0 ) [ − D K L ( q ( x t ∣ x t − 1 ) ∣ ∣ p θ ( x t ∣ x t + 1 ) ) ] \begin{aligned} logp_{\theta}(x)&\geq\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{T-1}|x_0)}[-D_{KL}(q(x_T|x_{T-1})||p_{\theta}(x_T))]+\mathbb{E}_{q(x_{t-1},x_{t+1}|x_0)}[-D_{KL}(q(x_t|x_{t-1})||p_{\theta}(x_{t}|x_{t+1}))] \\ \end{aligned} logpθ(x)Eq(x1x0)[logpθ(x0x1)]+Eq(xT1x0)[DKL(q(xTxT1)∣∣pθ(xT))]+Eq(xt1,xt+1x0)[DKL(q(xtxt1)∣∣pθ(xtxt+1))]

image.png

这里出现了一个问题,多元变量求期望方差会很大,那么能不能通过一些方法消去部分的变量呢?

马尔可夫性质贝叶斯

l o g p θ ( x ) ≥ E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) ∏ t = 1 T q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) ∏ t = 2 T p θ ( x t − 1 ∣ x t ) ∏ t = 1 T q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x 1 ∣ x 0 ) ∏ t = 2 T q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 , x 0 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t ∣ x 0 ) q ( x t − 1 ∣ x 0 ) ] \begin{aligned} logp_{\theta}(x)&\geq\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)\prod_{t=1}^Tp_{\theta}(x_{t-1}|x_t)}{\prod_{t=1}^Tq(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)\prod_{t=2}^Tp_{\theta}(x_{t-1}|x_t)}{\prod_{t=1}^Tq(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)\prod_{t=2}^{T}p_{\theta}(x_{t-1}|x_{t})}{q(x_1|x_0)\prod_{t=2}^{T}q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_1|x_{0})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=2}^{T}\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_t|x_{t-1})}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_1|x_{0})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=2}^{T}\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_t|x_{t-1},x_0)}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_1|x_{0})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=2}^{T}\frac{p_{\theta}(x_{t-1}|x_{t})}{\frac{q(x_{t-1}|x_{t},x_0)q(x_t|x_0)}{q(x_{t-1}|x_0)}}] \\ \end{aligned} logpθ(x)Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]=Eq(x1:Tx0)[logt=1Tq(xtxt1)pθ(xT)t=1Tpθ(xt1xt)]=Eq(x1:Tx0)[logt=1Tq(xtxt1)pθ(xT)pθ(x0x1)t=2Tpθ(xt1xt)]=Eq(x1:Tx0)[logq(x1x0)t=2Tq(xtxt1)pθ(xT)pθ(x0x1)t=2Tpθ(xt1xt)]=Eq(x1:Tx0)[logq(x1x0)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=2Tq(xtxt1)pθ(xt1xt)]=Eq(x1:Tx0)[logq(x1x0)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=2Tq(xtxt1,x0)pθ(xt1xt)]=Eq(x1:Tx0)[logq(x1x0)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=2Tq(xt1x0)q(xt1xt,x0)q(xtx0)pθ(xt1xt)]

[!NOTE]

  1. 由于马尔可夫性质规定: x t x_{t} xt时刻只与 x t − 1 x_{t-1} xt1时刻相关。因此 q ( x t ∣ x t − 1 , x 0 ) = q ( x t ∣ x t − 1 ) q(x_t|x_{t-1},x_0)=q(x_t|x_{t-1}) q(xtxt1,x0)=q(xtxt1)
  2. 但是逆向过程并不满足马尔可夫性质,即 q ( x t − 1 ∣ x t , x 0 ) ≠ q ( x t − 1 ∣ x t ) q(x_{t-1}|x_{t},x_0) \neq q(x_{t-1}|x_{t}) q(xt1xt,x0)=q(xt1xt),因此后文中 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_{t},x_0) q(xt1xt,x0)中的 x 0 x_0 x0一直没有删除;
  3. 值得注意的是,我们从原理正向推导出发时,直接在逆向非马尔可夫性质条件下在 p ( x t ∣ x t − 1 ) p(x_{t}|x_{t-1}) p(xtxt1)中添加 x 0 x_0 x0条件,并通过预估 x 0 ^ = f ( x t , t ) \hat{x_0}=f(x_t,t) x0^=f(xt,t)的形式来消除新增的 x 0 x_0 x0条件。上面的思路显得比较跳跃且难以想象,通过MLE估计ELBO的推导中,在满足马尔科夫性质下利用 q ( x t ∣ x t − 1 ) = q ( x t ∣ x t − 1 , x 0 ) q(x_t|x_{t-1})=q(x_t|x_{t-1},x_0) q(xtxt1)=q(xtxt1,x0)公式进行推导显得更为合理极度怀疑正向推导加 x 0 x_0 x0的措施是根据ELBO推导过程的trick来的。

其中

$$
\begin{aligned}
\mathbb{E}{q(x{1:T}|x_0)}[log\prod_{t=2}^{T}{\frac{q(x_{t-1}|x_{t},x_0)q(x_t|x_0)}{q(x_{t-1}|x_0)}}] &=\mathbb{E}{q(x{1:T}|x_0)}[log\prod_{t=2}^{T}q(x_{t-1}|x_{t},x_0)]+\mathbb{E}{q(x{1:T}|x_0)}[log\frac{\cancel{q(x_2|x_0)}}{q(x_1|x_0)}+log\frac{\cancel{q(x_3|x_1)}}{\cancel{q(x_2|x_0)}}+\cdots+log\frac{q(x_T|x_0)}{\cancel{q(x_{T-1}|x_0)}}] \
&=\mathbb{E}{q(x{1:T}|x_0)}[log\prod_{t=2}^{T}q(x_{t-1}|x_{t},x_0)]+\mathbb{E}{q(x{1:T}|x_0)}[log\frac{q(x_T|x_0)}{q(x_{1}|x_0)}]

\end{aligned}
$$

代入原式得

l o g p θ ( x ) ≥ E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t ∣ x 0 ) q ( x t − 1 ∣ x 0 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ l o g q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] = E q ( x 1 : T ∣ x 0 ) [ l o g p θ ( x T ) p θ ( x 0 ∣ x 1 ) q ( x T ∣ x 0 ) ] + E q ( x 1 : T ∣ x 0 ) [ ∑ t = 2 T l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] = E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x T ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x 0 ) ] + E q ( x t − 1 , x t ∣ x 0 ) [ ∑ t = 2 T l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] = E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] + E q ( x T ∣ x 0 ) [ l o g p θ ( x T ) q ( x T ∣ x 0 ) ] + ∑ t = 2 T E q ( x t − 1 , x t ∣ x 0 ) [ l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] = E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] − D K L ( q ( x T ∣ x 0 ) ∣ ∣ p θ ( x T ) ) + ∑ t = 2 T E q ( x t − 1 , x t ∣ x 0 ) [ l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] \begin{aligned} logp_{\theta}(x)&\geq\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_1|x_{0})}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=2}^{T}\frac{p_{\theta}(x_{t-1}|x_{t})}{\frac{q(x_{t-1}|x_{t},x_0)q(x_t|x_0)}{q(x_{t-1}|x_0)}}] \\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{\cancel{q(x_1|x_{0})}}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\prod_{t=2}^{T}\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}]+\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{\cancel{q(x_1|x_0)}}{q(x_{T}|x_0)}]\\ &=\mathbb{E}_{q(x_{1:T}|x_0)}[log\frac{p_{\theta}(x_T)p_{\theta}(x_0|x_1)}{q(x_{T}|x_0)}]+\mathbb{E}_{q(x_{1:T}|x_0)}[\sum_{t=2}^{T}log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}]\\ &=\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{T}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_{T}|x_0)}]+\mathbb{E}_{q(x_{t-1},x_t|x_0)}[\sum_{t=2}^{T}log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}] \\ &=\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]+\mathbb{E}_{q(x_{T}|x_0)}[log\frac{p_{\theta}(x_T)}{q(x_{T}|x_0)}]+\sum_{t=2}^{T}\mathbb{E}_{q(x_{t-1},x_t|x_0)}[log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}] \\ &=\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]-D_{KL}(q(x_{T}|x_0)||p_{\theta}(x_T))+\sum_{t=2}^{T}\mathbb{E}_{q(x_{t-1},x_t|x_0)}[log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}] \end{aligned} logpθ(x)Eq(x1:Tx0)[logq(x1x0)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=2Tq(xt1x0)q(xt1xt,x0)q(xtx0)pθ(xt1xt)]=Eq(x1:Tx0)[logq(x1x0) pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[logt=2Tq(xt1xt,x0)pθ(xt1xt)]+Eq(x1:Tx0)[logq(xTx0)q(x1x0) ]=Eq(x1:Tx0)[logq(xTx0)pθ(xT)pθ(x0x1)]+Eq(x1:Tx0)[t=2Tlogq(xt1xt,x0)pθ(xt1xt)]=Eq(x1x0)[logpθ(x0x1)]+Eq(xTx0)[logq(xTx0)pθ(xT)]+Eq(xt1,xtx0)[t=2Tlogq(xt1xt,x0)pθ(xt1xt)]=Eq(x1x0)[logpθ(x0x1)]+Eq(xTx0)[logq(xTx0)pθ(xT)]+t=2TEq(xt1,xtx0)[logq(xt1xt,x0)pθ(xt1xt)]=Eq(x1x0)[logpθ(x0x1)]DKL(q(xTx0)∣∣pθ(xT))+t=2TEq(xt1,xtx0)[logq(xt1xt,x0)pθ(xt1xt)]

其中

E q ( x t − 1 , x t ∣ x 0 ) [ l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] = ∬ q ( x t − 1 , x t ∣ x 0 ) l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) d x t − 1 d x t = ∬ q ( x t − 1 ∣ x t , x 0 ) q ( x t ∣ x 0 ) l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) d x t − 1 d x t = ∫ [ ∫ l o g p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t − 1 ∣ x t , x 0 ) d x t − 1 ] q ( x t ∣ x 0 ) d x t = E q ( x t ∣ x 0 ) [ − D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) ] \begin{aligned} \mathbb{E}_{q(x_{t-1},x_t|x_0)}[log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}] &= \iint q(x_{t-1},x_t|x_0)log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}dx_{t-1}dx_t \\ &=\iint q(x_{t-1}|x_{t},x_0)q(x_{t}|x_{0})log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}dx_{t-1}dx_t \\ &=\int [\int log\frac{p_{\theta}(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t},x_0)}q(x_{t-1}|x_{t},x_0)dx_{t-1}]q(x_{t}|x_0)dx_{t} \\ &=\mathbb{E}_{q(x_{t}|x_0)}[-D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t}))] \end{aligned} Eq(xt1,xtx0)[logq(xt1xt,x0)pθ(xt1xt)]=q(xt1,xtx0)logq(xt1xt,x0)pθ(xt1xt)dxt1dxt=q(xt1xt,x0)q(xtx0)logq(xt1xt,x0)pθ(xt1xt)dxt1dxt=[logq(xt1xt,x0)pθ(xt1xt)q(xt1xt,x0)dxt1]q(xtx0)dxt=Eq(xtx0)[DKL(q(xt1xt,x0)∣∣pθ(xt1xt))]

[!NOTE] Title

注意上式不能将 q ( x t − 1 , x t ∣ x 0 ) q(x_{t-1},x_t|x_0) q(xt1,xtx0)分解为 q ( x t ∣ x t − 1 ) q ( x t − 1 ∣ x 0 ) q(x_t|x_{t-1})q(x_{t-1}|x_0) q(xtxt1)q(xt1x0),因为不管先对 x t − 1 x_{t-1} xt1还是 x t x_t xt积分,都会在后续被积函数中作为条件存在

至此利用马尔可夫性质对完成了多元变量的消除工作:

l o g p θ ( x ) ≥ E q ( x 1 ∣ x 0 ) [ l o g p θ ( x 0 ∣ x 1 ) ] ⏟ 重构项 − D K L ( q ( x T ∣ x 0 ) ∣ ∣ p θ ( x T ) ) ⏟ 正则项 + ∑ t = 2 T E q ( x t ∣ x 0 ) [ − D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) ] ⏟ 去噪匹配项 \begin{aligned} logp_{\theta}(x)&\geq\underbrace{\mathbb{E}_{q(x_{1}|x_0)}[logp_{\theta}(x_0|x_1)]}_{重构项}-\underbrace{D_{KL}(q(x_{T}|x_0)||p_{\theta}(x_T))}_{正则项}+\underbrace{\sum_{t=2}^{T}\mathbb{E}_{q(x_{t}|x_0)}[-D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t}))]}_{去噪匹配项} \end{aligned} logpθ(x)重构项 Eq(x1x0)[logpθ(x0x1)]正则项 DKL(q(xTx0)∣∣pθ(xT))+去噪匹配项 t=2TEq(xtx0)[DKL(q(xt1xt,x0)∣∣pθ(xt1xt))]

也可以看到这里前两项与VAE具有相同的形式

T = 1 T=1 T=1时,即意味着只有一个潜变量 x 1 = z x_1=z x1=z,这时退化到与VAE的ELBO具有完全相同的表达式

ELBO解析

∑ t = 2 T E q ( x t ∣ x 0 ) [ − D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) ] \sum_{t=2}^{T}\mathbb{E}_{q(x_{t}|x_0)}[-D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t}))] t=2TEq(xtx0)[DKL(q(xt1xt,x0)∣∣pθ(xt1xt))]是ELBO中占比最大的,优先看这个。

其中 p θ ( x t − 1 ∣ x t ) p_{\theta}(x_{t-1}|x_{t}) pθ(xt1xt)是模型参数化的结果, q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_{t},x_0) q(xt1xt,x0)是模型需要靠近的对象(ground-truth)。

对[[001 DDPM-v2#后向生成过程|ground-truth的推导]]不再赘述,最终结果为

q ( x t − 1 ∣ x t , x 0 ) = N ( 1 α t [ x t − β t β ˉ t ϵ ˉ t ] , β t β ˉ t − 1 β ˉ t I ) q(x_{t-1}|x_{t},x_0) = N(\frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{t}], \frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}I) q(xt1xt,x0)=N(αt 1[xtβˉt βtϵˉt],βˉtβtβˉt1I)
由于最终模型参数化 p θ ( x t − 1 ∣ x t ) p_{\theta}(x_{t-1}|x_{t}) pθ(xt1xt)是为了接近 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_{t},x_0) q(xt1xt,x0),那不妨:

  1. 直接使用 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_{t},x_0) q(xt1xt,x0)的方差: β t β ˉ t − 1 β ˉ t \frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}} βˉtβtβˉt1
  2. 参考 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_{t},x_0) q(xt1xt,x0)均值的形式去设置预测的变量: 1 α t [ x t − β t β ˉ t ϵ ˉ θ ] \frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{\theta}] αt 1[xtβˉt βtϵˉθ]

代入上述假设,展开 D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t})) DKL(q(xt1xt,x0)∣∣pθ(xt1xt))

D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) = D K L ( N ( 1 α t [ x t − β t β ˉ t ϵ ˉ t ] , β t β ˉ t − 1 β ˉ t I ) ∣ ∣ N ( 1 α t [ x t − β t β ˉ t ϵ ˉ θ ] , β t β ˉ t − 1 β ˉ t I ) ) \begin{aligned} D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t})) &= D_{KL}(N(\frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{t}], \frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}I)||N(\frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{\theta}], \frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}I)) \\ \end{aligned} DKL(q(xt1xt,x0)∣∣pθ(xt1xt))=DKL(N(αt 1[xtβˉt βtϵˉt],βˉtβtβˉt1I)∣∣N(αt 1[xtβˉt βtϵˉθ],βˉtβtβˉt1I))

参考

D K L ( N ( μ 1 , σ 1 2 I ) ∣ ∣ N ( μ 2 , σ 2 2 ) ) = log ⁡ σ 2 σ 1 + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 − 1 2 D_{KL}(N(\mu_1,\sigma_1^2I)||N(\mu_2,\sigma_2^2))=\log\frac{\sigma_2}{\sigma_1}+\frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2}-\frac{1}{2} DKL(N(μ1,σ12I)∣∣N(μ2,σ22))=logσ1σ2+2σ22σ12+(μ1μ2)221

得到最终的值为

D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) = log ⁡ β t β ˉ t − 1 β ˉ t β t β ˉ t − 1 β ˉ t + β t β ˉ t − 1 β ˉ t + ( 1 α t [ x t − β t β ˉ t ϵ ˉ t ] − 1 α t [ x t − β t β ˉ t ϵ ˉ θ ] ) 2 2 β t β ˉ t − 1 β ˉ t − 1 2 = β t 2 α t β ˉ t − 1 ∥ ϵ ˉ θ − ϵ ˉ t ∥ 2 \begin{aligned} D_{KL}(q(x_{t-1}|x_{t},x_0)||p_{\theta}(x_{t-1}|x_{t})) &= \log\frac{\sqrt{\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}}}{\sqrt{\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}}}+\frac{\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}+(\frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{t}]-\frac{1}{\sqrt{\alpha_{t}}}[x_{t}-\frac{\beta_{t}}{\sqrt{\bar{\beta}_{t}}}\bar{\epsilon}_{\theta}])^2}{2\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}}-\frac{1}{2} \\ &=\frac{\beta_t}{2\alpha_t\bar{\beta}_{t-1}}\Vert \bar{\epsilon}_{\theta}-\bar{\epsilon}_{t} \Vert^2 \end{aligned} DKL(q(xt1xt,x0)∣∣pθ(xt1xt))=logβˉtβtβˉt1 βˉtβtβˉt1 +2βˉtβtβˉt1βˉtβtβˉt1+(αt 1[xtβˉt βtϵˉt]αt 1[xtβˉt βtϵˉθ])221=2αtβˉt1βtϵˉθϵˉt2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值