【扩散模型】(二)DDPM

文章介绍了ForwardProcess作为马尔科夫链的过程,利用重参数化技巧进行表示。接着,讨论了ReverseProcess,同样定义为马尔科夫链,通过计算概率分布和噪声预测来生成样本。最后,通过KL散度定义损失函数,用于训练模型参数。
摘要由CSDN通过智能技术生成

在这里插入图片描述

Forward Process

Defined as Markov Chain: q ( x 1 : T | x 0 ) = ∏ t = 1 T q ( x t | x t − 1 , x t − 2 , ⋯   , x 0 ) = ∏ t = 1 T q ( x t | x t − 1 ) q\left({\bold x}_{1:T}\middle\vert{\bold x}_0\right)=\prod_{t=1}^T{q\left({\bold x}_t\middle\vert{\bold x}_{t-1},{\bold x}_{t-2},\cdots,{\bold x}_0\right)}=\prod_{t=1}^T{q\left({\bold x}_t\middle\vert{\bold x}_{t-1}\right)} q(x1:Tx0)=t=1Tq(xtxt1,xt2,,x0)=t=1Tq(xtxt1)where q ( x t | x t − 1 ) = N ( x t ; 1 − β t ⋅ x t − 1 , β t I ) q(\left.{\bold x}_t \middle\vert{\bold x}_{t-1}\right.)={\cal N}\left({\bold x}_t;\sqrt{1-\beta_t}\cdot{\bold x}_{t-1},\beta_t{\bold I}\right) q(xtxt1)=N(xt;1βt xt1,βtI)

Reparameterization Trick

x t = 1 − β t ⋅ x t − 1 + β t ⋅ ϵ t {\bold x}_t=\sqrt{1-\beta_t}\cdot{\bold x}_{t-1}+\sqrt{\beta_t}\cdot{\boldsymbol\epsilon}_t xt=1βt xt1+βt ϵtwhere ϵ t ∼ N ( 0 , I ) {\boldsymbol\epsilon}_t\sim{\cal N}\left({\bold 0},{\bold I}\right) ϵtN(0,I)

Why μ 2 + σ 2 = 1 \mu^2+\sigma^2=1 μ2+σ2=1

x t = 1 − β t ( 1 − β t − 1 ⋅ x t − 2 + β t − 1 ⋅ ϵ t − 1 ) + β t ⋅ ϵ t = ( 1 − β t ) ( 1 − β t − 1 ) ⋅ x t − 2 + 1 − ( 1 − β t ) ( 1 − β t − 1 ) ⋅ ϵ ′ = ⋯ = ∏ i = 1 t ( 1 − β i ) ⋅ x 0 + 1 − ∏ i = 1 t ( 1 − β i ) ⋅ ϵ ′ ′ \begin{aligned} {\bold x}_t &=\sqrt{1-\beta_t}\left(\sqrt{1-\beta_{t-1}}\cdot{\bold x}_{t-2}+\sqrt{\beta_{t-1}}\cdot{\boldsymbol\epsilon}_{t-1}\right)+\sqrt{\beta_t}\cdot{\boldsymbol\epsilon}_t \\ &=\sqrt{(1-\beta_t)(1-\beta_{t-1})}\cdot{\bold x}_{t-2}+\sqrt{1-(1-\beta_t)(1-\beta_{t-1})}\cdot{\boldsymbol\epsilon}' \\ &=\cdots \\ &=\sqrt{\prod_{i=1}^{t}\left(1-\beta_i\right)}\cdot{\bold x}_0+\sqrt{1-\prod_{i=1}^{t}\left(1-\beta_i\right)}\cdot{\boldsymbol\epsilon}'' \end{aligned} xt=1βt (1βt1 xt2+βt1 ϵt1)+βt ϵt=(1βt)(1βt1) xt2+1(1βt)(1βt1) ϵ==i=1t(1βi) x0+1i=1t(1βi) ϵ′′where ϵ ′ , ϵ ′ ′ ∼ N ( 0 , I ) {\boldsymbol\epsilon}',{\boldsymbol\epsilon}''\sim{\cal N}\left({\bold 0},{\bold I}\right) ϵ,ϵ′′N(0,I)let α t = 1 − β t \alpha_t=1-\beta_t αt=1βtand α ˉ t = ∏ s = 1 t α s \bar\alpha_t=\prod_{s=1}^{t}\alpha_s αˉt=s=1tαswe have q ( x t | x 0 ) = N ( x t ; α ˉ t ⋅ x 0 , ( 1 − α ˉ t ) I ) q(\left.{\bold x}_t \middle\vert{\bold x}_0\right.)={\cal N}\left({\bold x}_t;\sqrt{\bar\alpha_t}\cdot{\bold x}_0,(1-\bar\alpha_t){\bold I}\right) q(xtx0)=N(xt;αˉt x0,(1αˉt)I)

Reverse Process

Defined as Markov Chain as well: p θ ( x 0 : T ) = p θ ( x T ) ∏ t = 1 T p θ ( x t − 1 | x t ) p_\theta({\bold x}_{0:T})=p_\theta({\bold x}_T)\prod_{t=1}^T{p_\theta\left({\bold x}_{t-1}\middle\vert{\bold x}_{t}\right)} pθ(x0:T)=pθ(xT)t=1Tpθ(xt1xt)where p θ ( x t − 1 | x t ) = N ( x t − 1 ; μ θ ( x t , t ) , Σ θ ( x t , t ) ) p_\theta\left({\bold x}_{t-1}\middle\vert{\bold x}_t\right)={\cal N}\left({\bold x}_{t-1};{\boldsymbol\mu}_\theta\left({\bold x}_t,t\right),{\boldsymbol\Sigma}_\theta\left({\bold x}_t,t\right)\right) pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))

From Forward Process

q ( x t − 1 | x t ) = q ( x t − 1 | x t , x 0 ) = q ( x t | x t − 1 , x 0 ) ⋅ q ( x t − 1 | x 0 ) q ( x t | x 0 ) q\left({\bold x}_{t-1}\middle\vert{\bold x}_t\right)=q\left({\bold x}_{t-1}\middle\vert{\bold x}_t,{\bold x}_0\right)=q(\left.{\bold x}_t \middle\vert{\bold x}_{t-1},{\bold x}_0\right.)\cdot\frac{q(\left.{\bold x}_{t-1} \middle\vert{\bold x}_0\right.)}{q(\left.{\bold x}_t \middle\vert{\bold x}_0\right.)} q(xt1xt)=q(xt1xt,x0)=q(xtxt1,x0)q(xtx0)q(xt1x0)with Gaussian kernel log ⁡ q ( x t − 1 | x t , x 0 ) = − 1 2 [ ( x t − α t ⋅ x t − 1 ) 2 β t + ( x t − 1 − α ˉ t − 1 ⋅ x 0 ) 2 1 − α ˉ t − 1 − ( x t − α ˉ t ⋅ x 0 ) 2 1 − α ˉ t ] = − 1 2 [ ( α t β t + 1 1 − α ˉ t − 1 ) x t − 1 2 − ( 2 α t β t ⋅ x t + 2 α ˉ t − 1 1 − α ˉ t − 1 ⋅ x 0 ) x t − 1 + C ] \begin{aligned} \log{q\left({\bold x}_{t-1}\middle\vert{\bold x}_t,{\bold x}_0\right)} &=-\frac12\left[\frac{\left({\bold x}_t-\sqrt{\alpha_t}\cdot{\bold x}_{t-1}\right)^2}{\beta_t}+\frac{\left({\bold x}_{t-1}-\sqrt{\bar\alpha_{t-1}}\cdot{\bold x}_0\right)^2}{1-\bar\alpha_{t-1}}-\frac{\left({\bold x}_t-\sqrt{\bar\alpha_t}\cdot{\bold x}_0\right)^2}{1-\bar\alpha_t}\right] \\ &=-\frac12\left[\left(\frac{\alpha_t}{\beta_t}+\frac{1}{1-\bar\alpha_{t-1}}\right){\bold x}_{t-1}^2-\left(\frac{2\sqrt{\alpha_t}}{\beta_t}\cdot{\bold x}_t+\frac{2\sqrt{\bar\alpha_{t-1}}}{1-\bar\alpha_{t-1}}\cdot{\bold x}_0\right){\bold x}_{t-1}+C\right] \end{aligned} logq(xt1xt,x0)=21[βt(xtαt xt1)2+1αˉt1(xt1αˉt1 x0)21αˉt(xtαˉt x0)2]=21[(βtαt+1αˉt11)xt12(βt2αt xt+1αˉt12αˉt1 x0)xt1+C]therefore 1 σ 2 = α t − α ˉ t + β t β t ( 1 − α ˉ t − 1 ) = 1 − α ˉ t 1 − α ˉ t − 1 ⋅ 1 β t ⟹ σ 2 = 1 − α ˉ t − 1 1 − α ˉ t ⋅ β t = Δ β ~ t \frac{1}{\sigma^2}=\frac{\alpha_t-\bar\alpha_t+\beta_t}{\beta_t\left(1-\bar\alpha_{t-1}\right)}=\frac{1-\bar\alpha_t}{1-\bar\alpha_{t-1}}\cdot\frac{1}{\beta_t} \Longrightarrow \sigma^2=\frac{1-\bar\alpha_{t-1}}{1-\bar\alpha_t}\cdot\beta_t\xlongequal[]{\Delta}\tilde\beta_t σ21=βt(1αˉt1)αtαˉt+βt=1αˉt11αˉtβt1σ2=1αˉt1αˉt1βtΔ β~t μ = σ 2 2 ( 2 α t β t ⋅ x t + 2 α ˉ t − 1 1 − α ˉ t − 1 ⋅ x 0 ) = α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t ⋅ x t + β t α ˉ t − 1 1 − α ˉ t ⋅ x 0 = Δ μ ~ t ( x t , x 0 ) \mu=\frac{\sigma^2}{2}\left(\frac{2\sqrt{\alpha_t}}{\beta_t}\cdot{\bold x}_t+\frac{2\sqrt{\bar\alpha_{t-1}}}{1-\bar\alpha_{t-1}}\cdot{\bold x}_0\right)=\frac{\sqrt{\alpha_t}\left(1-\bar\alpha_{t-1}\right)}{1-\bar\alpha_t}\cdot{\bold x}_t+\frac{\beta_t\sqrt{\bar\alpha_{t-1}}}{1-\bar\alpha_t}\cdot{\bold x}_0\xlongequal[]{\Delta}\tilde{\boldsymbol\mu}_t({\bold x}_t,{\bold x}_0) μ=2σ2(βt2αt xt+1αˉt12αˉt1 x0)=1αˉtαt (1αˉt1)xt+1αˉtβtαˉt1 x0Δ μ~t(xt,x0)finally q ( x t − 1 | x t , x 0 ) = N ( x t − 1 ; μ ~ t ( x t , x 0 ) , β ~ t I ) q\left({\bold x}_{t-1}\middle\vert{\bold x}_t,{\bold x}_0\right)={\cal N}\left({\bold x}_{t-1};\tilde{\boldsymbol\mu}_t({\bold x}_t,{\bold x}_0),\tilde\beta_t{\bold I}\right) q(xt1xt,x0)=N(xt1;μ~t(xt,x0),β~tI)

Noise Prediction

For given noise ϵ ∼ N ( 0 , I ) {\boldsymbol\epsilon}\sim{\cal N}({\bold 0},{\bold I}) ϵN(0,I)we have x t ( x 0 , ϵ ) = α ˉ t ⋅ x 0 + 1 − α ˉ t ⋅ ϵ ⟹ x 0 = x t ( x 0 , ϵ ) − 1 − α ˉ t ⋅ ϵ α ˉ t {\bold x}_t({\bold x}_0,{\boldsymbol\epsilon})=\sqrt{\bar\alpha_t}\cdot{\bold x}_0+\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon} \Longrightarrow {\bold x}_0=\frac{{\bold x}_t({\bold x}_0,{\boldsymbol\epsilon})-\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon}}{\sqrt{\bar\alpha_t}} xt(x0,ϵ)=αˉt x0+1αˉt ϵx0=αˉt xt(x0,ϵ)1αˉt ϵthus, w.r.t. x t {\bold x}_t xt and ϵ \boldsymbol\epsilon ϵ μ ~ t ( x t , x t − 1 − α ˉ t ⋅ ϵ α ˉ t ) = α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t ⋅ x t + β t α ˉ t − 1 1 − α ˉ t ⋅ x t − 1 − α ˉ t ⋅ ϵ α ˉ t = α t − α ˉ t + β t ( 1 − α ˉ t ) α t ⋅ x t − β t 1 − α ˉ t α t ⋅ ϵ = 1 α t ( x t − β t 1 − α ˉ t ⋅ ϵ ) \begin{aligned} \tilde{\boldsymbol\mu}_t\left({\bold x}_t,\frac{{\bold x}_t-\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon}}{\sqrt{\bar\alpha_t}}\right) &=\frac{\sqrt{\alpha_t}\left(1-\bar\alpha_{t-1}\right)}{1-\bar\alpha_t}\cdot{\bold x}_t+\frac{\beta_t\sqrt{\bar\alpha_{t-1}}}{1-\bar\alpha_t}\cdot\frac{{\bold x}_t-\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon}}{\sqrt{\bar\alpha_t}} \\ &=\frac{\alpha_t-\bar\alpha_t+\beta_t}{\left(1-\bar\alpha_t\right)\sqrt{\alpha_t}}\cdot{\bold x}_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}\sqrt{\alpha_t}}\cdot{\boldsymbol\epsilon} \\ &=\frac{1}{\sqrt{\alpha_t}}\left({\bold x}_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\cdot{\boldsymbol\epsilon}\right) \end{aligned} μ~t(xt,αˉt xt1αˉt ϵ)=1αˉtαt (1αˉt1)xt+1αˉtβtαˉt1 αˉt xt1αˉt ϵ=(1αˉt)αt αtαˉt+βtxt1αˉt αt βtϵ=αt 1(xt1αˉt βtϵ)parameterize as neural network ϵ = ϵ θ ( x t , t ) {\boldsymbol\epsilon}={\boldsymbol\epsilon}_\theta({\bold x}_t,t) ϵ=ϵθ(xt,t)finally μ θ ( x t , t ) = 1 α t ( x t − β t 1 − α ˉ t ⋅ ϵ θ ( x t , t ) ) {\boldsymbol\mu}_\theta\left({\bold x}_t,t\right)=\frac{1}{\sqrt{\alpha_t}}\left({\bold x}_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\cdot{\boldsymbol\epsilon}_\theta({\bold x}_t,t)\right) μθ(xt,t)=αt 1(xt1αˉt βtϵθ(xt,t))

Loss Function

Recap p θ ( x t − 1 | x t ) = N ( x t − 1 ; μ θ ( x t , t ) , σ t 2 I ) p_\theta\left({\bold x}_{t-1}\middle\vert{\bold x}_t\right)={\cal N}\left({\bold x}_{t-1};{\boldsymbol\mu}_\theta\left({\bold x}_t,t\right),\sigma_t^2{\bold I}\right) pθ(xt1xt)=N(xt1;μθ(xt,t),σt2I)where σ t 2 = β t   o r   β ~ t \sigma_t^2=\beta_t \ {\rm or} \ \tilde\beta_t σt2=βt or β~tusing KL divergence L t − 1 = K L ( q ( x t − 1 | x t , x 0 ) ∥ p θ ( x t − 1 | x t ) ) = E q [ 1 2 σ t 2 ∥ μ ~ t ( x t , x 0 ) − μ θ ( x t , t ) ∥ 2 ] = E x 0 , ϵ [ 1 2 σ t 2 ∥ 1 α t ( x t − β t 1 − α ˉ t ⋅ ϵ ) − 1 α t ( x t − β t 1 − α ˉ t ⋅ ϵ θ ( x t , t ) ) ∥ 2 ] = E x 0 , ϵ [ β t 2 2 σ t 2 α t ( 1 − α ˉ t ) ∥ ϵ − ϵ θ ( α ˉ t ⋅ x 0 + 1 − α ˉ t ⋅ ϵ , t ) ∥ 2 ] \begin{aligned} {\cal L}_{t-1} &=\mathop{\rm KL}\left(q\left({\bold x}_{t-1}\middle\vert{\bold x}_t,{\bold x}_0\right)\middle\Vert p_\theta\left({\bold x}_{t-1}\middle\vert{\bold x}_t\right)\right) \\ &={\bf E}_q\left[\left.\frac{1}{2\sigma_t^2}\middle\Vert\tilde{\boldsymbol\mu}_t({\bold x}_t,{\bold x}_0)-{\boldsymbol\mu}_\theta\left({\bold x}_t,t\right)\right\Vert^2\right] \\ &={\bf E}_{{\bold x}_0,{\boldsymbol\epsilon}}\left[\left.\frac{1}{2\sigma_t^2}\middle\Vert\frac{1}{\sqrt{\alpha_t}}\left({\bold x}_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\cdot{\boldsymbol\epsilon}\right)-\frac{1}{\sqrt{\alpha_t}}\left({\bold x}_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\cdot{\boldsymbol\epsilon}_\theta({\bold x}_t,t)\right)\right\Vert^2\right] \\ &={\bf E}_{{\bold x}_0,{\boldsymbol\epsilon}}\left[\left.\frac{\beta_t^2}{2\sigma_t^2\alpha_t(1-\bar\alpha_t)}\middle\Vert{\boldsymbol\epsilon}-{\boldsymbol\epsilon}_\theta\left(\sqrt{\bar\alpha_t}\cdot{\bold x}_0+\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon},t\right)\right\Vert^2\right] \end{aligned} Lt1=KL(q(xt1xt,x0)pθ(xt1xt))=Eq[2σt21 μ~t(xt,x0)μθ(xt,t) 2]=Ex0,ϵ[2σt21 αt 1(xt1αˉt βtϵ)αt 1(xt1αˉt βtϵθ(xt,t)) 2]=Ex0,ϵ[2σt2αt(1αˉt)βt2 ϵϵθ(αˉt x0+1αˉt ϵ,t) 2]a simplified version (w/ no coefficient) L s i m p = E x 0 , ϵ [ ∥ ϵ − ϵ θ ( α ˉ t ⋅ x 0 + 1 − α ˉ t ⋅ ϵ , t ) ∥ 2 ] {\cal L}_{\rm simp}={\bf E}_{{\bold x}_0,{\boldsymbol\epsilon}}\left[\left\Vert{\boldsymbol\epsilon}-{\boldsymbol\epsilon}_\theta\left(\sqrt{\bar\alpha_t}\cdot{\bold x}_0+\sqrt{1-\bar\alpha_t}\cdot{\boldsymbol\epsilon},t\right)\right\Vert^2\right] Lsimp=Ex0,ϵ[ ϵϵθ(αˉt x0+1αˉt ϵ,t) 2]

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值