深度学习(生成式模型)——score-based generative modeling through stochastic differential equations

前言

yang song博士在《Score-Based Generative Modeling Through Stochastic Differential Equations》一文中提出可以使用SDE(随机微分方程)来刻画Diffusion model的前向过程,并且用SDE统一了Score-based Model (NCSN)和DDPM的前向过程反向过程。此外,SDE对应了多个前向过程,即从一张图到某个噪声点的加噪方式有多种,但其中存在一个ODE(常微分方程)形式的前向过程,即不存在随机变量的确定性的前向过程。

本文将总结SDE与DDPM的关系,并给出相应推导

SDE是什么

SDE具体的数学形式如下:
d x = f ( x , t ) d t + g ( t ) d w (1.0) dx=f(x,t)dt+g(t)dw\tag{1.0} dx=f(x,t)dt+g(t)dw(1.0)

f ( x , t ) f(x,t) f(x,t)表示自变量 x x x随着时间 t t t确定性的变化(又被称为drift coefficients), g ( t ) g(t) g(t)是一项与时间 t t t相关的函数(又被称为diffusion coefficients), d w dw dw为布朗运动的增量,是一个随机项(可以理解为噪声

SDE与DDPM前向过程的关系

我们将上述部分微分项展开并移位可得

x t + Δ t − x t = f ( x , t ) d t + g ( t ) d w x t + Δ t = x t + f ( x , t ) d t + g ( t ) d w (2.0) \begin{aligned} x_{t+\Delta t}-x_t & = f(x,t)dt+g(t)dw\\ x_{t+\Delta t}& =x_t+f(x,t)dt+g(t)dw\tag{2.0} \end{aligned} xt+Δtxtxt+Δt=f(x,t)dt+g(t)dw=xt+f(x,t)dt+g(t)dw(2.0)

我们将 x t x_t xt看成是前向过程 t t t时刻的图像,则下一时刻 t + Δ t t+\Delta t t+Δt的图像 x t + Δ t x_{t+\Delta t} xt+Δt可通过式2.0加噪得到。

接下来,我们将简单推导式2.0与DDPM前向过程的关系,已知DDPM的前向过程为
x t + Δ t = 1 − β t + Δ t x t + β t + Δ t ϵ t (2.1) x_{t+\Delta t}=\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t} \tag{2.1} xt+Δt=1βt+Δt xt+βt+Δt ϵt(2.1)

β ‾ t + Δ t = T β t + Δ t \overline \beta_{t+\Delta t}=T\beta_{t+\Delta t} βt+Δt=Tβt+Δt Δ t = 1 T \Delta t=\frac{1}{T} Δt=T1,则式2.1为

x t + Δ t = 1 − β t + Δ t x t + β t + Δ t ϵ t = 1 − β ‾ t + Δ t T x t + β t + Δ t ϵ t = 1 − β ‾ t + Δ t Δ t x t + β t + Δ t ϵ t (2.2) \begin{aligned} x_{t+\Delta t}=&\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\frac{\overline \beta_{t+\Delta t}}{T}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\overline \beta_{t+\Delta t}\Delta t} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\tag{2.2} \end{aligned} xt+Δt===1βt+Δt xt+βt+Δt ϵt1Tβt+Δt xt+βt+Δt ϵt1βt+ΔtΔt xt+βt+Δt ϵt(2.2)

Δ t \Delta t Δt趋近于0,依据等价无穷小代换,式2.2有
x t + Δ t = 1 − β t + Δ t x t + β t + Δ t ϵ t = 1 − β ‾ t + Δ t Δ t x t + β t + Δ t ϵ t ≈ ( 1 − 1 2 β ‾ t + Δ t Δ t ) x t + β t + Δ t ϵ t = x t − 1 2 β ‾ t + Δ t x t d t + β t + Δ t ϵ t (2.3) \begin{aligned} x_{t+\Delta t}=&\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\overline \beta_{t+\Delta t}\Delta t} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ \approx&(1-\frac{1}{2}\overline \beta_{t+\Delta t}\Delta t)x_t+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&x_t-\frac{1}{2}\overline \beta_{t+\Delta t}x_t dt+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\tag{2.3} \end{aligned} xt+Δt===1βt+Δt xt+βt+Δt ϵt1βt+ΔtΔt xt+βt+Δt ϵt(121βt+ΔtΔt)xt+βt+Δt ϵtxt21βt+Δtxtdt+βt+Δt ϵt(2.3)

比对式2.3与2.0,则有

f ( x , t ) = − 1 2 β ‾ t + Δ t x t g ( t ) = β t + Δ t d w = ϵ t \begin{aligned} f(x,t)&=-\frac{1}{2}\overline \beta_{t+\Delta t}x_t\\ g(t)&=\sqrt{\beta_{t+\Delta t}}\\ dw&=\epsilon_{t} \end{aligned} f(x,t)g(t)dw=21βt+Δtxt=βt+Δt =ϵt

逆向过程的SDE

前文我们已经介绍了Diffusion model的前向过程可以用SDE描述,本节将推导出逆向过程的SDE形式。

d w = Δ t ϵ dw=\sqrt{\Delta t}\epsilon dw=Δt ϵ,由式2.0,可得
p ( x t + Δ t ∣ x t ) = N ( x t + Δ t ; x t + f ( x t , Δ t ) Δ t , g 2 ( t ) Δ t ) (3.0) p(x_{t+\Delta t}|x_t)=\mathcal N(x_{t+\Delta t};x_t+f(x_t,\Delta t)\Delta t,g^2(t)\Delta t)\tag{3.0} p(xt+Δtxt)=N(xt+Δt;xt+f(xt,Δt)Δt,g2(t)Δt)(3.0)

利用贝叶斯公式,则逆向过程为
q ( x t ∣ x t + Δ t ) = q ( x t + Δ t ∣ x t ) q ( x t ) q ( x t + Δ t ) = q ( x t + Δ t ∣ x t ) exp ⁡ { log ⁡ p ( x t ) − log ⁡ p ( x t + Δ t ) } (3.1) \begin{aligned} q(x_{t}|x_{t+\Delta t})&=\frac{q(x_{t+\Delta t}|x_{t})q(x_{t})}{q(x_{t+\Delta t})}\\ &=q(x_{t+\Delta t}|x_{t})\exp\{\log p(x_t)-\log p(x_{t+\Delta t})\}\tag{3.1} \end{aligned} q(xtxt+Δt)=q(xt+Δt)q(xt+Δtxt)q(xt)=q(xt+Δtxt)exp{logp(xt)logp(xt+Δt)}(3.1)

利用泰勒展开,则有

log ⁡ p ( x t + Δ t ) ≈ log ⁡ p ( x t ) + ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) (3.2) \log p(x_{t+\Delta t}) \approx \log p(x_t)+(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\tag{3.2} logp(xt+Δt)logp(xt)+(xt+Δtxt)xlogp(xt)(3.2)

代入式3.1,并且结合式3.0,则有
q ( x t ∣ x t + Δ t ) = q ( x t + Δ t ∣ x t ) exp ⁡ { − ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) } ≈ exp ⁡ { − ( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) 2 g 2 ( t ) Δ t } (3.3) \begin{aligned} q(x_{t}|x_{t+\Delta t})&=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)}{2g^2(t)\Delta t }\}\tag{3.3} \end{aligned} q(xtxt+Δt)=q(xt+Δtxt)exp{(xt+Δtxt)xlogp(xt)}exp{2g2(t)Δt(xt+Δtxtf(xt,t)Δt)2+2g2(t)Δt(xt+Δtxt)xlogp(xt)}(3.3)

为了后续书写方便,令
a = f ( x t , t ) Δ t b = g 2 ( t ) Δ t \begin{aligned} a&=f(x_t,t)\Delta t\\ b&=g^2(t)\Delta t \end{aligned} ab=f(xt,t)Δt=g2(t)Δt

则有

( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) = ( x t + Δ t − x t − a ) 2 + 2 b ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) = ( x t + Δ t − x t ) 2 − 2 a ( x t + Δ t − x t ) + a 2 + 2 b ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) = ( x t + Δ t − x t ) 2 − 2 ( a − b ∇ x log ⁡ p ( x t ) ) ( x t + Δ t − x t ) + ( a − b ) 2 + a 2 − ( a − b ) 2 = ( x t + Δ t − x t − ( a − b ∇ x log ⁡ p ( x t ) ) ) 2 + a 2 − ( a − b ∇ x log ⁡ p ( x t ) ) 2 \begin{aligned} &(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t-a)^2+2b(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t)^2-2a(x_{t+\Delta t}-x_t)+a^2+2b(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t)^2-2(a-b\nabla_{x}\log p(x_t))(x_{t+\Delta t}-x_t)+(a-b)^2+a^2-(a-b)^2\\ &=(x_{t+\Delta t}-x_t-(a-b\nabla_{x}\log p(x_t)))^2+a^2-(a-b\nabla_{x}\log p(x_t))^2 \end{aligned} (xt+Δtxtf(xt,t)Δt)2+2g2(t)Δt(xt+Δtxt)xlogp(xt)=xt+Δtxta2+2b(xt+Δtxt)xlogp(xt)=(xt+Δtxt)22a(xt+Δtxt)+a2+2b(xt+Δtxt)xlogp(xt)=(xt+Δtxt)22(abxlogp(xt))(xt+Δtxt)+(ab)2+a2(ab)2=(xt+Δtxt(abxlogp(xt)))2+a2(abxlogp(xt))2

Δ t \Delta t Δt趋近0时,则有

a 2 2 b = f ( x t , t ) 2 Δ t 2 g 2 ( t ) → 0 ( a − b ∇ x log ⁡ p ( x t ) ) 2 2 b = ( f ( x t , t ) − g 2 ( t ) ∇ x log ⁡ p ( x t ) ) Δ t g 2 ( t ) → 0 \begin{aligned} \frac{a^2}{2b}&=\frac{f(x_t,t)^2\Delta t}{2g^2(t)} \rightarrow 0\\ \frac{(a-b\nabla_{x}\log p(x_t))^2}{2b}&=\frac{(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))\Delta t}{g^2(t)} \rightarrow 0 \end{aligned} 2ba22b(abxlogp(xt))2=2g2(t)f(xt,t)2Δt0=g2(t)(f(xt,t)g2(t)xlogp(xt))Δt0

则当 Δ t \Delta t Δt趋近0,式3.3为

q ( x t ∣ x t + Δ t ) = q ( x t + Δ t ∣ x t ) exp ⁡ { − ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) } ≈ exp ⁡ { − ( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t ) 2 g 2 ( t ) Δ t } = exp ⁡ { − ( x t + Δ t − x t − ( f ( x t , t ) Δ t − g 2 ( t ) Δ t ∇ x log ⁡ p ( x t ) ) ) 2 2 g 2 ( t ) Δ t } (3.4) \begin{aligned} q(x_{t}|x_{t+\Delta t})&=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)}{2g^2(t)\Delta t }\}\\ &=\exp\{-\frac{(x_{t+\Delta t}-x_t-(f(x_t,t)\Delta t-g^2(t)\Delta t\nabla_{x}\log p(x_t)))^2}{2g^2(t)\Delta t}\}\tag{3.4} \end{aligned} q(xtxt+Δt)=q(xt+Δtxt)exp{(xt+Δtxt)xlogp(xt)}exp{2g2(t)Δt(xt+Δtxtf(xt,t)Δt)2+2g2(t)Δt(xt+Δtxt)xlogp(xt)}=exp{2g2(t)Δt(xt+Δtxt(f(xt,t)Δtg2(t)Δtxlogp(xt)))2}(3.4)

则有

q ( x t ∣ x t + Δ t ) = N ( x t ∣ x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log ⁡ p ( x t ) , 2 g 2 ( t ) Δ t ) (3.5) q(x_t|x_{t+\Delta t})=\mathcal N(x_t|x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_t),2g^2(t)\Delta t)\tag{3.5} q(xtxt+Δt)=N(xtxt+Δtf(xt,t)Δt+g2(t)Δtxlogp(xt),2g2(t)Δt)(3.5)

设噪声 z z z服从标准正态分布,则式3.5写成SDE的形式为

x t = x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log ⁡ p ( x t ) + g ( t ) 2 Δ t z d x = ( f ( x t , t ) − g 2 ( t ) ∇ x log ⁡ p ( x t ) ) d t − g ( t ) 2 Δ t z = ( f ( x t , t ) − g 2 ( t ) ∇ x log ⁡ p ( x t ) ) d t + g ( t ) d w ‾ \begin{aligned} x_t&=x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_t)+g(t)\sqrt{2\Delta t}z\\ dx&=(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))dt-g(t)\sqrt{2\Delta t}z\\ &=(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))dt+g(t)d\overline{w} \end{aligned} xtdx=xt+Δtf(xt,t)Δt+g2(t)Δtxlogp(xt)+g(t)t z=(f(xt,t)g2(t)xlogp(xt))dtg(t)t z=(f(xt,t)g2(t)xlogp(xt))dt+g(t)dw

∇ x t log ⁡ p ( x t ) \nabla_{x_t}\log p(x_t) xtlogp(xt)与DDPM预测的噪声 ϵ \epsilon ϵ的关系

score base model一般会用神经网络拟合 ∇ x t log ⁡ p ( x t ) \nabla_{x_t}\log p(x_t) xtlogp(xt),DDPM其实是一种特殊的score base model,已知DDPM的前向过程为

x t = α ˉ t x 0 + 1 − α ˉ t ϵ t (4.0) x_t=\sqrt{\bar \alpha_t}x_0+\sqrt{1-\bar\alpha_t}\epsilon_t\tag{4.0} xt=αˉt x0+1αˉt ϵt(4.0)

依据Tweedie方法,我们有

α ˉ t x 0 = x t + ( 1 − α ˉ t ) ∇ x log ⁡ p ( x t ) \begin{aligned} \sqrt{\bar \alpha_t}x_0=x_t+(1-\bar\alpha_t)\nabla_{x}\log p(x_t) \end{aligned} αˉt x0=xt+(1αˉt)xlogp(xt)
进而有
x t = α ˉ t x 0 − ( 1 − α ˉ t ) ∇ x log ⁡ p ( x t ) (4.2) x_t=\sqrt{\bar \alpha_t}x_0-(1-\bar\alpha_t)\nabla_{x}\log p(x_t)\tag{4.2} xt=αˉt x0(1αˉt)xlogp(xt)(4.2)

结合式4.0与4.2,则有

∇ x t log ⁡ p ( x t ) = − 1 1 − α ˉ t ϵ t (4.3) \nabla_{x_t}\log p(x_t)=-\frac{1}{\sqrt{1-\bar\alpha_t}}\epsilon_t\tag{4.3} xtlogp(xt)=1αˉt 1ϵt(4.3)

逆向过程SDE与DDPM逆向过程的关系

在进行正式的推导前,我们先对式3.1做个简单的变化,利用泰勒展开,则有
log ⁡ p ( x t ) ≈ log ⁡ p ( x t + Δ t ) + ( x t − x t + Δ t ) ∇ x log ⁡ p ( x t + Δ t ) (5.0) \log p(x_{t}) \approx \log p(x_{t+\Delta t})+(x_t-x_{t+\Delta t})\nabla_{x}\log p(x_{t+\Delta t})\tag{5.0} logp(xt)logp(xt+Δt)+(xtxt+Δt)xlogp(xt+Δt)(5.0)

代入式3.1,并结合式3.0,则有
q ( x t ∣ x t + Δ t ) = = q ( x t + Δ t ∣ x t ) exp ⁡ { log ⁡ p ( x t ) − log ⁡ p ( x t + Δ t ) = q ( x t + Δ t ∣ x t ) exp ⁡ { − ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t + Δ t ) } ≈ exp ⁡ { − ( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log ⁡ p ( x t + Δ t ) 2 g 2 ( t ) Δ t } (5.1) \begin{aligned} q(x_{t}|x_{t+\Delta t})&==q(x_{t+\Delta t}|x_{t})\exp\{\log p(x_t)-\log p(x_{t+\Delta t})\\ &=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_{t+\Delta t})\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_{t+\Delta t})}{2g^2(t)\Delta t }\}\tag{5.1} \end{aligned} q(xtxt+Δt)==q(xt+Δtxt)exp{logp(xt)logp(xt+Δt)=q(xt+Δtxt)exp{(xt+Δtxt)xlogp(xt+Δt)}exp{2g2(t)Δt(xt+Δtxtf(xt,t)Δt)2+2g2(t)Δt(xt+Δtxt)xlogp(xt+Δt)}(5.1)

因此我们可将式3.5重写为
q ( x t ∣ x t + Δ t ) = N ( x t ∣ x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log ⁡ p ( x t + Δ t ) , 2 g 2 ( t ) Δ t ) (5.2) q(x_t|x_{t+\Delta t})=\mathcal N(x_t|x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_{t+\Delta t}),2g^2(t)\Delta t)\tag{5.2} q(xtxt+Δt)=N(xtxt+Δtf(xt,t)Δt+g2(t)Δtxlogp(xt+Δt),2g2(t)Δt)(5.2)

已知用SDE表示DDPM的前向过程时,有

f ( x , t ) = − 1 2 β ‾ t + Δ t x t g ( t ) = β t + Δ t \begin{aligned} f(x,t)&=-\frac{1}{2}\overline \beta_{t+\Delta t}x_t\\ g(t)&=\sqrt{\beta_{t+\Delta t}} \end{aligned} f(x,t)g(t)=21βt+Δtxt=βt+Δt

其中
β ‾ t + Δ t = T β t + Δ t = β t + Δ t Δ t \overline \beta_{t+\Delta t}=T\beta_{t+\Delta t}=\frac{\beta_{t+{\Delta t}}}{\Delta t} βt+Δt=Tβt+Δt=Δtβt+Δt

T = 1 Δ t T=\frac{1}{\Delta t} T=Δt1,设 z z z服从标准正态分布,代入式5.2并结合式4.1,当 Δ t \Delta t Δt趋近于0时,有

x t = x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log ⁡ p ( x t + Δ t ) + g ( t ) 2 Δ t z x t = x t + Δ t + 1 2 β ‾ t + Δ t x t Δ t + β t + Δ t Δ t ∇ x log ⁡ p ( x t + Δ t ) + 2 β t + Δ t Δ t z ( 1 − 1 2 β ‾ t + Δ t Δ t ) x t = x t + Δ t + β t + Δ t Δ t ∇ x log ⁡ p ( x t + Δ t ) + 2 β t + Δ t Δ t z 1 − β t + Δ t x t ≈ x t + Δ t + β t + Δ t Δ t ∇ x log ⁡ p ( x t + Δ t ) + 2 β t + Δ t Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t + β t + Δ t Δ t ∇ x log ⁡ p ( x t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t + β t + Δ t Δ t ∇ x log ⁡ p ( x t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t − β t + Δ t Δ t 1 − α ˉ t ϵ t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z \begin{aligned} x_t&=x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_{t+\Delta t})+g(t)\sqrt{2\Delta t}z\\ x_t&=x_{t+\Delta t}+\frac{1}{2}\overline \beta_{t+\Delta t}x_t\Delta t+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ (1-\frac{1}{2}\overline \beta_{t+\Delta t}\Delta t)x_t&=x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ \sqrt{1- \beta_{t+\Delta t}}x_t&\approx x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ x_t&\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z\\ x_t&\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z\\ x_t &\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t} -\frac{\beta_{t+\Delta t}\Delta t}{\sqrt{1-\bar\alpha_t}}\epsilon_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z \end{aligned} xtxt(121βt+ΔtΔt)xt1βt+Δt xtxtxtxt=xt+Δtf(xt,t)Δt+g2(t)Δtxlogp(xt+Δt)+g(t)t z=xt+Δt+21βt+ΔtxtΔt+βt+ΔtΔtxlogp(xt+Δt)+2βt+ΔtΔt z=xt+Δt+βt+ΔtΔtxlogp(xt+Δt)+2βt+ΔtΔt zxt+Δt+βt+ΔtΔtxlogp(xt+Δt)+2βt+ΔtΔt z1βt+Δt 1(xt+Δt+βt+ΔtΔtxlogp(xt+Δt))+1βt+Δt2βt+ΔtΔt z1βt+Δt 1(xt+Δt+βt+ΔtΔtxlogp(xt+Δt))+1βt+Δt2βt+ΔtΔt z1βt+Δt 1(xt+Δt1αˉt βt+ΔtΔtϵt+Δt))+1βt+Δt2βt+ΔtΔt z

Δ t = 1 \Delta t=1 Δt=1时,则有

x t ≈ 1 1 − β t + 1 ( x t + 1 − β t + 1 1 − α ˉ t ϵ t + 1 ) ) + 2 β t + 1 1 − β t + 1 z x_t\approx \frac{1}{\sqrt{1- \beta_{t+1}}}(x_{t+1} -\frac{\beta_{t+1}}{\sqrt{1-\bar\alpha_t}}\epsilon_{t+1}))+\sqrt{\frac{2\beta_{t+1}}{1-\beta_{t+1}}}z xt1βt+1 1(xt+11αˉt βt+1ϵt+1))+1βt+12βt+1 z

不过这种约等于号是真的很膈应,是不能做完全等价的。

Probability Flow (PF) ODE

有多种SDE,可以将一张图像变为某个噪声点,其中也包括一个ODE(即去除掉SDE中布朗运动增量)

对于前向过程

d x = f ( x , t ) d t + g ( t ) d w dx=f(x,t)dt+g(t)dw dx=f(x,t)dt+g(t)dw

由Fokker-Planck方程可得

∂ p ( x , t ) ∂ t = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i , j ∂ 2 ∂ x i x j { [ g 2 ( t ) I ] i j p ( x , t ) } = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ g 2 ( t ) p ( x , t ) ] \begin{aligned} \frac{\partial p(x,t)}{\partial t} &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i,j}\frac{\partial^2}{\partial x_i x_j}\left\{[g^2(t)I]_{ij} p(x,t)\right\} \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[g^2(t) p(x,t)] \end{aligned} tp(x,t)=ixi[fi(x,t)p(x,t)]+21i,jxixj2{[g2(t)I]ijp(x,t)}=ixi[fi(x,t)p(x,t)]+21ixi22[g2(t)p(x,t)]

对上述式子做个等价变换,则有
∂ p ( x , t ) ∂ t = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ g 2 ( t ) p ( x , t ) ] = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ ( g 2 ( t ) − σ 2 ( t ) ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ σ 2 ( t ) p ( x , t ) ] = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ ∂ x i ( g 2 ( t ) − σ 2 ( t ) ) ∂ ∂ x i p ( x , t ) + 1 2 ∑ i ∂ 2 ∂ x i 2 [ σ 2 ( t ) p ( x , t ) ] = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ ∂ x i ( g 2 ( t ) − σ 2 ( t ) ) p ( x , t ) ∂ ∂ x i log ⁡ p ( x , t ) + 1 2 ∑ i ∂ 2 ∂ x i 2 [ σ 2 ( t ) p ( x , t ) ] = − ∑ i ∂ ∂ x i [ ( f i ( x , t ) − 1 2 ( g 2 ( t ) − σ 2 ( t ) ) ∂ ∂ x i log ⁡ p ( x , t ) ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ σ 2 ( t ) p ( x , t ) ] \begin{aligned} \frac{\partial p(x,t)}{\partial t} &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[g^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[(g^2(t) - \sigma^2(t)) p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial}{\partial x_i} (g^2(t) - \sigma^2(t)) \frac{\partial}{\partial x_i} p(x,t) + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial}{\partial x_i}(g^2(t) - \sigma^2(t)) p(x,t) \frac{\partial}{\partial x_i} \log p(x,t) + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i} \left[ \left(f_i(x,t) - \frac{1}{2}(g^2(t) - \sigma^2(t)) \frac{\partial}{\partial x_i} \log p(x,t) \right)p(x,t) \right] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \end{aligned} tp(x,t)=ixi[fi(x,t)p(x,t)]+21ixi22[g2(t)p(x,t)]=ixi[fi(x,t)p(x,t)]+21ixi22[(g2(t)σ2(t))p(x,t)]+21ixi22[σ2(t)p(x,t)]=ixi[fi(x,t)p(x,t)]+21ixi(g2(t)σ2(t))xip(x,t)+21ixi22[σ2(t)p(x,t)]=ixi[fi(x,t)p(x,t)]+21ixi(g2(t)σ2(t))p(x,t)xilogp(x,t)+21ixi22[σ2(t)p(x,t)]=ixi[(fi(x,t)21(g2(t)σ2(t))xilogp(x,t))p(x,t)]+21ixi22[σ2(t)p(x,t)]

利用Fokker-Planck方程的对应关系,则有
d x = ( f ( x , t ) − 1 2 ( g 2 ( t ) − σ 2 ( t ) ) ∇ x log ⁡ p ( x , t ) ) d t + σ ( t ) d w dx = \left(f(x,t) - \frac{1}{2}(g^2(t) - \sigma^2(t)) \nabla_{x} \log p(x,t) \right) dt + \sigma(t)dw dx=(f(x,t)21(g2(t)σ2(t))xlogp(x,t))dt+σ(t)dw

特别的,当 δ ( t ) = 0 \delta(t)=0 δ(t)=0时,则有
d x = ( f ( x , t ) − 1 2 g 2 ( t ) ∇ x log ⁡ p ( x , t ) ) d t dx = \left(f(x,t) - \frac{1}{2}g^2(t) \nabla_{x} \log p(x,t) \right) dt dx=(f(x,t)21g2(t)xlogp(x,t))dt
上述式子又被称为Probability Flow (PF) ODE

  • 24
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值