文章目录
前言
yang song博士在《Score-Based Generative Modeling Through Stochastic Differential Equations》一文中提出可以使用SDE(随机微分方程)来刻画Diffusion model的前向过程,并且用SDE统一了Score-based Model (NCSN)和DDPM的前向过程与反向过程。此外,SDE对应了多个前向过程,即从一张图到某个噪声点的加噪方式有多种,但其中存在一个ODE(常微分方程)形式的前向过程,即不存在随机变量的确定性的前向过程。
本文将总结SDE与DDPM的关系,并给出相应推导
SDE是什么
SDE具体的数学形式如下:
d
x
=
f
(
x
,
t
)
d
t
+
g
(
t
)
d
w
(1.0)
dx=f(x,t)dt+g(t)dw\tag{1.0}
dx=f(x,t)dt+g(t)dw(1.0)
f ( x , t ) f(x,t) f(x,t)表示自变量 x x x随着时间 t t t确定性的变化(又被称为drift coefficients), g ( t ) g(t) g(t)是一项与时间 t t t相关的函数(又被称为diffusion coefficients), d w dw dw为布朗运动的增量,是一个随机项(可以理解为噪声)
SDE与DDPM前向过程的关系
我们将上述部分微分项展开并移位可得
x t + Δ t − x t = f ( x , t ) d t + g ( t ) d w x t + Δ t = x t + f ( x , t ) d t + g ( t ) d w (2.0) \begin{aligned} x_{t+\Delta t}-x_t & = f(x,t)dt+g(t)dw\\ x_{t+\Delta t}& =x_t+f(x,t)dt+g(t)dw\tag{2.0} \end{aligned} xt+Δt−xtxt+Δt=f(x,t)dt+g(t)dw=xt+f(x,t)dt+g(t)dw(2.0)
我们将 x t x_t xt看成是前向过程 t t t时刻的图像,则下一时刻 t + Δ t t+\Delta t t+Δt的图像 x t + Δ t x_{t+\Delta t} xt+Δt可通过式2.0加噪得到。
接下来,我们将简单推导式2.0与DDPM前向过程的关系,已知DDPM的前向过程为
x
t
+
Δ
t
=
1
−
β
t
+
Δ
t
x
t
+
β
t
+
Δ
t
ϵ
t
(2.1)
x_{t+\Delta t}=\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t} \tag{2.1}
xt+Δt=1−βt+Δtxt+βt+Δtϵt(2.1)
设 β ‾ t + Δ t = T β t + Δ t \overline \beta_{t+\Delta t}=T\beta_{t+\Delta t} βt+Δt=Tβt+Δt, Δ t = 1 T \Delta t=\frac{1}{T} Δt=T1,则式2.1为
x t + Δ t = 1 − β t + Δ t x t + β t + Δ t ϵ t = 1 − β ‾ t + Δ t T x t + β t + Δ t ϵ t = 1 − β ‾ t + Δ t Δ t x t + β t + Δ t ϵ t (2.2) \begin{aligned} x_{t+\Delta t}=&\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\frac{\overline \beta_{t+\Delta t}}{T}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\overline \beta_{t+\Delta t}\Delta t} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\tag{2.2} \end{aligned} xt+Δt===1−βt+Δtxt+βt+Δtϵt1−Tβt+Δtxt+βt+Δtϵt1−βt+ΔtΔtxt+βt+Δtϵt(2.2)
当
Δ
t
\Delta t
Δt趋近于0,依据等价无穷小代换,式2.2有
x
t
+
Δ
t
=
1
−
β
t
+
Δ
t
x
t
+
β
t
+
Δ
t
ϵ
t
=
1
−
β
‾
t
+
Δ
t
Δ
t
x
t
+
β
t
+
Δ
t
ϵ
t
≈
(
1
−
1
2
β
‾
t
+
Δ
t
Δ
t
)
x
t
+
β
t
+
Δ
t
ϵ
t
=
x
t
−
1
2
β
‾
t
+
Δ
t
x
t
d
t
+
β
t
+
Δ
t
ϵ
t
(2.3)
\begin{aligned} x_{t+\Delta t}=&\sqrt{1-\beta_{t+\Delta t}} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&\sqrt{1-\overline \beta_{t+\Delta t}\Delta t} x_{t}+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ \approx&(1-\frac{1}{2}\overline \beta_{t+\Delta t}\Delta t)x_t+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\\ =&x_t-\frac{1}{2}\overline \beta_{t+\Delta t}x_t dt+\sqrt{\beta_{t+\Delta t}} \epsilon_{t}\tag{2.3} \end{aligned}
xt+Δt==≈=1−βt+Δtxt+βt+Δtϵt1−βt+ΔtΔtxt+βt+Δtϵt(1−21βt+ΔtΔt)xt+βt+Δtϵtxt−21βt+Δtxtdt+βt+Δtϵt(2.3)
比对式2.3与2.0,则有
f ( x , t ) = − 1 2 β ‾ t + Δ t x t g ( t ) = β t + Δ t d w = ϵ t \begin{aligned} f(x,t)&=-\frac{1}{2}\overline \beta_{t+\Delta t}x_t\\ g(t)&=\sqrt{\beta_{t+\Delta t}}\\ dw&=\epsilon_{t} \end{aligned} f(x,t)g(t)dw=−21βt+Δtxt=βt+Δt=ϵt
逆向过程的SDE
前文我们已经介绍了Diffusion model的前向过程可以用SDE描述,本节将推导出逆向过程的SDE形式。
令
d
w
=
Δ
t
ϵ
dw=\sqrt{\Delta t}\epsilon
dw=Δtϵ,由式2.0,可得
p
(
x
t
+
Δ
t
∣
x
t
)
=
N
(
x
t
+
Δ
t
;
x
t
+
f
(
x
t
,
Δ
t
)
Δ
t
,
g
2
(
t
)
Δ
t
)
(3.0)
p(x_{t+\Delta t}|x_t)=\mathcal N(x_{t+\Delta t};x_t+f(x_t,\Delta t)\Delta t,g^2(t)\Delta t)\tag{3.0}
p(xt+Δt∣xt)=N(xt+Δt;xt+f(xt,Δt)Δt,g2(t)Δt)(3.0)
利用贝叶斯公式,则逆向过程为
q
(
x
t
∣
x
t
+
Δ
t
)
=
q
(
x
t
+
Δ
t
∣
x
t
)
q
(
x
t
)
q
(
x
t
+
Δ
t
)
=
q
(
x
t
+
Δ
t
∣
x
t
)
exp
{
log
p
(
x
t
)
−
log
p
(
x
t
+
Δ
t
)
}
(3.1)
\begin{aligned} q(x_{t}|x_{t+\Delta t})&=\frac{q(x_{t+\Delta t}|x_{t})q(x_{t})}{q(x_{t+\Delta t})}\\ &=q(x_{t+\Delta t}|x_{t})\exp\{\log p(x_t)-\log p(x_{t+\Delta t})\}\tag{3.1} \end{aligned}
q(xt∣xt+Δt)=q(xt+Δt)q(xt+Δt∣xt)q(xt)=q(xt+Δt∣xt)exp{logp(xt)−logp(xt+Δt)}(3.1)
利用泰勒展开,则有
log p ( x t + Δ t ) ≈ log p ( x t ) + ( x t + Δ t − x t ) ∇ x log p ( x t ) (3.2) \log p(x_{t+\Delta t}) \approx \log p(x_t)+(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\tag{3.2} logp(xt+Δt)≈logp(xt)+(xt+Δt−xt)∇xlogp(xt)(3.2)
代入式3.1,并且结合式3.0,则有
q
(
x
t
∣
x
t
+
Δ
t
)
=
q
(
x
t
+
Δ
t
∣
x
t
)
exp
{
−
(
x
t
+
Δ
t
−
x
t
)
∇
x
log
p
(
x
t
)
}
≈
exp
{
−
(
x
t
+
Δ
t
−
x
t
−
f
(
x
t
,
t
)
Δ
t
)
2
+
2
g
2
(
t
)
Δ
t
(
x
t
+
Δ
t
−
x
t
)
∇
x
log
p
(
x
t
)
2
g
2
(
t
)
Δ
t
}
(3.3)
\begin{aligned} q(x_{t}|x_{t+\Delta t})&=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)}{2g^2(t)\Delta t }\}\tag{3.3} \end{aligned}
q(xt∣xt+Δt)=q(xt+Δt∣xt)exp{−(xt+Δt−xt)∇xlogp(xt)}≈exp{−2g2(t)Δt(xt+Δt−xt−f(xt,t)Δt)2+2g2(t)Δt(xt+Δt−xt)∇xlogp(xt)}(3.3)
为了后续书写方便,令
a
=
f
(
x
t
,
t
)
Δ
t
b
=
g
2
(
t
)
Δ
t
\begin{aligned} a&=f(x_t,t)\Delta t\\ b&=g^2(t)\Delta t \end{aligned}
ab=f(xt,t)Δt=g2(t)Δt
则有
( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log p ( x t ) = ( x t + Δ t − x t − a ) 2 + 2 b ( x t + Δ t − x t ) ∇ x log p ( x t ) = ( x t + Δ t − x t ) 2 − 2 a ( x t + Δ t − x t ) + a 2 + 2 b ( x t + Δ t − x t ) ∇ x log p ( x t ) = ( x t + Δ t − x t ) 2 − 2 ( a − b ∇ x log p ( x t ) ) ( x t + Δ t − x t ) + ( a − b ) 2 + a 2 − ( a − b ) 2 = ( x t + Δ t − x t − ( a − b ∇ x log p ( x t ) ) ) 2 + a 2 − ( a − b ∇ x log p ( x t ) ) 2 \begin{aligned} &(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t-a)^2+2b(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t)^2-2a(x_{t+\Delta t}-x_t)+a^2+2b(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\\ &=(x_{t+\Delta t}-x_t)^2-2(a-b\nabla_{x}\log p(x_t))(x_{t+\Delta t}-x_t)+(a-b)^2+a^2-(a-b)^2\\ &=(x_{t+\Delta t}-x_t-(a-b\nabla_{x}\log p(x_t)))^2+a^2-(a-b\nabla_{x}\log p(x_t))^2 \end{aligned} (xt+Δt−xt−f(xt,t)Δt)2+2g2(t)Δt(xt+Δt−xt)∇xlogp(xt)=(xt+Δt−xt−a)2+2b(xt+Δt−xt)∇xlogp(xt)=(xt+Δt−xt)2−2a(xt+Δt−xt)+a2+2b(xt+Δt−xt)∇xlogp(xt)=(xt+Δt−xt)2−2(a−b∇xlogp(xt))(xt+Δt−xt)+(a−b)2+a2−(a−b)2=(xt+Δt−xt−(a−b∇xlogp(xt)))2+a2−(a−b∇xlogp(xt))2
当 Δ t \Delta t Δt趋近0时,则有
a 2 2 b = f ( x t , t ) 2 Δ t 2 g 2 ( t ) → 0 ( a − b ∇ x log p ( x t ) ) 2 2 b = ( f ( x t , t ) − g 2 ( t ) ∇ x log p ( x t ) ) Δ t g 2 ( t ) → 0 \begin{aligned} \frac{a^2}{2b}&=\frac{f(x_t,t)^2\Delta t}{2g^2(t)} \rightarrow 0\\ \frac{(a-b\nabla_{x}\log p(x_t))^2}{2b}&=\frac{(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))\Delta t}{g^2(t)} \rightarrow 0 \end{aligned} 2ba22b(a−b∇xlogp(xt))2=2g2(t)f(xt,t)2Δt→0=g2(t)(f(xt,t)−g2(t)∇xlogp(xt))Δt→0
则当 Δ t \Delta t Δt趋近0,式3.3为
q ( x t ∣ x t + Δ t ) = q ( x t + Δ t ∣ x t ) exp { − ( x t + Δ t − x t ) ∇ x log p ( x t ) } ≈ exp { − ( x t + Δ t − x t − f ( x t , t ) Δ t ) 2 + 2 g 2 ( t ) Δ t ( x t + Δ t − x t ) ∇ x log p ( x t ) 2 g 2 ( t ) Δ t } = exp { − ( x t + Δ t − x t − ( f ( x t , t ) Δ t − g 2 ( t ) Δ t ∇ x log p ( x t ) ) ) 2 2 g 2 ( t ) Δ t } (3.4) \begin{aligned} q(x_{t}|x_{t+\Delta t})&=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_t)}{2g^2(t)\Delta t }\}\\ &=\exp\{-\frac{(x_{t+\Delta t}-x_t-(f(x_t,t)\Delta t-g^2(t)\Delta t\nabla_{x}\log p(x_t)))^2}{2g^2(t)\Delta t}\}\tag{3.4} \end{aligned} q(xt∣xt+Δt)=q(xt+Δt∣xt)exp{−(xt+Δt−xt)∇xlogp(xt)}≈exp{−2g2(t)Δt(xt+Δt−xt−f(xt,t)Δt)2+2g2(t)Δt(xt+Δt−xt)∇xlogp(xt)}=exp{−2g2(t)Δt(xt+Δt−xt−(f(xt,t)Δt−g2(t)Δt∇xlogp(xt)))2}(3.4)
则有
q ( x t ∣ x t + Δ t ) = N ( x t ∣ x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log p ( x t ) , 2 g 2 ( t ) Δ t ) (3.5) q(x_t|x_{t+\Delta t})=\mathcal N(x_t|x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_t),2g^2(t)\Delta t)\tag{3.5} q(xt∣xt+Δt)=N(xt∣xt+Δt−f(xt,t)Δt+g2(t)Δt∇xlogp(xt),2g2(t)Δt)(3.5)
设噪声 z z z服从标准正态分布,则式3.5写成SDE的形式为
x t = x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log p ( x t ) + g ( t ) 2 Δ t z d x = ( f ( x t , t ) − g 2 ( t ) ∇ x log p ( x t ) ) d t − g ( t ) 2 Δ t z = ( f ( x t , t ) − g 2 ( t ) ∇ x log p ( x t ) ) d t + g ( t ) d w ‾ \begin{aligned} x_t&=x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_t)+g(t)\sqrt{2\Delta t}z\\ dx&=(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))dt-g(t)\sqrt{2\Delta t}z\\ &=(f(x_t,t)-g^2(t)\nabla_{x}\log p(x_t))dt+g(t)d\overline{w} \end{aligned} xtdx=xt+Δt−f(xt,t)Δt+g2(t)Δt∇xlogp(xt)+g(t)2Δtz=(f(xt,t)−g2(t)∇xlogp(xt))dt−g(t)2Δtz=(f(xt,t)−g2(t)∇xlogp(xt))dt+g(t)dw
∇ x t log p ( x t ) \nabla_{x_t}\log p(x_t) ∇xtlogp(xt)与DDPM预测的噪声 ϵ \epsilon ϵ的关系
score base model一般会用神经网络拟合 ∇ x t log p ( x t ) \nabla_{x_t}\log p(x_t) ∇xtlogp(xt),DDPM其实是一种特殊的score base model,已知DDPM的前向过程为
x t = α ˉ t x 0 + 1 − α ˉ t ϵ t (4.0) x_t=\sqrt{\bar \alpha_t}x_0+\sqrt{1-\bar\alpha_t}\epsilon_t\tag{4.0} xt=αˉtx0+1−αˉtϵt(4.0)
依据Tweedie方法,我们有
α
ˉ
t
x
0
=
x
t
+
(
1
−
α
ˉ
t
)
∇
x
log
p
(
x
t
)
\begin{aligned} \sqrt{\bar \alpha_t}x_0=x_t+(1-\bar\alpha_t)\nabla_{x}\log p(x_t) \end{aligned}
αˉtx0=xt+(1−αˉt)∇xlogp(xt)
进而有
x
t
=
α
ˉ
t
x
0
−
(
1
−
α
ˉ
t
)
∇
x
log
p
(
x
t
)
(4.2)
x_t=\sqrt{\bar \alpha_t}x_0-(1-\bar\alpha_t)\nabla_{x}\log p(x_t)\tag{4.2}
xt=αˉtx0−(1−αˉt)∇xlogp(xt)(4.2)
结合式4.0与4.2,则有
∇ x t log p ( x t ) = − 1 1 − α ˉ t ϵ t (4.3) \nabla_{x_t}\log p(x_t)=-\frac{1}{\sqrt{1-\bar\alpha_t}}\epsilon_t\tag{4.3} ∇xtlogp(xt)=−1−αˉt1ϵt(4.3)
逆向过程SDE与DDPM逆向过程的关系
在进行正式的推导前,我们先对式3.1做个简单的变化,利用泰勒展开,则有
log
p
(
x
t
)
≈
log
p
(
x
t
+
Δ
t
)
+
(
x
t
−
x
t
+
Δ
t
)
∇
x
log
p
(
x
t
+
Δ
t
)
(5.0)
\log p(x_{t}) \approx \log p(x_{t+\Delta t})+(x_t-x_{t+\Delta t})\nabla_{x}\log p(x_{t+\Delta t})\tag{5.0}
logp(xt)≈logp(xt+Δt)+(xt−xt+Δt)∇xlogp(xt+Δt)(5.0)
代入式3.1,并结合式3.0,则有
q
(
x
t
∣
x
t
+
Δ
t
)
=
=
q
(
x
t
+
Δ
t
∣
x
t
)
exp
{
log
p
(
x
t
)
−
log
p
(
x
t
+
Δ
t
)
=
q
(
x
t
+
Δ
t
∣
x
t
)
exp
{
−
(
x
t
+
Δ
t
−
x
t
)
∇
x
log
p
(
x
t
+
Δ
t
)
}
≈
exp
{
−
(
x
t
+
Δ
t
−
x
t
−
f
(
x
t
,
t
)
Δ
t
)
2
+
2
g
2
(
t
)
Δ
t
(
x
t
+
Δ
t
−
x
t
)
∇
x
log
p
(
x
t
+
Δ
t
)
2
g
2
(
t
)
Δ
t
}
(5.1)
\begin{aligned} q(x_{t}|x_{t+\Delta t})&==q(x_{t+\Delta t}|x_{t})\exp\{\log p(x_t)-\log p(x_{t+\Delta t})\\ &=q(x_{t+\Delta t}|x_{t})\exp \{-(x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_{t+\Delta t})\}\\ &\approx\exp\{-\frac{(x_{t+\Delta t}-x_t-f(x_t,t)\Delta t)^2+2g^2(t)\Delta t (x_{t+\Delta t}-x_t)\nabla_{x}\log p(x_{t+\Delta t})}{2g^2(t)\Delta t }\}\tag{5.1} \end{aligned}
q(xt∣xt+Δt)==q(xt+Δt∣xt)exp{logp(xt)−logp(xt+Δt)=q(xt+Δt∣xt)exp{−(xt+Δt−xt)∇xlogp(xt+Δt)}≈exp{−2g2(t)Δt(xt+Δt−xt−f(xt,t)Δt)2+2g2(t)Δt(xt+Δt−xt)∇xlogp(xt+Δt)}(5.1)
因此我们可将式3.5重写为
q
(
x
t
∣
x
t
+
Δ
t
)
=
N
(
x
t
∣
x
t
+
Δ
t
−
f
(
x
t
,
t
)
Δ
t
+
g
2
(
t
)
Δ
t
∇
x
log
p
(
x
t
+
Δ
t
)
,
2
g
2
(
t
)
Δ
t
)
(5.2)
q(x_t|x_{t+\Delta t})=\mathcal N(x_t|x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_{t+\Delta t}),2g^2(t)\Delta t)\tag{5.2}
q(xt∣xt+Δt)=N(xt∣xt+Δt−f(xt,t)Δt+g2(t)Δt∇xlogp(xt+Δt),2g2(t)Δt)(5.2)
已知用SDE表示DDPM的前向过程时,有
f ( x , t ) = − 1 2 β ‾ t + Δ t x t g ( t ) = β t + Δ t \begin{aligned} f(x,t)&=-\frac{1}{2}\overline \beta_{t+\Delta t}x_t\\ g(t)&=\sqrt{\beta_{t+\Delta t}} \end{aligned} f(x,t)g(t)=−21βt+Δtxt=βt+Δt
其中
β
‾
t
+
Δ
t
=
T
β
t
+
Δ
t
=
β
t
+
Δ
t
Δ
t
\overline \beta_{t+\Delta t}=T\beta_{t+\Delta t}=\frac{\beta_{t+{\Delta t}}}{\Delta t}
βt+Δt=Tβt+Δt=Δtβt+Δt
T = 1 Δ t T=\frac{1}{\Delta t} T=Δt1,设 z z z服从标准正态分布,代入式5.2并结合式4.1,当 Δ t \Delta t Δt趋近于0时,有
x t = x t + Δ t − f ( x t , t ) Δ t + g 2 ( t ) Δ t ∇ x log p ( x t + Δ t ) + g ( t ) 2 Δ t z x t = x t + Δ t + 1 2 β ‾ t + Δ t x t Δ t + β t + Δ t Δ t ∇ x log p ( x t + Δ t ) + 2 β t + Δ t Δ t z ( 1 − 1 2 β ‾ t + Δ t Δ t ) x t = x t + Δ t + β t + Δ t Δ t ∇ x log p ( x t + Δ t ) + 2 β t + Δ t Δ t z 1 − β t + Δ t x t ≈ x t + Δ t + β t + Δ t Δ t ∇ x log p ( x t + Δ t ) + 2 β t + Δ t Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t + β t + Δ t Δ t ∇ x log p ( x t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t + β t + Δ t Δ t ∇ x log p ( x t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z x t ≈ 1 1 − β t + Δ t ( x t + Δ t − β t + Δ t Δ t 1 − α ˉ t ϵ t + Δ t ) ) + 2 β t + Δ t Δ t 1 − β t + Δ t z \begin{aligned} x_t&=x_{t+\Delta t}-f(x_t,t)\Delta t+g^2(t)\Delta t\nabla_{x}\log p(x_{t+\Delta t})+g(t)\sqrt{2\Delta t}z\\ x_t&=x_{t+\Delta t}+\frac{1}{2}\overline \beta_{t+\Delta t}x_t\Delta t+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ (1-\frac{1}{2}\overline \beta_{t+\Delta t}\Delta t)x_t&=x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ \sqrt{1- \beta_{t+\Delta t}}x_t&\approx x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t})+\sqrt{2\beta_{t+\Delta t}\Delta t}z\\ x_t&\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z\\ x_t&\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t}+\beta_{t+\Delta t}\Delta t \nabla_{x}\log p(x_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z\\ x_t &\approx \frac{1}{\sqrt{1- \beta_{t+\Delta t}}}(x_{t+\Delta t} -\frac{\beta_{t+\Delta t}\Delta t}{\sqrt{1-\bar\alpha_t}}\epsilon_{t+\Delta t}))+\sqrt{\frac{2\beta_{t+\Delta t}\Delta t}{1-\beta_{t+\Delta t}}}z \end{aligned} xtxt(1−21βt+ΔtΔt)xt1−βt+Δtxtxtxtxt=xt+Δt−f(xt,t)Δt+g2(t)Δt∇xlogp(xt+Δt)+g(t)2Δtz=xt+Δt+21βt+ΔtxtΔt+βt+ΔtΔt∇xlogp(xt+Δt)+2βt+ΔtΔtz=xt+Δt+βt+ΔtΔt∇xlogp(xt+Δt)+2βt+ΔtΔtz≈xt+Δt+βt+ΔtΔt∇xlogp(xt+Δt)+2βt+ΔtΔtz≈1−βt+Δt1(xt+Δt+βt+ΔtΔt∇xlogp(xt+Δt))+1−βt+Δt2βt+ΔtΔtz≈1−βt+Δt1(xt+Δt+βt+ΔtΔt∇xlogp(xt+Δt))+1−βt+Δt2βt+ΔtΔtz≈1−βt+Δt1(xt+Δt−1−αˉtβt+ΔtΔtϵt+Δt))+1−βt+Δt2βt+ΔtΔtz
当 Δ t = 1 \Delta t=1 Δt=1时,则有
x t ≈ 1 1 − β t + 1 ( x t + 1 − β t + 1 1 − α ˉ t ϵ t + 1 ) ) + 2 β t + 1 1 − β t + 1 z x_t\approx \frac{1}{\sqrt{1- \beta_{t+1}}}(x_{t+1} -\frac{\beta_{t+1}}{\sqrt{1-\bar\alpha_t}}\epsilon_{t+1}))+\sqrt{\frac{2\beta_{t+1}}{1-\beta_{t+1}}}z xt≈1−βt+11(xt+1−1−αˉtβt+1ϵt+1))+1−βt+12βt+1z
不过这种约等于号是真的很膈应,是不能做完全等价的。
Probability Flow (PF) ODE
有多种SDE,可以将一张图像变为某个噪声点,其中也包括一个ODE(即去除掉SDE中布朗运动增量)
对于前向过程
d x = f ( x , t ) d t + g ( t ) d w dx=f(x,t)dt+g(t)dw dx=f(x,t)dt+g(t)dw
由Fokker-Planck方程可得
∂ p ( x , t ) ∂ t = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i , j ∂ 2 ∂ x i x j { [ g 2 ( t ) I ] i j p ( x , t ) } = − ∑ i ∂ ∂ x i [ f i ( x , t ) p ( x , t ) ] + 1 2 ∑ i ∂ 2 ∂ x i 2 [ g 2 ( t ) p ( x , t ) ] \begin{aligned} \frac{\partial p(x,t)}{\partial t} &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i,j}\frac{\partial^2}{\partial x_i x_j}\left\{[g^2(t)I]_{ij} p(x,t)\right\} \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[g^2(t) p(x,t)] \end{aligned} ∂t∂p(x,t)=−i∑∂xi∂[fi(x,t)p(x,t)]+21i,j∑∂xixj∂2{[g2(t)I]ijp(x,t)}=−i∑∂xi∂[fi(x,t)p(x,t)]+21i∑∂xi2∂2[g2(t)p(x,t)]
对上述式子做个等价变换,则有
∂
p
(
x
,
t
)
∂
t
=
−
∑
i
∂
∂
x
i
[
f
i
(
x
,
t
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
2
∂
x
i
2
[
g
2
(
t
)
p
(
x
,
t
)
]
=
−
∑
i
∂
∂
x
i
[
f
i
(
x
,
t
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
2
∂
x
i
2
[
(
g
2
(
t
)
−
σ
2
(
t
)
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
2
∂
x
i
2
[
σ
2
(
t
)
p
(
x
,
t
)
]
=
−
∑
i
∂
∂
x
i
[
f
i
(
x
,
t
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
∂
x
i
(
g
2
(
t
)
−
σ
2
(
t
)
)
∂
∂
x
i
p
(
x
,
t
)
+
1
2
∑
i
∂
2
∂
x
i
2
[
σ
2
(
t
)
p
(
x
,
t
)
]
=
−
∑
i
∂
∂
x
i
[
f
i
(
x
,
t
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
∂
x
i
(
g
2
(
t
)
−
σ
2
(
t
)
)
p
(
x
,
t
)
∂
∂
x
i
log
p
(
x
,
t
)
+
1
2
∑
i
∂
2
∂
x
i
2
[
σ
2
(
t
)
p
(
x
,
t
)
]
=
−
∑
i
∂
∂
x
i
[
(
f
i
(
x
,
t
)
−
1
2
(
g
2
(
t
)
−
σ
2
(
t
)
)
∂
∂
x
i
log
p
(
x
,
t
)
)
p
(
x
,
t
)
]
+
1
2
∑
i
∂
2
∂
x
i
2
[
σ
2
(
t
)
p
(
x
,
t
)
]
\begin{aligned} \frac{\partial p(x,t)}{\partial t} &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[g^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[(g^2(t) - \sigma^2(t)) p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial}{\partial x_i} (g^2(t) - \sigma^2(t)) \frac{\partial}{\partial x_i} p(x,t) + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i}[f_i(x,t)p(x,t)] + \frac{1}{2} \sum_{i}\frac{\partial}{\partial x_i}(g^2(t) - \sigma^2(t)) p(x,t) \frac{\partial}{\partial x_i} \log p(x,t) + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \\ &= -\sum_{i} \frac{\partial}{\partial x_i} \left[ \left(f_i(x,t) - \frac{1}{2}(g^2(t) - \sigma^2(t)) \frac{\partial}{\partial x_i} \log p(x,t) \right)p(x,t) \right] + \frac{1}{2} \sum_{i}\frac{\partial^2}{\partial x_i^2}[\sigma^2(t) p(x,t)] \end{aligned}
∂t∂p(x,t)=−i∑∂xi∂[fi(x,t)p(x,t)]+21i∑∂xi2∂2[g2(t)p(x,t)]=−i∑∂xi∂[fi(x,t)p(x,t)]+21i∑∂xi2∂2[(g2(t)−σ2(t))p(x,t)]+21i∑∂xi2∂2[σ2(t)p(x,t)]=−i∑∂xi∂[fi(x,t)p(x,t)]+21i∑∂xi∂(g2(t)−σ2(t))∂xi∂p(x,t)+21i∑∂xi2∂2[σ2(t)p(x,t)]=−i∑∂xi∂[fi(x,t)p(x,t)]+21i∑∂xi∂(g2(t)−σ2(t))p(x,t)∂xi∂logp(x,t)+21i∑∂xi2∂2[σ2(t)p(x,t)]=−i∑∂xi∂[(fi(x,t)−21(g2(t)−σ2(t))∂xi∂logp(x,t))p(x,t)]+21i∑∂xi2∂2[σ2(t)p(x,t)]
利用Fokker-Planck方程的对应关系,则有
d
x
=
(
f
(
x
,
t
)
−
1
2
(
g
2
(
t
)
−
σ
2
(
t
)
)
∇
x
log
p
(
x
,
t
)
)
d
t
+
σ
(
t
)
d
w
dx = \left(f(x,t) - \frac{1}{2}(g^2(t) - \sigma^2(t)) \nabla_{x} \log p(x,t) \right) dt + \sigma(t)dw
dx=(f(x,t)−21(g2(t)−σ2(t))∇xlogp(x,t))dt+σ(t)dw
特别的,当
δ
(
t
)
=
0
\delta(t)=0
δ(t)=0时,则有
d
x
=
(
f
(
x
,
t
)
−
1
2
g
2
(
t
)
∇
x
log
p
(
x
,
t
)
)
d
t
dx = \left(f(x,t) - \frac{1}{2}g^2(t) \nabla_{x} \log p(x,t) \right) dt
dx=(f(x,t)−21g2(t)∇xlogp(x,t))dt
上述式子又被称为Probability Flow (PF) ODE