最近在研究 diffusion model,涉及到了 Time-reversal formula,reverse-time process,以及相关的一些随机过程、随机分析的理论知识,搞得头很晕,特意做了这篇整理。
关注的 SDE 如下
d X t = f ( X t , t ) d t + g ( X t , t ) d W t , (1) dX_t=f(X_t,t)dt+g(X_t,t)dW_t,\tag{1} dXt=f(Xt,t)dt+g(Xt,t)dWt,(1)
我们分别介绍 Forward Kolmogorov 方程和 Backward Kolmogorov 方程(这里就不用正向、倒向来描述了,避免中文带来的直观感觉与后面出现的 reverse-time 产生矛盾)
Forward Kolmogorov 方程为:对
t
⩾
s
t\geqslant s
t⩾s,
−
∂
p
(
x
t
,
t
∣
x
s
,
s
)
∂
t
=
∑
i
∂
∂
x
t
i
[
p
(
x
t
,
t
∣
x
s
,
s
)
f
i
(
x
t
,
t
)
]
−
1
2
∑
i
,
j
,
k
∂
2
[
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
p
(
x
t
,
t
∣
x
s
,
s
)
]
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_t,t|x_s,s)}{\partial t}=\sum_i\dfrac{\partial}{\partial x_t^i}\left[p(x_t,t|x_s,s)f^i(x_t,t)\right]-\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[g^{ik}(x_t,t)g^{jk}(x_t,t)p(x_t,t|x_s,s)\right]}{\partial x_t^i\partial x_t^j}
−∂t∂p(xt,t∣xs,s)=i∑∂xti∂[p(xt,t∣xs,s)fi(xt,t)]−21i,j,k∑∂xti∂xtj∂2[gik(xt,t)gjk(xt,t)p(xt,t∣xs,s)]
由于偏导数是针对
t
t
t 时刻的状态计算的,因此我们可以对
x
s
,
s
x_s,s
xs,s 条件直接积分,也即方程左右两边同乘
p
(
x
s
,
s
)
p(x_s,s)
p(xs,s) 后对
d
x
s
dx_s
dxs 积分,求偏导和积分运算交换后得到 Fokker-Planck 方程:
−
∂
p
(
x
t
,
t
)
∂
t
=
∑
i
∂
∂
x
t
i
[
p
(
x
t
,
t
)
f
i
(
x
t
,
t
)
]
−
1
2
∑
i
,
j
,
k
∂
2
[
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
p
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_t,t)}{\partial t}=\sum_i\dfrac{\partial}{\partial x_t^i}\left[p(x_t,t)f^i(x_t,t)\right]-\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[g^{ik}(x_t,t)g^{jk}(x_t,t)p(x_t,t)\right]}{\partial x_t^i\partial x_t^j}
−∂t∂p(xt,t)=i∑∂xti∂[p(xt,t)fi(xt,t)]−21i,j,k∑∂xti∂xtj∂2[gik(xt,t)gjk(xt,t)p(xt,t)]
对应的 Backward Kolmogorov 方程为:对
s
⩾
t
s\geqslant t
s⩾t
−
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
t
=
∑
i
f
i
(
x
t
,
t
)
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
i
+
1
2
∑
i
,
j
,
k
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
∂
2
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_s,s|x_t,t)}{\partial t}=\sum_i f^i(x_t,t)\dfrac{\partial p(x_s,s|x_t,t)}{\partial x_t^i}+\dfrac{1}{2}\sum_{i,j,k}g^{ik}(x_t,t)g^{jk}(x_t,t)\dfrac{\partial^2 p(x_s,s|x_t,t)}{\partial x_t^i\partial x_t^j}
−∂t∂p(xs,s∣xt,t)=i∑fi(xt,t)∂xti∂p(xs,s∣xt,t)+21i,j,k∑gik(xt,t)gjk(xt,t)∂xti∂xtj∂2p(xs,s∣xt,t)
注意转移概率描述的还是
X
t
X_t
Xt 到
X
s
X_s
Xs 的转移概率,是正向的转移概率,但是方程中对
t
t
t 和
x
t
x_t
xt 求偏导,描述的是初始状态的变化对转移概率的影响。
至此,我们先做一步总结:Kolmogorov 方程的主体都是正向转移概率的变化,Forward Kolmogorov 描述的是正向转移概率的末端变化,Backward Kolmogorov 描述的是正向转移概率的始端变化。(这都不是什么术语,只是我为了方便区分给出的命名)
我们开始进一步的研究:倒向转移概率,在后面的过程中,注意每一步都要明确时间的先后。我们要计算的是
∂
p
(
x
t
,
t
,
x
s
,
s
)
∂
t
,
s
⩾
t
\dfrac{\partial p(x_t,t,x_s,s)}{\partial t},\quad s\geqslant t
∂t∂p(xt,t,xs,s),s⩾t
这里再强调一下,上面的联合分布
t
,
s
t,s
t,s 的大小关系看起来不太重要,但重要的是,我们研究的是始端状态的变化
∂
t
\partial t
∂t,事实上,如果研究末端变化,直接由 Forward Kolmogorov 方程两侧同乘
p
(
x
s
,
s
)
p(x_s,s)
p(xs,s) 就得到了。而始端变化就稍微复杂些,我们计算
∂
p
(
x
t
,
t
,
x
s
,
s
)
∂
t
=
∂
(
p
(
x
s
,
s
∣
x
t
,
t
)
p
(
x
t
,
t
)
)
∂
t
=
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
t
p
(
x
t
,
t
)
+
p
(
x
s
,
s
∣
x
t
,
t
)
∂
p
(
x
t
,
t
)
∂
t
=
−
p
(
x
t
,
t
)
[
∑
i
f
i
(
x
t
,
t
)
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
i
+
1
2
∑
i
,
j
,
k
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
∂
2
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
i
∂
x
t
j
]
−
p
(
x
s
,
s
∣
x
t
,
t
)
[
∑
i
∂
∂
x
t
i
[
p
(
x
t
,
t
)
f
i
(
x
t
,
t
)
]
−
1
2
∑
i
,
j
,
k
∂
2
[
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
p
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
]
=
−
∑
i
∂
∂
x
t
i
[
f
i
(
x
t
,
t
)
p
(
x
t
,
t
)
p
(
x
s
,
s
∣
x
t
,
t
)
]
−
1
2
[
p
(
x
t
,
t
)
∑
i
,
j
,
k
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
∂
2
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
i
∂
x
t
j
+
2
∑
i
,
j
,
k
∂
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
j
+
p
(
x
s
,
s
∣
x
t
,
t
)
∑
i
,
j
,
k
∂
2
[
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
p
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
]
+
p
(
x
s
,
s
∣
x
t
,
t
)
∑
i
,
j
,
k
∂
2
[
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
p
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
+
∑
i
,
j
,
k
∂
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
p
(
x
s
,
s
∣
x
t
,
t
)
∂
x
t
j
=
−
∑
i
∂
∂
x
t
i
[
f
i
(
x
t
,
t
)
p
(
x
t
,
t
,
x
s
,
s
)
]
−
1
2
∑
i
,
j
,
k
∂
2
[
p
(
x
t
,
t
,
x
s
,
s
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
+
∑
i
,
j
,
k
∂
∂
x
t
i
[
p
(
x
s
,
s
∣
x
t
,
t
)
∂
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
j
]
=
−
∑
i
∂
∂
x
t
i
[
p
(
x
t
,
t
,
x
s
,
s
)
⋅
(
f
i
(
x
t
,
t
)
−
1
p
(
x
t
,
t
)
∑
j
,
k
∂
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
j
)
]
−
1
2
∑
i
,
j
,
k
∂
2
[
p
(
x
t
,
t
,
x
s
,
s
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
\begin{align*} &\quad \dfrac{\partial p(x_t,t,x_s,s)}{\partial t}\\ &=\dfrac{\partial \left(p(x_s,s|x_t,t)p(x_t,t)\right)}{\partial t}\\ &=\dfrac{\partial p(x_s,s|x_t,t)}{\partial t}p(x_t,t)+p(x_s,s|x_t,t)\dfrac{\partial p(x_t,t)}{\partial t}\\ &=-p(x_t,t)\left[\sum_if^i(x_t,t)\dfrac{\partial p(x_s,s|x_t,t)}{\partial x_t^i}+\dfrac{1}{2}\sum_{i,j,k}g^{ik}(x_t,t)g^{jk}(x_t,t)\dfrac{\partial^2 p(x_s,s|x_t,t)}{\partial x_t^i\partial x_t^j}\right]\\ &\quad-p(x_s,s|x_t,t)\left[\sum_i\dfrac{\partial}{\partial x_t^i}\left[p(x_t,t)f^i(x_t,t)\right]-\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[g^{ik}(x_t,t)g^{jk}(x_t,t)p(x_t,t)\right]}{\partial x_t^i\partial x_t^j}\right]\\ &=-\sum_i\dfrac{\partial}{\partial x_t^i}\left[f^i(x_t,t)p(x_t,t)p(x_s,s|x_t,t)\right]\\ &\quad -\dfrac{1}{2}\bigg[p(x_t,t)\sum_{i,j,k}g^{ik}(x_t,t)g^{jk}(x_t,t)\dfrac{\partial^2 p(x_s,s|x_t,t)}{\partial x_t^i\partial x_t^j}+2\sum_{i,j,k}\dfrac{\partial\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i}\dfrac{\partial p(x_s,s|x_t,t)}{\partial x_t^j}\\ &\qquad\quad+p(x_s,s|x_t,t)\sum_{i,j,k}\dfrac{\partial^2\left[g^{ik}(x_t,t)g^{jk}(x_t,t)p(x_t,t)\right]}{\partial x_t^i\partial x_t^j}\bigg]\\ &\quad +p(x_s,s|x_t,t)\sum_{i,j,k}\dfrac{\partial^2\left[g^{ik}(x_t,t)g^{jk}(x_t,t)p(x_t,t)\right]}{\partial x_t^i\partial x_t^j}+\sum_{i,j,k}\dfrac{\partial\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i}\dfrac{\partial p(x_s,s|x_t,t)}{\partial x_t^j}\\ &=-\sum_i\dfrac{\partial}{\partial x_t^i}\left[f^i(x_t,t)p(x_t,t,x_s,s)\right]\\ &\quad -\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[p(x_t,t,x_s,s)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i\partial x_t^j}\\ &\quad +\sum_{i,j,k}\dfrac{\partial}{\partial x_t^i}\left[p(x_s,s|x_t,t)\dfrac{\partial\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^j}\right]\\ &=-\sum_i\dfrac{\partial}{\partial x_t^i}\left[p(x_t,t,x_s,s)\cdot\left(f^i(x_t,t)-\dfrac{1}{p(x_t,t)}\sum_{j,k}\dfrac{\partial\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^j}\right)\right]\\ &\quad -\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[p(x_t,t,x_s,s)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i\partial x_t^j} \end{align*}
∂t∂p(xt,t,xs,s)=∂t∂(p(xs,s∣xt,t)p(xt,t))=∂t∂p(xs,s∣xt,t)p(xt,t)+p(xs,s∣xt,t)∂t∂p(xt,t)=−p(xt,t)
i∑fi(xt,t)∂xti∂p(xs,s∣xt,t)+21i,j,k∑gik(xt,t)gjk(xt,t)∂xti∂xtj∂2p(xs,s∣xt,t)
−p(xs,s∣xt,t)
i∑∂xti∂[p(xt,t)fi(xt,t)]−21i,j,k∑∂xti∂xtj∂2[gik(xt,t)gjk(xt,t)p(xt,t)]
=−i∑∂xti∂[fi(xt,t)p(xt,t)p(xs,s∣xt,t)]−21[p(xt,t)i,j,k∑gik(xt,t)gjk(xt,t)∂xti∂xtj∂2p(xs,s∣xt,t)+2i,j,k∑∂xti∂[p(xt,t)gik(xt,t)gjk(xt,t)]∂xtj∂p(xs,s∣xt,t)+p(xs,s∣xt,t)i,j,k∑∂xti∂xtj∂2[gik(xt,t)gjk(xt,t)p(xt,t)]]+p(xs,s∣xt,t)i,j,k∑∂xti∂xtj∂2[gik(xt,t)gjk(xt,t)p(xt,t)]+i,j,k∑∂xti∂[p(xt,t)gik(xt,t)gjk(xt,t)]∂xtj∂p(xs,s∣xt,t)=−i∑∂xti∂[fi(xt,t)p(xt,t,xs,s)]−21i,j,k∑∂xti∂xtj∂2[p(xt,t,xs,s)gik(xt,t)gjk(xt,t)]+i,j,k∑∂xti∂[p(xs,s∣xt,t)∂xtj∂[p(xt,t)gik(xt,t)gjk(xt,t)]]=−i∑∂xti∂
p(xt,t,xs,s)⋅
fi(xt,t)−p(xt,t)1j,k∑∂xtj∂[p(xt,t)gik(xt,t)gjk(xt,t)]
−21i,j,k∑∂xti∂xtj∂2[p(xt,t,xs,s)gik(xt,t)gjk(xt,t)]
如果整理一下系数,重新给定记号
f
ˉ
i
(
x
t
,
t
)
=
f
i
(
x
t
,
t
)
−
1
p
(
x
t
,
t
)
∑
j
,
k
∂
∂
x
t
j
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
\bar f^i(x_t,t)=f^i(x_t,t)-\dfrac{1}{p(x_t,t)}\sum_{j,k}\dfrac{\partial}{\partial x_t^j}\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]
fˉi(xt,t)=fi(xt,t)−p(xt,t)1j,k∑∂xtj∂[p(xt,t)gik(xt,t)gjk(xt,t)]
则有
t
⩽
s
t\leqslant s
t⩽s,
−
∂
p
(
x
t
,
t
,
x
s
,
s
)
∂
t
=
∑
i
∂
∂
x
t
i
[
f
ˉ
i
(
x
t
,
t
)
p
(
x
t
,
t
,
x
s
,
s
)
]
+
1
2
∑
i
,
j
,
k
∂
2
[
p
(
x
t
,
t
,
x
s
,
s
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_t,t,x_s,s)}{\partial t}=\sum_i\dfrac{\partial}{\partial x_t^i}\left[\bar f^i(x_t,t)p(x_t,t,x_s,s)\right]+\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[p(x_t,t,x_s,s)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i\partial x_t^j}
−∂t∂p(xt,t,xs,s)=i∑∂xti∂[fˉi(xt,t)p(xt,t,xs,s)]+21i,j,k∑∂xti∂xtj∂2[p(xt,t,xs,s)gik(xt,t)gjk(xt,t)]
对应地,我们可以在等式两侧同除
p
(
x
s
,
s
)
p(x_s,s)
p(xs,s) 或进一步对
d
x
s
dx_s
dxs 积分,分别能够得到条件转移概率和边缘密度函数(对应 Fokker-Planck 方程形式)的两个等式:
−
∂
p
(
x
t
,
t
∣
x
s
,
s
)
∂
t
=
∑
i
∂
∂
x
t
i
[
f
ˉ
i
(
x
t
,
t
)
p
(
x
t
,
t
∣
x
s
,
s
)
]
+
1
2
∑
i
,
j
,
k
∂
2
[
p
(
x
t
,
t
∣
x
s
,
s
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_t,t|x_s,s)}{\partial t}=\sum_i\dfrac{\partial}{\partial x_t^i}\left[\bar f^i(x_t,t)p(x_t,t|x_s,s)\right]+\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[p(x_t,t|x_s,s)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i\partial x_t^j}
−∂t∂p(xt,t∣xs,s)=i∑∂xti∂[fˉi(xt,t)p(xt,t∣xs,s)]+21i,j,k∑∂xti∂xtj∂2[p(xt,t∣xs,s)gik(xt,t)gjk(xt,t)]
以及
−
∂
p
(
x
t
,
t
)
∂
t
=
∑
i
∂
∂
x
t
i
[
f
ˉ
i
(
x
t
,
t
)
p
(
x
t
,
t
)
]
+
1
2
∑
i
,
j
,
k
∂
2
[
p
(
x
t
,
t
)
g
i
k
(
x
t
,
t
)
g
j
k
(
x
t
,
t
)
]
∂
x
t
i
∂
x
t
j
-\dfrac{\partial p(x_t,t)}{\partial t}=\sum_i\dfrac{\partial}{\partial x_t^i}\left[\bar f^i(x_t,t)p(x_t,t)\right]+\dfrac{1}{2}\sum_{i,j,k}\dfrac{\partial^2\left[p(x_t,t)g^{ik}(x_t,t)g^{jk}(x_t,t)\right]}{\partial x_t^i\partial x_t^j}
−∂t∂p(xt,t)=i∑∂xti∂[fˉi(xt,t)p(xt,t)]+21i,j,k∑∂xti∂xtj∂2[p(xt,t)gik(xt,t)gjk(xt,t)]
至此为止,我们做第二次总结:(1)我们仍然只关注了过程(1)的各种条件转移概率,边缘概率密度函数,所涉及的倒向概念,也只是针对该过程的时间先后,目前的随机过程的时间方向只有一个;(2)类比 Kolmogorov 方程,我们对倒向的转移概率
p
(
x
t
,
t
∣
x
s
,
s
)
,
t
⩽
s
p(x_t,t|x_s,s), t\leqslant s
p(xt,t∣xs,s),t⩽s 给出了一个偏导方程,但注意,我们只描述了倒向的转移概率的关于
t
t
t 的变化,关于
s
s
s 的变化暂不讨论。
本篇先整理到这里,在下一篇中就要引入 reverse time,时间的方向会出现两个,希望能够尽量以一种不使人困扰的方式把 diffusion model 所用的部分介绍清楚。