系列文章
GFlowNet Foundation 笔记(一)
GFlowNet Foundation 笔记(二)
条件流与自由能
Def 24. 已知自由能
F
(
s
)
\mathcal{F}(s)
F(s)
e
−
F
(
s
)
=
∑
s
′
:
s
′
≥
s
R
(
s
′
)
=
∑
s
′
:
s
′
≥
s
e
−
F
(
s
′
)
e^{-\mathcal{F}(s)} = \sum_{s': s' \ge s} R(s') = \sum_{s': s' \ge s} e^{-\mathcal{F}(s')}
e−F(s)=s′:s′≥s∑R(s′)=s′:s′≥s∑e−F(s′)
其中, s ′ s' s′ 为终止状态。注意, e − F ( s ) e^{-\mathcal{F}(s)} e−F(s) 并不等同于 F ( s ) F(s) F(s) 。
条件化 GFlowNet
除了 F(s) 之外,还有另一个可用于估计的量。即对于
s
≤
s
′
s \le s'
s≤s′ ,如果所有通过终止边缘
s
′
→
s
f
s' \rightarrow s_f
s′→sf 的流量都转向
s
s
s,则通过
s
s
s 的流量总和将表示自由能,如下图所示。
Def 25. 用
s
0
∣
x
s_0 | x
s0∣x 和
s
f
∣
x
s_f | x
sf∣x 分别表示在条件变量
x
x
x 下的源状态和汇状态。使用
R
(
s
∣
x
)
R(s | x)
R(s∣x) 表示在条件
x
x
x 下的终止奖励函数。
Prop 17. 当 GFlowNet 训练完成时
F
(
s
0
∣
x
)
=
∑
s
∣
x
R
(
s
∣
x
)
=
Z
(
x
)
F(s_0 | x) = \sum_{s | x} R(s | x) = Z(x)
F(s0∣x)=s∣x∑R(s∣x)=Z(x)
Proof.
∑
s
∣
x
R
(
s
∣
x
)
=
∑
s
∣
x
F
(
s
→
s
f
∣
x
)
=
∑
s
∣
x
∑
s
∈
τ
F
(
τ
)
=
∑
τ
∣
x
F
(
τ
)
=
F
(
s
0
∣
x
)
\begin{aligned} \sum_{s | x} R(s | x) &= \sum_{s | x}F(s \rightarrow s_f | x) \\ &= \sum_{s | x} \sum_{s \in \tau} F(\tau) \\ &= \sum_{\tau | x} F(\tau) = F(s_0 | x) \end{aligned}
s∣x∑R(s∣x)=s∣x∑F(s→sf∣x)=s∣x∑s∈τ∑F(τ)=τ∣x∑F(τ)=F(s0∣x)
估计自由能
Def 30. 状态条件 GFlowNet 是条件 GFlowNet 的一种特殊形式,即条件集合
X
\mathcal{X}
X 为状态集合,即
s
′
∣
s
,
∀
s
∈
X
,
s
<
s
′
s' | s, \forall s \in \mathcal{X}, s < s'
s′∣s,∀s∈X,s<s′ 。如果
s
′
→
s
′
′
∈
A
s' \rightarrow s'' \in \mathbb{A}
s′→s′′∈A ,根据马尔可夫性质
P
F
(
s
′
′
∣
s
′
,
s
)
=
P
(
s
′
→
s
′
′
∣
s
′
,
s
)
=
P
(
s
′
→
s
′
′
∣
s
′
)
=
P
F
(
s
′
′
∣
s
′
)
P_F(s'' | s', s) = P(s' \rightarrow s'' | s', s) = P(s' \rightarrow s'' | s') = P_F(s'' | s')
PF(s′′∣s′,s)=P(s′→s′′∣s′,s)=P(s′→s′′∣s′)=PF(s′′∣s′)
状态条件 GFlowNet 的训练目标为
L
=
E
(
s
0
,
s
1
,
.
.
.
,
s
n
,
s
f
)
∼
π
T
[
∑
t
=
0
n
E
0
≤
t
′
≤
t
[
L
(
s
t
,
s
t
+
1
∣
s
t
′
)
]
]
L
(
s
′
,
s
′
′
∣
s
)
=
(
l
o
g
(
δ
+
F
^
(
s
′
∣
s
)
P
^
F
(
s
′
′
∣
s
′
)
δ
+
F
^
(
s
′
′
∣
s
)
P
^
B
(
s
′
∣
s
′
′
,
s
)
)
)
2
\mathcal{L} = E_{(s_0, s_1, ..., s_n, s_f) \sim \pi_T}\Big[ \sum_{t=0}^n E_{0 \le t' \le t}[L(s_t, s_{t+1} | s_{t'})] \Big] \\ L(s', s'' | s) = \Big( log(\frac{\delta + \hat{F}(s' | s) \hat{P}_F(s'' | s')}{\delta + \hat{F}(s'' | s) \hat{P}_B(s' | s'', s)}) \Big)^2
L=E(s0,s1,...,sn,sf)∼πT[t=0∑nE0≤t′≤t[L(st,st+1∣st′)]]L(s′,s′′∣s)=(log(δ+F^(s′′∣s)P^B(s′∣s′′,s)δ+F^(s′∣s)P^F(s′′∣s′)))2
Def 31. 称 F ( s ∣ s ) F(s | s) F(s∣s) 为条件状态自流量。 F ( s ∣ s ) F(s | s) F(s∣s) 表示当仅允许通过 s 的轨迹时通过 s 的流,并通过所有 s0 产生所需的流 R(s0)。
Prop 19. 当 GFlowNet 训练完成时
e
−
F
(
s
)
=
F
(
s
∣
s
)
=
∑
s
′
≥
s
R
(
s
′
)
=
∑
s
′
≥
s
F
(
s
′
→
s
f
)
\begin{aligned} e^{-\mathcal{F}(s)} &= F(s | s) \\ &= \sum_{s' \ge s}R(s') \\ &= \sum_{s' \ge s}F(s' \rightarrow s_f) \end{aligned}
e−F(s)=F(s∣s)=s′≥s∑R(s′)=s′≥s∑F(s′→sf)
Def 32. 定义条件终止概率分布为
P
T
(
s
∣
A
)
=
P
(
s
→
s
f
,
A
)
P
(
A
)
=
1
s
→
s
f
∈
A
P
(
s
→
s
f
)
∑
s
′
→
s
f
∈
A
P
(
s
′
→
s
f
)
=
1
s
∈
A
P
(
s
→
s
f
)
∑
s
′
∈
A
P
(
s
′
→
s
f
)
=
∑
τ
∈
A
,
s
→
s
f
∈
τ
P
(
τ
)
∑
τ
∈
A
P
(
τ
)
\begin{aligned} P_T(s | A) &= \frac{P(s \rightarrow s_f, A)}{P(A)} \\ &= \frac{1_{s \rightarrow s_f \in A} P(s \rightarrow s_f)}{\sum_{s' \rightarrow s_f \in A}P(s' \rightarrow s_f)} \\ &= \frac{1_{s \in A} P(s \rightarrow s_f)}{\sum_{s' \in A}P(s' \rightarrow s_f)} \\ &= \frac{\sum_{\tau \in A, s \rightarrow s_f \in \tau} P(\tau)}{\sum_{\tau \in A}P(\tau)} \end{aligned}
PT(s∣A)=P(A)P(s→sf,A)=∑s′→sf∈AP(s′→sf)1s→sf∈AP(s→sf)=∑s′∈AP(s′→sf)1s∈AP(s→sf)=∑τ∈AP(τ)∑τ∈A,s→sf∈τP(τ)
其中,
A
A
A 为任意轨迹的集合。
P
T
(
s
∣
T
)
=
P
T
(
s
)
=
P
(
s
→
s
f
)
=
R
(
s
)
F
(
s
0
)
=
e
−
E
(
s
)
+
F
(
s
0
)
P_T(s | \mathcal{T}) = P_T(s) = P(s \rightarrow s_f) = \frac{R(s)}{F(s_0)} = e^{-\mathcal{E}(s) + \mathcal{F}(s_0)}
PT(s∣T)=PT(s)=P(s→sf)=F(s0)R(s)=e−E(s)+F(s0)
Prop 20. 已知
s
≤
s
′
s \le s'
s≤s′
P
T
(
s
′
∣
s
)
=
F
(
s
′
→
s
f
)
∑
s
′
′
≥
s
F
(
s
′
′
→
s
f
)
=
F
(
s
′
→
s
f
)
F
(
s
∣
s
)
=
R
(
s
′
)
∑
s
′
′
≥
s
R
(
s
′
′
)
=
e
−
E
(
s
)
+
F
(
s
)
\begin{aligned} P_T(s' | s) &= \frac{F(s' \rightarrow s_f)}{\sum_{s'' \ge s} F(s'' \rightarrow s_f)}\\ &= \frac{F(s' \rightarrow s_f)}{F(s | s)} \\ &= \frac{R(s')}{\sum_{s'' \ge s} R(s'')} \\ &= e^{-\mathcal{E}(s) + \mathcal{F}(s)} \end{aligned}
PT(s′∣s)=∑s′′≥sF(s′′→sf)F(s′→sf)=F(s∣s)F(s′→sf)=∑s′′≥sR(s′′)R(s′)=e−E(s)+F(s)
使用 GFlowNet 训练基于能量的模型
定义模型
P
θ
(
s
)
=
e
−
E
θ
(
s
)
/
Z
P_{\theta}(s) = e^{-\mathcal{E}_{\theta}(s)} / Z
Pθ(s)=e−Eθ(s)/Z ,其中
s
s
s 为终止状态。根据
P
^
T
\hat{P}_T
P^T 抽取的 GFlowNet 样本可用于获得上述模型在观测数据
x
x
x 下的负对数似然的随机梯度估计量
δ
−
l
o
g
P
θ
(
x
)
δ
θ
=
δ
E
θ
(
x
)
δ
θ
+
δ
l
o
g
Z
δ
θ
=
δ
E
θ
(
x
)
δ
θ
+
δ
l
o
g
∑
s
e
−
E
θ
(
s
)
δ
θ
=
δ
E
θ
(
x
)
δ
θ
+
1
∑
s
e
−
E
θ
(
s
)
∑
s
e
−
E
θ
(
s
)
(
−
δ
E
θ
(
s
)
δ
θ
)
=
δ
E
θ
(
x
)
δ
θ
−
∑
s
P
θ
(
s
)
δ
E
θ
(
s
)
δ
θ
\begin{aligned} \frac{\delta -logP_{\theta}(x)}{\delta \theta} &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{\delta log Z}{\delta \theta} \\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{\delta log \sum_s e^{-\mathcal{E}_{\theta}(s)}}{\delta \theta} \\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{1}{\sum_s e^{-\mathcal{E}_{\theta}(s)}} \sum_s e^{-\mathcal{E}_{\theta}(s)} (-\frac{\delta \mathcal{E}_{\theta}(s)}{\delta \theta})\\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} - \sum_s P_{\theta}(s) \frac{\delta \mathcal{E}_{\theta}(s)}{\delta \theta} \end{aligned}
δθδ−logPθ(x)=δθδEθ(x)+δθδlogZ=δθδEθ(x)+δθδlog∑se−Eθ(s)=δθδEθ(x)+∑se−Eθ(s)1s∑e−Eθ(s)(−δθδEθ(s))=δθδEθ(x)−s∑Pθ(s)δθδEθ(s)
其中,
s
∼
P
^
T
(
s
)
s \sim \hat{P}_T(s)
s∼P^T(s) 。
引入隐变量
h
h
h 后,
P
θ
(
x
,
h
)
=
e
−
E
θ
(
x
,
h
)
/
∑
x
,
h
e
−
E
θ
(
x
,
h
)
P_{\theta}(x, h) = e^{-\mathcal{E}_{\theta}(x, h)} / \sum_{x, h} e^{-\mathcal{E}_{\theta}(x, h)}
Pθ(x,h)=e−Eθ(x,h)/∑x,he−Eθ(x,h) ,边缘负对数似然的梯度变为
δ
−
l
o
g
P
θ
(
x
)
δ
θ
=
δ
−
l
o
g
∑
h
P
θ
(
x
,
h
)
δ
θ
=
−
1
∑
h
P
θ
(
x
,
h
)
∑
h
δ
P
θ
(
x
,
h
)
δ
θ
=
−
1
P
θ
(
x
)
∑
h
δ
δ
θ
(
e
−
E
θ
(
x
,
h
)
∑
x
,
h
e
−
E
θ
(
x
,
h
)
)
=
−
1
P
θ
(
x
)
∑
h
(
−
P
θ
(
x
,
h
)
δ
E
θ
(
x
,
h
)
δ
θ
+
P
θ
(
x
,
h
)
∑
s
,
h
P
θ
(
s
,
h
)
δ
E
θ
(
s
,
h
)
δ
θ
)
=
∑
h
P
θ
(
h
∣
x
)
(
δ
E
θ
(
x
,
h
)
δ
θ
−
∑
s
,
h
P
θ
(
s
,
h
)
δ
E
θ
(
s
,
h
)
δ
θ
)
\begin{aligned} \frac{\delta -logP_{\theta}(x)}{\delta \theta} &= \frac{\delta -log \sum_h P_{\theta}(x, h)}{\delta \theta} \\ &= -\frac{1}{\sum_h P_{\theta}(x, h)} \sum_h \frac{\delta P_{\theta}(x, h)}{\delta \theta} \\ &= -\frac{1}{P_{\theta}(x)} \sum_h \frac{\delta}{\delta \theta}(\frac{e^{-\mathcal{E}_{\theta}(x, h)}}{\sum_{x, h} e^{-\mathcal{E}_{\theta}(x, h)}}) \\ &= -\frac{1}{P_{\theta}(x)} \sum_h \Big( -P_{\theta}(x, h) \frac{\delta \mathcal{E}_{\theta}(x, h)}{\delta \theta} + P_{\theta}(x, h) \sum_{s, h} P_{\theta}(s, h) \frac{\delta \mathcal{E}_{\theta}(s, h)}{\delta \theta} \Big)\\ &= \sum_h P_{\theta}(h | x) \Big( \frac{\delta \mathcal{E}_{\theta}(x, h)}{\delta \theta} - \sum_{s, h} P_{\theta}(s, h) \frac{\delta \mathcal{E}_{\theta}(s, h)}{\delta \theta} \Big) \end{aligned}
δθδ−logPθ(x)=δθδ−log∑hPθ(x,h)=−∑hPθ(x,h)1h∑δθδPθ(x,h)=−Pθ(x)1h∑δθδ(∑x,he−Eθ(x,h)e−Eθ(x,h))=−Pθ(x)1h∑(−Pθ(x,h)δθδEθ(x,h)+Pθ(x,h)s,h∑Pθ(s,h)δθδEθ(s,h))=h∑Pθ(h∣x)(δθδEθ(x,h)−s,h∑Pθ(s,h)δθδEθ(s,h))
使用 GFlowNet 进行主动学习
训练分为外循环更新,即学习真实的能量函数(奖励函数),和内循环更新,即使用学习到的能量函数作为驱动目标训练 GFlowNet。
估计熵、条件熵和互信息
Def 33. 定义 熵奖励函数( entropic reward function )
R
′
R'
R′
R
′
(
s
)
=
−
R
(
s
)
l
o
g
R
(
s
)
R'(s) = -R(s)logR(s)
R′(s)=−R(s)logR(s)
新训练一个 GFlowNet ,原来的目标为 R R R ,新训练的目标为 R ′ R' R′ 。
Prop 21. 终止状态随机变量
S
S
S 的熵
H
[
S
]
H[S]
H[S] 为
H
[
S
]
=
−
∑
s
P
T
(
s
)
l
o
g
P
T
(
s
)
=
−
∑
s
R
(
s
)
F
(
s
0
)
(
l
o
g
R
(
s
)
−
l
o
g
F
(
s
0
)
)
=
−
∑
s
R
(
s
)
l
o
g
R
(
s
)
+
l
o
g
F
(
s
0
)
∑
s
R
(
s
)
F
(
s
0
)
=
∑
s
R
′
(
s
)
F
(
s
0
)
+
l
o
g
F
(
s
0
)
=
F
′
(
s
0
)
F
(
s
0
)
+
l
o
g
F
(
s
0
)
\begin{aligned} H[S] &= -\sum_s P_T(s)log P_T(s) \\ &= -\sum_s \frac{R(s)}{F(s_0)} \Big( logR(s) - logF(s_0) \Big) \\ &= \frac{-\sum_s R(s)logR(s) + logF(s_0)\sum_s R(s)}{F(s_0)} \\ &= \frac{\sum_s R'(s)}{F(s_0)} + logF(s_0) \\ &= \frac{F'(s_0)}{F(s_0)} + log F(s_0) \\ \end{aligned}
H[S]=−s∑PT(s)logPT(s)=−s∑F(s0)R(s)(logR(s)−logF(s0))=F(s0)−∑sR(s)logR(s)+logF(s0)∑sR(s)=F(s0)∑sR′(s)+logF(s0)=F(s0)F′(s0)+logF(s0)
其中, F ′ F' F′ 为新训练 GFlowNet 的流量度量。
Prop 22. 条件熵
H
[
S
∣
x
]
H[S | x]
H[S∣x] 为
H
[
S
∣
x
]
=
F
′
(
s
0
∣
x
)
F
(
s
0
∣
x
)
+
l
o
g
F
(
s
0
∣
x
)
H[S | x] = \frac{F'(s_0 | x)}{F(s_0 | x)} + logF(s_0 | x)
H[S∣x]=F(s0∣x)F′(s0∣x)+logF(s0∣x)
当
x
x
x 是轨迹中的一个事件时,将考虑经过该事件的轨迹集合。当
x
=
s
x = s
x=s 时,只考虑经过
s
s
s 的轨迹
H
[
S
∣
s
]
=
F
′
(
s
0
∣
s
)
F
(
s
0
∣
s
)
+
l
o
g
F
(
s
0
∣
s
)
=
F
′
(
s
∣
s
)
F
(
s
∣
s
)
+
l
o
g
F
(
s
∣
s
)
H[S | s] = \frac{F'(s_0 | s)}{F(s_0 | s)} + logF(s_0 | s) = \frac{F'(s | s)}{F(s | s)} + logF(s | s)
H[S∣s]=F(s0∣s)F′(s0∣s)+logF(s0∣s)=F(s∣s)F′(s∣s)+logF(s∣s)
Corollary 4. 终止状态随机变量
S
S
S 和条件随机变量
X
X
X 的互信息为
M
I
(
S
;
X
)
=
H
[
S
]
−
E
X
[
H
(
S
∣
X
)
]
=
F
′
(
s
0
)
F
(
s
0
)
+
l
o
g
F
(
s
0
)
−
E
X
[
F
′
(
s
0
∣
X
)
F
(
s
0
∣
X
)
+
l
o
g
F
(
s
0
∣
X
)
]
MI(S; X) = H[S] - E_X[H(S | X)] = \frac{F'(s_0)}{F(s_0)} + logF(s_0) - E_X[\frac{F'(s_0 | X)}{F(s_0 | X)} + log F(s_0 | X)]
MI(S;X)=H[S]−EX[H(S∣X)]=F(s0)F′(s0)+logF(s0)−EX[F(s0∣X)F′(s0∣X)+logF(s0∣X)]