GFlowNet Foundation 笔记(三)

系列文章
GFlowNet Foundation 笔记(一)
GFlowNet Foundation 笔记(二)

条件流与自由能

Def 24. 已知自由能 F ( s ) \mathcal{F}(s) F(s)
e − F ( s ) = ∑ s ′ : s ′ ≥ s R ( s ′ ) = ∑ s ′ : s ′ ≥ s e − F ( s ′ ) e^{-\mathcal{F}(s)} = \sum_{s': s' \ge s} R(s') = \sum_{s': s' \ge s} e^{-\mathcal{F}(s')} eF(s)=s:ssR(s)=s:sseF(s)

其中, s ′ s' s 为终止状态。注意, e − F ( s ) e^{-\mathcal{F}(s)} eF(s) 并不等同于 F ( s ) F(s) F(s)

条件化 GFlowNet

除了 F(s) 之外,还有另一个可用于估计的量。即对于 s ≤ s ′ s \le s' ss ,如果所有通过终止边缘 s ′ → s f s' \rightarrow s_f ssf 的流量都转向 s s s,则通过 s s s 的流量总和将表示自由能,如下图所示。
在这里插入图片描述
Def 25. s 0 ∣ x s_0 | x s0x s f ∣ x s_f | x sfx 分别表示在条件变量 x x x 下的源状态和汇状态。使用 R ( s ∣ x ) R(s | x) R(sx) 表示在条件 x x x 下的终止奖励函数。

Prop 17. 当 GFlowNet 训练完成时
F ( s 0 ∣ x ) = ∑ s ∣ x R ( s ∣ x ) = Z ( x ) F(s_0 | x) = \sum_{s | x} R(s | x) = Z(x) F(s0x)=sxR(sx)=Z(x)

Proof.
∑ s ∣ x R ( s ∣ x ) = ∑ s ∣ x F ( s → s f ∣ x ) = ∑ s ∣ x ∑ s ∈ τ F ( τ ) = ∑ τ ∣ x F ( τ ) = F ( s 0 ∣ x ) \begin{aligned} \sum_{s | x} R(s | x) &= \sum_{s | x}F(s \rightarrow s_f | x) \\ &= \sum_{s | x} \sum_{s \in \tau} F(\tau) \\ &= \sum_{\tau | x} F(\tau) = F(s_0 | x) \end{aligned} sxR(sx)=sxF(ssfx)=sxsτF(τ)=τxF(τ)=F(s0x)

估计自由能

Def 30. 状态条件 GFlowNet 是条件 GFlowNet 的一种特殊形式,即条件集合 X \mathcal{X} X 为状态集合,即 s ′ ∣ s , ∀ s ∈ X , s < s ′ s' | s, \forall s \in \mathcal{X}, s < s' ss,sX,s<s 。如果 s ′ → s ′ ′ ∈ A s' \rightarrow s'' \in \mathbb{A} ssA ,根据马尔可夫性质
P F ( s ′ ′ ∣ s ′ , s ) = P ( s ′ → s ′ ′ ∣ s ′ , s ) = P ( s ′ → s ′ ′ ∣ s ′ ) = P F ( s ′ ′ ∣ s ′ ) P_F(s'' | s', s) = P(s' \rightarrow s'' | s', s) = P(s' \rightarrow s'' | s') = P_F(s'' | s') PF(ss,s)=P(sss,s)=P(sss)=PF(ss)

状态条件 GFlowNet 的训练目标为
L = E ( s 0 , s 1 , . . . , s n , s f ) ∼ π T [ ∑ t = 0 n E 0 ≤ t ′ ≤ t [ L ( s t , s t + 1 ∣ s t ′ ) ] ] L ( s ′ , s ′ ′ ∣ s ) = ( l o g ( δ + F ^ ( s ′ ∣ s ) P ^ F ( s ′ ′ ∣ s ′ ) δ + F ^ ( s ′ ′ ∣ s ) P ^ B ( s ′ ∣ s ′ ′ , s ) ) ) 2 \mathcal{L} = E_{(s_0, s_1, ..., s_n, s_f) \sim \pi_T}\Big[ \sum_{t=0}^n E_{0 \le t' \le t}[L(s_t, s_{t+1} | s_{t'})] \Big] \\ L(s', s'' | s) = \Big( log(\frac{\delta + \hat{F}(s' | s) \hat{P}_F(s'' | s')}{\delta + \hat{F}(s'' | s) \hat{P}_B(s' | s'', s)}) \Big)^2 L=E(s0,s1,...,sn,sf)πT[t=0nE0tt[L(st,st+1st)]]L(s,ss)=(log(δ+F^(ss)P^B(ss,s)δ+F^(ss)P^F(ss)))2

Def 31. F ( s ∣ s ) F(s | s) F(ss) 为条件状态自流量。 F ( s ∣ s ) F(s | s) F(ss) 表示当仅允许通过 s 的轨迹时通过 s 的流,并通过所有 s0 产生所需的流 R(s0)。

Prop 19. 当 GFlowNet 训练完成时
e − F ( s ) = F ( s ∣ s ) = ∑ s ′ ≥ s R ( s ′ ) = ∑ s ′ ≥ s F ( s ′ → s f ) \begin{aligned} e^{-\mathcal{F}(s)} &= F(s | s) \\ &= \sum_{s' \ge s}R(s') \\ &= \sum_{s' \ge s}F(s' \rightarrow s_f) \end{aligned} eF(s)=F(ss)=ssR(s)=ssF(ssf)

Def 32. 定义条件终止概率分布为
P T ( s ∣ A ) = P ( s → s f , A ) P ( A ) = 1 s → s f ∈ A P ( s → s f ) ∑ s ′ → s f ∈ A P ( s ′ → s f ) = 1 s ∈ A P ( s → s f ) ∑ s ′ ∈ A P ( s ′ → s f ) = ∑ τ ∈ A , s → s f ∈ τ P ( τ ) ∑ τ ∈ A P ( τ ) \begin{aligned} P_T(s | A) &= \frac{P(s \rightarrow s_f, A)}{P(A)} \\ &= \frac{1_{s \rightarrow s_f \in A} P(s \rightarrow s_f)}{\sum_{s' \rightarrow s_f \in A}P(s' \rightarrow s_f)} \\ &= \frac{1_{s \in A} P(s \rightarrow s_f)}{\sum_{s' \in A}P(s' \rightarrow s_f)} \\ &= \frac{\sum_{\tau \in A, s \rightarrow s_f \in \tau} P(\tau)}{\sum_{\tau \in A}P(\tau)} \end{aligned} PT(sA)=P(A)P(ssf,A)=ssfAP(ssf)1ssfAP(ssf)=sAP(ssf)1sAP(ssf)=τAP(τ)τA,ssfτP(τ)

其中, A A A 为任意轨迹的集合。
P T ( s ∣ T ) = P T ( s ) = P ( s → s f ) = R ( s ) F ( s 0 ) = e − E ( s ) + F ( s 0 ) P_T(s | \mathcal{T}) = P_T(s) = P(s \rightarrow s_f) = \frac{R(s)}{F(s_0)} = e^{-\mathcal{E}(s) + \mathcal{F}(s_0)} PT(sT)=PT(s)=P(ssf)=F(s0)R(s)=eE(s)+F(s0)

Prop 20. 已知 s ≤ s ′ s \le s' ss
P T ( s ′ ∣ s ) = F ( s ′ → s f ) ∑ s ′ ′ ≥ s F ( s ′ ′ → s f ) = F ( s ′ → s f ) F ( s ∣ s ) = R ( s ′ ) ∑ s ′ ′ ≥ s R ( s ′ ′ ) = e − E ( s ) + F ( s ) \begin{aligned} P_T(s' | s) &= \frac{F(s' \rightarrow s_f)}{\sum_{s'' \ge s} F(s'' \rightarrow s_f)}\\ &= \frac{F(s' \rightarrow s_f)}{F(s | s)} \\ &= \frac{R(s')}{\sum_{s'' \ge s} R(s'')} \\ &= e^{-\mathcal{E}(s) + \mathcal{F}(s)} \end{aligned} PT(ss)=ssF(ssf)F(ssf)=F(ss)F(ssf)=ssR(s)R(s)=eE(s)+F(s)

使用 GFlowNet 训练基于能量的模型

定义模型 P θ ( s ) = e − E θ ( s ) / Z P_{\theta}(s) = e^{-\mathcal{E}_{\theta}(s)} / Z Pθ(s)=eEθ(s)/Z ,其中 s s s 为终止状态。根据 P ^ T \hat{P}_T P^T 抽取的 GFlowNet 样本可用于获得上述模型在观测数据 x x x 下的负对数似然的随机梯度估计量
δ − l o g P θ ( x ) δ θ = δ E θ ( x ) δ θ + δ l o g Z δ θ = δ E θ ( x ) δ θ + δ l o g ∑ s e − E θ ( s ) δ θ = δ E θ ( x ) δ θ + 1 ∑ s e − E θ ( s ) ∑ s e − E θ ( s ) ( − δ E θ ( s ) δ θ ) = δ E θ ( x ) δ θ − ∑ s P θ ( s ) δ E θ ( s ) δ θ \begin{aligned} \frac{\delta -logP_{\theta}(x)}{\delta \theta} &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{\delta log Z}{\delta \theta} \\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{\delta log \sum_s e^{-\mathcal{E}_{\theta}(s)}}{\delta \theta} \\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} + \frac{1}{\sum_s e^{-\mathcal{E}_{\theta}(s)}} \sum_s e^{-\mathcal{E}_{\theta}(s)} (-\frac{\delta \mathcal{E}_{\theta}(s)}{\delta \theta})\\ &= \frac{\delta \mathcal{E}_{\theta}(x)}{\delta \theta} - \sum_s P_{\theta}(s) \frac{\delta \mathcal{E}_{\theta}(s)}{\delta \theta} \end{aligned} δθδlogPθ(x)=δθδEθ(x)+δθδlogZ=δθδEθ(x)+δθδlogseEθ(s)=δθδEθ(x)+seEθ(s)1seEθ(s)(δθδEθ(s))=δθδEθ(x)sPθ(s)δθδEθ(s)

其中, s ∼ P ^ T ( s ) s \sim \hat{P}_T(s) sP^T(s)
引入隐变量 h h h 后, P θ ( x , h ) = e − E θ ( x , h ) / ∑ x , h e − E θ ( x , h ) P_{\theta}(x, h) = e^{-\mathcal{E}_{\theta}(x, h)} / \sum_{x, h} e^{-\mathcal{E}_{\theta}(x, h)} Pθ(x,h)=eEθ(x,h)/x,heEθ(x,h) ,边缘负对数似然的梯度变为
δ − l o g P θ ( x ) δ θ = δ − l o g ∑ h P θ ( x , h ) δ θ = − 1 ∑ h P θ ( x , h ) ∑ h δ P θ ( x , h ) δ θ = − 1 P θ ( x ) ∑ h δ δ θ ( e − E θ ( x , h ) ∑ x , h e − E θ ( x , h ) ) = − 1 P θ ( x ) ∑ h ( − P θ ( x , h ) δ E θ ( x , h ) δ θ + P θ ( x , h ) ∑ s , h P θ ( s , h ) δ E θ ( s , h ) δ θ ) = ∑ h P θ ( h ∣ x ) ( δ E θ ( x , h ) δ θ − ∑ s , h P θ ( s , h ) δ E θ ( s , h ) δ θ ) \begin{aligned} \frac{\delta -logP_{\theta}(x)}{\delta \theta} &= \frac{\delta -log \sum_h P_{\theta}(x, h)}{\delta \theta} \\ &= -\frac{1}{\sum_h P_{\theta}(x, h)} \sum_h \frac{\delta P_{\theta}(x, h)}{\delta \theta} \\ &= -\frac{1}{P_{\theta}(x)} \sum_h \frac{\delta}{\delta \theta}(\frac{e^{-\mathcal{E}_{\theta}(x, h)}}{\sum_{x, h} e^{-\mathcal{E}_{\theta}(x, h)}}) \\ &= -\frac{1}{P_{\theta}(x)} \sum_h \Big( -P_{\theta}(x, h) \frac{\delta \mathcal{E}_{\theta}(x, h)}{\delta \theta} + P_{\theta}(x, h) \sum_{s, h} P_{\theta}(s, h) \frac{\delta \mathcal{E}_{\theta}(s, h)}{\delta \theta} \Big)\\ &= \sum_h P_{\theta}(h | x) \Big( \frac{\delta \mathcal{E}_{\theta}(x, h)}{\delta \theta} - \sum_{s, h} P_{\theta}(s, h) \frac{\delta \mathcal{E}_{\theta}(s, h)}{\delta \theta} \Big) \end{aligned} δθδlogPθ(x)=δθδloghPθ(x,h)=hPθ(x,h)1hδθδPθ(x,h)=Pθ(x)1hδθδ(x,heEθ(x,h)eEθ(x,h))=Pθ(x)1h(Pθ(x,h)δθδEθ(x,h)+Pθ(x,h)s,hPθ(s,h)δθδEθ(s,h))=hPθ(hx)(δθδEθ(x,h)s,hPθ(s,h)δθδEθ(s,h))

使用 GFlowNet 进行主动学习

训练分为外循环更新,即学习真实的能量函数(奖励函数),和内循环更新,即使用学习到的能量函数作为驱动目标训练 GFlowNet。

估计熵、条件熵和互信息

Def 33. 定义 熵奖励函数( entropic reward function ) R ′ R' R
R ′ ( s ) = − R ( s ) l o g R ( s ) R'(s) = -R(s)logR(s) R(s)=R(s)logR(s)

新训练一个 GFlowNet ,原来的目标为 R R R ,新训练的目标为 R ′ R' R

Prop 21. 终止状态随机变量 S S S 的熵 H [ S ] H[S] H[S]
H [ S ] = − ∑ s P T ( s ) l o g P T ( s ) = − ∑ s R ( s ) F ( s 0 ) ( l o g R ( s ) − l o g F ( s 0 ) ) = − ∑ s R ( s ) l o g R ( s ) + l o g F ( s 0 ) ∑ s R ( s ) F ( s 0 ) = ∑ s R ′ ( s ) F ( s 0 ) + l o g F ( s 0 ) = F ′ ( s 0 ) F ( s 0 ) + l o g F ( s 0 ) \begin{aligned} H[S] &= -\sum_s P_T(s)log P_T(s) \\ &= -\sum_s \frac{R(s)}{F(s_0)} \Big( logR(s) - logF(s_0) \Big) \\ &= \frac{-\sum_s R(s)logR(s) + logF(s_0)\sum_s R(s)}{F(s_0)} \\ &= \frac{\sum_s R'(s)}{F(s_0)} + logF(s_0) \\ &= \frac{F'(s_0)}{F(s_0)} + log F(s_0) \\ \end{aligned} H[S]=sPT(s)logPT(s)=sF(s0)R(s)(logR(s)logF(s0))=F(s0)sR(s)logR(s)+logF(s0)sR(s)=F(s0)sR(s)+logF(s0)=F(s0)F(s0)+logF(s0)

其中, F ′ F' F 为新训练 GFlowNet 的流量度量。

Prop 22. 条件熵 H [ S ∣ x ] H[S | x] H[Sx]
H [ S ∣ x ] = F ′ ( s 0 ∣ x ) F ( s 0 ∣ x ) + l o g F ( s 0 ∣ x ) H[S | x] = \frac{F'(s_0 | x)}{F(s_0 | x)} + logF(s_0 | x) H[Sx]=F(s0x)F(s0x)+logF(s0x)

x x x 是轨迹中的一个事件时,将考虑经过该事件的轨迹集合。当 x = s x = s x=s 时,只考虑经过 s s s 的轨迹
H [ S ∣ s ] = F ′ ( s 0 ∣ s ) F ( s 0 ∣ s ) + l o g F ( s 0 ∣ s ) = F ′ ( s ∣ s ) F ( s ∣ s ) + l o g F ( s ∣ s ) H[S | s] = \frac{F'(s_0 | s)}{F(s_0 | s)} + logF(s_0 | s) = \frac{F'(s | s)}{F(s | s)} + logF(s | s) H[Ss]=F(s0s)F(s0s)+logF(s0s)=F(ss)F(ss)+logF(ss)

Corollary 4. 终止状态随机变量 S S S 和条件随机变量 X X X 的互信息为
M I ( S ; X ) = H [ S ] − E X [ H ( S ∣ X ) ] = F ′ ( s 0 ) F ( s 0 ) + l o g F ( s 0 ) − E X [ F ′ ( s 0 ∣ X ) F ( s 0 ∣ X ) + l o g F ( s 0 ∣ X ) ] MI(S; X) = H[S] - E_X[H(S | X)] = \frac{F'(s_0)}{F(s_0)} + logF(s_0) - E_X[\frac{F'(s_0 | X)}{F(s_0 | X)} + log F(s_0 | X)] MI(S;X)=H[S]EX[H(SX)]=F(s0)F(s0)+logF(s0)EX[F(s0X)F(s0X)+logF(s0X)]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值