坐标上升变分推断( Coordinate Ascent Variational Inference, CAVI)

变分推断是为了近似获得 P ( Z ∣ X ) P(Z | X) P(ZX) ,即隐状态的后验分布。
l o g P ( X ) = l o g P ( X , Z ) − l o g P ( Z ∣ X ) = l o g P ( X , Z ) q ( Z ) − l o g P ( Z ∣ X ) q ( Z ) \begin{aligned} log P(X) &= log P(X, Z) - log P(Z | X) \\ &= log \frac{P(X, Z)}{q(Z)} - log \frac{P(Z | X)}{q(Z)} \end{aligned} logP(X)=logP(X,Z)logP(ZX)=logq(Z)P(X,Z)logq(Z)P(ZX)
对两边求 q ( Z ) q(Z) q(Z) 的期望
E q ( Z ) [ l o g P ( X ) ] = ∫ q ( Z ) l o g P ( X ) d Z = l o g P ( X ) ∫ q ( Z ) d Z = l o g P ( X ) \begin{aligned} E_{q(Z)}[log P(X)] &= \int q(Z) log P(X) dZ \\ &= log P(X) \int q(Z) dZ \\ &= log P(X) \end{aligned} Eq(Z)[logP(X)]=q(Z)logP(X)dZ=logP(X)q(Z)dZ=logP(X)
所以
l o g P ( X ) = ∫ q ( Z ) l o g P ( X , Z ) q ( Z ) d Z − ∫ q ( Z ) P ( Z ∣ X ) q ( Z ) d Z = L ( q ) + K L ( q ∥ p ) \begin{aligned} log P(X) &= \int q(Z) log \frac{P(X, Z)}{q(Z)} dZ - \int q(Z) \frac{P(Z | X)}{q(Z)} dZ \\ &= \mathcal{L}(q) + KL(q \| p) \end{aligned} logP(X)=q(Z)logq(Z)P(X,Z)dZq(Z)q(Z)P(ZX)dZ=L(q)+KL(qp)
不了解KL散度的同学可以参考这篇文章:如何理解K-L散度(相对熵)
为了使 q ( Z ) q(Z) q(Z) 满足 P ( Z ∣ X ) P(Z | X) P(ZX) 的分布,应最小化 K L ( q ∥ p ) KL(q \| p) KL(qp) X X X 为可观测变量,即 l o g P ( X ) log P(X) logP(X) 为常值。所以,最小化 K L ( q ∥ p ) KL(q \| p) KL(qp) 与最大化 L ( q ) \mathcal{L}(q) L(q) 等价。
假设 q ( Z ) = ∏ i q i ( Z i ) q(Z) = \prod_i q_i(Z_i) q(Z)=iqi(Zi)
L ( q ) = ∫ q ( Z ) [ l o g P ( X , Z ) − l o g q ( Z ) ] d Z = ∫ ∏ i q i ( Z i ) l o g P ( X , Z ) d Z − ∫ ∏ i q i ( Z i ) l o g ∏ i q i ( Z i ) d Z = ∫ q j ( Z j ) [ ∫ ∏ i ≠ j q i ( Z i ) l o g P ( X , Z ) d Z i ] d Z j − ∫ ∑ j q j ( Z j ) l o g q j ( Z j ) [ ∏ i ≠ j q i ( Z i ) d Z i ] d Z j = ∫ q j ( Z j ) [ ∫ ∏ i ≠ j q i ( Z i ) l o g P ( X , Z ) d Z i ] d Z j − ∑ i ∫ q i ( Z i ) l o g q i ( Z i ) d Z i = ∫ q j ( Z j ) E q i ( Z i ) , i ≠ j [ l o g P ( X , Z ) ] d Z j − ∫ q j ( Z j ) l o g q j ( Z j ) d Z j + ∑ i ≠ j ∫ q i ( Z i ) l o g q i ( Z i ) d Z i = ∫ q j ( Z j ) l o g P ~ ( X , Z j ) d Z j − ∫ q j ( Z j ) l o g q j ( Z j ) d Z j + ∑ i ≠ j ∫ q i ( Z i ) l o g q i ( Z i ) d Z i = − K L ( q j ( Z j ) ∣ ∣ P ~ ( X , Z j ) ) + ∑ i ≠ j ∫ q i ( Z i ) l o g q i ( Z i ) d Z i \begin{aligned} \mathcal{L}(q) &= \int q(Z) [ log P(X, Z) - log q(Z)] dZ \\ &= \int \prod_i q_i(Z_i)log P(X, Z) dZ - \int \prod_i q_i(Z_i) log \prod_i q_i(Z_i) dZ \\ &= \int q_j(Z_j) [\int \prod_{i \ne j} q_i(Z_i)log P(X, Z) dZ_i] dZ_j - \int \sum_j q_j(Z_j) log q_j(Z_j) [\prod_{i \ne j} q_i(Z_i) dZ_i] dZ_j\\ &= \int q_j(Z_j) [\int \prod_{i \ne j} q_i(Z_i)log P(X, Z) dZ_i] dZ_j - \sum_i \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= \int q_j(Z_j) E_{q_i(Z_i), i \ne j}[log P(X, Z)] dZ_j - \int q_j(Z_j) log q_j(Z_j) dZ_j + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= \int q_j(Z_j) log \widetilde{P}(X, Z_j) dZ_j - \int q_j(Z_j) log q_j(Z_j) dZ_j + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \\ &= -KL(q_j(Z_j) || \widetilde{P}(X, Z_j)) + \sum_{i \ne j} \int q_i(Z_i) log q_i(Z_i) dZ_i \end{aligned} L(q)=q(Z)[logP(X,Z)logq(Z)]dZ=iqi(Zi)logP(X,Z)dZiqi(Zi)logiqi(Zi)dZ=qj(Zj)[i=jqi(Zi)logP(X,Z)dZi]dZjjqj(Zj)logqj(Zj)[i=jqi(Zi)dZi]dZj=qj(Zj)[i=jqi(Zi)logP(X,Z)dZi]dZjiqi(Zi)logqi(Zi)dZi=qj(Zj)Eqi(Zi),i=j[logP(X,Z)]dZjqj(Zj)logqj(Zj)dZj+i=jqi(Zi)logqi(Zi)dZi=qj(Zj)logP (X,Zj)dZjqj(Zj)logqj(Zj)dZj+i=jqi(Zi)logqi(Zi)dZi=KL(qj(Zj)P (X,Zj))+i=jqi(Zi)logqi(Zi)dZi
定义 l o g P ~ ( X , Z j ) = E q i ( Z i ) , i ≠ j [ l o g P ( X , Z ) ] log \widetilde{P}(X, Z_j) = E_{q_i(Z_i), i \ne j}[log P(X, Z)] logP (X,Zj)=Eqi(Zi),i=j[logP(X,Z)]
CAVI 的思想是当迭代 q j ( Z j ) q_j(Z_j) qj(Zj) 时,固定其他 q i ( Z i ) , i ≠ j q_i(Z_i), i \ne j qi(Zi),i=j ,所以
L ( q ) = − K L ( q j ( Z j ) ∣ ∣ P ~ ( X , Z j ) ) + c o n s t \mathcal{L}(q) = -KL(q_j(Z_j) || \widetilde{P}(X, Z_j)) + const L(q)=KL(qj(Zj)P (X,Zj))+const
为了最大化 L ( q ) \mathcal{L}(q) L(q) 就是使 q j ( Z j ) = P ~ ( X , Z j ) q_j(Z_j) = \widetilde{P}(X, Z_j) qj(Zj)=P (X,Zj) ,即
q j ∗ ( Z j ) = e x p ( E q i ( Z i ) , i ≠ j [ l o g P ( X , Z ) ] ) q_j^*(Z_j) = exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)]) qj(Zj)=exp(Eqi(Zi),i=j[logP(X,Z)])
为了使 ∑ Z j q j ∗ ( Z j ) = 1 \sum_{Z_j} q_j^*(Z_j) = 1 Zjqj(Zj)=1 ,将上面的式子进行标准化,可得
q j ∗ ( Z j ) = e x p ( E q i ( Z i ) , i ≠ j [ l o g P ( X , Z ) ] ) ∫ e x p ( E q i ( Z i ) , i ≠ j [ l o g P ( X , Z ) ] ) d Z j q_j^*(Z_j) = \frac{exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)])}{\int exp(E_{q_i(Z_i), i \ne j}[log P(X, Z)]) dZ_j} qj(Zj)=exp(Eqi(Zi),i=j[logP(X,Z)])dZjexp(Eqi(Zi),i=j[logP(X,Z)])
迭代直至收敛,即为我们所希望求得的隐状态后验。

Reference
[1]Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer New York

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值