变分贝叶斯

Variational Bayesian inference

参考文献

  1. 徐亦达老师变分推断课件
  2. A tutorial on variational Bayesian inference
  3. 白板推导指数族分布
  4. 白板推导变分推断

Log-likelihood and Evidence Lower Bound(ELOB)

下列表达式总是成立:
ln ⁡ ( p ( X ) ) = ln ⁡ ( p ( X , Z ) ) − ln ⁡ ( P ( Z ∣ X ) ) \ln(p(X)) = \ln(p(X,Z)) - \ln(P(Z\mid X)) ln(p(X))=ln(p(X,Z))ln(P(ZX))
所以下式也成立:
ln ⁡ ( P ( X ) ) = [ ln ⁡ ( p ( X , Z ) ) − ln ⁡ ( q ( Z ) ) ] − [ ln ⁡ ( p ( Z ∣ X ) ) − ln ⁡ ( q ( Z ) ) ] \ln(P(X)) = \left[\ln(p(X,Z))-\ln(q(Z))\right] - \left[\ln(p(Z\mid X))-\ln(q(Z))\right] ln(P(X))=[ln(p(X,Z))ln(q(Z))][ln(p(ZX))ln(q(Z))]
所以现在我们有
ln ⁡ ( p ( X ) ) = ln ⁡ ( p ( X , Z ) q ( Z ) ) − ln ⁡ ( p ( Z ∣ X ) q ( Z ) ) \ln(p(X)) = \ln\left(\frac{p(X,Z)}{q(Z)}\right) - \ln\left(\frac{p(Z\mid X)}{q(Z)}\right) ln(p(X))=ln(q(Z)p(X,Z))ln(q(Z)p(ZX))
两边同时取期望:
ln ⁡ ( p ( X ) ) = ∫ q ( Z ) ln ⁡ ( p ( X , Z ) q ( Z ) ) d Z − ∫ q ( Z ) ln ⁡ ( p ( Z ∣ X ) q ( Z ) ) d Z = ∫ q ( Z ) ln ⁡ ( p ( X , Z ) ) d Z − ∫ q ( Z ) ln ⁡ ( q ( Z ) ) d Z ⏟ L ( q ) + ( − ∫ q ( Z ) ln ⁡ ( p ( Z ∣ X ) q ( Z ) ) d Z ) ⏟ K L ( q ∥ p ) = L ( q ) + K L ( q ∥ p ) \begin{aligned} \ln (p(X)) &=\int q(Z) \ln \left(\frac{p(X, Z)}{q(Z)}\right) \mathrm{d} Z-\int q(Z) \ln \left(\frac{p(Z \mid X)}{q(Z)}\right) \mathrm{d} Z \\ &=\underbrace{\int q(Z) \ln (p(X, Z)) \mathrm{d} Z-\int q(Z) \ln (q(Z)) \mathrm{d} Z}_{\mathcal{L}(q)}+\underbrace{\left(-\int q(Z) \ln \left(\frac{p(Z \mid X)}{q(Z)}\right) \mathrm{d} Z\right)}_{\mathbb{K} \mathbb{L}(q \| p)} \\ &=\mathcal{L}(q)+\mathbb{K} \mathbb{L}(q \| p) \end{aligned} ln(p(X))=q(Z)ln(q(Z)p(X,Z))dZq(Z)ln(q(Z)p(ZX))dZ=L(q) q(Z)ln(p(X,Z))dZq(Z)ln(q(Z))dZ+KL(qp) (q(Z)ln(q(Z)p(ZX))dZ)=L(q)+KL(qp)
KL散度一般用于度量两个概率分布函数之间的距离,其定义如下:
K L [ p ( X ) ∣ q ( X ) ] = ∑ x ∈ X [ p ( x ) log ⁡ p ( x ) q ( x ) ] = E x ∼ p ( x ) [ log ⁡ p ( x ) q ( x ) ] \mathbb{KL}[p(X)\mid q(X)] = \sum_{x\in X}\left[p(x)\log\frac{p(x)}{q(x)}\right] = \mathbb{E}_{x\sim p(x)}\left[\log\frac{p(x)}{q(x)}\right] KL[p(X)q(X)]=xX[p(x)logq(x)p(x)]=Exp(x)[logq(x)p(x)]
我们要做的就是找到与后验分布 p ( Z ∣ X ) p(Z\mid X) p(ZX)最接近的简单分布 p ( Z ) p(Z) p(Z)

Alternative Evidence Lower Bound(ELOB)

我们看另一种推导方法:
ln ⁡ ( p ( X ) ) = log ⁡ ∫ Z p ( X , Z ) d z = log ⁡ ∫ Z p ( X , Z ) q ( Z ) q ( Z ) d z = log ⁡ ( E q [ p ( X , Z ) q ( Z ) ] ) ≥ E q [ log ⁡ ( p ( X , Z ) q ( Z ) ) ]  using Jensen’s inequality  = E q [ log ⁡ ( p ( X , Z ) ) ] − E q [ log ⁡ ( q ( Z ) ) ] ≜ L ( q ) \begin{aligned} \ln (p(X)) &=\log \int_{Z} p(X, Z) \mathrm{d} z \\ &=\log \int_{Z} p(X, Z) \frac{q(Z)}{q(Z)} \mathrm{d} z \\ &=\log \left(\mathbb{E}_{q}\left[\frac{p(X, Z)}{q(Z)}\right]\right) \\ & \geq \mathbb{E}_{q}\left[\log \left(\frac{p(X, Z)}{q(Z)}\right)\right] \text { using Jensen's inequality } \\ &=\mathbb{E}_{q}[\log (p(X, Z))]-\mathbb{E}_{q}[\log (q(Z))] \\ & \triangleq \mathcal{L}(q) \end{aligned} ln(p(X))=logZp(X,Z)dz=logZp(X,Z)q(Z)q(Z)dz=log(Eq[q(Z)p(X,Z)])Eq[log(q(Z)p(X,Z))] using Jensen’s inequality =Eq[log(p(X,Z))]Eq[log(q(Z))]L(q)

Maximize Evidence Lower Bound(ELOB)

我们给每个部分一个名字:
Evidence Lower Bound (ELOB): L ( q ) = ∫ q ( Z ) ln ⁡ ( p ( X , Z ) ) d Z − ∫ q ( Z ) ln ⁡ ( q ( Z ) ) d Z K L  divergence:  K L ( q ∥ p ) = − ∫ q ( Z ) ln ⁡ ( p ( Z ∣ X ) q ( Z ) ) d Z \begin{array}{ll} \text {Evidence Lower Bound (ELOB):} & \mathcal{L}(q)=\int q(Z) \ln (p(X, Z)) \mathrm{d} Z-\int q(Z) \ln (q(Z)) \mathrm{d} Z \\ \mathrm{KL} \text { divergence: } & \mathbb{K} \mathbb{L}(q \| p)=-\int q(Z) \ln \left(\frac{p(Z \mid X)}{q(Z)}\right) d Z \end{array} Evidence Lower Bound (ELOB):KL divergence: L(q)=q(Z)ln(p(X,Z))dZq(Z)ln(q(Z))dZKL(qp)=q(Z)ln(q(Z)p(ZX))dZ

  • 注意 p ( X ) p(X) p(X)对于 q ( Z ) q(Z) q(Z)的选择是固定的。我们想要去选择一个 q ( Z ) q(Z) q(Z)函数最小化KL散度,因此 q ( Z ) q(Z) q(Z)变得离 p ( Z ∣ X ) p(Z\mid X) p(ZX)越来越近。很容易验证,当 q ( Z ) = p ( Z ∣ X ) q(Z)=p(Z\mid X) q(Z)=p(ZX)时,KL散度为 0 0 0
  • 我们知道 ln ⁡ p ( X ) = L ( q ) + K L ( q ∥ p ) \ln p(X) = \mathcal{L}(q)+\mathbb{KL}(q\| p) lnp(X)=L(q)+KL(qp)。最小化 K L ( q ∥ p ) \mathbb{KL}(q\| p) KL(qp)等同于最大化 L ( q ) \mathcal{L}(q) L(q)

我们可以选择 q ( Z ) q(Z) q(Z)使得
q ( Z ) = ∏ i = 1 M q i ( Z i ) q(Z) = \prod_{i=1}^Mq_i(Z_i) q(Z)=i=1Mqi(Zi)
其中 M M M Z Z Z的维度,也就是说 q ( Z ) q(Z) q(Z)的各个维度是独立的,这被称为平均场变分贝叶斯

注意 q ( Z ) q(Z) q(Z)对联合概率密度函数 p ( Z ∣ X ) p(Z\mid X) p(ZX)是一个很好地近似,但是边缘分布 q ( Z i ) q(Z_i) q(Zi) p ( Z i ∣ x ) p(Z_i\mid x) p(Zix)的近似不一定好。

将其带入到 L ( q ) \mathcal{L}(q) L(q)中:
L ( q ) = ∫ q ( Z ) ln ⁡ ( p ( X , Z ) ) d Z − ∫ q ( Z ) ln ⁡ ( q ( Z ) ) d Z = ∫ ∏ i = 1 M q i ( Z i ) ln ⁡ ( p ( X , Z ) ) d Z ⏟ part (1)  − ∫ ∏ i = 1 M q i ( Z i ) ∑ i = 1 M ln ⁡ ( q i ( Z i ) ) d Z ⏟ part (2)  \begin{aligned} \mathcal{L}(q) &=\int q(Z) \ln (p(X, Z)) \mathrm{d} Z-\int q(Z) \ln (q(Z)) \mathrm{d} Z \\ &=\underbrace{\int \prod_{i=1}^{M} q_{i}\left(Z_{i}\right) \ln (p(X, Z)) \mathrm{d} Z}_{\text {part (1) }}-\underbrace{\int \prod_{i=1}^{M} q_{i}\left(Z_{i}\right) \sum_{i=1}^{M} \ln \left(q_{i}\left(Z_{i}\right)\right) \mathrm{d} Z}_{\text {part (2) }} \end{aligned} L(q)=q(Z)ln(p(X,Z))dZq(Z)ln(q(Z))dZ=part (1)  i=1Mqi(Zi)ln(p(X,Z))dZpart (2)  i=1Mqi(Zi)i=1Mln(qi(Zi))dZ
我们先看Part1,假设我们只对 Z i Z_i Zi感兴趣,将其拿出来,变为:
( Part ⁡ 1 ) = ∫ Z j q j ( Z j ) ( ∫ Z i ≠ j … ∫ ∏ i ≠ j M q i ( Z i ) ln ⁡ ( p ( X , Z ) ) ∏ i ≠ j M d Z i ) d Z j (\operatorname{Part} 1)=\int_{Z_{j}} q_{j}\left(Z_{j}\right)\left(\int_{Z_{i \neq j}} \ldots \int \prod_{i \neq j}^{M} q_{i}\left(Z_{i}\right) \ln (p(X, Z)) \prod_{i \neq j}^{M} d Z_{i}\right) d Z_{j} (Part1)=Zjqj(Zj) Zi=ji=jMqi(Zi)ln(p(X,Z))i=jMdZi dZj
或者将其写为更紧凑的形式:
( Part ⁡ 1 ) = ∫ Z j q j ( Z j ) ( ∫ Z i ≠ j ⋯ ∫ ln ⁡ ( p ( X , Z ) ) ∏ i ≠ j M q i ( Z i ) d Z i ) d Z j (\operatorname{Part} 1)=\int_{Z_{j}} q_{j}\left(Z_{j}\right)\left(\int_{Z_{i \neq j}} \cdots \int \ln (p(X, Z)) \prod_{i \neq j}^{M} q_{i}\left(Z_{i}\right) d Z_{i}\right) d Z_{j} (Part1)=Zjqj(Zj) Zi=jln(p(X,Z))i=jMqi(Zi)dZi dZj
或者,为了让其更具有意义,可以将其放进一个期望函数里:
( Part ⁡ 1 ) = ∫ Z j q j ( Z j ) [ E i ≠ j [ ln ⁡ ( p ( X , Z ) ) ] ] d Z j (\operatorname{Part} 1)=\int_{Z_{j}} q_{j}\left(Z_{j}\right)\left[\mathbb{E}_{i \neq j}[\ln (p(X, Z))]\right] d Z_{j} (Part1)=Zjqj(Zj)[Ei=j[ln(p(X,Z))]]dZj
现在再看Part2:
(  Part 2)  = ∫ ∏ i = 1 M q i ( Z i ) ∑ i = 1 M ln ⁡ ( q i ( Z i ) ) d Z (\text { Part 2) }=\int \prod_{i=1}^{M} q_{i}(Z_{i}) \sum_{i=1}^{M} \ln \left(q_{i}(Z_{i}\right)) d Z ( Part 2) =i=1Mqi(Zi)i=1Mln(qi(Zi))dZ
将其化简:
(Part2) ⁡ = ∫ q ( Z ) ∑ i = 1 M ln ⁡ ( q i ( Z i ) ) d Z = ∑ i = 1 M ∫ Z q ( Z 1 , ⋯   , Z M ) ln ⁡ ( q i ( Z i ) ) d Z = ∑ i = 1 M ∫ Z i q i ( Z i ) ln ⁡ ( q i ( Z i ) ) d Z i \begin{aligned} \operatorname{(Part2)} &= \int q(Z)\sum_{i=1}^M\ln(q_i(Z_i))dZ\\ &=\sum_{i=1}^M\int_{Z}q(Z_1,\cdots,Z_M)\ln(q_i(Z_i))dZ\\ &=\sum_{i=1}^M\int_{Z_i}q_i(Z_i)\ln(q_i(Z_i))dZ_i \end{aligned} (Part2)=q(Z)i=1

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值