变分推断属于近似推断
令
X
是
观
测
变
量
,
Z
是
隐
变
量
,
θ
是
参
数
X 是观测变量,Z 是隐变量, \theta 是参数
X是观测变量,Z是隐变量,θ是参数
根据贝叶斯公式有
p
(
x
)
=
p
(
x
,
z
)
p
(
z
∣
x
)
p(x)=\frac{p(x,z)}{p(z|x)}
p(x)=p(z∣x)p(x,z)
在给定参数
θ
\theta
θ情况下:
p
(
x
∣
θ
)
=
p
(
x
,
z
∣
θ
)
p
(
z
∣
x
,
θ
)
p(x|\theta)=\frac{p(x,z|\theta)}{p(z|x,\theta)}
p(x∣θ)=p(z∣x,θ)p(x,z∣θ)
方程两边取对数有
log
p
(
x
∣
θ
)
=
log
p
(
x
,
z
∣
θ
)
−
log
p
(
z
∣
x
,
θ
)
\log p(x|\theta) =\log p(x,z|\theta)-\log p(z|x,\theta)
logp(x∣θ)=logp(x,z∣θ)−logp(z∣x,θ)
原有分布不好求,引入一个知名分布
q
(
z
)
q(z)
q(z),原式有:
log
p
(
x
∣
θ
)
=
log
p
(
x
,
z
∣
θ
)
−
log
q
(
z
)
−
log
p
(
z
∣
x
,
θ
)
+
log
q
(
z
)
\log p(x|\theta) =\log p(x,z|\theta)-\log q(z) -\log p(z|x,\theta)+\log q(z)
logp(x∣θ)=logp(x,z∣θ)−logq(z)−logp(z∣x,θ)+logq(z)
=
log
p
(
x
,
z
∣
θ
)
q
(
z
)
−
log
p
(
z
∣
x
,
θ
)
q
(
z
)
=\log \frac{p(x,z|\theta)}{q(z)}-\log \frac{p(z|x,\theta)}{q(z)}
=logq(z)p(x,z∣θ)−logq(z)p(z∣x,θ)
方程两边对z取积分有
左边:
∫
z
log
p
(
x
,
z
∣
θ
)
d
z
=
∫
z
log
p
(
z
∣
x
,
θ
)
p
(
x
∣
θ
)
d
z
\int_z\log p(x,z|\theta)dz=\int_z\log p(z|x,\theta)p(x|\theta)dz
∫zlogp(x,z∣θ)dz=∫zlogp(z∣x,θ)p(x∣θ)dz
=
log
p
(
x
∣
θ
)
∫
z
p
(
z
∣
x
,
θ
)
d
z
=\log p(x|\theta)\int_z p(z|x,\theta)dz
=logp(x∣θ)∫zp(z∣x,θ)dz
=
log
p
(
x
∣
θ
)
=\log p(x|\theta)
=logp(x∣θ)
右边:
∫
z
q
(
z
)
log
p
(
x
,
z
∣
θ
)
q
(
z
)
d
z
−
∫
z
q
(
z
)
log
p
(
z
∣
x
,
θ
)
q
(
z
)
d
z
\int_zq(z)\log \frac{p(x,z|\theta)}{q(z)}dz-\int_z q(z)\log \frac{p(z|x,\theta)}{q(z)}dz
∫zq(z)logq(z)p(x,z∣θ)dz−∫zq(z)logq(z)p(z∣x,θ)dz
=
∫
z
q
(
z
)
log
p
(
x
,
z
∣
θ
)
q
(
z
)
d
z
+
∫
z
q
(
z
)
log
q
(
z
)
p
(
z
∣
x
,
θ
)
d
z
=\int_zq(z)\log \frac{p(x,z|\theta)}{q(z)}dz+\int_z q(z)\log \frac{q(z)}{p(z|x,\theta)}dz
=∫zq(z)logq(z)p(x,z∣θ)dz+∫zq(z)logp(z∣x,θ)q(z)dz
其中:
∫
z
q
(
z
)
log
p
(
x
,
z
∣
θ
)
q
(
z
)
d
z
叫
作
E
L
B
O
\int_zq(z)\log \frac{p(x,z|\theta)}{q(z)}dz 叫作ELBO
∫zq(z)logq(z)p(x,z∣θ)dz叫作ELBO
∫
z
q
(
z
)
log
q
(
z
)
p
(
z
∣
x
,
θ
)
d
z
是
K
L
(
q
(
z
)
∣
∣
p
(
z
∣
x
,
θ
)
)
\int_z q(z)\log \frac{q(z)}{p(z|x,\theta)}dz 是KL({q(z)}||{p(z|x,\theta)})
∫zq(z)logp(z∣x,θ)q(z)dz是KL(q(z)∣∣p(z∣x,θ))
因为
K
L
(
q
(
z
)
∣
∣
p
(
z
∣
x
,
θ
)
)
KL({q(z)}||{p(z|x,\theta)})
KL(q(z)∣∣p(z∣x,θ))是大于0的量,并且当
q
(
z
)
q(z)
q(z)和原始分布越接近,
K
L
(
q
(
z
)
∣
∣
p
(
z
∣
x
,
θ
)
)
KL({q(z)}||{p(z|x,\theta)})
KL(q(z)∣∣p(z∣x,θ))
值越接近于0,于是问题可以转化为求最大化
E
L
B
O
项
,
求
E
L
B
O
的
过
程
也
就
叫
变
分
推
断
ELBO 项,求ELBO的过程也就叫变分推断
ELBO项,求ELBO的过程也就叫变分推断
展开
E
L
B
O
ELBO
ELBO有:
∫
z
q
(
z
)
log
p
(
x
,
z
∣
θ
)
q
(
z
)
\int_zq(z)\log \frac{p(x,z|\theta)}{q(z)}
∫zq(z)logq(z)p(x,z∣θ)
=
∫
z
q
(
z
)
log
p
(
x
,
z
∣
θ
)
d
z
−
∫
z
q
(
z
)
log
q
(
z
)
d
z
=\int_zq(z)\log{p(x,z|\theta)}dz -\int_zq(z)\log{q(z)}dz
=∫zq(z)logp(x,z∣θ)dz−∫zq(z)logq(z)dz
这里假定分布
q
(
Z
)
=
∏
i
q
i
(
z
i
)
q(Z)=\prod_i q_i(z_i)
q(Z)=∏iqi(zi),也就是假定
q
(
Z
)
q(Z)
q(Z)服从均值理论,等于多个分布的联乘,也就是近似的由来
带入上式有:
=
∫
z
∏
i
q
i
(
z
i
)
log
p
(
x
,
z
∣
θ
)
d
z
−
∫
z
∏
i
q
i
(
z
i
)
log
q
(
z
)
d
z
=\int_z\prod_i q_i(z_i)\log{p(x,z|\theta)}dz -\int_z\prod_i q_i(z_i)\log{q(z)}dz
=∫z∏iqi(zi)logp(x,z∣θ)dz−∫z∏iqi(zi)logq(z)dz
减号左边有:
∫
z
∏
i
q
i
(
z
i
)
log
p
(
x
,
z
∣
θ
)
d
z
\int_z\prod_i q_i(z_i)\log{p(x,z|\theta)}dz
∫z∏iqi(zi)logp(x,z∣θ)dz
提出
q
j
(
z
j
)
有
:
q_j(z_j)有:
qj(zj)有:
=
∫
z
j
q
j
(
z
j
)
(
∫
z
∏
i
≠
j
m
q
i
(
z
i
)
log
p
(
x
,
z
∣
θ
)
d
z
1...
m
)
d
z
j
=\int_{z_j}q_j(z_j) (\int_z\displaystyle \prod_{i\not= j}^m q_i(z_i)\log{p(x,z|\theta)}dz_{1...m} )dz_j
=∫zjqj(zj)(∫zi=j∏mqi(zi)logp(x,z∣θ)dz1...m)dzj
=
∫
z
j
q
j
(
z
j
)
(
E
[
log
p
(
x
,
z
∣
θ
)
]
d
z
j
,
其
中
E
是
关
于
∏
i
≠
j
m
q
i
(
z
i
)
的
期
望
=\int_{z_j}q_j(z_j) (E[\log{p(x,z|\theta)}]dz_j,其中E是关于\displaystyle \prod_{i\not= j}^m q_i(z_i)的期望
=∫zjqj(zj)(E[logp(x,z∣θ)]dzj,其中E是关于i=j∏mqi(zi)的期望
减号右边有:
∫
z
∏
i
q
i
(
z
i
)
log
∏
i
q
i
(
z
i
)
d
z
i
\int_z\prod_i q_i(z_i)\log{\prod_i q_i(z_i)}dz_i
∫z∏iqi(zi)log∏iqi(zi)dzi
=
∫
z
∏
i
q
i
(
z
i
)
∑
i
log
q
i
(
z
i
)
d
z
i
=\int_z\prod_i q_i(z_i)\sum_i \log{q_i(z_i)}dz_i
=∫z∏iqi(zi)∑ilogqi(zi)dzi
=
∫
z
∏
i
q
i
(
z
i
)
(
log
q
1
(
z
1
)
+
log
q
2
(
z
2
)
+
.
.
.
+
log
q
m
(
z
m
)
)
d
z
i
=\int_z\prod_i q_i(z_i)(\log q_1(z_1)+\log q_2(z_2)+...+ \log q_m(z_m))dz_i
=∫z∏iqi(zi)(logq1(z1)+logq2(z2)+...+logqm(zm))dzi
=
∑
i
∫
z
q
i
(
z
i
)
log
q
i
(
z
i
)
d
z
i
=\sum_i \int_zq_i(z_i)\log q_i(z_i)dz_i
=∑i∫zqi(zi)logqi(zi)dzi
由于这里只关心其中一项
q
j
(
z
j
)
q_j(z_j)
qj(zj),把其他项可以视作常数,
=
∫
z
q
j
(
z
j
)
log
q
j
(
z
j
)
d
z
j
+
C
= \int_zq_j(z_j)\log q_j(z_j)dz_j +C
=∫zqj(zj)logqj(zj)dzj+C
减
号
左
边
−
减
号
右
边
有
:
减号左边 -减号右边有:
减号左边−减号右边有:
∫
z
j
q
j
(
z
j
)
E
[
log
p
(
x
,
z
∣
θ
)
]
d
z
j
−
∫
z
q
j
(
z
j
)
log
q
j
(
z
j
)
d
z
j
−
C
\int_{z_j}q_j(z_j) E[\log{p(x,z|\theta)}]dz_j -\int_zq_j(z_j)\log q_j(z_j)dz_j -C
∫zjqj(zj)E[logp(x,z∣θ)]dzj−∫zqj(zj)logqj(zj)dzj−C
令
E
[
log
p
(
x
,
z
∣
θ
)
]
=
log
p
^
(
x
,
z
)
有
E[\log{p(x,z|\theta)}]=\log \hat p(x,z)有
E[logp(x,z∣θ)]=logp^(x,z)有
∫
z
j
q
j
(
z
j
)
log
p
^
(
x
,
z
)
d
z
j
−
∫
z
q
j
(
z
j
)
log
q
j
(
z
j
)
d
z
j
−
C
\int_{z_j}q_j(z_j)\log \hat p(x,z)dz_j -\int_zq_j(z_j)\log q_j(z_j)dz_j -C
∫zjqj(zj)logp^(x,z)dzj−∫zqj(zj)logqj(zj)dzj−C
=
∫
z
j
q
j
(
z
j
)
log
p
^
(
x
,
z
i
)
q
j
(
z
j
)
=\int_{z_j}q_j(z_j)\log \displaystyle \frac{\hat p(x,z_i)}{q_j(z_j)}
=∫zjqj(zj)logqj(zj)p^(x,zi)
=
−
∫
z
j
q
j
(
z
j
)
log
q
j
(
z
j
)
p
^
(
x
,
z
i
)
≤
0
=-\int_{z_j}q_j(z_j)\log \displaystyle \frac{q_j(z_j)}{\hat p(x,z_i)} \leq0
=−∫zjqj(zj)logp^(x,zi)qj(zj)≤0
所以当使得
q
j
(
z
j
)
=
p
^
(
x
,
z
i
)
时
,
函
数
能
取
最
大
值
q_j(z_j) = \hat p(x,z_i)时,函数能取最大值
qj(zj)=p^(x,zi)时,函数能取最大值