EM算法公式推导两种方式
1 引入因变量Z
l
o
g
(
p
(
x
∣
θ
)
)
=
l
o
g
(
∫
p
(
z
,
x
∣
θ
)
d
z
)
(
1
)
log(p(x|\theta))=log(\int p(z,x|\theta) dz)\quad(1)
log(p(x∣θ))=log(∫p(z,x∣θ)dz)(1)
2 根据贝叶斯公式
l
o
g
(
p
(
x
)
)
=
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
q
(
z
)
)
−
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
log(p(x))=\frac{log(p(x,z))}{log(q(z))}-\frac{log(p(z|x))}{log(q(z))}
log(p(x))=log(q(z))log(p(x,z))−log(q(z))log(p(z∣x))
方法1 引入因变量Z
l
o
g
(
p
(
x
∣
θ
)
)
=
l
o
g
(
∫
p
(
z
,
x
∣
θ
)
d
z
)
(
1
)
log(p(x|\theta))=log(\int p(z,x|\theta) dz)\quad(1)
log(p(x∣θ))=log(∫p(z,x∣θ)dz)(1)
l
o
g
(
p
(
x
)
)
=
l
o
g
(
∫
q
(
z
)
p
(
z
,
x
∣
θ
)
q
(
z
)
)
(
2
)
log(p(x))=log(\int q(z)\frac{p(z,x|\theta)}{q(z)}) \quad(2)
log(p(x))=log(∫q(z)q(z)p(z,x∣θ))(2)
由于log函数是凸函数
l
o
g
(
p
(
x
)
)
>
=
∫
q
(
z
)
l
o
g
(
p
(
z
,
x
∣
θ
)
q
(
z
)
)
(
3
)
log(p(x))>=\int q(z)log(\frac{p(z,x|\theta)}{q(z)})\quad(3)
log(p(x))>=∫q(z)log(q(z)p(z,x∣θ))(3)
取等号时,
p
(
z
,
x
∣
θ
)
q
(
z
)
=
c
(
4
)
\frac{p(z,x|\theta)}{q(z)}=c \quad (4)
q(z)p(z,x∣θ)=c(4)是常数
p
(
z
,
x
∣
θ
)
=
c
∗
q
(
z
)
(
5
)
p(z,x|\theta)=c*q(z)\quad(5)
p(z,x∣θ)=c∗q(z)(5)
两边同时对z积分
∫
z
p
(
z
,
x
∣
θ
)
d
z
=
∫
z
c
∗
q
(
z
)
d
z
(
6
)
\int _z p(z,x|\theta) dz=\int _z c* q(z) dz\quad(6)
∫zp(z,x∣θ)dz=∫zc∗q(z)dz(6)
可得
p
(
x
∣
θ
)
=
c
p(x|\theta)=c
p(x∣θ)=c
根据公式4
p
(
z
,
x
∣
θ
)
q
(
z
)
=
p
(
x
∣
θ
)
q
(
z
)
=
p
(
z
∣
x
,
θ
)
\frac{p(z,x|\theta)}{q(z)}=p(x|\theta)\\ q(z)=p(z|x,\theta)
q(z)p(z,x∣θ)=p(x∣θ)q(z)=p(z∣x,θ)
q(z)=p(z|x,theta)带入公式3
l
o
g
(
p
(
x
)
)
>
=
∫
q
(
z
)
l
o
g
(
p
(
z
,
x
∣
θ
)
q
(
z
)
)
l
o
g
(
p
(
x
)
)
=
∫
p
(
z
∣
x
,
θ
t
)
l
o
g
(
p
(
z
,
x
∣
θ
t
+
1
)
p
(
z
∣
x
,
θ
t
)
)
log(p(x))>=\int q(z)log(\frac{p(z,x|\theta)}{q(z)})\quad \\ log(p(x))=\int p(z|x,\theta^t)log(\frac{p(z,x|\theta^{t+1})}{p(z|x,\theta^t)})
log(p(x))>=∫q(z)log(q(z)p(z,x∣θ))log(p(x))=∫p(z∣x,θt)log(p(z∣x,θt)p(z,x∣θt+1))
方法2 根据公式log(p(x))=log(p(x,z))-log(p(z|x))
由于
p
(
x
)
=
p
(
x
,
z
)
p
(
z
∣
x
)
(
1
)
由于 p(x)=\frac{p(x,z)}{p(z|x )} \quad(1)
由于p(x)=p(z∣x)p(x,z)(1)
l
o
g
(
p
(
x
)
)
=
l
o
g
(
p
(
x
,
z
)
)
−
l
o
g
(
p
(
z
∣
x
)
)
(
2
)
log(p(x))=log(p(x,z))-log(p(z|x)) \quad(2)
log(p(x))=log(p(x,z))−log(p(z∣x))(2)
l
o
g
(
p
(
x
)
)
=
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
q
(
z
)
)
−
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
(
3
)
log(p(x))=\frac{log(p(x,z))}{log(q(z))}-\frac{log(p(z|x))}{log(q(z))} \quad(3)
log(p(x))=log(q(z))log(p(x,z))−log(q(z))log(p(z∣x))(3)
两边同时对q(z)积分
左边
=
∫
q
(
z
)
l
o
g
(
p
(
x
)
)
d
z
(
4
)
左边=\int {q(z)log(p(x)) }\,{\rm d}z\quad(4)
左边=∫q(z)log(p(x))dz(4)
由于积分与x无关
左边
=
l
o
g
(
p
(
x
)
)
(
5
)
左边= log(p(x)) \quad(5)
左边=log(p(x))(5)
右边
=
∫
q
(
z
)
(
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
q
(
z
)
)
−
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
)
d
z
(
6
)
右边=\int {q(z)(\frac{log(p(x,z))}{log(q(z))}-\frac{log(p(z|x))}{log(q(z))} )}\,{\rm d}z \quad(6)
右边=∫q(z)(log(q(z))log(p(x,z))−log(q(z))log(p(z∣x)))dz(6)
右边可以变成两项
∫
q
(
z
)
(
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
q
(
z
)
)
d
z
−
∫
q
(
z
)
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
d
z
(
7
)
\int {q(z)(\frac{log(p(x,z))}{log(q(z))}\,{\rm d}z -\int q(z)\frac{log(p(z|x))}{log(q(z))} }\,{\rm d}z \quad(7)
∫q(z)(log(q(z))log(p(x,z))dz−∫q(z)log(q(z))log(p(z∣x))dz(7)
第二项就是
K
L
散度
∫
q
(
z
)
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
d
z
(
8
)
第二项 就是KL散度 \int {q(z)\frac{log(p(z|x))}{log(q(z))} }\,{\rm d}z \quad(8)
第二项就是KL散度∫q(z)log(q(z))log(p(z∣x))dz(8)
第一项
∫
q
(
z
)
(
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
q
(
z
)
)
d
z
为
E
L
B
O
(
9
)
第一项 \int q(z)(\frac{log(p(x,z))}{log(q(z))}\,{\rm d}z 为ELBO\quad(9)
第一项∫q(z)(log(q(z))log(p(x,z))dz为ELBO(9)
当
k
l
距离为
0
时,
p
(
x
)
概率最大
,
即
l
o
g
(
p
(
z
∣
x
)
)
l
o
g
(
q
(
z
)
)
为常数
当kl距离为0时,p(x)概率最大,即\frac{log(p(z|x))}{log(q(z))} 为常数
当kl距离为0时,p(x)概率最大,即log(q(z))log(p(z∣x))为常数
此时
q
(
z
)
=
p
(
z
∣
x
)
此时 q(z)=p(z|x)
此时q(z)=p(z∣x)
将
q
(
z
)
=
p
(
z
∣
x
)
带入第一项
∫
p
(
z
∣
x
)
(
l
o
g
(
p
(
x
,
z
)
)
l
o
g
(
p
(
z
∣
x
)
)
d
z
将q(z)=p(z|x)带入第一项 \int p(z|x)(\frac{log(p(x,z))}{log(p(z|x))}\,{\rm d}z
将q(z)=p(z∣x)带入第一项∫p(z∣x)(log(p(z∣x))log(p(x,z))dz
EM 中都有\theta作为条件,在求最大化\theta时,log(p(z|x)与\theta无关可以省略
a
r
g
m
a
x
θ
∫
p
(
z
∣
x
)
(
l
o
g
(
p
(
x
,
z
)
)
d
z
(
10
)
argmax_{\theta} \int p(z|x)(log(p(x,z))\,{\rm d}z\quad(10)
argmaxθ∫p(z∣x)(log(p(x,z))dz(10)