1.EM问题背景
随机变量中观测变量为
X
=
(
x
1
,
x
2
,
.
.
.
,
x
n
)
T
X=(x_1,x_2,...,x_n)^T
X=(x1,x2,...,xn)T,
隐变量为
Z
=
(
z
1
,
z
2
,
.
.
.
z
n
)
T
Z=(z_1,z_2,...z_n)^T
Z=(z1,z2,...zn)T
若
X
X
X服从的分布模型参数为
Θ
=
(
θ
1
,
θ
2
,
.
.
.
θ
k
)
\Theta=(\theta_1,\theta_2,...\theta_k)
Θ=(θ1,θ2,...θk),
则在模型
θ
\theta
θ下产生观测值X的概率为
P
(
X
∣
Θ
)
=
Π
i
m
P
(
x
(
i
)
∣
Θ
)
P(X|\Theta)=\Pi_i^mP(x^{(i)}|\Theta)
P(X∣Θ)=ΠimP(x(i)∣Θ)
则对数似然函数为
L
L
(
Θ
)
=
∑
i
m
l
o
g
P
(
x
(
i
)
∣
Θ
)
=
∑
i
m
l
o
g
∑
Z
(
i
)
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
\begin{aligned} LL(\Theta)&=\sum_i^mlogP(x^{(i)}|\Theta)\\ &=\sum_i^mlog\sum_{Z^{(i)}}P(x^{(i)},z^{(i)}|\Theta) \end{aligned}
LL(Θ)=i∑mlogP(x(i)∣Θ)=i∑mlogZ(i)∑P(x(i),z(i)∣Θ)
目的是求出使对数似然函数尽量大的
Θ
\Theta
Θ值
2.Jensen不等式
如果函数 f ( x ) f(x) f(x)为凸函数,如 f ( x ) = x 2 f(x)=x^2 f(x)=x2,则有 E [ f ( x ) ] ≥ f ( E [ x ] ) E[f(x)]\geq f(E[x]) E[f(x)]≥f(E[x]),当 x x x为常量时取等
3.EM算法推导
step1 初始化:
对 Θ = ( θ 1 , θ 2 , . . . θ k ) \Theta=(\theta_1,\theta_2,...\theta_k) Θ=(θ1,θ2,...θk)进行初始化
step2 E步:
上述对数似然函数中乘除一个
Q
i
(
z
(
i
)
)
Q_i(z^{(i)})
Qi(z(i)),且满足
∑
z
(
i
)
Q
i
(
z
(
i
)
)
=
1
\sum_{z^{(i)}}Q_i(z^{(i)})=1
∑z(i)Qi(z(i))=1,使得式子变为一个期望,即
L
L
(
Θ
)
=
∑
i
m
l
o
g
P
(
x
(
i
)
∣
Θ
)
=
∑
i
m
l
o
g
∑
Z
(
i
)
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
=
∑
i
m
l
o
g
∑
Z
(
i
)
[
Q
i
(
z
(
i
)
)
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
Q
i
(
z
(
i
)
)
]
,
对
Z
求
和
转
化
为
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
Q
i
(
z
(
i
)
)
的
期
望
由
J
e
n
s
e
n
不
等
式
,
f
(
x
)
=
l
o
g
x
,
则
有
E
[
f
(
x
)
]
≤
f
(
E
[
x
]
)
≥
∑
i
m
∑
Z
(
i
)
[
Q
i
(
z
(
i
)
)
l
o
g
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
Q
i
(
z
(
i
)
)
]
,
令
此
式
子
为
g
(
Θ
)
\begin{aligned} LL(\Theta)&=\sum_i^mlogP(x^{(i)}|\Theta)\\ &=\sum_i^mlog\sum_{Z^{(i)}}P(x^{(i)},z^{(i)}|\Theta)\\ &=\sum_i^mlog\sum_{Z^{(i)}} [Q_i(z^{(i)}) {{P(x^{(i)},z^{(i)}|\Theta)}\over{Q_i(z^{(i)})}}],对Z求和转化为{{P(x^{(i)},z^{(i)}|\Theta)}\over{Q_i(z^{(i)})}}的期望\\ &由Jensen不等式,f(x)=logx,则有E[f(x)]\leq f(E[x])\\ &\geq\sum_i^m\sum_{Z^{(i)}} [Q_i(z^{(i)}) log{{P(x^{(i)},z^{(i)}|\Theta)}\over{Q_i(z^{(i)})}}],令此式子为g(\Theta)\\ \end{aligned}
LL(Θ)=i∑mlogP(x(i)∣Θ)=i∑mlogZ(i)∑P(x(i),z(i)∣Θ)=i∑mlogZ(i)∑[Qi(z(i))Qi(z(i))P(x(i),z(i)∣Θ)],对Z求和转化为Qi(z(i))P(x(i),z(i)∣Θ)的期望由Jensen不等式,f(x)=logx,则有E[f(x)]≤f(E[x])≥i∑mZ(i)∑[Qi(z(i))logQi(z(i))P(x(i),z(i)∣Θ)],令此式子为g(Θ)
在
g
(
Θ
)
g(\Theta)
g(Θ)与
L
L
(
Θ
)
LL(\Theta)
LL(Θ)相交点,若
g
(
Θ
)
g(\Theta)
g(Θ)逐渐增大,则
L
L
(
Θ
)
LL(\Theta)
LL(Θ)必定也逐渐增大,否则会不满足不等式,
所以取
g
(
Θ
)
=
L
L
(
Θ
)
g(\Theta)=LL(\Theta)
g(Θ)=LL(Θ)时的
Θ
\Theta
Θ可以找到局部更优解,通过迭代可找到局部最优解
当
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
Q
i
(
z
(
i
)
)
=
c
{{P(x^{(i)},z^{(i)}|\Theta)}\over{Q_i(z^{(i)})}}=c
Qi(z(i))P(x(i),z(i)∣Θ)=c时(
c
c
c为常数),不等式取等
则有
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
=
c
∗
Q
i
(
z
(
i
)
)
,
左
右
同
时
对
z
(
i
)
求
和
∑
z
(
i
)
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
=
∑
z
(
i
)
c
∗
Q
i
(
z
(
i
)
)
∑
z
(
i
)
P
(
x
(
i
)
,
z
(
i
)
∣
Θ
)
=
c
\begin{aligned} P(x^{(i)},z^{(i)}|\Theta)&=c*Q_i(z^{(i)}),左右同时对z^{(i)}求和\\ \sum_{z^{(i)}}P(x^{(i)},z^{(i)}|\Theta)&=\sum_{z^{(i)}}c*Q_i(z^{(i)})\\ \sum_{z^{(i)}}P(x^{(i)},z^{(i)}|\Theta)&=c \end{aligned}
P(x(i),z(i)∣Θ)z(i)∑P(x(i),z(i)∣Θ)z(i)∑P(x(i),z(i)∣Θ)=c∗Qi(z(i)),左右同时对z(i)求和=z(i)∑c∗Qi(z(i))=c
则 Q i ( z ( i ) ) = P ( x ( i ) , z ( i ) ∣ Θ ) c = P ( x ( i ) , z ( i ) ∣ Θ ) ∑ z ( i ) P ( x ( i ) , z ( i ) ∣ Θ ) \begin{aligned} Q_i(z^{(i)})&={{P(x^{(i)},z^{(i)}|\Theta)}\over c}\\ &={{P(x^{(i)},z^{(i)}|\Theta)}\over {\sum_{z^{(i)}}P(x^{(i)},z^{(i)}|\Theta)}}\\ \end{aligned} Qi(z(i))=cP(x(i),z(i)∣Θ)=∑z(i)P(x(i),z(i)∣Θ)P(x(i),z(i)∣Θ)
带入 g ( Θ ) g(\Theta) g(Θ)有: g ( Θ ) = ∑ i m ∑ Z ( i ) [ Q i ( z ( i ) ) l o g P ( x ( i ) , z ( i ) ∣ Θ ) Q i ( z ( i ) ) ] g(\Theta)=\sum_i^m\sum_{Z^{(i)}} [Q_i(z^{(i)}) log{{P(x^{(i)},z^{(i)}|\Theta)}\over{Q_i(z^{(i)})}}] g(Θ)=∑im∑Z(i)[Qi(z(i))logQi(z(i))P(x(i),z(i)∣Θ)]是关于 Θ \Theta Θ的函数,其中 Q i ( z ( i ) ) Q_i(z^{(i)}) Qi(z(i))是第k次迭代中确定 Θ ( k ) \Theta^{(k)} Θ(k)后,来找 Θ ( k + 1 ) \Theta^{(k+1)} Θ(k+1)使 g ( Θ ) g(\Theta) g(Θ)函数值更大的一个定量,所以在后面对 Θ \Theta Θ求偏导时不需要对 Q i ( z ( i ) ) Q_i(z^{(i)}) Qi(z(i))中的 Θ \Theta Θ进行求导。
step3 M步:
对 g ( Θ ) g(\Theta) g(Θ)中的各个 θ \theta θ求偏导,找到 g ( Θ ) g(\Theta) g(Θ)的极大值
step4 重复
重复step2和step3,直至收敛,即 Θ \Theta Θ的值不再变化