EM算法(Expectation-Maximum)算法。
1,要解决的问题
在一些情况下,我们得到的观察数据有未观察到的隐含数据,此时我们有未知的隐含数据和模型参数,因而无法直接用极大化对数似然函数得到模型分布的参数。怎么办呢?这就是EM算法可以派上用场的地方了。
算法思路
假设未知的隐含数据为z,模型参数为
θ
θ
。
已知的输入数据为:
x=(x(1),x(2),...x(m))
x
=
(
x
(
1
)
,
x
(
2
)
,
.
.
.
x
(
m
)
)
联合分布
p(x,z|θ)
p
(
x
,
z
|
θ
)
条件分布
p(z|x,θ)
p
(
z
|
x
,
θ
)
- 随即初始画模型参数 θ θ 的初值 θ0 θ 0 。
for j from 1 to J开始EM算法迭代:
a. E步:Qi(z(i))=P(z(i)|x(i),θj)(1) (1) Q i ( z ( i ) ) = P ( z ( i ) | x ( i ) , θ j )
((1)需证明,该等式为何成立)
L(θ,θj)=∑I=1m∑z(i)Qi(z(i))logP(x(i),z(i)|θ)(2) (2) L ( θ , θ j ) = ∑ I = 1 m ∑ z ( i ) Q i ( z ( i ) ) log P ( x ( i ) , z ( i ) | θ )
((2)需证明,该等式为何成立)
b. M步:极大化 L(θ,θj) L ( θ , θ j ) ,得到 θj+1 θ j + 1 :
θj+1=argmaxθL(θ,θj) θ j + 1 = arg max θ L ( θ , θ j )
c. 如果 θj+1已收敛,则算法结束,不然回到步骤a重复迭代 θ j + 1 已 收 敛 , 则 算 法 结 束 , 不 然 回 到 步 骤 a 重 复 迭 代
((3)需证明,为何可以收敛)数学证明
证明步骤中,先证明了为什么(2)是最大似然函数。再证明(1)等式成立。最后证明收敛性。
对于m个样本观察数据 x=(x(1),x(2),...x(m)) x = ( x ( 1 ) , x ( 2 ) , . . . x ( m ) ) ,m个未观察到的隐含数据 z=(z(1),z(2),...z(m)) z = ( z ( 1 ) , z ( 2 ) , . . . z ( m ) ) ,极大化模型分布的对数似然函数如下:
θ=argmaxθ∑i=1mlogP(x(i)|θ)=argmaxθ∑i=1mlog∑z(i)P(x(i),z(i)|θ) θ = arg max θ ∑ i = 1 m log P ( x ( i ) | θ ) = arg max θ ∑ i = 1 m log ∑ z ( i ) P ( x ( i ) , z ( i ) | θ )
再对该等式进行缩放,缩放用到Jensen不等式,通过
log∑jλjyj≥∑jλjlogyj,λj≥0,∑jλj=1 log ∑ j λ j y j ≥ ∑ j λ j log y j , λ j ≥ 0 , ∑ j λ j = 1
可以得出:
∑i=1mlog∑z(i)P(x(i),z(i)|θ)=∑i=1mlog∑z(i)Qi(z(i))P(x(i),z(i)|θ)Qi(z(i))≥∑i=1m∑z(i)Qi(z(i))logP(x(i),z(i)|θ)Qi(z(i))(755) (755) ∑ i = 1 m log ∑ z ( i ) P ( x ( i ) , z ( i ) | θ ) = ∑ i = 1 m log ∑ z ( i ) Q i ( z ( i ) ) P ( x ( i ) , z ( i ) | θ ) Q i ( z ( i ) ) ≥ ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log P ( x ( i ) , z ( i ) | θ ) Q i ( z ( i ) )
由于 ∑zQi(z(i))=1, ∑ z Q i ( z ( i ) ) = 1 , 当 P(x(i),z(i)|θ)Qi(z(i))=c P ( x ( i ) , z ( i ) | θ ) Q i ( z ( i ) ) = c ,等号成立
而从上面两等式中,我们可以得到:
Qi(z(i))=P(x(i),z(i)|θ)c=P(x(i),z(i)|θ)c×∑zQi(z(i))=P(x(i),z(i)|θ)∑zP(x(i),z(i)|θ)=P(x(i),z(i)|θ)P(x(i)|θ)=P(z(i)|x(i),θ)(756) (756) Q i ( z ( i ) ) = P ( x ( i ) , z ( i ) | θ ) c = P ( x ( i ) , z ( i ) | θ ) c × ∑ z Q i ( z ( i ) ) = P ( x ( i ) , z ( i ) | θ ) ∑ z P ( x ( i ) , z ( i ) | θ ) = P ( x ( i ) , z ( i ) | θ ) P ( x ( i ) | θ ) = P ( z ( i ) | x ( i ) , θ )
所以(1)中证明成立
所以当等式成立时, ∑mi=1∑z(i)Qi(z(i))logP(x(i),z(i)|θ)Qi(z(i)) ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log P ( x ( i ) , z ( i ) | θ ) Q i ( z ( i ) ) 是对数似然函数的下界,求该函数的极大下界。有助于极大化 ∑mi=1log∑z(i)P(x(i),z(i)|θ) ∑ i = 1 m log ∑ z ( i ) P ( x ( i ) , z ( i ) | θ ) .所以:
==argmaxθ∑i=1m∑z(i)Qi(z(i))logP(x(i),z(i)|θ)Qi(z(i))argmaxθ∑i=1m∑z(i)Qi(z(i))(logP(x(i),z(i)|θ)−logQi(z(i)))argmaxθ∑i=1m∑z(i)Qi(z(i))logP(x(i),z(i)|θ)(757) (757) arg max θ ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log P ( x ( i ) , z ( i ) | θ ) Q i ( z ( i ) ) = arg max θ ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) ( log P ( x ( i ) , z ( i ) | θ ) − log Q i ( z ( i ) ) ) = arg max θ ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log P ( x ( i ) , z ( i ) | θ )
所以(2)中证明成立
(3)要求EM算法收敛时,则我们需要证明我们的对数似然函数的值在迭代的过程中一直在增大。 只需证明 ∑mi=1logP(x(i)|θj+1)≥∑mi=1logP(x(i)|θj) ∑ i = 1 m log P ( x ( i ) | θ j + 1 ) ≥ ∑ i = 1 m log P ( x ( i ) | θ j ) ,这里不做证明。详细可以参考EM算法总结
以上就是EM算法的整个过程,和数学证明。
代码
#-*-coding:utf-8-*-
import numpy as np
from scipy.stats import binom
def em_single(priors, observations):
counts = {"A": {"H": 0, "T": 0}, "B": {"H": 0, "T": 0}}
theta_A = priors[0]
theta_B = priors[1]
for observation in observations:
len_observation = len(observation)
num_head = sum(observation)
num_tail = len_observation - num_head
contribution_A = binom.pmf(num_head, len_observation, theta_A)
contribution_B = binom.pmf(num_head, len_observation, theta_B)
weight_A = contribution_A/(contribution_A + contribution_B)
weight_B = contribution_B/(contribution_A + contribution_B)
counts["A"]["H"] += weight_A * num_head
counts["A"]["T"] += weight_A * num_tail
counts["B"]["H"] += weight_B * num_head
counts["B"]["T"] += weight_B * num_tail
new_theta_A = counts["A"]["H"] / (counts["A"]["H"] + counts["A"]["T"])
new_theta_B = counts["B"]["H"] / (counts["B"]["H"] + counts["B"]["T"])
return (new_theta_A, new_theta_B)
def em(observations, prior, tol = 1e-6, itertions = 10000):
itertion = 0
while itertion < itertions:
new_prior = em_single(prior, observations)
delta_change = np.abs(prior[0] - new_prior[0])
if delta_change < tol:
break
else:
prior = new_prior
itertion += 1
return ([new_prior, itertion])
def main(argv = None):
observations = np.array([[1,0,0,0,1,1,0,1,0,1],
[1,1,1,1,0,1,1,1,0,1],
[1,0,1,1,1,1,1,0,1,1],
[1,0,1,0,0,0,1,1,0,0],
[0,1,1,1,0,1,1,1,0,1]])
print em(observations, [0.8, 0.2])
if __name__ == "__main__":
main()