0. 前言
EM(Expectation Maximization)算法是1977由Dempster等人提出来的,是一种迭代算法。其每次迭代过程分为两步:E步:求期望(Expectation);M步:求极大值(Maximization)。
EM算法的应用场景: 当模型含有隐变量或者潜在变量(latent variable)的时候。
1. EM算法流程
输入: 观测变量Y,隐变量数据Z,联合分布 P ( Y , Z ∣ θ ) P(Y,Z|\theta) P(Y,Z∣θ),条件分布 P ( Z ∣ Y , θ ) P(Z|Y, \theta) P(Z∣Y,θ);
输出: 模型参数 θ \theta θ
(1) 选择参数的初始值 θ ( 0 ) \theta_{(0)} θ(0),开始迭代;
(2) E步:计算Q函数
Q ( θ , θ ( i ) ) = E z [ log P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ z P ( Z ∣ Y , θ ( i ) ) log P ( Y , Z ∣ θ ) Q(\theta, \theta_{(i)}) = E_z[\log P(Y,Z|\theta)|Y, \theta^{(i)}]\\ =\sum_{z}P(Z|Y,\theta^{(i)})\log P(Y,Z|\theta) Q(θ,θ(i))=Ez[logP(Y,Z∣θ)∣Y,θ(i)]=z∑P(Z∣Y,θ(i))logP(Y,Z∣θ)
其中 θ ( i ) \theta^{(i)} θ(i)为第i次迭代 θ \theta θ的估计值, P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)}) P(Z∣Y,θ(i))是在给定观测数据Y和当前的估计参数 θ ( i ) \theta^{(i)} θ(i)的情况下,隐变量Z的条件概率分布;(这一步的主要目的就是求出在当前参数下,隐变量Z的条件概率分布 P ( Z ∣ Y , θ ( i ) ) P(Z|Y,\theta^{(i)}) P(Z∣Y,θ(i))。然后就可以写出Q函数了)
(3) M 步:求使 Q ( θ , θ ( i ) ) Q(\theta, \theta^{(i)}) Q(θ,θ(i))极大化的 θ \theta θ,确定第i+1次迭代的参数 θ ( i + 1 ) \theta^{(i+1)} θ(i+1):
θ ( i + 1 ) = arg max θ Q ( θ , θ ( i ) ) \theta^{(i+1)} = \arg \max_{\theta}Q(\theta, \theta^{(i)}) θ(i+1)=argθmaxQ(θ,θ(i))
(4)重复(2)、(3)步直到收敛。
2. EM算法的经典例子-三硬币模型
假设有三个硬币A,B,C;这些硬币正面出现的概率为 π , p , q \pi,p,q π,p,q。进行如下的实验:先掷硬币A,根据A的结果选出硬币B或者硬币C,其中A为正面时选择B,反面时选择C;然后投掷选出的硬币,正面记为1,反面记为0;独立的重复n次实验,出现观测结果如下(n=10):
1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1
【解法】:假设y是观测变量(观测结果),z是未观测到的投掷硬币A的结果;
θ
=
{
π
,
p
,
q
}
\theta=\{\pi,p,q\}
θ={π,p,q}是模型参数,那么三硬币模型可以写为在观测参数
θ
\theta
θ下,出现结果y的条件概率:
P
(
y
∣
θ
)
=
∑
z
P
(
y
,
z
∣
θ
)
=
∑
z
P
(
z
∣
θ
)
P
(
y
∣
z
,
θ
)
=
π
p
y
(
1
−
p
)
(
1
−
y
)
+
(
1
−
π
)
q
y
(
1
−
q
)
(
1
−
y
)
P(y|\theta) = \sum_{z}P(y,z|\theta)=\sum_{z}P(z|\theta)P(y|z,\theta)\\ =\pi p^y(1-p)^{(1-y)}+(1-\pi) q^y(1-q)^{(1-y)}
P(y∣θ)=z∑P(y,z∣θ)=z∑P(z∣θ)P(y∣z,θ)=πpy(1−p)(1−y)+(1−π)qy(1−q)(1−y)
E步: 计算未观测变量Z在给定观测数据Y和当前参数
θ
\theta
θ下的条件概率:
P
(
Z
∣
Y
,
θ
(
i
)
)
P(Z|Y,\theta^{(i)})
P(Z∣Y,θ(i)),再写出Q函数。假设第i次迭代的参数为
θ
(
i
)
=
{
π
(
i
)
,
p
(
i
)
,
q
(
i
)
}
\theta^{(i)}=\{\pi^{(i)}, p^{(i)}, q^{(i)}\}
θ(i)={π(i),p(i),q(i)},则在当前参数下观测变量
y
(
i
)
y^{(i)}
y(i)下,来自硬币B(A为正面)的概率:
μ
j
(
i
+
1
)
=
P
(
z
=
B
∣
Y
,
θ
(
i
)
)
=
π
(
i
)
p
(
i
)
(
1
−
p
)
(
i
)
π
(
i
)
p
(
i
)
(
1
−
p
)
(
i
)
+
π
(
i
)
q
(
i
)
(
1
−
q
)
(
i
)
\mu_j^{(i+1)} = P(z=B|Y,\theta_{(i)})=\frac{\pi^{(i)}p^{(i)}(1-p)^{(i)}}{\pi^{(i)}p^{(i)}(1-p)^{(i)}+\pi^{(i)}q^{(i)}(1-q)^{(i)}}
μj(i+1)=P(z=B∣Y,θ(i))=π(i)p(i)(1−p)(i)+π(i)q(i)(1−q)(i)π(i)p(i)(1−p)(i)
则Q函数可以表示为:
Q
(
θ
,
θ
(
i
)
)
=
E
z
[
log
P
(
Y
,
Z
∣
θ
)
∣
Y
,
θ
(
i
)
]
=
∑
z
P
(
Z
∣
Y
,
θ
(
i
)
)
log
P
(
Y
,
Z
∣
θ
)
=
∑
j
=
1
n
∑
z
P
(
Z
∣
y
j
,
θ
(
i
)
)
log
P
(
y
j
,
Z
∣
θ
)
=
∑
j
=
1
n
P
(
z
=
B
∣
y
j
,
θ
(
i
)
)
log
P
(
y
j
,
Z
=
B
∣
θ
)
+
P
(
z
=
C
∣
y
j
,
θ
(
i
)
)
log
P
(
y
j
,
z
=
C
∣
θ
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
log
[
π
(
i
)
p
(
i
)
(
1
−
p
)
(
i
)
]
+
(
1
−
μ
j
(
i
+
1
)
)
log
[
(
1
−
π
(
i
)
)
q
(
i
)
(
1
−
q
)
(
i
)
]
Q(\theta, \theta_{(i)}) = E_z[\log P(Y,Z|\theta)|Y, \theta^{(i)}]\\ =\sum_{z}P(Z|Y,\theta^{(i)})\log P(Y,Z|\theta)\\ =\sum_{j=1}^{n}\sum_{z}P(Z|y_j,\theta^{(i)})\log P(y_j,Z|\theta)\\ =\sum_{j=1}^{n}P(z=B|y_j,\theta^{(i)})\log P(y_j,Z=B|\theta)+P(z=C|y_j,\theta^{(i)})\log P(y_j,z=C|\theta)\\ =\sum_{j=1}^{n}\mu_j^{(i+1)}\log[\pi^{(i)}p^{(i)}(1-p)^{(i)}]+(1-\mu_j^{(i+1)})\log[(1-\pi^{(i)})q^{(i)}(1-q)^{(i)}]
Q(θ,θ(i))=Ez[logP(Y,Z∣θ)∣Y,θ(i)]=z∑P(Z∣Y,θ(i))logP(Y,Z∣θ)=j=1∑nz∑P(Z∣yj,θ(i))logP(yj,Z∣θ)=j=1∑nP(z=B∣yj,θ(i))logP(yj,Z=B∣θ)+P(z=C∣yj,θ(i))logP(yj,z=C∣θ)=j=1∑nμj(i+1)log[π(i)p(i)(1−p)(i)]+(1−μj(i+1))log[(1−π(i))q(i)(1−q)(i)]
M步: 分别对
π
(
i
)
,
p
(
i
)
,
q
(
i
)
\pi ^{(i)}, p^{(i)}, q^{(i)}
π(i),p(i),q(i)求偏导数有:
∂
Q
∂
π
(
i
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
p
(
i
)
(
1
−
p
)
(
i
)
π
(
i
)
p
(
i
)
(
1
−
p
)
(
i
)
−
(
1
−
μ
j
(
i
+
1
)
)
q
(
i
)
(
1
−
q
)
(
i
)
(
1
−
π
(
i
)
)
q
(
i
)
(
1
−
q
)
(
i
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
π
(
i
)
−
(
1
−
μ
j
(
i
+
1
)
)
(
1
−
π
(
i
)
)
=
0
推
导
出
:
π
(
i
+
1
)
=
1
n
∑
j
=
1
n
μ
j
(
i
+
1
)
\frac{\partial Q}{\partial \pi^{(i)}} = \sum_{j=1}^{n}\frac{\mu_j^{(i+1)}p^{(i)}(1-p)^{(i)}}{\pi^{(i)}p^{(i)}(1-p)^{(i)}}-\frac{(1-\mu_j^{(i+1)})q^{(i)}(1-q)^{(i)}}{(1-\pi^{(i)})q^{(i)}(1-q)^{(i)}} \\ =\sum_{j=1}^{n} \frac{\mu_j^{(i+1)}}{\pi^{(i)}}-\frac{(1-\mu_j^{(i+1)})}{(1-\pi^{(i)})} = 0\\ 推导出:\pi^{(i+1)} = \frac{1}{n}\sum_{j=1}^{n}\mu_j^{(i+1)}\\
∂π(i)∂Q=j=1∑nπ(i)p(i)(1−p)(i)μj(i+1)p(i)(1−p)(i)−(1−π(i))q(i)(1−q)(i)(1−μj(i+1))q(i)(1−q)(i)=j=1∑nπ(i)μj(i+1)−(1−π(i))(1−μj(i+1))=0推导出:π(i+1)=n1j=1∑nμj(i+1)
同理可以推导出:
p
(
i
+
1
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
y
j
∑
j
=
1
n
μ
j
(
i
+
1
)
q
(
i
+
1
)
=
∑
j
=
1
n
(
1
−
μ
j
)
(
i
+
1
)
y
j
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
p^{(i+1)} = \frac{\sum_{j=1}^{n}\mu_j^{(i+1)}y_j}{\sum_{j=1}^{n}\mu_j^{(i+1)}}\\ q^{(i+1)} = \frac{\sum_{j=1}^{n}(1-\mu_j)^{(i+1)}y_j}{\sum_{j=1}^{n}(1-\mu_j^{(i+1)})}\\
p(i+1)=∑j=1nμj(i+1)∑j=1nμj(i+1)yjq(i+1)=∑j=1n(1−μj(i+1))∑j=1n(1−μj)(i+1)yj
按照上面的更新公式,不断重复E步和M步,直到收敛。