EM算法解决三硬币问题
1.问题介绍
假如有三个硬币,分别记做A,B,C,这些硬币正面出现的概率分别是 π \pi π, p p p, q q q。进行如下掷硬币实验:先掷硬币A,根据其结果选出硬币B或硬币C,正面选择硬币B,反面选择硬币C。然后掷选出的硬币,掷硬币的结果,出现正面记作1,反面记作0,独立重复n次实验。
2.EM算法简介
输入观测数据
Y
Y
Y,隐变量数据
Z
Z
Z,联合分布
P
(
Y
,
Z
∣
θ
)
P(Y,Z|\theta)
P(Y,Z∣θ),条件分布
P
(
Z
∣
Y
,
θ
)
P(Z|Y,\theta)
P(Z∣Y,θ),输出模型参数
θ
\theta
θ.
(1)选择参数的初值
θ
(
0
)
\theta^{(0)}
θ(0),开始迭代
(2)
E
E
E步:记
θ
(
i
)
\theta^{(i)}
θ(i)为第
i
i
i次迭代参数
θ
\theta
θ的估计值,在第
i
+
1
i+1
i+1次迭代的
E
E
E步,计算
Q
(
θ
,
θ
(
i
)
)
=
E
Z
[
l
o
g
P
(
Y
,
Z
∣
θ
)
∣
Y
,
θ
(
i
)
]
=
∑
Z
l
o
g
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
(
i
)
)
\begin{aligned} Q(\theta,\theta^{(i)}) &=E_Z[logP(Y,Z|\theta)|Y,\theta^{(i)}]\\ &=\sum_ZlogP(Y,Z|\theta)P(Z|Y,\theta^{(i)}) \end{aligned}
Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑logP(Y,Z∣θ)P(Z∣Y,θ(i))
这里,
P
(
Z
∣
Y
,
θ
(
i
)
)
P(Z|Y,\theta^{(i)})
P(Z∣Y,θ(i))是在给定观测数据
Y
Y
Y和当前的参数估计
θ
(
i
)
\theta^{(i)}
θ(i)下隐变量数据
Z
Z
Z的条件概率分布
(3)
M
M
M步:求使
Q
(
θ
,
θ
(
i
)
)
Q(\theta,\theta^{(i)})
Q(θ,θ(i))极大化的
θ
\theta
θ,确定第
i
+
1
i+1
i+1次迭代的参数估计值
θ
(
i
+
1
)
\theta^{(i+1)}
θ(i+1)
θ
(
i
+
1
)
=
a
r
g
m
a
x
θ
Q
(
θ
,
θ
(
i
)
)
\theta^{(i+1)} = \mathop{argmax}\limits_{\theta}Q(\theta,\theta^{(i)})
θ(i+1)=θargmaxQ(θ,θ(i))
(4)重复(2)和(3)直到收敛。
3.EM算法解决三硬币问题
(1)首先选取参数的初值,记作
θ
(
0
)
=
(
π
(
0
)
,
p
(
0
)
,
q
(
0
)
)
\theta^{(0)} = (\pi^{(0)},p^{(0)},q^{(0)})
θ(0)=(π(0),p(0),q(0))
(2)
E
E
E步,计算在模型参数
π
(
i
)
,
p
(
i
)
,
q
(
i
)
\pi^{(i)},p^{(i)},q^{(i)}
π(i),p(i),q(i)下观测数据
y
j
y_j
yj来自掷硬币B的概率
μ
j
(
i
+
1
)
=
π
(
i
)
(
p
(
i
)
)
y
j
(
1
−
p
(
i
)
)
1
−
y
j
π
(
i
)
(
p
(
i
)
)
y
j
(
1
−
p
(
i
)
)
+
(
1
−
π
(
i
)
)
(
q
(
i
)
)
y
j
(
1
−
q
(
i
)
)
1
−
y
j
\mu_j^{(i+1)} = \frac{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}}{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)}) + (1-\pi^{(i)})(q^{(i)})^{y_j}(1-q^{(i)})^{1-y_j}}
μj(i+1)=π(i)(p(i))yj(1−p(i))+(1−π(i))(q(i))yj(1−q(i))1−yjπ(i)(p(i))yj(1−p(i))1−yj
(3)
M
M
M步,计算模型参数的新估计值
π
(
i
+
1
)
=
1
n
∑
j
=
1
n
μ
j
(
i
+
1
)
\pi^{(i+1)} = \frac{1}{n}\sum_{j=1}^{n}\mu_j^{(i+1)}
π(i+1)=n1j=1∑nμj(i+1)
p
(
i
+
1
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
y
j
∑
j
=
1
n
μ
j
i
+
1
p^{(i+1)} = \frac{\mathop{\sum}\limits_{j=1}^n\mu_j^{(i+1)}y_j}{\mathop{\sum}\limits_{j=1}^n\mu_j^{i+1}}
p(i+1)=j=1∑nμji+1j=1∑nμj(i+1)yj
q
i
+
1
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
y
j
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
q^{i+1} = \frac{\mathop{\sum}\limits_{j=1}^n(1-\mu_j^{(i+1)})y_j}{\mathop{\sum}\limits_{j=1}^n(1-\mu_j^{(i+1)})}
qi+1=j=1∑n(1−μj(i+1))j=1∑n(1−μj(i+1))yj
(4)重复(2)(3)步的迭代直到收敛
以上内容均参考李航老师的《统计学习方法》一书
4.推导
《统计学习方法》一书略去了三硬币问题的推导。然而,给出的解决算法中E步和M步所计算的内容与EM算法形式上的E步和M步的计算内容有些出入,因此,有些地方需要更清楚的解释。
这个问题中,我们设定的隐变量是什么?我认为,隐变量
Z
Z
Z应该代表了A硬币是正面或反面,正面是1,反面是0。
为了便于后边的论述,我们假设
θ
=
(
π
,
p
,
q
)
\theta =(\pi,p,q)
θ=(π,p,q)
首先理解E步。为了理解E步的内容,我们从
Q
Q
Q函数入手。注意到:
Q
(
θ
,
θ
(
i
)
)
=
∑
Z
l
o
g
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
(
i
)
)
Q(\theta,\theta^{(i)}) = \sum_ZlogP(Y,Z|\theta)P(Z|Y,\theta^{(i)})
Q(θ,θ(i))=Z∑logP(Y,Z∣θ)P(Z∣Y,θ(i))
观察EM算法
E
E
E步的公式,我们首先需要得到观测数据与隐变量在
θ
\theta
θ条件下的联合分布
P
(
Y
,
Z
∣
θ
)
P(Y,Z|\theta)
P(Y,Z∣θ)。如何求联合概率分布?基于以下公式
P
(
Y
,
Z
∣
θ
)
=
P
(
Z
∣
θ
)
P
(
Y
∣
Z
,
θ
)
P(Y,Z|\theta) = P(Z|\theta)P(Y|Z,\theta)
P(Y,Z∣θ)=P(Z∣θ)P(Y∣Z,θ)
假设我们抽取的n个样本相互独立,显然我们每次掷硬币A都是独立的,即
z
j
z_j
zj之间相互独立。每次观测的数据
y
j
y_j
yj的值也仅仅与
z
j
z_j
zj有关,于是我们可以得到
P
(
Z
∣
θ
)
=
∏
j
=
1
n
P
(
z
j
∣
θ
)
=
∏
j
=
1
n
π
z
j
(
1
−
π
)
1
−
z
j
P(Z|\theta) = \prod_{j=1}^n P(z_j|\theta)=\prod_{j=1}^n\pi^{z_j}(1-\pi)^{1-z_j}
P(Z∣θ)=j=1∏nP(zj∣θ)=j=1∏nπzj(1−π)1−zj
其中
P
(
z
j
∣
θ
)
=
π
z
j
(
1
−
π
)
1
−
z
j
P(z_j|\theta)=\pi^{z_j}(1-\pi)^{1-z_j}
P(zj∣θ)=πzj(1−π)1−zj
这个是由于
z
z
z服从二项分布。
P
(
z
j
=
0
∣
θ
)
=
1
−
π
P(z_j=0|\theta)=1-\pi
P(zj=0∣θ)=1−π,
P
(
z
j
=
1
∣
θ
)
=
π
P(z_j=1|\theta) = \pi
P(zj=1∣θ)=π。
如何求出
P
(
Y
∣
Z
,
θ
)
P(Y|Z,\theta)
P(Y∣Z,θ)?当
z
j
z_j
zj已经确定,我们也就知道投掷哪枚硬币了。这样我们根据
z
j
z_j
zj的取值可以写出以下的内容。
P
(
y
j
=
1
∣
z
j
=
1
,
θ
)
=
p
P(y_j=1|z_j=1,\theta)=p
P(yj=1∣zj=1,θ)=p,
P
(
y
j
=
1
∣
z
j
=
0
,
θ
)
=
q
,
P
(
y
j
=
0
∣
z
j
=
1
,
θ
)
=
1
−
p
P(y_j=1|z_j=0,\theta)=q,P(y_j=0|z_j=1,\theta)=1-p
P(yj=1∣zj=0,θ)=q,P(yj=0∣zj=1,θ)=1−p,
P
(
y
j
=
0
∣
z
j
=
0
,
θ
)
=
1
−
q
P(y_j=0|z_j=0,\theta)=1-q
P(yj=0∣zj=0,θ)=1−q。对于仅有的这四种情况,我们可以用一个公式包含,即
P
(
y
j
∣
z
j
,
θ
)
=
(
p
z
j
q
(
1
−
z
j
)
)
y
j
(
(
1
−
p
)
z
j
(
1
−
q
)
(
1
−
z
j
)
)
(
1
−
y
j
)
P(y_j|z_j,\theta)=(p^{z_j}q^{(1-z_j)})^{y_{j}}((1-p)^{z_j}(1-q)^{(1-z_j)})^{(1-y_j)}
P(yj∣zj,θ)=(pzjq(1−zj))yj((1−p)zj(1−q)(1−zj))(1−yj)
因此,由独立性假设,我们可以得到
P
(
Y
∣
Z
,
θ
)
=
∏
j
=
1
n
(
p
z
j
q
(
1
−
z
j
)
)
y
j
(
(
1
−
p
)
z
j
(
1
−
q
)
(
1
−
z
j
)
)
(
1
−
y
j
)
P(Y|Z,\theta)=\prod_{j=1}^n(p^{z_j}q^{(1-z_j)})^{y_{j}}((1-p)^{z_j}(1-q)^{(1-z_j)})^{(1-y_j)}
P(Y∣Z,θ)=j=1∏n(pzjq(1−zj))yj((1−p)zj(1−q)(1−zj))(1−yj)
所以,由
P
(
Y
,
Z
∣
θ
)
=
P
(
Z
∣
θ
)
P
(
Y
∣
Z
,
θ
)
P(Y,Z|\theta) = P(Z|\theta)P(Y|Z,\theta)
P(Y,Z∣θ)=P(Z∣θ)P(Y∣Z,θ)将
P
(
Z
∣
θ
)
P(Z|\theta)
P(Z∣θ)和
P
(
Y
∣
Z
,
θ
)
P(Y|Z,\theta)
P(Y∣Z,θ)相乘,整理,得到
P
(
Y
,
Z
∣
θ
)
=
∏
j
=
1
n
[
π
z
j
p
z
j
y
j
(
1
−
p
)
z
j
(
1
−
y
j
)
]
[
(
1
−
π
)
1
−
z
j
q
(
1
−
z
j
)
y
j
(
1
−
q
)
(
1
−
z
j
)
(
1
−
y
j
)
]
P(Y,Z|\theta) = \prod_{j=1}^n[\pi^{z_j}p^{z_jy_j}(1-p)^{z_j(1-y_j)}][(1-\pi)^{1-z_j}q^{(1-z_j)y_j}(1-q)^{(1-z_j)(1-y_j)}]
P(Y,Z∣θ)=j=1∏n[πzjpzjyj(1−p)zj(1−yj)][(1−π)1−zjq(1−zj)yj(1−q)(1−zj)(1−yj)]
对上边的公式对
Z
Z
Z求和,也就是对每一个乘积项将
z
j
=
0
z_j=0
zj=0和
z
j
=
1
z_j=1
zj=1的情况加起来,我们可以得到
P
(
Y
∣
θ
)
=
∑
Z
P
(
Y
,
Z
∣
θ
)
=
∏
j
=
1
n
[
π
p
y
j
(
1
−
p
)
1
−
y
j
+
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
]
P(Y|\theta) = \sum_{Z}P(Y,Z|\theta) = \prod_{j=1}^n[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi)q^{y_j}(1-q)^{1-y_j}]
P(Y∣θ)=Z∑P(Y,Z∣θ)=j=1∏n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]
由于有了
P
(
Y
,
Z
∣
θ
)
P(Y,Z|\theta)
P(Y,Z∣θ)和
P
(
Y
∣
θ
)
P(Y|\theta)
P(Y∣θ),我们可以得到
P
(
Z
∣
Y
,
θ
)
=
P
(
Y
,
Z
∣
θ
)
P
(
Y
∣
θ
)
=
∏
j
=
1
n
[
π
z
j
p
z
j
y
j
(
1
−
p
)
z
j
(
1
−
y
j
)
]
[
(
1
−
π
)
1
−
z
j
q
(
1
−
z
j
)
y
j
(
1
−
q
)
(
1
−
z
j
)
(
1
−
y
j
)
]
[
π
p
y
j
(
1
−
p
)
1
−
y
j
+
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
]
P(Z|Y,\theta) = \frac{P(Y,Z|\theta)}{P(Y|\theta)} =\prod_{j=1}^n\frac{[\pi^{z_j}p^{z_jy_j}(1-p)^{z_j(1-y_j)}][(1-\pi)^{1-z_j}q^{(1-z_j)y_j}(1-q)^{(1-z_j)(1-y_j)}]}{[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi)q^{y_j}(1-q)^{1-y_j}]}
P(Z∣Y,θ)=P(Y∣θ)P(Y,Z∣θ)=j=1∏n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj][πzjpzjyj(1−p)zj(1−yj)][(1−π)1−zjq(1−zj)yj(1−q)(1−zj)(1−yj)]
按照EM算法中的需求带入
θ
(
i
)
\theta^{(i)}
θ(i)
P
(
Z
∣
Y
,
θ
(
i
)
)
=
∏
j
=
1
n
[
(
π
(
i
)
)
z
j
(
p
(
i
)
)
z
j
y
j
(
1
−
p
(
i
)
)
z
j
(
1
−
y
j
)
]
[
(
1
−
π
(
i
)
)
1
−
z
j
(
q
(
i
)
)
(
1
−
z
j
)
y
j
(
1
−
q
(
i
)
)
(
1
−
z
j
)
(
1
−
y
j
)
]
[
π
(
i
)
(
p
(
i
)
)
y
j
(
1
−
p
(
i
)
)
1
−
y
j
+
(
1
−
π
(
i
)
)
(
q
(
i
)
)
y
j
(
1
−
q
(
i
)
)
1
−
y
j
]
P(Z|Y,\theta^{(i)}) =\prod_{j=1}^n\frac{[(\pi^{(i)})^{z_j}(p^{(i)})^{z_jy_j}(1-p^{(i)})^{z_j(1-y_j)}][(1-\pi^{(i)})^{1-z_j}(q^{(i)})^{(1-z_j)y_j}(1-q^{(i)})^{(1-z_j)(1-y_j)}]}{[\pi^{(i)} (p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}+(1-\pi^{(i)})(q^{(i)})^{y_j}(1-q^{(i)})^{1-y_j}]}
P(Z∣Y,θ(i))=j=1∏n[π(i)(p(i))yj(1−p(i))1−yj+(1−π(i))(q(i))yj(1−q(i))1−yj][(π(i))zj(p(i))zjyj(1−p(i))zj(1−yj)][(1−π(i))1−zj(q(i))(1−zj)yj(1−q(i))(1−zj)(1−yj)]
至此,
Q
Q
Q函数中所需要的组成部分都计算出来了。我们也就完成了
E
E
E步
李航老师书中的
μ
j
(
i
+
1
)
\mu_j^{(i+1)}
μj(i+1)是什么?我们观察上面的公式,可以发现
μ
j
(
i
+
1
)
=
P
(
z
j
=
1
∣
y
j
,
θ
(
i
)
)
\mu_j^{(i+1)}=P(z_j=1|y_j,\theta^{(i)})
μj(i+1)=P(zj=1∣yj,θ(i))
1
−
μ
j
(
i
+
1
)
=
P
(
z
j
=
0
∣
y
j
,
θ
(
i
)
)
1-\mu_j^{(i+1)}=P(z_j=0|y_j,\theta^{(i)})
1−μj(i+1)=P(zj=0∣yj,θ(i))
因为使用
μ
j
(
i
+
1
)
\mu_j^{(i+1)}
μj(i+1)这个符号,会使得公式变得简单,后边我也会用这个符号说明问题。
接下类我们来理解
M
M
M步。
θ
(
i
+
1
)
=
a
r
g
m
a
x
θ
Q
(
θ
,
θ
(
i
)
)
=
a
r
g
m
a
x
θ
∑
Z
l
o
g
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
(
i
)
)
=
a
r
g
m
a
x
θ
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
1
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
=
a
r
g
m
a
x
θ
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
(
l
o
g
P
(
y
1
,
z
1
∣
θ
)
+
∑
j
=
2
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
a
r
g
m
a
x
θ
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
,
z
1
∣
θ
)
+
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
2
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
a
r
g
m
a
x
θ
(
P
(
z
1
=
0
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
,
z
1
=
0
∣
θ
)
+
P
(
z
1
=
1
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
,
z
1
=
1
∣
θ
)
)
∑
Z
′
P
(
Z
′
∣
Y
′
,
θ
)
+
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
2
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
a
r
g
m
a
x
θ
(
P
(
z
1
=
0
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
,
z
1
=
0
∣
θ
)
+
P
(
z
1
=
1
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
,
z
1
=
1
∣
θ
)
)
+
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
2
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
a
r
g
m
a
x
θ
∑
z
1
(
P
(
z
1
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
∣
θ
)
)
+
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
2
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
a
r
g
m
a
x
θ
∑
z
1
(
P
(
z
1
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
1
∣
θ
)
)
+
∑
z
2
(
P
(
z
2
∣
Y
,
θ
(
i
)
)
l
o
g
P
(
y
2
∣
θ
)
)
+
∑
Z
P
(
Z
∣
Y
,
θ
(
i
)
)
∑
j
=
3
n
l
o
g
P
(
y
j
,
z
j
∣
θ
)
)
=
.
.
.
=
a
r
g
m
a
x
θ
∑
j
=
1
n
∑
z
j
P
(
z
j
∣
y
j
,
θ
(
i
)
)
l
o
g
(
y
j
,
z
j
∣
θ
)
\begin{aligned} \theta^{(i+1)} &=\mathop{argmax}\limits_{\theta}Q(\theta,\theta^{(i)})\\ &=\mathop{argmax}\limits_{\theta}\sum_ZlogP(Y,Z|\theta)P(Z|Y,\theta^{(i)})\\ &=\mathop{argmax}\limits_{\theta}\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=1}^nlogP(y_j,z_j|\theta)\\ &=\mathop{argmax}\limits_{\theta}\sum_ZP(Z|Y,\theta^{(i)})(logP(y_1,z_1|\theta)+\sum_{j=2}^nlogP(y_j,z_j|\theta))\\ &=\mathop{argmax}\limits_{\theta}\sum_ZP(Z|Y,\theta^{(i)})logP(y_1,z_1|\theta)+\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=2}^nlogP(y_j,z_j|\theta))\\ &=\mathop{argmax}\limits_{\theta}(P(z_1=0|Y,\theta^{(i)})logP(y_1,z_1=0|\theta)+P(z_1=1|Y,\theta^{(i)})logP(y_1,z_1=1|\theta))\sum_{Z'}P(Z'|Y',\theta)+\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=2}^nlogP(y_j,z_j|\theta))\\ &=\mathop{argmax}\limits_{\theta}(P(z_1=0|Y,\theta^{(i)})logP(y_1,z_1=0|\theta)+P(z_1=1|Y,\theta^{(i)})logP(y_1,z_1=1|\theta))+\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=2}^nlogP(y_j,z_j|\theta))\\ &=\mathop{argmax}\limits_{\theta}\sum_{z_1}(P(z_1|Y,\theta^{(i)})logP(y_1|\theta))+\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=2}^nlogP(y_j,z_j|\theta))\\ &=\mathop{argmax}\limits_{\theta}\sum_{z_1}(P(z_1|Y,\theta^{(i)})logP(y_1|\theta))+\sum_{z_2}(P(z_2|Y,\theta^{(i)})logP(y_2|\theta))+\sum_ZP(Z|Y,\theta^{(i)})\sum_{j=3}^nlogP(y_j,z_j|\theta))\\ &=...\\ &=\mathop{argmax}\limits_{\theta}\sum_{j=1}^n\sum_{z_j}P(z_j|y_j,\theta^{(i)})log(y_j,z_j|\theta) \end{aligned}
θ(i+1)=θargmaxQ(θ,θ(i))=θargmaxZ∑logP(Y,Z∣θ)P(Z∣Y,θ(i))=θargmaxZ∑P(Z∣Y,θ(i))j=1∑nlogP(yj,zj∣θ)=θargmaxZ∑P(Z∣Y,θ(i))(logP(y1,z1∣θ)+j=2∑nlogP(yj,zj∣θ))=θargmaxZ∑P(Z∣Y,θ(i))logP(y1,z1∣θ)+Z∑P(Z∣Y,θ(i))j=2∑nlogP(yj,zj∣θ))=θargmax(P(z1=0∣Y,θ(i))logP(y1,z1=0∣θ)+P(z1=1∣Y,θ(i))logP(y1,z1=1∣θ))Z′∑P(Z′∣Y′,θ)+Z∑P(Z∣Y,θ(i))j=2∑nlogP(yj,zj∣θ))=θargmax(P(z1=0∣Y,θ(i))logP(y1,z1=0∣θ)+P(z1=1∣Y,θ(i))logP(y1,z1=1∣θ))+Z∑P(Z∣Y,θ(i))j=2∑nlogP(yj,zj∣θ))=θargmaxz1∑(P(z1∣Y,θ(i))logP(y1∣θ))+Z∑P(Z∣Y,θ(i))j=2∑nlogP(yj,zj∣θ))=θargmaxz1∑(P(z1∣Y,θ(i))logP(y1∣θ))+z2∑(P(z2∣Y,θ(i))logP(y2∣θ))+Z∑P(Z∣Y,θ(i))j=3∑nlogP(yj,zj∣θ))=...=θargmaxj=1∑nzj∑P(zj∣yj,θ(i))log(yj,zj∣θ)
其中
Z
′
=
(
z
2
,
z
3
,
.
.
.
,
z
n
)
,
Y
′
=
(
y
2
,
y
3
,
.
.
.
y
n
)
Z' = (z_2,z_3,...,z_n),Y'=(y_2,y_3,...y_n)
Z′=(z2,z3,...,zn),Y′=(y2,y3,...yn)。显然,
∑
Z
′
P
(
Z
′
∣
Y
′
,
θ
)
=
1
\mathop{\sum}\limits_{Z'}P(Z'|Y',\theta)=1
Z′∑P(Z′∣Y′,θ)=1,(
Z
′
Z'
Z′的条件概率对
Z
′
Z'
Z′的积分一定是1)
(注:参考了 https://blog.csdn.net/zsdust/article/details/100042491)
带入我们上边得到的公式,得到
θ
(
i
+
1
)
=
a
r
g
m
a
x
θ
∑
j
=
1
n
∑
z
j
P
(
z
j
∣
y
j
,
θ
(
i
)
)
l
o
g
(
y
j
,
z
j
∣
θ
)
=
a
r
g
m
a
x
θ
∑
j
=
1
n
{
[
l
o
g
π
p
y
j
(
1
−
p
)
(
1
−
y
j
)
]
μ
j
(
i
+
1
)
+
[
l
o
g
(
1
−
π
)
q
y
j
(
1
−
q
)
(
1
−
y
j
)
]
(
1
−
μ
j
(
i
+
1
)
)
}
\begin{aligned} \theta^{(i+1)} &= \mathop{argmax}\limits_{\theta}\sum_{j=1}^n\sum_{z_j}P(z_j|y_j,\theta^{(i)})log(y_j,z_j|\theta)\\ &=\mathop{argmax}\limits_{\theta}\sum_{j=1}^n\{[log\pi p^{y_j}(1-p)^{(1-y_j)}]\mu_j^{(i+1)} + [log(1-\pi)q^{y_j}(1-q)^{(1-y_j)}](1-\mu_j^{(i+1)})\} \end{aligned}
θ(i+1)=θargmaxj=1∑nzj∑P(zj∣yj,θ(i))log(yj,zj∣θ)=θargmaxj=1∑n{[logπpyj(1−p)(1−yj)]μj(i+1)+[log(1−π)qyj(1−q)(1−yj)](1−μj(i+1))}
上式对
π
\pi
π,
p
p
p,
q
q
q求导数,分别让导数等于0即可得到上边的公式。
对
π
\pi
π求导
1
π
(
i
+
1
)
∑
j
=
1
n
μ
j
(
i
+
1
)
+
1
1
−
π
(
i
+
1
)
(
1
−
μ
j
(
i
+
1
)
)
=
0
(
1
−
π
(
i
+
1
)
)
∑
j
=
1
n
μ
j
(
i
+
1
)
+
π
(
i
+
1
)
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
=
0
π
(
i
+
1
)
=
1
n
∑
j
=
1
n
μ
j
(
i
+
1
)
\begin{aligned} \frac{1}{\pi^{(i+1)}}\sum_{j=1}^n\mu_j^{(i+1)} + \frac{1}{1-\pi^{(i+1)}}(1-\mu_j^{(i+1)}) &= 0\\ (1-\pi^{(i+1)})\sum_{j=1}^n\mu_j^{(i+1)} + \pi^{(i+1)}\sum_{j=1}^n(1-\mu_j^{(i+1)})&=0\\ \pi^{(i+1)} = \frac{1}{n}\sum_{j=1}^{n}\mu_j^{(i+1)} \end{aligned}
π(i+1)1j=1∑nμj(i+1)+1−π(i+1)1(1−μj(i+1))(1−π(i+1))j=1∑nμj(i+1)+π(i+1)j=1∑n(1−μj(i+1))π(i+1)=n1j=1∑nμj(i+1)=0=0
对
p
p
p求导
∑
j
=
1
n
[
y
j
(
1
−
p
(
i
+
1
)
)
+
(
y
j
−
1
)
p
(
i
+
1
)
]
μ
j
(
i
+
1
)
=
0
∑
j
=
1
n
[
y
j
−
p
(
i
+
1
)
]
μ
j
(
i
+
1
)
=
0
∑
j
=
1
n
y
j
μ
j
(
i
+
1
)
−
p
(
i
+
1
)
∑
j
=
1
n
μ
j
(
i
+
1
)
=
0
p
(
i
+
1
)
=
∑
j
=
1
n
μ
j
(
i
+
1
)
y
j
∑
j
=
1
n
μ
j
i
+
1
\begin{aligned} \sum_{j=1}^n[y_j(1-p^{(i+1)}) + (y_j-1)p^{(i+1)}]\mu_j^{(i+1)} &= 0\\ \sum_{j=1}^n[y_j - p^{(i+1)}]\mu_j^{(i+1)} &= 0\\ \sum_{j=1}^n y_j\mu_j^{(i+1)} - p^{(i+1)}\sum_{j=1}^n\mu_j^{(i+1)}&=0\\ p^{(i+1)} = \frac{\mathop{\sum}\limits_{j=1}^n\mu_j^{(i+1)}y_j}{\mathop{\sum}\limits_{j=1}^n\mu_j^{i+1}} \end{aligned}
j=1∑n[yj(1−p(i+1))+(yj−1)p(i+1)]μj(i+1)j=1∑n[yj−p(i+1)]μj(i+1)j=1∑nyjμj(i+1)−p(i+1)j=1∑nμj(i+1)p(i+1)=j=1∑nμji+1j=1∑nμj(i+1)yj=0=0=0
对
q
q
q求导
∑
j
=
1
n
y
j
l
o
g
q
(
i
+
1
)
+
(
1
−
y
j
)
l
o
g
(
1
−
q
(
i
+
1
)
)
(
1
−
μ
j
(
i
+
1
)
)
=
0
∑
j
=
1
n
y
j
(
1
−
μ
j
(
i
+
1
)
)
−
q
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
=
0
q
i
+
1
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
y
j
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
\begin{aligned} \sum_{j=1}^ny_jlogq^{(i+1)} + (1-y_j)log(1-q^{(i+1)})(1-\mu_j^{(i+1)})&=0\\ \sum_{j=1}^ny_j(1-\mu_j^{(i+1)})-q\sum_{j=1}^n(1-\mu_j^{(i+1)})&=0\\ q^{i+1} = \frac{\mathop{\sum}\limits_{j=1}^n(1-\mu_j^{(i+1)})y_j}{\mathop{\sum}\limits_{j=1}^n(1-\mu_j^{(i+1)})} \end{aligned}
j=1∑nyjlogq(i+1)+(1−yj)log(1−q(i+1))(1−μj(i+1))j=1∑nyj(1−μj(i+1))−qj=1∑n(1−μj(i+1))qi+1=j=1∑n(1−μj(i+1))j=1∑n(1−μj(i+1))yj=0=0