EM是由于样本中存在隐变量,样本分布的参数又未知,则需要用EM算法来进行计算。
三硬币问题
例:(三硬币模型)三枚硬币,分别记作A,B,C.这些硬币出现正面的概率分别为 π , p , q \pi,p,q π,p,q,进行如下试验:先抛A硬币,A为正面选B,A为反面选C;然后抛选出的B或C硬币,正面记作1,反面记作0;独立重复n次试验(n=10),观测结果为 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1,试求出 π , p , q \pi,p,q π,p,q.
符号定义
观测值
Y
=
(
Y
1
,
Y
2
,
.
.
.
,
Y
n
)
T
=
(
1
,
1
,
0
,
1
,
0
,
0
,
1
,
0
,
1
,
1
)
T
Y=(Y_1,Y_2,...,Y_n)^T=(1,1,0,1,0,0,1,0,1,1)^T
Y=(Y1,Y2,...,Yn)T=(1,1,0,1,0,0,1,0,1,1)T,
未观测值
Z
=
(
Z
1
,
Z
2
,
.
.
.
,
Z
n
)
T
Z=(Z_1,Z_2,...,Z_n)^T
Z=(Z1,Z2,...,Zn)T表示硬币A的正反面,取值为0或1,Y和Z为随机变量。
θ
=
(
π
,
p
,
q
)
\theta=(\pi,p,q)
θ=(π,p,q)为模型参数
则观测数据的似然函数为
L
(
θ
)
L(\theta)
L(θ),是产生这组观测值的概率,
L
(
θ
)
=
P
(
Y
∣
θ
)
=
∑
Z
P
(
Z
∣
θ
)
P
(
Y
∣
Z
,
θ
)
=
Π
j
=
1
n
[
π
p
y
j
(
1
−
p
)
1
−
y
j
+
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
]
L(\theta)=P(Y|\theta)=\sum_ZP(Z|\theta)P(Y|Z,\theta)=\Pi^n_{j=1}[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi )q^{y_j}(1-q)^{1-y_j}]
L(θ)=P(Y∣θ)=Z∑P(Z∣θ)P(Y∣Z,θ)=Πj=1n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]
其对数似然为
L
L
(
θ
)
=
l
o
g
∑
j
=
1
n
[
π
p
y
j
(
1
−
p
)
1
−
y
j
+
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
]
LL(\theta)=log\sum^n_{j=1}[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi )q^{y_j}(1-q)^{1-y_j}]
LL(θ)=log∑j=1n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]
需要求使
L
(
θ
)
L(\theta)
L(θ)最大的
θ
\theta
θ,即
θ
^
=
a
r
g
max
θ
l
o
g
L
(
θ
)
\hat{\theta}=arg \displaystyle\max_{\theta}logL(\theta)
θ^=argθmaxlogL(θ)
但是这式子无法求解,所以需要EM算法来迭代求解:
EM算法
E步:
计算完全数据的对数似然函数对于Z的期望,即在模型
θ
\theta
θ的结果下,产生Y观测值,隐变量是Z的概率的均值???
Q
(
θ
,
θ
(
i
)
)
=
E
Z
∣
Y
,
θ
(
i
)
[
l
o
g
(
P
(
Y
,
Z
∣
θ
)
)
]
=
∑
j
=
1
n
∑
z
j
[
p
(
z
j
∣
y
j
,
θ
(
i
)
)
l
o
g
(
p
(
y
j
,
z
j
∣
θ
)
)
]
=
∑
j
=
1
n
[
p
(
z
j
=
1
∣
y
j
,
θ
(
i
)
)
l
o
g
(
p
(
y
j
,
z
j
=
1
∣
θ
)
)
]
+
[
p
(
z
j
=
0
∣
y
j
,
θ
(
i
)
)
l
o
g
(
p
(
y
j
,
z
j
=
0
∣
θ
)
)
]
\begin{aligned} Q(\theta,\theta^{(i)})&=E_{Z|Y,\theta^{(i)}}[log(P(Y,Z|\theta))]\\ &=\sum^n_{j=1}\sum_{z_j}[p(z_j|y_j,\theta^{(i)})log(p(y_j,z_j|\theta))]\\ &=\sum^n_{j=1}[p(z_j=1|y_j,\theta^{(i)})log(p(y_j,z_j=1|\theta))]+[p(z_j=0|y_j,\theta^{(i)})log(p(y_j,z_j=0|\theta))] \end{aligned}
Q(θ,θ(i))=EZ∣Y,θ(i)[log(P(Y,Z∣θ))]=j=1∑nzj∑[p(zj∣yj,θ(i))log(p(yj,zj∣θ))]=j=1∑n[p(zj=1∣yj,θ(i))log(p(yj,zj=1∣θ))]+[p(zj=0∣yj,θ(i))log(p(yj,zj=0∣θ))]
l
o
g
(
P
(
Y
,
Z
∣
θ
)
)
=
∑
j
=
1
n
l
o
g
(
p
(
y
j
,
z
j
∣
θ
)
)
\bm{log(P(Y,Z|\theta))}=\sum^n_{j=1}log(p(y_j,z_j|\theta))
log(P(Y,Z∣θ))=∑j=1nlog(p(yj,zj∣θ))是一个联合概率,为完全数据的对数似然函数,表示在参数
θ
\theta
θ的条件下,产生这组
Y
,
Z
Y,Z
Y,Z的观测值的概率
p
(
y
j
,
z
j
=
1
∣
θ
)
=
p
(
y
j
∣
z
j
=
1
,
θ
)
p
(
z
j
=
1
∣
θ
)
=
π
p
y
j
(
1
−
p
)
1
−
y
j
p(y_j,z_j=1|\theta)=p(y_j|z_j=1,\theta)p(z_j=1|\theta)=\pi p^{y_j}(1-p)^{1-y_j}
p(yj,zj=1∣θ)=p(yj∣zj=1,θ)p(zj=1∣θ)=πpyj(1−p)1−yj
p
(
y
j
,
z
j
=
0
∣
θ
)
=
p
(
y
j
∣
z
j
=
0
,
θ
)
p
(
z
j
=
0
∣
θ
)
=
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
p(y_j,z_j=0|\theta)=p(y_j|z_j=0,\theta)p(z_j=0|\theta)=(1-\pi) q^{y_j}(1-q)^{1-y_j}
p(yj,zj=0∣θ)=p(yj∣zj=0,θ)p(zj=0∣θ)=(1−π)qyj(1−q)1−yj
P
(
z
j
∣
y
j
,
θ
(
i
)
)
\bm{P(z_j|y_j,\theta^{(i)})}
P(zj∣yj,θ(i))是一个后验概率,表示在观测数据
y
j
y_j
yj和当前参数
θ
(
i
)
\theta^{(i)}
θ(i)下隐变量
z
j
z_j
zj的条件概率分布,
μ
(
i
+
1
)
\mu ^{(i+1)}
μ(i+1)是一个定值,相当于常数
如:计算在模型参数
π
(
i
)
,
p
(
i
)
,
q
(
i
)
\pi^{(i)},p^{(i)},q^{(i)}
π(i),p(i),q(i)下观测数据
y
j
y_j
yj是来自B硬币的条件概率,然后进行加权求和就为选中硬币B的期望
P
(
z
j
=
1
∣
y
j
,
θ
(
i
)
)
=
μ
(
i
+
1
)
=
p
(
y
j
,
z
j
=
1
)
p
(
y
j
)
=
p
(
y
j
∣
z
j
=
1
)
p
(
z
j
=
1
)
∑
z
j
p
(
y
j
,
z
j
)
=
p
(
y
j
∣
z
j
=
1
)
p
(
z
j
=
1
)
p
(
y
j
∣
z
j
=
1
)
+
p
(
y
j
∣
z
j
=
0
)
=
π
(
i
)
(
p
(
i
)
)
y
j
(
1
−
p
(
i
)
)
1
−
y
j
π
(
i
)
(
p
(
i
)
)
y
j
(
1
−
p
(
i
)
)
1
−
y
j
+
(
1
−
π
(
i
)
)
(
q
(
i
)
)
y
j
(
1
−
q
(
i
)
)
1
−
y
j
P
(
z
j
=
0
∣
y
j
,
θ
(
i
)
)
=
1
−
μ
(
i
+
1
)
\begin{aligned} P(z_j=1|y_j,\theta^{(i)})&=\mu^{(i+1)}\\ &={{p(y_j,z_j=1)}\over{p(y_j)}}\\ &={{p(y_j|z_j=1)p(z_j=1)}\over{\sum_{z_j}p(y_j,z_j)}}\\ &={{p(y_j|z_j=1)p(z_j=1)}\over{p(y_j|z_j=1)+p(y_j|z_j=0)}}\\ &={{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}}\over{ \pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j} +(1-\pi^{(i)})(q^{(i)}) ^{y_j}(1-q^{(i)})^{1-y_j} }}\\ \\ P(z_j=0|y_j,\theta^{(i)})&=1-\mu^{(i+1)} \end{aligned}
P(zj=1∣yj,θ(i))P(zj=0∣yj,θ(i))=μ(i+1)=p(yj)p(yj,zj=1)=∑zjp(yj,zj)p(yj∣zj=1)p(zj=1)=p(yj∣zj=1)+p(yj∣zj=0)p(yj∣zj=1)p(zj=1)=π(i)(p(i))yj(1−p(i))1−yj+(1−π(i))(q(i))yj(1−q(i))1−yjπ(i)(p(i))yj(1−p(i))1−yj=1−μ(i+1)
代入
Q
Q
Q有:
Q ( θ , θ ( i ) ) = E Z ∣ Y , θ ( i ) [ l o g ( P ( Y , Z ∣ θ ) ) ] = ∑ j = 1 n ∑ z j [ p ( z j ∣ y j , θ ( i ) ) l o g ( p ( y j , z j ∣ θ ) ) ] = ∑ j = 1 n [ p ( z j = 1 ∣ y j , θ ( i ) ) l o g ( p ( y j , z j = 1 ∣ θ ) ) ] + [ p ( z j = 0 ∣ y j , θ ( i ) ) l o g ( p ( y j , z j = 0 ∣ θ ) ) ] = ∑ j = 1 n μ j ( i + 1 ) l o g ( π p y j ( 1 − p ) 1 − y j ) + ( 1 − μ j ( i + 1 ) ) l o g ( ( 1 − π ) q y j ( 1 − q ) 1 − y j ) \begin{aligned} Q(\theta,\theta^{(i)})&=E_{Z|Y,\theta^{(i)}}[log(P(Y,Z|\theta))]\\ &=\sum^n_{j=1}\sum_{z_j}[p(z_j|y_j,\theta^{(i)})log(p(y_j,z_j|\theta))]\\ &=\sum^n_{j=1}[p(z_j=1|y_j,\theta^{(i)})log(p(y_j,z_j=1|\theta))]+[p(z_j=0|y_j,\theta^{(i)})log(p(y_j,z_j=0|\theta))]\\ &=\sum^n_{j=1}\mu^{(i+1)}_jlog(\pi p^{y_j}(1-p)^{1-y_j})+(1-\mu^{(i+1)}_j)log((1-\pi) q^{y_j}(1-q)^{1-y_j}) \end{aligned} Q(θ,θ(i))=EZ∣Y,θ(i)[log(P(Y,Z∣θ))]=j=1∑nzj∑[p(zj∣yj,θ(i))log(p(yj,zj∣θ))]=j=1∑n[p(zj=1∣yj,θ(i))log(p(yj,zj=1∣θ))]+[p(zj=0∣yj,θ(i))log(p(yj,zj=0∣θ))]=j=1∑nμj(i+1)log(πpyj(1−p)1−yj)+(1−μj(i+1))log((1−π)qyj(1−q)1−yj)
M步:
求使
Q
(
θ
)
Q(\theta)
Q(θ)极大化的
θ
\theta
θ
用
Q
Q
Q函数对参数求偏导,
∂
Q
∂
π
,
∂
Q
∂
p
,
∂
Q
∂
q
{{\partial Q}\over{\partial \pi}},{{\partial Q}\over{\partial p}},{{\partial Q}\over{\partial q}}
∂π∂Q,∂p∂Q,∂q∂Q,再令其为0即可求得参数(
μ
(
i
+
1
)
\mu ^{(i+1)}
μ(i+1)是一个定值,相当于常数)
(1)对
π
\pi
π求偏导和估计
∂
Q
∂
π
=
∑
j
=
1
n
{
μ
j
(
i
+
1
)
∗
1
p
y
j
(
1
−
p
)
1
−
y
j
π
∗
p
y
j
(
1
−
p
)
1
−
y
j
−
(
1
−
μ
j
(
i
+
1
)
)
∗
1
q
y
j
(
1
−
q
)
1
−
y
j
(
1
−
π
)
∗
q
y
j
(
1
−
q
)
1
−
y
j
}
=
∑
j
=
1
n
{
μ
j
(
i
+
1
)
∗
1
π
−
(
1
−
μ
j
(
i
+
1
)
)
∗
1
1
−
π
}
=
∑
j
=
1
n
μ
j
(
j
+
1
)
−
π
π
(
1
−
π
)
=
0
则
π
(
i
+
1
)
=
1
n
∑
j
=
1
n
μ
j
(
i
+
1
)
\begin{aligned} {{\partial Q}\over{\partial \pi}}&=\sum^n_{j=1} \{\mu ^{(i+1)}_j*{1\over {p^{y_j}(1-p)^{1-y_j}\pi}}*p^{y_j}(1-p)^{1-y_j}-(1-\mu^{(i+1)}_j)*{1\over{q^{y_j}(1-q)^{1-y_j}(1-\pi)}}*q^{y_j}(1-q)^{1-y_j}\}\\ &=\sum^n_{j=1}\{\mu ^{(i+1)}_j*{1\over \pi}-(1-\mu^{(i+1)}_j)*{1\over {1-\pi}}\}\\ &=\sum^n_{j=1}{{\mu^{(j+1)}_j}-\pi\over{\pi(1-\pi)}}\\ &=0 \\ 则\pi^{(i+1)}&={1\over n}\sum^n_{j=1}\mu^{(i+1)}_j \end{aligned}
∂π∂Q则π(i+1)=j=1∑n{μj(i+1)∗pyj(1−p)1−yjπ1∗pyj(1−p)1−yj−(1−μj(i+1))∗qyj(1−q)1−yj(1−π)1∗qyj(1−q)1−yj}=j=1∑n{μj(i+1)∗π1−(1−μj(i+1))∗1−π1}=j=1∑nπ(1−π)μj(j+1)−π=0=n1j=1∑nμj(i+1)
(2)对
p
p
p求偏导和估计
∂ Q ∂ p = ∑ j = 1 n μ j ( i + 1 ) ∗ 1 π p y j ( 1 − p ) 1 − y j ∗ [ π y j ( 1 − p ) 1 − y j p 1 − y j − π ( 1 − y j ) p y j ( 1 − p ) y j ] = ∑ j = 1 n μ j ( i + 1 ) ∗ 1 π p y j ( 1 − p ) 1 − y j ∗ π ∗ y j ∗ ( 1 − p ) 1 − y j ∗ ( 1 − p ) y j − ( 1 − y j ) ∗ p y j ∗ p 1 − y j p 1 − y j ∗ ( 1 − p ) y j = ∑ j = 1 n μ j ( i + 1 ) ∗ y j ( 1 − p ) − ( 1 − y j ) p p ∗ ( 1 − p ) = ∑ j = 1 n μ j ( i + 1 ) ∗ y j − p p ∗ ( 1 − p ) = 0 则 p ( i + 1 ) = ∑ j = 1 n μ j ( i + 1 ) y j ∑ j = 1 n μ j ( i + 1 ) \begin{aligned} {{\partial Q}\over{\partial p}}&=\sum^n_{j=1}\mu^{(i+1)}_j*{1\over{\pi p^{y_j} (1-p)^{1-y_j}}}*[\pi y_j{{(1-p)}^{1-y_j}\over{p^{1-y_j}}}-\pi (1-y_j){{p^{y_j}}\over{(1-p)^{y_j}}}]\\ &=\sum^n_{j=1}\mu^{(i+1)}_j*{1\over{\pi p^{y_j} (1-p)^{1-y_j}}}*\pi*{y_j*(1-p)^{1-y_j}*(1-p)^{y_j}-(1-y_j)*p^{y_j}*{p^{1-y_j}}\over{p^{1-y_j}*(1-p)^{y_j}}}\\ &=\sum^n_{j=1}\mu^{(i+1)}_j*{{y_j(1-p)-(1-y_j)p}\over{p*(1-p)}} \\ &=\sum^n_{j=1}\mu^{(i+1)}_j*{{y_j-p}\over{p*(1-p)}} \\ &=0\\ 则p^{(i+1)}&={{\sum^n_{j=1}\mu^{(i+1)}_jy_j}\over{\sum^n_{j=1}\mu^{(i+1)}_j}} \end{aligned} ∂p∂Q则p(i+1)=j=1∑nμj(i+1)∗πpyj(1−p)1−yj1∗[πyjp1−yj(1−p)1−yj−π(1−yj)(1−p)yjpyj]=j=1∑nμj(i+1)∗πpyj(1−p)1−yj1∗π∗p1−yj∗(1−p)yjyj∗(1−p)1−yj∗(1−p)yj−(1−yj)∗pyj∗p1−yj=j=1∑nμj(i+1)∗p∗(1−p)yj(1−p)−(1−yj)p=j=1∑nμj(i+1)∗p∗(1−p)yj−p=0=∑j=1nμj(i+1)∑j=1nμj(i+1)yj
(3)对
q
q
q求偏导和估计
∂
Q
∂
q
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
1
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
∗
[
(
1
−
π
)
y
j
(
1
−
q
)
1
−
y
j
q
1
−
y
j
−
(
1
−
π
)
(
1
−
y
j
)
q
y
j
(
1
−
q
)
y
j
]
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
1
(
1
−
π
)
q
y
j
(
1
−
q
)
1
−
y
j
∗
(
1
−
π
)
∗
y
j
(
1
−
q
)
1
−
y
j
∗
(
1
−
q
)
y
j
−
(
1
−
y
j
)
q
1
−
y
j
∗
q
y
j
(
1
−
q
)
y
j
∗
q
1
−
y
j
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
y
j
(
1
−
q
)
−
(
1
−
y
j
)
q
(
1
−
q
)
∗
q
=
∑
j
=
1
n
(
1
−
μ
(
i
+
1
)
)
y
j
−
q
q
(
1
−
q
)
=
0
则
q
(
i
+
1
)
=
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
y
j
∑
j
=
1
n
(
1
−
μ
j
(
i
+
1
)
)
\begin{aligned} {{\partial Q}\over{\partial q}}&=\sum^n_{j=1}(1-\mu^{(i+1)}_j){1\over{(1-\pi) q^{y_j}(1-q)^{1-y_j}}}*[(1-\pi)y_j{{(1-q)^{1-y_j}}\over{q^{1-y_j}}}-(1-\pi)(1-y_j){{q^{y_j}}\over{(1-q)^{y_j}}}]\\ &=\sum^n_{j=1}(1-\mu^{(i+1)}_j){1\over{(1-\pi) q^{y_j}(1-q)^{1-y_j}}}*(1-\pi)*{{y_j(1-q)^{1-y_j}*(1-q)^{y_j}-(1-y_j)q^{1-y_j}*q^{y_j}}\over{(1-q)^{y_j}*q^{1-y_j}}}\\ &=\sum^n_{j=1}(1-\mu^{(i+1)}_j){{ y_j(1-q)-(1-y_j)q }\over{(1-q)*q}}\\ &=\sum^n_{j=1}(1-\mu^{(i+1)}){{y_j-q}\over{q(1-q)}}\\ &=0\\ 则q^{(i+1)}&={{\sum^n_{j=1}(1-\mu^{(i+1)}_j)y_j}\over{\sum^n_{j=1}(1-\mu^{(i+1)}_j)}} \end{aligned}
∂q∂Q则q(i+1)=j=1∑n(1−μj(i+1))(1−π)qyj(1−q)1−yj1∗[(1−π)yjq1−yj(1−q)1−yj−(1−π)(1−yj)(1−q)yjqyj]=j=1∑n(1−μj(i+1))(1−π)qyj(1−q)1−yj1∗(1−π)∗(1−q)yj∗q1−yjyj(1−q)1−yj∗(1−q)yj−(1−yj)q1−yj∗qyj=j=1∑n(1−μj(i+1))(1−q)∗qyj(1−q)−(1−yj)q=j=1∑n(1−μ(i+1))q(1−q)yj−q=0=∑j=1n(1−μj(i+1))∑j=1n(1−μj(i+1))yj
进行数字计算,假定模型参数 π ( 0 ) = 0.5 , p ( 0 ) = 0.5 , q ( 0 ) = 0.5 \pi^{(0)}=0.5,p^{(0)}=0.5,q^{(0)}=0.5 π(0)=0.5,p(0)=0.5,q(0)=0.5
则对于 y j = 1 y_j=1 yj=1和 y j = 0 y_j=0 yj=0, E E E步中都有 μ j ( 1 ) = π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j = 0.5 \mu^{(1)}_j={{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}}\over{ \pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j} +(1-\pi^{(i)})(q^{(i)}) ^{y_j}(1-q^{(i)})^{1-y_j} }}=0.5 μj(1)=π(i)(p(i))yj(1−p(i))1−yj+(1−π(i))(q(i))yj(1−q(i))1−yjπ(i)(p(i))yj(1−p(i))1−yj=0.5
则根据 M M M步可以得到 π ( 1 ) = 0.5 , p ( 1 ) = 0.6 , q ( 1 ) = 0.6 \pi^{(1)}=0.5,p^{(1)}=0.6,q^{(1)}=0.6 π(1)=0.5,p(1)=0.6,q(1)=0.6已经收敛
不懂就问
?为何会收敛
?期望为什么是条件概率与联合概率的求乘积
?
μ
\mu
μ为何是定值