UTF8gbsn
如何证明EM算法的收敛性. EM算法的收敛性证明有两个点.
-
P ( Y ∣ θ ) P(Y|\theta) P(Y∣θ),
有上界,而 L ( θ t ) = l o g P ( Y ∣ θ t ) L(\theta^t)=logP(Y|\theta^t) L(θt)=logP(Y∣θt)收敛到某一个值 L ∗ L^{*} L∗ -
EM算法的收敛序列 θ t \theta^{t} θt,收敛到 θ ∗ \theta^{*} θ∗是 L ( θ ) L(\theta) L(θ)的稳定点.
今天我们这里只证明第一点, 第二点留作后面来证明.
第一个定理
P ( Y ∣ θ t + 1 ) ⩾ P ( Y ∣ θ t ) P(Y|\theta^{t+1})\geqslant P(Y|\theta^t) P(Y∣θt+1)⩾P(Y∣θt)
这个定理证明如下.
-
首先,由于 P ( Y ∣ θ ) = P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ) P(Y|\theta)=\frac{P(Y,Z|\theta)}{P(Z|Y,\theta)} P(Y∣θ)=P(Z∣Y,θ)P(Y,Z∣θ) 可得
L [ P ( Y ∣ θ ) ] = l o g P ( Y , Z ∣ θ ) − l o g P ( Z ∣ Y , θ ) L[P(Y|\theta)]=logP(Y,Z|\theta)-logP(Z|Y,\theta) L[P(Y∣θ)]=logP(Y,Z∣θ)−logP(Z∣Y,θ) -
我们将 θ t \theta^t θt, 也即是我们上一步迭代求出来的估计参数引入. Q ( θ , θ t ) = ∑ i P ( Z i ∣ Y , θ t ) l o g P ( Z i , Y ∣ θ ) H ( θ , θ t ) = ∑ i P ( Z i ∣ Y , θ t ) l o g P ( Z i ∣ Y , θ ) \left. \begin{aligned} Q(\theta,\theta^t)&=\sum_iP(Z_i|Y,\theta^t)logP(Z_i,Y|\theta)\\ H(\theta,\theta^t)&=\sum_{i}P(Z_i|Y, \theta^t)logP(Z_i|Y,\theta) \end{aligned} \right. Q(θ,θt)H(θ,θt)=i∑P(Zi∣Y,θt)logP(Zi,Y∣θ)=i∑P(Zi∣Y,θt)logP(Zi∣Y,θ)
-
上式两个相减可得. Q ( θ , θ t ) − H ( θ , θ t ) = ∑ i P ( Z i ∣ Y , θ t ) l o g P ( Z i , Y ∣ θ ) P ( Z i ∣ Y , θ ) = ∑ i P ( Z i ∣ Y , θ t ) l o g P ( Y ∣ θ ) = L [ P ( Y ∣ θ ) ] \left. \begin{aligned} Q(\theta,\theta^t)-H(\theta,\theta^t)&=\sum_{i}P(Z_i|Y,\theta^t)log \frac{P(Z_i, Y|\theta)}{P(Z_i|Y,\theta)}\\ &=\sum_{i}P(Z_i|Y,\theta^t)log P(Y|\theta)\\ &=L[P(Y|\theta)] \end{aligned} \right. Q(θ,θt)−H(θ,θt)=i∑P(Zi∣Y,θt)logP(Zi∣Y,θ)P(Zi,Y∣θ)=i∑P(Zi∣Y,θt)logP(Y∣θ)=L[P(Y∣θ)]
-
现在我们引入参数 θ t + 1 \theta^{t+1} θt+1. l o g P ( Y ∣ θ t + 1 ) − l o g P ( Y ∣ θ t ) = Q ( θ t + 1 , θ t ) − H ( θ t + 1 , θ t ) − Q ( θ t , θ t ) + H ( θ t , θ t ) = Q ( θ t + 1 , θ t ) − Q ( θ t , θ t ) − [ H ( θ t + 1 , θ t ) − H ( θ t , θ t ) ] \left. \begin{aligned} logP(Y|\theta^{t+1})-logP(Y|\theta^{t})&=Q(\theta^{t+1},\theta^t)-H(\theta^{t+1}, \theta^t)-Q(\theta^t,\theta^t)+H(\theta^t,\theta^t)\\ &=Q(\theta^{t+1},\theta^t)-Q(\theta^t,\theta^t)-[H(\theta^{t+1},\theta^t)-H(\theta^{t}, \theta^{t})] \end{aligned} \right. logP(Y∣θt+1)−logP(Y∣θt)=Q(θt+1,θt)−H(θt+1,θt)−Q(θt,θt)+H(θt,θt)=Q(θt+1,θt)−Q(θt,θt)−[H(θt+1,θt)−H(θt,θt)]
接下来,需要证明
Q
(
θ
t
+
1
,
θ
t
)
−
Q
(
θ
t
,
θ
t
)
⩾
0
Q(\theta^{t+1},\theta^t)-Q(\theta^t,\theta^t)\geqslant 0
Q(θt+1,θt)−Q(θt,θt)⩾0
这个是显而易见的,
因为
θ
t
+
1
=
arg
max
θ
Q
(
θ
,
θ
t
)
\theta^{t+1}=\arg\max_{\theta} Q(\theta, \theta^t)
θt+1=argmaxθQ(θ,θt). 再接下来证明
H
(
θ
t
+
1
,
θ
t
)
−
H
(
θ
t
,
θ
t
)
=
∑
i
P
(
Z
i
∣
Y
,
θ
t
)
l
o
g
P
(
Z
i
∣
Y
,
θ
t
+
1
)
P
(
Z
i
∣
Y
,
θ
t
)
⩽
l
o
g
∑
i
P
(
Z
i
∣
Y
,
θ
t
)
P
(
Z
i
∣
Y
,
θ
t
+
1
)
P
(
Z
i
∣
Y
,
θ
t
)
=
l
o
g
∑
i
P
(
Z
i
∣
Y
,
θ
t
+
1
)
=
0
\left. \begin{aligned} H(\theta^{t+1},\theta^t)-H(\theta^{t}, \theta^t)&=\sum_{i}P(Z_i|Y,\theta^t)log \frac{P(Z_i|Y,\theta^{t+1})}{P(Z_i|Y,\theta^t)}\\ &\leqslant log \sum_{i}P(Z_i|Y,\theta^t)\frac{P(Z_i|Y,\theta^{t+1})}{P(Z_i|Y,\theta^t)}\\ &=log \sum_{i}P(Z_i|Y,\theta^{t+1})=0 \end{aligned} \right.
H(θt+1,θt)−H(θt,θt)=i∑P(Zi∣Y,θt)logP(Zi∣Y,θt)P(Zi∣Y,θt+1)⩽logi∑P(Zi∣Y,θt)P(Zi∣Y,θt)P(Zi∣Y,θt+1)=logi∑P(Zi∣Y,θt+1)=0
于是结论
P
(
Y
∣
θ
t
+
1
)
⩾
P
(
Y
∣
θ
t
)
P(Y|\theta^{t+1})\geqslant P(Y|\theta^t)
P(Y∣θt+1)⩾P(Y∣θt)
得证.故而第一点得证.因为
L
(
θ
)
L(\theta)
L(θ)的单调有界性, 于是它就是收敛的.
第二个定理
至于如何证明收敛到
θ
∗
\theta^{*}
θ∗
是
L
(
θ
)
L(\theta)
L(θ)的一个稳定点的这名请参考文章Tobias, R. D. (1986). 1983-On
the Convergence Proferties of the EM Algorithm.pdf. Annals of
Statistics, 14(2), 590–606.