统计学习方法——EM算法及其推广
EM算法及其推广(二)
EM算法在高斯混合模型学习中的应用
高斯混合模型
高斯混合模型是指具有如下形式的概率分布模型:
P
(
y
∣
θ
)
=
∑
k
=
1
K
α
k
ϕ
(
y
∣
θ
k
)
P\left( {y\left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {y\left| {{\theta _k}} \right.} \right)}
P(y∣θ)=k=1∑Kαkϕ(y∣θk)
其中,
α
k
>
0
{{\alpha _k}}>0
αk>0是系数,
∑
k
=
1
K
α
k
=
1
\sum\limits_{k = 1}^K {{\alpha _k}} = 1
k=1∑Kαk=1,
ϕ
(
y
∣
θ
k
)
{\phi \left( {y\left| {{\theta _k}} \right.} \right)}
ϕ(y∣θk)是高斯分布密度,
θ
k
=
(
μ
,
σ
k
2
)
{\theta _k} = \left( {\mu ,\sigma _k^2} \right)
θk=(μ,σk2),
ϕ
(
y
∣
θ
k
)
=
1
2
π
σ
k
exp
(
−
(
y
−
μ
k
)
2
2
σ
k
2
)
\phi \left( {y\left| {{\theta _k}} \right.} \right) = \frac{1}{{\sqrt {2\pi } {\sigma _k}}}\exp \left( { - \frac{{{{\left( {y - {\mu _k}} \right)}^2}}}{{2\sigma _k^2}}} \right)
ϕ(y∣θk)=2πσk1exp(−2σk2(y−μk)2)
称为第
k
k
k个分模型。
一般混合模型可以由任意概率分布密度替代高斯分布密度。
高斯混合模型参数估计的EM算法
- 理论
假设观察数据 y 1 , y 2 , ⋯   , y N {y_1},{y_2}, \cdots ,{y_N} y1,y2,⋯,yN由高斯混合模型生成,
P ( y ∣ θ ) = ∑ k = 1 K α k ϕ ( y ∣ θ k ) P\left( {y\left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {y\left| {{\theta _k}} \right.} \right)} P(y∣θ)=k=1∑Kαkϕ(y∣θk)
其中 θ = ( α 1 , α 2 , ⋯   , α K ; θ 1 , θ 2 , ⋯   , θ K ) \theta = \left( {{\alpha _1},{\alpha _2}, \cdots ,{\alpha _K};{\theta _1},{\theta _2}, \cdots ,{\theta _K}} \right) θ=(α1,α2,⋯,αK;θ1,θ2,⋯,θK),用EM算法估计高斯混合模型的参数 θ \theta θ。- 明确隐变量,写出完全数据的对数似然函数
设想观测数据 y j , j = 1 , 2 , ⋯   , N y_j,j=1,2,\cdots,N yj,j=1,2,⋯,N是这样产生的:首先依概率 α k \alpha_k αk选择第 k k k个高斯分布分模型 ϕ ( y ∣ θ k ) {\phi \left( {y\left| {{\theta _k}} \right.} \right)} ϕ(y∣θk);然后依照第 k k k个分模型的概率分布 ϕ ( y ∣ θ k ) {\phi \left( {y\left| {{\theta _k}} \right.} \right)} ϕ(y∣θk)生成观测数据 y j y_j yj,这时观测 y j y_j yj是已知的。反映观测数据 y j y_j yj来自第 k k k个分模型的数据是未知的, k = 1 , 2 , ⋯   , K k=1,2,\cdots,K k=1,2,⋯,K,以隐变量 γ j k {\gamma _{jk}} γjk表示,定义如下:
γ j k = { 1 , 第 j 个 观 测 来 自 第 k 个 分 模 型 0 , 否 则 {\gamma _{jk}} = \left\{ \begin{array}{l} 1,第j个观测来自第k个分模型\\ 0,否则 \end{array} \right. γjk={1,第j个观测来自第k个分模型0,否则
j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K j=1,2,\cdots,N;k=1,2,\cdots,K j=1,2,⋯,N;k=1,2,⋯,K
有了观测数据 y j y_j yj和未观测数据 γ j k \gamma_{jk} γjk,那么完全数据为:
( y j , γ j 1 , γ j 2 , ⋯   , γ j K ) , j = 1 , 2 , ⋯   , N \left( {{y_j},{\gamma _{j1}},{\gamma _{j2}}, \cdots ,{\gamma _{jK}}} \right),j = 1,2, \cdots ,N (yj,γj1,γj2,⋯,γjK),j=1,2,⋯,N
于是写出完全数据的似然函数:
P ( y , γ ∣ θ ) = ∏ j = 1 N P ( y j , γ j 1 , γ j 2 , ⋯   , γ j K ∣ θ ) = ∏ k = 1 K α k n k ∏ j = 1 N [ 1 2 π σ k exp ( − ( y j − μ k ) 2 2 σ k 2 ) ] γ j k P\left( {y,\gamma \left| \theta \right.} \right) = \prod\limits_{j = 1}^N {P\left( {{y_j},{\gamma _{j1}},{\gamma _{j2}}, \cdots ,{\gamma _{jK}}\left| \theta \right.} \right)} = \prod\limits_{k = 1}^K {\alpha _k^{{n_k}}} {\prod\limits_{j = 1}^N {\left[ {\frac{1}{{\sqrt {2\pi } {\sigma _k}}}\exp \left( { - \frac{{{{\left( {{y_j} - {\mu _k}} \right)}^2}}}{{2\sigma _k^2}}} \right)} \right]} ^{{\gamma _{jk}}}} P(y,γ∣θ)=j=1∏NP(yj,γj1,γj2,⋯,γjK∣θ)=k=1∏Kαknkj=1∏N[2πσk1exp(−2σk2(yj−μk)2)]γjk
式中 n k = ∑ j = 1 N γ j k , N = ∑ k = 1 K n k {n_k} = \sum\limits_{j = 1}^N {{\gamma _{jk}}} ,N = \sum\limits_{k = 1}^K {{n_k}} nk=j=1∑Nγjk,N=k=1∑Knk
完全数据的对数似然函数为:
log P ( y , γ ∣ θ ) = ∑ k = 1 K n k log α k + ∑ j = 1 N γ j k [ log ( 1 2 π ) − log σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] \log P\left( {y,\gamma \left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{n_k}\log {\alpha _k}} + \sum\limits_{j = 1}^N {{\gamma _{jk}}\left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right]} logP(y,γ∣θ)=k=1∑Knklogαk+j=1∑Nγjk[log(2π1)−logσk−2σk21(yj−μk)2] - E步:确定
Q
Q
Q函数
Q ( θ , θ ( i ) ) = E [ log P ( y , γ ∣ θ ) ∣ y , θ ( i ) ] = ∑ k = 1 K { ∑ j = 1 N ( E γ j k ) log α k + ∑ j = 1 N ( E γ j k ) [ log ( 1 2 π ) − log σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] } Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = E\left[ {\log P\left( {y,\gamma \left| \theta \right.} \right)\left| y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_{k = 1}^K {\left\{ {\sum\limits_{j = 1}^N {\left( {E{\gamma _{jk}}} \right)\log {\alpha _k}} + \sum\limits_{j = 1}^N {\left( {E{\gamma _{jk}}} \right)\left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right]} } \right\}} Q(θ,θ(i))=E[logP(y,γ∣θ)∣y,θ(i)]=k=1∑K{j=1∑N(Eγjk)logαk+j=1∑N(Eγjk)[log(2π1)−logσk−2σk21(yj−μk)2]}
需要计算 E ( γ j k ∣ y , θ ) E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) E(γjk∣y,θ),记为 γ ^ j k {{\hat \gamma }_{jk}} γ^jk:
γ ^ j k = E ( γ j k ∣ y , θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K {{\hat \gamma }_{jk}} = E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) = \frac{{{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)}}{{\sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)} }},j = 1,2, \cdots ,N;k = 1,2, \cdots ,K γ^jk=E(γjk∣y,θ)=k=1∑Kαkϕ(yj∣θk)αkϕ(yj∣θk),j=1,2,⋯,N;k=1,2,⋯,K
带入后得到:
Q ( θ , θ ( i ) ) = ∑ k = 1 K n k log α k + ∑ k = 1 N γ ^ j k [ log ( 1 2 π ) − log σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = \sum\limits_{k = 1}^K {{n_k}\log {\alpha _k}} + \sum\limits_{k = 1}^N {{{\hat \gamma }_{jk}}} \left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right] Q(θ,θ(i))=k=1∑Knklogαk+k=1∑Nγ^jk[log(2π1)−logσk−2σk21(yj−μk)2] - 确定M步
迭代的 M M M步是求函数 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))对 θ \theta θ的极大值,即求新一轮迭代的模型参数:
θ ( i + 1 ) = arg max θ Q ( θ , θ ( i ) ) {\theta ^{\left( {i + 1} \right)}} = \arg \mathop {\max }\limits_\theta Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) θ(i+1)=argθmaxQ(θ,θ(i))
用 μ ^ k , σ ^ k 2 , α ^ k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k},\hat \sigma _k^2,{{\hat \alpha }_k},k = 1,2, \cdots ,K μ^k,σ^k2,α^k,k=1,2,⋯,K表示 θ ( i + 1 ) \theta ^{\left( {i + 1} \right)} θ(i+1)的各个参数:
μ ^ k = ∑ j = 1 N γ ^ j k y j ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{y_j}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K μ^k=j=1∑Nγ^jkj=1∑Nγ^jkyj,k=1,2,⋯,K
σ ^ k 2 = ∑ j = 1 N γ ^ j k ( y j − μ k ) 2 ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat \sigma _k^2 = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K σ^k2=j=1∑Nγ^jkj=1∑Nγ^jk(yj−μk)2,k=1,2,⋯,K
α ^ k = n k N = ∑ j = 1 N γ ^ j k N , k = 1 , 2 , ⋯   , K {{\hat \alpha }_k} = \frac{{{n_k}}}{N} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }}{N},k = 1,2, \cdots ,K α^k=Nnk=Nj=1∑Nγ^jk,k=1,2,⋯,K
- 明确隐变量,写出完全数据的对数似然函数
- 算法
- 输入:观测数据 y 1 , y 2 , ⋯   , y N y_1,y_2,\cdots,y_N y1,y2,⋯,yN,高斯混合模型
- 输出:高斯混合模型参数
- 流程
- 取参数的初始值开始迭代
- E步:依据当前模型参数,计算分模型
k
k
k对观测数据
y
j
y_j
yj的响应度
γ ^ j k = E ( γ j k ∣ y , θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K {{\hat \gamma }_{jk}} = E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) = \frac{{{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)}}{{\sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)} }},j = 1,2, \cdots ,N;k = 1,2, \cdots ,K γ^jk=E(γjk∣y,θ)=k=1∑Kαkϕ(yj∣θk)αkϕ(yj∣θk),j=1,2,⋯,N;k=1,2,⋯,K - M步:计算新一轮迭代的模型参数
μ ^ k = ∑ j = 1 N γ ^ j k y j ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{y_j}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K μ^k=j=1∑Nγ^jkj=1∑Nγ^jkyj,k=1,2,⋯,K
σ ^ k 2 = ∑ j = 1 N γ ^ j k ( y j − μ k ) 2 ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat \sigma _k^2 = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K σ^k2=j=1∑Nγ^jkj=1∑Nγ^jk(yj−μk)2,k=1,2,⋯,K
α ^ k = n k N = ∑ j = 1 N γ ^ j k N , k = 1 , 2 , ⋯   , K {{\hat \alpha }_k} = \frac{{{n_k}}}{N} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }}{N},k = 1,2, \cdots ,K α^k=Nnk=Nj=1∑Nγ^jk,k=1,2,⋯,K - 重复上两步直到收敛。
EM算法的推广
EM算法还可以解释为 F F F函数的极大-极大算法,基于这个解释有若干变形与推广。
F函数的极大-极大算法
- F函数
假设隐变量数据 Z Z Z的概率分布为 P ~ ( Z ) \tilde P\left( Z \right) P~(Z),定义分布 P ~ \tilde P P~与参数 θ \theta θ的函数 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)如下:
F ( P ~ , θ ) = E P ~ [ log P ( Y , Z ∣ θ ) ] + H ( P ~ ) F\left( {\tilde P,\theta } \right) = {E_{\tilde P}}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)} \right] + H\left( {\tilde P} \right) F(P~,θ)=EP~[logP(Y,Z∣θ)]+H(P~)
称为 F F F函数,式中 H ( P ~ ) = − E P ~ log P ~ ( Z ) H\left( {\tilde P} \right) = - {E_{\tilde P}}\log \tilde P\left( Z \right) H(P~)=−EP~logP~(Z)是分布 P ~ ( Z ) \tilde P\left( Z \right) P~(Z)的熵。 - 引理
- 对于固定的
θ
\theta
θ,存在唯一的分布
P
~
θ
\tilde P_{\theta}
P~θ极大化
F
(
P
~
,
θ
)
F\left( {\tilde P,\theta } \right)
F(P~,θ),这时
P
~
θ
\tilde P_{\theta}
P~θ由下式给出:
P ~ θ ( Z ) = P ( Z ∣ Y , θ ) {\tilde P_\theta }\left( Z \right) = P\left( {Z\left| {Y,\theta } \right.} \right) P~θ(Z)=P(Z∣Y,θ)
并且 P ~ θ \tilde P_{\theta} P~θ随 θ \theta θ连续变化。 - 若
P
~
θ
(
Z
)
=
P
(
Z
∣
Y
,
θ
)
{\tilde P_\theta }\left( Z \right) = P\left( {Z\left| {Y,\theta } \right.} \right)
P~θ(Z)=P(Z∣Y,θ),则
F ( P ~ , θ ) = log P ( Y ∣ θ ) F\left( {\tilde P,\theta } \right) = \log P\left( {Y\left| \theta \right.} \right) F(P~,θ)=logP(Y∣θ)
- 对于固定的
θ
\theta
θ,存在唯一的分布
P
~
θ
\tilde P_{\theta}
P~θ极大化
F
(
P
~
,
θ
)
F\left( {\tilde P,\theta } \right)
F(P~,θ),这时
P
~
θ
\tilde P_{\theta}
P~θ由下式给出:
- 定理
- 设 L ( θ ) = log P ( Y ∣ θ ) L\left( \theta \right) = \log P\left( {Y\left| \theta \right.} \right) L(θ)=logP(Y∣θ)为观测数据的对数似然函数, θ ( i ) , i = 1 , 2 ⋯ {\theta ^{\left( i \right)}},i = 1,2 \cdots θ(i),i=1,2⋯为EM算法得到的参数估计序列,函数 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)由定义给出。如果 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)在 P ~ ∗ \tilde P^* P~∗和 θ ∗ \theta^* θ∗有局部极大值,那么 L ( θ ) L\left( \theta \right) L(θ)也在 θ ∗ \theta^* θ∗有局部最大值。类似地,如果 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)在 P ~ ∗ \tilde P^* P~∗和 θ ∗ \theta^* θ∗达到全局最大值,那么 L ( θ ) L\left( \theta \right) L(θ)也在 θ ∗ \theta^* θ∗达到全局最大值。
- EM算法的一次迭代可由
F
F
F函数的极大-极大算法实现
设 θ ( i ) \theta^{\left(i\right)} θ(i)为第 i i i次迭代参数 θ \theta θ的估计, P ~ ( i ) {\tilde P^{\left(i\right)}} P~(i)为第 i i i次迭代函数 P ~ {\tilde P } P~的估计,在第 i + 1 i+1 i+1次迭代的两步为:- 对固定的 θ ( i ) \theta^{\left(i\right)} θ(i),求 P ~ ( i + 1 ) {\tilde P^{\left(i+1\right)}} P~(i+1)使 F ( P ~ , θ ( i ) ) F\left( {\tilde P,\theta^{\left(i\right)} } \right) F(P~,θ(i))极大化;
- 对固定的 P ~ ( i + 1 ) {\tilde P^{\left(i+1\right)}} P~(i+1),求 θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1)使 F ( P ~ ( i + 1 ) , θ ) F\left( {\tilde P^{\left(i+1\right)},\theta} \right) F(P~(i+1),θ)极大化;
GEM算法
算法一
- 输入:观测数据, F F F函数
- 输出:模型参数
- 流程:
- 初始化参数 θ ( 0 ) \theta^{\left(0\right)} θ(0),开始迭代
- 第
i
+
1
i+1
i+1次迭代:
- 记 θ ( i ) \theta^{\left(i\right)} θ(i)为参数 θ \theta θ的估计值, P ~ ( i ) \tilde P^{\left(i\right)} P~(i)为参数 P ~ \tilde P P~的估计,求 P ~ ( i + 1 ) \tilde P^{\left(i+1\right)} P~(i+1)使 P ~ \tilde P P~极大化 F ( P ~ , θ ( i ) ) F\left(\tilde P,\theta^{\left(i\right)}\right) F(P~,θ(i))
- 求 θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1),使得 F ( P ~ ( i + 1 ) , θ ) F\left(\tilde P^{\left(i+1\right)},\theta\right) F(P~(i+1),θ)极大化
- 重复上一步直到收敛
求解 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))的极大化是困难的。
算法二
- 输入:观测函数, Q Q Q函数
- 输出:模型参数
- 流程:
- 初始化参数 θ ( 0 ) \theta^{\left(0\right)} θ(0),开始迭代
- 第
i
+
1
i+1
i+1次迭代:
- 记
θ
(
i
)
\theta^{\left(i\right)}
θ(i)为参数
θ
\theta
θ的估计值,计算:
Q ( θ , θ ( i ) ) = E Z [ log P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log P ( Y , Z ∣ θ ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = {E_Z}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)\left| Y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_Z {P\left( {Z\left| {Y,{\theta ^{\left( i \right)}}} \right.} \right)\log P\left( {Y,Z\left| \theta \right.} \right)} Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑P(Z∣∣∣Y,θ(i))logP(Y,Z∣θ) - 求
θ
(
i
+
1
)
\theta^{\left(i+1\right)}
θ(i+1),使得
Q ( θ ( i + 1 ) , θ ( i ) ) > Q ( θ ( i ) , θ ( i ) ) Q\left( {{\theta ^{\left( {i + 1} \right)}},{\theta ^{\left( i \right)}}} \right) > Q\left( {{\theta ^{\left( i \right)}},{\theta ^{\left( i \right)}}} \right) Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
- 记
θ
(
i
)
\theta^{\left(i\right)}
θ(i)为参数
θ
\theta
θ的估计值,计算:
- 重复上一步骤直至收敛。
当参数 θ \theta θ的维数为 d ( d ≥ 2 ) d\left( {d \ge 2} \right) d(d≥2)时,可采用特殊的GEM算法。
算法三
- 输入:观测函数, Q Q Q函数
- 输出:模型参数
- 流程:
- 初始化参数 θ ( 0 ) = ( θ 1 ( 0 ) , θ 2 ( 0 ) , ⋯   , θ d ( 0 ) ) {\theta ^{\left( 0 \right)}} = \left( {\theta _1^{\left( 0 \right)},\theta _2^{\left( 0 \right)}, \cdots ,\theta _d^{\left( 0 \right)}} \right) θ(0)=(θ1(0),θ2(0),⋯,θd(0)),开始迭代
- 第
i
+
1
i+1
i+1次迭代:
- 记
θ
(
i
)
=
(
θ
1
(
i
)
,
θ
2
(
i
)
,
⋯
 
,
θ
d
(
i
)
)
{\theta ^{\left( i \right)}} = \left( {\theta _1^{\left( i \right)},\theta _2^{\left( i \right)}, \cdots ,\theta _d^{\left( i \right)}} \right)
θ(i)=(θ1(i),θ2(i),⋯,θd(i))为参数
θ
=
(
θ
1
,
θ
2
,
⋯
 
,
θ
d
)
\theta = \left( {{\theta _1},{\theta _2}, \cdots ,{\theta _d}} \right)
θ=(θ1,θ2,⋯,θd)的估计值,计算
Q ( θ , θ ( i ) ) = E Z [ log P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log P ( Y , Z ∣ θ ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = {E_Z}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)\left| Y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_Z {P\left( {Z\left| {Y,{\theta ^{\left( i \right)}}} \right.} \right)\log P\left( {Y,Z\left| \theta \right.} \right)} Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑P(Z∣∣∣Y,θ(i))logP(Y,Z∣θ) - 进行
d
d
d次条件极大化
- 在 θ 2 ( i ) , ⋯   , θ d ( i ) \theta _2^{\left( i \right)}, \cdots ,\theta _d^{\left( i \right)} θ2(i),⋯,θd(i)保持不变的条件下使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))达到极大的 θ 1 ( i + 1 ) \theta _1^{\left( i+1 \right)} θ1(i+1)
- 在 θ 1 = θ 1 ( i + 1 ) , θ j = θ j ( i ) , j = 3 , 4 , ⋯   , k {\theta _1} = \theta _1^{\left( {i + 1} \right)},{\theta _j} = \theta _j^{\left( i \right)},j = 3,4, \cdots ,k θ1=θ1(i+1),θj=θj(i),j=3,4,⋯,k的条件下求使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))达到极大的 θ 2 ( i + 1 ) \theta _2^{\left( i+1 \right)} θ2(i+1)
- 如此经过
k
k
k次,得到
θ
(
i
+
1
)
\theta^{\left(i+1\right)}
θ(i+1),使得
Q ( θ ( i + 1 ) , θ ( i ) ) > Q ( θ ( i ) , θ ( i ) ) Q\left( {{\theta ^{\left( {i + 1} \right)}},{\theta ^{\left( i \right)}}} \right) > Q\left( {{\theta ^{\left( i \right)}},{\theta ^{\left( i \right)}}} \right) Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
- 记
θ
(
i
)
=
(
θ
1
(
i
)
,
θ
2
(
i
)
,
⋯
 
,
θ
d
(
i
)
)
{\theta ^{\left( i \right)}} = \left( {\theta _1^{\left( i \right)},\theta _2^{\left( i \right)}, \cdots ,\theta _d^{\left( i \right)}} \right)
θ(i)=(θ1(i),θ2(i),⋯,θd(i))为参数
θ
=
(
θ
1
,
θ
2
,
⋯
 
,
θ
d
)
\theta = \left( {{\theta _1},{\theta _2}, \cdots ,{\theta _d}} \right)
θ=(θ1,θ2,⋯,θd)的估计值,计算
- 重复上一步直到收敛。
参考文献
《统计学习方法》