统计学习方法——EM算法及其推广(二)

EM算法及其推广(二)

EM算法在高斯混合模型学习中的应用

高斯混合模型

高斯混合模型是指具有如下形式的概率分布模型:
P ( y ∣ θ ) = ∑ k = 1 K α k ϕ ( y ∣ θ k ) P\left( {y\left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {y\left| {{\theta _k}} \right.} \right)} P(yθ)=k=1Kαkϕ(yθk)
其中, α k > 0 {{\alpha _k}}>0 αk>0是系数, ∑ k = 1 K α k = 1 \sum\limits_{k = 1}^K {{\alpha _k}} = 1 k=1Kαk=1 ϕ ( y ∣ θ k ) {\phi \left( {y\left| {{\theta _k}} \right.} \right)} ϕ(yθk)是高斯分布密度, θ k = ( μ , σ k 2 ) {\theta _k} = \left( {\mu ,\sigma _k^2} \right) θk=(μ,σk2)
ϕ ( y ∣ θ k ) = 1 2 π σ k exp ⁡ ( − ( y − μ k ) 2 2 σ k 2 ) \phi \left( {y\left| {{\theta _k}} \right.} \right) = \frac{1}{{\sqrt {2\pi } {\sigma _k}}}\exp \left( { - \frac{{{{\left( {y - {\mu _k}} \right)}^2}}}{{2\sigma _k^2}}} \right) ϕ(yθk)=2π σk1exp(2σk2(yμk)2)
称为第 k k k个分模型。
一般混合模型可以由任意概率分布密度替代高斯分布密度。

高斯混合模型参数估计的EM算法
  • 理论
    假设观察数据 y 1 , y 2 , ⋯   , y N {y_1},{y_2}, \cdots ,{y_N} y1,y2,,yN由高斯混合模型生成,
    P ( y ∣ θ ) = ∑ k = 1 K α k ϕ ( y ∣ θ k ) P\left( {y\left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {y\left| {{\theta _k}} \right.} \right)} P(yθ)=k=1Kαkϕ(yθk)
    其中 θ = ( α 1 , α 2 , ⋯   , α K ; θ 1 , θ 2 , ⋯   , θ K ) \theta = \left( {{\alpha _1},{\alpha _2}, \cdots ,{\alpha _K};{\theta _1},{\theta _2}, \cdots ,{\theta _K}} \right) θ=(α1,α2,,αK;θ1,θ2,,θK),用EM算法估计高斯混合模型的参数 θ \theta θ
    • 明确隐变量,写出完全数据的对数似然函数
      设想观测数据 y j , j = 1 , 2 , ⋯   , N y_j,j=1,2,\cdots,N yjj=1,2,,N是这样产生的:首先依概率 α k \alpha_k αk选择第 k k k个高斯分布分模型 ϕ ( y ∣ θ k ) {\phi \left( {y\left| {{\theta _k}} \right.} \right)} ϕ(yθk);然后依照第 k k k个分模型的概率分布 ϕ ( y ∣ θ k ) {\phi \left( {y\left| {{\theta _k}} \right.} \right)} ϕ(yθk)生成观测数据 y j y_j yj,这时观测 y j y_j yj是已知的。反映观测数据 y j y_j yj来自第 k k k个分模型的数据是未知的, k = 1 , 2 , ⋯   , K k=1,2,\cdots,K k=1,2,,K,以隐变量 γ j k {\gamma _{jk}} γjk表示,定义如下:
      γ j k = { 1 , 第 j 个 观 测 来 自 第 k 个 分 模 型 0 , 否 则 {\gamma _{jk}} = \left\{ \begin{array}{l} 1,第j个观测来自第k个分模型\\ 0,否则 \end{array} \right. γjk={1,jk0,
      j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K j=1,2,\cdots,N;k=1,2,\cdots,K j=1,2,,Nk=1,2,,K
      有了观测数据 y j y_j yj和未观测数据 γ j k \gamma_{jk} γjk,那么完全数据为:
      ( y j , γ j 1 , γ j 2 , ⋯   , γ j K ) , j = 1 , 2 , ⋯   , N \left( {{y_j},{\gamma _{j1}},{\gamma _{j2}}, \cdots ,{\gamma _{jK}}} \right),j = 1,2, \cdots ,N (yj,γj1,γj2,,γjK),j=1,2,,N
      于是写出完全数据的似然函数:
      P ( y , γ ∣ θ ) = ∏ j = 1 N P ( y j , γ j 1 , γ j 2 , ⋯   , γ j K ∣ θ ) = ∏ k = 1 K α k n k ∏ j = 1 N [ 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ] γ j k P\left( {y,\gamma \left| \theta \right.} \right) = \prod\limits_{j = 1}^N {P\left( {{y_j},{\gamma _{j1}},{\gamma _{j2}}, \cdots ,{\gamma _{jK}}\left| \theta \right.} \right)} = \prod\limits_{k = 1}^K {\alpha _k^{{n_k}}} {\prod\limits_{j = 1}^N {\left[ {\frac{1}{{\sqrt {2\pi } {\sigma _k}}}\exp \left( { - \frac{{{{\left( {{y_j} - {\mu _k}} \right)}^2}}}{{2\sigma _k^2}}} \right)} \right]} ^{{\gamma _{jk}}}} P(y,γθ)=j=1NP(yj,γj1,γj2,,γjKθ)=k=1Kαknkj=1N[2π σk1exp(2σk2(yjμk)2)]γjk
      式中 n k = ∑ j = 1 N γ j k , N = ∑ k = 1 K n k {n_k} = \sum\limits_{j = 1}^N {{\gamma _{jk}}} ,N = \sum\limits_{k = 1}^K {{n_k}} nk=j=1Nγjk,N=k=1Knk
      完全数据的对数似然函数为:
      log ⁡ P ( y , γ ∣ θ ) = ∑ k = 1 K n k log ⁡ α k + ∑ j = 1 N γ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] \log P\left( {y,\gamma \left| \theta \right.} \right) = \sum\limits_{k = 1}^K {{n_k}\log {\alpha _k}} + \sum\limits_{j = 1}^N {{\gamma _{jk}}\left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right]} logP(y,γθ)=k=1Knklogαk+j=1Nγjk[log(2π 1)logσk2σk21(yjμk)2]
    • E步:确定 Q Q Q函数
      Q ( θ , θ ( i ) ) = E [ log ⁡ P ( y , γ ∣ θ ) ∣ y , θ ( i ) ] = ∑ k = 1 K { ∑ j = 1 N ( E γ j k ) log ⁡ α k + ∑ j = 1 N ( E γ j k ) [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] } Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = E\left[ {\log P\left( {y,\gamma \left| \theta \right.} \right)\left| y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_{k = 1}^K {\left\{ {\sum\limits_{j = 1}^N {\left( {E{\gamma _{jk}}} \right)\log {\alpha _k}} + \sum\limits_{j = 1}^N {\left( {E{\gamma _{jk}}} \right)\left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right]} } \right\}} Q(θ,θ(i))=E[logP(y,γθ)y,θ(i)]=k=1K{j=1N(Eγjk)logαk+j=1N(Eγjk)[log(2π 1)logσk2σk21(yjμk)2]}
      需要计算 E ( γ j k ∣ y , θ ) E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) E(γjky,θ),记为 γ ^ j k {{\hat \gamma }_{jk}} γ^jk
      γ ^ j k = E ( γ j k ∣ y , θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K {{\hat \gamma }_{jk}} = E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) = \frac{{{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)}}{{\sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)} }},j = 1,2, \cdots ,N;k = 1,2, \cdots ,K γ^jk=E(γjky,θ)=k=1Kαkϕ(yjθk)αkϕ(yjθk),j=1,2,,N;k=1,2,,K
      带入后得到:
      Q ( θ , θ ( i ) ) = ∑ k = 1 K n k log ⁡ α k + ∑ k = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ] Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = \sum\limits_{k = 1}^K {{n_k}\log {\alpha _k}} + \sum\limits_{k = 1}^N {{{\hat \gamma }_{jk}}} \left[ {\log \left( {\frac{1}{{\sqrt {2\pi } }}} \right) - \log {\sigma _k} - \frac{1}{{2\sigma _k^2}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} \right] Q(θ,θ(i))=k=1Knklogαk+k=1Nγ^jk[log(2π 1)logσk2σk21(yjμk)2]
    • 确定M步
      迭代的 M M M步是求函数 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i)) θ \theta θ的极大值,即求新一轮迭代的模型参数:
      θ ( i + 1 ) = arg ⁡ max ⁡ θ Q ( θ , θ ( i ) ) {\theta ^{\left( {i + 1} \right)}} = \arg \mathop {\max }\limits_\theta Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) θ(i+1)=argθmaxQ(θ,θ(i))
      μ ^ k , σ ^ k 2 , α ^ k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k},\hat \sigma _k^2,{{\hat \alpha }_k},k = 1,2, \cdots ,K μ^k,σ^k2,α^k,k=1,2,,K表示 θ ( i + 1 ) \theta ^{\left( {i + 1} \right)} θ(i+1)的各个参数:
      μ ^ k = ∑ j = 1 N γ ^ j k y j ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{y_j}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K μ^k=j=1Nγ^jkj=1Nγ^jkyj,k=1,2,,K
      σ ^ k 2 = ∑ j = 1 N γ ^ j k ( y j − μ k ) 2 ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat \sigma _k^2 = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K σ^k2=j=1Nγ^jkj=1Nγ^jk(yjμk)2,k=1,2,,K
      α ^ k = n k N = ∑ j = 1 N γ ^ j k N , k = 1 , 2 , ⋯   , K {{\hat \alpha }_k} = \frac{{{n_k}}}{N} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }}{N},k = 1,2, \cdots ,K α^k=Nnk=Nj=1Nγ^jk,k=1,2,,K
  • 算法
    • 输入:观测数据 y 1 , y 2 , ⋯   , y N y_1,y_2,\cdots,y_N y1,y2,,yN,高斯混合模型
    • 输出:高斯混合模型参数
    • 流程
      • 取参数的初始值开始迭代
      • E步:依据当前模型参数,计算分模型 k k k对观测数据 y j y_j yj的响应度
        γ ^ j k = E ( γ j k ∣ y , θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K {{\hat \gamma }_{jk}} = E\left( {{\gamma _{jk}}\left| {y,\theta } \right.} \right) = \frac{{{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)}}{{\sum\limits_{k = 1}^K {{\alpha _k}\phi \left( {{y_j}\left| {{\theta _k}} \right.} \right)} }},j = 1,2, \cdots ,N;k = 1,2, \cdots ,K γ^jk=E(γjky,θ)=k=1Kαkϕ(yjθk)αkϕ(yjθk),j=1,2,,N;k=1,2,,K
      • M步:计算新一轮迭代的模型参数
        μ ^ k = ∑ j = 1 N γ ^ j k y j ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K {{\hat \mu }_k} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{y_j}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K μ^k=j=1Nγ^jkj=1Nγ^jkyj,k=1,2,,K
        σ ^ k 2 = ∑ j = 1 N γ ^ j k ( y j − μ k ) 2 ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat \sigma _k^2 = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}{{\left( {{y_j} - {\mu _k}} \right)}^2}} }}{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }},k = 1,2, \cdots ,K σ^k2=j=1Nγ^jkj=1Nγ^jk(yjμk)2,k=1,2,,K
        α ^ k = n k N = ∑ j = 1 N γ ^ j k N , k = 1 , 2 , ⋯   , K {{\hat \alpha }_k} = \frac{{{n_k}}}{N} = \frac{{\sum\limits_{j = 1}^N {{{\hat \gamma }_{jk}}} }}{N},k = 1,2, \cdots ,K α^k=Nnk=Nj=1Nγ^jk,k=1,2,,K
      • 重复上两步直到收敛。

EM算法的推广

EM算法还可以解释为 F F F函数的极大-极大算法,基于这个解释有若干变形与推广。

F函数的极大-极大算法
  • F函数
    假设隐变量数据 Z Z Z的概率分布为 P ~ ( Z ) \tilde P\left( Z \right) P~(Z),定义分布 P ~ \tilde P P~与参数 θ \theta θ的函数 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)如下:
    F ( P ~ , θ ) = E P ~ [ log ⁡ P ( Y , Z ∣ θ ) ] + H ( P ~ ) F\left( {\tilde P,\theta } \right) = {E_{\tilde P}}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)} \right] + H\left( {\tilde P} \right) F(P~,θ)=EP~[logP(Y,Zθ)]+H(P~)
    称为 F F F函数,式中 H ( P ~ ) = − E P ~ log ⁡ P ~ ( Z ) H\left( {\tilde P} \right) = - {E_{\tilde P}}\log \tilde P\left( Z \right) H(P~)=EP~logP~(Z)是分布 P ~ ( Z ) \tilde P\left( Z \right) P~(Z)的熵。
  • 引理
    • 对于固定的 θ \theta θ,存在唯一的分布 P ~ θ \tilde P_{\theta} P~θ极大化 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ),这时 P ~ θ \tilde P_{\theta} P~θ由下式给出:
      P ~ θ ( Z ) = P ( Z ∣ Y , θ ) {\tilde P_\theta }\left( Z \right) = P\left( {Z\left| {Y,\theta } \right.} \right) P~θ(Z)=P(ZY,θ)
      并且 P ~ θ \tilde P_{\theta} P~θ θ \theta θ连续变化。
    • P ~ θ ( Z ) = P ( Z ∣ Y , θ ) {\tilde P_\theta }\left( Z \right) = P\left( {Z\left| {Y,\theta } \right.} \right) P~θ(Z)=P(ZY,θ),则
      F ( P ~ , θ ) = log ⁡ P ( Y ∣ θ ) F\left( {\tilde P,\theta } \right) = \log P\left( {Y\left| \theta \right.} \right) F(P~,θ)=logP(Yθ)
  • 定理
    • L ( θ ) = log ⁡ P ( Y ∣ θ ) L\left( \theta \right) = \log P\left( {Y\left| \theta \right.} \right) L(θ)=logP(Yθ)为观测数据的对数似然函数, θ ( i ) , i = 1 , 2 ⋯ {\theta ^{\left( i \right)}},i = 1,2 \cdots θ(i),i=1,2为EM算法得到的参数估计序列,函数 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ)由定义给出。如果 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ) P ~ ∗ \tilde P^* P~ θ ∗ \theta^* θ有局部极大值,那么 L ( θ ) L\left( \theta \right) L(θ)也在 θ ∗ \theta^* θ有局部最大值。类似地,如果 F ( P ~ , θ ) F\left( {\tilde P,\theta } \right) F(P~,θ) P ~ ∗ \tilde P^* P~ θ ∗ \theta^* θ达到全局最大值,那么 L ( θ ) L\left( \theta \right) L(θ)也在 θ ∗ \theta^* θ达到全局最大值。
    • EM算法的一次迭代可由 F F F函数的极大-极大算法实现
      θ ( i ) \theta^{\left(i\right)} θ(i)为第 i i i次迭代参数 θ \theta θ的估计, P ~ ( i ) {\tilde P^{\left(i\right)}} P~(i)为第 i i i次迭代函数 P ~ {\tilde P } P~的估计,在第 i + 1 i+1 i+1次迭代的两步为:
      • 对固定的 θ ( i ) \theta^{\left(i\right)} θ(i),求 P ~ ( i + 1 ) {\tilde P^{\left(i+1\right)}} P~(i+1)使 F ( P ~ , θ ( i ) ) F\left( {\tilde P,\theta^{\left(i\right)} } \right) F(P~,θ(i))极大化;
      • 对固定的 P ~ ( i + 1 ) {\tilde P^{\left(i+1\right)}} P~(i+1),求 θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1)使 F ( P ~ ( i + 1 ) , θ ) F\left( {\tilde P^{\left(i+1\right)},\theta} \right) F(P~(i+1),θ)极大化;
GEM算法
算法一
  • 输入:观测数据, F F F函数
  • 输出:模型参数
  • 流程:
    • 初始化参数 θ ( 0 ) \theta^{\left(0\right)} θ(0),开始迭代
    • i + 1 i+1 i+1次迭代:
      • θ ( i ) \theta^{\left(i\right)} θ(i)为参数 θ \theta θ的估计值, P ~ ( i ) \tilde P^{\left(i\right)} P~(i)为参数 P ~ \tilde P P~的估计,求 P ~ ( i + 1 ) \tilde P^{\left(i+1\right)} P~(i+1)使 P ~ \tilde P P~极大化 F ( P ~ , θ ( i ) ) F\left(\tilde P,\theta^{\left(i\right)}\right) F(P~,θ(i))
      • θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1),使得 F ( P ~ ( i + 1 ) , θ ) F\left(\tilde P^{\left(i+1\right)},\theta\right) F(P~(i+1),θ)极大化
    • 重复上一步直到收敛

求解 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))的极大化是困难的。

算法二
  • 输入:观测函数, Q Q Q函数
  • 输出:模型参数
  • 流程:
    • 初始化参数 θ ( 0 ) \theta^{\left(0\right)} θ(0),开始迭代
    • i + 1 i+1 i+1次迭代:
      • θ ( i ) \theta^{\left(i\right)} θ(i)为参数 θ \theta θ的估计值,计算:
        Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = {E_Z}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)\left| Y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_Z {P\left( {Z\left| {Y,{\theta ^{\left( i \right)}}} \right.} \right)\log P\left( {Y,Z\left| \theta \right.} \right)} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)
      • θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1),使得
        Q ( θ ( i + 1 ) , θ ( i ) ) > Q ( θ ( i ) , θ ( i ) ) Q\left( {{\theta ^{\left( {i + 1} \right)}},{\theta ^{\left( i \right)}}} \right) > Q\left( {{\theta ^{\left( i \right)}},{\theta ^{\left( i \right)}}} \right) Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
    • 重复上一步骤直至收敛。

当参数 θ \theta θ的维数为 d ( d ≥ 2 ) d\left( {d \ge 2} \right) d(d2)时,可采用特殊的GEM算法。

算法三
  • 输入:观测函数, Q Q Q函数
  • 输出:模型参数
  • 流程:
    • 初始化参数 θ ( 0 ) = ( θ 1 ( 0 ) , θ 2 ( 0 ) , ⋯   , θ d ( 0 ) ) {\theta ^{\left( 0 \right)}} = \left( {\theta _1^{\left( 0 \right)},\theta _2^{\left( 0 \right)}, \cdots ,\theta _d^{\left( 0 \right)}} \right) θ(0)=(θ1(0),θ2(0),,θd(0)),开始迭代
    • i + 1 i+1 i+1次迭代:
      • θ ( i ) = ( θ 1 ( i ) , θ 2 ( i ) , ⋯   , θ d ( i ) ) {\theta ^{\left( i \right)}} = \left( {\theta _1^{\left( i \right)},\theta _2^{\left( i \right)}, \cdots ,\theta _d^{\left( i \right)}} \right) θ(i)=(θ1(i),θ2(i),,θd(i))为参数 θ = ( θ 1 , θ 2 , ⋯   , θ d ) \theta = \left( {{\theta _1},{\theta _2}, \cdots ,{\theta _d}} \right) θ=(θ1,θ2,,θd)的估计值,计算
        Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = {E_Z}\left[ {\log P\left( {Y,Z\left| \theta \right.} \right)\left| Y \right.,{\theta ^{\left( i \right)}}} \right] = \sum\limits_Z {P\left( {Z\left| {Y,{\theta ^{\left( i \right)}}} \right.} \right)\log P\left( {Y,Z\left| \theta \right.} \right)} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)
      • 进行 d d d次条件极大化
        • θ 2 ( i ) , ⋯   , θ d ( i ) \theta _2^{\left( i \right)}, \cdots ,\theta _d^{\left( i \right)} θ2(i),,θd(i)保持不变的条件下使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))达到极大的 θ 1 ( i + 1 ) \theta _1^{\left( i+1 \right)} θ1(i+1)
        • θ 1 = θ 1 ( i + 1 ) , θ j = θ j ( i ) , j = 3 , 4 , ⋯   , k {\theta _1} = \theta _1^{\left( {i + 1} \right)},{\theta _j} = \theta _j^{\left( i \right)},j = 3,4, \cdots ,k θ1=θ1(i+1),θj=θj(i),j=3,4,,k的条件下求使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))达到极大的 θ 2 ( i + 1 ) \theta _2^{\left( i+1 \right)} θ2(i+1)
        • 如此经过 k k k次,得到 θ ( i + 1 ) \theta^{\left(i+1\right)} θ(i+1),使得
          Q ( θ ( i + 1 ) , θ ( i ) ) > Q ( θ ( i ) , θ ( i ) ) Q\left( {{\theta ^{\left( {i + 1} \right)}},{\theta ^{\left( i \right)}}} \right) > Q\left( {{\theta ^{\left( i \right)}},{\theta ^{\left( i \right)}}} \right) Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
    • 重复上一步直到收敛。
参考文献

《统计学习方法》

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值