电信保温杯笔记——《统计学习方法(第二版)——李航》第9章 EM算法及其推广

论文

EM算法:《Maximum Likelihood from Incomplete Data Via the EM Algorithm》
GEM算法:《A view of the EM algorithm that justifies incremental, sparse, and other variants》

介绍

电信保温杯笔记——《统计学习方法(第二版)——李航》
本文是对原书的精读,会有大量原书的截图,同时对书上不详尽的地方进行细致解读与改写。

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

EM算法

原理:定义一个 Q ( θ , θ ( i ) ) Q(\theta, \theta^{(i)} ) Q(θ,θ(i)) 函数,求 Q ( θ , θ ( i ) ) Q(\theta, \theta^{(i)} ) Q(θ,θ(i)) 最大值时的 θ \theta θ,同时又令它为 θ ( i + 1 ) \theta^{(i+1)} θ(i+1),并将 θ ( i + 1 ) \theta^{(i+1)} θ(i+1) 带入 Q Q Q 函数,再求 Q ( θ , θ ( i + 1 ) ) Q(\theta, \theta^{(i+1)} ) Q(θ,θ(i+1)) 最大值时的 θ \theta θ,经过多次迭代后,可得到 P ( y ∣ θ ) P(y|\theta) P(yθ)
在这里插入图片描述

电信保温杯笔记——《统计学习方法(第二版)——李航》第4章 朴素贝叶斯法中的数学基础里,有最大后验概率估计和极大似然估计的解释。

下面例子中, A , B , C , π , p , q A,B,C,\pi,p,q A,B,C,π,p,q 就是隐藏变量,正反面就是观测值。每一个观测值只源于模型 B,C,而模型B,C 发生的只取决于A。

例子

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

步骤

在这里插入图片描述

Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) ( 9.9 ) \begin{aligned} Q(\theta,\theta^{(i)} ) &= E_Z[ \log P(Y,Z|\theta) |Y, \theta^{(i)} ] \\ &= \sum_Z P(Z|Y, \theta^{(i)}) \log P(Y,Z|\theta) \quad\quad\quad\quad\quad\quad\quad\quad\quad (9.9) \end{aligned} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)(9.9)
在这里插入图片描述

推导

在这里插入图片描述

λ j = P ( Z ∣ Y , θ ( i ) ) , y j = P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) \lambda_j = P(Z|Y,\theta^{(i)}),y_j = \frac{ P(Y|Z,\theta) P(Z|\theta) }{ P(Z|Y, \theta^{(i)}) } λj=P(ZY,θ(i)),yj=P(ZY,θ(i))P(YZ,θ)P(Zθ)
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( ∑ Z P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ⁡ P ( Y ∣ θ ( i ) ) ≥ ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) − log ⁡ P ( Y ∣ θ ( i ) ) = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) − ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y ∣ θ ( i ) ) = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ θ ( i ) ) \begin{aligned} L(\theta ) - L(\theta^{(i)} ) &= \log \left( \sum_Z P(Z|Y,\theta^{(i)}) \frac{ P(Y|Z,\theta) P(Z|\theta) }{ P(Z|Y, \theta^{(i)}) } \right) - \log P(Y| \theta^{(i)}) \\ &\ge \sum_Z P(Z|Y,\theta^{(i)}) \log \frac{ P(Y|Z,\theta) P(Z|\theta) }{ P(Z|Y, \theta^{(i)}) } - \log P(Y| \theta^{(i)}) \\ &= \sum_Z P(Z|Y,\theta^{(i)}) \log \frac{ P(Y|Z,\theta) P(Z|\theta) }{ P(Z|Y, \theta^{(i)}) } - \sum_Z P(Z|Y,\theta^{(i)}) \log P(Y| \theta^{(i)}) \\ &= \sum_Z P(Z|Y,\theta^{(i)}) \log \frac{ P(Y|Z,\theta) P(Z|\theta) }{ P(Z|Y, \theta^{(i)}) P(Y| \theta^{(i)}) } \end{aligned} L(θ)L(θ(i))=log(ZP(ZY,θ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))logP(Yθ(i))ZP(ZY,θ(i))logP(ZY,θ(i))P(YZ,θ)P(Zθ)logP(Yθ(i))=ZP(ZY,θ(i))logP(ZY,θ(i))P(YZ,θ)P(Zθ)ZP(ZY,θ(i))logP(Yθ(i))=ZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)
在这里插入图片描述

在这里插入图片描述

Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) ( 9.9 ) \begin{aligned} Q(\theta,\theta^{(i)} ) &= E_Z[ \log P(Y,Z|\theta) |Y, \theta^{(i)} ] \\ &= \sum_Z P(Z|Y, \theta^{(i)}) \log P(Y,Z|\theta) \quad\quad\quad\quad\quad\quad\quad\quad\quad (9.9) \end{aligned} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)(9.9)
在这里插入图片描述

在这里插入图片描述

收敛性

感觉这一部分可以不用看,直接看GEM算法。
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

GEM算法

在这里插入图片描述

高斯混合模型

在这里插入图片描述

α k , σ k \alpha_k,\sigma_k αk,σk 决定每个高斯模型的高矮肥瘦。

使用EM算法估计高斯混合模型的参数

在这里插入图片描述

1. 明确隐变量,写出完全数据的对数似然函数

在这里插入图片描述

P ( y , γ ∣ θ ) = ∏ j = 1 N P ( y j , γ j 1 , γ j 2 , ⋯   , γ j K ∣ θ ) = ∏ j = 1 N ∏ k = 1 K [ α k ϕ k ( y j ∣ θ k ) ] γ j k = ∏ k = 1 K ∏ j = 1 N α k γ j k [ ϕ k ( y j ∣ θ k ) ] γ j k = ∏ k = 1 K α k ∑ j = 1 N γ j k ∏ j = 1 N [ ϕ k ( y j ∣ θ k ) ] γ j k \begin{aligned} P(y, \gamma | \theta ) &= \prod\limits_{j = 1}^N P(y_j, \gamma_{j1},\gamma_{j2}, \cdots, \gamma_{jK} | \theta) \\ &= \prod\limits_{j = 1}^N \prod\limits_{k = 1}^K [\alpha_k \phi_k(y_j | \theta_k) ]^{\gamma_{jk}} \\ &= \prod\limits_{k = 1}^K \prod\limits_{j = 1}^N \alpha_k^{\gamma_{jk}} [ \phi_k(y_j | \theta_k) ]^{\gamma_{jk}} \\ &= \prod\limits_{k = 1}^K \alpha_k^{\sum\limits_{j = 1}^N \gamma_{jk}} \prod\limits_{j = 1}^N [ \phi_k(y_j | \theta_k) ]^{\gamma_{jk}} \\ \end{aligned} P(y,γθ)=j=1NP(yj,γj1,γj2,,γjKθ)=j=1Nk=1K[αkϕk(yjθk)]γjk=k=1Kj=1Nαkγjk[ϕk(yjθk)]γjk=k=1Kαkj=1Nγjkj=1N[ϕk(yjθk)]γjk
那么,完全数据的对数似然函数为

log ⁡ P ( y , γ ∣ θ ) = log ⁡ [ ∏ k = 1 K α k ∑ j = 1 N γ j k ∏ j = 1 N [ ϕ k ( y j ∣ θ k ) ] γ j k ] = ∑ k = 1 K { ∑ j = 1 N γ j k log ⁡ α k + ∑ j = 1 N γ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } \begin{aligned} \log P(y, \gamma | \theta ) &= \log \left[ \prod\limits_{k = 1}^K \alpha_k^{\sum\limits_{j = 1}^N \gamma_{jk}} \prod\limits_{j = 1}^N [ \phi_k(y_j | \theta_k) ]^{\gamma_{jk}} \right] \\ &= \sum\limits_{k = 1}^K \left\{ \sum\limits_{j = 1}^N \gamma_{jk} \log \alpha_k + \sum\limits_{j = 1}^N \gamma_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \\ \end{aligned} logP(y,γθ)=log k=1Kαkj=1Nγjkj=1N[ϕk(yjθk)]γjk =k=1K{j=1Nγjklogαk+j=1Nγjk[log(2π 1)logσk2σk21(yjμk)2)]}

2. EM算法的E步:确定Q函数

Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) ( 9.9 ) \begin{aligned} Q(\theta,\theta^{(i)} ) &= E_Z[ \log P(Y,Z|\theta) |Y, \theta^{(i)} ] \\ &= \sum_Z P(Z|Y, \theta^{(i)}) \log P(Y,Z|\theta) \quad\quad\quad\quad\quad\quad\quad\quad\quad (9.9) \end{aligned} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZP(ZY,θ(i))logP(Y,Zθ)(9.9)
Z = γ Z = \gamma Z=γ
Q ( θ , θ ( i ) ) = E [ log ⁡ P ( y , γ ∣ θ ) ∣ y , θ ( i ) ] = E { ∑ k = 1 K { ∑ j = 1 N γ j k log ⁡ α k + ∑ j = 1 N γ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } } = ∑ k = 1 K { log ⁡ α k ∑ j = 1 N ( E γ j k ) + ∑ j = 1 N ( E γ j k ) [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } ( 9.28 ) \begin{aligned} Q(\theta,\theta^{(i)} ) &= E[ \log P(y,\gamma |\theta) |y, \theta^{(i)} ] \\ &= E\left\{ \sum\limits_{k = 1}^K \left\{ \sum\limits_{j = 1}^N \gamma_{jk} \log \alpha_k + \sum\limits_{j = 1}^N \gamma_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \right\} \\ &= \sum\limits_{k = 1}^K \left\{ \log \alpha_k \sum\limits_{j = 1}^N (E \gamma_{jk} ) + \sum\limits_{j = 1}^N (E \gamma_{jk} ) \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \quad (9.28) \end{aligned} Q(θ,θ(i))=E[logP(y,γθ)y,θ(i)]=E{k=1K{j=1Nγjklogαk+j=1Nγjk[log(2π 1)logσk2σk21(yjμk)2)]}}=k=1K{logαkj=1N(Eγjk)+j=1N(Eγjk)[log(2π 1)logσk2σk21(yjμk)2)]}(9.28)

这里需要计算 E ( γ j k ∣ y , θ ( i ) ) E( \gamma_{jk} | y, \theta^{(i)} ) E(γjky,θ(i)),记为 γ ^ j k \hat{\gamma}_{jk} γ^jk
γ ^ j k = E ( γ j k ∣ y , θ ( i ) ) = P ( γ j k = 1 ∣ y , θ ( i ) ) ⋅ 1 + P ( γ j k = 0 ∣ y , θ ( i ) ) ⋅ 0 = P ( γ j k = 1 ∣ y , θ ( i ) ) = P ( γ j k = 1 , y j ∣ θ ( i ) ) P ( y j ∣ θ ( i ) ) = P ( γ j k = 1 , y j ∣ θ ( i ) ) ∑ k = 1 K P ( γ j k = 1 , y j ∣ θ ( i ) ) (分母为互斥事件) = P ( γ j k = 1 ∣ θ ( i ) ) P ( y j ∣ γ j k = 1 , θ ( i ) ) ∑ k = 1 K P ( γ j k = 1 ∣ θ ( i ) ) P ( y j ∣ γ j k = 1 , θ ( i ) ) = α k ( i ) ϕ ( y j ∣ θ k ( i ) ) ∑ k = 1 K α k ( i ) ϕ ( y j ∣ θ k ( i ) ) , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K \begin{aligned} \hat{\gamma}_{jk} &= E( \gamma_{jk} |y, \theta^{(i)} ) = P(\gamma_{jk} = 1 |y, \theta^{(i)} ) \cdot 1 + P(\gamma_{jk} = 0 |y, \theta^{(i)} ) \cdot 0 \\ &= P(\gamma_{jk} = 1 |y, \theta^{(i)} ) \\ &= \frac{ P(\gamma_{jk} = 1 ,y_j | \theta^{(i)} ) }{ P(y_j | \theta^{(i)} ) } \\ &= \frac{ P(\gamma_{jk} = 1 ,y_j | \theta^{(i)} ) }{ \sum\limits_{k = 1}^K P(\gamma_{jk} = 1,y_j | \theta^{(i)} ) } \text{(分母为互斥事件)}\\ &= \frac{ P(\gamma_{jk} = 1 | \theta^{(i)} ) P(y_j | \gamma_{jk} = 1 ,\theta^{(i)} ) }{ \sum\limits_{k = 1}^K P(\gamma_{jk} = 1 | \theta^{(i)} ) P(y_j | \gamma_{jk} = 1 ,\theta^{(i)} ) } \\ &= \frac{ \alpha_k^{(i)} \phi ( y_j | \theta_k^{(i)} ) }{ \sum\limits_{k = 1}^K \alpha_k^{(i)} \phi ( y_j | \theta_k^{(i)} ) } , \quad j = 1,2,\cdots, N; \quad k = 1,2,\cdots, K \end{aligned} γ^jk=E(γjky,θ(i))=P(γjk=1∣y,θ(i))1+P(γjk=0∣y,θ(i))0=P(γjk=1∣y,θ(i))=P(yjθ(i))P(γjk=1,yjθ(i))=k=1KP(γjk=1,yjθ(i))P(γjk=1,yjθ(i))(分母为互斥事件)=k=1KP(γjk=1∣θ(i))P(yjγjk=1,θ(i))P(γjk=1∣θ(i))P(yjγjk=1,θ(i))=k=1Kαk(i)ϕ(yjθk(i))αk(i)ϕ(yjθk(i)),j=1,2,,N;k=1,2,,K

n k = ∑ j = 1 N γ j k = ∑ j = 1 N E γ j k n_k = \sum\limits_{j = 1}^N \gamma_{jk} = \sum\limits_{j = 1}^N E\gamma_{jk} nk=j=1Nγjk=j=1NEγjk
在这里插入图片描述

3. 确定EM算法的M步

θ k = ( α k , μ k , σ k ) \theta_k = ( \alpha_k, \mu_k, \sigma_k ) θk=(αk,μk,σk)
在这里插入图片描述

∂ Q ( θ , θ ( i ) ) ∂ μ k = ∂ { ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } } ∂ μ k = ∑ j = 1 N γ ^ j k ( − 1 2 σ k 2 ⋅ 2 ( μ k − y j ) ) = − 1 σ k 2 ∑ j = 1 N γ ^ j k ( μ k − y j ) = 1 σ k 2 ∑ j = 1 N γ ^ j k ( y j − μ k ) = 1 σ k 2 ( ∑ j = 1 N γ ^ j k y j − μ k ∑ j = 1 N γ ^ j k ) = 0 \begin{aligned} \frac{ \partial Q(\theta,\theta^{(i)} ) }{ \partial \mu_k } &= \frac{ \partial \left\{ \sum\limits_{k = 1}^K \left\{ n_k \log \alpha_k + \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \right\} }{ \partial \mu_k } \\ &= \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left( - \frac{1}{2 \sigma_k^2 } \cdot 2( \mu_k - y_j ) \right) \\ &= - \frac{1}{ \sigma_k^2 } \sum\limits_{j = 1}^N \hat{\gamma}_{jk} ( \mu_k - y_j ) \\ &= \frac{1}{ \sigma_k^2 } \sum\limits_{j = 1}^N \hat{\gamma}_{jk} ( y_j - \mu_k ) \\ &= \frac{1}{ \sigma_k^2 } ( \sum\limits_{j = 1}^N \hat{\gamma}_{jk} y_j - \mu_k \sum\limits_{j = 1}^N \hat{\gamma}_{jk} ) \\ &= 0 \end{aligned} μkQ(θ,θ(i))=μk{k=1K{nklogαk+j=1Nγ^jk[log(2π 1)logσk2σk21(yjμk)2)]}}=j=1Nγ^jk(2σk212(μkyj))=σk21j=1Nγ^jk(μkyj)=σk21j=1Nγ^jk(yjμk)=σk21(j=1Nγ^jkyjμkj=1Nγ^jk)=0
μ ^ k = μ k ( i + 1 ) = ∑ j = 1 N γ ^ j k y j ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat{\mu}_k = \mu_k^{(i+1)} = \frac{ \sum\limits_{j = 1}^N \hat{\gamma}_{jk} y_j }{ \sum\limits_{j = 1}^N \hat{\gamma}_{jk} }, \quad k = 1,2,\cdots, K μ^k=μk(i+1)=j=1Nγ^jkj=1Nγ^jkyj,k=1,2,,K
∂ Q ( θ , θ ( i ) ) ∂ σ k 2 = ∂ { ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } } ∂ σ k 2 = ∂ { ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − 1 2 log ⁡ σ k 2 − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } } ∂ σ k 2 = ∑ j = 1 N γ ^ j k ( − 1 2 σ k 2 + 1 2 σ k 4 ⋅ ( μ k − y j ) 2 ) = − 1 2 σ 4 ∑ j = 1 N γ ^ j k ( σ k 2 − ( μ k − y j ) 2 ) = − 1 2 σ 4 ( σ k 2 ∑ j = 1 N γ ^ j k − ∑ j = 1 N γ ^ j k ( μ k − y j ) 2 ) = 0 \begin{aligned} \frac{ \partial Q(\theta,\theta^{(i)} ) }{ \partial \sigma_k^2 } &= \frac{ \partial \left\{ \sum\limits_{k = 1}^K \left\{ n_k \log \alpha_k + \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \right\} }{ \partial \sigma_k^2 } \\ &= \frac{ \partial \left\{ \sum\limits_{k = 1}^K \left\{ n_k \log \alpha_k + \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \frac{1}{2} \log \sigma_k^2 - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} \right\} }{ \partial \sigma_k^2 } \\ &= \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left( -\frac{1}{2 \sigma_k^2 } + \frac{1}{2 \sigma_k^4 } \cdot ( \mu_k - y_j )^2 \right) \\ &= - \frac{1}{ 2\sigma^4 } \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left( \sigma_k^2 - ( \mu_k - y_j )^2 \right) \\ &= - \frac{1}{ 2\sigma^4 } \left( \sigma_k^2 \sum\limits_{j = 1}^N \hat{\gamma}_{jk} - \sum\limits_{j = 1}^N \hat{\gamma}_{jk} ( \mu_k - y_j )^2 \right) \\ &= 0 \end{aligned} σk2Q(θ,θ(i))=σk2{k=1K{nklogαk+j=1Nγ^jk[log(2π 1)logσk2σk21(yjμk)2)]}}=σk2{k=1K{nklogαk+j=1Nγ^jk[log(2π 1)21logσk22σk21(yjμk)2)]}}=j=1Nγ^jk(2σk21+2σk41(μkyj)2)=2σ41j=1Nγ^jk(σk2(μkyj)2)=2σ41(σk2j=1Nγ^jkj=1Nγ^jk(μkyj)2)=0
σ ^ k 2 = σ k 2 ( i + 1 ) = ∑ j = 1 N γ ^ j k ( μ k − y j ) 2 ∑ j = 1 N γ ^ j k , k = 1 , 2 , ⋯   , K \hat{\sigma}_k^2 = \sigma_k^{2(i+1)} = \frac{ \sum\limits_{j = 1}^N \hat{\gamma}_{jk} ( \mu_k - y_j )^2 }{ \sum\limits_{j = 1}^N \hat{\gamma}_{jk} }, \quad k = 1,2,\cdots, K σ^k2=σk2(i+1)=j=1Nγ^jkj=1Nγ^jk(μkyj)2,k=1,2,,K
在这里插入图片描述

使用拉格朗日函数:
∂ { Q ( θ , θ ( i ) ) + λ ( 1 − ∑ k = 1 K α k ) } ∂ α k = ∂ { ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } + λ ( 1 − ∑ k = 1 K α k ) } ∂ α k = n k 1 α k − λ = 0 \begin{aligned} \frac{ \partial \left\{ Q(\theta,\theta^{(i)} ) + \lambda(1 - \sum\limits_{k = 1}^K \alpha_k ) \right\} }{ \partial \alpha_k } &= \frac{ \partial \left\{ \sum\limits_{k = 1}^K \left\{ n_k \log \alpha_k + \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} + \lambda(1 - \sum\limits_{k = 1}^K \alpha_k ) \right\} }{ \partial \alpha_k } \\ &= n_k \frac{1}{\alpha_k} - \lambda \\ &= 0 \end{aligned} αk{Q(θ,θ(i))+λ(1k=1Kαk)}=αk{k=1K{nklogαk+j=1Nγ^jk[log(2π 1)logσk2σk21(yjμk)2)]}+λ(1k=1Kαk)}=nkαk1λ=0
∂ { Q ( θ , θ ( i ) ) + λ ( 1 − ∑ k = 1 K α k ) } ∂ λ = ∂ { ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ ( 1 2 π ) − log ⁡ σ k − 1 2 σ k 2 ( y j − μ k ) 2 ) ] } + λ ( 1 − ∑ k = 1 K α k ) } ∂ λ = 1 − ∑ k = 1 K α k = 0 \begin{aligned} \frac{ \partial \left\{ Q(\theta,\theta^{(i)} ) + \lambda(1 - \sum\limits_{k = 1}^K \alpha_k ) \right\} }{ \partial \lambda } &= \frac{ \partial \left\{ \sum\limits_{k = 1}^K \left\{ n_k \log \alpha_k + \sum\limits_{j = 1}^N \hat{\gamma}_{jk} \left[ \log \left( \frac{1}{\sqrt{2\pi}} \right) - \log \sigma_k - \frac{1}{2 \sigma_k^2 } (y_j - \mu_k)^2) \right] \right\} + \lambda(1 - \sum\limits_{k = 1}^K \alpha_k ) \right\} }{ \partial \lambda } \\ &= 1 - \sum\limits_{k = 1}^K \alpha_k \\ &= 0 \end{aligned} λ{Q(θ,θ(i))+λ(1k=1Kαk)}=λ{k=1K{nklogαk+j=1Nγ^jk[log(2π 1)logσk2σk21(yjμk)2)]}+λ(1k=1Kαk)}=1k=1Kαk=0

α ^ k = α k ( i + 1 ) = n k λ = n k ∑ k = 1 K n k = n k N = ∑ j = 1 N γ ^ j k N , k = 1 , 2 , ⋯   , K \hat{\alpha}_k = \alpha_k^{(i+1)} = \frac{ n_k }{ \lambda } = \frac{ n_k }{ \sum\limits_{k = 1}^K n_k } = \frac{ n_k }{ N } = \frac{ \sum\limits_{j = 1}^N \hat{\gamma}_{jk} }{ N } , \quad k = 1,2,\cdots, K α^k=αk(i+1)=λnk=k=1Knknk=Nnk=Nj=1Nγ^jk,k=1,2,,K
在这里插入图片描述

步骤

在这里插入图片描述
在这里插入图片描述

EM算法的推广

在这里插入图片描述

F函数的极大-极大算法

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

GEM算法

GEM算法1

在这里插入图片描述

GEM算法2

在这里插入图片描述
在这里插入图片描述

GEM算法3

在这里插入图片描述

在这里插入图片描述

本章概要

在这里插入图片描述

备注

EM算法的推广这部分还没看,日后用到再回来细看。

相关视频

相关的笔记

hktxt /Learn-Statistical-Learning-Method

相关代码

Dod-o /Statistical-Learning-Method_Code
关于def loadData(mu0, sigma0, mu1, sigma1, alpha0, alpha1):
以概率 α k \alpha_k αk 使第 k k k 个高斯模型生成数据 y j y_j yj。并不像之前几个模型那样使用图像数据集。

关于def calcGauss(dataSetArr, mu, sigmod):
计算
在这里插入图片描述

pytorch

tensorflow

keras

pytorch API:

tensorflow API

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值