EM算法详解

EM算法

EM算法是含有隐变量的概率模型参数的极大似然估计法。

Y Y Y表示观测变量的数据, Z Z Z表示隐变量的数据, θ \theta θ表示要估计的参数, Y Y Y Z Z Z连在一起称为完全数据,观测数据 Y Y Y称为不完全数据,假设 Y Y Y的概率分布是 P ( Y ∣ θ ) P(Y\mid\theta) P(Yθ),那么不完全数据 Y Y Y的对数似然函数是 log ⁡ P ( Y ∣ θ ) \log P(Y\mid\theta) logP(Yθ),假设 Y Y Y Z Z Z的联合概率分布是 P ( Y , Z ∣ θ ) P(Y,Z\mid\theta) P(Y,Zθ),那么完全数据的对数似然函数是 log ⁡ P ( Y , Z ∣ θ ) \log P(Y,Z\mid\theta) logP(Y,Zθ)

对含有隐变量的概率模型,目标是极大化观测数据 Y Y Y对于模型参数 θ \theta θ的对数似然函数,即极大化
L ( θ ) = log ⁡ P ( Y ∣ θ ) = log ⁡ ( ∑ z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) ) L(\theta)=\log P(Y\mid\theta)=\log(\sum_zP(Y\mid Z,\theta)P(Z\mid\theta)) L(θ)=logP(Yθ)=log(zP(YZ,θ)P(Zθ))
这个式子的困难在于存在未观测数据并且包含和的对数,EM算法通过迭代逐步近似极大化 L ( θ ) L(\theta) L(θ),假设在第 i i i次迭代后 θ \theta θ的估计值是 θ ( i ) \theta^{(i)} θ(i),我们希望新的估计值 θ \theta θ能使 L ( θ ) L(\theta) L(θ)增加,即 L ( θ ) > L ( θ ( i ) ) L(\theta)>L(\theta^{(i)}) L(θ)>L(θ(i)),并逐步达到极大值,为此考虑两者的差
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( ∑ z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) ) − log ⁡ P ( Y ∣ θ ( i ) ) L(\theta)-L(\theta_{(i)})=\log(\sum_zP(Y\mid Z,\theta)P(Z\mid\theta))-\log P(Y\mid\theta^{(i)}) L(θ)L(θ(i))=log(zP(YZ,θ)P(Zθ))logP(Yθ(i))
利用 J e n s e n Jensen Jensen不等式
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( ∑ z P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ⁡ P ( Y ∣ θ ( i ) ) ≥ ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ⁡ P ( Y ∣ θ ( i ) ) = ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y ∣ θ ( i ) ) = ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ θ ( i ) ) ) \begin{aligned} L(\theta)-L(\theta_{(i)}) &=\log(\sum_zP(Z\mid Y,\theta^{(i)})\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})})-\log P(Y\mid\theta^{(i)})\\ &\geq\sum_zP(Z\mid Y,\theta^{(i)})\log(\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})})-\log P(Y\mid\theta^{(i)})\\ &=\sum_zP(Z\mid Y,\theta^{(i)})\log(\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})})-\sum_zP(Z\mid Y,\theta^{(i)})\log P(Y\mid\theta^{(i)})\\ &=\sum_zP(Z\mid Y,\theta^{(i)})\log(\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})P(Y\mid\theta^{(i)})}) \end{aligned} L(θ)L(θ(i))=log(zP(ZY,θ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))logP(Yθ(i))zP(ZY,θ(i))log(P(ZY,θ(i))P(YZ,θ)P(Zθ))logP(Yθ(i))=zP(ZY,θ(i))log(P(ZY,θ(i))P(YZ,θ)P(Zθ))zP(ZY,θ(i))logP(Yθ(i))=zP(ZY,θ(i))log(P(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ))

B ( θ , θ ( i ) ) = L ( θ ( i ) ) + ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ θ ( i ) ) ) B(\theta,\theta^{(i)})=L(\theta^{(i)})+\sum_zP(Z\mid Y,\theta^{(i)})\log(\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})P(Y\mid\theta^{(i)})}) B(θ,θ(i))=L(θ(i))+zP(ZY,θ(i))log(P(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ))


L ( θ ) ≥ B ( θ , θ ( i ) ) L(\theta)\geq B(\theta,\theta^{(i)}) L(θ)B(θ,θ(i))
并且 L ( θ ( i ) ) = B ( θ ( i ) , θ ( i ) ) L(\theta^{(i)})=B(\theta^{(i)},\theta^{(i)}) L(θ(i))=B(θ(i),θ(i)),因此任何能使 B ( θ , θ ( i ) ) B(\theta,\theta^{(i)}) B(θ,θ(i))增大的 θ \theta θ也一定能使 L ( θ ) L(\theta) L(θ)增大,为使 L ( θ ) L(\theta) L(θ)增长尽可能的大,应选择 θ ( i + 1 ) \theta^{(i+1)} θ(i+1)使 B ( θ , θ ( i ) ) B(\theta,\theta^{(i)}) B(θ,θ(i))达到极大,即
θ ( i + 1 ) = arg ⁡ max ⁡ θ B ( θ , θ ( i ) ) = arg ⁡ max ⁡ θ L ( θ ( i ) ) + ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ θ ( i ) ) ) = arg ⁡ max ⁡ θ ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) = arg ⁡ max ⁡ θ ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) = arg ⁡ max ⁡ θ E z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] \begin{aligned} \theta^{(i+1)} &=\arg\max\limits_{\theta}B(\theta,\theta^{(i)})\\ &=\arg\max\limits_{\theta}L(\theta^{(i)})+\sum_zP(Z\mid Y,\theta^{(i)})\log(\frac{P(Y\mid Z,\theta)P(Z\mid\theta)}{P(Z\mid Y,\theta^{(i)})P(Y\mid\theta^{(i)})})\\ &=\arg\max\limits_{\theta}\sum_zP(Z\mid Y,\theta^{(i)})\log(P(Y\mid Z,\theta)P(Z\mid\theta)\\ &=\arg\max\limits_{\theta}\sum_zP(Z\mid Y,\theta^{(i)})\log P(Y,Z\mid\theta)\\ &=\arg\max\limits_{\theta}E_z[\log P(Y,Z\mid\theta)\mid Y,\theta^{(i)}] \end{aligned} θ(i+1)=argθmaxB(θ,θ(i))=argθmaxL(θ(i))+zP(ZY,θ(i))log(P(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ))=argθmaxzP(ZY,θ(i))log(P(YZ,θ)P(Zθ)=argθmaxzP(ZY,θ(i))logP(Y,Zθ)=argθmaxEz[logP(Y,Zθ)Y,θ(i)]

Q ( θ , θ ( i ) ) = E z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] Q(\theta,\theta^{(i)})=E_z[\log P(Y,Z\mid\theta)\mid Y,\theta^{(i)}] Q(θ,θ(i))=Ez[logP(Y,Zθ)Y,θ(i)]
即完全数据的对数似然 l o g ( Y , Z ∣ θ ) log(Y,Z\mid\theta) log(Y,Zθ)关于在给定观测数据和当前参数 θ ( i ) \theta^{(i)} θ(i)下对未观测数据 Z Z Z的期望,因此
θ ( i + 1 ) = arg ⁡ max ⁡ θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=\arg\max\limits_{\theta}Q(\theta,\theta^{(i)}) θ(i+1)=argθmaxQ(θ,θ(i))
EM算法

输入:观测数据变量 Y Y Y,隐变量数据 Z Z Z,联合分布 P ( Y , Z ∣ θ ) P(Y,Z\mid\theta) P(Y,Zθ),条件分布 P ( Z ∣ Y , θ ) P(Z\mid Y,\theta) P(ZY,θ)

输出:模型参数 θ \theta θ

(1) 选择参数的初始值 θ ( 0 ) \theta^{(0)} θ(0),开始迭代

(2) E步:计算 Q ( θ , θ ( i ) ) = ∑ z P ( Z ∣ Y , θ ( i ) ) log ⁡ P ( Y , Z ∣ θ ) Q(\theta,\theta^{(i)})=\sum_zP(Z\mid Y,\theta^{(i)})\log P(Y,Z\mid\theta) Q(θ,θ(i))=zP(ZY,θ(i))logP(Y,Zθ)

(3) M步:计算 θ ( i + 1 ) = arg ⁡ max ⁡ θ Q ( θ , θ ( i ) ) \theta^{(i+1)}=\arg\max\limits_{\theta}Q(\theta,\theta^{(i)}) θ(i+1)=argθmaxQ(θ,θ(i))

(4) 重复执行第(2)步和第(3)步,直至收敛

高斯混合模型

高斯混合分布是具有如下形式的概率分布
P ( x ∣ θ ) = ∑ k = 1 K α k ϕ ( x ∣ θ k ) P(x\mid\theta)=\sum_{k=1}^K\alpha_k\phi(x\mid\theta_k) P(xθ)=k=1Kαkϕ(xθk)
其中 α k \alpha_k αk是系数, ∑ k = 1 K α k = 1 \sum_{k=1}^K\alpha_k=1 k=1Kαk=1 α k ≥ 0 \alpha_k\geq0 αk0 ϕ ( y ∣ θ k ) \phi(y\mid\theta_k) ϕ(yθk)是高斯概率密度函数, θ k = ( μ k , σ k 2 ) \theta_k=(\mu_k,\sigma_k^2) θk=(μk,σk2)
ϕ ( x ∣ θ k ) = 1 2 π σ k exp ⁡ ( − ( x − μ k ) 2 2 σ k 2 ) \phi(x\mid\theta_k)=\frac{1}{\sqrt{2\pi}\sigma_k}\exp(-\frac{(x-\mu_k)^2}{2\sigma_k^2}) ϕ(xθk)=2π σk1exp(2σk2(xμk)2)
假设样本的生成过程由高斯混合分布给出:首先根据 α 1 , α 2 , … , α k \alpha_1,\alpha_2,\dots,\alpha_k α1,α2,,αk选择一个高斯混合成分,然后根据被选择的高斯混合成分生成观测数据。这是观测数据是已知的,观测数据来自哪个高斯分布是未知的,以隐变量 γ j k \gamma_{jk} γjk表示,其定义如下:
γ j k = { 1 , 第 j 个 观 测 来 自 第 k 个 分 模 型 0 , 其 他 j = 1 , 2 , … , N ;    k = 1 , 2 , … , k \begin{aligned} \gamma_{jk}=&\begin{cases} 1,\quad第j个观测来自第k个分模型\\ 0,\quad其他 \end{cases}\\\\ &j=1,2,\dots,N;\;k=1,2,\dots,k \end{aligned} γjk={1,jk0,j=1,2,,N;k=1,2,,k
那么完全数据是
( y j , γ j 1 , γ j 2 , … , γ j k ) , j = 1 , 2 , … , N (y_j,\gamma_{j1},\gamma_{j2},\dots,\gamma_{jk}),\quad j=1,2,\dots,N (yj,γj1,γj2,,γjk),j=1,2,,N
于是完全数据的似然函数
P ( γ , y ∣ θ ) = ∏ j = 1 N P ( y j , γ j 1 , γ j 2 , … , γ j k ∣ θ ) = ∏ j = 1 N ∏ k = 1 K [ α k ϕ ( x j ∣ θ k ) ] γ j k = ∏ k = 1 K α k n k ∏ j = 1 N [ ϕ ( x j ∣ θ k ) ] γ j k = ∏ k = 1 K α k n k ∏ j = 1 N [ 1 2 π σ k exp ⁡ ( − ( x j − μ k ) 2 2 σ k 2 ) ] γ j k \begin{aligned} P(\gamma,y\mid\theta) &=\prod_{j=1}^{N}P(y_j,\gamma_{j1},\gamma_{j2},\dots,\gamma_{jk}\mid\theta)\\ &=\prod_{j=1}^{N}\prod_{k=1}^{K}[\alpha_k\phi(x_j\mid\theta_k)]^{\gamma_{jk}}\\ &=\prod_{k=1}^{K}\alpha_k^{n_k}\prod_{j=1}^{N}[\phi(x_j\mid\theta_k)]^{\gamma_{jk}}\\ &=\prod_{k=1}^{K}\alpha_k^{n_k}\prod_{j=1}^{N}[\frac{1}{\sqrt{2\pi}\sigma_k}\exp(-\frac{(x_j-\mu_k)^2}{2\sigma_k^2})]^{\gamma_{jk}} \end{aligned} P(γ,yθ)=j=1NP(yj,γj1,γj2,,γjkθ)=j=1Nk=1K[αkϕ(xjθk)]γjk=k=1Kαknkj=1N[ϕ(xjθk)]γjk=k=1Kαknkj=1N[2π σk1exp(2σk2(xjμk)2)]γjk
其中 n k = ∑ j = 1 N γ j k n_k=\sum_{j=1}^{N}\gamma_{jk} nk=j=1Nγjk ∑ k = 1 K n k = N \sum_{k=1}^{K}n_k=N k=1Knk=N,那么完全数据的对数似然是
log ⁡ P ( γ , y ∣ θ ) = ∑ k = 1 K n k log ⁡ α k + ∑ k = 1 K ∑ j = 1 N [ γ j k ( log ⁡ 1 2 π − log ⁡ σ k − ( x j − μ k ) 2 2 σ k 2 ) ] \log P(\gamma,y\mid\theta)=\sum_{k=1}^{K}n_k\log\alpha_k+\sum_{k=1}^{K}\sum_{j=1}^{N}[\gamma_{jk}(\log\frac{1}{\sqrt{2\pi}}-\log\sigma_k-\frac{(x_j-\mu_k)^2}{2\sigma_k^2})] logP(γ,yθ)=k=1Knklogαk+k=1Kj=1N[γjk(log2π 1logσk2σk2(xjμk)2)]
需要极大化的 Q Q Q函数是
Q ( θ , θ ( i ) ) = E γ [ log ⁡ P ( γ , y ∣ θ ) ∣ y , θ ( i ) ] = E γ { ∑ k = 1 K n k log ⁡ α k + ∑ k = 1 K ∑ j = 1 N [ γ j k ( log ⁡ 1 2 π − log ⁡ σ k − ( x j − μ k ) 2 2 σ k 2 ) ] } = ∑ k = 1 K { ∑ j = 1 N ( E γ j k ) log ⁡ α k + ∑ j = 1 N ( E γ j k ) [ log ⁡ 1 2 π − log ⁡ σ k − ( x j − μ k ) 2 2 σ k 2 ] } \begin{aligned} Q(\theta,\theta^{(i)}) &=E_\gamma[\log P(\gamma,y\mid\theta)\mid y,\theta^{(i)}]\\ &=E_\gamma\{\sum_{k=1}^{K}n_k\log\alpha_k+\sum_{k=1}^{K}\sum_{j=1}^{N}[\gamma_{jk}(\log\frac{1}{\sqrt{2\pi}}-\log\sigma_k-\frac{(x_j-\mu_k)^2}{2\sigma_k^2})]\}\\ &=\sum_{k=1}^{K}\{\sum_{j=1}^{N}(E\gamma_{jk})\log\alpha_k+\sum_{j=1}^{N}(E\gamma_{jk})[\log\frac{1}{\sqrt{2\pi}}-\log\sigma_k-\frac{(x_j-\mu_k)^2}{2\sigma_k^2}]\} \end{aligned} Q(θ,θ(i))=Eγ[logP(γ,yθ)y,θ(i)]=Eγ{k=1Knklogαk+k=1Kj=1N[γjk(log2π 1logσk2σk2(xjμk)2)]}=k=1K{j=1N(Eγjk)logαk+j=1N(Eγjk)[log2π 1logσk2σk2(xjμk)2]}
这里需要计算 E γ j k = E ( γ j k ∣ y , θ ) E\gamma_{jk}=E(\gamma_{jk}\mid y,\theta) Eγjk=E(γjky,θ)
E ( γ j k ∣ y , θ ) = P ( γ j k = 1 ∣ y , θ ) = P ( γ j k = 1 , y j ∣ θ ) P ( y j ∣ θ ) = P ( γ j k = 1 , y j ∣ θ ) ∑ k = 1 K P ( γ j k = 1 , y j ∣ θ ) = P ( y j ∣ γ j k = 1 , θ ) P ( γ j k = 1 ∣ θ ) ∑ k = 1 K P ( y j ∣ γ j k = 1 , θ ) P ( γ j k = 1 ∣ θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) \begin{aligned} E(\gamma_{jk}\mid y,\theta) &=P(\gamma_{jk}=1\mid y,\theta)\\ &=\frac{P(\gamma_{jk}=1,y_j\mid\theta)}{P(y_j\mid\theta)}\\ &=\frac{P(\gamma_{jk}=1,y_j\mid\theta)}{\sum_{k=1}^{K}P(\gamma_{jk}=1,y_j\mid\theta)}\\ &=\frac{P(y_j\mid\gamma_{jk}=1,\theta)P(\gamma_{jk}=1\mid\theta)}{\sum_{k=1}^{K}P(y_j\mid\gamma_{jk}=1,\theta)P(\gamma_{jk}=1\mid\theta)}\\ &=\frac{\alpha_k\phi(y_j\mid\theta_k)}{\sum_{k=1}^{K}\alpha_k\phi(y_j\mid\theta_k)} \end{aligned} E(γjky,θ)=P(γjk=1y,θ)=P(yjθ)P(γjk=1,yjθ)=k=1KP(γjk=1,yjθ)P(γjk=1,yjθ)=k=1KP(yjγjk=1,θ)P(γjk=1θ)P(yjγjk=1,θ)P(γjk=1θ)=k=1Kαkϕ(yjθk)αkϕ(yjθk)
E ( γ j k ∣ y , θ ) E(\gamma_{jk}\mid y,\theta) E(γjky,θ)表示当前参数下第 j j j个观测数据来自第 k k k个混合成分的概率,记为 γ ^ j k \hat{\gamma}_{jk} γ^jk。综上所述
Q ( θ , θ ( i ) ) = ∑ k = 1 K { n k log ⁡ α k + ∑ j = 1 N γ ^ j k [ log ⁡ 1 2 π − log ⁡ σ k − ( x j − μ k ) 2 2 σ k 2 ] } Q(\theta,\theta^{(i)})=\sum_{k=1}^{K}\{n_k\log\alpha_k+\sum_{j=1}^{N}\hat{\gamma}_{jk}[\log\frac{1}{\sqrt{2\pi}}-\log\sigma_k-\frac{(x_j-\mu_k)^2}{2\sigma_k^2}]\} Q(θ,θ(i))=k=1K{nklogαk+j=1Nγ^jk[log2π 1logσk2σk2(xjμk)2]}
其中 n k = ∑ j = 1 N γ ^ j k n_k=\sum_{j=1}^{N}\hat{\gamma}_{jk} nk=j=1Nγ^jk,接下来需对 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))求极大,需要求 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))对每个参数的偏导。

Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) μ k \mu_k μk的偏导
∂ Q ( θ , θ ( i ) ) ∂ μ k = ∑ j = 1 N γ j k ( x j − μ k ) σ k 2 \frac{\partial Q(\theta, \theta^{(i)})}{\partial \mu_k}=\sum_{j=1}^{N}\frac{\gamma_{jk}(x_j-\mu_k)}{\sigma^2_k} μkQ(θ,θ(i))=j=1Nσk2γjk(xjμk)
所以
μ ^ k = ∑ j = 1 N γ j k x j ∑ j = 1 N γ j k \hat{\mu}_k=\frac{\sum_{j=1}^{N}\gamma_{jk}x_j}{\sum_{j=1}^{N}\gamma_{jk}} μ^k=j=1Nγjkj=1Nγjkxj
Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) σ k 2 \sigma_k^2 σk2的偏导
∂ Q ( θ , θ ( i ) ) ∂ σ k 2 = − 1 2 ∑ j = 1 N γ ^ j k ( 1 σ k 2 − ( x j − μ k ) 2 σ k 4 ) \frac{\partial Q(\theta,\theta^{(i)})}{\partial\sigma_k^2}=-\frac{1}{2}\sum_{j=1}^{N}\hat{\gamma}_{jk}(\frac{1}{\sigma^2_k}-\frac{(x_j-\mu_k)^2}{\sigma^4_k}) σk2Q(θ,θ(i))=21j=1Nγ^jk(σk21σk4(xjμk)2)
所以
σ ^ k 2 = ∑ j = 1 N γ ^ j k ( x j − μ k ) 2 ∑ j = 1 N γ ^ j k \hat{\sigma}_k^2=\frac{\sum_{j=1}^{N}\hat{\gamma}^{jk}(x_j-\mu_k)^2}{\sum_{j=1}^{N}\hat{\gamma}_{jk}} σ^k2=j=1Nγ^jkj=1Nγ^jk(xjμk)2
Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) α k \alpha_k αk的偏导:由于存在约束条件 ∑ k = 1 K α k = 1 \sum_{k=1}^{K}\alpha_k=1 k=1Kαk=1,所以考虑 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))的拉格朗日函数
L ( θ , θ ( i ) ) = Q ( θ , θ ( i ) ) + λ ( ∑ k = 1 K α k − 1 ) L(\theta,\theta^{(i)})=Q(\theta,\theta^{(i)})+\lambda(\sum_{k=1}^K\alpha_k-1) L(θ,θ(i))=Q(θ,θ(i))+λ(k=1Kαk1)
求偏导
∂ L ( θ , θ ( i ) ) ∂ α k = ∑ j = 1 N γ ^ j k α k + λ \frac{\partial L(\theta,\theta^{(i)})}{\partial\alpha_k}=\frac{\sum_{j=1}^N\hat{\gamma}_{jk}}{\alpha_k}+\lambda αkL(θ,θ(i))=αkj=1Nγ^jk+λ
令偏导等于零,即
∑ j = 1 N γ ^ j k + λ α k = 0 \sum_{j=1}^N\hat{\gamma}_{jk}+\lambda\alpha_k=0 j=1Nγ^jk+λαk=0
为求解 λ \lambda λ,对所有分模型求和得
∑ k = 1 K ∑ j = 1 N γ ^ j k + λ ∑ k = 1 K α k = 0 \sum_{k=1}^K\sum_{j=1}^N\hat{\gamma}_{jk}+\lambda\sum_{k=1}^K\alpha_k=0 k=1Kj=1Nγ^jk+λk=1Kαk=0
解得 λ = − N \lambda=-N λ=N,所以
α ^ k = ∑ j = 1 N γ ^ j k N \hat{\alpha}_k=\frac{\sum_{j=1}^{N}\hat{\gamma}_{jk}}{N} α^k=Nj=1Nγ^jk

多元高斯分布与此类似,更新公式几乎是一模一样的,但计算却更为复杂,涉及到复杂的求导。

根据高斯混合模型的计算结果,可以得到一种聚类方法,即根据 γ \gamma γ的取值判断每个样本属于哪一类,对于样本 x j x_j xj,其所属的类别是
arg ⁡ max ⁡ k γ j k \arg\max\limits_{k}\gamma_{jk} argkmaxγjk

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值