混合高斯模型:Gaussian Mixture Model

GMM


定义

(混合高斯模型) 高斯混合模型的概率分布模型形如:
P ( y ∣ θ ) = ∑ i = 1 K α k ϕ ( y ∣ θ k ) P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k) P(yθ)=i=1Kαkϕ(yθk)
其中, α k \alpha_k αk为系数,且 α k ≥ 0 \alpha_k\ge 0 αk0 , ∑ i = 1 K α k = 1 \sum_{i =1 }^{K} {\alpha_k}=1 i=1Kαk=1 ; ϕ ( y ∣ θ k ) \phi(y \mid \theta_k) ϕ(yθk)是第 k k k个高斯分布密度分模型, θ k = ( μ k , σ k 2 ) \theta_k = (\mu_k,\sigma_k^2) θk=(μk,σk2)
ϕ ( y ∣ θ k ) = 1 2 π σ k exp ⁡ ( − ( y − μ k ) 2 2 σ k 2 ) \phi(y \mid \theta_k) = \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y-\mu_k)^2}{2\sigma_k^2}}\right) ϕ(yθk)=2π σk1exp(2σk2(yμk)2)


模型参数估计—EM求解

假设观测数据 y 1 , y 2 , ⋯   , y n y_1,y_2,\cdots,y_n y1,y2,,yn由高斯混合模型生成,即 P ( y ∣ θ ) = ∑ i = 1 K α k ϕ ( y ∣ θ k ) P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k) P(yθ)=i=1Kαkϕ(yθk),其中的参数 θ = ( α 1 , ⋯   , α k ; θ 1 , ⋯   , θ k ) \theta = (\alpha_1,\cdots,\alpha_k;\theta_1,\cdots,\theta_k) θ=(α1,,αk;θ1,,θk)

1、确定隐含变量,写出完全数据的对数似然函数
在混合高斯模型中存在多个分模型,观测变量的观测值并不知道是由哪一个模型生成的,因此可以假设隐变量 γ j k \gamma_{jk} γjk,其定义为:
γ j k = { 1 ,    第 i 个 观 测 值 由 第 k 个 分 模 型 产 生 0 ,    o t h e r w i s e \gamma_{jk}=\left \{ \begin{array}{} 1,\;第i个观测值由第k个分模型产生\\ 0,\;otherwise\\ \end{array} \right. γjk={1,ik0,otherwise
由此,完全数据为 ( y j , γ j 1 , ⋯   , γ j K ) (y_j,\gamma_{j1},\cdots,\gamma_{jK}) (yj,γj1,,γjK),可以得到完全数据的似然函数:
P ( y , γ ∣ θ ) = L ( θ ) = ∏ j = 1 n P ( y 1 , γ j 1 , ⋯   , γ j K ∣ θ ) = ∏ j = 1 n ( γ j k ∑ i = 1 K α k ϕ ( y j ∣ θ k ) ) = ∏ j = 1 n ( ∏ i = 1 K ( α k ϕ ( y j ∣ θ k ) ) γ j k ) = ∏ j = 1 n ∏ i = 1 K ( α k ϕ ( y j ∣ θ k ) ) γ j k = ∏ j = 1 n ∏ i = 1 K ( α k 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ) γ j k \begin{aligned} P(y,\gamma \mid \theta) &= L(\theta)\\ &=\prod_{j=1}^n P(y_1,\gamma_{j1},\cdots,\gamma_{jK}\mid \theta)\\ &=\prod_{j=1}^n (\gamma_{jk}\sum_{i =1 }^K \alpha_k \phi(y_j \mid \theta_k))\\ &=\prod_{j=1}^n \left(\prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\right)\\ &=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\\ &=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\ \end{aligned} P(y,γθ)=L(θ)=j=1nP(y1,γj1,,γjKθ)=j=1n(γjki=1Kαkϕ(yjθk))=j=1n(i=1K(αkϕ(yjθk))γjk)=j=1ni=1K(αkϕ(yjθk))γjk=j=1ni=1K(αk2π σk1exp(2σk2(yjμk)2))γjk
那么完全数据的对数似然函数为:
log ⁡ P ( y , γ ∣ θ ) = ℓ ( θ ) = ∑ i = 1 K ∑ j = 1 n log ⁡ ( α k 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ) γ j k = ∑ i = 1 K ∑ j = 1 n γ j k log ⁡ ( α k 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ) = ∑ i = 1 K ∑ j = 1 n γ j k log ⁡ α k + ∑ i = 1 K ∑ j = 1 n γ j k log ⁡ ( 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ) = ∑ i = 1 K log ⁡ α k ∑ j = 1 n γ j k + ∑ i = 1 K ∑ j = 1 n γ j k log ⁡ ( 1 2 π σ k exp ⁡ ( − ( y j − μ k ) 2 2 σ k 2 ) ) = ∑ i = 1 K log ⁡ α k ∑ j = 1 n γ j k + ∑ i = 1 K ∑ j = 1 n γ j k [ log ⁡ 1 2 π − log ⁡ σ k + ( − ( y j − μ k ) 2 2 σ k 2 ) ] \begin{aligned} \log P(y,\gamma \mid \theta) &= \ell (\theta)\\ &=\sum_{i=1}^K \sum_{j=1}^n \log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\ &=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \alpha_k +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\\ \end{aligned} logP(y,γθ)=(θ)=i=1Kj=1nlog(αk2π σk1exp(2σk2(yjμk)2))γjk=i=1Kj=1nγjklog(αk2π σk1exp(2σk2(yjμk)2))=i=1Kj=1nγjklogαk+i=1Kj=1nγjklog(2π σk1exp(2σk2(yjμk)2))=i=1Klogαkj=1nγjk+i=1Kj=1nγjklog(2π σk1exp(2σk2(yjμk)2))=i=1Klogαkj=1nγjk+i=1Kj=1nγjk[log2π 1logσk+(2σk2(yjμk)2)]

2、E步:确定Q函数
Q ( θ , θ ( i ) ) = E [ log ⁡ P ( y , γ ∣ θ ) ∣ y , θ ( i ) ] = E { ∑ i = 1 K log ⁡ α k ∑ j = 1 n γ j k + ∑ i = 1 K ∑ j = 1 n γ j k [ log ⁡ 1 2 π − log ⁡ σ k + ( − ( y j − μ k ) 2 2 σ k 2 ) ] } = ∑ i = 1 K { log ⁡ α k ∑ j = 1 n E γ j k + ∑ j = 1 n E γ j k [ log ⁡ 1 2 π − log ⁡ σ k + ( − ( y j − μ k ) 2 2 σ k 2 ) ] } \begin{aligned} Q(\theta,\theta^{(i)}) &=E[ \log P(y,\gamma \mid \theta)\mid y,\theta^{(i)} ]\\ &=E\left\{\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\ &=\sum_{i=1}^K\left\{\log \alpha_k\sum_{j=1}^n {E\gamma_{jk}} +\sum_{j=1}^n {E\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\ \end{aligned} Q(θ,θ(i))=E[logP(y,γθ)y,θ(i)]=E{i=1Klogαkj=1nγjk+i=1Kj=1nγjk[log2π 1logσk+(2σk2(yjμk)2)]}=i=1K{logαkj=1nEγjk+j=1nEγjk[log2π 1logσk+(2σk2(yjμk)2)]}
γ ^ j k = E ( γ j k ∣ y , θ ) \hat \gamma_{jk}= E(\gamma_{jk}\mid y,\theta) γ^jk=E(γjky,θ)则:
γ ^ j k = 1 ⋅ P ( γ j k = 1 ∣ y , θ ) + 0 ⋅ P ( γ j k = 0 ∣ y , θ ) = P ( γ j k = 1 , y j ∣ θ ) 1 = P ( γ j k = 1 , y j ∣ θ ) ∑ k = 1 K P ( γ j k = 1 , y j ∣ θ ) = P ( y j ∣ γ j k = 1 , θ ) P ( γ j k = 1 ∣ θ ) ∑ k = 1 K P ( y j ∣ γ j k = 1 , θ ) P ( γ j k = 1 ∣ θ ) = α k ϕ ( y j ∣ θ k ) ∑ k = 1 K α k ϕ ( y j ∣ θ k ) \begin{aligned} \hat \gamma_{jk} &= 1\cdot P(\gamma_{jk} = 1 \mid y,\theta)+ 0\cdot P(\gamma_{jk} = 0 \mid y,\theta)\\ &=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{1}\\ &=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{\sum_{k=1}^KP(\gamma_{jk} = 1 ,y_j\mid \theta)}\\ &=\frac {P(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}{\sum_{k=1}^KP(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}\\ &=\frac{\alpha_k \phi(y_j\mid \theta_{k})}{\sum_{k=1}^K\alpha_k \phi(y_j\mid \theta_{k})}\\ \end{aligned} γ^jk=1P(γjk=1y,θ)+0P(γjk=0y,θ)=1P(γjk=1,yjθ)=k=1KP(γjk=1,yjθ)P(γjk=1,yjθ)=k=1KP(yjγjk=1,θ)P(γjk=1θ)P(yjγjk=1,θ)P(γjk=1θ)=k=1Kαkϕ(yjθk)αkϕ(yjθk)
γ ^ j k \hat \gamma_{jk} γ^jk是在当前模型下第 j j j个观测数据来自第 k k k个分模型的概率,成为分模型 k k k对观测数据 y j y_j yj的响应度。

3、M步
迭代中的M步是求函数 Q ( θ , θ ( i ) ) Q(\theta, \theta^{(i)}) Q(θ,θ(i)) θ \theta θ的极大值,即:
θ ( i + 1 ) = arg ⁡ max ⁡ θ    Q ( θ , θ ( i ) ) \theta^{(i+1)} =\arg \underset{\theta}{\max} \; Q(\theta,\theta^{(i)}) θ(i+1)=argθmaxQ(θ,θ(i))
依据 Q Q Q函数对各个参数( μ k , σ k 2 , α k \mu_k,\sigma_k^2,\alpha_k μk,σk2,αk)求偏导并置其为0得到新的参数为:
μ ^ k = ∑ j = 1 n γ ^ j k y j ∑ j = 1 n γ ^ j k σ ^ k 2 = ∑ j = 1 n γ ^ j k ( y j − μ k ) 2 ∑ j = 1 n γ ^ j k α ^ k = ∑ j = 1 n γ ^ j k n \begin{aligned} \hat \mu_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}y_j}{\sum_{j=1}^n\hat \gamma_{jk}}\\ \hat \sigma_k^2 &= \frac{\sum_{j=1}^n \hat \gamma_{jk}(y_j-\mu_k)^2}{\sum_{j=1}^n\hat \gamma_{jk}}\\ \hat \alpha_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}}{n}\\ \end{aligned} μ^kσ^k2α^k=j=1nγ^jkj=1nγ^jkyj=j=1nγ^jkj=1nγ^jk(yjμk)2=nj=1nγ^jk

4、重复迭代,直到参数收敛即可。

算法流程
输入:观测数据 Y Y Y,高斯混合模型;
输出:高斯混合模型参数
(1)为参数赋予初值;
(2)E步,计算分模型对观测数据的响应度;
(3)M步,计算新一轮的迭代模型参数;
(4)重复(2)(3)步,直到模型参数收敛。


推广

EM算法可以解释为F函数的极大-极大算法,可以推广为广义期望极大算法(GEM)。


Reference:

[1]李航:《统计学习方法》

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值