GMM
定义
(混合高斯模型) 高斯混合模型的概率分布模型形如:
P
(
y
∣
θ
)
=
∑
i
=
1
K
α
k
ϕ
(
y
∣
θ
k
)
P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k)
P(y∣θ)=i=1∑Kαkϕ(y∣θk)
其中,
α
k
\alpha_k
αk为系数,且
α
k
≥
0
\alpha_k\ge 0
αk≥0 ,
∑
i
=
1
K
α
k
=
1
\sum_{i =1 }^{K} {\alpha_k}=1
∑i=1Kαk=1 ;
ϕ
(
y
∣
θ
k
)
\phi(y \mid \theta_k)
ϕ(y∣θk)是第
k
k
k个高斯分布密度分模型,
θ
k
=
(
μ
k
,
σ
k
2
)
\theta_k = (\mu_k,\sigma_k^2)
θk=(μk,σk2)。
ϕ
(
y
∣
θ
k
)
=
1
2
π
σ
k
exp
(
−
(
y
−
μ
k
)
2
2
σ
k
2
)
\phi(y \mid \theta_k) = \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y-\mu_k)^2}{2\sigma_k^2}}\right)
ϕ(y∣θk)=2πσk1exp(−2σk2(y−μk)2)
模型参数估计—EM求解
假设观测数据 y 1 , y 2 , ⋯   , y n y_1,y_2,\cdots,y_n y1,y2,⋯,yn由高斯混合模型生成,即 P ( y ∣ θ ) = ∑ i = 1 K α k ϕ ( y ∣ θ k ) P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k) P(y∣θ)=∑i=1Kαkϕ(y∣θk),其中的参数 θ = ( α 1 , ⋯   , α k ; θ 1 , ⋯   , θ k ) \theta = (\alpha_1,\cdots,\alpha_k;\theta_1,\cdots,\theta_k) θ=(α1,⋯,αk;θ1,⋯,θk)
1、确定隐含变量,写出完全数据的对数似然函数
在混合高斯模型中存在多个分模型,观测变量的观测值并不知道是由哪一个模型生成的,因此可以假设隐变量
γ
j
k
\gamma_{jk}
γjk,其定义为:
γ
j
k
=
{
1
,
  
第
i
个
观
测
值
由
第
k
个
分
模
型
产
生
0
,
  
o
t
h
e
r
w
i
s
e
\gamma_{jk}=\left \{ \begin{array}{} 1,\;第i个观测值由第k个分模型产生\\ 0,\;otherwise\\ \end{array} \right.
γjk={1,第i个观测值由第k个分模型产生0,otherwise
由此,完全数据为
(
y
j
,
γ
j
1
,
⋯
 
,
γ
j
K
)
(y_j,\gamma_{j1},\cdots,\gamma_{jK})
(yj,γj1,⋯,γjK),可以得到完全数据的似然函数:
P
(
y
,
γ
∣
θ
)
=
L
(
θ
)
=
∏
j
=
1
n
P
(
y
1
,
γ
j
1
,
⋯
 
,
γ
j
K
∣
θ
)
=
∏
j
=
1
n
(
γ
j
k
∑
i
=
1
K
α
k
ϕ
(
y
j
∣
θ
k
)
)
=
∏
j
=
1
n
(
∏
i
=
1
K
(
α
k
ϕ
(
y
j
∣
θ
k
)
)
γ
j
k
)
=
∏
j
=
1
n
∏
i
=
1
K
(
α
k
ϕ
(
y
j
∣
θ
k
)
)
γ
j
k
=
∏
j
=
1
n
∏
i
=
1
K
(
α
k
1
2
π
σ
k
exp
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
)
γ
j
k
\begin{aligned} P(y,\gamma \mid \theta) &= L(\theta)\\ &=\prod_{j=1}^n P(y_1,\gamma_{j1},\cdots,\gamma_{jK}\mid \theta)\\ &=\prod_{j=1}^n (\gamma_{jk}\sum_{i =1 }^K \alpha_k \phi(y_j \mid \theta_k))\\ &=\prod_{j=1}^n \left(\prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\right)\\ &=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\\ &=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\ \end{aligned}
P(y,γ∣θ)=L(θ)=j=1∏nP(y1,γj1,⋯,γjK∣θ)=j=1∏n(γjki=1∑Kαkϕ(yj∣θk))=j=1∏n(i=1∏K(αkϕ(yj∣θk))γjk)=j=1∏ni=1∏K(αkϕ(yj∣θk))γjk=j=1∏ni=1∏K(αk2πσk1exp(−2σk2(yj−μk)2))γjk
那么完全数据的对数似然函数为:
log
P
(
y
,
γ
∣
θ
)
=
ℓ
(
θ
)
=
∑
i
=
1
K
∑
j
=
1
n
log
(
α
k
1
2
π
σ
k
exp
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
)
γ
j
k
=
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
log
(
α
k
1
2
π
σ
k
exp
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
)
=
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
log
α
k
+
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
log
(
1
2
π
σ
k
exp
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
)
=
∑
i
=
1
K
log
α
k
∑
j
=
1
n
γ
j
k
+
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
log
(
1
2
π
σ
k
exp
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
)
=
∑
i
=
1
K
log
α
k
∑
j
=
1
n
γ
j
k
+
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
[
log
1
2
π
−
log
σ
k
+
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
]
\begin{aligned} \log P(y,\gamma \mid \theta) &= \ell (\theta)\\ &=\sum_{i=1}^K \sum_{j=1}^n \log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\ &=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \alpha_k +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\ &=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\\ \end{aligned}
logP(y,γ∣θ)=ℓ(θ)=i=1∑Kj=1∑nlog(αk2πσk1exp(−2σk2(yj−μk)2))γjk=i=1∑Kj=1∑nγjklog(αk2πσk1exp(−2σk2(yj−μk)2))=i=1∑Kj=1∑nγjklogαk+i=1∑Kj=1∑nγjklog(2πσk1exp(−2σk2(yj−μk)2))=i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjklog(2πσk1exp(−2σk2(yj−μk)2))=i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjk[log2π1−logσk+(−2σk2(yj−μk)2)]
2、E步:确定Q函数
Q
(
θ
,
θ
(
i
)
)
=
E
[
log
P
(
y
,
γ
∣
θ
)
∣
y
,
θ
(
i
)
]
=
E
{
∑
i
=
1
K
log
α
k
∑
j
=
1
n
γ
j
k
+
∑
i
=
1
K
∑
j
=
1
n
γ
j
k
[
log
1
2
π
−
log
σ
k
+
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
]
}
=
∑
i
=
1
K
{
log
α
k
∑
j
=
1
n
E
γ
j
k
+
∑
j
=
1
n
E
γ
j
k
[
log
1
2
π
−
log
σ
k
+
(
−
(
y
j
−
μ
k
)
2
2
σ
k
2
)
]
}
\begin{aligned} Q(\theta,\theta^{(i)}) &=E[ \log P(y,\gamma \mid \theta)\mid y,\theta^{(i)} ]\\ &=E\left\{\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\ &=\sum_{i=1}^K\left\{\log \alpha_k\sum_{j=1}^n {E\gamma_{jk}} +\sum_{j=1}^n {E\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\ \end{aligned}
Q(θ,θ(i))=E[logP(y,γ∣θ)∣y,θ(i)]=E{i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjk[log2π1−logσk+(−2σk2(yj−μk)2)]}=i=1∑K{logαkj=1∑nEγjk+j=1∑nEγjk[log2π1−logσk+(−2σk2(yj−μk)2)]}
记
γ
^
j
k
=
E
(
γ
j
k
∣
y
,
θ
)
\hat \gamma_{jk}= E(\gamma_{jk}\mid y,\theta)
γ^jk=E(γjk∣y,θ)则:
γ
^
j
k
=
1
⋅
P
(
γ
j
k
=
1
∣
y
,
θ
)
+
0
⋅
P
(
γ
j
k
=
0
∣
y
,
θ
)
=
P
(
γ
j
k
=
1
,
y
j
∣
θ
)
1
=
P
(
γ
j
k
=
1
,
y
j
∣
θ
)
∑
k
=
1
K
P
(
γ
j
k
=
1
,
y
j
∣
θ
)
=
P
(
y
j
∣
γ
j
k
=
1
,
θ
)
P
(
γ
j
k
=
1
∣
θ
)
∑
k
=
1
K
P
(
y
j
∣
γ
j
k
=
1
,
θ
)
P
(
γ
j
k
=
1
∣
θ
)
=
α
k
ϕ
(
y
j
∣
θ
k
)
∑
k
=
1
K
α
k
ϕ
(
y
j
∣
θ
k
)
\begin{aligned} \hat \gamma_{jk} &= 1\cdot P(\gamma_{jk} = 1 \mid y,\theta)+ 0\cdot P(\gamma_{jk} = 0 \mid y,\theta)\\ &=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{1}\\ &=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{\sum_{k=1}^KP(\gamma_{jk} = 1 ,y_j\mid \theta)}\\ &=\frac {P(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}{\sum_{k=1}^KP(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}\\ &=\frac{\alpha_k \phi(y_j\mid \theta_{k})}{\sum_{k=1}^K\alpha_k \phi(y_j\mid \theta_{k})}\\ \end{aligned}
γ^jk=1⋅P(γjk=1∣y,θ)+0⋅P(γjk=0∣y,θ)=1P(γjk=1,yj∣θ)=∑k=1KP(γjk=1,yj∣θ)P(γjk=1,yj∣θ)=∑k=1KP(yj∣γjk=1,θ)P(γjk=1∣θ)P(yj∣γjk=1,θ)P(γjk=1∣θ)=∑k=1Kαkϕ(yj∣θk)αkϕ(yj∣θk)
γ
^
j
k
\hat \gamma_{jk}
γ^jk是在当前模型下第
j
j
j个观测数据来自第
k
k
k个分模型的概率,成为分模型
k
k
k对观测数据
y
j
y_j
yj的响应度。
3、M步
迭代中的M步是求函数
Q
(
θ
,
θ
(
i
)
)
Q(\theta, \theta^{(i)})
Q(θ,θ(i))对
θ
\theta
θ的极大值,即:
θ
(
i
+
1
)
=
arg
max
θ
  
Q
(
θ
,
θ
(
i
)
)
\theta^{(i+1)} =\arg \underset{\theta}{\max} \; Q(\theta,\theta^{(i)})
θ(i+1)=argθmaxQ(θ,θ(i))
依据
Q
Q
Q函数对各个参数(
μ
k
,
σ
k
2
,
α
k
\mu_k,\sigma_k^2,\alpha_k
μk,σk2,αk)求偏导并置其为0得到新的参数为:
μ
^
k
=
∑
j
=
1
n
γ
^
j
k
y
j
∑
j
=
1
n
γ
^
j
k
σ
^
k
2
=
∑
j
=
1
n
γ
^
j
k
(
y
j
−
μ
k
)
2
∑
j
=
1
n
γ
^
j
k
α
^
k
=
∑
j
=
1
n
γ
^
j
k
n
\begin{aligned} \hat \mu_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}y_j}{\sum_{j=1}^n\hat \gamma_{jk}}\\ \hat \sigma_k^2 &= \frac{\sum_{j=1}^n \hat \gamma_{jk}(y_j-\mu_k)^2}{\sum_{j=1}^n\hat \gamma_{jk}}\\ \hat \alpha_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}}{n}\\ \end{aligned}
μ^kσ^k2α^k=∑j=1nγ^jk∑j=1nγ^jkyj=∑j=1nγ^jk∑j=1nγ^jk(yj−μk)2=n∑j=1nγ^jk
4、重复迭代,直到参数收敛即可。
算法流程
输入:观测数据
Y
Y
Y,高斯混合模型;
输出:高斯混合模型参数
(1)为参数赋予初值;
(2)E步,计算分模型对观测数据的响应度;
(3)M步,计算新一轮的迭代模型参数;
(4)重复(2)(3)步,直到模型参数收敛。
推广
EM算法可以解释为F函数的极大-极大算法,可以推广为广义期望极大算法(GEM)。
Reference:
[1]李航:《统计学习方法》