GMM
- 一个类一个正态分布
- N ( μ k , Σ k ) N(\mu_k,\Sigma_k) N(μk,Σk)
有监督 | 无监督 | 半监督 | |
---|---|---|---|
目标函数 | L = l o g p ( X l , Y l ∥ θ ) = Σ i = 1 l l o g p ( y i ∥ θ ) p ( x i ∥ y i , θ ) = Σ i = 1 l l o g α y i N ( x i ∥ θ y i ) L=logp(X_l,Y_l\|\theta)=\Sigma_{i=1}^llogp(y_i\|\theta)p(x_i\|y_i,\theta)\\=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i}) L=logp(Xl,Yl∥θ)=Σi=1llogp(yi∥θ)p(xi∥yi,θ)=Σi=1llogαyiN(xi∥θyi) | p ( x ; θ ) = Π i N Σ k = 1 K π k N ( x i ∥ μ k , Σ k ) p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i\|\mu_k,\Sigma_k) p(x;θ)=ΠiNΣk=1KπkN(xi∥μk,Σk) | P ( x l , y l , x u ∥ θ ) = Σ i = 1 l l o g α y i N ( x i ∥ θ y i ) + Σ i = l m l o g Σ k = 1 N α k N ( x i ∥ θ k ) P(x_l,y_l,x_u\|\theta)=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k) P(xl,yl,xu∥θ)=Σi=1llogαyiN(xi∥θyi)+Σi=lmlogΣk=1NαkN(xi∥θk) |
E | 求导解决 | 求 γ i k = p ( y i = k ∥ x i ) = α k N ( x i ∥ θ k ) Σ k = 1 N α k N ( x i ∥ θ k ) 求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)} 求γik=p(yi=k∥xi)=Σk=1NαkN(xi∥θk)αkN(xi∥θk) | 求 γ i k = p ( y i = k ∥ x i ) = α k N ( x i ∥ θ k ) Σ k = 1 N α k N ( x i ∥ θ k ) 求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)} 求γik=p(yi=k∥xi)=Σk=1NαkN(xi∥θk)αkN(xi∥θk) |
M | μ k = 1 l k ( Σ i ∈ D l , y i = k x i ) Σ i = 1 l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T ) α k = l k m \mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\\Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{l_k}{m} μk=lk1(Σi∈Dl,yi=kxi)Σi=lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=mlk | μ k = Σ i γ ( z i k ) x i γ ( z i k ) π k = Σ i γ ( z i k ) N Σ k = Σ i γ ( z i k ) ( x i − μ k ) ( x i − μ k ) T γ ( z i k ) \mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})}\\\pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N}\\\Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})} μk=γ(zik)Σiγ(zik)xiπk=NΣiγ(zik)Σk=γ(zik)Σiγ(zik)(xi−μk)(xi−μk)T | μ k = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k x i + Σ i = l m γ i k x i ) Σ i = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T + Σ i = l m γ i k ( x i − μ k ) ( x i − μ k ) T ) α k = Σ i = l m γ i k + l k m \mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\\Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m} μk=Σi=lmγik+lk1(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=mΣi=lmγik+lk |
半监督=无监督+有监督 |
有监督
- 目标函数: L = l o g p ( X l , Y l ∣ θ ) = Σ i = 1 l l o g p ( y i ∣ θ ) p ( x i ∣ y i , θ ) , θ i = α i , μ i , Σ i L=logp(X_l,Y_l|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta),\theta_i={\alpha_i,\mu_i,\Sigma_i} L=logp(Xl,Yl∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ),θi=αi,μi,Σi
- = Σ i = 1 l l o g α y i N ( x i ∣ θ y i ) = Σ i = 1 l ( l o g α y i − n 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ y i ∣ ) − ( x i − μ y i ) T Σ y i − 1 ( x i − μ y i ) =\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i}) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i}) =Σi=1llogαyiN(xi∣θyi)=Σi=1l(logαyi−2nlog(2π)−21log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)
- 直接求导得到结果
- μ k = 1 l k ( Σ i ∈ D l , y i = k x i ) Σ i = 1 l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T ) α k = l k m \mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\ \Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{l_k}{m} μk=lk1(Σi∈Dl,yi=kxi)Σi=lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T)αk=mlk
无监督
5.2GMM高斯混合模型和EM
- 概率解释: 假设有K个簇,每一个簇服从高斯分布,以概率π𝑘随机选择一个簇 k ,从其分布中采样出一个样本点,如此得到观测数据
- N个样本点𝒙的似然函数(Likelihood)
- p ( x ; θ ) = Π i N Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) , 其 中 Σ k π k = 1 , 0 ≤ π k ≤ 1 p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k),其中\Sigma_k\pi_k=1,0\leq \pi_k\leq 1 p(x;θ)=ΠiNΣk=1KπkN(xi∣μk,Σk),其中Σkπk=1,0≤πk≤1
- 引入隐变量,指示所属类,k维独热表示
- p ( z k = 1 ) = π k p(z_k=1)=\pi_k p(zk=1)=πk
-
p
(
x
i
∣
z
)
=
Π
k
K
N
(
x
i
∣
μ
k
,
Σ
k
)
z
k
p(x_i|z)=\Pi_k^KN(x_i|\mu_k,\Sigma_k)^{z_k}
p(xi∣z)=ΠkKN(xi∣μk,Σk)zk
- p ( x i ∣ z k = 1 ) = N ( x i ∣ μ k , Σ k ) p(x_i|z_k=1)=N(x_i|\mu_k,\Sigma_k) p(xi∣zk=1)=N(xi∣μk,Σk)
- p ( x i ) = Σ z p ( x i ∣ z ) p ( z ) = Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) p(x_i)=\Sigma_zp(x_i|z)p(z)=\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k) p(xi)=Σzp(xi∣z)p(z)=Σk=1KπkN(xi∣μk,Σk)
- 从属度(可以看做,xi属于第k个簇的解释
- γ ( z i k ) = p ( z i k = 1 ∣ x i ) = p ( z i k = 1 ) p ( x i ∣ z k = 1 ) Σ k = 1 K p ( z i k = 1 ) p ( x i ∣ z k = 1 ) = π k N ( x i ∣ μ k , Σ k ) Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) \gamma(z_{ik})\\=p(z_{ik=1}|x_i)\\=\frac{p(z_{ik}=1)p(x_i|z_k=1)}{\Sigma_{k=1}^Kp(z_{ik}=1)p(x_i|z_k=1)}\\=\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)} γ(zik)=p(zik=1∣xi)=Σk=1Kp(zik=1)p(xi∣zk=1)p(zik=1)p(xi∣zk=1)=Σk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)
参数学习:极大似然估计–EM
- 极大似然估计
- 难:log里面有求和,所有参数耦合
- 似然函数取最大值时满足的条件:
l
o
g
(
P
(
x
∣
θ
)
对
μ
k
求
导
log(P(x|\theta)对\mu_k求导
log(P(x∣θ)对μk求导
-
0
=
−
Σ
i
=
1
N
π
k
N
(
x
i
∣
μ
k
,
Σ
k
)
Σ
k
=
1
K
π
k
N
(
x
i
∣
μ
k
,
Σ
k
)
Σ
k
(
x
i
−
μ
k
)
0=-\Sigma_{i=1}^N\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}\Sigma_k(x_i-\mu_k)
0=−Σi=1NΣk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)Σk(xi−μk)
- μ k = Σ i γ ( z i k ) x i γ ( z i k ) \mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})} μk=γ(zik)Σiγ(zik)xi
- π k = Σ i γ ( z i k ) N \pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N} πk=NΣiγ(zik)
- Σ k = Σ i γ ( z i k ) ( x i − μ k ) ( x i − μ k ) T γ ( z i k ) \Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})} Σk=γ(zik)Σiγ(zik)(xi−μk)(xi−μk)T
- 这不是封闭解–》EM
- E:给定当前参数估计值,求后验概率 γ ( z i k ) = E ( z i k ) \gamma(z_{ik})=E(z_{ik}) γ(zik)=E(zik)
- M:依据后验概率 γ ( z i k ) \gamma(z_{ik}) γ(zik),求参数估计 μ k 、 π k 、 Σ k \mu_k、\pi_k、\Sigma_k μk、πk、Σk
- 迭代收敛到局部极小
-
0
=
−
Σ
i
=
1
N
π
k
N
(
x
i
∣
μ
k
,
Σ
k
)
Σ
k
=
1
K
π
k
N
(
x
i
∣
μ
k
,
Σ
k
)
Σ
k
(
x
i
−
μ
k
)
0=-\Sigma_{i=1}^N\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}\Sigma_k(x_i-\mu_k)
0=−Σi=1NΣk=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk)Σk(xi−μk)
EM
- 通用EM
- 目标函数:极大似然函数 l o g P ( X ∣ θ ) = l o g Σ z P ( x , z ∣ θ ) logP(X|\theta)=log\Sigma_zP(x,z|\theta) logP(X∣θ)=logΣzP(x,z∣θ)
- 用于:不完整数据的对数似然函数
- 不知Z的数据,只知道Z的后验分布 P ( z ∣ x , θ o l d ) P(z|x,\theta^{old}) P(z∣x,θold)
- 考虑其期望 Q ( θ , θ o l d ) = E p ( z ∣ x , θ o l d ) ( l o g P ( x , z ∣ θ ) ) Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta)) Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))
- 最大化期望 θ n e w = a r g m a x θ Q ( θ , θ o l d ) \theta^{new}=argmax_\theta Q(\theta,\theta^{old}) θnew=argmaxθQ(θ,θold)
- E:求 P ( z ∣ x , θ o l d ) P(z|x,\theta^{old}) P(z∣x,θold)
- M:
θ
n
e
w
=
a
r
g
m
a
x
θ
Q
(
θ
,
θ
o
l
d
)
\theta^{new}=argmax_\theta Q(\theta,\theta^{old})
θnew=argmaxθQ(θ,θold)
- why是启发式的,但却存在似然函数?
- Q ( θ , θ o l d ) = E p ( z ∣ x , θ o l d ) ( l o g P ( x , z ∣ θ ) ) = p ( x ; θ ) Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta))=p(x;\theta) Q(θ,θold)=Ep(z∣x,θold)(logP(x,z∣θ))=p(x;θ)
- why是启发式的,但却存在似然函数?
- 完整数据和不完整数据的比较
- 不完整数据:
l
o
g
p
(
x
)
=
Σ
i
l
o
g
Σ
z
p
(
x
i
∣
z
)
p
(
z
)
=
Σ
i
l
o
g
Σ
k
=
1
K
π
k
N
(
x
i
∣
μ
k
,
Σ
k
)
logp(x)=\Sigma_ilog \Sigma_zp(x_i|z)p(z)=\Sigma_ilog \Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)
logp(x)=ΣilogΣzp(xi∣z)p(z)=ΣilogΣk=1KπkN(xi∣μk,Σk)
- 不完整数据中,参数之间是耦合的,不存在封闭解
- 完整数据
- l o g p ( x , z ∣ θ ) = l o g p ( z ∣ θ ) p ( x ∣ z , θ ) = Σ i Σ k z i k ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) logp(x,z|\theta)=logp(z|\theta)p(x|z,\theta)=\Sigma_i\Sigma_k z_{ik}(log\pi_k+logN(x_i|\mu_k,\Sigma_k)) logp(x,z∣θ)=logp(z∣θ)p(x∣z,θ)=ΣiΣkzik(logπk+logN(xi∣μk,Σk))
- E z ( l o g p ( x , z ∣ θ ) ) = Σ i Σ k E ( z i k ) ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) = Σ i Σ k γ ( z i k ) ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) E_z(logp(x,z|\theta))\\=\Sigma_i\Sigma_kE(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k))\\=\Sigma_i\Sigma_k\gamma(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k)) Ez(logp(x,z∣θ))=ΣiΣkE(zik)(logπk+logN(xi∣μk,Σk))=ΣiΣkγ(zik)(logπk+logN(xi∣μk,Σk))
EM收敛性保证
- 目标:最大化
P
(
x
∣
θ
)
=
Σ
z
p
(
x
,
z
∣
θ
)
P(x|\theta)=\Sigma_zp(x,z|\theta)
P(x∣θ)=Σzp(x,z∣θ)
- 直接优化 P ( x ∣ θ ) P(x|\theta) P(x∣θ)很困难,但优化完整数据的 p ( x , z ∣ θ ) p(x,z|\theta) p(x,z∣θ)容易
- 证明
- 分解
- 对任意分布q(z),下列分解成立
- l n p ( x ∣ θ ) = L ( q , θ ) + K L ( q ∣ ∣ p ) 其 中 , L ( q , θ ) = Σ z q ( z ) l n ( p ( x , z ∣ θ ) q ( z ) ) K L ( q ∣ ∣ p ) = − Σ z q ( z ) l n ( p ( z ∣ x , θ ) q ( z ) ) K L ( q ∣ ∣ p ) ≥ 0 , L ( q , θ ) 是 l n p ( x ∣ θ ) 的 下 界 lnp(x|\theta)=L(q,\theta)+KL(q||p)\\其中,\\L(q,\theta)=\Sigma_zq(z)ln(\frac{p(x,z|\theta)}{q(z)})\\KL(q||p)=-\Sigma_zq(z)ln(\frac{p(z|x,\theta)}{q(z)})\\KL(q||p)\geq0,L(q,\theta)是lnp(x|\theta)的下界 lnp(x∣θ)=L(q,θ)+KL(q∣∣p)其中,L(q,θ)=Σzq(z)ln(q(z)p(x,z∣θ))KL(q∣∣p)=−Σzq(z)ln(q(z)p(z∣x,θ))KL(q∣∣p)≥0,L(q,θ)是lnp(x∣θ)的下界
- E: 最 大 化 L ( q , θ ) , q ( z ) = P ( z ∣ x , θ o l d ) 最大化L(q,\theta),\\q(z)=P(z|x,\theta^{old}) 最大化L(q,θ),q(z)=P(z∣x,θold)
- M : 原 来 的 下 界 L ( q , θ ) = Σ z P ( z ∣ x , θ o l d ) l n ( p ( x , z ∣ θ ) q ( z ) ) = Q ( θ , θ o l d ) + c o n s t − − − 正 好 是 期 望 M:原来的下界L(q,\theta)=\Sigma_zP(z|x,\theta^{old})ln(\frac{p(x,z|\theta)}{q(z)})=Q(\theta,\theta^{old})+const---正好是期望 M:原来的下界L(q,θ)=ΣzP(z∣x,θold)ln(q(z)p(x,z∣θ))=Q(θ,θold)+const−−−正好是期望
- 下界提升了
半监督
- 目标函数: L = l o g p ( X l , Y l , X u ∣ θ ) = Σ i = 1 l l o g p ( y i ∣ θ ) p ( x i ∣ y i , θ ) + Σ i = l + 1 m l o g ( Σ k = 1 N p ( y i = k ∣ θ ) p ( x i ∣ y i = k , θ ) ) , θ i = α i , μ i , Σ i L=logp(X_l,Y_l,X_u|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta)+\Sigma_{i=l+1}^mlog(\Sigma_{k=1}^Np(y_i=k|\theta)p(x_i|y_i=k,\theta)),\theta_i={\alpha_i,\mu_i,\Sigma_i} L=logp(Xl,Yl,Xu∣θ)=Σi=1llogp(yi∣θ)p(xi∣yi,θ)+Σi=l+1mlog(Σk=1Np(yi=k∣θ)p(xi∣yi=k,θ)),θi=αi,μi,Σi
- = Σ i = 1 l l o g α y i N ( x i ∣ θ y i ) + Σ i = l m l o g Σ k = 1 N α k N ( x i ∣ θ k ) = Σ i = 1 l ( l o g α y i − n 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ y i ∣ ) − ( x i − μ y i ) T Σ y i − 1 ( x i − μ y i ) + Σ i = l m l o g ( Σ k = 1 N ( α k 1 ( 2 π ) n / 2 ∣ Σ k ∣ 1 / 2 e x p { − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) } ) ) =\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i})+\Sigma_{i=l}^mlog(\Sigma_{k=1}^N(\alpha_k{{1} \over {(2\pi)^{n/2}|\Sigma_k|^{1/2}}} exp\{ -{{1} \over {2}}(x_i-\mu_k)^T{\Sigma_k}^{-1}(x_i-\mu_k)\})) =Σi=1llogαyiN(xi∣θyi)+Σi=lmlogΣk=1NαkN(xi∣θk)=Σi=1l(logαyi−2nlog(2π)−21log(∣Σyi∣)−(xi−μyi)TΣyi−1(xi−μyi)+Σi=lmlog(Σk=1N(αk(2π)n/2∣Σk∣1/21exp{−21(xi−μk)TΣk−1(xi−μk)}))
- E: 求 γ i k = p ( y i = k ∣ x i ) = α k N ( x i ∣ θ k ) Σ k = 1 N α k N ( x i ∣ θ k ) 求\gamma_{ik}=p(y_i=k|x_i)=\frac{\alpha_kN(x_i|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k)} 求γik=p(yi=k∣xi)=Σk=1NαkN(xi∣θk)αkN(xi∣θk)
- M: μ k = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k x i + Σ i = l m γ i k x i ) Σ i = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T + Σ i = l m γ i k ( x i − μ k ) ( x i − μ k ) T ) α k = Σ i = l m γ i k + l k m \mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\ \Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m} μk=Σi=lmγik+lk1(Σi∈Dl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(Σi∈Dl,yi=k(xi−μk)(xi−μk)T+Σi=lmγik(xi−μk)(xi−μk)T)αk=mΣi=lmγik+lk