0 前言
高斯混合模型(Gaussian Mixture Model)通常简称GMM,是一种广泛使用的聚类算法,该方法使用了高斯分布作为参数模型,并使用了期望最大(Expectation Maximization,简称EM)算法进行训练。
1 单高斯模型
首先,当随机变量X属于一维情况下的高斯概率密度函数:
P
(
x
;
μ
,
σ
2
)
=
1
2
π
σ
2
e
x
p
(
−
(
x
−
μ
)
2
2
σ
2
)
(1.1)
P(x;\mu,\sigma^2) =\frac{1}{ \sqrt{ 2 \pi \sigma^2} }exp(- \frac{(x-\mu)^2}{2 \sigma^2}) \tag{1.1}
P(x;μ,σ2)=2πσ21exp(−2σ2(x−μ)2)(1.1)
其中
μ
\mu
μ为均值,
σ
2
\sigma^2
σ2表示方差,其概率密度函数图像如下:
P
(
x
,
μ
,
Σ
)
=
1
(
2
π
)
D
2
Σ
1
2
e
x
p
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
(1.2)
P(x,\bm{\mu} , \bm{\Sigma}) = \frac{1}{ (2 \pi)^{\frac D2} \Sigma^{\frac 12} }exp ( - \frac12 (x-\mu)^T \Sigma^{-1}(x-\mu) ) \tag{1.2}
P(x,μ,Σ)=(2π)2DΣ211exp(−21(x−μ)TΣ−1(x−μ))(1.2)
其中,
μ
\mu
μ均值,
Σ
\Sigma
Σ为协方差矩阵,D为样本X的维数。其概率密度函数图像如下:
对于单高斯模型,如果明确训练样本是属于某个高斯模型,我们可以通过极大似然估计求得该高斯模型的参数。假设有样本集
D
=
{
x
1
,
x
2
,
.
.
.
,
x
m
}
D=\{x_1,x_2,...,x_m\}
D={x1,x2,...,xm},其中
x
i
∈
R
d
x_i \in R^d
xi∈Rd,每个样本点都是独立的。通过概率密度函数,可以得到样本集的似然函数:
L
(
μ
,
Σ
)
=
∏
i
=
1
m
P
(
x
i
)
=
∏
i
=
1
m
1
(
2
π
)
D
2
Σ
1
2
e
x
p
(
−
1
2
(
x
i
−
μ
)
T
Σ
−
1
(
x
i
−
μ
)
)
(1.3)
L(\mu ,\Sigma) = \prod_{i=1}^m P(x_i) = \prod_{i=1}^m \frac{1}{ (2 \pi)^{\frac D2} \Sigma^{\frac 12} }exp ( - \frac12 (x_i-\mu)^T \Sigma^{-1}(x_i-\mu) ) \tag{1.3}
L(μ,Σ)=i=1∏mP(xi)=i=1∏m(2π)2DΣ211exp(−21(xi−μ)TΣ−1(xi−μ))(1.3)
由于每个点发生的概率都很小,乘积会变得极其小,不利于计算和观察,因此通常我们用 Maximum Log-Likelihood 来计算:
l
(
μ
,
Σ
)
=
l
o
g
L
(
(
μ
,
Σ
)
=
∑
i
=
1
m
−
D
2
l
o
g
2
π
−
1
2
l
o
g
∣
Σ
∣
−
1
2
(
x
i
−
μ
)
T
Σ
−
1
(
x
i
−
μ
)
(1.4)
l(\mu ,\Sigma) = logL((\mu ,\Sigma) = \sum_{i=1}^m -\frac D2 log2\pi - \frac12 log |\Sigma| - \frac12 (x_i-\mu)^T \Sigma^{-1}(x_i-\mu) \tag{1.4}
l(μ,Σ)=logL((μ,Σ)=i=1∑m−2Dlog2π−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ)(1.4)
对最大对数似然函数关于各个参数求偏导,并令结果等于0,可解得各个参数。
∂
l
(
μ
,
Σ
)
∂
μ
=
∑
i
=
1
m
−
1
2
∗
2
Σ
−
1
(
x
i
−
μ
)
=
0
(
Σ
为
正
定
矩
阵
)
m
μ
=
∑
i
=
1
m
x
i
⇒
μ
=
1
m
∑
i
=
1
m
x
i
(1.5)
\frac{\partial l(\mu ,\Sigma) }{ \partial \mu} = \sum_{i=1}^m -\frac12 * 2 \Sigma^{-1} (x_i-\mu) = 0 \qquad (\Sigma为正定矩阵)\\ m \mu = \sum_{i=1}^m x_i \Rightarrow \qquad \mu=\frac1m \sum_{i=1}^m x_i \tag{1.5}
∂μ∂l(μ,Σ)=i=1∑m−21∗2Σ−1(xi−μ)=0(Σ为正定矩阵)mμ=i=1∑mxi⇒μ=m1i=1∑mxi(1.5)
对
Σ
\Sigma
Σ求偏导,我们先了解下面的矩阵求导知识:
- t r [ A B C ] = t r [ C A B ] = t r [ B C A ] tr[ABC] = tr[CAB] = tr[BCA] tr[ABC]=tr[CAB]=tr[BCA]
- x T A x = t r ( x T A x ) = t r ( x x T A ) x^TAx = tr(x^TAx) = tr(xx^TA) xTAx=tr(xTAx)=tr(xxTA)
- ∂ t r ( A B ) ∂ A = B T \frac{\partial tr(AB)}{\partial A} = B^T ∂A∂tr(AB)=BT
- ∂ l o g ∣ A ∣ ∂ A = A − T \frac{\partial log|A|}{\partial A} = A^{-T} ∂A∂log∣A∣=A−T
l
(
μ
,
Σ
)
=
∑
i
=
1
m
−
D
2
l
o
g
2
π
−
1
2
l
o
g
∣
Σ
∣
−
1
2
(
x
i
−
μ
)
T
Σ
−
1
(
x
i
−
μ
)
=
∑
i
=
1
m
−
D
2
l
o
g
2
π
+
1
2
l
o
g
∣
Σ
−
1
∣
−
1
2
t
r
[
(
x
i
−
μ
)
(
x
i
−
μ
)
T
Σ
−
1
]
(1.6)
\begin{aligned} l(\mu ,\Sigma) &= \sum_{i=1}^m -\frac D2 log2\pi - \frac12 log |\Sigma| - \frac12 (x_i-\mu)^T \Sigma^{-1}(x_i-\mu) \\ &= \sum_{i=1}^m -\frac D2 log2\pi + \frac12 log |\Sigma^{-1}| - \frac12 tr[ (x_i-\mu)(x_i-\mu)^T \Sigma^{-1} ] \end{aligned} \tag{1.6}
l(μ,Σ)=i=1∑m−2Dlog2π−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ)=i=1∑m−2Dlog2π+21log∣Σ−1∣−21tr[(xi−μ)(xi−μ)TΣ−1](1.6)
∂
l
(
μ
,
Σ
)
∂
Σ
−
1
=
m
2
Σ
−
1
2
∑
i
=
1
m
(
x
i
−
μ
)
(
x
i
−
μ
)
T
=
0
Σ
=
1
m
∑
i
=
1
m
(
x
i
−
μ
)
(
x
i
−
μ
)
T
(1.7)
\frac{\partial l(\mu ,\Sigma) }{ \partial \Sigma^{-1}} = \frac m2 \Sigma - \frac12 \sum_{i=1}^m (x_i-\mu)(x_i-\mu)^T = 0 \\ \Sigma = \frac 1m \sum_{i=1}^m (x_i-\mu)(x_i-\mu)^T \tag{1.7}
∂Σ−1∂l(μ,Σ)=2mΣ−21i=1∑m(xi−μ)(xi−μ)T=0Σ=m1i=1∑m(xi−μ)(xi−μ)T(1.7)
2 混合高斯模型
假设设有随机变量X,混合高斯模型由K个高斯模型组成(即数据包含K个类),则GMM的概率密度函数如下:
P
(
x
)
=
∑
i
=
1
K
p
(
k
)
p
(
x
∣
k
)
=
∑
i
=
1
K
π
i
N
(
x
;
μ
i
,
Σ
i
)
(2.1)
P(x) = \sum_{i=1}^K p(k)p(x|k) = \sum_{i=1}^K \pi_i N(x;\mu_i,\Sigma_i) \tag{2.1}
P(x)=i=1∑Kp(k)p(x∣k)=i=1∑KπiN(x;μi,Σi)(2.1)
其中,
p
(
x
∣
k
)
=
N
(
x
;
μ
k
,
Σ
k
)
p(x|k) = N(x;\mu_k,\Sigma_k)
p(x∣k)=N(x;μk,Σk)是第k个高斯模型的概率密度函数,
π
k
\pi_k
πk第k个高斯模型的权重,称作选择第k个模型的先验概率,且满足
∑
i
=
1
K
π
k
=
1
\sum_{i=1}^K \pi_k =1
∑i=1Kπk=1。
m个样本点的联合概率为:
L
(
μ
,
Σ
,
π
)
=
∏
i
=
1
m
p
(
x
i
)
=
∏
i
=
1
m
[
∑
j
=
1
K
π
j
N
(
x
;
μ
j
,
Σ
j
)
]
(2.2)
L(\mu,\Sigma,\pi) = \prod_{i=1}^m p(x_i) = \prod_{i=1}^m \left[\sum_{j=1}^K \pi_j N(x;\mu_j,\Sigma_j) \right] \tag{2.2}
L(μ,Σ,π)=i=1∏mp(xi)=i=1∏m[j=1∑KπjN(x;μj,Σj)](2.2)
对数似然函数表示为:
l
(
θ
)
=
l
o
g
L
(
μ
,
Σ
,
π
)
=
∑
i
=
1
m
l
o
g
∑
j
=
1
K
π
j
N
(
x
;
μ
j
,
Σ
j
)
(2.3)
l(\theta) = log L(\mu,\Sigma,\pi) = \sum_{i=1}^m log \sum_{j=1}^K \pi_j N(x;\mu_j,\Sigma_j) \tag{2.3}
l(θ)=logL(μ,Σ,π)=i=1∑mlogj=1∑KπjN(x;μj,Σj)(2.3)
对似然函数求导,令导数为0,发现不能求得参数的解析解,下面我们将使用EM算法求解混合高斯模型的参数。
在介绍推导过程之前,先引入几个公式:
离散随机变量X,p(x)表示X的概率密度分布函数,g(x)表示X的某一函数。
离散变量数学期望: E [ g ( X ) ] = ∑ i g ( x i ) p ( x i ) E[g(X)] = \sum_i g(x_i) p(x_i) E[g(X)]=∑ig(xi)p(xi)
如果f是凸函数,X是随机变量。由Jensen不等式: E [ f ( X ) ] ≥ f ( E X ) E[f(X)] \geq f(EX) E[f(X)]≥f(EX)
- E-step
引入隐变量 z i j z_{ij} zij,将一个样本点表示为:
( x i , z 1 , z 2 , . . . , z K ) , i = 1 , 2 , . . . , m (\bm{x_i},z_1,z_2,...,z_K ) ,\qquad i=1,2,...,m (xi,z1,z2,...,zK),i=1,2,...,m
其中 z j ∈ { 0 , 1 } z_j \in \{0,1\} zj∈{0,1}, z j = 1 z_j = 1 zj=1 ,表示样本 x i x_i xi来自第j个高斯模型,且 ∑ j = 1 K z j = 1 \sum_{j=1}^K z_j =1 ∑j=1Kzj=1。
样本集的似然函数为:
L ( θ ) = ∏ i = 1 m p ( x i ; θ ) = ∏ i = 1 m ∑ j = 1 K p ( x i , z j ; θ ) (2.4) L(\theta) = \prod_{i=1}^m p(x_i;\theta) = \prod_{i=1}^m \sum_{j=1}^K p(x_i,z_j;\theta) \tag{2.4} L(θ)=i=1∏mp(xi;θ)=i=1∏mj=1∑Kp(xi,zj;θ)(2.4)
对数似然函数表示为:
l ( θ ) = l o g L ( θ ) = ∑ i = 1 m l o g ( ∑ j = 1 K p ( x i , z j ; θ ) ) = ∑ i = 1 m l o g ∑ j = 1 K Q i ( z j ) p ( x i , z j ; θ ) Q i ( z j ) (2.5) \begin{aligned} l(\theta) &= log L(\theta) = \sum_{i=1}^m log \left( \sum_{j=1}^K p(x_i,z_j;\theta) \right) \\ &= \sum_{i=1}^m log \sum_{j=1}^K Q_i(z_j) \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \end{aligned} \tag{2.5} l(θ)=logL(θ)=i=1∑mlog(j=1∑Kp(xi,zj;θ))=i=1∑mlogj=1∑KQi(zj)Qi(zj)p(xi,zj;θ)(2.5)
其中, ∑ j = 1 K Q i ( z j ) = 1 , Q i ( z ) ≥ 0 \sum_{j=1}^K Q_i(z_j) =1,Q_i(z) \geq 0 ∑j=1KQi(zj)=1,Qi(z)≥0 。令:
p ( z i j ) = Q i ( z j ) f ( z i j ) = p ( x i , z j ; θ ) Q i ( z j ) p(z_{ij}) = Q_i(z_j) \qquad f(z_{ij}) = \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } p(zij)=Qi(zj)f(zij)=Qi(zj)p(xi,zj;θ)
由期望公式: E [ f ( X ) ] = ∑ i f ( x i ) p ( x i ) E[f(X)] = \sum_i f(x_i) p(x_i) E[f(X)]=∑if(xi)p(xi),可知
∑ j = 1 K Q i ( z j ) p ( x i , z j ; θ ) Q i ( z j ) 是 p ( x i , z j ; θ ) Q i ( z j ) 的 数 学 期 望 (2.6) \sum_{j=1}^K Q_i(z_j) \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } 是 \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) }的数学期望 \tag{2.6} j=1∑KQi(zj)Qi(zj)p(xi,zj;θ)是Qi(zj)p(xi,zj;θ)的数学期望(2.6)
由于 f ( x ) = l o g x f(x) = logx f(x)=logx是凹函数,根据jensen不等式:
l ( θ ) = ∑ i = 1 m l o g [ E ( p ( x i , z j ; θ ) Q i ( z j ) ) ] ≥ ∑ i = 1 m E [ l o g p ( x i , z j ; θ ) Q i ( z j ) ] (2.7) l(\theta) = \sum_{i=1}^m log \left[ E \left( \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \right) \right] \geq \sum_{i=1}^m E \left[ log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) }\right] \tag{2.7} l(θ)=i=1∑mlog[E(Qi(zj)p(xi,zj;θ))]≥i=1∑mE[logQi(zj)p(xi,zj;θ)](2.7)
根据期望的定义,可将不等式右边展开:
∑ i = 1 m E [ l o g p ( x i , z j ; θ ) Q i ( z j ) ] = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g p ( x i , z j ; θ ) Q i ( z j ) (2.8) \sum_{i=1}^m E \left[ log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \right] = \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \tag{2.8} i=1∑mE[logQi(zj)p(xi,zj;θ)]=i=1∑mj=1∑KQi(zj)logQi(zj)p(xi,zj;θ)(2.8)
可以得到似然函数关于参数 θ \theta θ的下界:
l ( θ ) ≥ ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g p ( x i , z j ; θ ) Q i ( z j ) = J ( θ ) (2.9) l(\theta) \geq \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } = J(\theta) \tag{2.9} l(θ)≥i=1∑mj=1∑KQi(zj)logQi(zj)p(xi,zj;θ)=J(θ)(2.9)
假设 θ \theta θ已经给定,那么 l ( θ ) l(\theta) l(θ)的值就取决于 Q i ( z j ) Q_i(z_j) Qi(zj)和 p ( x i , z j ; θ ) p(x_i,z_j;\theta) p(xi,zj;θ)了。我们可以通过调整这两个概率使下界 J ( θ ) J(\theta) J(θ)不断上升,以逼近 l ( θ ) l(\theta) l(θ)的真实值,当下界 J ( θ ) J(\theta) J(θ)与似然函数 l ( θ ) l(\theta) l(θ)的值相等,然后固定 Q i ( z j ) Q_i(z_j) Qi(zj),利用极大似然估计调整 θ \theta θ,使下界 J ( θ ) J(\theta) J(θ)值达到最大值,得到的 θ \theta θ为新的 θ t + 1 \theta^{t+1} θt+1;再固定 θ \theta θ,调整 Q i ( z j ) Q_i(z_j) Qi(zj)…直到收敛到似然函数 l ( θ ) l(\theta) l(θ)的最大值 θ ∗ \theta^* θ∗处。过程示意图如下:
接下来我们需要寻找使不等式等号成立的条件,容易知道,不等式等号成立,就是Jensen不等式等号成立:
∑
i
=
1
m
l
o
g
[
E
(
p
(
x
i
,
z
j
;
θ
)
Q
i
(
z
j
)
)
]
≥
∑
i
=
1
m
E
[
l
o
g
p
(
x
i
,
z
j
;
θ
)
Q
i
(
z
j
)
]
(2.10)
\sum_{i=1}^m log \left[ E \left( \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \right) \right] \geq \sum_{i=1}^m E \left[ log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) }\right] \tag{2.10}
i=1∑mlog[E(Qi(zj)p(xi,zj;θ))]≥i=1∑mE[logQi(zj)p(xi,zj;θ)](2.10)
要使上面不等式等号成立,当且仅当:
p
(
x
i
,
z
j
;
θ
)
Q
i
(
z
j
)
=
C
⇒
p
(
x
i
,
z
j
;
θ
)
=
C
Q
i
(
z
j
)
⇒
∑
j
=
1
K
p
(
x
i
,
z
j
;
θ
)
=
C
∑
j
=
1
K
Q
i
(
z
j
)
∵
∑
j
=
1
K
Q
i
(
z
j
)
=
1
⇒
∑
j
=
1
K
p
(
x
i
,
z
j
;
θ
)
=
p
(
x
i
;
θ
)
=
C
∴
Q
i
(
z
j
)
=
p
(
x
i
,
z
j
;
θ
)
∑
j
=
1
K
p
(
x
i
,
z
j
;
θ
)
=
p
(
x
i
,
z
j
;
θ
)
p
(
x
i
;
θ
)
=
p
(
z
j
∣
x
i
;
θ
)
(2.11)
\begin{aligned} & \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } = C \\ & \Rightarrow p(x_i,z_j;\theta) = CQ_i(z_j) \\ & \Rightarrow \sum_{j=1}^K p(x_i,z_j;\theta) = C \sum_{j=1}^K Q_i(z_j) \qquad \because \sum_{j=1}^K Q_i(z_j) =1 \\ & \Rightarrow \sum_{j=1}^K p(x_i,z_j;\theta) = p(x_i;\theta) =C \\ & \therefore Q_i(z_j) = \frac{ p(x_i,z_j;\theta) }{ \sum_{j=1}^K p(x_i,z_j;\theta) } = \frac{ p(x_i,z_j;\theta) }{ p(x_i;\theta) } = p(z_j|x_i;\theta) \end{aligned} \tag{2.11}
Qi(zj)p(xi,zj;θ)=C⇒p(xi,zj;θ)=CQi(zj)⇒j=1∑Kp(xi,zj;θ)=Cj=1∑KQi(zj)∵j=1∑KQi(zj)=1⇒j=1∑Kp(xi,zj;θ)=p(xi;θ)=C∴Qi(zj)=∑j=1Kp(xi,zj;θ)p(xi,zj;θ)=p(xi;θ)p(xi,zj;θ)=p(zj∣xi;θ)(2.11)
- M-step
通过E步,我们得到了每个样本i关于 z j z_j zj的分布 Q i ( z j ) Q_i(z_j) Qi(zj),即已知 Q i ( z j ) Q_i(z_j) Qi(zj),此时固定 Q i ( z j ) Q_i(z_j) Qi(zj),求下界 J ( θ ) J(\theta) J(θ)最大值:
J ( θ ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g p ( x i , z j ; θ ) Q i ( z j ) (2.12) J(\theta) = \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \tag{2.12} J(θ)=i=1∑mj=1∑KQi(zj)logQi(zj)p(xi,zj;θ)(2.12)
对下界函数 J ( θ ) J(\theta) J(θ)求关于 θ \theta θ的偏导数:
J ( θ ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g p ( x i , z j ; θ ) Q i ( z j ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g p ( x i , z j ; θ ) − Q i ( z j ) l o g Q i ( z j ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g ( π j N ( μ j , Σ j ) ) − Q i ( z j ) l o g Q i ( z j ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g [ π j 1 ( 2 π ) D 2 ∣ Σ j ∣ 1 2 e x p ( − 1 2 ( x i − μ j ) T Σ j − 1 ( x i − μ j ) ) ] − Q i ( z j ) l o g Q i ( z j ) = ∑ i = 1 m ∑ j = 1 K Q i ( z j ) [ l o g π j − D 2 l o g 2 π − 1 2 l o g ∣ Σ j ∣ − 1 2 ( x i − μ j ) T Σ j − 1 ( x i − μ j ) ] − Q i ( z j ) l o g Q i ( z j ) (2.13) \begin{aligned} J(\theta) &= \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \frac{ p(x_i,z_j;\theta) }{ Q_i(z_j) } \\ &= \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log p(x_i,z_j;\theta) - Q_i(z_j) log Q_i(z_j) \\ & = \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log (\pi_j N(\mu_j,\Sigma_j)) - Q_i(z_j) log Q_i(z_j) \\ & = \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \left[\pi_j \frac{1}{ (2 \pi)^{\frac D2} |\Sigma_j|^{\frac 12} }exp ( - \frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) ) \right] - Q_i(z_j) log Q_i(z_j) \\ & = \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) \left[ log \pi_j - \frac D2 log2\pi - \frac12 log |\Sigma_j| - \frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) \right] - Q_i(z_j) log Q_i(z_j) \end{aligned} \tag{2.13} J(θ)=i=1∑mj=1∑KQi(zj)logQi(zj)p(xi,zj;θ)=i=1∑mj=1∑KQi(zj)logp(xi,zj;θ)−Qi(zj)logQi(zj)=i=1∑mj=1∑KQi(zj)log(πjN(μj,Σj))−Qi(zj)logQi(zj)=i=1∑mj=1∑KQi(zj)log[πj(2π)2D∣Σj∣211exp(−21(xi−μj)TΣj−1(xi−μj))]−Qi(zj)logQi(zj)=i=1∑mj=1∑KQi(zj)[logπj−2Dlog2π−21log∣Σj∣−21(xi−μj)TΣj−1(xi−μj)]−Qi(zj)logQi(zj)(2.13)
对 π p \pi_p πp求偏导,需要使用拉格朗日乘子法:
∂ J ( θ ) ∂ π p = ∂ [ ∑ i = 1 m ∑ j = 1 K Q i ( z j ) l o g π j + λ ( 1 − ∑ j = 1 K π j ) ] / ∂ π p = ∑ i = 1 m Q i ( z p ) π p − λ = 0 ⇒ ∑ i = 1 m Q i ( z p ) = λ π p ⇒ ∑ i = 1 m ∑ j = 1 K Q i ( z j ) = λ ∑ j = 1 K π j ∵ ∑ j = 1 K Q i ( z j ) = 1 , ∑ j = 1 K π j = 1 ∴ λ = m ⇒ π p = ∑ i = 1 m Q i ( z p ) m (2.14) \begin{aligned} \frac{\partial J(\theta) }{ \partial \pi_p} &= \partial \left[ \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) log \pi_j + \lambda (1-\sum_{j=1}^K \pi_j) \right] / \partial \pi_p \\ &= \frac{ \sum_{i=1}^m Q_i(z_p) }{ \pi_p } - \lambda = 0 \\ & \Rightarrow \sum_{i=1}^m Q_i(z_p) = \lambda \pi_p \\ & \Rightarrow \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) = \lambda \sum_{j=1}^K \pi_j \\ & \because \sum_{j=1}^K Q_i(z_j) =1, \qquad \sum_{j=1}^K \pi_j =1 \\ & \therefore \lambda = m \qquad \Rightarrow \pi_p = \frac{ \sum_{i=1}^m Q_i(z_p) }{m} \end{aligned} \tag{2.14} ∂πp∂J(θ)=∂[i=1∑mj=1∑KQi(zj)logπj+λ(1−j=1∑Kπj)]/∂πp=πp∑i=1mQi(zp)−λ=0⇒i=1∑mQi(zp)=λπp⇒i=1∑mj=1∑KQi(zj)=λj=1∑Kπj∵j=1∑KQi(zj)=1,j=1∑Kπj=1∴λ=m⇒πp=m∑i=1mQi(zp)(2.14)
对 μ p \mu_p μp求偏导,只看和 μ p \mu_p μp有关的部分:
∂ J ( θ ) ∂ μ p = ∂ [ ∑ i = 1 m ∑ j = 1 K Q i ( z j ) ( − 1 2 ( x i − μ j ) T Σ j − 1 ( x i − μ j ) ) ] / ∂ μ p = ∑ i = 1 m − 1 2 Q i ( z p ) ∂ ( x i − μ p ) T Σ p − 1 ( x i − μ p ) ∂ μ p ∵ ∂ u v ∂ x = ∂ u T v ∂ x = ∂ u ∂ x v + ∂ v ∂ x u = ∑ i = 1 m − 1 2 Q i ( z p ) [ ∂ ( x i − μ p ) ∂ μ p Σ p − 1 ( x i − μ p ) + ∂ ( Σ p − 1 x i − Σ p − 1 μ p ) ∂ μ p ( x i − μ p ) ] = ∑ i = 1 m − 1 2 Q i ( z p ) [ ∂ ( − μ p ) ∂ μ p Σ p − 1 ( x i − μ p ) + ∂ ( − Σ p − 1 μ p ) ∂ μ p ( x i − μ p ) ] = ∑ i = 1 m − 1 2 Q i ( z p ) [ − Σ p − 1 ( x i − μ p ) − ( Σ p − 1 ) T ( x i − μ p ) ] = ∑ i = 1 m − 1 2 Q i ( z p ) [ − Σ p − 1 ( x i − μ p ) − Σ p − 1 ( x i − μ p ) ] = ∑ i = 1 m Q i ( z p ) [ Σ p − 1 ( x i − μ p ) ] = 0 ⇒ μ p = ∑ i = 1 m Q i ( z p ) x i ∑ i = 1 m Q i ( z p ) (2.15) \begin{aligned} \frac{\partial J(\theta) }{ \partial \mu_p} &= \partial \left[ \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) (-\frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) )\right] / \partial \mu_p \\ & = \sum_{i=1}^m -\frac12 Q_i(z_p) \frac{\partial (x_i-\mu_p)^T \Sigma_p^{-1}(x_i-\mu_p) }{ \partial \mu_p } \qquad \because \frac{ \partial \bm{uv} }{ \partial \bm x} = \frac{ \partial \bm{u^Tv} }{ \partial \bm x} = \frac{\partial \bm u}{ \partial \bm x} \bm v + \frac{\partial \bm v}{ \partial \bm x} \bm u \\ & = \sum_{i=1}^m -\frac12 Q_i(z_p) \left[ \frac{\partial (x_i-\mu_p) }{ \partial \mu_p } \Sigma_p^{-1}(x_i-\mu_p) + \frac{\partial (\Sigma_p^{-1}x_i - \Sigma_p^{-1} \mu_p) }{ \partial \mu_p } (x_i-\mu_p) \right] \\ & = \sum_{i=1}^m -\frac12 Q_i(z_p) \left[ \frac{\partial (-\mu_p) }{ \partial \mu_p } \Sigma_p^{-1}(x_i-\mu_p) + \frac{\partial (- \Sigma_p^{-1} \mu_p) }{ \partial \mu_p } (x_i-\mu_p) \right] \\ & = \sum_{i=1}^m -\frac12 Q_i(z_p) \left[ - \Sigma_p^{-1}(x_i-\mu_p) - ( \Sigma_p^{-1})^T(x_i-\mu_p) \right] \\ & = \sum_{i=1}^m -\frac12 Q_i(z_p) \left[ - \Sigma_p^{-1}(x_i-\mu_p) - \Sigma_p^{-1} (x_i-\mu_p) \right] \\ & = \sum_{i=1}^m Q_i(z_p) [ \Sigma_p^{-1}(x_i-\mu_p)] = 0 \\ & \Rightarrow \mu_p = \frac{ \sum_{i=1}^m Q_i(z_p) x_i }{ \sum_{i=1}^m Q_i(z_p)} \end{aligned} \tag{2.15} ∂μp∂J(θ)=∂[i=1∑mj=1∑KQi(zj)(−21(xi−μj)TΣj−1(xi−μj))]/∂μp=i=1∑m−21Qi(zp)∂μp∂(xi−μp)TΣp−1(xi−μp)∵∂x∂uv=∂x∂uTv=∂x∂uv+∂x∂vu=i=1∑m−21Qi(zp)[∂μp∂(xi−μp)Σp−1(xi−μp)+∂μp∂(Σp−1xi−Σp−1μp)(xi−μp)]=i=1∑m−21Qi(zp)[∂μp∂(−μp)Σp−1(xi−μp)+∂μp∂(−Σp−1μp)(xi−μp)]=i=1∑m−21Qi(zp)[−Σp−1(xi−μp)−(Σp−1)T(xi−μp)]=i=1∑m−21Qi(zp)[−Σp−1(xi−μp)−Σp−1(xi−μp)]=i=1∑mQi(zp)[Σp−1(xi−μp)]=0⇒μp=∑i=1mQi(zp)∑i=1mQi(zp)xi(2.15)
对 Σ p \Sigma_p Σp求偏导,只看和 Σ p \Sigma_p Σp有关的部分:
∂ J ( θ ) ∂ Σ p = ∂ [ ∑ i = 1 m ∑ j = 1 K Q i ( z j ) ( − 1 2 l o g ∣ Σ j ∣ − 1 2 ( x i − μ j ) T Σ j − 1 ( x i − μ j ) ) ] / ∂ Σ p = ∂ [ ∑ i = 1 m − 1 2 Q i ( z p ) ( l o g ∣ Σ p ∣ + ( x i − μ p ) T Σ p − 1 ( x i − μ p ) ) ] / ∂ Σ p ∵ ∂ l o g ∣ Σ p ∣ ∂ Σ p = 1 ∣ Σ p ∣ ∣ Σ p ∣ ∂ Σ p = 1 ∣ Σ p ∣ ∣ Σ p ∣ ( Σ p − 1 ) T = Σ p − 1 ∵ ∂ [ ( x i − μ p ) T Σ p − 1 ( x i − μ p ) ] ∂ Σ p = ∂ t r [ ( x i − μ p ) T Σ p − 1 ( x i − μ p ) ] ∂ Σ p = ∂ t r [ Σ p − 1 ( x i − μ p ) ( x i − μ p ) T ] ∂ Σ p ∂ t r [ Σ p − 1 ( x i − μ p ) ( x i − μ p ) T ] ∂ Σ p = − ( Σ p − 1 ) T [ ( x i − μ p ) ( x i − μ p ) T ] T ( Σ p − 1 ) T = − Σ p − 1 ( x i − μ p ) ( x i − μ p ) T Σ p − 1 ∂ J ( θ ) ∂ Σ p = ∑ i = 1 m − 1 2 Q i ( z p ) ( Σ p − 1 − Σ p − 1 ( x i − μ p ) ( x i − μ p ) T Σ p − 1 ) = 0 ⇒ ∑ i = 1 m Q i ( z p ) Σ p − 1 = ∑ i = 1 m Q i ( z p ) Σ p − 1 ( x i − μ p ) ( x i − μ p ) T Σ p − 1 ( 右 边 同 时 乘 以 Σ p ) ⇒ ∑ i = 1 m Q i ( z p ) I = ∑ i = 1 m Q i ( z p ) Σ p − 1 ( x i − μ p ) ( x i − μ p ) T ( 左 边 同 时 乘 以 Σ p ) ⇒ ∑ i = 1 m Q i ( z p ) Σ p = ∑ i = 1 m Q i ( z p ) ( x i − μ p ) ( x i − μ p ) T ⇒ Σ p = ∑ i = 1 m Q i ( z p ) ( x i − μ p ) ( x i − μ p ) T ∑ i = 1 m Q i ( z p ) (2.16) \begin{aligned} \frac{\partial J(\theta) }{ \partial \Sigma_p} &= \partial \left[ \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) \left(- \frac12 log |\Sigma_j| - \frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) \right) \right] / \partial \Sigma_p \\ &= \partial \left[ \sum_{i=1}^m -\frac12 Q_i(z_p) ( log |\Sigma_p| + (x_i-\mu_p)^T \Sigma_p^{-1} (x_i-\mu_p) ) \right] / \partial \Sigma_p \\ & \because \frac{ \partial log |\Sigma_p| }{ \partial \Sigma_p} = \frac 1{|\Sigma_p|} \frac{|\Sigma_p|}{ \partial \Sigma_p} = \frac 1{|\Sigma_p|} |\Sigma_p| (\Sigma_p^{-1})^T = \Sigma_p^{-1} \\ & \because \frac{ \partial [ (x_i-\mu_p)^T \Sigma_p^{-1} (x_i-\mu_p) ] }{ \partial \Sigma_p} = \frac{ \partial tr [ (x_i-\mu_p) ^T \Sigma_p^{-1} (x_i-\mu_p) ] }{ \partial \Sigma_p} = \frac{ \partial tr [ \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T ] }{ \partial \Sigma_p}\\ & \frac{ \partial tr [ \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T ] }{ \partial \Sigma_p} = - (\Sigma_p^{-1})^T [ (x_i-\mu_p)(x_i-\mu_p)^T ]^T (\Sigma_p^{-1})^T = - \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T \Sigma_p^{-1} \\ \frac{\partial J(\theta) }{ \partial \Sigma_p} &= \sum_{i=1}^m -\frac12 Q_i(z_p)( \Sigma_p^{-1} - \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T \Sigma_p^{-1} ) =0 \\ & \Rightarrow \sum_{i=1}^m Q_i(z_p) \Sigma_p^{-1} = \sum_{i=1}^m Q_i(z_p) \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T \Sigma_p^{-1} \quad (右边同时乘以 \Sigma_p) \\ & \Rightarrow \sum_{i=1}^m Q_i(z_p) I = \sum_{i=1}^m Q_i(z_p) \Sigma_p^{-1} (x_i-\mu_p)(x_i-\mu_p)^T \quad (左边同时乘以 \Sigma_p) \\ & \Rightarrow \sum_{i=1}^m Q_i(z_p) \Sigma_p = \sum_{i=1}^m Q_i(z_p) (x_i-\mu_p)(x_i-\mu_p)^T \\ & \Rightarrow \Sigma_p = \frac{ \sum_{i=1}^m Q_i(z_p) (x_i-\mu_p)(x_i-\mu_p)^T }{ \sum_{i=1}^m Q_i(z_p) } \end{aligned} \tag{2.16} ∂Σp∂J(θ)∂Σp∂J(θ)=∂[i=1∑mj=1∑KQi(zj)(−21log∣Σj∣−21(xi−μj)TΣj−1(xi−μj))]/∂Σp=∂[i=1∑m−21Qi(zp)(log∣Σp∣+(xi−μp)TΣp−1(xi−μp))]/∂Σp∵∂Σp∂log∣Σp∣=∣Σp∣1∂Σp∣Σp∣=∣Σp∣1∣Σp∣(Σp−1)T=Σp−1∵∂Σp∂[(xi−μp)TΣp−1(xi−μp)]=∂Σp∂tr[(xi−μp)TΣp−1(xi−μp)]=∂Σp∂tr[Σp−1(xi−μp)(xi−μp)T]∂Σp∂tr[Σp−1(xi−μp)(xi−μp)T]=−(Σp−1)T[(xi−μp)(xi−μp)T]T(Σp−1)T=−Σp−1(xi−μp)(xi−μp)TΣp−1=i=1∑m−21Qi(zp)(Σp−1−Σp−1(xi−μp)(xi−μp)TΣp−1)=0⇒i=1∑mQi(zp)Σp−1=i=1∑mQi(zp)Σp−1(xi−μp)(xi−μp)TΣp−1(右边同时乘以Σp)⇒i=1∑mQi(zp)I=i=1∑mQi(zp)Σp−1(xi−μp)(xi−μp)T(左边同时乘以Σp)⇒i=1∑mQi(zp)Σp=i=1∑mQi(zp)(xi−μp)(xi−μp)T⇒Σp=∑i=1mQi(zp)∑i=1mQi(zp)(xi−μp)(xi−μp)T(2.16)
对
Σ
p
\Sigma_p
Σp求偏导的另一种方式,对
Σ
p
−
1
求
偏
导
\Sigma_p^{-1}求偏导
Σp−1求偏导:
∂
J
(
θ
)
∂
Σ
p
−
1
=
∂
[
∑
i
=
1
m
∑
j
=
1
K
Q
i
(
z
j
)
(
−
1
2
l
o
g
∣
Σ
j
∣
−
1
2
(
x
i
−
μ
j
)
T
Σ
j
−
1
(
x
i
−
μ
j
)
)
]
/
∂
Σ
p
−
1
=
∂
[
∑
i
=
1
m
∑
j
=
1
K
Q
i
(
z
j
)
(
1
2
l
o
g
∣
Σ
j
−
1
∣
−
1
2
(
x
i
−
μ
j
)
T
Σ
j
−
1
(
x
i
−
μ
j
)
)
]
/
∂
Σ
p
−
1
=
∂
[
∑
i
=
1
m
1
2
Q
i
(
z
p
)
(
l
o
g
∣
Σ
p
−
1
∣
−
(
x
i
−
μ
p
)
T
Σ
p
−
1
(
x
i
−
μ
p
)
)
]
/
∂
Σ
p
−
1
=
∑
i
=
1
m
1
2
Q
i
(
z
p
)
[
(
(
Σ
p
−
1
)
−
1
)
T
−
∂
t
r
[
(
x
i
−
μ
p
)
T
Σ
p
−
1
(
x
i
−
μ
p
)
]
∂
Σ
p
−
1
]
=
∑
i
=
1
m
1
2
Q
i
(
z
p
)
[
Σ
p
−
∂
t
r
[
(
x
i
−
μ
p
)
(
x
i
−
μ
p
)
T
Σ
p
−
1
]
∂
Σ
p
−
1
]
=
∑
i
=
1
m
1
2
Q
i
(
z
p
)
[
Σ
p
−
[
(
x
i
−
μ
p
)
(
x
i
−
μ
p
)
T
]
T
]
=
∑
i
=
1
m
1
2
Q
i
(
z
p
)
[
Σ
p
−
(
x
i
−
μ
p
)
(
x
i
−
μ
p
)
T
]
=
0
⇒
Σ
p
=
∑
i
=
1
m
Q
i
(
z
p
)
(
x
i
−
μ
p
)
(
x
i
−
μ
p
)
T
∑
i
=
1
m
Q
i
(
z
p
)
(2.17)
\begin{aligned} \frac{\partial J(\theta) }{ \partial \Sigma_p^{-1}} &= \partial \left[ \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) \left(- \frac12 log |\Sigma_j| - \frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) \right) \right] / \partial \Sigma_p^{-1} \\ & = \partial \left[ \sum_{i=1}^m \sum_{j=1}^K Q_i(z_j) \left( \frac12 log |\Sigma_j^{-1}| - \frac12 (x_i-\mu_j)^T \Sigma_j^{-1}(x_i-\mu_j) \right) \right] / \partial \Sigma_p^{-1} \\ & = \partial \left[ \sum_{i=1}^m \frac12 Q_i(z_p) ( log |\Sigma_p^{-1}| - (x_i-\mu_p)^T \Sigma_p^{-1}(x_i-\mu_p) ) \right] / \partial \Sigma_p^{-1} \\ & = \sum_{i=1}^m \frac12 Q_i(z_p) \left[ (( \Sigma_p^{-1})^{-1})^T - \frac{\partial tr [(x_i-\mu_p)^T \Sigma_p^{-1}(x_i-\mu_p)]}{ \partial \Sigma_p^{-1} } \right] \\ & = \sum_{i=1}^m \frac12 Q_i(z_p) \left[ \Sigma_p - \frac{\partial tr [(x_i-\mu_p)(x_i-\mu_p)^T \Sigma_p^{-1}]}{ \partial \Sigma_p^{-1} } \right] \\ & = \sum_{i=1}^m \frac12 Q_i(z_p) \left[ \Sigma_p - [(x_i-\mu_p)(x_i-\mu_p)^T ]^T \right] \\ & = \sum_{i=1}^m \frac12 Q_i(z_p) \left[ \Sigma_p - (x_i-\mu_p)(x_i-\mu_p)^T \right] =0 \\ & \Rightarrow \Sigma_p = \frac{ \sum_{i=1}^m Q_i(z_p) (x_i-\mu_p)(x_i-\mu_p)^T }{ \sum_{i=1}^m Q_i(z_p) } \end{aligned} \tag{2.17}
∂Σp−1∂J(θ)=∂[i=1∑mj=1∑KQi(zj)(−21log∣Σj∣−21(xi−μj)TΣj−1(xi−μj))]/∂Σp−1=∂[i=1∑mj=1∑KQi(zj)(21log∣Σj−1∣−21(xi−μj)TΣj−1(xi−μj))]/∂Σp−1=∂[i=1∑m21Qi(zp)(log∣Σp−1∣−(xi−μp)TΣp−1(xi−μp))]/∂Σp−1=i=1∑m21Qi(zp)[((Σp−1)−1)T−∂Σp−1∂tr[(xi−μp)TΣp−1(xi−μp)]]=i=1∑m21Qi(zp)[Σp−∂Σp−1∂tr[(xi−μp)(xi−μp)TΣp−1]]=i=1∑m21Qi(zp)[Σp−[(xi−μp)(xi−μp)T]T]=i=1∑m21Qi(zp)[Σp−(xi−μp)(xi−μp)T]=0⇒Σp=∑i=1mQi(zp)∑i=1mQi(zp)(xi−μp)(xi−μp)T(2.17)
3 EM算法
通过上面对EM算法的理论推导,现在我们对EM算法的实现步骤进行一个总结:
- 初始化参数 K , θ = { π j , μ j , Σ j } j = 1 , 2 , . . , K K,\theta=\{ \pi_j,\mu_j,\Sigma_j \} \quad j=1,2,..,K K,θ={πj,μj,Σj}j=1,2,..,K
- E-step:根据当前参数
π
j
,
μ
j
,
Σ
j
\pi_j,\mu_j,\Sigma_j
πj,μj,Σj计算后验概率
Q
i
(
z
j
)
Q_i(z_j)
Qi(zj):
Q i ( z j ) = π j N ( x i ; μ j , Σ j ) ∑ p = 1 K π p N ( x i ; μ p , Σ p ) (3.1) Q_i(z_j) = \frac{ \pi_j N(x_i;\mu_j,\Sigma_j) }{ \sum_{p=1}^K \pi_p N(x_i;\mu_p,\Sigma_p) } \tag{3.1} Qi(zj)=∑p=1KπpN(xi;μp,Σp)πjN(xi;μj,Σj)(3.1) - M-step:根据E步中计算的 Q i ( z j ) Q_i(z_j) Qi(zj),重新计算参数 θ = { π j , μ j , Σ j } j = 1 , 2 , . . , K \theta=\{ \pi_j,\mu_j,\Sigma_j \} \quad j=1,2,..,K θ={πj,μj,Σj}j=1,2,..,K
- 计算对数似然函数,观察对数似然函数值的变化是否收敛,收敛则结束,否则转到步骤2,直到收敛。
注:通过EM求解混合高斯模型,得到的解不一定是全局最优解,可能收敛到局部最优。
4 代码实现
import numpy as np
import matplotlib.pyplot as plt
#高斯概率密度函数
def gaussian(x, Mean, Cov_matrix):
dim = np.shape(cov)[0] # 维度
# 之所以加入单位矩阵是为了防止行列式为0的情况
covdet = np.linalg.det(Cov_matrix + np.eye(dim) * 0.01) # 协方差矩阵的行列式
covinv = np.linalg.inv(Cov_matrix + np.eye(dim) * 0.01) # 协方差矩阵的逆
xdiff = x - Mean
# 概率密度
prob = 1.0 / np.power(2 * np.pi, 1.0 * dim / 2) / np.sqrt(np.abs(covdet)) * np.exp(
-0.5 * xdiff.T @ covinv @ xdiff)
return prob
def GMM(X, K, iter_num=10):
global imgnum
m, n = X.shape
# 初始化参数
oldpi = pi = np.full(K, 1.0 / K)
oldmu = mu = [X[i] for i in np.roll(np.arange(K), np.random.choice(m))]
olsigma = sigma = [np.eye(n) for i in range(K)]
Q = np.zeros((m, K))
errorlost = 0
for itn in range(iter_num):
plt.close()
plt.plot(X[:, 0], X[:, 1], '.')
# E_step
for i in range(m):
pxz = [pi[j] * gaussian(X[i], mu[j], sigma[j]) for j in range(K)]
pxzsum = np.sum(pxz)
Q[i] = pxz / pxzsum
Qcol = np.sum(Q, axis=0)
pi = Qcol / m
mu = [np.sum(Q[:, j:j + 1] * X, axis=0) / Qcol[j] for j in range(K)]
for j in range(K):
xdif = X - mu[j]
sigma[j] = 1.0 / Qcol[j] * np.sum(
[Q[i][j] * (xdif[i:i + 1].T @ xdif[i:i + 1]) for i in range(m)], axis=0)
# 画出每个高斯分布的均值
for j in range(K):
plt.plot(mu[j][0], mu[j][1], marker='o', c='r')
plt.savefig(str(imgnum)+".png")
imgnum += 1
plt.show()
curerror = sum(np.power((oldpi - pi).tolist(), 2)) + sum(np.power(np.ravel(oldmu) - np.ravel(mu), 2)) + sum(
np.power(np.ravel(olsigma) - np.ravel(sigma), 2))
if abs(curerror - errorlost) < 0.0001:
break
errorlost = curerror
print("第{:d}次迭代参数平方损失{:6f}:".format(itn, errorlost))
return pi, mu, sigma
if __name__ == '__main__':
# 初始化参数
mean = [2, 2]
cov = [[1, 0], [0, 1]]
s1 = np.random.multivariate_normal(mean, cov, 80, "raise")
mean = [10, 8]
cov = [[1, 0], [0, 4]]
s2 = np.random.multivariate_normal(mean, cov, 100, "raise")
mean = [4, 13]
cov = [[2, 1], [1, 3]]
s3 = np.random.multivariate_normal(mean, cov, 120, "raise")
sall = np.vstack((s1, s2, s3))
np.random.shuffle(sall)
# 高斯分量的个数
K = 3
# 最大迭代次数
iter_numbers = 60
pi, mu, sigma = GMM(sall, K, iter_numbers)
pointnum = 100
xrange = np.linspace(sall[:, 0].min(), sall[:, 0].max(), pointnum)
yrange = np.linspace(sall[:, 1].min(), sall[:, 1].max(), pointnum)
XX, YY = np.meshgrid(xrange, yrange)
plt.close()
plt.scatter(s1[:, 0], s1[:, 1], marker='.', c='r')
plt.scatter(s2[:, 0], s2[:, 1], marker='o', c='b')
plt.scatter(s3[:, 0], s3[:, 1], marker='x', c='g')
plt_data = np.dstack((XX, YY))
for j in range(K):
Z = [[gaussian(plt_data[q][p], mu[j], sigma[j]) for p in range(pointnum)] for q in range(pointnum)]
cs = plt.contour(XX, YY, Z)
plt.clabel(cs)
plt.savefig(str(imgnum) + ".png")
plt.show()