EM算法摘记(二):混合高斯分布
混合高斯分布(GMM)
\qquad 摘记(一)描述的三硬币问题实际上是一个具有 2 个子分布的概率模型,更加广为人知的是具有 K K K 个子分布的混合高斯模型,整个过程基本上和三硬币问题是一样的。
【1】混合高斯分布的 E M EM EM 公式的推导过程与“三硬币问题”几乎完全一样:
只需要将“三硬币问题”中的 P ( y ∣ z k = 1 , θ ) = α k y ( 1 − α k ) 1 − y , k ∈ { 1 , 2 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{1-y},k\in\{1,2\} P(y∣zk=1,θ)=αky(1−αk)1−y,k∈{1,2} 替换为 P ( x ∣ z k = 1 , θ ) = N ( x ∣ μ k , Σ k ) , k ∈ { 1 , ⋯ , K } P(\boldsymbol{x}|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}),k\in\{1,\cdots,K\} P(x∣zk=1,θ)=N(x∣μk,Σk),k∈{1,⋯,K} 就可以了。
【2】 不同之处仅仅在于:混合高斯分布的最大似然估计与“三硬币问题”不一样
\qquad
混合高斯模型
\qquad
p
(
x
∣
π
,
μ
,
Σ
)
=
∑
k
=
1
K
π
k
N
(
x
∣
μ
k
,
Σ
k
)
p(\boldsymbol{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
p(x∣π,μ,Σ)=k=1∑KπkN(x∣μk,Σk)
,
∑
k
=
1
K
π
k
=
1
,
x
∈
R
D
\ \ ,\ \displaystyle\sum_{k=1}^{K}\pi_{k}=1,\ \boldsymbol{x}\in R^{D}
, k=1∑Kπk=1, x∈RD
\qquad\qquad\qquad\qquad 其中, N ( x ∣ μ , Σ ) = 1 ( 2 π ) D 2 ∣ Σ ∣ − 1 2 exp [ − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ] \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu},\boldsymbol{\Sigma})= \dfrac{1}{{(2\pi)}^{\frac{D}{2}}{|\boldsymbol\Sigma|}^{-\frac{1}{2}}}\exp[-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T}\boldsymbol\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})] N(x∣μ,Σ)=(2π)2D∣Σ∣−211exp[−21(x−μ)TΣ−1(x−μ)]
\qquad
下图是从一个
K
=
3
K=3
K=3 的混合高斯分布中抽取了500个样本点,其中图(a)中的样本是从联合分布
p
(
x
,
z
)
=
p
(
z
)
p
(
x
∣
z
)
p(\boldsymbol x,\mathbf z) =p(\mathbf z)p(\boldsymbol x|\mathbf z)
p(x,z)=p(z)p(x∣z) 中抽取(包含了隐藏变量
z
\mathbf z
z 的信息,
z
\mathbf z
z 的3种状态分别对应着红、绿、蓝色样本,对应了3种不同的高斯分布),图(b)是从边缘分布
p
(
x
)
=
∑
z
p
(
x
,
z
)
p(\boldsymbol x)=\sum\limits_{\mathbf z}p(\boldsymbol x,\mathbf z)
p(x)=z∑p(x,z) 中抽取(因为缺少隐藏变量
z
\mathbf z
z 的信息,所以只能看到有关
x
\boldsymbol x
x 的信息,无法知道
x
\boldsymbol x
x 到底属于哪个子分布 )。
From 《Pattern Recognition and Machine Learning》. Figure 9.5
p ( x ) = ∑ z p ( z ) p ( x ∣ z ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z) p(x)=z∑p(z)p(x∣z):an equivalent formulation of the Gaussian mixture involving an explicit latent variable(包含显式的隐藏变量)
可以使用 ancestral sampling 生成服从混合高斯模型概率分布的随机样本【图(a)】。
(1) 根据 p ( z ) p(z) p(z) 生成 z ∈ { 1 , 2 , 3 } z\in\{1,2,3\} z∈{1,2,3} 的值(例如以 π 3 \pi_{3} π3 的概率产生 z = 3 z=3 z=3 来表示 z 3 = 1 z_{3}=1 z3=1)
(2) 再根据条件概率 p ( x ∣ z 3 = 1 ) = N ( x ∣ μ 3 , Σ 3 ) p(\boldsymbol x|z_{3}=1)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3}) p(x∣z3=1)=N(x∣μ3,Σ3) 产生 x x x 的值
如果在图(a)中忽略 z z z 的信息,就变成图(b)的情形。
\qquad
\qquad
图(b)中显示了从
p
(
x
∣
π
,
μ
,
Σ
)
p(\mathbf{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})
p(x∣π,μ,Σ) 中抽取了
N
=
500
N=500
N=500 个样本为
{
x
1
,
x
2
,
⋯
,
x
N
}
\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
{x1,x2,⋯,xN},如果引入隐藏变量
z
z
z 表示GMM中的每一个分布 ,那么这些样本实际上可能是图(a)中所显示的
{
(
x
1
,
z
2
)
,
(
x
2
,
z
3
)
,
(
x
3
,
z
1
)
,
⋯
,
(
x
N
,
z
1
)
}
\{(x_{1},z_{2}), (x_{2},z_{3}), (x_{3},z_{1}),\cdots,(x_{N},z_{1}) \}
{(x1,z2),(x2,z3),(x3,z1),⋯,(xN,z1)},此处采用的是1-of-K
表示法,
z
k
z_{k}
zk 意味着
z
k
=
1
z_{k}=1
zk=1,表明该样本点是从第
k
k
k 个高斯分布中抽取的。
\qquad
1. 隐藏变量的 1-of-K 表示
\qquad 采用 1 − o f − K 1-of-K 1−of−K 表示隐藏变量,需要将 K K K 个隐藏变量 组织成 向量 的形式,而且在由 K K K 个隐藏变量组成的向量 z = [ z 1 , z 2 , ⋯ , z K ] T \bold z =[ z_{1},z_{2},\cdots,z_{K} ]^{T} z=[z1,z2,⋯,zK]T 中,仅有一个分量的值为1,其余分量均为0。
\qquad 以上图中 K = 3 K=3 K=3 的情况为例:
1
)
\qquad1)
1) 隐藏变量
z
1
=
1
z_{1}=1
z1=1 ,即隐藏向量
z
=
[
1
,
0
,
0
]
T
\bold z=[1,0,0]^{T}
z=[1,0,0]T,表示 事件“选中第1个(红色)子分布”
\qquad
该事件的概率为
P
(
z
1
=
1
∣
θ
)
=
π
1
P(z_{1}=1|\theta)=\pi_{1}
P(z1=1∣θ)=π1
2
)
\qquad2)
2) 隐藏变量
z
2
=
1
z_{2}=1
z2=1 ,即隐藏向量
z
=
[
0
,
1
,
0
]
T
\bold z=[0,1,0]^{T}
z=[0,1,0]T,表示 事件“选中第2个(绿色)子分布”
\qquad
该事件的概率为
P
(
z
2
=
1
∣
θ
)
=
π
2
P(z_{2}=1|\theta)=\pi_{2}
P(z2=1∣θ)=π2
3
)
\qquad3)
3) 隐藏变量
z
3
=
1
z_{3}=1
z3=1 ,即隐藏向量
z
=
[
0
,
0
,
1
]
T
\bold z=[0,0,1]^{T}
z=[0,0,1]T,表示 事件“选中第3个(蓝色)子分布”
\qquad
该事件的概率为
P
(
z
3
=
1
∣
θ
)
=
π
3
P(z_{3}=1|\theta)=\pi_{3}
P(z3=1∣θ)=π3
\qquad
选中第k个子分布的(先验)概率可以统一描述为:
P
(
z
k
=
1
∣
θ
)
=
π
k
,
k
∈
{
1
,
2
,
3
}
P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2,3\}
P(zk=1∣θ)=πk , k∈{1,2,3}
\qquad
\qquad
用隐藏向量
z
=
[
z
1
,
z
2
,
z
3
]
T
\mathbf z=[z_{1},z_{2},z_{3}]^{T}
z=[z1,z2,z3]T 表示,也就是:
\qquad\qquad P ( z ∣ θ ) = ∏ k = 1 3 π k z k = π 1 z 1 ⋅ π 2 z 2 ⋅ π 3 z 3 , z = [ z 1 z 2 z 3 ] ∈ { [ 1 0 0 ] , [ 0 1 0 ] , [ 0 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{3} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdot\pi_{3}^{z_{3}}\ \ ,\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\0\\1 \end{matrix} \right]\right\} \end{aligned} P(z∣θ)=k=1∏3πkzk=π1z1⋅π2z2⋅π3z3 , z= z1z2z3 ∈⎩ ⎨ ⎧ 100 , 010 , 001 ⎭ ⎬ ⎫
\qquad 对于具有 K K K 个分布的 GMM,隐藏向量 z = [ z 1 , z 2 , ⋯ , z K ] T \mathbf z=[z_{1},z_{2},\cdots,z_{K}]^{T} z=[z1,z2,⋯,zK]T,则为:
\qquad\qquad P ( z ∣ θ ) = ∏ k = 1 K π k z k \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned} P(z∣θ)=k=1∏Kπkzk = π 1 z 1 ⋅ π 2 z 2 ⋯ π K z K , z = [ z 1 z 2 z 3 ⋮ z K ] ∈ { [ 1 0 0 ⋮ 0 ] , [ 0 1 0 ⋮ 0 ] , ⋯ , [ 0 0 ⋮ 0 1 ] } =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdots\pi_{K}^{z_{K}},\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} =π1z1⋅π2z2⋯πKzK, z= z1z2z3⋮zK ∈⎩ ⎨ ⎧ 100⋮0 , 010⋮0 ,⋯, 00⋮01 ⎭ ⎬ ⎫
\qquad
2. 引入隐藏向量 z \mathbf z z 表示观测值 x \boldsymbol x x 的概率
\qquad 从图(a)中可以很明显地看出:
1 ) \qquad1) 1) 红色的观测样本 x \boldsymbol x x 都是由第1个(红色)子分布产生,其概率为: P ( x ∣ z 1 = 1 , θ ) = N ( x ∣ μ 1 , Σ 1 ) P(\boldsymbol x|z_{1}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1}) P(x∣z1=1,θ)=N(x∣μ1,Σ1)
2 ) \qquad2) 2) 绿色的观测样本 x \boldsymbol x x 都是由第2个(绿色)子分布产生,其概率为: P ( x ∣ z 2 = 1 , θ ) = N ( x ∣ μ 2 , Σ 2 ) P(\boldsymbol x|z_{2}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2}) P(x∣z2=1,θ)=N(x∣μ2,Σ2)
3
)
\qquad3)
3) 蓝色的观测样本
x
\boldsymbol x
x 都是由第3个(蓝色)子分布产生,其概率为:
P
(
x
∣
z
3
=
1
,
θ
)
=
N
(
x
∣
μ
3
,
Σ
3
)
P(\boldsymbol x|z_{3}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3})
P(x∣z3=1,θ)=N(x∣μ3,Σ3)
\qquad
因此,图(a)中不同颜色样本点所描述的子分布(此处
K
=
3
K=3
K=3)可以统一描述为:
\qquad
P
(
x
∣
z
k
=
1
,
θ
)
=
N
(
x
∣
μ
k
,
Σ
k
)
,
k
=
1
,
⋯
,
K
\qquad\qquad P(\boldsymbol x|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ ,\qquad k=1,\cdots,K
P(x∣zk=1,θ)=N(x∣μk,Σk) ,k=1,⋯,K
\qquad
\qquad
用隐藏向量
z
=
[
z
1
z
2
z
3
⋮
z
K
]
∈
{
[
1
0
0
⋮
0
]
,
[
0
1
0
⋮
0
]
,
⋯
,
[
0
0
⋮
0
1
]
}
\mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\}
z=
z1z2z3⋮zK
∈⎩
⎨
⎧
100⋮0
,
010⋮0
,⋯,
00⋮01
⎭
⎬
⎫ 表示,也就是:
\qquad
\qquad\qquad
P
(
x
∣
z
,
θ
)
=
∏
k
=
1
K
P
(
x
∣
z
k
=
1
,
θ
)
z
k
=
P
(
x
∣
z
1
=
1
,
θ
)
z
1
⋅
P
(
x
∣
z
2
=
1
,
θ
)
z
2
⋯
P
(
x
∣
z
K
=
1
,
θ
)
z
K
=
N
(
x
∣
μ
1
,
Σ
1
)
z
1
⋅
N
(
x
∣
μ
2
,
Σ
2
)
z
2
⋯
N
(
x
∣
μ
K
,
Σ
K
)
z
K
=
∏
k
=
1
K
[
N
(
x
∣
μ
k
,
Σ
k
)
]
z
k
\begin{aligned} P(\boldsymbol{x}|\mathbf z,\theta) &= \prod_{k=1}^{K} P(\boldsymbol{x}|z_{k}=1,\theta)^{z_{k}} \\ &=P(\boldsymbol{x}|z_{1}=1,\theta)^{z_{1}}\cdot P(\boldsymbol{x}|z_{2}=1,\theta)^{z_{2}} \cdots P(\boldsymbol{x}|z_{K}=1,\theta)^{z_{K}}\\ &=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})^{z_{1}}\cdot\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})^{z_{2}} \cdots\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})^{z_{K}} \\ &= \prod_{k=1}^{K} \left[\ \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right]^{z_{k}} \end{aligned}
P(x∣z,θ)=k=1∏KP(x∣zk=1,θ)zk=P(x∣z1=1,θ)z1⋅P(x∣z2=1,θ)z2⋯P(x∣zK=1,θ)zK=N(x∣μ1,Σ1)z1⋅N(x∣μ2,Σ2)z2⋯N(x∣μK,ΣK)zK=k=1∏K[ N(x∣μk,Σk) ]zk
\qquad
\qquad
又由
P
(
z
∣
θ
)
=
∏
k
=
1
K
π
k
z
k
\begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned}
P(z∣θ)=k=1∏Kπkzk
\qquad 则有: p ( x ) = ∑ z p ( z ) p ( x ∣ z ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z)=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) p(x)=z∑p(z)p(x∣z)=k=1∑KπkN(x∣μk,Σk)
\qquad 对于所有样本集 X = { x 1 , x 2 , ⋯ , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,⋯,xN},对应的隐藏变量(向量)集为 Z = { z 1 , z 2 , ⋯ , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} Z={z1,z2,⋯,zN},就有:
p ( X ∣ Z , μ , Σ , π ) = ∏ n = 1 N ∏ k = 1 K { N ( x n ∣ μ k , Σ k ) } z n k \qquad\qquad\begin{aligned} p(\mathbf X|\mathbf Z,\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned} p(X∣Z,μ,Σ,π)=n=1∏Nk=1∏K{ N(xn∣μk,Σk) }znk
\qquad 可以得到:
p ( X , Z ∣ μ , Σ , π ) = ∏ n = 1 N ∏ k = 1 K π k z n k { N ( x n ∣ μ k , Σ k ) } z n k \qquad\qquad\begin{aligned} p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K} \pi_{k}^{z_{nk}}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned} p(X,Z∣μ,Σ,π)=n=1∏Nk=1∏Kπkznk{ N(xn∣μk,Σk) }znk
\qquad\qquad 其中, z n k z_{nk} znk 表示样本 x n \boldsymbol x_{n} xn 所对应隐藏变量(向量) z n \mathbf z_{n} zn 的第 k k k 个分量值(0或者1)。
\qquad
3. 对数似然函数
\qquad
\qquad
若样本点为图(a)的形式,也就是假设隐藏变量集
Z
\mathbf Z
Z 的信息已知(或说
z
n
k
z_{nk}
znk 信息已知)时,为了获得模型中的未知参数
(
μ
,
Σ
,
π
)
(\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
(μ,Σ,π),可以求取
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
lnp(X,Z∣μ,Σ,π) 的最大似然解。
ln p ( X , Z ∣ μ , Σ , π ) = ∑ n = 1 N ∑ k = 1 K z n k { ln π k + ln N ( x n ∣ μ k , Σ k ) } \qquad\qquad\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) = \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} lnp(X,Z∣μ,Σ,π)=n=1∑Nk=1∑Kznk{ lnπk+lnN(xn∣μk,Σk) }
\qquad 如果 z n k z_{nk} znk 信息已知,可将所有样本 X = { x 1 , x 2 , ⋯ , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,⋯,xN} 分为 K K K 组,第 k k k 组 C k = { n ∣ z n k = 1 , z n j = 0 , ∀ j ≠ k } C_{k}=\{n\ |\ z_{nk}=1,z_{nj}=0,\forall j \not =k\} Ck={n ∣ znk=1,znj=0,∀j=k} 中的所有样本点 x n \boldsymbol x_{n} xn 是从 N ( x n ∣ μ k , Σ k ) \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) N(xn∣μk,Σk) 中抽样的,对数似然函数可以表示为:
ln p ( X , Z ∣ μ , Σ , π ) = ∑ n ∈ C 1 { ln π 1 + ln N ( x n ∣ μ 1 , Σ 1 ) } + ⋯ + ∑ n ∈ C k { ln π k + ln N ( x n ∣ μ k , Σ k ) } + ⋯ + ∑ n ∈ C K { ln π K + ln N ( x n ∣ μ K , Σ K ) } \qquad\qquad\begin{aligned}\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{1}}\left\{\ \ln\pi_{1}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})\ \right\} \\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{K}}\left\{\ \ln\pi_{K}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})\ \right\} \end{aligned} lnp(X,Z∣μ,Σ,π)=n∈C1∑{ lnπ1+lnN(xn∣μ1,Σ1) }+⋯+n∈Ck∑{ lnπk+lnN(xn∣μk,Σk) }+⋯+n∈CK∑{ lnπK+lnN(xn∣μK,ΣK) }
\qquad 由于 K K K 组分布相互独立,只需要分别求出每一组的对数最大似然分量,对于第 k k k 组而言(假设有 N k N_{k} Nk 个样本):
ln p ( X k , Z k ∣ μ , Σ , π ) = ∑ n ∈ C k { ln π k + ln N ( x n ∣ μ k , Σ k ) } = N k ln π k + N k ln N ( x n ∣ μ k , Σ k ) = N k ln π k − N k D 2 ln ( 2 π ) − N k 2 ln ∣ Σ k ∣ − 1 2 ∑ n ∈ C k ( x n − μ k ) T Σ k − 1 ( x n − μ k ) \qquad\qquad\begin{aligned}\ln p(\mathbf X_{k},\mathbf Z_{k}|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ &= N_{k}\ln\pi_{k}+ N_{k}\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \\ &= N_{k}\ln\pi_{k}-\dfrac{N_{k}D}{2}\ln(2\pi)-\dfrac{N_{k}}{2}\ln\begin{vmatrix}\boldsymbol{\Sigma}_{k} \end{vmatrix}-\dfrac{1}{2}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})^{T}\boldsymbol{\Sigma}_{k}^{-1}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})\ \\ \end{aligned} lnp(Xk,Zk∣μ,Σ,π)=n∈Ck∑{ lnπk+lnN(xn∣μk,Σk) }=Nklnπk+NklnN(xn∣μk,Σk) =Nklnπk−2NkDln(2π)−2Nkln Σk −21n∈Ck∑(xn−μk)TΣk−1(xn−μk)
\qquad 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} μk,Σk 求最大似然解:
μ ^ k = 1 N k ∑ n ∈ C k x n = 1 N k ∑ n = 1 N z n k x n \qquad\qquad\hat{\boldsymbol{\mu}}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}\boldsymbol{x}_{n}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}\boldsymbol{x}_{n} μ^k=Nk1n∈Ck∑xn=Nk1n=1∑Nznkxn
Σ ^ k = 1 N k ∑ n ∈ C k ( x n − μ ^ k ) ( x n − μ ^ k ) T = 1 N k ∑ n = 1 N z n k ( x n − μ ^ k ) ( x n − μ ^ k ) T \qquad\qquad\begin{aligned}\hat{\boldsymbol{\Sigma}}_{k}&=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T}\\ &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T} \\ \end{aligned} Σ^k=Nk1n∈Ck∑(xn−μ^k)(xn−μ^k)T=Nk1n=1∑Nznk(xn−μ^k)(xn−μ^k)T
\qquad\qquad 其中, N k = ∑ n = 1 N z n k N_{k}=\displaystyle\sum_{n=1}^{N}z_{nk} Nk=n=1∑Nznk
\qquad
4. 计算对数似然函数的期望
\qquad 由于所有隐藏变量 Z = { z 1 , z 2 , ⋯ , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} Z={z1,z2,⋯,zN} 都是未知的, E M EM EM 算法通过求对数似然函数 ln p ( X , Z ∣ μ , Σ , π ) \ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) lnp(X,Z∣μ,Σ,π) 的期望来猜测所有的隐藏变量。
\qquad 对数似然函数的期望:
\qquad\qquad Q ( θ , θ ( i ) ) = E p ( Z ∣ X , θ ) [ ln p ( X , Z ∣ μ , Σ , π ) ] = E p ( Z ∣ X , θ ) { ∑ n = 1 N ∑ k = 1 K z n k { ln π k + ln N ( x n ∣ μ k , Σ k ) } } = ∑ n = 1 N ∑ k = 1 K E p ( Z ∣ X , θ ) [ z n k ] { ln π k + ln N ( x n ∣ μ k , Σ k ) } \begin{aligned}Q(\theta,\theta^{(i)})&=E_{p(\bold Z|\bold X,\theta)}[\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})] \\ &= E_{p(\bold Z|\bold X,\theta)}\left\{\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} E_{p(\bold Z|\bold X,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ \end{aligned} Q(θ,θ(i))=Ep(Z∣X,θ)[lnp(X,Z∣μ,Σ,π)]=Ep(Z∣X,θ){n=1∑Nk=1∑Kznk{ lnπk+lnN(xn∣μk,Σk) }}=n=1∑Nk=1∑KEp(Z∣X,θ)[znk]{ lnπk+lnN(xn∣μk,Σk) }
\qquad
仍然假设观测数据
X
=
{
x
1
,
x
2
,
⋯
,
x
N
}
\mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
X={x1,x2,⋯,xN} 的产生过程是:
\newline
1
)
\qquad1)
1) 首先,以
P
(
z
k
=
1
∣
θ
)
=
π
k
,
k
∈
{
1
,
⋯
,
K
}
P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\}
P(zk=1∣θ)=πk, k∈{1,⋯,K} 的概率,确定
z
k
=
1
,
z
j
=
0
(
∀
j
≠
k
)
z_{k}=1, \ z_{j}=0\ (\forall j \not =k)
zk=1, zj=0 (∀j=k),也就是选择第
k
k
k 个子分布
N
(
x
∣
μ
k
Σ
k
)
\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k}\boldsymbol{\Sigma}_{k})\newline
N(x∣μkΣk)
2
)
\qquad2)
2) 然后,从第
k
k
k 个子分布
N
(
x
∣
μ
k
,
Σ
k
)
\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
N(x∣μk,Σk) 中抽出样本
x
\boldsymbol x\newline
x
\qquad
如果每个观测样本
x
n
\boldsymbol x_{n}
xn 的产生过程都是独立的
,那么观测样本
x
n
\boldsymbol x_{n}
xn 的产生只与隐藏向量
z
n
\mathbf z_{n}
zn 有关。
\qquad 也就是求
\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k p ( z n ∣ x n , θ ( i ) ) E_{P(\bold Z|\bold X,\theta)}[z_{nk}]=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} p(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) EP(Z∣X,θ)[znk]=znk∈{0,1}∑znkp(zn∣xn,θ(i))
\qquad 又由
\qquad\qquad p ( z n ∣ x n , θ ) = p ( z n , x n ∣ θ ) p ( x n ∣ θ ) = p ( x n ∣ z n , θ ) p ( z n ∣ θ ) ∑ z n { p ( x n ∣ z n , θ ) p ( z n ∣ θ ) } , z n = [ z n 1 z n 2 z n 3 ⋮ z n K ] ∈ { [ 1 0 0 ⋮ 0 ] , [ 0 1 0 ⋮ 0 ] , ⋯ , [ 0 0 ⋮ 0 1 ] } = p ( x n ∣ z n k = 1 , θ ) p ( z n k = 1 ∣ θ ) ∑ j = 1 K { p ( x n ∣ z n j = 1 , θ ) p ( z n j = 1 ∣ θ ) } = π k p ( x n ∣ z n k = 1 , θ ) ∑ j = 1 K { π j p ( x n ∣ z n j = 1 , θ ) } \begin{aligned}p(\mathbf z_{n}|\boldsymbol x_{n},\theta) &=\frac{p(\mathbf z_{n},\boldsymbol x_{n}|\theta)}{p(\boldsymbol x_{n}|\theta)} \\ &=\frac{p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)}{\sum\limits_{\mathbf z_{n}}\left\{ \ p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)\ \right\}},\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2}\\z_{n3}\\ \vdots \\z_{nK} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} \\ &=\frac{p(\boldsymbol x_{n}|z_{nk}=1,\theta)p(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta)p(z_{nj}=1|\theta)\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta)}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta)\ \right\}} \\ \end{aligned} p(zn∣xn,θ)=p(xn∣θ)p(zn,xn∣θ)=zn∑{ p(xn∣zn,θ)p(zn∣θ) }p(xn∣zn,θ)p(zn∣θ), zn= zn1zn2zn3⋮znK ∈⎩ ⎨ ⎧ 100⋮0 , 010⋮0 ,⋯, 00⋮01 ⎭ ⎬ ⎫=j=1∑K{ p(xn∣znj=1,θ)p(znj=1∣θ) }p(xn∣znk=1,θ)p(znk=1∣θ)=j=1∑K{ πjp(xn∣znj=1,θ) }πkp(xn∣znk=1,θ)
\qquad 因此
\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k P ( z n ∣ x n , θ ( i ) ) = 1 ⋅ p ( x n ∣ z n k = 1 , θ ( i ) ) p ( z n k = 1 ∣ θ ( i ) ) + 0 ⋅ p ( x n ∣ z n k = 0 , θ ( i ) ) p ( z n k = 0 ∣ θ ( i ) ) ∑ j = 1 K { p ( x n ∣ z n j = 1 , θ ( i ) ) p ( z n j = 1 ∣ θ ( i ) ) } = π k p ( x n ∣ z n k = 1 , θ ( i ) ) ∑ j = 1 K { π j p ( x n ∣ z n j = 1 , θ ( i ) ) } = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K { π j N ( x n ∣ μ j , Σ j ) } , k ∈ { 1 , ⋯ , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{1\cdot p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})p(z_{nk}=1|\theta^{(i)})+0\cdot p(\boldsymbol x_{n}|z_{nk}=0,\theta^{(i)})p(z_{nk}=0|\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})p(z_{nj}=1|\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned} EP(Z∣X,θ)[znk]=znk∈{0,1}∑znkP(zn∣xn,θ(i))=j=1∑K{ p(xn∣znj=1,θ(i))p(znj=1∣θ(i)) }1⋅p(xn∣znk=1,θ(i))p(znk=1∣θ(i))+0⋅p(xn∣znk=0,θ(i))p(znk=0∣θ(i))=j=1∑K{ πjp(xn∣znj=1,θ(i)) }πkp(xn∣znk=1,θ(i))=j=1∑K{ πjN(xn∣μj,Σj) }πkN(xn∣μk,Σk),k∈{1,⋯,K}
5. E E E 步公式
\qquad
\qquad
由于在图(b)抽取的数据中,我们并不知道隐藏变量
z
n
\mathbf z_{n}
zn 的值,因此我们在给定模型参数为
(
μ
,
Σ
,
π
)
(\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
(μ,Σ,π) 的情况下,求取了
z
n
\mathbf z_{n}
zn 的期望:
\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k P ( z n ∣ x n , θ ( i ) ) = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K { π j N ( x n ∣ μ j , Σ j ) } , k ∈ { 1 , ⋯ , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned} EP(Z∣X,θ)[znk]=znk∈{0,1}∑znkP(zn∣xn,θ(i))=j=1∑K{ πjN(xn∣μj,Σj) }πkN(xn∣μk,Σk),k∈{1,⋯,K}
\qquad 并以此来代替 z n k z_{nk} znk,用于进行对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的计算,若记
\qquad\qquad γ ( z n k ) = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K { π j N ( x n ∣ μ j , Σ j ) } , k ∈ { 1 , ⋯ , K } \gamma(z_{nk})=\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\ ,\ \ \ k\in\{1,\cdots,K\} γ(znk)=j=1∑K{ πjN(xn∣μj,Σj) }πkN(xn∣μk,Σk) , k∈{1,⋯,K}
\qquad 则对数似然函数为:
\qquad\qquad Q ( θ , θ ( i ) ) = ∑ n = 1 N ∑ k = 1 K γ ( z n k ) { ln π k + ln N ( x n ∣ μ k , Σ k ) } Q(\theta,\theta^{(i)})= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} Q(θ,θ(i))=n=1∑Nk=1∑Kγ(znk){ lnπk+lnN(xn∣μk,Σk) }
\qquad
6. M M M 步公式
\qquad 对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} μk,Σk 求最大似然解:
\qquad\qquad μ ^ k = 1 N k ∑ n = 1 N γ ( z n k ) x n \hat{\boldsymbol{\mu}}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})\boldsymbol{x}_{n} μ^k=Nk1n=1∑Nγ(znk)xn
\qquad\qquad Σ ^ k = 1 N k ∑ n = 1 N γ ( z n k ) ( x n − μ ^ k ) ( x n − μ ^ k ) T \begin{aligned}\hat{\boldsymbol{\Sigma}}_{k} &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T} \\ \end{aligned} Σ^k=Nk1n=1∑Nγ(znk)(xn−μ^k)(xn−μ^k)T
\qquad\qquad 其中, N k = ∑ n = 1 N γ ( z n k ) N_{k}=\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) Nk=n=1∑Nγ(znk)
\qquad
\qquad
求解
π
k
\pi_{k}
πk:为了使
Q
(
θ
,
θ
(
i
)
)
Q(\theta,\theta^{(i)})
Q(θ,θ(i)) 达到最大,同时必须满足
∑
k
π
k
=
1
\sum\limits_{k}\pi_{k}=1
k∑πk=1,采用拉格朗日乘子法进行求解
\qquad\qquad max { ∑ n = 1 N ∑ k = 1 K γ ( z n k ) { ln π k + ln N ( x n ∣ μ k , Σ k ) } + λ ( ∑ k π k − 1 ) } \max\ \left\{ \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} +\lambda(\sum\limits_{k}\pi_{k}-1) \right\} max {n=1∑Nk=1∑Kγ(znk){ lnπk+lnN(xn∣μk,Σk) }+λ(k∑πk−1)}
\qquad\qquad 对 π k \pi_{k} πk 求偏导:
\qquad\qquad\qquad ∑ n = 1 N γ ( z n k ) π k + λ = 0 \displaystyle\sum_{n=1}^{N}\frac{\gamma(z_{nk})}{\pi_{k}} +\lambda=0 n=1∑Nπkγ(znk)+λ=0
\qquad\qquad 等式两端都乘以 π k \pi_{k} πk :
\qquad\qquad\qquad ∑ n = 1 N γ ( z n k ) + π k λ = 0 \displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\pi_{k}\lambda=0 n=1∑Nγ(znk)+πkλ=0
\qquad\qquad 对 π k \pi_{k} πk 求和:
\qquad\qquad\qquad ∑ k = 1 K ∑ n = 1 N γ ( z n k ) + ∑ k = 1 K π k λ = 0 \displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\displaystyle\sum_{k=1}^{K}\pi_{k}\lambda=0 k=1∑Kn=1∑Nγ(znk)+k=1∑Kπkλ=0
\qquad\qquad 由于
\qquad\qquad\qquad ∑ k = 1 K ∑ n = 1 N γ ( z n k ) = ∑ k = 1 K ∑ n = 1 N π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K { π j N ( x n ∣ μ j , Σ j ) } = ∑ n = 1 N ∑ k = 1 K { π k N ( x n ∣ μ k , Σ k ) } ∑ j = 1 K { π j N ( x n ∣ μ j , Σ j ) } = N \begin{aligned}\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) &=\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &=\displaystyle\sum_{n=1}^{N}\dfrac{ \sum\limits_{k=1}^{K}\left\{\ \pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &= N \\ \end{aligned} k=1∑Kn=1∑Nγ(znk)=k=1∑Kn=1∑Nj=1∑K{ πjN(xn∣μj,Σj) }πkN(xn∣μk,Σk)=n=1∑Nj=1∑K{ πjN(xn∣μj,Σj) }k=1∑K{ πkN(xn∣μk,Σk) }=N
\qquad\qquad 可得到: λ = − N \lambda=-N λ=−N
\qquad\qquad 因此: π k = 1 N ∑ n = 1 N γ ( z n k ) = N k N \pi_{k}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) =\dfrac{N_{k}}{N} πk=N1n=1∑Nγ(znk)=NNk
\qquad
代码实现
\qquad
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
def gen_gausssian(num):
mean1 = [1,2]
cov1 = [[5,3],[3,10]]
data1 = np.random.multivariate_normal(mean1,cov1,num)
label1 = np.zeros((1,num)).T
data11 = np.concatenate([data1,label1],axis=1)
mean2 = [9,7]
cov2 = [[10,-3],[-3,5]]
data2 = np.random.multivariate_normal(mean2,cov2,num)
label2 = np.ones((1,num)).T
data22 = np.concatenate([data2,label2],axis=1)
mean3 = [8,0]
cov3 = [[9,2],[2,5]]
data3 = np.random.multivariate_normal(mean3,cov3,num)
label3 = 2*np.ones((1,num)).T
data33 = np.concatenate([data3,label3],axis=1)
data = np.concatenate([data11,data22,data33],axis=0)
return np.round(data,4),data1,data2,data3,data11,data22,data33
def show_all_scatter(data):
data0 = data[:,0:2]
x,y = data0.T
plt.scatter(x,y,c='k',s=3)
plt.axis()
plt.title("p(x)")
plt.xlabel("x")
plt.ylabel("y")
def show_scatter(data1,data2,data3):
x1,y1 = data1.T
x2,y2 = data2.T
x3,y3 = data3.T
plt.scatter(x1,y1,c='b',s=3)
plt.scatter(x2,y2,c='r',s=3)
plt.scatter(x3,y3,c='g',s=3)
plt.axis()
plt.title("p(x,z)")
plt.xlabel("x")
plt.ylabel("y")
def gmm(indata,pi,mu,cov,K):
pi0 = pi
mu0 = mu
cov0 = cov
flag = 1
iter = 0
while flag:
t1 = np.zeros((indata.shape[0],K))
t2 = np.tile(pi0,(indata.shape[0],1))
for k in range(K):
t1[:,k] = multivariate_normal.pdf(indata, mu0[k], cov0[k])
mu1 = mu0
cov1 = cov0
t3 = t1*t2
gamma = np.transpose(t3.T/np.sum(t3,axis=1))
nk = np.sum(gamma,axis=0)
pi0 = nk/indata.shape[0]
mu0 = np.transpose(np.dot(indata.T,gamma)/nk)
for k in range(K):
t = np.zeros((2,2));
for n in range(indata.shape[0]):
t = t + np.dot(np.asmatrix(indata[n,:] - mu0[k,:]).T,np.asmatrix(indata[n,:] - mu0[k,:]))*gamma[n,k]
cov0[k] = np.asarray(t/nk[k])
if np.sum(mu0-mu1)<1:
flag = 0
iter = iter + 1
print('pi:\n',pi0)
print('mu:\n',mu0)
print('cov:\n',cov0)
if __name__ == '__main__':
data,data1,data2,data3,data11,data22,data33 = gen_gausssian(1000)
plt.figure(1)
show_scatter(data1,data2,data3)
plt.figure(2)
show_all_scatter(data)
plt.show()
indata = data[:,0:2]
d_min = np.ceil(np.min(indata))
d_max = np.floor(np.max(indata))
K = 3;
mean = np.zeros((3,2))
mean[0,:] = np.array([d_min+1, (d_min+d_max)/2.0])
mean[1,:] = np.array([d_max-1, d_min+1])
mean[2,:] = np.array([d_max-1, d_max-1])
cov = np.zeros((3,2,2))
cov[0,:,:] = np.eye(2)
cov[1,:,:] = 3*np.eye(2)
cov[2,:,:] = 5*np.eye(2)
pi = np.random.rand(3)
pi = pi/np.sum(pi)
gmm(indata,pi,mean,cov,K)
\qquad
程序中假定3个正态分布(每个分布1000个点,即
π
k
=
0.3333
\pi_{k}=0.3333
πk=0.3333)为:
\qquad\qquad μ 1 = [ 1 2 ] , Σ 1 = [ 5 3 3 10 ] \mu_{1} = \left[ \begin{matrix}1\\2\end{matrix}\right], \Sigma_{1}=\left[ \begin{matrix}5&3\\3&10\end{matrix}\right] μ1=[12],Σ1=[53310]
\qquad\qquad μ 2 = [ 9 7 ] , Σ 2 = [ 10 − 3 − 3 5 ] \mu_{2} = \left[ \begin{matrix}9\\7\end{matrix}\right], \Sigma_{2}=\left[ \begin{matrix}10&-3\\-3&5\end{matrix}\right] μ2=[97],Σ2=[10−3−35]
\qquad\qquad μ 3 = [ 8 0 ] , Σ 3 = [ 9 2 2 5 ] \mu_{3} = \left[ \begin{matrix}8\\0\end{matrix}\right], \Sigma_{3}=\left[ \begin{matrix}9&2\\2&5\end{matrix}\right] μ3=[80],Σ3=[9225]
\qquad
某一次程序的运行结果:
\qquad
π
\boldsymbol \pi
π:
\qquad\qquad
[0.2724863 0.39993072 0.32758298]
\qquad
μ
\boldsymbol\mu
μ:
\qquad\qquad
[[ 0.38057473 2.20601868]
\qquad\qquad
[ 7.69370185 -0.01075563]
\qquad\qquad
[ 8.44929411 7.27392284]]
\qquad
Σ
\bold \Sigma
Σ:
\qquad\qquad
[[[ 3.45907872 2.43860616 ]
\qquad\qquad
[ 2.43860616 8.10405857]]
\qquad\qquad
[[11.99347758 3.83693019 ]
\qquad\qquad
[ 3.83693019 5.45452493]]
\qquad\qquad
[[ 9.14835616 -0.6767731]
\qquad\qquad
[-0.6767731 3.86528346]]]