EM算法摘记(二):混合高斯分布

混合高斯分布(GMM)

\qquad 摘记(一)描述的三硬币问题实际上是一个具有 2 个子分布的概率模型,更加广为人知的是具有 K K K 个子分布的混合高斯模型,整个过程基本上和三硬币问题是一样的。

【1】混合高斯分布的 E M EM EM 公式的推导过程与“三硬币问题”几乎完全一样:
   只需要将“三硬币问题”中的 P ( y ∣ z k = 1 , θ ) = α k y ( 1 − α k ) 1 − y , k ∈ { 1 , 2 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{1-y},k\in\{1,2\} P(yzk=1,θ)=αky(1αk)1y,k{1,2} 替换为 P ( x ∣ z k = 1 , θ ) = N ( x ∣ μ k , Σ k ) , k ∈ { 1 , ⋯   , K } P(\boldsymbol{x}|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}),k\in\{1,\cdots,K\} P(xzk=1,θ)=N(xμk,Σk),k{1,,K} 就可以了。
 
【2】 不同之处仅仅在于:混合高斯分布的最大似然估计与“三硬币问题”不一样

\qquad 混合高斯模型 \qquad p ( x ∣ π , μ , Σ ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) p(\boldsymbol{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) p(xπ,μ,Σ)=k=1KπkN(xμk,Σk)    ,   ∑ k = 1 K π k = 1 ,   x ∈ R D \ \ ,\ \displaystyle\sum_{k=1}^{K}\pi_{k}=1,\ \boldsymbol{x}\in R^{D}   , k=1Kπk=1, xRD

\qquad\qquad\qquad\qquad   其中, N ( x ∣ μ , Σ ) = 1 ( 2 π ) D 2 ∣ Σ ∣ − 1 2 exp ⁡ [ − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ] \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu},\boldsymbol{\Sigma})= \dfrac{1}{{(2\pi)}^{\frac{D}{2}}{|\boldsymbol\Sigma|}^{-\frac{1}{2}}}\exp[-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T}\boldsymbol\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})] N(xμ,Σ)=(2π)2DΣ211exp[21(xμ)TΣ1(xμ)]

\qquad 下图是从一个 K = 3 K=3 K=3 的混合高斯分布中抽取了500个样本点,其中图(a)中的样本是从联合分布 p ( x , z ) = p ( z ) p ( x ∣ z ) p(\boldsymbol x,\mathbf z) =p(\mathbf z)p(\boldsymbol x|\mathbf z) p(x,z)=p(z)p(xz) 中抽取(包含了隐藏变量 z \mathbf z z 的信息, z \mathbf z z 的3种状态分别对应着绿色样本,对应了3种不同的高斯分布),图(b)是从边缘分布 p ( x ) = ∑ z p ( x , z ) p(\boldsymbol x)=\sum\limits_{\mathbf z}p(\boldsymbol x,\mathbf z) p(x)=zp(x,z) 中抽取(因为缺少隐藏变量 z \mathbf z z 的信息,所以只能看到有关 x \boldsymbol x x 的信息,无法知道 x \boldsymbol x x 到底属于哪个子分布 )。
在这里插入图片描述

From 《Pattern Recognition and Machine Learning》. Figure 9.5
 
p ( x ) = ∑ z p ( z ) p ( x ∣ z ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z) p(x)=zp(z)p(xz):an equivalent formulation of the Gaussian mixture involving an explicit latent variable(包含显式的隐藏变量)
  
可以使用 ancestral sampling 生成服从混合高斯模型概率分布的随机样本【图(a)】。
(1) 根据 p ( z ) p(z) p(z) 生成 z ∈ { 1 , 2 , 3 } z\in\{1,2,3\} z{1,2,3} 的值(例如以 π 3 \pi_{3} π3 的概率产生 z = 3 z=3 z=3 来表示 z 3 = 1 z_{3}=1 z3=1
(2) 再根据条件概率 p ( x ∣ z 3 = 1 ) = N ( x ∣ μ 3 , Σ 3 ) p(\boldsymbol x|z_{3}=1)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3}) p(xz3=1)=N(xμ3,Σ3) 产生 x x x 的值
 
如果在图(a)中忽略 z z z 的信息,就变成图(b)的情形。

\qquad
\qquad 图(b)中显示了从 p ( x ∣ π , μ , Σ ) p(\mathbf{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma}) p(xπ,μ,Σ) 中抽取了 N = 500 N=500 N=500 个样本为 { x 1 , x 2 , ⋯   , x N } \{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} {x1,x2,,xN},如果引入隐藏变量 z z z 表示GMM中的每一个分布 ,那么这些样本实际上可能是图(a)中所显示的 { ( x 1 , z 2 ) , ( x 2 , z 3 ) , ( x 3 , z 1 ) , ⋯   , ( x N , z 1 ) } \{(x_{1},z_{2}), (x_{2},z_{3}), (x_{3},z_{1}),\cdots,(x_{N},z_{1}) \} {(x1,z2),(x2,z3),(x3,z1),,(xN,z1)},此处采用的是1-of-K表示法, z k z_{k} zk 意味着 z k = 1 z_{k}=1 zk=1,表明该样本点是从第 k k k 个高斯分布中抽取的。
\qquad

1. 隐藏变量的 1-of-K 表示

\qquad 采用 1 − o f − K 1-of-K 1ofK 表示隐藏变量,需要将 K K K 个隐藏变量 组织成 向量 的形式,而且在由 K K K 个隐藏变量组成的向量 z = [ z 1 , z 2 , ⋯   , z K ] T \bold z =[ z_{1},z_{2},\cdots,z_{K} ]^{T} z=[z1,z2,,zK]T 中,仅有一个分量的值为1,其余分量均为0

\qquad 以上图中 K = 3 K=3 K=3 的情况为例:

1 ) \qquad1) 1) 隐藏变量 z 1 = 1 z_{1}=1 z1=1 ,即隐藏向量 z = [ 1 , 0 , 0 ] T \bold z=[1,0,0]^{T} z=[1,0,0]T,表示 事件“选中第1个(红色)子分布”
\qquad   该事件的概率为 P ( z 1 = 1 ∣ θ ) = π 1 P(z_{1}=1|\theta)=\pi_{1} P(z1=1∣θ)=π1

2 ) \qquad2) 2) 隐藏变量 z 2 = 1 z_{2}=1 z2=1 ,即隐藏向量 z = [ 0 , 1 , 0 ] T \bold z=[0,1,0]^{T} z=[0,1,0]T,表示 事件“选中第2个(绿色)子分布”
\qquad   该事件的概率为 P ( z 2 = 1 ∣ θ ) = π 2 P(z_{2}=1|\theta)=\pi_{2} P(z2=1∣θ)=π2

3 ) \qquad3) 3) 隐藏变量 z 3 = 1 z_{3}=1 z3=1 ,即隐藏向量 z = [ 0 , 0 , 1 ] T \bold z=[0,0,1]^{T} z=[0,0,1]T,表示 事件“选中第3个(蓝色)子分布”
\qquad   该事件的概率为 P ( z 3 = 1 ∣ θ ) = π 3 P(z_{3}=1|\theta)=\pi_{3} P(z3=1∣θ)=π3

\qquad 选中第k个子分布的(先验)概率可以统一描述为: P ( z k = 1 ∣ θ ) = π k   ,     k ∈ { 1 , 2 , 3 } P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2,3\} P(zk=1∣θ)=πk ,   k{1,2,3}
\qquad   
\qquad 隐藏向量 z = [ z 1 , z 2 , z 3 ] T \mathbf z=[z_{1},z_{2},z_{3}]^{T} z=[z1,z2,z3]T 表示,也就是:

\qquad\qquad    P ( z ∣ θ ) = ∏ k = 1 3 π k z k = π 1 z 1 ⋅ π 2 z 2 ⋅ π 3 z 3    ,   z = [ z 1 z 2 z 3 ] ∈ { [ 1 0 0 ] , [ 0 1 0 ] , [ 0 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{3} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdot\pi_{3}^{z_{3}}\ \ ,\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\0\\1 \end{matrix} \right]\right\} \end{aligned} P(zθ)=k=13πkzk=π1z1π2z2π3z3  , z= z1z2z3 100 , 010 , 001

\qquad 对于具有 K K K 个分布的 GMM,隐藏向量 z = [ z 1 , z 2 , ⋯   , z K ] T \mathbf z=[z_{1},z_{2},\cdots,z_{K}]^{T} z=[z1,z2,,zK]T,则为:

\qquad\qquad    P ( z ∣ θ ) = ∏ k = 1 K π k z k \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned} P(zθ)=k=1Kπkzk = π 1 z 1 ⋅ π 2 z 2 ⋯ π K z K ,   z = [ z 1 z 2 z 3 ⋮ z K ] ∈ { [ 1 0 0 ⋮ 0 ] , [ 0 1 0 ⋮ 0 ] , ⋯   , [ 0 0 ⋮ 0 1 ] } =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdots\pi_{K}^{z_{K}},\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} =π1z1π2z2πKzK, z= z1z2z3zK 1000 , 0100 ,, 0001

\qquad

2. 引入隐藏向量 z \mathbf z z 表示观测值 x \boldsymbol x x 的概率

\qquad 从图(a)中可以很明显地看出:

1 ) \qquad1) 1) 红色的观测样本 x \boldsymbol x x 都是由第1个(红色)子分布产生,其概率为: P ( x ∣ z 1 = 1 , θ ) = N ( x ∣ μ 1 , Σ 1 ) P(\boldsymbol x|z_{1}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1}) P(xz1=1,θ)=N(xμ1,Σ1)

2 ) \qquad2) 2) 绿色的观测样本 x \boldsymbol x x 都是由第2个(绿色)子分布产生,其概率为: P ( x ∣ z 2 = 1 , θ ) = N ( x ∣ μ 2 , Σ 2 ) P(\boldsymbol x|z_{2}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2}) P(xz2=1,θ)=N(xμ2,Σ2)

3 ) \qquad3) 3) 蓝色的观测样本 x \boldsymbol x x 都是由第3个(蓝色)子分布产生,其概率为: P ( x ∣ z 3 = 1 , θ ) = N ( x ∣ μ 3 , Σ 3 ) P(\boldsymbol x|z_{3}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3}) P(xz3=1,θ)=N(xμ3,Σ3)
 在这里插入图片描述
\qquad 因此,图(a)中不同颜色样本点所描述的子分布(此处 K = 3 K=3 K=3)可以统一描述为:
\qquad
P ( x ∣ z k = 1 , θ ) = N ( x ∣ μ k , Σ k )   , k = 1 , ⋯   , K \qquad\qquad P(\boldsymbol x|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ ,\qquad k=1,\cdots,K P(xzk=1,θ)=N(xμk,Σk) ,k=1,,K
\qquad
\qquad 隐藏向量 z = [ z 1 z 2 z 3 ⋮ z K ] ∈ { [ 1 0 0 ⋮ 0 ] , [ 0 1 0 ⋮ 0 ] , ⋯   , [ 0 0 ⋮ 0 1 ] } \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} z= z1z2z3zK 1000 , 0100 ,, 0001 表示,也就是:
\qquad
\qquad\qquad   P ( x ∣ z , θ ) = ∏ k = 1 K P ( x ∣ z k = 1 , θ ) z k = P ( x ∣ z 1 = 1 , θ ) z 1 ⋅ P ( x ∣ z 2 = 1 , θ ) z 2 ⋯ P ( x ∣ z K = 1 , θ ) z K = N ( x ∣ μ 1 , Σ 1 ) z 1 ⋅ N ( x ∣ μ 2 , Σ 2 ) z 2 ⋯ N ( x ∣ μ K , Σ K ) z K = ∏ k = 1 K [   N ( x ∣ μ k , Σ k )   ] z k \begin{aligned} P(\boldsymbol{x}|\mathbf z,\theta) &= \prod_{k=1}^{K} P(\boldsymbol{x}|z_{k}=1,\theta)^{z_{k}} \\ &=P(\boldsymbol{x}|z_{1}=1,\theta)^{z_{1}}\cdot P(\boldsymbol{x}|z_{2}=1,\theta)^{z_{2}} \cdots P(\boldsymbol{x}|z_{K}=1,\theta)^{z_{K}}\\ &=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})^{z_{1}}\cdot\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})^{z_{2}} \cdots\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})^{z_{K}} \\ &= \prod_{k=1}^{K} \left[\ \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right]^{z_{k}} \end{aligned} P(xz,θ)=k=1KP(xzk=1,θ)zk=P(xz1=1,θ)z1P(xz2=1,θ)z2P(xzK=1,θ)zK=N(xμ1,Σ1)z1N(xμ2,Σ2)z2N(xμK,ΣK)zK=k=1K[ N(xμk,Σk) ]zk
\qquad
\qquad 又由 P ( z ∣ θ ) = ∏ k = 1 K π k z k \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned} P(zθ)=k=1Kπkzk

\qquad 则有: p ( x ) = ∑ z p ( z ) p ( x ∣ z ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z)=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) p(x)=zp(z)p(xz)=k=1KπkN(xμk,Σk)

\qquad 对于所有样本集 X = { x 1 , x 2 , ⋯   , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,,xN},对应的隐藏变量(向量)集为 Z = { z 1 , z 2 , ⋯   , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} Z={z1,z2,,zN},就有:

p ( X ∣ Z , μ , Σ , π ) = ∏ n = 1 N ∏ k = 1 K {   N ( x n ∣ μ k , Σ k )   } z n k \qquad\qquad\begin{aligned} p(\mathbf X|\mathbf Z,\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned} p(XZ,μ,Σ,π)=n=1Nk=1K{ N(xnμk,Σk) }znk

\qquad 可以得到:

p ( X , Z ∣ μ , Σ , π ) = ∏ n = 1 N ∏ k = 1 K π k z n k {   N ( x n ∣ μ k , Σ k )   } z n k \qquad\qquad\begin{aligned} p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K} \pi_{k}^{z_{nk}}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned} p(X,Zμ,Σ,π)=n=1Nk=1Kπkznk{ N(xnμk,Σk) }znk

\qquad\qquad 其中, z n k z_{nk} znk 表示样本 x n \boldsymbol x_{n} xn 所对应隐藏变量(向量) z n \mathbf z_{n} zn 的第 k k k 个分量值(0或者1)。

\qquad

3. 对数似然函数

\qquad
\qquad 若样本点为图(a)的形式,也就是假设隐藏变量集 Z \mathbf Z Z 的信息已知(或说 z n k z_{nk} znk 信息已知)时,为了获得模型中的未知参数 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) (μ,Σ,π),可以求取 ln ⁡ p ( X , Z ∣ μ , Σ , π ) \ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) lnp(X,Zμ,Σ,π) 的最大似然解。

ln ⁡ p ( X , Z ∣ μ , Σ , π ) = ∑ n = 1 N ∑ k = 1 K z n k {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } \qquad\qquad\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) = \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} lnp(X,Zμ,Σ,π)=n=1Nk=1Kznk{ lnπk+lnN(xnμk,Σk) }

\qquad 如果 z n k z_{nk} znk 信息已知,可将所有样本 X = { x 1 , x 2 , ⋯   , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,,xN} 分为 K K K 组,第 k k k C k = { n   ∣   z n k = 1 , z n j = 0 , ∀ j ≠ k } C_{k}=\{n\ |\ z_{nk}=1,z_{nj}=0,\forall j \not =k\} Ck={n  znk=1,znj=0,j=k} 中的所有样本点 x n \boldsymbol x_{n} xn 是从 N ( x n ∣ μ k , Σ k ) \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) N(xnμk,Σk) 中抽样的,对数似然函数可以表示为:

ln ⁡ p ( X , Z ∣ μ , Σ , π ) = ∑ n ∈ C 1 {   ln ⁡ π 1 + ln ⁡ N ( x n ∣ μ 1 , Σ 1 )   } + ⋯ + ∑ n ∈ C k {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } + ⋯ + ∑ n ∈ C K {   ln ⁡ π K + ln ⁡ N ( x n ∣ μ K , Σ K )   } \qquad\qquad\begin{aligned}\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{1}}\left\{\ \ln\pi_{1}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})\ \right\} \\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{K}}\left\{\ \ln\pi_{K}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})\ \right\} \end{aligned} lnp(X,Zμ,Σ,π)=nC1{ lnπ1+lnN(xnμ1,Σ1) }++nCk{ lnπk+lnN(xnμk,Σk) }++nCK{ lnπK+lnN(xnμK,ΣK) }

\qquad 由于 K K K 组分布相互独立,只需要分别求出每一组的对数最大似然分量,对于第 k k k 组而言(假设有 N k N_{k} Nk 个样本):

ln ⁡ p ( X k , Z k ∣ μ , Σ , π ) = ∑ n ∈ C k {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } = N k ln ⁡ π k + N k ln ⁡ N ( x n ∣ μ k , Σ k )   = N k ln ⁡ π k − N k D 2 ln ⁡ ( 2 π ) − N k 2 ln ⁡ ∣ Σ k ∣ − 1 2 ∑ n ∈ C k ( x n − μ k ) T Σ k − 1 ( x n − μ k )   \qquad\qquad\begin{aligned}\ln p(\mathbf X_{k},\mathbf Z_{k}|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ &= N_{k}\ln\pi_{k}+ N_{k}\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \\ &= N_{k}\ln\pi_{k}-\dfrac{N_{k}D}{2}\ln(2\pi)-\dfrac{N_{k}}{2}\ln\begin{vmatrix}\boldsymbol{\Sigma}_{k} \end{vmatrix}-\dfrac{1}{2}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})^{T}\boldsymbol{\Sigma}_{k}^{-1}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})\ \\ \end{aligned} lnp(Xk,Zkμ,Σ,π)=nCk{ lnπk+lnN(xnμk,Σk) }=Nklnπk+NklnN(xnμk,Σk) =Nklnπk2NkDln(2π)2Nkln Σk 21nCk(xnμk)TΣk1(xnμk) 

\qquad 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} μk,Σk 求最大似然解:

μ ^ k = 1 N k ∑ n ∈ C k x n = 1 N k ∑ n = 1 N z n k x n \qquad\qquad\hat{\boldsymbol{\mu}}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}\boldsymbol{x}_{n}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}\boldsymbol{x}_{n} μ^k=Nk1nCkxn=Nk1n=1Nznkxn

Σ ^ k = 1 N k ∑ n ∈ C k ( x n − μ ^ k ) ( x n − μ ^ k ) T = 1 N k ∑ n = 1 N z n k ( x n − μ ^ k ) ( x n − μ ^ k ) T \qquad\qquad\begin{aligned}\hat{\boldsymbol{\Sigma}}_{k}&=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T}\\ &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T} \\ \end{aligned} Σ^k=Nk1nCk(xnμ^k)(xnμ^k)T=Nk1n=1Nznk(xnμ^k)(xnμ^k)T

\qquad\qquad 其中, N k = ∑ n = 1 N z n k N_{k}=\displaystyle\sum_{n=1}^{N}z_{nk} Nk=n=1Nznk

\qquad

4. 计算对数似然函数的期望

\qquad 由于所有隐藏变量 Z = { z 1 , z 2 , ⋯   , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} Z={z1,z2,,zN} 都是未知的, E M EM EM 算法通过求对数似然函数 ln ⁡ p ( X , Z ∣ μ , Σ , π ) \ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) lnp(X,Zμ,Σ,π) 的期望来猜测所有的隐藏变量。

\qquad 对数似然函数的期望:

\qquad\qquad Q ( θ , θ ( i ) ) = E p ( Z ∣ X , θ ) [ ln ⁡ p ( X , Z ∣ μ , Σ , π ) ] = E p ( Z ∣ X , θ ) { ∑ n = 1 N ∑ k = 1 K z n k {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } } = ∑ n = 1 N ∑ k = 1 K E p ( Z ∣ X , θ ) [ z n k ] {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } \begin{aligned}Q(\theta,\theta^{(i)})&=E_{p(\bold Z|\bold X,\theta)}[\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})] \\ &= E_{p(\bold Z|\bold X,\theta)}\left\{\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} E_{p(\bold Z|\bold X,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ \end{aligned} Q(θ,θ(i))=Ep(ZX,θ)[lnp(X,Zμ,Σ,π)]=Ep(ZX,θ){n=1Nk=1Kznk{ lnπk+lnN(xnμk,Σk) }}=n=1Nk=1KEp(ZX,θ)[znk]{ lnπk+lnN(xnμk,Σk) }

\qquad 仍然假设观测数据 X = { x 1 , x 2 , ⋯   , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,,xN} 的产生过程是: \newline
1 ) \qquad1) 1) 首先,以 P ( z k = 1 ∣ θ ) = π k ,   k ∈ { 1 , ⋯   , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} P(zk=1∣θ)=πk, k{1,,K} 的概率,确定 z k = 1 ,   z j = 0   ( ∀ j ≠ k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) zk=1, zj=0 (j=k),也就是选择第 k k k 个子分布 N ( x ∣ μ k Σ k ) \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k}\boldsymbol{\Sigma}_{k})\newline N(xμkΣk)
2 ) \qquad2) 2) 然后,从第 k k k 个子分布 N ( x ∣ μ k , Σ k ) \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) N(xμk,Σk) 中抽出样本 x \boldsymbol x\newline x
\qquad 如果每个观测样本 x n \boldsymbol x_{n} xn 的产生过程都是独立的,那么观测样本 x n \boldsymbol x_{n} xn 的产生只与隐藏向量 z n \mathbf z_{n} zn 有关

\qquad 也就是求

\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k p ( z n ∣ x n , θ ( i ) ) E_{P(\bold Z|\bold X,\theta)}[z_{nk}]=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} p(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) EP(ZX,θ)[znk]=znk{0,1}znkp(znxn,θ(i))

\qquad 又由

\qquad\qquad p ( z n ∣ x n , θ ) = p ( z n , x n ∣ θ ) p ( x n ∣ θ ) = p ( x n ∣ z n , θ ) p ( z n ∣ θ ) ∑ z n {   p ( x n ∣ z n , θ ) p ( z n ∣ θ )   } ,       z n = [ z n 1 z n 2 z n 3 ⋮ z n K ] ∈ { [ 1 0 0 ⋮ 0 ] , [ 0 1 0 ⋮ 0 ] , ⋯   , [ 0 0 ⋮ 0 1 ] } = p ( x n ∣ z n k = 1 , θ ) p ( z n k = 1 ∣ θ ) ∑ j = 1 K {   p ( x n ∣ z n j = 1 , θ ) p ( z n j = 1 ∣ θ )   } = π k p ( x n ∣ z n k = 1 , θ ) ∑ j = 1 K {   π j p ( x n ∣ z n j = 1 , θ )   } \begin{aligned}p(\mathbf z_{n}|\boldsymbol x_{n},\theta) &=\frac{p(\mathbf z_{n},\boldsymbol x_{n}|\theta)}{p(\boldsymbol x_{n}|\theta)} \\ &=\frac{p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)}{\sum\limits_{\mathbf z_{n}}\left\{ \ p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)\ \right\}},\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2}\\z_{n3}\\ \vdots \\z_{nK} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} \\ &=\frac{p(\boldsymbol x_{n}|z_{nk}=1,\theta)p(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta)p(z_{nj}=1|\theta)\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta)}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta)\ \right\}} \\ \end{aligned} p(znxn,θ)=p(xnθ)p(zn,xnθ)=zn{ p(xnzn,θ)p(znθ) }p(xnzn,θ)p(znθ),     zn= zn1zn2zn3znK 1000 , 0100 ,, 0001 =j=1K{ p(xnznj=1,θ)p(znj=1∣θ) }p(xnznk=1,θ)p(znk=1∣θ)=j=1K{ πjp(xnznj=1,θ) }πkp(xnznk=1,θ)

\qquad 因此

\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k P ( z n ∣ x n , θ ( i ) ) = 1 ⋅ p ( x n ∣ z n k = 1 , θ ( i ) ) p ( z n k = 1 ∣ θ ( i ) ) + 0 ⋅ p ( x n ∣ z n k = 0 , θ ( i ) ) p ( z n k = 0 ∣ θ ( i ) ) ∑ j = 1 K {   p ( x n ∣ z n j = 1 , θ ( i ) ) p ( z n j = 1 ∣ θ ( i ) )   } = π k p ( x n ∣ z n k = 1 , θ ( i ) ) ∑ j = 1 K {   π j p ( x n ∣ z n j = 1 , θ ( i ) )   } = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K {   π j N ( x n ∣ μ j , Σ j )   } , k ∈ { 1 , ⋯   , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{1\cdot p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})p(z_{nk}=1|\theta^{(i)})+0\cdot p(\boldsymbol x_{n}|z_{nk}=0,\theta^{(i)})p(z_{nk}=0|\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})p(z_{nj}=1|\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned} EP(ZX,θ)[znk]=znk{0,1}znkP(znxn,θ(i))=j=1K{ p(xnznj=1,θ(i))p(znj=1∣θ(i)) }1p(xnznk=1,θ(i))p(znk=1∣θ(i))+0p(xnznk=0,θ(i))p(znk=0∣θ(i))=j=1K{ πjp(xnznj=1,θ(i)) }πkp(xnznk=1,θ(i))=j=1K{ πjN(xnμj,Σj) }πkN(xnμk,Σk),k{1,,K}

5. E E E 步公式

\qquad
\qquad 由于在图(b)抽取的数据中,我们并不知道隐藏变量 z n \mathbf z_{n} zn 的值,因此我们在给定模型参数为 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) (μ,Σ,π) 的情况下,求取了 z n \mathbf z_{n} zn 的期望:

\qquad\qquad E P ( Z ∣ X , θ ) [ z n k ] = ∑ z n k ∈ { 0 , 1 } z n k P ( z n ∣ x n , θ ( i ) ) = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K {   π j N ( x n ∣ μ j , Σ j )   } , k ∈ { 1 , ⋯   , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned} EP(ZX,θ)[znk]=znk{0,1}znkP(znxn,θ(i))=j=1K{ πjN(xnμj,Σj) }πkN(xnμk,Σk),k{1,,K}

\qquad 并以此来代替 z n k z_{nk} znk,用于进行对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的计算,若记

\qquad\qquad γ ( z n k ) = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K {   π j N ( x n ∣ μ j , Σ j )   }   ,     k ∈ { 1 , ⋯   , K } \gamma(z_{nk})=\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\ ,\ \ \ k\in\{1,\cdots,K\} γ(znk)=j=1K{ πjN(xnμj,Σj) }πkN(xnμk,Σk) ,   k{1,,K}

\qquad 则对数似然函数为:

\qquad\qquad Q ( θ , θ ( i ) ) = ∑ n = 1 N ∑ k = 1 K γ ( z n k ) {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } Q(\theta,\theta^{(i)})= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} Q(θ,θ(i))=n=1Nk=1Kγ(znk){ lnπk+lnN(xnμk,Σk) }

\qquad

6. M M M 步公式

\qquad 对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} μk,Σk 求最大似然解:

\qquad\qquad μ ^ k = 1 N k ∑ n = 1 N γ ( z n k ) x n \hat{\boldsymbol{\mu}}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})\boldsymbol{x}_{n} μ^k=Nk1n=1Nγ(znk)xn

\qquad\qquad Σ ^ k = 1 N k ∑ n = 1 N γ ( z n k ) ( x n − μ ^ k ) ( x n − μ ^ k ) T \begin{aligned}\hat{\boldsymbol{\Sigma}}_{k} &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})(\boldsymbol{x}_{n}-\hat{\boldsymbol{\mu}}_{k})^{T} \\ \end{aligned} Σ^k=Nk1n=1Nγ(znk)(xnμ^k)(xnμ^k)T

\qquad\qquad 其中, N k = ∑ n = 1 N γ ( z n k ) N_{k}=\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) Nk=n=1Nγ(znk)

\qquad
\qquad 求解 π k \pi_{k} πk:为了使 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 达到最大,同时必须满足 ∑ k π k = 1 \sum\limits_{k}\pi_{k}=1 kπk=1,采用拉格朗日乘子法进行求解

\qquad\qquad   max ⁡   { ∑ n = 1 N ∑ k = 1 K γ ( z n k ) {   ln ⁡ π k + ln ⁡ N ( x n ∣ μ k , Σ k )   } + λ ( ∑ k π k − 1 ) } \max\ \left\{ \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} +\lambda(\sum\limits_{k}\pi_{k}-1) \right\} max {n=1Nk=1Kγ(znk){ lnπk+lnN(xnμk,Σk) }+λ(kπk1)}

\qquad\qquad π k \pi_{k} πk 求偏导:

\qquad\qquad\qquad   ∑ n = 1 N γ ( z n k ) π k + λ = 0 \displaystyle\sum_{n=1}^{N}\frac{\gamma(z_{nk})}{\pi_{k}} +\lambda=0 n=1Nπkγ(znk)+λ=0

\qquad\qquad 等式两端都乘以 π k \pi_{k} πk

\qquad\qquad\qquad   ∑ n = 1 N γ ( z n k ) + π k λ = 0 \displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\pi_{k}\lambda=0 n=1Nγ(znk)+πkλ=0

\qquad\qquad π k \pi_{k} πk 求和:

\qquad\qquad\qquad   ∑ k = 1 K ∑ n = 1 N γ ( z n k ) + ∑ k = 1 K π k λ = 0 \displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\displaystyle\sum_{k=1}^{K}\pi_{k}\lambda=0 k=1Kn=1Nγ(znk)+k=1Kπkλ=0

\qquad\qquad 由于

\qquad\qquad\qquad   ∑ k = 1 K ∑ n = 1 N γ ( z n k ) = ∑ k = 1 K ∑ n = 1 N π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K {   π j N ( x n ∣ μ j , Σ j )   } = ∑ n = 1 N ∑ k = 1 K {   π k N ( x n ∣ μ k , Σ k )   } ∑ j = 1 K {   π j N ( x n ∣ μ j , Σ j )   } = N \begin{aligned}\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) &=\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &=\displaystyle\sum_{n=1}^{N}\dfrac{ \sum\limits_{k=1}^{K}\left\{\ \pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &= N \\ \end{aligned} k=1Kn=1Nγ(znk)=k=1Kn=1Nj=1K{ πjN(xnμj,Σj) }πkN(xnμk,Σk)=n=1Nj=1K{ πjN(xnμj,Σj) }k=1K{ πkN(xnμk,Σk) }=N

\qquad\qquad 可得到:  λ = − N \lambda=-N λ=N

\qquad\qquad 因此:  π k = 1 N ∑ n = 1 N γ ( z n k ) = N k N \pi_{k}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) =\dfrac{N_{k}}{N} πk=N1n=1Nγ(znk)=NNk

\qquad

代码实现

\qquad

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

def gen_gausssian(num):
    mean1 = [1,2]
    cov1 = [[5,3],[3,10]]
    data1 = np.random.multivariate_normal(mean1,cov1,num)
    label1 = np.zeros((1,num)).T
    data11 = np.concatenate([data1,label1],axis=1)
    
    mean2 = [9,7]
    cov2 = [[10,-3],[-3,5]]
    data2 = np.random.multivariate_normal(mean2,cov2,num)
    label2 = np.ones((1,num)).T
    data22 = np.concatenate([data2,label2],axis=1)    
    
    mean3 = [8,0]
    cov3 = [[9,2],[2,5]]
    data3 = np.random.multivariate_normal(mean3,cov3,num)
    label3 = 2*np.ones((1,num)).T
    data33 = np.concatenate([data3,label3],axis=1)
    
    data = np.concatenate([data11,data22,data33],axis=0)
    
    return np.round(data,4),data1,data2,data3,data11,data22,data33

def show_all_scatter(data):    
    data0 = data[:,0:2]
    x,y = data0.T
    plt.scatter(x,y,c='k',s=3)    
    
    plt.axis()
    plt.title("p(x)")
    plt.xlabel("x")
    plt.ylabel("y")
    
def show_scatter(data1,data2,data3):
    x1,y1 = data1.T
    x2,y2 = data2.T
    x3,y3 = data3.T
    plt.scatter(x1,y1,c='b',s=3)
    plt.scatter(x2,y2,c='r',s=3)
    plt.scatter(x3,y3,c='g',s=3)    
   
    plt.axis()
    plt.title("p(x,z)")
    plt.xlabel("x")
    plt.ylabel("y")
    
def gmm(indata,pi,mu,cov,K):
    pi0 = pi
    mu0 = mu
    cov0 = cov
    flag = 1
    iter = 0
    while flag:
        t1 = np.zeros((indata.shape[0],K))
        t2 = np.tile(pi0,(indata.shape[0],1))
        for k in range(K):
            t1[:,k] = multivariate_normal.pdf(indata, mu0[k], cov0[k])
            
        mu1 = mu0
        cov1 = cov0
        
        t3 = t1*t2
        gamma = np.transpose(t3.T/np.sum(t3,axis=1))
        
        nk = np.sum(gamma,axis=0)
        pi0 = nk/indata.shape[0]
        mu0 = np.transpose(np.dot(indata.T,gamma)/nk)
        for k in range(K):
            t = np.zeros((2,2));
            for n in range(indata.shape[0]):
                t = t + np.dot(np.asmatrix(indata[n,:] - mu0[k,:]).T,np.asmatrix(indata[n,:] - mu0[k,:]))*gamma[n,k]
            cov0[k] = np.asarray(t/nk[k])
            
        if np.sum(mu0-mu1)<1:
            flag = 0
            
        iter = iter + 1
        
    print('pi:\n',pi0)
    print('mu:\n',mu0)
    print('cov:\n',cov0)
    
if __name__ == '__main__':
    
    data,data1,data2,data3,data11,data22,data33 = gen_gausssian(1000)
    plt.figure(1)
    show_scatter(data1,data2,data3)
    plt.figure(2)
    show_all_scatter(data)    
    plt.show()
    
    indata = data[:,0:2]
    d_min = np.ceil(np.min(indata))
    d_max = np.floor(np.max(indata))
    K = 3;
    mean = np.zeros((3,2))
    mean[0,:] = np.array([d_min+1, (d_min+d_max)/2.0])
    mean[1,:] = np.array([d_max-1, d_min+1])
    mean[2,:] = np.array([d_max-1, d_max-1])
    cov = np.zeros((3,2,2))
    cov[0,:,:] = np.eye(2)
    cov[1,:,:] = 3*np.eye(2)
    cov[2,:,:] = 5*np.eye(2) 
    pi = np.random.rand(3)
    pi = pi/np.sum(pi)
    
    gmm(indata,pi,mean,cov,K)

在这里插入图片描述
\qquad 程序中假定3个正态分布(每个分布1000个点,即 π k = 0.3333 \pi_{k}=0.3333 πk=0.3333)为:

\qquad\qquad μ 1 = [ 1 2 ] , Σ 1 = [ 5 3 3 10 ] \mu_{1} = \left[ \begin{matrix}1\\2\end{matrix}\right], \Sigma_{1}=\left[ \begin{matrix}5&3\\3&10\end{matrix}\right] μ1=[12],Σ1=[53310]

\qquad\qquad μ 2 = [ 9 7 ] , Σ 2 = [ 10 − 3 − 3 5 ] \mu_{2} = \left[ \begin{matrix}9\\7\end{matrix}\right], \Sigma_{2}=\left[ \begin{matrix}10&-3\\-3&5\end{matrix}\right] μ2=[97],Σ2=[10335]

\qquad\qquad μ 3 = [ 8 0 ] , Σ 3 = [ 9 2 2 5 ] \mu_{3} = \left[ \begin{matrix}8\\0\end{matrix}\right], \Sigma_{3}=\left[ \begin{matrix}9&2\\2&5\end{matrix}\right] μ3=[80],Σ3=[9225]

\qquad 某一次程序的运行结果:
在这里插入图片描述
\qquad π \boldsymbol \pi π:
\qquad\qquad [0.2724863 0.39993072 0.32758298]
\qquad μ \boldsymbol\mu μ:
\qquad\qquad [[ 0.38057473 2.20601868]
\qquad\qquad [ 7.69370185 -0.01075563]
\qquad\qquad [ 8.44929411 7.27392284]]
\qquad Σ \bold \Sigma Σ:
\qquad\qquad [[[ 3.45907872 2.43860616 ]
\qquad\qquad [ 2.43860616 8.10405857]]

\qquad\qquad [[11.99347758 3.83693019 ]
\qquad\qquad [ 3.83693019 5.45452493]]

\qquad\qquad [[ 9.14835616 -0.6767731]
\qquad\qquad [-0.6767731 3.86528346]]]

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值