高斯混合模型的详细求解过程---【3】

1.多维高斯分布回顾

在上一篇文章中介绍了高斯混合模型的公式,但是在实际应用中每次迭代的公式又是如何出来的呢,抱着这个疑问,我也就写出了这篇博客,希望可以记录一下,如下所示,是多维高斯分布的分布公式:
N ( X ⃗ ∣ μ ⃗ , Σ ) = 1 ( 2 π ) D 2 ⋅ ∣ Σ ∣ 1 2 ⋅ e − ( X ⃗ − μ ⃗ ) T ⋅ ∑ − 1 ⋅ ( X ⃗ − μ ⃗ ) 2 N(\vec{X}\mid\vec{\mu},{\Sigma})=\frac{1}{{(2\pi)}^{\frac{D}{2}}\cdot {\mid \Sigma\mid}^{\frac{1}{2}}}\cdot e^{-\frac{({\vec{X}-\vec{\mu})}^T\cdot{\sum^{-1}}\cdot{({\vec{X}-\vec{\mu})}}}{2}} N(X μ ,Σ)=(2π)2DΣ211e2(X μ )T1(X μ )

2.极大似然估计

求解高斯混合模型的过程就是,就是求解所有模型参数的过程,也就是通过不断的迭代求取 π k \pi_k πk, μ i \mu_i μi, σ i 2 \sigma^2_i σi2这三个参数,并且使其趋于稳定从而得到稳定后的模型分布参数。
对于混合模型进行参数估计,首先都会想到极大似然估计,于是我们首先使用极大似然估计来进行参数的估计,其推导公式也即如下所示:
P = ∏ i = 1 N p ( x i ⃗ ∣ π k , μ k ) ⃗ , Σ k ) = ∏ i = 1 N ⋅ ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) = ∏ i = 1 N ⋅ ∑ k = 1 K π k 1 ( 2 π ) D 2 ⋅ ∣ Σ k ∣ 1 2 ⋅ e − ( x i ⃗ − μ k ⃗ ) T ⋅ Σ k − 1 ⋅ ( X i ⃗ − μ k ⃗ ) 2 = ∑ k = 1 K π k N ( x 1 ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∑ k = 1 K π k N ( x 2 ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ⋅ ⋅ ⋅ ∑ k = 1 K π k N ( x N ⃗ ∣ μ k ⃗ , Σ k ) P=\prod_{i=1}^{N}p(\vec{x_i}|\pi_k,\vec{{\mu_k)}},{\Sigma}_k)\quad\\ \qquad \\\qquad =\prod_{i=1}^{N}\cdot\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)\\ \qquad \\\qquad\qquad\qquad\qquad\qquad=\prod_{i=1}^{N}\cdot\sum_{k=1}^K\pi_k \frac{1}{{(2\pi)}^{\frac{D}{2}}\cdot {\mid \Sigma_k\mid}^{\frac{1}{2}}}\cdot e^{-\frac{({\vec{x_i}-\vec{\mu_k})}^T\cdot{\Sigma_k^{-1}}\cdot{({\vec{X_i}-\vec{\mu_k})}}}{2}}\\\qquad \\\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\quad=\sum_{k=1}^K\pi_k N(\vec{x_1}|\vec{\mu_k},\Sigma_k)\cdot \sum_{k=1}^K\pi_kN(\vec{x_2}|\vec{\mu_k},\Sigma_k)\cdot\cdot\cdot\cdot\sum_{k=1}^K\pi_k N(\vec{x_N}|\vec{\mu_k},\Sigma_k) P=i=1Np(xi πk,μk) ,Σk)=i=1Nk=1KπkN(xi μk ,Σk)=i=1Nk=1Kπk(2π)2DΣk211e2(xi μk )TΣk1(Xi μk )=k=1KπkN(x1 μk ,Σk)k=1KπkN(x2 μk ,Σk)k=1KπkN(xN μk ,Σk)

3.EM算法的E步

在理想情况下,每个样本应该只由一个混合成分生成,这个混合成分对应的就是被样本分配到的簇,这样样本 x i x_i xi只由第 k k k个混合成分组成,也就是说 p ( z i = k ∣ x i ) = 1 p(z_i=k\mid x_i)=1 p(zi=kxi)=1,并且 p ( z n = k ∣ x n ) = 0 p(z_n=k\mid x_n)=0 p(zn=kxn)=0,此时 n ≠ i n\neq i n=i,但是由于我们之前不知道这样的理想的高斯分布是怎样的,我们只能根据已经观察到的数据集,来获取每个样本由每个混合成分生成的概率,这个概率就是公式所表达的值,如下所示:
E ( h i k ∣ x i ) = 0 ⋅ p ( h i k = 0 ∣ x i ) ) + 1 ⋅ p ( h i k = 1 ∣ x i ) ) = p ( h i k = 1 ∣ x i ) ) = p ( z i = k ∣ x i ) = π k ⋅ p ( x i ∣ μ k , Σ k ) ∑ k = 1 K π k ⋅ p ( x i ∣ μ k , Σ k ) E(h_{ik}\mid x_i)=0\cdot p(h_{ik}=0\mid x_i))+1\cdot p(h_{ik}=1\mid x_i))\\ \qquad\qquad \\=p(h_{ik}=1\mid x_i)) \qquad\qquad\quad\\ \qquad \\=p(z_i=k\mid x_i)\qquad\qquad\qquad\\ \qquad \\\qquad\qquad\quad=\frac{\pi_k \cdot p(x_i\mid \mu_k,\Sigma_k)}{\sum_{k=1}^{K}\pi_k \cdot p(x_i\mid \mu_k,\Sigma_k)}\quad\quad\quad\qquad\qquad E(hikxi)=0p(hik=0xi))+1p(hik=1xi))=p(hik=1xi))=p(zi=kxi)=k=1Kπkp(xiμk,Σk)πkp(xiμk,Σk)
对上边的公式进行一下解释,

1.随机变量 h i k h_{ik} hik表示样本 x i x_i xi是否由第 k k k个混合成分生成
2.随机变量 h i k h_{ik} hik也就是EM算法的隐变量

如果随机变量 h i k h_{ik} hik表示样本 x i x_i xi是否由第 k k k个混合成分生成,则 h i k h_{ik} hik记为1,否则记为0。根据这个定义我们知道, h i 1 , h i 2 , h i 3 , … … , h i K h_{i1},h_{i2},h_{i3},……,h_{iK} hi1hi2hi3,hiK,这K个数中,只有一个为1,说明 x i x_i xi只由一个混合成分生成,其余所有数都为0,以下列出了其分布列:

h i k h_{ik} hik01
概率 p ( h i k = 0 ∣ x i ) p(h_{ik}=0\mid x_i) p(hik=0xi) p ( h i k = 1 ∣ x i ) p(h_{ik}=1\mid x_i) p(hik=1xi)

上边的公式就是根据隐变量的分布列,来求取其期望的公式。

4.EM算法的M步

EM算法的M步骤,就是对于先对数化似然函数,然后求取其极大值 ,对上式展开并且带入 N ( x i ⃗ ∣ μ k ⃗ , Σ k ) N(\vec{x_i}|\vec{\mu_k},\Sigma_k) N(xi μk ,Σk)公式可得
ln ⁡ P = ln ⁡ ( ∑ k = 1 K π k N ( x 1 ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∑ k = 1 K π k N ( x 2 ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ⋅ ⋅ ⋅ ∑ k = 1 K π k N ( x N ⃗ ∣ μ k ⃗ , Σ k ) ) = ln ⁡ ( ∑ k = 1 K π k N ( x 1 ⃗ ∣ μ k ⃗ , Σ k ) ) + ln ⁡ ( ∑ k = 1 K π k N ( x 2 ⃗ ∣ μ k ⃗ , Σ k ) ) + ⋅ ⋅ ⋅ ⋅ + ln ⁡ ( ∑ k = 1 K π k N ( x N ⃗ ∣ μ k ⃗ , Σ k ) ) = ∑ i = 1 N ⋅ ln ⁡ ( ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) \ln{P}=\ln(\sum_{k=1}^K\pi_k N(\vec{x_1}|\vec{\mu_k},\Sigma_k)\cdot \sum_{k=1}^K\pi_kN(\vec{x_2}|\vec{\mu_k},\Sigma_k)\cdot\cdot\cdot\cdot\sum_{k=1}^K\pi_k N(\vec{x_N}|\vec{\mu_k},\Sigma_k))\\\qquad \\\quad\quad\qquad\qquad\qquad=\ln(\sum_{k=1}^K\pi_k N(\vec{x_1}|\vec{\mu_k},\Sigma_k))+\ln(\sum_{k=1}^K\pi_kN(\vec{x_2}|\vec{\mu_k},\Sigma_k))+\cdot\cdot\cdot\cdot+\ln(\sum_{k=1}^K\pi_k N(\vec{x_N}|\vec{\mu_k},\Sigma_k))\\ \qquad \\=\sum_{i=1}^{N}\cdot \ln(\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k))\qquad\qquad\qquad\qquad\qquad\quad\quad\qquad\qquad lnP=ln(k=1KπkN(x1 μk ,Σk)k=1KπkN(x2 μk ,Σk)k=1KπkN(xN μk ,Σk))=ln(k=1KπkN(x1 μk ,Σk))+ln(k=1KπkN(x2 μk ,Σk))++ln(k=1KπkN(xN μk ,Σk))=i=1Nln(k=1KπkN(xi μk ,Σk))
根据之前所学的知识,因为我们要求的是 π k \pi_k πk, u k u_k uk, Σ k \Sigma_k Σk,所以在在这一过程中,我们通常采用的方法是分别对其求偏导,并且使其为0,求得分别的在极大值时的取值。下面我将会以 μ k \mu_k μk的求取来进行一下演示:
但是在开始之前需要补充一些矩阵求偏导的公式:

1 .若A为n阶方阵,x是n维列向量,则有:
∂ ( x T A x ) ∂ x = ( A + A T ) x \qquad\quad\frac{\partial(x^{T}Ax)}{\partial x}=(A+A^T)x x(xTAx)=(A+AT)x
2 .特殊的,当A为n阶对称方阵时,则有 A = A T A=A^T A=AT,上式可以简化为:
∂ ( x T A x ) ∂ x = ( A + A T ) x = 2 A x \qquad\quad\frac{\partial(x^TAx)}{\partial x}=(A+A^T)x=2Ax x(xTAx)=(A+AT)x=2Ax

补充完上述的方法之后,我们接下来就要对 ln ⁡ P \ln{P} lnP μ k \mu_k μk的偏导:
∂ ( ln ⁡ P ) ∂ μ k = ∂ ( ∑ i = 1 N ⋅ ln ⁡ ( ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) ) ∂ μ k = ∑ i = 1 N ⋅ ∂ ( ln ⁡ ( ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) ) ∂ μ k = ∑ i = 1 N [ 1 ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) ∂ μ k ] = ∑ i = 1 N [ 1 ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( π 1 N ( x i ⃗ ∣ μ 1 ⃗ , Σ 1 ) + π 2 N ( x i ⃗ ∣ μ 2 ⃗ , Σ 2 ) + … … + π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) + … … + π K N ( x i ⃗ ∣ μ K ⃗ , Σ K ) ) ∂ μ k ] = ∑ i = 1 N [ 1 ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) ∂ μ k ] = ∑ i = 1 N [ π k ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) ∂ μ k ] \frac{\partial(\ln{P})}{\partial \mu_k}=\frac{\partial(\sum_{i=1}^{N}\cdot \ln(\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)))}{\partial \mu_k}\\\qquad \\=\frac{\sum_{i=1}^{N}\cdot\partial( \ln(\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)))}{\partial \mu_k}\\\qquad \\=\sum_{i=1}^N\left[\frac{1}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial(\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k))}{\partial \mu_k}\right]\\\qquad \\=\sum_{i=1}^N\left[\frac{1}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial(\pi_1 N(\vec{x_i}|\vec{\mu_1},\Sigma_1)+\pi_2 N(\vec{x_i}|\vec{\mu_2},\Sigma_2)+……+\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)+……+\pi_KN(\vec{x_i}|\vec{\mu_K},\Sigma_K))}{\partial \mu_k}\right]\\\qquad \\=\sum_{i=1}^N\left[\frac{1}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial(\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k))}{\partial \mu_k}\right]\\\qquad \\=\sum_{i=1}^N\left[\frac{\pi_k}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial(N(\vec{x_i}|\vec{\mu_k},\Sigma_k))}{\partial \mu_k}\right] μk(lnP)=μk(i=1Nln(k=1KπkN(xi μk ,Σk)))=μki=1N(ln(k=1KπkN(xi μk ,Σk)))=i=1N[k=1KπkN(xi μk ,Σk)1μk(k=1KπkN(xi μk ,Σk))]=i=1N[k=1KπkN(xi μk ,Σk)1μk(π1N(xi μ1 ,Σ1)+π2N(xi μ2 ,Σ2)++πkN(xi μk ,Σk)++πKN(xi μK ,ΣK))]=i=1N[k=1KπkN(xi μk ,Σk)1μk(πkN(xi μk ,Σk))]=i=1N[k=1KπkN(xi μk ,Σk)πkμk(N(xi μk ,Σk))]
到这里之后我们需要将 N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ) N(\vec{x_i}|\vec{\mu_k},\Sigma_k)) N(xi μk ,Σk))的具体公式带入计算,其结果及运算过程如下所示:
上 式 = ∑ i = 1 N [ π k ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( 1 ( 2 π ) D 2 ⋅ ∣ Σ k ∣ 1 2 ⋅ e − ( x i ⃗ − μ k ⃗ ) T ⋅ Σ k − 1 ⋅ ( x i ⃗ − μ k ⃗ ) 2 ) ∂ μ k ] = ∑ i = 1 N [ π k ⋅ 1 ( 2 π ) D 2 ⋅ ∣ Σ k ∣ 1 2 ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ∂ ( e − ( x i ⃗ − μ k ⃗ ) T ⋅ Σ k − 1 ⋅ ( x i ⃗ − μ k ⃗ ) 2 ) ∂ μ k ] = ∑ i = 1 N [ π k ⋅ 1 ( 2 π ) D 2 ⋅ ∣ Σ k ∣ 1 2 ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ e − ( x i ⃗ − μ k ⃗ ) T ⋅ Σ k − 1 ⋅ ( x i ⃗ − μ k ⃗ ) 2 ⋅ ∂ ( − ( x i ⃗ − μ k ⃗ ) T ⋅ Σ k − 1 ⋅ ( x i ⃗ − μ k ⃗ ) 2 ) ∂ μ k ] 上式=\sum_{i=1}^N\left[\frac{\pi_k}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial(\frac{1}{{(2\pi)}^{\frac{D}{2}}\cdot {\mid \Sigma_k\mid}^{\frac{1}{2}}}\cdot e^{-\frac{({\vec{x_i}-\vec{\mu_k})}^T\cdot{\Sigma_k^{-1}}\cdot{({\vec{x_i}-\vec{\mu_k})}}}{2}})}{\partial \mu_k}\right]\\\qquad \\=\sum_{i=1}^N\left[\frac{\pi_k \cdot\frac{1}{(2\pi)^{\frac{D}{2}}\cdot\mid\Sigma_k\mid^{\frac{1}{2}}}}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\frac{\partial( e^{-\frac{({\vec{x_i}-\vec{\mu_k})}^T\cdot{\Sigma_k^{-1}}\cdot{({\vec{x_i}-\vec{\mu_k})}}}{2}})}{\partial \mu_k}\right]\qquad\quad\\\qquad\\\qquad\qquad\quad=\sum_{i=1}^N\left[\frac{\pi_k \cdot\frac{1}{(2\pi)^{\frac{D}{2}}\cdot\mid\Sigma_k\mid^{\frac{1}{2}}}}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot e^{-\frac{({\vec{x_i}-\vec{\mu_k})}^T\cdot{\Sigma_k^{-1}}\cdot{({\vec{x_i}-\vec{\mu_k})}}}{2}}\cdot\frac{\partial( {-\frac{({\vec{x_i}-\vec{\mu_k})}^T\cdot{\Sigma_k^{-1}}\cdot{({\vec{x_i}-\vec{\mu_k})}}}{2}})}{\partial \mu_k}\right] =i=1Nk=1KπkN(xi μk ,Σk)πkμk((2π)2DΣk211e2(xi μk )TΣk1(xi μk ))=i=1Nk=1KπkN(xi μk ,Σk)πk(2π)2DΣk211μk(e2(xi μk )TΣk1(xi μk ))=i=1Nk=1KπkN(xi μk ,Σk)πk(2π)2DΣk211e2(xi μk )TΣk1(xi μk )μk(2(xi μk )TΣk1(xi μk ))
到了这里下一步的计算就需要用到我们在本节开头的地方补充的向量求偏导的方法:
上 式 = ∑ i = 1 N [ π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ( − 1 2 ) ⋅ 2 ⋅ Σ k − 1 ( x i ⃗ − μ k ⃗ ) ⋅ ∂ ( x i ⃗ − μ k ⃗ ) ∂ μ k ] = ∑ i = 1 N [ π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ( − 1 2 ) ⋅ 2 ⋅ Σ k − 1 ( x i ⃗ − μ k ⃗ ) ⋅ ( − 1 ) ] = ∑ i = 1 N [ π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ Σ k − 1 ( x i ⃗ − μ k ⃗ ) ] = Σ k − 1 ⋅ ∑ i = 1 N [ π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ( x i ⃗ − μ k ⃗ ) ] ⇓ 令 上 式 = 0 上式=\sum_{i=1}^N\left[\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot(-\frac{1}{2})\cdot 2\cdot\Sigma_k^{-1}(\vec{x_i}-\vec{\mu_k})\cdot\frac{\partial(\vec{x_i}-\vec{\mu_k})}{\partial \mu_k}\right]\\\qquad\\=\sum_{i=1}^N\left[\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot(-\frac{1}{2})\cdot 2\cdot\Sigma_k^{-1}(\vec{x_i}-\vec{\mu_k})\cdot(-1)\right]\\\qquad\\=\sum_{i=1}^N\left[\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot\Sigma_k^{-1}(\vec{x_i}-\vec{\mu_k})\right] \qquad\qquad\qquad\qquad\\\qquad\\=\Sigma_k^{-1}\cdot\sum_{i=1}^N\left[\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot(\vec{x_i}-\vec{\mu_k})\right]\qquad\quad\qquad\qquad\\\qquad\qquad\\\Downarrow \qquad\\\qquad\qquad\qquad\qquad\qquad\qquad\\\qquad 令上式=0\qquad\qquad\quad =i=1N[k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)(21)2Σk1(xi μk )μk(xi μk )]=i=1N[k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)(21)2Σk1(xi μk )(1)]=i=1N[k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)Σk1(xi μk )]=Σk1i=1N[k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)(xi μk )]=0
对于上边等式的求解,因为 Σ k − 1 \Sigma_k^{-1} Σk1是第k个混合成分的协方差矩阵的逆,其一定为一个非奇异矩阵,又由线性代数知识,如果n阶方阵A为可逆矩阵,x为n维列向量,那么Ax=0有且仅有零解,即x=0。
对于我们的此处的计算,则会有如下等式:
Σ k − 1 ⋅ ∑ i = 1 N [ π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ ( x i ⃗ − μ k ⃗ ) ] = 0 ⇓ 令 h i k = π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⇓ 简 化 为 Σ k − 1 ⋅ ∑ i = 1 N [ h i k ⋅ ( x i ⃗ − μ k ⃗ ) ] = 0 ⇓ 解 得 μ k ⃗ = ∑ i = 1 N h i k ⋅ x i ⃗ ∑ i = 1 N h i k ⇓ 代 入 μ k ⃗ = ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ x i ⃗ ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) \Sigma_k^{-1}\cdot\sum_{i=1}^N\left[\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot(\vec{x_i}-\vec{\mu_k})\right]=0\\\qquad \qquad \\\Downarrow \\\qquad \\ 令h_{ik}=\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\\\qquad \qquad \\\Downarrow 简化为\\\qquad \\ \Sigma_k^{-1}\cdot\sum_{i=1}^N\left[h_{ik}\cdot(\vec{x_i}-\vec{\mu_k})\right]=0\\\qquad \qquad \\\Downarrow 解得\\\qquad \\ \vec{\mu_k}=\frac{\sum_{i=1}^Nh_{ik}\cdot \vec{x_i}}{\sum_{i=1}^Nh_{ik}}\\\qquad \qquad \\\Downarrow 代入\\\qquad \\ \vec{\mu_k}=\frac{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot \vec{x_i}}{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}} Σk1i=1N[k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)(xi μk )]=0hik=k=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)Σk1i=1N[hik(xi μk )]=0μk =i=1Nhiki=1Nhikxi μk =i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)xi
所以这就是我们最后需要证的结果:
μ k ⃗ = ∑ i = 1 N h i k ⋅ x i ⃗ ∑ i = 1 N h i k = ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ⋅ x i ⃗ ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) \vec{\mu_k}=\frac{\sum_{i=1}^Nh_{ik}\cdot \vec{x_i}}{\sum_{i=1}^Nh_{ik}}=\frac{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}\cdot \vec{x_i}}{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}} μk =i=1Nhiki=1Nhikxi =i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)xi
此处我只展示了 μ k ⃗ \vec{\mu_k} μk 的求解过程,对于 Σ k {\Sigma_k} Σk的求解未展开,不过其计算原理一样,到最后其计算结果为:
Σ k = ∑ i = 1 N h i k ( x i ⃗ − μ k ⃗ ) ⋅ ( x i ⃗ − μ k ⃗ ) T ∑ i = 1 N h i k = ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ( x i ⃗ − μ k ⃗ ) ⋅ ( x i ⃗ − μ k ⃗ ) T ∑ i = 1 N π k ⋅ N ( x i ⃗ ∣ μ k ⃗ , Σ k ) ∑ k = 1 K π k N ( x i ⃗ ∣ μ k ⃗ , Σ k ) {\Sigma_k}=\frac{\sum_{i=1}^Nh_{ik}(\vec{x_i}-\vec{\mu_k})\cdot(\vec{x_i}-\vec{\mu_k})^T}{\sum_{i=1}^Nh_{ik}}=\frac{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}(\vec{x_i}-\vec{\mu_k})\cdot(\vec{x_i}-\vec{\mu_k})^T}{\sum_{i=1}^N\frac{\pi_k\cdot N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}{\sum_{k=1}^K\pi_k N(\vec{x_i}|\vec{\mu_k},\Sigma_k)}} Σk=i=1Nhiki=1Nhik(xi μk )(xi μk )T=i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)i=1Nk=1KπkN(xi μk ,Σk)πkN(xi μk ,Σk)(xi μk )(xi μk )T

5.心得

推导虽然不容易,但自己推导一遍还是方便理解的。加油~~

  • 7
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

王延凯的博客

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值