GMM高斯混合模型原理推导(二)

前言

上一篇GMM高斯混合模型原理推导(一),我们的连乘已经变成了连加号,只需要求出 P ( z , x ) P(z,x) P(z,x)相关的概率即可
数学基础:【概率论与数理统计知识复习-哔哩哔哩】

原理推导

对于 P ( z , x ) P(z,x) P(z,x),为什么不是 P ( z , x ∣ θ ) P(z,x|\theta) P(z,xθ)?因为 θ \theta θ是参数,不是随机变量
P ( z i = C k , x i ) = P ( x i ∣ z i = C k ) P ( z i = C k ) = p k ∗ N ( x i ∣ μ k , Σ k ) P(z_i=Ck,x_i)=P(x_i|z_i=Ck)P(z_i=Ck)=p_k*N(x_i|\mu_k,Σ_k) P(zi=Ck,xi)=P(xizi=Ck)P(zi=Ck)=pkN(xiμk,Σk)
P ( z ∣ x ) P(z|x) P(zx)我们目前没有必要求出来,因为他的参数是给定 θ t \theta^t θt,我们所需要的是变量,因为后续要求导求极值。而 P ( z ∣ x , θ t ) P(z|x,\theta^t) P(zx,θt)已经相当于一个常数

因此
E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] = ∑ k = 1 K ∑ i = 1 n l o g [ p k ∗ N ( μ k , Σ k ) ] P ( z i = C k ∣ x i , θ t ) = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g N ( μ k , Σ k ) ] P ( z i = C k ∣ x i , θ t ) \begin{equation} \begin{aligned} {E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]}=&\sum_{k=1}^K\sum_{i=1}^{n}log[p_k*N(\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation} EP(ZX,θt)[logP(Z,Xθ)]==k=1Ki=1nlog[pkN(μk,Σk)]P(zi=Ckxi,θt)k=1Ki=1n[logpk+logN(μk,Σk)]P(zi=Ckxi,θt)
先求出 p k p_k pk,对于 p k p_k pk,我们知道有约束条件 ∑ k = 1 K p k = 1 \sum\limits_{k=1}^Kp_k=1 k=1Kpk=1,所以构造拉格朗日函数
L ( θ , λ ) = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g N ( x i ∣ μ k , Σ k ) ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] L(\theta,\lambda)=\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(x_i|\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] L(θ,λ)=k=1Ki=1n[logpk+logN(xiμk,Σk)]P(zi=Ckxi,θt)+λ[k=1Kpk1]
让其对关于 p k p_k pk求导
∂ L ( θ , λ ) ∂ p k = ∑ i = 1 n 1 p k P ( z i = C k ∣ x i , θ t ) + λ = 0 等式左右乘以 p k ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) + λ p k = 0 \begin{equation} \begin{aligned} &\frac{\partial{L(\theta,\lambda)}}{\partial{p_k}}=\sum_{i=1}^n\frac{1}{p_k}P(z_i=Ck|x_i,\theta^t)+\lambda=0 \\&等式左右乘以p_k \\&\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)+\lambda{p_k}=0 \end{aligned} \end{equation} pkL(θ,λ)=i=1npk1P(zi=Ckxi,θt)+λ=0等式左右乘以pki=1nP(zi=Ckxi,θt)+λpk=0
因此,当 k = 1 , 2 , ⋯   , K k=1,2,\cdots,K k=1,2,,K
∑ i = 1 n P ( z i = C 1 ) + λ p 1 = 0 ∑ i = 1 n P ( z i = C 2 ) + λ p 2 = 0 ⋮ \sum_{i=1}^nP(z_i=C1)+\lambda{p_1}=0 \\\sum_{i=1}^nP(z_i=C2)+\lambda{p_2}=0 \\\vdots i=1nP(zi=C1)+λp1=0i=1nP(zi=C2)+λp2=0
所以
∑ i = 1 n P ( z i = C 1 ) + λ p 1 + ∑ i = 1 n P ( z i = C 2 ) + λ p 2 + ⋯ + ∑ i = 1 n P ( z i = C k ) + λ p k = 0 \sum_{i=1}^nP(z_i=C1)+\lambda{p_1}+\sum_{i=1}^nP(z_i=C2)+\lambda{p_2}+\cdots+\sum_{i=1}^nP(z_i=Ck)+\lambda{p_k}=0 i=1nP(zi=C1)+λp1+i=1nP(zi=C2)+λp2++i=1nP(zi=Ck)+λpk=0

∑ k = 1 K ∑ i = 1 n [ P ( z i = C k ) + ∑ k = 1 K λ p k = ∑ i = 1 n ∑ k = 1 K P ( z i = C k ) + λ ∑ k = 1 K p k = 0 \begin{equation} \begin{aligned} &\sum_{k=1}^K\sum_{i=1}^n[P(z_i=Ck)+\sum_{k=1}^K\lambda{p_k} \\=&\sum_{i=1}^n\sum_{k=1}^KP(z_i=Ck)+\lambda\sum_{k=1}^K{p_k} \\=&0 \end{aligned} \end{equation} ==k=1Ki=1n[P(zi=Ck)+k=1Kλpki=1nk=1KP(zi=Ck)+λk=1Kpk0
因为 ∑ k = 1 K p k = 1 \sum_{k=1}^K{p_k}=1 k=1Kpk=1 ∑ k = 1 K P ( z i = C k ) = 1 \sum\limits_{k=1}^KP(z_i=Ck)=1 k=1KP(zi=Ck)=1

所以最终变成
∑ i = 1 n 1 + λ = 0 → n + λ = 0 → λ = − n \sum_{i=1}^n1+\lambda=0 \rightarrow n+\lambda=0 \rightarrow \lambda=-n i=1n1+λ=0n+λ=0λ=n
λ = − n \lambda=-n λ=n代入之前的 ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) + λ p k = 0 \sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)+\lambda{p_k}=0 i=1nP(zi=Ckxi,θt)+λpk=0
p k = 1 n ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) p_k=\frac{1}{n}\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t) pk=n1i=1nP(zi=Ckxi,θt)
p p p有了,接下来就是求解 μ , Σ \mu,Σ μ,Σ

正态分布的概率密度函数
f ( x ) = 1 ( 2 π ) d 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) } f(x)=\frac{1}{(2\pi)^{\frac{d}{2}}|Σ|^{\frac{1}{2}}}exp\left\{ -\frac{1}{2}(x-\mu)^TΣ^{-1}(x-\mu) \right\} f(x)=(2π)2d∣Σ211exp{21(xμ)TΣ1(xμ)}
其中 d d d代表x的维度。

要求均值和协方差,先把拉格朗日函数里面的正态分布写成概率密度函数的形式
L ( θ , λ ) = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g N ( x i ∣ μ k , Σ k ) ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g [ 1 ( 2 π ) d 2 ∣ Σ k ∣ 1 2 e x p { − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) } ] ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g 1 ( 2 π ) d 2 ∣ Σ k ∣ 1 2 + l o g [ e x p { − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) } ] ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] = ∑ k = 1 K ∑ i = 1 n [ l o g p k + l o g 1 ( 2 π ) d 2 ∣ Σ k ∣ 1 2 − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] = ∑ k = 1 K ∑ i = 1 n [ l o g p k − d 2 2 π − 1 2 l o g ∣ Σ k ∣ − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) + λ [ ∑ k = 1 K p k − 1 ] \begin{equation} \begin{aligned} L(\theta,\lambda)=&\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(x_i|\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^{n}\left[logp_k+log\left[\frac{1}{(2\pi)^{\frac{d}{2}}|Σ_k|^{\frac{1}{2}}}exp\left\{ -\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right\}\right]\right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k+log\frac{1}{(2\pi)^\frac{d}{2}|Σ_k|^\frac{1}{2}}+log\left[exp\left\{-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k)\right\}\right] \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k+log\frac{1}{(2\pi)^\frac{d}{2}|Σ_k|^\frac{1}{2}}-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k-\frac{d}{2}2\pi-\frac{1}{2}log|Σ_k|-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \end{aligned} \end{equation} L(θ,λ)=====k=1Ki=1n[logpk+logN(xiμk,Σk)]P(zi=Ckxi,θt)+λ[k=1Kpk1]k=1Ki=1n[logpk+log[(2π)2dΣk211exp{21(xiμk)TΣk1(xiμk)}]]P(zi=Ckxi,θt)+λ[k=1Kpk1]k=1Ki=1n[logpk+log(2π)2dΣk211+log[exp{21(xiμk)TΣk1(xiμk)}]]P(zi=Ckxi,θt)+λ[k=1Kpk1]k=1Ki=1n[logpk+log(2π)2dΣk21121(xiμk)TΣk1(xiμk)]P(zi=Ckxi,θt)+λ[k=1Kpk1]k=1Ki=1n[logpk2d2π21logΣk21(xiμk)TΣk1(xiμk)]P(zi=Ckxi,θt)+λ[k=1Kpk1]
对拉格朗日函数关于 μ k \mu_k μk求导,以下直接给出用到得矩阵求导公式
∂ ( x T A x ) ∂ x = 2 A x ( 假设 A 为对称阵 ) \frac{\partial{(x^TAx)}}{\partial{x}}=2Ax(假设A为对称阵) x(xTAx)=2Ax(假设A为对称阵)
矩阵求导依然满足链式求导法则。所以可以将 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) (x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) (xiμk)TΣk1(xiμk)中的 ( x i − μ k ) (x_i-\mu_k) (xiμk)当作x,求完外层的导数后再求里面的相乘即可。
∂ L ( θ , λ ) ∂ μ k = ∑ i = 1 n Σ k − 1 ( x i − μ k ) P ( z i = C k ∣ x i , θ t ) = Σ k − 1 ∑ i = 1 n ( x i − u k ) P ( z i = C k ∣ x i , θ t ) = 0 即: ∑ i = 1 n ( x i − u k ) P ( z i = C k ∣ x i , θ t ) = 0 = ∑ i = 1 n x i P ( z i = C k ∣ x i , θ t ) − ∑ i = 1 n μ k P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n x i P ( z i = C k ∣ x i , θ t ) − μ k ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) = 0 \begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial\mu_k}=&\sum_{i=1}^nΣ_k^{-1}(x_i-\mu_k)P(z_i=Ck|x_i,\theta^t) \\=&Σ_k^{-1}\sum_{i=1}^n(x_i-u_k)P(z_i=Ck|x_i,\theta^t) \\=&0 \\即:&\sum_{i=1}^n(x_i-u_k)P(z_i=Ck|x_i,\theta^t)=0 \\=&\sum_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)-\sum_{i=1}^n\mu_kP(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)-\mu_k\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t) \\=&0 \end{aligned} \end{equation} μkL(θ,λ)===即:===i=1nΣk1(xiμk)P(zi=Ckxi,θt)Σk1i=1n(xiuk)P(zi=Ckxi,θt)0i=1n(xiuk)P(zi=Ckxi,θt)=0i=1nxiP(zi=Ckxi,θt)i=1nμkP(zi=Ckxi,θt)i=1nxiP(zi=Ckxi,θt)μki=1nP(zi=Ckxi,θt)0
移项得
u k = ∑ i = 1 n x i P ( z i = C k ∣ x i , θ t ) ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) u_k=\frac{\sum\limits_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)} uk=i=1nP(zi=Ckxi,θt)i=1nxiP(zi=Ckxi,θt)
再对 Σ k Σ_k Σk求导。

对于 Σ k Σ_k Σk,我们知道,它是一个矩阵,标量对矩阵求导可以对每一个分量求导求解,或者利用迹技巧直接求解。

本文两种都讲一次吧,读者对哪种感兴趣就用哪种

第①种:分量求导

先认识一下下面两个求导常用公式(A为矩阵),此处不作推导,感兴趣可以百度或者看书
( I n ∣ A ∣ ) ′ = ( A − 1 ) T ; ( A − 1 ) ′ = − A − 1 A ′ A − 1 ; (In|A|)^{'}=(A^{-1})^{T}; \\(A^{-1})'=-A^{-1}A'A^{-1}; (InA)=(A1)T;(A1)=A1AA1;
Σ k {Σ}_k Σk求导
∂ L ( θ , λ ) ∂ Σ k = ∑ i = 1 n [ − 1 2 ( Σ k − 1 ) T + 1 2 ( x i − μ k ) T Σ k − 1 Σ k ′ Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n [ − 1 2 Σ k − 1 + 1 2 ( x i − μ k ) T Σ k − 1 Σ k ′ Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) (1) \begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial{Σ}_k}=&\sum_{i=1}^n \left[ -\frac{1}{2}({Σ}_k^{-1})^{T}+\frac{1}{2}(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n \left[ -\frac{1}{2}{Σ}_k^{-1}+\frac{1}{2}(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation}\tag{1} ΣkL(θ,λ)==i=1n[21(Σk1)T+21(xiμk)TΣk1ΣkΣk1(xiμk)]P(zi=Ckxi,θt)i=1n[21Σk1+21(xiμk)TΣk1ΣkΣk1(xiμk)]P(zi=Ckxi,θt)(1)

对于里面的 ( x i − μ k ) T Σ k − 1 Σ k ′ Σ k − 1 ( x i − μ k ) (x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k) (xiμk)TΣk1ΣkΣk1(xiμk),我们知道 Σ k Σ_k Σk是协方差矩阵,我们分别对里面的分量进行求导。我们令 A = Σ k − 1 ( x i − μ k ) A={Σ}_k^{-1}(x_i-\mu_k) A=Σk1(xiμk),则 ( x i − μ k ) T Σ k − 1 Σ k ′ Σ k − 1 ( x i − μ k ) = A T Σ k ′ A (x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k)=A^TΣ_k^{'}A (xiμk)TΣk1ΣkΣk1(xiμk)=ATΣkA,所以
∂ A T Σ k A ∂ Σ i j = A i ∗ A j = ( A ∗ A T ) i j \frac{\partial{A^TΣ_kA}}{\partial{Σ_{ij}}}=A_i*A_j=(A*A^T)_{ij} ΣijATΣkA=AiAj=(AAT)ij
为啥等于这个呢?来看**(以下省略掉 Σ k − 1 {Σ}_k^{-1} Σk1,不影响最终结果,后面运算的时候再加回去即可,现在只是证明上面所写的合理性)**
A T Σ k A = ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) ( Σ 11 Σ 12 Σ 21 Σ 22 ) ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) A^TΣ_kA= \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix} ATΣkA=((x1μ1)(x2μ2))(Σ11Σ21Σ12Σ22)((x1μ1)(x2μ2))
再看
A A T = ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) = ( ( x 1 − μ 1 ) ( x 1 − μ 1 ) ( x 1 − μ 1 ) ( x 2 − μ 2 ) ( x 2 − μ 2 ) ( x 1 − μ 1 ) ( x 2 − μ 2 ) ( x 2 − μ 2 ) ) AA^{T}=\begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} =\begin{pmatrix} (x^1-\mu^1)(x^1-\mu^1) & (x^1-\mu^1)(x^2-\mu^2) \\(x^2-\mu^2)(x^1-\mu^1) & (x^2-\mu^2)(x^2-\mu^2) \end{pmatrix} AAT=((x1μ1)(x2μ2))((x1μ1)(x2μ2))=((x1μ1)(x1μ1)(x2μ2)(x1μ1)(x1μ1)(x2μ2)(x2μ2)(x2μ2))
Σ i j Σ_{ij} Σij求导相当于矩阵的每一个元素对 Σ i j Σ_{ij} Σij求导,那么理论上也只有对应位置的数值是1,其余为0。因为其余元素被视为常数,而对应位置的求导就是标量对标量的求导,所以直接等于1。比如对 Σ 11 Σ_{11} Σ11求导,所得
∂ A T Σ k A ∂ Σ 11 = ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) ( 1 0 0 0 ) ( ( x 1 − μ 1 ) ( x 2 − μ 2 ) ) = ( x 1 − μ 1 ) ( x 1 − μ 1 ) = ( A A T ) 11 \begin{equation} \begin{aligned} \frac{\partial{A^TΣ_kA}}{\partial{Σ_{11}}}=& \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix} \\=&(x^1-\mu^1)(x^1-\mu^1) \\=&(AA^T)_{11} \end{aligned} \end{equation} Σ11ATΣkA===((x1μ1)(x2μ2))(1000)((x1μ1)(x2μ2))(x1μ1)(x1μ1)(AAT)11
所以,以此类推
∂ A T Σ k A ∂ Σ k = A A T \frac{\partial{A^TΣ_kA}}{\partial{Σ_{k}}}=AA^T ΣkATΣkA=AAT
所以公式(1)等于
∂ L ( θ , λ ) ∂ Σ k = ∑ i = 1 n [ − 1 2 Σ k − 1 + 1 2 A A T ] P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n [ − 1 2 Σ k − 1 + 1 2 Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 ] P ( z i = C k ∣ x i , θ t ) = − ∑ i = 1 n 1 2 Σ k − 1 P ( z i = C k ∣ x i , θ t ) + ∑ i = 1 n 1 2 Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 P ( z i = C k ∣ x i , θ t ) = 0 \begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial{Σ}_k}=&\sum_{i=1}^n \left[ -\frac{1}{2}{Σ}_k^{-1}+\frac{1}{2}AA^T \right]P(z_i=Ck|x_i,\theta^t) \\=&\sum\limits_{i=1}^n\left[ -\frac{1}{2}Σ_k^{-1}+\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1} \right]P(z_i=Ck|x_i,\theta^t) \\=&-\sum\limits_{i=1}^n\frac{1}{2}Σ_k^{-1}P(z_i=Ck|x_i,\theta^t)+\sum\limits_{i=1}^n\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\=&0 \end{aligned} \end{equation} ΣkL(θ,λ)====i=1n[21Σk1+21AAT]P(zi=Ckxi,θt)i=1n[21Σk1+21Σk1(xiμk)(xiμk)TΣk1]P(zi=Ckxi,θt)i=1n21Σk1P(zi=Ckxi,θt)+i=1n21Σk1(xiμk)(xiμk)TΣk1P(zi=Ckxi,θt)0
移项
∑ i = 1 n 1 2 Σ k − 1 P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n 1 2 Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 P ( z i = C k ∣ x i , θ t ) 即 ∑ i = 1 n Σ k − 1 P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 P ( z i = C k ∣ x i , θ t ) 等式左右,都左乘以 Σ k 得 ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n ( x i − μ k ) ( x i − μ k ) T Σ k − 1 P ( z i = C k ∣ x i , θ t ) 等式左右,都右乘以 Σ k 得 Σ k ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n ( x i − μ k ) ( x i − μ k ) T P ( z i = C k ∣ x i , θ t ) \begin{equation} \begin{aligned} &\sum\limits_{i=1}^n\frac{1}{2}Σ_k^{-1}P(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&即\sum\limits_{i=1}^nΣ_k^{-1}P(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&等式左右,都左乘以Σ_k得& \\&\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&等式左右,都右乘以Σ_k得& \\&Σ_k\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^TP(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation} i=1n21Σk1P(zi=Ckxi,θt)=i=1n21Σk1(xiμk)(xiμk)TΣk1P(zi=Ckxi,θt)i=1nΣk1P(zi=Ckxi,θt)=i=1nΣk1(xiμk)(xiμk)TΣk1P(zi=Ckxi,θt)等式左右,都左乘以Σki=1nP(zi=Ckxi,θt)=i=1n(xiμk)(xiμk)TΣk1P(zi=Ckxi,θt)等式左右,都右乘以ΣkΣki=1nP(zi=Ckxi,θt)=i=1n(xiμk)(xiμk)TP(zi=Ckxi,θt)
所以
Σ k = ∑ i = 1 n ( x i − μ k ) ( x i − μ k ) T P ( z i = C k ∣ x i , θ t ) ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) Σ_k=\frac{\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^TP(z_i=Ck|x_i,\theta^t)}{\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)} Σk=i=1nP(zi=Ckxi,θt)i=1n(xiμk)(xiμk)TP(zi=Ckxi,θt)
第②种:迹技巧

对于迹技巧,先来看两个微分公式(A是矩阵,并且可逆),此处不作推导,感兴趣可以百度或者看书
d ∣ A ∣ = ∣ A ∣ t r ( A − 1 d A ) ; d ( A − 1 ) = − A − 1 ( d A ) A − 1 d|A|=|A|tr(A^{-1}dA); \\d(A^{-1})=-A^{-1}(dA)A^{-1} dA=Atr(A1dA);d(A1)=A1(dA)A1
对于迹技巧,对原函数求微分,原函数与 Σ k Σ_k Σk相关的只有两项

第一项 l o g ∣ Σ k ∣ log|Σ_k| logΣk
d ( l o g ∣ Σ k ∣ ) = 1 ∣ Σ k ∣ d ∣ Σ ∣ = 1 ∣ Σ k ∣ ∣ Σ k ∣ t r ( Σ k − 1 d Σ k ) = t r ( Σ k − 1 d Σ k ) d(log|Σ_k|)=\frac{1}{|Σ_k|}d|Σ|=\frac{1}{|Σ_k|}|Σ_k|tr(Σ_k^{-1}dΣ_k)=tr(Σ_k^{-1}dΣ_k) d(logΣk)=Σk1d∣Σ∣=Σk1Σktr(Σk1dΣk)=tr(Σk1dΣk)
第二项 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) (x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) (xiμk)TΣk1(xiμk)
d ( ( x i − μ k ) T Σ k − 1 ( x i − μ k ) ) = − ( x i − μ k ) T Σ k − 1 ( d Σ k ) Σ k − 1 ( x i − μ k ) \begin{equation} \begin{aligned} d((x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k))=-(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \end{aligned} \end{equation} d((xiμk)TΣk1(xiμk))=(xiμk)TΣk1(dΣk)Σk1(xiμk)
所以,原函数的微分
d L ( θ , λ ) = ∑ i = 1 n [ − 1 2 t r ( Σ k − 1 d Σ k ) + 1 2 ( x i − μ k ) T Σ k − 1 ( d Σ k ) Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) {dL(\theta,\lambda)}=\sum_{i=1}^n \left[ -\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) dL(θ,λ)=i=1n[21tr(Σk1dΣk)+21(xiμk)TΣk1(dΣk)Σk1(xiμk)]P(zi=Ckxi,θt)
给其套入迹
t r ( d L ( θ , λ ) ) = t r ( ∑ i = 1 n [ − 1 2 t r ( Σ k ∣ − 1 d Σ k ) + 1 2 ( x i − μ k ) T Σ k − 1 ( d Σ k ) Σ k − 1 ( x i − μ k ) ] P ( z i = C k ∣ x i , θ t ) ) = ∑ i = 1 n [ 1 2 t r ( Σ k − 1 d Σ k ) + 1 2 t r ( ( x i − μ k ) T Σ k − 1 ( d Σ k ) Σ k − 1 ( x i − μ k ) ) ] P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n [ − 1 2 t r ( Σ k − 1 d Σ k ) + 1 2 t r ( Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 d Σ k ) ] P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n [ 1 2 t r ( − Σ k − 1 d Σ k + Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 d Σ k ) ] P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n [ 1 2 t r ( − Σ k − 1 + Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 ) d Σ k ) ] P ( z i = C k ∣ x i , θ t ) = t r ( ∑ i = 1 n 1 2 ( P ( z i = C k ∣ x i , θ t ) ( − Σ k − 1 + Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 ) ) d Σ k ) \begin{equation} \begin{aligned} tr({dL(\theta,\lambda)})=&tr\left( \sum_{i=1}^n\left[ -\frac{1}{2}tr(Σ_k|^{-1}dΣ_k)+\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \right) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}tr((x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k))\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[-\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}tr(Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}dΣ_k)\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr(-Σ_k^{-1}dΣ_k+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}dΣ_k)\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr\left(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1})dΣ_k\right)\right]P(z_i=Ck|x_i,\theta^t) \\=&tr\left(\sum_{i=1}^n\frac{1}{2}(P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}))dΣ_k\right) \end{aligned} \end{equation} tr(dL(θ,λ))======tr(i=1n[21tr(Σk1dΣk)+21(xiμk)TΣk1(dΣk)Σk1(xiμk)]P(zi=Ckxi,θt))i=1n[21tr(Σk1dΣk)+21tr((xiμk)TΣk1(dΣk)Σk1(xiμk))]P(zi=Ckxi,θt)i=1n[21tr(Σk1dΣk)+21tr(Σk1(xiμk)(xiμk)TΣk1dΣk)]P(zi=Ckxi,θt)i=1n[21tr(Σk1dΣk+Σk1(xiμk)(xiμk)TΣk1dΣk)]P(zi=Ckxi,θt)i=1n[21tr(Σk1+Σk1(xiμk)(xiμk)TΣk1)dΣk)]P(zi=Ckxi,θt)tr(i=1n21(P(zi=Ckxi,θt)(Σk1+Σk1(xiμk)(xiμk)TΣk1))dΣk)
去掉迹得
d L ( θ , λ ) = ∑ i = 1 n 1 2 ( P ( z i = C k ∣ x i , θ t ) ( − Σ k − 1 + Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 ) ) d Σ 即 d L ( θ , λ ) d Σ k = ∑ i = 1 n 1 2 P ( z i = C k ∣ x i , θ t ) ( − Σ k − 1 + Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 ) = 0 \begin{equation} \begin{aligned} &{dL(\theta,\lambda)}=\sum_{i=1}^n\frac{1}{2}(P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}))dΣ \\&即\frac{dL(\theta,\lambda)}{dΣ_k}=\sum_{i=1}^n\frac{1}{2}P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1})=0 \end{aligned} \end{equation} dL(θ,λ)=i=1n21(P(zi=Ckxi,θt)(Σk1+Σk1(xiμk)(xiμk)TΣk1))dΣdΣkdL(θ,λ)=i=1n21P(zi=Ckxi,θt)(Σk1+Σk1(xiμk)(xiμk)TΣk1)=0
移项
∑ i = 1 n P ( z i = C k ∣ x i , θ t ) Σ k − 1 = ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) Σ k − 1 ( x i − μ k ) ( x i − μ k ) T Σ k − 1 \sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)Σ_k^{-1}=\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1} i=1nP(zi=Ckxi,θt)Σk1=i=1nP(zi=Ckxi,θt)Σk1(xiμk)(xiμk)TΣk1
和上面第一种方法一样,因为 P ( z i = C k ∣ x i , θ t ) P(z_i=Ck|x_i,\theta^t) P(zi=Ckxi,θt)是标量,故等式左右,先都左乘 Σ k Σ_k Σk,再都右乘 Σ k Σ_k Σk,得
Σ k ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) = ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ( x i − μ k ) ( x i − μ k ) T Σ_k\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T Σki=1nP(zi=Ckxi,θt)=i=1nP(zi=Ckxi,θt)(xiμk)(xiμk)T
最终
Σ k = ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ( x i − μ k ) ( x i − μ k ) T ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) Σ_k=\frac{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)} Σk=i=1nP(zi=Ckxi,θt)i=1nP(zi=Ckxi,θt)(xiμk)(xiμk)T

结果

p k = 1 n ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ; u k = ∑ i = 1 n x i P ( z i = C k ∣ x i , θ t ) ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ; Σ k = ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ( x i − μ k ) ( x i − μ k ) T ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) p_k=\frac{1}{n}\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t);\\ u_k=\frac{\sum\limits_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)}; \\Σ_k=\frac{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)} pk=n1i=1nP(zi=Ckxi,θt);uk=i=1nP(zi=Ckxi,θt)i=1nxiP(zi=Ckxi,θt);Σk=i=1nP(zi=Ckxi,θt)i=1nP(zi=Ckxi,θt)(xiμk)(xiμk)T

那么,接下来只需要计算出 P ( z i = C k ∣ x i , θ t ) P(z_i=Ck|x_i,\theta^t) P(zi=Ckxi,θt) θ \theta θ是参数,下面省略掉
P ( z i = C k ∣ x i ) = P ( z i = C k , x i ) P ( x i ) P(z_i=Ck|x_i)=\frac{P(z_i=Ck,x_i)}{P(x_i)} P(zi=Ckxi)=P(xi)P(zi=Ck,xi)
对于 P ( x i ) P(x_i) P(xi)
P ( x i ) = ∑ z i P ( x i , z i ) = ∑ k = 1 K P ( x i , z i = C k ) P(x_i)=\sum\limits_{z_i}P(x_i,z_i)=\sum\limits_{k=1}^KP(x_i,z_i=Ck) P(xi)=ziP(xi,zi)=k=1KP(xi,zi=Ck)
前面我们算出来过 P ( x i , z i = C k ) = p k ∗ N ( x i ∣ μ k , Σ k ) P(x_i,z_i=Ck)=p_k*N(x_i|\mu_k,Σ_k) P(xi,zi=Ck)=pkN(xiμk,Σk)

所以
P ( z i = C k ∣ x i ) = p k ∗ N ( x i ∣ μ k , Σ k ) ∑ k = 1 K p k ∗ N ( x i ∣ μ k , Σ k ) P(z_i=Ck|x_i)=\frac{p_k*N(x_i|\mu_k,Σ_k)}{\sum\limits_{k=1}^Kp_k*N(x_i|\mu_k,Σ_k)} P(zi=Ckxi)=k=1KpkN(xiμk,Σk)pkN(xiμk,Σk)
请务必注意式子中,分子处得k来自左边的Ck,而分母的k是来自求和符号

算法流程

①随机初始化模型参数 p t , μ t , Σ t p^t,\mu^t,Σ^t pt,μt,Σt

②计算出 P ( z i = C k ∣ x i , θ ) P(z_i=Ck|x_i,\theta) P(zi=Ckxi,θ)

③依据公式计算出 p t + 1 , μ t + 1 , Σ t + 1 p^{t+1},\mu^{t+1},Σ^{t+1} pt+1,μt+1,Σt+1

④计算 p t + 1 , μ t + 1 , Σ t + 1 p^{t+1},\mu^{t+1},Σ^{t+1} pt+1,μt+1,Σt+1 p t , μ t , Σ t p^t,\mu^t,Σ^t pt,μt,Σt的差值,如果差值小于 ϵ \epsilon ϵ(自己设定的值)。如果小于则说明变化太小,证明收敛,结束算法。否则循环②,③步骤

代码实现

GMM高斯混合模型代码实现

结束

至此推导和代码已都全部完成。很多地方推导并不严谨,如有问题,还请指出。阿里嘎多

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值