前言
上一篇GMM高斯混合模型原理推导(一),我们的连乘已经变成了连加号,只需要求出
P
(
z
,
x
)
P(z,x)
P(z,x)相关的概率即可
数学基础:【概率论与数理统计知识复习-哔哩哔哩】
原理推导
对于
P
(
z
,
x
)
P(z,x)
P(z,x),为什么不是
P
(
z
,
x
∣
θ
)
P(z,x|\theta)
P(z,x∣θ)?因为
θ
\theta
θ是参数,不是随机变量
P
(
z
i
=
C
k
,
x
i
)
=
P
(
x
i
∣
z
i
=
C
k
)
P
(
z
i
=
C
k
)
=
p
k
∗
N
(
x
i
∣
μ
k
,
Σ
k
)
P(z_i=Ck,x_i)=P(x_i|z_i=Ck)P(z_i=Ck)=p_k*N(x_i|\mu_k,Σ_k)
P(zi=Ck,xi)=P(xi∣zi=Ck)P(zi=Ck)=pk∗N(xi∣μk,Σk)
而
P
(
z
∣
x
)
P(z|x)
P(z∣x)我们目前没有必要求出来,因为他的参数是给定
θ
t
\theta^t
θt,我们所需要的是变量,因为后续要求导求极值。而
P
(
z
∣
x
,
θ
t
)
P(z|x,\theta^t)
P(z∣x,θt)已经相当于一个常数
因此
E
P
(
Z
∣
X
,
θ
t
)
[
l
o
g
P
(
Z
,
X
∣
θ
)
]
=
∑
k
=
1
K
∑
i
=
1
n
l
o
g
[
p
k
∗
N
(
μ
k
,
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
N
(
μ
k
,
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
\begin{equation} \begin{aligned} {E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]}=&\sum_{k=1}^K\sum_{i=1}^{n}log[p_k*N(\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation}
EP(Z∣X,θt)[logP(Z,X∣θ)]==k=1∑Ki=1∑nlog[pk∗N(μk,Σk)]P(zi=Ck∣xi,θt)k=1∑Ki=1∑n[logpk+logN(μk,Σk)]P(zi=Ck∣xi,θt)
先求出
p
k
p_k
pk,对于
p
k
p_k
pk,我们知道有约束条件
∑
k
=
1
K
p
k
=
1
\sum\limits_{k=1}^Kp_k=1
k=1∑Kpk=1,所以构造拉格朗日函数
L
(
θ
,
λ
)
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
N
(
x
i
∣
μ
k
,
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
L(\theta,\lambda)=\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(x_i|\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right]
L(θ,λ)=k=1∑Ki=1∑n[logpk+logN(xi∣μk,Σk)]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]
让其对关于
p
k
p_k
pk求导
∂
L
(
θ
,
λ
)
∂
p
k
=
∑
i
=
1
n
1
p
k
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
=
0
等式左右乘以
p
k
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
p
k
=
0
\begin{equation} \begin{aligned} &\frac{\partial{L(\theta,\lambda)}}{\partial{p_k}}=\sum_{i=1}^n\frac{1}{p_k}P(z_i=Ck|x_i,\theta^t)+\lambda=0 \\&等式左右乘以p_k \\&\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)+\lambda{p_k}=0 \end{aligned} \end{equation}
∂pk∂L(θ,λ)=i=1∑npk1P(zi=Ck∣xi,θt)+λ=0等式左右乘以pki=1∑nP(zi=Ck∣xi,θt)+λpk=0
因此,当
k
=
1
,
2
,
⋯
,
K
k=1,2,\cdots,K
k=1,2,⋯,K时
∑
i
=
1
n
P
(
z
i
=
C
1
)
+
λ
p
1
=
0
∑
i
=
1
n
P
(
z
i
=
C
2
)
+
λ
p
2
=
0
⋮
\sum_{i=1}^nP(z_i=C1)+\lambda{p_1}=0 \\\sum_{i=1}^nP(z_i=C2)+\lambda{p_2}=0 \\\vdots
i=1∑nP(zi=C1)+λp1=0i=1∑nP(zi=C2)+λp2=0⋮
所以
∑
i
=
1
n
P
(
z
i
=
C
1
)
+
λ
p
1
+
∑
i
=
1
n
P
(
z
i
=
C
2
)
+
λ
p
2
+
⋯
+
∑
i
=
1
n
P
(
z
i
=
C
k
)
+
λ
p
k
=
0
\sum_{i=1}^nP(z_i=C1)+\lambda{p_1}+\sum_{i=1}^nP(z_i=C2)+\lambda{p_2}+\cdots+\sum_{i=1}^nP(z_i=Ck)+\lambda{p_k}=0
i=1∑nP(zi=C1)+λp1+i=1∑nP(zi=C2)+λp2+⋯+i=1∑nP(zi=Ck)+λpk=0
即
∑
k
=
1
K
∑
i
=
1
n
[
P
(
z
i
=
C
k
)
+
∑
k
=
1
K
λ
p
k
=
∑
i
=
1
n
∑
k
=
1
K
P
(
z
i
=
C
k
)
+
λ
∑
k
=
1
K
p
k
=
0
\begin{equation} \begin{aligned} &\sum_{k=1}^K\sum_{i=1}^n[P(z_i=Ck)+\sum_{k=1}^K\lambda{p_k} \\=&\sum_{i=1}^n\sum_{k=1}^KP(z_i=Ck)+\lambda\sum_{k=1}^K{p_k} \\=&0 \end{aligned} \end{equation}
==k=1∑Ki=1∑n[P(zi=Ck)+k=1∑Kλpki=1∑nk=1∑KP(zi=Ck)+λk=1∑Kpk0
因为
∑
k
=
1
K
p
k
=
1
\sum_{k=1}^K{p_k}=1
∑k=1Kpk=1,
∑
k
=
1
K
P
(
z
i
=
C
k
)
=
1
\sum\limits_{k=1}^KP(z_i=Ck)=1
k=1∑KP(zi=Ck)=1
所以最终变成
∑
i
=
1
n
1
+
λ
=
0
→
n
+
λ
=
0
→
λ
=
−
n
\sum_{i=1}^n1+\lambda=0 \rightarrow n+\lambda=0 \rightarrow \lambda=-n
i=1∑n1+λ=0→n+λ=0→λ=−n
将
λ
=
−
n
\lambda=-n
λ=−n代入之前的
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
p
k
=
0
\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)+\lambda{p_k}=0
i=1∑nP(zi=Ck∣xi,θt)+λpk=0得
p
k
=
1
n
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
p_k=\frac{1}{n}\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)
pk=n1i=1∑nP(zi=Ck∣xi,θt)
p
p
p有了,接下来就是求解
μ
,
Σ
\mu,Σ
μ,Σ。
正态分布的概率密度函数
f
(
x
)
=
1
(
2
π
)
d
2
∣
Σ
∣
1
2
e
x
p
{
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
}
f(x)=\frac{1}{(2\pi)^{\frac{d}{2}}|Σ|^{\frac{1}{2}}}exp\left\{ -\frac{1}{2}(x-\mu)^TΣ^{-1}(x-\mu) \right\}
f(x)=(2π)2d∣Σ∣211exp{−21(x−μ)TΣ−1(x−μ)}
其中
d
d
d代表x的维度。
要求均值和协方差,先把拉格朗日函数里面的正态分布写成概率密度函数的形式
L
(
θ
,
λ
)
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
N
(
x
i
∣
μ
k
,
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
[
1
(
2
π
)
d
2
∣
Σ
k
∣
1
2
e
x
p
{
−
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
}
]
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
1
(
2
π
)
d
2
∣
Σ
k
∣
1
2
+
l
o
g
[
e
x
p
{
−
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
}
]
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
+
l
o
g
1
(
2
π
)
d
2
∣
Σ
k
∣
1
2
−
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
=
∑
k
=
1
K
∑
i
=
1
n
[
l
o
g
p
k
−
d
2
2
π
−
1
2
l
o
g
∣
Σ
k
∣
−
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
λ
[
∑
k
=
1
K
p
k
−
1
]
\begin{equation} \begin{aligned} L(\theta,\lambda)=&\sum_{k=1}^K\sum_{i=1}^{n}[logp_k+logN(x_i|\mu_k,Σ_k)]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^{n}\left[logp_k+log\left[\frac{1}{(2\pi)^{\frac{d}{2}}|Σ_k|^{\frac{1}{2}}}exp\left\{ -\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right\}\right]\right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k+log\frac{1}{(2\pi)^\frac{d}{2}|Σ_k|^\frac{1}{2}}+log\left[exp\left\{-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k)\right\}\right] \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k+log\frac{1}{(2\pi)^\frac{d}{2}|Σ_k|^\frac{1}{2}}-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \\=&\sum_{k=1}^K\sum_{i=1}^n \left[ logp_k-\frac{d}{2}2\pi-\frac{1}{2}log|Σ_k|-\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t)+\lambda\left[\sum_{k=1}^Kp_k-1\right] \end{aligned} \end{equation}
L(θ,λ)=====k=1∑Ki=1∑n[logpk+logN(xi∣μk,Σk)]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]k=1∑Ki=1∑n[logpk+log[(2π)2d∣Σk∣211exp{−21(xi−μk)TΣk−1(xi−μk)}]]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]k=1∑Ki=1∑n[logpk+log(2π)2d∣Σk∣211+log[exp{−21(xi−μk)TΣk−1(xi−μk)}]]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]k=1∑Ki=1∑n[logpk+log(2π)2d∣Σk∣211−21(xi−μk)TΣk−1(xi−μk)]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]k=1∑Ki=1∑n[logpk−2d2π−21log∣Σk∣−21(xi−μk)TΣk−1(xi−μk)]P(zi=Ck∣xi,θt)+λ[k=1∑Kpk−1]
对拉格朗日函数关于
μ
k
\mu_k
μk求导,以下直接给出用到得矩阵求导公式
∂
(
x
T
A
x
)
∂
x
=
2
A
x
(
假设
A
为对称阵
)
\frac{\partial{(x^TAx)}}{\partial{x}}=2Ax(假设A为对称阵)
∂x∂(xTAx)=2Ax(假设A为对称阵)
矩阵求导依然满足链式求导法则。所以可以将
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k)
(xi−μk)TΣk−1(xi−μk)中的
(
x
i
−
μ
k
)
(x_i-\mu_k)
(xi−μk)当作x,求完外层的导数后再求里面的相乘即可。
∂
L
(
θ
,
λ
)
∂
μ
k
=
∑
i
=
1
n
Σ
k
−
1
(
x
i
−
μ
k
)
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
Σ
k
−
1
∑
i
=
1
n
(
x
i
−
u
k
)
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
0
即:
∑
i
=
1
n
(
x
i
−
u
k
)
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
0
=
∑
i
=
1
n
x
i
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
−
∑
i
=
1
n
μ
k
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
x
i
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
−
μ
k
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
0
\begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial\mu_k}=&\sum_{i=1}^nΣ_k^{-1}(x_i-\mu_k)P(z_i=Ck|x_i,\theta^t) \\=&Σ_k^{-1}\sum_{i=1}^n(x_i-u_k)P(z_i=Ck|x_i,\theta^t) \\=&0 \\即:&\sum_{i=1}^n(x_i-u_k)P(z_i=Ck|x_i,\theta^t)=0 \\=&\sum_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)-\sum_{i=1}^n\mu_kP(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)-\mu_k\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t) \\=&0 \end{aligned} \end{equation}
∂μk∂L(θ,λ)===即:===i=1∑nΣk−1(xi−μk)P(zi=Ck∣xi,θt)Σk−1i=1∑n(xi−uk)P(zi=Ck∣xi,θt)0i=1∑n(xi−uk)P(zi=Ck∣xi,θt)=0i=1∑nxiP(zi=Ck∣xi,θt)−i=1∑nμkP(zi=Ck∣xi,θt)i=1∑nxiP(zi=Ck∣xi,θt)−μki=1∑nP(zi=Ck∣xi,θt)0
移项得
u
k
=
∑
i
=
1
n
x
i
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
u_k=\frac{\sum\limits_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)}
uk=i=1∑nP(zi=Ck∣xi,θt)i=1∑nxiP(zi=Ck∣xi,θt)
再对
Σ
k
Σ_k
Σk求导。
对于 Σ k Σ_k Σk,我们知道,它是一个矩阵,标量对矩阵求导可以对每一个分量求导求解,或者利用迹技巧直接求解。
本文两种都讲一次吧,读者对哪种感兴趣就用哪种
第①种:分量求导
先认识一下下面两个求导常用公式(A为矩阵),此处不作推导,感兴趣可以百度或者看书
(
I
n
∣
A
∣
)
′
=
(
A
−
1
)
T
;
(
A
−
1
)
′
=
−
A
−
1
A
′
A
−
1
;
(In|A|)^{'}=(A^{-1})^{T}; \\(A^{-1})'=-A^{-1}A'A^{-1};
(In∣A∣)′=(A−1)T;(A−1)′=−A−1A′A−1;
对
Σ
k
{Σ}_k
Σk求导
∂
L
(
θ
,
λ
)
∂
Σ
k
=
∑
i
=
1
n
[
−
1
2
(
Σ
k
−
1
)
T
+
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
Σ
k
′
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
[
−
1
2
Σ
k
−
1
+
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
Σ
k
′
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(1)
\begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial{Σ}_k}=&\sum_{i=1}^n \left[ -\frac{1}{2}({Σ}_k^{-1})^{T}+\frac{1}{2}(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n \left[ -\frac{1}{2}{Σ}_k^{-1}+\frac{1}{2}(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation}\tag{1}
∂Σk∂L(θ,λ)==i=1∑n[−21(Σk−1)T+21(xi−μk)TΣk−1Σk′Σk−1(xi−μk)]P(zi=Ck∣xi,θt)i=1∑n[−21Σk−1+21(xi−μk)TΣk−1Σk′Σk−1(xi−μk)]P(zi=Ck∣xi,θt)(1)
对于里面的
(
x
i
−
μ
k
)
T
Σ
k
−
1
Σ
k
′
Σ
k
−
1
(
x
i
−
μ
k
)
(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k)
(xi−μk)TΣk−1Σk′Σk−1(xi−μk),我们知道
Σ
k
Σ_k
Σk是协方差矩阵,我们分别对里面的分量进行求导。我们令
A
=
Σ
k
−
1
(
x
i
−
μ
k
)
A={Σ}_k^{-1}(x_i-\mu_k)
A=Σk−1(xi−μk),则
(
x
i
−
μ
k
)
T
Σ
k
−
1
Σ
k
′
Σ
k
−
1
(
x
i
−
μ
k
)
=
A
T
Σ
k
′
A
(x_i-\mu_k)^T{Σ}_k^{-1}{Σ}_k^{'}{Σ}_k^{-1}(x_i-\mu_k)=A^TΣ_k^{'}A
(xi−μk)TΣk−1Σk′Σk−1(xi−μk)=ATΣk′A,所以
∂
A
T
Σ
k
A
∂
Σ
i
j
=
A
i
∗
A
j
=
(
A
∗
A
T
)
i
j
\frac{\partial{A^TΣ_kA}}{\partial{Σ_{ij}}}=A_i*A_j=(A*A^T)_{ij}
∂Σij∂ATΣkA=Ai∗Aj=(A∗AT)ij
为啥等于这个呢?来看**(以下省略掉
Σ
k
−
1
{Σ}_k^{-1}
Σk−1,不影响最终结果,后面运算的时候再加回去即可,现在只是证明上面所写的合理性)**
A
T
Σ
k
A
=
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
(
Σ
11
Σ
12
Σ
21
Σ
22
)
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
A^TΣ_kA= \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix}
ATΣkA=((x1−μ1)(x2−μ2))(Σ11Σ21Σ12Σ22)((x1−μ1)(x2−μ2))
再看
A
A
T
=
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
=
(
(
x
1
−
μ
1
)
(
x
1
−
μ
1
)
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
(
x
2
−
μ
2
)
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
(
x
2
−
μ
2
)
)
AA^{T}=\begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} =\begin{pmatrix} (x^1-\mu^1)(x^1-\mu^1) & (x^1-\mu^1)(x^2-\mu^2) \\(x^2-\mu^2)(x^1-\mu^1) & (x^2-\mu^2)(x^2-\mu^2) \end{pmatrix}
AAT=((x1−μ1)(x2−μ2))((x1−μ1)(x2−μ2))=((x1−μ1)(x1−μ1)(x2−μ2)(x1−μ1)(x1−μ1)(x2−μ2)(x2−μ2)(x2−μ2))
对
Σ
i
j
Σ_{ij}
Σij求导相当于矩阵的每一个元素对
Σ
i
j
Σ_{ij}
Σij求导,那么理论上也只有对应位置的数值是1,其余为0。因为其余元素被视为常数,而对应位置的求导就是标量对标量的求导,所以直接等于1。比如对
Σ
11
Σ_{11}
Σ11求导,所得
∂
A
T
Σ
k
A
∂
Σ
11
=
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
(
1
0
0
0
)
(
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
)
=
(
x
1
−
μ
1
)
(
x
1
−
μ
1
)
=
(
A
A
T
)
11
\begin{equation} \begin{aligned} \frac{\partial{A^TΣ_kA}}{\partial{Σ_{11}}}=& \begin{pmatrix} (x^1-\mu^1) & (x^2-\mu^2) \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} (x^1-\mu^1) \\ (x^2-\mu^2) \end{pmatrix} \\=&(x^1-\mu^1)(x^1-\mu^1) \\=&(AA^T)_{11} \end{aligned} \end{equation}
∂Σ11∂ATΣkA===((x1−μ1)(x2−μ2))(1000)((x1−μ1)(x2−μ2))(x1−μ1)(x1−μ1)(AAT)11
所以,以此类推
∂
A
T
Σ
k
A
∂
Σ
k
=
A
A
T
\frac{\partial{A^TΣ_kA}}{\partial{Σ_{k}}}=AA^T
∂Σk∂ATΣkA=AAT
所以公式(1)等于
∂
L
(
θ
,
λ
)
∂
Σ
k
=
∑
i
=
1
n
[
−
1
2
Σ
k
−
1
+
1
2
A
A
T
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
[
−
1
2
Σ
k
−
1
+
1
2
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
−
∑
i
=
1
n
1
2
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
+
∑
i
=
1
n
1
2
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
0
\begin{equation} \begin{aligned} \frac{\partial{L(\theta,\lambda)}}{\partial{Σ}_k}=&\sum_{i=1}^n \left[ -\frac{1}{2}{Σ}_k^{-1}+\frac{1}{2}AA^T \right]P(z_i=Ck|x_i,\theta^t) \\=&\sum\limits_{i=1}^n\left[ -\frac{1}{2}Σ_k^{-1}+\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1} \right]P(z_i=Ck|x_i,\theta^t) \\=&-\sum\limits_{i=1}^n\frac{1}{2}Σ_k^{-1}P(z_i=Ck|x_i,\theta^t)+\sum\limits_{i=1}^n\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\=&0 \end{aligned} \end{equation}
∂Σk∂L(θ,λ)====i=1∑n[−21Σk−1+21AAT]P(zi=Ck∣xi,θt)i=1∑n[−21Σk−1+21Σk−1(xi−μk)(xi−μk)TΣk−1]P(zi=Ck∣xi,θt)−i=1∑n21Σk−1P(zi=Ck∣xi,θt)+i=1∑n21Σk−1(xi−μk)(xi−μk)TΣk−1P(zi=Ck∣xi,θt)0
移项
∑
i
=
1
n
1
2
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
1
2
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
即
∑
i
=
1
n
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
等式左右,都左乘以
Σ
k
得
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
等式左右,都右乘以
Σ
k
得
Σ
k
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
\begin{equation} \begin{aligned} &\sum\limits_{i=1}^n\frac{1}{2}Σ_k^{-1}P(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n\frac{1}{2}{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&即\sum\limits_{i=1}^nΣ_k^{-1}P(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n{Σ}_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&等式左右,都左乘以Σ_k得& \\&\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^T{Σ}_k^{-1}P(z_i=Ck|x_i,\theta^t) \\&等式左右,都右乘以Σ_k得& \\&Σ_k\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^TP(z_i=Ck|x_i,\theta^t) \end{aligned} \end{equation}
i=1∑n21Σk−1P(zi=Ck∣xi,θt)=i=1∑n21Σk−1(xi−μk)(xi−μk)TΣk−1P(zi=Ck∣xi,θt)即i=1∑nΣk−1P(zi=Ck∣xi,θt)=i=1∑nΣk−1(xi−μk)(xi−μk)TΣk−1P(zi=Ck∣xi,θt)等式左右,都左乘以Σk得i=1∑nP(zi=Ck∣xi,θt)=i=1∑n(xi−μk)(xi−μk)TΣk−1P(zi=Ck∣xi,θt)等式左右,都右乘以Σk得Σki=1∑nP(zi=Ck∣xi,θt)=i=1∑n(xi−μk)(xi−μk)TP(zi=Ck∣xi,θt)
所以
Σ
k
=
∑
i
=
1
n
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
Σ_k=\frac{\sum\limits_{i=1}^n(x_i-\mu_k)(x_i-\mu_k)^TP(z_i=Ck|x_i,\theta^t)}{\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t)}
Σk=∑i=1nP(zi=Ck∣xi,θt)i=1∑n(xi−μk)(xi−μk)TP(zi=Ck∣xi,θt)
第②种:迹技巧
对于迹技巧,先来看两个微分公式(A是矩阵,并且可逆),此处不作推导,感兴趣可以百度或者看书
d
∣
A
∣
=
∣
A
∣
t
r
(
A
−
1
d
A
)
;
d
(
A
−
1
)
=
−
A
−
1
(
d
A
)
A
−
1
d|A|=|A|tr(A^{-1}dA); \\d(A^{-1})=-A^{-1}(dA)A^{-1}
d∣A∣=∣A∣tr(A−1dA);d(A−1)=−A−1(dA)A−1
对于迹技巧,对原函数求微分,原函数与
Σ
k
Σ_k
Σk相关的只有两项
第一项
l
o
g
∣
Σ
k
∣
log|Σ_k|
log∣Σk∣
d
(
l
o
g
∣
Σ
k
∣
)
=
1
∣
Σ
k
∣
d
∣
Σ
∣
=
1
∣
Σ
k
∣
∣
Σ
k
∣
t
r
(
Σ
k
−
1
d
Σ
k
)
=
t
r
(
Σ
k
−
1
d
Σ
k
)
d(log|Σ_k|)=\frac{1}{|Σ_k|}d|Σ|=\frac{1}{|Σ_k|}|Σ_k|tr(Σ_k^{-1}dΣ_k)=tr(Σ_k^{-1}dΣ_k)
d(log∣Σk∣)=∣Σk∣1d∣Σ∣=∣Σk∣1∣Σk∣tr(Σk−1dΣk)=tr(Σk−1dΣk)
第二项
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
(x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k)
(xi−μk)TΣk−1(xi−μk)
d
(
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
)
=
−
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
d
Σ
k
)
Σ
k
−
1
(
x
i
−
μ
k
)
\begin{equation} \begin{aligned} d((x_i-\mu_k)^TΣ_k^{-1}(x_i-\mu_k))=-(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \end{aligned} \end{equation}
d((xi−μk)TΣk−1(xi−μk))=−(xi−μk)TΣk−1(dΣk)Σk−1(xi−μk)
所以,原函数的微分
d
L
(
θ
,
λ
)
=
∑
i
=
1
n
[
−
1
2
t
r
(
Σ
k
−
1
d
Σ
k
)
+
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
d
Σ
k
)
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
{dL(\theta,\lambda)}=\sum_{i=1}^n \left[ -\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t)
dL(θ,λ)=i=1∑n[−21tr(Σk−1dΣk)+21(xi−μk)TΣk−1(dΣk)Σk−1(xi−μk)]P(zi=Ck∣xi,θt)
给其套入迹
t
r
(
d
L
(
θ
,
λ
)
)
=
t
r
(
∑
i
=
1
n
[
−
1
2
t
r
(
Σ
k
∣
−
1
d
Σ
k
)
+
1
2
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
d
Σ
k
)
Σ
k
−
1
(
x
i
−
μ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
)
=
∑
i
=
1
n
[
1
2
t
r
(
Σ
k
−
1
d
Σ
k
)
+
1
2
t
r
(
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
d
Σ
k
)
Σ
k
−
1
(
x
i
−
μ
k
)
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
[
−
1
2
t
r
(
Σ
k
−
1
d
Σ
k
)
+
1
2
t
r
(
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
d
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
[
1
2
t
r
(
−
Σ
k
−
1
d
Σ
k
+
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
d
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
[
1
2
t
r
(
−
Σ
k
−
1
+
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
)
d
Σ
k
)
]
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
t
r
(
∑
i
=
1
n
1
2
(
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(
−
Σ
k
−
1
+
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
)
)
d
Σ
k
)
\begin{equation} \begin{aligned} tr({dL(\theta,\lambda)})=&tr\left( \sum_{i=1}^n\left[ -\frac{1}{2}tr(Σ_k|^{-1}dΣ_k)+\frac{1}{2}(x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k) \right]P(z_i=Ck|x_i,\theta^t) \right) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}tr((x_i-\mu_k)^TΣ_k^{-1}(dΣ_k)Σ_k^{-1}(x_i-\mu_k))\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[-\frac{1}{2}tr(Σ_k^{-1}dΣ_k)+\frac{1}{2}tr(Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}dΣ_k)\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr(-Σ_k^{-1}dΣ_k+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}dΣ_k)\right]P(z_i=Ck|x_i,\theta^t) \\=&\sum_{i=1}^n\left[\frac{1}{2}tr\left(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1})dΣ_k\right)\right]P(z_i=Ck|x_i,\theta^t) \\=&tr\left(\sum_{i=1}^n\frac{1}{2}(P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}))dΣ_k\right) \end{aligned} \end{equation}
tr(dL(θ,λ))======tr(i=1∑n[−21tr(Σk∣−1dΣk)+21(xi−μk)TΣk−1(dΣk)Σk−1(xi−μk)]P(zi=Ck∣xi,θt))i=1∑n[21tr(Σk−1dΣk)+21tr((xi−μk)TΣk−1(dΣk)Σk−1(xi−μk))]P(zi=Ck∣xi,θt)i=1∑n[−21tr(Σk−1dΣk)+21tr(Σk−1(xi−μk)(xi−μk)TΣk−1dΣk)]P(zi=Ck∣xi,θt)i=1∑n[21tr(−Σk−1dΣk+Σk−1(xi−μk)(xi−μk)TΣk−1dΣk)]P(zi=Ck∣xi,θt)i=1∑n[21tr(−Σk−1+Σk−1(xi−μk)(xi−μk)TΣk−1)dΣk)]P(zi=Ck∣xi,θt)tr(i=1∑n21(P(zi=Ck∣xi,θt)(−Σk−1+Σk−1(xi−μk)(xi−μk)TΣk−1))dΣk)
去掉迹得
d
L
(
θ
,
λ
)
=
∑
i
=
1
n
1
2
(
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(
−
Σ
k
−
1
+
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
)
)
d
Σ
即
d
L
(
θ
,
λ
)
d
Σ
k
=
∑
i
=
1
n
1
2
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(
−
Σ
k
−
1
+
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
)
=
0
\begin{equation} \begin{aligned} &{dL(\theta,\lambda)}=\sum_{i=1}^n\frac{1}{2}(P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}))dΣ \\&即\frac{dL(\theta,\lambda)}{dΣ_k}=\sum_{i=1}^n\frac{1}{2}P(z_i=Ck|x_i,\theta^t)(-Σ_k^{-1}+Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1})=0 \end{aligned} \end{equation}
dL(θ,λ)=i=1∑n21(P(zi=Ck∣xi,θt)(−Σk−1+Σk−1(xi−μk)(xi−μk)TΣk−1))dΣ即dΣkdL(θ,λ)=i=1∑n21P(zi=Ck∣xi,θt)(−Σk−1+Σk−1(xi−μk)(xi−μk)TΣk−1)=0
移项
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
Σ
k
−
1
=
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
Σ
k
−
1
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ
k
−
1
\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)Σ_k^{-1}=\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)Σ_k^{-1}(x_i-\mu_k)(x_i-\mu_k)^TΣ_k^{-1}
i=1∑nP(zi=Ck∣xi,θt)Σk−1=i=1∑nP(zi=Ck∣xi,θt)Σk−1(xi−μk)(xi−μk)TΣk−1
和上面第一种方法一样,因为
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
P(z_i=Ck|x_i,\theta^t)
P(zi=Ck∣xi,θt)是标量,故等式左右,先都左乘
Σ
k
Σ_k
Σk,再都右乘
Σ
k
Σ_k
Σk,得
Σ
k
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
=
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
Σ_k\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)=\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T
Σki=1∑nP(zi=Ck∣xi,θt)=i=1∑nP(zi=Ck∣xi,θt)(xi−μk)(xi−μk)T
最终
Σ
k
=
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
∑
i
=
1
n
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
Σ_k=\frac{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)}
Σk=i=1∑nP(zi=Ck∣xi,θt)i=1∑nP(zi=Ck∣xi,θt)(xi−μk)(xi−μk)T
结果
p k = 1 n ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ; u k = ∑ i = 1 n x i P ( z i = C k ∣ x i , θ t ) ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ; Σ k = ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) ( x i − μ k ) ( x i − μ k ) T ∑ i = 1 n P ( z i = C k ∣ x i , θ t ) p_k=\frac{1}{n}\sum_{i=1}^nP(z_i=Ck|x_i,\theta^t);\\ u_k=\frac{\sum\limits_{i=1}^nx_iP(z_i=Ck|x_i,\theta^t)}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)}; \\Σ_k=\frac{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)(x_i-\mu_k)(x_i-\mu_k)^T}{\sum\limits_{i=1}^nP(z_i=Ck|x_i,\theta^t)} pk=n1i=1∑nP(zi=Ck∣xi,θt);uk=i=1∑nP(zi=Ck∣xi,θt)i=1∑nxiP(zi=Ck∣xi,θt);Σk=i=1∑nP(zi=Ck∣xi,θt)i=1∑nP(zi=Ck∣xi,θt)(xi−μk)(xi−μk)T
那么,接下来只需要计算出
P
(
z
i
=
C
k
∣
x
i
,
θ
t
)
P(z_i=Ck|x_i,\theta^t)
P(zi=Ck∣xi,θt),
θ
\theta
θ是参数,下面省略掉
P
(
z
i
=
C
k
∣
x
i
)
=
P
(
z
i
=
C
k
,
x
i
)
P
(
x
i
)
P(z_i=Ck|x_i)=\frac{P(z_i=Ck,x_i)}{P(x_i)}
P(zi=Ck∣xi)=P(xi)P(zi=Ck,xi)
对于
P
(
x
i
)
P(x_i)
P(xi)
P
(
x
i
)
=
∑
z
i
P
(
x
i
,
z
i
)
=
∑
k
=
1
K
P
(
x
i
,
z
i
=
C
k
)
P(x_i)=\sum\limits_{z_i}P(x_i,z_i)=\sum\limits_{k=1}^KP(x_i,z_i=Ck)
P(xi)=zi∑P(xi,zi)=k=1∑KP(xi,zi=Ck)
前面我们算出来过
P
(
x
i
,
z
i
=
C
k
)
=
p
k
∗
N
(
x
i
∣
μ
k
,
Σ
k
)
P(x_i,z_i=Ck)=p_k*N(x_i|\mu_k,Σ_k)
P(xi,zi=Ck)=pk∗N(xi∣μk,Σk)
所以
P
(
z
i
=
C
k
∣
x
i
)
=
p
k
∗
N
(
x
i
∣
μ
k
,
Σ
k
)
∑
k
=
1
K
p
k
∗
N
(
x
i
∣
μ
k
,
Σ
k
)
P(z_i=Ck|x_i)=\frac{p_k*N(x_i|\mu_k,Σ_k)}{\sum\limits_{k=1}^Kp_k*N(x_i|\mu_k,Σ_k)}
P(zi=Ck∣xi)=k=1∑Kpk∗N(xi∣μk,Σk)pk∗N(xi∣μk,Σk)
请务必注意式子中,分子处得k来自左边的Ck,而分母的k是来自求和符号
算法流程
①随机初始化模型参数 p t , μ t , Σ t p^t,\mu^t,Σ^t pt,μt,Σt。
②计算出 P ( z i = C k ∣ x i , θ ) P(z_i=Ck|x_i,\theta) P(zi=Ck∣xi,θ)
③依据公式计算出 p t + 1 , μ t + 1 , Σ t + 1 p^{t+1},\mu^{t+1},Σ^{t+1} pt+1,μt+1,Σt+1
④计算 p t + 1 , μ t + 1 , Σ t + 1 p^{t+1},\mu^{t+1},Σ^{t+1} pt+1,μt+1,Σt+1和 p t , μ t , Σ t p^t,\mu^t,Σ^t pt,μt,Σt的差值,如果差值小于 ϵ \epsilon ϵ(自己设定的值)。如果小于则说明变化太小,证明收敛,结束算法。否则循环②,③步骤
代码实现
结束
至此推导和代码已都全部完成。很多地方推导并不严谨,如有问题,还请指出。阿里嘎多