定义
定义:
P
(
x
)
=
∑
i
=
1
k
α
i
⋅
ϕ
(
x
∣
μ
i
,
Σ
i
)
P(\boldsymbol{x})=\sum_{i=1}^{k} \alpha_{i} \cdot \phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)
P(x)=i=1∑kαi⋅ϕ(x∣μi,Σi)
该模型共由k个混合成分组成,每个混合成分对应一个高斯分布,其中,
x
∈
R
n
\boldsymbol{x} \in \mathbb{R}^{n}
x∈Rn,
α
i
\alpha_i
αi为混合系数,且
α
i
≥
0
,
∑
i
=
1
k
α
i
=
1
\alpha_{i} \geq 0, \sum_{i=1}^{k} \alpha_{i}=1
αi≥0,∑i=1kαi=1,
ϕ
(
x
∣
μ
i
,
Σ
i
)
\phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)
ϕ(x∣μi,Σi)为多元高斯分布(当
x
\boldsymbol{x}
x为标量时,相应地替换为一元高斯分布)的概率密度函数:
ϕ
(
x
∣
μ
i
,
Σ
i
)
=
1
(
2
π
)
n
2
∣
Σ
i
∣
1
2
exp
(
−
1
2
(
x
−
μ
i
)
T
Σ
i
−
1
(
x
−
μ
i
)
)
\phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)=\frac{1}{(2 \pi)^{\frac{n}{2}}\left|\boldsymbol{\Sigma}_{i}\right|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}_{i}\right)\right)
ϕ(x∣μi,Σi)=(2π)2n∣Σi∣211exp(−21(x−μi)TΣi−1(x−μi))
其生成数据的方式为:首先,依概率
α
i
\alpha_i
αi选择第i个高斯混合成分,接着依据该混合成分的概率分布
ϕ
(
x
∣
μ
i
,
Σ
i
)
\phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)
ϕ(x∣μi,Σi)生成样本。
参数估计
EM算法
已知数据集 D = { x 1 , x 2 , … , x m } D=\left\{\boldsymbol{x}_{1}, \boldsymbol{x}_{2}, \ldots, \boldsymbol{x}_{m}\right\} D={x1,x2,…,xm}中的样本均由某个高斯混合模型生成,而每个样本 x j \boldsymbol{x}_{j} xj是由哪个高斯混合成分生成是未知的,属于一个隐变量,我们令其为 z j , z j ∈ { 1 , 2 , … , k } z_{j}, z_{j} \in\{1,2, \ldots, k\} zj,zj∈{1,2,…,k}表示生成样本 x j \boldsymbol{x}_{j} xj的高斯混合成分,结合高斯混合模型生成数据的方式易知 z j z_j zj的分布律为 P ( z j = i ) = α i P\left(z_{j}=i\right)=\alpha_{i} P(zj=i)=αi。接下来应用EM算法。
E步:确定Q函数,并把样本序列和隐变量序列代入其中
Q ( θ ∣ θ ( i ) ) = ∑ Z P ( Z ∣ X , θ ( i ) ) ln P ( X , Z ∣ θ ) = ∑ z 1 , z 2 , … , z m { ∏ j = 1 m P ( z j ∣ x j , θ ( i ) ) ln [ ∏ j = 1 m P ( x j , z j ∣ θ ) ] } = ∑ j = 1 m [ ∑ z j P ( z j ∣ x j , θ ( i ) ) ln P ( x j , z j ∣ θ ) ] = ∑ j = 1 m [ ∑ i = 1 k P ( z j = i ∣ x j , θ ( i ) ) ln P ( x j , z j = i ∣ θ ) ] \begin{aligned} Q\left(\theta \mid \theta^{(i)}\right) &=\sum_{Z} P\left(Z \mid X, \theta^{(i)}\right) \ln P(X, Z \mid \theta) \\ &=\sum_{z_{1}, z_{2}, \ldots, z_{m}}\left\{\prod_{j=1}^{m} P\left(z_{j} \mid x_{j}, \theta^{(i)}\right) \ln \left[\prod_{j=1}^{m} P\left(x_{j}, z_{j} \mid \theta\right)\right]\right\} \\ &=\sum_{j=1}^{m}\left[\sum_{z_{j}} P\left(z_{j} \mid x_{j}, \theta^{(i)}\right) \ln P\left(x_{j}, z_{j} \mid \theta\right)\right] \\ &=\sum_{j=1}^{m}\left[\sum_{i=1}^{k} P\left(z_{j}=i \mid x_{j}, \theta^{(i)}\right) \ln P\left(x_{j}, z_{j}=i \mid \theta\right)\right] \end{aligned} Q(θ∣θ(i))=Z∑P(Z∣X,θ(i))lnP(X,Z∣θ)=z1,z2,…,zm∑{j=1∏mP(zj∣xj,θ(i))ln[j=1∏mP(xj,zj∣θ)]}=j=1∑m⎣⎡zj∑P(zj∣xj,θ(i))lnP(xj,zj∣θ)⎦⎤=j=1∑m[i=1∑kP(zj=i∣xj,θ(i))lnP(xj,zj=i∣θ)]
其中,第2个等式到第3等式是根据EM算法笔记中的结果得到。
对于
P
(
z
j
=
i
∣
x
j
,
θ
(
i
)
)
P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right)
P(zj=i∣xj,θ(i)),如果我们先不考虑
θ
(
i
)
\theta^{(i)}
θ(i),有
P
(
z
j
=
i
∣
x
j
)
=
P
(
z
j
=
i
)
⋅
P
(
x
j
∣
z
j
=
i
)
P
(
x
j
)
=
α
i
⋅
ϕ
(
x
j
∣
μ
i
,
Σ
i
)
∑
l
=
1
k
α
l
⋅
ϕ
(
x
j
∣
μ
l
,
Σ
l
)
\begin{aligned} P\left(z_{j}=i \mid \boldsymbol{x}_{j}\right) &=\frac{P\left(z_{j}=i\right) \cdot P\left(\boldsymbol{x}_{j} \mid z_{j}=i\right)}{P\left(\boldsymbol{x}_{j}\right)} \\ &=\frac{\alpha_{i} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i} , \boldsymbol{\Sigma}_{i}\right)}{\sum_{l=1}^{k} \alpha_{l} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{l}, \boldsymbol{\Sigma}_{l}\right)} \end{aligned}
P(zj=i∣xj)=P(xj)P(zj=i)⋅P(xj∣zj=i)=∑l=1kαl⋅ϕ(xj∣μl,Σl)αi⋅ϕ(xj∣μi,Σi)
这就是西瓜书中的式9.30。如果考虑
θ
(
i
)
\theta^{(i)}
θ(i), 那么
P
(
z
j
=
i
∣
x
j
,
θ
(
i
)
)
=
α
i
(
i
)
⋅
ϕ
(
x
j
∣
μ
i
(
i
)
,
Σ
i
(
i
)
)
∑
l
=
1
k
α
l
(
i
)
⋅
ϕ
(
x
j
∣
μ
l
(
i
)
,
Σ
l
(
i
)
)
P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right)=\frac{\alpha_{i}^{(i)} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}^{(i)}, \boldsymbol{\Sigma}_{i}^{(i)}\right)}{\sum_{l=1}^{k} \alpha_{l}^{(i)} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{l}^{(i)}, \boldsymbol{\Sigma}_{l}^{(i)}\right)}
P(zj=i∣xj,θ(i))=∑l=1kαl(i)⋅ϕ(xj∣μl(i),Σl(i))αi(i)⋅ϕ(xj∣μi(i),Σi(i))
这里的i表示这是第i次迭代,同时也表示这个参数是已知的,也就是说式中都是已知量,那这个概率值也是已知量,所以我们可以将其简记为
γ
j
i
\gamma_{j i}
γji。
对于
P
(
x
j
,
z
j
=
i
∣
θ
)
P\left(x_{j}, z_{j}=i \mid \theta\right)
P(xj,zj=i∣θ),利用
P
(
A
,
B
)
=
P
(
A
)
⋅
P
(
A
∣
B
)
P(A, B)=P(A)\cdot P(A|B)
P(A,B)=P(A)⋅P(A∣B),可以有:
P
(
x
j
,
z
j
=
i
∣
θ
)
=
P
(
x
j
∣
z
j
=
i
,
θ
)
⋅
P
(
z
j
=
i
∣
θ
)
=
ϕ
(
x
j
∣
μ
i
,
Σ
i
)
⋅
α
i
\begin{aligned} P\left(\boldsymbol{x}_{j}, z_{j}=i \mid \theta\right) &=P\left(\boldsymbol{x}_{j} \mid z_{j}=i, \theta\right) \cdot P\left(z_{j}=i \mid \theta\right) \\ &=\phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot \alpha_{i} \end{aligned}
P(xj,zj=i∣θ)=P(xj∣zj=i,θ)⋅P(zj=i∣θ)=ϕ(xj∣μi,Σi)⋅αi
这时这个式子中的三个参数都是未知的。将上面两个式子代回到Q函数,可得
Q
(
θ
∣
θ
(
i
)
)
=
∑
j
=
1
m
[
∑
i
=
1
k
P
(
z
j
=
i
∣
x
j
,
θ
(
i
)
)
ln
P
(
x
j
,
z
j
=
i
∣
θ
)
]
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
ln
[
α
i
⋅
ϕ
(
x
j
∣
μ
i
,
Σ
i
)
]
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
[
ln
α
i
+
ln
ϕ
(
x
j
∣
μ
i
,
Σ
i
)
]
=
∑
j
=
1
m
∑
i
=
1
k
[
γ
j
i
ln
α
i
+
γ
j
i
ln
ϕ
(
x
j
∣
μ
i
,
Σ
i
)
]
=
∑
j
=
1
m
∑
i
=
1
k
{
γ
j
i
ln
α
i
+
γ
j
i
ln
[
1
(
2
π
)
n
2
∣
Σ
i
∣
1
2
exp
(
−
1
2
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
)
]
}
=
∑
j
=
1
m
∑
i
=
1
k
{
γ
j
i
ln
α
i
+
γ
j
i
[
ln
1
(
2
π
)
n
2
−
1
2
ln
∣
Σ
i
∣
−
1
2
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
]
}
=
∑
j
=
1
m
∑
i
=
1
k
{
γ
j
i
ln
α
i
+
γ
j
i
ln
1
(
2
π
)
n
2
−
1
2
γ
j
i
ln
∣
Σ
i
∣
−
1
2
γ
j
i
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
}
\begin{aligned} Q\left(\theta \mid \theta^{(i)}\right) &=\sum_{j=1}^{m}\left[\sum_{i=1}^{k} P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right) \ln P\left(\boldsymbol{x}_{j}, z_{j}=i \mid \theta\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i} \ln \left[\alpha_{i} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right]\\ &=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i}\left[\ln \alpha_{i}+\ln \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left[\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \left[\frac{1}{(2 \pi)^{\frac{n}{2}}\left|\boldsymbol{\Sigma}_{i}\right|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right)\right]\right\} \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i}\left[\ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]\right\} \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right\} \end{aligned}
Q(θ∣θ(i))=j=1∑m[i=1∑kP(zj=i∣xj,θ(i))lnP(xj,zj=i∣θ)]=j=1∑mi=1∑kγjiln[αi⋅ϕ(xj∣μi,Σi)]=j=1∑mi=1∑kγji[lnαi+lnϕ(xj∣μi,Σi)]=j=1∑mi=1∑k[γjilnαi+γjilnϕ(xj∣μi,Σi)]=j=1∑mi=1∑k{γjilnαi+γjiln[(2π)2n∣Σi∣211exp(−21(xj−μi)TΣi−1(xj−μi))]}=j=1∑mi=1∑k{γjilnαi+γji[ln(2π)2n1−21ln∣Σi∣−21(xj−μi)TΣi−1(xj−μi)]}=j=1∑mi=1∑k{γjilnαi+γjiln(2π)2n1−21γjiln∣Σi∣−21γji(xj−μi)TΣi−1(xj−μi)}
接下来就对Q函数进行极大化操作。对于m个多元正态分布生成的样本的似然函数
∑
i
=
1
m
ln
ϕ
(
x
j
∣
μ
j
,
Σ
j
)
\sum^{m}_{i=1}\ln\phi(\boldsymbol{x}_{j}\mid \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_{j})
∑i=1mlnϕ(xj∣μj,Σj),对比上面第4个等式的式子,差别仅仅是多了
γ
,
α
\gamma,\alpha
γ,α这些常数项。而且注意,虽然有
∑
i
=
1
k
\sum_{i=1}^{k}
∑i=1k这个求和符号,但是我们想要极大化的是特定的
(
μ
i
,
Σ
i
)
(\mu_{i}, \Sigma_{i})
(μi,Σi),对于下标不等于i的参数集合
(
μ
,
Σ
)
(\mu, \Sigma)
(μ,Σ),我们都可以看成是常数,所以
∑
i
=
1
k
\sum_{i=1}^{k}
∑i=1k这个求和符号其实可以在极大化操作中忽略。而
ln
α
i
\ln \alpha_i
lnαi是个凹函数,且有线性等式约束
α
i
≥
0
,
∑
i
=
1
k
α
i
=
1
\alpha_{i} \geq 0, \sum_{i=1}^{k} \alpha_{i}=1
αi≥0,∑i=1kαi=1,我们可以用拉格朗日乘子法求出来的点一定是目标函数的最大值点,所以我们用拉格朗日乘子法求
α
i
\alpha_i
αi。
M步:求使得Q函数达到极大的 θ ( i + 1 ) \theta^{(i+1)} θ(i+1)。
求
μ
i
(
i
+
1
)
\boldsymbol{\mu}_{i}^{(i+1)}
μi(i+1),也就是对于Q函数关于
μ
i
\boldsymbol{\mu}_{i}
μi求偏导
∂
Q
(
θ
,
θ
(
i
)
)
∂
μ
i
=
∑
j
=
1
m
{
0
+
0
−
0
−
1
2
γ
j
i
∂
(
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
)
∂
μ
i
}
=
−
∑
j
=
1
m
1
2
γ
j
i
∂
(
x
j
T
Σ
i
−
1
x
j
−
x
j
T
Σ
i
−
1
μ
i
−
μ
i
T
Σ
i
−
1
x
j
+
μ
i
T
Σ
i
−
1
μ
i
)
∂
μ
i
=
−
∑
j
=
1
m
1
2
γ
j
i
∂
(
−
x
j
T
Σ
i
−
1
μ
i
−
μ
i
T
Σ
i
−
1
x
j
+
μ
i
T
Σ
i
−
1
μ
i
)
∂
μ
i
\begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu}_{i}} &=\sum_{j=1}^{m}\left\{0+0-0-\frac{1}{2} \gamma_{j i} \frac{\partial\left(\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right)}{\partial \boldsymbol{\mu}_{i}}\right\} \\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}-\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \end{aligned}
∂μi∂Q(θ,θ(i))=j=1∑m⎩⎨⎧0+0−0−21γji∂μi∂((xj−μi)TΣi−1(xj−μi))⎭⎬⎫=−j=1∑m21γji∂μi∂(xjTΣi−1xj−xjTΣi−1μi−μiTΣi−1xj+μiTΣi−1μi)=−j=1∑m21γji∂μi∂(−xjTΣi−1μi−μiTΣi−1xj+μiTΣi−1μi)
由于
x
j
T
Σ
i
−
1
μ
i
\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}
xjTΣi−1μi和
μ
i
T
Σ
i
−
1
x
j
\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}
μiTΣi−1xj均为标量且
Σ
i
\Sigma_{i}
Σi为对称矩阵,标量转置还是它本身,所以
x
j
T
Σ
i
−
1
μ
i
=
(
x
j
T
Σ
i
−
1
μ
i
)
T
=
μ
i
T
(
Σ
i
−
1
)
T
x
j
=
μ
i
T
(
Σ
i
T
)
−
1
x
j
=
μ
i
T
Σ
i
−
1
x
j
\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}=\left(\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)^{T}=\boldsymbol{\mu}_{i}^{T}\left(\boldsymbol{\Sigma}_{i}^{-1}\right)^{T} \boldsymbol{x}_{j}=\boldsymbol{\mu}_{i}^{T}\left(\boldsymbol{\Sigma}_{i}^{T}\right)^{-1} \boldsymbol{x}_{j}=\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}
xjTΣi−1μi=(xjTΣi−1μi)T=μiT(Σi−1)Txj=μiT(ΣiT)−1xj=μiTΣi−1xj
代入上式可得
∂
Q
(
θ
,
θ
(
i
)
)
∂
μ
i
=
−
∑
j
=
1
m
1
2
γ
j
i
∂
(
−
x
j
T
Σ
i
−
1
μ
i
−
μ
i
T
Σ
i
−
1
x
j
+
μ
i
T
Σ
i
−
1
μ
i
)
∂
μ
i
=
−
∑
j
=
1
m
1
2
γ
j
i
∂
(
−
2
μ
i
T
Σ
i
−
1
x
j
+
μ
i
T
Σ
i
−
1
μ
i
)
∂
μ
i
\begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu_{i}}}&=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-x_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)} {\partial \boldsymbol{\mu}_{i}}\\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-2 \boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \end{aligned}
∂μi∂Q(θ,θ(i))=−j=1∑m21γji∂μi∂(−xjTΣi−1μi−μiTΣi−1xj+μiTΣi−1μi)=−j=1∑m21γji∂μi∂(−2μiTΣi−1xj+μiTΣi−1μi)
又由矩阵微分公式
∂
x
T
a
∂
x
=
a
,
∂
x
T
B
x
∂
x
=
(
B
+
B
T
)
x
\dfrac{\partial \boldsymbol{x}^{T} \boldsymbol{a}}{\partial \boldsymbol{x}}=\boldsymbol{a}, \dfrac{\partial \boldsymbol{x}^{T} \mathbf{B} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\mathbf{B}+\mathbf{B}^{T}\right) \boldsymbol{x}
∂x∂xTa=a,∂x∂xTBx=(B+BT)x可得
∂
Q
(
θ
,
θ
(
i
)
)
∂
μ
i
=
∑
j
=
1
m
1
2
γ
j
i
(
2
Σ
i
−
1
x
j
−
2
Σ
i
−
1
μ
i
)
=
∑
j
=
1
m
γ
j
i
(
Σ
i
−
1
x
j
−
Σ
i
−
1
μ
i
)
\frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu_{i}}}=\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i}\left(2 \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x_{j}}-2 \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu_{i}}\right)=\sum_{j=1}^{m} \gamma_{j i}\left( \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x_{j}}- \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu_{i}}\right)
∂μi∂Q(θ,θ(i))=j=1∑m21γji(2Σi−1xj−2Σi−1μi)=j=1∑mγji(Σi−1xj−Σi−1μi)
令上式等于0可得
∑
j
=
1
m
γ
j
i
(
Σ
i
−
1
x
j
−
Σ
i
−
1
μ
i
)
=
0
Σ
i
−
1
⋅
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
=
0
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
=
0
\begin{gathered} \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}-\boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)=0 \\ \boldsymbol{\Sigma}_{i}^{-1} \cdot \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)=0 \\ \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)=0 \end{gathered}
j=1∑mγji(Σi−1xj−Σi−1μi)=0Σi−1⋅j=1∑mγji(xj−μi)=0j=1∑mγji(xj−μi)=0
μ i = ∑ j = 1 m γ j i x j ∑ j = 1 m γ j i ⇒ μ i ( i + 1 ) = ∑ j = 1 m γ j i x j ∑ j = 1 m γ j i \boldsymbol{\mu}_{i}=\frac{\sum_{j=1}^{m} \gamma_{j i} \boldsymbol{x}_{j}}{\sum_{j=1}^{m} \gamma_{j i}} \Rightarrow \boldsymbol{\mu}_{i}^{(i+1)}=\frac{\sum_{j=1}^{m} \gamma_{j i} \boldsymbol{x}_{j}}{\sum_{j=1}^{m} \gamma_{j i}} μi=∑j=1mγji∑j=1mγjixj⇒μi(i+1)=∑j=1mγji∑j=1mγjixj
此即为西瓜书式9.34
求
Σ
i
(
i
+
1
)
\Sigma_{i}^{(i+1)}
Σi(i+1),对Q函数关于
Σ
i
\Sigma_{i}
Σi求偏导
∂
Q
(
θ
,
θ
(
i
)
)
∂
Σ
i
=
∑
j
=
1
m
{
0
+
0
−
∂
∂
Σ
i
(
1
2
γ
j
i
ln
∣
Σ
i
∣
)
−
∂
∂
Σ
i
[
1
2
γ
j
i
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
]
}
=
∑
j
=
1
m
{
−
1
2
γ
j
i
∂
(
ln
∣
Σ
i
∣
)
∂
Σ
i
−
1
2
γ
j
i
∂
[
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
]
∂
Σ
i
}
\begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{j=1}^{m}\left\{0+0-\frac{\partial}{\partial \boldsymbol{\Sigma}_{i}}\left(\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|\right)-\frac{\partial}{\partial \boldsymbol{\Sigma}_{i}}\left[\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]\right\} \\ &=\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \frac{\partial\left(\ln \left|\boldsymbol{\Sigma}_{i}\right|\right)}{\partial \boldsymbol{\Sigma}_{i}}-\frac{1}{2} \gamma_{j i} \frac{\partial\left[\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]}{\partial \boldsymbol{\Sigma}_{i}}\right\} \end{aligned}
∂Σi∂Q(θ,θ(i))=j=1∑m{0+0−∂Σi∂(21γjiln∣Σi∣)−∂Σi∂[21γji(xj−μi)TΣi−1(xj−μi)]}=j=1∑m⎩⎨⎧−21γji∂Σi∂(ln∣Σi∣)−21γji∂Σi∂[(xj−μi)TΣi−1(xj−μi)]⎭⎬⎫
由矩阵微分公式
∂
∣
X
∣
∂
X
=
∣
X
∣
⋅
(
X
−
1
)
T
,
∂
a
T
X
−
1
b
∂
X
=
−
X
−
T
a
b
T
X
−
T
\dfrac{\partial|\mathbf{X}|}{\partial \mathbf{X}}=|\mathbf{X}| \cdot\left(\mathbf{X}^{-1}\right)^{T}, \dfrac{\partial \boldsymbol{a}^{T} \mathbf{X}^{-1} \boldsymbol{b}}{\partial \mathbf{X}}=-\mathbf{X}^{-T} \boldsymbol{a} \boldsymbol{b}^{T} \mathbf{X}^{-T}
∂X∂∣X∣=∣X∣⋅(X−1)T,∂X∂aTX−1b=−X−TabTX−T,且
Σ
i
\Sigma_{i}
Σi是对称矩阵(先求逆再求转置相当于只求逆),可得
∂
Q
(
θ
,
θ
(
i
)
)
∂
Σ
i
=
∑
j
=
1
m
{
−
1
2
γ
j
i
⋅
1
∣
Σ
i
∣
⋅
∣
Σ
i
∣
⋅
(
Σ
i
−
1
)
T
−
1
2
γ
j
i
⋅
(
−
Σ
i
)
−
T
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
Σ
i
−
T
}
=
∑
j
=
1
m
{
−
1
2
γ
j
i
Σ
i
−
1
+
1
2
γ
j
i
Σ
i
−
1
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
Σ
i
−
1
}
\begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\Sigma}_{i}} & =\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \cdot \frac{1}{\left|\boldsymbol{\Sigma}_{i}\right|} \cdot\left|\boldsymbol{\Sigma}_{i}\right| \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\right)^{T}-\frac{1}{2} \gamma_{j i} \cdot\left(-\boldsymbol{\Sigma}_{i}\right)^{-T}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-T}\right\}\\ &=\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\right\} \end{aligned}
∂Σi∂Q(θ,θ(i))=j=1∑m{−21γji⋅∣Σi∣1⋅∣Σi∣⋅(Σi−1)T−21γji⋅(−Σi)−T(xj−μi)(xj−μi)TΣi−T}=j=1∑m{−21γjiΣi−1+21γjiΣi−1(xj−μi)(xj−μi)TΣi−1}
令上式等于0可得
∑
j
=
1
m
{
−
1
2
γ
j
i
Σ
i
−
1
+
1
2
γ
j
i
Σ
i
−
1
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
Σ
i
−
1
}
=
0
∑
j
=
1
m
{
−
1
2
γ
j
i
+
1
2
γ
j
i
Σ
i
−
1
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
}
=
0
1
2
∑
j
=
1
m
γ
j
i
Σ
i
−
1
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
=
1
2
∑
j
=
1
m
γ
j
i
Σ
i
−
1
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
=
∑
j
=
1
m
γ
j
i
\begin{aligned} &\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\right\}=0\\ &\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}\right\}=0\\ &\frac{1}{2} \sum_{j=1}^{m} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}=\frac{1}{2} \sum_{j=1}^{m} \gamma_{j i}\\ &\Sigma_{i}^{-1} \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}=\sum_{j=1}^{m} \gamma_{j i} \end{aligned}
j=1∑m{−21γjiΣi−1+21γjiΣi−1(xj−μi)(xj−μi)TΣi−1}=0j=1∑m{−21γji+21γjiΣi−1(xj−μi)(xj−μi)T}=021j=1∑mγjiΣi−1(xj−μi)(xj−μi)T=21j=1∑mγjiΣi−1j=1∑mγji(xj−μi)(xj−μi)T=j=1∑mγji
Σ i − 1 = ∑ j = 1 m γ j i ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T Σ i = ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T ∑ j = 1 m γ j i ⇒ Σ i ( i + 1 ) = ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T ∑ j = 1 m γ j i \begin{gathered} \boldsymbol{\Sigma}_{i}^{-1}=\frac{\sum_{j=1}^{m} \gamma_{j i}}{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}} \\ \boldsymbol{\Sigma}_{i}=\frac{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}}{\sum_{j=1}^{m} \gamma_{j i}} \Rightarrow \boldsymbol{\Sigma}_{i}^{(i+1)}=\frac{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}}{\sum_{j=1}^{m} \gamma_{j i}} \end{gathered} Σi−1=∑j=1mγji(xj−μi)(xj−μi)T∑j=1mγjiΣi=∑j=1mγji∑j=1mγji(xj−μi)(xj−μi)T⇒Σi(i+1)=∑j=1mγji∑j=1mγji(xj−μi)(xj−μi)T
此即为西瓜书式9.35
求
α
i
(
i
+
1
)
\alpha_{i}^{(i+1)}
αi(i+1),由于
α
i
\alpha_i
αi存在约束
∑
i
=
1
k
α
i
=
1
\sum_{i=1}^{k} \alpha_{i}=1
∑i=1kαi=1,所以考虑使用拉格朗日乘子法,其拉格朗日函数为
L
(
α
,
λ
)
=
Q
(
θ
,
θ
(
i
)
)
+
λ
(
∑
i
=
1
k
α
i
−
1
)
=
∑
j
=
1
m
∑
i
=
1
k
{
γ
j
i
ln
α
i
+
γ
j
i
ln
1
(
2
π
)
n
2
−
1
2
γ
j
i
ln
∣
Σ
i
∣
−
1
2
γ
j
i
(
x
j
−
μ
i
)
T
Σ
i
−
1
(
x
j
−
μ
i
)
}
+
λ
(
∑
i
=
1
k
α
i
−
1
)
\begin{aligned} L(\boldsymbol{\alpha}, \lambda) &=Q\left(\theta, \theta^{(i)}\right)+\lambda\left(\sum_{i=1}^{k} \alpha_{i}-1\right) \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right\}+\lambda\left(\sum_{i=1}^{k} \alpha_{i}-1\right) \end{aligned}
L(α,λ)=Q(θ,θ(i))+λ(i=1∑kαi−1)=j=1∑mi=1∑k{γjilnαi+γjiln(2π)2n1−21γjiln∣Σi∣−21γji(xj−μi)TΣi−1(xj−μi)}+λ(i=1∑kαi−1)
对拉格朗日函数关于
α
i
\alpha_i
αi求偏导:
∂
L
(
α
,
λ
)
∂
α
i
=
∑
j
=
1
m
{
∂
(
γ
j
i
ln
α
i
)
∂
α
i
+
0
−
0
−
0
}
+
λ
∂
(
∑
i
=
1
k
α
i
−
1
)
∂
α
i
=
∑
j
=
1
m
γ
j
i
α
i
+
λ
\frac{\partial L(\boldsymbol{\alpha} , \lambda)}{\partial \alpha_{i}}=\sum_{j=1}^{m}\left\{\frac{\partial\left(\gamma_{j i} \ln \alpha_{i}\right)}{\partial \alpha_{i}}+0-0-0\right\}+\lambda \frac{\partial\left(\sum_{i=1}^{k} \alpha_{i}-1\right)}{\partial \alpha_{i}}=\sum_{j=1}^{m} \frac{\gamma_{j i}}{\alpha_{i}}+\lambda
∂αi∂L(α,λ)=j=1∑m{∂αi∂(γjilnαi)+0−0−0}+λ∂αi∂(∑i=1kαi−1)=j=1∑mαiγji+λ
令上式等于0可得
∑
j
=
1
m
γ
j
i
α
i
+
λ
=
0
1
α
i
∑
j
=
1
m
γ
j
i
=
−
λ
α
i
=
−
1
λ
∑
j
=
1
m
γ
j
i
\begin{aligned} &\sum_{j=1}^{m} \frac{\gamma_{j i}}{\alpha_{i}}+\lambda=0 \\ &\frac{1}{\alpha_{i}} \sum_{j=1}^{m} \gamma_{j i}=-\lambda \\ &\alpha_{i}=-\frac{1}{\lambda} \sum_{j=1}^{m} \gamma_{j i} \end{aligned}
j=1∑mαiγji+λ=0αi1j=1∑mγji=−λαi=−λ1j=1∑mγji
由于
∑
i
=
1
k
α
i
=
1
\sum_{i=1}^{k} \alpha_{i}=1
∑i=1kαi=1,则上式两边关于i求和可得
∑
i
=
1
k
α
i
=
−
1
λ
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
1
=
−
1
λ
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
λ
=
−
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
\begin{gathered} \sum_{i=1}^{k} \alpha_{i}=-\frac{1}{\lambda} \sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \\ 1=-\frac{1}{\lambda} \sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \\ \lambda=-\sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \end{gathered}
i=1∑kαi=−λ1i=1∑kj=1∑mγji1=−λ1i=1∑kj=1∑mγjiλ=−i=1∑kj=1∑mγji
这时我们要求出
λ
\lambda
λ,又因为
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
=
∑
j
=
1
m
∑
i
=
1
k
P
(
z
j
=
i
∣
x
j
,
θ
(
i
)
)
=
∑
j
=
1
m
1
=
m
\sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i}=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i}=\sum_{j=1}^{m} \sum_{i=1}^{k} P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right)=\sum_{j=1}^{m} 1=m
i=1∑kj=1∑mγji=j=1∑mi=1∑kγji=j=1∑mi=1∑kP(zj=i∣xj,θ(i))=j=1∑m1=m
所以
λ
=
−
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
=
−
m
α
i
=
−
1
λ
∑
j
=
1
m
γ
j
i
=
1
m
∑
j
=
1
m
γ
j
i
\begin{gathered} \lambda=-\sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i}=-m \\ \alpha_{i}=-\frac{1}{\lambda} \sum_{j=1}^{m} \gamma_{j i}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \end{gathered}
λ=−i=1∑kj=1∑mγji=−mαi=−λ1j=1∑mγji=m1j=1∑mγji
由于
0
≤
γ
j
i
=
P
(
z
j
=
i
∣
x
j
,
θ
(
i
)
)
≤
1
0 \leq \gamma_{j i}=P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right) \leq 1
0≤γji=P(zj=i∣xj,θ(i))≤1,所以
0
≤
∑
j
=
1
m
γ
j
i
≤
m
⇒
0
≤
1
m
∑
j
=
1
m
γ
j
i
≤
1
0 \leq \sum_{j=1}^{m} \gamma_{j i} \leq m \Rightarrow 0 \leq \frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \leq 1
0≤j=1∑mγji≤m⇒0≤m1j=1∑mγji≤1
那么此时解得的
α
i
\alpha_{i}
αi是有效解,可以作为下一次迭代的初始参数,也即
α
i
=
1
m
∑
j
=
1
m
γ
j
i
⇒
α
i
(
i
+
1
)
=
1
m
∑
j
=
1
m
γ
j
i
\alpha_{i}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \Rightarrow \alpha_{i}^{(i+1)}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i}
αi=m1j=1∑mγji⇒αi(i+1)=m1j=1∑mγji
此即为西瓜书式9.38