上篇博我们给出了高斯混合模型EM算法,这里我们对它的公式进行详细推导.
高斯混合模型EM算法推导
在7.10 EM算法的使用场景及步骤中,我们给出了一般的EM算法步聚,在具体应用时,关键是要构造出该方法所需的要素,然后直接套用它即可。 这里符合指出的没有缺失属性(隐变量)的情况,有有(I)和(II)两种办法处理。
【西瓜书】就是按(I)处理:即在极大(对数)似然过程中“凑出”递推式,转化为EM算法。
(1)参数 μ \boldsymbol{\mu } μ
由【西瓜书式(9.28)】有
∂
p
(
x
)
∂
μ
=
p
(
x
)
−
1
2
∂
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
∂
μ
=
−
p
(
x
)
Σ
−
1
(
x
−
μ
)
(由【西瓜书式(A.32】))
\begin{align} \frac{\partial p(\boldsymbol{x })}{\partial \boldsymbol{\mu }} & =p(\boldsymbol{x }) \frac{-\frac{1}{2}\partial (\boldsymbol{x }-\boldsymbol{\mu })^\mathrm{T}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu })}{\partial \boldsymbol{\mu }}\notag \\ & =-p(\boldsymbol{x }){\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu })\qquad \text{(由【西瓜书式(A.32】))} \tag{9.3} \end{align}
∂μ∂p(x)=p(x)∂μ−21∂(x−μ)TΣ−1(x−μ)=−p(x)Σ−1(x−μ)(由【西瓜书式(A.32】))(9.3)
将式(9.3)用于 p ( x j ∣ μ i , Σ i ) p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i) p(xj∣μi,Σi)
,即
∂
p
(
x
j
∣
μ
i
,
Σ
i
)
∂
μ
i
=
−
p
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
(
x
j
−
μ
i
)
\begin{align} \frac{\partial p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\partial \boldsymbol{\mu }_i}=-p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.4} \end{align}
∂μi∂p(xj∣μi,Σi)=−p(xj∣μi,Σi)Σi−1(xj−μi)(9.4)
由【西瓜书式(9.32)】有
∂
L
L
(
D
)
∂
μ
i
=
∑
j
=
1
m
∂
ln
(
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
)
∂
μ
i
=
∑
j
=
1
m
α
i
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
∂
p
(
x
j
∣
μ
i
,
Σ
i
)
∂
μ
i
=
−
∑
j
=
1
m
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
(
x
j
−
μ
i
)
(由式(9.4))
=
−
Σ
i
−
1
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
\begin{align} \frac{\partial \mathrm{LL}(D)}{\partial \boldsymbol{\mu }_i} & =\sum_{j=1}^m\frac{\partial\ln\, (\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i))}{\partial \boldsymbol{\mu }_i}\notag \\ & =\sum_{j=1}^m\frac{{\alpha}_i }{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}\frac{\partial p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\partial \boldsymbol{\mu }_i}\notag \\ & =-\sum_{j=1}^m\frac{{\alpha}_i p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)\quad \text{(由式(9.4))}\notag \\ & =-{\boldsymbol{\Sigma } }_i^{-1}\sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.5} \end{align}
∂μi∂LL(D)=j=1∑m∂μi∂ln(∑i=1kαip(xj∣μi,Σi))=j=1∑m∑i=1kαip(xj∣μi,Σi)αi∂μi∂p(xj∣μi,Σi)=−j=1∑m∑i=1kαip(xj∣μi,Σi)αip(xj∣μi,Σi)Σi−1(xj−μi)(由式(9.4))=−Σi−1j=1∑mγji(xj−μi)(9.5)
其中
γ
j
i
=
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
\begin{align} {\gamma}_{ji}=\frac{{\alpha}_i p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)} \tag{9.6} \end{align}
γji=∑i=1kαip(xj∣μi,Σi)αip(xj∣μi,Σi)(9.6)
式中
Σ
i
{\boldsymbol{\Sigma } }_i
Σi不是求和符号,而是协方差(矩阵)。
注:这里既有 Σ {\boldsymbol{\Sigma } } Σ又有 ∑ {\sum } ∑,请注意区别: Σ {\boldsymbol{\Sigma } } Σ不是求和符号,而是协方差矩阵。
令
∂
L
L
(
D
)
∂
μ
i
=
0
\frac{\partial \mathrm{LL}(D)}{\partial \boldsymbol{\mu }_i}=\boldsymbol{0}
∂μi∂LL(D)=0,两边乘以
Σ
{\boldsymbol{\Sigma } }
Σ,则由式(9.5)得
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
=
0
∑
j
=
1
m
γ
j
i
x
j
=
μ
i
∑
j
=
1
m
γ
j
i
\begin{align} \sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)=0\notag \\ \sum_{j=1}^m{\gamma}_{ji}\boldsymbol{x }_j=\boldsymbol{\mu }_i\sum_{j=1}^m{\gamma}_{ji} \tag{9.7} \end{align}
j=1∑mγji(xj−μi)=0j=1∑mγjixj=μij=1∑mγji(9.7)
以当前(
t
t
t时)的参数值
(
α
i
t
,
μ
i
t
,
Σ
i
t
)
({\alpha}_i^{\,t},\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t})
(αit,μit,Σit),根据【西瓜书式(9.28)】即可计算出此时的式(9.6)的值
γ
j
i
t
=
α
i
t
p
(
x
j
∣
μ
i
t
,
Σ
i
t
)
∑
i
=
1
k
α
i
t
p
(
x
j
∣
μ
i
t
,
Σ
i
t
)
\begin{align} {\gamma}_{ji}^{\,t}=\frac{{\alpha}_i^{\,t} p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t})}{\sum_{i=1}^k{\alpha}_i^{\,t}p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t})} \tag{9.8} \end{align}
γjit=∑i=1kαitp(xj∣μit,Σit)αitp(xj∣μit,Σit)(9.8)
由式(9.8)的已知,代入等式(9.7),得下一时刻需求解的
μ
\boldsymbol{\mu }
μ,这样就“凑”成了递推式(将等式变为递推式)
∑
j
=
1
m
γ
j
i
t
x
j
=
μ
i
t
+
1
∑
j
=
1
m
γ
j
i
t
\begin{align} \sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j=\boldsymbol{\mu }_i^{\,t+1}\sum_{j=1}^m{\gamma}_{ji}^{\,t} \tag{9.9} \end{align}
j=1∑mγjitxj=μit+1j=1∑mγjit(9.9)
即
μ
i
t
+
1
=
∑
j
=
1
m
γ
j
i
t
x
j
∑
j
=
1
m
γ
j
i
t
\begin{align} \boldsymbol{\mu }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.10} \end{align}
μit+1=∑j=1mγjit∑j=1mγjitxj(9.10)
即【西瓜书式(9.34)】。
(2)参数
Σ
{\boldsymbol{\Sigma } }
Σ
∂
∣
Σ
∣
−
1
2
∂
Σ
=
−
1
2
∣
Σ
∣
−
1
2
Σ
−
T
(由式(A86))
=
−
1
2
∣
Σ
∣
−
1
2
Σ
−
1
(由
Σ
的对称性)
\begin{align} \frac{\partial |{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}{\partial {\boldsymbol{\Sigma } }} & ={-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-\mathrm{T}}\quad \text{(由式(A86))}\notag \\ & ={-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-1}\quad \text{(由${\boldsymbol{\Sigma } }$的对称性)} \tag{9.11} \end{align}
∂Σ∂∣Σ∣−21=−21∣Σ∣−21Σ−T(由式(A86))=−21∣Σ∣−21Σ−1(由Σ的对称性)(9.11)
其中用到公式(参见5、含矩阵的偏导数):
∂
∣
A
∣
−
1
2
∂
A
=
−
1
2
∣
A
∣
−
1
2
A
−
T
\begin{align} \frac{\partial |\mathbf{A}|^{-\frac{1}{2}}}{\partial \mathbf{A}} ={-\frac{1}{2}}|\mathbf{A}|^{-\frac{1}{2}}\mathbf{A}^{-\mathrm{T}} \tag{A86} \end{align}
∂A∂∣A∣−21=−21∣A∣−21A−T(A86)
∂
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
∂
Σ
=
∂
t
r
(
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∂
Σ
=
∂
t
r
(
(
x
−
μ
)
(
x
−
μ
)
T
Σ
−
1
)
∂
Σ
=
−
Σ
−
T
(
(
x
−
μ
)
(
x
−
μ
)
T
)
T
Σ
−
T
(由式(A80))
=
−
Σ
−
1
(
x
−
μ
)
(
x
−
μ
)
T
Σ
−
1
(由
Σ
的对称性)
\begin{align} & \quad \frac{\partial (\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial \mathrm{tr}((\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu}))}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial \mathrm{tr}((\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1})}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =-{\boldsymbol{\Sigma } }^{-\mathrm{T}}((\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-\mathrm{T}}\quad \text{(由式(A80))}\notag \\ & =-{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}\quad \text{(由${\boldsymbol{\Sigma } }$的对称性)} \tag{9.12} \end{align}
∂Σ∂(x−μ)TΣ−1(x−μ)=∂Σ∂tr((x−μ)TΣ−1(x−μ))=∂Σ∂tr((x−μ)(x−μ)TΣ−1)=−Σ−T((x−μ)(x−μ)T)TΣ−T(由式(A80))=−Σ−1(x−μ)(x−μ)TΣ−1(由Σ的对称性)(9.12)
其中用到公式(参见5、含矩阵的偏导数):
∂
t
r
(
B
A
−
1
)
∂
A
=
−
(
A
−
1
B
A
−
1
)
T
\begin{align} \frac{\partial \mathrm{tr}(\mathbf{B}\mathbf{A}^{-1})}{\partial \mathbf{A}}=-(\mathbf{A}^{-1}\mathbf{B}\mathbf{A}^{-1})^{\mathrm{T}} \tag{A80} \end{align}
∂A∂tr(BA−1)=−(A−1BA−1)T(A80)
利用式(9.11)、式(9.12),求【西瓜书式(9.28)】关于矩阵
Σ
{\boldsymbol{\Sigma } }
Σ的偏导数,有
∂
P
(
x
)
∂
Σ
=
∂
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∂
Σ
=
∂
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
∂
Σ
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
+
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
∂
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∂
Σ
=
(
2
π
)
−
n
2
(
−
1
2
∣
Σ
∣
−
1
2
Σ
−
1
)
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
+
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∂
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
∂
Σ
=
−
1
2
P
(
x
)
(
Σ
−
1
)
+
P
(
x
)
(
−
1
2
(
−
Σ
−
1
(
x
−
μ
)
(
x
−
μ
)
T
Σ
−
1
)
)
=
1
2
P
(
x
)
Σ
−
1
(
−
Σ
+
(
x
−
μ
)
(
x
−
μ
)
T
)
Σ
−
1
\begin{align} & \quad \frac{\partial P(\boldsymbol{x })}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial {(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial {(2\mathrm{\pi} )^{-\frac{n}{2}}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}{\partial {\boldsymbol{\Sigma } }}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\notag \\ & \ +{(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\frac{\partial\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)}{\partial {\boldsymbol{\Sigma } }}\notag \\ & = {(2\mathrm{\pi} )^{-\frac{n}{2}}}({-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-1})\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\notag \\ & \quad +{(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\frac{\partial(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu}))}{\partial {\boldsymbol{\Sigma } }}\notag \\ & ={-\frac{1}{2}}P(\boldsymbol{x })({\boldsymbol{\Sigma } }^{-1})+P(\boldsymbol{x })\left(-\frac{1}{2}(-{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1})\right)\notag \\ & ={\frac{1}{2}}P(\boldsymbol{x }){\boldsymbol{\Sigma } }^{-1}\left(-{\boldsymbol{\Sigma } }+(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}\right){\boldsymbol{\Sigma } }^{-1} \tag{9.13} \end{align}
∂Σ∂P(x)=∂Σ∂(2π)−2n∣Σ∣−21exp(−21(x−μ)TΣ−1(x−μ))=∂Σ∂(2π)−2n∣Σ∣−21exp(−21(x−μ)TΣ−1(x−μ)) +(2π)−2n∣Σ∣−21∂Σ∂exp(−21(x−μ)TΣ−1(x−μ))=(2π)−2n(−21∣Σ∣−21Σ−1)exp(−21(x−μ)TΣ−1(x−μ))+(2π)−2n∣Σ∣−21exp(−21(x−μ)TΣ−1(x−μ))∂Σ∂(−21(x−μ)TΣ−1(x−μ))=−21P(x)(Σ−1)+P(x)(−21(−Σ−1(x−μ)(x−μ)TΣ−1))=21P(x)Σ−1(−Σ+(x−μ)(x−μ)T)Σ−1(9.13)
将式(9.13)应用于
P
(
x
j
∣
μ
i
,
Σ
i
)
P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i,{\boldsymbol{\Sigma } }_i)
P(xj∣μi,Σi),有
∂
P
(
x
j
∣
μ
i
,
Σ
i
)
∂
Σ
i
=
1
2
P
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
\begin{align} \frac{\partial P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{\partial {\boldsymbol{\Sigma } }_i} & =\frac{1}{2}P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\tag{9.14} \end{align}
∂Σi∂P(xj∣μi,Σi)=21P(xj∣μi,Σi)Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1(9.14)
利用式(9.14),再求【西瓜书式(9.32)】关于矩阵
Σ
i
{\boldsymbol{\Sigma } }_i
Σi的偏导数,有
∂
L
L
(
D
)
∂
Σ
i
=
∑
j
=
1
m
∂
ln
(
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
)
∂
Σ
i
=
∑
j
=
1
m
α
i
(
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
)
∂
P
(
x
j
∣
μ
i
,
Σ
i
)
∂
Σ
i
(下式由式(9.14))
=
1
2
∑
j
=
1
m
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
(
∑
i
=
1
k
α
i
p
(
x
j
∣
μ
i
,
Σ
i
)
)
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
=
1
2
∑
j
=
1
m
γ
j
i
[
Σ
i
−
1
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
Σ
i
−
1
−
Σ
i
−
1
]
(由式(9.6))
=
1
2
Σ
i
−
1
(
∑
j
=
1
m
γ
j
i
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
)
Σ
i
−
1
\begin{align} \frac{\partial \mathrm{LL}(D)}{\partial {\boldsymbol{\Sigma } }_i} & =\sum_{j=1}^m\frac{\partial \ln\, ({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}{\partial {\boldsymbol{\Sigma } }_i}\notag \\ & =\sum_{j=1}^m\frac{{\alpha}_i }{({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}\frac{\partial P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{\partial {\boldsymbol{\Sigma } }_i}\qquad \text{(下式由式(9.14))}\notag \\ & =\frac{1}{2}\sum_{j=1}^m\frac{{\alpha}_i P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}{\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}\sum_{j=1}^m{\gamma}_{ji} [{\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}{\boldsymbol{\Sigma } }_i^{-1}-{\boldsymbol{\Sigma } }_i^{-1}]\qquad \text{(由式(9.6))}\notag \\ & =\frac{1}{2}{\boldsymbol{\Sigma } }_i^{-1}\left(\sum_{j=1}^m{\gamma}_{ji} [(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]\right){\boldsymbol{\Sigma } }_i^{-1} \tag{9.15} \end{align}
∂Σi∂LL(D)=j=1∑m∂Σi∂ln(∑i=1kαip(xj∣μi,Σi))=j=1∑m(∑i=1kαip(xj∣μi,Σi))αi∂Σi∂P(xj∣μi,Σi)(下式由式(9.14))=21j=1∑m(∑i=1kαip(xj∣μi,Σi))αiP(xj∣μi,Σi)Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1=21j=1∑mγji[Σi−1(xj−μi)(xj−μi)TΣi−1−Σi−1](由式(9.6))=21Σi−1(j=1∑mγji[(xj−μi)(xj−μi)T−Σi])Σi−1(9.15)
令
∂
L
L
(
D
)
∂
Σ
i
=
0
\frac{\partial \mathrm{LL}(D)}{\partial {\boldsymbol{\Sigma } }_i}=\mathbf{0}
∂Σi∂LL(D)=0,得
∑
j
=
1
m
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
=
Σ
i
∑
j
=
1
m
γ
j
i
\begin{align} \sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}={\boldsymbol{\Sigma } }_i\sum_{j=1}^m{\gamma}_{ji} \tag{9.16} \end{align}
j=1∑mγji(xj−μi)(xj−μi)T=Σij=1∑mγji(9.16)
同式(9.9)的方法,由等式“凑”出递推式
Σ
i
t
+
1
=
∑
j
=
1
m
γ
j
i
t
(
x
j
−
μ
i
t
)
(
x
j
−
μ
i
t
)
T
∑
j
=
1
m
γ
j
i
t
\begin{align} {\boldsymbol{\Sigma } }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}(\boldsymbol{x }_j-\boldsymbol{\mu}_i^{\,t})(\boldsymbol{x }_j-\boldsymbol{\mu}_i^{\,t})^{\mathrm{T}}}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.17} \end{align}
Σit+1=∑j=1mγjit∑j=1mγjit(xj−μit)(xj−μit)T(9.17)
即【西瓜书式(9.35)】,该式为矩阵。
(3)参数 α {\alpha} α
在 L L ( D ) \mathrm{LL}(D) LL(D)中若将混合系数 α i {\alpha}_i αi视为变量,则它有约束: α i > 0 , ∑ i = 1 k α i = 1 {\alpha}_i> 0,\, \sum_{i=1}^k{\alpha}_i=1 αi>0,∑i=1kαi=1,故需要拉格朗日乘数法,由此得到拉格朗日函数【西瓜书式(9.36)】,令其对 α {\alpha} α的导数为0,则得到【西瓜书式(9.37)】。
观察【西瓜书式(9.37)】发现:与式(9.6)比较,它的分数项分子缺
α
i
{\alpha}_i
αi,两边乘以
α
i
{\alpha}_i
αi即可配成式(9.6),则有
∑
j
=
1
m
γ
j
i
+
λ
α
i
=
0
\begin{align} \sum_{j=1}^m{\gamma}_{ji} +\lambda {\alpha }_i=0 \tag{9.18} \end{align}
j=1∑mγji+λαi=0(9.18)
对
i
i
i求和,有
0
=
∑
i
=
1
k
(
∑
j
=
1
m
γ
j
i
+
λ
α
i
)
=
∑
i
=
1
k
∑
j
=
1
m
γ
j
i
+
λ
∑
i
=
1
k
α
i
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
+
λ
(由于
∑
i
=
1
k
α
i
=
1
)
=
∑
j
=
1
m
1
+
λ
(由于
∑
i
=
1
k
γ
j
i
=
1
)
=
m
+
λ
∴
λ
=
−
m
\begin{align} 0 & =\sum_{i=1}^k(\sum_{j=1}^m{\gamma}_{ji} +\lambda {\alpha }_i)\notag \\ & =\sum_{i=1}^k\sum_{j=1}^m{\gamma}_{ji} +\lambda \sum_{i=1}^k{\alpha }_i\notag \\ & =\sum_{j=1}^m\sum_{i=1}^k{\gamma}_{ji}+\lambda \qquad \text{(由于$\sum_{i=1}^k{\alpha }_i=1$)}\notag \\ & =\sum_{j=1}^m1+\lambda \qquad \text{(由于$\sum_{i=1}^k{\gamma}_{ji}=1$)}\notag \\ & =m+\lambda\notag \\ \therefore \qquad \quad \lambda & =-m \tag{9.19} \end{align}
0∴λ=i=1∑k(j=1∑mγji+λαi)=i=1∑kj=1∑mγji+λi=1∑kαi=j=1∑mi=1∑kγji+λ(由于∑i=1kαi=1)=j=1∑m1+λ(由于∑i=1kγji=1)=m+λ=−m(9.19)
由式(9.18)、式(9.19)“凑”出递推式
α
i
t
+
1
=
1
m
∑
j
=
1
m
γ
j
i
t
\begin{align} {\alpha }_i^{\,t+1}=\frac{1}{m}\sum_{j=1}^m{\gamma}_{ji}^{\,t} \tag{9.20} \end{align}
αit+1=m1j=1∑mγjit(9.20)
即【西瓜书式(9.38)】。
有了上述递推式,则可套用EM算法:
- E步:根据当前时刻( t t t时)的参数,由式(9.8)计算当前时刻的 γ j i t {\gamma}_{ji}^{\,t} γjit。
- M步:由 γ j i t {\gamma}_{ji}^{\,t} γjit及递推式(9.10)、式(9.17)、式(9.20)更新下一时刻( t + 1 t+1 t+1)的参数。
整理成伪代码即为【西瓜书图9.6】所示的高斯混合聚类算法。
注:代码中并没有体现出关于时刻的符号 t t t,这是由于程序运行的过程隐含地体现了时刻,若显示地体现,则会引入较多的变量,占用较多的存储空间,需要寻址、赋值等,反而不方便。
至此,我们完成了高斯混合聚类算法的推导,看官如果还不过瘾的话,我们再来推导一次——按EM的(II)的方法:将样本的成分标识视为隐变量,使用EM算法。
首先,我们设定要素及准备有关公式。
(1)参数:
k
k
k个成分的高斯混合分布的参数为
Θ
=
(
{
μ
i
,
Σ
i
,
α
i
}
i
=
1
k
)
{\Theta}=(\{{\mu }_i,{\boldsymbol{\Sigma } }_i,{\alpha }_i\}_{i=1}^k)
Θ=({μi,Σi,αi}i=1k),在序列(7.33)中(7.10 EM算法的使用场景及步骤),
Θ
0
,
Θ
1
,
Θ
2
,
⋯
,
Θ
t
,
Θ
t
+
1
,
⋯
\begin{align} {\Theta}^0,{\Theta}^1,{\Theta}^2,\cdots,{\Theta}^{\,t},{\Theta}^{\,t+1},\cdots \tag{7.33} \end{align}
Θ0,Θ1,Θ2,⋯,Θt,Θt+1,⋯(7.33)
Θ
t
=
(
{
μ
i
t
,
Σ
i
t
,
α
i
t
}
i
=
1
k
)
{\Theta}^{\,t}=(\{{\mu }_i^t,{\boldsymbol{\Sigma } }_i^t,{\alpha }_i^t\}_{i=1}^k)
Θt=({μit,Σit,αit}i=1k)。
(2)隐变量:将样本 x \boldsymbol{x } x所属的成分(簇)作为其隐变量 z z z,根据混合成分分布的定义我们给出关联的“事件”概率。
{混合成分中产生的样本
x
\boldsymbol{x}
x属于第
i
i
i个成分}={选取第
i
i
i个成分}
⋂
\bigcap
⋂{在该成分中产生样本
x
\boldsymbol{x}
x}
,则该事件发生的概率表达式有
P
(
x
,
z
=
i
∣
Θ
)
=
α
i
P
(
x
∣
Θ
i
)
\begin{align} P(\boldsymbol{x },z=i\,|\,\Theta) & ={\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i) \tag{9.21} \end{align}
P(x,z=i∣Θ)=αiP(x∣Θi)(9.21)
注:这时,
x
\boldsymbol{x }
x只与
Θ
\Theta
Θ中的成分
i
i
i的参数
Θ
i
{\Theta}_i
Θi有关。
{混合成分中产生样本
x
\boldsymbol{x}
x}=
⋃
i
=
1
k
{\bigcup}_{i=1}^k
⋃i=1k{在第
i
i
i个成分中产生样本
x
\boldsymbol{x}
x},则该事件发生的概率表达式有
P
(
x
∣
Θ
)
=
∑
i
=
1
k
α
i
P
(
x
∣
Θ
i
)
\begin{align} P(\boldsymbol{x }\,|\,\Theta) & =\sum_{i=1}^k{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i) \tag{9.22} \end{align}
P(x∣Θ)=i=1∑kαiP(x∣Θi)(9.22)
式(9.21)、式(9.22)对于混合成分分布都成立,而对于高斯混合,式中的 P ( x ∣ Θ i ) P(\boldsymbol{x }\,|\,{\Theta}_i) P(x∣Θi)由【西瓜书式(9.28)】所定义。这时, P P P实际上是 p p p(概率分布密度),由于本书主要讨论离散型随机变量,常用大写的 P P P(即并不去严格区分它的大小写),根据上下文理解:针对离散型则为概率,针对连续型则为概率分布密度。
由式(9.21)得到关于
z
z
z的分段函数
P
(
x
,
z
∣
Θ
)
=
α
i
P
(
x
∣
Θ
i
)
,
(
若
z
=
i
)
,
i
=
1
,
2
,
⋯
,
k
\begin{align} P(\boldsymbol{x },z\,|\,\Theta) & ={\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i),\ (\text{若}\ z=i),i=1,2,\cdots,k \tag{9.23} \end{align}
P(x,z∣Θ)=αiP(x∣Θi), (若 z=i),i=1,2,⋯,k(9.23)
将指示函数应用于该分段函数,有
P
(
x
,
z
∣
Θ
)
=
∏
i
=
1
k
[
α
i
P
(
x
∣
Θ
i
)
]
I
(
z
=
i
)
(由式(B8))
\begin{align} P(\boldsymbol{x },z\,|\,\Theta) & =\prod_{i=1}^k[{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)]^{\mathbb{I}(z=i)}\qquad \text{(由式(B8))} \tag{9.24} \end{align}
P(x,z∣Θ)=i=1∏k[αiP(x∣Θi)]I(z=i)(由式(B8))(9.24)
其中用到公式(参见6、指示函数及应用(将分段函数表达成一个式子的技术)):
f
(
x
)
=
∏
i
=
1
n
a
i
(
x
)
I
[
A
i
(
x
)
]
\begin{align} f(\boldsymbol{x})=\mathop{\prod}\limits_{i=1}^na_i(\boldsymbol{x})^{\mathbb{I}[A_i(\boldsymbol{x})]} \tag{B8} \end{align}
f(x)=i=1∏nai(x)I[Ai(x)](B8)
(3)隐变量
z
z
z的分布:由式(9.21)、式(9.22)及贝叶斯公式,有
P
(
z
=
i
∣
x
,
Θ
)
=
P
(
x
,
z
=
i
∣
Θ
)
P
(
x
∣
Θ
)
=
α
i
P
(
x
∣
Θ
i
)
∑
i
=
1
k
α
i
P
(
x
∣
Θ
i
)
\begin{align} P(z=i\,|\,\boldsymbol{x },\Theta) & =\frac{ P(\boldsymbol{x },z=i\,|\,\Theta)}{P(\boldsymbol{x }\,|\,\Theta)}\notag \\ & =\frac{ {\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)}{\sum_{i=1}^k{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)} \tag{9.25} \end{align}
P(z=i∣x,Θ)=P(x∣Θ)P(x,z=i∣Θ)=∑i=1kαiP(x∣Θi)αiP(x∣Θi)(9.25)
当已知
x
j
\boldsymbol{x }_j
xj时,对应的隐变量
z
z
z在时刻
t
t
t的后验分布为
P
(
z
j
=
i
∣
x
j
,
Θ
t
)
=
α
i
t
P
(
x
j
∣
Θ
i
t
)
∑
i
=
1
k
α
i
t
P
(
x
j
∣
Θ
i
t
)
=
γ
j
i
t
\begin{align} P(z_j=i\,|\,\boldsymbol{x }_j,{\Theta}^{\,t}) =\frac{ {\alpha}_i^{\,t} P(\boldsymbol{x }_j\,|\,{\Theta}_i^{\,t})}{\sum_{i=1}^k{\alpha}_i^{\,t} P(\boldsymbol{x }_j\,|\,{\Theta}_i^{\,t})}={\gamma}_{ji}^{\,t} \tag{9.26} \end{align}
P(zj=i∣xj,Θt)=∑i=1kαitP(xj∣Θit)αitP(xj∣Θit)=γjit(9.26)
其中,
γ
j
i
t
{\gamma}_{ji}^{\,t}
γjit即为式(9.8),这即为【西瓜书式(9.30)】。
(4)设样本集为:
D
′
=
{
x
j
,
z
j
}
j
=
1
m
D'=\{\boldsymbol{x }_j,z_j\}_{j=1}^m
D′={xj,zj}j=1m,其中,
z
j
z_j
zj为样本所属的成分变量(隐变量),记
X
=
{
x
j
}
j
=
1
m
,
Z
=
{
z
j
}
j
=
1
m
\mathbf{X}=\{\boldsymbol{x }_j\}_{j=1}^m,\ \mathbf{Z}=\{z_j\}_{j=1}^m
X={xj}j=1m, Z={zj}j=1m。
具体化
Q
(
Θ
∣
Θ
t
)
Q(\Theta\,|\,{\Theta}^{\,t})
Q(Θ∣Θt):
Q
(
Θ
∣
Θ
t
)
=
E
Z
∣
X
,
Θ
t
L
L
(
Θ
∣
X
,
Z
)
(由【西瓜书式(7.36)】)
=
E
Z
∣
X
,
Θ
t
ln
P
(
X
,
Z
∣
Θ
)
=
E
Z
∣
X
,
Θ
t
ln
∏
j
=
1
m
P
(
x
j
,
z
j
∣
Θ
)
=
E
Z
∣
X
,
Θ
t
∑
j
=
1
m
ln
P
(
x
j
,
z
j
∣
Θ
)
=
E
Z
∣
X
,
Θ
t
∑
j
=
1
m
ln
∏
i
=
1
k
(
α
i
P
(
x
j
∣
Θ
i
)
)
I
(
z
j
=
i
)
(由式(9.24))
=
∑
j
=
1
m
E
Z
∣
X
,
Θ
t
(
∑
i
=
1
k
I
(
z
j
=
i
)
ln
(
α
i
P
(
x
j
∣
Θ
i
)
)
)
=
∑
j
=
1
m
∑
i
=
1
k
(
E
Z
∣
X
,
Θ
t
I
(
z
j
=
i
)
)
ln
(
α
i
P
(
x
j
∣
Θ
i
)
)
=
∑
j
=
1
m
∑
i
=
1
k
(
E
z
j
∣
x
j
,
Θ
t
I
(
z
j
=
i
)
)
ln
(
α
i
P
(
x
j
∣
Θ
i
)
)
(
E
作用域只含
z
j
)
=
∑
j
=
1
m
∑
i
=
1
k
P
(
z
j
=
i
∣
x
j
,
Θ
t
)
ln
(
α
i
P
(
x
j
∣
Θ
i
)
)
(由式(B11))
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
t
ln
(
α
i
P
(
x
j
∣
Θ
i
)
)
(由式(9.26))
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
t
ln
α
i
+
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
t
ln
P
(
x
j
∣
Θ
i
)
\begin{align} Q(\Theta\,|\,{\Theta}^{\,t}) & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\mathrm{LL}(\Theta\,|\,\mathbf{X},\mathbf{Z})\qquad \text{(由【西瓜书式(7.36)】)}\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\ln\, P(\mathbf{X},\mathbf{Z}\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\ln\, \prod_{j=1}^m P(\boldsymbol{x }_j,z_j\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \sum_{j=1}^m \ln\, P(\boldsymbol{x }_j,z_j\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \sum_{j=1}^m \ln\, \prod_{i=1}^k( {\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))^{\mathbb{I}(z_j=i)} \qquad \text{(由式(9.24))}\notag \\ & = \sum_{j=1}^m {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \left (\sum_{i=1}^k{\mathbb{I}(z_j=i)}\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\right)\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k\left ({\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} {\mathbb{I}(z_j=i)}\right)\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k\left ({\mathbb{E} }_{z_j\,|\,\boldsymbol{x}_j,{\Theta}^{\,t}} {\mathbb{I}(z_j=i)}\right)\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\ \text{(${\mathbb{E} }$作用域只含$z_j$)}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k P(z_j=i\,|\,\boldsymbol{x}_j,{\Theta}^{\,t})\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\qquad \text{(由式(B11))}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\qquad \text{(由式(9.26))}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln{\alpha}_i+\sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t}\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i) \end{align}
Q(Θ∣Θt)=EZ∣X,ΘtLL(Θ∣X,Z)(由【西瓜书式(7.36)】)=EZ∣X,ΘtlnP(X,Z∣Θ)=EZ∣X,Θtlnj=1∏mP(xj,zj∣Θ)=EZ∣X,Θtj=1∑mlnP(xj,zj∣Θ)=EZ∣X,Θtj=1∑mlni=1∏k(αiP(xj∣Θi))I(zj=i)(由式(9.24))=j=1∑mEZ∣X,Θt(i=1∑kI(zj=i)ln(αiP(xj∣Θi)))=j=1∑mi=1∑k(EZ∣X,ΘtI(zj=i))ln(αiP(xj∣Θi))=j=1∑mi=1∑k(Ezj∣xj,ΘtI(zj=i))ln(αiP(xj∣Θi)) (E作用域只含zj)=j=1∑mi=1∑kP(zj=i∣xj,Θt)ln(αiP(xj∣Θi))(由式(B11))=j=1∑mi=1∑kγjitln(αiP(xj∣Θi))(由式(9.26))=j=1∑mi=1∑kγjitlnαi+j=1∑mi=1∑kγjitlnP(xj∣Θi)
其中用到公式(参见6、指示函数及应用(将分段函数表达成一个式子的技术)):
E
x
∈
D
I
A
(
x
)
=
P
(
x
∈
A
)
\begin{align} \mathop{\mathbb{E} }\limits_{x\in D}\mathbb{I}_A (x) & =P(x\in A) \tag{B11} \end{align}
x∈DEIA(x)=P(x∈A)(B11)
即有
Q
(
Θ
∣
Θ
t
)
=
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
t
ln
α
i
+
∑
j
=
1
m
∑
i
=
1
k
γ
j
i
t
ln
P
(
x
j
∣
Θ
i
)
\begin{align} Q(\Theta\,|\,{\Theta}^{\,t})= \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln{\alpha}_i+\sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t}\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i) \tag{9.27} \end{align}
Q(Θ∣Θt)=j=1∑mi=1∑kγjitlnαi+j=1∑mi=1∑kγjitlnP(xj∣Θi)(9.27)
有了上述准备后,即可使用EM算法:
E步:
(1)推断隐变量分布: P ( Z ∣ X , Θ t ) P(\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}) P(Z∣X,Θt),这时它等价为 P ( z j ∣ x j , Θ t ) , j = 1 , 2 , ⋯ , k P(z_j\,|\,\boldsymbol{x}_j,{\Theta}^{\,t}),\, j=1,2,\cdots,k P(zj∣xj,Θt),j=1,2,⋯,k,即式(9.26),其中参数即为当前参数,而 x j \boldsymbol{x}_j xj也已知,由【西瓜书式(9.28)】和式(9.26)即可计算 γ j i t {\gamma}_{ji}^{\,t} γjit。
(2)列出 Q Q Q的表达式:即式(9.27)。
M步:求 Q Q Q的最大值点:
由式(9.27)有
∂
Q
(
Θ
∣
Θ
t
)
∂
μ
i
=
0
+
∑
j
=
1
m
(
γ
j
i
t
∂
ln
P
(
x
j
∣
Θ
i
)
∂
μ
i
+
∑
l
≠
i
γ
j
l
t
∂
ln
P
(
x
j
∣
Θ
l
)
∂
μ
i
)
=
∑
j
=
1
m
(
γ
j
i
t
1
P
(
x
j
∣
Θ
i
)
∂
P
(
x
j
∣
Θ
i
)
∂
μ
i
+
0
)
=
−
∑
j
=
1
m
γ
j
i
t
1
P
(
x
j
∣
Θ
i
)
p
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
(
x
j
−
μ
i
)
(由式(9.4))
=
−
Σ
i
−
1
∑
j
=
1
m
γ
j
i
t
(
x
j
−
μ
i
)
\begin{align} \frac{\partial Q(\Theta\,|\,{\Theta}^{\,t})}{\partial \boldsymbol{\mu }_i } & = 0+\sum_{j=1}^m \left( {\gamma}_{ji}^{\,t}\frac{\partial\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i)}{\partial \boldsymbol{\mu }_i }+\sum_{l\neq i} {\gamma}_{jl}^{\,t}\frac{\partial\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_l)}{\partial \boldsymbol{\mu }_i }\right)\notag \\ & = \sum_{j=1}^m \left( {\gamma}_{ji}^{\,t}\frac{1}{P(\boldsymbol{x }_j\,|\,{\Theta}_i)}\frac{\partial P(\boldsymbol{x }_j\,|\,{\Theta}_i)}{\partial \boldsymbol{\mu }_i }+0\right)\notag \\ & = -\sum_{j=1}^m {\gamma}_{ji}^{\,t}\frac{1}{P(\boldsymbol{x }_j\,|\,{\Theta}_i)}p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)\quad \text{(由式(9.4))}\notag \\ & = -{\boldsymbol{\Sigma } }_i^{-1}\sum_{j=1}^m {\gamma}_{ji}^{\,t}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.28} \end{align}
∂μi∂Q(Θ∣Θt)=0+j=1∑m
γjit∂μi∂lnP(xj∣Θi)+l=i∑γjlt∂μi∂lnP(xj∣Θl)
=j=1∑m(γjitP(xj∣Θi)1∂μi∂P(xj∣Θi)+0)=−j=1∑mγjitP(xj∣Θi)1p(xj∣μi,Σi)Σi−1(xj−μi)(由式(9.4))=−Σi−1j=1∑mγjit(xj−μi)(9.28)
令
∂
Q
(
Θ
∣
Θ
t
)
∂
μ
i
=
0
\frac{\partial Q(\Theta\,|\,{\Theta}^{\,t})}{\partial \boldsymbol{\mu }_i }=\boldsymbol{0}
∂μi∂Q(Θ∣Θt)=0,则得与式(9.10)一致的结果:
μ
i
t
+
1
=
∑
j
=
1
m
γ
j
i
t
x
j
∑
j
=
1
m
γ
j
i
t
\begin{align} \boldsymbol{\mu }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.29} \end{align}
μit+1=∑j=1mγjit∑j=1mγjitxj(9.29)
即【西瓜书式(9.34)】。
类似地,从式(9.27)出发,可推导出其他递推式,留给大家练习。
本文为原创,您可以:
- 点赞(支持博主)
- 收藏(待以后看)
- 转发(他考研或学习,正需要)
- 评论(或讨论)
- 引用(支持原创)
- 不侵权
上一篇:9.3 高斯混合聚类算法(男生和女生依比例形成男女混合成绩模型)
下一篇:9.5 密度聚类与层次聚类(DBSCAN算法、AGNES算法)