(《机器学习》完整版系列)第9章 聚类——9.4 高斯混合模型EM算法详细推导

上篇博我们给出了高斯混合模型EM算法,这里我们对它的公式进行详细推导.

高斯混合模型EM算法推导

7.10 EM算法的使用场景及步骤中,我们给出了一般的EM算法步聚,在具体应用时,关键是要构造出该方法所需的要素,然后直接套用它即可。 这里符合指出的没有缺失属性(隐变量)的情况,有有(I)和(II)两种办法处理。

【西瓜书】就是按(I)处理:即在极大(对数)似然过程中“凑出”递推式,转化为EM算法。

(1)参数 μ \boldsymbol{\mu } μ

由【西瓜书式(9.28)】有
∂ p ( x ) ∂ μ = p ( x ) − 1 2 ∂ ( x − μ ) T Σ − 1 ( x − μ ) ∂ μ = − p ( x ) Σ − 1 ( x − μ ) (由【西瓜书式(A.32】)) \begin{align} \frac{\partial p(\boldsymbol{x })}{\partial \boldsymbol{\mu }} & =p(\boldsymbol{x }) \frac{-\frac{1}{2}\partial (\boldsymbol{x }-\boldsymbol{\mu })^\mathrm{T}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu })}{\partial \boldsymbol{\mu }}\notag \\ & =-p(\boldsymbol{x }){\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu })\qquad \text{(由【西瓜书式(A.32】))} \tag{9.3} \end{align} μp(x)=p(x)μ21(xμ)TΣ1(xμ)=p(x)Σ1(xμ)(由【西瓜书式(A.32】))(9.3)

将式(9.3)用于 p ( x j   ∣   μ i , Σ i ) p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i) p(xjμi,Σi)

,即
∂ p ( x j   ∣   μ i , Σ i ) ∂ μ i = − p ( x j   ∣   μ i , Σ i ) Σ i − 1 ( x j − μ i ) \begin{align} \frac{\partial p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\partial \boldsymbol{\mu }_i}=-p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.4} \end{align} μip(xjμi,Σi)=p(xjμi,Σi)Σi1(xjμi)(9.4)

由【西瓜书式(9.32)】有
∂ L L ( D ) ∂ μ i = ∑ j = 1 m ∂ ln ⁡   ( ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) ) ∂ μ i = ∑ j = 1 m α i ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) ∂ p ( x j   ∣   μ i , Σ i ) ∂ μ i = − ∑ j = 1 m α i p ( x j   ∣   μ i , Σ i ) ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) Σ i − 1 ( x j − μ i ) (由式(9.4)) = − Σ i − 1 ∑ j = 1 m γ j i ( x j − μ i ) \begin{align} \frac{\partial \mathrm{LL}(D)}{\partial \boldsymbol{\mu }_i} & =\sum_{j=1}^m\frac{\partial\ln\, (\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i))}{\partial \boldsymbol{\mu }_i}\notag \\ & =\sum_{j=1}^m\frac{{\alpha}_i }{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}\frac{\partial p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\partial \boldsymbol{\mu }_i}\notag \\ & =-\sum_{j=1}^m\frac{{\alpha}_i p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)\quad \text{(由式(9.4))}\notag \\ & =-{\boldsymbol{\Sigma } }_i^{-1}\sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.5} \end{align} μiLL(D)=j=1mμiln(i=1kαip(xjμi,Σi))=j=1mi=1kαip(xjμi,Σi)αiμip(xjμi,Σi)=j=1mi=1kαip(xjμi,Σi)αip(xjμi,Σi)Σi1(xjμi)(由式(9.4)=Σi1j=1mγji(xjμi)(9.5)
其中
γ j i = α i p ( x j   ∣   μ i , Σ i ) ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) \begin{align} {\gamma}_{ji}=\frac{{\alpha}_i p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)}{\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)} \tag{9.6} \end{align} γji=i=1kαip(xjμi,Σi)αip(xjμi,Σi)(9.6)
式中 Σ i {\boldsymbol{\Sigma } }_i Σi不是求和符号,而是协方差(矩阵)。

注:这里既有 Σ {\boldsymbol{\Sigma } } Σ又有 ∑ {\sum } ,请注意区别: Σ {\boldsymbol{\Sigma } } Σ不是求和符号,而是协方差矩阵。

∂ L L ( D ) ∂ μ i = 0 \frac{\partial \mathrm{LL}(D)}{\partial \boldsymbol{\mu }_i}=\boldsymbol{0} μiLL(D)=0,两边乘以 Σ {\boldsymbol{\Sigma } } Σ,则由式(9.5)得
∑ j = 1 m γ j i ( x j − μ i ) = 0 ∑ j = 1 m γ j i x j = μ i ∑ j = 1 m γ j i \begin{align} \sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)=0\notag \\ \sum_{j=1}^m{\gamma}_{ji}\boldsymbol{x }_j=\boldsymbol{\mu }_i\sum_{j=1}^m{\gamma}_{ji} \tag{9.7} \end{align} j=1mγji(xjμi)=0j=1mγjixj=μij=1mγji(9.7)

以当前( t t t时)的参数值 ( α i   t , μ i   t , Σ i   t ) ({\alpha}_i^{\,t},\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t}) (αit,μit,Σit),根据【西瓜书式(9.28)】即可计算出此时的式(9.6)的值
γ j i   t = α i   t p ( x j   ∣   μ i   t , Σ i   t ) ∑ i = 1 k α i   t p ( x j   ∣   μ i   t , Σ i   t ) \begin{align} {\gamma}_{ji}^{\,t}=\frac{{\alpha}_i^{\,t} p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t})}{\sum_{i=1}^k{\alpha}_i^{\,t}p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i^{\,t},{\boldsymbol{\Sigma } }_i^{\,t})} \tag{9.8} \end{align} γjit=i=1kαitp(xjμit,Σit)αitp(xjμit,Σit)(9.8)

由式(9.8)的已知,代入等式(9.7),得下一时刻需求解的 μ \boldsymbol{\mu } μ,这样就“凑”成了递推式(将等式变为递推式)
∑ j = 1 m γ j i   t x j = μ i   t + 1 ∑ j = 1 m γ j i   t \begin{align} \sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j=\boldsymbol{\mu }_i^{\,t+1}\sum_{j=1}^m{\gamma}_{ji}^{\,t} \tag{9.9} \end{align} j=1mγjitxj=μit+1j=1mγjit(9.9)

μ i   t + 1 = ∑ j = 1 m γ j i   t x j ∑ j = 1 m γ j i   t \begin{align} \boldsymbol{\mu }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.10} \end{align} μit+1=j=1mγjitj=1mγjitxj(9.10)
即【西瓜书式(9.34)】。

(2)参数 Σ {\boldsymbol{\Sigma } } Σ
∂ ∣ Σ ∣ − 1 2 ∂ Σ = − 1 2 ∣ Σ ∣ − 1 2 Σ − T (由式(A86)) = − 1 2 ∣ Σ ∣ − 1 2 Σ − 1 (由 Σ 的对称性) \begin{align} \frac{\partial |{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}{\partial {\boldsymbol{\Sigma } }} & ={-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-\mathrm{T}}\quad \text{(由式(A86))}\notag \\ & ={-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-1}\quad \text{(由${\boldsymbol{\Sigma } }$的对称性)} \tag{9.11} \end{align} ΣΣ21=21Σ21ΣT(由式(A86)=21Σ21Σ1(由Σ的对称性)(9.11)
其中用到公式(参见5、含矩阵的偏导数):
∂ ∣ A ∣ − 1 2 ∂ A = − 1 2 ∣ A ∣ − 1 2 A − T \begin{align} \frac{\partial |\mathbf{A}|^{-\frac{1}{2}}}{\partial \mathbf{A}} ={-\frac{1}{2}}|\mathbf{A}|^{-\frac{1}{2}}\mathbf{A}^{-\mathrm{T}} \tag{A86} \end{align} AA21=21A21AT(A86)
∂ ( x − μ ) T Σ − 1 ( x − μ ) ∂ Σ = ∂ t r ( ( x − μ ) T Σ − 1 ( x − μ ) ) ∂ Σ = ∂ t r ( ( x − μ ) ( x − μ ) T Σ − 1 ) ∂ Σ = − Σ − T ( ( x − μ ) ( x − μ ) T ) T Σ − T (由式(A80)) = − Σ − 1 ( x − μ ) ( x − μ ) T Σ − 1 (由 Σ 的对称性) \begin{align} & \quad \frac{\partial (\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial \mathrm{tr}((\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu}))}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial \mathrm{tr}((\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1})}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =-{\boldsymbol{\Sigma } }^{-\mathrm{T}}((\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-\mathrm{T}}\quad \text{(由式(A80))}\notag \\ & =-{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}\quad \text{(由${\boldsymbol{\Sigma } }$的对称性)} \tag{9.12} \end{align} Σ(xμ)TΣ1(xμ)=Σtr((xμ)TΣ1(xμ))=Σtr((xμ)(xμ)TΣ1)=ΣT((xμ)(xμ)T)TΣT(由式(A80)=Σ1(xμ)(xμ)TΣ1(由Σ的对称性)(9.12)
其中用到公式(参见5、含矩阵的偏导数):
∂ t r ( B A − 1 ) ∂ A = − ( A − 1 B A − 1 ) T \begin{align} \frac{\partial \mathrm{tr}(\mathbf{B}\mathbf{A}^{-1})}{\partial \mathbf{A}}=-(\mathbf{A}^{-1}\mathbf{B}\mathbf{A}^{-1})^{\mathrm{T}} \tag{A80} \end{align} Atr(BA1)=(A1BA1)T(A80)
利用式(9.11)、式(9.12),求【西瓜书式(9.28)】关于矩阵 Σ {\boldsymbol{\Sigma } } Σ的偏导数,有
∂ P ( x ) ∂ Σ = ∂ ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ∂ Σ = ∂ ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 ∂ Σ exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) )   + ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 ∂ exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ∂ Σ = ( 2 π ) − n 2 ( − 1 2 ∣ Σ ∣ − 1 2 Σ − 1 ) exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) + ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ∂ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ∂ Σ = − 1 2 P ( x ) ( Σ − 1 ) + P ( x ) ( − 1 2 ( − Σ − 1 ( x − μ ) ( x − μ ) T Σ − 1 ) ) = 1 2 P ( x ) Σ − 1 ( − Σ + ( x − μ ) ( x − μ ) T ) Σ − 1 \begin{align} & \quad \frac{\partial P(\boldsymbol{x })}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial {(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)}{\partial {\boldsymbol{\Sigma } }}\notag \\ & =\frac{\partial {(2\mathrm{\pi} )^{-\frac{n}{2}}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}{\partial {\boldsymbol{\Sigma } }}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\notag \\ & \ +{(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\frac{\partial\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)}{\partial {\boldsymbol{\Sigma } }}\notag \\ & = {(2\mathrm{\pi} )^{-\frac{n}{2}}}({-\frac{1}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}{\boldsymbol{\Sigma } }^{-1})\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\notag \\ & \quad +{(2\mathrm{\pi} )^{-\frac{n}{2}}|{\boldsymbol{\Sigma } }|^{-\frac{1}{2}}}\exp\left(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})\right)\frac{\partial(-\frac{1}{2}(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu}))}{\partial {\boldsymbol{\Sigma } }}\notag \\ & ={-\frac{1}{2}}P(\boldsymbol{x })({\boldsymbol{\Sigma } }^{-1})+P(\boldsymbol{x })\left(-\frac{1}{2}(-{\boldsymbol{\Sigma } }^{-1}(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}{\boldsymbol{\Sigma } }^{-1})\right)\notag \\ & ={\frac{1}{2}}P(\boldsymbol{x }){\boldsymbol{\Sigma } }^{-1}\left(-{\boldsymbol{\Sigma } }+(\boldsymbol{x }-\boldsymbol{\mu})(\boldsymbol{x }-\boldsymbol{\mu})^{\mathrm{T}}\right){\boldsymbol{\Sigma } }^{-1} \tag{9.13} \end{align} ΣP(x)=Σ(2π)2nΣ21exp(21(xμ)TΣ1(xμ))=Σ(2π)2nΣ21exp(21(xμ)TΣ1(xμ)) +(2π)2nΣ21Σexp(21(xμ)TΣ1(xμ))=(2π)2n(21Σ21Σ1)exp(21(xμ)TΣ1(xμ))+(2π)2nΣ21exp(21(xμ)TΣ1(xμ))Σ(21(xμ)TΣ1(xμ))=21P(x)(Σ1)+P(x)(21(Σ1(xμ)(xμ)TΣ1))=21P(x)Σ1(Σ+(xμ)(xμ)T)Σ1(9.13)
将式(9.13)应用于 P ( x j   ∣   μ i , Σ i ) P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i,{\boldsymbol{\Sigma } }_i) P(xjμi,Σi),有
∂ P ( x j   ∣   μ i , Σ i ) ∂ Σ i = 1 2 P ( x j   ∣   μ i , Σ i ) Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 \begin{align} \frac{\partial P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{\partial {\boldsymbol{\Sigma } }_i} & =\frac{1}{2}P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\tag{9.14} \end{align} ΣiP(xjμi,Σi)=21P(xjμi,Σi)Σi1[(xjμi)(xjμi)TΣi]Σi1(9.14)

利用式(9.14),再求【西瓜书式(9.32)】关于矩阵 Σ i {\boldsymbol{\Sigma } }_i Σi的偏导数,有
∂ L L ( D ) ∂ Σ i = ∑ j = 1 m ∂ ln ⁡   ( ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) ) ∂ Σ i = ∑ j = 1 m α i ( ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) ) ∂ P ( x j   ∣   μ i , Σ i ) ∂ Σ i (下式由式(9.14)) = 1 2 ∑ j = 1 m α i P ( x j   ∣   μ i , Σ i ) ( ∑ i = 1 k α i p ( x j   ∣   μ i , Σ i ) ) Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 = 1 2 ∑ j = 1 m γ j i [ Σ i − 1 ( x j − μ i ) ( x j − μ i ) T Σ i − 1 − Σ i − 1 ] (由式(9.6)) = 1 2 Σ i − 1 ( ∑ j = 1 m γ j i [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] ) Σ i − 1 \begin{align} \frac{\partial \mathrm{LL}(D)}{\partial {\boldsymbol{\Sigma } }_i} & =\sum_{j=1}^m\frac{\partial \ln\, ({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}{\partial {\boldsymbol{\Sigma } }_i}\notag \\ & =\sum_{j=1}^m\frac{{\alpha}_i }{({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}\frac{\partial P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{\partial {\boldsymbol{\Sigma } }_i}\qquad \text{(下式由式(9.14))}\notag \\ & =\frac{1}{2}\sum_{j=1}^m\frac{{\alpha}_i P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)}{({\sum_{i=1}^k{\alpha}_ip(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)})}{\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}\sum_{j=1}^m{\gamma}_{ji} [{\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}{\boldsymbol{\Sigma } }_i^{-1}-{\boldsymbol{\Sigma } }_i^{-1}]\qquad \text{(由式(9.6))}\notag \\ & =\frac{1}{2}{\boldsymbol{\Sigma } }_i^{-1}\left(\sum_{j=1}^m{\gamma}_{ji} [(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]\right){\boldsymbol{\Sigma } }_i^{-1} \tag{9.15} \end{align} ΣiLL(D)=j=1mΣiln(i=1kαip(xjμi,Σi))=j=1m(i=1kαip(xjμi,Σi))αiΣiP(xjμi,Σi)(下式由式(9.14)=21j=1m(i=1kαip(xjμi,Σi))αiP(xjμi,Σi)Σi1[(xjμi)(xjμi)TΣi]Σi1=21j=1mγji[Σi1(xjμi)(xjμi)TΣi1Σi1](由式(9.6)=21Σi1(j=1mγji[(xjμi)(xjμi)TΣi])Σi1(9.15)

∂ L L ( D ) ∂ Σ i = 0 \frac{\partial \mathrm{LL}(D)}{\partial {\boldsymbol{\Sigma } }_i}=\mathbf{0} ΣiLL(D)=0,得
∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T = Σ i ∑ j = 1 m γ j i \begin{align} \sum_{j=1}^m{\gamma}_{ji}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}={\boldsymbol{\Sigma } }_i\sum_{j=1}^m{\gamma}_{ji} \tag{9.16} \end{align} j=1mγji(xjμi)(xjμi)T=Σij=1mγji(9.16)
同式(9.9)的方法,由等式“凑”出递推式
Σ i   t + 1 = ∑ j = 1 m γ j i   t ( x j − μ i   t ) ( x j − μ i   t ) T ∑ j = 1 m γ j i   t \begin{align} {\boldsymbol{\Sigma } }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}(\boldsymbol{x }_j-\boldsymbol{\mu}_i^{\,t})(\boldsymbol{x }_j-\boldsymbol{\mu}_i^{\,t})^{\mathrm{T}}}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.17} \end{align} Σit+1=j=1mγjitj=1mγjit(xjμit)(xjμit)T(9.17)
即【西瓜书式(9.35)】,该式为矩阵。

(3)参数 α {\alpha} α

L L ( D ) \mathrm{LL}(D) LL(D)中若将混合系数 α i {\alpha}_i αi视为变量,则它有约束: α i > 0 ,   ∑ i = 1 k α i = 1 {\alpha}_i> 0,\, \sum_{i=1}^k{\alpha}_i=1 αi>0,i=1kαi=1,故需要拉格朗日乘数法,由此得到拉格朗日函数【西瓜书式(9.36)】,令其对 α {\alpha} α的导数为0,则得到【西瓜书式(9.37)】。

观察【西瓜书式(9.37)】发现:与式(9.6)比较,它的分数项分子缺 α i {\alpha}_i αi,两边乘以 α i {\alpha}_i αi即可配成式(9.6),则有
∑ j = 1 m γ j i + λ α i = 0 \begin{align} \sum_{j=1}^m{\gamma}_{ji} +\lambda {\alpha }_i=0 \tag{9.18} \end{align} j=1mγji+λαi=0(9.18)
i i i求和,有
0 = ∑ i = 1 k ( ∑ j = 1 m γ j i + λ α i ) = ∑ i = 1 k ∑ j = 1 m γ j i + λ ∑ i = 1 k α i = ∑ j = 1 m ∑ i = 1 k γ j i + λ (由于 ∑ i = 1 k α i = 1 ) = ∑ j = 1 m 1 + λ (由于 ∑ i = 1 k γ j i = 1 ) = m + λ ∴ λ = − m \begin{align} 0 & =\sum_{i=1}^k(\sum_{j=1}^m{\gamma}_{ji} +\lambda {\alpha }_i)\notag \\ & =\sum_{i=1}^k\sum_{j=1}^m{\gamma}_{ji} +\lambda \sum_{i=1}^k{\alpha }_i\notag \\ & =\sum_{j=1}^m\sum_{i=1}^k{\gamma}_{ji}+\lambda \qquad \text{(由于$\sum_{i=1}^k{\alpha }_i=1$)}\notag \\ & =\sum_{j=1}^m1+\lambda \qquad \text{(由于$\sum_{i=1}^k{\gamma}_{ji}=1$)}\notag \\ & =m+\lambda\notag \\ \therefore \qquad \quad \lambda & =-m \tag{9.19} \end{align} 0λ=i=1k(j=1mγji+λαi)=i=1kj=1mγji+λi=1kαi=j=1mi=1kγji+λ(由于i=1kαi=1=j=1m1+λ(由于i=1kγji=1=m+λ=m(9.19)

由式(9.18)、式(9.19)“凑”出递推式
α i   t + 1 = 1 m ∑ j = 1 m γ j i   t \begin{align} {\alpha }_i^{\,t+1}=\frac{1}{m}\sum_{j=1}^m{\gamma}_{ji}^{\,t} \tag{9.20} \end{align} αit+1=m1j=1mγjit(9.20)
即【西瓜书式(9.38)】。

有了上述递推式,则可套用EM算法:

  • E步:根据当前时刻( t t t时)的参数,由式(9.8)计算当前时刻的 γ j i   t {\gamma}_{ji}^{\,t} γjit
  • M步:由 γ j i   t {\gamma}_{ji}^{\,t} γjit及递推式(9.10)、式(9.17)、式(9.20)更新下一时刻( t + 1 t+1 t+1)的参数。

整理成伪代码即为【西瓜书图9.6】所示的高斯混合聚类算法。

注:代码中并没有体现出关于时刻的符号 t t t,这是由于程序运行的过程隐含地体现了时刻,若显示地体现,则会引入较多的变量,占用较多的存储空间,需要寻址、赋值等,反而不方便。

至此,我们完成了高斯混合聚类算法的推导,看官如果还不过瘾的话,我们再来推导一次——按EM的(II)的方法:将样本的成分标识视为隐变量,使用EM算法。

首先,我们设定要素及准备有关公式。

(1)参数: k k k个成分的高斯混合分布的参数为 Θ = ( { μ i , Σ i , α i } i = 1 k ) {\Theta}=(\{{\mu }_i,{\boldsymbol{\Sigma } }_i,{\alpha }_i\}_{i=1}^k) Θ=({μi,Σi,αi}i=1k),在序列(7.33)中(7.10 EM算法的使用场景及步骤),
Θ 0 , Θ 1 , Θ 2 , ⋯   , Θ   t , Θ   t + 1 , ⋯ \begin{align} {\Theta}^0,{\Theta}^1,{\Theta}^2,\cdots,{\Theta}^{\,t},{\Theta}^{\,t+1},\cdots \tag{7.33} \end{align} Θ0,Θ1,Θ2,,Θt,Θt+1,(7.33)
Θ   t = ( { μ i t , Σ i t , α i t } i = 1 k ) {\Theta}^{\,t}=(\{{\mu }_i^t,{\boldsymbol{\Sigma } }_i^t,{\alpha }_i^t\}_{i=1}^k) Θt=({μit,Σit,αit}i=1k)

(2)隐变量:将样本 x \boldsymbol{x } x所属的成分(簇)作为其隐变量 z z z,根据混合成分分布的定义我们给出关联的“事件”概率。

{混合成分中产生的样本 x \boldsymbol{x} x属于第 i i i个成分}={选取第 i i i个成分} ⋂ \bigcap {在该成分中产生样本 x \boldsymbol{x} x}
,则该事件发生的概率表达式有
P ( x , z = i   ∣   Θ ) = α i P ( x   ∣   Θ i ) \begin{align} P(\boldsymbol{x },z=i\,|\,\Theta) & ={\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i) \tag{9.21} \end{align} P(x,z=iΘ)=αiP(xΘi)(9.21)
注:这时, x \boldsymbol{x } x只与 Θ \Theta Θ中的成分 i i i的参数 Θ i {\Theta}_i Θi有关。

{混合成分中产生样本 x \boldsymbol{x} x}= ⋃ i = 1 k {\bigcup}_{i=1}^k i=1k{在第 i i i个成分中产生样本 x \boldsymbol{x} x},则该事件发生的概率表达式有
P ( x   ∣   Θ ) = ∑ i = 1 k α i P ( x   ∣   Θ i ) \begin{align} P(\boldsymbol{x }\,|\,\Theta) & =\sum_{i=1}^k{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i) \tag{9.22} \end{align} P(xΘ)=i=1kαiP(xΘi)(9.22)

式(9.21)、式(9.22)对于混合成分分布都成立,而对于高斯混合,式中的 P ( x   ∣   Θ i ) P(\boldsymbol{x }\,|\,{\Theta}_i) P(xΘi)由【西瓜书式(9.28)】所定义。这时, P P P实际上是 p p p(概率分布密度),由于本书主要讨论离散型随机变量,常用大写的 P P P(即并不去严格区分它的大小写),根据上下文理解:针对离散型则为概率,针对连续型则为概率分布密度。

由式(9.21)得到关于 z z z的分段函数
P ( x , z   ∣   Θ ) = α i P ( x   ∣   Θ i ) ,   ( 若  z = i ) , i = 1 , 2 , ⋯   , k \begin{align} P(\boldsymbol{x },z\,|\,\Theta) & ={\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i),\ (\text{若}\ z=i),i=1,2,\cdots,k \tag{9.23} \end{align} P(x,zΘ)=αiP(xΘi), ( z=i),i=1,2,,k(9.23)

将指示函数应用于该分段函数,有
P ( x , z   ∣   Θ ) = ∏ i = 1 k [ α i P ( x   ∣   Θ i ) ] I ( z = i ) (由式(B8)) \begin{align} P(\boldsymbol{x },z\,|\,\Theta) & =\prod_{i=1}^k[{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)]^{\mathbb{I}(z=i)}\qquad \text{(由式(B8))} \tag{9.24} \end{align} P(x,zΘ)=i=1k[αiP(xΘi)]I(z=i)(由式(B8)(9.24)
其中用到公式(参见6、指示函数及应用(将分段函数表达成一个式子的技术)):
f ( x ) = ∏ i = 1 n a i ( x ) I [ A i ( x ) ] \begin{align} f(\boldsymbol{x})=\mathop{\prod}\limits_{i=1}^na_i(\boldsymbol{x})^{\mathbb{I}[A_i(\boldsymbol{x})]} \tag{B8} \end{align} f(x)=i=1nai(x)I[Ai(x)](B8)
(3)隐变量 z z z的分布:由式(9.21)、式(9.22)及贝叶斯公式,有
P ( z = i   ∣   x , Θ ) = P ( x , z = i   ∣   Θ ) P ( x   ∣   Θ ) = α i P ( x   ∣   Θ i ) ∑ i = 1 k α i P ( x   ∣   Θ i ) \begin{align} P(z=i\,|\,\boldsymbol{x },\Theta) & =\frac{ P(\boldsymbol{x },z=i\,|\,\Theta)}{P(\boldsymbol{x }\,|\,\Theta)}\notag \\ & =\frac{ {\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)}{\sum_{i=1}^k{\alpha}_i P(\boldsymbol{x }\,|\,{\Theta}_i)} \tag{9.25} \end{align} P(z=ix,Θ)=P(xΘ)P(x,z=iΘ)=i=1kαiP(xΘi)αiP(xΘi)(9.25)
当已知 x j \boldsymbol{x }_j xj时,对应的隐变量 z z z在时刻 t t t的后验分布为
P ( z j = i   ∣   x j , Θ   t ) = α i   t P ( x j   ∣   Θ i   t ) ∑ i = 1 k α i   t P ( x j   ∣   Θ i   t ) = γ j i   t \begin{align} P(z_j=i\,|\,\boldsymbol{x }_j,{\Theta}^{\,t}) =\frac{ {\alpha}_i^{\,t} P(\boldsymbol{x }_j\,|\,{\Theta}_i^{\,t})}{\sum_{i=1}^k{\alpha}_i^{\,t} P(\boldsymbol{x }_j\,|\,{\Theta}_i^{\,t})}={\gamma}_{ji}^{\,t} \tag{9.26} \end{align} P(zj=ixj,Θt)=i=1kαitP(xjΘit)αitP(xjΘit)=γjit(9.26)
其中, γ j i   t {\gamma}_{ji}^{\,t} γjit即为式(9.8),这即为【西瓜书式(9.30)】。

(4)设样本集为: D ′ = { x j , z j } j = 1 m D'=\{\boldsymbol{x }_j,z_j\}_{j=1}^m D={xj,zj}j=1m,其中, z j z_j zj为样本所属的成分变量(隐变量),记 X = { x j } j = 1 m ,   Z = { z j } j = 1 m \mathbf{X}=\{\boldsymbol{x }_j\}_{j=1}^m,\ \mathbf{Z}=\{z_j\}_{j=1}^m X={xj}j=1m, Z={zj}j=1m
具体化 Q ( Θ   ∣   Θ   t ) Q(\Theta\,|\,{\Theta}^{\,t}) Q(ΘΘt)
Q ( Θ   ∣   Θ   t ) = E Z   ∣   X , Θ   t L L ( Θ   ∣   X , Z ) (由【西瓜书式(7.36)】) = E Z   ∣   X , Θ   t ln ⁡   P ( X , Z   ∣   Θ ) = E Z   ∣   X , Θ   t ln ⁡   ∏ j = 1 m P ( x j , z j   ∣   Θ ) = E Z   ∣   X , Θ   t ∑ j = 1 m ln ⁡   P ( x j , z j   ∣   Θ ) = E Z   ∣   X , Θ   t ∑ j = 1 m ln ⁡   ∏ i = 1 k ( α i P ( x j   ∣   Θ i ) ) I ( z j = i ) (由式(9.24)) = ∑ j = 1 m E Z   ∣   X , Θ   t ( ∑ i = 1 k I ( z j = i ) ln ⁡ ( α i P ( x j   ∣   Θ i ) ) ) = ∑ j = 1 m ∑ i = 1 k ( E Z   ∣   X , Θ   t I ( z j = i ) ) ln ⁡ ( α i P ( x j   ∣   Θ i ) ) = ∑ j = 1 m ∑ i = 1 k ( E z j   ∣   x j , Θ   t I ( z j = i ) ) ln ⁡ ( α i P ( x j   ∣   Θ i ) )   ( E 作用域只含 z j ) = ∑ j = 1 m ∑ i = 1 k P ( z j = i   ∣   x j , Θ   t ) ln ⁡ ( α i P ( x j   ∣   Θ i ) ) (由式(B11)) = ∑ j = 1 m ∑ i = 1 k γ j i   t ln ⁡ ( α i P ( x j   ∣   Θ i ) ) (由式(9.26)) = ∑ j = 1 m ∑ i = 1 k γ j i   t ln ⁡ α i + ∑ j = 1 m ∑ i = 1 k γ j i   t ln ⁡   P ( x j   ∣   Θ i ) \begin{align} Q(\Theta\,|\,{\Theta}^{\,t}) & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\mathrm{LL}(\Theta\,|\,\mathbf{X},\mathbf{Z})\qquad \text{(由【西瓜书式(7.36)】)}\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\ln\, P(\mathbf{X},\mathbf{Z}\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}}\ln\, \prod_{j=1}^m P(\boldsymbol{x }_j,z_j\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \sum_{j=1}^m \ln\, P(\boldsymbol{x }_j,z_j\,|\,\Theta)\notag \\ & = {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \sum_{j=1}^m \ln\, \prod_{i=1}^k( {\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))^{\mathbb{I}(z_j=i)} \qquad \text{(由式(9.24))}\notag \\ & = \sum_{j=1}^m {\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} \left (\sum_{i=1}^k{\mathbb{I}(z_j=i)}\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\right)\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k\left ({\mathbb{E} }_{\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}} {\mathbb{I}(z_j=i)}\right)\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k\left ({\mathbb{E} }_{z_j\,|\,\boldsymbol{x}_j,{\Theta}^{\,t}} {\mathbb{I}(z_j=i)}\right)\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\ \text{(${\mathbb{E} }$作用域只含$z_j$)}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k P(z_j=i\,|\,\boldsymbol{x}_j,{\Theta}^{\,t})\ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\qquad \text{(由式(B11))}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln({\alpha}_i P(\boldsymbol{x }_j\,|\,{\Theta}_i))\qquad \text{(由式(9.26))}\notag \\ & = \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln{\alpha}_i+\sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t}\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i) \end{align} Q(ΘΘt)=EZX,ΘtLL(ΘX,Z)(由【西瓜书式(7.36)】)=EZX,ΘtlnP(X,ZΘ)=EZX,Θtlnj=1mP(xj,zjΘ)=EZX,Θtj=1mlnP(xj,zjΘ)=EZX,Θtj=1mlni=1k(αiP(xjΘi))I(zj=i)(由式(9.24)=j=1mEZX,Θt(i=1kI(zj=i)ln(αiP(xjΘi)))=j=1mi=1k(EZX,ΘtI(zj=i))ln(αiP(xjΘi))=j=1mi=1k(Ezjxj,ΘtI(zj=i))ln(αiP(xjΘi)) E作用域只含zj=j=1mi=1kP(zj=ixj,Θt)ln(αiP(xjΘi))(由式(B11)=j=1mi=1kγjitln(αiP(xjΘi))(由式(9.26)=j=1mi=1kγjitlnαi+j=1mi=1kγjitlnP(xjΘi)
其中用到公式(参见6、指示函数及应用(将分段函数表达成一个式子的技术)):
E x ∈ D I A ( x ) = P ( x ∈ A ) \begin{align} \mathop{\mathbb{E} }\limits_{x\in D}\mathbb{I}_A (x) & =P(x\in A) \tag{B11} \end{align} xDEIA(x)=P(xA)(B11)
即有
Q ( Θ   ∣   Θ   t ) = ∑ j = 1 m ∑ i = 1 k γ j i   t ln ⁡ α i + ∑ j = 1 m ∑ i = 1 k γ j i   t ln ⁡   P ( x j   ∣   Θ i ) \begin{align} Q(\Theta\,|\,{\Theta}^{\,t})= \sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t} \ln{\alpha}_i+\sum_{j=1}^m \sum_{i=1}^k{\gamma}_{ji}^{\,t}\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i) \tag{9.27} \end{align} Q(ΘΘt)=j=1mi=1kγjitlnαi+j=1mi=1kγjitlnP(xjΘi)(9.27)
有了上述准备后,即可使用EM算法:

E步:

(1)推断隐变量分布: P ( Z   ∣   X , Θ   t ) P(\mathbf{Z}\,|\,\mathbf{X},{\Theta}^{\,t}) P(ZX,Θt),这时它等价为 P ( z j   ∣   x j , Θ   t ) ,   j = 1 , 2 , ⋯   , k P(z_j\,|\,\boldsymbol{x}_j,{\Theta}^{\,t}),\, j=1,2,\cdots,k P(zjxj,Θt),j=1,2,,k,即式(9.26),其中参数即为当前参数,而 x j \boldsymbol{x}_j xj也已知,由【西瓜书式(9.28)】和式(9.26)即可计算 γ j i   t {\gamma}_{ji}^{\,t} γjit

(2)列出 Q Q Q的表达式:即式(9.27)。

M步:求 Q Q Q的最大值点:

由式(9.27)有
∂ Q ( Θ   ∣   Θ   t ) ∂ μ i = 0 + ∑ j = 1 m ( γ j i   t ∂ ln ⁡   P ( x j   ∣   Θ i ) ∂ μ i + ∑ l ≠ i γ j l   t ∂ ln ⁡   P ( x j   ∣   Θ l ) ∂ μ i ) = ∑ j = 1 m ( γ j i   t 1 P ( x j   ∣   Θ i ) ∂ P ( x j   ∣   Θ i ) ∂ μ i + 0 ) = − ∑ j = 1 m γ j i   t 1 P ( x j   ∣   Θ i ) p ( x j   ∣   μ i , Σ i ) Σ i − 1 ( x j − μ i ) (由式(9.4)) = − Σ i − 1 ∑ j = 1 m γ j i   t ( x j − μ i ) \begin{align} \frac{\partial Q(\Theta\,|\,{\Theta}^{\,t})}{\partial \boldsymbol{\mu }_i } & = 0+\sum_{j=1}^m \left( {\gamma}_{ji}^{\,t}\frac{\partial\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_i)}{\partial \boldsymbol{\mu }_i }+\sum_{l\neq i} {\gamma}_{jl}^{\,t}\frac{\partial\ln\, P(\boldsymbol{x }_j\,|\,{\Theta}_l)}{\partial \boldsymbol{\mu }_i }\right)\notag \\ & = \sum_{j=1}^m \left( {\gamma}_{ji}^{\,t}\frac{1}{P(\boldsymbol{x }_j\,|\,{\Theta}_i)}\frac{\partial P(\boldsymbol{x }_j\,|\,{\Theta}_i)}{\partial \boldsymbol{\mu }_i }+0\right)\notag \\ & = -\sum_{j=1}^m {\gamma}_{ji}^{\,t}\frac{1}{P(\boldsymbol{x }_j\,|\,{\Theta}_i)}p(\boldsymbol{x }_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}(\boldsymbol{x }_j-\boldsymbol{\mu }_i)\quad \text{(由式(9.4))}\notag \\ & = -{\boldsymbol{\Sigma } }_i^{-1}\sum_{j=1}^m {\gamma}_{ji}^{\,t}(\boldsymbol{x }_j-\boldsymbol{\mu }_i) \tag{9.28} \end{align} μiQ(ΘΘt)=0+j=1m γjitμilnP(xjΘi)+l=iγjltμilnP(xjΘl) =j=1m(γjitP(xjΘi)1μiP(xjΘi)+0)=j=1mγjitP(xjΘi)1p(xjμi,Σi)Σi1(xjμi)(由式(9.4)=Σi1j=1mγjit(xjμi)(9.28)
∂ Q ( Θ   ∣   Θ   t ) ∂ μ i = 0 \frac{\partial Q(\Theta\,|\,{\Theta}^{\,t})}{\partial \boldsymbol{\mu }_i }=\boldsymbol{0} μiQ(ΘΘt)=0,则得与式(9.10)一致的结果:
μ i   t + 1 = ∑ j = 1 m γ j i   t x j ∑ j = 1 m γ j i   t \begin{align} \boldsymbol{\mu }_i^{\,t+1}=\frac{\sum_{j=1}^m{\gamma}_{ji}^{\,t}\boldsymbol{x }_j}{\sum_{j=1}^m{\gamma}_{ji}^{\,t}} \tag{9.29} \end{align} μit+1=j=1mγjitj=1mγjitxj(9.29)
即【西瓜书式(9.34)】。

类似地,从式(9.27)出发,可推导出其他递推式,留给大家练习。

本文为原创,您可以:

  • 点赞(支持博主)
  • 收藏(待以后看)
  • 转发(他考研或学习,正需要)
  • 评论(或讨论)
  • 引用(支持原创)
  • 不侵权

上一篇:9.3 高斯混合聚类算法(男生和女生依比例形成男女混合成绩模型)
下一篇:9.5 密度聚类与层次聚类(DBSCAN算法、AGNES算法)

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值