(《机器学习》完整版系列)第13章 半监督学习——13.1 生成式方法详解(样本数据都是由同一个潜在的模型“生成”的)

在前面章节中,我们学习了:(1)在样本集的样本标记的指导下进行分类;(2)在样本集样本的稠密分布约束下进行聚类。 前者称为监督学习,后者称为无监督学习,无监督学习实际上还是找一个指导:以“稠密度”指导聚类(“稠密度”高的地方不应该分开)。 在许多情况下,既有一些有标记的样本,又有大量的无标记样本,那么,充分利用这两点开发出的机器学习即为半监督学习。

设有标记的样本集: D l D_l Dl,未标记的样本集: D u D_u Du
{ D l = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ ( x l , y l ) } D u = { x l + 1 , x l + 2 , ⋯ x l + u } \begin{align} \begin{cases} D_l=\{(\boldsymbol{x}_1,y_1),(\boldsymbol{x}_2,y_2),\cdots (\boldsymbol{x}_l,y_l)\} \\ D_u=\{\boldsymbol{x}_{l+1},\boldsymbol{x}_{l+2},\cdots \boldsymbol{x}_{l+u}\} \\ \end{cases} \tag{13.1} \end{align} {Dl={(x1,y1),(x2,y2),(xl,yl)}Du={xl+1,xl+2,xl+u}(13.1)
D l ∪ D u D_l\cup D_u DlDu训练分类器,则是半监督学习。

生成式方法分为如下步骤来讨论:
(1)关于 μ i i {\boldsymbol{\mu }_i }_i μii
(2)关于 Σ i \boldsymbol{\Sigma } _i Σi
(3)关于 α i \alpha _i αi
(4)模型参数及简化
(5)应用EM算法求参数:E步和M步不断循环迭代直至收敛,从而得到模型参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N,然后,就可以利用模型进行预测了。

在利用无标记样本集时,必须有一个先验假设,如:聚类时,假定“近墨者黑”。 现在的先验假设是:所有样本数据(无论是否有标记)都是由同一个潜在的模型“生成”的,基于此理念的机器学习方法称为生成式方法。
设潜在的模型为高斯混合模型,对高斯混合模型参数进行估计可以采用EA算法,本篇进行了详细讨论和数学推导。

生成式方法

本节延续(9.3 高斯混合聚类算法(男生和女生依比例形成男女混合成绩模型)
9.4 高斯混合模型EM算法详细推导)的高斯混合分布【西瓜书式(9.29)】相关内容:假设样本由高斯混合模型生成,则有【西瓜书式(13.1)】(由【西瓜书式(9.29)】改写),其中,高斯分布 p ( x   ∣   μ i , Σ i ) p(\boldsymbol{x }\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i) p(xμi,Σi)由【西瓜书式(9.28)】定义。 后验概率 p ( Θ = i   ∣   x ) p(\Theta =i\,|\,\boldsymbol{x }) p(Θ=ix)由【西瓜书式(13.3)】或【西瓜书式(9.30)】给出,在此基础上,我们讨论其半监督学习。

(0)若干准备

x \boldsymbol{x} x所隶属的成分为 Θ \Theta Θ,将样本空间的参数及混合成分记为
θ = ( μ , Σ , α ) = ( { μ i , Σ i , α i } i = 1 N ) ⟹ ( { Θ = i } i = 1 N ) \boldsymbol{\theta } =(\boldsymbol{\mu },{\boldsymbol{\Sigma } },\boldsymbol{\alpha })=(\{\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i,\boldsymbol{\alpha }_i\}_{i=1}^N)\Longrightarrow (\{\Theta =i\}_{i=1}^N) θ=(μ,Σ,α)=({μi,Σi,αi}i=1N)({Θ=i}i=1N)

回到对数似然法【西瓜书式(7.10)】,则
L L ( θ ) = ln ⁡ P ( D l ∪ D u   ∣   θ ) = ln ⁡ ( P ( D l   ∣   θ ) P ( D u   ∣   θ ) ) (由i.i.d.假设) = ln ⁡ [ ∏ j = 1 l P ( x j , y j   ∣   θ ) × ∏ j = l + 1 l + u P ( x j   ∣   θ ) ] = ∑ j = 1 l ln ⁡ P ( x j , y j   ∣   θ ) + ∑ j = l + 1 l + u ln ⁡ P ( x j   ∣   θ ) \begin{align} \mathrm{LL}(\boldsymbol{\theta } ) & =\ln P(D_l\cup D_u\,|\,\boldsymbol{\theta } )\notag \\ & =\ln (P(D_l\,|\,\boldsymbol{\theta } )P(D_u\,|\,\boldsymbol{\theta } ))\quad \text{(由i.i.d.假设)}\notag \\ & =\ln \left[\prod _{j=1}^lP(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )\times \prod _{j=l+1}^{l+u}P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } )\right]\notag \\ & =\sum _{j=1}^l\ln P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )+ \sum _{j=l+1}^{l+u}\ln P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } ) \tag{13.2} \end{align} LL(θ)=lnP(DlDuθ)=ln(P(Dlθ)P(Duθ))(由i.i.d.假设)=ln j=1lP(xj,yjθ)×j=l+1l+uP(xjθ) =j=1llnP(xj,yjθ)+j=l+1l+ulnP(xjθ)(13.2)

P ( x , y   ∣   θ ) = P ( x , y ) (省略统一的条件 θ ,下同) = ∑ i = 1 N P ( Θ = i , x , y ) = ∑ i = 1 N P ( x ) P ( Θ = i   ∣   x ) P ( y   ∣   Θ = i , x ) = ∑ i = 1 N 【西瓜书式(13.1)与(13.3)相乘】 P ( y   ∣   Θ = i , x ) = ∑ i = 1 N α i P ( x   ∣   μ i , Σ i ) P ( y   ∣   Θ = i , x ) \begin{align} P(\boldsymbol{x},y\,|\,\boldsymbol{\theta } ) & =P(\boldsymbol{x},y)\quad \text{(省略统一的条件$\boldsymbol{\theta }$,下同)}\notag \\ & =\sum_{i=1}^N P(\Theta =i,\boldsymbol{x},y)\notag \\ & =\sum_{i=1}^N P(\boldsymbol{x})P(\Theta =i\,|\,\boldsymbol{x})P(y\,|\,\Theta =i,\boldsymbol{x})\notag \\ & =\sum_{i=1}^N \text{【西瓜书式(13.1)与(13.3)相乘】}P(y\,|\,\Theta =i,\boldsymbol{x})\notag \\ & =\sum_{i=1}^N \alpha _iP(\boldsymbol{x}\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y\,|\,\Theta =i,\boldsymbol{x}) \tag{13.3} \end{align} P(x,yθ)=P(x,y)(省略统一的条件θ,下同)=i=1NP(Θ=i,x,y)=i=1NP(x)P(Θ=ix)P(yΘ=i,x)=i=1N【西瓜书式(13.1)(13.3)相乘】P(yΘ=i,x)=i=1NαiP(xμi,Σi)P(yΘ=i,x)(13.3)
将式(13.3)作用于 ( x j , y j ) (\boldsymbol{x}_j,y_j) (xj,yj),得到 P ( x j , y j   ∣   θ ) P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } ) P(xj,yjθ),同样,将【西瓜书式(13.1)】作用于 x j \boldsymbol{x}_j xj,得到 P ( x j   ∣   θ ) P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } ) P(xjθ),记:
{   A j = d e f P ( x j , y j   ∣   θ ) = ∑ i = 1 N α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i )   B j = d e f P ( x j   ∣   θ ) = ∑ i = 1 N α i P ( x j   ∣   μ i , Σ i ) \begin{align} \begin{cases} \ A_j\mathop{=} \limits^{\mathrm{def}} P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )=\sum_{i=1}^N\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \\ \ B_j\mathop{=} \limits^{\mathrm{def}} P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } )=\sum_{i=1}^N\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \\ \end{cases} \tag{13.4} \end{align} { Aj=defP(xj,yjθ)=i=1NαiP(yjΘj=i,xj)P(xjμi,Σi) Bj=defP(xjθ)=i=1NαiP(xjμi,Σi)(13.4)

再引入记号(样本所属成分的后验概率)
γ j i = d e f P ( Θ j = i   ∣   x j ) \begin{align} {\gamma _{ji}} \mathop{=} \limits^{\mathrm{def}} P(\Theta_j =i\,|\,\boldsymbol{x}_j) \tag{13.5} \end{align} γji=defP(Θj=ixj)(13.5)
则由贝叶斯公式【西瓜书式(7.8)】有
γ j i = α i P ( x j   ∣   μ i , Σ i ) B j \begin{align} {\gamma _{ji}}=\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j} \tag{13.6} \end{align} γji=BjαiP(xjμi,Σi)(13.6)
即【西瓜书式(13.5)】.

假设I:假设每个混合成分对应于一个类别

设“若 x j \boldsymbol{x}_j xj属于成分 Θ j = i \Theta_j =i Θj=i,则 x j \boldsymbol{x}_j xj属于类别 i i i”,用概率式子表达即为
P ( y j = i   ∣   Θ j = i ) = 1 \begin{align} P(y_j=i\,|\,\Theta_j =i)=1 \tag{13.7} \end{align} P(yj=iΘj=i)=1(13.7)
D i = D l ⋂ { ( x j , y j ) : y j = i } D_i=D_l\bigcap \{(\boldsymbol{x}_j,y_j):y_j=i\} Di=Dl{(xj,yj):yj=i},则
( x j , y j ) ∈ D i (\boldsymbol{x}_j,y_j)\in D_i (xj,yj)Di时,有
{   P ( y j   ∣   Θ j = i , x j ) = 1   A j = B j \begin{align} \begin{cases} \ P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)=1 \\ \ A_j=B_j \\ \end{cases} \tag{13.8} \end{align} { P(yjΘj=i,xj)=1 Aj=Bj(13.8)
( x j , y j ) ∈ D l ∖ D i (\boldsymbol{x}_j,y_j)\in D_l\setminus D_i (xj,yj)DlDi时,有
P ( y j   ∣   Θ j = i , x j ) = 0 \begin{align} P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)=0 \tag{13.9} \end{align} P(yjΘj=i,xj)=0(13.9)
引入记号 C j [ f ] C_j[f] Cj[f],由式(13.6)、式(13.8)、式(13.9),有
C j [ f ] = d e f ∑ ( x j , y j ) ∈ D l α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) A j f ( x j ) + ∑ x j ∈ D u α i P ( x j   ∣   μ i , Σ i ) B j f ( x j ) = ∑ ( x j , y j ) ∈ D l γ j i B j A j P ( y j   ∣   Θ j = i , x j ) f ( x j ) + ∑ x j ∈ D u γ j i f ( x j ) = ∑ ( x j , y j ) ∈ D i [ γ j i B j A j P ( y j   ∣   Θ j = i , x j ) f ( x j ) ] + ∑ ( x j , y j ) ∈ D l ∖ D i [ γ j i B j A j P ( y j   ∣   Θ j = i , x j ) ( x j − μ i ) ] + ∑ x j ∈ D u γ j i f ( x j ) = ∑ ( x j , y j ) ∈ D i [ γ j i f ( x j ) ] + ∑ ( x j , y j ) ∈ D l ∖ D i [ 0 ] + ∑ x j ∈ D u γ j i f ( x j ) = ∑ x j ∈ D i ⋃ D u γ j i f ( x j ) \begin{align} \quad C_j[f] & \mathop{=} \limits^{\mathrm{def}} \sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ A_j}f(\boldsymbol{x}_j)\notag\\ & \qquad +\sum_{\boldsymbol{x}_j\in D_u}\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j}f(\boldsymbol{x}_j)\tag{13.10} \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)f(\boldsymbol{x}_j)+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_i}[{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)f(\boldsymbol{x}_j)]\notag \\ & \qquad +\sum_{(\boldsymbol{x}_j,y_j)\in D_l\setminus D_i}[{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)(\boldsymbol{x}_j-\boldsymbol{\mu }_i)]+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_i}[{\gamma _{ji}}f(\boldsymbol{x}_j)]+\sum_{(\boldsymbol{x}_j,y_j)\in D_l\setminus D_i}[0]+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j) \tag{13.11} \end{align} Cj[f]=def(xj,yj)DlAjαiP(yjΘj=i,xj)P(xjμi,Σi)f(xj)+xjDuBjαiP(xjμi,Σi)f(xj)=(xj,yj)DlγjiAjBjP(yjΘj=i,xj)f(xj)+xjDuγjif(xj)=(xj,yj)Di[γjiAjBjP(yjΘj=i,xj)f(xj)]+(xj,yj)DlDi[γjiAjBjP(yjΘj=i,xj)(xjμi)]+xjDuγjif(xj)=(xj,yj)Di[γjif(xj)]+(xj,yj)DlDi[0]+xjDuγjif(xj)=xjDiDuγjif(xj)(13.10)(13.11)
式(13.4)代入式(13.2),得到
L L ( θ ) = 【西瓜书式(13.4)】 = ∑ ( x j , y j ) ∈ D l ln ⁡ A j + ∑ x j ∈ D u ln ⁡ B j (简记) \begin{align} \mathrm{LL}(\boldsymbol{\theta } ) & =\text{【西瓜书式(13.4)】}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\ln B_j\quad \text{(简记)} \tag{13.12} \end{align} LL(θ)=【西瓜书式(13.4)=(xj,yj)DllnAj+xjDulnBj(简记)(13.12)
其中, A j ,   B j A_j,\ B_j Aj, Bj为式(13.4)。

再结合约束条件 α i ⩾ 0 , ∑ i = 1 N α i = 1 \alpha _i\geqslant 0,\sum_{i=1}^N\alpha _i=1 αi0,i=1Nαi=1,作拉格朗日函数
L = L L ( θ ) + λ ( ∑ i = 1 N α i − 1 ) \begin{align} L=\mathrm{LL}(\boldsymbol{\theta } )+\lambda (\sum_{i=1}^N\alpha _i-1) \tag{13.13} \end{align} L=LL(θ)+λ(i=1Nαi1)(13.13)

(1)关于 μ i i {\boldsymbol{\mu }_i }_i μii

∂ A j ∂ μ i = ∂ ∂ μ i [ ( α i P ( x j   ∣   μ i , Σ i ) P ( y j   ∣   Θ j = i , x j ) ) + ∑ k ≠ i (与 μ i 无关的项) ] = α i P ( y j   ∣   Θ j = i , x j ) ∂ ∂ μ i P ( x j   ∣   μ i , Σ i ) = − α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) Σ i − 1 ( x j − μ i ) (由9.4 高斯混合模型EM算法详细推导的式(9.4)) \begin{align} \frac{\partial A_j}{\partial \boldsymbol{\mu }_i} & =\frac{\partial }{\partial \boldsymbol{\mu }_i}\left[(\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j))+\sum_{k\neq i}\text{(与$\boldsymbol{\mu }_i$无关的项)}\right]\notag \\ & =\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)\frac{\partial }{\partial \boldsymbol{\mu }_i}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =-\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}){\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\quad \text{(由9.4 高斯混合模型EM算法详细推导的式(9.4))} \tag{13.14} \end{align} μiAj=μi (αiP(xjμi,Σi)P(yjΘj=i,xj))+k=i(与μi无关的项) =αiP(yjΘj=i,xj)μiP(xjμi,Σi)=αiP(yjΘj=i,xj)P(xjμi,Σi)Σi1(xjμi)(由9.4 高斯混合模型EM算法详细推导的式(9.4)(13.14)

同样有
∂ B j ∂ μ i = − α i P ( x j   ∣   μ i , Σ i ) Σ i − 1 ( x j − μ i ) \begin{align} \frac{\partial B_j}{\partial \boldsymbol{\mu }_i} & =-\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}){\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i) \tag{13.15} \end{align} μiBj=αiP(xjμi,Σi)Σi1(xjμi)(13.15)

由式(13.12)、式(13.13)、式(13.14)、式(13.15),有
∂ L ∂ μ i = ∂ L L ( θ ) ∂ μ i = ∑ ( x j , y j ) ∈ D l ∂ ∂ μ i ln ⁡ A j + ∑ x j ∈ D u ∂ ∂ μ i ln ⁡ B j (由式(13.12)) = ∑ ( x j , y j ) ∈ D l 1 A j ∂ A j ∂ μ i + ∑ x j ∈ D u 1 B j ∂ B j ∂ μ i = − ∑ ( x j , y j ) ∈ D l α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) A j Σ i − 1 ( x j − μ i ) − ∑ x j ∈ D u α i P ( x j   ∣   μ i , Σ i ) B j Σ i − 1 ( x j − μ i ) \begin{align} \frac{\partial L}{\partial \boldsymbol{\mu }_i } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \boldsymbol{\mu }_i }\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\partial }{\partial \boldsymbol{\mu }_i }\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\frac{\partial }{\partial \boldsymbol{\mu }_i }\ln B_j\quad \text{(由式(13.12))}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial \boldsymbol{\mu }_i}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial \boldsymbol{\mu }_i}\notag \\ & =-\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ A_j}{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\notag \\ & \quad\quad -\sum_{\boldsymbol{x}_j\in D_u}\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j}{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i) \tag{13.16} \end{align} μiL=μiLL(θ)=(xj,yj)DlμilnAj+xjDuμilnBj(由式(13.12)=(xj,yj)DlAj1μiAj+xjDuBj1μiBj=(xj,yj)DlAjαiP(yjΘj=i,xj)P(xjμi,Σi)Σi1(xjμi)xjDuBjαiP(xjμi,Σi)Σi1(xjμi)(13.16)

f ( x j ) = − Σ i − 1 ( x j − μ i ) f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i) f(xj)=Σi1(xjμi),式(13.16)变为
∂ L ∂ μ i = C j [ f ] ∣ f ( x j ) = − Σ i − 1 ( x j − μ i ) (由式(13.10)) = ∑ x j ∈ D i ⋃ D u γ j i f ( x j ) ∣ f ( x j ) = − Σ i − 1 ( x j − μ i ) (由式(13.11)) = − Σ i − 1 ( ∑ x j ∈ D i ⋃ D u γ j i ( x j − μ i ) ) \begin{align} \frac{\partial L}{\partial \boldsymbol{\mu }_i } & =C_j[f]|_{f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)}\quad \text{(由式(13.10))}\notag \\ & =\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)}\quad \text{(由式(13.11))}\notag \\ & =-{\boldsymbol{\Sigma } _i}^{-1}\left(\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\right) \tag{13.17} \end{align} μiL=Cj[f]f(xj)=Σi1(xjμi)(由式(13.10)=xjDiDuγjif(xj)f(xj)=Σi1(xjμi)(由式(13.11)=Σi1 xjDiDuγji(xjμi) (13.17)
∂ L ∂ μ i = 0 \frac{\partial L}{\partial \boldsymbol{\mu }_i }=\mathbf{0} μiL=0,则
μ i = ∑ x j ∈ D i ⋃ D u γ j i x j ∑ x j ∈ D i ⋃ D u γ j i \begin{align} \boldsymbol{\mu }_i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.18} \end{align} μi=xjDiDuγjixjDiDuγjixj(13.18)

(2)关于 Σ i \boldsymbol{\Sigma } _i Σi

∂ A j ∂ Σ i = ∂ ∂ Σ i [ ( α i P ( x j   ∣   μ i , Σ i ) P ( y j   ∣   Θ j = i , x j ) ) + ∑ k ≠ i (与 Σ i 无关的项) ] = α i P ( y j   ∣   Θ j = i , x j ) ∂ ∂ Σ i P ( x j   ∣   μ i , Σ i ) (下式由9.4 高斯混合模型EM算法详细推导的式(9.14)) = 1 2 α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 = 1 2 α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) f ( x j ) \begin{align} \frac{\partial A_j}{\partial {\boldsymbol{\Sigma } _i}} & =\frac{\partial }{\partial {\boldsymbol{\Sigma } _i}}\left[(\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j))+\sum_{k\neq i}\text{(与${\boldsymbol{\Sigma } _i}$无关的项)}\right]\notag \\ & =\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)\frac{\partial }{\partial {\boldsymbol{\Sigma } _i}}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\quad \text{(下式由9.4 高斯混合模型EM算法详细推导的式(9.14))}\notag \\ & =\frac{1}{2}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\tag{13.19} \end{align} ΣiAj=Σi (αiP(xjμi,Σi)P(yjΘj=i,xj))+k=i(与Σi无关的项) =αiP(yjΘj=i,xj)ΣiP(xjμi,Σi)(下式由9.4 高斯混合模型EM算法详细推导的式(9.14)=21αiP(yjΘj=i,xj)P(xjμi,Σi)Σi1[(xjμi)(xjμi)TΣi]Σi1=21αiP(yjΘj=i,xj)P(xjμi,Σi)f(xj)(13.19)
其中, f ( x j ) = Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1} f(xj)=Σi1[(xjμi)(xjμi)TΣi]Σi1

同样有
∂ B j ∂ Σ i = 1 2 α i P ( x j   ∣   μ i , Σ i ) f ( x j ) \begin{align} \frac{\partial B_j}{\partial {\boldsymbol{\Sigma } _i}} & =\frac{1}{2}\alpha _iP(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j) \tag{13.20} \end{align} ΣiBj=21αiP(xjμi,Σi)f(xj)(13.20)
其中, f ( x j ) = Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1} f(xj)=Σi1[(xjμi)(xjμi)TΣi]Σi1

由式(13.13)、式(13.19)、式(13.20),有
∂ L ∂ Σ i = ∂ L L ( θ ) ∂ Σ i = ∑ ( x j , y j ) ∈ D l ∂ ∂ Σ i ln ⁡ A j + ∑ x j ∈ D u ∂ ∂ Σ i ln ⁡ B j (由式(13.12)) = ∑ ( x j , y j ) ∈ D l 1 A j ∂ A j ∂ Σ i + ∑ x j ∈ D u 1 B j ∂ B j ∂ Σ i = 1 2 ∑ ( x j , y j ) ∈ D l 1 A j α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) f ( x j ) 1 2 + ∑ x j ∈ D u 1 B j α i P ( x j   ∣   μ i , Σ i ) f ( x j ) = 1 2 C j [ f ] ∣ f ( x j ) = Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 (由式(13.10)) = 1 2 ∑ x j ∈ D i ⋃ D u γ j i f ( x j ) ∣ f ( x j ) = Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 (由式(13.11)) = 1 2 ∑ x j ∈ D i ⋃ D u γ j i Σ i − 1 [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] Σ i − 1 = 1 2 Σ i − 1 ( ∑ x j ∈ D i ⋃ D u γ j i [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] ) Σ i − 1 \begin{align} \frac{\partial L}{\partial {\boldsymbol{\Sigma } _i} } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial {\boldsymbol{\Sigma } _i} }\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\partial }{\partial {\boldsymbol{\Sigma } _i} }\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\frac{\partial }{\partial {\boldsymbol{\Sigma } _i} }\ln B_j\quad \text{(由式(13.12))}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial {\boldsymbol{\Sigma } _i}}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial {\boldsymbol{\Sigma } _i}}\notag \\ & =\frac{1}{2}\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\notag \\ &\qquad \frac{1}{2}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\alpha _iP(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\notag \\ & =\frac{1}{2}C_j[f]|_{f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}}\quad \text{(由式(13.10))}\notag \\ & =\frac{1}{2}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}}\quad \text{(由式(13.11))}\notag \\ & =\frac{1}{2}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}{\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}{\boldsymbol{\Sigma } }_i^{-1}\left(\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]\right){\boldsymbol{\Sigma } }_i^{-1} \tag{13.21} \end{align} ΣiL=ΣiLL(θ)=(xj,yj)DlΣilnAj+xjDuΣilnBj(由式(13.12)=(xj,yj)DlAj1ΣiAj+xjDuBj1ΣiBj=21(xj,yj)DlAj1αiP(yjΘj=i,xj)P(xjμi,Σi)f(xj)21+xjDuBj1αiP(xjμi,Σi)f(xj)=21Cj[f]f(xj)=Σi1[(xjμi)(xjμi)TΣi]Σi1(由式(13.10)=21xjDiDuγjif(xj)f(xj)=Σi1[(xjμi)(xjμi)TΣi]Σi1(由式(13.11)=21xjDiDuγjiΣi1[(xjμi)(xjμi)TΣi]Σi1=21Σi1 xjDiDuγji[(xjμi)(xjμi)TΣi] Σi1(13.21)
∂ L ∂ Σ i = 0 \frac{\partial L}{\partial {\boldsymbol{\Sigma } _i} }=\mathbf{0} ΣiL=0,则
∑ x j ∈ D i ⋃ D u γ j i [ ( x j − μ i ) ( x j − μ i ) T − Σ i ] = 0 ∑ x j ∈ D i ⋃ D u γ j i ( x j − μ i ) ( x j − μ i ) T − ∑ x j ∈ D i ⋃ D u γ j i Σ i = 0 Σ i = ∑ x j ∈ D i ⋃ D u γ j i ( x j − μ i ) ( x j − μ i ) T ∑ x j ∈ D i ⋃ D u γ j i \begin{align} & \sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]=\mathbf{0}\notag \\ & \sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}{\boldsymbol{\Sigma } }_i=\mathbf{0}\notag \\ & {\boldsymbol{\Sigma } _i}=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.22} \end{align} xjDiDuγji[(xjμi)(xjμi)TΣi]=0xjDiDuγji(xjμi)(xjμi)TxjDiDuγjiΣi=0Σi=xjDiDuγjixjDiDuγji(xjμi)(xjμi)T(13.22)

(3)关于 α i \alpha _i αi

∂ A j ∂ α i = P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) \begin{align} \frac{\partial A_j}{\partial \alpha _i} & =P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \tag{13.23} \end{align} αiAj=P(yjΘj=i,xj)P(xjμi,Σi)(13.23)

同样有
∂ B j ∂ α i = P ( x j   ∣   μ i , Σ i ) \begin{align} \frac{\partial B_j}{\partial \alpha _i} & =P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \tag{13.24} \end{align} αiBj=P(xjμi,Σi)(13.24)

由式(13.23)、式(13.24),有
∂ L L ( θ ) ∂ α i = ∑ ( x j , y j ) ∈ D l 1 A j ∂ A j ∂ α i + ∑ x j ∈ D u 1 B j ∂ B j ∂ α i = ∑ ( x j , y j ) ∈ D l 1 A j P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) + ∑ x j ∈ D u 1 B j P ( x j   ∣   μ i , Σ i ) = α i − 1 ∑ ( x j , y j ) ∈ D l 1 A j α i P ( y j   ∣   Θ j = i , x j ) P ( x j   ∣   μ i , Σ i ) + α i − 1 ∑ x j ∈ D u 1 B j α i P ( x j   ∣   μ i , Σ i ) = α i − 1 C j [ f ] ∣ f ( x j ) = 1 (由式(13.10)) = α i − 1 ∑ x j ∈ D i ⋃ D u γ j i f ( x j ) ∣ f ( x j ) = 1 (由式(13.11)) = α i − 1 ∑ x j ∈ D i ⋃ D u γ j i \begin{align} \frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \alpha _i } & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial \alpha _i}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial \alpha _i}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =\alpha _i^{-1}\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}{\alpha _i}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag\\ &\qquad +\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}{\alpha _i}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =\alpha _i^{-1}C_j[f]|_{f(\boldsymbol{x}_j)=1}\quad \text{(由式(13.10))}\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)=1}\quad \text{(由式(13.11))}\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.25} \end{align} αiLL(θ)=(xj,yj)DlAj1αiAj+xjDuBj1αiBj=(xj,yj)DlAj1P(yjΘj=i,xj)P(xjμi,Σi)+xjDuBj1P(xjμi,Σi)=αi1(xj,yj)DlAj1αiP(yjΘj=i,xj)P(xjμi,Σi)+αi1xjDuBj1αiP(xjμi,Σi)=αi1Cj[f]f(xj)=1(由式(13.10)=αi1xjDiDuγjif(xj)f(xj)=1(由式(13.11)=αi1xjDiDuγji(13.25)
由式(13.13)
∂ L ∂ α i = ∂ L L ( θ ) ∂ α i + λ ∂ ∂ α i ( ∑ j = 1 N α j − 1 ) = α i − 1 ∑ x j ∈ D i ⋃ D u γ j i + λ \begin{align} \frac{\partial L}{\partial \alpha _i } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \alpha _i }+\lambda \frac{\partial }{\partial \alpha _i }(\sum_{j=1}^N\alpha _j-1)\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}+\lambda \tag{13.26} \end{align} αiL=αiLL(θ)+λαi(j=1Nαj1)=αi1xjDiDuγji+λ(13.26)
令其为 0 0 0,则
α i = − λ − 1 ∑ x j ∈ D i ⋃ D u γ j i λ α i = − ∑ x j ∈ D i ⋃ D u γ j i ∑ i = 1 N λ α i = − ∑ i = 1 N ∑ x j ∈ D i ⋃ D u γ j i λ ∑ i = 1 N α i = − ∑ i = 1 N ∑ x j ∈ D i ⋃ D u γ j i λ = − ∑ i = 1 N ∑ x j ∈ D i ⋃ D u γ j i \begin{align} \alpha _i & =-\lambda ^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.27} \\ \lambda\alpha _i & =-\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \sum _{i=1}^N\lambda\alpha _i & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \lambda\sum _{i=1}^N\alpha _i & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \lambda & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.28} \end{align} αiλαii=1Nλαiλi=1Nαiλ=λ1xjDiDuγji=xjDiDuγji=i=1NxjDiDuγji=i=1NxjDiDuγji=i=1NxjDiDuγji(13.27)(13.28)

由式(13.27)、式(13.28),有
α i = ∑ x j ∈ D i ⋃ D u γ j i ∑ i = 1 N ∑ x j ∈ D i ⋃ D u γ j i \begin{align} \alpha _i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}{\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.29} \end{align} αi=i=1NxjDiDuγjixjDiDuγji(13.29)

(4)模型参数及简化

汇总式(13.18)、式(13.22)、式(13.29),得模型参数
{ μ i = ∑ x j ∈ D i ⋃ D u γ j i x j ∑ x j ∈ D i ⋃ D u γ j i Σ i = ∑ x j ∈ D i ⋃ D u γ j i ( x j − μ i ) ( x j − μ i ) T ∑ x j ∈ D i ⋃ D u γ j i α i = ∑ x j ∈ D i ⋃ D u γ j i ∑ i = 1 N ∑ x j ∈ D i ⋃ D u γ j i \begin{align} \begin{cases} \boldsymbol{\mu }_i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ {\boldsymbol{\Sigma } _i}=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ \alpha _i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}{\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ \end{cases} \tag{13.30} \end{align} μi=xjDiDuγjixjDiDuγjixjΣi=xjDiDuγjixjDiDuγji(xjμi)(xjμi)Tαi=i=1NxjDiDuγjixjDiDuγji(13.30)
其中,成分 i i i的参数: μ i \boldsymbol{\mu }_i μi为向量(中心点), Σ i \boldsymbol{\Sigma } _i Σi为矩阵(样本集的协方差矩阵), α i \alpha _i αi为标量(成分占比)。
另外,公式(13.30)中并没有见到监督的 y i y_i yi,那它的作用在哪里呢?它的作用在于分出 D i D_i Di

在假设I下,模型参数为式(13.30)。
为简化计算,我们在假设I的基础上再增加一个假设 II,则导出模型参数为【西瓜书式 (13.6)(13.7)(13.8)】。

假设 II:假设每个类别对应于一个混合成分
设“若 x j \boldsymbol{x}_j xj属于类别 i i i,则 x j \boldsymbol{x}_j xj属于成分 Θ j = i \Theta_j =i Θj=i”,用概率式子表达即为
P ( Θ j = i   ∣   y j = i ) = 1 \begin{align} P(\Theta_j =i\,|\,y_j=i)=1 \tag{13.31} \end{align} P(Θj=iyj=i)=1(13.31)
即有
P ( Θ j = i   ∣   x j ∈ D i ) = 1 \begin{align} P(\Theta_j =i\,|\,\boldsymbol{x}_j\in D_i)=1 \tag{13.32} \end{align} P(Θj=ixjDi)=1(13.32)
代入式(13.5)得
γ j i = 1 , ( 若  x j ∈ D i ) \begin{align} {\gamma _{ji}}=1,\quad (\text{\text{若}} \ \boldsymbol{x}_j\in D_i) \tag{13.33} \end{align} γji=1,( xjDi)(13.33)
由式 (13.33)有 ∑ x j ∈ D i γ j i = ∣ D i ∣ \sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}=|D_i| xjDiγji=Di,记为 l i l_i li,即为 D l D_l Dl中标记为第 i i i类的样例的数目。

由此,可简化上述模型参数式(13.30)
μ i = ∑ x j ∈ D i ⋃ D u γ j i x j ∑ x j ∈ D i ⋃ D u γ j i = ∑ x j ∈ D i γ j i x j + ∑ x j ∈ D u γ j i x j ∑ x j ∈ D i γ j i + ∑ x j ∈ D u γ j i = ∑ x j ∈ D i x j + ∑ x j ∈ D u γ j i x j l i + ∑ x j ∈ D u γ j i \begin{align} \boldsymbol{\mu }_i & =\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}\notag \\ & =\frac{\sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}\boldsymbol{x}_j+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}\notag \\ & =\frac{\sum_{\boldsymbol{x}_j\in D_i}\boldsymbol{x}_j+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}\boldsymbol{x}_j}{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}} \tag{13.34} \end{align} μi=xjDiDuγjixjDiDuγjixj=xjDiγji+xjDuγjixjDiγjixj+xjDuγjixj=li+xjDuγjixjDixj+xjDuγjixj(13.34)

其中, l i = ∣ D i ∣ l_i=|D_i| li=Di

Σ i = ∑ x j ∈ D i ( x j − μ i ) ( x j − μ i ) T + ∑ x j ∈ D u γ j i ( x j − μ i ) ( x j − μ i ) T l i + ∑ x j ∈ D u γ j i α i = l i + ∑ x j ∈ D u γ j i ∑ i = 1 N ( l i + ∑ x j ∈ D u γ j i ) = l i + ∑ x j ∈ D u γ j i l + ∑ x j ∈ D u ∑ i = 1 N γ j i = l i + ∑ x j ∈ D u γ j i l + ∑ x j ∈ D u = l i + ∑ x j ∈ D u γ j i l + u \begin{align} {\boldsymbol{\Sigma } _i} & =\frac{\sum_{\boldsymbol{x}_j\in D_i}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}} \tag{13.35} \\ \alpha _i & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{\sum _{i=1}^N(l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}})}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+\sum_{\boldsymbol{x}_j\in D_u}\sum _{i=1}^N{\gamma _{ji}}}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+\sum_{\boldsymbol{x}_j\in D_u}}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+u} \tag{13.36} \end{align} Σiαi=li+xjDuγjixjDi(xjμi)(xjμi)T+xjDuγji(xjμi)(xjμi)T=i=1N(li+xjDuγji)li+xjDuγji=l+xjDui=1Nγjili+xjDuγji=l+xjDuli+xjDuγji=l+uli+xjDuγji(13.35)(13.36)
式 (13.34)、式 (13.35)、式(13.36)即【西瓜书式(13.6)(13.7)(13.8)】,当 D l = ∅ D_l=\varnothing Dl=时,即为【西瓜书式(9.34)、式(9.35)、式(9.38)】,当 D u = ∅ D_u=\varnothing Du=时,即为有监督学习。

(5)应用EM算法求参数

E步:根据当前参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N及样本集 { x j } j = 1 m \{\boldsymbol{x}_j\}_{j=1}^m {xj}j=1m,由【西瓜书式(13.5)】计算 γ j i {\gamma _{ji}} γji

M步:基于 γ j i {\gamma _{ji}} γji,由【西瓜书式(13.6)(13.7)(13.8)】更新参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N

E步和M步不断循环迭代直至收敛,从而得到模型参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N,然后,就可以利用模型进行预测了。

本文为原创,您可以:

  • 点赞(支持博主)
  • 收藏(待以后看)
  • 转发(他考研或学习,正需要)
  • 评论(或讨论)
  • 引用(支持原创)
  • 不侵权

上一篇:12.7 定理的证明技巧(烧脑的数学,好玩的技巧)
下一篇:13.2 半监督SVM(SVM的进化路线)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值