在前面章节中,我们学习了:(1)在样本集的样本标记的指导下进行分类;(2)在样本集样本的稠密分布约束下进行聚类。 前者称为监督学习,后者称为无监督学习,无监督学习实际上还是找一个指导:以“稠密度”指导聚类(“稠密度”高的地方不应该分开)。 在许多情况下,既有一些有标记的样本,又有大量的无标记样本,那么,充分利用这两点开发出的机器学习即为半监督学习。
设有标记的样本集:
D
l
D_l
Dl,未标记的样本集:
D
u
D_u
Du
{
D
l
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
⋯
(
x
l
,
y
l
)
}
D
u
=
{
x
l
+
1
,
x
l
+
2
,
⋯
x
l
+
u
}
\begin{align} \begin{cases} D_l=\{(\boldsymbol{x}_1,y_1),(\boldsymbol{x}_2,y_2),\cdots (\boldsymbol{x}_l,y_l)\} \\ D_u=\{\boldsymbol{x}_{l+1},\boldsymbol{x}_{l+2},\cdots \boldsymbol{x}_{l+u}\} \\ \end{cases} \tag{13.1} \end{align}
{Dl={(x1,y1),(x2,y2),⋯(xl,yl)}Du={xl+1,xl+2,⋯xl+u}(13.1)
用
D
l
∪
D
u
D_l\cup D_u
Dl∪Du训练分类器,则是半监督学习。
生成式方法分为如下步骤来讨论:
(1)关于
μ
i
i
{\boldsymbol{\mu }_i }_i
μii
(2)关于
Σ
i
\boldsymbol{\Sigma } _i
Σi
(3)关于
α
i
\alpha _i
αi
(4)模型参数及简化
(5)应用EM算法求参数:E步和M步不断循环迭代直至收敛,从而得到模型参数
(
μ
i
,
Σ
i
,
α
i
)
i
=
1
N
(\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N
(μi,Σi,αi)i=1N,然后,就可以利用模型进行预测了。
在利用无标记样本集时,必须有一个先验假设,如:聚类时,假定“近墨者黑”。 现在的先验假设是:所有样本数据(无论是否有标记)都是由同一个潜在的模型“生成”的,基于此理念的机器学习方法称为生成式方法。
设潜在的模型为高斯混合模型,对高斯混合模型参数进行估计可以采用EA算法,本篇进行了详细讨论和数学推导。
生成式方法
本节延续(9.3 高斯混合聚类算法(男生和女生依比例形成男女混合成绩模型)和
9.4 高斯混合模型EM算法详细推导)的高斯混合分布【西瓜书式(9.29)】相关内容:假设样本由高斯混合模型生成,则有【西瓜书式(13.1)】(由【西瓜书式(9.29)】改写),其中,高斯分布
p
(
x
∣
μ
i
,
Σ
i
)
p(\boldsymbol{x }\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i)
p(x∣μi,Σi)由【西瓜书式(9.28)】定义。 后验概率
p
(
Θ
=
i
∣
x
)
p(\Theta =i\,|\,\boldsymbol{x })
p(Θ=i∣x)由【西瓜书式(13.3)】或【西瓜书式(9.30)】给出,在此基础上,我们讨论其半监督学习。
(0)若干准备
设
x
\boldsymbol{x}
x所隶属的成分为
Θ
\Theta
Θ,将样本空间的参数及混合成分记为
θ
=
(
μ
,
Σ
,
α
)
=
(
{
μ
i
,
Σ
i
,
α
i
}
i
=
1
N
)
⟹
(
{
Θ
=
i
}
i
=
1
N
)
\boldsymbol{\theta } =(\boldsymbol{\mu },{\boldsymbol{\Sigma } },\boldsymbol{\alpha })=(\{\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } }_i,\boldsymbol{\alpha }_i\}_{i=1}^N)\Longrightarrow (\{\Theta =i\}_{i=1}^N)
θ=(μ,Σ,α)=({μi,Σi,αi}i=1N)⟹({Θ=i}i=1N)
回到对数似然法【西瓜书式(7.10)】,则
L
L
(
θ
)
=
ln
P
(
D
l
∪
D
u
∣
θ
)
=
ln
(
P
(
D
l
∣
θ
)
P
(
D
u
∣
θ
)
)
(由i.i.d.假设)
=
ln
[
∏
j
=
1
l
P
(
x
j
,
y
j
∣
θ
)
×
∏
j
=
l
+
1
l
+
u
P
(
x
j
∣
θ
)
]
=
∑
j
=
1
l
ln
P
(
x
j
,
y
j
∣
θ
)
+
∑
j
=
l
+
1
l
+
u
ln
P
(
x
j
∣
θ
)
\begin{align} \mathrm{LL}(\boldsymbol{\theta } ) & =\ln P(D_l\cup D_u\,|\,\boldsymbol{\theta } )\notag \\ & =\ln (P(D_l\,|\,\boldsymbol{\theta } )P(D_u\,|\,\boldsymbol{\theta } ))\quad \text{(由i.i.d.假设)}\notag \\ & =\ln \left[\prod _{j=1}^lP(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )\times \prod _{j=l+1}^{l+u}P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } )\right]\notag \\ & =\sum _{j=1}^l\ln P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )+ \sum _{j=l+1}^{l+u}\ln P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } ) \tag{13.2} \end{align}
LL(θ)=lnP(Dl∪Du∣θ)=ln(P(Dl∣θ)P(Du∣θ))(由i.i.d.假设)=ln
j=1∏lP(xj,yj∣θ)×j=l+1∏l+uP(xj∣θ)
=j=1∑llnP(xj,yj∣θ)+j=l+1∑l+ulnP(xj∣θ)(13.2)
则
P
(
x
,
y
∣
θ
)
=
P
(
x
,
y
)
(省略统一的条件
θ
,下同)
=
∑
i
=
1
N
P
(
Θ
=
i
,
x
,
y
)
=
∑
i
=
1
N
P
(
x
)
P
(
Θ
=
i
∣
x
)
P
(
y
∣
Θ
=
i
,
x
)
=
∑
i
=
1
N
【西瓜书式(13.1)与(13.3)相乘】
P
(
y
∣
Θ
=
i
,
x
)
=
∑
i
=
1
N
α
i
P
(
x
∣
μ
i
,
Σ
i
)
P
(
y
∣
Θ
=
i
,
x
)
\begin{align} P(\boldsymbol{x},y\,|\,\boldsymbol{\theta } ) & =P(\boldsymbol{x},y)\quad \text{(省略统一的条件$\boldsymbol{\theta }$,下同)}\notag \\ & =\sum_{i=1}^N P(\Theta =i,\boldsymbol{x},y)\notag \\ & =\sum_{i=1}^N P(\boldsymbol{x})P(\Theta =i\,|\,\boldsymbol{x})P(y\,|\,\Theta =i,\boldsymbol{x})\notag \\ & =\sum_{i=1}^N \text{【西瓜书式(13.1)与(13.3)相乘】}P(y\,|\,\Theta =i,\boldsymbol{x})\notag \\ & =\sum_{i=1}^N \alpha _iP(\boldsymbol{x}\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y\,|\,\Theta =i,\boldsymbol{x}) \tag{13.3} \end{align}
P(x,y∣θ)=P(x,y)(省略统一的条件θ,下同)=i=1∑NP(Θ=i,x,y)=i=1∑NP(x)P(Θ=i∣x)P(y∣Θ=i,x)=i=1∑N【西瓜书式(13.1)与(13.3)相乘】P(y∣Θ=i,x)=i=1∑NαiP(x∣μi,Σi)P(y∣Θ=i,x)(13.3)
将式(13.3)作用于
(
x
j
,
y
j
)
(\boldsymbol{x}_j,y_j)
(xj,yj),得到
P
(
x
j
,
y
j
∣
θ
)
P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )
P(xj,yj∣θ),同样,将【西瓜书式(13.1)】作用于
x
j
\boldsymbol{x}_j
xj,得到
P
(
x
j
∣
θ
)
P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } )
P(xj∣θ),记:
{
A
j
=
d
e
f
P
(
x
j
,
y
j
∣
θ
)
=
∑
i
=
1
N
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
B
j
=
d
e
f
P
(
x
j
∣
θ
)
=
∑
i
=
1
N
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
\begin{align} \begin{cases} \ A_j\mathop{=} \limits^{\mathrm{def}} P(\boldsymbol{x}_j,y_j\,|\,\boldsymbol{\theta } )=\sum_{i=1}^N\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \\ \ B_j\mathop{=} \limits^{\mathrm{def}} P(\boldsymbol{x}_j\,|\,\boldsymbol{\theta } )=\sum_{i=1}^N\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \\ \end{cases} \tag{13.4} \end{align}
{ Aj=defP(xj,yj∣θ)=∑i=1NαiP(yj∣Θj=i,xj)P(xj∣μi,Σi) Bj=defP(xj∣θ)=∑i=1NαiP(xj∣μi,Σi)(13.4)
再引入记号(样本所属成分的后验概率)
γ
j
i
=
d
e
f
P
(
Θ
j
=
i
∣
x
j
)
\begin{align} {\gamma _{ji}} \mathop{=} \limits^{\mathrm{def}} P(\Theta_j =i\,|\,\boldsymbol{x}_j) \tag{13.5} \end{align}
γji=defP(Θj=i∣xj)(13.5)
则由贝叶斯公式【西瓜书式(7.8)】有
γ
j
i
=
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
B
j
\begin{align} {\gamma _{ji}}=\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j} \tag{13.6} \end{align}
γji=BjαiP(xj∣μi,Σi)(13.6)
即【西瓜书式(13.5)】.
假设I:假设每个混合成分对应于一个类别
设“若
x
j
\boldsymbol{x}_j
xj属于成分
Θ
j
=
i
\Theta_j =i
Θj=i,则
x
j
\boldsymbol{x}_j
xj属于类别
i
i
i”,用概率式子表达即为
P
(
y
j
=
i
∣
Θ
j
=
i
)
=
1
\begin{align} P(y_j=i\,|\,\Theta_j =i)=1 \tag{13.7} \end{align}
P(yj=i∣Θj=i)=1(13.7)
设
D
i
=
D
l
⋂
{
(
x
j
,
y
j
)
:
y
j
=
i
}
D_i=D_l\bigcap \{(\boldsymbol{x}_j,y_j):y_j=i\}
Di=Dl⋂{(xj,yj):yj=i},则
当
(
x
j
,
y
j
)
∈
D
i
(\boldsymbol{x}_j,y_j)\in D_i
(xj,yj)∈Di时,有
{
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
=
1
A
j
=
B
j
\begin{align} \begin{cases} \ P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)=1 \\ \ A_j=B_j \\ \end{cases} \tag{13.8} \end{align}
{ P(yj∣Θj=i,xj)=1 Aj=Bj(13.8)
当
(
x
j
,
y
j
)
∈
D
l
∖
D
i
(\boldsymbol{x}_j,y_j)\in D_l\setminus D_i
(xj,yj)∈Dl∖Di时,有
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
=
0
\begin{align} P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)=0 \tag{13.9} \end{align}
P(yj∣Θj=i,xj)=0(13.9)
引入记号
C
j
[
f
]
C_j[f]
Cj[f],由式(13.6)、式(13.8)、式(13.9),有
C
j
[
f
]
=
d
e
f
∑
(
x
j
,
y
j
)
∈
D
l
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
A
j
f
(
x
j
)
+
∑
x
j
∈
D
u
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
B
j
f
(
x
j
)
=
∑
(
x
j
,
y
j
)
∈
D
l
γ
j
i
B
j
A
j
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
f
(
x
j
)
+
∑
x
j
∈
D
u
γ
j
i
f
(
x
j
)
=
∑
(
x
j
,
y
j
)
∈
D
i
[
γ
j
i
B
j
A
j
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
f
(
x
j
)
]
+
∑
(
x
j
,
y
j
)
∈
D
l
∖
D
i
[
γ
j
i
B
j
A
j
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
(
x
j
−
μ
i
)
]
+
∑
x
j
∈
D
u
γ
j
i
f
(
x
j
)
=
∑
(
x
j
,
y
j
)
∈
D
i
[
γ
j
i
f
(
x
j
)
]
+
∑
(
x
j
,
y
j
)
∈
D
l
∖
D
i
[
0
]
+
∑
x
j
∈
D
u
γ
j
i
f
(
x
j
)
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
f
(
x
j
)
\begin{align} \quad C_j[f] & \mathop{=} \limits^{\mathrm{def}} \sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ A_j}f(\boldsymbol{x}_j)\notag\\ & \qquad +\sum_{\boldsymbol{x}_j\in D_u}\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j}f(\boldsymbol{x}_j)\tag{13.10} \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)f(\boldsymbol{x}_j)+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_i}[{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)f(\boldsymbol{x}_j)]\notag \\ & \qquad +\sum_{(\boldsymbol{x}_j,y_j)\in D_l\setminus D_i}[{\gamma _{ji}}\frac{B_j}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)(\boldsymbol{x}_j-\boldsymbol{\mu }_i)]+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_i}[{\gamma _{ji}}f(\boldsymbol{x}_j)]+\sum_{(\boldsymbol{x}_j,y_j)\in D_l\setminus D_i}[0]+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)\notag \\ & =\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j) \tag{13.11} \end{align}
Cj[f]=def(xj,yj)∈Dl∑AjαiP(yj∣Θj=i,xj)P(xj∣μi,Σi)f(xj)+xj∈Du∑BjαiP(xj∣μi,Σi)f(xj)=(xj,yj)∈Dl∑γjiAjBjP(yj∣Θj=i,xj)f(xj)+xj∈Du∑γjif(xj)=(xj,yj)∈Di∑[γjiAjBjP(yj∣Θj=i,xj)f(xj)]+(xj,yj)∈Dl∖Di∑[γjiAjBjP(yj∣Θj=i,xj)(xj−μi)]+xj∈Du∑γjif(xj)=(xj,yj)∈Di∑[γjif(xj)]+(xj,yj)∈Dl∖Di∑[0]+xj∈Du∑γjif(xj)=xj∈Di⋃Du∑γjif(xj)(13.10)(13.11)
式(13.4)代入式(13.2),得到
L
L
(
θ
)
=
【西瓜书式(13.4)】
=
∑
(
x
j
,
y
j
)
∈
D
l
ln
A
j
+
∑
x
j
∈
D
u
ln
B
j
(简记)
\begin{align} \mathrm{LL}(\boldsymbol{\theta } ) & =\text{【西瓜书式(13.4)】}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\ln B_j\quad \text{(简记)} \tag{13.12} \end{align}
LL(θ)=【西瓜书式(13.4)】=(xj,yj)∈Dl∑lnAj+xj∈Du∑lnBj(简记)(13.12)
其中,
A
j
,
B
j
A_j,\ B_j
Aj, Bj为式(13.4)。
再结合约束条件
α
i
⩾
0
,
∑
i
=
1
N
α
i
=
1
\alpha _i\geqslant 0,\sum_{i=1}^N\alpha _i=1
αi⩾0,∑i=1Nαi=1,作拉格朗日函数
L
=
L
L
(
θ
)
+
λ
(
∑
i
=
1
N
α
i
−
1
)
\begin{align} L=\mathrm{LL}(\boldsymbol{\theta } )+\lambda (\sum_{i=1}^N\alpha _i-1) \tag{13.13} \end{align}
L=LL(θ)+λ(i=1∑Nαi−1)(13.13)
(1)关于 μ i i {\boldsymbol{\mu }_i }_i μii
∂ A j ∂ μ i = ∂ ∂ μ i [ ( α i P ( x j ∣ μ i , Σ i ) P ( y j ∣ Θ j = i , x j ) ) + ∑ k ≠ i (与 μ i 无关的项) ] = α i P ( y j ∣ Θ j = i , x j ) ∂ ∂ μ i P ( x j ∣ μ i , Σ i ) = − α i P ( y j ∣ Θ j = i , x j ) P ( x j ∣ μ i , Σ i ) Σ i − 1 ( x j − μ i ) (由9.4 高斯混合模型EM算法详细推导的式(9.4)) \begin{align} \frac{\partial A_j}{\partial \boldsymbol{\mu }_i} & =\frac{\partial }{\partial \boldsymbol{\mu }_i}\left[(\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j))+\sum_{k\neq i}\text{(与$\boldsymbol{\mu }_i$无关的项)}\right]\notag \\ & =\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)\frac{\partial }{\partial \boldsymbol{\mu }_i}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =-\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}){\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\quad \text{(由9.4 高斯混合模型EM算法详细推导的式(9.4))} \tag{13.14} \end{align} ∂μi∂Aj=∂μi∂ (αiP(xj∣μi,Σi)P(yj∣Θj=i,xj))+k=i∑(与μi无关的项) =αiP(yj∣Θj=i,xj)∂μi∂P(xj∣μi,Σi)=−αiP(yj∣Θj=i,xj)P(xj∣μi,Σi)Σi−1(xj−μi)(由9.4 高斯混合模型EM算法详细推导的式(9.4))(13.14)
同样有
∂
B
j
∂
μ
i
=
−
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
(
x
j
−
μ
i
)
\begin{align} \frac{\partial B_j}{\partial \boldsymbol{\mu }_i} & =-\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}){\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i) \tag{13.15} \end{align}
∂μi∂Bj=−αiP(xj∣μi,Σi)Σi−1(xj−μi)(13.15)
由式(13.12)、式(13.13)、式(13.14)、式(13.15),有
∂
L
∂
μ
i
=
∂
L
L
(
θ
)
∂
μ
i
=
∑
(
x
j
,
y
j
)
∈
D
l
∂
∂
μ
i
ln
A
j
+
∑
x
j
∈
D
u
∂
∂
μ
i
ln
B
j
(由式(13.12))
=
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
∂
A
j
∂
μ
i
+
∑
x
j
∈
D
u
1
B
j
∂
B
j
∂
μ
i
=
−
∑
(
x
j
,
y
j
)
∈
D
l
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
A
j
Σ
i
−
1
(
x
j
−
μ
i
)
−
∑
x
j
∈
D
u
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
B
j
Σ
i
−
1
(
x
j
−
μ
i
)
\begin{align} \frac{\partial L}{\partial \boldsymbol{\mu }_i } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \boldsymbol{\mu }_i }\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\partial }{\partial \boldsymbol{\mu }_i }\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\frac{\partial }{\partial \boldsymbol{\mu }_i }\ln B_j\quad \text{(由式(13.12))}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial \boldsymbol{\mu }_i}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial \boldsymbol{\mu }_i}\notag \\ & =-\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ A_j}{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\notag \\ & \quad\quad -\sum_{\boldsymbol{x}_j\in D_u}\frac{\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})}{ B_j}{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i) \tag{13.16} \end{align}
∂μi∂L=∂μi∂LL(θ)=(xj,yj)∈Dl∑∂μi∂lnAj+xj∈Du∑∂μi∂lnBj(由式(13.12))=(xj,yj)∈Dl∑Aj1∂μi∂Aj+xj∈Du∑Bj1∂μi∂Bj=−(xj,yj)∈Dl∑AjαiP(yj∣Θj=i,xj)P(xj∣μi,Σi)Σi−1(xj−μi)−xj∈Du∑BjαiP(xj∣μi,Σi)Σi−1(xj−μi)(13.16)
令
f
(
x
j
)
=
−
Σ
i
−
1
(
x
j
−
μ
i
)
f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)
f(xj)=−Σi−1(xj−μi),式(13.16)变为
∂
L
∂
μ
i
=
C
j
[
f
]
∣
f
(
x
j
)
=
−
Σ
i
−
1
(
x
j
−
μ
i
)
(由式(13.10))
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
f
(
x
j
)
∣
f
(
x
j
)
=
−
Σ
i
−
1
(
x
j
−
μ
i
)
(由式(13.11))
=
−
Σ
i
−
1
(
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
(
x
j
−
μ
i
)
)
\begin{align} \frac{\partial L}{\partial \boldsymbol{\mu }_i } & =C_j[f]|_{f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)}\quad \text{(由式(13.10))}\notag \\ & =\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)=-{\boldsymbol{\Sigma } _i}^{-1}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)}\quad \text{(由式(13.11))}\notag \\ & =-{\boldsymbol{\Sigma } _i}^{-1}\left(\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x}_j-\boldsymbol{\mu }_i)\right) \tag{13.17} \end{align}
∂μi∂L=Cj[f]∣f(xj)=−Σi−1(xj−μi)(由式(13.10))=xj∈Di⋃Du∑γjif(xj)∣f(xj)=−Σi−1(xj−μi)(由式(13.11))=−Σi−1
xj∈Di⋃Du∑γji(xj−μi)
(13.17)
令
∂
L
∂
μ
i
=
0
\frac{\partial L}{\partial \boldsymbol{\mu }_i }=\mathbf{0}
∂μi∂L=0,则
μ
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
x
j
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} \boldsymbol{\mu }_i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.18} \end{align}
μi=∑xj∈Di⋃Duγji∑xj∈Di⋃Duγjixj(13.18)
(2)关于 Σ i \boldsymbol{\Sigma } _i Σi
∂
A
j
∂
Σ
i
=
∂
∂
Σ
i
[
(
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
)
+
∑
k
≠
i
(与
Σ
i
无关的项)
]
=
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
∂
∂
Σ
i
P
(
x
j
∣
μ
i
,
Σ
i
)
(下式由9.4 高斯混合模型EM算法详细推导的式(9.14))
=
1
2
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
=
1
2
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
f
(
x
j
)
\begin{align} \frac{\partial A_j}{\partial {\boldsymbol{\Sigma } _i}} & =\frac{\partial }{\partial {\boldsymbol{\Sigma } _i}}\left[(\alpha _iP(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j))+\sum_{k\neq i}\text{(与${\boldsymbol{\Sigma } _i}$无关的项)}\right]\notag \\ & =\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)\frac{\partial }{\partial {\boldsymbol{\Sigma } _i}}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\quad \text{(下式由9.4 高斯混合模型EM算法详细推导的式(9.14))}\notag \\ & =\frac{1}{2}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i){\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i] {\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\tag{13.19} \end{align}
∂Σi∂Aj=∂Σi∂
(αiP(xj∣μi,Σi)P(yj∣Θj=i,xj))+k=i∑(与Σi无关的项)
=αiP(yj∣Θj=i,xj)∂Σi∂P(xj∣μi,Σi)(下式由9.4 高斯混合模型EM算法详细推导的式(9.14))=21αiP(yj∣Θj=i,xj)P(xj∣μi,Σi)Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1=21αiP(yj∣Θj=i,xj)P(xj∣μi,Σi)f(xj)(13.19)
其中,
f
(
x
j
)
=
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}
f(xj)=Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1。
同样有
∂
B
j
∂
Σ
i
=
1
2
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
f
(
x
j
)
\begin{align} \frac{\partial B_j}{\partial {\boldsymbol{\Sigma } _i}} & =\frac{1}{2}\alpha _iP(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j) \tag{13.20} \end{align}
∂Σi∂Bj=21αiP(xj∣μi,Σi)f(xj)(13.20)
其中,
f
(
x
j
)
=
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}
f(xj)=Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1。
由式(13.13)、式(13.19)、式(13.20),有
∂
L
∂
Σ
i
=
∂
L
L
(
θ
)
∂
Σ
i
=
∑
(
x
j
,
y
j
)
∈
D
l
∂
∂
Σ
i
ln
A
j
+
∑
x
j
∈
D
u
∂
∂
Σ
i
ln
B
j
(由式(13.12))
=
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
∂
A
j
∂
Σ
i
+
∑
x
j
∈
D
u
1
B
j
∂
B
j
∂
Σ
i
=
1
2
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
f
(
x
j
)
1
2
+
∑
x
j
∈
D
u
1
B
j
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
f
(
x
j
)
=
1
2
C
j
[
f
]
∣
f
(
x
j
)
=
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
(由式(13.10))
=
1
2
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
f
(
x
j
)
∣
f
(
x
j
)
=
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
(由式(13.11))
=
1
2
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
Σ
i
−
1
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
Σ
i
−
1
=
1
2
Σ
i
−
1
(
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
)
Σ
i
−
1
\begin{align} \frac{\partial L}{\partial {\boldsymbol{\Sigma } _i} } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial {\boldsymbol{\Sigma } _i} }\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{\partial }{\partial {\boldsymbol{\Sigma } _i} }\ln A_j+\sum_{\boldsymbol{x}_j\in D_u}\frac{\partial }{\partial {\boldsymbol{\Sigma } _i} }\ln B_j\quad \text{(由式(13.12))}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial {\boldsymbol{\Sigma } _i}}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial {\boldsymbol{\Sigma } _i}}\notag \\ & =\frac{1}{2}\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\alpha _iP(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\notag \\ &\qquad \frac{1}{2}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\alpha _iP(\boldsymbol{x }_j\,|\,\boldsymbol{\mu}_i, {\boldsymbol{\Sigma } }_i)f(\boldsymbol{x}_j)\notag \\ & =\frac{1}{2}C_j[f]|_{f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}}\quad \text{(由式(13.10))}\notag \\ & =\frac{1}{2}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)={\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}}\quad \text{(由式(13.11))}\notag \\ & =\frac{1}{2}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}{\boldsymbol{\Sigma } }_i^{-1}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]{\boldsymbol{\Sigma } }_i^{-1}\notag \\ & =\frac{1}{2}{\boldsymbol{\Sigma } }_i^{-1}\left(\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]\right){\boldsymbol{\Sigma } }_i^{-1} \tag{13.21} \end{align}
∂Σi∂L=∂Σi∂LL(θ)=(xj,yj)∈Dl∑∂Σi∂lnAj+xj∈Du∑∂Σi∂lnBj(由式(13.12))=(xj,yj)∈Dl∑Aj1∂Σi∂Aj+xj∈Du∑Bj1∂Σi∂Bj=21(xj,yj)∈Dl∑Aj1αiP(yj∣Θj=i,xj)P(xj∣μi,Σi)f(xj)21+xj∈Du∑Bj1αiP(xj∣μi,Σi)f(xj)=21Cj[f]∣f(xj)=Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1(由式(13.10))=21xj∈Di⋃Du∑γjif(xj)∣f(xj)=Σi−1[(xj−μi)(xj−μi)T−Σi]Σi−1(由式(13.11))=21xj∈Di⋃Du∑γjiΣi−1[(xj−μi)(xj−μi)T−Σi]Σi−1=21Σi−1
xj∈Di⋃Du∑γji[(xj−μi)(xj−μi)T−Σi]
Σi−1(13.21)
令
∂
L
∂
Σ
i
=
0
\frac{\partial L}{\partial {\boldsymbol{\Sigma } _i} }=\mathbf{0}
∂Σi∂L=0,则
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
[
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
Σ
i
]
=
0
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
−
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
Σ
i
=
0
Σ
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} & \sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}[(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-{\boldsymbol{\Sigma } }_i]=\mathbf{0}\notag \\ & \sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}-\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}{\boldsymbol{\Sigma } }_i=\mathbf{0}\notag \\ & {\boldsymbol{\Sigma } _i}=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.22} \end{align}
xj∈Di⋃Du∑γji[(xj−μi)(xj−μi)T−Σi]=0xj∈Di⋃Du∑γji(xj−μi)(xj−μi)T−xj∈Di⋃Du∑γjiΣi=0Σi=∑xj∈Di⋃Duγji∑xj∈Di⋃Duγji(xj−μi)(xj−μi)T(13.22)
(3)关于 α i \alpha _i αi
∂ A j ∂ α i = P ( y j ∣ Θ j = i , x j ) P ( x j ∣ μ i , Σ i ) \begin{align} \frac{\partial A_j}{\partial \alpha _i} & =P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \tag{13.23} \end{align} ∂αi∂Aj=P(yj∣Θj=i,xj)P(xj∣μi,Σi)(13.23)
同样有
∂
B
j
∂
α
i
=
P
(
x
j
∣
μ
i
,
Σ
i
)
\begin{align} \frac{\partial B_j}{\partial \alpha _i} & =P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i}) \tag{13.24} \end{align}
∂αi∂Bj=P(xj∣μi,Σi)(13.24)
由式(13.23)、式(13.24),有
∂
L
L
(
θ
)
∂
α
i
=
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
∂
A
j
∂
α
i
+
∑
x
j
∈
D
u
1
B
j
∂
B
j
∂
α
i
=
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
+
∑
x
j
∈
D
u
1
B
j
P
(
x
j
∣
μ
i
,
Σ
i
)
=
α
i
−
1
∑
(
x
j
,
y
j
)
∈
D
l
1
A
j
α
i
P
(
y
j
∣
Θ
j
=
i
,
x
j
)
P
(
x
j
∣
μ
i
,
Σ
i
)
+
α
i
−
1
∑
x
j
∈
D
u
1
B
j
α
i
P
(
x
j
∣
μ
i
,
Σ
i
)
=
α
i
−
1
C
j
[
f
]
∣
f
(
x
j
)
=
1
(由式(13.10))
=
α
i
−
1
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
f
(
x
j
)
∣
f
(
x
j
)
=
1
(由式(13.11))
=
α
i
−
1
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} \frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \alpha _i } & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}\frac{\partial A_j}{\partial \alpha _i}+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}\frac{\partial B_j}{\partial \alpha _i}\notag \\ & =\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})+\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =\alpha _i^{-1}\sum_{(\boldsymbol{x}_j,y_j)\in D_l}\frac{1}{ A_j}{\alpha _i}P(y_j\,|\,\Theta_j =i,\boldsymbol{x}_j)P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag\\ &\qquad +\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_u}\frac{1}{ B_j}{\alpha _i}P(\boldsymbol{x}_j\,|\,\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i})\notag \\ & =\alpha _i^{-1}C_j[f]|_{f(\boldsymbol{x}_j)=1}\quad \text{(由式(13.10))}\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}f(\boldsymbol{x}_j)|_{f(\boldsymbol{x}_j)=1}\quad \text{(由式(13.11))}\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.25} \end{align}
∂αi∂LL(θ)=(xj,yj)∈Dl∑Aj1∂αi∂Aj+xj∈Du∑Bj1∂αi∂Bj=(xj,yj)∈Dl∑Aj1P(yj∣Θj=i,xj)P(xj∣μi,Σi)+xj∈Du∑Bj1P(xj∣μi,Σi)=αi−1(xj,yj)∈Dl∑Aj1αiP(yj∣Θj=i,xj)P(xj∣μi,Σi)+αi−1xj∈Du∑Bj1αiP(xj∣μi,Σi)=αi−1Cj[f]∣f(xj)=1(由式(13.10))=αi−1xj∈Di⋃Du∑γjif(xj)∣f(xj)=1(由式(13.11))=αi−1xj∈Di⋃Du∑γji(13.25)
由式(13.13)
∂
L
∂
α
i
=
∂
L
L
(
θ
)
∂
α
i
+
λ
∂
∂
α
i
(
∑
j
=
1
N
α
j
−
1
)
=
α
i
−
1
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
+
λ
\begin{align} \frac{\partial L}{\partial \alpha _i } & =\frac{\partial \mathrm{LL}(\boldsymbol{\theta } )}{\partial \alpha _i }+\lambda \frac{\partial }{\partial \alpha _i }(\sum_{j=1}^N\alpha _j-1)\notag \\ & =\alpha _i^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}+\lambda \tag{13.26} \end{align}
∂αi∂L=∂αi∂LL(θ)+λ∂αi∂(j=1∑Nαj−1)=αi−1xj∈Di⋃Du∑γji+λ(13.26)
令其为
0
0
0,则
α
i
=
−
λ
−
1
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
λ
α
i
=
−
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
∑
i
=
1
N
λ
α
i
=
−
∑
i
=
1
N
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
λ
∑
i
=
1
N
α
i
=
−
∑
i
=
1
N
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
λ
=
−
∑
i
=
1
N
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} \alpha _i & =-\lambda ^{-1}\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.27} \\ \lambda\alpha _i & =-\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \sum _{i=1}^N\lambda\alpha _i & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \lambda\sum _{i=1}^N\alpha _i & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\notag \\ \lambda & =-\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}} \tag{13.28} \end{align}
αiλαii=1∑Nλαiλi=1∑Nαiλ=−λ−1xj∈Di⋃Du∑γji=−xj∈Di⋃Du∑γji=−i=1∑Nxj∈Di⋃Du∑γji=−i=1∑Nxj∈Di⋃Du∑γji=−i=1∑Nxj∈Di⋃Du∑γji(13.27)(13.28)
由式(13.27)、式(13.28),有
α
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
∑
i
=
1
N
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} \alpha _i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}{\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \tag{13.29} \end{align}
αi=∑i=1N∑xj∈Di⋃Duγji∑xj∈Di⋃Duγji(13.29)
(4)模型参数及简化
汇总式(13.18)、式(13.22)、式(13.29),得模型参数
{
μ
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
x
j
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
Σ
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
α
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
∑
i
=
1
N
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
\begin{align} \begin{cases} \boldsymbol{\mu }_i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ {\boldsymbol{\Sigma } _i}=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ \alpha _i=\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}{\sum _{i=1}^N\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}} \\ \end{cases} \tag{13.30} \end{align}
⎩
⎨
⎧μi=∑xj∈Di⋃Duγji∑xj∈Di⋃DuγjixjΣi=∑xj∈Di⋃Duγji∑xj∈Di⋃Duγji(xj−μi)(xj−μi)Tαi=∑i=1N∑xj∈Di⋃Duγji∑xj∈Di⋃Duγji(13.30)
其中,成分
i
i
i的参数:
μ
i
\boldsymbol{\mu }_i
μi为向量(中心点),
Σ
i
\boldsymbol{\Sigma } _i
Σi为矩阵(样本集的协方差矩阵),
α
i
\alpha _i
αi为标量(成分占比)。
另外,公式(13.30)中并没有见到监督的
y
i
y_i
yi,那它的作用在哪里呢?它的作用在于分出
D
i
D_i
Di。
在假设I下,模型参数为式(13.30)。
为简化计算,我们在假设I的基础上再增加一个假设 II,则导出模型参数为【西瓜书式 (13.6)(13.7)(13.8)】。
假设 II:假设每个类别对应于一个混合成分
设“若
x
j
\boldsymbol{x}_j
xj属于类别
i
i
i,则
x
j
\boldsymbol{x}_j
xj属于成分
Θ
j
=
i
\Theta_j =i
Θj=i”,用概率式子表达即为
P
(
Θ
j
=
i
∣
y
j
=
i
)
=
1
\begin{align} P(\Theta_j =i\,|\,y_j=i)=1 \tag{13.31} \end{align}
P(Θj=i∣yj=i)=1(13.31)
即有
P
(
Θ
j
=
i
∣
x
j
∈
D
i
)
=
1
\begin{align} P(\Theta_j =i\,|\,\boldsymbol{x}_j\in D_i)=1 \tag{13.32} \end{align}
P(Θj=i∣xj∈Di)=1(13.32)
代入式(13.5)得
γ
j
i
=
1
,
(
若
x
j
∈
D
i
)
\begin{align} {\gamma _{ji}}=1,\quad (\text{\text{若}} \ \boldsymbol{x}_j\in D_i) \tag{13.33} \end{align}
γji=1,(若 xj∈Di)(13.33)
由式 (13.33)有
∑
x
j
∈
D
i
γ
j
i
=
∣
D
i
∣
\sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}=|D_i|
∑xj∈Diγji=∣Di∣,记为
l
i
l_i
li,即为
D
l
D_l
Dl中标记为第
i
i
i类的样例的数目。
由此,可简化上述模型参数式(13.30)
μ
i
=
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
x
j
∑
x
j
∈
D
i
⋃
D
u
γ
j
i
=
∑
x
j
∈
D
i
γ
j
i
x
j
+
∑
x
j
∈
D
u
γ
j
i
x
j
∑
x
j
∈
D
i
γ
j
i
+
∑
x
j
∈
D
u
γ
j
i
=
∑
x
j
∈
D
i
x
j
+
∑
x
j
∈
D
u
γ
j
i
x
j
l
i
+
∑
x
j
∈
D
u
γ
j
i
\begin{align} \boldsymbol{\mu }_i & =\frac{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i\bigcup D_u}{\gamma _{ji}}}\notag \\ & =\frac{\sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}\boldsymbol{x}_j+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}\boldsymbol{x}_j}{\sum_{\boldsymbol{x}_j\in D_i}{\gamma _{ji}}+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}\notag \\ & =\frac{\sum_{\boldsymbol{x}_j\in D_i}\boldsymbol{x}_j+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}\boldsymbol{x}_j}{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}} \tag{13.34} \end{align}
μi=∑xj∈Di⋃Duγji∑xj∈Di⋃Duγjixj=∑xj∈Diγji+∑xj∈Duγji∑xj∈Diγjixj+∑xj∈Duγjixj=li+∑xj∈Duγji∑xj∈Dixj+∑xj∈Duγjixj(13.34)
其中, l i = ∣ D i ∣ l_i=|D_i| li=∣Di∣
Σ
i
=
∑
x
j
∈
D
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
+
∑
x
j
∈
D
u
γ
j
i
(
x
j
−
μ
i
)
(
x
j
−
μ
i
)
T
l
i
+
∑
x
j
∈
D
u
γ
j
i
α
i
=
l
i
+
∑
x
j
∈
D
u
γ
j
i
∑
i
=
1
N
(
l
i
+
∑
x
j
∈
D
u
γ
j
i
)
=
l
i
+
∑
x
j
∈
D
u
γ
j
i
l
+
∑
x
j
∈
D
u
∑
i
=
1
N
γ
j
i
=
l
i
+
∑
x
j
∈
D
u
γ
j
i
l
+
∑
x
j
∈
D
u
=
l
i
+
∑
x
j
∈
D
u
γ
j
i
l
+
u
\begin{align} {\boldsymbol{\Sigma } _i} & =\frac{\sum_{\boldsymbol{x}_j\in D_i}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}(\boldsymbol{x }_j-\boldsymbol{\mu}_i)(\boldsymbol{x }_j-\boldsymbol{\mu}_i)^{\mathrm{T}}}{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}} \tag{13.35} \\ \alpha _i & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{\sum _{i=1}^N(l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}})}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+\sum_{\boldsymbol{x}_j\in D_u}\sum _{i=1}^N{\gamma _{ji}}}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+\sum_{\boldsymbol{x}_j\in D_u}}\notag \\ & =\frac{l_i+\sum_{\boldsymbol{x}_j\in D_u}{\gamma _{ji}}}{l+u} \tag{13.36} \end{align}
Σiαi=li+∑xj∈Duγji∑xj∈Di(xj−μi)(xj−μi)T+∑xj∈Duγji(xj−μi)(xj−μi)T=∑i=1N(li+∑xj∈Duγji)li+∑xj∈Duγji=l+∑xj∈Du∑i=1Nγjili+∑xj∈Duγji=l+∑xj∈Duli+∑xj∈Duγji=l+uli+∑xj∈Duγji(13.35)(13.36)
式 (13.34)、式 (13.35)、式(13.36)即【西瓜书式(13.6)(13.7)(13.8)】,当
D
l
=
∅
D_l=\varnothing
Dl=∅时,即为【西瓜书式(9.34)、式(9.35)、式(9.38)】,当
D
u
=
∅
D_u=\varnothing
Du=∅时,即为有监督学习。
(5)应用EM算法求参数
E步:根据当前参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N及样本集 { x j } j = 1 m \{\boldsymbol{x}_j\}_{j=1}^m {xj}j=1m,由【西瓜书式(13.5)】计算 γ j i {\gamma _{ji}} γji。
M步:基于 γ j i {\gamma _{ji}} γji,由【西瓜书式(13.6)(13.7)(13.8)】更新参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N。
E步和M步不断循环迭代直至收敛,从而得到模型参数 ( μ i , Σ i , α i ) i = 1 N (\boldsymbol{\mu }_i,{\boldsymbol{\Sigma } _i},\alpha _i)_{i=1}^N (μi,Σi,αi)i=1N,然后,就可以利用模型进行预测了。
本文为原创,您可以:
- 点赞(支持博主)
- 收藏(待以后看)
- 转发(他考研或学习,正需要)
- 评论(或讨论)
- 引用(支持原创)
- 不侵权