【西瓜书笔记】10. 高斯混合模型

定义

定义:
P ( x ) = ∑ i = 1 k α i ⋅ ϕ ( x ∣ μ i , Σ i ) P(\boldsymbol{x})=\sum_{i=1}^{k} \alpha_{i} \cdot \phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) P(x)=i=1kαiϕ(xμi,Σi)
该模型共由k个混合成分组成,每个混合成分对应一个高斯分布,其中, x ∈ R n \boldsymbol{x} \in \mathbb{R}^{n} xRn α i \alpha_i αi为混合系数,且 α i ≥ 0 , ∑ i = 1 k α i = 1 \alpha_{i} \geq 0, \sum_{i=1}^{k} \alpha_{i}=1 αi0,i=1kαi=1, ϕ ( x ∣ μ i , Σ i ) \phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) ϕ(xμi,Σi)为多元高斯分布(当 x \boldsymbol{x} x为标量时,相应地替换为一元高斯分布)的概率密度函数:
ϕ ( x ∣ μ i , Σ i ) = 1 ( 2 π ) n 2 ∣ Σ i ∣ 1 2 exp ⁡ ( − 1 2 ( x − μ i ) T Σ i − 1 ( x − μ i ) ) \phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)=\frac{1}{(2 \pi)^{\frac{n}{2}}\left|\boldsymbol{\Sigma}_{i}\right|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}_{i}\right)\right) ϕ(xμi,Σi)=(2π)2nΣi211exp(21(xμi)TΣi1(xμi))
其生成数据的方式为:首先,依概率 α i \alpha_i αi选择第i个高斯混合成分,接着依据该混合成分的概率分布 ϕ ( x ∣ μ i , Σ i ) \phi\left(\boldsymbol{x} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) ϕ(xμi,Σi)生成样本。

参数估计

EM算法

已知数据集 D = { x 1 , x 2 , … , x m } D=\left\{\boldsymbol{x}_{1}, \boldsymbol{x}_{2}, \ldots, \boldsymbol{x}_{m}\right\} D={x1,x2,,xm}中的样本均由某个高斯混合模型生成,而每个样本 x j \boldsymbol{x}_{j} xj是由哪个高斯混合成分生成是未知的,属于一个隐变量,我们令其为 z j , z j ∈ { 1 , 2 , … , k } z_{j}, z_{j} \in\{1,2, \ldots, k\} zj,zj{1,2,,k}表示生成样本 x j \boldsymbol{x}_{j} xj的高斯混合成分,结合高斯混合模型生成数据的方式易知 z j z_j zj的分布律为 P ( z j = i ) = α i P\left(z_{j}=i\right)=\alpha_{i} P(zj=i)=αi。接下来应用EM算法。

E步:确定Q函数,并把样本序列和隐变量序列代入其中

Q ( θ ∣ θ ( i ) ) = ∑ Z P ( Z ∣ X , θ ( i ) ) ln ⁡ P ( X , Z ∣ θ ) = ∑ z 1 , z 2 , … , z m { ∏ j = 1 m P ( z j ∣ x j , θ ( i ) ) ln ⁡ [ ∏ j = 1 m P ( x j , z j ∣ θ ) ] } = ∑ j = 1 m [ ∑ z j P ( z j ∣ x j , θ ( i ) ) ln ⁡ P ( x j , z j ∣ θ ) ] = ∑ j = 1 m [ ∑ i = 1 k P ( z j = i ∣ x j , θ ( i ) ) ln ⁡ P ( x j , z j = i ∣ θ ) ] \begin{aligned} Q\left(\theta \mid \theta^{(i)}\right) &=\sum_{Z} P\left(Z \mid X, \theta^{(i)}\right) \ln P(X, Z \mid \theta) \\ &=\sum_{z_{1}, z_{2}, \ldots, z_{m}}\left\{\prod_{j=1}^{m} P\left(z_{j} \mid x_{j}, \theta^{(i)}\right) \ln \left[\prod_{j=1}^{m} P\left(x_{j}, z_{j} \mid \theta\right)\right]\right\} \\ &=\sum_{j=1}^{m}\left[\sum_{z_{j}} P\left(z_{j} \mid x_{j}, \theta^{(i)}\right) \ln P\left(x_{j}, z_{j} \mid \theta\right)\right] \\ &=\sum_{j=1}^{m}\left[\sum_{i=1}^{k} P\left(z_{j}=i \mid x_{j}, \theta^{(i)}\right) \ln P\left(x_{j}, z_{j}=i \mid \theta\right)\right] \end{aligned} Q(θθ(i))=ZP(ZX,θ(i))lnP(X,Zθ)=z1,z2,,zm{j=1mP(zjxj,θ(i))ln[j=1mP(xj,zjθ)]}=j=1mzjP(zjxj,θ(i))lnP(xj,zjθ)=j=1m[i=1kP(zj=ixj,θ(i))lnP(xj,zj=iθ)]

其中,第2个等式到第3等式是根据EM算法笔记中的结果得到。

对于 P ( z j = i ∣ x j , θ ( i ) ) P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right) P(zj=ixj,θ(i)),如果我们先不考虑 θ ( i ) \theta^{(i)} θ(i),有
P ( z j = i ∣ x j ) = P ( z j = i ) ⋅ P ( x j ∣ z j = i ) P ( x j ) = α i ⋅ ϕ ( x j ∣ μ i , Σ i ) ∑ l = 1 k α l ⋅ ϕ ( x j ∣ μ l , Σ l ) \begin{aligned} P\left(z_{j}=i \mid \boldsymbol{x}_{j}\right) &=\frac{P\left(z_{j}=i\right) \cdot P\left(\boldsymbol{x}_{j} \mid z_{j}=i\right)}{P\left(\boldsymbol{x}_{j}\right)} \\ &=\frac{\alpha_{i} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i} , \boldsymbol{\Sigma}_{i}\right)}{\sum_{l=1}^{k} \alpha_{l} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{l}, \boldsymbol{\Sigma}_{l}\right)} \end{aligned} P(zj=ixj)=P(xj)P(zj=i)P(xjzj=i)=l=1kαlϕ(xjμl,Σl)αiϕ(xjμi,Σi)
这就是西瓜书中的式9.30。如果考虑 θ ( i ) \theta^{(i)} θ(i), 那么
P ( z j = i ∣ x j , θ ( i ) ) = α i ( i ) ⋅ ϕ ( x j ∣ μ i ( i ) , Σ i ( i ) ) ∑ l = 1 k α l ( i ) ⋅ ϕ ( x j ∣ μ l ( i ) , Σ l ( i ) ) P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right)=\frac{\alpha_{i}^{(i)} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}^{(i)}, \boldsymbol{\Sigma}_{i}^{(i)}\right)}{\sum_{l=1}^{k} \alpha_{l}^{(i)} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{l}^{(i)}, \boldsymbol{\Sigma}_{l}^{(i)}\right)} P(zj=ixj,θ(i))=l=1kαl(i)ϕ(xjμl(i),Σl(i))αi(i)ϕ(xjμi(i),Σi(i))
这里的i表示这是第i次迭代,同时也表示这个参数是已知的,也就是说式中都是已知量,那这个概率值也是已知量,所以我们可以将其简记为 γ j i \gamma_{j i} γji

对于 P ( x j , z j = i ∣ θ ) P\left(x_{j}, z_{j}=i \mid \theta\right) P(xj,zj=iθ),利用 P ( A , B ) = P ( A ) ⋅ P ( A ∣ B ) P(A, B)=P(A)\cdot P(A|B) P(A,B)=P(A)P(AB),可以有:
P ( x j , z j = i ∣ θ ) = P ( x j ∣ z j = i , θ ) ⋅ P ( z j = i ∣ θ ) = ϕ ( x j ∣ μ i , Σ i ) ⋅ α i \begin{aligned} P\left(\boldsymbol{x}_{j}, z_{j}=i \mid \theta\right) &=P\left(\boldsymbol{x}_{j} \mid z_{j}=i, \theta\right) \cdot P\left(z_{j}=i \mid \theta\right) \\ &=\phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot \alpha_{i} \end{aligned} P(xj,zj=iθ)=P(xjzj=i,θ)P(zj=iθ)=ϕ(xjμi,Σi)αi

这时这个式子中的三个参数都是未知的。将上面两个式子代回到Q函数,可得
Q ( θ ∣ θ ( i ) ) = ∑ j = 1 m [ ∑ i = 1 k P ( z j = i ∣ x j , θ ( i ) ) ln ⁡ P ( x j , z j = i ∣ θ ) ] = ∑ j = 1 m ∑ i = 1 k γ j i ln ⁡ [ α i ⋅ ϕ ( x j ∣ μ i , Σ i ) ] = ∑ j = 1 m ∑ i = 1 k γ j i [ ln ⁡ α i + ln ⁡ ϕ ( x j ∣ μ i , Σ i ) ] = ∑ j = 1 m ∑ i = 1 k [ γ j i ln ⁡ α i + γ j i ln ⁡ ϕ ( x j ∣ μ i , Σ i ) ] = ∑ j = 1 m ∑ i = 1 k { γ j i ln ⁡ α i + γ j i ln ⁡ [ 1 ( 2 π ) n 2 ∣ Σ i ∣ 1 2 exp ⁡ ( − 1 2 ( x j − μ i ) T Σ i − 1 ( x j − μ i ) ) ] } = ∑ j = 1 m ∑ i = 1 k { γ j i ln ⁡ α i + γ j i [ ln ⁡ 1 ( 2 π ) n 2 − 1 2 ln ⁡ ∣ Σ i ∣ − 1 2 ( x j − μ i ) T Σ i − 1 ( x j − μ i ) ] } = ∑ j = 1 m ∑ i = 1 k { γ j i ln ⁡ α i + γ j i ln ⁡ 1 ( 2 π ) n 2 − 1 2 γ j i ln ⁡ ∣ Σ i ∣ − 1 2 γ j i ( x j − μ i ) T Σ i − 1 ( x j − μ i ) } \begin{aligned} Q\left(\theta \mid \theta^{(i)}\right) &=\sum_{j=1}^{m}\left[\sum_{i=1}^{k} P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right) \ln P\left(\boldsymbol{x}_{j}, z_{j}=i \mid \theta\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i} \ln \left[\alpha_{i} \cdot \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right]\\ &=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i}\left[\ln \alpha_{i}+\ln \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left[\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \phi\left(\boldsymbol{x}_{j} \mid \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right] \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \left[\frac{1}{(2 \pi)^{\frac{n}{2}}\left|\boldsymbol{\Sigma}_{i}\right|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right)\right]\right\} \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i}\left[\ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]\right\} \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right\} \end{aligned} Q(θθ(i))=j=1m[i=1kP(zj=ixj,θ(i))lnP(xj,zj=iθ)]=j=1mi=1kγjiln[αiϕ(xjμi,Σi)]=j=1mi=1kγji[lnαi+lnϕ(xjμi,Σi)]=j=1mi=1k[γjilnαi+γjilnϕ(xjμi,Σi)]=j=1mi=1k{γjilnαi+γjiln[(2π)2nΣi211exp(21(xjμi)TΣi1(xjμi))]}=j=1mi=1k{γjilnαi+γji[ln(2π)2n121lnΣi21(xjμi)TΣi1(xjμi)]}=j=1mi=1k{γjilnαi+γjiln(2π)2n121γjilnΣi21γji(xjμi)TΣi1(xjμi)}
接下来就对Q函数进行极大化操作。对于m个多元正态分布生成的样本的似然函数 ∑ i = 1 m ln ⁡ ϕ ( x j ∣ μ j , Σ j ) \sum^{m}_{i=1}\ln\phi(\boldsymbol{x}_{j}\mid \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_{j}) i=1mlnϕ(xjμj,Σj),对比上面第4个等式的式子,差别仅仅是多了 γ , α \gamma,\alpha γ,α这些常数项。而且注意,虽然有 ∑ i = 1 k \sum_{i=1}^{k} i=1k这个求和符号,但是我们想要极大化的是特定的 ( μ i , Σ i ) (\mu_{i}, \Sigma_{i}) (μi,Σi),对于下标不等于i的参数集合 ( μ , Σ ) (\mu, \Sigma) (μ,Σ),我们都可以看成是常数,所以 ∑ i = 1 k \sum_{i=1}^{k} i=1k这个求和符号其实可以在极大化操作中忽略。而 ln ⁡ α i \ln \alpha_i lnαi是个凹函数,且有线性等式约束 α i ≥ 0 , ∑ i = 1 k α i = 1 \alpha_{i} \geq 0, \sum_{i=1}^{k} \alpha_{i}=1 αi0,i=1kαi=1,我们可以用拉格朗日乘子法求出来的点一定是目标函数的最大值点,所以我们用拉格朗日乘子法求 α i \alpha_i αi

M步:求使得Q函数达到极大的 θ ( i + 1 ) \theta^{(i+1)} θ(i+1)

μ i ( i + 1 ) \boldsymbol{\mu}_{i}^{(i+1)} μi(i+1),也就是对于Q函数关于 μ i \boldsymbol{\mu}_{i} μi求偏导
∂ Q ( θ , θ ( i ) ) ∂ μ i = ∑ j = 1 m { 0 + 0 − 0 − 1 2 γ j i ∂ ( ( x j − μ i ) T Σ i − 1 ( x j − μ i ) ) ∂ μ i } = − ∑ j = 1 m 1 2 γ j i ∂ ( x j T Σ i − 1 x j − x j T Σ i − 1 μ i − μ i T Σ i − 1 x j + μ i T Σ i − 1 μ i ) ∂ μ i = − ∑ j = 1 m 1 2 γ j i ∂ ( − x j T Σ i − 1 μ i − μ i T Σ i − 1 x j + μ i T Σ i − 1 μ i ) ∂ μ i \begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu}_{i}} &=\sum_{j=1}^{m}\left\{0+0-0-\frac{1}{2} \gamma_{j i} \frac{\partial\left(\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right)}{\partial \boldsymbol{\mu}_{i}}\right\} \\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}-\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \end{aligned} μiQ(θ,θ(i))=j=1m0+0021γjiμi((xjμi)TΣi1(xjμi))=j=1m21γjiμi(xjTΣi1xjxjTΣi1μiμiTΣi1xj+μiTΣi1μi)=j=1m21γjiμi(xjTΣi1μiμiTΣi1xj+μiTΣi1μi)
由于 x j T Σ i − 1 μ i \boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i} xjTΣi1μi μ i T Σ i − 1 x j \boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j} μiTΣi1xj均为标量且 Σ i \Sigma_{i} Σi为对称矩阵,标量转置还是它本身,所以
x j T Σ i − 1 μ i = ( x j T Σ i − 1 μ i ) T = μ i T ( Σ i − 1 ) T x j = μ i T ( Σ i T ) − 1 x j = μ i T Σ i − 1 x j \boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}=\left(\boldsymbol{x}_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)^{T}=\boldsymbol{\mu}_{i}^{T}\left(\boldsymbol{\Sigma}_{i}^{-1}\right)^{T} \boldsymbol{x}_{j}=\boldsymbol{\mu}_{i}^{T}\left(\boldsymbol{\Sigma}_{i}^{T}\right)^{-1} \boldsymbol{x}_{j}=\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j} xjTΣi1μi=(xjTΣi1μi)T=μiT(Σi1)Txj=μiT(ΣiT)1xj=μiTΣi1xj
代入上式可得
∂ Q ( θ , θ ( i ) ) ∂ μ i = − ∑ j = 1 m 1 2 γ j i ∂ ( − x j T Σ i − 1 μ i − μ i T Σ i − 1 x j + μ i T Σ i − 1 μ i ) ∂ μ i = − ∑ j = 1 m 1 2 γ j i ∂ ( − 2 μ i T Σ i − 1 x j + μ i T Σ i − 1 μ i ) ∂ μ i \begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu_{i}}}&=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-x_{j}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}-\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)} {\partial \boldsymbol{\mu}_{i}}\\ &=-\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i} \frac{\partial\left(-2 \boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}+\boldsymbol{\mu}_{i}^{T} \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)}{\partial \boldsymbol{\mu}_{i}} \end{aligned} μiQ(θ,θ(i))=j=1m21γjiμi(xjTΣi1μiμiTΣi1xj+μiTΣi1μi)=j=1m21γjiμi(2μiTΣi1xj+μiTΣi1μi)
又由矩阵微分公式 ∂ x T a ∂ x = a , ∂ x T B x ∂ x = ( B + B T ) x \dfrac{\partial \boldsymbol{x}^{T} \boldsymbol{a}}{\partial \boldsymbol{x}}=\boldsymbol{a}, \dfrac{\partial \boldsymbol{x}^{T} \mathbf{B} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\mathbf{B}+\mathbf{B}^{T}\right) \boldsymbol{x} xxTa=a,xxTBx=(B+BT)x可得
∂ Q ( θ , θ ( i ) ) ∂ μ i = ∑ j = 1 m 1 2 γ j i ( 2 Σ i − 1 x j − 2 Σ i − 1 μ i ) = ∑ j = 1 m γ j i ( Σ i − 1 x j − Σ i − 1 μ i ) \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\mu_{i}}}=\sum_{j=1}^{m} \frac{1}{2} \gamma_{j i}\left(2 \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x_{j}}-2 \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu_{i}}\right)=\sum_{j=1}^{m} \gamma_{j i}\left( \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x_{j}}- \boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu_{i}}\right) μiQ(θ,θ(i))=j=1m21γji(2Σi1xj2Σi1μi)=j=1mγji(Σi1xjΣi1μi)

令上式等于0可得
∑ j = 1 m γ j i ( Σ i − 1 x j − Σ i − 1 μ i ) = 0 Σ i − 1 ⋅ ∑ j = 1 m γ j i ( x j − μ i ) = 0 ∑ j = 1 m γ j i ( x j − μ i ) = 0 \begin{gathered} \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{x}_{j}-\boldsymbol{\Sigma}_{i}^{-1} \boldsymbol{\mu}_{i}\right)=0 \\ \boldsymbol{\Sigma}_{i}^{-1} \cdot \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)=0 \\ \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)=0 \end{gathered} j=1mγji(Σi1xjΣi1μi)=0Σi1j=1mγji(xjμi)=0j=1mγji(xjμi)=0

μ i = ∑ j = 1 m γ j i x j ∑ j = 1 m γ j i ⇒ μ i ( i + 1 ) = ∑ j = 1 m γ j i x j ∑ j = 1 m γ j i \boldsymbol{\mu}_{i}=\frac{\sum_{j=1}^{m} \gamma_{j i} \boldsymbol{x}_{j}}{\sum_{j=1}^{m} \gamma_{j i}} \Rightarrow \boldsymbol{\mu}_{i}^{(i+1)}=\frac{\sum_{j=1}^{m} \gamma_{j i} \boldsymbol{x}_{j}}{\sum_{j=1}^{m} \gamma_{j i}} μi=j=1mγjij=1mγjixjμi(i+1)=j=1mγjij=1mγjixj

此即为西瓜书式9.34

Σ i ( i + 1 ) \Sigma_{i}^{(i+1)} Σi(i+1),对Q函数关于 Σ i \Sigma_{i} Σi求偏导
∂ Q ( θ , θ ( i ) ) ∂ Σ i = ∑ j = 1 m { 0 + 0 − ∂ ∂ Σ i ( 1 2 γ j i ln ⁡ ∣ Σ i ∣ ) − ∂ ∂ Σ i [ 1 2 γ j i ( x j − μ i ) T Σ i − 1 ( x j − μ i ) ] } = ∑ j = 1 m { − 1 2 γ j i ∂ ( ln ⁡ ∣ Σ i ∣ ) ∂ Σ i − 1 2 γ j i ∂ [ ( x j − μ i ) T Σ i − 1 ( x j − μ i ) ] ∂ Σ i } \begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{j=1}^{m}\left\{0+0-\frac{\partial}{\partial \boldsymbol{\Sigma}_{i}}\left(\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|\right)-\frac{\partial}{\partial \boldsymbol{\Sigma}_{i}}\left[\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]\right\} \\ &=\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \frac{\partial\left(\ln \left|\boldsymbol{\Sigma}_{i}\right|\right)}{\partial \boldsymbol{\Sigma}_{i}}-\frac{1}{2} \gamma_{j i} \frac{\partial\left[\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right]}{\partial \boldsymbol{\Sigma}_{i}}\right\} \end{aligned} ΣiQ(θ,θ(i))=j=1m{0+0Σi(21γjilnΣi)Σi[21γji(xjμi)TΣi1(xjμi)]}=j=1m21γjiΣi(lnΣi)21γjiΣi[(xjμi)TΣi1(xjμi)]
由矩阵微分公式 ∂ ∣ X ∣ ∂ X = ∣ X ∣ ⋅ ( X − 1 ) T , ∂ a T X − 1 b ∂ X = − X − T a b T X − T \dfrac{\partial|\mathbf{X}|}{\partial \mathbf{X}}=|\mathbf{X}| \cdot\left(\mathbf{X}^{-1}\right)^{T}, \dfrac{\partial \boldsymbol{a}^{T} \mathbf{X}^{-1} \boldsymbol{b}}{\partial \mathbf{X}}=-\mathbf{X}^{-T} \boldsymbol{a} \boldsymbol{b}^{T} \mathbf{X}^{-T} XX=X(X1)T,XaTX1b=XTabTXT,且 Σ i \Sigma_{i} Σi是对称矩阵(先求逆再求转置相当于只求逆),可得
∂ Q ( θ , θ ( i ) ) ∂ Σ i = ∑ j = 1 m { − 1 2 γ j i ⋅ 1 ∣ Σ i ∣ ⋅ ∣ Σ i ∣ ⋅ ( Σ i − 1 ) T − 1 2 γ j i ⋅ ( − Σ i ) − T ( x j − μ i ) ( x j − μ i ) T Σ i − T } = ∑ j = 1 m { − 1 2 γ j i Σ i − 1 + 1 2 γ j i Σ i − 1 ( x j − μ i ) ( x j − μ i ) T Σ i − 1 } \begin{aligned} \frac{\partial Q\left(\theta, \theta^{(i)}\right)}{\partial \boldsymbol{\Sigma}_{i}} & =\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \cdot \frac{1}{\left|\boldsymbol{\Sigma}_{i}\right|} \cdot\left|\boldsymbol{\Sigma}_{i}\right| \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\right)^{T}-\frac{1}{2} \gamma_{j i} \cdot\left(-\boldsymbol{\Sigma}_{i}\right)^{-T}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-T}\right\}\\ &=\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\right\} \end{aligned} ΣiQ(θ,θ(i))=j=1m{21γjiΣi1Σi(Σi1)T21γji(Σi)T(xjμi)(xjμi)TΣiT}=j=1m{21γjiΣi1+21γjiΣi1(xjμi)(xjμi)TΣi1}
令上式等于0可得
∑ j = 1 m { − 1 2 γ j i Σ i − 1 + 1 2 γ j i Σ i − 1 ( x j − μ i ) ( x j − μ i ) T Σ i − 1 } = 0 ∑ j = 1 m { − 1 2 γ j i + 1 2 γ j i Σ i − 1 ( x j − μ i ) ( x j − μ i ) T } = 0 1 2 ∑ j = 1 m γ j i Σ i − 1 ( x j − μ i ) ( x j − μ i ) T = 1 2 ∑ j = 1 m γ j i Σ i − 1 ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T = ∑ j = 1 m γ j i \begin{aligned} &\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\right\}=0\\ &\sum_{j=1}^{m}\left\{-\frac{1}{2} \gamma_{j i}+\frac{1}{2} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}\right\}=0\\ &\frac{1}{2} \sum_{j=1}^{m} \gamma_{j i} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}=\frac{1}{2} \sum_{j=1}^{m} \gamma_{j i}\\ &\Sigma_{i}^{-1} \sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}=\sum_{j=1}^{m} \gamma_{j i} \end{aligned} j=1m{21γjiΣi1+21γjiΣi1(xjμi)(xjμi)TΣi1}=0j=1m{21γji+21γjiΣi1(xjμi)(xjμi)T}=021j=1mγjiΣi1(xjμi)(xjμi)T=21j=1mγjiΣi1j=1mγji(xjμi)(xjμi)T=j=1mγji

Σ i − 1 = ∑ j = 1 m γ j i ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T Σ i = ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T ∑ j = 1 m γ j i ⇒ Σ i ( i + 1 ) = ∑ j = 1 m γ j i ( x j − μ i ) ( x j − μ i ) T ∑ j = 1 m γ j i \begin{gathered} \boldsymbol{\Sigma}_{i}^{-1}=\frac{\sum_{j=1}^{m} \gamma_{j i}}{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}} \\ \boldsymbol{\Sigma}_{i}=\frac{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}}{\sum_{j=1}^{m} \gamma_{j i}} \Rightarrow \boldsymbol{\Sigma}_{i}^{(i+1)}=\frac{\sum_{j=1}^{m} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T}}{\sum_{j=1}^{m} \gamma_{j i}} \end{gathered} Σi1=j=1mγji(xjμi)(xjμi)Tj=1mγjiΣi=j=1mγjij=1mγji(xjμi)(xjμi)TΣi(i+1)=j=1mγjij=1mγji(xjμi)(xjμi)T

此即为西瓜书式9.35

α i ( i + 1 ) \alpha_{i}^{(i+1)} αi(i+1),由于 α i \alpha_i αi存在约束 ∑ i = 1 k α i = 1 \sum_{i=1}^{k} \alpha_{i}=1 i=1kαi=1,所以考虑使用拉格朗日乘子法,其拉格朗日函数为
L ( α , λ ) = Q ( θ , θ ( i ) ) + λ ( ∑ i = 1 k α i − 1 ) = ∑ j = 1 m ∑ i = 1 k { γ j i ln ⁡ α i + γ j i ln ⁡ 1 ( 2 π ) n 2 − 1 2 γ j i ln ⁡ ∣ Σ i ∣ − 1 2 γ j i ( x j − μ i ) T Σ i − 1 ( x j − μ i ) } + λ ( ∑ i = 1 k α i − 1 ) \begin{aligned} L(\boldsymbol{\alpha}, \lambda) &=Q\left(\theta, \theta^{(i)}\right)+\lambda\left(\sum_{i=1}^{k} \alpha_{i}-1\right) \\ &=\sum_{j=1}^{m} \sum_{i=1}^{k}\left\{\gamma_{j i} \ln \alpha_{i}+\gamma_{j i} \ln \frac{1}{(2 \pi)^{\frac{n}{2}}}-\frac{1}{2} \gamma_{j i} \ln \left|\boldsymbol{\Sigma}_{i}\right|-\frac{1}{2} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{T} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\right\}+\lambda\left(\sum_{i=1}^{k} \alpha_{i}-1\right) \end{aligned} L(α,λ)=Q(θ,θ(i))+λ(i=1kαi1)=j=1mi=1k{γjilnαi+γjiln(2π)2n121γjilnΣi21γji(xjμi)TΣi1(xjμi)}+λ(i=1kαi1)

对拉格朗日函数关于 α i \alpha_i αi求偏导:
∂ L ( α , λ ) ∂ α i = ∑ j = 1 m { ∂ ( γ j i ln ⁡ α i ) ∂ α i + 0 − 0 − 0 } + λ ∂ ( ∑ i = 1 k α i − 1 ) ∂ α i = ∑ j = 1 m γ j i α i + λ \frac{\partial L(\boldsymbol{\alpha} , \lambda)}{\partial \alpha_{i}}=\sum_{j=1}^{m}\left\{\frac{\partial\left(\gamma_{j i} \ln \alpha_{i}\right)}{\partial \alpha_{i}}+0-0-0\right\}+\lambda \frac{\partial\left(\sum_{i=1}^{k} \alpha_{i}-1\right)}{\partial \alpha_{i}}=\sum_{j=1}^{m} \frac{\gamma_{j i}}{\alpha_{i}}+\lambda αiL(α,λ)=j=1m{αi(γjilnαi)+000}+λαi(i=1kαi1)=j=1mαiγji+λ

令上式等于0可得
∑ j = 1 m γ j i α i + λ = 0 1 α i ∑ j = 1 m γ j i = − λ α i = − 1 λ ∑ j = 1 m γ j i \begin{aligned} &\sum_{j=1}^{m} \frac{\gamma_{j i}}{\alpha_{i}}+\lambda=0 \\ &\frac{1}{\alpha_{i}} \sum_{j=1}^{m} \gamma_{j i}=-\lambda \\ &\alpha_{i}=-\frac{1}{\lambda} \sum_{j=1}^{m} \gamma_{j i} \end{aligned} j=1mαiγji+λ=0αi1j=1mγji=λαi=λ1j=1mγji
由于 ∑ i = 1 k α i = 1 \sum_{i=1}^{k} \alpha_{i}=1 i=1kαi=1,则上式两边关于i求和可得
∑ i = 1 k α i = − 1 λ ∑ i = 1 k ∑ j = 1 m γ j i 1 = − 1 λ ∑ i = 1 k ∑ j = 1 m γ j i λ = − ∑ i = 1 k ∑ j = 1 m γ j i \begin{gathered} \sum_{i=1}^{k} \alpha_{i}=-\frac{1}{\lambda} \sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \\ 1=-\frac{1}{\lambda} \sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \\ \lambda=-\sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i} \end{gathered} i=1kαi=λ1i=1kj=1mγji1=λ1i=1kj=1mγjiλ=i=1kj=1mγji
这时我们要求出 λ \lambda λ,又因为
∑ i = 1 k ∑ j = 1 m γ j i = ∑ j = 1 m ∑ i = 1 k γ j i = ∑ j = 1 m ∑ i = 1 k P ( z j = i ∣ x j , θ ( i ) ) = ∑ j = 1 m 1 = m \sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i}=\sum_{j=1}^{m} \sum_{i=1}^{k} \gamma_{j i}=\sum_{j=1}^{m} \sum_{i=1}^{k} P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right)=\sum_{j=1}^{m} 1=m i=1kj=1mγji=j=1mi=1kγji=j=1mi=1kP(zj=ixj,θ(i))=j=1m1=m
所以
λ = − ∑ i = 1 k ∑ j = 1 m γ j i = − m α i = − 1 λ ∑ j = 1 m γ j i = 1 m ∑ j = 1 m γ j i \begin{gathered} \lambda=-\sum_{i=1}^{k} \sum_{j=1}^{m} \gamma_{j i}=-m \\ \alpha_{i}=-\frac{1}{\lambda} \sum_{j=1}^{m} \gamma_{j i}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \end{gathered} λ=i=1kj=1mγji=mαi=λ1j=1mγji=m1j=1mγji
由于 0 ≤ γ j i = P ( z j = i ∣ x j , θ ( i ) ) ≤ 1 0 \leq \gamma_{j i}=P\left(z_{j}=i \mid \boldsymbol{x}_{j}, \theta^{(i)}\right) \leq 1 0γji=P(zj=ixj,θ(i))1,所以
0 ≤ ∑ j = 1 m γ j i ≤ m ⇒ 0 ≤ 1 m ∑ j = 1 m γ j i ≤ 1 0 \leq \sum_{j=1}^{m} \gamma_{j i} \leq m \Rightarrow 0 \leq \frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \leq 1 0j=1mγjim0m1j=1mγji1
那么此时解得的 α i \alpha_{i} αi是有效解,可以作为下一次迭代的初始参数,也即
α i = 1 m ∑ j = 1 m γ j i ⇒ α i ( i + 1 ) = 1 m ∑ j = 1 m γ j i \alpha_{i}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} \Rightarrow \alpha_{i}^{(i+1)}=\frac{1}{m} \sum_{j=1}^{m} \gamma_{j i} αi=m1j=1mγjiαi(i+1)=m1j=1mγji
此即为西瓜书式9.38

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值