【EM算法】通俗理解+数学推导

Expectation Maximization Algorithm

单一高斯模型拟合数据

当数据的分布情况如下图所示时,可以采用单个的高斯模型来对数据进行拟合:

似然函数:
L   ( θ ∣ X ˉ ) = log ⁡   [ P ( X ˉ ∣ θ ) ] = ∑ i = 1 k log ⁡   P ( x i ∣ θ ) = ∑ i = 1 N log ⁡   N ( x i ∣ μ , σ ) \begin{aligned}{} \mathcal{L}\,(\theta|\bar{X}) &=\log\,[P(\bar{X}|\theta)] \nonumber\\ &=\sum_{i=1}^k\log\,P(x_i|\theta)\nonumber\\ &=\sum_{i=1}^N\log\,\mathcal{N}(x_i|\mu,\sigma)\nonumber \end{aligned}\nonumber L(θXˉ)=log[P(Xˉθ)]=i=1klogP(xiθ)=i=1NlogN(xiμ,σ)
求解 arg ⁡ max ⁡ θ = L ( θ ∣ X ˉ ) \arg\underset {\theta}{\max}=\mathcal{L}(\theta|\bar{X}) argθmax=L(θXˉ)

先求:
μ M L E = ∂ L ( μ , σ ∣ X ˉ ) ∂ μ = 0 \mu_{MLE}=\frac{\partial \mathcal{L}(\mu,\sigma|\bar{X})}{\partial \mu}=0\nonumber μMLE=μL(μ,σXˉ)=0
再求:
σ M L E 2 = ∂ L ( μ M L E , σ ∣ X ˉ ) ∂ σ = 0 \sigma_{MLE}^2=\frac{\partial \mathcal{L}(\mu_{MLE},\sigma|\bar{X})}{\partial \sigma}=0\nonumber σMLE2=σL(μMLE,σXˉ)=0

解得:
μ M L E = 1 N ∑ i = 1 N x i σ M L E 2 = ∑ i = 1 N ( x i − μ M L E ) N \begin{aligned}\mu_{MLE}&=\frac{1}{N}\sum_{i=1}^Nx_i\\\sigma^2_{MLE}&=\frac{\sum_{i=1}^N(x_i-\mu_{MLE})}{N}\end{aligned}\nonumber μMLEσMLE2=N1i=1Nxi=Ni=1N(xiμMLE)
但是在现实情况下,数据可能是这样分布的:

此时不能用一个高斯模型拟合求解,需要用到两个或多个高斯模型混合求解。

高斯混合模型

高斯混合模型的定义:

p ( X ∣ θ ) = ∑ l = 1 k α l N ( X ∣ μ l , σ l ) s . t . ∑ l = 1 k α l = 1 \begin{aligned} p(X|\theta)=\sum_{l=1}^k\alpha_l\mathcal{N}(X|\mu_l,\sigma_l)\qquad s.t.\quad\sum_{l=1}^k\alpha_l=1\nonumber \end{aligned} p(Xθ)=l=1kαlN(Xμl,σl)s.t.l=1kαl=1

其中, α l \alpha_l αl为第 l l l 个 高斯模型的归一化权重, N ( X ∣ μ l , σ l ) \mathcal{N}(X|\mu_l,\sigma_l) N(Xμl,σl)是高斯分布密度
θ = { θ 1 = [ α 1 , μ 1 , σ 1 ]   . . .   θ k = [ α k , μ k , σ k ] } \theta=\{\theta_1=[\alpha_1,\mu_1,\sigma_1]\,...\,\theta_k=[\alpha_k,\mu_k,\sigma_k]\}\nonumber θ={θ1=[α1,μ1,σ1]...θk=[αk,μk,σk]}
接着,对高斯混合模型中的 θ \theta θ 参数进行极大似然估计:

θ M L E = arg ⁡ max ⁡ θ   L ( θ ∣ X ) = arg ⁡ max ⁡ θ   ( ∑ i = 1 n log ⁡ ∑ l = 1 k α l N ( X ∣ μ l , σ l ) ) \begin{aligned} \theta_{MLE} &=\arg\underset\theta{\max}\,\cal{L}(\theta|X)\\ &=\arg\underset\theta{\max}\,(\sum_{i=1}^n\log\sum_{l=1}^k\alpha_l\mathcal{N}(X|\mu_l,\sigma_l)) \end{aligned}\nonumber θMLE=argθmaxL(θX)=argθmax(i=1nlogl=1kαlN(Xμl,σl))

此时,对 μ 1 , μ 2   . . .   μ k \mu_1,\mu_2\ ...\ \mu_k μ1,μ2 ... μk σ 1 , σ 2   . . .   σ k \sigma_1,\sigma_2\ ...\ \sigma_k σ1,σ2 ... σk都求偏导,然后再求解很困难,所以需要借助EM算法根据迭代的方式来求解。

EM算法迭代公式

定义EM算法的参数更新公式为:

θ ( g + 1 ) = arg ⁡ max ⁡ θ ∫ Z   log ⁡   [ p ( X , Z ∣ θ ) ] ⋅ p ( Z ∣ X , θ ( g ) )   d Z (1) \theta^{(g+1)}=\arg\underset\theta{\max}\int _Z\,\log\,[p(X,Z|\theta)]·p(Z|X,\theta^{(g)})\,\mathrm{d}Z\tag1 θ(g+1)=argθmaxZlog[p(X,Zθ)]p(ZX,θ(g))dZ(1)

其中, Z Z Z为隐变量(辅助变量)。

引入隐变量的前提

1、简化求解;

2、不改变数据的边缘分布(在此,对第2条进行证明)

证明满足边缘分布也就是要证明加入隐变量 Z Z Z 后式 ( 2 ) (2) (2)成立:

p ( x i ) = ∫ z i p θ ( x i ∣ z i ) ⋅ p θ ( z i ) d z i z i ∈ 1 , 2 ,   . . .   k (2) p(x_i)=\int _{z_i}p_\theta(x_i|z_i)·p_\theta(z_i)\mathrm{d}z_i\quad z_i\in{1,2,\ ...\ k}\tag2 p(xi)=zipθ(xizi)pθ(zi)dzizi1,2, ... k(2)
该如何理解高斯混合模型中的隐变量 Z Z Z 呢?

其实, Z Z Z 就代表着数据属于哪儿一个高斯分布,如下图所示:

此时,高斯混合模型在一定程度上已经退化为了单个高斯模型,这样也就大大简化了求解。
那在没有观测到数据之前, z i z_i zi 到底属于 θ 1 \theta_1 θ1还是属于 θ 2 \theta_2 θ2 呢?即: p ( z i ) = ? p(z_i)=? p(zi)=?

其实 p ( z i ) p(z_i) p(zi)这个概率就是从高斯混合模型中的权重参数得来的,即:

p ( z i ) = α z i p(z_i)=\alpha_{z_i}\nonumber p(zi)=αzi
此时,

p θ ( x i ∣ z i ) = N ( x i ∣ μ z i , σ z i ) p_\theta(x_i|z_i)=\mathcal{N}(x_i|\mu_{z_i},\sigma_{z_i})\nonumber pθ(xizi)=N(xiμzi,σzi)
p ( z i ) p(z_i) p(zi) p θ ( x i ∣ z i ) p_\theta(x_i|z_i) pθ(xizi)代入到上式中,得:

p ( x i ) = ∫ z i p θ ( x i ∣ z i ) ⋅ p θ ( z i ) d z i = ∑ z i k α z i N ( x i ∣ μ z i , α z i ) \begin{aligned}p(x_i)&=\int _{z_i}p_\theta(x_i|z_i)·p_\theta(z_i)\mathrm{d}z_i\\&=\sum_{z_i}^k\alpha_{z_i}\mathcal{N}(x_i|\mu_{z_i},\alpha_{z_i})\end{aligned}\nonumber p(xi)=zipθ(xizi)pθ(zi)dzi=zikαziN(xiμzi,αzi)
所以不会改变 x i x_i xi 的边缘分布,条件2得证!

收敛性证明(局部收敛)

如果EM算法收敛,则随着迭代更新,似然函数始终都在增加,即对于任意更新步骤,都有当前更新的似然比上一次更新的似然要大。则需要证明:

log ⁡   P ( X ∣ θ ( g + 1 ) ) = L ( θ ( g + 1 ) ) ⩾ L ( θ ( g ) ) = log ⁡   P ( X ∣ θ ( g ) ) (3) \log\,P(X|\theta^{(g+1)})=\mathcal{L}(\theta^{(g+1)})\geqslant\mathcal{L}(\theta^{(g)})=\log\,P(X|\theta^{(g)}) \tag3 logP(Xθ(g+1))=L(θ(g+1))L(θ(g))=logP(Xθ(g))(3)

证明

由联合概率公式

P ( X ) = P ( X , Z ) P ( Z ∣ X ) P(X)=\frac{P(X,Z)}{P(Z|X)}\nonumber P(X)=P(ZX)P(X,Z)

log ⁡   P ( X ∣ θ ) = log ⁡   P ( X , Z ∣ θ ) − log ⁡   P ( Z ∣ X , θ ) (4) \log\,P(X|\theta)=\log\,P(X,Z|\theta)-\log\,P(Z|X,\theta)\tag4 logP(Xθ)=logP(X,Zθ)logP(ZX,θ)(4)

对等式两边同时求期望得:

E p ( z ∣ x , θ ( g ) ) [ log ⁡   P ( X ∣ θ ) ] = E p ( z ∣ x , θ ( g ) ) [ log ⁡   P ( X , Z ∣ θ ) − log ⁡   P ( Z ∣ X , θ ) ] \underset{p(z|x,\theta^{(g)})}{E}[\log\,P(X|\theta)] =\underset{p(z|x,\theta^{(g)})}{E}[\log\,P(X,Z|\theta)-\log\,P(Z|X,\theta)]\nonumber p(zx,θ(g))E[logP(Xθ)]=p(zx,θ(g))E[logP(X,Zθ)logP(ZX,θ)]

等式左边:

E p ( z ∣ x , θ ( g ) ) [ log ⁡   P ( X ∣ θ ] = ∫ Z   log ⁡   [ P ( X ∣ θ ) ] ⋅ P ( Z ∣ X , θ ( g ) )    d Z = log ⁡   P ( X ∣ θ ) \begin{aligned} \underset{p(z|x,\theta^{(g)})}{E}[\log\,P(X|\theta] &=\int_Z\,\log\,[P(X|\theta)]·P(Z|X,\theta^{(g)})\,\,\rm{dZ}\\ &=\log\,P(X|\theta) \end{aligned}\nonumber p(zx,θ(g))E[logP(Xθ]=Zlog[P(Xθ)]P(ZX,θ(g))dZ=logP(Xθ)

等式右边:

E p ( z ∣ x , θ ( g ) ) [ log ⁡   P ( X , Z ∣ θ ) − log ⁡   P ( Z ∣ X , θ ) ] = ∫ Z   log ⁡   [ P ( X , Z ∣ θ ) ] ⋅ P ( X , Z ∣ θ ( g ) )   d Z − ∫ Z   log ⁡   [ P ( Z ∣ X , θ ) ] ⋅ P ( Z ∣ X , θ ( g ) )   d Z \underset{p(z|x,\theta^{(g)})}{E}[\log\,P(X,Z|\theta)-\log\,P(Z|X,\theta)]\\ =\int_Z\,\log\,[P(X,Z|\theta)]·P(X,Z|\theta^{(g)})\,\rm{dZ}-\int_Z\,\log\,[P(Z|X,\theta)]·P(Z|X,\theta^{(g)})\,\rm{dZ}\nonumber p(zx,θ(g))E[logP(X,Zθ)logP(ZX,θ)]=Zlog[P(X,Zθ)]P(X,Zθ(g))dZZlog[P(Z∣X,θ)]P(Z∣X,θ(g))dZ

令:

Q ( θ , θ ( g ) ) = ∫ Z log ⁡   [ P ( X , Z ∣ θ ] ⋅ P ( X , Z ∣ θ ( g ) )   d Z H ( θ , θ ( g ) ) = ∫ Z log ⁡   [ P ( Z ∣ X , θ ] ⋅ P ( Z ∣ X , θ ( g ) )   d Z \begin{aligned} Q(\theta,\theta^{(g)})&=\int_Z\log\,[P(X,Z|\theta]·P(X,Z|\theta^{(g)})\,\rm{dZ}\\ H(\theta,\theta^{(g)})&=\int_Z\log\,[P(Z|X,\theta]·P(Z|X,\theta^{(g)})\,\rm{dZ} \end{aligned}\nonumber Q(θ,θ(g))H(θ,θ(g))=Zlog[P(X,Zθ]P(X,Zθ(g))dZ=Zlog[P(ZX,θ]P(ZX,θ(g))dZ

则公式 ( 4 ) (4) (4)可以写成
log ⁡   P ( X ∣ θ ( g ) ) = Q ( θ , θ ( g ) ) − H ( θ , θ ( g ) ) (4) \log\,P(X|\theta^{(g)})=Q(\theta,\theta^{(g)})-H(\theta,\theta^{(g)})\tag{4} logP(Xθ(g))=Q(θ,θ(g))H(θ,θ(g))(4)
其中   θ ( g ) \,\theta^{(g)} θ(g)为常数,   θ   \,\theta\, θ为变量。并且,不难发现 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))为EM算法中的迭代公式。

分别取   θ   \,\theta\, θ   θ ( g )   \,\theta^{(g)}\, θ(g)   θ ( g + 1 )   \,\theta^{(g+1)}\, θ(g+1)并相减,有
log ⁡   P ( X ∣ θ ( g + 1 ) ) − log ⁡   P ( X ∣ θ ( g ) ) = [ Q ( θ ( g + 1 ) , θ ( g ) ) − Q ( θ ( g ) , θ ( g ) ] − [ H ( θ ( g + 1 ) , θ ( g ) ) − H ( θ ( g ) , θ ( g ) ] (5) \log\,P(X|\theta^{(g+1)})-\log\,P(X|\theta^{(g)})\\ =[Q(\theta^{(g+1)},\theta^{(g)})-Q(\theta^{(g)},\theta^{(g)}] -[H(\theta^{(g+1)},\theta^{(g)})-H(\theta^{(g)},\theta^{(g)}] \tag5 logP(Xθ(g+1))logP(Xθ(g))=[Q(θ(g+1),θ(g))Q(θ(g),θ(g)][H(θ(g+1),θ(g))H(θ(g),θ(g)](5)
因为当用EM算法求解 θ ( g ) \theta^{(g)} θ(g) 迭代到 θ ( g + 1 ) \theta^{(g+1)} θ(g+1) 时,有:

Q ( θ ( g + 1 ) , θ ( g ) ) − Q ( θ ( g ) , θ ( g ) ≥ 0 ) Q(\theta^{(g+1)},\theta^{(g)})- Q(\theta^{(g)},\theta^{(g)}\ge 0)\nonumber Q(θ(g+1),θ(g))Q(θ(g),θ(g)0)

所以式 ( 5 ) (5) (5)右端第一项大于等于0。

因为, f ( x ) = log ⁡   x   f(x)=\log\ x\, f(x)=log x为凸函数,所以,由Jensen’s不等式得:
H ( θ ( g + 1 ) , θ ( g ) ) − H ( θ ( g ) , θ ( g ) ) = ∫ Z   log ⁡   [ P ( Z ∣ X , θ ( g + 1 ) ) ] ⋅ P ( Z ∣ X , θ ( g ) ) − log ⁡   [ P ( Z ∣ X , θ ( g ) ) ] ⋅ P ( Z ∣ X , θ ( g ) )   d Z = ∫ Z   log ⁡ [ P ( Z ∣ X , θ ( g + 1 ) ) P ( Z ∣ Z , θ ( g ) ) ] ⋅ P ( Z ∣ X , θ ( g ) )   d Z ≤ log ⁡   [ ∫ Z   P ( Z ∣ X , θ ( g + 1 ) ) P ( Z ∣ X , θ ( g ) ) ⋅ P ( Z ∣ X , θ ( g ) )   d Z ] = log ⁡   [ ∫ Z   P ( Z ∣ X , θ ( g + 1 ) )   d Z ] = log ⁡   1 = 0 (6) \begin{aligned} H(\theta^{(g+1)},\theta^{(g)})- H(\theta^{(g)},\theta^{(g)}) &=\int_Z\,\log\,[P(Z|X,\theta^{(g+1)})]· P(Z|X,\theta^{(g)})-\log\,[P(Z|X,\theta^{(g)})]·P(Z|X,\theta^{(g)})\,\rm{dZ}\\ &=\int_Z\,\log[\frac{P(Z|X,\theta^{(g+1)})}{P(Z|Z,\theta^{(g)})}]·P(Z|X,\theta^{(g)})\,\rm{dZ}\\ &\le\log\,[\int_Z\,\frac{P(Z|X,\theta^{(g+1)})}{P(Z|X,\theta^{(g)})}·P(Z|X,\theta^{(g)})\,\rm{dZ}]\\ &=\log\,[\int_Z\,P(Z|X,\theta^{(g+1)})\,\rm{dZ}]\\ &=\log\,1\\ &=0 \end{aligned}\tag6 H(θ(g+1),θ(g))H(θ(g),θ(g))=Zlog[P(ZX,θ(g+1))]P(ZX,θ(g))log[P(ZX,θ(g))]P(ZX,θ(g))dZ=Zlog[P(ZZ,θ(g))P(ZX,θ(g+1))]P(ZX,θ(g))dZlog[ZP(ZX,θ(g))P(ZX,θ(g+1))P(ZX,θ(g))dZ]=log[ZP(ZX,θ(g+1))dZ]=log1=0(6)

由式 ( 5 ) (5) (5)和式 ( 6 ) (6) (6)得式 ( 7 ) (7) (7)
log ⁡   P ( X ∣ θ ( g + 1 ) ) − log ⁡   P ( X ∣ θ ( g ) ) ≥ 0 (7) \log\,P(X|\theta^{(g+1)})-\log\,P(X|\theta^{(g)})\ge0 \tag{7} logP(Xθ(g+1))logP(Xθ(g))0(7)
即得式 ( 3 ) (3) (3)​成立!

所以EM算法收敛性得证!

EM算法在高斯混合模型中的应用

高斯混合模型:

P ( X ∣ θ ) = ∑ l = 1 k α l N ( X ∣ μ l , σ l ) s . t . ∑ l = 1 k α l = 1 \begin{aligned} P(X|\theta)=\sum_{l=1}^k\alpha_l\mathcal{N}(X|\mu_l,\sigma_l)\qquad s.t.\quad\sum_{l=1}^k\alpha_l=1\nonumber \end{aligned} P(Xθ)=l=1kαlN(Xμl,σl)s.t.l=1kαl=1

对于数据集 X = { x 1 , x 2 , . . . , x n } X=\{x_1,x_2,...,x_n\} X={x1,x2,...,xn}引入隐变量 Z = { z 1 , z 2 , . . . , z n } Z=\{z_1,z_2,...,z_n\} Z={z1,z2,...,zn},每个 z i z_i zi表示数据 x i x_i xi属于第几个高斯分布

EM算法更新过程:
θ ( g + 1 ) = arg ⁡ max ⁡ θ ∫ Z   log ⁡   [ p ( X , Z ∣ θ ) ] ⋅ p ( Z ∣ X , θ ( g ) )   d Z = arg ⁡ max ⁡ θ   Q ( θ , θ ( g ) ) (1) \begin{aligned} \theta^{(g+1)}&=\arg\underset\theta{\max}\int _Z\,\log\,[p(X,Z|\theta)]·p(Z|X,\theta^{(g)})\,\mathrm{d}Z\\ &=\arg\underset\theta{\max}\,Q(\theta,\theta^{(g)}) \end{aligned} \tag1 θ(g+1)=argθmaxZlog[p(X,Zθ)]p(ZX,θ(g))dZ=argθmaxQ(θ,θ(g))(1)

E过程,即计算 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))

计算 p ( X , Z ∣ θ ) p(X,Z|\theta) p(X,Zθ) p ( Z ∣ X , θ ) p(Z|X,\theta) p(ZX,θ)

计算 p ( X , Z ∣ θ ) p(X,Z|\theta) p(X,Zθ)
p ( X , Z ∣ θ ) = ∏ i = 1 n p ( x i , z i ∣ θ ) = ∏ i = 1 n p ( x i ∣ z i , θ ) p ( z i ∣ θ ) \begin{aligned} p(X,Z|\theta) &=\prod_{i=1}^np(x_i,z_i|\theta)\\ &=\prod_{i=1}^np(x_i|z_i,\theta)p(z_i|\theta) \end{aligned}\nonumber p(X,Zθ)=i=1np(xi,ziθ)=i=1np(xizi,θ)p(ziθ)
p ( z i ∣ θ ) p(z_i|\theta) p(ziθ)表示在没有任何数据 X X X的情况下,是第 z i z_i zi个高斯分布的概率,即为高斯混合分布的混合系数 α z i \alpha_{z_i} αzi p ( x i ∣ z i , θ ) p(x_i|z_i,\theta) p(xizi,θ)表示数据 x i x_i xi在第 z i z_i zi个高斯分布中的概率,即为 N ( μ z i , Σ z i ) \mathcal{N}(\mu_{z_i},\Sigma_{z_i}) N(μzi,Σzi),所以:
p ( X , Z ∣ θ ) = ∏ i = 1 n α z i N ( μ z i , σ z i ) (8) p(X,Z|\theta)=\prod_{i=1}^n\alpha_{z_i}\mathcal{N}(\mu_{z_i},\sigma_{z_i})\tag8 p(X,Zθ)=i=1nαziN(μzi,σzi)(8)
计算 p ( Z ∣ X , θ ) p(Z|X,\theta) p(ZX,θ):
p ( Z ∣ X , θ ) = ∏ i = 1 n p ( z i ∣ x i , θ ) \begin{aligned} p(Z|X,\theta) &=\prod_{i=1}^np(z_i|x_i,\theta)\\ \end{aligned}\nonumber p(ZX,θ)=i=1np(zixi,θ)
其中, p ( z i ∣ x i , θ ) p(z_i|x_i,\theta) p(zixi,θ)的直观解释见下图:

对于红色数据来说,其 p ( z i ∣ x i , θ ) p(z_i|x_i,\theta) p(zixi,θ)为:
p ( z i = θ 1 ∣ x i , θ ) = a a + b p ( z i = θ 2 ∣ x i , θ ) = b a + b p(z_i=\theta_1|x_i,\theta)=\dfrac a {a+b}\\ p(z_i=\theta_2|x_i,\theta)=\dfrac b {a+b}\nonumber p(zi=θ1xi,θ)=a+bap(zi=θ2xi,θ)=a+bb
所以有
p ( z i ∣ x i , θ ) = p ( x i , z i ∣ θ ) p ( x i ∣ θ ) = α z i N ( μ z i , σ z i ) ∑ l = 1 k α l N ( μ l , σ l ) \begin{aligned} p(z_i|x_i,\theta) &=\frac{p(x_i,z_i|\theta)}{p(x_i|\theta)}\\ &=\frac{\alpha_{z_i}\mathcal{N}(\mu_{z_i},\sigma_{z_i})}{\sum_{l=1}^k\alpha_l\mathcal{N}(\mu_l,\sigma_l)} \end{aligned}\nonumber p(zixi,θ)=p(xiθ)p(xi,ziθ)=l=1kαlN(μl,σl)αziN(μzi,σzi)

所以有
p ( Z ∣ X , θ ) = ∏ i = 1 n p ( z i ∣ x i , θ ) = ∏ i = 1 n p ( x i , z i ∣ θ ) p ( x i ∣ θ ) = ∏ i = 1 n α z i N ( μ z i , σ z i ) ∑ l = 1 k α l N ( μ l , σ l ) (9) \begin{aligned} p(Z|X,\theta) &=\prod_{i=1}^np(z_i|x_i,\theta)\\ &=\prod_{i=1}^n\frac{p(x_i,z_i|\theta)}{p(x_i|\theta)}\\ &=\prod_{i=1}^n\frac{\alpha_{z_i}\mathcal{N}(\mu_{z_i},\sigma_{z_i})}{\sum_{l=1}^k\alpha_l\mathcal{N}(\mu_l,\sigma_l)}\tag9 \end{aligned} p(ZX,θ)=i=1np(zixi,θ)=i=1np(xiθ)p(xi,ziθ)=i=1nl=1kαlN(μl,σl)αziN(μzi,σzi)(9)
将式 ( 8 ) (8) (8) ( 9 ) (9) (9)代入得:
Q ( θ , θ ( g ) ) = ∫ Z ln ⁡ [ p ( X , Z ∣ θ ) ] ⋅ p ( Z ∣ X , θ ( g ) ) d Z = ∫ z 1 . . . ∫ z n ( ∑ i = 1 n ln ⁡ p ( z i , x i ∣ θ ) ∏ i = 1 n p ( z i ∣ x i , θ ( g ) ) ) d z 1 . . . d z n \begin{aligned} Q(\theta,\theta^{(g)}) &=\int_Z\ln[p(X,Z|\theta)]·p(Z|X,\theta^{(g)})\mathrm{d}Z\\ &=\int_{z_1}...\int_{z_n}\bigg(\sum_{i=1}^n\ln p(z_i,x_i|\theta)\prod_{i=1}^np(z_i|x_i,\theta^{(g)})\bigg)\mathrm{d}z_1...\mathrm{d}z_n \end{aligned}\nonumber Q(θ,θ(g))=Zln[p(X,Zθ)]p(ZX,θ(g))dZ=z1...zn(i=1nlnp(zi,xiθ)i=1np(zixi,θ(g)))dz1...dzn

简化 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))所用公式推导

因为有如下公式:
∫ y 1 . . . ∫ y n ( ∑ i = 1 n f i ( y i ) ) P ( Y ) d Y = ∑ i = 1 n ( ∫ y i f i ( y i ) P i ( y i ) d y i ) \int_{y_1}...\int_{y_n}\bigg(\sum_{i=1}^nf_i(y_i)\bigg)P(Y)\mathrm{d}Y=\sum_{i=1}^n\bigg(\int_{y_i}f_i(y_i)P_i(y_i)\mathrm{d}y_i\bigg) \nonumber y1...yn(i=1nfi(yi))P(Y)dY=i=1n(yifi(yi)Pi(yi)dyi)

其中 P ( Y ) P(Y) P(Y) y 1 , . . . , y n y_1,...,y_n y1,...,yn的联合概率分布 P ( y 1 , . . . , y n ) P(y_1,...,y_n) P(y1,...,yn)

该公式推导过程:

F ( Y ) = f 1 ( x 1 ) + . . . + f n ( x n ) = ∑ i = 1 n f i ( y i ) F(Y)=f_1(x_1)+...+f_n(x_n)=\sum_{i=1}^nf_i(y_i) F(Y)=f1(x1)+...+fn(xn)=i=1nfi(yi)
∫ Y ( F ( Y ) ) P ( Y ) d Y = ∫ y 1 . . . ∫ y n ( ∑ i = 1 n f i ( y i ) ) P ( Y ) d y 1 . . . d y n \int_Y(F(Y))P(Y)\mathrm{d}Y=\int_{y_1}...\int_{y_n}\bigg(\sum_{i=1}^nf_i(y_i)\bigg)P(Y)\mathrm{d}y_1...\mathrm{d}y_n \nonumber Y(F(Y))P(Y)dY=y1...yn(i=1nfi(yi))P(Y)dy1...dyn

将上式中 ∑ i = 1 n f i ( y i ) \sum_{i=1}^nf_i(y_i) i=1nfi(yi)展开,则为:
∫ y 1 . . . ∫ y n ( ∑ i = 1 n f i ( y i ) ) P ( Y ) d y 1 . . . d y n = ∫ y 1 . . . ∫ y n [ f 1 ( y 1 ) + f 2 ( y 2 ) + . . . + f n ( y n ) ] P ( y 1 , . . . , y n ) d y 1 . . . d y n = ∫ y 1 . . . ∫ y n f 1 ( y 1 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n + ∫ y 1 . . . ∫ y n f 2 ( y 2 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n + . . . + ∫ y 1 . . . ∫ y n f n ( y n ) P ( y 1 , . . . , y n ) d y 1 . . . d y n \begin{aligned} \int_{y_1}...\int_{y_n}&\bigg(\sum_{i=1}^nf_i(y_i)\bigg)P(Y)\mathrm{d}y_1...\mathrm{d}y_n\\ &=\int_{y_1}...\int_{y_n}[f_1(y_1)+f_2(y_2)+...+f_n(y_n)]P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\ &=\int_{y_1}...\int_{y_n}f_1(y_1)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\ &\quad+\int_{y_1}...\int_{y_n}f_2(y_2)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\&\quad+...+\int_{y_1}...\int_{y_n}f_n(y_n)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n \end{aligned}\nonumber y1...yn(i=1nfi(yi))P(Y)dy1...dyn=y1...yn[f1(y1)+f2(y2)+...+fn(yn)]P(y1,...,yn)dy1...dyn=y1...ynf1(y1)P(y1,...,yn)dy1...dyn+y1...ynf2(y2)P(y1,...,yn)dy1...dyn+...+y1...ynfn(yn)P(y1,...,yn)dy1...dyn

先重点关注第一项:
∫ y 1 . . . ∫ y n f 1 ( y 1 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n \int_{y_1}...\int_{y_n}f_1(y_1)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n \nonumber y1...ynf1(y1)P(y1,...,yn)dy1...dyn

因为 f 1 ( y 1 ) f_1(y_1) f1(y1) y 2 , . . . , y n y_2,...,y_n y2,...,yn均无关,对于 y 2 , . . . , y n y_2,...,y_n y2,...,yn来说 f 1 ( y 1 ) f_1(y_1) f1(y1)可看作是常数,所以 f 1 ( y 1 ) f_1(y_1) f1(y1)可以移到与之无关变量的积分号的外面,即
∫ y 1 . . . ∫ y n f 1 ( y 1 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n = ∫ y 1 f 1 ( y 1 ) ( ∫ y 2 . . . ∫ y n P ( y 1 , . . . , y n ) d y 2 . . . d y n ) d y 1 \int_{y_1}...\int_{y_n}f_1(y_1)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\=\int_{y_1}f_1(y_1)\bigg(\int_{y_2}...\int_{y_n}P(y_1,...,y_n)\mathrm{d}y_2...\mathrm{d}y_n\bigg)\mathrm{d}y_1 \nonumber y1...ynf1(y1)P(y1,...,yn)dy1...dyn=y1f1(y1)(y2...ynP(y1,...,yn)dy2...dyn)dy1

根据边缘概率公式:
P ( x ) = ∫ y P ( x , y ) d y P(x)=\int_{y}P(x,y)\mathrm{d}y \nonumber P(x)=yP(x,y)dy

因此有:
P ( y 1 , y 2 , . . . , y n − 1 ) = ∫ y n P ( y 1 , y 2 , . . . , y n − 1 , y n ) d y n P ( y 1 , y 2 , . . . , y n − 2 ) = ∫ y n − 1 P ( y 1 , y 2 , . . . , y n − 2 , y n − 1 ) d y n − 1    ⋮ P ( y 1 ) = ∫ y 2 P ( y 1 , y 2 ) d y 2 \begin{aligned} P(y_1,y_2,...,y_{n-1})&=\int_{y_n}P(y_1,y_2,...,y_{n-1},y_n)\mathrm{d}y_n\\ P(y_1,y_2,...,y_{n-2})&=\int_{y_{n-1}}P(y_1,y_2,...,y_{n-2},y_{n-1})\mathrm{d}y_{n-1}\\&\ \ \vdots\\P(y_1)&=\int_{y_2}P(y_1,y_2)\mathrm{d}y_2 \end{aligned} \nonumber P(y1,y2,...,yn1)P(y1,y2,...,yn2)P(y1)=ynP(y1,y2,...,yn1,yn)dyn=yn1P(y1,y2,...,yn2,yn1)dyn1  =y2P(y1,y2)dy2

因此套用一次边缘概率公式,可以去除掉一层积分,所以公式第一项最后变为:
∫ y 1 f 1 ( y 1 ) ( ∫ y 2 . . . ∫ y n P ( y 1 , . . . , y n ) d y 2 . . . d y n ) d y 1 = ∫ y 1 f 1 ( y 1 ) P ( y 1 ) d y 1 \int_{y_1}f_1(y_1)\bigg(\int_{y_2}...\int_{y_n}P(y_1,...,y_n)\mathrm{d}y_2...\mathrm{d}y_n\bigg)\mathrm{d}y_1=\int_{y_1}f_1(y_1)P(y_1)\mathrm{d}y_1 \nonumber y1f1(y1)(y2...ynP(y1,...,yn)dy2...dyn)dy1=y1f1(y1)P(y1)dy1

整个公式即为:
∫ y 1 . . . ∫ y n ( ∑ i = 1 n f i ( y i ) ) P ( Y ) d y 1 . . . d y n = ∫ y 1 . . . ∫ y n f 1 ( y 1 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n + ∫ y 1 . . . ∫ y n f 2 ( y 2 ) P ( y 1 , . . . , y n ) d y 1 . . . d y n + . . . + ∫ y 1 . . . ∫ y n f n ( y n ) P ( y 1 , . . . , y n ) d y 1 . . . d y n = ∫ y 1 f 1 ( y 1 ) P ( y 1 ) d y 1 + ∫ y 2 f 2 ( y 2 ) P ( y 2 ) d y 2 + . . . + ∫ y n f n ( y n ) P ( y n ) d y n = ∑ i = 1 n ( ∫ y i f i ( y i ) P ( y i ) d y i ) \begin{aligned} \int_{y_1}...\int_{y_n}&\bigg(\sum_{i=1}^nf_i(y_i)\bigg)P(Y)\mathrm{d}y_1...\mathrm{d}y_n\\ &=\int_{y_1}...\int_{y_n}f_1(y_1)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\&\quad +\int_{y_1}...\int_{y_n}f_2(y_2)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\&\quad +...+\int_{y_1}...\int_{y_n}f_n(y_n)P(y_1,...,y_n)\mathrm{d}y_1...\mathrm{d}y_n\\ &=\int_{y_1}f_1(y_1)P(y_1)\mathrm{d}y_1\\&\quad +\int_{y_2}f_2(y_2)P(y_2)\mathrm{d}y_2\\&\quad +...+\int_{y_n}f_n(y_n)P(y_n)\mathrm{d}y_n\\ &=\sum_{i=1}^n\bigg(\int_{y_i}f_i(y_i)P(y_i)\mathrm{d}y_i\bigg) \end{aligned} \nonumber y1...yn(i=1nfi(yi))P(Y)dy1...dyn=y1...ynf1(y1)P(y1,...,yn)dy1...dyn+y1...ynf2(y2)P(y1,...,yn)dy1...dyn+...+y1...ynfn(yn)P(y1,...,yn)dy1...dyn=y1f1(y1)P(y1)dy1+y2f2(y2)P(y2)dy2+...+ynfn(yn)P(yn)dyn=i=1n(yifi(yi)P(yi)dyi)

简化 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))

因此,把 f i ( y i ) f_i(y_i) fi(yi)看作 log ⁡ p ( z i , x i ∣ θ ) \log p(z_i,x_i|\theta) logp(zi,xiθ),把 P i ( y i ) P_i(y_i) Pi(yi)看作 p ( z i ∣ x i , θ ( g ) ) p(z_i|x_i,\theta^{(g)}) p(zixi,θ(g)),可得:
Q ( θ , θ ( g ) ) = ∫ z 1 . . . ∫ z n ( ∑ i = 1 n ln ⁡ p ( z i , x i ∣ θ ) ∏ i = 1 n p ( z i ∣ x i , θ ( g ) ) ) d z 1 . . . d z n = ∑ i = 1 n ( ∫ z i ln ⁡ p ( z i , x i ∣ θ ) p ( z i ∣ x i , θ ( g ) ) d z i ) \begin{aligned} Q(\theta,\theta^{(g)}) &=\int_{z_1}...\int_{z_n}\bigg(\sum_{i=1}^n\ln p(z_i,x_i|\theta)\prod_{i=1}^np(z_i|x_i,\theta^{(g)})\bigg)\mathrm{d}z_1...\mathrm{d}z_n\\ &=\sum_{i=1}^n\bigg(\int_{z_i}\ln p(z_i,x_i|\theta)p(z_i|x_i,\theta^{(g)})\mathrm{d}z_i\bigg) \end{aligned}\nonumber Q(θ,θ(g))=z1...zn(i=1nlnp(zi,xiθ)i=1np(zixi,θ(g)))dz1...dzn=i=1n(zilnp(zi,xiθ)p(zixi,θ(g))dzi)
因为是 z i z_i zi离散随机变量, z i ∈ { 1 , . . . , k } z_i\in\{1,...,k\} zi{1,...,k},所以积分符号应写为累加符号,可得:
Q ( θ , θ ( g ) ) = ∑ i = 1 n ( ∫ z i ln ⁡ p ( z i , x i ∣ θ ) p ( z i ∣ x i , θ ( g ) ) d z i ) = ∑ i = 1 n ( ∑ z i = 1 k ln ⁡ p ( z i , x i ∣ Θ ) p ( z i ∣ x i , θ ( g ) ) ) \begin{aligned} Q(\theta,\theta^{(g)}) &=\sum_{i=1}^n\bigg(\int_{z_i}\ln p(z_i,x_i|\theta)p(z_i|x_i,\theta^{(g)})\mathrm{d}z_i\bigg)\\&=\sum_{i=1}^n\bigg(\sum_{z_i=1}^k\ln p(z_i,x_i|\Theta)p(z_i|x_i,\theta^{(g)})\bigg) \end{aligned}\nonumber Q(θ,θ(g))=i=1n(zilnp(zi,xiθ)p(zixi,θ(g))dzi)=i=1n(zi=1klnp(zi,xi∣Θ)p(zixi,θ(g)))
l l l替换 z i z_i zi,最终可得:
Q ( θ , θ ( g ) ) = ∑ l = 1 k ∑ i = 1 n ln ⁡ p ( l , x i ∣ θ ) p ( l ∣ x i , θ ( g ) ) = ∑ l = 1 k ∑ i = 1 n ln ⁡ [ α l N ( x i ∣ μ l , σ l ) ] p ( l ∣ x i , θ ( g ) ) = ∑ l = 1 k ∑ i = 1 n ln ⁡ ( α l ) p ( l ∣ x i , θ ( g ) ) + ∑ l = 1 k ∑ i = 1 n ln ⁡ [ N ( x i ∣ μ l , σ l ) ] p ( l ∣ x i , θ ( g ) ) \begin{aligned} Q(\theta,\theta^{(g)}) &=\sum_{l=1}^k\sum_{i=1}^n\ln p(l,x_i|\theta)p(l|x_i,\theta^{(g)})\\ &=\sum_{l=1}^k\sum_{i=1}^n\ln[\alpha_l\mathcal{N}(x_i|\mu_l,\sigma_l)]p(l|x_i,\theta^{(g)})\\&=\sum_{l=1}^k\sum_{i=1}^n\ln(\alpha_l)p(l|x_i,\theta^{(g)})\\&\quad+\sum_{l=1}^k\sum_{i=1}^n\ln[\mathcal{N}(x_i|\mu_l,\sigma_l)]p(l|x_i,\theta^{(g)}) \end{aligned}\nonumber Q(θ,θ(g))=l=1ki=1nlnp(l,xiθ)p(lxi,θ(g))=l=1ki=1nln[αlN(xiμl,σl)]p(lxi,θ(g))=l=1ki=1nln(αl)p(lxi,θ(g))+l=1ki=1nln[N(xiμl,σl)]p(lxi,θ(g))

M过程,即最大化 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))

每次更新即为求使 Q ( θ , θ ( g ) ) Q(\theta,\theta^{(g)}) Q(θ,θ(g))最大的 { α 1 , . . . , α k , μ 1 , . . . , μ k , σ 1 , . . . , σ k } \{\alpha_1,...,\alpha_k,\mu_1,...,\mu_k,\sigma_1,...,\sigma_k\} {α1,...,αk,μ1,...,μk,σ1,...,σk}。而由于式中加号左边只包含 α \alpha α,而加号右边只包含 μ \mu μ, σ \sigma σ。所以可以每一项分别最大化

最大化 α \alpha α

计算公式:
α l = 1 n ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \alpha_l=\frac{1}n\sum_{i=1}^np(l|x_i,\theta^{(g)}) \nonumber αl=n1i=1np(lxi,θ(g))
计算过程:

​ 优化目标:
∂ ∑ l = 1 k ∑ i = 1 n ln ⁡ ( α l ) p ( l ∣ x i , θ ( g ) ) ∂ α 1 , . . . , ∂ α k = [ 0...0 ] s . t   ∑ l = 1 k α l = 1 \frac{\partial\sum_{l=1}^k\sum_{i=1}^n\ln(\alpha_l)p(l|x_i,\theta^{(g)})}{\partial\alpha_1,...,\partial\alpha_k}=[0...0]\qquad s.t\,\sum_{l=1}^k\alpha_l=1 \nonumber α1,...,αkl=1ki=1nln(αl)p(lxi,θ(g))=[0...0]s.tl=1kαl=1
​ 因为 ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \sum_{i=1}^np(l|x_i,\theta^{(g)}) i=1np(lxi,θ(g))这部分当中不包含 α \alpha α,所以:
∂ L M ∂ α l = 1 α l ( ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) ) + λ = 0 \frac{\partial\mathbb{LM}}{\partial\alpha_l}=\frac{1}{\alpha_l}\bigg(\sum_{i=1}^np(l|x_i,\theta^{(g)})\bigg)+\lambda=0 \nonumber αlLM=αl1(i=1np(lxi,θ(g)))+λ=0
​ 所以:
α l = − 1 λ ( ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) ) \alpha_l=-\frac{1}{\lambda}\bigg(\sum_{i=1}^np(l|x_i,\theta^{(g)})\bigg)\nonumber αl=λ1(i=1np(lxi,θ(g)))
​ 因为:
∑ l = 1 k α l = 1 \sum_{l=1}^k\alpha_l=1\nonumber l=1kαl=1

​ 即:
− ∑ l = 1 k 1 λ ( ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) ) = 1 -\sum_{l=1}^k\frac{1}{\lambda}\bigg(\sum_{i=1}^np(l|x_i,\theta^{(g)})\bigg)=1 \nonumber l=1kλ1(i=1np(lxi,θ(g)))=1
​ 接着:
λ = − ∑ l = 1 k ( ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) ) = − ∑ i = 1 n ( ∑ l = 1 k p ( l ∣ x i , θ ( g ) ) ) = − ∑ i = 1 n 1 = − n \begin{aligned} \lambda &=-\sum_{l=1}^k\bigg(\sum_{i=1}^np(l|x_i,\theta^{(g)})\bigg)\\ &=-\sum_{i=1}^n\bigg(\sum_{l=1}^kp(l|x_i,\theta^{(g)})\bigg)\\ &=-\sum_{i=1}^n1\\ &=-n \end{aligned}\nonumber λ=l=1k(i=1np(lxi,θ(g)))=i=1n(l=1kp(lxi,θ(g)))=i=1n1=n
​ 所以:
α l = 1 n ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \alpha_l=\frac{1}{n}\sum_{i=1}^np(l|x_i,\theta^{(g)})\nonumber αl=n1i=1np(lxi,θ(g))

最大化 μ \mu μ

μ l \mu_l μl计算公式:
μ l = ∑ i = 1 n x i p ( l ∣ x i , θ ( g ) ) ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \mu_l=\dfrac{\sum_{i=1}^nx_ip(l|x_i,\theta^{(g)})}{\sum_{i=1}^np(l|x_i,\theta^{(g)})}\nonumber μl=i=1np(lxi,θ(g))i=1nxip(lxi,θ(g))
μ \mu μ计算过程

​ 优化目标:
∂ ∑ l = 1 k ∑ i = 1 n ln ⁡ [ N ( x i ∣ μ l , Σ l ) ] p ( l ∣ x i , θ ( g ) ) ∂ μ 1 , . . . , ∂ μ k , ∂ σ 1 , . . . , ∂ σ k = [ 0...0 ] \frac{\partial\sum_{l=1}^k\sum_{i=1}^n\ln[\mathcal{N}(x_i|\mu_l,\Sigma_l)]p(l|x_i,\theta^{(g)})}{\partial\mu_1,...,\partial\mu_k,\partial\sigma_1,...,\partial\sigma_k}=[0...0]\nonumber μ1,...,μk,σ1,...,σkl=1ki=1nln[N(xiμl,Σl)]p(lxi,θ(g))=[0...0]
​ 因为:
∑ l = 1 k ∑ i = 1 n ln ⁡ [ N ( x i ∣ μ l , σ l ) ] p ( l ∣ x i , θ ( g ) ) = ∑ l = 1 k ∑ i = 1 n ln ⁡ ( 1 ( 2 π ) d ∣ σ l ∣ e − 1 2 ( x i − μ l ) ⊤ σ l − 1 ( x i − μ l ) ) p ( l ∣ x i , θ ( g ) ) = ∑ l = 1 k ∑ i = 1 n ( − 1 2 ln ⁡ ( ( 2 π ) d ∣ σ l ∣ ) − 1 2 ( x i − μ l ) ⊤ σ l − 1 ( x i − μ l ) ) p ( l ∣ x i , θ ( g ) ) \begin{aligned} \sum_{l=1}^k&\sum_{i=1}^n\ln[\mathcal{N}(x_i|\mu_l,\sigma_l)]p(l|x_i,\theta^{(g)})\\ &=\sum_{l=1}^k\sum_{i=1}^n\ln\bigg(\dfrac{1}{\sqrt{(2\pi)^d|\sigma_l|}}e^{-\frac{1}{2}(x_i-\mu_l)^\top\sigma_l^{-1}(x_i-\mu_l)}\bigg)p(l|x_i,\theta^{(g)})\\ &=\sum_{l=1}^k\sum_{i=1}^n\bigg(-\frac{1}{2}\ln\Big((2\pi)^d|\sigma_l|\Big)-\frac{1}{2}(x_i-\mu_l)^\top\sigma_l^{-1}(x_i-\mu_l)\bigg)p(l|x_i,\theta^{(g)}) \end{aligned}\nonumber l=1ki=1nln[N(xiμl,σl)]p(lxi,θ(g))=l=1ki=1nln((2π)dσl 1e21(xiμl)σl1(xiμl))p(lxi,θ(g))=l=1ki=1n(21ln((2π)dσl)21(xiμl)σl1(xiμl))p(lxi,θ(g))
​ 将上式对 μ l \mu_l μl求导,并令其为0,可得:
∑ i = 1 n σ l − 1 ( x i − μ l ) p ( l ∣ x i , θ ( g ) ) = 0 \sum_{i=1}^n\sigma_l^{-1}(x_i-\mu_l)p(l|x_i,\theta^{(g)})=0\nonumber i=1nσl1(xiμl)p(lxi,θ(g))=0

​ 所以:
μ l = ∑ i = 1 n x i p ( l ∣ x i , θ ( g ) ) ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \mu_l=\dfrac{\sum_{i=1}^nx_ip(l|x_i,\theta^{(g)})}{\sum_{i=1}^np(l|x_i,\theta^{(g)})}\nonumber μl=i=1np(lxi,θ(g))i=1nxip(lxi,θ(g))

最大化 σ \sigma σ

σ l \sigma_l σl计算公式:
σ l = ∑ i = 1 n M l ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) = ∑ i = 1 n ( x i − μ l ) ( x − μ l ) ⊺ p ( l ∣ x i , θ ( g ) ) ∑ i = 1 n p ( l ∣ x i , θ ( g ) ) \begin{aligned} \sigma_l&=\frac{\sum_{i=1}^nM_l}{\sum_{i=1}^np(l|x_i,\theta^{(g)})}\\ &=\frac{\sum_{i=1}^n(x_i-\mu_l)(x-\mu_l)^\intercal p(l|x_i,\theta^{(g)})}{\sum_{i=1}^np(l|x_i,\theta^{(g)})} \end{aligned}\nonumber σl=i=1np(lxi,θ(g))i=1nMl=i=1np(lxi,θ(g))i=1n(xiμl)(xμl)p(lxi,θ(g))

更新参数 θ ( g ) → θ ( g + 1 ) \theta^{(g)}\rightarrow\theta^{(g+1)} θ(g)θ(g+1)

α l ( g + 1 ) = 1 N ∑ i = 1 N p ( l ∣ x i , θ ( g ) ) μ l ( g + 1 ) = ∑ i = 1 N x i p ( l ∣ x i , θ ( g ) ) σ i = 1 N p ( l ∣ x i , θ ( g ) ) σ l ( g + 1 ) = ∑ i = 1 N [ x i − μ l ( g + 1 ) ] [ x − μ l ( g + 1 ) ] ⊺ p ( l ∣ x i , θ ( g ) ) ∑ i = 1 N p ( l ∣ x i , θ ( g ) ) \alpha_l^{(g+1)}=\frac{1}N\sum_{i=1}^Np(l|x_i,\theta^{(g)})\\ \mu_l^{(g+1)}=\frac{\sum_{i=1}^Nx_ip(l|x_i,\theta^{(g)})}{\sigma_{i=1}^Np(l|x_i,\theta^{(g)})}\\ \sigma_l^{(g+1)}=\dfrac{\sum_{i=1}^N[x_i-\mu_l^{(g+1)}][x-\mu_l^{(g+1)}]^\intercal p(l|x_i,\theta^{(g)})}{\sum_{i=1}^Np(l|x_i,\theta^{(g)})}\nonumber αl(g+1)=N1i=1Np(lxi,θ(g))μl(g+1)=σi=1Np(lxi,θ(g))i=1Nxip(lxi,θ(g))σl(g+1)=i=1Np(lxi,θ(g))i=1N[xiμl(g+1)][xμl(g+1)]p(lxi,θ(g))

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值