EM算法(2)

1.EM算法
   假定有训练数据集
{ x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) } \left\{x^{(1)}, x^{(2)}, \cdots, x^{(m)}\right\} {x(1),x(2),,x(m)}
   包含m个独立样本,希望从中找出该组数据得模型模型 p ( x , z ) p(x, z) p(x,z)得参数。
   取对数似然函数
l ( θ ) = ∑ i = 1 m log ⁡ p ( x ; θ ) = ∑ i = 1 m log ⁡ ∑ z p ( x , z ; θ ) \begin{aligned} &l(\theta)=\sum_{i=1}^{m} \log p(x ; \theta)\\ &=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta) \end{aligned} l(θ)=i=1mlogp(x;θ)=i=1mlogzp(x,z;θ)
   z是隐随机变量,不方便直接找到参数估计,使用下面的策略找出:计算 1 ( θ ) 1(\theta) 1(θ)的下界,求该下界最大值;重复该过程,直到收敛到局部最大值。
在这里插入图片描述
   令 Q i Q_i Qi是z的某一个分布, Q i ≥ 0 Q_i \geq 0 Qi0,有:
l ( θ ) = ∑ i = 1 m log ⁡ ∑ z p ( x , z ; θ ) = ∑ i = 1 m log ⁡ ∑ z ( i ) p ( x ( i ) , z ( i ) ; θ ) l(\theta)=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta)=\sum_{i=1}^{m} \log \sum_{z^{(i)}} p\left(x^{(i)}, z^{(i)} ; \theta\right) l(θ)=i=1mlogzp(x,z;θ)=i=1mlogz(i)p(x(i),z(i);θ) = ∑ i = 1 m log ⁡ ∑ z ( M Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) ≥ ∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log ⁡ p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) \begin{aligned} &=\sum_{i=1}^{m} \log \sum_{z^{(M}} Q_{i}\left(z^{(i)}\right) \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}\\ &\geq \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)} \end{aligned} =i=1mlogz(MQi(z(i))Qi(z(i))p(x(i),z(i);θ)i=1mz(i)Qi(z(i))logQi(z(i))p(x(i),z(i);θ)
   寻找尽量紧的下界,可以令:
p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) = c \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}=c Qi(z(i))p(x(i),z(i);θ)=c
   进一步分析:
Q i ( z ( i ) ) ∝ p ( x ( i ) , z ( i ) ; θ ) ∑ z Q i ( z ( i ) ) = 1 Q i ( z ( i ) ) = p ( x ( i ) , z ( i ) ; θ ) ∑ z p ( x ( i ) , z ( i ) ; θ ) = p ( x ( i ) , z ( i ) ; θ ) p ( x ( i ) ; θ ) = p ( z ( i ) ∣ x ( i ) ; θ ) \begin{array}{c} Q_{i}\left(z^{(i)}\right) \propto p\left(x^{(i)}, z^{(i)} ; \theta\right) \quad \sum_{z} Q_{i}\left(z^{(i)}\right)=1 \\ Q_{i}\left(z^{(i)}\right)=\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{\sum_{z} p\left(x^{(i)}, z^{(i)} ; \theta\right)} \\ =\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{p\left(x^{(i)} ; \theta\right)} \\ =p\left(z^{(i)} | x^{(i)} ; \theta\right) \end{array} Qi(z(i))p(x(i),z(i);θ)zQi(z(i))=1Qi(z(i))=zp(x(i),z(i);θ)p(x(i),z(i);θ)=p(x(i);θ)p(x(i),z(i);θ)=p(z(i)x(i);θ)
   EM算法整体框架:
在这里插入图片描述
2.从理论公式推导GMM
   随机变量X是由K个高斯分布混合而成,取各个高斯分布的概率为 φ 1 φ 2 ⋯ φ K \varphi_{1} \varphi_{2} \cdots \varphi_{K} φ1φ2φK,第i个高斯分布的均值为 μ i \mu_i μi,方差为 ∑ i \sum_i i。若观测到随机变量X的一系列样本 x 1 , x 2 , … , x n \mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}} x1,x2,,xn,试估计参数 φ , μ , Σ \varphi, \quad \boldsymbol{\mu}, \quad \boldsymbol{\Sigma} φ,μ,Σ
   E-step
w j ( i ) = Q i ( z ( i ) = j ) = P ( z ( i ) = j ∣ x ( i ) ; ϕ , μ , Σ ) w_{j}^{(i)}=Q_{i}\left(z^{(i)}=j\right)=P\left(z^{(i)}=j | x^{(i)} ; \phi, \mu, \Sigma\right) wj(i)=Qi(z(i)=j)=P(z(i)=jx(i);ϕ,μ,Σ)
   M-step
   将多项分布和高斯分布的参数带入:
∑ i = 1 m ∑ z ( i ) Q i ( z ( i ) ) log ⁡ p ( x ( i ) , z ( i ) ; ϕ , μ , Σ ) Q i ( z ( i ) ) = ∑ i = 1 m ∑ j = 1 k Q i ( z ( i ) = j ) log ⁡ p ( x ( i ) ∣ z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) Q i ( z ( i ) = j ) = ∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ 1 ( 2 π ) n / 2 ∣ Σ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) ⋅ ϕ j w j ( i ) \begin{array}{l} \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \phi, \mu, \Sigma\right)}{Q_{i}\left(z^{(i)}\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} Q_{i}\left(z^{(i)}=j\right) \log \frac{p\left(x^{(i)} | z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{Q_{i}\left(z^{(i)}=j\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \end{array} i=1mz(i)Qi(z(i))logQi(z(i))p(x(i),z(i);ϕ,μ,Σ)=i=1mj=1kQi(z(i)=j)logQi(z(i)=j)p(x(i)z(i)=j;μ,Σ)p(z(i)=j;ϕ)=i=1mj=1kwj(i)logwj(i)(2π)n/2Σj1/21exp(21(x(i)μj)TΣj1(x(i)μj))ϕj
   对均值求偏导
∇ μ l ∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ 1 ( 2 π ) n / 2 ∣ ∑ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) ⋅ ϕ j w j ( i ) = − ∇ μ l ∑ i = 1 m ∑ j = 1 k w j ( i ) 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) = 1 2 ∑ i = 1 m w l ( i ) ∇ μ l 2 μ l T Σ l − 1 x ( i ) − μ l T Σ l − 1 μ l = ∑ i = 1 m w l ( i ) ( Σ l − 1 x ( i ) − Σ l − 1 μ l ) \begin{array}{l} \nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{\left.\left.(2 \pi)^{n / 2}\right|\sum_ j\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \\ \quad=-\nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ \quad=\frac{1}{2} \sum_{i=1}^{m} w_{l}^{(i)} \nabla_{\mu_{l}} 2 \mu_{l}^{T} \Sigma_{l}^{-1} x^{(i)}-\mu_{l}^{T} \Sigma_{l}^{-1} \mu_{l} \\ \quad=\sum_{i=1}^{m} w_{l}^{(i)}\left(\Sigma_{l}^{-1} x^{(i)}-\Sigma_{l}^{-1} \mu_{l}\right) \end{array} μli=1mj=1kwj(i)logwj(i)(2π)n/2j1/21exp(21(x(i)μj)TΣj1(x(i)μj))ϕj=μli=1mj=1kwj(i)21(x(i)μj)TΣj1(x(i)μj)=21i=1mwl(i)μl2μlTΣl1x(i)μlTΣl1μl=i=1mwl(i)(Σl1x(i)Σl1μl)
   令上式等于0,解的均值为:
μ l : = ∑ i = 1 m w l ( i ) x ( i ) ∑ i = 1 m w l ( i ) \mu_{l}:=\frac{\sum_{i=1}^{m} w_{l}^{(i)} x^{(i)}}{\sum_{i=1}^{m} w_{l}^{(i)}} μl:=i=1mwl(i)i=1mwl(i)x(i)
   对方差求偏导,等于0
Σ j = ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m w j ( i ) \Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}} Σj=i=1mwj(i)i=1mwj(i)(x(i)μj)(x(i)μj)T
   多项分布参数,考察M-step的目标函数,对于 ϕ \phi ϕ,删除常数项
∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ 1 ( 2 π ) n / 2 ∣ Σ j ∣ 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) ⋅ ϕ j w j ( i ) \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} i=1mj=1kwj(i)logwj(i)(2π)n/2Σj1/21exp(21(x(i)μj)TΣj1(x(i)μj))ϕj
   得到
∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ ϕ j \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j} i=1mj=1kwj(i)logϕj
   拉格朗日乘子法
   由于多项分布的概率和为1,建立拉格朗日方程
L ( ϕ ) = ∑ i = 1 m ∑ j = 1 k w j ( i ) log ⁡ ϕ j + β ( ∑ j = 1 k ϕ j − 1 ) \mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j}+\beta\left(\sum_{j=1}^{k} \phi_{j}-1\right) L(ϕ)=i=1mj=1kwj(i)logϕj+β(j=1kϕj1)
   求偏导,等于0
∂ ∂ ϕ j L ( ϕ ) = ∑ i = 1 m w j ( i ) ϕ j + β − β = ∑ i = 1 m ∑ j = 1 k w j ( i ) = ∑ i = 1 m 1 = m ϕ j : = 1 m ∑ i = 1 m w j ( i ) \begin{array}{c} \frac{\partial}{\partial \phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\beta \\ -\beta=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)}=\sum_{i=1}^{m} 1=m \\ \phi_{j}:=\frac{1}{m} \sum_{i=1}^{m} w_{j}^{(i)} \end{array} ϕjL(ϕ)=i=1mϕjwj(i)+ββ=i=1mj=1kwj(i)=i=1m1=mϕj:=m1i=1mwj(i)
   总结,对于所有的数据点,可以看作组份k生成了这些点。组份k是一个标准的高斯分布,利用上面结论: { γ ( i , k ) x i ∣ i = 1 , 2 , ⋯ N } \left\{\gamma(i, k) x_{i} | i=1,2, \cdots N\right\} {γ(i,k)xii=1,2,N}
{ μ k = 1 N k ∑ i = 1 N γ ( i , k ) x i Σ k = 1 N k ∑ i = 1 N γ ( i , k ) ( x i − μ k ) ( x i − μ k ) T π k = 1 N ∑ i = 1 N γ ( i , k ) N k = N ⋅ π k \left\{\begin{array}{l} \mu_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k) x_{i} \\ \Sigma_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k)\left(x_{i}-\mu_{k}\right)\left(x_{i}-\mu_{k}\right)^{T} \\ \pi_{k}=\frac{1}{N} \sum_{i=1}^{N} \gamma(i, k) \\ N_{k}=N \cdot \pi_{k} \end{array}\right. μk=Nk1i=1Nγ(i,k)xiΣk=Nk1i=1Nγ(i,k)(xiμk)(xiμk)Tπk=N1i=1Nγ(i,k)Nk=Nπk

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值