机器学习—高斯混合模型

机器学习—高斯混合模型


为了解决高斯模型的单峰性的问题,我们引入多个高斯模型的加权平均来拟合多峰数据:
p ( x ) = ∑ k = 1 K α k N ( μ k , Σ k ) p(x)=\sum_{k=1}^K\alpha_k\mathcal{N}(\mu_k,\Sigma_k) p(x)=k=1KαkN(μk,Σk)
引入隐变量z,这个变量表示对应的样本x属于一个高斯分布,这个变量时一个离散的随机变量:
p ( z = i ) = p i , ∑ i = 1 k p ( z = i ) = 1 p(z=i)=p_i,\sum_{i=1}^kp(z=i)=1 p(z=i)=pi,i=1kp(z=i)=1
作为一个生成式模型,高斯混合模型通过隐变量z的分布来生成样本。用概率图来表示:

在这里插入图片描述
其中,节点z就是上面的概率,x就是生成的高斯分布。于是对 p ( x ) p(x) p(x):
p ( x ) = ∑ z p ( x , z ) = ∑ k = 1 K p ( x , z = k ) = ∑ k = 1 K p ( z = k ) p ( x ∣ z = k ) p(x)=\sum_zp(x,z)=\sum_{k=1}^Kp(x,z=k)=\sum_{k=1}^Kp(z=k)p(x|z=k) p(x)=zp(x,z)=k=1Kp(x,z=k)=k=1Kp(z=k)p(xz=k)
因此:
p ( x ) = ∑ k = 1 K p k N ( x ∣ μ k , Σ k ) p(x) = \sum_{k=1}^Kp_k\mathcal{N}(x|\mu_k,\Sigma_k) p(x)=k=1KpkN(xμk,Σk)

极大似然估计

样本为 X = ( x 1 , x 2 , . . . , x N ) X=(x_1,x_2,...,x_N) X=(x1,x2,...,xN), ( X , Z ) (X,Z) (X,Z)为完全参数,参数为 θ = p 1 , p 2 , . . . , p k , μ 1 , μ 2 , . . . , μ k , Σ 1 , Σ 2 , . . . , Σ k \theta={p_1, p_2, ..., p_k, \mu_1, \mu_2,..., \mu_k,\Sigma_1,\Sigma_2,...,\Sigma_k} θ=p1,p2,...,pk,μ1,μ2,...,μk,Σ1,Σ2,...,Σk.我们通过极大似然估计得到 θ \theta θ的值:
θ M L E = a r g m a x θ l o g p ( X ) = a r g m a x θ ∑ i = 1 N l o g p ( x i ) = a r g m a x θ ∑ i = 1 N l o g ∑ k = 1 K p k N ( x i ∣ μ k , Σ k ) \theta_{MLE}=argmax_{\theta}logp(X)=argmax_{\theta}\sum_{i=1}^Nlogp(x_i)\\ =argmax_{\theta}\sum_{i=1}^Nlog\sum_{k=1}^Kp_k\mathcal{N}(x_i|\mu_k,\Sigma_k) θMLE=argmaxθlogp(X)=argmaxθi=1Nlogp(xi)=argmaxθi=1Nlogk=1KpkN(xiμk,Σk)
这个表达式直接通过求导,由于连加号的存在,无法得到解析解。因此需要使用EM算法。

EM求解GMM

EM算法的基本表达式为: θ t + 1 = a r g m a x θ E z ∣ x , θ t [ p ( x , z ∣ θ ) ] \theta^{t+1}=\mathop{argmax}\limits_{\theta}\mathbb{E}_{z|x,\theta_t}[p(x,z|\theta)] θt+1=θargmaxEzx,θt[p(x,zθ)]。套用GMM的表达式,对数据集来说:
Q ( θ , θ t ) = ∑ z [ log ⁡ ∏ i = 1 N p ( x i , z i ∣ θ ) ] ∏ i = 1 N p ( z i ∣ x i , θ t ) = ∑ z [ ∑ i = 1 N log ⁡ p ( x i , z i ∣ θ ) ] ∏ i = 1 N p ( z i ∣ x i , θ t ) Q(\theta,\theta^t)=\sum\limits_z[\log\prod\limits_{i=1}^Np(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_z[\sum\limits_{i=1}^N\log p(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t) Q(θ,θt)=z[logi=1Np(xi,ziθ)]i=1Np(zixi,θt)=z[i=1Nlogp(xi,ziθ)]i=1Np(zixi,θt)
对于中间的那个求和号,展开,第一项为:
∑ z log ⁡ p ( x 1 , z 1 ∣ θ ) ∏ i = 1 N p ( z i ∣ x i , θ t ) = ∑ z log ⁡ p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) ∏ i = 2 N p ( z i ∣ x i , θ t ) = ∑ z 1 log ⁡ p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) ∑ z 2 , ⋯   , z K ∏ i = 2 N p ( z i ∣ x i , θ t ) = ∑ z 1 log ⁡ p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) \sum\limits_z\log p(x_1,z_1|\theta)\prod\limits_{i=1}^Np(z_i|x_i,\theta^t)=\sum\limits_z\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta) p(z_1|x_1,\theta^t)\sum\limits_{z_2,\cdots,z_K}\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t) zlogp(x1,z1θ)i=1Np(zixi,θt)=zlogp(x1,z1θ)p(z1x1,θt)i=2Np(zixi,θt)=z1logp(x1,z1θ)p(z1x1,θt)z2,,zKi=2Np(zixi,θt)=z1logp(x1,z1θ)p(z1x1,θt)
类似地,Q可以写为:
Q ( θ , θ t ) = ∑ i = 1 N ∑ z i log ⁡ p ( x i , z i ∣ θ ) p ( z i ∣ x i , θ t ) Q(\theta,\theta^t)=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p(x_i,z_i|\theta)p(z_i|x_i,\theta^t) Q(θ,θt)=i=1Nzilogp(xi,ziθ)p(zixi,θt)
对于 p ( x , z ∣ θ ) p(x,z|\theta) p(x,zθ):
p ( x , z ∣ θ ) = p ( z ∣ θ ) p ( x ∣ z , θ ) = p z N ( x ∣ μ z , Σ z ) p(x,z|\theta)=p(z|\theta)p(x|z,\theta)=p_z\mathcal{N}(x|\mu_z,\Sigma_z) p(x,zθ)=p(zθ)p(xz,θ)=pzN(xμz,Σz)
代入 Q Q Q:
Q = ∑ i = 1 N ∑ z i log ⁡ p z i N ( x i ∣ μ z i , Σ z i ) p z i t N ( x i ∣ μ z i t , Σ z i t ) ∑ k p k t N ( x i ∣ μ k t , Σ k t ) Q=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}\frac{p_{z_i}^t\mathcal{N}(x_i|\mu_{z_i}^t,\Sigma_{z_i}^t)}{\sum\limits_kp_k^t\mathcal{N}(x_i|\mu_k^t,\Sigma_k^t)} Q=i=1NzilogpziN(xiμzi,Σzi)kpktN(xiμkt,Σkt)pzitN(xiμzit,Σzit)
下面需要对Q值求最大值:
Q = ∑ k = 1 K ∑ i = 1 N [ log ⁡ p k + log ⁡ N ( x i ∣ μ k , Σ k ) ] p ( z i = k ∣ x i , θ t ) Q=\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t) Q=k=1Ki=1N[logpk+logN(xiμk,Σk)]p(zi=kxi,θt)

  1. p k t + 1 p_k^{t+1} pkt+1:
    p k t + 1 = a r g m a x p k ∑ k = 1 K ∑ i = 1 N [ log ⁡ p k + log ⁡ N ( x i ∣ μ k , Σ k ) ] p ( z i = k ∣ x i , θ t )   s . t .   ∑ k = 1 K p k = 1 p_k^{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)\ s.t.\ \sum\limits_{k=1}^Kp_k=1 pkt+1=argmaxpkk=1Ki=1N[logpk+logN(xiμk,Σk)]p(zi=kxi,θt) s.t. k=1Kpk=1
    即:
    p k t + 1 = a r g m a x p k ∑ k = 1 K ∑ i = 1 N log ⁡ p k p ( z i = k ∣ x i , θ t )   s . t .   ∑ k = 1 K p k = 1 p_k^{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=k|x_i,\theta^t)\ s.t.\ \sum\limits_{k=1}^Kp_k=1 pkt+1=argmaxpkk=1Ki=1Nlogpkp(zi=kxi,θt) s.t. k=1Kpk=1
    引入Lagrange乘子:
    L ( p k , λ ) = ∑ k = 1 K ∑ i = 1 N log ⁡ p k p ( z i = k ∣ x i , θ t ) − λ ( 1 − ∑ k = 1 K p k ) L(p_k,\lambda)=\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=k|x_i,\theta^t)-\lambda(1-\sum\limits_{k=1}^Kp_k) L(pk,λ)=k=1Ki=1Nlogpkp(zi=kxi,θt)λ(1k=1Kpk)。所以:
    ∂ ∂ p k L = ∑ i = 1 N 1 p k p ( z i = k ∣ x i , θ t ) + λ = 0 ⇒ ∑ k ∑ i = 1 N 1 p k p ( z i = k ∣ x i , θ t ) + λ ∑ k p k = 0 ⇒ λ = − N \frac{\partial}{\partial p_k}L=\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=k|x_i,\theta^t)+\lambda=0\\ \Rightarrow \sum\limits_k\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=k|x_i,\theta^t)+\lambda\sum\limits_kp_k=0\\ \Rightarrow\lambda=-N pkL=i=1Npk1p(zi=kxi,θt)+λ=0ki=1Npk1p(zi=kxi,θt)+λkpk=0λ=N
    于是有:
    p k t + 1 = 1 N ∑ i = 1 N p ( z i = k ∣ x i , θ t ) p_k^{t+1}=\frac{1}{N}\sum\limits_{i=1}^Np(z_i=k|x_i,\theta^t) pkt+1=N1i=1Np(zi=kxi,θt)
  2. μ k , Σ k \mu_k,\Sigma_k μk,Σk,这两个参数是无约束的,直接求导即可。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值