GMM

GMM

  • 一个类一个正态分布
  • N ( μ k , Σ k ) N(\mu_k,\Sigma_k) N(μk,Σk)
有监督无监督半监督
目标函数 L = l o g p ( X l , Y l ∥ θ ) = Σ i = 1 l l o g p ( y i ∥ θ ) p ( x i ∥ y i , θ ) = Σ i = 1 l l o g α y i N ( x i ∥ θ y i ) L=logp(X_l,Y_l\|\theta)=\Sigma_{i=1}^llogp(y_i\|\theta)p(x_i\|y_i,\theta)\\=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i}) L=logp(Xl,Ylθ)=Σi=1llogp(yiθ)p(xiyi,θ)=Σi=1llogαyiN(xiθyi) p ( x ; θ ) = Π i N Σ k = 1 K π k N ( x i ∥ μ k , Σ k ) p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i\|\mu_k,\Sigma_k) p(x;θ)=ΠiNΣk=1KπkN(xiμk,Σk) P ( x l , y l , x u ∥ θ ) = Σ i = 1 l l o g α y i N ( x i ∥ θ y i ) + Σ i = l m l o g Σ k = 1 N α k N ( x i ∥ θ k ) P(x_l,y_l,x_u\|\theta)=\Sigma_{i=1}^llog \alpha_{y_i}N(x_i\|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k) P(xl,yl,xuθ)=Σi=1llogαyiN(xiθyi)+Σi=lmlogΣk=1NαkN(xiθk)
E求导解决 求 γ i k = p ( y i = k ∥ x i ) = α k N ( x i ∥ θ k ) Σ k = 1 N α k N ( x i ∥ θ k ) 求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)} γik=p(yi=kxi)=Σk=1NαkN(xiθk)αkN(xiθk) 求 γ i k = p ( y i = k ∥ x i ) = α k N ( x i ∥ θ k ) Σ k = 1 N α k N ( x i ∥ θ k ) 求\gamma_{ik}=p(y_i=k\|x_i)=\frac{\alpha_kN(x_i\|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i\|\theta_k)} γik=p(yi=kxi)=Σk=1NαkN(xiθk)αkN(xiθk)
M μ k = 1 l k ( Σ i ∈ D l , y i = k x i ) Σ i = 1 l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T ) α k = l k m \mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\\Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{l_k}{m} μk=lk1(ΣiDl,yi=kxi)Σi=lk1(ΣiDl,yi=k(xiμk)(xiμk)T)αk=mlk μ k = Σ i γ ( z i k ) x i γ ( z i k ) π k = Σ i γ ( z i k ) N Σ k = Σ i γ ( z i k ) ( x i − μ k ) ( x i − μ k ) T γ ( z i k ) \mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})}\\\pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N}\\\Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})} μk=γ(zik)Σiγ(zik)xiπk=NΣiγ(zik)Σk=γ(zik)Σiγ(zik)(xiμk)(xiμk)T μ k = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k x i + Σ i = l m γ i k x i ) Σ i = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T + Σ i = l m γ i k ( x i − μ k ) ( x i − μ k ) T ) α k = Σ i = l m γ i k + l k m \mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\\Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\\alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m} μk=Σi=lmγik+lk1(ΣiDl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(ΣiDl,yi=k(xiμk)(xiμk)T+Σi=lmγik(xiμk)(xiμk)T)αk=mΣi=lmγik+lk
半监督=无监督+有监督

有监督

  • 目标函数: L = l o g p ( X l , Y l ∣ θ ) = Σ i = 1 l l o g p ( y i ∣ θ ) p ( x i ∣ y i , θ ) , θ i = α i , μ i , Σ i L=logp(X_l,Y_l|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta),\theta_i={\alpha_i,\mu_i,\Sigma_i} L=logp(Xl,Ylθ)=Σi=1llogp(yiθ)p(xiyi,θ),θi=αi,μi,Σi
  • = Σ i = 1 l l o g α y i N ( x i ∣ θ y i ) = Σ i = 1 l ( l o g α y i − n 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ y i ∣ ) − ( x i − μ y i ) T Σ y i − 1 ( x i − μ y i ) =\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i}) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i}) =Σi=1llogαyiN(xiθyi)=Σi=1l(logαyi2nlog(2π)21log(Σyi)(xiμyi)TΣyi1(xiμyi)
  • 直接求导得到结果
  • μ k = 1 l k ( Σ i ∈ D l , y i = k x i ) Σ i = 1 l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T ) α k = l k m \mu_k=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}x_i)\\ \Sigma_i=\frac{1}{l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{l_k}{m} μk=lk1(ΣiDl,yi=kxi)Σi=lk1(ΣiDl,yi=k(xiμk)(xiμk)T)αk=mlk

无监督

5.2GMM高斯混合模型和EM

在这里插入图片描述

  • 概率解释: 假设有K个簇,每一个簇服从高斯分布,以概率π𝑘随机选择一个簇 k ,从其分布中采样出一个样本点,如此得到观测数据
  • N个样本点𝒙的似然函数(Likelihood)
    • p ( x ; θ ) = Π i N Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) , 其 中 Σ k π k = 1 , 0 ≤ π k ≤ 1 p(x;\theta)=\Pi_i^N\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k),其中\Sigma_k\pi_k=1,0\leq \pi_k\leq 1 p(x;θ)=ΠiNΣk=1KπkN(xiμk,Σk),Σkπk=1,0πk1
    • 引入隐变量,指示所属类,k维独热表示
      • p ( z k = 1 ) = π k p(z_k=1)=\pi_k p(zk=1)=πk
      • p ( x i ∣ z ) = Π k K N ( x i ∣ μ k , Σ k ) z k p(x_i|z)=\Pi_k^KN(x_i|\mu_k,\Sigma_k)^{z_k} p(xiz)=ΠkKN(xiμk,Σk)zk
        • p ( x i ∣ z k = 1 ) = N ( x i ∣ μ k , Σ k ) p(x_i|z_k=1)=N(x_i|\mu_k,\Sigma_k) p(xizk=1)=N(xiμk,Σk)
      • p ( x i ) = Σ z p ( x i ∣ z ) p ( z ) = Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) p(x_i)=\Sigma_zp(x_i|z)p(z)=\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k) p(xi)=Σzp(xiz)p(z)=Σk=1KπkN(xiμk,Σk)
  • 从属度(可以看做,xi属于第k个簇的解释
    • γ ( z i k ) = p ( z i k = 1 ∣ x i ) = p ( z i k = 1 ) p ( x i ∣ z k = 1 ) Σ k = 1 K p ( z i k = 1 ) p ( x i ∣ z k = 1 ) = π k N ( x i ∣ μ k , Σ k ) Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) \gamma(z_{ik})\\=p(z_{ik=1}|x_i)\\=\frac{p(z_{ik}=1)p(x_i|z_k=1)}{\Sigma_{k=1}^Kp(z_{ik}=1)p(x_i|z_k=1)}\\=\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)} γ(zik)=p(zik=1xi)=Σk=1Kp(zik=1)p(xizk=1)p(zik=1)p(xizk=1)=Σk=1KπkN(xiμk,Σk)πkN(xiμk,Σk)

参数学习:极大似然估计–EM

  • 极大似然估计
    • 难:log里面有求和,所有参数耦合
    • 似然函数取最大值时满足的条件: l o g ( P ( x ∣ θ ) 对 μ k 求 导 log(P(x|\theta)对\mu_k求导 log(P(xθ)μk
      • 0 = − Σ i = 1 N π k N ( x i ∣ μ k , Σ k ) Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) Σ k ( x i − μ k ) 0=-\Sigma_{i=1}^N\frac{\pi_kN(x_i|\mu_k,\Sigma_k)}{\Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k)}\Sigma_k(x_i-\mu_k) 0=Σi=1NΣk=1KπkN(xiμk,Σk)πkN(xiμk,Σk)Σk(xiμk)
        • μ k = Σ i γ ( z i k ) x i γ ( z i k ) \mu_k=\frac{\Sigma_i\gamma(z_{ik})x_i}{\gamma(z_{ik})} μk=γ(zik)Σiγ(zik)xi
        • π k = Σ i γ ( z i k ) N \pi_k=\frac{\Sigma_i\gamma(z_{ik})}{N} πk=NΣiγ(zik)
        • Σ k = Σ i γ ( z i k ) ( x i − μ k ) ( x i − μ k ) T γ ( z i k ) \Sigma_k=\frac{\Sigma_i\gamma(z_{ik})(x_i-\mu_k)(x_i-\mu_k)^T}{\gamma(z_{ik})} Σk=γ(zik)Σiγ(zik)(xiμk)(xiμk)T
      • 这不是封闭解–》EM
        • E:给定当前参数估计值,求后验概率 γ ( z i k ) = E ( z i k ) \gamma(z_{ik})=E(z_{ik}) γ(zik)=E(zik)
        • M:依据后验概率 γ ( z i k ) \gamma(z_{ik}) γ(zik),求参数估计 μ k 、 π k 、 Σ k \mu_k、\pi_k、\Sigma_k μkπkΣk
        • 迭代收敛到局部极小
EM
  • 通用EM
    • 目标函数:极大似然函数 l o g P ( X ∣ θ ) = l o g Σ z P ( x , z ∣ θ ) logP(X|\theta)=log\Sigma_zP(x,z|\theta) logP(Xθ)=logΣzP(x,zθ)
    • 用于:不完整数据的对数似然函数
      • 不知Z的数据,只知道Z的后验分布 P ( z ∣ x , θ o l d ) P(z|x,\theta^{old}) P(zx,θold)
      • 考虑其期望 Q ( θ , θ o l d ) = E p ( z ∣ x , θ o l d ) ( l o g P ( x , z ∣ θ ) ) Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta)) Q(θ,θold)=Ep(zx,θold)(logP(x,zθ))
      • 最大化期望 θ n e w = a r g m a x θ Q ( θ , θ o l d ) \theta^{new}=argmax_\theta Q(\theta,\theta^{old}) θnew=argmaxθQ(θ,θold)
    • E:求 P ( z ∣ x , θ o l d ) P(z|x,\theta^{old}) P(zx,θold)
    • M: θ n e w = a r g m a x θ Q ( θ , θ o l d ) \theta^{new}=argmax_\theta Q(\theta,\theta^{old}) θnew=argmaxθQ(θ,θold)
      • why是启发式的,但却存在似然函数?
        • Q ( θ , θ o l d ) = E p ( z ∣ x , θ o l d ) ( l o g P ( x , z ∣ θ ) ) = p ( x ; θ ) Q(\theta,\theta^{old})=E_{p(z|x,\theta^{old})}(log P(x,z|\theta))=p(x;\theta) Q(θ,θold)=Ep(zx,θold)(logP(x,zθ))=p(x;θ)
    • 完整数据和不完整数据的比较
    • 不完整数据: l o g p ( x ) = Σ i l o g Σ z p ( x i ∣ z ) p ( z ) = Σ i l o g Σ k = 1 K π k N ( x i ∣ μ k , Σ k ) logp(x)=\Sigma_ilog \Sigma_zp(x_i|z)p(z)=\Sigma_ilog \Sigma_{k=1}^K\pi_kN(x_i|\mu_k,\Sigma_k) logp(x)=ΣilogΣzp(xiz)p(z)=ΣilogΣk=1KπkN(xiμk,Σk)
      • 不完整数据中,参数之间是耦合的,不存在封闭解
    • 完整数据
      • l o g p ( x , z ∣ θ ) = l o g p ( z ∣ θ ) p ( x ∣ z , θ ) = Σ i Σ k z i k ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) logp(x,z|\theta)=logp(z|\theta)p(x|z,\theta)=\Sigma_i\Sigma_k z_{ik}(log\pi_k+logN(x_i|\mu_k,\Sigma_k)) logp(x,zθ)=logp(zθ)p(xz,θ)=ΣiΣkzik(logπk+logN(xiμk,Σk))
      • E z ( l o g p ( x , z ∣ θ ) ) = Σ i Σ k E ( z i k ) ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) = Σ i Σ k γ ( z i k ) ( l o g π k + l o g N ( x i ∣ μ k , Σ k ) ) E_z(logp(x,z|\theta))\\=\Sigma_i\Sigma_kE(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k))\\=\Sigma_i\Sigma_k\gamma(z_{ik})(log\pi_k+logN(x_i|\mu_k,\Sigma_k)) Ez(logp(x,zθ))=ΣiΣkE(zik)(logπk+logN(xiμk,Σk))=ΣiΣkγ(zik)(logπk+logN(xiμk,Σk))
EM收敛性保证
  • 目标:最大化 P ( x ∣ θ ) = Σ z p ( x , z ∣ θ ) P(x|\theta)=\Sigma_zp(x,z|\theta) P(xθ)=Σzp(x,zθ)
    • 直接优化 P ( x ∣ θ ) P(x|\theta) P(xθ)很困难,但优化完整数据的 p ( x , z ∣ θ ) p(x,z|\theta) p(x,zθ)容易
  • 证明
    • 分解
    • 对任意分布q(z),下列分解成立
      • l n p ( x ∣ θ ) = L ( q , θ ) + K L ( q ∣ ∣ p ) 其 中 , L ( q , θ ) = Σ z q ( z ) l n ( p ( x , z ∣ θ ) q ( z ) ) K L ( q ∣ ∣ p ) = − Σ z q ( z ) l n ( p ( z ∣ x , θ ) q ( z ) ) K L ( q ∣ ∣ p ) ≥ 0 , L ( q , θ ) 是 l n p ( x ∣ θ ) 的 下 界 lnp(x|\theta)=L(q,\theta)+KL(q||p)\\其中,\\L(q,\theta)=\Sigma_zq(z)ln(\frac{p(x,z|\theta)}{q(z)})\\KL(q||p)=-\Sigma_zq(z)ln(\frac{p(z|x,\theta)}{q(z)})\\KL(q||p)\geq0,L(q,\theta)是lnp(x|\theta)的下界 lnp(xθ)=L(q,θ)+KL(qp)L(q,θ)=Σzq(z)ln(q(z)p(x,zθ))KL(qp)=Σzq(z)ln(q(z)p(zx,θ))KL(qp)0,L(q,θ)lnp(xθ)
    • E: 最 大 化 L ( q , θ ) , q ( z ) = P ( z ∣ x , θ o l d ) 最大化L(q,\theta),\\q(z)=P(z|x,\theta^{old}) L(q,θ),q(z)=P(zx,θold)
    • M : 原 来 的 下 界 L ( q , θ ) = Σ z P ( z ∣ x , θ o l d ) l n ( p ( x , z ∣ θ ) q ( z ) ) = Q ( θ , θ o l d ) + c o n s t − − − 正 好 是 期 望 M:原来的下界L(q,\theta)=\Sigma_zP(z|x,\theta^{old})ln(\frac{p(x,z|\theta)}{q(z)})=Q(\theta,\theta^{old})+const---正好是期望 M:L(q,θ)=ΣzP(zx,θold)ln(q(z)p(x,zθ))=Q(θ,θold)+const
    • 下界提升了
      在这里插入图片描述
      在这里插入图片描述
      在这里插入图片描述
      在这里插入图片描述

半监督

  • 目标函数: L = l o g p ( X l , Y l , X u ∣ θ ) = Σ i = 1 l l o g p ( y i ∣ θ ) p ( x i ∣ y i , θ ) + Σ i = l + 1 m l o g ( Σ k = 1 N p ( y i = k ∣ θ ) p ( x i ∣ y i = k , θ ) ) , θ i = α i , μ i , Σ i L=logp(X_l,Y_l,X_u|\theta)=\Sigma_{i=1}^llogp(y_i|\theta)p(x_i|y_i,\theta)+\Sigma_{i=l+1}^mlog(\Sigma_{k=1}^Np(y_i=k|\theta)p(x_i|y_i=k,\theta)),\theta_i={\alpha_i,\mu_i,\Sigma_i} L=logp(Xl,Yl,Xuθ)=Σi=1llogp(yiθ)p(xiyi,θ)+Σi=l+1mlog(Σk=1Np(yi=kθ)p(xiyi=k,θ)),θi=αi,μi,Σi
  • = Σ i = 1 l l o g α y i N ( x i ∣ θ y i ) + Σ i = l m l o g Σ k = 1 N α k N ( x i ∣ θ k ) = Σ i = 1 l ( l o g α y i − n 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ y i ∣ ) − ( x i − μ y i ) T Σ y i − 1 ( x i − μ y i ) + Σ i = l m l o g ( Σ k = 1 N ( α k 1 ( 2 π ) n / 2 ∣ Σ k ∣ 1 / 2 e x p { − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) } ) ) =\Sigma_{i=1}^llog \alpha_{y_i}N(x_i|\theta_{y_i})+\Sigma_{i=l}^mlog\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k) \\=\Sigma_{i=1}^l(log\alpha_{y_i}-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma_{y_i}|)-(x_i-\mu_{y_i})^T\Sigma_{y_i}^{-1}(x_i-\mu_{y_i})+\Sigma_{i=l}^mlog(\Sigma_{k=1}^N(\alpha_k{{1} \over {(2\pi)^{n/2}|\Sigma_k|^{1/2}}} exp\{ -{{1} \over {2}}(x_i-\mu_k)^T{\Sigma_k}^{-1}(x_i-\mu_k)\})) =Σi=1llogαyiN(xiθyi)+Σi=lmlogΣk=1NαkN(xiθk)=Σi=1l(logαyi2nlog(2π)21log(Σyi)(xiμyi)TΣyi1(xiμyi)+Σi=lmlog(Σk=1N(αk(2π)n/2Σk1/21exp{21(xiμk)TΣk1(xiμk)}))
  • E: 求 γ i k = p ( y i = k ∣ x i ) = α k N ( x i ∣ θ k ) Σ k = 1 N α k N ( x i ∣ θ k ) 求\gamma_{ik}=p(y_i=k|x_i)=\frac{\alpha_kN(x_i|\theta_k)}{\Sigma_{k=1}^N\alpha_kN(x_i|\theta_k)} γik=p(yi=kxi)=Σk=1NαkN(xiθk)αkN(xiθk)
  • M: μ k = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k x i + Σ i = l m γ i k x i ) Σ i = 1 Σ i = l m γ i k + l k ( Σ i ∈ D l , y i = k ( x i − μ k ) ( x i − μ k ) T + Σ i = l m γ i k ( x i − μ k ) ( x i − μ k ) T ) α k = Σ i = l m γ i k + l k m \mu_k=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}x_i+\Sigma_{i=l}^m\gamma_{ik}x_i)\\ \Sigma_i=\frac{1}{\Sigma_{i=l}^m\gamma_{ik}+l_k}(\Sigma_{i\in D_l ,y_i=k}(x_i-\mu_k)(x_i-\mu_k)^T+\Sigma_{i=l}^m\gamma_{ik}(x_i-\mu_k)(x_i-\mu_k)^T)\\ \alpha_k=\frac{\Sigma_{i=l}^m\gamma_{ik}+l_k}{m} μk=Σi=lmγik+lk1(ΣiDl,yi=kxi+Σi=lmγikxi)Σi=Σi=lmγik+lk1(ΣiDl,yi=k(xiμk)(xiμk)T+Σi=lmγik(xiμk)(xiμk)T)αk=mΣi=lmγik+lk
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值