最大期望算法(Expectation-Maximization algorithm,EM)

最大期望算法(Expectation-Maximization algorithm,EM)

  最大期望算法,也被译作最大化算法(Minorize-Maxization,MM),是在概率模型中寻找参数最大似然估计或者最大后验估计的算法,其中概率模型依赖于无法观测的隐性变量。
  最大期望算法就是E-step和M-step交替进行计算,直至满足收敛条件。所以它是一种迭代算法。
  EM算法适用场景:当数据有缺失值时,即数据不完整时。还有很多机器学习模型的求解经常用到Em,比如GMM(高斯混合模型)、HMM(隐马尔科夫模型)等等。

一、EM算法的广义步骤

  E-step:利用可用的数据来估算(猜测)潜在变量的值;
  M-step:根据E步骤中生成的估计值,使用完整的数据更新参数。

二、先写出EM的公式

  这里以最大似然估计作为准则:
θ ^ M L E = arg ⁡ max ⁡ log ⁡ P ( X ∣ θ ) {\hat \theta _{MLE}} = \arg \max \log P(X|\theta ) θ^MLE=argmaxlogP(Xθ)
  EM公式
θ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z {\theta ^{(t + 1)}} = \arg \mathop {\max }\limits_\theta \int_Z {\log } P(X,Z|\theta )P(Z|X,{\theta ^{(t)}})dZ θ(t+1)=argθmaxZlogP(X,Zθ)P(ZX,θ(t))dZ
   ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z \int_Z {\log } P(X,Z|\theta )P(Z|X,{\theta ^{(t)}})dZ ZlogP(X,Zθ)P(ZX,θ(t))dZ也可以写作 E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] {E_{Z|X,{\theta ^{(t)}}}}[\log P(X,Z|\theta )] EZX,θ(t)[logP(X,Zθ)]或者 ∑ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) \sum\limits_Z {\log P(X,Z|\theta )P(Z|X,{\theta ^{(t)}})} ZlogP(X,Zθ)P(ZX,θ(t))

三、其收敛性的证明

  此处的证明并不是非常严格的证明。要证明其收敛,就是要证明当 θ ( t ) → θ ( t + 1 ) {\theta ^{(t)}} \to {\theta ^{(t + 1)}} θ(t)θ(t+1) P ( X ∣ θ ( t ) ) → P ( X ∣ θ ( t + 1 ) ) P(X|{\theta ^{(t)}}) \to P(X|{\theta ^{(t + 1)}}) P(Xθ(t))P(Xθ(t+1)) P ( X ∣ θ ( t ) ) ≤ P ( X ∣ θ ( t + 1 ) ) P(X|{\theta ^{(t)}}) \le P(X|{\theta ^{(t + 1)}}) P(Xθ(t))P(Xθ(t+1))。证明过程如下:

  证明:
log ⁡ P ( X ∣ θ ) = log ⁡ P ( X , Z ∣ θ ) − log ⁡ P ( Z ∣ X , θ ) \log P(X|\theta ) = \log P(X,Z|\theta ) - \log P(Z|X,\theta ) logP(Xθ)=logP(X,Zθ)logP(ZX,θ)
  等式两边关于 Z ∣ X , θ ( t ) {Z|X,{\theta ^{(t)}}} ZX,θ(t)分布同时求期望:

  左边:
E Z ∣ X , θ ( t ) [ log ⁡ P ( X ∣ θ ) ] = ∫ Z log ⁡ P ( X ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z = log ⁡ P ( X ∣ θ ) ∫ Z P ( Z ∣ X , θ ( t ) ) d Z = log ⁡ P ( X ∣ θ ) \begin{array}{l} {E_{Z|X,{\theta ^{(t)}}}}[\log P(X|\theta )]\\ = \int_Z {\log } P(X|\theta )P(Z|X,{\theta ^{(t)}})dZ\\ = \log P(X|\theta )\int_Z {P(Z|X,{\theta ^{(t)}})} dZ\\ = \log P(X|\theta ) \end{array} EZX,θ(t)[logP(Xθ)]=ZlogP(Xθ)P(ZX,θ(t))dZ=logP(Xθ)ZP(ZX,θ(t))dZ=logP(Xθ)

  右边:
E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) − log ⁡ P ( Z ∣ X , θ ) ] = ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z − ∫ Z log ⁡ P ( Z ∣ X , θ ) P ( Z ∣ X , θ ( t ) ) d Z \begin{array}{l} {E_{Z|X,{\theta ^{(t)}}}}\left[ {\log P(X,Z|\theta ) - \log P(Z|X,\theta )} \right]\\ = \int_Z {\log P(X,Z|\theta )P(Z|X,{\theta ^{(t)}})dZ - } \int_Z {\log P(Z|X,\theta )P(Z|X,{\theta ^{(t)}})dZ} \end{array} EZX,θ(t)[logP(X,Zθ)logP(ZX,θ)]=ZlogP(X,Zθ)P(ZX,θ(t))dZZlogP(ZX,θ)P(ZX,θ(t))dZ
  令 Q ( θ , θ ( t ) ) = ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z Q(\theta ,{\theta ^{(t)}}) = \int_Z {\log P(X,Z|\theta )P(Z|X,{\theta ^{(t)}})dZ} Qθ,θ(t))=ZlogP(X,Zθ)P(ZX,θ(t))dZ H ( θ , θ ( t ) ) = ∫ Z log ⁡ P ( Z ∣ X , θ ) P ( Z ∣ X , θ ( t ) ) d Z H(\theta ,{\theta ^{(t)}}) = \int_Z {\log P(Z|X,\theta )P(Z|X,{\theta ^{(t)}})dZ} H(θ,θ(t))=ZlogP(ZX,θ)P(ZX,θ(t))dZ

  则
E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) − log ⁡ P ( Z ∣ X , θ ) ] = Q ( θ , θ ( t ) ) − H ( θ , θ ( t ) ) {E_{Z|X,{\theta ^{(t)}}}}\left[ {\log P(X,Z|\theta ) - \log P(Z|X,\theta )} \right] = Q(\theta ,{\theta ^{(t)}}) - H(\theta ,{\theta ^{(t)}}) EZX,θ(t)[logP(X,Zθ)logP(ZX,θ)]=Q(θ,θ(t))H(θ,θ(t))

  由EM的算法公式可得:(因为就是要求满足要求的最大 θ \theta θ作为 θ ( t + 1 ) \theta ^{(t+1)} θ(t+1)
Q ( θ ( t + 1 ) , θ ( t ) ) ≥ Q ( θ , θ ( t ) ) Q({\theta ^{(t + 1)}},{\theta ^{(t)}}) \ge Q(\theta ,{\theta ^{(t)}}) Q(θ(t+1),θ(t))Q(θ,θ(t))
  也即:( θ \theta θ θ ( t ) \theta ^{(t)} θ(t)时)
Q ( θ ( t + 1 ) , θ ( t ) ) ≥ Q ( θ ( t ) , θ ( t ) ) Q({\theta ^{(t + 1)}},{\theta ^{(t)}}) \ge Q({\theta ^{(t)}},{\theta ^{(t)}}) Q(θ(t+1),θ(t))Q(θ(t),θ(t))

  对于 H ( θ , θ ( t ) ) H(\theta ,{\theta ^{(t)}}) H(θ,θ(t))
H ( θ ( t + 1 ) , θ ( t ) ) − H ( θ ( t ) , θ ( t ) ) = ∫ Z log ⁡ P ( Z ∣ X , θ ( t + 1 ) ) P ( Z ∣ X , θ ( t ) ) d Z − ∫ Z log ⁡ P ( Z ∣ X , θ t ) P ( Z ∣ X , θ ( t ) ) d Z = ∫ Z P ( Z ∣ X , θ ( t ) ) log ⁡ P ( Z ∣ X , θ ( t + 1 ) ) P ( Z ∣ X , θ t ) d Z = − K L ( P ( Z ∣ X , θ ( t ) ) ∣ ∣ P ( Z ∣ X , θ ( t + 1 ) ) ) ≤ 0 \begin{array}{l} H({\theta ^{(t + 1)}},{\theta ^{(t)}}) - H({\theta ^{(t)}},{\theta ^{(t)}})\\ = \int_Z {\log P(Z|X,{\theta ^{(t + 1)}})} P(Z|X,{\theta ^{(t)}})dZ - \int_Z {\log P(Z|X,{\theta ^t})} P(Z|X,{\theta ^{(t)}})dZ\\ = \int_Z {P(Z|X,{\theta ^{(t)}})} \log \frac{{P(Z|X,{\theta ^{(t + 1)}})}}{{P(Z|X,{\theta ^t})}}dZ\\ = - KL(P(Z|X,{\theta ^{(t)}})||P(Z|X,{\theta ^{(t + 1)}}))\\ \le 0 \end{array} H(θ(t+1),θ(t))H(θ(t),θ(t))=ZlogP(ZX,θ(t+1))P(ZX,θ(t))dZZlogP(ZX,θt)P(ZX,θ(t))dZ=ZP(ZX,θ(t))logP(ZX,θt)P(ZX,θ(t+1))dZ=KL(P(ZX,θ(t))P(ZX,θ(t+1)))0
  也即 H ( θ ( t + 1 ) , θ ( t ) ) ≤ H ( θ ( t ) , θ ( t ) ) H({\theta ^{(t + 1)}},{\theta ^{(t)}}) \le H({\theta ^{(t)}},{\theta ^{(t)}}) H(θ(t+1),θ(t))H(θ(t),θ(t))
  综上, P ( X ∣ θ ( t ) ) ≤ P ( X ∣ θ ( t + 1 ) ) P(X|{\theta ^{(t)}}) \le P(X|{\theta ^{(t + 1)}}) P(Xθ(t))P(Xθ(t+1)),证毕。

四、公式推导方法1

说明下数据:
  X: observed data    X = { x 1 , x 2 , ⋯ x N } X = \{ {x_1},{x_2}, \cdots {x_N}\} X={x1,x2,xN}
  Z: unovserved data(latent data)    Z = { z i } i = 1 K Z = \{ {z_i}\} _{i = 1}^K Z={zi}i=1K
  (X,Z): complete data
   θ \theta θ: parameter

4.1 E-M步骤公式

  E-step:
P ( Z ∣ X , θ ( t ) ) → E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] P(Z|X,{\theta ^{(t)}}) \to {E_{Z|X,{\theta ^{(t)}}}}[\log P(X,Z|\theta )] P(ZX,θ(t))EZX,θ(t)[logP(X,Zθ)]
  M-step:
θ ( t + 1 ) = arg ⁡ max ⁡ θ E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] {\theta ^{(t + 1)}} = \arg \mathop {\max }\limits_\theta {E_{Z|X,{\theta ^{(t)}}}}[\log P(X,Z|\theta )] θ(t+1)=argθmaxEZX,θ(t)[logP(X,Zθ)]

4.2 推导过程

log ⁡ P ( X ∣ θ ) = log ⁡ ( X , Z ∣ θ ) − log ⁡ ( Z ∣ X , θ ) \log P(X|\theta ) = \log (X,Z|\theta ) - \log (Z|X,\theta ) logP(Xθ)=log(X,Zθ)log(ZX,θ)
  等价代换,引入分布 q ( Z ) q(Z) q(Z):
log ⁡ P ( X ∣ θ ) = log ⁡ P ( X , Z ∣ θ ) q ( z ) − log ⁡ P ( Z ∣ X , θ ) q ( z ) , q ( Z ) ≠ 0 \log P(X|\theta ) = \log \frac{{P(X,Z|\theta )}}{{q(z)}} - \log \frac{{P(Z|X,\theta )}}{{q(z)}} , q(Z) \ne 0 logP(Xθ)=logq(z)P(X,Zθ)logq(z)P(ZX,θ),q(Z)=0

  两边同时关于分布 q ( Z ) q(Z) q(Z)求期望

  对于左边:
E q ( Z ) [ log ⁡ P ( X ∣ θ ) ] = ∫ Z log ⁡ P ( X ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z = log ⁡ P ( X ∣ θ ) ∫ Z P ( Z ∣ X , θ ( t ) ) d Z = log ⁡ P ( X ∣ θ ) \begin{array}{l} {E_{q(Z)}}[\log P(X|\theta )] = \int_Z {\log } P(X|\theta )P(Z|X,{\theta ^{(t)}})dZ\\ = \log P(X|\theta )\int_Z {P(Z|X,{\theta ^{(t)}})} dZ\\ = \log P(X|\theta ) \end{array} Eq(Z)[logP(Xθ)]=ZlogP(Xθ)P(ZX,θ(t))dZ=logP(Xθ)ZP(ZX,θ(t))dZ=logP(Xθ)

  对于右边:
E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) q ( z ) − log ⁡ P ( Z ∣ X , θ ) q ( z ) ] = ∫ Z log ⁡ P ( X , Z ∣ θ ) q ( z ) q ( z ) d Z − ∫ Z log ⁡ P ( Z ∣ X , θ ) q ( z ) q ( z ) d Z = E L B O + K L ( q ( Z ) ∣ ∣ P ( Z ∣ X , θ ) ) \begin{array}{l} {E_{Z|X,{\theta ^{(t)}}}}\left[ {\log \frac{{P(X,Z|\theta )}}{{q(z)}} - \log \frac{{P(Z|X,\theta )}}{{q(z)}}} \right]\\ = \int_Z {\log \frac{{P(X,Z|\theta )}}{{q(z)}}} q(z)dZ - \int_Z {\log \frac{{P(Z|X,\theta )}}{{q(z)}}} q(z)dZ\\ = ELBO + KL\left( {q(Z)||P(Z|X,\theta )} \right) \end{array} EZX,θ(t)[logq(z)P(X,Zθ)logq(z)P(ZX,θ)]=Zlogq(z)P(X,Zθ)q(z)dZZlogq(z)P(ZX,θ)q(z)dZ=ELBO+KL(q(Z)P(ZX,θ))
  其中 P ( Z ∣ X , θ ) {P(Z|X,\theta )} P(ZX,θ)为后验概率。ELBO为evidence lower bound

∴ log ⁡ P ( X ∣ θ ) = E L B O + K L ( q ∣ ∣ P ) ∵ K L ( q ∣ ∣ P ) ≥ 0 ∴ log ⁡ P ( X ∣ θ ) ≥ E L B O \begin{array}{l} \therefore \log P(X|\theta ) = ELBO + KL\left( {q||P} \right)\\ \because KL\left( {q||P} \right) \ge 0\\ \therefore \log P(X|\theta ) \ge ELBO \end{array} logP(Xθ)=ELBO+KL(qP)KL(qP)0logP(Xθ)ELBO

  则取最大值,就等价于ELBO取最大值,此时:
θ ^ ( t + 1 ) = arg ⁡ max ⁡ θ E L B O = arg ⁡ max ⁡ θ ∫ Z log ⁡ P ( X , Z ∣ θ ) q ( z ) q ( z ) d Z {{\hat \theta }^{(t + 1)}} = \arg \mathop {\max }\limits_\theta ELBO = \arg \mathop {\max }\limits_\theta \int_Z {\log \frac{{P(X,Z|\theta )}}{{q(z)}}} q(z)dZ θ^(t+1)=argθmaxELBO=argθmaxZlogq(z)P(X,Zθ)q(z)dZ

   ∵ \because q = P q=P q=P时, K L = 0 KL=0 KL=0,即 log ⁡ P ( X ∣ θ ) = E L B O \log P(X|\theta ) = ELBO logP(Xθ)=ELBO的等号成立。

  则取 q ( z ) = P ( Z ∣ X , θ ( t ) ) q(z) = P(Z|X,{\theta ^{(t)}}) q(z)=P(ZX,θ(t)),即上一时刻的后验。
   ∴ \therefore
θ ^ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) P ( Z ∣ X , θ ( t ) ) d Z = arg ⁡ max ⁡ θ ∫ Z [ log ⁡ P ( X , Z ∣ θ ) − log ⁡ P ( Z ∣ X , θ ( t ) ) ] P ( Z ∣ X , θ ( t ) ) d Z \begin{array}{l} {{\hat \theta }^{(t + 1)}} = \arg \mathop {\max }\limits_\theta \int_Z {\log \frac{{P(X,Z|\theta )}}{{P(Z|X,{\theta ^{(t)}})}}} P(Z|X,{\theta ^{(t)}})dZ\\ = \arg \mathop {\max }\limits_\theta \int_Z {\left[ {\log P(X,Z|\theta ) - \log P(Z|X,{\theta ^{(t)}})} \right]} P(Z|X,{\theta ^{(t)}})dZ \end{array} θ^(t+1)=argθmaxZlogP(ZX,θ(t))P(X,Zθ)P(ZX,θ(t))dZ=argθmaxZ[logP(X,Zθ)logP(ZX,θ(t))]P(ZX,θ(t))dZ

  此时 θ ( t ) \theta ^{(t)} θ(t)为上一时刻的参数,故在此可视作常数,则减号后面的参数都视作常数,与目标 θ \theta θ无关,在求解是无用,故可以省去。

  则此时:
θ ^ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ ( t ) ) d Z {{\hat \theta }^{(t + 1)}} = \arg \mathop {\max }\limits_\theta \int_Z {\log P(X,Z|\theta )} P(Z|X,{\theta ^{(t)}})dZ θ^(t+1)=argθmaxZlogP(X,Zθ)P(ZX,θ(t))dZ

  证毕

五、公式推导方法2(涉及Jensen不等式)

5.1 Jensen不等式

  对于concave function(凹函数),设 t ∈ [ 0 , 1 ] t \in [0,1] t[0,1],则有 f ( t ⋅ a + ( 1 − t ) ⋅ b ) ≥ t f ( a ) + ( 1 − t ) f ( b ) f\left( {t \cdot a + (1 - t) \cdot b} \right) \ge tf\left( a \right) + (1 - t)f\left( b \right) f(ta+(1t)b)tf(a)+(1t)f(b)。特别的,当 a = b = 1 2 a = b = \frac{1}{2} a=b=21时, f ( a + b 2 ) ≥ f ( a ) + f ( b ) 2 f\left( {\frac{{a + b}}{2}} \right) \ge \frac{{f\left( a \right) + f\left( b \right)}}{2} f(2a+b)2f(a)+f(b)。从期望的角度来看就是 f ( E ) ≥ E ( f ) f\left( E \right) \ge E\left( f \right) f(E)E(f),先期望后函数大于等于先函数后期望。

5.2 关于E-M算法的理解

  对于E-step θ ( t ) \theta ^{(t)} θ(t)是上一次求出的参数,在这一步就是常数,然后对于后验分布 Z ∣ X , θ ( t ) Z|X,{\theta ^{(t)}} ZX,θ(t)求关于最大似然函数 log ⁡ P ( X , Z ∣ θ ) {\log P(X,Z|\theta )} logP(X,Zθ)的期望,即写出一个关于 θ \theta θ的函数。
  对于M-step: 针对E-step写出的期望函数,求使参数 θ \theta θ满足其取最大值的参数作为当前时刻的参数目标 θ ( t + 1 ) \theta ^{(t+1)} θ(t+1).

5.3 推导过程

log ⁡ P ( X ∣ θ ) = log ⁡ ∫ Z P ( X , Z ∣ θ ) d Z \log P(X|\theta ) = \log \int_Z {P(X,Z|\theta )} dZ logP(Xθ)=logZP(X,Zθ)dZ

联合分布的边缘积分=边缘概率分布

  引入分布 q ( Z ) q(Z) q(Z):
原 式 = log ⁡ ∫ Z P ( X , Z ∣ θ ) q ( Z ) ⋅ q ( Z ) d Z = log ⁡ [ E Z ∣ X , θ ( t ) ( P ( X , Z ∣ θ ) q ( Z ) ) ] \begin{array}{l}原式= \log \int_Z {\frac{{P(X,Z|\theta )}}{{q(Z)}}} \cdot q(Z)dZ\\ = \log \left[ {{E_{Z|X,{\theta ^{(t)}}}}\left( {\frac{{P(X,Z|\theta )}}{{q(Z)}}} \right)} \right] \end{array} =logZq(Z)P(X,Zθ)q(Z)dZ=log[EZX,θ(t)(q(Z)P(X,Zθ))]

  由Jensen不等式:
原 式 ≥ E Z ∣ X , θ ( t ) [ log ⁡ ( P ( X , Z ∣ θ ) q ( Z ) ) ] = E L B O 原式\ge {E_{Z|X,{\theta ^{(t)}}}}\left[ {\log \left( {\frac{{P(X,Z|\theta )}}{{q(Z)}}} \right)} \right] = ELBO EZX,θ(t)[log(q(Z)P(X,Zθ))]=ELBO

  当 P ( X , Z ∣ θ ) q ( Z ) = c {\frac{{P(X,Z|\theta )}}{{q(Z)}}} = c q(Z)P(X,Zθ)=c, c c c为常数时,等号成立。

  则 q ( Z ) = 1 c P ( X , Z ∣ θ ) q(Z) = \frac{1}{c}P(X,Z|\theta ) q(Z)=c1P(X,Zθ),两边同时对 Z Z Z求积分:
∫ Z q ( Z ) d Z = ∫ Z 1 c P ( X , Z ∣ θ ) d Z 1 = 1 c P ( X ∣ θ ) \begin{array}{c} \int_Z {q(Z)dZ = \int_Z {\frac{1}{c}P(X,Z|\theta )dZ} } \\ \\ 1 = \frac{1}{c}P(X|\theta ) \end{array} Zq(Z)dZ=Zc1P(X,Zθ)dZ1=c1P(Xθ)
   ∴ \therefore q ( Z ) = P ( X ∣ θ ) ⋅ P ( X , Z ∣ θ ) = P ( Z ∣ X , θ ) q(Z) = P(X|\theta ) \cdot P(X,Z|\theta ) = P(Z|X,\theta ) q(Z)=P(Xθ)P(X,Zθ)=P(ZX,θ)

  后续步骤同方法一。

六、广义EM算法

  ①狭义EM算法是广义EM算法的一种特例;
  ②生成模型中如果 Z Z Z的复杂度太高,则后验概率 P ( Z ∣ X , θ ) P(Z|X,\theta) P(ZX,θ)很难求出(intractable)。但是像GMM和HNN的 Z Z Z是结构化的,相对简单,所以可以用狭义EM算法进行优化。

log ⁡ P ( X ∣ θ ) = E L B O + K L ( q ∣ ∣ P ) = E q ( Z ) [ log ⁡ P ( X , Z ∣ θ ) q ( Z ) ] − E q ( Z ) [ log ⁡ P ( Z ∣ X , θ ) q ( Z ) ] \begin{array}{l} \log P(X|\theta ) = ELBO + KL(q||P)\\ = {E_{q(Z)}}\left[ {\log \frac{{P(X,Z|\theta )}}{{q(Z)}}} \right] - {E_{q(Z)}}\left[ {\log \frac{{P(Z|X,\theta )}}{{q(Z)}}} \right] \end{array} logP(Xθ)=ELBO+KL(qP)=Eq(Z)[logq(Z)P(X,Zθ)]Eq(Z)[logq(Z)P(ZX,θ)]

广义EM步骤:

   { 1. 固 定 θ : q ^ = arg ⁡ min ⁡ q K L ( q ∣ ∣ P ) = arg ⁡ max ⁡ q E L B O ( q ^ , θ ) 2. 固 定 q ^ : θ = arg ⁡ max ⁡ θ E L B O ( q ^ , θ ) \left\{ \begin{array}{l} 1.固定\theta :\hat q = \arg \mathop {\min }\limits_q KL(q||P) = \arg \mathop {\max }\limits_q ELBO(\hat q,\theta)\\ 2.固定\hat q:\theta = \arg \mathop {\max }\limits_\theta ELBO(\hat q,\theta) \end{array} \right. {1.θ:q^=argqminKL(qP)=argqmaxELBO(q^,θ)2.q^:θ=argθmaxELBO(q^,θ)

  对应的:

   { E − s t e p : q ( t + 1 ) = arg ⁡ max ⁡ q E L B O ( q , θ ( t ) ) M − s t e p : θ ( t + 1 ) = arg ⁡ max ⁡ θ E L B O ( q ( t + 1 ) , θ ( t ) ) \left\{ \begin{array}{l} E-step:{q^{(t + 1)}} = \arg \mathop {\max }\limits_q ELBO(q,{\theta ^{(t)}})\\ M-step:{\theta ^{(t + 1)}} = \arg \mathop {\max }\limits_\theta ELBO({q^{(t + 1)}},{\theta ^{(t)}}) \end{array} \right. Estep:q(t+1)=argqmaxELBO(q,θ(t))Mstep:θ(t+1)=argθmaxELBO(q(t+1),θ(t))


E L B O ( q , θ ) = E q ( Z ) log ⁡ P ( X , Z ∣ θ ) − E q ( Z ) log ⁡ q ( Z ) ELBO(q,\theta ) = {E_{q(Z)}} \log P(X,Z|\theta ) - {E_{q(Z)}}\log q(Z) ELBO(q,θ)=Eq(Z)logP(X,Zθ)Eq(Z)logq(Z)

  其中 − E q ( Z ) log ⁡ q ( Z ) - {E_{q(Z)}}\log q(Z) Eq(Z)logq(Z)是熵 H [ q ( z ) ] H[q(z)] H[q(z)],则
E L B O ( q , θ ) = E q ( Z ) log ⁡ P ( X , Z ∣ θ ) + H [ q ( z ) ] ELBO(q,\theta ) = {E_{q(Z)}} \log P(X,Z|\theta ) + H[q(z)] ELBO(q,θ)=Eq(Z)logP(X,Zθ)+H[q(z)]
  广义EM是先固定一个参数在计算另一个参数,故可以从坐标上升法的角度去看。

七、EM算法的改进

①变分贝叶斯EM算法,VBEM/VIEM/VEM,三个简称叫法不同,但内容基本一致;
②蒙特卡洛EM算法,MCEM。

以上内容是在B站上up主的白板推导系列和几何周志华的西瓜书总结的内容,强烈推荐该系列课程,通俗易懂,还有一定的实时性,附上链接机器学习白板推导系列
本人第一次发博,不周之处还请多多批评,欢迎交流。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值