# 机器学习笔记（十七）——EM算法的推导

## 一、Jensen 不等式

在EM算法的推导过程中，用到了数学上的Jensen不等式，这里先来介绍一下。

$\begin{array}{}\text{(1)}& \phi \left(\sum _{i=1}^{n}g\left({x}_{i}\right){\lambda }_{i}\right)\le \sum _{i=1}^{n}\phi \left(g\left({x}_{i}\right)\right){\lambda }_{i}，（17-1）\end{array}$

其中${\lambda }_{1}+{\lambda }_{2}+\cdots +{\lambda }_{n}=1,{\lambda }_{i}\ge 0$$\lambda_1 + \lambda_2 + \cdots + \lambda_n = 1, \lambda_i \ge 0$

若φ是凹函数，只需把不等式符号调转。主要参考文献【1】

## 二、EM算法推导

面对一个含有隐含变量的概率模型，目标是极大化观测数据$Y$$Y$关于参数$\theta$$\theta$的对数似然函数，即极大化：

$\begin{array}{}\text{(184)}& L\left(\theta \right)=logP\left(Y;\theta \right)=log\sum _{z}P\left(Y,Z;\theta \right)\phantom{\rule{0ex}{0ex}}=log\sum _{z}P\left(Y|Z;\theta \right)P\left(Z;\theta \right)\end{array}$

事实上，EM算法是通过迭代逐步极大化$L\left(\theta \right)$$L(\theta)$的。假设在第$i$$i$次迭代后$\theta$$\theta$的估计值是${\theta }^{\left(i\right)}$$\theta^{(i)}$。我们希望新的估计值$\theta$$\theta$能使$L\left(\theta \right)$$L(\theta)$增加，即$L\left(\theta \right)>L\left({\theta }^{\left(i\right)}\right)$$L(\theta) > L(\theta^{(i)})$,并逐步达到极大值。为此考虑两者的差：
$\begin{array}{}\text{(185)}& L\left(\theta \right)-L\left({\theta }^{\left(i\right)}\right)=log\left(\sum _{z}P\left(Y|Z;\theta \right)P\left(Z;\theta \right)\right)-logP\left(Y;{\theta }^{\left(i\right)}\right)\phantom{\rule{0ex}{0ex}}=log\left(\sum _{z}P\left(Z|Y;{\theta }^{\left(i\right)}\right)\frac{P\left(Y|Z;\theta \right)P\left(Z;\theta \right)}{P\left(Z|Y;{\theta }^{\left(i\right)}\right)}\right)-logP\left(Y;{\theta }^{\left(i\right)}\right)\phantom{\rule{0ex}{0ex}}\ge \sum _{z}P\left(Z|Y;{\theta }^{\left(i\right)}\right)log\frac{P\left(Y|Z;\theta \right)P\left(Z;\theta \right)}{P\left(Z|Y;{\theta }^{\left(i\right)}\right)}-logP\left(Y;{\theta }^{\left(i\right)}\right)\phantom{\rule{0ex}{0ex}}=\sum _{z}P\left(Z|Y;{\theta }^{\left(i\right)}\right)log\frac{P\left(Y|Z;\theta \right)P\left(Z;\theta \right)}{P\left(Z|Y;{\theta }^{\left(i\right)}\right)P\left(Y;{\theta }^{\left(i\right)}\right)}\end{array}$

$\begin{array}{}\text{(186)}& B\left(\theta ,{\theta }^{\left(i\right)}\right)=L\left({\theta }^{\left(i\right)}\right)+\sum _{z}P\left(Y|Z;{\theta }^{\left(i\right)}\right)log\frac{P\left(Y|Z;\theta \right)P\left(Z;\theta \right)}{P\left(Y|Z;{\theta }^{\left(i\right)}\right)P\left(Y;{\theta }^{\left(i\right)}\right)}，（17-2）\end{array}$

$\begin{array}{}\text{(187)}& L\left(\theta \right)\ge B\left(\theta ,{\theta }^{\left(i\right)}\right)\end{array}$

$\begin{array}{}\text{(188)}& L\left({\theta }^{\left(i\right)}\right)=B\left({\theta }^{\left(i\right)},{\theta }^{\left(i\right)}\right)\end{array}$

$\begin{array}{}\text{(189)}& {\theta }^{\left(i+1\right)}=arg\underset{\theta }{max}B\left(\theta ,{\theta }^{\left(i\right)}\right)\end{array}$

$\begin{array}{}\text{(190)}& {\theta }^{\left(i+1\right)}=arg\underset{\theta }{max}\left(L\left({\theta }^{\left(i\right)}\right)+\sum _{z}P\left(Y|Z;{\theta }^{\left(i\right)}\right)log\frac{P\left(Y|Z;\theta \right)P\left(Z;\theta \right)}{P\left(Y|Z;{\theta }^{\left(i\right)}\right)P\left(Y;{\theta }^{\left(i\right)}\right)}\right)\phantom{\rule{0ex}{0ex}}=arg\underset{\theta }{max}\left(\sum _{z}P\left(Y|Z;{\theta }^{\left(i\right)}\right)logP\left(Y|Z;\theta \right)P\left(Z;\theta \right)\right)\phantom{\rule{0ex}{0ex}}=arg\underset{\theta }{max}\left(\sum _{z}P\left(Y|Z;{\theta }^{\left(i\right)}\right)logP\left(Y,Z;\theta \right)\right)\phantom{\rule{0ex}{0ex}}=arg\underset{\theta }{max}Q\left(\theta ,{\theta }^{\left(i\right)}\right)\end{array}$

EM算法并不能保证全局最优值，直观解释如图所示。