期望极大算法：Expectation Maximization Algorithm

最新推荐文章于 2024-07-14 06:30:00 发布

xholes

最新推荐文章于 2024-07-14 06:30:00 发布

阅读量1.8k

点赞数 1

分类专栏：机器学习文章标签： EM算法机器学习期望极大算法

本文链接：https://blog.csdn.net/xholes/article/details/78343966

版权

机器学习专栏收录该内容

35 篇文章 7 订阅

订阅专栏

EM

EM算法是一种迭代的算法，也可以说是一类算法的范式。概率模型中，有时候不仅存在观测变量，还可能存在隐含变量或者潜在变量。如果模型中的变量都是观测变量，那么直接使用极大似然估计或者贝叶斯估计来估计参数；当变量中含有隐变量时，就可以采用EM算法来进行能够参数的估计。EM算法主要分为两步：E步，求期望；M步，求极大。

一般情况下，用 $Y$ 表示观测随机变量的数据， $Z$ 表示隐随机变量的数据。 $Y$ 和 $Z$ 连在一起称为完全数据，观测数据 $Y$ 又称之为不完全数据。假设给定观测数据 $Y$ ，其概率分布为 $P(Y\mid \theta)$ ， $\theta$ 是需要估计的模型参数，假设 $Y$ 和 $Z$ 的联合概率密度分布是 $P(Y,Z\mid \theta)$ 。EM算法是通过迭代求取 $L(\theta) = \log P(Y,Z\mid \theta)$ 的极大似然估计。

算法流程
输入：观测变量数据 $Y$ ，隐含变量数据 $Z$ ，联合分布 $P(Y,Z\mid \theta)$ ,条件分布 $P(Z\mid Y,\theta)$
输出：模型参数 $\theta$
(1) 选择参数的初值 $\theta^{(0)}$ ,开始迭代；
(2) E步：记 $\theta^{(i)}$ 为第 $i$ 的迭代参数 $\theta$ 的估计值，在第 $i+1$ 次迭代的E步，计算：

Q (θ, θ (i)) = E Z [log P (Y, Z ∣ θ) ∣ Y, θ (i)] = \sum Z log P (Y, Z ∣ θ) P (Z ∣ Y, θ (i))

$Q(\theta,\theta^{(i)}) = E_Z[\log P(Y,Z\mid \theta) \mid Y,\theta^{(i)}] = \underset Z\sum \log P(Y,Z\mid \theta)P(Z\mid Y,\theta^{(i)})$
(3)M步：求使得

Q(θ,θ(i)) $Q(\theta,\theta_{(i)})$ 极大化的

θ $\theta$ ，确定第

i+1 $i+1$ 次迭代的参数估计值：

θ (i + 1) = arg max θ Q (θ, θ (i))

$\theta^{(i+1)} =\arg \underset{\theta}\max\;Q(\theta,\theta^{(i)})$
(4)重复上述E步和M步，直至算法收敛。

$\color{red}{EM算法和坐标上升法都是交替优化的过程，是否存在一定的内在联系？}$

推导

对于一个含有隐含变量的概率模型，目标是是极大化观测数据 $Y$ 关于参数 $\theta$ 的对数似然函数 $L(\theta) = \log P(Y\mid \theta)$ 。每一次迭代之后，该似然函数应该都会增大，即：

L (θ) - L (θ (i)) = log (\sum Z P (Y ∣ Z, θ) P (Z ∣ θ)) - log P (Y ∣ θ (i)) = log (\sum Z P (Y ∣ Z, θ (i)) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) )) - log P (Y ∣ θ (i)) \geq \sum Z P (Y ∣ Z, θ (i)) log P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) ) - log P (Y ∣ θ (i)) = \sum Z P (Y ∣ Z, θ (i)) log P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) ) - log P (Y ∣ θ (i)) \sum P (Y ∣ Z, θ (i)) = \sum Z P (Y ∣ Z, θ (i)) log P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) ) - \sum P (Y ∣ Z, θ (i)) log P (Y ∣ θ (i)) = \sum Z P (Y ∣ Z, θ (i)) (log P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) ) - log P (Y ∣ θ (i))) = \sum Z P (Y ∣ Z, θ (i)) log P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ Z , θ ( i ) ) P ( Y ∣ θ ( i ) ) = \sum Z P (Y ∣ Z, θ (i)) log P ( Y , Z ∣ θ ) P ( Y , Z ∣ θ ( i ) )

$\begin{align} L(\theta) - L(\theta^{(i)}) &= \log \left( \underset {Z}\sum P(Y\mid Z,\theta)P(Z\mid \theta)\right) - \log P(Y\mid \theta^{(i)})\\ &=\log \left( \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})}\right)- \log P(Y\mid \theta^{(i)})\\ &\ge \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})}- \log P(Y\mid \theta^{(i)})\\ &= \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})}- \log P(Y\mid \theta^{(i)})\sum P(Y \mid Z,\theta^{(i)})\\ &= \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})}- \sum P(Y \mid Z,\theta^{(i)})\log P(Y\mid \theta^{(i)})\\ &= \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \left(\log\frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})}- \log P(Y\mid \theta^{(i)})\right)\\ &= \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y\mid Z,\theta)P(Z\mid \theta)}{P(Y \mid Z,\theta^{(i)})P(Y\mid \theta^{(i)})}\\ &= \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y,Z\mid \theta)}{P(Y ,Z\mid \theta^{(i)})}\\ \end{align}$
那么由上式可以得到一个

L(θ) $L(\theta)$ 的下界，即：

L (θ) \geq B (θ, θ (i)) = L (θ (i)) + \sum Z P (Y ∣ Z, θ (i)) log P ( Y , Z ∣ θ ) P ( Y , Z ∣ θ ( i ) )

$L(\theta) \ge B(\theta,\theta^{(i)}) = L(\theta^{(i)}) + \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y,Z\mid \theta)}{P(Y ,Z\mid \theta^{(i)})}$
那么：

θ (i + 1) = arg max θ B (θ, θ (i)) = arg max θ \sum Z P (Y ∣ Z, θ (i)) log P ( Y , Z ∣ θ ) P ( Y , Z ∣ θ ( i ) ) = arg max θ \sum Z P (Y ∣ Z, θ (i)) log P (Y, Z ∣ θ) = arg max θ Q (θ, θ (i))

$\begin{align} \theta^{(i+1)}&=\arg \underset{\theta}\max \; B(\theta,\theta^{(i)})\\ &= \arg \underset{\theta}\max \; \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log\frac {P(Y,Z\mid \theta)}{P(Y ,Z\mid \theta^{(i)})}\\ &= \arg \underset{\theta}\max \; \underset {Z}\sum P(Y \mid Z,\theta^{(i)}) \log{P(Y,Z\mid \theta)}\\ &= \arg \underset{\theta}\max \; Q(\theta,\theta^{(i)})\\ \end{align}$
从上式可以看出，EM算法是通过不断极大化下界来来逼近对数似然函数的极大值的算法，但不能保证得到全局最优值。