EM算法学习笔记

最新推荐文章于 2024-11-16 21:36:47 发布

A-Egoist

最新推荐文章于 2024-11-16 21:36:47 发布

阅读量185

点赞数

分类专栏：机器学习文章标签：算法 em 机器学习 python

本文链接：https://blog.csdn.net/CesareBorgia/article/details/117197321

版权

机器学习专栏收录该内容

8 篇文章 1 订阅

订阅专栏

Expectation Maximization Algorithm

1 极大似然估计(Maximum Likelihood Estimate)

极大似然估计思想：若已知某事件A发生有多种概率可能性 $P_1,P_2...P_n$ ，则认为其中最大的概率为事件A发生的概率。

极大似然估计的通常解法：

①构造似然函数
$L(\theta_1,\theta_2,...,\theta_n)=\left\{ \begin{array}{ll} \prod_{i=1}^{n}P(x_i;\theta_1,\theta_2,...,\theta_n)\\\\ \prod_{i=1}^{n}f(x_i;\theta_1,\theta_2,...,\theta_n) \end{array} \right.$
②对似然函数取对数

③两边同时求导

④令导数等于为0，解似然方程

2 期望极大算法(Expectation Maximization Algorithm)

2.1 基本思想

假设有两个数据集 $A 、 B$ ， $A\sim{N(\mu,\sigma^2)}、B\sim{N(\mu,\sigma^2)}$ ，通过MLE我们快速的求出 $\hat{\mu_1},\hat{\sigma_1^2},\hat{\mu_2},\hat{\sigma_2^2}$

若将 $A 、 B$ 两个数据集混合，如何求 $\hat{\mu_1},\hat{\sigma_1^2},\hat{\mu_2},\hat{\sigma_2^2}$ ？

想要解决这个问题就需要使用到EM算法：

①首先给 $\mu_1,\sigma_1^2,\mu_2,\sigma_2^2$ 设定一个初始值，即 $\mu_1^{(0)},{\sigma_1^2}^{(0)},\mu_2^{(0)},{\sigma_2^2}^{(0)}$

②利用已知的 $\mu_1^{(i)},{\sigma_1^2}^{(i)},\mu_2^{(i)},{\sigma_2^2}^{(i)}$ 去判断数据集中的每个样本是属于数据集 $A$ 还是数据集 $B$ ，并将数据集重新标注

③利用重新标注的数据集通过MLE得到新的 $\mu_1,\sigma_1^2,\mu_2,\sigma_2^2$

④重复②、③直至 $\mu_1,\sigma_1^2,\mu_2,\sigma_2^2$ 收敛

2.2 EM算法

输入：观测变量数据 $Y$ ，隐变从量数据 $Z$ ，联合分布 $P(Y,Z\mid \theta)$ ，条件分布 $P(Z\mid Y,\theta)$ ；

输出：模型参数 $\theta$

(1) 选择模型参数的初始值 $\theta^{(0)}$ ，开始迭代；

(2) E步：记 $\theta^{(i)}$ 表示第 $i$ 次迭代参数 $\theta$ 的估计值，在第 $i + 1$ 次迭代的E步，计算
$\begin{aligned} Q(\theta,\theta^{(i)})&=E_Z[\log P(Y,Z\mid \theta)\mid Y,\theta^{(i)}] \\&=\sum_Z\log P(Y,Z\mid \theta)P(Z\mid Y,\theta^{(i)}) \end{aligned}$
(3) M步：求使 $Q(\theta,\theta^{(i)})$ 极大化的 $\theta$ ，确定第 $i + 1$ 次迭代的参数的估计值 $\theta^{(i+1)}$
$\theta^{(i+1)}=\arg \max Q(\theta,\theta^{(i)})$
(4) 重复第(2)步和第(3)步，直到收敛。

2.3 数学推导

假设目标函数为 $L(\theta)=\prod{P_\theta(y_i)}$ ，其中 $\theta$ 为需要求解的参数

在没有隐变量(未观测变量)的情况下，求解 $L(\theta)$ 的最大值的一般步骤：
$\begin{aligned} &①：\ln{L(\theta)}=\sum{\ln{P_\theta(y_i)}}\\&②：\frac{\mathrm{d} \ln{L(\theta)}}{\mathrm{d} \theta} =\frac{\mathrm{d} \sum{\ln{P_\theta(y_i)}}}{\mathrm{d} \theta}=0 \\&③：设\theta_0满足上述表达式，则\max (L(\theta))=L(\theta_0) \end{aligned}$
在概率函数中存在隐变量时，由全概率公式 $P(B)=\sum{P(A_i)}P(B\mid{A_i})$ 可得：

$P_{\theta}(y_i)=\sum_{z}P_{\theta}(z)P_{\theta}(y_i\mid{z})$ ，其中 $z$ 为隐变量：
$\therefore L(\theta)=\prod \sum_{z}P_\theta(z)P_\theta(y_i\mid z)$
两边同时取对数：
$\therefore \ln L(\theta)=\sum \ln \sum_{z} P_{\theta}(z) P_{\theta}(y_i \mid z)$
由于 $\ln$ 项中有和式，在偏导数的情况下会非常复杂，因此使用迭代的方式求解析解的近似解。

设第 $n$ 次计算出来的 $\theta$ 为 $\theta^{(n)}$ ，第 $n + 1$ 次计算出来的 $\theta$ 为 $\theta^{(n+1)}$ ，则：
$\therefore \ln L(\theta^{(n+1)})-\ln L(\theta^{(n)})=\sum \ln \sum_z P_{\theta^{(n+1)}}(z)P_{\theta^{(n+1)}}(y_i\mid z)-\sum \ln P_{\theta^{(n)}}(z)P_{\theta^{(n)}}(y_i\mid z)$

由于 $\theta^{(n)}$ 已经求出，所以 $\theta^{(n)}$ 是一个已知常数，因此右式中的第二项中的隐变量不用表示体现出来。
$\therefore \ln L(\theta^{(n+1)})-\ln L(\theta^{(n)})=\sum{[\ln \sum_z P_{\theta^{(n+1)}}(z)P_{\theta^{(n+1)}}(y_i\mid z)-\ln P_{\theta^{(n)}}(y_i) ]}$

上式经过化简之后可得：
$\ln L(\theta^{(n+1)}) \ge \sum_{i=1}^{N} \sum_{z}P_{\theta^{(n)}}(z\mid y_i)[\ln \frac{P_{\theta^{(n+1)}}(y_i,z)}{P_{\theta^{(n)}}(z,y_i)}]+\ln L(\theta^{(n)})$
记右式为 $Q(\theta^{(n+1)}\mid \theta^{(n)})$ ，该函数称为下边界函数。

EM算法的目的是要取得目标函数的极大值，那么可以不断地提升下边界函数值来提升目标函数的值，直至收敛。
$\begin{aligned} \therefore Q(\theta^{(n+1)}\mid \theta^{(n)})\ &=\sum_{i=1}^{N} \sum_{z}P_{\theta^{(n)}}(z\mid y_i)[\ln \frac{P_{\theta^{(n+1)}}(y_i,z)}{P_{\theta^{(n)}}(z,y_i)}]+\ln L(\theta^{(n)})\\ &=\sum_{i=1}^{N} \sum_{z}P_{\theta^{(n)}}(z\mid y_i)\ln P_{\theta^{(n+1)}}(y_i,z)-\sum_{i=1}^{N} \sum_{z}P_{\theta^{(n)}}(z\mid y_i)\ln P_{\theta^{(n)}}(z,y_i) + \ln L(\theta^{(n)}) \end{aligned}$
上述右式中只有第一项含有未知参数 $\theta^{(n+1)}$ ，那么接下来我们只需要令 $\frac{\partial Q(\theta^{(n+1)}\mid \theta^{(n)})}{\partial \theta^{(n+1)}}=0$ ，即可求出 $\theta^{(n+1)}=\arg\max(Q(\theta^{(n+1)}\mid \theta^{(n)}))$ ，将求出的 $\theta^{(n+1)}$ 代入 $Q(\theta^{(n+2)}\mid \theta^{(n+1)})$ 即可求出 $\theta^{(n+2)}$ …然后不断的迭代直至 $\theta$ 收敛，我们便可以求出满足条件的 $\theta$ 。

2.4 Python代码实现

三硬币模型

题面：

假设有3枚硬币，分别记作A，B，C。这些硬币正面出现的概率分别为 $\pi,p$ 和 $q$ 。进行如下抛硬币试验：先抛硬币A，根据其结果选出硬币B或硬币C，正面选硬币B，反面选硬币C；然后掷选出的硬币，掷硬币的结果，出现正面记作1，出现反面记作0；独立地重复 $n$ 次试验(这里，n=10)，观测结果如下：
$1, 1, 0, 1, 0, 0, 1, 0, 1, 1$
假设只能观测到掷硬币的结果，不能观测掷硬币的过程。问如何估计三硬币正面出现的概率，即三硬币模型的参数。

解：

因为 $P(A硬币正面)=\pi,P(B硬币正面)=p,P(C硬币正面)=q$ ，设 $y$ 为某一次试验的观测量， $\theta=(\pi,p,q)$ 为模型参数
$\begin{aligned} &\therefore P(y=1\mid\theta)=\pi p+(1-\pi)q\\ &\therefore P(y=0\mid\theta)=\pi(1-p)+(1-\pi)(1-q)\\ &\therefore P(y\mid\theta)=\pi p^y(1-q)^{1-y}+(1-\pi)q^y(1-p)^{1-y} \end{aligned}$
设 $Y=(y_1,y_2,...,y_n)^T,Z=(z_1,z_2,...,z_n)^T$
$\begin{aligned} \because P(y\mid \theta)=\sum_ZP(y,z\mid\theta)=\sum_Z\frac{P(y,z,\theta)}{P(\theta)}=\sum_ZP(z\mid\theta)P(y\mid z,\theta)\\ \end{aligned}$

$\begin{aligned} P(Y\mid\theta)=&\sum_ZP(z\mid\theta)P(Y\mid z,\theta)\\=&\prod_{j=1}^{n}[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi)q^{y_j}(1-q)^{1-y_j}] \end{aligned}$

log-likelihood：
$l(\theta)=\log P(Y\mid \theta)=\sum_{j=1}^{n}\log [\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi)q^{y_j}(1-q)^{1-y_j}]$
E-step：
$\begin{aligned} Q(\theta\mid\theta^{(i)})=&E_Z[\log P(Y,Z\mid\theta)\mid Y,\theta^{(i)}]\\=&\sum_Z[(\log P(Y,Z\mid\theta))P(Z\mid Y,\theta^{(i)})]\\=&[P(z=1|Y,\theta^{(i)})\log P(Y,z=1\mid\theta)+P(z=0\mid Y,\theta^{(i)})\log P(Y,z=0\mid\theta)]\\=&\sum_{j=1}^n[P(z=1|y_j,\theta^{(i)})\log P(y_j,z=1\mid\theta)+P(z=0\mid y_j,\theta^{(i)})\log P(y_j,z=0\mid\theta)] \end{aligned}$
设在模型参数为 $\theta^{(i)}$ 下观测数据 $y_j$ 来自 B 的概率为 $\mu_j$
$\therefore \mu_j^{(i+1)}=P(z=1\mid y_j,\theta^{(i)})=\frac{\pi^{(i)} (p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}}{\pi^{(i)} (p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}+(1-\pi^{(i)})(q^{(i)})^{y_j}(1-q^{(i)})^{1-y_j}}$

$\because P(y_j,z=1\mid\theta)=\pi p^{y_j}(1-p)^{1-y_j}$

$\therefore Q(\theta\mid\theta^{(i)})=\sum_{j=1}^n\{\mu_j^{(i+1)}\log [\pi p^{y_j}(1-p)^{(1-y_j)}]+(1-\mu_j^{(i+1)})\log[(1-\pi)q^{y_j}(1-q)^{(1-y_j)}] \}$

M-step：计算模型参数行的估计值 $\theta^{(i+1)}$

求 $\pi$ ：
$令\frac{\partial Q(\theta\mid\theta^{(i)})}{\partial\pi}=0\\ \therefore \pi^{(i+1)}=\frac{1}{n}\sum_{j=1}^n\mu_j^{(i+1)}$
求 $p$ ：
$令\frac{\partial Q(\theta\mid\theta^{(i)})}{\partial p}=0\\ \therefore p^{(i+1)}=\frac{\sum_{j=1}^n\mu_j^{(i+1)}y_j}{\sum_{j=1}^n\mu_j^{(i+1)}}$

求 $q$ ：
$令\frac{\partial Q(\theta\mid\theta^{(i)})}{\partial q}=0\\ \therefore q^{(i+1)}=\frac{\sum_{j=1}^n(1-\mu_j^{(i+1)})y_j}{\sum_{j=1}^n(1-\mu_j^{(i+1)})}$

import numpy
import math

class EM:
    def __init__(self, prob):
        self.prob_A, self.prob_B, self.prob_C = prob
#     e-step
    def expectation(self, j):
        '''
        计算出对于每一个y_j来自B的概率\mu_j
        '''
        prob_1 = self.prob_A * math.pow(self.prob_B, data[j]) * math.pow(1 - self.prob_B, 1 -data[j])  # B,即 z=1
        prob_0 = (1 - self.prob_A) * math.pow(self.prob_C, data[j]) * math.pow(1 - self.prob_C, 1- data[j])  # C,即 z=0
        return prob_1 / (prob_1 + prob_0)  # 返回 \mu_j

#     m-step
    def maximization(self, data):
        count = len(data)  # n
        print('init prob:{},{},{}'.format(self.prob_A, self.prob_B, self.prob_C))
        for d in range(count):
            _ = yield
            mu = [self.expectation(j) for j in range(count)]  # 得到\mu向量
            prob_A = 1 / count * sum(mu)  # 计算新的 \pi
            prob_B = sum([mu[j] * data[j] for j in range(count)]) / sum([mu[j] for j in range(count)])  # 计算新的 p
            prob_C = sum([(1 - mu[j]) * data[j] for j in range(count)]) / sum([(1 - mu[j]) for j in range(count)])  # 计算新的 q
            print('{}/{} prob_A:{:.3f}, prob_B:{:.3f}, prob_C:{:.3f}'.format(d, count, prob_A, prob_B, prob_C))
            self.prob_A = prob_A  # 赋值
            self.prob_B = prob_B  # 赋值
            self.prob_C = prob_C  # 赋值
data = [1, 1, 0, 1, 0, 0, 1, 0, 1, 1]  # 可观测数据集 Y
em = EM(prob=[0.5, 0.5, 0.5])
f = em.maximization(data)
next(f)
f.send(1)
f.send(2)
em = EM(prob=[0.4, 0.6, 0.7])
f = em.maximization(data)
next(f)
f.send(1)
f.send(2)