1.16 The EM algorithm(Expectation-Maximization)

enthwxq

于 2018-12-26 21:26:56 发布

阅读量240

点赞数

分类专栏： ML

本文链接：https://blog.csdn.net/enthwxq/article/details/85255252

版权

ML 专栏收录该内容

18 篇文章 1 订阅

订阅专栏

1.Jensen's inequality

If $\large f$ is convex function and $\large X$ is a random variable, then $\large E(f(X))\geq f(EX)$ .

If $\large f$ is continuous and derivable,then $\large f'' \geqslant 0$ (In the case of $\large f$ taking vector-valued inputs,the Hessian matrix $\large H$ is positive semi-definite)

In the following, the function we will use is $\large f = log(x)$ . It is a concave function,since $\large f''=-\frac{1}{x^2}< 0$ .Using Jensen's inequality,we get : $\large log(E(X))\geq E(log(X))$ .

2.EM algorithm

Suppose probability density function of $\large x$ is $\large p(x)$ parameterized by $\large \theta$ ,i.e. $\large p(x;\theta)$ .We want to estimate the value of $\large \theta$ .

the likelihood function is : $\large l(\theta)=\sum_{i}^{m} log(p(x;\theta))=\sum_{i=1}^{m}log(\sum_{z}p(x,z;\theta))$ ,where $\large z$ is the latent random variable.

Because z is not observed, explicitly finding the maximum likelihood estimates of the parameters $\large \theta$ is very hard.So we will use EM algorithm.The related derivation is as follows:

$\large z^i$ is another random variable and it has distribution $\large Q(z^i)$ . $\large \sum_{z}Q_i(z)=1,Q_i(z)\geq 0$

$\large \sum_ilog(p(x^i;\theta))=\sum_i log(\sum_z p(x^i,z^i;\theta))=\sum_i log(\sum_z Q_i(z^i)\frac{p(x^i,z^i;\theta)}{Q_i(z^i)})$

$\large =\sum_i log(E(\frac{p(x^i,z^i;\theta)}{Q_i(z^i)}))$ $\large \geq \sum_i E(log(\frac{p(x^i,z^i;\theta)}{Q_i(z^i)}))=\sum_i Q_i(z^i)log(\frac{p(x^i,z^i;\theta)}{Q_i(z^i)})$

So far, the aim of this derivation is to construct a lower bound for $\large l(\theta)$ ,the function we want to maximize.

To find a tighter bound is to let this item $\large \frac{p(x^i,z^i;\theta)}{Q_i(z^i)}$ equal to a constant-valued random variable .I.e. Let $\large \frac{p(x^i,z^i;\theta)}{Q_i(z^i)}=c$

we get $\large Q_i(z^i) \propto p(x^i,z^i;\theta)$ . Further,since $\large \sum_z Q_i(z^i)=1$ ,we can get : $\large Q_i(z^i)=\frac{p(x^i,z^i;\theta)}{\sum_zp(x^i,z;\theta)}=\frac{p(x^i,z^i;\theta)}{p(x^i;\theta)}=p(z^i|x^i;\theta)$