EM算法
问题描述
EM算法是一种利用数据估计生成模型的算法。即求解目标为数据满足的概率模型。
假设观测变量为 Y = { y 1 , y 2 , … , y N } Y=\{y_1,y_2,\dots,y_N\} Y={
y1,y2,…,yN},隐藏变量为 Z Z Z,估计参数为 θ \theta θ。
根据最大似然估计的思想,很自然的想法就是寻找合适的参数 θ \theta θ,使得取得观测值的概率最大。
arg max θ P ( Y ∣ θ ) (1) \arg \max_\theta P(Y|\theta)\tag{1} argθmaxP(Y∣θ)(1)
一般我们认为样本之间独立同分布,因此有
P ( Y ∣ θ ) = ∏ j = 1 N P ( y j ∣ θ ) (2) P(Y|\theta)=\prod_{j=1}^NP(y_j|\theta)\tag{2} P(Y∣θ)=j=1∏NP(yj∣θ)(2)
为了方便运算,一般将 ( 1 ) (1) (1)式改写为如下形式
arg max θ log P ( Y ∣ θ ) (3) \arg \max_\theta \log P(Y|\theta)\tag{3} argθmaxlogP(Y∣θ)(3)
由于隐藏变量的存在,这个最值问题无法直接求解,因此使用EM算法来迭代
算法导出
首先定义 L ( θ ) L(\theta) L(θ)
L ( θ ) ≜ log P ( Y ∣ θ ) (4) L(\theta) \triangleq \log P(Y|\theta) \tag{4} L(θ)≜logP(Y∣θ)(4)
迭代过程中我们希望求得的 θ ( 1 ) , θ ( 2 ) , ⋯ , θ ( i ) , ⋯ \theta^{(1)},\theta^{(2)},\cdots,\theta^{(i)},\cdots θ(1),θ(2),⋯,θ(i),⋯保证 L ( θ ) L(\theta) L(θ)能够单调递增。为此,考虑 L ( θ ) − L ( θ ( i ) ) L(\theta)-L(\theta^{(i)}) L(θ)−L(θ(i))
L ( θ ) − L ( θ ( i ) ) = log ( P ( Y ∣ θ ) ) − log ( P ( Y ∣ θ ( i ) ) ) = log ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) ) − log ( P ( Y ∣ θ ( i ) ) ) = log ( P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) − log ( P ( Y ∣ θ ( i ) ) ) = log ( E P ( Z ∣ Y , θ ( i ) ) [ P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ] ) − log ( P ( Y ∣ θ ( i ) ) ) ⩾ E P ( Z ∣ Y , θ ( i ) ) [ log ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) ) ] − E P ( Z ∣ Y , θ ( i ) ) [ log ( P ( Y ∣ θ ( i ) ) ) ] = E P ( Z ∣ Y , θ ( i ) ) [ log ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ θ ( i ) ) ) ] \begin{aligned} L(\theta)-L(\theta^{(i)}) & = \log(P(Y|\theta))-\log(P(Y|\theta^{(i)}))\\ & =\log(P(Y|Z,\theta)P(Z|\theta))-\log(P(Y|\theta^{(i)}))\\ &=\log(P(Z|Y,\theta^{(i)})\frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})})-\log(P(Y|\theta^{(i)}))\\ &=\log(E_{P(Z|Y,\theta^{(i)})}[ \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})}])-\log(P(Y|\theta^{(i)}))\\ &\geqslant E_{P(Z|Y,\theta^{(i)})}[\log(\frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})})]-E_{P(Z|Y,\theta^{(i)})}[\log(P(Y|\theta^{(i)}))]\\ &=E_{P(Z|Y,\theta^{(i)})}[\log(\frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})P(Y|\theta^{(i)})})] \end{aligned} L(θ)−L(θ(i))=log(P(Y∣θ))−log(P(Y∣θ(i)))=log(P(Y∣Z,θ)P(Z∣θ))−log(P(Y∣θ(i)))=log(P(Z∣Y,θ(i))P(Z∣Y,θ(i))P(Y∣Z,θ)P(Z∣θ))−log(P(Y∣θ(i)))=log(EP(Z∣Y,θ(i))[P(Z∣Y,θ(i))P(Y∣Z,θ)P(Z∣θ)])−log(P(Y∣θ(i)))⩾EP(Z∣Y,θ(i))[log(P(Z∣Y,θ(i))P(Y∣Z,θ)P(Z∣θ))]−EP(Z∣Y,θ(i))[log(P(Y∣θ(i)))]=EP(Z∣Y,θ(i))[log(P(Z∣Y,θ(i))P(Y∣θ(i))P(Y∣Z,θ)P