隐马尔可夫模型(HHM)学习笔记2

EM算法

整理了李航的书。
EM算法是一种迭代算法,用于含有隐变量的概率模型参数的极大似然估计,或极大后验概率估计,EM算法的每次迭代由两步组成:E步,求期望;M步,求极大。下文仅讨论极大似然估计。
由一个例子引入EM算法:
假设有3枚硬币,分别记作A,B,C。这些硬币正面出现的概率分别为 π \pi π p p p q q q。进行如下掷硬币实验:先掷硬币A,若正面则再掷硬币B;若反面则再掷硬币C;记第二次掷硬币的结果,正面记为1,反面记为0;独立地重复 n n n次实验(这里 n = 10 n=10 n=10),结果如下: 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1。假设只能观测到掷硬币的结果,不能观测掷硬币的过程,估计 π \pi π p p p q q q
三硬币模型可以写作: P ( y ∣ θ ) = ∑ z ( y , z ∣ θ ) = ∑ z P ( z ∣ θ ) P ( y ∣ z , θ ) = π p y ( 1 − p ) 1 − y + ( 1 − π ) q 1 − y ( 1 − q ) q P\left( {y|\theta } \right) = \sum\limits_z {\left( {y,z|\theta } \right) = \sum\limits_z {P\left( {z|\theta } \right)P\left( {y|z,\theta } \right) = \pi {p^y}{{\left( {1 - p} \right)}^{1 - y}} + \left( {1 - \pi } \right){q^{1 - y}}{{\left( {1 - q} \right)}^q}} } P(yθ)=z(y,zθ)=zP(zθ)P(yz,θ)=πpy(1p)1y+(1π)q1y(1q)q这里随机变量 y y y是观测变量,表示某次试验的结果是1或0;随机变量 z z z是隐变量,表示未观测到的掷硬币A的结果; θ = ( π , p , q ) \theta = \left( {\pi ,p,q} \right) θ=(π,p,q)是模型参数。
然后求模型 θ = ( π , p , q ) \theta = \left( {\pi ,p,q} \right) θ=(π,p,q)的极大似然估计,即 θ ^  =  arg ⁡ m a x θ log ⁡ P ( Y ∣ θ ) \widehat \theta {\text{ = }}\arg \mathop {max}\limits_\theta \log P\left( {Y|\theta } \right) θ  = argθmaxlogP(Yθ)。这个问题没有解析解,只能通过迭代的方法求解。EM算法就是一种解决这类问题的迭算法。

EM算法步骤:
输入:观测变量数据 Y Y Y,隐变量数据 Z Z Z,联合分布 P ( Y , Z ∣ θ ) P\left( {Y,Z|\theta } \right) P(Y,Zθ),条件分布 P ( Z ∣ Y , θ ) P\left( {Z|Y,\theta } \right) P(ZY,θ)
输出:模型参数 θ \theta θ
1)选择参数的初值 θ ( 0 ) {\theta ^{\left( 0 \right)}} θ(0),开始迭代。初值是可以任意选择,但是EM算法是初值敏感的。
2)E步:记 θ ( i ) {\theta ^{\left( i \right)}} θ(i)为第 i i i次迭代参数 θ \theta θ的估计值,在第 i + 1 i+1 i+1次迭代的E步,计算 Q ( θ , θ ( i ) ) = E Z [ log ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log ⁡ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = {E_Z}\left[ {\log P\left( {Y,Z|\theta } \right)|Y,{\theta ^{\left( i \right)}}} \right] = \sum\limits_Z {\log P\left( {Y,Z|\theta } \right)P\left( {Z|Y,{\theta ^{\left( i \right)}}} \right)} Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]=ZlogP(Y,Zθ)P(ZY,θ(i))函数 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))是EM算法的核心,称为Q函数。
3)M步:求使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))极大化的 θ \theta θ,确定第 i + 1 i+1 i+1次迭代的参数的估计值 θ ( i + 1 ) {{\theta ^{\left( i+1 \right)}}} θ(i+1) θ ( i + 1 ) = arg ⁡ m a x θ Q ( θ , θ ( i ) ) {\theta ^{\left( {i + 1} \right)}} = \arg \mathop {max}\limits_\theta Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) θ(i+1)=argθmaxQ(θ,θ(i))4)重复第2步和第3步,直到收敛。
Q函数的定义:完全数据的对数似然函数 log ⁡ P ( Y , Z ∣ θ ) \log P\left( {Y,Z|\theta } \right) logP(Y,Zθ)关于在给定观测数据 Y Y Y和当前参数 θ ( i ) \theta ^{\left( i \right)} θ(i)对未观测数据 Z Z Z的条件概率分布 P ( Z ∣ Y , θ ( i ) ) P\left( {Z|Y,{\theta ^{\left( i \right)}}}\right) P(ZY,θ(i))的期望称为Q函数。
求解上述三硬币问题:
1)选择模型参数初值为 θ ( 0 ) = { 0.5 , 0.5 , 0.5 } {\theta ^{\left( 0 \right)}} = \left\{ {0.5,0.5,0.5} \right\} θ(0)={0.5,0.5,0.5}
2)E步: Q ( θ , θ ( i ) ) = ∑ Z log ⁡ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) = ∑ j = 1 10 ∑ k = 1 2 log ⁡ P ( y j , z k ∣ θ ) P ( z k ∣ y j , θ ( i ) ) = Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) = \sum\limits_Z {\log P\left( {Y,Z|\theta } \right)P\left( {Z|Y,{\theta ^{\left( i \right)}}} \right)} = \sum\limits_{j = 1}^{10} {\sum\limits_{k = 1}^2 {\log P\left( {{y_j},{z_k}|\theta } \right)P\left( {{z_k}|{y_j},{\theta ^{\left( i \right)}}} \right)} } = Q(θ,θ(i))=ZlogP(Y,Zθ)P(ZY,θ(i))=j=110k=12logP(yj,zkθ)P(zkyj,θ(i))= ∑ j = 1 10 log ⁡ π ( i + 1 ) ( p ( i + 1 ) ) y j ( 1 − p ( i + 1 ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j + log ⁡ ( 1 − π ( i + 1 ) ) ( q ( i + 1 ) ) y j ( 1 − q ( i + 1 ) ) 1 − y j ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j \begin{matrix}\sum\limits_{j = 1}^{10}\end{matrix}\begin{matrix} {\log {\pi ^{\left( {i + 1} \right)}}{{\left( {{p^{\left( {i + 1} \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( {i + 1} \right)}}} \right)}^{1 - {y_j}}}\frac{{{\pi ^{\left( i \right)}}{{\left( {{p^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( i \right)}}} \right)}^{1 - {y_j}}}}}{{{\pi ^{\left( i \right)}}{{\left( {{p^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( i \right)}}} \right)}^{1 - {y_j}}} + \left( {1 - {\pi ^{\left( i \right)}}} \right){{\left( {{q^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {q^{\left( i \right)}}} \right)}^{1-{y_j}}}}} + } \\ {\log \left( {1 - {\pi ^{\left( i+1 \right)}}} \right){{\left( {{q^{\left( i+1 \right)}}} \right)}^{{y_j}}}{{\left( {1 - {q^{\left( i+1 \right)}}} \right)}^{1-{y_j}}}\frac{{\left( {1 - {\pi ^{\left( i \right)}}} \right){{\left( {{q^{\left( i \right)}}} \right)}^{y_j}}{{\left( {1 - {q^{\left( i \right)}}} \right)}^{1-{y_j}}}}}{{{\pi ^{\left( i \right)}}{{\left( {{p^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( i \right)}}} \right)}^{1 - {y_j}}} + \left( {1 - {\pi ^{\left( i \right)}}} \right){{\left( {{q^{\left( i \right)}}} \right)}^{y_j}}{{\left( {1 - {q^{\left( i \right)}}} \right)}^{1-{y_j}}}}}}\end{matrix} j=110logπ(i+1)(p(i+1))yj(1p(i+1))1yjπ(i)(p(i))yj(1p(i))1yj+(1π(i))(q(i))yj(1q(i))1yjπ(i)(p(i))yj(1p(i))1yj+log(1π(i+1))(q(i+1))yj(1q(i+1))1yjπ(i)(p(i))yj(1p(i))1yj+(1π(i))(q(i))yj(1q(i))1yj(1π(i))(q(i))yj(1q(i))1yj
3)M步:求使 Q ( θ , θ ( i ) ) Q\left( {\theta ,{\theta ^{\left( i \right)}}} \right) Q(θ,θ(i))极大化的 θ \theta θ。令 μ j ( i + 1 ) = π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j π ( i ) ( p ( i ) ) y j ( 1 − p ( i ) ) 1 − y j + ( 1 − π ( i ) ) ( q ( i ) ) y j ( 1 − q ( i ) ) 1 − y j {\mu_j ^{\left( {i + 1} \right)}} = \frac{{{\pi ^{\left( i \right)}}{{\left( {{p^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( i \right)}}} \right)}^{1 - {y_j}}}}}{{{\pi ^{\left( i \right)}}{{\left( {{p^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {p^{\left( i \right)}}} \right)}^{1 - {y_j}}} + \left( {1 - {\pi ^{\left( i \right)}}} \right){{\left( {{q^{\left( i \right)}}} \right)}^{{y_j}}}{{\left( {1 - {q^{\left( i \right)}}} \right)}^{1 - {y_j}}}}} μj(i+1)=π(i)(p(i))yj(1p(i))1yj+(1π(i))(q(i))yj(1q(i))1yjπ(i)(p(i))yj(1p(i))1yj ∂ Q ∂ π ( i + 1 ) = ∑ j = 1 10 μ j ( i + 1 ) π ( i + 1 ) − 1 − μ j ( i + 1 ) 1 − π ( i + 1 ) = 0 {\bf \frac{{\partial Q}}{{\partial {\pi ^{\left( {i + 1} \right)}}}}} = \sum\limits_{j = 1}^{10} {\frac{{{\mu_j ^{\left( i+1 \right)}}}}{{{\pi ^{\left( {i + 1} \right)}}}} - \frac{{1 - {\mu_j ^{\left( i+1 \right)}}}}{{1 - {\pi ^{\left( {i + 1} \right)}}}}} = 0 π(i+1)Q=j=110π(i+1)μj(i+1)1π(i+1)1μj(i+1)=0 ⇒ ∑ j = 1 10 μ j ( i + 1 ) ( 1 − π ( i + 1 ) ) − ( 1 − μ j ( i + 1 ) ) π ( i + 1 ) = 0 \Rightarrow\sum\limits_{j = 1}^{10} {{\mu_j ^{\left( i+1 \right)}}\left( {1 - {\pi ^{\left( {i + 1} \right)}}} \right) - \left( {1 - {\mu_j ^{\left( i+1 \right)}}} \right){\pi ^{\left( {i + 1} \right)}}} = 0 j=110μj(i+1)(1π(i+1))(1μj(i+1))π(i+1)=0 ⇒ π ( i + 1 ) = ∑ i = 1 10 μ j ( i + 1 ) 10 \Rightarrow{\pi ^{\left( {i + 1} \right)}} = {\bf\frac{{\sum\limits_{i = 1}^{10} {{\mu_j ^{\left( i+1 \right)}}} }}{{10}}} π(i+1)=10i=110μj(i+1) ∂ Q ∂ p ( i + 1 ) = ∑ j = 1 10 μ j ( i + 1 ) ( y i ( p ( i + 1 ) ) y i − 1 ( 1 − p ( i + 1 ) ) 1 − y i − ( 1 − y i ) ( p ( i + 1 ) ) y i ( 1 − p ( i + 1 ) ) − y i ) ( p ( i + 1 ) ) y i ( 1 − p ( i + 1 ) ) 1 − y i = 0 {\bf \frac{{\partial {\text{Q}}}}{{\partial {p^{\left( {i + 1} \right)}}}}} = \sum\limits_{j = 1}^{10} {\frac{{{\mu _j}^{\left( {i + 1} \right)}\left( {{y_i}{{\left( {{p^{\left( {i + 1} \right)}}} \right)}^{{y_i} - 1}}{{\left( {1 - {p^{\left( {i + 1} \right)}}} \right)}^{1 - {y_i}}} - \left( {1 - {y_i}} \right){{\left( {{p^{\left( {i + 1} \right)}}} \right)}^{{y_i}}}{{\left( {1 - {p^{\left( {i + 1} \right)}}} \right)}^{ - {y_i}}}} \right)}}{{{{\left( {{p^{\left( {i + 1} \right)}}} \right)}^{{y_i}}}{{\left( {1 - {p^{\left( {i + 1} \right)}}} \right)}^{1 - {y_i}}}}}} = 0 p(i+1)Q=j=110(p(i+1))yi(1p(i+1))1yiμj(i+1)(yi(p(i+1))yi1(1p(i+1))1yi(1yi)(p(i+1))yi(1p(i+1))yi)=0 ⇒ ∑ j = 1 10 μ j ( i + 1 ) [ y j ( 1 − p ( i + 1 ) ) − ( 1 − y j ) p ( i + 1 ) ] = 0 \Rightarrow \sum\limits_{j = 1}^{10} {{\mu _j}^{\left( {i + 1} \right)}\left[ {{y_j}\left( {1 - {p^{\left( {i + 1} \right)}}} \right) - \left( {1 - {y_j}} \right){p^{\left( {i + 1} \right)}}} \right]} = 0 j=110μj(i+1)[yj(1p(i+1))(1yj)p(i+1)]=0 ⇒ ∑ j = 1 10 μ j ( i + 1 ) y j = p ( i + 1 ) ∑ j = 1 10 μ j ( i + 1 ) ⇒ p ( i + 1 ) = ∑ j = 1 10 μ j ( i + 1 ) y j ∑ j = 1 10 μ j ( i + 1 ) \Rightarrow \sum\limits_{j = 1}^{10} {{\mu _j}^{\left( {i + 1} \right)}{y_j}} = {p^{\left( {i + 1} \right)}}\sum\limits_{j = 1}^{10} {{\mu _j}^{\left( {i + 1} \right)}} \Rightarrow {\bf{p^{\left( {i + 1} \right)}} = \frac{{\sum\limits_{j = 1}^{10} {{\mu _j}^{\left( {i + 1} \right)}{y_j}} }}{{\sum\limits_{j = 1}^{10} {{\mu _j}^{\left( {i + 1} \right)}} }}} j=110μj(i+1)yj=p(i+1)j=110μj(i+1)p(i+1)=j=110μj(i+1)j=110μj(i+1)yj同理,由 ∂ Q ∂ q ( i + 1 ) = 0 {\bf \frac{{\partial {\text{Q}}}}{{\partial {q^{\left( {i + 1} \right)}}}} = 0} q(i+1)Q=0 q ( i + 1 ) = ∑ j = 1 10 ( 1 − μ j ( i + 1 ) ) y j ∑ j = 1 10 1 − μ j ( i + 1 ) {\bf{q^{\left( {i + 1} \right)}} = \frac{{\sum\limits_{j = 1}^{10} {\left( {1 - {\mu _j}^{\left( {i + 1} \right)}} \right){y_j}} }}{{\sum\limits_{j = 1}^{10} {1 - {\mu _j}^{\left( {i + 1} \right)}} }}} q(i+1)=j=1101μj(i+1)j=110(1μj(i+1))yj

EM算法的收敛性

定理一:设 P ( Y ∣ θ ) P\left( {Y|\theta } \right) P(Yθ)为观测数据的似然函数, θ ( i ) ( i = 1 , 2 , ⋯ ) {\theta ^{\left( i \right)}}\left( {i = 1,2, \cdots } \right) θ(i)(i=1,2,)为EM算法得到的参数估计序列, P ( Y ∣ θ ( i ) ) ( i = 1 , 2 , ⋯ ) P\left( {Y|{\theta ^{\left( i \right)}}} \right)\left( {i = 1,2, \cdots } \right) P(Yθ(i))(i=1,2,)为对应的似然函数序列,则 P ( Y ∣ θ ( i ) ) P\left( {Y|{\theta ^{\left( i \right)}}} \right) P(Yθ(i))是单调递增的,即 P ( Y ∣ θ ( i + 1 ) ) ⩾ P ( Y ∣ θ ( i ) ) P\left( {Y|{\theta ^{\left( {i + 1} \right)}}} \right) \geqslant P\left( {Y|{\theta ^{\left( i \right)}}} \right) P(Yθ(i+1))P(Yθ(i))
定理二:设 L ( θ ) = log ⁡ P ( Y ∣ θ ) L\left( \theta \right) = \log P\left( {Y|\theta } \right) L(θ)=logP(Yθ)为观测数据的对数似然函数, θ ( i ) ( i = 1 , 2 , ⋯ ) {\theta ^{\left( i \right)}}\left( {i = 1,2, \cdots } \right) θ(i)(i=1,2,)为EM算法得到的参数估计序列, L ( θ ( i ) ) ( i = 1 , 2 , ⋯ ) L\left( {{\theta ^{\left( i \right)}}} \right)\left( {i = 1,2, \cdots } \right) L(θ(i))(i=1,2,)为对应的对数似然函数序列。
1)如果 log ⁡ P ( Y ∣ θ ) \log P\left( {Y|\theta } \right) logP(Yθ)有上界,则 L ( θ ( i ) ) = log ⁡ P ( Y ∣ θ ( i ) ) L\left( {{\theta ^{\left( i \right)}}} \right) = \log P\left( {Y|{\theta ^{\left( i \right)}}} \right) L(θ(i))=logP(Yθ(i))收敛到某一值 L ∗ {L^*} L
2)在函数 Q ( θ , θ ′ ) Q\left( {\theta ,{\theta^\prime}} \right) Q(θ,θ) L ( θ ) L\left( \theta \right) L(θ)满足一定条件下,由EM算法得到的参数估计序列 θ ( i ) \theta ^{\left( i \right)} θ(i)的收敛值 θ ∗ {\theta ^ * } θ L ( θ ) L\left( \theta \right) L(θ)的稳定点。
EM算法的收敛性包含关于对数似然函数序列 L ( θ ( i ) ) L\left( {{\theta ^{\left( i \right)}}} \right) L(θ(i))的收敛性和关于参数估计序列 θ ( i ) \theta ^{\left( i \right)} θ(i)的收敛性两层意思,前者不包括后者。定理只能保证参数估计序列收敛到对数似然函数序列的稳定点,不能保证收敛到极大值点。
下一篇《隐马尔可夫模型的训练》

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值