隐马尔可夫模型(HHM)学习笔记3

本文整理了李航书上关于隐马尔可夫模型的内容。在训练方面,介绍了用EM算法实现参数学习的Baum - Welch算法,包括E步求Q函数和M步极大化三项。在预测方面,阐述了用动态规划求解的Viterbi算法,可求概率最大路径。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

隐马尔可夫模型的训练

Baum-Welch算法

整理了李航书上的内容。
马尔科夫模型是一个含有隐变量的概率模型P(x∣λ)=∑yP(x∣y,λ)P(y∣λ)P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{\bf{y}} {P\left( {{\bf{x}}|{\bf{y}},{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{\lambda }}} \right)} P(xλ)=yP(xy,λ)P(yλ)按照Q函数的定义它的参数学习可以由EM算法实现。
1)所有观测数据写成x=(x1,x2,⋯xT){\bf{x}} = \left( {{x_1},{x_2}, \cdots {x_T}} \right)x=(x1,x2,xT),所有隐数据写成y=(y1,y2,⋯yT){\bf{y}} = \left( {{y_1},{y_2}, \cdots {y_T}} \right)y=(y1,y2,yT),完全数据是(x,y)=(x1,x2,⋯ ,xT,y1,y2,⋯ ,yT)\left( {{\bf{x}},{\bf{y}}} \right) = \left( {{x_1},{x_2}, \cdots ,{x_T},y_1,y_2,\cdots,y_T} \right)(x,y)=(x1,x2,,xT,y1,y2,,yT)。完全数据的对数似然函数是log⁡P(x,y∣λ)\log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right)logP(x,yλ)
2)EM算法的E步:求Q函数Q(λ,λ‾)Q\left( {\lambda ,\overline \lambda } \right)Q(λ,λ)Q(λ,λ‾)=∑ylog⁡P(x,y∣λ)P(x,y∣λ‾)Q\left( {\lambda ,\overline \lambda } \right) = \sum\limits_{\bf{y}} {\log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} Q(λ,λ)=ylogP(x,yλ)P(x,yλ)注:按照Q函数的定义Q(λ,λ‾)=Ey[log⁡P(x,y∣λ)y,λ‾]=∑ylog⁡P(x,y∣λ)P(y∣x,λ‾)Q\left( {{\bf{\lambda ,}}\overline {\bf{\lambda }} } \right) = {E_{\bf{y}}}\left[ {\log P\left( {{\bf{x,y}}|{\bf{\lambda }}} \right){\bf{y}},\overline {\bf{\lambda }} } \right]=\sum\limits_{\bf{y}} {\log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{x}},\overline {\bf{\lambda }} } \right)}Q(λ,λ)=Ey[logP(x,yλ)y,λ]=ylogP(x,yλ)P(yx,λ),上式省略了对λ{\bf{\lambda }}λ而言的常数因子1/P(x∣λ‾)1/P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right)1/P(xλ)P(y,x∣λ‾)/P(y∣x,λ‾)=1/P(x∣λ‾)P\left( {{\bf{y}},{\bf{x}}|\overline {\bf{\lambda }} } \right)/P\left( {{\bf{y}}|{\bf{x}},\overline {\bf{\lambda }} } \right) = 1/P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right)P(y,xλ)/P(yx,λ)=1/P(xλ))。
其中,λ‾{\overline {\bf{\lambda }} }λ是隐马尔可夫模型参数的当前估计值,λ\lambdaλ是要极大化的隐马尔可夫模型参数。P(x,y∣λ)=πy1by1x1ay1y2by2x2⋯ayT−1yTbyTxTP\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right) = {\pi _{{y_1}}}{b_{{y_1}{x_1}}}{a_{{y_1}{y_2}}}{b_{{y_2}{x_2}}} \cdots {a_{{y_{T - 1}}{y_T}}}{b_{{y_T}{x_T}}}P(x,yλ)=πy1by1x1ay1y2by2x2ayT1yTbyTxT于是函数Q(λ,λ‾)=∑ylog⁡πy1P(x,y∣λ‾)+∑y(∑t=1T−1log⁡aytayt+1)P(x,y∣λ‾)+∑y(∑t=1Tlog⁡bytxt)P(x,y∣λ‾)\begin{aligned}Q\left( {{\bf{\lambda ,}}\overline {\bf{\lambda }} } \right) &= \sum\limits_{\bf{y}} {\log {{\bf{\pi }}_{{y_1}}}P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)}\\ &+ \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T - 1} {\log {a_{{y_t}}}{a_{{y_{t + 1}}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} + \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} \end{aligned}Q(λ,λ)=ylogπy1P(x,yλ)+y(t=1T1logaytayt+1)P(x,yλ)+y(t=1Tlogbytxt)P(x,yλ)
3)EM算法的M步:对上式的三项分别极大化。第一部分:∑ylog⁡πy1P(x,y∣λ‾)=∑i=1Nlog⁡πsiP(x,y1=si∣λ‾)\sum\limits_{\bf{y}} {\log {{\bf{\pi }}_{{y_1}}}P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{i = 1}^N {\log {{\bf{\pi }}_{{s_i}}}P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} ylogπy1P(x,yλ)=i=1NlogπsiP(x,y1=siλ)注意到∑i=1Nπsi=1\sum\limits_{i = 1}^N {{{\bf{\pi }}_{{s_i}}}} = 1i=1Nπsi=1,利用拉格朗日乘子法,写出拉格朗日函数:∑i=1Nlog⁡πsiP(x,y1=si∣λ‾)+γ(∑i=1Nπsi−1)\sum\limits_{i = 1}^N {\log {{\bf{\pi }}_{{s_i}}}P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} + \gamma \left( {\sum\limits_{i = 1}^N {{{\bf{\pi }}_{{s_i}}}} - 1} \right)i=1NlogπsiP(x,y1=siλ)+γ(i=1Nπsi1)关于πsi{{{\bf{\pi }}_{{s_i}}}}πsi的偏导等于零P(x,y1=si∣λ‾)πsi+γ=0⇒P(x,y1=si∣λ‾)+πsiγ=0(1)\frac{{P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)}}{{{\pi _{{s_i}}}}} + \gamma = 0 \Rightarrow P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right) + {\pi _{{s_i}}}\gamma = 0\qquad(1)πsiP(x,y1=siλ)+γ=0P(x,y1=siλ)+πsiγ=0(1)⇒−γ=∑i=1NP(x,y1=si∣λ‾)=P(x∣λ‾) \Rightarrow - \gamma = \sum\limits_{i = 1}^N {P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} = P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right)γ=i=1NP(x,y1=siλ)=P(xλ)带回(1)式得πsi=P(x,y1=si∣λ‾)P(x∣λ‾)=γ1(i){\pi _{{s_i}}} = \frac{{P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)}}{{P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right)}}={\gamma _1}\left( i \right)πsi=P(xλ)P(x,y1=siλ)=γ1(i)第二部分:∑y(∑t=1T−1log⁡aytayt+1)P(x,y∣λ‾)=∑i=1N∑j=1N∑t=1T−1log⁡aijP(x,yt=si,yt+1=sj∣λ‾)\sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T - 1} {\log {a_{{y_t}}}{a_{{y_{t + 1}}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T - 1} {\log {a_{ij}}P\left( {{\bf{x}},{y_t} = {s_i},{y_{t+1}} = {s_j}|\overline {\bf{\lambda }} } \right)} } } y(t=1T1logaytayt+1)P(x,yλ)=i=1Nj=1Nt=1T1logaijP(x,yt=si,yt+1=sjλ)类比第一部分,发现∑j=1Naij=1\sum\limits_{j = 1}^N {{a_{ij}}} = 1j=1Naij=1,写出拉格朗日函数:∑i=1N∑j=1N∑t=1T−1log⁡aijP(x,yt=si,yt+1=sj∣λ‾)+γ(∑j=1Naij−1)\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T - 1} {\log {a_{ij}}P\left( {{\bf{x}},{y_t} = {s_i},{y_{t+1}} = {s_j}|\overline {\bf{\lambda }} } \right)} } + \gamma \left( {\sum\limits_{j = 1}^N {{a_{ij}}} - 1} \right)} i=1Nj=1Nt=1T1logaijP(x,yt=si,yt+1=sjλ)+γ(j=1Naij1)关于aij{{a_{ij}}}aij的偏导等于零∑t=1T−1P(x,yt=si,yt+1=sj∣λ‾)aij+γ=0(2)\frac{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)} }}{{{a_{ij}}}} + \gamma = 0\qquad(2)aijt=1T1P(x,yt=si,yt+1=sjλ)+γ=0(2)⇒−γ=∑t=1T−1P(x,yt=si,yt+1=sj∣λ‾)P(x,yt+1=sj∣yt=si,λ‾)=∑t=1T−1P(x,yt=si∣λ‾) \Rightarrow - \gamma = \sum\limits_{t = 1}^{T - 1} {\frac{{P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)}}{{P\left( {{\bf{x}},{y_{t + 1}} = {s_j}|{y_t} = {s_i},\overline {\bf{\lambda }} } \right)}} = } \sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i}|\overline {\bf{\lambda }} } \right)} γ=t=1T1P(x,yt+1=sjyt=si,λ)P(x,yt=si,yt+1=sjλ)=t=1T1P(x,yt=siλ)代回(2)式得aij=∑t=1T−1P(x,yt=si,yt+1=sj∣λ‾)∑t=1T−1P(x,yt=si∣λ‾)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i){a_{ij}} = \frac{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)} }}{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i}|\overline {\bf{\lambda }} } \right)} }}=\frac{{\sum\limits_{t = 1}^{T - 1} {{\xi _t}\left( {i,j} \right)} }}{{\sum\limits_{t = 1}^{T - 1} {{\gamma _t}\left( i \right)} }}aij=t=1T1P(x,yt=siλ)t=1T1P(x,yt=si,yt+1=sjλ)=t=1T1γt(i)t=1T1ξt(i,j)这符合aija_{ij}aij的实际意义:在yt=siy_t=s_iyt=si的前提下yt+1=sjy_{t+1}=s_{j}yt+1=sj的概率。
第三部分:∑y(∑t=1Tlog⁡bytxt)P(x,y∣λ‾)=∑j=1N(∑t=1T−1log⁡bytxt)P(x,yt=j∣λ‾)\sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{j = 1}^N {\left( {\sum\limits_{t = 1}^{T - 1} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{y_t} = j|\overline {\bf{\lambda }} } \right)} y(t=1Tlogbytxt)P(x,yλ)=j=1N(t=1T1logbytxt)P(x,yt=jλ)同样有∑k=1Mbjxk=1\sum\limits_{k = 1}^M {{b_{j{x_k}}}} = 1k=1Mbjxk=1,写出拉格朗日函数:∑j=1N∑t=1Tlog⁡bytxtP(x,yt=sj∣λ‾)+γ(∑k=1Mbjxk−1)\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right) + \gamma \left( {\sum\limits_{k = 1}^M {{b_{j{x_k}}}} - 1} \right)} j=1Nt=1TlogbytxtP(x,yt=sjλ)+γ(k=1Mbjxk1)关于bjxk{{b_{j{x_k}}}}bjxk的偏导等于零∑t=1TP(x,yt=sj∣λ‾)I(xt=ok)bjxt+γ=0(3)\frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{{b_{j{x_t}}}}} + \gamma = 0\qquad(3)bjxtt=1TP(x,yt=sjλ)I(xt=ok)+γ=0(3)注:I(true)=1,I(false)=0I(true)=1,I(false)=0I(true)=1,I(false)=0.−γ=∑t=1TP(x,yt=sj∣λ‾)I(xt=ok)P(yt=sj∣xt=ok,λ‾)=∑t=1TP(x,yt=sj∣λ‾) - \gamma = \frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{P\left( {{y_t} = {s_j}|{x_t} = {o_k},\overline {\bf{\lambda }} } \right)}} = \sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)} γ=P(yt=sjxt=ok,λ)t=1TP(x,yt=sjλ)I(xt=ok)=t=1TP(x,yt=sjλ)代回(3)式得bjxt=∑t=1TP(x,yt=sj∣λ‾)I(xt=ok)∑t=1TP(x,yt=sj∣λ‾)=∑t=1,xt=okTγt(j)∑t=1Tγt(j){b_{j{x_t}}} = \frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)} }}=\frac{{\sum\limits_{t = 1,{x_t} = {o_k}}^T {{\gamma _t}\left( j \right)} }}{{\sum\limits_{t = 1}^T {{\gamma _t}\left( j \right)} }}bjxt=t=1TP(x,yt=sjλ)t=1TP(x,yt=sjλ)I(xt=ok)=t=1Tγt(j)t=1,xt=okTγt(j)这符合bjxtb_{jx_{t}}bjxt的实际意义:在yt=sjy_t=s_jyt=sj的前提下,xt=okx_t=o_kxt=ok的概率。

预测算法

Viterbi算法

Viterbi算法实际是用动态规划解隐马尔科夫模型预测问题,即用动态规划求概率最大路径(最优路径)。最优路径的特性:如果最优路径在时刻ttt通过结点it∗{i}^*_tit,那么这一条路径从结点it∗{i}^*_tit到终点iT∗{i}^*_TiT的部分路径,对于从it∗{i}^*_titiT∗{i}^*_TiT的所有可能的部分路径来说,必须是最优的。根据这一特性,我们只需从时刻t=1t=1t=1开始,递推计算地时刻ttt状态为iii的各条部分路径的最大概率,直至得到t=Tt=Tt=T状态为iii的各条路径的最大概率。时刻t=Tt=Tt=T的最大概率即为最优路径的概率P∗P^*P,最优路径的终结点iT∗{i}^*_TiT也同时得到。之后,从终结点iT∗{i}^*_TiT开始,由后向前逐步求节点iT - 1∗,…,i1∗i_{T{\text{ - }}1}^*, \ldots ,i_1^*iT - 1,,i1,得到最优路径I=(i1∗,i2∗,…,iT∗)I = \left( {i_1^*,i_2^*, \ldots ,i_T^*} \right)I=(i1,i2,,iT)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值