隐马尔可夫模型(HHM)学习笔记3

隐马尔可夫模型的训练

Baum-Welch算法

整理了李航书上的内容。
马尔科夫模型是一个含有隐变量的概率模型 P ( x ∣ λ ) = ∑ y P ( x ∣ y , λ ) P ( y ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{\bf{y}} {P\left( {{\bf{x}}|{\bf{y}},{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{\lambda }}} \right)} P(xλ)=yP(xy,λ)P(yλ)按照Q函数的定义它的参数学习可以由EM算法实现。
1)所有观测数据写成 x = ( x 1 , x 2 , ⋯ x T ) {\bf{x}} = \left( {{x_1},{x_2}, \cdots {x_T}} \right) x=(x1,x2,xT),所有隐数据写成 y = ( y 1 , y 2 , ⋯ y T ) {\bf{y}} = \left( {{y_1},{y_2}, \cdots {y_T}} \right) y=(y1,y2,yT),完全数据是 ( x , y ) = ( x 1 , x 2 , ⋯   , x T , y 1 , y 2 , ⋯   , y T ) \left( {{\bf{x}},{\bf{y}}} \right) = \left( {{x_1},{x_2}, \cdots ,{x_T},y_1,y_2,\cdots,y_T} \right) (x,y)=(x1,x2,,xT,y1,y2,,yT)。完全数据的对数似然函数是 log ⁡ P ( x , y ∣ λ ) \log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right) logP(x,yλ)
2)EM算法的E步:求Q函数 Q ( λ , λ ‾ ) Q\left( {\lambda ,\overline \lambda } \right) Q(λ,λ) Q ( λ , λ ‾ ) = ∑ y log ⁡ P ( x , y ∣ λ ) P ( x , y ∣ λ ‾ ) Q\left( {\lambda ,\overline \lambda } \right) = \sum\limits_{\bf{y}} {\log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} Q(λ,λ)=ylogP(x,yλ)P(x,yλ)注:按照Q函数的定义 Q ( λ , λ ‾ ) = E y [ log ⁡ P ( x , y ∣ λ ) y , λ ‾ ] = ∑ y log ⁡ P ( x , y ∣ λ ) P ( y ∣ x , λ ‾ ) Q\left( {{\bf{\lambda ,}}\overline {\bf{\lambda }} } \right) = {E_{\bf{y}}}\left[ {\log P\left( {{\bf{x,y}}|{\bf{\lambda }}} \right){\bf{y}},\overline {\bf{\lambda }} } \right]=\sum\limits_{\bf{y}} {\log P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{x}},\overline {\bf{\lambda }} } \right)} Q(λ,λ)=Ey[logP(x,yλ)y,λ]=ylogP(x,yλ)P(yx,λ),上式省略了对 λ {\bf{\lambda }} λ而言的常数因子 1 / P ( x ∣ λ ‾ ) 1/P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right) 1/P(xλ) P ( y , x ∣ λ ‾ ) / P ( y ∣ x , λ ‾ ) = 1 / P ( x ∣ λ ‾ ) P\left( {{\bf{y}},{\bf{x}}|\overline {\bf{\lambda }} } \right)/P\left( {{\bf{y}}|{\bf{x}},\overline {\bf{\lambda }} } \right) = 1/P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right) P(y,xλ)/P(yx,λ)=1/P(xλ))。
其中, λ ‾ {\overline {\bf{\lambda }} } λ是隐马尔可夫模型参数的当前估计值, λ \lambda λ是要极大化的隐马尔可夫模型参数。 P ( x , y ∣ λ ) = π y 1 b y 1 x 1 a y 1 y 2 b y 2 x 2 ⋯ a y T − 1 y T b y T x T P\left( {{\bf{x}},{\bf{y}}|{\bf{\lambda }}} \right) = {\pi _{{y_1}}}{b_{{y_1}{x_1}}}{a_{{y_1}{y_2}}}{b_{{y_2}{x_2}}} \cdots {a_{{y_{T - 1}}{y_T}}}{b_{{y_T}{x_T}}} P(x,yλ)=πy1by1x1ay1y2by2x2ayT1yTbyTxT于是函数 Q ( λ , λ ‾ ) = ∑ y log ⁡ π y 1 P ( x , y ∣ λ ‾ ) + ∑ y ( ∑ t = 1 T − 1 log ⁡ a y t a y t + 1 ) P ( x , y ∣ λ ‾ ) + ∑ y ( ∑ t = 1 T log ⁡ b y t x t ) P ( x , y ∣ λ ‾ ) \begin{aligned}Q\left( {{\bf{\lambda ,}}\overline {\bf{\lambda }} } \right) &= \sum\limits_{\bf{y}} {\log {{\bf{\pi }}_{{y_1}}}P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)}\\ &+ \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T - 1} {\log {a_{{y_t}}}{a_{{y_{t + 1}}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} + \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} \end{aligned} Q(λ,λ)=ylogπy1P(x,yλ)+y(t=1T1logaytayt+1)P(x,yλ)+y(t=1Tlogbytxt)P(x,yλ)
3)EM算法的M步:对上式的三项分别极大化。第一部分: ∑ y log ⁡ π y 1 P ( x , y ∣ λ ‾ ) = ∑ i = 1 N log ⁡ π s i P ( x , y 1 = s i ∣ λ ‾ ) \sum\limits_{\bf{y}} {\log {{\bf{\pi }}_{{y_1}}}P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{i = 1}^N {\log {{\bf{\pi }}_{{s_i}}}P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} ylogπy1P(x,yλ)=i=1NlogπsiP(x,y1=siλ)注意到 ∑ i = 1 N π s i = 1 \sum\limits_{i = 1}^N {{{\bf{\pi }}_{{s_i}}}} = 1 i=1Nπsi=1,利用拉格朗日乘子法,写出拉格朗日函数: ∑ i = 1 N log ⁡ π s i P ( x , y 1 = s i ∣ λ ‾ ) + γ ( ∑ i = 1 N π s i − 1 ) \sum\limits_{i = 1}^N {\log {{\bf{\pi }}_{{s_i}}}P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} + \gamma \left( {\sum\limits_{i = 1}^N {{{\bf{\pi }}_{{s_i}}}} - 1} \right) i=1NlogπsiP(x,y1=siλ)+γ(i=1Nπsi1)关于 π s i {{{\bf{\pi }}_{{s_i}}}} πsi的偏导等于零 P ( x , y 1 = s i ∣ λ ‾ ) π s i + γ = 0 ⇒ P ( x , y 1 = s i ∣ λ ‾ ) + π s i γ = 0 ( 1 ) \frac{{P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)}}{{{\pi _{{s_i}}}}} + \gamma = 0 \Rightarrow P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right) + {\pi _{{s_i}}}\gamma = 0\qquad(1) πsiP(x,y1=siλ)+γ=0P(x,y1=siλ)+πsiγ=0(1) ⇒ − γ = ∑ i = 1 N P ( x , y 1 = s i ∣ λ ‾ ) = P ( x ∣ λ ‾ ) \Rightarrow - \gamma = \sum\limits_{i = 1}^N {P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)} = P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right) γ=i=1NP(x,y1=siλ)=P(xλ)带回(1)式得 π s i = P ( x , y 1 = s i ∣ λ ‾ ) P ( x ∣ λ ‾ ) = γ 1 ( i ) {\pi _{{s_i}}} = \frac{{P\left( {{\bf{x}},{y_1} = {s_i}|\overline {\bf{\lambda }} } \right)}}{{P\left( {{\bf{x}}|\overline {\bf{\lambda }} } \right)}}={\gamma _1}\left( i \right) πsi=P(xλ)P(x,y1=siλ)=γ1(i)第二部分: ∑ y ( ∑ t = 1 T − 1 log ⁡ a y t a y t + 1 ) P ( x , y ∣ λ ‾ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( x , y t = s i , y t + 1 = s j ∣ λ ‾ ) \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T - 1} {\log {a_{{y_t}}}{a_{{y_{t + 1}}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T - 1} {\log {a_{ij}}P\left( {{\bf{x}},{y_t} = {s_i},{y_{t+1}} = {s_j}|\overline {\bf{\lambda }} } \right)} } } y(t=1T1logaytayt+1)P(x,yλ)=i=1Nj=1Nt=1T1logaijP(x,yt=si,yt+1=sjλ)类比第一部分,发现 ∑ j = 1 N a i j = 1 \sum\limits_{j = 1}^N {{a_{ij}}} = 1 j=1Naij=1,写出拉格朗日函数: ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( x , y t = s i , y t + 1 = s j ∣ λ ‾ ) + γ ( ∑ j = 1 N a i j − 1 ) \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T - 1} {\log {a_{ij}}P\left( {{\bf{x}},{y_t} = {s_i},{y_{t+1}} = {s_j}|\overline {\bf{\lambda }} } \right)} } + \gamma \left( {\sum\limits_{j = 1}^N {{a_{ij}}} - 1} \right)} i=1Nj=1Nt=1T1logaijP(x,yt=si,yt+1=sjλ)+γ(j=1Naij1)关于 a i j {{a_{ij}}} aij的偏导等于零 ∑ t = 1 T − 1 P ( x , y t = s i , y t + 1 = s j ∣ λ ‾ ) a i j + γ = 0 ( 2 ) \frac{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)} }}{{{a_{ij}}}} + \gamma = 0\qquad(2) aijt=1T1P(x,yt=si,yt+1=sjλ)+γ=0(2) ⇒ − γ = ∑ t = 1 T − 1 P ( x , y t = s i , y t + 1 = s j ∣ λ ‾ ) P ( x , y t + 1 = s j ∣ y t = s i , λ ‾ ) = ∑ t = 1 T − 1 P ( x , y t = s i ∣ λ ‾ ) \Rightarrow - \gamma = \sum\limits_{t = 1}^{T - 1} {\frac{{P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)}}{{P\left( {{\bf{x}},{y_{t + 1}} = {s_j}|{y_t} = {s_i},\overline {\bf{\lambda }} } \right)}} = } \sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i}|\overline {\bf{\lambda }} } \right)} γ=t=1T1P(x,yt+1=sjyt=si,λ)P(x,yt=si,yt+1=sjλ)=t=1T1P(x,yt=siλ)代回(2)式得 a i j = ∑ t = 1 T − 1 P ( x , y t = s i , y t + 1 = s j ∣ λ ‾ ) ∑ t = 1 T − 1 P ( x , y t = s i ∣ λ ‾ ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) {a_{ij}} = \frac{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i},{y_{t + 1}} = {s_j}|\overline {\bf{\lambda }} } \right)} }}{{\sum\limits_{t = 1}^{T - 1} {P\left( {{\bf{x}},{y_t} = {s_i}|\overline {\bf{\lambda }} } \right)} }}=\frac{{\sum\limits_{t = 1}^{T - 1} {{\xi _t}\left( {i,j} \right)} }}{{\sum\limits_{t = 1}^{T - 1} {{\gamma _t}\left( i \right)} }} aij=t=1T1P(x,yt=siλ)t=1T1P(x,yt=si,yt+1=sjλ)=t=1T1γt(i)t=1T1ξt(i,j)这符合 a i j a_{ij} aij的实际意义:在 y t = s i y_t=s_i yt=si的前提下 y t + 1 = s j y_{t+1}=s_{j} yt+1=sj的概率。
第三部分: ∑ y ( ∑ t = 1 T log ⁡ b y t x t ) P ( x , y ∣ λ ‾ ) = ∑ j = 1 N ( ∑ t = 1 T − 1 log ⁡ b y t x t ) P ( x , y t = j ∣ λ ‾ ) \sum\limits_{\bf{y}} {\left( {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{\bf{y}}|\overline {\bf{\lambda }} } \right)} = \sum\limits_{j = 1}^N {\left( {\sum\limits_{t = 1}^{T - 1} {{{\operatorname{log}b }_{{y_t}{x_t}}}} } \right)P\left( {{\bf{x}},{y_t} = j|\overline {\bf{\lambda }} } \right)} y(t=1Tlogbytxt)P(x,yλ)=j=1N(t=1T1logbytxt)P(x,yt=jλ)同样有 ∑ k = 1 M b j x k = 1 \sum\limits_{k = 1}^M {{b_{j{x_k}}}} = 1 k=1Mbjxk=1,写出拉格朗日函数: ∑ j = 1 N ∑ t = 1 T log ⁡ b y t x t P ( x , y t = s j ∣ λ ‾ ) + γ ( ∑ k = 1 M b j x k − 1 ) \sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T} {{{\operatorname{log}b }_{{y_t}{x_t}}}} P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right) + \gamma \left( {\sum\limits_{k = 1}^M {{b_{j{x_k}}}} - 1} \right)} j=1Nt=1TlogbytxtP(x,yt=sjλ)+γ(k=1Mbjxk1)关于 b j x k {{b_{j{x_k}}}} bjxk的偏导等于零 ∑ t = 1 T P ( x , y t = s j ∣ λ ‾ ) I ( x t = o k ) b j x t + γ = 0 ( 3 ) \frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{{b_{j{x_t}}}}} + \gamma = 0\qquad(3) bjxtt=1TP(x,yt=sjλ)I(xt=ok)+γ=0(3)注: I ( t r u e ) = 1 , I ( f a l s e ) = 0 I(true)=1,I(false)=0 I(true)=1,I(false)=0. − γ = ∑ t = 1 T P ( x , y t = s j ∣ λ ‾ ) I ( x t = o k ) P ( y t = s j ∣ x t = o k , λ ‾ ) = ∑ t = 1 T P ( x , y t = s j ∣ λ ‾ ) - \gamma = \frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{P\left( {{y_t} = {s_j}|{x_t} = {o_k},\overline {\bf{\lambda }} } \right)}} = \sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)} γ=P(yt=sjxt=ok,λ)t=1TP(x,yt=sjλ)I(xt=ok)=t=1TP(x,yt=sjλ)代回(3)式得 b j x t = ∑ t = 1 T P ( x , y t = s j ∣ λ ‾ ) I ( x t = o k ) ∑ t = 1 T P ( x , y t = s j ∣ λ ‾ ) = ∑ t = 1 , x t = o k T γ t ( j ) ∑ t = 1 T γ t ( j ) {b_{j{x_t}}} = \frac{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)I\left( {{x_t} = {o_k}} \right)} }}{{\sum\limits_{t = 1}^T {P\left( {{\bf{x}},{y_t} = {s_j}|\overline {\bf{\lambda }} } \right)} }}=\frac{{\sum\limits_{t = 1,{x_t} = {o_k}}^T {{\gamma _t}\left( j \right)} }}{{\sum\limits_{t = 1}^T {{\gamma _t}\left( j \right)} }} bjxt=t=1TP(x,yt=sjλ)t=1TP(x,yt=sjλ)I(xt=ok)=t=1Tγt(j)t=1,xt=okTγt(j)这符合 b j x t b_{jx_{t}} bjxt的实际意义:在 y t = s j y_t=s_j yt=sj的前提下, x t = o k x_t=o_k xt=ok的概率。

预测算法

Viterbi算法

Viterbi算法实际是用动态规划解隐马尔科夫模型预测问题,即用动态规划求概率最大路径(最优路径)。最优路径的特性:如果最优路径在时刻 t t t通过结点 i t ∗ {i}^*_t it,那么这一条路径从结点 i t ∗ {i}^*_t it到终点 i T ∗ {i}^*_T iT的部分路径,对于从 i t ∗ {i}^*_t it i T ∗ {i}^*_T iT的所有可能的部分路径来说,必须是最优的。根据这一特性,我们只需从时刻 t = 1 t=1 t=1开始,递推计算地时刻 t t t状态为 i i i的各条部分路径的最大概率,直至得到 t = T t=T t=T状态为 i i i的各条路径的最大概率。时刻 t = T t=T t=T的最大概率即为最优路径的概率 P ∗ P^* P,最优路径的终结点 i T ∗ {i}^*_T iT也同时得到。之后,从终结点 i T ∗ {i}^*_T iT开始,由后向前逐步求节点 i T  -  1 ∗ , … , i 1 ∗ i_{T{\text{ - }}1}^*, \ldots ,i_1^* iT - 1,,i1,得到最优路径 I = ( i 1 ∗ , i 2 ∗ , … , i T ∗ ) I = \left( {i_1^*,i_2^*, \ldots ,i_T^*} \right) I=(i1,i2,,iT)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值