【统计学习方法】第10章 隐马尔可夫模型

隐马尔可夫模型(hidden Markov model,HMM)是可用于标注问题的统计学习模型,描述由隐藏的马尔可夫链随机生成观测序列的过程,属于生成模型。

1、隐马尔可夫模型的基本概念

隐马尔可夫模型的定义

隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程.隐藏的马尔可夫链随机生成的状态的序列,称为状态序列(state sequence);每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列(observation sequence)。序列的每一个位置又可以看作是一个时刻。

状态集合 Q = { q 1 , q 2 , … , q N } ∣ Q ∣ = N \begin{aligned} & Q=\left\{q_{1},q_{2},\ldots ,q_{N}\right\} \quad \left| Q\right| =N \end{aligned} Q={q1,q2,,qN}Q=N

观测集合 V = { v 1 , v 2 , … , v M } ∣ V ∣ = M \begin{aligned} & V=\left\{v_{1},v_{2},\ldots ,v_{M}\right\} \quad \left| V\right| =M \end{aligned} V={v1,v2,,vM}V=M

状态序列 I = { i 1 , i 2 , … , i t , … , i T } i t ∈ Q ( t = 1 , 2 , … , T ) \begin{aligned} & I=\left\{i_{1},i_{2},\ldots ,i_{t},\ldots,i_{T}\right\} \quad i_{t}\in Q \quad \left(t=1,2,\ldots,T \right)\end{aligned} I={i1,i2,,it,,iT}itQ(t=1,2,,T)

观测序列 O = { o 1 , o 2 , … , o t , … , o T } o t ∈ V ( t = 1 , 2 , … , T ) \begin{aligned} & O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} \quad o_{t}\in V \quad \left(t=1,2,\ldots,T \right)\end{aligned} O={o1,o2,,ot,,oT}otV(t=1,2,,T)

状态转移矩阵 A = [ a i j ] N × N \begin{aligned} & A=\left[a_{ij}\right]_{N\times N} \end{aligned} A=[aij]N×N

t t t时刻处于状态 q i q_{i} qi的条件下,在 t + 1 t+1 t+1时刻转移到状态 q j q_{j} qj的概率 a i j = P ( i t + 1 = q j ∣ i t = q i ) ( i = 1 , 2 , … , N ) ( j = 1 , 2 , … , M ) \begin{aligned} & a_{ij}= P\left( i_{t+1}=q_{j}|i_{t}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \quad \left(j=1,2,\ldots,M \right)\end{aligned} aij=P(it+1=qjit=qi)(i=1,2,,N)(j=1,2,,M)

观测概率矩阵 B = [ b j ( k ) ] N × M \begin{aligned} & B=\left[b_{j}\left(k\right)\right]_{N\times M} \end{aligned} B=[bj(k)]N×M

t t t时刻处于状态 q i q_{i} qi的条件下,生成观测 v k v_{k} vk的概率 b j ( k ) = P ( o t = v k ∣ i t = q j ) ( k = 1 , 2 , … , M ) ( j = 1 , 2 , … , N ) \begin{aligned} & b_{j}\left(k\right)= P\left( o_{t}=v_{k}|i_{t}=q_{j}\right) \quad \left(k=1,2,\ldots,M \right) \quad \left(j=1,2,\ldots,N \right)\end{aligned} bj(k)=P(ot=vkit=qj)(k=1,2,,M)(j=1,2,,N)

初始概率向量 π = ( π i ) \begin{aligned} & \pi =\left( \pi _{i}\right) \end{aligned} π=(πi)

在时刻 t = 1 t=1 t=1处于状态 q i q_{i} qi的概率 π i = P ( i 1 = q i ) ( i = 1 , 2 , … , N ) \begin{aligned} & \pi_{i} =P\left( i_{1}=q_{i}\right) \quad \left(i=1,2,\ldots,N \right) \end{aligned} πi=P(i1=qi)(i=1,2,,N)

隐马尔科夫模型 λ = ( A , B . π ) \begin{aligned} & \lambda =\left( A,B.\pi \right) \end{aligned} λ=(A,B.π)

隐马尔科夫模型基本假设:

  1. 齐次马尔科夫性假设:在任意时刻 t t t的状态只依赖于时刻 t − 1 t-1 t1的状态。 P ( i t ∣ i t − 1 , o t − 1 , … , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) ( t = 1 , 2 , … , T ) \begin{aligned} & P\left( i_{t}|i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(i_{t}|i_{t-1}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned} P(itit1,ot1,,i1,o1)=P(itit1)(t=1,2,,T)
  2. 观测独立性假设:任意时刻 t t t的观测只依赖于时刻 t t t的状态。 P ( o t ∣ i T , o T , i T − 1 , o T − 1 , … , i t + 1 , o t + 1 , i t , i t − 1 , o t − 1 , … , i 1 , o 1 ) = P ( o t ∣ i t ) ( t = 1 , 2 , … , T ) \begin{aligned} & P\left( o_{t}|i_{T},o_{T},i_{T-1},o_{T-1},\ldots,i_{t+1},o_{t+1},i_{t},i_{t-1},o_{t-1},\ldots,i_{1},o_{1}\right)=P\left(o_{t}|i_{t}\right) \quad \left(t=1,2,\ldots,T\right) \end{aligned} P(otiT,oT,iT1,oT1,,it+1,ot+1,it,it1,ot1,,i1,o1)=P(otit)(t=1,2,,T)

观测序列的生成过程

观测序列生成算法:

  • 输入:隐马尔科夫模型 λ = ( A , B . π ) \lambda =\left( A,B.\pi \right) λ=(A,B.π),观测序列长度 T T T;
  • 输出:观测序列 O = { o 1 , o 2 , … , o t , … , o T } O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} O={o1,o2,,ot,,oT}
  1. 由初始概率向量 π \pi π产生状态 i 1 i_{1} i1
  2. t = 1 t=1 t=1
  3. 由状态 i t i_{t} it的观测概率分布 b j ( k ) b_{j}\left(k\right) bj(k)生成 o t o_{t} ot
  4. 由状态 i t i_{t} it的状态转移概率分布 a i t i t + 1 a_{i_{t}i_{t+1}} aitit+1生成状态 i t + 1 ( i t + 1 = 1 , 2 , … , N ) i_{t+1} \quad \left(i_{t+1}=1,2,\ldots,N\right) it+1(it+1=1,2,,N)
  5. t = t + 1 t=t+1 t=t+1;如果KaTeX parse error: Expected 'EOF', got '&' at position 2: t&̲lt;T,转至3.;否则,结束。

隐马尔可夫模型的3个基本问题

隐马尔科夫模型的3个基本问题:

  1. 概率计算:已知 λ = ( A , B . π ) \lambda =\left( A,B.\pi \right) λ=(A,B.π) O = { o 1 , o 2 , … , o t , … , o T } O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} O={o1,o2,,ot,,oT},计算 P ( O ∣ λ ) P\left(O| \lambda \right) P(Oλ)
  2. 学习:已知 O = { o 1 , o 2 , … , o t , … , o T } O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} O={o1,o2,,ot,,oT},计算 λ ∗ = arg ⁡ max ⁡ P ( O ∣ λ ) \lambda^* =\arg \max P\left( O|\lambda \right) λ=argmaxP(Oλ)
  3. 预测(编码):已知 λ = ( A , B . π ) \lambda =\left( A,B.\pi \right) λ=(A,B.π) O = { o 1 , o 2 , … , o t , … , o T } O=\left\{o_{1},o_{2},\ldots ,o_{t},\ldots,o_{T}\right\} O={o1,o2,,ot,,oT},计算 I ∗ = arg ⁡ max ⁡ P ( I ∣ O λ ) I^* =\arg \max P\left( I|O \lambda \right) I=argmaxP(IOλ)
    前向概率 α t ( i ) = P ( o 1 , o 2 , … , o t , i t = q i ∣ λ ) \begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right) \end{aligned} αt(i)=P(o1,o2,,ot,it=qiλ)
    给定模型 λ \lambda λ,时刻 t t t部分观测序列为 o 1 , o 2 , … , o t o_{1},o_{2},\ldots ,o_{t} o1,o2,,ot且状态为 q i q_{i} qi的概率。

2、概率计算算法

前向算法

前向概率递推计算 α t ( i ) = P ( o 1 , o 2 , … , o t , i t = q i ∣ λ ) = P ( i t = q i , o 1 t ) = ∑ j = 1 N P ( i t − 1 = q j , i t = q i , o 1 t − 1 , o t ) = ∑ j = 1 N P ( i t = q i , o t ∣ i t − 1 = q j , o 1 t − 1 ) ⋅ P ( i t − 1 = q j , o 1 t − 1 ) = ∑ j = 1 N P ( i t = q i , o t ∣ i t − 1 = q j ) ⋅ α t − 1 ( j ) = ∑ j = 1 N P ( o t ∣ i t = q i , i t − 1 = q j ) ⋅ P ( i t = q i ∣ i t − 1 = q j ) ⋅ α t − 1 ( j ) = ∑ j = 1 N b i ( o t ) ⋅ a j i ⋅ α t − 1 ( j ) \begin{aligned} & \alpha _{t}\left( i\right) =P\left(o_{1},o_{2},\ldots ,o_{t}, i_{t}=q_{i}| \lambda \right)=P\left(i_{t}=q_{i},o_{1}^t \right) \\ & =\sum _{j=1}^{N}P\left(i_{t-1}=q_{j},i_{t}=q_{i},o_{1}^{t-1},o_{t}\right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j},o_{1}^{t-1}\right)\cdot P\left(i_{t-1}=q_{j},o_{1}^{t-1} \right) \\ & =\sum _{j=1}^{N}P\left(i_{t}=q_{i},o_{t}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right)\\ & =\sum _{j=1}^{N}P\left(o_{t}|i_{t}=q_{i},i_{t-1}=q_{j}\right)\cdot P\left(i_{t}=q_{i}|i_{t-1}=q_{j}\right)\cdot \alpha _{t-1}\left( j\right) \\ & =\sum _{j=1}^{N}b_{i}\left(o_{t}\right)\cdot a_{ji}\cdot \alpha _{t-1}\left( j\right)\end{aligned} αt(i)=P(o1,o2,,ot,it=qiλ)P(it=qi,o1t)=j=1NP(it1=qj,it=qi,o1t1,ot)=j=1NP(it=qi,otit1=qj,o1t1)P(it1=qj,o1t1)=j=1NP(it=qi,otit1=qj)αt1(j)=j=1NP(otit=qi,it1=qj)P(it=qiit1=qj)αt1(j)=j=1Nbi(ot)ajiαt1(j)

概率计算 P ( O ∣ λ ) = P ( o 1 T ∣ λ ) = ∑ i = 1 N P ( o 1 T , i T = q i ) = ∑ i = 1 N α T ( i ) \begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{T}=q_{i}\right)\\ & = \sum _{i=1}^{N}\alpha _{T}\left( i\right)\end{aligned} P(Oλ)=P(o1Tλ)=i=1NP(o1T,iT=qi)=i=1NαT(i)

观测序列概率计算的前向算法:

  • 输入:隐马尔科夫模型 λ \lambda λ,观测序列 O O O;
  • 输出:观测序列概率 P ( O ∣ λ ) P\left(O| \lambda \right) P(Oλ)
  1. 初值 α 1 ( i ) = π i b i ( o 1 ) ( t = 1 , 2 , … , N ) \begin{aligned} & \alpha _{1}\left( i\right)= \pi_{i}b_{i}\left(o_{1}\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned} α1(i)=πibi(o1)(t=1,2,,N)
  2. 递推 对 t = 1 , 2 , … , T − 1 t=1,2,\ldots,T-1 t=1,2,,T1 α t + 1 ( i ) = ∑ j = 1 N b i ( o t + 1 ) ⋅ a j i ⋅ α t ( j ) ( t = 1 , 2 , … , N ) \begin{aligned} & \alpha _{t+1}\left( i\right) =\sum _{j=1}^{N}b_{i}\left(o_{t+1}\right)\cdot a_{ji}\cdot \alpha _{t}\left( j\right) \quad \left(t=1,2,\ldots,N\right) \end{aligned} αt+1(i)=j=1Nbi(ot+1)ajiαt(j)(t=1,2,,N)
  3. 终止 P ( O ∣ λ ) = ∑ j = 1 N α T ( i ) \begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\alpha _{T}\left( i\right)\end{aligned} P(Oλ)=j=1NαT(i)

后向算法

后向概率 β t ( i ) = P ( o t + 1 , o t + 2 , … , o T ∣ i t = q i λ ) \begin{aligned} & \beta_{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i} \lambda \right) \end{aligned} βt(i)=P(ot+1,ot+2,,oTit=qiλ)
给定模型 λ \lambda λ,时刻 t t t状态为 q i q_{i} qi的条件下,从时刻 t + 1 t+1 t+1到时刻 T T T的部分观测序列为 o t + 1 , o t + 2 , … , o T o_{t+1},o_{t+2},\ldots ,o_{T} ot+1,ot+2,,oT的概率。

后向概率递推计算 β t ( i ) = P ( o t + 1 , o t + 2 , … , o T ∣ i t = q i , λ ) = P ( o t + 1 T ∣ i t = q i ) = P ( o t + 1 T , i t = q i ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 1 T , i t = q i , i t + 1 = q j ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 1 T ∣ i t = q i , i t + 1 = q j ) ⋅ P ( i t = q i , i t + 1 = q j ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 1 T ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ i t = q i ) ⋅ P ( i t = q i ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 2 N , o t + 1 ∣ i t + 1 = q j ) ⋅ a i j = ∑ j = 1 N P ( o t + 2 T ∣ i t + 1 = q j ) ⋅ P ( o t + 1 ∣ i t + 1 = q j ) ⋅ a i j = ∑ j = 1 N β t + 1 ( j ) ⋅ b j ( o t + 1 ) ⋅ a i j \begin{aligned} & \beta _{t}\left( i\right) =P\left(o_{t+1},o_{t+2},\ldots ,o_{T}| i_{t}=q_{i}, \lambda \right)=P\left(o_{t+1}^T |i_{t}=q_{i}\right) \\ & =\dfrac {P\left(o_{t+1}^{T}, i_{t}=q_{i}\right)} {P\left(i_{t}=q_{i}\right)}\\ & =\dfrac {\sum_{j=1}^{N} P\left(o_{t+1}^{T},i_{t}=q_{i},i_{t+1}=q_{j}\right)}{P\left(i_{t}=q_{i}\right)}\\ & =\sum_{j=1}^{N} \dfrac {P\left(o_{t+1}^{T}|i_{t}=q_{i},i_{t+1}=q_{j}\right) \cdot P\left(i_{t}=q_{i},i_{t+1}=q_{j} \right)}{P\left(i_{t}=q_{i}\right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+1}^{T}|i_{t+1}=q_{j}\right) \cdot \dfrac {P\left(i_{t+1}=q_{j}|i_{t}=q_{i}\right) \cdot P\left(i_{t}=q_{i} \right)}{P\left(i_{t}=q_{i} \right)} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{N},o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} P\left(o_{t+2}^{T}|i_{t+1}=q_{j}\right) \cdot P\left(o_{t+1}|i_{t+1}=q_{j}\right) \cdot a_{ij} \\ & = \sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij}\end{aligned} βt(i)=P(ot+1,ot+2,,oTit=qi,λ)P(ot+1Tit=qi)=P(it=qi)P(ot+1T,it=qi)=P(it=qi)j=1NP(ot+1T,it=qi,it+1=qj)=j=1NP(it=qi)P(ot+1Tit=qi,it+1=qj)P(it=qi,it+1=qj)=j=1NP(ot+1Tit+1=qj)P(it=qi)P(it+1=qjit=qi)P(it=qi)=j=1NP(ot+2N,ot+1it+1=qj)aij=j=1NP(ot+2Tit+1=qj)P(ot+1it+1=qj)aij=j=1Nβt+1(j)bj(ot+1)aij

概率计算 P ( O ∣ λ ) = P ( o 1 T ∣ λ ) = ∑ i = 1 N P ( o 1 T , i 1 = q i ) = ∑ i = 1 N P ( i 1 = q i ) ⋅ P ( o 1 ∣ i 1 = q i ) ⋅ P ( o 2 T ∣ i 1 = q i ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) \begin{aligned} & P\left(O| \lambda \right) =P\left(o_{1}^{T}| \lambda\right) \\ & = \sum_{i=1}^{N}P\left(o_{1}^{T},i_{1}=q_{i}\right)\\ & = \sum_{i=1}^{N}P\left(i_{1}=q_{i}\right) \cdot P\left(o_{1}|i_{1}=q_{i}\right)\cdot P\left(o_{2}^{T}|i_{1}=q_{i}\right) \\ & = \sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}\left(i\right)\end{aligned} P(Oλ)=P(o1Tλ)=i=1NP(o1T,i1=qi)=i=1NP(i1=qi)P(o1i1=qi)P(o2Ti1=qi)=i=1Nπibi(o1)β1(i)

观测序列概率计算的后向算法:

  • 输入:隐马尔科夫模型 λ \lambda λ,观测序列 O O O;
  • 输出:观测序列概率 P ( O ∣ λ ) P\left(O| \lambda \right) P(Oλ)
  1. 初值 β T ( i ) = 1 ( t = 1 , 2 , … , N ) \begin{aligned} & \beta_{T}\left( i\right)= 1 \quad \left(t=1,2,\ldots,N\right) \end{aligned} βT(i)=1(t=1,2,,N)
  2. 递推 对 t = T − 1 , T − 2 , … , 1 t=T-1,T-2,\ldots,1 t=T1,T2,,1 β t ( i ) = ∑ j = 1 N β t + 1 ( j ) ⋅ b j ( o t + 1 ) ⋅ a i j ( t = 1 , 2 , … , N ) \begin{aligned} & \beta_{t}\left( i\right) =\sum_{j=1}^{N} \beta_{t+1}\left(j\right) \cdot b_{j}\left(o_{t+1}\right) \cdot a_{ij} \quad \left(t=1,2,\ldots,N\right) \end{aligned} βt(i)=j=1Nβt+1(j)bj(ot+1)aij(t=1,2,,N)
  3. 终止 P ( O ∣ λ ) = ∑ j = 1 N π i b i ( o 1 ) β 1 ( i ) \begin{aligned} & P\left(O| \lambda \right)= \sum _{j=1}^{N}\pi_{i} b_{i}\left(o_{1}\right)\beta _{1}\left( i\right) \end{aligned} P(Oλ)=j=1Nπibi(o1)β1(i)

P ( O ∣ λ ) P \left( O | \lambda \right) P(Oλ)的前向概率、后向概率的表示
P ( O ∣ λ ) = P ( o 1 T ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 t , o t + 1 T , i t = q i , i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 t , i t = q i , i t + 1 = q j ) P ( o t + 1 T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 t , i t = q i ) P ( i t + 1 = q j ∣ i t = q i ) P ( o t + 1 T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 t , i t = q i ) P ( i t + 1 = q j ∣ i t = q i ) P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) t = 1 , 2 , ⋯   , T − 1 \begin{aligned} & P \left( O | \lambda \right) = P \left( o_{1}^{T} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, o_{t+1}^{T}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i}, i_{t+1}=q_{j} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( i_{t+1}=q_{j} | i_{t}=q_{i} \right) P \left( o_{t+1} | i_{t+1}=q_{j} \right) P \left( o_{t+2}^{T} | i_{t+1}=q_{j} \right) \\ & = \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) \quad \quad \quad t=1, 2, \cdots, T-1\end{aligned} P(Oλ)P(o1T)i=1Nj=1NP(o1t,ot+1T,it=qi,it+1=qj)i=1Nj=1NP(o1t,it=qi,it+1=qj)P(ot+1Tit+1=qj)=i=1Nj=1NP(o1t,it=qi)P(it+1=qjit=qi)P(ot+1Tit+1=qj)=i=1Nj=1NP(o1t,it=qi)P(it+1=qjit=qi)P(ot+1it+1=qj)P(ot+2Tit+1=qj)=i=1Nj=1Nαt(i)aijbj(ot+1)βt+1(j)t=1,2,,T1

一些概率与期望值的计算

给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_{i} qi的概率 γ t ( i ) = P ( i t = q i ∣ O , λ ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) = P ( i t = q i , O ∣ λ ) ∑ j = 1 N ( i t = q i , O ∣ λ ) = P ( o 1 t , i t = q i ) P ( o t + 1 T ∣ i t = q i ) ∑ j = 1 N P ( o 1 t , i t = q i ) P ( o t + 1 T ∣ i t = q i ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( i ) β t ( i ) \begin{aligned} \\ & \gamma_{t} \left( i \right) = P \left( i_{t}=q_{i} | O, \lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, O | \lambda \right) } { \sum_{j=1}^{N} \left( i_{t}=q_{i}, O | \lambda \right) } \\ & = \dfrac{ P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } { \sum_{j=1}^{N} P \left( o_{1}^{t}, i_{t}=q_{i} \right) P \left( o_{t+1}^{T}| i_{t}=q_{i} \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) \beta_{t} \left( i \right)} { \sum_{j=1}^{N} \alpha_{t} \left( i \right) \beta_{t} \left( i \right) }\end{aligned} γt(i)=P(it=qiO,λ)=P(Oλ)P(it=qi,Oλ)=j=1N(it=qi,Oλ)P(it=qi,Oλ)=j=1NP(o1t,it=qi)P(ot+1Tit=qi)P(o1t,it=qi)P(ot+1Tit=qi)=j=1Nαt(i)βt(i)αt(i)βt(i)

给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_{i} qi且在时刻 t + 1 t+1 t+1处于状态 q j q_{j} qj的概率
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O , λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) ∑ i = 1 N ∑ j = 1 N P ( i t = q i , i t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} \\ & \xi_{t} \left( i,j \right) = P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O ,\lambda \right) \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j},O | \lambda \right) } { P \left( O | \lambda \right) } \\ & = \dfrac{ P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} P \left( i_{t}=q_{i}, i_{t+1}=q_{j}, O|\lambda \right) } \\ & = \dfrac{ \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right) } { \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t} \left( i \right) a_{ij} b_{j} \left( o_{t+1} \right) \beta_{t+1} \left( j \right)}\end{aligned} ξt(i,j)=P(it=qi,it+1=qjO,λ)=P(Oλ)P(it=qi,it+1=qj,Oλ)=i=1Nj=1NP(it=qi,it+1=qj,Oλ)P(it=qi,it+1=qj,Oλ)=i=1Nj=1Nαt(i)aijbj(ot+1)βt+1(j)αt(i)aijbj(ot+1)βt+1(j)

在观测 O O O下状态 i i i出现的期望 ∑ t = 1 T γ t ( i ) = ∑ t = 1 T P ( i t = q i ∣ O , λ ) \begin{aligned} & \sum_{t=1}^{T} \gamma_{t} \left( i \right) = \sum_{t=1}^{T} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned} t=1Tγt(i)=t=1TP(it=qiO,λ)

在观测 O O O下由状态 i i i转移的期望 ∑ t = 1 T - 1 γ t ( i ) = ∑ t = 1 T - 1 P ( i t = q i ∣ O , λ ) \begin{aligned} & \sum_{t=1}^{T-1} \gamma_{t} \left( i \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i} | O, \lambda \right) \end{aligned} t=1T1γt(i)=t=1T1P(it=qiO,λ)

在观测 O O O下由状态 i i i转移到状态 j j j的期望 ∑ t = 1 T - 1 ξ t ( i , j ) = ∑ t = 1 T - 1 P ( i t = q i , i t + 1 = q j ∣ O , λ ) \begin{aligned} & \sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) = \sum_{t=1}^{T-1} P \left( i_{t}=q_{i}, i_{t+1}=q_{j} | O, \lambda \right) \end{aligned} t=1T1ξt(i,j)=t=1T1P(it=qi,it+1=qjO,λ)

3、学习算法

Baum-Welch 算法

将观测序列作为观测数据 O O O,将状态序列作为隐数据 I I I,则应马尔科夫模型是含有隐变量的概率模型
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) \begin{aligned} & P \left( O | \lambda \right) = \sum_{I} P \left( O | I, \lambda \right) P \left( I | \lambda \right)\end{aligned} P(Oλ)=IP(OI,λ)P(Iλ)

完全数据 ( O , I ) = ( o 1 , o 2 , ⋯   , o T , i 1 , i 2 , ⋯   , o T ) \begin{aligned} & \left( O, I \right) = \left(o_{1}, o_{2}, \cdots, o_{T}, i_{1}, i_{2}, \cdots, o_{T} \right)\end{aligned} (O,I)=(o1,o2,,oT,i1,i2,,oT)

完全数据的对数似然函数 log ⁡ P ( O , I ∣ λ ) \begin{aligned} & \log P \left( O, I | \lambda \right) \end{aligned} logP(O,Iλ)

Q ( λ , λ ‾ ) Q \left( \lambda, \overline{\lambda} \right) Q(λ,λ)函数
Q ( λ , λ ‾ ) = E I [ log ⁡ P ( O , I ∣ λ ) ∣ O , λ ‾ ] = ∑ I log ⁡ P ( O , I ∣ λ ) P ( I ∣ O , λ ‾ ) = ∑ I log ⁡ P ( O , I ∣ λ ) P ( O , I ∣ λ ‾ ) P ( O ∣ λ ‾ ) \begin{aligned} \\& Q \left( \lambda, \overline{\lambda} \right) = E_{I} \left[ \log P \left( O, I | \lambda \right) | O, \overline{\lambda} \right] \\ & = \sum_{I} \log P \left( O, I | \lambda \right) P \left( I | O, \overline{\lambda} \right) \\ & = \sum_{I} \log \dfrac{P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) }{P \left( O | \overline{\lambda} \right)}\end{aligned} Q(λ,λ)=EI[logP(O,Iλ)O,λ]=IlogP(O,Iλ)P(IO,λ)=IlogP(Oλ)P(O,Iλ)P(O,Iλ)
其中, λ ‾ \overline{\lambda} λ是隐马尔科夫模型参数的当前估计值, λ \lambda λ是隐马尔科夫模型参数。

由于对最大化 Q ( λ , λ ‾ ) Q \left( \lambda, \overline{\lambda} \right) Q(λ,λ)函数, P ( O ∣ λ ‾ ) P \left( O | \overline{\lambda} \right) P(Oλ)为常数因子,
以及 P ( O , I ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) ⋯ a i T − 1 i T b T ( o T ) \begin{aligned} & P \left( O, I | \lambda \right) = \pi_{i_{1}} b_{i_{1}} \left( o_{1} \right) a_{i_{1}i_{2}} b_{i_{2}} \left( o_{2} \right) \cdots a_{i_{T-1}i_{T}}b_{T}\left( o_{T} \right)\end{aligned} P(O,Iλ)=πi1bi1(o1)ai1i2bi2(o2)aiT1iTbT(oT)
所以求 Q ( λ , λ ‾ ) Q \left( \lambda, \overline{\lambda} \right) Q(λ,λ)函数对 λ \lambda λ的最大 λ = arg ⁡ max ⁡ Q ( λ , λ ‾ ) ⇔ arg ⁡ max ⁡ ∑ I log ⁡ P ( O , I ∣ λ ) P ( O , I ∣ λ ‾ ) = ∑ I log ⁡ π i 1 P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T − 1 log ⁡ a i t i t + 1 ) P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T log ⁡ b i t ( o t ) ) P ( O , I ∣ λ ‾ ) \begin{aligned} & \lambda = \arg \max{Q \left( \lambda, \overline{\lambda} \right) }\Leftrightarrow \arg\max \sum_{I} \log P \left( O, I | \lambda \right) P \left( O, I | \overline{\lambda} \right) \\ & = \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) + \sum_{I} \left( \sum_{t=1}^{T} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right)\end{aligned} λ=argmaxQ(λ,λ)argmaxIlogP(O,Iλ)P(O,Iλ)=Ilogπi1P(O,Iλ)+I(t=1T1logaitit+1)P(O,Iλ)+I(t=1Tlogbit(ot))P(O,Iλ)

对三项分别进行极大化:

max ⁡ ∑ I log ⁡ π i 1 P ( O , I ∣ λ ‾ ) = ∑ i = 1 N log ⁡ π i 1 P ( O , i 1 = i ∣ λ ‾ ) s . t . ∑ i = 1 N π i = 1 \begin{aligned} & \max \sum_{I} \log \pi_{i_{1}} P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) \\ & s.t. \sum_{i=1}^{N} \pi_{i} = 1 \end{aligned} maxIlogπi1P(O,Iλ)=i=1Nlogπi1P(O,i1=iλ)s.t.i=1Nπi=1

  1. 构造拉格朗日函数,对其求偏导,令结果为0 ∂ ∂ π i [ ∑ i = 1 N log ⁡ π i 1 P ( O , i 1 = i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i − 1 ) ] = 0 \begin{aligned} & \dfrac{\partial}{\partial \pi_{i}} \left[ \sum_{i=1}^{N} \log \pi_{i_{1}} P \left( O, i_{1}=i | \overline{\lambda} \right) + \gamma \left( \sum_{i=1}^{N} \pi_{i} - 1 \right) \right] = 0\end{aligned} πi[i=1Nlogπi1P(O,i1=iλ)+γ(i=1Nπi1)]=0
    P ( O , i 1 = i ∣ λ ‾ ) + γ π i = 0 ∑ i = 1 N [ P ( O , i 1 = i ∣ λ ‾ ) + γ π i ] = 0 ∑ i = 1 N P ( O , i 1 = i ∣ λ ‾ ) + γ ∑ i = 1 N π i = 0 P ( O ∣ λ ‾ ) + γ = 0 γ = − P ( O ∣ λ ‾ ) \begin{aligned} & P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0 \\ & \sum_{i=1}^{N} \left[ P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} \right] = 0 \\ & \sum_{i=1}^{N} P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \sum_{i=1}^{N} \pi_{i} = 0 \\ & P \left( O | \overline{\lambda} \right) + \gamma = 0 \\ & \gamma = - P \left( O | \overline{\lambda} \right)\end{aligned} P(O,i1=iλ)+γπi=0i=1N[P(O,i1=iλ)+γπi]=0i=1NP(O,i1=iλ)+γi=1Nπi=0P(Oλ)+γ=0γ=P(Oλ)
    代入 P ( O , i 1 = i ∣ λ ‾ ) + γ π i = 0 P \left( O, i_{1} = i | \overline{\lambda} \right) + \gamma \pi_{i} = 0 P(O,i1=iλ)+γπi=0,得 π i = P ( O , i 1 = i ∣ λ ‾ ) P ( O ∣ λ ‾ ) = γ 1 ( i ) \begin{aligned} & \pi_{i} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \\ & = \gamma_{1} \left( i \right) \end{aligned} πi=P(Oλ)P(O,i1=iλ)=γ1(i)
    max ⁡ ∑ I ( ∑ t = 1 T − 1 log ⁡ a i t i t + 1 ) P ( O , I ∣ λ ‾ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( O , i t = i , i t + 1 = j ∣ λ ‾ ) s . t . ∑ j = 1 N a i j = 1 \begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{T-1} \log a_{i_{t}i_{t+1}} \right) P \left( O, I | \overline{\lambda} \right) = \sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{ij} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right) \\ & s.t. \sum_{j=1}^{N} a_{ij} = 1 \end{aligned} maxI(t=1T1logaitit+1)P(O,Iλ)=i=1Nj=1Nt=1T1logaijP(O,it=i,it+1=jλ)s.t.j=1Naij=1
  2. a i j = ∑ t = 1 T − 1 P ( O , i t = i , i t + 1 = j ∣ λ ‾ ) ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ‾ ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) \begin{aligned} \\ & a_{ij} = \dfrac{\sum_{t=1}^{T-1} P \left( O, i_{t}=i, i_{t+1}=j | \overline{\lambda} \right)}{\sum_{t=1}^{T-1} P \left( O, i_{t}=i | \overline{\lambda} \right)} \\ & = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)}\end{aligned} aij=t=1T1P(O,it=iλ)t=1T1P(O,it=i,it+1=jλ)=t=1T1γt(i)t=1T1ξt(i,j)
    max ⁡ ∑ I ( ∑ t = 1 N log ⁡ b i t ( o t ) ) P ( O , I ∣ λ ‾ ) = ∑ j = 1 N ∑ t = 1 T log ⁡ b j ( o t ) P ( O , i t = j ∣ λ ‾ ) s . t . ∑ k = 1 M b j ( k ) = 1 \begin{aligned} \\ & \max \sum_{I} \left( \sum_{t=1}^{N} \log b_{i_{t}} \left( o_{t} \right) \right) P \left( O, I | \overline{\lambda} \right) = \sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j} \left( o_{t} \right) P \left( O, i_{t}=j | \overline{\lambda} \right) \\ & s.t. \sum_{k=1}^{M} b_{j} \left( k \right) = 1 \end{aligned} maxI(t=1Nlogbit(ot))P(O,Iλ)=j=1Nt=1Tlogbj(ot)P(O,it=jλ)s.t.k=1Mbj(k)=1
  3. b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ‾ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = j ∣ λ ‾ ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) \begin{aligned} \\ & b_{j} \left( k \right) = \dfrac{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right) I \left( o_{t} = v_{k} \right)}{\sum_{t=1}^{T} P \left( O, i_{t}=j | \overline{\lambda} \right)} \\ & = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)}\end{aligned} bj(k)=t=1TP(O,it=jλ)t=1TP(O,it=jλ)I(ot=vk)=t=1Tγt(j)t=1,ot=vkTγt(j)

Baum-Welch算法:

  • 输入:观测数据 O = ( o 1 , o 2 , ⋯   , o T ) O = \left( o_{1}, o_{2}, \cdots, o_{T} \right) O=(o1,o2,,oT)
  • 输出:隐马尔科夫模型参数
  1. 初始化
    n = 0 n=0 n=0,选取 a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)} aij(0),bj(k)(0),πi(0),得到模型 λ ( 0 ) = ( a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) ) \lambda^{\left( 0 \right)} = \left( a_{ij}^{ \left( 0 \right) },b_{j} \left( k \right)^{\left( 0 \right)},\pi_{i}^{\left( 0 \right)} \right) λ(0)=(aij(0),bj(k)(0),πi(0))
  2. 递推
    n = 1 , 2 , ⋯   , n=1,2, \cdots, n=1,2,,
    a i j ( n + 1 ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) b j ( k ) ( n + 1 ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) π i ( n + 1 ) = P ( O , i 1 = i ∣ λ ‾ ) P ( O ∣ λ ‾ ) \begin{aligned} \\ & a_{ij}^{\left( n+1 \right)} = \dfrac{\sum_{t=1}^{T-1} \xi_{t} \left( i,j \right) }{\sum_{t=1}^{T-1} \gamma_{t} \left( i \right)} \\ & b_{j} \left( k \right)^{\left( n+1 \right)} = \dfrac{ \sum_{t=1,o_{t}=v_{k}}^{T} \gamma_{t} \left( j \right)}{\sum_{t=1}^{T} \gamma_{t} \left( j \right)} \\ & \pi_{i}^{\left( n+1 \right)} = \dfrac{P \left( O, i_{1} = i | \overline{\lambda} \right)}{P \left( O | \overline{\lambda} \right)} \end{aligned} aij(n+1)=t=1T1γt(i)t=1T1ξt(i,j)bj(k)(n+1)=t=1Tγt(j)t=1,ot=vkTγt(j)πi(n+1)=P(Oλ)P(O,i1=iλ)
    其中,右端各值按观测数据 O = ( o 1 , o 2 , ⋯   , o T ) O = \left( o_{1}, o_{2}, \cdots, o_{T} \right) O=(o1,o2,,oT)和模型 λ ( n ) = ( A ( n ) , B ( n ) , π ( n ) ) \lambda^{\left( n \right)} = \left( A^{\left( n \right)},B^{\left( n \right)},\pi^{\left( n \right)} \right) λ(n)=(A(n),B(n),π(n))计算。
  3. 终止
    得到模型 λ ( n + 1 ) = ( A ( n + 1 ) , B ( n + 1 ) , π ( n + 1 ) ) \lambda^{\left( n+1 \right)} = \left( A^{\left( n+1 \right)},B^{\left( n+1 \right)},\pi^{\left( n+1 \right)} \right) λ(n1)=(A(n+1),B(n+1),π(n+1))
    在时刻 t t t状态为 i i i的所有单个路径 ( i 1 , i 2 , ⋯   , i t ) \left( i_{1}, i_{2}, \cdots, i_{t} \right) (i1,i2,,it)中概率最大值
    δ t ( i ) = max ⁡ i 1 , i 2 , ⋯   , i t − 1 P ( i t = i , i t − 1 , ⋯   , i 1 , o t , ⋯   , o 1 ∣ λ ) i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left(i_{t}=i, i_{t-1}, \cdots, i_{1}, o_{t}, \cdots, o_{1} | \lambda \right) \quad \quad \quad i = 1, 2, \cdots, N \end{aligned} δt(i)=i1,i2,,it1maxP(it=i,it1,,i1,ot,,o1λ)i=1,2,,N

得递推公式 δ t + 1 ( i ) = max ⁡ i 1 , i 2 , ⋯   , i t P ( i t + 1 = i , i t , ⋯   , i 1 , o t + 1 , ⋯   , o 1 ∣ λ ) = max ⁡ 1 ≤ j ≤ N [ max ⁡ i 1 , i 2 , ⋯   , i t − 1 P ( i t + 1 = i , i t = j , i t − 1 , ⋯   , i 1 , o t + 1 , o t , ⋯   , o 1 ∣ λ ) ] = max ⁡ 1 ≤ j ≤ N [ max ⁡ i 1 , i 2 , ⋯   , i t − 1 P ( i t + 1 = i , i t = j , i t − 1 , ⋯   , i 1 , o t , o t − 1 , ⋯   , o 1 ∣ λ ) P ( o t + 1 ∣ i t + 1 = i , λ ) ] = max ⁡ 1 ≤ j ≤ N [ max ⁡ i 1 , i 2 , ⋯   , i t − 1 P ( i t = j , i t − 1 , ⋯   , i 1 , o t , o t − 1 , ⋯   , o 1 ∣ λ ) P ( i t + 1 = i ∣ i t = j , λ ) P ( o t + 1 ∣ i t + 1 = i , λ ) ] = max ⁡ 1 ≤ j ≤ N [ δ t ( j ) a j i ] b i ( o t + 1 ) i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \delta_{t+1} \left( i \right) = \max_{i_{1}, i_{2}, \cdots, i_{t}} P \left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} | \lambda \right) \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t+1}, o_{t}, \cdots, o_{1} | \lambda \right) \right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t+1}=i, i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \max_{i_{1}, i_{2}, \cdots, i_{t-1}} P \left( i_{t}=j, i_{t-1}, \cdots, i_{1}, o_{t}, o_{t-1}, \cdots, o_{1} | \lambda \right) P \left( i_{t+1}=i | i_{t}=j, \lambda \right)P \left( o_{t+1} | i_{t+1}=i, \lambda \right)\right] \\ & = \max_{1 \leq j \leq N} \left[ \delta_{t} \left( j \right) a_{ji}\right] b_{i} \left( o_{t+1} \right)\quad \quad \quad i = 1, 2, \cdots, N \end{aligned} δt+1(i)=i1,i2,,itmaxP(it+1=i,it,,i1,ot+1,,o1λ)=1jNmax[i1,i2,,it1maxP(it+1=i,it=j,it1,,i1,ot+1,ot,,o1λ)]=1jNmax[i1,i2,,it1maxP(it+1=i,it=j,it1,,i1,ot,ot1,,o1λ)P(ot+1it+1=i,λ)]=1jNmax[i1,i2,,it1maxP(it=j,it1,,i1,ot,ot1,,o1λ)P(it+1=iit=j,λ)P(ot+1it+1=i,λ)]=1jNmax[δt(j)aji]bi(ot+1)i=1,2,,N

在时刻 t t t状态为 i i i的所有单个路径 ( i 1 , i 2 , ⋯   , i t ) \left( i_{1}, i_{2}, \cdots, i_{t} \right) (i1,i2,,it)中概率最大值的路径的第 t − 1 t-1 t1个结点 ψ t ( i ) = arg ⁡ max ⁡ 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned} ψt(i)=arg1jNmax[δt1(j)aji]i=1,2,,N

4、预测算法

维特比算法

维特比算法:

  • 输入:模型 λ = ( A , B , π ) \lambda = \left( A, B, \pi \right) λ=(A,B,π)和观测数据 O = ( o 1 , o 2 , ⋯   , o T ) O = \left( o_{1}, o_{2}, \cdots, o_{T} \right) O=(o1,o2,,oT)
  • 输出:最优路径 I ∗ = ( i 1 ∗ , i 2 ∗ , ⋯   , i T ∗ ) I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right) I=(i1,i2,,iT)
  1. 初始化
    δ 1 ( i ) = π i b i ( o 1 ) i = 1 , 2 , ⋯   , N ψ 1 ( i ) = 0 \begin{aligned} \\ & \delta_{1} \left( i \right) = \pi_{i} b_{i} \left( o_{1} \right) \quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{1} \left( i \right) = 0 \end{aligned} δ1(i)=πibi(o1)i=1,2,,Nψ1(i)=0
  2. 递推
    t = 2 , 3 , ⋯   , T t=2,3, \cdots, T t=2,3,,T δ t ( i ) = max ⁡ 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] b i ( o t ) i = 1 , 2 , ⋯   , N ψ t ( i ) = arg ⁡ max ⁡ 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \delta_{t} \left( i \right) = \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji}\right] b_{i} \left( o_{t} \right)\quad \quad \quad i = 1, 2, \cdots, N \\ & \psi_{t} \left( i \right) = \arg \max_{1 \leq j \leq N} \left[ \delta_{t-1} \left( j \right) a_{ji} \right] \quad \quad \quad i = 1, 2, \cdots, N \end{aligned} δt(i)=1jNmax[δt1(j)aji]bi(ot)i=1,2,,Nψt(i)=arg1jNmax[δt1(j)aji]i=1,2,,N
  3. 终止 P ∗ = max ⁡ 1 ≤ j ≤ N δ T ( i ) i T ∗ = arg ⁡ max ⁡ 1 ≤ j ≤ N [ δ T ( i ) ] \begin{aligned} \\ & P^{*} = \max_{1 \leq j \leq N} \delta_{T} \left( i \right) \\ & i_{T}^{*} = \arg \max_{1 \leq j \leq N} \left[ \delta_{T} \left( i \right) \right] \end{aligned} P=1jNmaxδT(i)iT=arg1jNmax[δT(i)]
  4. 最优路径回溯
    t = T − 1 , T − 2 , ⋯   , 1 t=T-1,T-2, \cdots, 1 t=T1,T2,,1 i t ∗ = ψ t + 1 ( i t + 1 ∗ ) \begin{aligned} \\ & i_{t}^{*} = \psi_{t+1} \left( i_{t+1}^{*} \right) \end{aligned} it=ψt+1(it+1)
    求得最优路径 I ∗ = ( i 1 ∗ , i 2 ∗ , ⋯   , i T ∗ ) I^{*} = \left( i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*} \right) I=(i1,i2,,iT)

5、概要总结

1.隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态的序列,再由各个状态随机生成一个观测而产生观测的序列的过程。

隐马尔可夫模型由初始状态概率向 π \pi π、状态转移概率矩阵 A A A和观测概率矩阵 B B B决定。因此,隐马尔可夫模型可以写成 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)

隐马尔可夫模型是一个生成模型,表示状态序列和观测序列的联合分布,但是状态序列是隐藏的,不可观测的。

隐马尔可夫模型可以用于标注,这时状态对应着标记。标注问题是给定观测序列预测其对应的标记序列。

2.概率计算问题。给定模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)和观测序列 O = ( o 1 , o 2 , … , o T ) O=(o_1,o_2,…,o_T) O(o1o2,,oT),计算在模型 λ \lambda λ下观测序列 O O O出现的概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ)。前向-后向算法是通过递推地计算前向-后向概率可以高效地进行隐马尔可夫模型的概率计算。

3.学习问题。已知观测序列 O = ( o 1 , o 2 , … , o T ) O=(o_1,o_2,…,o_T) O(o1o2,,oT),估计模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)参数,使得在该模型下观测序列概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ)最大。即用极大似然估计的方法估计参数。Baum-Welch算法,也就是EM算法可以高效地对隐马尔可夫模型进行训练。它是一种非监督学习算法。

4.预测问题。已知模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)和观测序列 O = ( o 1 , o 2 , … , o T ) O=(o_1,o_2,…,o_T) O(o1o2,,oT),求对给定观测序列条件概率 P ( I ∣ O ) P(I|O) P(IO)最大的状态序列 I = ( i 1 , i 2 , … , i T ) I=(i_1,i_2,…,i_T) I(i1i2,,iT)。维特比算法应用动态规划高效地求解最优路径,即概率最大的状态序列。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值