隐马尔科夫模型(HMM):计算观测序列的出现概率

引言

给定模型 λ = ( A , B , π ) \lambda = (A, B, \pi) λ=(A,B,π),计算观测序列 O ( o i , o 2 , ⋯   , o T ) O(o_i, o_2, \cdots, o_T) O(oi,o2,,oT) 的出现概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ),是隐马尔科夫模型(HMM)能解决的基本问题之一。

1. 直接计算法

容易想到,解决此问题的最直观方法是直接按概率公式计算 P ( O ∣ λ ) P(O|\lambda) P(Oλ)。该方法首先穷举所有可能的状态序列组合 S = ( s 1 , s 2 , ⋯   , s T ) S = (s_1, s_2, \cdots, s_T) S=(s1,s2,,sT),然后求每种可能的状态序列与观测序列共同出现的联合概率 P ( O , S ∣ λ ) P(O, S | \lambda) P(O,Sλ)

对于长度为T的序列,所有可能的状态序列组合共有 ∏ 1 T C N 1 = N T \prod_{1}^{T} C_{N}^{1} = N^T 1TCN1=NT 种;然后根据观测独立性假设,每种状态序列产生给定观测序列的概率等于 ∏ t = 1 T P ( o t ∣ s t ) \prod_{t=1}^{T}P(o_t|s_t) t=1TP(otst),所以 P ( O , S ∣ λ ) P(O, S | \lambda) P(O,Sλ) 计算的时间复杂度是 O ( T N T ) O(TN^T) O(TNT),这在工程实践中难以被接受。

由此,提出了基于动态规划的前向算法(forward algorithm)和后向算法(backward algorithm)改进方案。

2. 前向算法

给定隐马尔科夫模型 λ \lambda λ,定义在 t t t 时刻状态为 q i q_i qi 且到 t t t 时刻部分观测序列为 ( o 1 , ⋯   , o t ) (o_1, \cdots, o_t) (o1,,ot) 的概率为前向概率,记作:
α t ( q i ) = P ( s t = q i , o 1 , ⋯   , o t ∣ λ ) (2.1) \alpha_t(q_i) = P(s_t = q_i, o_1, \cdots, o_t | \lambda) \tag{2.1} αt(qi)=P(st=qi,o1,,otλ)(2.1)
根据条件概率公式, P ( s t , o 1 , ⋯   , o t ∣ λ ) = P ( s t , o 1 , ⋯   , o t , λ ) / P ( λ ) P(s_t, o_1, \cdots, o_t | \lambda) = P(s_t, o_1, \cdots, o_t, \lambda)/P(\lambda) P(st,o1,,otλ)=P(st,o1,,ot,λ)/P(λ),且对于给定模型参数 P ( λ ) P(\lambda) P(λ) 必然等于1、 λ \lambda λ 是一个必然事件,因此不妨将 式 1 式1 1公式简写为:

α t ( q i ) = P ( s t = q i , o 1 , ⋯   , o t ) (2.2) \alpha_t(q_i) = P(s_t = q_i, o_1, \cdots, o_t) \tag{2.2} αt(qi)=P(st=qi,o1,,ot)(2.2)
动态规划的核心是状态转移方程,我们可知前向算法的状态转移方程必然写成:前一时刻序列状态的概率乘以某一项后等于后一时刻序列状态的概率形式,我们不妨把它写成如下所示:

P ( s t , o 1 , ⋯   , o t ) = F u n ( ) P ( s t − 1 , o 1 , ⋯   , o t − 1 ) (2.3) P(s_t, o_1, \cdots, o_t) = Fun()P(s_{t-1}, o_1, \cdots, o_{t-1}) \tag{2.3} P(st,o1,,ot)=Fun()P(st1,o1,,ot1)(2.3)

式 2.3 式2.3 2.3可见,前后时刻间序列状态的概率仅相差一个变量 s t s_t st 和一个常数项 o t o_t ot,我们不妨借助边缘概率性质将 t t t 时刻序列状态的概率 P ( s t , o 1 , ⋯   , o t ) P(s_t, o_1, \cdots, o_t) P(st,o1,,ot) 改写成如下所示:

P ( s t , o 1 , ⋯   , o t ) = ∑ j = 1 N P ( s t − 1 = q j , s t , o 1 , ⋯   , o t − 1 , o t ) (2.4) P(s_t, o_1, \cdots, o_t) = \sum_{j=1}^{N} P(s_{t-1} = q_j, s_t, o_1, \cdots, o_{t-1}, o_t) \tag{2.4} P(st,o1,,ot)=j=1NP(st1=qj,st,o1,,ot1,ot)(2.4)

根据概率论中的链式法则(条件概率公式的推广),可将联合概率依次拆解成多个条件概率的乘积,所以 式 2.4 式2.4 2.4中改写后的联合概率可拆解成如下形式:
P ( s t − 1 = q j , s t , o 1 , ⋯   , o t − 1 , o t ) = P ( s t − 1 = q j , o 1 , ⋯   , o t − 1 ) P (   s t   ∣ s t − 1 = q j , o 1 , ⋯   , o t − 1 ) P (   o t   ∣ s t − 1 = q j , s t , o 1 , ⋯   , o t − 1 ) P(s_{t-1} = q_j, s_t, o_1, \cdots, o_{t-1}, o_t) = P(s_{t-1} = q_j, o_1, \cdots, o_{t-1}) P(\ s_t\ | s_{t-1} = q_j, o_1, \cdots, o_{t-1}) P(\ o_t \ | s_{t-1} = q_j, s_t, o_1, \cdots, o_{t-1}) P(st1=qj,st,o1,,ot1,ot)=P(st1=qj,o1,,ot1)P( st st1=qj,o1,,ot1)P( ot st1=qj,st,o1,,ot1)
根据隐马尔科夫模型(HMM)的齐次马尔科夫性假设,式中 P (   s t   ∣ s t − 1 = q j , o 1 , ⋯   , o t − 1 ) = P (   s t   ∣ s t − 1 = q j ) P(\ s_t\ | s_{t-1} = q_j, o_1, \cdots, o_{t-1}) = P(\ s_t\ | s_{t-1} = q_j) P( st st1=qj,o1,,ot1)=P( st st1=qj);又根据隐马尔科夫模型(HMM)的观测独立性假设,式中 P (   o t   ∣ s t − 1 = q j , s t , o 1 , ⋯   , o t − 1 ) = P (   o t   ∣ s t ) P(\ o_t \ | s_{t-1} = q_j, s_t, o_1, \cdots, o_{t-1}) = P(\ o_t \ | s_t) P( ot st1=qj,st,o1,,ot1)=P( ot st)。所以 式 2.4 式2.4 2.4可简化为:

P ( s t , o 1 , ⋯   , o t ) = ∑ j = 1 N P ( s t − 1 = q j , o 1 , ⋯   , o t − 1 ) P (   s t   ∣ s t − 1 = q j ) P (   o t   ∣ s t ) (2.5) P(s_t, o_1, \cdots, o_t) = \sum_{j=1}^{N} P(s_{t-1} = q_j, o_1, \cdots, o_{t-1})P(\ s_t\ | s_{t-1} = q_j)P(\ o_t \ | s_t) \tag{2.5} P(st,o1,,ot)=j=1NP(st1=qj,o1,,ot1)P( st st1=qj)P( ot st)(2.5)

带入模型参数后,前向概率表达式可表示为:

α t ( q i ) = [ ∑ j = 1 N α t − 1 ( q j ) a j i ] b i ( o t ) ,       t = 2 , 3 , ⋯   , T (2.6) \alpha_t(q_i) = \begin{bmatrix} \sum_{j=1}^{N} \alpha_{t-1}(q_j)a_{ji} \end{bmatrix} b_i(o_t) , \ \ \ \ \ t=2, 3, \cdots, T \tag{2.6} αt(qi)=[j=1Nαt1(qj)aji]bi(ot),     t=2,3,,T(2.6)

所以,给定隐马尔科夫模型 λ \lambda λ,观测序列的出现概率为:
P ( O ∣ λ ) = ∑ i = 1 N α T ( q i ) ,        α 1 ( q i ) = π i b i ( o 1 ) (2.7) P(O|\lambda) = \sum_{i=1}^{N} \alpha_T(q_i), \ \ \ \ \ \ \alpha_1(q_i) = \pi_ib_i(o_1) \tag{2.7} P(Oλ)=i=1NαT(qi),      α1(qi)=πibi(o1)(2.7)

3. 后向算法

给定隐马尔科夫模型 λ \lambda λ,定义在 t t t 时刻状态为 q i q_i qi 的条件下,从 t + 1 t+1 t+1 T T T 时刻的部分观测序列为 ( o t + 1 , ⋯   , o T ) (o_{t+1}, \cdots, o_T) (ot+1,,oT) 的概率为后向概率,记作:
β t ( q i ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i , λ ) (3.1) \beta_t(q_i) = P(o_{t+1}, \cdots, o_T | s_t = q_i, \lambda) \tag{3.1} βt(qi)=P(ot+1,,oTst=qi,λ)(3.1)

与前向算法同理,后向算法的状态转移方程为:
β t ( q i ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i ) (3.2) \beta_t(q_i) = P(o_{t+1}, \cdots, o_T | s_t = q_i) \tag{3.2} βt(qi)=P(ot+1,,oTst=qi)(3.2) P ( o t + 1 , ⋯   , o T ∣ s t = q i ) = F u n ( ) P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j ) (3.3) P(o_{t+1}, \cdots, o_T | s_t = q_i) = Fun()P(o_{t+2}, \cdots, o_T | s_{t+1} = q_j) \tag{3.3} P(ot+1,,oTst=qi)=Fun()P(ot+2,,oTst+1=qj)(3.3) P ( o t + 1 , ⋯   , o T ∣ s t = q i ) = ∑ j = 1 N P ( s t + 1 = q j , o t + 1 , o t + 2 , ⋯   , o T ∣ s t = q i ) (3.4) P(o_{t+1}, \cdots, o_T | s_t = q_i) = \sum_{j=1}^{N} P(s_{t+1} = q_j, o_{t+1}, o_{t+2}, \cdots, o_T | s_t = q_i) \tag{3.4} P(ot+1,,oTst=qi)=j=1NP(st+1=qj,ot+1,ot+2,,oTst=qi)(3.4) { P ( s t + 1 , o t + 1 , o t + 2 , ⋯   , o T ∣ s t ) = P ( o t + 2 , ⋯   , o T ∣ s t , s t + 1 , o t + 1 ) P ( o t + 1 ∣ s t , s t + 1 ) P ( s t + 1 ∣ s t ) P ( o t + 2 , ⋯   , o T ∣ s t , s t + 1 , o t + 1 ) = P ( o t + 2 , ⋯   , o T ∣ s t + 1 ) P ( o t + 1 ∣ s t , s t + 1 ) = P ( o t + 1 ∣ s t + 1 ) (3.5) \begin{cases} P(s_{t+1}, o_{t+1}, o_{t+2}, \cdots, o_T | s_t) = P(o_{t+2}, \cdots, o_T | s_t, s_{t+1}, o_{t+1})P(o_{t+1} | s_t, s_{t+1})P(s_{t+1}| s_t) \\ P(o_{t+2}, \cdots, o_T | s_t, s_{t+1}, o_{t+1}) = P(o_{t+2}, \cdots, o_T | s_{t+1}) \\ P(o_{t+1} | s_t, s_{t+1}) = P(o_{t+1} | s_{t+1}) \end{cases} \tag{3.5} P(st+1,ot+1,ot+2,,oTst)=P(ot+2,,oTst,st+1,ot+1)P(ot+1st,st+1)P(st+1st)P(ot+2,,oTst,st+1,ot+1)=P(ot+2,,oTst+1)P(ot+1st,st+1)=P(ot+1st+1)(3.5) β t ( q i ) = ∑ j = 1 N P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j ) P ( o t + 1 ∣ s t + 1 = q j ) P ( s t + 1 = q j ∣ s t = q i ) (3.6) \beta_t(q_i) = \sum_{j=1}^{N} P(o_{t+2}, \cdots, o_T | s_{t+1}=q_j)P(o_{t+1} | s_{t+1}=q_j)P(s_{t+1}=q_j| s_t=q_i) \tag{3.6} βt(qi)=j=1NP(ot+2,,oTst+1=qj)P(ot+1st+1=qj)P(st+1=qjst=qi)(3.6)
β t ( q i ) = ∑ j = 1 N β t + 1 ( q j ) b j ( o t + 1 ) a i j (3.7) \beta_t(q_i) = \sum_{j=1}^{N} \beta_{t+1}(q_j)b_j(o_{t+1})a_{ij} \tag{3.7} βt(qi)=j=1Nβt+1(qj)bj(ot+1)aij(3.7)
P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( q i ) ,        β T ( q i ) = 1 (3.8) P(O|\lambda) = \sum_{i=1}^{N} \pi_i b_i(o_1) \beta_1(q_i) , \ \ \ \ \ \ \beta_T(q_i) = 1 \tag{3.8} P(Oλ)=i=1Nπibi(o1)β1(qi),      βT(qi)=1(3.8)

4. 序列中间时刻状态概率计算

由前向算法和后向算法,可以推出序列中间某一时刻处于状态 q i q_i qi 的概率 γ t ( q i ) \gamma_t(q_i) γt(qi),和序列中间相邻的某两时刻分别处于状态 q i q_i qi q j q_j qj 的概率 ξ t ( q i , q j ) \xi_t(q_i, q_j) ξt(qi,qj),这称作F/B算法(Forward / Backward Algorithm)。

4.1 序列中某一时刻所处状态的概率

γ t ( q i ) \gamma_t(q_i) γt(qi) 记作序列中某一时刻所处状态的概率,其表达式为:
γ t ( q i ) = P ( s t = q i ∣ O , λ ) (4.1.1) \gamma_t(q_i) = P(s_t = q_i | O, \lambda) \tag{4.1.1} γt(qi)=P(st=qiO,λ)(4.1.1)
γ t ( q i ) = P ( s t = q i , O , λ ) P ( O ∣ λ ) (4.1.2) \gamma_t(q_i) = \frac{P(s_t = q_i, O, \lambda)}{P(O | \lambda)} \tag{4.1.2} γt(qi)=P(Oλ)P(st=qi,O,λ)(4.1.2)
γ t ( q i ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i , o 1 , ⋯   , o t ) P ( s t = q i , o 1 , ⋯   , o t ) ∑ i = 1 N P ( s t = q i , O ) (4.1.3) \gamma_t(q_i) = \frac{ P(o_{t+1}, \cdots, o_T | s_t = q_i, o_1, \cdots, o_t)P(s_t = q_i, o_1, \cdots, o_t) }{ \sum_{i=1}^{N} P(s_t = q_i, O) } \tag{4.1.3} γt(qi)=i=1NP(st=qi,O)P(ot+1,,oTst=qi,o1,,ot)P(st=qi,o1,,ot)(4.1.3)
∵ P ( o t + 1 , ⋯   , o T ∣ s t = q i , o 1 , ⋯   , o t ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i ) (4.1.4) \because P(o_{t+1}, \cdots, o_T | s_t = q_i, o_1, \cdots, o_t) = P(o_{t+1}, \cdots, o_T | s_t = q_i) \tag{4.1.4} P(ot+1,,oTst=qi,o1,,ot)=P(ot+1,,oTst=qi)(4.1.4)
∴ γ t ( q i ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i ) P ( s t = q i , o 1 , ⋯   , o t ) ∑ i = 1 N P ( s t = q i , O ) (4.1.5) \therefore \gamma_t(q_i) = \frac{ P(o_{t+1}, \cdots, o_T | s_t = q_i)P(s_t = q_i, o_1, \cdots, o_t) }{ \sum_{i=1}^{N} P(s_t = q_i, O) } \tag{4.1.5} γt(qi)=i=1NP(st=qi,O)P(ot+1,,oTst=qi)P(st=qi,o1,,ot)(4.1.5)
γ t ( q i ) = P ( o t + 1 , ⋯   , o T ∣ s t = q i ) P ( s t = q i , o 1 , ⋯   , o t ) ∑ i = 1 N P ( o t + 1 , ⋯   , o T ∣ s t = q i ) P ( s t = q i , o 1 , ⋯   , o t ) (4.1.6) \gamma_t(q_i) = \frac{ P(o_{t+1}, \cdots, o_T | s_t = q_i)P(s_t = q_i, o_1, \cdots, o_t) }{ \sum_{i=1}^{N} P(o_{t+1}, \cdots, o_T | s_t = q_i)P(s_t = q_i, o_1, \cdots, o_t) } \tag{4.1.6} γt(qi)=i=1NP(ot+1,,oTst=qi)P(st=qi,o1,,ot)P(ot+1,,oTst=qi)P(st=qi,o1,,ot)(4.1.6)
γ t ( q i ) = α t ( q i ) β t ( q i ) ∑ i = 1 N α t ( q i ) β t ( q i ) (4.1.7) \gamma_t(q_i) = \frac{ \alpha_t(q_i) \beta_t(q_i) }{ \sum_{i=1}^{N} \alpha_t(q_i) \beta_t(q_i) } \tag{4.1.7} γt(qi)=i=1Nαt(qi)βt(qi)αt(qi)βt(qi)(4.1.7)

4.2 序列中某两相邻时刻所处状态的概率

ξ t ( q i , q j ) \xi_t(q_i, q_j) ξt(qi,qj) 记作序列中间某两相邻时刻所处状态的概率,其表达式为:
ξ t ( q i , q j ) = P ( s t = q i , s t + 1 = q j ∣ O , λ ) (4.2.1) \xi_t(q_i, q_j) = P(s_t = q_i, s_{t+1}=q_j | O, \lambda) \tag{4.2.1} ξt(qi,qj)=P(st=qi,st+1=qjO,λ)(4.2.1)
ξ t ( q i , q j ) = P ( s t = q i , s t + 1 = q j , O , λ ) P ( O ∣ λ ) (4.2.2) \xi_t(q_i, q_j) = \frac{ P(s_t = q_i, s_{t+1}=q_j, O, \lambda) }{ P(O | \lambda) } \tag{4.2.2} ξt(qi,qj)=P(Oλ)P(st=qi,st+1=qj,O,λ)(4.2.2)
ξ t ( q i , q j ) = P ( s t = q i , s t + 1 = q j , O ) ∑ i = 1 N ∑ j = 1 N P ( s t = q i , s t + 1 = q j , O ) (4.2.3) \xi_t(q_i, q_j) = \frac{ P(s_t = q_i, s_{t+1}=q_j, O) }{ \sum_{i=1}^{N} \sum_{j=1}^{N} P(s_t = q_i, s_{t+1}=q_j, O) } \tag{4.2.3} ξt(qi,qj)=i=1Nj=1NP(st=qi,st+1=qj,O)P(st=qi,st+1=qj,O)(4.2.3)
∵ { P ( s t = q i , s t + 1 = q j , O ) = P ( s t = q i , o 1 , ⋯   , o t ) P ( s t + 1 = q j ∣ s t = q i ) P ( o t + 1 ∣ s t + 1 = q j ) P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j ) P ( s t = q i , o 1 , ⋯   , o t ) = α t ( q i ) P ( s t + 1 = q j ∣ s t = q i ) = a i j P ( o t + 1 ∣ s t + 1 = q j ) = b j ( o t + 1 ) P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j ) = β t + 1 ( q i ) (4.2.4) \because\begin{cases} P(s_t = q_i, s_{t+1}=q_j, O) = P(s_t = q_i, o_1, \cdots, o_t)P(s_{t+1} = q_j|s_t = q_i)P(o_{t+1}|s_{t+1} = q_j) P(o_{t+2}, \cdots, o_T | s_{t+1} = q_j) \\ P(s_t = q_i, o_1, \cdots, o_t) = \alpha_t(q_i) \\ P(s_{t+1} = q_j|s_t = q_i) = a_{ij} \\ P(o_{t+1}|s_{t+1} = q_j) = b_j(o_{t+1}) \\ P(o_{t+2}, \cdots, o_T | s_{t+1} = q_j) = \beta_{t+1}(q_i) \end{cases} \tag{4.2.4} P(st=qi,st+1=qj,O)=P(st=qi,o1,,ot)P(st+1=qjst=qi)P(ot+1st+1=qj)P(ot+2,,oTst+1=qj)P(st=qi,o1,,ot)=αt(qi)P(st+1=qjst=qi)=aijP(ot+1st+1=qj)=bj(ot+1)P(ot+2,,oTst+1=qj)=βt+1(qi)(4.2.4)
∴ ξ t ( q i , q j ) = α t ( q i ) a i j b j ( o t + 1 ) β t + 1 ( q i ) ∑ i = 1 N ∑ j = 1 N α t ( q i ) a i j b j ( o t + 1 ) β t + 1 ( q i ) (4.2.5) \therefore \xi_t(q_i, q_j) = \frac{ \alpha_t(q_i)a_{ij}b_j(o_{t+1})\beta_{t+1}(q_i) }{ \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_t(q_i)a_{ij}b_j(o_{t+1})\beta_{t+1}(q_i) } \tag{4.2.5} ξt(qi,qj)=i=1Nj=1Nαt(qi)aijbj(ot+1)βt+1(qi)αt(qi)aijbj(ot+1)βt+1(qi)(4.2.5)

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值