HMM的概率计算问题

HMM的概率计算问题

HMM的概率计算问题是指,给定模型参数 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π) 和观测序列 O = ( o 1 , o 2 , . . . , o T ) O = (o_1,o_2,...,o_T) O=(o1,o2,...,oT),计算在模型 λ \lambda λ下,观测序列 O O O出现的概率: P ( O ∣ λ ) P(O | \lambda) P(Oλ)


直接计算

按概率公式直接计算,在贝叶斯框架下有:

P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O | \lambda) = \sum_{I} P(O,I | \lambda) = \sum_{I} P(O | I,\lambda)P(I | \lambda) P(Oλ)=IP(O,Iλ)=IP(OI,λ)P(Iλ)

  • 其中, P ( O ∣ I , λ ) P(O | I,\lambda) P(OI,λ)是从 i t → o t i_t \to o_t itot,由发射概率矩阵 [ b j ( k ) ] N × M [b_j(k)]_{N \times M} [bj(k)]N×M中获得:

P ( O ∣ I , λ ) = P ( o 1 ∣ i 1 ) . . . P ( o t ∣ i t ) . . . P ( o T ∣ i T ) = b i 1 ( o 1 ) . . . b i t ( o t ) . . . b i T ( o T ) P(O | I,\lambda) =P(o_1 | i_1)...P(o_t | i_t)...P(o_T | i_T) = b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T) P(OI,λ)=P(o1i1)...P(otit)...P(oTiT)=bi1(o1)...bit(ot)...biT(oT),共 T T T

  • P ( I ∣ λ ) P(I | \lambda) P(Iλ)是从 i t − 1 → i t i_{t-1} \to i_t it1it,由转移概率矩阵 [ a i j ] N × N [a_{ij}]_{N \times N} [aij]N×N和初始状态概率向量 π \pi π获得:

P ( I ∣ λ ) = π i 1 P ( i 2 ∣ i 1 ) . . . P ( i t ∣ i t − 1 ) . . . P ( i T ∣ i T − 1 ) = π i 1 a i 1 i 2 . . . a i t − 1 i t . . . a i T − 1 i T P(I | \lambda) = \pi_{i_1}P(i_2 | i_1) ...P(i_t | i_{t-1})...P(i_T | i_{T-1}) = \pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T} P(Iλ)=πi1P(i2i1)...P(itit1)...P(iTiT1)=πi1ai1i2...ait1it...aiT1iT,共 T T T

两式代入计算得:

  • P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) P(O | \lambda) = \sum_{I} P(O,I | \lambda) P(Oλ)=IP(O,Iλ)

= ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) = \sum_{I} P(O | I,\lambda)P(I | \lambda) =IP(OI,λ)P(Iλ)

= ∑ I [ b i 1 ( o 1 ) . . . b i t ( o t ) . . . b i T ( o T ) ] × [ π i 1 a i 1 i 2 . . . a i t − 1 i t . . . a i T − 1 i T ] = \sum_{I} [b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T)] \times [\pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T}] =I[bi1(o1)...bit(ot)...biT(oT)]×[πi1ai1i2...ait1it...aiT1iT]

= ∑ I π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1 = \sum_{I} \pi_{i_1} \prod_{t=1}^T b_{i_t}(o_t) \prod_{t=1}^{T-1}a_{i_t i_{t+1}} =Iπi1t=1Tbit(ot)t=1T1aitit+1

由于 ∑ I = ∑ i 1 . . . ∑ i t . . . ∑ i T \sum_{I} = \sum_{i_1}...\sum_{i_t}...\sum_{i_T} I=i1...it...iT,每个 i t i_t it N N N种取值可能,故 ∑ I \sum_{I} I共有 N T N^T NT项,可知若按概率公式直接计算 P ( O ∣ λ ) P(O | \lambda) P(Oλ),计算量会很大。


前向算法(Forward Algorithm)

找出从时刻 1 → . . . → t → . . . → T 1 \to ... \to t \to ... \to T 1...t...T,前向概率的递归关系:

前向概率

在观测时间点 1 , . . . , t , . . . , T 1,...,t,...,T 1,...,t,...,T上,对应的观测值为 o 1 , . . . , o t , . . . , o T o_1,...,o_t,...,o_T o1,...,ot,...,oT,各隐状态分别为 i 1 , . . . , i t , . . . , i T i_1,...,i_t,...,i_T i1,...,it,...,iT

i 1 → . . . → i t → . . . → i T i_1 \to ... \to i_t \to ...\to i_T i1...it...iT o 1 → . . . → o t → . . . → o T o_1 \to ... \to o_t \to ...\to o_T o1...ot...oT

定义前向概率 α t ( i ) = P ( o 1 , . . . , o t , i t = q i ∣ λ ) \alpha_t(i) = P(o_1,...,o_t,i_t = q_i | \lambda) αt(i)=P(o1,...,ot,it=qiλ)

它表示:截止到时刻 t t t,观测序列的值为 o 1 , o 2 , . . . , o t o_1,o_2,...,o_t o1,o2,...,ot、且 t t t时刻的状态为 q i q_i qi的概率。

递归过程的公式推导

根据定义,写出 t = 1 t=1 t=1 t = 2 t=2 t=2的前向概率:

  • α 1 ( i ) = P ( o 1 , i 1 = q i ∣ λ ) = P ( o 1 ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = b i ( o 1 ) π i \alpha_1(i) = P(o_1,i_1 = q_i | \lambda) = P(o_1 | i_1 = q_i, \lambda)P(i_1 = q_i | \lambda) = b_{i}(o_1) \pi_i α1(i)=P(o1,i1=qiλ)=P(o1i1=qi,λ)P(i1=qiλ)=bi(o1)πi

  • α 2 ( j ) = P ( o 1 , o 2 , i 2 = q j ∣ λ ) \alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda) α2(j)=P(o1,o2,i2=qjλ)
    $= \sum_{i=1}^N P(o_1,o_2,i_1 = q_i,i_2 = q_j | \lambda) $
    = ∑ i = 1 N P ( o 2 ∣ i 2 = q j , λ ) P ( i 2 = q j ∣ i 1 = q i , λ ) P ( o 1 ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = \sum_{i=1}^N P(o_2 | i_2 = q_j,\lambda)P(i_2 = q_j | i_1 = q_i,\lambda)P(o_1 | i_1 = q_i,\lambda) P(i_1 = q_i | \lambda) =i=1NP(o2i2=qj,λ)P(i2=qji1=qi,λ)P(o1i1=qi,λ)P(i1=qiλ)
    = ∑ i = 1 N b j ( o 2 ) a i j α 1 = \sum_{i=1}^N b_j(o_2) a_{ij} \alpha_1 =i=1Nbj(o2)aijα1
    = b j ( o 2 ) ∑ i = 1 N a i j α 1 ( i ) = b_j(o_2) \sum_{i=1}^N a_{ij} \alpha_1(i) =bj(o2)i=1Naijα1(i)

. . . ... ...

递推得到 α t + 1 ( j ) \alpha_{t+1}(j) αt+1(j) α t ( i ) \alpha_t(i) αt(i)之间的关系:

α t + 1 ( j ) = b j ( o t + 1 ) ∑ i = 1 N a i j α t ( i ) \alpha_{t+1}(j) = b_j(o_{t+1}) \sum_{i=1}^N a_{ij} \alpha_t(i) αt+1(j)=bj(ot+1)i=1Naijαt(i)

其中, j ∈ { 1 , 2 , . . . , N } j \in \{1,2,...,N\} j{1,2,...,N}

对递归过程的直观理解

t = 1 t=1 t=1 t = 2 t=2 t=2两个时刻为例,它们之间涉及到的观测值和隐状态有: o 1 o_1 o1 o 2 o_2 o2 i 1 i_1 i1 i 2 i_2 i2

i 1 → i 2 i_1 \to i_2 i1i2

o 1 → o 2 o_1 \to o_2 o1o2

当计算出 α 1 ( i ) = P ( o 1 , i 1 = q i ∣ λ ) , i ∈ { 1 , 2 , . . . , N } \alpha_1(i) = P(o_1,i_1 = q_i | \lambda), i \in \{1,2,...,N\} α1(i)=P(o1,i1=qiλ),i{1,2,...,N}后,我们手上的信息有:在时刻 t = 1 t=1 t=1,隐状态为 q 1 q_1 q1且观测值为 o 1 o_1 o1的概率 α 1 ( 1 ) \alpha_1(1) α1(1)、…、隐状态为 q N q_N qN且观测值为 o 1 o_1 o1的概率 α 1 ( N ) \alpha_1(N) α1(N)

而计算 α 2 ( j ) = P ( o 1 , o 2 , i 2 = q j ∣ λ ) , j ∈ { 1 , 2 , . . . , N } \alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda), j \in \{1,2,...,N\} α2(j)=P(o1,o2,i2=qjλ),j{1,2,...,N}意味着我们要求出:在时刻 t = 2 t=2 t=2,隐状态为 q 1 q_1 q1且过去两个观测值为 o 1 o_1 o1 o 2 o_2 o2的概率 α 2 ( 1 ) \alpha_2(1) α2(1)、…、隐状态为 q N q_N qN且过去两个观测值为 o 1 o_1 o1 o 2 o_2 o2的概率 α 2 ( N ) \alpha_2(N) α2(N)

如何利用 α 1 ( i ) \alpha_1(i) α1(i)来计算 α 2 ( j ) \alpha_2(j) α2(j)

对比我们已有的信息、待求的信息,发现我们需要确定的是观测值 o 2 o_2 o2,而 o 2 o_2 o2是通过 i 2 i_2 i2决定(即 b i 2 ( o 2 ) b_{i_2}(o_2) bi2(o2)), i 2 i_2 i2又由 i 1 i_1 i1确定(即 a i 1 i 2 a_{i_1 i_2} ai1i2)。因此,在每个 α 1 ( i ) \alpha_1(i) α1(i)的基础上,再加入 b i 2 ( o 2 ) b_{i_2}(o_2) bi2(o2) a i 1 i 2 a_{i_1 i_2} ai1i2这两个概率,就可求得 α 2 ( j ) \alpha_2(j) α2(j)

α 2 ( j ) = ∑ i 1 = 1 N α 1 ( i ) b i 2 ( o 2 ) a i 1 i 2 \alpha_2(j) = \sum_{i_1 = 1}^N \alpha_1(i) b_{i_2}(o_2) a_{i_1 i_2} α2(j)=i1=1Nα1(i)bi2(o2)ai1i2

稍作调整令 i 1 = q i , i 2 = q j i_1 = q_i, i_2 = q_j i1=qi,i2=qj,即可得:

α 2 ( j ) = ∑ i = 1 N α 1 ( i ) b j ( o 2 ) a i j = b j ( o 2 ) ∑ i = 1 N α 1 ( i ) a i j \alpha_2(j) = \sum_{i = 1}^N \alpha_1(i) b_{j}(o_2) a_{ij} = b_j(o_2) \sum_{i=1}^N \alpha_1(i) a_{ij} α2(j)=i=1Nα1(i)bj(o2)aij=bj(o2)i=1Nα1(i)aij

意义

为什么要计算前向概率?

  • 首先,前向概率可以帮助我们计算目标概率: P ( O ∣ λ ) P(O | \lambda) P(Oλ)。根据定义, t = T t=T t=T时刻的前向概率为:

α T ( i ) = P ( o 1 , . . . , o T , i T = q i ∣ λ ) \alpha_T(i) = P(o_1,...,o_T,i_T = q_i | \lambda) αT(i)=P(o1,...,oT,iT=qiλ)

因此, P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O | \lambda) = \sum_{i=1}^N \alpha_T(i) P(Oλ)=i=1NαT(i)

  • 其次,由于递归关系的存在,计算前向概率的工作量,远小于概率公式直接计算。注意到, i ∈ { 1 , 2 , . . . , N } i \in \{1,2,...,N\} i{1,2,...,N}。因此,计算 α 1 ( i ) \alpha_1(i) α1(i)需进行 N N N次运算;计算 α 2 ( i ) \alpha_2(i) α2(i)需进行 N N N次累加;…;计算 α T ( i ) \alpha_T(i) αT(i)需进行 N N N次累加。最终进行了 N × T N \times T N×T次运算,远小于 N T N^T NT
    计算量减少的原因在于,每一次计算直接引用前一个时刻的计算结果,避免重复计算。

后向算法(Backward Algorithm)

找出从时刻 T → . . . → t → . . . → 1 T \to ... \to t \to ... \to 1 T...t...1,后向概率的递归关系:

后向概率

在观测时间点 1 , . . . , t , . . . , T 1,...,t,...,T 1,...,t,...,T上,对应的观测值为 o 1 , . . . , o t , . . . , o T o_1,...,o_t,...,o_T o1,...,ot,...,oT,各隐状态分别为 i 1 , . . . , i t , . . . , i T i_1,...,i_t,...,i_T i1,...,it,...,iT

i 1 → . . . → i t → . . . → i T i_1 \to ... \to i_t \to ...\to i_T i1...it...iT o 1 → . . . → o t → . . . → o T o_1 \to ... \to o_t \to ...\to o_T o1...ot...oT

定义后向概率 β t ( i ) = P ( o t + 1 , . . . , o T ∣ i t = q i , λ ) \beta_t(i) = P(o_{t+1},...,o_T | i_t = q_i, \lambda) βt(i)=P(ot+1,...,oTit=qi,λ)

它表示:在 t t t时刻的状态为 q i q_i qi的条件下,对于 t t t之后的所有时刻,观测序列的值为 o t + 1 , o t + 2 , . . . , o T o_{t+1},o_{t+2},...,o_T ot+1,ot+2,...,oT的概率。

递归过程的公式推导

根据定义,写出 t = T t=T t=T t = T − 1 t=T-1 t=T1 t = T − 2 t=T-2 t=T2的后向概率:

  • β T ( i ) = 1 \beta_T(i) = 1 βT(i)=1

【注】:初始值等于 1 1 1是因为,后向概率考量的是 t t t时刻之后(不包括 t t t时刻)的观测值序列,我们的观测序列只持续到时刻 T T T T T T之后的观测值与状态都未知,所有的情况都是可能的,因此定义为 1 1 1

  • β T − 1 ( i ) = P ( o T ∣ i T − 1 = q i , λ ) \beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda) βT1(i)=P(oTiT1=qi,λ)
    = ∑ k = 1 N P ( o T , i T = q k ∣ i T − 1 = q i , λ ) = \sum_{k=1}^N P(o_T,i_T = q_k| i_{T-1} = q_i, \lambda) =k=1NP(oT,iT=qkiT1=qi,λ)
    = ∑ k = 1 N P ( o T ∣ i T = q k , λ ) P ( i T = q k ∣ i T − 1 = q i , λ ) = \sum_{k=1}^N P(o_T | i_T = q_k,\lambda) P(i_T = q_k | i_{T-1} = q_i, \lambda) =k=1NP(oTiT=qk,λ)P(iT=qkiT1=qi,λ)
    = ∑ k = 1 N b k ( o T ) a i k = \sum_{k=1}^N b_k(o_T) a_{ik} =k=1Nbk(oT)aik

  • β T − 2 ( j ) = P ( o T , o T − 1 ∣ i T − 2 = q j , λ ) \beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda) βT2(j)=P(oT,oT1iT2=qj,λ)
    = ∑ i = 1 N ∑ k = 1 N P ( o T , o T − 1 , i T = q k , i T − 1 = q i ∣ i T − 2 = q j , λ ) = \sum_{i=1}^N \sum_{k=1}^N P(o_T,o_{T-1},i_T=q_k,i_{T-1}=q_i | i_{T-2} = q_j, \lambda) =i=1Nk=1NP(oT,oT1,iT=qk,iT1=qiiT2=qj,λ)
    = ∑ i = 1 N ∑ k = 1 N P ( o T ∣ i T = q k , λ ) P ( i T = q k ∣ i T − 1 = q i , λ ) P ( o T − 1 ∣ i T − 1 = q i , λ ) P ( i T − 1 = q i ∣ i T − 2 = q j , λ ) = \sum_{i=1}^N \sum_{k=1}^N P(o_T | i_T=q_k, \lambda) P(i_T=q_k | i_{T-1}=q_i, \lambda) P(o_{T-1} | i_{T-1}=q_i, \lambda) P(i_{T-1}=q_i | i_{T-2}=q_j, \lambda) =i=1Nk=1NP(oTiT=qk,λ)P(iT=qkiT1=qi,λ)P(oT1iT1=qi,λ)P(iT1=qiiT2=qj,λ)
    = ∑ i = 1 N β T − 1 ( i ) b i ( o T − 1 ) a j i = \sum_{i=1}^N \beta_{T-1}(i) b_i(o_{T-1}) a_{ji} =i=1NβT1(i)bi(oT1)aji

. . . ... ...

递推得到 β t ( j ) \beta_t(j) βt(j) β t + 1 ( i ) \beta_{t+1}(i) βt+1(i)之间的关系:

β t ( j ) = ∑ i = 1 N β t + 1 ( i ) b i ( o t + 1 ) a j i \beta_t(j) = \sum_{i=1}^N \beta_{t+1}(i) b_i(o_{t+1}) a_{ji} βt(j)=i=1Nβt+1(i)bi(ot+1)aji

其中, j ∈ { 1 , 2 , . . . , N } j \in \{1,2,...,N\} j{1,2,...,N}

对递归过程的直观理解

t = T − 1 t = T-1 t=T1 t = T − 2 t = T-2 t=T2两个时刻为例,它们之间涉及到的观测值和隐状态有: o T − 2 o_{T-2} oT2 o T − 1 o_{T-1} oT1 o T o_T oT i T − 2 i_{T-2} iT2 i T − 1 i_{T-1} iT1 i T i_T iT

i T − 2 → i T − 1 → i T i_{T-2} \to i_{T-1} \to i_T iT2iT1iT

o T − 2 → o T − 1 → o T o_{T-2} \to o_{T-1}\to o_T oT2oT1oT

当计算出 β T − 1 ( i ) = P ( o T ∣ i T − 1 = q i , λ ) , i ∈ { 1 , 2 , . . . , N } \beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda), i \in \{1,2,...,N\} βT1(i)=P(oTiT1=qi,λ),i{1,2,...,N}后,我们手上的信息有:在时刻 t = T − 1 t = T-1 t=T1,隐状态为 q 1 q_1 q1的条件下,后面时刻的观测值为 o T o_T oT的概率 β T − 1 ( 1 ) \beta_{T-1}(1) βT1(1)、…、隐状态为 q N q_N qN的条件下,后面时刻的观测值为 o T o_T oT的概率 β T − 1 ( N ) \beta_{T-1}(N) βT1(N)

而计算 β T − 2 ( j ) = P ( o T , o T − 1 ∣ i T − 2 = q j , λ ) , j ∈ { 1 , 2 , . . . , N } \beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda), j \in \{1,2,...,N\} βT2(j)=P(oT,oT1iT2=qj,λ),j{1,2,...,N}意味着我们要求出:在时刻 t = T − 2 t = T-2 t=T2,隐状态为 q 1 q_1 q1的条件下,后面时刻的观测值为 o T o_T oT o T − 1 o_{T-1} oT1的概率 β T − 2 ( 1 ) \beta_{T-2}(1) βT2(1)、…、隐状态为 q N q_N qN的条件下,后面时刻的观测值为 o T o_T oT o T − 1 o_{T-1} oT1的概率 β T − 2 ( N ) \beta_{T-2}(N) βT2(N)

如何利用 β T − 1 ( i ) \beta_{T-1}(i) βT1(i)来计算 β T − 2 ( j ) \beta_{T-2}(j) βT2(j)

对比我们已有的信息、待求的信息,发现我们需要确定的是观测值 o T − 1 o_{T-1} oT1,而 o T − 1 o_{T-1} oT1是通过 i T − 1 i_{T-1} iT1决定(即 b i T − 1 ( o T − 1 ) b_{i_{T-1}}(o_{T-1}) biT1(oT1)), i T − 1 i_{T-1} iT1又由 i T − 2 i_{T-2} iT2确定(即 a i T − 2 i T − 1 a_{i_{T-2} i_{T-1}} aiT2iT1)。因此,在每个 β T − 1 ( i ) \beta_{T-1}(i) βT1(i)的基础上,再加入 b i T − 1 ( o T − 1 ) b_{i_{T-1}}(o_{T-1}) biT1(oT1) a i T − 2 i T − 1 a_{i_{T-2} i_{T-1}} aiT2iT1这两个概率,就可求得 β T − 2 ( j ) \beta_{T-2}(j) βT2(j)

β T − 2 ( j ) = ∑ i T − 1 = 1 N β T − 1 ( i ) b i T − 1 ( o T − 1 ) a i T − 2 i T − 1 \beta_{T-2}(j) = \sum_{i_{T-1} = 1}^N \beta_{T-1}(i) b_{i_{T-1}}(o_{T-1}) a_{i_{T-2} i_{T-1}} βT2(j)=iT1=1NβT1(i)biT1(oT1)aiT2iT1

稍作调整令 t = T − 2 , t + 1 = T − 1 , i T − 1 = q i , i T − 2 = q j t = T-2, t+1 = T-1, i_{T-1} = q_i, i_{T-2} = q_j t=T2,t+1=T1,iT1=qi,iT2=qj,即可得:

β t ( j ) = ∑ i = 1 N β t + 1 ( i ) b i ( o t + 1 ) a j i \beta_{t}(j) = \sum_{i = 1}^N \beta_{t+1}(i) b_{i}(o_{t+1}) a_{ji} βt(j)=i=1Nβt+1(i)bi(ot+1)aji

意义

为什么要计算后向概率?

  • 首先,后向概率也可以帮助我们计算目标概率: P ( O ∣ λ ) P(O | \lambda) P(Oλ)。根据定义, t = 1 t=1 t=1时刻的后向概率为:

β 1 ( i ) = P ( o 2 , . . . , o T ∣ i 1 = q i , λ ) \beta_1(i) = P(o_2,...,o_T | i_1 = q_i, \lambda) β1(i)=P(o2,...,oTi1=qi,λ)

此时 β 1 ( i ) \beta_1(i) β1(i)与目标概率 P ( O ∣ λ ) P(O | \lambda) P(Oλ)相比,还差一个观测值 o 1 o_1 o1。由于所有的观测都相互独立,在 t = 1 t=1 t=1时刻、状态为 q i q_i qi的条件下,观测值 o 1 o_1 o1出现的条件概率为: P ( o 1 ∣ i 1 = q i , λ ) = b i ( o 1 ) P(o_1 | i_1 = q_i, \lambda) = b_i(o_1) P(o1i1=qi,λ)=bi(o1)

两式相乘,得到所有观测值 O = ( o 1 , . . . , o T ) O = (o_1,...,o_T) O=(o1,...,oT) t = 1 t=1 t=1时刻、状态为 q i q_i qi条件下的联合概率: P ( o 1 , . . . , o T ∣ i 1 = q i , λ ) = β 1 ( i ) b i ( o 1 ) P(o_1,...,o_T | i_1 = q_i, \lambda) = \beta_1(i) b_i(o_1) P(o1,...,oTi1=qi,λ)=β1(i)bi(o1)

因此,目标概率 P ( O ∣ λ ) = ∑ i = 1 N P ( o 1 , . . . , o T ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = ∑ i = 1 N β 1 ( i ) b i ( o 1 ) π i P(O | \lambda) = \sum_{i=1}^N P(o_1,...,o_T | i_1 = q_i, \lambda) P(i_1 = q_i| \lambda ) = \sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_i P(Oλ)=i=1NP(o1,...,oTi1=qi,λ)P(i1=qiλ)=i=1Nβ1(i)bi(o1)πi

  • 其次,后向概率与前向概率的计算量一样,最终进行了 N × T N \times T N×T次运算,都远远小于概率公式直接计算的 N T N^T NT项。

前向-后向算法(Forward-Backward Algorithm)

前向算法利用前向概率,从 1 → T 1 \to T 1T的方向计算 P ( O ∣ λ ) P(O | \lambda) P(Oλ) = ∑ i = 1 N α T ( i ) \sum_{i=1}^N \alpha_T(i) i=1NαT(i)

后向算法利用后向概率,从 T → 1 T \to 1 T1的方向计算 P ( O ∣ λ ) P(O | \lambda) P(Oλ) = ∑ i = 1 N β 1 ( i ) b i ( o 1 ) π i \sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_i i=1Nβ1(i)bi(o1)πi

也可以同时用前向概率、后向概率计算 P ( O ∣ λ ) P(O | \lambda) P(Oλ)

P ( O ∣ λ ) = ∑ i = 1 N P ( O , i t = q i ∣ λ ) P(O | \lambda) = \sum_{i=1}^N P(O,i_t = q_i | \lambda) P(Oλ)=i=1NP(O,it=qiλ)

= ∑ i = 1 N P ( O ∣ i t = q i , λ ) P ( i t = q i ∣ λ ) = \sum_{i=1}^N P(O | i_t = q_i,\lambda) P(i_t = q_i | \lambda) =i=1NP(Oit=qi,λ)P(it=qiλ)

= ∑ i = 1 N P ( o 1 , . . . , o t ∣ i t = q i , λ ) P ( o t + 1 , . . . , o T ∣ i t = q i , λ ) P ( i t = q i ∣ λ ) = \sum_{i=1}^N P(o_1,...,o_t | i_t = q_i,\lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda) P(i_t = q_i | \lambda) =i=1NP(o1,...,otit=qi,λ)P(ot+1,...,oTit=qi,λ)P(it=qiλ)

= ∑ i = 1 N P ( o 1 , . . . , o t , i t = q i ∣ λ ) P ( o t + 1 , . . . , o T ∣ i t = q i , λ ) = \sum_{i=1}^N P(o_1,...,o_t,i_t = q_i | \lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda) =i=1NP(o1,...,ot,it=qiλ)P(ot+1,...,oTit=qi,λ)

= ∑ i = 1 N α t ( i ) β t ( i ) = \sum_{i=1}^N \alpha_t(i) \beta_t(i) =i=1Nαt(i)βt(i)

若利用后向概率的递推关系,替换 β t ( i ) = ∑ j = 1 N β t + 1 ( j ) b j ( o t + 1 ) a i j \beta_{t}(i) = \sum_{j = 1}^N \beta_{t+1}(j) b_{j}(o_{t+1}) a_{ij} βt(i)=j=1Nβt+1(j)bj(ot+1)aij,又有:

P ( O ∣ λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j P(O | \lambda) = \sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij} P(Oλ)=i=1Nj=1Nαt(i)βt+1(j)bj(ot+1)aij


其他概率的计算

利用前向、后向概率,还可以进行其他的计算:

  • 给定模型 λ \lambda λ,则观测序列为 O = ( o 1 , . . . , o T ) O=(o_1,...,o_T) O=(o1,...,oT)、且 t t t时刻的隐状态为 q i q_i qi的概率:

P ( O , i t = q i ∣ λ ) = α t ( i ) β t ( i ) P(O,i_t = q_i | \lambda) = \alpha_t(i) \beta_t(i) P(O,it=qiλ)=αt(i)βt(i)

  • 给定模型 λ \lambda λ和观测序列 O = ( o 1 , . . . , o T ) O=(o_1,...,o_T) O=(o1,...,oT),则 t t t时刻的隐状态为 q i q_i qi的概率(单个状态):

P ( i t = q i ∣ O , λ ) = P ( O , i t = q i ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) P(i_t = q_i | O,\lambda) = \frac{P(O,i_t = q_i | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \beta_t(j)} P(it=qiO,λ)=P(Oλ)P(O,it=qiλ)=j=1Nαt(j)βt(j)αt(i)βt(i)

  • 给定模型 λ \lambda λ和观测序列 O = ( o 1 , . . . , o T ) O=(o_1,...,o_T) O=(o1,...,oT),则 t t t时刻的隐状态为 q i q_i qi、且 t + 1 t+1 t+1时刻的隐状态为 q j q_j qj的概率(两个状态):

P ( i t = q i , i t + 1 = q j ∣ O , λ ) = P ( O , i t = q i , i t + 1 = q j ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j ∑ i = 1 N ∑ j = 1 N α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j P(i_t = q_i,i_{t+1} = q_j | O,\lambda) = \frac{P(O,i_t = q_i,i_{t+1} = q_j | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}}{\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}} P(it=qi,it+1=qjO,λ)=P(Oλ)P(O,it=qi,it+1=qjλ)=i=1Nj=1Nαt(i)βt+1(j)bj(ot+1)aijαt(i)βt+1(j)bj(ot+1)aij

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值