HMM的概率计算问题
HMM的概率计算问题是指,给定模型参数 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π) 和观测序列 O = ( o 1 , o 2 , . . . , o T ) O = (o_1,o_2,...,o_T) O=(o1,o2,...,oT),计算在模型 λ \lambda λ下,观测序列 O O O出现的概率: P ( O ∣ λ ) P(O | \lambda) P(O∣λ)。
直接计算
按概率公式直接计算,在贝叶斯框架下有:
P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O | \lambda) = \sum_{I} P(O,I | \lambda) = \sum_{I} P(O | I,\lambda)P(I | \lambda) P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)P(I∣λ)
- 其中, P ( O ∣ I , λ ) P(O | I,\lambda) P(O∣I,λ)是从 i t → o t i_t \to o_t it→ot,由发射概率矩阵 [ b j ( k ) ] N × M [b_j(k)]_{N \times M} [bj(k)]N×M中获得:
P ( O ∣ I , λ ) = P ( o 1 ∣ i 1 ) . . . P ( o t ∣ i t ) . . . P ( o T ∣ i T ) = b i 1 ( o 1 ) . . . b i t ( o t ) . . . b i T ( o T ) P(O | I,\lambda) =P(o_1 | i_1)...P(o_t | i_t)...P(o_T | i_T) = b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T) P(O∣I,λ)=P(o1∣i1)...P(ot∣it)...P(oT∣iT)=bi1(o1)...bit(ot)...biT(oT),共 T T T项
- P ( I ∣ λ ) P(I | \lambda) P(I∣λ)是从 i t − 1 → i t i_{t-1} \to i_t it−1→it,由转移概率矩阵 [ a i j ] N × N [a_{ij}]_{N \times N} [aij]N×N和初始状态概率向量 π \pi π获得:
P ( I ∣ λ ) = π i 1 P ( i 2 ∣ i 1 ) . . . P ( i t ∣ i t − 1 ) . . . P ( i T ∣ i T − 1 ) = π i 1 a i 1 i 2 . . . a i t − 1 i t . . . a i T − 1 i T P(I | \lambda) = \pi_{i_1}P(i_2 | i_1) ...P(i_t | i_{t-1})...P(i_T | i_{T-1}) = \pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T} P(I∣λ)=πi1P(i2∣i1)...P(it∣it−1)...P(iT∣iT−1)=πi1ai1i2...ait−1it...aiT−1iT,共 T T T项
两式代入计算得:
- P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) P(O | \lambda) = \sum_{I} P(O,I | \lambda) P(O∣λ)=∑IP(O,I∣λ)
= ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) = \sum_{I} P(O | I,\lambda)P(I | \lambda) =∑IP(O∣I,λ)P(I∣λ)
= ∑ I [ b i 1 ( o 1 ) . . . b i t ( o t ) . . . b i T ( o T ) ] × [ π i 1 a i 1 i 2 . . . a i t − 1 i t . . . a i T − 1 i T ] = \sum_{I} [b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T)] \times [\pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T}] =∑I[bi1(o1)...bit(ot)...biT(oT)]×[πi1ai1i2...ait−1it...aiT−1iT]
= ∑ I π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1 = \sum_{I} \pi_{i_1} \prod_{t=1}^T b_{i_t}(o_t) \prod_{t=1}^{T-1}a_{i_t i_{t+1}} =∑Iπi1∏t=1Tbit(ot)∏t=1T−1aitit+1
由于 ∑ I = ∑ i 1 . . . ∑ i t . . . ∑ i T \sum_{I} = \sum_{i_1}...\sum_{i_t}...\sum_{i_T} ∑I=∑i1...∑it...∑iT,每个 i t i_t it有 N N N种取值可能,故 ∑ I \sum_{I} ∑I共有 N T N^T NT项,可知若按概率公式直接计算 P ( O ∣ λ ) P(O | \lambda) P(O∣λ),计算量会很大。
前向算法(Forward Algorithm)
找出从时刻 1 → . . . → t → . . . → T 1 \to ... \to t \to ... \to T 1→...→t→...→T,前向概率的递归关系:
前向概率
在观测时间点 1 , . . . , t , . . . , T 1,...,t,...,T 1,...,t,...,T上,对应的观测值为 o 1 , . . . , o t , . . . , o T o_1,...,o_t,...,o_T o1,...,ot,...,oT,各隐状态分别为 i 1 , . . . , i t , . . . , i T i_1,...,i_t,...,i_T i1,...,it,...,iT。
i 1 → . . . → i t → . . . → i T i_1 \to ... \to i_t \to ...\to i_T i1→...→it→...→iT o 1 → . . . → o t → . . . → o T o_1 \to ... \to o_t \to ...\to o_T o1→...→ot→...→oT
定义前向概率
:
α
t
(
i
)
=
P
(
o
1
,
.
.
.
,
o
t
,
i
t
=
q
i
∣
λ
)
\alpha_t(i) = P(o_1,...,o_t,i_t = q_i | \lambda)
αt(i)=P(o1,...,ot,it=qi∣λ)
它表示:截止到时刻 t t t,观测序列的值为 o 1 , o 2 , . . . , o t o_1,o_2,...,o_t o1,o2,...,ot、且 t t t时刻的状态为 q i q_i qi的概率。
递归过程的公式推导
根据定义,写出 t = 1 t=1 t=1和 t = 2 t=2 t=2的前向概率:
-
α 1 ( i ) = P ( o 1 , i 1 = q i ∣ λ ) = P ( o 1 ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = b i ( o 1 ) π i \alpha_1(i) = P(o_1,i_1 = q_i | \lambda) = P(o_1 | i_1 = q_i, \lambda)P(i_1 = q_i | \lambda) = b_{i}(o_1) \pi_i α1(i)=P(o1,i1=qi∣λ)=P(o1∣i1=qi,λ)P(i1=qi∣λ)=bi(o1)πi
-
α 2 ( j ) = P ( o 1 , o 2 , i 2 = q j ∣ λ ) \alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda) α2(j)=P(o1,o2,i2=qj∣λ)
$= \sum_{i=1}^N P(o_1,o_2,i_1 = q_i,i_2 = q_j | \lambda) $
= ∑ i = 1 N P ( o 2 ∣ i 2 = q j , λ ) P ( i 2 = q j ∣ i 1 = q i , λ ) P ( o 1 ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = \sum_{i=1}^N P(o_2 | i_2 = q_j,\lambda)P(i_2 = q_j | i_1 = q_i,\lambda)P(o_1 | i_1 = q_i,\lambda) P(i_1 = q_i | \lambda) =∑i=1NP(o2∣i2=qj,λ)P(i2=qj∣i1=qi,λ)P(o1∣i1=qi,λ)P(i1=qi∣λ)
= ∑ i = 1 N b j ( o 2 ) a i j α 1 = \sum_{i=1}^N b_j(o_2) a_{ij} \alpha_1 =∑i=1Nbj(o2)aijα1
= b j ( o 2 ) ∑ i = 1 N a i j α 1 ( i ) = b_j(o_2) \sum_{i=1}^N a_{ij} \alpha_1(i) =bj(o2)∑i=1Naijα1(i)
. . . ... ...
递推得到 α t + 1 ( j ) \alpha_{t+1}(j) αt+1(j)与 α t ( i ) \alpha_t(i) αt(i)之间的关系:
α t + 1 ( j ) = b j ( o t + 1 ) ∑ i = 1 N a i j α t ( i ) \alpha_{t+1}(j) = b_j(o_{t+1}) \sum_{i=1}^N a_{ij} \alpha_t(i) αt+1(j)=bj(ot+1)i=1∑Naijαt(i)
其中, j ∈ { 1 , 2 , . . . , N } j \in \{1,2,...,N\} j∈{1,2,...,N}。
对递归过程的直观理解
以 t = 1 t=1 t=1和 t = 2 t=2 t=2两个时刻为例,它们之间涉及到的观测值和隐状态有: o 1 o_1 o1、 o 2 o_2 o2、 i 1 i_1 i1、 i 2 i_2 i2:
i 1 → i 2 i_1 \to i_2 i1→i2
o 1 → o 2 o_1 \to o_2 o1→o2
当计算出 α 1 ( i ) = P ( o 1 , i 1 = q i ∣ λ ) , i ∈ { 1 , 2 , . . . , N } \alpha_1(i) = P(o_1,i_1 = q_i | \lambda), i \in \{1,2,...,N\} α1(i)=P(o1,i1=qi∣λ),i∈{1,2,...,N}后,我们手上的信息有:在时刻 t = 1 t=1 t=1,隐状态为 q 1 q_1 q1且观测值为 o 1 o_1 o1的概率 α 1 ( 1 ) \alpha_1(1) α1(1)、…、隐状态为 q N q_N qN且观测值为 o 1 o_1 o1的概率 α 1 ( N ) \alpha_1(N) α1(N)。
而计算 α 2 ( j ) = P ( o 1 , o 2 , i 2 = q j ∣ λ ) , j ∈ { 1 , 2 , . . . , N } \alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda), j \in \{1,2,...,N\} α2(j)=P(o1,o2,i2=qj∣λ),j∈{1,2,...,N}意味着我们要求出:在时刻 t = 2 t=2 t=2,隐状态为 q 1 q_1 q1且过去两个观测值为 o 1 o_1 o1、 o 2 o_2 o2的概率 α 2 ( 1 ) \alpha_2(1) α2(1)、…、隐状态为 q N q_N qN且过去两个观测值为 o 1 o_1 o1、 o 2 o_2 o2的概率 α 2 ( N ) \alpha_2(N) α2(N)。
如何利用 α 1 ( i ) \alpha_1(i) α1(i)来计算 α 2 ( j ) \alpha_2(j) α2(j)?
对比我们已有的信息、待求的信息,发现我们需要确定的是观测值 o 2 o_2 o2,而 o 2 o_2 o2是通过 i 2 i_2 i2决定(即 b i 2 ( o 2 ) b_{i_2}(o_2) bi2(o2)), i 2 i_2 i2又由 i 1 i_1 i1确定(即 a i 1 i 2 a_{i_1 i_2} ai1i2)。因此,在每个 α 1 ( i ) \alpha_1(i) α1(i)的基础上,再加入 b i 2 ( o 2 ) b_{i_2}(o_2) bi2(o2)和 a i 1 i 2 a_{i_1 i_2} ai1i2这两个概率,就可求得 α 2 ( j ) \alpha_2(j) α2(j):
α 2 ( j ) = ∑ i 1 = 1 N α 1 ( i ) b i 2 ( o 2 ) a i 1 i 2 \alpha_2(j) = \sum_{i_1 = 1}^N \alpha_1(i) b_{i_2}(o_2) a_{i_1 i_2} α2(j)=i1=1∑Nα1(i)bi2(o2)ai1i2
稍作调整令 i 1 = q i , i 2 = q j i_1 = q_i, i_2 = q_j i1=qi,i2=qj,即可得:
α 2 ( j ) = ∑ i = 1 N α 1 ( i ) b j ( o 2 ) a i j = b j ( o 2 ) ∑ i = 1 N α 1 ( i ) a i j \alpha_2(j) = \sum_{i = 1}^N \alpha_1(i) b_{j}(o_2) a_{ij} = b_j(o_2) \sum_{i=1}^N \alpha_1(i) a_{ij} α2(j)=i=1∑Nα1(i)bj(o2)aij=bj(o2)i=1∑Nα1(i)aij
意义
为什么要计算前向概率?
- 首先,前向概率可以帮助我们计算目标概率: P ( O ∣ λ ) P(O | \lambda) P(O∣λ)。根据定义, t = T t=T t=T时刻的前向概率为:
α T ( i ) = P ( o 1 , . . . , o T , i T = q i ∣ λ ) \alpha_T(i) = P(o_1,...,o_T,i_T = q_i | \lambda) αT(i)=P(o1,...,oT,iT=qi∣λ)
因此, P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O | \lambda) = \sum_{i=1}^N \alpha_T(i) P(O∣λ)=∑i=1NαT(i)。
- 其次,由于递归关系的存在,计算前向概率的工作量,远小于概率公式直接计算。注意到,
i
∈
{
1
,
2
,
.
.
.
,
N
}
i \in \{1,2,...,N\}
i∈{1,2,...,N}。因此,计算
α
1
(
i
)
\alpha_1(i)
α1(i)需进行
N
N
N次运算;计算
α
2
(
i
)
\alpha_2(i)
α2(i)需进行
N
N
N次累加;…;计算
α
T
(
i
)
\alpha_T(i)
αT(i)需进行
N
N
N次累加。最终进行了
N
×
T
N \times T
N×T次运算,远小于
N
T
N^T
NT。
计算量减少的原因在于,每一次计算直接引用前一个时刻的计算结果,避免重复计算。
后向算法(Backward Algorithm)
找出从时刻 T → . . . → t → . . . → 1 T \to ... \to t \to ... \to 1 T→...→t→...→1,后向概率的递归关系:
后向概率
在观测时间点 1 , . . . , t , . . . , T 1,...,t,...,T 1,...,t,...,T上,对应的观测值为 o 1 , . . . , o t , . . . , o T o_1,...,o_t,...,o_T o1,...,ot,...,oT,各隐状态分别为 i 1 , . . . , i t , . . . , i T i_1,...,i_t,...,i_T i1,...,it,...,iT。
i 1 → . . . → i t → . . . → i T i_1 \to ... \to i_t \to ...\to i_T i1→...→it→...→iT o 1 → . . . → o t → . . . → o T o_1 \to ... \to o_t \to ...\to o_T o1→...→ot→...→oT
定义后向概率
:
β
t
(
i
)
=
P
(
o
t
+
1
,
.
.
.
,
o
T
∣
i
t
=
q
i
,
λ
)
\beta_t(i) = P(o_{t+1},...,o_T | i_t = q_i, \lambda)
βt(i)=P(ot+1,...,oT∣it=qi,λ)
它表示:在 t t t时刻的状态为 q i q_i qi的条件下,对于 t t t之后的所有时刻,观测序列的值为 o t + 1 , o t + 2 , . . . , o T o_{t+1},o_{t+2},...,o_T ot+1,ot+2,...,oT的概率。
递归过程的公式推导
根据定义,写出 t = T t=T t=T、 t = T − 1 t=T-1 t=T−1和 t = T − 2 t=T-2 t=T−2的后向概率:
- β T ( i ) = 1 \beta_T(i) = 1 βT(i)=1
【注】:初始值等于 1 1 1是因为,后向概率考量的是 t t t时刻之后(不包括 t t t时刻)的观测值序列,我们的观测序列只持续到时刻 T T T, T T T之后的观测值与状态都未知,所有的情况都是可能的,因此定义为 1 1 1。
-
β T − 1 ( i ) = P ( o T ∣ i T − 1 = q i , λ ) \beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda) βT−1(i)=P(oT∣iT−1=qi,λ)
= ∑ k = 1 N P ( o T , i T = q k ∣ i T − 1 = q i , λ ) = \sum_{k=1}^N P(o_T,i_T = q_k| i_{T-1} = q_i, \lambda) =∑k=1NP(oT,iT=qk∣iT−1=qi,λ)
= ∑ k = 1 N P ( o T ∣ i T = q k , λ ) P ( i T = q k ∣ i T − 1 = q i , λ ) = \sum_{k=1}^N P(o_T | i_T = q_k,\lambda) P(i_T = q_k | i_{T-1} = q_i, \lambda) =∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)
= ∑ k = 1 N b k ( o T ) a i k = \sum_{k=1}^N b_k(o_T) a_{ik} =∑k=1Nbk(oT)aik -
β T − 2 ( j ) = P ( o T , o T − 1 ∣ i T − 2 = q j , λ ) \beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda) βT−2(j)=P(oT,oT−1∣iT−2=qj,λ)
= ∑ i = 1 N ∑ k = 1 N P ( o T , o T − 1 , i T = q k , i T − 1 = q i ∣ i T − 2 = q j , λ ) = \sum_{i=1}^N \sum_{k=1}^N P(o_T,o_{T-1},i_T=q_k,i_{T-1}=q_i | i_{T-2} = q_j, \lambda) =∑i=1N∑k=1NP(oT,oT−1,iT=qk,iT−1=qi∣iT−2=qj,λ)
= ∑ i = 1 N ∑ k = 1 N P ( o T ∣ i T = q k , λ ) P ( i T = q k ∣ i T − 1 = q i , λ ) P ( o T − 1 ∣ i T − 1 = q i , λ ) P ( i T − 1 = q i ∣ i T − 2 = q j , λ ) = \sum_{i=1}^N \sum_{k=1}^N P(o_T | i_T=q_k, \lambda) P(i_T=q_k | i_{T-1}=q_i, \lambda) P(o_{T-1} | i_{T-1}=q_i, \lambda) P(i_{T-1}=q_i | i_{T-2}=q_j, \lambda) =∑i=1N∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)P(oT−1∣iT−1=qi,λ)P(iT−1=qi∣iT−2=qj,λ)
= ∑ i = 1 N β T − 1 ( i ) b i ( o T − 1 ) a j i = \sum_{i=1}^N \beta_{T-1}(i) b_i(o_{T-1}) a_{ji} =∑i=1NβT−1(i)bi(oT−1)aji
. . . ... ...
递推得到 β t ( j ) \beta_t(j) βt(j)与 β t + 1 ( i ) \beta_{t+1}(i) βt+1(i)之间的关系:
β t ( j ) = ∑ i = 1 N β t + 1 ( i ) b i ( o t + 1 ) a j i \beta_t(j) = \sum_{i=1}^N \beta_{t+1}(i) b_i(o_{t+1}) a_{ji} βt(j)=i=1∑Nβt+1(i)bi(ot+1)aji
其中, j ∈ { 1 , 2 , . . . , N } j \in \{1,2,...,N\} j∈{1,2,...,N}。
对递归过程的直观理解
以 t = T − 1 t = T-1 t=T−1和 t = T − 2 t = T-2 t=T−2两个时刻为例,它们之间涉及到的观测值和隐状态有: o T − 2 o_{T-2} oT−2、 o T − 1 o_{T-1} oT−1、 o T o_T oT、 i T − 2 i_{T-2} iT−2、 i T − 1 i_{T-1} iT−1、 i T i_T iT:
i T − 2 → i T − 1 → i T i_{T-2} \to i_{T-1} \to i_T iT−2→iT−1→iT
o T − 2 → o T − 1 → o T o_{T-2} \to o_{T-1}\to o_T oT−2→oT−1→oT
当计算出 β T − 1 ( i ) = P ( o T ∣ i T − 1 = q i , λ ) , i ∈ { 1 , 2 , . . . , N } \beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda), i \in \{1,2,...,N\} βT−1(i)=P(oT∣iT−1=qi,λ),i∈{1,2,...,N}后,我们手上的信息有:在时刻 t = T − 1 t = T-1 t=T−1,隐状态为 q 1 q_1 q1的条件下,后面时刻的观测值为 o T o_T oT的概率 β T − 1 ( 1 ) \beta_{T-1}(1) βT−1(1)、…、隐状态为 q N q_N qN的条件下,后面时刻的观测值为 o T o_T oT的概率 β T − 1 ( N ) \beta_{T-1}(N) βT−1(N)。
而计算 β T − 2 ( j ) = P ( o T , o T − 1 ∣ i T − 2 = q j , λ ) , j ∈ { 1 , 2 , . . . , N } \beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda), j \in \{1,2,...,N\} βT−2(j)=P(oT,oT−1∣iT−2=qj,λ),j∈{1,2,...,N}意味着我们要求出:在时刻 t = T − 2 t = T-2 t=T−2,隐状态为 q 1 q_1 q1的条件下,后面时刻的观测值为 o T o_T oT、 o T − 1 o_{T-1} oT−1的概率 β T − 2 ( 1 ) \beta_{T-2}(1) βT−2(1)、…、隐状态为 q N q_N qN的条件下,后面时刻的观测值为 o T o_T oT、 o T − 1 o_{T-1} oT−1的概率 β T − 2 ( N ) \beta_{T-2}(N) βT−2(N)。
如何利用 β T − 1 ( i ) \beta_{T-1}(i) βT−1(i)来计算 β T − 2 ( j ) \beta_{T-2}(j) βT−2(j)?
对比我们已有的信息、待求的信息,发现我们需要确定的是观测值 o T − 1 o_{T-1} oT−1,而 o T − 1 o_{T-1} oT−1是通过 i T − 1 i_{T-1} iT−1决定(即 b i T − 1 ( o T − 1 ) b_{i_{T-1}}(o_{T-1}) biT−1(oT−1)), i T − 1 i_{T-1} iT−1又由 i T − 2 i_{T-2} iT−2确定(即 a i T − 2 i T − 1 a_{i_{T-2} i_{T-1}} aiT−2iT−1)。因此,在每个 β T − 1 ( i ) \beta_{T-1}(i) βT−1(i)的基础上,再加入 b i T − 1 ( o T − 1 ) b_{i_{T-1}}(o_{T-1}) biT−1(oT−1)和 a i T − 2 i T − 1 a_{i_{T-2} i_{T-1}} aiT−2iT−1这两个概率,就可求得 β T − 2 ( j ) \beta_{T-2}(j) βT−2(j):
β T − 2 ( j ) = ∑ i T − 1 = 1 N β T − 1 ( i ) b i T − 1 ( o T − 1 ) a i T − 2 i T − 1 \beta_{T-2}(j) = \sum_{i_{T-1} = 1}^N \beta_{T-1}(i) b_{i_{T-1}}(o_{T-1}) a_{i_{T-2} i_{T-1}} βT−2(j)=iT−1=1∑NβT−1(i)biT−1(oT−1)aiT−2iT−1
稍作调整令 t = T − 2 , t + 1 = T − 1 , i T − 1 = q i , i T − 2 = q j t = T-2, t+1 = T-1, i_{T-1} = q_i, i_{T-2} = q_j t=T−2,t+1=T−1,iT−1=qi,iT−2=qj,即可得:
β t ( j ) = ∑ i = 1 N β t + 1 ( i ) b i ( o t + 1 ) a j i \beta_{t}(j) = \sum_{i = 1}^N \beta_{t+1}(i) b_{i}(o_{t+1}) a_{ji} βt(j)=i=1∑Nβt+1(i)bi(ot+1)aji
意义
为什么要计算后向概率?
- 首先,后向概率也可以帮助我们计算目标概率: P ( O ∣ λ ) P(O | \lambda) P(O∣λ)。根据定义, t = 1 t=1 t=1时刻的后向概率为:
β 1 ( i ) = P ( o 2 , . . . , o T ∣ i 1 = q i , λ ) \beta_1(i) = P(o_2,...,o_T | i_1 = q_i, \lambda) β1(i)=P(o2,...,oT∣i1=qi,λ)
此时 β 1 ( i ) \beta_1(i) β1(i)与目标概率 P ( O ∣ λ ) P(O | \lambda) P(O∣λ)相比,还差一个观测值 o 1 o_1 o1。由于所有的观测都相互独立,在 t = 1 t=1 t=1时刻、状态为 q i q_i qi的条件下,观测值 o 1 o_1 o1出现的条件概率为: P ( o 1 ∣ i 1 = q i , λ ) = b i ( o 1 ) P(o_1 | i_1 = q_i, \lambda) = b_i(o_1) P(o1∣i1=qi,λ)=bi(o1)
两式相乘,得到所有观测值 O = ( o 1 , . . . , o T ) O = (o_1,...,o_T) O=(o1,...,oT)在 t = 1 t=1 t=1时刻、状态为 q i q_i qi条件下的联合概率: P ( o 1 , . . . , o T ∣ i 1 = q i , λ ) = β 1 ( i ) b i ( o 1 ) P(o_1,...,o_T | i_1 = q_i, \lambda) = \beta_1(i) b_i(o_1) P(o1,...,oT∣i1=qi,λ)=β1(i)bi(o1)
因此,目标概率 P ( O ∣ λ ) = ∑ i = 1 N P ( o 1 , . . . , o T ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = ∑ i = 1 N β 1 ( i ) b i ( o 1 ) π i P(O | \lambda) = \sum_{i=1}^N P(o_1,...,o_T | i_1 = q_i, \lambda) P(i_1 = q_i| \lambda ) = \sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_i P(O∣λ)=i=1∑NP(o1,...,oT∣i1=qi,λ)P(i1=qi∣λ)=i=1∑Nβ1(i)bi(o1)πi
- 其次,后向概率与前向概率的计算量一样,最终进行了 N × T N \times T N×T次运算,都远远小于概率公式直接计算的 N T N^T NT项。
前向-后向算法(Forward-Backward Algorithm)
前向算法利用前向概率,从 1 → T 1 \to T 1→T的方向计算 P ( O ∣ λ ) P(O | \lambda) P(O∣λ) = ∑ i = 1 N α T ( i ) \sum_{i=1}^N \alpha_T(i) ∑i=1NαT(i)
后向算法利用后向概率,从 T → 1 T \to 1 T→1的方向计算 P ( O ∣ λ ) P(O | \lambda) P(O∣λ) = ∑ i = 1 N β 1 ( i ) b i ( o 1 ) π i \sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_i ∑i=1Nβ1(i)bi(o1)πi
也可以同时用前向概率、后向概率计算 P ( O ∣ λ ) P(O | \lambda) P(O∣λ):
P ( O ∣ λ ) = ∑ i = 1 N P ( O , i t = q i ∣ λ ) P(O | \lambda) = \sum_{i=1}^N P(O,i_t = q_i | \lambda) P(O∣λ)=∑i=1NP(O,it=qi∣λ)
= ∑ i = 1 N P ( O ∣ i t = q i , λ ) P ( i t = q i ∣ λ ) = \sum_{i=1}^N P(O | i_t = q_i,\lambda) P(i_t = q_i | \lambda) =∑i=1NP(O∣it=qi,λ)P(it=qi∣λ)
= ∑ i = 1 N P ( o 1 , . . . , o t ∣ i t = q i , λ ) P ( o t + 1 , . . . , o T ∣ i t = q i , λ ) P ( i t = q i ∣ λ ) = \sum_{i=1}^N P(o_1,...,o_t | i_t = q_i,\lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda) P(i_t = q_i | \lambda) =∑i=1NP(o1,...,ot∣it=qi,λ)P(ot+1,...,oT∣it=qi,λ)P(it=qi∣λ)
= ∑ i = 1 N P ( o 1 , . . . , o t , i t = q i ∣ λ ) P ( o t + 1 , . . . , o T ∣ i t = q i , λ ) = \sum_{i=1}^N P(o_1,...,o_t,i_t = q_i | \lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda) =∑i=1NP(o1,...,ot,it=qi∣λ)P(ot+1,...,oT∣it=qi,λ)
= ∑ i = 1 N α t ( i ) β t ( i ) = \sum_{i=1}^N \alpha_t(i) \beta_t(i) =∑i=1Nαt(i)βt(i)
若利用后向概率的递推关系,替换 β t ( i ) = ∑ j = 1 N β t + 1 ( j ) b j ( o t + 1 ) a i j \beta_{t}(i) = \sum_{j = 1}^N \beta_{t+1}(j) b_{j}(o_{t+1}) a_{ij} βt(i)=∑j=1Nβt+1(j)bj(ot+1)aij,又有:
P ( O ∣ λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j P(O | \lambda) = \sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij} P(O∣λ)=i=1∑Nj=1∑Nαt(i)βt+1(j)bj(ot+1)aij
其他概率的计算
利用前向、后向概率,还可以进行其他的计算:
- 给定模型 λ \lambda λ,则观测序列为 O = ( o 1 , . . . , o T ) O=(o_1,...,o_T) O=(o1,...,oT)、且 t t t时刻的隐状态为 q i q_i qi的概率:
P ( O , i t = q i ∣ λ ) = α t ( i ) β t ( i ) P(O,i_t = q_i | \lambda) = \alpha_t(i) \beta_t(i) P(O,it=qi∣λ)=αt(i)βt(i)
- 给定模型
λ
\lambda
λ和观测序列
O
=
(
o
1
,
.
.
.
,
o
T
)
O=(o_1,...,o_T)
O=(o1,...,oT),则
t
t
t时刻的隐状态为
q
i
q_i
qi的概率(
单个状态
):
P ( i t = q i ∣ O , λ ) = P ( O , i t = q i ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) P(i_t = q_i | O,\lambda) = \frac{P(O,i_t = q_i | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \beta_t(j)} P(it=qi∣O,λ)=P(O∣λ)P(O,it=qi∣λ)=∑j=1Nαt(j)βt(j)αt(i)βt(i)
- 给定模型
λ
\lambda
λ和观测序列
O
=
(
o
1
,
.
.
.
,
o
T
)
O=(o_1,...,o_T)
O=(o1,...,oT),则
t
t
t时刻的隐状态为
q
i
q_i
qi、且
t
+
1
t+1
t+1时刻的隐状态为
q
j
q_j
qj的概率(
两个状态
):
P ( i t = q i , i t + 1 = q j ∣ O , λ ) = P ( O , i t = q i , i t + 1 = q j ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j ∑ i = 1 N ∑ j = 1 N α t ( i ) β t + 1 ( j ) b j ( o t + 1 ) a i j P(i_t = q_i,i_{t+1} = q_j | O,\lambda) = \frac{P(O,i_t = q_i,i_{t+1} = q_j | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}}{\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}} P(it=qi,it+1=qj∣O,λ)=P(O∣λ)P(O,it=qi,it+1=qj∣λ)=∑i=1N∑j=1Nαt(i)βt+1(j)bj(ot+1)aijαt(i)βt+1(j)bj(ot+1)aij