定义
定义:隐马尔科夫模型(Hidden Markov Model, HMM)是关于时序的概率模型,描述由一个隐藏的马尔科夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程。隐藏的马尔科夫链速记生成的状态的序列,成为状态序列。每一个状态生成一个观测,而由此产生的观测的随机序列,成为观测序列,序列的每一个位置又可以看做是一个时刻。
假设Q是所有N种可能的状态的集合: Q = { q 1 , q 2 , … , q N } Q=\left\{q_{1}, q_{2}, \ldots, q_{N}\right\} Q={q1,q2,…,qN}, V是所有M种可能的观测的集合: V = { v 1 , v 2 , … , v M } V=\left\{v_{1}, v_{2}, \ldots, v_{M}\right\} V={v1,v2,…,vM}。I是长度为T的状态序列,O是对应的观测序列: I = ( i 1 , i 2 , … , i T ) , O = ( o 1 , o 2 , … , o T ) I=\left(i_{1}, i_{2}, \ldots, i_{T}\right), O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) I=(i1,i2,…,iT),O=(o1,o2,…,oT).
设A是状态转移概率矩阵: A = [ a i j ] N × N A=\left[a_{i j}\right]_{N \times N} A=[aij]N×N,其中, a i j = P ( i t + 1 = q j ∣ i t = q i ) i = 1 , 2 , … , N ; j = 1 , 2 , … , N a_{i j}=P\left(i_{t+1}=q_{j} \mid i_{t}=q_{i}\right) \quad i=1,2, \ldots, N ; j=1,2, \ldots, N aij=P(it+1=qj∣it=qi)i=1,2,…,N;j=1,2,…,N。
又设B是观测概率矩阵: B = [ b j k ] N × M B=\left[b_{j k}\right]_{N \times M} B=[bjk]N×M,其中, b j k = P ( o t = v k ∣ i t = q j ) , j = 1 , 2 , … N ; k = 1 , 2 , … , M b_{j k}=P\left(o_{t}=v_{k} \mid i_{t}=q_{j}\right), \quad j=1,2, \ldots N ; k=1,2, \ldots, M bjk=P(ot=vk∣it=qj),j=1,2,…N;k=1,2,…,M。
π \pi π是初始状态概率向量: π = ( π 1 , π 2 , … , π N ) \pi=\left(\pi_{1}, \pi_{2}, \ldots, \pi_{N}\right) π=(π1,π2,…,πN),其中, π i = P ( i 1 = q i ) , i = 1 , 2 , … , N \pi_{i}=P\left(i_{1}=q_{i}\right), \quad i=1,2, \ldots, N πi=P(i1=qi),i=1,2,…,N。隐马尔科夫模型由初始状态概率向量 π \pi π,状态转移概率矩阵A和观测概率矩阵B决定。 π \pi π和A决定状态序列,B决定观测序列。因此,隐马尔科夫模型 λ \lambda λ可以用三元符号表示,即 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)。
两个基本假设
假设1:其次马尔科夫假设,即假设隐藏的马尔科夫链在任意时刻
t
t
t的状态只依赖于其前一时刻的状态,与其他时刻的状态及观测无关,也与时刻
t
t
t无关:
P
(
i
t
∣
i
t
−
1
,
o
t
−
1
,
…
,
i
1
,
o
1
)
=
P
(
i
t
∣
i
t
−
1
)
P\left(i_{t} \mid i_{t-1}, o_{t-1}, \ldots, i_{1}, o_{1}\right)=P\left(i_{t} \mid i_{t-1}\right)
P(it∣it−1,ot−1,…,i1,o1)=P(it∣it−1)
假设2:观测独立性假设,即假设任意时刻的观测只依赖于该时刻的马尔科夫链的状态,与其他观测与状态无关:
P
(
o
t
∣
i
T
,
o
T
,
i
T
−
1
,
o
T
−
1
,
…
,
i
t
,
i
t
−
1
,
o
t
−
1
,
…
,
i
1
,
o
1
)
=
P
(
o
t
∣
i
t
)
P\left(o_{t} \mid i_{T}, o_{T}, i_{T-1}, o_{T-1}, \ldots, i_{t}, i_{t-1}, o_{t-1}, \ldots, i_{1}, o_{1}\right)=P\left(o_{t} \mid i_{t}\right)
P(ot∣iT,oT,iT−1,oT−1,…,it,it−1,ot−1,…,i1,o1)=P(ot∣it)
三个基本问题
问题1:概率计算问题。给定模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)和观测序列 O = ( o 1 , o 2 , … , o T ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) O=(o1,o2,…,oT),计算在模型 λ \lambda λ下观测序列O出现的概率 P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)。
问题2:学习问题。已知观测序列 O = ( o 1 , o 2 , … , o T ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) O=(o1,o2,…,oT)估计模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)
的参数,使得在该模型下观测序列概率 P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)最大,即用极大似然估计的方法估计参数。
问题3:预测问题。也称为解码问题,已知模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)和观测序列 O = ( o 1 , o 2 , … , o T ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) O=(o1,o2,…,oT),求对给定规则序列条件概率 P ( I ∣ O ) P(I \mid O) P(I∣O)最大的状态序列 I = ( i 1 , i 2 , … , i T ) I=\left(i_{1}, i_{2}, \ldots, i_{T}\right) I=(i1,i2,…,iT),即给定观测序列,求最有可能的对应的状态序列。
概率计算问题
直接计算法
对于求
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ)最直接的方法就是按照概率公式直接计算,即:
P
(
O
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
\begin{aligned} P(O \mid \lambda) &=\sum_{I} P(O, I \mid \lambda) \\ &=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda) \end{aligned}
P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)P(I∣λ)
这里
π
i
=
P
(
i
1
=
q
i
)
\pi_{i}=P(i_{1 }= q_{i})
πi=P(i1=qi)。其中,
P
(
I
∣
λ
)
P(I \mid \lambda)
P(I∣λ)表示给定模型参数时,产生状态序列
I
=
(
i
1
,
i
2
,
…
,
i
T
)
I=\left(i_{1}, i_{2}, \ldots, i_{T}\right)
I=(i1,i2,…,iT)的概率:
P
(
I
∣
λ
)
=
π
i
1
a
i
1
i
2
a
i
2
i
3
⋯
a
i
T
−
1
i
T
P(I \mid \lambda)=\pi_{i_{1}} a_{i_{1} i_{2}} a_{i_{2} i_{3}} \cdots a_{i_{T-1} i_{T}}
P(I∣λ)=πi1ai1i2ai2i3⋯aiT−1iT
P
(
O
∣
I
,
λ
)
P(O \mid I, \lambda)
P(O∣I,λ)表示给定模型参数且状态序列为
I
=
(
i
1
,
i
2
,
…
,
i
T
)
I=\left(i_{1}, i_{2}, \ldots, i_{T}\right)
I=(i1,i2,…,iT)时,产生观测序列
O
=
(
o
1
,
o
2
,
…
,
o
T
)
O=\left(o_{1}, o_{2}, \ldots, o_{T}\right)
O=(o1,o2,…,oT)的概率:
P
(
O
∣
I
,
λ
)
=
b
i
1
o
1
b
i
2
o
2
…
b
i
T
o
T
P(O \mid I, \lambda)=b_{i_{1} o_{1}} b_{i_{2} o_{2}} \ldots b_{i_{T} o_{T}}
P(O∣I,λ)=bi1o1bi2o2…biToT
所以
P
(
O
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
=
∑
i
1
,
i
2
,
…
,
i
T
π
i
1
b
i
1
o
1
a
i
1
i
2
b
i
2
o
2
⋯
a
i
T
−
1
i
T
b
i
T
o
T
\begin{aligned} P(O \mid \lambda) &=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda) \\ &=\sum_{i_{1}, i_{2}, \ldots, i_{T}} \pi_{i_{1}} b_{i_{1} o_{1}} a_{i_{1} i_{2}} b_{i_{2} o_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} o_{T}} \end{aligned}
P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)=i1,i2,…,iT∑πi1bi1o1ai1i2bi2o2⋯aiT−1iTbiToT
但是其中,
∑
i
1
,
i
2
…
,
i
T
\sum_{i_{1}, i_{2} \ldots, i_{T}}
∑i1,i2…,iT共有
N
T
N^{T}
NT种可能,计算
π
i
1
b
i
1
O
1
a
i
1
i
2
b
i
2
O
2
⋯
a
i
T
−
1
i
T
b
i
T
O
T
\pi_{i_{1}} b_{i_{1} O_{1}} a_{i_{1} i_{2}} b_{i_{2} O_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} O_{T}}
πi1bi1O1ai1i2bi2O2⋯aiT−1iTbiTOT的时间复杂度为
O
(
T
)
O(T)
O(T),所以上式整体时间复杂度为
O
(
T
N
T
)
O\left(T N^{T}\right)
O(TNT),显然时间复杂度太高了,这种算法不可行。
前向算法
首先定义前向概率:给定隐马尔科夫模型
λ
\lambda
λ,定义到时刻t部分观测序列为
o
1
,
o
2
,
…
,
o
t
o_{1}, o_{2}, \ldots, o_{t}
o1,o2,…,ot且状态为
q
i
q_i
qi的概率为前向概率,记作:
α
t
(
i
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
∣
λ
)
\alpha_{t}(i)=P\left(o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i} \mid \lambda\right)
αt(i)=P(o1,o2,…,ot,it=qi∣λ)
根据前向概率的定义可推得:
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
…
,
o
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
P(O \mid \lambda)=P\left(o_{1}, o_{2}, \ldots, o_{T} \mid \lambda\right)=\sum_{i=1}^{N} P\left(o_{1}, o_{2}, \ldots, o_{T}, i_{T}=q_{i} \mid \lambda\right)=\sum_{i=1}^{N} \alpha_{T}(i)
P(O∣λ)=P(o1,o2,…,oT∣λ)=i=1∑NP(o1,o2,…,oT,iT=qi∣λ)=i=1∑NαT(i)
于是求解
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ)的问题被转化为了求解前向概率
α
t
(
i
)
\alpha_{t}(i)
αt(i)的问题。由前向概率的定义可知:
α
1
(
i
)
=
P
(
o
1
,
i
1
=
q
i
∣
λ
)
=
π
i
b
i
o
1
α
2
(
i
)
=
P
(
o
1
,
o
2
,
i
2
=
q
i
∣
λ
)
=
[
∑
j
=
1
N
α
1
(
j
)
a
j
i
]
×
b
i
o
2
α
3
(
i
)
=
P
(
o
1
,
o
2
,
o
3
,
i
3
=
q
i
∣
λ
)
=
[
∑
j
=
1
N
α
2
(
j
)
a
j
i
]
×
b
i
o
3
\begin{aligned} &\alpha_{1}(i)=P\left(o_{1}, i_{1}=q_{i} \mid \lambda\right)=\pi_{i} b_{i o_{1}} \\ &\alpha_{2}(i)=P\left(o_{1}, o_{2}, i_{2}=q_{i} \mid \lambda\right)=\left[\sum_{j=1}^{N} \alpha_{1}(j) a_{j i}\right] \times b_{i o_{2}} \\ &\alpha_{3}(i)=P\left(o_{1}, o_{2}, o_{3}, i_{3}=q_{i} \mid \lambda\right)=\left[\sum_{j=1}^{N} \alpha_{2}(j) a_{j i}\right] \times b_{i o_{3}} \end{aligned}
α1(i)=P(o1,i1=qi∣λ)=πibio1α2(i)=P(o1,o2,i2=qi∣λ)=[j=1∑Nα1(j)aji]×bio2α3(i)=P(o1,o2,o3,i3=qi∣λ)=[j=1∑Nα2(j)aji]×bio3
第2行对应
依此类推可得如下递推公式:
α
t
+
1
(
i
)
=
[
∑
j
=
1
N
α
t
(
j
)
a
j
i
]
×
b
i
o
t
+
1
\alpha_{t+1}(i)=\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] \times b_{i o_{t+1}}
αt+1(i)=[j=1∑Nαt(j)aji]×biot+1
因此:
α
T
(
i
)
=
[
∑
j
=
1
N
α
T
−
1
(
j
)
a
j
i
]
×
b
i
o
T
\alpha_{T}(i)=\left[\sum_{j=1}^{N} \alpha_{T-1}(j) a_{j i}\right] \times b_{i o_{T}}
αT(i)=[j=1∑NαT−1(j)aji]×bioT
将上式所求结果代回:
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
…
,
o
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
P(O \mid \lambda)=P\left(o_{1}, o_{2}, \ldots, o_{T} \mid \lambda\right)=\sum_{i=1}^{N} P\left(o_{1}, o_{2}, \ldots, o_{T}, i_{T}=q_{i} \mid \lambda\right)=\sum_{i=1}^{N} \alpha_{T}(i)
P(O∣λ)=P(o1,o2,…,oT∣λ)=i=1∑NP(o1,o2,…,oT,iT=qi∣λ)=i=1∑NαT(i)
即可求得
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ)。
后向算法
同前向算法一样,首先定义后向概率:给定隐马尔科夫模型
λ
\lambda
λ,定义在时刻t状态为
q
i
q_i
qi的条件下,从
t
+
1
t+1
t+1到T的部分观测序列为
o
t
+
1
,
o
t
+
2
,
…
,
o
T
o_{t+1}, o_{t+2}, \ldots, o_{T}
ot+1,ot+2,…,oT的概率为后向概率,记作:
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
…
,
o
T
∣
i
t
=
q
i
,
λ
)
\beta_{t}(i)=P\left(o_{t+1}, o_{t+2}, \ldots, o_{T} \mid i_{t}=q_{i}, \lambda\right)
βt(i)=P(ot+1,ot+2,…,oT∣it=qi,λ)
由后向概率的定义可知
β
T
(
i
)
=
P
(
i
T
=
q
i
,
λ
)
=
1
β
T
−
1
(
i
)
=
P
(
o
T
∣
i
T
−
1
=
q
i
,
λ
)
=
∑
j
=
1
N
a
i
j
b
j
o
T
β
T
(
j
)
β
T
−
2
(
i
)
=
P
(
o
T
−
1
,
o
T
∣
i
T
−
2
=
q
i
,
λ
)
=
∑
j
=
1
N
a
i
j
b
j
o
u
−
1
β
T
−
1
(
j
)
\begin{aligned} \beta_{T}(i) &=P\left(i_{T}=q_{i}, \lambda\right)=1 \\ \beta_{T-1}(i) &=P\left(o_{T} \mid i_{T-1}=q_{i}, \lambda\right)=\sum_{j=1}^{N} a_{i j} b_{j o_{T}} \beta_{T}(j) \\ \beta_{T-2}(i) &=P\left(o_{T-1}, o_{T} \mid i_{T-2}=q_{i}, \lambda\right)=\sum_{j=1}^{N} a_{i j} b_{j o_{u-1}} \beta_{T-1}(j) \end{aligned}
βT(i)βT−1(i)βT−2(i)=P(iT=qi,λ)=1=P(oT∣iT−1=qi,λ)=j=1∑NaijbjoTβT(j)=P(oT−1,oT∣iT−2=qi,λ)=j=1∑Naijbjou−1βT−1(j)
第2和第3行分别对应
依次类推可得递推公式:
β
t
(
i
)
=
∑
j
=
1
N
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
\beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)
βt(i)=j=1∑Naijbjot+1βt+1(j)
根据递推公式可求得
β
1
(
i
)
\beta_{1}(i)
β1(i)又:
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
i
1
=
q
i
∣
λ
)
P
(
o
2
,
o
3
,
…
,
o
T
∣
i
1
=
q
i
,
λ
)
=
∑
i
=
1
N
π
i
b
i
o
1
β
1
(
i
)
P(O \mid \lambda)=P\left(o_{1}, o_{2}, \ldots, o_{T} \mid \lambda\right)=\sum_{i=1}^{N} P\left(o_{1}, i_{1}=q_{i} \mid \lambda\right) P\left(o_{2}, o_{3}, \ldots, o_{T} \mid i_{1}=q_{i}, \lambda\right)=\sum_{i=1}^{N} \pi_{i} b_{i o_{1}} \beta_{1}(i)
P(O∣λ)=P(o1,o2,…,oT∣λ)=i=1∑NP(o1,i1=qi∣λ)P(o2,o3,…,oT∣i1=qi,λ)=i=1∑Nπibio1β1(i)
所以也可以求得
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ)
综上可以看出前向算法和后向算法都是先计算局部概率,然后递推到全局,每一时刻的概率计算都会用上前一时刻计算出的结果,整体的时间复杂度大约是 O ( T N 2 ) O(TN^2) O(TN2)。比如 β T − 1 ( i ) \beta_{T-1}(i) βT−1(i),因为每一个i要先遍历j求和N次,然后再针对i有N种情况,时间复杂度是 N 2 N^2 N2,然后总共有T个 β \beta β。总体复杂度明显小于直接计算法的 O ( T N T ) O\left(T N^{T}\right) O(TNT)。
利用前向概率和后向概率,可以得到关于单个状态和两个状态概率的一些计算公式:
公式1
给定模型参数 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_{i} qi的概率,记为: γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_{t}(i)=P\left(i_{t}=q_{i} \mid O, \lambda\right) γt(i)=P(it=qi∣O,λ)
可以通过前向概率和后向概率进行计算,推导如下:
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
∑
j
=
1
N
P
(
i
t
=
q
j
,
O
∣
λ
)
\gamma_{t}(i)=P\left(i_{t}=q_{i} \mid O, \lambda\right)=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{\sum_{j=1}^{N} P\left(i_{t}=q_{j}, O \mid \lambda\right)}
γt(i)=P(it=qi∣O,λ)=P(O∣λ)P(it=qi,O∣λ)=∑j=1NP(it=qj,O∣λ)P(it=qi,O∣λ)
又由前向概率和后向概率的定义可知:
α
t
(
i
)
β
t
(
i
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
o
t
+
2
,
…
,
o
T
∣
i
t
=
q
i
,
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
\alpha_{t}(i) \beta_{t}(i)=P\left(o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i} \mid \lambda\right) P\left(o_{t+1}, o_{t+2}, \ldots, o_{T} \mid i_{t}=q_{i}, \lambda\right)=P\left(i_{t}=q_{i}, O \mid \lambda\right)
αt(i)βt(i)=P(o1,o2,…,ot,it=qi∣λ)P(ot+1,ot+2,…,oT∣it=qi,λ)=P(it=qi,O∣λ)
所以
γ
t
(
i
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
∑
j
=
1
N
P
(
i
t
=
q
j
,
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
\gamma_{t}(i)=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{\sum_{j=1}^{N} P\left(i_{t}=q_{j}, O \mid \lambda\right)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)}
γt(i)=∑j=1NP(it=qj,O∣λ)P(it=qi,O∣λ)=∑j=1Nαt(j)βt(j)αt(i)βt(i)
公式2
给定模型参数 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_{i} qi且在时刻 t + 1 t+1 t+1处于状态 q j q_j qj的概率,记为: ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O , λ ) \xi_{t}(i, j)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid O, \lambda\right) ξt(i,j)=P(it=qi,it+1=qj∣O,λ)
可以通过前向概率和后向概率进行计算,推导如下:
ξ
t
(
i
,
j
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
O
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
∑
i
=
1
N
∑
j
=
1
N
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
\xi_{t}(i, j)=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{\sum_{i=1}^{N} \sum_{j=1}^{N} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}
ξt(i,j)=P(O∣λ)P(it=qi,it+1=qj,O∣λ)=∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)P(it=qi,it+1=qj,O∣λ)
又:
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
1
,
o
2
,
…
,
o
T
∣
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
P
(
o
t
+
2
,
o
t
+
3
,
…
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
=
α
t
(
i
)
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
\begin{aligned} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}, o_{2}, \ldots, o_{T} \mid \lambda\right) \\ &=P\left(o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i} \mid \lambda\right) P\left(o_{t+1}, i_{t+1}=q_{j} \mid i_{t}=q_{i}, \lambda\right) P\left(o_{t+2}, o_{t+3}, \ldots, o_{T} \mid i_{t+1}=q_{j}, \lambda\right) \\ &=\alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j) \end{aligned}
P(it=qi,it+1=qj,O∣λ)=P(it=qi,it+1=qj,o1,o2,…,oT∣λ)=P(o1,o2,…,ot,it=qi∣λ)P(ot+1,it+1=qj∣it=qi,λ)P(ot+2,ot+3,…,oT∣it+1=qj,λ)=αt(i)aijbjot+1βt+1(j)
所以
ξ
t
(
i
,
j
)
=
α
t
(
i
)
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
∑
i
=
1
N
∑
j
=
1
N
α
t
(
i
)
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
\xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)}{\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)}
ξt(i,j)=∑i=1N∑j=1Nαt(i)aijbjot+1βt+1(j)αt(i)aijbjot+1βt+1(j)