10.4预测算法(解决Decoding: S ^ = a r g m a x S P ( S ∣ O , λ ) \hat{S} = argmax_{S}P(S|O,\lambda) S^=argmaxSP(S∣O,λ))
下面介绍HMM预测的两种算法:近似算法与维特比算法(VIterbi algorithm)。
10.4.1 近似算法
近似算法思想:在每个时刻 t 选择在该时刻最有可能出现的状态
i
t
∗
i_{t}^*
it∗,从而得到一个状态序列
S
=
(
s
1
∗
,
s
2
∗
,
⋅
⋅
⋅
,
s
T
∗
)
S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*)
S=(s1∗,s2∗,⋅⋅⋅,sT∗),将它作为预测的结果。具体算法如下:
给定隐马尔可夫模型
λ
\lambda
λ 和观测序列 O,在时刻 t 处于状态
q
i
q_{i}
qi 的概率
γ
t
(
i
)
\gamma_{t}(i)
γt(i) 是
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
−
−
−
−
(
10.42
)
\gamma_{t}(i) = \frac{\alpha_{t}(i)\beta_{t}(i)}{P(O|\lambda)} = \frac{\alpha_{t}(i)\beta_{t}(i)}{\sum_{j=1}^{N}\alpha_{t}(j)\beta_{t}(j)}----(10.42)
γt(i)=P(O∣λ)αt(i)βt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)−−−−(10.42)
每一个时刻 t 最有可能的状态
s
t
∗
s_{t}^*
st∗ 是
s
t
∗
=
a
r
g
m
a
x
1
⩽
i
⩽
N
[
γ
t
(
i
)
]
,
t
=
1
,
2
,
⋅
⋅
⋅
,
T
−
−
−
−
(
10.43
)
s_{t}^* = argmax_{1 \leqslant i \leqslant N} \left[ \gamma_{t}(i) \right], t = 1,2, \cdot \cdot \cdot ,T----(10.43)
st∗=argmax1⩽i⩽N[γt(i)],t=1,2,⋅⋅⋅,T−−−−(10.43)
从而得到状态序列
S
=
(
s
1
∗
,
s
2
∗
,
⋅
⋅
⋅
,
s
T
∗
)
S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*)
S=(s1∗,s2∗,⋅⋅⋅,sT∗).
近似算法的优点是计算简单,其缺点是不能保证预测的状态序列整体是最有可能的状态序列,因为预测的序列可能有实际不发生的部分,也即可能存在状态转移概率
a
i
∗
j
∗
=
0
a_{i^*j^*} = 0
ai∗j∗=0的相邻状态
i
∗
i^*
i∗和
j
∗
j^*
j∗出现。尽管如此,近似算法仍然是有用的。
这个地方好好想一下,要是状态3和状态4之间的转移概率为0,但是状态3和状态4分别是t时刻和t+1时刻求得的最有可能的状态,那这里近似算法的出来的序列不就不合理了嘛。
10.4.2 维特比算法(VIterbi algorithm)
维特比(Viterbi)算法实际是用动态规划解隐马尔可夫模型预测问题,即用动态规划求概率最大路径,这时一条路径对应着一个状态序列。具体算法如下:
首先定义一个路径最大概率
δ
\delta
δ,所以定义在 t 时刻状态为 i 的所有单个路径中概率的最大值为:
δ
t
(
i
)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
=
q
i
,
s
t
−
1
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
∣
λ
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
N
−
−
−
−
(
10.44
)
\delta_{t}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t} = q_{i},s_{t-1},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda), i = 1,2,\cdot \cdot \cdot ,N----(10.44)
δt(i)=s1,s2,⋅⋅⋅,st−1maxP(st=qi,st−1,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ),i=1,2,⋅⋅⋅,N−−−−(10.44)
其实上面的话和公式可以用下面这张图很容易解释,就是说现在 t 时刻在状态
q
T
q_{T}
qT,那么我就要求 1~t-1 时刻中可以到达 t 时刻的状态
q
T
q_{T}
qT的所有路径中该路最大的那条路径。

在得到了
δ
t
(
i
)
\delta_{t}(i)
δt(i)之后,由于我们最终要求解这条路径,所以要想到用递推的方法去找
δ
t
+
1
(
i
)
\delta_{t+1}(i)
δt+1(i)和
δ
t
(
j
)
\delta_{t}(j)
δt(j)之间的关系,推导得到关系如下:
δ
t
+
1
(
i
)
=
m
a
x
1
⩽
j
⩽
N
[
δ
t
(
j
)
a
j
i
]
b
o
t
+
1
(
i
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
N
;
t
=
1
,
2
,
⋅
⋅
⋅
,
T
−
1
−
−
−
−
(
10.45
)
\delta_{t+1}(i) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i),i = 1,2,\cdot \cdot \cdot ,N; t = 1,2,\cdot \cdot \cdot ,T-1----(10.45)
δt+1(i)=1⩽j⩽Nmax[δt(j)aji]bot+1(i),i=1,2,⋅⋅⋅,N;t=1,2,⋅⋅⋅,T−1−−−−(10.45)
以下是(10.45)的详细公式推导:
δ
t
+
1
(
i
)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
P
(
s
t
+
1
=
q
i
,
s
t
,
⋅
⋅
⋅
,
s
1
,
o
t
+
1
,
⋅
⋅
⋅
,
o
1
∣
λ
)
\delta_{t+1}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}P(s_{t+1} = q_{i},s_{t},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda)
δt+1(i)=s1,s2,⋅⋅⋅,stmaxP(st+1=qi,st,⋅⋅⋅,s1,ot+1,⋅⋅⋅,o1∣λ)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
+
1
=
q
i
,
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
+
1
,
⋅
⋅
⋅
,
o
1
∣
λ
)
= \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda)
=s1,s2,⋅⋅⋅,st−1maxP(st+1=qi,st=qj,⋅⋅⋅,s1,ot+1,⋅⋅⋅,o1∣λ)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
+
1
=
q
i
,
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
∣
λ
)
P
(
o
t
+
1
∣
s
t
+
1
=
q
i
,
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
,
λ
)
= \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda)
=s1,s2,⋅⋅⋅,st−1maxP(st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(ot+1∣st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1,λ)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
+
1
=
q
i
,
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
∣
λ
)
P
(
o
t
+
1
∣
s
t
+
1
=
q
i
,
λ
)
= \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda)
=s1,s2,⋅⋅⋅,st−1maxP(st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(ot+1∣st+1=qi,λ)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
∣
λ
)
P
(
s
t
+
1
=
q
i
∣
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
,
λ
)
P
(
o
t
+
1
∣
s
t
+
1
=
q
i
,
λ
)
= \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda)
=s1,s2,⋅⋅⋅,st−1maxP(st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(st+1=qi∣st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1,λ)P(ot+1∣st+1=qi,λ)
=
m
a
x
s
1
,
s
2
,
⋅
⋅
⋅
,
s
t
−
1
P
(
s
t
=
q
j
,
⋅
⋅
⋅
,
s
1
,
o
t
,
⋅
⋅
⋅
,
o
1
∣
λ
)
P
(
s
t
+
1
=
q
i
∣
s
t
=
q
j
,
λ
)
P
(
o
t
+
1
∣
s
t
+
1
=
q
i
,
λ
)
= \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda)
=s1,s2,⋅⋅⋅,st−1maxP(st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(st+1=qi∣st=qj,λ)P(ot+1∣st+1=qi,λ)
=
m
a
x
s
t
=
q
j
δ
t
(
j
)
a
j
i
b
o
t
+
1
(
i
)
= \underset{s_{t}=q_{j}}{max}\delta_{t}(j)a_{ji}b_{o_{t+1}}(i)
=st=qjmaxδt(j)ajibot+1(i)
=
m
a
x
1
⩽
j
⩽
N
[
δ
t
(
j
)
a
j
i
]
b
o
t
+
1
(
i
)
= \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i)
=1⩽j⩽Nmax[δt(j)aji]bot+1(i)
但是你会发现有一个问题就是我们的
δ
t
+
1
(
i
)
\delta_{t+1}(i)
δt+1(i)只是一个概率值,并没有储存这个最大概率的路径,所以我们要再定义一个变量
ψ
\psi
ψ 用来存储这条概率最大的路径:
ψ
t
(
i
)
=
a
r
g
m
a
x
1
⩽
j
⩽
N
[
δ
t
−
1
(
j
)
a
j
i
]
\psi_{t}(i) = arg \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ]
ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji]
这里由于
b
o
t
+
1
(
i
)
b_{o_{t+1}}(i)
bot+1(i)在第 t 个时刻是一个定值,所以在这个式子中是记录概率最大路径的前一个状态 j并把这个值赋值给
ψ
t
(
i
)
\psi_{t}(i)
ψt(i)
算法10.5(维特比算法)
输入:模型
λ
=
(
A
,
B
,
π
)
\lambda = (A,B,\pi)
λ=(A,B,π)和观测
O
=
(
o
1
,
o
2
,
⋅
⋅
⋅
,
o
T
)
O = (o_{1},o_{2},\cdot \cdot \cdot ,o_{T})
O=(o1,o2,⋅⋅⋅,oT);
输出:最优路径
S
∗
=
(
s
1
∗
,
s
2
∗
,
⋅
⋅
⋅
,
s
T
∗
)
S^* = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot ,s_{T}^*)
S∗=(s1∗,s2∗,⋅⋅⋅,sT∗).
- 初始化
δ 1 ( i ) = π i b s 1 ( o 1 ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N \delta_{1}(i) = \pi_{i}b_{s1}(o_{1}),i = 1,2,\cdot \cdot \cdot ,N δ1(i)=πibs1(o1),i=1,2,⋅⋅⋅,N
ψ 1 ( i ) = 0 , i = 1 , 2 , ⋅ ⋅ ⋅ , N \psi_{1}(i) = 0,i = 1,2,\cdot \cdot \cdot ,N ψ1(i)=0,i=1,2,⋅⋅⋅,N - 递推。对
t
=
2
,
3
,
⋅
⋅
⋅
,
T
t = 2,3,\cdot \cdot \cdot ,T
t=2,3,⋅⋅⋅,T
δ t ( i ) = m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] b o t + 1 ( i ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N \delta_{t}(i) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ]b_{o_{t+1}}(i),i = 1,2,\cdot \cdot \cdot ,N δt(i)=1⩽j⩽Nmax[δt−1(j)aji]bot+1(i),i=1,2,⋅⋅⋅,N
ψ t ( i ) = a r g m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋅ ⋅ ⋅ , N \psi_{t}(i) = arg \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ],i = 1,2,\cdot \cdot \cdot ,N ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji],i=1,2,⋅⋅⋅,N - 终止
P ∗ = m a x 1 ⩽ j ⩽ N δ T ( i ) P^* = \underset{1 \leqslant j \leqslant N}{max}\delta_{T}(i) P∗=1⩽j⩽NmaxδT(i)
S T ∗ = a r g m a x 1 ⩽ j ⩽ N [ δ T ( i ) ] S_{T}^* = arg \underset{1 \leqslant j \leqslant N}{max} \left [ \delta_{T}(i) \right ] ST∗=arg1⩽j⩽Nmax[δT(i)] - 最优路径回溯。对
t
=
T
−
1
,
T
−
2
,
⋅
⋅
⋅
,
1
t = T-1,T-2,\cdot \cdot \cdot ,1
t=T−1,T−2,⋅⋅⋅,1
s t ∗ = ψ t + 1 ( s t + 1 ∗ ) s_{t}^* = \psi_{t+1}(s_{t+1}^*) st∗=ψt+1(st+1∗)
最终求得最优路径 S ∗ = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) S^* = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot ,s_{T}^*) S∗=(s1∗,s2∗,⋅⋅⋅,sT∗).
又讲了一大堆的理论,快看看书上例10.3,可以帮助你快速消化这些理论
参考文献
以下是HMM系列文章的参考文献:
- 李航——《统计学习方法》
- YouTube——shuhuai008的视频课程HMM
- YouTube——徐亦达机器学习HMM、EM
- *[https://www.huaxiaozhuan.com/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0/chapters/15_HMM.html]:隐马尔可夫模型
- [https://sm1les.com/2019/04/10/hidden-markov-model/]:隐马尔可夫模型(HMM)及其三个基本问题
- [https://www.cnblogs.com/skyme/p/4651331.html]:一文搞懂HMM(隐马尔可夫模型)
- [https://www.zhihu.com/question/55974064]:南屏晚钟的解答
感谢以上作者对本文的贡献,如有侵权联系后删除相应内容。