第10章 隐马尔可夫模型
-
隐马尔可夫模型(Hidden Markov model,HMM)是可用于序列标注问题的统计学模型,描述了由隐马尔可夫链随机生成观察序列的过程,属于生成模型。
-
隐马尔可夫模型:隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观察而产生观察随机序列的过程
10.1基本概念
10.1.1隐马尔可夫模型的定义
隐马尔可夫模型(Hidden Markov Model,HMM)是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程。隐藏的马尔可夫链随机生成的状态的序列,称为状态序列;每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列,序列的每一个位置又可以看作是一个时刻。其形式定义如下:
![](https://i-blog.csdnimg.cn/blog_migrate/08ac9fc6509aa69bef9a8a1185052ecd.png)
Q是所有可能的状态的集合,V是所有可能的观测的集合:
Q
=
q
1
,
q
2
,
⋅
⋅
⋅
,
q
N
,
V
=
v
1
,
v
2
,
⋅
⋅
⋅
,
v
M
Q = {q_{1},q_{2},\cdot \cdot \cdot ,q_{N}}, V = {v_{1},v_{2},\cdot \cdot \cdot ,v_{M}}
Q=q1,q2,⋅⋅⋅,qN,V=v1,v2,⋅⋅⋅,vM
其中,N是可能的状态数,M是可能的观测数。
I是长度为T的状态序列,O是对应的观测序列:
S
=
s
1
,
s
2
,
⋅
⋅
⋅
,
s
T
,
O
=
o
1
,
o
2
,
⋅
⋅
⋅
,
o
T
S = {s_{1},s_{2},\cdot \cdot \cdot ,s_{T}}, O = {o_{1},o_{2},\cdot \cdot \cdot ,o_{T}}
S=s1,s2,⋅⋅⋅,sT,O=o1,o2,⋅⋅⋅,oT
A 是状态转移概率矩阵:
A
=
[
a
11
a
12
⋅
⋅
⋅
a
1
N
a
21
a
22
⋅
⋅
⋅
a
2
N
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
N
1
a
N
2
⋅
⋅
⋅
a
N
N
]
−
−
−
−
(
10.1
补充
)
A = \begin{bmatrix} a_{11} & a_{12} & \cdot \cdot \cdot & a_{1N}\\ a_{21} & a_{22} & \cdot \cdot \cdot & a_{2N}\\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ a_{N1} & a_{N2} & \cdot \cdot \cdot & a_{NN}\\ \end{bmatrix}----(10.1补充)
A=
a11a21⋅⋅aN1a12a22⋅⋅aN2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1Na2N⋅⋅aNN
−−−−(10.1补充)
即:
A
=
[
a
i
j
]
N
∗
N
−
−
−
−
(
10.1
)
A = \begin{bmatrix} a_{ij} \end{bmatrix}_{N*N}----(10.1)
A=[aij]N∗N−−−−(10.1)
其中,
a
i
j
=
P
(
s
t
+
1
=
q
j
∣
s
t
=
q
i
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
N
;
j
=
1
,
2
,
⋅
⋅
⋅
,
N
−
−
−
−
(
10.2
)
a_{ij} = P(s_{t+1} = q_{j}|s_{t} = q_{i}),i = 1,2,\cdot \cdot \cdot ,N; j = 1,2,\cdot \cdot \cdot ,N----(10.2)
aij=P(st+1=qj∣st=qi),i=1,2,⋅⋅⋅,N;j=1,2,⋅⋅⋅,N−−−−(10.2),表示在时刻t处于状态
q
i
q_{i}
qi的条件下在时刻t+1转移到状态
q
j
q_{j}
qj的概率。
B 是观测概率矩阵:
B
=
[
b
11
b
12
⋅
⋅
⋅
b
1
M
b
21
b
22
⋅
⋅
⋅
b
2
M
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
b
N
1
b
N
2
⋅
⋅
⋅
b
N
M
]
−
−
−
−
(
10.3
补充
)
B = \begin{bmatrix} b_{11} & b_{12} & \cdot \cdot \cdot & b_{1M}\\ b_{21} & b_{22} & \cdot \cdot \cdot & b_{2M}\\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ b_{N1} & b_{N2} & \cdot \cdot \cdot & b_{NM}\\ \end{bmatrix}----(10.3补充)
B=
b11b21⋅⋅bN1b12b22⋅⋅bN2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅b1Mb2M⋅⋅bNM
−−−−(10.3补充)
即:
B
=
[
b
i
j
]
N
∗
M
−
−
−
−
(
10.3
)
B = \begin{bmatrix} b_{ij} \end{bmatrix}_{N*M}----(10.3)
B=[bij]N∗M−−−−(10.3)
其中,
b
j
(
k
)
=
P
(
o
t
=
v
k
∣
s
t
=
q
j
)
,
k
=
1
,
2
,
⋅
⋅
⋅
,
M
;
j
=
1
,
2
,
⋅
⋅
⋅
,
N
−
−
−
−
(
10.4
)
b_{j}(k) = P(o_{t} = v_{k}|s_{t} = q_{j}),k = 1,2,\cdot \cdot \cdot ,M; j = 1,2,\cdot \cdot \cdot ,N----(10.4)
bj(k)=P(ot=vk∣st=qj),k=1,2,⋅⋅⋅,M;j=1,2,⋅⋅⋅,N−−−−(10.4),表示在时刻t处于状态
q
j
q_{j}
qj的条件下生成观测
v
k
v_{k}
vk的概率。
π是初始状态概率向量:
π
=
π
1
,
π
2
,
⋅
⋅
⋅
,
π
N
−
−
−
−
(
10.5
补充
)
\pi = {\pi_{1},\pi_{2},\cdot \cdot \cdot ,\pi_{N} }----(10.5补充)
π=π1,π2,⋅⋅⋅,πN−−−−(10.5补充)
即:
π
=
(
π
i
)
−
−
−
−
(
10.5
)
\pi = (\pi_{i} )----(10.5)
π=(πi)−−−−(10.5)
其中,
π
=
P
(
s
1
=
q
i
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
N
−
−
−
−
(
10.6
)
\pi = P(s_{1} = q_{i} ),i = 1,2, \cdot \cdot \cdot ,N----(10.6)
π=P(s1=qi),i=1,2,⋅⋅⋅,N−−−−(10.6),是时刻t = 1处于状态
q
i
q_{i}
qi的概率。
隐马尔可夫模型由初始状态概率向量π、状态转移概率矩阵A和观测概率矩阵B决定。π和A决定状态序列,B决定观测序列。因此,隐马尔可夫模型λ
可以用三元符号表示,即:
λ
=
(
A
,
B
,
π
)
−
−
−
−
(
10.7
)
\lambda = (A,B,\pi)----(10.7)
λ=(A,B,π)−−−−(10.7)
A
,
B
,
π
A,B,\pi
A,B,π称为隐马尔可夫模型的三要素。
从定义可知,隐马尔可夫模型作了两个基本假设:
(1)齐次马尔可夫性假设,即假设隐藏的马尔可夫链在任意时刻t的状态只依赖于其前一时刻的状态,与其他时刻的状态及观测无关,也与时刻t无关:
P
(
s
t
∣
s
t
−
1
,
o
t
−
1
,
⋅
⋅
⋅
,
s
1
,
o
1
)
=
P
(
s
t
∣
s
t
−
1
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
T
−
−
−
−
(
10.8
)
;
P(s_{t} | s_{t-1},o_{t-1},\cdot \cdot \cdot ,s_{1},o_{1}) = P(s_{t} | s_{t-1}),i = 1,2,\cdot \cdot \cdot ,T----(10.8);
P(st∣st−1,ot−1,⋅⋅⋅,s1,o1)=P(st∣st−1),i=1,2,⋅⋅⋅,T−−−−(10.8);
(2)观测独立性假设,即假设任意时刻的观测只依赖于该时刻的马尔可夫链的状态,与其他观测及状态无关:
P
(
o
t
∣
s
T
,
o
T
,
⋅
⋅
⋅
,
s
t
+
1
,
o
t
+
1
,
s
t
,
s
t
−
1
,
o
t
−
1
,
⋅
⋅
⋅
,
s
1
,
o
1
)
=
P
(
o
t
∣
s
t
)
,
i
=
1
,
2
,
⋅
⋅
⋅
,
T
−
−
−
−
(
10.9
)
;
P(o_{t} | s_{T},o_{T},\cdot \cdot \cdot ,s_{t+1},o_{t+1},s_{t},s_{t-1},o_{t-1},\cdot \cdot \cdot ,s_{1},o_{1}) = P(o_{t} | s_{t}),i = 1,2,\cdot \cdot \cdot ,T----(10.9);
P(ot∣sT,oT,⋅⋅⋅,st+1,ot+1,st,st−1,ot−1,⋅⋅⋅,s1,o1)=P(ot∣st),i=1,2,⋅⋅⋅,T−−−−(10.9);
看了一大堆的定义没有例子很容易理解不深刻,建议看一下书上的例10.1(盒子和球模型),看完之后就可以很快的理解这些基本概念
10.1.2观测序列的生成过程
根据隐马尔可夫模型定义,可以将一个长度为T的观测序列
O
=
(
o
1
,
o
2
,
⋅
⋅
⋅
,
o
T
)
O = (o_{1},o_{2}, \cdot \cdot \cdot , o_{T})
O=(o1,o2,⋅⋅⋅,oT)的生成过程描述如下。
算法10.1(观测序列的生成)
输入:隐马尔可夫模型
λ
=
(
A
,
B
,
π
)
\lambda = (A,B,\pi)
λ=(A,B,π),观测序列长度T;
输出:观测序列
O
=
(
o
1
,
o
2
,
⋅
⋅
⋅
,
o
T
)
O = (o_{1},o_{2}, \cdot \cdot \cdot , o_{T})
O=(o1,o2,⋅⋅⋅,oT)
(1)按照初始状态分布
π
\pi
π产生状态
s
1
s_{1}
s1 ;
(2)令t = 1;
(3)按照状态
s
t
s_{t}
st的观测概率分布
b
s
t
(
k
)
b_{s_{t}}(k)
bst(k)生成
o
t
o_{t}
ot;
(4)按照状态
s
t
s_{t}
st的状态转移概率分布{
a
s
t
s
t
+
1
a_{s_{t}s_{t+1}}
astst+1}产生状态
s
t
+
1
,
s
t
+
1
=
1
,
2
,
⋅
⋅
⋅
,
N
s_{t+1},s_{t+1} = 1,2, \cdot \cdot \cdot,N
st+1,st+1=1,2,⋅⋅⋅,N;
(5)令t = t + 1;如果t < T,转步(3);否则,终止。
10.1.3HMM的3个基本问题
(1)Evaluation:Given λ ,求 P(O|λ)——这里就用到了Forward/Backword
(2)Learining:
λ
M
L
E
=
a
r
g
m
a
x
λ
P
(
O
∣
λ
)
\lambda_{MLE} = argmax_{\lambda}P(O|\lambda)
λMLE=argmaxλP(O∣λ)——Baum-Welch(EM)
(3)Decoding:
S
^
=
a
r
g
m
a
x
S
P
(
S
∣
O
,
λ
)
\hat{S} = argmax_{S}P(S|O,\lambda)
S^=argmaxSP(S∣O,λ)——Viterbi
习惯上(1)又叫做概率计算方法(书中10.2内容),这里主要用到了前向算法和后向算法;(2)又叫做学习算法(书中10.3内容),这里主要是用到了Baum-Welch算法(这个算法其实就是EM算法,后面有时间会更新EM算法的推导);(3)又叫做预测算法(书中10.4内容)这个书中有近似算法和
Viterbi算法。
参考文献
以下是本次HMM算法系列文章的参考文献:
- 李航——《统计学习方法》
- YouTube——shuhuai008的视频课程HMM
- YouTube——徐亦达机器学习HMM、EM
- *[https://www.huaxiaozhuan.com/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0/chapters/15_HMM.html]:隐马尔可夫模型
- [https://sm1les.com/2019/04/10/hidden-markov-model/]:隐马尔可夫模型(HMM)及其三个基本问题
- 例子可以看这个[https://www.cnblogs.com/skyme/p/4651331.html]:一文搞懂HMM(隐马尔可夫模型)
- [https://www.zhihu.com/question/55974064]:南屏晚钟的解答
感谢以上作者对本文的贡献,如有侵权联系后删除相应内容。