本博客是根据隐马尔科夫模型HMM(一)HMM模型、隐马尔科夫模型HMM(二)前向后向算法评估观察序列概率、隐马尔科夫模型HMM(四)维特比算法解码隐藏状态序列做的笔记
一、背景
- 隐马尔可夫模型用于解决:a. 基于序列的问题;b. 序列有隐藏状态和观测状态。
- 下图表示的是一个长度为{T}的隐马尔可夫模型。其中状态序列
I
=
(
i
1
,
i
2
,
.
.
,
i
T
)
I=(i_{1},i_{2},..,i_{T})
I=(i1,i2,..,iT),观测序列
O
=
(
O
1
,
O
2
,
.
.
,
O
T
)
O=(O_{1},O_{2},..,O_{T})
O=(O1,O2,..,OT).任意一个隐藏状态
i
t
∈
Q
=
{
q
1
,
q
2
,
.
.
,
q
N
}
i_{t}\in Q= \{q_{1},q_{2},..,q_{N}\}
it∈Q={q1,q2,..,qN},共有
N
N
N种隐藏状态。任意一个观测状态
O
t
∈
V
=
{
v
1
,
v
2
,
.
.
,
v
m
}
,
共
有
O_{t}\in V= \{v_{1},v_{2},..,v_{m}\},共有
Ot∈V={v1,v2,..,vm},共有M
种
观
测
状
态
种观测状态
种观测状态。
- 隐马尔可夫模型可用三个矩阵描述:
1) 初始概率矩阵 Π \Pi Π。 i 1 i_{1} i1取 { q 1 , q 2 , . . , q N } \{q_{1},q_{2},..,q_{N}\} {q1,q2,..,qN}的概率。
2) 状态转移矩阵 A A A.矩阵任意一个元素 a i , j a_{i,j} ai,j表示从 t t t时刻的隐藏状态 q i q_{i} qi转移到 t + 1 t+1 t+1时刻的隐藏状态 q j q_{j} qj的概率,即 a i , j = P ( i t + 1 = q j ∣ i t = q i ) a_{i,j}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) ai,j=P(it+1=qj∣it=qi)。———齐次马尔科夫链假设。即任意时刻的隐藏状态只依赖于它前一个隐藏状态。
3)观测矩阵 B B B。矩阵任意一个元素 b j ( O t ) b_{j}(O_{t}) bj(Ot)表示从 t t t时刻的隐藏状态 q i q_{i} qi观测到状态 o j o_{j} oj的概率,即 b j ( O t ) = P ( O t ∣ i t = q i ) b_{j}(O_{t})=P(O_{t}|i_{t}=q_{i}) bj(Ot)=P(Ot∣it=qi)。——观测独立性假设。即任意时刻的观察状态只仅仅依赖于当前时刻的隐藏状态。
二、隐马尔科夫观测序列的生成
三、隐马尔科夫模型的三个基本问题
- 评估观察序列的概率。即给定模型参数 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( O 1 , O 2 , . . , O T ) O=(O_{1},O_{2},..,O_{T}) O=(O1,O2,..,OT)。计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)。这个问题用到前向后向算法。
- 模型参数学习。给定观测序列 O = ( O 1 , O 2 , . . , O T ) O=(O_{1},O_{2},..,O_{T}) O=(O1,O2,..,OT),估计模型参数 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π),使该模型 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)最大。这个问题用到EM算法的鲍姆-韦尔算法
- 预测问题,也称解码问题。给定模型参数 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( O 1 , O 2 , . . , O T ) O=(O_{1},O_{2},..,O_{T}) O=(O1,O2,..,OT),求给定观测序列条件下,最可能出现的状态序列。这个问题求解需要用到基于动态规划的维特比算法。
四、前向后向算法
评估观察序列的概率。即给定模型参数 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( O 1 , O 2 , . . , O T ) O=(O_{1},O_{2},..,O_{T}) O=(O1,O2,..,OT)。计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
- 暴力求解
- 前向算法
前向算法本质上属于动态规划的算法,也就是我们要通过找到局部状态递推的公式,这样一步步的从子问题的最优解拓展到整个问题的最优解。
α t ( j ) a j i \alpha_{t}(j)a_{ji} αt(j)aji—— t t t时刻的隐藏状态是 q j q_{j} qj, t + 1 t+1 t+1时刻的隐藏状态是 q i q_{i} qi,且观测序列是 O 1 , O 2 , . . . , O t O_{1},O_{2},...,O_{t} O1,O2,...,Ot的概率。
α
t
(
j
)
a
j
i
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
i
t
=
q
j
∣
λ
)
∗
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
(
1
)
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
∣
i
t
=
q
j
,
λ
)
∗
P
(
i
t
=
q
i
,
λ
)
∗
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
(
2
)
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
i
t
+
1
=
q
i
∣
P
(
i
t
=
q
j
,
λ
)
∗
P
(
i
t
=
q
i
,
λ
)
(
3
)
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
\begin{aligned} \alpha_{t}(j)a_{ji}&=P(O_{1},O_{2},...,O_{t},i_{t}=q_{j}|\lambda)*P(i_{t+1}=q_{i}|i_{t}=q_{j})(1)\\ &= P(O_{1},O_{2},...,O_{t}|i_{t}=q_{j},\lambda)*P(i_{t}=q_{i},\lambda)*P(i_{t+1}=q_{i}|i_{t}=q_{j})(2)\\ &= P(O_{1},O_{2},...,O_{t},i_{t+1}=q_{i}|P(i_{t}=q_{j},\lambda)*P(i_{t}=q_{i},\lambda)(3)\\ &= P(O_{1},O_{2},...,O_{t},i_{t}=q_{j},i_{t+1}=q_{i}|\lambda) \end{aligned}
αt(j)aji=P(O1,O2,...,Ot,it=qj∣λ)∗P(it+1=qi∣it=qj)(1)=P(O1,O2,...,Ot∣it=qj,λ)∗P(it=qi,λ)∗P(it+1=qi∣it=qj)(2)=P(O1,O2,...,Ot,it+1=qi∣P(it=qj,λ)∗P(it=qi,λ)(3)=P(O1,O2,...,Ot,it=qj,it+1=qi∣λ)
Σ
j
=
1
N
α
t
(
j
)
a
j
i
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
i
t
+
1
=
q
i
∣
λ
)
\Sigma_{j=1}^{N}\alpha_{t}(j)a_{ji}= P(O_{1},O_{2},...,O_{t},i_{t+1}=q_{i}|\lambda)
Σj=1Nαt(j)aji=P(O1,O2,...,Ot,it+1=qi∣λ)
类似于(1)~(3)式的变化过程,下列的式子可化为
[
Σ
j
=
1
N
α
t
(
j
)
a
j
i
]
b
i
+
1
(
O
t
+
1
)
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
i
t
+
1
=
q
i
∣
λ
)
∗
P
(
O
t
+
1
∣
i
t
+
1
=
q
i
)
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
,
O
t
+
1
,
i
t
+
1
=
q
i
∣
λ
)
=
α
t
+
1
(
i
)
\begin{aligned} [\Sigma_{j=1}^{N}\alpha_{t}(j)a_{ji}]b_{i+1}(O_{t+1})&= P(O_{1},O_{2},...,O_{t},i_{t+1}=q_{i}|\lambda)*P(O_{t+1}|i_{t+1}=q_{i})\\ &=P(O_{1},O_{2},...,O_{t},O_{t+1},i_{t+1}=q_{i}|\lambda)\\ &=\alpha_{t+1}(i) \end{aligned}
[Σj=1Nαt(j)aji]bi+1(Ot+1)=P(O1,O2,...,Ot,it+1=qi∣λ)∗P(Ot+1∣it+1=qi)=P(O1,O2,...,Ot,Ot+1,it+1=qi∣λ)=αt+1(i)
实现了
α
t
\alpha_{t}
αt到
α
t
+
1
\alpha_{t+1}
αt+1的递推
- 后向算法
后向算法与前向算法类似,参考
五、一些概率
- 给定时刻
t
t
t,参数
λ
\lambda
λ和观测序列,求
P
(
i
t
=
q
i
∣
O
,
λ
)
P(i_{t}=q_{i}|O,\lambda)
P(it=qi∣O,λ)
γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_{t}(i)=P(i_{t}=q_{i}|O,\lambda) γt(i)=P(it=qi∣O,λ)
γ t ( i ) = P ( i t = q i ∣ O , λ ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) = P ( i t = q i , O 1 , O 2 , . . . , O t , O t + 1 , . . . , O T ∣ λ ) P ( O ∣ λ ) = P ( O 1 , O 2 , . . . , O t , O t + 1 , . . . , O T ∣ i t = q i , λ ) ∗ P ( i t = q i ) P ( O ∣ λ ) = P ( O 1 , O 2 , . . . , O t , ∣ i t = q i , λ ) ∗ P ( i t = q i ) ∗ P ( O t + 1 , . . . , O T ∣ i t = q i , λ ) P ( O ∣ λ ) = P ( O 1 , O 2 , . . . , O t , i t = q i , λ ) ∗ P ( O t + 1 , . . . , O T ∣ i t = q i , λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = α t ( i ) β t ( i ) Σ i = 1 N α t ( i ) β t ( i ) \begin{aligned} \gamma_{t}(i) &=P(i_{t}=q_{i}|O,\lambda)\\ &=\frac{P(i_{t}=q_{i},O|\lambda)}{P(O|\lambda)}\\ &=\frac{P(i_{t}=q_{i},O_{1},O_{2},...,O_{t},O_{t+1},...,O_{T}|\lambda)}{P(O|\lambda)}\\ &=\frac{P(O_{1},O_{2},...,O_{t},O_{t+1},...,O_{T}|i_{t}=q_{i},\lambda)*P(i_{t}=q_{i})}{P(O|\lambda)}\\ &=\frac{P(O_{1},O_{2},...,O_{t},|i_{t}=q_{i},\lambda)*P(i_{t}=q_{i})*P(O_{t+1},...,O_{T}|i_{t}=q_{i},\lambda)}{P(O|\lambda)}\\ &=\frac{P(O_{1},O_{2},...,O_{t},i_{t}=q_{i},\lambda)*P(O_{t+1},...,O_{T}|i_{t}=q_{i},\lambda)}{P(O|\lambda)}\\ &=\frac{\alpha_{t}(i)\beta_{t}(i)}{P(O|\lambda)}\\ &=\frac{\alpha_{t}(i)\beta_{t}(i)}{\Sigma_{i=1}^{N}\alpha_{t}(i)\beta_{t}(i)}\\ \end{aligned} γt(i)=P(it=qi∣O,λ)=P(O∣λ)P(it=qi,O∣λ)=P(O∣λ)P(it=qi,O1,O2,...,Ot,Ot+1,...,OT∣λ)=P(O∣λ)P(O1,O2,...,Ot,Ot+1,...,OT∣it=qi,λ)∗P(it=qi)=P(O∣λ)P(O1,O2,...,Ot,∣it=qi,λ)∗P(it=qi)∗P(Ot+1,...,OT∣it=qi,λ)=P(O∣λ)P(O1,O2,...,Ot,it=qi,λ)∗P(Ot+1,...,OT∣it=qi,λ)=P(O∣λ)αt(i)βt(i)=Σi=1Nαt(i)βt(i)αt(i)βt(i)
- 给定模型参数
λ
\lambda
λ和观测序列O,在时刻
t
t
t处于状态
q
i
q_{i}
qi,且时刻
t
+
1
t+1
t+1处于状态
q
j
q_{j}
qj的概率记为:
ξ t = P ( i t = q i , i t + 1 = q j ∣ O , λ ) \xi_{t}=P(i_{t}=q_{i},i_{t+1}=q_{j}|O,\lambda) ξt=P(it=qi,it+1=qj∣O,λ)
ξ
t
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
O
,
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
O
∣
λ
)
(
1
)
条
件
概
率
公
式
,
O
作
为
条
件
乘
进
去
=
P
(
i
t
+
1
=
q
j
,
O
∣
i
t
=
q
i
,
λ
)
∗
P
(
i
t
=
q
i
)
P
(
O
∣
λ
)
(
2
)
分
离
出
P
(
i
t
=
q
i
)
=
P
(
i
t
+
1
=
q
j
,
O
1
,
O
2
,
.
.
.
,
O
t
,
O
t
+
1
,
.
.
.
,
O
T
∣
i
t
=
q
i
,
λ
)
∗
P
(
i
t
=
q
i
)
P
(
O
∣
λ
)
(
3
)
展
开
O
=
P
(
O
1
,
O
2
,
.
.
.
,
O
t
∣
i
t
=
q
i
,
λ
)
∗
P
(
i
t
+
1
=
q
j
,
O
t
+
1
,
.
.
.
,
O
T
∣
i
t
=
q
i
)
∗
P
(
i
t
=
q
i
)
P
(
O
∣
λ
)
(
4
)
分
开
O
为
O
1
O
t
和
O
t
O
T
=
α
t
(
i
)
∗
P
(
i
t
+
1
=
q
j
,
O
t
+
1
,
.
.
.
,
O
T
∣
i
t
=
q
i
)
P
(
O
∣
λ
)
(
5
)
合
并
α
t
(
i
)
=
α
t
(
i
)
∗
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
t
+
1
,
.
.
.
,
O
T
)
P
(
i
t
=
q
i
)
∗
P
(
O
∣
λ
)
(
6
)
去
掉
i
t
=
q
i
这
个
条
件
=
α
t
(
i
)
∗
P
(
i
t
=
q
i
,
O
t
+
1
,
.
.
.
,
O
T
∣
i
t
+
1
=
q
j
)
∗
P
(
i
t
+
1
=
q
j
)
P
(
i
t
=
q
i
)
∗
P
(
O
∣
λ
)
(
7
)
加
上
i
t
+
1
=
q
j
这
个
条
件
=
α
t
(
i
)
∗
P
(
i
t
=
q
i
∣
i
t
+
1
=
q
j
)
∗
P
(
O
t
+
1
,
.
.
.
,
O
T
∣
i
t
+
1
=
q
j
)
P
(
i
t
+
1
=
q
j
)
P
(
i
t
=
q
i
)
∗
P
(
O
∣
λ
)
(
8
)
在
i
t
+
1
=
q
j
时
,
i
t
=
q
i
与
O
t
+
1
,
.
.
.
,
O
T
独
立
=
α
t
(
i
)
∗
P
(
O
t
+
1
∣
i
t
+
1
=
q
j
)
∗
P
(
O
t
+
2
,
.
.
.
,
O
T
∣
i
t
+
1
=
q
j
)
P
(
i
t
=
q
i
∣
i
t
+
1
=
q
j
)
∗
P
(
i
t
+
1
=
q
j
)
P
(
i
t
=
q
i
)
∗
P
(
O
∣
λ
)
(
8
)
在
i
t
+
1
=
q
j
时
,
O
t
+
1
与
O
t
+
2
,
.
.
.
,
O
T
独
立
=
α
t
(
i
)
∗
b
j
(
O
t
+
1
)
∗
β
j
(
t
+
1
)
∗
P
(
i
t
=
q
i
∣
i
t
+
1
=
q
j
)
∗
P
(
i
t
+
1
=
q
j
)
P
(
i
t
=
q
i
)
∗
P
(
O
∣
λ
)
(
9
)
b
和
β
公
式
化
简
=
α
t
(
i
)
∗
b
j
(
O
t
+
1
∗
β
j
(
t
+
1
)
∗
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
)
P
(
O
∣
λ
)
(
10
)
公
式
化
简
=
α
t
(
i
)
b
j
(
O
t
+
1
)
β
j
(
t
+
1
)
a
i
j
P
(
O
∣
λ
)
(
10
)
公
式
化
简
\begin{aligned} \xi_{t} &=P(i_{t}=q_{i},i_{t+1}=q_{j}|O,\lambda)\\ &=\frac{P(i_{t}=q_{i},i_{t+1}=q_{j},O|\lambda)}{P(O|\lambda)} (1)条件概率公式,O作为条件乘进去\\ &=\frac{P(i_{t+1}=q_{j},O|i_{t}=q_{i},\lambda)*P(i_{t}=q_{i})}{P(O|\lambda)} (2)分离出P(i_{t}=q_{i})\\ &=\frac{P(i_{t+1}=q_{j},O_{1},O_{2},...,O_{t},O_{t+1},...,O_{T}|i_{t}=q_{i},\lambda)*P(i_{t}=q_{i})}{P(O|\lambda)} (3)展开O\\ &=\frac{P(O_{1},O_{2},...,O_{t}|i_{t}=q_{i},\lambda)*P(i_{t+1}=q_{j},O_{t+1},...,O_{T}|i_{t}=q_{i})*P(i_{t}=q_{i})}{P(O|\lambda)}(4)分开O为O_{1}~O_{t}和O_{t}~O_{T}\\ &=\alpha_{t}(i)*\frac{P(i_{t+1}=q_{j},O_{t+1},...,O_{T}|i_{t}=q_{i})}{P(O|\lambda)}(5)合并\alpha_{t}(i)\\ &=\alpha_{t}(i)*\frac{P(i_{t}=q_{i},i_{t+1}=q_{j},O_{t+1},...,O_{T})}{P(i_{t}=q_{i})*P(O|\lambda)}(6)去掉i_{t}=q_{i}这个条件\\ &=\alpha_{t}(i)*\frac{P(i_{t}=q_{i},O_{t+1},...,O_{T}|i_{t+1}=q_{j})*P(i_{t+1}=q_{j})}{P(i_{t}=q_{i})*P(O|\lambda)}(7)加上i_{t+1}=q_{j}这个条件\\ &=\alpha_{t}(i)*\frac{P(i_{t}=q_{i}|i_{t+1}=q_{j})*P(O_{t+1},...,O_{T}|i_{t+1}=q_{j})P(i_{t+1}=q_{j})}{P(i_{t}=q_{i})*P(O|\lambda)}(8)在i_{t+1}=q_{j}时,i_{t}=q_{i}与O_{t+1},...,O_{T}独立\\ &=\alpha_{t}(i)*P(O_{t+1}|i_{t+1}=q_{j})*P(O_{t+2},...,O_{T}|i_{t+1}=q_{j})\frac{P(i_{t}=q_{i}|i_{t+1}=q_{j})*P(i_{t+1}=q_{j})}{P(i_{t}=q_{i})*P(O|\lambda)}(8)在i_{t+1}=q_{j}时,O_{t+1}与O_{t+2},...,O_{T}独立\\ &=\alpha_{t}(i)*b_{j}(O_{t+1})*\beta_{j}(t+1)*\frac{P(i_{t}=q_{i}|i_{t+1}=q_{j})*P(i_{t+1}=q_{j})}{P(i_{t}=q_{i})*P(O|\lambda)}(9)b和\beta公式化简\\ &=\alpha_{t}(i)*b_{j}(O_{t+1}*\beta_{j}(t+1)*\frac{P(i_{t+1}=q_{j}|i_{t}=q_{i})}{P(O|\lambda)}(10)公式化简\\ &=\frac{\alpha_{t}(i)b_{j}(O_{t+1})\beta_{j}(t+1)a_{ij}}{P(O|\lambda)}(10)公式化简\\ \end{aligned}
ξt=P(it=qi,it+1=qj∣O,λ)=P(O∣λ)P(it=qi,it+1=qj,O∣λ)(1)条件概率公式,O作为条件乘进去=P(O∣λ)P(it+1=qj,O∣it=qi,λ)∗P(it=qi)(2)分离出P(it=qi)=P(O∣λ)P(it+1=qj,O1,O2,...,Ot,Ot+1,...,OT∣it=qi,λ)∗P(it=qi)(3)展开O=P(O∣λ)P(O1,O2,...,Ot∣it=qi,λ)∗P(it+1=qj,Ot+1,...,OT∣it=qi)∗P(it=qi)(4)分开O为O1 Ot和Ot OT=αt(i)∗P(O∣λ)P(it+1=qj,Ot+1,...,OT∣it=qi)(5)合并αt(i)=αt(i)∗P(it=qi)∗P(O∣λ)P(it=qi,it+1=qj,Ot+1,...,OT)(6)去掉it=qi这个条件=αt(i)∗P(it=qi)∗P(O∣λ)P(it=qi,Ot+1,...,OT∣it+1=qj)∗P(it+1=qj)(7)加上it+1=qj这个条件=αt(i)∗P(it=qi)∗P(O∣λ)P(it=qi∣it+1=qj)∗P(Ot+1,...,OT∣it+1=qj)P(it+1=qj)(8)在it+1=qj时,it=qi与Ot+1,...,OT独立=αt(i)∗P(Ot+1∣it+1=qj)∗P(Ot+2,...,OT∣it+1=qj)P(it=qi)∗P(O∣λ)P(it=qi∣it+1=qj)∗P(it+1=qj)(8)在it+1=qj时,Ot+1与Ot+2,...,OT独立=αt(i)∗bj(Ot+1)∗βj(t+1)∗P(it=qi)∗P(O∣λ)P(it=qi∣it+1=qj)∗P(it+1=qj)(9)b和β公式化简=αt(i)∗bj(Ot+1∗βj(t+1)∗P(O∣λ)P(it+1=qj∣it=qi)(10)公式化简=P(O∣λ)αt(i)bj(Ot+1)βj(t+1)aij(10)公式化简
P
(
O
∣
λ
)
=
Σ
r
=
1
N
Σ
s
=
1
m
α
r
(
i
)
b
s
(
O
t
+
1
)
β
s
(
t
+
1
)
a
r
s
P(O|\lambda)=\Sigma_{r=1}^{N}\Sigma_{s=1}^{m}\alpha_{r}(i)b_{s}(O_{t+1})\beta_{s}(t+1)a_{rs}
P(O∣λ)=Σr=1NΣs=1mαr(i)bs(Ot+1)βs(t+1)ars
ξ
t
=
α
t
(
i
)
b
j
(
O
t
+
1
)
β
j
(
t
+
1
)
a
i
j
Σ
r
=
1
N
Σ
s
=
1
m
α
r
(
i
)
a
r
s
b
s
(
O
t
+
1
)
β
s
(
t
+
1
)
\xi_{t}=\frac{\alpha_{t}(i)b_{j}(O_{t+1})\beta_{j}(t+1)a_{ij}}{\Sigma_{r=1}^{N}\Sigma_{s=1}^{m}\alpha_{r}(i)a_{rs}b_{s}(O_{t+1})\beta_{s}(t+1)}
ξt=Σr=1NΣs=1mαr(i)arsbs(Ot+1)βs(t+1)αt(i)bj(Ot+1)βj(t+1)aij