前言
不太了解隐马尔可夫模型的可以先看我的上一篇博客:隐马尔可夫模型(HMM)初级篇
基础知识
定义:
- 观测序列 O O O,观测值集合 Q = { q 1 , q 2 , . . . , q N } Q=\{q_1,q_2, ...,q_N\} Q={q1,q2,...,qN}
- 状态序列 I I I,状态值集合 V = { v 1 , v 2 , . . . , v M } V=\{v_1,v_2, ..., v_M\} V={v1,v2,...,vM}
- 初始状态概率向量 π π π
- 状态转移概率矩阵 A A A, a i j = P ( i t + 1 = q j ∣ i t = q i ) a_{ij}=P(i_{t+1}=q_j|i_t=q_i) aij=P(it+1=qj∣it=qi)
- 观测概率矩阵 B B B, b j ( k ) = P ( o t = v k ∣ i t = q j ) b_j(k)=P(o_t=v_k|i_t=q_j) bj(k)=P(ot=vk∣it=qj)
λ = ( π , A , B ) λ = (π, A, B) λ=(π,A,B) 为 HMM 三要素。
两个假设:
- 齐次马尔可夫性假设
P ( i t ∣ i t − 1 , o t − 1 , . . . , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , . . . , T P(i_t | i_{t-1}, o_{t-1}, ... , i_1, o_1) = P(i_t | i_{t-1}), t = 1, 2, ..., T P(it∣it−1,ot−1,...,i1,o1)=P(it∣it−1),t=1,2,...,T - 观测独立性假设
P ( o t ∣ i T , o T , i T − 1 , o T − 1 , . . . , i t + 1 , o t + 1 , i t , i t − 1 , o t − 1 , . . . , i 1 , o 1 ) = P ( o t ∣ i t ) P(o_t | i_T, o_T, i_{T-1}, o_{T-1}, ... , i_{t+1}, o_{t+1}, i_t, i_{t-1}, o_{t-1}, ..., i_1, o_1) = P(o_t | i_t) P(ot∣iT,oT,iT−1,oT−1,...,it+1,ot+1,it,it−1,ot−1,...,i1,o1)=P(ot∣it)
HMM的三个问题
- Evaluation
P ( O ∣ λ ) P(O| λ) P(O∣λ)——分为前向、后向 - Learning
λ λ λ 如何求解?——EM算法( λ = a r g m a x P ( O ∣ λ ) λ = argmax P(O| λ) λ=argmaxP(O∣λ)) - Decoding
找到最大序列的状态应该是什么?就是找到 I = a r g m a x P ( I ∣ O ) I = argmax P(I|O) I=argmaxP(I∣O)
可以引申出两个问题:预测问题—— P ( i t + 1 ∣ o 1 , o 2 , . . . , o t ) P(i_{t+1}|o_1, o_2, ..., o_t) P(it+1∣o1,o2,...,ot)预测下一状态的隐状态;滤波问题 P ( i t ∣ o 1 , o 2 , . . . , o t ) P(i_t|o_1, o_2, ..., o_t) P(it∣o1,o2,...,ot)
Evaluation
问题描述
即 给出
λ
λ
λ,求
P
(
O
∣
λ
)
P(O| λ)
P(O∣λ)
P
(
O
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
⋅
P
(
I
∣
λ
)
①
\begin{aligned} P(O| λ) = \sum_I P(O,I|λ) = \sum_I P(O| I,λ) \cdot P(I| λ)\ \ \ \ \ ① \end{aligned}
P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)⋅P(I∣λ) ①
P ( I ∣ λ ) = P ( i 1 , i 2 , . . . , i T ∣ λ ) = P ( i T ∣ i 1 , i 2 , . . . , i T − 1 , λ ) ⋅ P ( i 1 , i 2 , . . . , i T − 1 , λ ) ② \begin{aligned} P(I|λ)=P(i_1,i_2, ..., i_T|λ)=P(i_T|i_1,i_2, ..., i_{T-1},λ)\cdot P(i_1,i_2, ..., i_{T-1}, λ)\ \ \ \ \ ② \end{aligned} P(I∣λ)=P(i1,i2,...,iT∣λ)=P(iT∣i1,i2,...,iT−1,λ)⋅P(i1,i2,...,iT−1,λ) ②
根据 齐次马尔可夫性假设 和状态转移矩阵
P
(
i
T
∣
i
1
,
i
2
,
.
.
.
,
i
T
−
1
,
λ
)
=
P
(
i
T
∣
i
T
−
1
)
=
a
i
T
−
1
,
i
T
③
P(i_T|i_1,i_2, ..., i_{T-1},λ)=P(i_T|i_{T-1})=a_{i_{T-1},i_T}\ \ \ \ \ ③
P(iT∣i1,i2,...,iT−1,λ)=P(iT∣iT−1)=aiT−1,iT ③
同理,再结合条件概率的乘法
P ( i 1 , i 2 , . . . , i T − 1 , λ ) = P ( i T − 1 ∣ i 1 , i 2 , . . . , i T − 2 , λ ) ⋅ P ( i T − 2 ∣ i 1 , i 2 , . . . , i T − 3 , λ ) ⋅ . . . ⋅ P ( i 2 ∣ i 1 , λ ) ⋅ P ( i 1 ∣ λ ) = a i T − 2 , i T − 1 ⋅ a i T − 3 , i T − 2 ⋅ . . . ⋅ a i 2 , i 1 ⋅ π ( i 1 ) ④ P ( i 1 ∣ λ ) 即 π ( i 1 ) \begin{aligned} P(i_1,i_2, ..., i_{T-1}, λ) &= P(i_{T-1}|i_1,i_2, ..., i_{T-2},λ)\cdot P(i_{T-2}|i_1,i_2, ..., i_{T-3},λ)\cdot ...\cdot P(i_2|i_1,λ)\cdot P(i_1|λ) \\ &=a_{i_{T-2},i_{T-1}}\cdot a_{i_{T-3},i_{T-2}}\cdot ...\cdot a_{i_2,i_1}\cdot π(i_1)\ \ \ \ \ ④\\ P(i_1|λ)\ 即\ π(i_1) \end{aligned} P(i1,i2,...,iT−1,λ)P(i1∣λ) 即 π(i1)=P(iT−1∣i1,i2,...,iT−2,λ)⋅P(iT−2∣i1,i2,...,iT−3,λ)⋅...⋅P(i2∣i1,λ)⋅P(i1∣λ)=aiT−2,iT−1⋅aiT−3,iT−2⋅...⋅ai2,i1⋅π(i1) ④
那么,
P
(
I
∣
λ
)
=
P
(
i
1
,
i
2
,
.
.
.
,
i
T
∣
λ
)
=
P
(
i
T
∣
i
1
,
i
2
,
.
.
.
,
i
T
−
1
,
λ
)
⋅
P
(
i
1
,
i
2
,
.
.
.
,
i
T
−
1
,
λ
)
=
a
i
T
,
i
T
−
1
⋅
a
i
T
−
2
,
i
T
−
1
⋅
.
.
.
⋅
a
i
2
,
i
1
⋅
π
(
i
1
)
=
π
(
i
1
)
⋅
∏
t
=
2
T
a
i
t
−
1
,
i
t
⑤
\begin{aligned} P(I|λ)&=P(i_1,i_2, ..., i_T|λ)\\ &=P(i_T|i_1,i_2, ..., i_{T-1},λ)\cdot P(i_1,i_2, ..., i_{T-1}, λ)\\ &=a_{i_{T},i_{T-1}}\cdot a_{i_{T-2},i_{T-1}}\cdot ...\cdot a_{i_2,i_1}\cdot π(i_1)\\ &=π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}}\ \ \ \ \ ⑤ \end{aligned}
P(I∣λ)=P(i1,i2,...,iT∣λ)=P(iT∣i1,i2,...,iT−1,λ)⋅P(i1,i2,...,iT−1,λ)=aiT,iT−1⋅aiT−2,iT−1⋅...⋅ai2,i1⋅π(i1)=π(i1)⋅t=2∏Tait−1,it ⑤
由上面的HMM图解可知,
P
(
O
∣
I
,
λ
)
=
∏
t
=
1
T
b
i
t
(
o
t
)
⑥
P(O|I,λ)=\prod_{t=1}^T b_{i_t}(o_t) \ \ \ \ \ ⑥
P(O∣I,λ)=t=1∏Tbit(ot) ⑥
由⑤⑥可得①为:
P
(
O
∣
λ
)
=
∑
I
[
π
(
i
1
)
⋅
∏
t
=
2
T
a
i
t
−
1
,
i
t
⋅
∏
t
=
1
T
b
i
t
(
o
t
)
]
=
∑
i
1
∑
i
2
.
.
.
∑
i
T
[
π
(
i
1
)
⋅
∏
t
=
2
T
a
i
t
−
1
,
i
t
⋅
∏
t
=
1
T
b
i
t
(
o
t
)
]
⑦
\begin{aligned} P(O| λ) &= \sum_I \left[ π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}} \cdot \prod_{t=1}^T b_{i_t}(o_t) \right] \\ &= \sum_{i_1}\sum_{i_2}...\sum_{i_T} \left[ π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}} \cdot \prod_{t=1}^T b_{i_t}(o_t) \right]\ \ \ \ \ ⑦ \end{aligned}
P(O∣λ)=I∑[π(i1)⋅t=2∏Tait−1,it⋅t=1∏Tbit(ot)]=i1∑i2∑...iT∑[π(i1)⋅t=2∏Tait−1,it⋅t=1∏Tbit(ot)] ⑦
每个 i i i 都有 N N N 种可能,那么⑦的复杂度即为 O ( N T ) O(N^T) O(NT),即随着序列长度 T T T 指数级增长。
可见这种计算方法是十分复杂的,因此需要想其他方法简化计算。
前向算法(Forward Algorithm)
记 α t ( i ) = P ( o 1 , o 2 , . . . , o t , i t = q i ∣ λ ) α_t(i) =P(o_1,o_2, ..., o_t, i_t = q_i | λ) αt(i)=P(o1,o2,...,ot,it=qi∣λ)
则 α T ( i ) = P ( O , i t = q i ∣ λ ) α_T(i) =P(O, i_t = q_i | λ) αT(i)=P(O,it=qi∣λ),这里 O O O即代表 o 1 , o 2 , . . . , o T o_1,o_2, ..., o_T o1,o2,...,oT
于是,有公式
P
(
O
∣
λ
)
=
∑
i
=
1
N
P
(
O
,
i
t
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
P(O| λ)=\sum_{i=1}^N P(O, i_t=q_i|λ)=\sum_{i=1}^N α_T(i)
P(O∣λ)=i=1∑NP(O,it=qi∣λ)=i=1∑NαT(i)
α t + 1 ( j ) = P ( o 1 , o 2 , . . . , o t , i t = q j ∣ λ ) = ∑ i = 1 N P ( o 1 , . . . , o t , o t + 1 , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) = 由 观 测 独 立 性 假 设 ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) = ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ o 1 , . . . , o t , i t = q i , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i ∣ λ ) = 由 齐 次 马 尔 可 夫 性 假 设 ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ i t = q i , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ i t = q i , λ ) ⋅ α t ( i ) = ∑ i = 1 N b j ( o t + 1 ) ⋅ α t ( i ) ⋅ a i j \begin{aligned} α_{t+1}(j) &=P(o_1,o_2, ..., o_t, i_t = q_j | λ)\\ &=\sum_{i=1}^N P(o_1, ..., o_t, o_{t+1}, i_t=q_i, i_{t+1}=q_j|λ)\\ &=\sum_{i=1}^N P(o_{t+1}|o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\cdot P(o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\\ &\overset{由观测独立性假设}{=}\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\\ &=\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|o_1, ..., o_t, i_t=q_i, λ)\cdot P(o_1, ..., o_t, i_t=q_i|λ)\\ &\overset{由齐次马尔可夫性假设}{=}\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|i_t=q_i, λ)\cdot P(o_1, ..., o_t, i_t=q_i|λ)\\ &=\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|i_t=q_i, λ)\cdot α_t(i) \\ &=\sum_{i=1}^N b_j(o_{t+1})\cdot α_t(i) \cdot a_{ij} \end{aligned} αt+1(j)=P(o1,o2,...,ot,it=qj∣λ)=i=1∑NP(o1,...,ot,ot+1,it=qi,it+1=qj∣λ)=i=1∑NP(ot+1∣o1,...,ot,it=qi,it+1=qj,λ)⋅P(o1,...,ot,it=qi,it+1=qj,λ)=由观测独立性假设i=1∑NP(ot+1∣it+1=qj)⋅P(o1,...,ot,it=qi,it+1=qj,λ)=i=1∑NP(ot+1∣it+1=qj)⋅P(it+1=qj∣o1,...,ot,it=qi,λ)⋅P(o1,...,ot,it=qi∣λ)=由齐次马尔可夫性假设i=1∑NP(ot+1∣it+1=qj)⋅P(it+1=qj∣it=qi,λ)⋅P(o1,...,ot,it=qi∣λ)=i=1∑NP(ot+1∣it+1=qj)⋅P(it+1=qj∣it=qi,λ)⋅αt(i)=i=1∑Nbj(ot+1)⋅αt(i)⋅aij
这样就得到了一个递归公式,可以计算 α T ( i ) α_T(i) αT(i),从而可算 P ( O ∣ λ ) P(O| λ) P(O∣λ)。复杂度为 O ( T × N 2 ) O(T\times N^2) O(T×N2)。
后向算法(backward algorithm)
记 β t ( k ) = P ( o t + 1 , . . . , o T ∣ i t = q k , λ ) β_t(k)=P(o_{t+1}, ..., o_T|i_t=q_k, λ) βt(k)=P(ot+1,...,oT∣it=qk,λ)
则 β 1 ( k ) = P ( o 2 , . . . , o T ∣ i 1 = q k , λ ) β_1(k)=P(o_{2}, ..., o_T|i_1=q_k, λ) β1(k)=P(o2,...,oT∣i1=qk,λ)
P ( O ∣ λ ) = P ( o 1 , o 2 , . . . , o T ∣ λ ) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T , i 1 = q k ) (λ是给定值,就不写上去了) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T ∣ i 1 = q k ) ⋅ P ( i 1 = q k ) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k ( P ( i 1 = q k ) 即 是 初 始 状 态 概 率 π k ) = ∑ k = 1 N P ( o 1 ∣ o 2 , . . . , o T , i 1 = q k ) ⋅ P ( o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k = 由 观 测 独 立 性 假 设 ∑ k = 1 N P ( o 1 ∣ i 1 = q k ) ⋅ P ( o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k = ∑ k = 1 N P ( o 1 ∣ i 1 = q k ) ⋅ β 1 ( i ) ⋅ π k = ∑ k = 1 N b k ( o 1 ) ⋅ β 1 ( k ) ⋅ π k \begin{aligned} P(O| λ)&=P(o_1,o_2, ..., o_T| λ)\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T,i_1=q_k) \qquad\text{(λ是给定值,就不写上去了)}\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T|i_1=q_k) \cdot P(i_1=q_k)\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T|i_1=q_k) \cdot π_k\qquad (P(i_1=q_k)即是初始状态概率 π_k)\\ &=\sum_{k=1}^N P(o_1|o_2, ..., o_T, i_1=q_k) \cdot P(o_2, ..., o_T| i_1=q_k)\cdot π_k\\ &\overset{由观测独立性假设}{=}\sum_{k=1}^N P(o_1|i_1=q_k) \cdot P(o_2, ..., o_T| i_1=q_k)\cdot π_k\\ &=\sum_{k=1}^N P(o_1|i_1=q_k) \cdot β_1(i)\cdot π_k\\ &=\sum_{k=1}^N b_k(o_1) \cdot β_1(k)\cdot π_k \end{aligned} P(O∣λ)=P(o1,o2,...,oT∣λ)=k=1∑NP(o1,o2,...,oT,i1=qk)(λ是给定值,就不写上去了)=k=1∑NP(o1,o2,...,oT∣i1=qk)⋅P(i1=qk)=k=1∑NP(o1,o2,...,oT∣i1=qk)⋅πk(P(i1=qk)即是初始状态概率πk)=k=1∑NP(o1∣o2,...,oT,i1=qk)⋅P(o2,...,oT∣i1=qk)⋅πk=由观测独立性假设k=1∑NP(o1∣i1=qk)⋅P(o2,...,oT∣i1=qk)⋅πk=k=1∑NP(o1∣i1=qk)⋅β1(i)⋅πk=k=1∑Nbk(o1)⋅β1(k)⋅πk
β
t
(
k
)
=
P
(
o
t
+
1
,
.
.
.
,
o
T
∣
i
t
=
q
k
)
(
同
样
省
略
了
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
,
o
T
,
i
t
+
1
=
q
j
∣
i
t
=
q
k
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
k
)
⋅
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
k
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
k
)
⋅
a
k
j
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
,
o
T
∣
i
t
+
1
=
q
j
)
⋅
a
k
j
(
这
步
转
换
解
释
见
下
面
)
=
∑
j
=
1
N
P
(
o
t
+
1
∣
o
t
+
2
.
.
.
,
o
T
,
i
t
+
1
=
q
j
)
⋅
P
(
o
t
+
2
.
.
.
,
o
T
,
∣
i
t
+
1
=
q
j
)
⋅
a
k
j
=
∑
j
=
1
N
P
(
o
t
+
1
∣
o
t
+
2
.
.
.
,
o
T
,
i
t
+
1
=
q
j
)
⋅
β
t
+
1
(
k
)
⋅
a
k
j
=
由
观
测
独
立
性
假
设
∑
j
=
1
N
P
(
o
t
+
1
∣
i
t
+
1
=
q
j
)
⋅
β
t
+
1
(
k
)
⋅
a
k
j
=
∑
j
=
1
N
b
j
(
o
t
+
1
)
⋅
a
k
j
⋅
β
t
+
1
(
k
)
\begin{aligned} β_t(k)&=P(o_{t+1}, ..., o_T|i_t=q_k) \qquad(同样省略了λ)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T, i_{t+1}=q_j |i_t=q_k)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j, i_t=q_k)\cdot P(i_{t+1}=q_j| i_t=q_k)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j, i_t=q_k)\cdot a_{kj}\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j)\cdot a_{kj}\qquad (这步转换解释见下面)\\ &=\sum_{j=1}^N P(o_{t+1}| o_{t+2} ..., o_T, i_{t+1}=q_j)\cdot P(o_{t+2} ..., o_T, |i_{t+1}=q_j) \cdot a_{kj}\\ &=\sum_{j=1}^N P(o_{t+1}| o_{t+2} ..., o_T, i_{t+1}=q_j)\cdot β_{t+1}(k)\cdot a_{kj}\\ &\overset{由观测独立性假设}{=}\sum_{j=1}^N P(o_{t+1}| i_{t+1}=q_j)\cdot β_{t+1}(k)\cdot a_{kj}\\ &=\sum_{j=1}^N b_j(o_{t+1})\cdot a_{kj}\cdot β_{t+1}(k) \end{aligned}
βt(k)=P(ot+1,...,oT∣it=qk)(同样省略了λ)=j=1∑NP(ot+1,...,oT,it+1=qj∣it=qk)=j=1∑NP(ot+1,...,oT∣it+1=qj,it=qk)⋅P(it+1=qj∣it=qk)=j=1∑NP(ot+1,...,oT∣it+1=qj,it=qk)⋅akj=j=1∑NP(ot+1,...,oT∣it+1=qj)⋅akj(这步转换解释见下面)=j=1∑NP(ot+1∣ot+2...,oT,it+1=qj)⋅P(ot+2...,oT,∣it+1=qj)⋅akj=j=1∑NP(ot+1∣ot+2...,oT,it+1=qj)⋅βt+1(k)⋅akj=由观测独立性假设j=1∑NP(ot+1∣it+1=qj)⋅βt+1(k)⋅akj=j=1∑Nbj(ot+1)⋅akj⋅βt+1(k)
相关解释:
- 通过上式就可以从 β T ( k ) β_T(k) βT(k) 开始递归推导 β T − 1 ( k ) , . . . , β 1 ( k ) β_{T-1}(k), ..., β_1(k) βT−1(k),...,β1(k),从而得到 P ( O ∣ λ ) P(O| λ) P(O∣λ)。复杂度为 O ( T × N 2 ) O(T\times N^2) O(T×N2)。
在贝叶斯网络中,正常来说a和c是有关联的,但一旦给定b,那么会发生阻断,a、c相互独立,有 P ( c ∣ a , b ) = P ( c ∣ b ) P(c|a,b)=P(c|b) P(c∣a,b)=P(c∣b)。同理应用到 i t , i t + 1 , o t + 1 i_t,i_{t+1},o_{t+1} it,it+1,ot+1,有 P ( o t + 1 ∣ i t , i t + 1 ) = P ( o t + 1 ∣ i t + 1 ) P(o_{t+1}|i_t,i_{t+1})=P(o_{t+1}|i_{t+1}) P(ot+1∣it,it+1)=P(ot+1∣it+1)