10.1 隐马尔科夫模型
蓝色圆圈代表状态变量,绿色圆圈代表观测变量。
模型参数及符号:
状态集合: Q = { q 1 , … q N } Q=\left\{q_{1, \ldots} q_{N}\right\} Q={q1,…qN}
预测集合: V = { v 1 , … v M } V=\left\{v_{1, \ldots} v_{M}\right\} V={v1,…vM}
状态序列: I = { i 1 , … i T } i t ⊂ Q I=\left\{i_{1, \ldots} i_{T}\right\} \quad i_{\mathrm{t}} \subset Q I={i1,…iT}it⊂Q
预测序列 : O = { 0 1 , … o T } o t ⊂ O=\left\{0_{1, \ldots} o_{T}\right\} \quad o_{\mathrm{t}} \subset O={01,…oT}ot⊂ V V V
然后我们可以构建状态转移矩阵
i
2
=
q
1
i
2
=
q
2
…
i
2
=
q
N
i
1
=
q
1
a
11
a
12
…
a
1
N
a
1
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
1
)
i
1
=
q
2
a
21
a
22
…
a
2
N
a
2
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
2
)
…
…
…
…
…
…
i
1
=
q
N
a
N
1
a
N
2
…
a
N
N
a
N
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
N
)
\begin{array}{|c|c|c|c|c|c|} \hline & i_{2}=q_{1} & i_{2}=q_{2} & \ldots & i_{2}=q_{\mathrm{N}} & \\ \hline i_{1}=q_{1} & a_{11} & a_{12} & \ldots & a_{1 \mathrm{~N}} & a_{1 \mathrm{j}}=\mathrm{P}\left(i_{2}=q_{\mathrm{j}} \mid i_{1}=q_{1}\right) \\ \hline i_{1}=q_{2} & a_{21} & a_{22} & \ldots & a_{2 \mathrm{~N}} & a_{2 \mathrm{j}}=\mathrm{P}\left(i_{2}=q_{\mathrm{j}} \mid i_{1}=q_{2}\right) \\ \hline \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ \hline i_{1}=q_{\mathrm{N}} & a_{\mathrm{N} 1} & a_{\mathrm{N} 2} & \ldots & a_{\mathrm{NN}} & a_{\mathrm{Nj}}=\mathrm{P}\left(i_{2}=q_{\mathrm{j}} \mid i_{1}=q_{\mathrm{N}}\right) \\ \hline \end{array}
i1=q1i1=q2…i1=qNi2=q1a11a21…aN1i2=q2a12a22…aN2……………i2=qNa1 Na2 N…aNNa1j=P(i2=qj∣i1=q1)a2j=P(i2=qj∣i1=q2)…aNj=P(i2=qj∣i1=qN)
也就是
A
N
×
N
=
[
a
11
a
12
⋯
a
1
N
a
21
a
22
⋯
a
2
N
⋯
⋯
⋯
⋯
a
N
1
a
N
2
⋯
a
N
N
]
A_{N \times N}=\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 N} \\ a_{21} & a_{22} & \cdots & a_{2 N} \\ \cdots & \cdots & \cdots & \cdots \\ a_{N 1} & a_{N 2} & \cdots & a_{N N} \end{array}\right]
AN×N=⎣
⎡a11a21⋯aN1a12a22⋯aN2⋯⋯⋯⋯a1Na2N⋯aNN⎦
⎤
然后我们构建观测概率矩阵
o
1
=
v
1
o
1
=
v
2
…
o
1
=
v
M
i
1
=
q
1
b
1
(
1
)
b
1
(
2
)
…
b
1
(
M
)
i
1
=
q
2
b
2
(
1
)
b
2
(
2
)
…
b
2
(
M
)
…
…
…
…
…
i
1
=
q
N
b
N
(
1
)
b
N
(
2
)
…
b
N
(
M
)
\begin{array}{|c|c|c|c|c|} \hline & o_{1}=v_{1} & o_{1}=v_{2} & \ldots & o_{1}=v_{\mathrm{M}} \\ \hline i_{1}=q_{1} & b_{1}(1) & b_{1}(2) & \ldots & b_{1}(\mathrm{M}) \\ \hline i_{1}=q_{2} & b_{2}(1) & b_{2}(2) & \ldots & b_{2}(\mathrm{M}) \\ \hline \ldots & \ldots & \ldots & \ldots & \ldots \\ \hline i_{1}=q_{\mathrm{N}} & b_{\mathrm{N}}(1) & b_{\mathrm{N}}(2) & \ldots & b_{\mathrm{N}}(\mathrm{M}) \\ \hline \end{array}
i1=q1i1=q2…i1=qNo1=v1b1(1)b2(1)…bN(1)o1=v2b1(2)b2(2)…bN(2)……………o1=vMb1(M)b2(M)…bN(M)
B N × M = [ b 11 b 12 ⋯ b 1 M b 21 b 22 ⋯ b 2 M ⋯ ⋯ ⋯ ⋯ b N 1 b N 2 ⋯ b N M ] B_{N \times M}=\left[\begin{array}{cccc} b_{11} & b_{12} & \cdots & b_{1 M} \\ b_{21} & b_{22} & \cdots & b_{2 M} \\ \cdots & \cdots & \cdots & \cdots \\ b_{N 1} & b_{N 2} & \cdots & b_{N M} \end{array}\right] BN×M=⎣ ⎡b11b21⋯bN1b12b22⋯bN2⋯⋯⋯⋯b1Mb2M⋯bNM⎦ ⎤
我们设定初始状态概率向量为
π
=
[
π
1
π
2
⋯
π
N
]
=
[
P
(
i
1
=
q
1
)
P
(
i
2
=
q
2
)
⋯
P
(
i
N
=
q
N
)
]
\pi=\left[\begin{array}{c} \pi_{1} \\ \pi_{2} \\ \cdots \\ \pi_{N} \end{array}\right]=\left[\begin{array}{c} P\left(i_{1}=q_{1}\right) \\ P\left(i_{2}=q_{2}\right) \\ \cdots\\ P\left(i_{N}=q_{N}\right) \end{array}\right]
π=⎣
⎡π1π2⋯πN⎦
⎤=⎣
⎡P(i1=q1)P(i2=q2)⋯P(iN=qN)⎦
⎤
模型参数量是
λ
=
(
π
N
×
1
,
A
N
×
N
,
B
N
×
M
)
\lambda=\left(\pi_{N \times 1}, A_{N \times N}, B_{N \times M}\right)
λ=(πN×1,AN×N,BN×M)
总参数量=
N
×
(
N
×
N
)
×
(
N
×
M
)
N \times(N \times N) \times(N \times M)
N×(N×N)×(N×M)。而自由参数量=
(
N
−
1
)
×
(
N
×
N
−
N
)
×
(
N
×
M
−
N
)
(N-1) \times(N \times N-N) \times(N \times M-N)
(N−1)×(N×N−N)×(N×M−N)。
在隐马尔科夫模型中,我们有两个基本假设:
- 齐次马尔科夫性: P ( i t ∣ i t − 1 , ⋯ , i 1 ) = P ( i t ∣ i t − 1 ) P\left(i_{t} \mid i_{t-1}, \cdots, i_{1}\right)=P\left(i_{t} \mid i_{t-1}\right) P(it∣it−1,⋯,i1)=P(it∣it−1)
- 观测独立假设
同时我们有三个基本问题:
- 概率计算问题: P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)
- 学习问题: arg max P ( O ∣ λ ) \arg \max P(O \mid \lambda) argmaxP(O∣λ)
- 预测问题: arg max P ( I ∣ O ) \arg \max P(I \mid O) argmaxP(I∣O)
概率算法,计算 P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)
1.直接计算法
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) = ∑ i 1 , i 2 , … , i T π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) … a i T − 1 i T b i T ( o T ) P(O \mid \lambda)=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda)=\sum_{i_{1}, i_{2}, \ldots, i_T} \pi_{i_{1}} b_{i_{1}\left(o_{1}\right)} a_{i_{1} i_{2}} b_{i_{2}\left(o_{2}\right)} \ldots a_{i_{T-1} i_{T}} b_{i_{T}\left(o_{T}\right)} P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)=i1,i2,…,iT∑πi1bi1(o1)ai1i2bi2(o2)…aiT−1iTbiT(oT)
但是这个式子的计算复杂度为 O ( T N T ) O(TN^T) O(TNT)
2.前向计算法
计算复杂度为: O ( T N 2 ) O\left(T N^{2}\right) O(TN2)
引入新变量: α t ( i ) = P ( o 1 , ⋯ , o t , i t = q i ∣ λ ) , i = 1 , … , N \alpha_{t}(i)=P\left(o_{1}, \cdots, o_{t}, i_{t}=q_{i} \mid \lambda\right), i=1, \ldots, N αt(i)=P(o1,⋯,ot,it=qi∣λ),i=1,…,N
输入:隐马尔科夫模型 λ \lambda λ, 观测序列 O O O
输出:观测序列概率 P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)
步骤:
(1) 初值:
α
1
(
i
)
=
π
i
b
i
(
o
1
)
,
i
=
1
,
2
,
…
,
N
\alpha_{1}(i)=\pi_{i} b_{i}\left(o_{1}\right), i=1,2, \ldots, N
α1(i)=πibi(o1),i=1,2,…,N
(2) 递推: 对于
t
=
1
,
2
,
…
,
T
−
1
\mathrm{t}=1,2, \ldots, T-1
t=1,2,…,T−1
α
t
+
1
(
i
)
=
P
(
O
1
,
…
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
)
=
∑
j
=
1
N
P
(
O
1
,
…
,
O
t
,
i
t
=
q
i
)
P
(
O
t
+
1
∣
i
t
+
1
=
q
i
)
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
=
[
∑
j
=
1
N
α
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
\begin{gathered} \alpha_{t+1}(i)=P\left(O_{1}, \ldots, o_{t}, o_{t+1}, i_{t+1}=q_{i}\right) \\ =\sum_{j=1}^{N} P\left(O_{1}, \ldots, O_{t}, i_{t}=q_{i}\right) P\left(O_{t+1} \mid i_{t+1}=q_{i}\right) P\left(i_{t+1}=q_{i} \mid i_{t}=q_{j}\right) \\ =\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) \end{gathered}
αt+1(i)=P(O1,…,ot,ot+1,it+1=qi)=j=1∑NP(O1,…,Ot,it=qi)P(Ot+1∣it+1=qi)P(it+1=qi∣it=qj)=[j=1∑Nαt(j)aji]bi(ot+1)
(3) 终止:
P
(
O
∣
λ
)
=
P
(
O
1
,
…
,
O
T
∣
λ
)
=
∑
i
=
1
N
P
(
O
1
,
…
,
O
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
P(O \mid \lambda)=\mathrm{P}\left(O_{1}, \ldots, O_{T} \mid \lambda\right)=\sum_{i=1}^{N} \mathrm{P}\left(O_{1}, \ldots, O_{T}, i_{T}=q_{i} \mid \lambda\right)=\sum_{i=1}^{N} \alpha_{T}(i)
P(O∣λ)=P(O1,…,OT∣λ)=i=1∑NP(O1,…,OT,iT=qi∣λ)=i=1∑NαT(i)
我们引入新变量
β
t
(
i
)
=
P
(
o
t
+
1
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
,
i
=
1
,
…
,
N
\beta_{t}(i)=\mathrm{P}\left(o_{t+1}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right), i=1, \ldots, N
βt(i)=P(ot+1,⋯,oT∣it=qi,λ),i=1,…,N
3.后向算法
计算复杂度 O ( T N 2 ) O(TN^2) O(TN2)
输入:隐马尔科夫模型 λ \lambda λ,观测序列 O O O
输出:观测序列概率 P ( O ∣ λ ) P(O\mid \lambda) P(O∣λ)
步骤:
(1) 初值:
β
T
(
i
)
=
1
,
i
=
1
,
2
,
…
,
N
\beta_{T}(i)=1, i=1,2, \ldots, N
βT(i)=1,i=1,2,…,N
(2) 递推:对t
=
T
−
1
,
T
−
2
,
…
,
1
=T-1, T-2, \ldots, 1
=T−1,T−2,…,1
β
t
(
i
)
=
∑
j
=
1
N
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
,
i
=
1
,
2
,
…
,
N
\beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), i=1,2, \ldots, N
βt(i)=j=1∑Naijbj(ot+1)βt+1(j),i=1,2,…,N
(3) 终止:
P
(
O
∣
λ
)
=
∑
i
=
1
N
π
i
b
i
(
o
i
)
β
1
(
i
)
P(O\mid\lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{i}\right) \beta_{1}(i)
P(O∣λ)=∑i=1Nπibi(oi)β1(i)
学习算法,估计参数 λ = ( π , A , B ) \lambda=(\pi, A, B) λ=(π,A,B)
1.监督学习方法
已知训练数据包含 S S S个长度相同的观测序列和对应的状态序列 { ( O 1 , I 1 ) , ( O 2 , I 2 ) , … , ( O S , I S ) } \left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \ldots,\left(O_{S}, I_{S}\right)\right\} {(O1,I1),(O2,I2),…,(OS,IS)}
(1)转移概率
a
i
j
a_{i j}
aij 的估计
a
^
i
j
=
A
i
j
∑
j
=
1
N
A
i
j
,
i
=
1
,
2
,
…
,
N
;
j
=
1
,
2
,
…
,
N
\hat{a}_{i j}=\frac{A_{i j}}{\sum_{j=1}^{N} A_{i j}}, i=1,2, \ldots, N ; j=1,2, \ldots, N
a^ij=∑j=1NAijAij,i=1,2,…,N;j=1,2,…,N
(2)转移概率
a
i
j
a_{i j}
aij 的估计
b
^
j
(
k
)
=
B
j
k
∑
k
=
1
M
B
j
k
j
=
1
,
2
,
…
,
N
;
k
=
1
,
2
,
…
,
M
\widehat{b}_{j}(k)=\frac{B_{j k}}{\sum_{k=1}^{M} B_{j k}} j=1,2, \ldots, N ; k=1,2, \ldots, M
b
j(k)=∑k=1MBjkBjkj=1,2,…,N;k=1,2,…,M
(3)初始状态概率
π
i
\pi_{i}
πi 的估计
π
^
i
\hat{\pi}_{i}
π^i 为
S
S
S 个样本中初始状态为
q
i
q_{i}
qi 的频率
2.Buam-Welch算法(EM算法)
输入:观测数据 O = ( O 1 , O 2 , … , O T ) O=\left(O_{1}, O_{2}, \ldots, O_{T}\right) O=(O1,O2,…,OT)
输出:隐马尔科夫模型参数
(1)初始化:对 n = 0 n=0 n=0,选取 a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) a_{i j}^{(0)}, b_{j}(k)^{(0)}, \pi_{i}^{(0)} aij(0),bj(k)(0),πi(0),得到模型 λ ( 0 ) = ( A ( 0 ) , B ( 0 ) , π ( 0 ) ) \lambda^{(0)}=\left(A^{(0)}, B^{(0)}, \pi^{(0)}\right) λ(0)=(A(0),B(0),π(0))
(2)递推:对
n
=
1
,
2
,
⋯
,
n=1, 2, \cdots,
n=1,2,⋯,
a
i
j
(
n
+
1
)
=
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
t
=
1
T
−
1
γ
t
(
i
)
b
j
(
k
)
(
n
+
1
)
=
∑
t
=
1
,
o
t
=
v
k
T
γ
t
(
j
)
∑
t
=
1
T
γ
t
(
j
)
π
i
(
n
+
1
)
=
γ
1
(
i
)
a_{i j}^{(n+1)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)}\\ b_{j}(k)^{(n+1)}=\frac{\sum_{t=1, o t=v k}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)}\\ \pi_{i}^{(n+1)}=\gamma_{1}(i)
aij(n+1)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)bj(k)(n+1)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)πi(n+1)=γ1(i)
其中
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
ξ
t
(
i
,
j
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
∑
i
=
1
N
∑
j
=
1
N
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
\begin{gathered} \gamma_{t}(i)=\frac{\alpha_{t}(i) \beta_{t}(i)}{P(O \mid \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)} \\ \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)} \end{gathered}
γt(i)=P(O∣λ)αt(i)βt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)ξt(i,j)=∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)αt(i)aijbj(ot+1)βt+1(j)
预测算法
目标:计算 arg max P ( I ∣ O , λ ) \arg \max P(I \mid O, \lambda) argmaxP(I∣O,λ)
1.近似算法
i t ∗ = arg max 1 ≤ i ≤ N [ γ t ( i ) ] , t = 1 , 2 , … , T i_{t}^{*}=\arg \max _{1 \leq i \leq N}\left[\gamma_{t}(i)\right], t=1,2, \ldots, T it∗=arg1≤i≤Nmax[γt(i)],t=1,2,…,T
其中 γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_{t}(\mathrm{i})=P\left(i_{t}=q_{i} \mid O, \lambda\right) γt(i)=P(it=qi∣O,λ)
当
t
=
1
t=1
t=1:
i
1
∗
=
arg
max
1
≤
j
≤
N
P
(
i
1
=
q
j
∣
O
,
λ
)
=
arg
max
1
≤
j
≤
N
{
P
(
i
1
=
q
j
∣
O
,
λ
)
⋯
(
i
1
=
q
N
∣
O
,
λ
)
}
i_{1}^{*}=\arg \max _{1 \leq j \leq N} P\left(i_{1}=q_{j} \mid O, \lambda\right)=\arg \max _{1 \leq j \leq N}\left\{\begin{array}{c} P\left(i_{1}=q_{j} \mid O, \lambda\right) \\ \cdots\\ \left(i_{1}=q_{N} \mid O, \lambda\right) \end{array}\right\}
i1∗=arg1≤j≤NmaxP(i1=qj∣O,λ)=arg1≤j≤Nmax⎩
⎨
⎧P(i1=qj∣O,λ)⋯(i1=qN∣O,λ)⎭
⎬
⎫
当
t
=
2
t=2
t=2
i
2
∗
=
arg
max
1
≤
j
≤
N
P
(
i
2
=
q
j
∣
O
,
λ
)
i_{2}^{*}=\arg \max _{1 \leq j \leq N} P\left(i_{2}=q_{j} \mid O, \lambda\right)
i2∗=arg1≤j≤NmaxP(i2=qj∣O,λ)
最后
I
∗
=
(
i
1
∗
,
…
,
i
T
∗
)
I^{*}=\left(i_{1}^{*}, \ldots, i_{T}^{*}\right)
I∗=(i1∗,…,iT∗)
2.维特比算法
输入:模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)观测数据 O = ( o 1 , o 2 , ⋯ , o T ) O=(o_1, o_2, \cdots, o_T) O=(o1,o2,⋯,oT)
输出:最优路径 I ∗ = ( i 1 ∗ , … , i T ∗ ) I^{*}=\left(i_{1}^{*}, \ldots, i_{T}^{*}\right) I∗=(i1∗,…,iT∗)
(1)初始化
δ
t
(
i
)
=
π
i
b
i
(
o
1
)
,
i
=
1
,
2
,
…
,
N
Ψ
t
(
i
)
=
0
,
i
=
1
,
2
,
…
,
N
\begin{gathered} \delta_{t}(i)=\pi_{i} b_{i}\left(o_{1}\right), i=1,2, \ldots, N \\ \Psi_{t}(i)=0, i=1,2, \ldots, N \end{gathered}
δt(i)=πibi(o1),i=1,2,…,NΨt(i)=0,i=1,2,…,N
(2)递推
对于
t
=
2
,
3
,
⋯
,
T
t=2, 3, \cdots, T
t=2,3,⋯,T
δ
t
(
i
)
=
max
1
≤
j
≤
N
[
δ
t
−
1
(
j
)
a
i
j
]
b
i
(
o
t
)
,
i
=
1
,
2
,
…
,
N
Ψ
t
(
i
)
=
arg
max
1
≤
j
≤
N
[
δ
t
−
1
(
j
)
a
i
j
]
,
i
=
1
,
2
,
…
,
N
\begin{gathered} \delta_{t}(i)=\max _{1 \leq j \leq N}\left[\delta_{t-1}(j) a_{i j}\right] b_{i}\left(o_{t}\right), i=1,2, \ldots, N \\ \Psi_{t}(i)=\arg \max _{1 \leq j \leq N}\left[\delta_{t-1}(j) a_{i j}\right], i=1,2, \ldots, N \end{gathered}
δt(i)=1≤j≤Nmax[δt−1(j)aij]bi(ot),i=1,2,…,NΨt(i)=arg1≤j≤Nmax[δt−1(j)aij],i=1,2,…,N
(3)终止
P
∗
=
max
1
≤
i
≤
N
δ
T
(
i
)
i
T
∗
=
arg
max
1
≤
i
≤
N
[
δ
T
(
i
)
]
\begin{gathered} P^{*}=\max _{1 \leq i \leq N} \delta_{T}(i) \\ i_{T}^{*}=\arg \max _{1 \leq i \leq N}\left[\delta_{T}(i)\right] \end{gathered}
P∗=1≤i≤NmaxδT(i)iT∗=arg1≤i≤Nmax[δT(i)]
(4)最优路径回溯
对于
t
=
T
−
1
,
T
−
2
,
⋯
,
1
t=T-1, T-2, \cdots, 1
t=T−1,T−2,⋯,1
i
t
∗
=
Ψ
t
+
1
(
i
t
+
1
∗
)
i_{t}^{*}=\Psi_{t+1}\left(i_{t+1}^{*}\right)
it∗=Ψt+1(it+1∗)
求得最优路径
I
∗
=
(
i
1
∗
,
i
2
∗
,
…
,
i
T
∗
)
I^{*}=\left(i_{1}^{*}, i_{2}^{*}, \ldots, i_{T}^{*}\right)
I∗=(i1∗,i2∗,…,iT∗)
总结
- 状态链、观测链、状态转移矩阵、观测转移矩阵
- 隐马尔可夫两种假设:齐次马尔可夫性、观测独立假设
- 概率计算法:直接计算法、前向算法、后向算法
- 学习算法:EM算法
- 预测算法:近似算法、维特比算法
10.2 维特比算法(Viterbi Algorithm)
如上一节所示,我们在初始化中引入了新变量
δ
t
(
j
)
=
max
i
1
,
i
2
,
…
,
i
t
−
1
P
(
i
1
,
…
,
i
t
−
1
,
i
t
=
j
,
o
t
,
…
,
o
1
∣
λ
)
,
i
=
1
,
2
,
…
,
N
\delta_{t}(j)=\max _{i_{1}, i_{2}, \ldots, i_{t-1}} \mathrm{P}\left(i_{1}, \ldots, i_{t-1}, i_{t}=j, o_{t}, \ldots, o_{1} \mid \lambda\right), i=1,2, \ldots, N
δt(j)=i1,i2,…,it−1maxP(i1,…,it−1,it=j,ot,…,o1∣λ),i=1,2,…,N
推导:
δ
t
+
1
(
i
)
=
max
i
1
,
i
2
,
…
,
i
t
P
(
i
1
,
…
,
i
t
,
i
t
+
1
=
i
,
O
t
+
1
,
…
,
o
1
∣
λ
)
=
max
1
≤
j
≤
N
δ
t
(
j
)
P
(
o
t
+
1
∣
i
t
+
1
=
q
i
)
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
=
max
1
≤
j
≤
N
[
δ
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
i
=
1
,
2
,
…
,
N
;
t
=
1
,
2
,
…
,
T
−
1
\begin{aligned} \delta_{t+1}(i)&=\max _{i_{1}, i_{2}, \ldots, i t} P\left(i_{1}, \ldots, i_{t}, i_{t+1}=i, O_{t+1}, \ldots, o_{1} \mid \lambda\right) \\ &=\max _{1 \leq j \leq N} \delta_{t}(j) P\left(o_{t+1} \mid i_{t+1}=q_{i}\right) P\left(i_{t+1}=q_{i} \mid i_{t}=q_{j}\right) \\ &=\max _{1 \leq j \leq N}\left[\delta_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) \quad i=1,2, \ldots, N; t=1,2, \ldots, T-1 \end{aligned}
δt+1(i)=i1,i2,…,itmaxP(i1,…,it,it+1=i,Ot+1,…,o1∣λ)=1≤j≤Nmaxδt(j)P(ot+1∣it+1=qi)P(it+1=qi∣it=qj)=1≤j≤Nmax[δt(j)aji]bi(ot+1)i=1,2,…,N;t=1,2,…,T−1
维特比算法本质是用动态规划来解决隐马尔科夫模型的预测问题