UTF8gbsn
The predication of HMM has two different methods. the approximation way
and the Viterbi algorithm. The predication problem of HMM is to find the
state sequence based on
λ
=
(
A
,
B
,
π
)
\lambda=(A,B,\pi)
λ=(A,B,π) and observation
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O=(o_1,o_2,\cdots, o_T)
O=(o1,o2,⋯,oT)
approximation way
The probability of
i
t
=
q
i
i_t=q_i
it=qi when we have
λ
=
(
A
,
B
,
π
)
,
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
\lambda=(A,B,\pi), O=\left( o_1, o_2, \cdots, o_T \right)
λ=(A,B,π),O=(o1,o2,⋯,oT) is
γ
t
=
i
\gamma_{t}=i
γt=i.
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
\gamma_t(i)=\frac{\alpha_t(i)\beta_t(i)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{j=1}^{N}\alpha_t(j)\beta_t(j)}
γt(i)=P(O∣λ)αt(i)βt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)
Now, the most likely state
i
t
∗
i^{*}_t
it∗ of time t is :
i
t
∗
=
arg
max
1
⩽
i
⩽
N
[
γ
t
(
i
)
]
,
t
=
1
,
2
,
⋯
,
T
i^{*}_t=\arg\max_{1\leqslant i\leqslant N} [\gamma_t(i)], t=1,2,\cdots, T
it∗=arg1⩽i⩽Nmax[γt(i)],t=1,2,⋯,T
Finaly, we can get a state sequence
I
∗
=
(
i
1
∗
,
i
2
∗
,
⋯
,
i
T
∗
)
I^{*}=\left( i_1^{*}, i_2^{*}, \cdots, i_T^{*} \right)
I∗=(i1∗,i2∗,⋯,iT∗). One drawback
of this approximation algorithm is that the generated state sequence
might be impossible. For example,
a
i
j
=
0
a_{ij}=0
aij=0 for some
i
,
j
i,j
i,j.
Viterbi
The Viterbi algorithm uses dynamic programming approach to solve the HMM
predication problem. Let’s first introduce two variables
δ
,
Ψ
\delta, \Psi
δ,Ψ.
δ t ( i ) = max i 1 , i 2 , ⋯ , i t − 1 P ( i t = i , i t − 1 , ⋯ , i 1 , o t , ⋯ , o 1 ∣ λ ) , i = 1 , 2 , ⋯ , N \delta_t(i)=\max_{i_1,i_2,\cdots,i_{t-1}}P(i_t=i,i_{t-1},\cdots,i_1,o_t,\cdots, o_1|\lambda), i=1,2,\cdots, N δt(i)=i1,i2,⋯,it−1maxP(it=i,it−1,⋯,i1,ot,⋯,o1∣λ),i=1,2,⋯,N
Now, the recursion of δ \delta δ is:
δ t + 1 ( i ) = max i 1 , i 2 , ⋯ , i t P ( i t + 1 = i , i t , ⋯ , i 1 , o t , ⋯ , o 1 ∣ λ ) , i = 1 , 2 , ⋯ , N = max 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯ , N ; t = 1 , 2 , ⋯ , T − 1 \left. \begin{aligned} \delta_{t+1}(i)&=\max_{i_1,i_2,\cdots,i_{t}}P(i_{t+1}=i,i_{t},\cdots,i_1,o_t,\cdots, o_1|\lambda), i=1,2,\cdots, N\\ &=\max_{1\leqslant j\leqslant N}[\delta_t(j)a_{ji}]b_i(o_{t+1}), i=1,2,\cdots, N; t=1,2,\cdots,T-1 \end{aligned} \right. δt+1(i)=i1,i2,⋯,itmaxP(it+1=i,it,⋯,i1,ot,⋯,o1∣λ),i=1,2,⋯,N=1⩽j⩽Nmax[δt(j)aji]bi(ot+1),i=1,2,⋯,N;t=1,2,⋯,T−1
The definition of Ψ \Psi Ψ is :
Ψ t ( i ) = arg max 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯ , N \Psi_{t}(i)=\arg \max_{1\leqslant j\leqslant N}[\delta_{t-1}(j)a_{ji}],i=1,2,\cdots, N Ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji],i=1,2,⋯,N
At last, we can develop the Viterbi algorithm as bellow:
-
Input: λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π) and the observation sequence
O = ( o 1 , o 2 , ⋯ , o T ) O=\left( o_1, o_2, \cdots, o_T \right) O=(o1,o2,⋯,oT); -
Output: The optimised path
I ∗ = ( i 1 ∗ , i 2 ∗ , ⋯ , i T ∗ ) I^{*}=\left( i^{*}_1, i^{*}_2, \cdots, i^{*}_T \right) I∗=(i1∗,i2∗,⋯,iT∗).
Alg:
-
Initialization:
δ 1 ( i ) = π i b i ( o 1 ) . i = 1 , 2 , ⋯ , N \delta_1(i)=\pi_ib_i(o_1). i=1,2, \cdots, N δ1(i)=πibi(o1).i=1,2,⋯,N
Ψ 1 ( i ) = 0 , i = 1 , 2 , ⋯ , N \Psi_1(i)=0, i=1,2, \cdots, N Ψ1(i)=0,i=1,2,⋯,N
-
Recursion, t = 2 , 3 , ⋯ , T t=2,3, \cdots, T t=2,3,⋯,T
δ t ( i ) = max 1 ≤ i ≤ N [ δ t − 1 ( j ) a j i ] b i ( o t ) , i = 1 , 2 , ⋯ , N \delta_{t}(i)=\max _{1 \leq i \leq N}\left[\delta_{t-1}(j) a_{j i}\right] b_{i}\left(o_{t}\right), \quad i=1,2, \cdots, N δt(i)=1≤i≤Nmax[δt−1(j)aji]bi(ot),i=1,2,⋯,N
Ψ t ( i ) = arg max 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯ , N \Psi_{t}(i)=\arg \max _{1 \leqslant j \leqslant N}\left[\delta_{t-1}(j) a_{j i}\right], \quad i=1,2, \cdots, N Ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji],i=1,2,⋯,N
-
Stop:
P ∗ = max 1 ⩽ i ⩽ N δ T ( i ) P^{*}=\max _{1 \leqslant i \leqslant N} \delta_{T}(i) P∗=1⩽i⩽NmaxδT(i)
i T ∗ = arg max 1 ⩽ i ⩽ N [ δ T ( i ) ] i_{T}^{*}=\arg \max _{1 \leqslant i \leqslant N}\left[\delta_{T}(i)\right] iT∗=arg1⩽i⩽Nmax[δT(i)]
-
Backtracking
i t ∗ = Ψ t + 1 ( i t + 1 ∗ ) i_{t}^{*}=\Psi_{t+1}\left(i_{t+1}^{*}\right) it∗=Ψt+1(it+1∗)
Finally, we will get the best path
I
∗
=
(
i
1
∗
,
i
2
∗
,
⋯
,
i
T
∗
)
I^{*}=\left(i_{1}^{*}, i_{2}^{*}, \cdots, i_{T}^{*}\right)
I∗=(i1∗,i2∗,⋯,iT∗)
Example
For example, λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π)
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = [ 0.2 0.4 0.4 ] A=\left[\begin{array}{ccc} 0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5 \end{array}\right], \quad B=\left[\begin{array}{cc} 0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3 \end{array}\right], \quad \pi=\left[\begin{array}{c} 0.2 \\ 0.4 \\ 0.4 \end{array}\right] A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤,π=⎣⎡0.20.40.4⎦⎤
The observation
O
=
(
r
e
d
,
w
h
i
t
e
,
r
e
d
)
O=(red, white, red)
O=(red,white,red), what is the best path
I
∗
=
(
i
1
∗
,
i
2
∗
,
i
3
∗
)
I^{*}=\left(i_{1}^{*}, i_{2}^{*}, i_{3}^{*}\right)
I∗=(i1∗,i2∗,i3∗).
-
Initialization:
δ 1 ( 1 ) = 0.4 × 0.5 = 0.10 , δ 1 ( 2 ) = 0.4 × 0.4 = 0.16 , δ 1 ( 3 ) = 0.4 × 0.7 = 0.28 \delta_{1}(1)=0.4\times 0.5=0.10, \quad \delta_{1}(2)=0.4\times 0.4=0.16, \quad \delta_{1}(3)=0.4\times 0.7=0.28 δ1(1)=0.4×0.5=0.10,δ1(2)=0.4×0.4=0.16,δ1(3)=0.4×0.7=0.28
Ψ 1 ( i ) = 0 , i = 1 , 2 , 3 \Psi_{1}(i)=0, i=1,2,3 Ψ1(i)=0,i=1,2,3 -
t = 2 t=2 t=2 δ 2 ( 1 ) = max 1 ⩽ j ⩽ 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max j { 0.10 × 0.5 , 0.16 × 0.3 , 0.28 × 0.2 } × 0.5 = 0.028 Ψ 2 ( 1 ) = 3 δ 2 ( 2 ) = 0.0504 Ψ 2 ( 2 ) = 3 δ 2 ( 3 ) = 0.042 Ψ 2 ( 3 ) = 3 \begin{aligned} \delta_{2}(1) &=\max _{1 \leqslant j \leqslant 3}\left[\delta_{1}(j) a_{j 1}\right] b_{1}\left(o_{2}\right) \\ &=\max _{j}\{0.10 \times 0.5,0.16 \times 0.3,0.28 \times 0.2\} \times 0.5 \\ &=0.028 \\ \Psi_{2}(1) &=3 \\ \delta_{2}(2) &=0.0504 \\ \Psi_{2}(2) &=3 \\ \delta_{2}(3) &=0.042 \\ \Psi_{2}(3) &=3 \end{aligned} δ2(1)Ψ2(1)δ2(2)Ψ2(2)δ2(3)Ψ2(3)=1⩽j⩽3max[δ1(j)aj1]b1(o2)=jmax{0.10×0.5,0.16×0.3,0.28×0.2}×0.5=0.028=3=0.0504=3=0.042=3
-
t = 3 t=3 t=3 δ 3 ( i ) = max 1 ⩽ j ⩽ 3 [ δ 2 ( j ) a j i ] b i ( o 3 ) Ψ 3 ( i ) = arg max 1 ⩽ j ⩽ 3 [ δ 2 ( j ) a j i ] δ 3 ( 1 ) = 0.00756 Ψ 3 ( 1 ) = 2 δ 3 ( 2 ) = 0.01008 Ψ 3 ( 2 ) = 2 δ 3 ( 3 ) = 0.0147 Ψ 3 ( 3 ) = 3 \begin{array}{l} \delta_{3}(i)=\max _{1 \leqslant j \leqslant 3}\left[\delta_{2}(j) a_{j i}\right] b_{i}\left(o_{3}\right) \\ \Psi_{3}(i)=\arg \max _{1 \leqslant j \leqslant 3}\left[\delta_{2}(j) a_{j i}\right] \\ \delta_{3}(1)=0.00756 \\ \Psi_{3}(1)=2 \\ \delta_{3}(2)=0.01008 \\ \Psi_{3}(2)=2 \\ \delta_{3}(3)=0.0147 \\ \Psi_{3}(3)=3 \end{array} δ3(i)=max1⩽j⩽3[δ2(j)aji]bi(o3)Ψ3(i)=argmax1⩽j⩽3[δ2(j)aji]δ3(1)=0.00756Ψ3(1)=2δ3(2)=0.01008Ψ3(2)=2δ3(3)=0.0147Ψ3(3)=3
-
Stop:
P ∗ = max 1 ⩽ i ⩽ 3 δ 3 ( i ) = 0.0147 P^{*}=\max _{1 \leqslant i \leqslant 3} \delta_{3}(i)=0.0147 P∗=1⩽i⩽3maxδ3(i)=0.0147
i 3 ∗ = arg max i [ δ 3 ( i ) ] = 3 i_{3}^{*}=\arg \max _{i}\left[\delta_{3}(i)\right]=3 i3∗=argimax[δ3(i)]=3 -
Backtracking: i 2 ∗ = Ψ 3 ( i 3 ∗ ) = Ψ 3 ( 3 ) = 3 i 1 ∗ = Ψ 2 ( i 2 ∗ ) = Ψ 2 ( 3 ) = 3 \begin{array}{r} i_{2}^{*}=\Psi_{3}\left(i_{3}^{*}\right)=\Psi_{3}(3)=3 \\ i_{1}^{*}=\Psi_{2}\left(i_{2}^{*}\right)=\Psi_{2}(3)=3 \end{array} i2∗=Ψ3(i3∗)=Ψ3(3)=3i1∗=Ψ2(i2∗)=Ψ2(3)=3
At last, we get the best path
I
∗
=
(
i
1
∗
,
i
2
∗
,
i
3
∗
)
=
(
3
,
3
,
3
)
I^{*}=\left(i_{1}^{*}, i_{2}^{*}, i_{3}^{*}\right)=(3,3,3)
I∗=(i1∗,i2∗,i3∗)=(3,3,3)