电信保温杯笔记——《统计学习方法(第二版)——李航》第10章 隐马尔可夫模型
论文
HMM算法:《An introduction to hidden Markov models》、《A tutorial on hidden Markov models and selected applications in speech recognition》
Baum-Welch算法:《A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains》
介绍
电信保温杯笔记——《统计学习方法(第二版)——李航》
本文是对原书的精读,会有大量原书的截图,同时对书上不详尽的地方进行细致解读与改写。
隐马尔可夫模型
隐马尔可夫模型的定义
对于NLP的标注问题,就是给句子中每个词标注它的词性,如 n,adj,v 等,状态的集合 Q 为{n,adj,v…},观测值的集合 V 为词的集合。
隐马尔可夫模型的基本假设
例子
盒子就是状态,球的颜色就是观测值。
隐马尔可夫模型的作用
生成观测序列
例如自动写作。
预测状态序列
对于NLP的标注问题,就是给句子中每个词标注它的词性,如 n,adj,v 等,状态的集合 Q 为{n,adj,v…},观测值的集合 V 为词的集合,观测序列就是句子。
隐马尔可夫模型的三个基本问题及其解法
三个基本问题
基本问题一:计算观测序列的概率
基本问题二:估计模型的参数
基本问题三:计算状态序列的概率
下面各节将逐一介绍这些基本问题的解法。
基本问题一的解法
直接计算法
状态序列组合共有
N
T
N^T
NT 种,公式(10.12)共
2
T
2T
2T 个元素相乘。
向前算法
α
t
+
1
(
i
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
∣
λ
)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
o
t
+
1
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
P
(
o
t
+
1
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
P
(
o
t
+
1
∣
i
t
+
1
=
q
i
,
λ
)
(基于假设2)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
b
i
(
o
t
+
1
)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
∣
λ
)
P
(
i
t
+
1
=
q
i
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
,
λ
)
b
i
(
o
t
+
1
)
=
∑
j
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
j
∣
λ
)
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
,
λ
)
b
i
(
o
t
+
1
)
(基于假设1)
=
∑
j
=
1
N
α
t
(
j
)
a
j
i
b
i
(
o
t
+
1
)
i
=
1
,
2
,
⋯
,
N
(
10.16
)
\begin{aligned} \alpha_{t+1}(i) &= P(o_1, o_2, \cdots , o_t, o_{t+1} , i_{t+1} = q_i | \lambda) \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, o_{t+1} , i_t = q_j , i_{t+1} = q_i | \lambda) \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, i_t = q_j , i_{t+1} = q_i | \lambda) P(o_{t+1} | o_1, o_2, \cdots , o_t, i_t = q_j , i_{t+1} = q_i , \lambda) \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, i_t = q_j , i_{t+1} = q_i | \lambda) P(o_{t+1} | i_{t+1} = q_i , \lambda ) \text{(基于假设2)} \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, i_t = q_j , i_{t+1} = q_i | \lambda) b_i(o_{t+1}) \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, i_t = q_j | \lambda) P( i_{t+1} = q_i | o_1, o_2, \cdots , o_t, i_t = q_j , \lambda) b_i(o_{t+1}) \\ &= \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t, i_t = q_j | \lambda) P( i_{t+1} = q_i | i_t = q_j , \lambda) b_i(o_{t+1}) \text{(基于假设1)}\\ &= \sum\limits_{j = 1}^N \alpha_{t}(j) a_{ji} b_i(o_{t+1}) \quad i = 1,2,\cdots, N \quad (10.16) \\ \end{aligned}
αt+1(i)=P(o1,o2,⋯,ot,ot+1,it+1=qi∣λ)=j=1∑NP(o1,o2,⋯,ot,ot+1,it=qj,it+1=qi∣λ)=j=1∑NP(o1,o2,⋯,ot,it=qj,it+1=qi∣λ)P(ot+1∣o1,o2,⋯,ot,it=qj,it+1=qi,λ)=j=1∑NP(o1,o2,⋯,ot,it=qj,it+1=qi∣λ)P(ot+1∣it+1=qi,λ)(基于假设2)=j=1∑NP(o1,o2,⋯,ot,it=qj,it+1=qi∣λ)bi(ot+1)=j=1∑NP(o1,o2,⋯,ot,it=qj∣λ)P(it+1=qi∣o1,o2,⋯,ot,it=qj,λ)bi(ot+1)=j=1∑NP(o1,o2,⋯,ot,it=qj∣λ)P(it+1=qi∣it=qj,λ)bi(ot+1)(基于假设1)=j=1∑Nαt(j)ajibi(ot+1)i=1,2,⋯,N(10.16)
(3)终止
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
(
10.17
)
\begin{aligned} P(O | \lambda) &= P(o_1, o_2, \cdots , o_T | \lambda) \\ &= \sum\limits_{i = 1}^N P(o_1, o_2, \cdots , o_T, i_T = q_i | \lambda) \\ &= \sum\limits_{i = 1}^N \alpha_{T}(i) \quad \quad (10.17) \\ \end{aligned}
P(O∣λ)=P(o1,o2,⋯,oT∣λ)=i=1∑NP(o1,o2,⋯,oT,iT=qi∣λ)=i=1∑NαT(i)(10.17)
例子
向后算法
(1)
β
T
(
i
)
=
P
(
∣
i
T
=
q
i
,
λ
)
=
1
i
=
1
,
2
,
⋯
,
N
(
10.19
)
\begin{aligned} \beta_T(i) &= P( | i_T = q_i , \lambda) = 1 \quad i = 1,2,\cdots , N \quad (10.19) \end{aligned}
βT(i)=P(∣iT=qi,λ)=1i=1,2,⋯,N(10.19)
(2)对
t
=
T
−
1
,
T
−
2
,
⋯
,
1
t = T-1,T-2, \cdots , 1
t=T−1,T−2,⋯,1
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
i
t
+
1
=
q
j
,
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
i
t
+
1
=
q
j
,
λ
)
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
i
t
+
1
=
q
j
,
λ
)
a
i
j
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
a
i
j
(基于假设2)
=
∑
j
=
1
N
P
(
o
t
+
1
∣
i
t
+
1
=
q
j
,
λ
)
P
(
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
a
i
j
=
∑
j
=
1
N
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
a
i
j
i
=
1
,
2
,
⋯
,
N
(
10.20
)
\begin{aligned} \beta_t(i) &= P( o_{t+1} , o_{t+2} , \cdots , o_T | i_t = q_i , \lambda) \\ &= \sum\limits_{j = 1}^N P( i_{t+1} = q_j , o_{t+1} , o_{t+2} , \cdots , o_T | i_t = q_i , \lambda) \\ &= \sum\limits_{j = 1}^N P( o_{t+1} , o_{t+2} , \cdots , o_T | i_t = q_i , i_{t+1} = q_j ,\lambda) P( i_{t+1} = q_j | i_t = q_i , \lambda) \\ &= \sum\limits_{j = 1}^N P( o_{t+1} , o_{t+2} , \cdots , o_T | i_t = q_i , i_{t+1} = q_j , \lambda) a_{ij} \\ &= \sum\limits_{j = 1}^N P( o_{t+1} , o_{t+2} , \cdots , o_T | i_{t+1} = q_j ,\lambda) a_{ij} \text{(基于假设2)} \\ &= \sum\limits_{j = 1}^N P( o_{t+1} | i_{t+1} = q_j ,\lambda) P( o_{t+2} , \cdots , o_T | i_{t+1} = q_j ,\lambda) a_{ij} \\ &= \sum\limits_{j = 1}^N b_j(o_{t+1}) \beta_{t+1}(j) a_{ij} \quad i = 1,2,\cdots , N \quad (10.20) \end{aligned}
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=j=1∑NP(it+1=qj,ot+1,ot+2,⋯,oT∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it=qi,it+1=qj,λ)P(it+1=qj∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it=qi,it+1=qj,λ)aij=j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj,λ)aij(基于假设2)=j=1∑NP(ot+1∣it+1=qj,λ)P(ot+2,⋯,oT∣it+1=qj,λ)aij=j=1∑Nbj(ot+1)βt+1(j)aiji=1,2,⋯,N(10.20)
(3)
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
T
,
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
T
,
∣
i
1
=
q
i
,
λ
)
P
(
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
⋯
,
o
T
,
∣
i
1
=
q
i
,
λ
)
π
i
=
∑
i
=
1
N
P
(
o
1
∣
i
1
=
q
i
,
λ
)
P
(
o
2
,
⋯
,
o
T
,
∣
i
1
=
q
i
,
λ
)
π
i
=
∑
i
=
1
N
b
i
(
o
1
)
β
1
(
i
)
π
i
(
10.21
)
\begin{aligned} P(O | \lambda) &= P(o_1, o_2, \cdots , o_T | \lambda) \\ &= \sum\limits_{i = 1}^N P(o_1, o_2, \cdots , o_T, i_1 = q_i | \lambda) \\ &= \sum\limits_{i = 1}^N P(o_1, o_2, \cdots , o_T, | i_1 = q_i , \lambda) P(i_1 = q_i | \lambda )\\ &= \sum\limits_{i = 1}^N P(o_1, o_2, \cdots , o_T, | i_1 = q_i , \lambda) \pi_i \\ &= \sum\limits_{i = 1}^N P(o_1 | i_1 = q_i , \lambda) P( o_2, \cdots , o_T, | i_1 = q_i , \lambda) \pi_i \\ &= \sum\limits_{i = 1}^N b_i(o_1) \beta_1(i) \pi_i \quad (10.21) \\ \end{aligned}
P(O∣λ)=P(o1,o2,⋯,oT∣λ)=i=1∑NP(o1,o2,⋯,oT,i1=qi∣λ)=i=1∑NP(o1,o2,⋯,oT,∣i1=qi,λ)P(i1=qi∣λ)=i=1∑NP(o1,o2,⋯,oT,∣i1=qi,λ)πi=i=1∑NP(o1∣i1=qi,λ)P(o2,⋯,oT,∣i1=qi,λ)πi=i=1∑Nbi(o1)β1(i)πi(10.21)
P ( O ∣ λ ) = P ( o 1 , o 2 , ⋯ , o t , o t + 1 , o t + 2 , ⋯ , o T ∣ λ ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o t , i t = q i , i t + 1 = q j , o t + 1 , o t + 2 , ⋯ , o T ∣ λ ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o t , i t = q i ∣ λ ) P ( i t + 1 = q j , o t + 1 , o t + 2 , ⋯ , o T ∣ o 1 , o 2 , ⋯ , o t , i t = q i , λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) P ( i t + 1 = q j , o t + 1 , o t + 2 , ⋯ , o T ∣ o 1 , o 2 , ⋯ , o t , i t = q i , λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ o 1 , o 2 , ⋯ , o t , i t = q i , i t + 1 = q j , λ ) P ( i t + 1 = q j ∣ o 1 , o 2 , ⋯ , o t , i t = q i , λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j , λ ) P ( i t + 1 = q j ∣ i t = q i , λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j , λ ) a i j = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j P ( o t + 1 ∣ i t + 1 = q j , λ ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j , λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ( 10.22 ) \begin{aligned} P(O | \lambda) &= P(o_1, o_2, \cdots , o_t , o_{t+1} ,o_{t+2} , \cdots , o_T | \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t ,i_t = q_i , i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N P(o_1, o_2, \cdots , o_t ,i_t = q_i | \lambda) P( i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) P( i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , i_{t+1} = q_j , \lambda) P( i_{t+1} = q_j | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) P( i_{t+1} = q_j | i_t = q_i , \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) a_{ij} \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) a_{ij} P( o_{t+1} | i_{t+1} = q_j , \lambda) P( o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) \\ &= \sum\limits_{i = 1}^N \sum\limits_{j = 1}^N \alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j) \quad (10.22) \\ \end{aligned} P(O∣λ)=P(o1,o2,⋯,ot,ot+1,ot+2,⋯,oT∣λ)=i=1∑Nj=1∑NP(o1,o2,⋯,ot,it=qi,it+1=qj,ot+1,ot+2,⋯,oT∣λ)=i=1∑Nj=1∑NP(o1,o2,⋯,ot,it=qi∣λ)P(it+1=qj,ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,λ)=i=1∑Nj=1∑Nαt(i)P(it+1=qj,ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,λ)=i=1∑Nj=1∑Nαt(i)P(ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,it+1=qj,λ)P(it+1=qj∣o1,o2,⋯,ot,it=qi,λ)=i=1∑Nj=1∑Nαt(i)P(ot+1,ot+2,⋯,oT∣it+1=qj,λ)P(it+1=qj∣it=qi,λ)=i=1∑Nj=1∑Nαt(i)P(ot+1,ot+2,⋯,oT∣it+1=qj,λ)aij=i=1∑Nj=1∑Nαt(i)aijP(ot+1∣it+1=qj,λ)P(ot+2,⋯,oT∣it+1=qj,λ)=i=1∑Nj=1∑Nαt(i)aijbj(ot+1)βt+1(j)(10.22)
—些概率与期望值的计算
P
(
i
t
=
q
i
,
O
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
\begin{aligned} P(i_t = q_i, O | \lambda) &= P(o_1, o_2, \cdots , o_t ,i_t = q_i , o_{t+1} ,o_{t+2} ,\cdots , o_T | \lambda) \\ &= P(o_1, o_2, \cdots , o_t ,i_t = q_i | \lambda) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= P(o_1, o_2, \cdots , o_t ,i_t = q_i | \lambda) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | i_t = q_i , \lambda) \\ &= \alpha_t(i) \beta_t(i) \end{aligned}
P(it=qi,O∣λ)=P(o1,o2,⋯,ot,it=qi,ot+1,ot+2,⋯,oT∣λ)=P(o1,o2,⋯,ot,it=qi∣λ)P(ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,λ)=P(o1,o2,⋯,ot,it=qi∣λ)P(ot+1,ot+2,⋯,oT∣it=qi,λ)=αt(i)βt(i)
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
i
t
+
1
=
q
j
,
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
P
(
i
t
+
1
=
q
j
,
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
i
t
+
1
=
q
j
,
λ
)
P
(
i
t
+
1
=
q
j
∣
o
1
,
o
2
,
⋯
,
o
t
,
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
a
i
j
=
α
t
(
i
)
P
(
o
t
+
1
∣
i
t
+
1
=
q
j
,
λ
)
P
(
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
λ
)
a
i
j
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
\begin{aligned} P(i_t = q_i, i_{t+1} = q_j , O | \lambda) &= P(o_1, o_2, \cdots , o_t ,i_t = q_i , i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | \lambda) \\ &= P(o_1, o_2, \cdots , o_t ,i_t = q_i | \lambda) P( i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \alpha_t(i) P( i_{t+1} = q_j , o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | o_1, o_2, \cdots , o_t ,i_t = q_i , i_{t+1} = q_j , \lambda) P( i_{t+1} = q_j | o_1, o_2, \cdots , o_t ,i_t = q_i , \lambda) \\ &= \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) P( i_{t+1} = q_j | i_t = q_i , \lambda) \\ &= \alpha_t(i) P( o_{t+1} ,o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) a_{ij} \\ &= \alpha_t(i) P( o_{t+1} | i_{t+1} = q_j , \lambda) P( o_{t+2} ,\cdots , o_T | i_{t+1} = q_j , \lambda) a_{ij} \\ &= \alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j) \\ \end{aligned}
P(it=qi,it+1=qj,O∣λ)=P(o1,o2,⋯,ot,it=qi,it+1=qj,ot+1,ot+2,⋯,oT∣λ)=P(o1,o2,⋯,ot,it=qi∣λ)P(it+1=qj,ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,λ)=αt(i)P(it+1=qj,ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,λ)=αt(i)P(ot+1,ot+2,⋯,oT∣o1,o2,⋯,ot,it=qi,it+1=qj,λ)P(it+1=qj∣o1,o2,⋯,ot,it=qi,λ)=αt(i)P(ot+1,ot+2,⋯,oT∣it+1=qj,λ)P(it+1=qj∣it=qi,λ)=αt(i)P(ot+1,ot+2,⋯,oT∣it+1=qj,λ)aij=αt(i)P(ot+1∣it+1=qj,λ)P(ot+2,⋯,oT∣it+1=qj,λ)aij=αt(i)aijbj(ot+1)βt+1(j)
基本问题二的解法
监督学习算法
Baum-Welch 算法
I
I
I 是隐变量。EM算法可以参考电信保温杯笔记——《统计学习方法(第二版)——李航》第9章 EM算法及其推广,如果了解EM算法,下面推导其实可以不用看
步骤
基本问题三的解法
近似算法
维特比算法(viterbi)
可以先看看这个视频机器学习-白板推导系列(十四)-隐马尔可夫模型HMM(Hidden Markov Model)
维特比算法其实和向前算法类似,只不过一个是求 max,一个是求 sum,递推的方式都是一样的,等下看公式的时候就知道。
δ
\delta
δ 记录的是概率,
Ψ
\Psi
Ψ 记录的是一个状态。
对比向前算法的递推式:
α
t
+
1
(
i
)
=
P
(
o
1
,
o
2
,
⋯
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
∣
λ
)
=
∑
j
=
1
N
α
t
(
j
)
a
j
i
b
i
(
o
t
+
1
)
i
=
1
,
2
,
⋯
,
N
(
10.16
)
\begin{aligned} \alpha_{t+1}(i) &= P(o_1, o_2, \cdots , o_t, o_{t+1} , i_{t+1} = q_i | \lambda) \\ &= \sum\limits_{j = 1}^N \alpha_{t}(j) a_{ji} b_i(o_{t+1}) \quad i = 1,2,\cdots, N \quad (10.16) \\ \end{aligned}
αt+1(i)=P(o1,o2,⋯,ot,ot+1,it+1=qi∣λ)=j=1∑Nαt(j)ajibi(ot+1)i=1,2,⋯,N(10.16)
步骤
例子
本章概要
相关视频
机器学习-白板推导系列(十四)-隐马尔可夫模型HMM(Hidden Markov Model)
相关的笔记
hktxt /Learn-Statistical-Learning-Method
相关代码
Dod-o /Statistical-Learning-Method_Code
这个词性标注不同于平时的词性标注,它标记字属于词的什么部分:
B:词语的开头
M:一个词语的中间词
E:一个词语的结尾
S:非词语,单个词
这四个就是状态的集合
Q
Q
Q。
def trainParameter(fileName):
用定义直接统计
π
\pi
π、A、B,概率结果使用
log
P
\log P
logP 保存,以防概率累乘后值太小,属于一个技巧。
def loadArticle(fileName):
读取测试集文章
def participle(artical, PI, A, B):
使用维特比算法,将每个字的隐状态预测出来,遇到状态 E 或 S 就作为划分的依据,实现了分词的效果。