隐含马尔科夫(HMM)模型
1.简介
隐含马尔科夫模型是一种关于时序的有向图概率模型,可用于最高气温预测、分词等时序数据建模问题。首先介绍下这一类问题的特点,用 o t o_t ot表示t时刻观测值,记观测值序列为 O = { o 1 , o 2 , … , o n } O=\{o_1,o_2,\dots,o_n\} O={o1,o2,…,on},某一个时刻t的观测值与前n个时刻观测值有一定关系;根据常识来看,某天的最高气温跟前两天的最高气温有一定关系,这就是一个时序数据建模问题。解决这类问题最简单的方法就是用多项式来拟合,以前n个时刻观测值作为输入,以t时刻观测值输出。虽然模型很简单,但其效果并不是很好,因为真实情况比这个复杂得多。隐含马尔科夫模型能够更好地解决这类问题。
隐含马尔科夫模型并不直接使用观测值作为输入,而是假设观测值序列O是由对应的状态序列S产生的,模型最后预测的结果也是状态序列,这样就能够求解出隐变量。状态S就是隐变量,满足马尔科夫假设,即满足任意t时刻的状态 s t s_t st与其前n个状态相关;并且观测值序列O由状态序列S产生。为了简便,特作出两个假设:
- 齐次马尔科夫假设:任意t时刻状态 s t s_t st只与其前一个状态 s t − 1 s_{t-1} st−1有关。
- 观测独立性假设:任意t时刻观测值 o t o_t ot只与该时刻状态 s t s_t st有关,观测值之间相互独立。
特别说明,这里的状态S一定是离散的,观测值O既可以是离散的也可以是连续的,为了便于说明,采用离散的形式来讨论。
假设
Q
=
{
q
1
,
q
2
,
…
,
q
M
}
Q=\{q_1,q_2,\dots,q_M\}
Q={q1,q2,…,qM}表示所有可能状态的集合;
V
=
{
v
1
,
v
2
,
…
,
v
N
}
V=\{v_1,v_2,\dots,v_N\}
V={v1,v2,…,vN}表示所有可能观测值;状态转移概率矩阵A,其中
a
i
j
a_{ij}
aij表示状态
q
i
q_i
qi转移到状态
q
j
q_j
qj的概率,即
a
i
j
=
P
(
s
t
=
q
j
∣
s
t
−
1
=
q
i
)
a_{ij}=P(s_t = q_j|s_{t-1}=q_i)
aij=P(st=qj∣st−1=qi)
A
=
[
a
11
a
12
…
a
1
M
a
21
a
22
…
a
2
M
⋮
⋮
…
⋮
a
M
1
a
M
2
…
a
M
M
]
M
×
M
A = \left[ \begin{matrix} a_{11} & a_{12} & \dots &a_{1M}\\ a_{21} & a_{22} & \dots &a_{2M}\\ \vdots & \vdots & \dots & \vdots\\ a_{M1} & a_{M2} & \dots &a_{MM} \end{matrix} \right]_{M \times M}
A=⎣⎢⎢⎢⎡a11a21⋮aM1a12a22⋮aM2…………a1Ma2M⋮aMM⎦⎥⎥⎥⎤M×M
观测概率矩阵B,其中
b
i
j
b_{ij}
bij表示状态
q
i
q_i
qi产生观测值
v
j
v_j
vj的概率,即
b
i
j
=
P
(
o
t
=
v
j
∣
s
t
=
q
i
)
b_{ij}=P(o_t = v_j|s_t=q_i)
bij=P(ot=vj∣st=qi)
B
=
[
b
11
b
12
…
b
1
N
b
21
b
22
…
b
2
N
⋮
⋮
…
⋮
b
M
1
b
M
2
…
b
M
N
]
M
×
N
B = \left[ \begin{matrix} b_{11} & b_{12} & \dots & b_{1N}\\ b_{21} & b_{22} & \dots & b_{2N}\\ \vdots & \vdots & \dots & \vdots\\ b_{M1} & b_{M2} & \dots & b_{MN} \end{matrix} \right]_{M \times N}
B=⎣⎢⎢⎢⎡b11b21⋮bM1b12b22⋮bM2…………b1Nb2N⋮bMN⎦⎥⎥⎥⎤M×N
时刻1的状态概率向量
Π
\Pi
Π,其中
π
i
\pi_i
πi表示时刻1状态
q
i
q_i
qi出现的概率,即
π
i
=
P
(
s
1
=
q
i
)
\pi_i = P(s_1 = q_i)
πi=P(s1=qi)
Π
=
[
π
1
,
π
2
,
…
,
π
M
]
1
×
M
\Pi = [\pi_1, \pi_2,\dots,\pi_M]_{1 \times M}
Π=[π1,π2,…,πM]1×M
于是,隐含马尔科夫模型可记为
λ
=
(
A
,
B
,
Π
)
\lambda = (A,B,\Pi)
λ=(A,B,Π),其中
A
,
B
,
Π
A,B,\Pi
A,B,Π称为隐含马尔科夫模型三要素。
训练隐含马尔科夫模型需要解决3个基本问题:
- 概率计算问题:给出观测序列O,如何计算似然函数概率 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
- 学习问题:给定观测序列O,如何最大化似然函数求解参数 λ \lambda λ
- 预测问题:求解得到模型后,如何预测得出最优状态序列
2.概率计算问题
给定观测序列
O
=
{
o
1
,
o
2
,
…
,
o
T
}
O=\{o_1,o_2,\dots,o_T\}
O={o1,o2,…,oT}与参数模型
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)有
P
(
O
∣
λ
)
=
∑
S
∈
Q
T
P
(
O
,
S
∣
λ
)
=
∑
s
1
,
s
2
,
…
,
s
T
P
(
o
1
,
o
2
,
…
,
o
T
,
s
1
,
s
2
,
…
,
s
T
∣
λ
)
=
∑
s
1
,
s
2
,
…
,
s
T
π
s
1
b
s
1
o
1
a
s
1
s
2
b
s
2
o
2
o
2
…
a
s
T
−
1
s
T
b
s
T
o
T
\begin{aligned} P(O|\lambda) &= \sum_{S \in Q^T} P(O,S|\lambda)\\ & = \sum_{s_1,s_2,\dots,s_T} P(o_1,o_2,\dots,o_T,s_1,s_2,\dots,s_T|\lambda)\\ & = \sum_{s_1,s_2,\dots,s_T} \pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2}o_2 \dots a_{s_{T-1}s_T}b_{s_To_T} \end{aligned}
P(O∣λ)=S∈QT∑P(O,S∣λ)=s1,s2,…,sT∑P(o1,o2,…,oT,s1,s2,…,sT∣λ)=s1,s2,…,sT∑πs1bs1o1as1s2bs2o2o2…asT−1sTbsToT
其中
s
1
,
s
2
,
…
,
s
T
s_1,s_2,\dots,s_T
s1,s2,…,sT为任意长度为T的状态序列,并且观测值之间相互独立。显然如果直接计算,那么计算复杂度会非常高,接下来介绍前向后向算法来计算上述概率表达式。
2.1 前向算法
前向算法做出如下定义:
α
t
(
i
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
s
t
=
q
i
∣
λ
)
,
t
=
1
,
2
,
…
,
T
\alpha_t(i) = P(o_1,o_2,\dots,o_t,s_t=q_i|\lambda),t=1,2,\dots,T
αt(i)=P(o1,o2,…,ot,st=qi∣λ),t=1,2,…,T
那么,初值为
α
1
(
i
)
=
P
(
o
1
,
s
1
=
q
i
∣
λ
)
=
π
i
b
q
1
o
1
\alpha_1(i) = P(o_1,s_1=q_i|\lambda)=\pi_ib_{q_1o_1}
α1(i)=P(o1,s1=qi∣λ)=πibq1o1,递推公式为
α
t
+
1
(
i
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
o
t
+
1
,
s
t
+
1
=
q
i
∣
λ
)
=
b
i
o
t
+
1
⋅
∑
j
=
1
M
α
t
(
j
)
⋅
a
j
i
\begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,\dots,o_t,o_{t+1}, s_{t+1}=q_i|\lambda)\\ & = b_{io_{t+1}} \cdot \sum_{j=1}^M \alpha_t(j) \cdot a_{ji} \end{aligned}
αt+1(i)=P(o1,o2,…,ot,ot+1,st+1=qi∣λ)=biot+1⋅j=1∑Mαt(j)⋅aji
于是通过迭代的方式可以求得
α
T
(
i
)
,
i
=
1
,
2
,
…
,
M
\alpha_T(i),i=1,2,\dots,M
αT(i),i=1,2,…,M,即求得
P
(
O
∣
λ
)
=
∑
i
=
1
M
P
(
o
1
,
o
2
,
…
,
o
T
,
s
T
=
q
i
∣
λ
)
=
∑
i
=
1
M
α
T
(
i
)
P(O|\lambda)=\sum_{i=1}^M P(o_1,o_2,\dots,o_T,s_T=q_i|\lambda)=\sum_{i=1}^M\alpha_T(i)
P(O∣λ)=i=1∑MP(o1,o2,…,oT,sT=qi∣λ)=i=1∑MαT(i)
2.2 后向算法
后向算法做出如下定义:
β
t
(
i
)
=
P
(
o
t
+
1
,
…
,
o
T
∣
s
t
=
q
i
,
λ
)
,
t
=
T
,
T
−
1
,
…
,
1
\beta_t(i) = P(o_{t+1},\dots,o_T|s_t=q_i,\lambda),t = T,T-1,\dots,1
βt(i)=P(ot+1,…,oT∣st=qi,λ),t=T,T−1,…,1
那么,初值为
β
T
(
i
)
=
1
\beta_T(i)=1
βT(i)=1,递推公式为
β
t
−
1
(
i
)
=
P
(
o
t
,
o
t
+
1
,
…
,
o
T
∣
s
t
−
1
=
q
i
,
λ
)
=
∑
j
=
1
M
α
i
j
⋅
b
j
o
t
⋅
β
t
(
j
)
\begin{aligned} \beta_{t-1}(i)&=P(o_t,o_{t+1},\dots,o_T|s_{t-1}=q_i,\lambda)\\ & = \sum_{j=1}^M \alpha_{ij} \cdot b_{jo_t} \cdot \beta_t(j) \end{aligned}
βt−1(i)=P(ot,ot+1,…,oT∣st−1=qi,λ)=j=1∑Mαij⋅bjot⋅βt(j)
于是通过迭代的方式可以求得
β
1
(
i
)
,
i
=
1
,
2
,
…
,
M
\beta_1(i),i=1,2,\dots,M
β1(i),i=1,2,…,M,即求得
P
(
O
∣
λ
)
=
∑
i
=
1
M
P
(
o
1
,
o
2
,
…
,
o
T
∣
s
1
=
q
i
,
λ
)
=
∑
i
=
1
M
π
i
⋅
b
i
o
1
⋅
β
1
(
i
)
P(O|\lambda)=\sum_{i=1}^M P(o_1,o_2,\dots,o_T|s_1=q_i,\lambda)=\sum_{i=1}^M \pi_{i} \cdot b_{io_1} \cdot \beta_1(i)
P(O∣λ)=i=1∑MP(o1,o2,…,oT∣s1=qi,λ)=i=1∑Mπi⋅bio1⋅β1(i)
3.学习问题
隐含马尔科夫模型只能够含有隐变量S,可使用EM算法进行求解,求解后得到的算法称鲍姆-韦尔奇(Baum-Welch)算法。算法推导过程如下:
如前所述,给定观测值序列O和模型参数
λ
\lambda
λ,对于任意状态序列S有
P
(
O
,
S
∣
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
T
,
s
1
,
s
2
,
…
,
s
T
∣
λ
)
=
π
s
1
b
s
1
o
1
a
s
1
s
2
b
s
2
o
2
…
a
s
T
−
1
s
T
b
s
T
o
T
P(O,S|\lambda) = P(o_1,o_2,\dots,o_T,s_1,s_2,\dots,s_T|\lambda)=\pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2} \dots a_{s_{T-1}s_T}b_{s_To_T}
P(O,S∣λ)=P(o1,o2,…,oT,s1,s2,…,sT∣λ)=πs1bs1o1as1s2bs2o2…asT−1sTbsToT
1.E步:求解Q函数
Q
(
λ
,
λ
i
)
=
E
S
[
log
P
(
O
,
S
∣
λ
)
∣
O
,
λ
i
]
=
∑
S
log
[
P
(
O
,
S
∣
λ
)
]
P
(
S
∣
O
,
λ
i
)
\begin{aligned} Q(\lambda,\lambda_i) &= E_S[\log P(O,S|\lambda)|O,\lambda_i]\\ & = \sum_S \log [P(O,S|\lambda)] P(S|O,\lambda_i) \end{aligned}
Q(λ,λi)=ES[logP(O,S∣λ)∣O,λi]=S∑log[P(O,S∣λ)]P(S∣O,λi)
其中
λ
i
\lambda_i
λi为当前模型参数估计值,
λ
\lambda
λ是要极大化似然函数的模型参数值。又因为
P
(
O
∣
λ
i
)
P(O|\lambda_i)
P(O∣λi)与
λ
,
S
\lambda,S
λ,S都无关,不影响Q函数最大化,所以Q函数也可以写成如下形式:
Q
(
λ
,
λ
i
)
=
P
(
O
∣
λ
i
)
∑
S
log
[
P
(
O
,
S
∣
λ
)
]
P
(
S
∣
O
,
λ
i
)
=
∑
S
log
[
P
(
O
,
S
∣
λ
)
]
P
(
O
,
S
∣
λ
i
)
\begin{aligned} Q(\lambda,\lambda_i) &= P(O|\lambda_i) \sum_S \log [P(O,S|\lambda)] P(S|O,\lambda_i)\\ & = \sum_S \log [P(O,S|\lambda)] P(O,S|\lambda_i) \end{aligned}
Q(λ,λi)=P(O∣λi)S∑log[P(O,S∣λ)]P(S∣O,λi)=S∑log[P(O,S∣λ)]P(O,S∣λi)
于是将
P
(
O
,
S
∣
λ
)
P(O,S|\lambda)
P(O,S∣λ)表达式带入上式可得
Q
(
λ
,
λ
i
)
=
∑
S
log
[
π
s
1
b
s
1
o
1
a
s
1
s
2
b
s
2
o
2
…
a
s
T
−
1
s
T
b
s
T
o
T
]
⋅
P
(
O
,
S
∣
λ
i
)
=
∑
S
log
π
s
1
P
(
O
,
S
∣
λ
i
)
+
∑
S
∑
t
=
1
T
−
1
log
a
s
t
s
t
+
1
P
(
O
,
S
∣
λ
i
)
+
∑
S
∑
t
=
1
T
log
b
s
t
o
t
P
(
O
,
S
∣
λ
i
)
=
∑
j
=
1
M
log
π
j
P
(
O
,
s
1
=
q
j
∣
λ
i
)
+
∑
j
=
1
M
∑
k
=
1
M
∑
t
=
1
T
−
1
log
a
j
k
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
+
∑
j
=
1
M
∑
t
=
1
T
log
b
j
o
t
P
(
O
,
s
t
=
q
j
∣
λ
i
)
\begin{aligned} Q(\lambda,\lambda_i) &= \sum_S \log [\pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2} \dots a_{s_{T-1}s_T}b_{s_To_T}] \cdot P(O,S|\lambda_i)\\ & = \sum_S\log\pi_{s_1}P(O,S|\lambda_i) + \sum_S \sum_{t=1}^{T-1} \log a_{s_ts_{t+1}} P(O,S|\lambda_i) + \sum_S \sum_{t=1}^T \log b_{s_to_t} P(O,S|\lambda_i)\\ & = \sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i) + \sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)\\ &+ \sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i)\\ \end{aligned}
Q(λ,λi)=S∑log[πs1bs1o1as1s2bs2o2…asT−1sTbsToT]⋅P(O,S∣λi)=S∑logπs1P(O,S∣λi)+S∑t=1∑T−1logastst+1P(O,S∣λi)+S∑t=1∑TlogbstotP(O,S∣λi)=j=1∑MlogπjP(O,s1=qj∣λi)+j=1∑Mk=1∑Mt=1∑T−1logajkP(O,st=qj,st+1=qk∣λi)+j=1∑Mt=1∑TlogbjotP(O,st=qj∣λi)
2.M步:最大化Q函数
接下来,对Q函数中三部分分别求解条件约束极值
(1)求解 π \pi π
因为
π
\pi
π向量表示时刻1的所有可能状态,所以
∑
j
=
1
M
π
j
=
1
\sum_{j=1}^M \pi_j=1
∑j=1Mπj=1。于是求
∑
j
=
1
M
log
π
j
P
(
O
,
s
1
=
q
j
∣
λ
i
)
\sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i)
∑j=1MlogπjP(O,s1=qj∣λi)在约束条件
∑
j
=
1
M
π
j
=
1
\sum_{j=1}^M \pi_j=1
∑j=1Mπj=1下极值,其拉格朗日函数为
L
1
(
μ
1
)
=
∑
j
=
1
M
log
π
j
P
(
O
,
s
1
=
q
j
∣
λ
i
)
+
μ
1
(
∑
j
=
1
M
π
j
−
1
)
L_1(\mu_1) = \sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i) + \mu_1(\sum_{j=1}^M \pi_j-1)
L1(μ1)=j=1∑MlogπjP(O,s1=qj∣λi)+μ1(j=1∑Mπj−1)
求偏导可得
∂
L
1
∂
π
j
=
P
(
O
,
s
1
=
q
j
∣
λ
i
)
π
j
+
μ
1
\frac{\partial L_1}{\partial \pi_j} = \frac{P(O,s_1 = q_j|\lambda_i)}{\pi_j} + \mu_1
∂πj∂L1=πjP(O,s1=qj∣λi)+μ1
∂ L 1 ∂ μ 1 = ∑ j = 1 M π j − 1 \frac{\partial L_1}{\partial \mu_1} = \sum_{j=1}^M \pi_j-1 ∂μ1∂L1=j=1∑Mπj−1
令以上两式均为0可得
π
^
j
=
P
(
O
,
s
1
=
q
j
∣
λ
i
)
∑
k
=
1
M
P
(
O
,
s
1
=
q
k
∣
λ
i
)
\hat{\pi}_j = \frac{P(O,s_1=q_j|\lambda_i)}{\sum_{k=1}^M P(O,s_1=q_k|\lambda_i)}
π^j=∑k=1MP(O,s1=qk∣λi)P(O,s1=qj∣λi)
又由前所述概率计算方法可得
P
(
O
,
s
t
=
q
j
∣
λ
i
)
=
α
t
(
j
)
⋅
β
t
(
j
)
P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j)
P(O,st=qj∣λi)=αt(j)⋅βt(j)
所以
π
^
j
=
α
1
(
j
)
β
1
(
j
)
∑
k
=
1
M
α
1
(
k
)
β
1
(
k
)
\hat{\pi}_j = \frac{\alpha_1(j)\beta_1(j)}{\sum_{k=1}^M \alpha_1(k)\beta_1(k)}
π^j=∑k=1Mα1(k)β1(k)α1(j)β1(j)
(2)求解A
因为A是状态转移概率矩阵,所以
∑
k
=
1
M
a
j
k
=
1
\sum_{k=1}^M a_{jk}=1
∑k=1Majk=1。于是求解
∑
j
=
1
M
∑
k
=
1
M
∑
t
=
1
T
−
1
log
a
j
k
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
\sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)
∑j=1M∑k=1M∑t=1T−1logajkP(O,st=qj,st+1=qk∣λi)在
∑
k
=
1
M
a
j
k
=
1
\sum_{k=1}^M a_{jk}=1
∑k=1Majk=1下的条件极值,其拉格朗日函数为
L
2
(
μ
2
)
=
∑
j
=
1
M
∑
k
=
1
M
∑
t
=
1
T
−
1
log
a
j
k
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
+
μ
2
(
∑
k
=
1
M
a
j
k
−
1
)
L_2(\mu_2) = \sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i) + \mu_2(\sum_{k=1}^M a_{jk}-1)
L2(μ2)=j=1∑Mk=1∑Mt=1∑T−1logajkP(O,st=qj,st+1=qk∣λi)+μ2(k=1∑Majk−1)
求偏导数可得
∂
L
2
(
μ
2
)
∂
a
j
k
=
∑
t
=
1
T
−
1
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
a
j
k
+
μ
2
\frac{\partial L_2(\mu_2)}{\partial a_{jk}} = \sum_{t=1}^{T-1} \frac{P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{a_{jk}} + \mu_2
∂ajk∂L2(μ2)=t=1∑T−1ajkP(O,st=qj,st+1=qk∣λi)+μ2
∂ L 2 ( μ 2 ) ∂ μ 2 = ∑ k = 1 M a j k − 1 \frac{\partial L_2(\mu_2)}{\partial \mu_2} = \sum_{k=1}^M a_{jk}-1 ∂μ2∂L2(μ2)=k=1∑Majk−1
令偏导数为0可得
a
^
j
k
=
∑
t
=
1
T
−
1
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
∑
k
=
1
M
∑
t
=
1
T
−
1
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
=
∑
t
=
1
T
−
1
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
∑
t
=
1
T
−
1
P
(
O
,
s
t
=
q
j
∣
λ
i
)
\begin{aligned} \hat a_{jk} &= \frac{\sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{\sum_{k=1}^M \sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}\\ & = \frac{\sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{\sum_{t=1}^{T-1}P(O,s_t=q_j|\lambda_i)} \end{aligned}
a^jk=∑k=1M∑t=1T−1P(O,st=qj,st+1=qk∣λi)∑t=1T−1P(O,st=qj,st+1=qk∣λi)=∑t=1T−1P(O,st=qj∣λi)∑t=1T−1P(O,st=qj,st+1=qk∣λi)
又由前述概率计算方法可得
P
(
O
,
s
t
=
q
j
,
s
t
+
1
=
q
k
∣
λ
i
)
=
α
t
(
j
)
a
j
k
b
k
o
t
+
1
β
t
+
1
(
k
)
P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i) = \alpha_t(j) a_{jk} b_{ko_{t+1}} \beta_{t+1}(k)
P(O,st=qj,st+1=qk∣λi)=αt(j)ajkbkot+1βt+1(k)
P ( O , s t = q j ∣ λ i ) = α t ( j ) ⋅ β t ( j ) P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j) P(O,st=qj∣λi)=αt(j)⋅βt(j)
于是
a
^
j
k
=
∑
t
=
1
T
−
1
α
t
(
j
)
a
j
k
b
k
o
t
+
1
β
t
+
1
(
k
)
∑
t
=
1
T
−
1
α
t
(
j
)
⋅
β
t
(
j
)
\hat a_{jk} = \frac{\sum_{t=1}^{T-1} \alpha_t(j) a_{jk} b_{ko_{t+1}} \beta_{t+1}(k)}{\sum_{t=1}^{T-1} \alpha_t(j) \cdot \beta_t(j)}
a^jk=∑t=1T−1αt(j)⋅βt(j)∑t=1T−1αt(j)ajkbkot+1βt+1(k)
(3)计算B
因为B表示观测概率矩阵,所以
∑
t
=
1
T
b
j
k
=
1
\sum_{t=1}^T b_{jk}=1
∑t=1Tbjk=1。于是求解
∑
j
=
1
M
∑
t
=
1
T
log
b
j
o
t
P
(
O
,
s
t
=
q
j
∣
λ
i
)
\sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i)
∑j=1M∑t=1TlogbjotP(O,st=qj∣λi)在
∑
t
=
1
T
b
j
o
t
=
1
\sum_{t=1}^T b_{jo_t}=1
∑t=1Tbjot=1约束条件下极值,其拉格朗日函数为
L
3
(
μ
3
)
=
∑
j
=
1
M
∑
t
=
1
T
log
b
j
o
t
P
(
O
,
s
t
=
q
j
∣
λ
i
)
+
μ
3
(
∑
t
=
1
T
b
j
o
t
−
1
)
L_3(\mu_3) = \sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i) + \mu_3(\sum_{t=1}^T b_{jo_t}-1)
L3(μ3)=j=1∑Mt=1∑TlogbjotP(O,st=qj∣λi)+μ3(t=1∑Tbjot−1)
求偏导数得
∂
L
3
(
μ
3
)
∂
b
j
o
t
=
P
(
O
,
s
t
=
q
j
∣
λ
i
)
b
j
o
t
+
μ
3
\frac{\partial L_3(\mu_3)}{\partial b_{jo_t}} = \frac{P(O,s_t = q_j|\lambda_i)}{b_{jo_t}} + \mu_3
∂bjot∂L3(μ3)=bjotP(O,st=qj∣λi)+μ3
∂ L 3 ( μ 3 ) ∂ μ 3 = ∑ t = 1 T b j k − 1 \frac{\partial L_3(\mu_3)}{\partial \mu_3} = \sum_{t=1}^T b_{jk}-1 ∂μ3∂L3(μ3)=t=1∑Tbjk−1
令偏导数为0可得
b
^
j
k
=
∑
t
=
1
T
P
(
O
,
s
t
=
q
j
∣
λ
i
)
I
(
o
t
=
v
k
)
∑
t
=
1
T
P
(
O
,
s
t
=
q
j
∣
λ
i
)
\begin{aligned} \hat b _{jk} &= \frac{\sum_{t=1}^T P(O,s_t = q_j|\lambda_i)I(o_t=v_k)}{\sum_{t=1}^T P(O,s_t = q_j|\lambda_i)} \end{aligned}
b^jk=∑t=1TP(O,st=qj∣λi)∑t=1TP(O,st=qj∣λi)I(ot=vk)
其中k表示可能的观测值,取值范围从1到N;I(·)为0-1函数,用于筛选出符合观测值k的数据。
由前述概率计算公式可得
P
(
O
,
s
t
=
q
j
∣
λ
i
)
=
α
t
(
j
)
⋅
β
t
(
j
)
P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j)
P(O,st=qj∣λi)=αt(j)⋅βt(j)
于是
b
^
j
k
=
∑
t
=
1
T
α
t
(
j
)
β
t
(
j
)
I
(
o
t
=
v
k
)
∑
t
=
1
T
α
t
(
j
)
β
t
(
j
)
\hat b _{jk} = \frac{\sum_{t=1}^T \alpha_t(j)\beta_t(j) I(o_t=v_k)}{\sum_{t=1}^T \alpha_t(j)\beta_t(j)}
b^jk=∑t=1Tαt(j)βt(j)∑t=1Tαt(j)βt(j)I(ot=vk)
综上所述,学习问题可用上述求解出来的迭代公式解决,直至参数收敛。
4.预测问题
隐含马尔科夫模型参数 λ = ( π , A , B ) \lambda=(\pi,A,B) λ=(π,A,B),从状态序列变化上看,可以构成树形求解空间,目标是求解最优路径问题,考虑使用动态规划算法解决预测问题。维特比算法就是这样设计来解决预测问题,最优路径问题有这样的特性,最优路径的子路径也是子问题的最优路径。
因此,令
δ
t
(
i
)
=
max
s
1
,
s
2
,
…
,
s
t
−
1
P
(
s
t
=
q
i
,
s
t
−
1
,
…
,
s
1
,
o
t
,
o
t
−
1
,
…
,
o
1
∣
λ
)
,
i
=
1
,
2
,
…
,
M
\delta_t(i) = \max_{s_1,s_2,\dots,s_{t-1}} P(s_t=q_i,s_{t-1},\dots,s_1,o_t,o_{t-1},\dots,o_1|\lambda),i=1,2,\dots,M
δt(i)=maxs1,s2,…,st−1P(st=qi,st−1,…,s1,ot,ot−1,…,o1∣λ),i=1,2,…,M表示状态序列满足时刻t状态为i的最大概率,由此可得递推公式
δ
t
+
1
(
i
)
=
max
1
≤
j
≤
M
[
δ
t
(
j
)
a
j
i
]
b
i
o
t
+
1
\delta_{t+1}(i)= \max_{1 \le j \le M} [\delta_t(j)a_{ji}]b_{io_{t+1}}
δt+1(i)=1≤j≤Mmax[δt(j)aji]biot+1
定义时刻t状态为i的所有路径中最优路径中第t-1个结点为
ψ
t
(
i
)
=
max
1
≤
j
≤
M
[
δ
t
−
1
(
j
)
a
j
i
]
,
i
=
1
,
2
,
…
,
M
\psi_t(i) = \max_{1 \le j \le M} [\delta_{t-1}(j)a_{ji}], i=1,2,\dots,M
ψt(i)=max1≤j≤M[δt−1(j)aji],i=1,2,…,M。
综上所述,通过迭代可求得 δ t ( i ) , i = 1 , 2 , … , M \delta_t(i),i=1,2,\dots,M δt(i),i=1,2,…,M,并且记录每次选择状态,即可得到状态序列
5.参考资料
- 《统计学习方法》- 李航