文章目录
通信模型
发送者(人或机器)发送信息时,需通过媒介(空气或电线)传播信号,此过程为广义上的编码。接收者根据规则将信号还原成发送者发送的信息,此过程为广义上的解码.
语音识别是接收方根据接收信号还原发送方的信息的过程,如何通过观测信号 o 1 , o 2 , ⋯ o_1,o_2,\cdots o1,o2,⋯,来分析信号源发送的信息 s 1 , s 2 , ⋯ s_1,s_2,\cdots s1,s2,⋯呢?从概率角度来看,就是从所有源信息中找到最可能产出观测信号的源信息。
根据贝叶斯定理
P
(
s
1
,
s
2
,
⋯
∣
o
1
,
o
2
,
⋯
)
=
P
(
o
1
,
o
2
,
⋯
∣
s
1
,
s
2
,
⋯
)
P
(
s
1
,
s
2
,
⋯
)
P
(
o
1
,
o
2
,
⋯
)
P(s_1,s_2,\cdots|o_1,o_2,\cdots)=\frac{P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)}{P(o_1,o_2,\cdots)}
P(s1,s2,⋯∣o1,o2,⋯)=P(o1,o2,⋯)P(o1,o2,⋯∣s1,s2,⋯)P(s1,s2,⋯)
一旦信息
o
1
,
o
2
,
⋯
o_1,o_2,\cdots
o1,o2,⋯产生后就不会改变,即
P
(
o
1
,
o
2
,
⋯
)
P(o_1,o_2,\cdots)
P(o1,o2,⋯)为常数,最可能的源信息
s
1
,
s
2
,
⋯
=
arg
max
s
1
,
s
2
,
⋯
P
(
s
1
,
s
2
,
⋯
∣
o
1
,
o
2
,
⋯
)
=
arg
max
s
1
,
s
2
,
⋯
P
(
o
1
,
o
2
,
⋯
∣
s
1
,
s
2
,
⋯
)
P
(
s
1
,
s
2
,
⋯
)
s_1,s_2,\cdots =\arg\max_{s_1,s_2,\cdots}P(s_1,s_2,\cdots|o_1,o_2,\cdots)= \arg\max_{s_1,s_2,\cdots}P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)
s1,s2,⋯=args1,s2,⋯maxP(s1,s2,⋯∣o1,o2,⋯)=args1,s2,⋯maxP(o1,o2,⋯∣s1,s2,⋯)P(s1,s2,⋯)
这个公式可由隐含马尔可夫模型求解。
马尔可夫假设和马尔可夫过程
观测序列
s
1
,
s
2
,
⋯
,
s
t
,
⋯
s_1,s_2,\cdots,s_t,\cdots
s1,s2,⋯,st,⋯是每天最高气温序列,
s
t
s_t
st为气温随机变量。假设随机过程中状态
s
t
s_t
st的概率分布只与它的前一个状态相关(今天的最高气温仅与昨天的最高气温有关),即
P
(
s
t
∣
s
1
,
s
2
,
⋯
,
s
t
−
1
)
=
P
(
s
t
∣
s
t
−
1
)
P(s_t|s_1,s_2,\cdots,s_{t-1})=P(s_t|s_{t-1})
P(st∣s1,s2,⋯,st−1)=P(st∣st−1)
该假设称为马尔可夫假设
,符合马尔可夫假设的随机过程称为马尔可夫过程(有向图-贝叶斯网络)
。
随机选择一个状态作为初始状态,随后依据转移规则生成后续状态,经
T
T
T时间后,产生状态序列
s
1
,
⋯
,
s
T
s_1,\cdots,s_T
s1,⋯,sT。若时间足够长,从
m
i
m_i
mi到
m
j
m_j
mj的转移概率为
#
(
m
i
,
m
j
)
/
#
(
m
i
)
\#(m_i,m_j)/\#(m_i)
#(mi,mj)/#(mi)。
隐马尔可夫模型和通信模型
隐马尔可夫模型,描述由马尔可夫链生成不可观测的状态序列,再由状态序列生成观测序列的过程。 隐含的状态序列 s 1 , s 2 , ⋯ s_1,s_2,\cdots s1,s2,⋯是一个典型的马尔可夫链,这种模型称为“隐含”马尔可夫模型。
隐马尔可夫模型的两个假设:
-
独立输出假设: HMM在每个时刻 t t t输出一个观测 o t o_t ot仅与隐状态 s t s_t st相关::
P ( o t ∣ s 1 , ⋯ , s t , o 1 , ⋯ , o t − 1 ) = P ( o t ∣ s t ) P(o_t|s_1,\cdots,s_{t},o_1,\cdots,o_{t-1})=P(o_t|s_{t}) P(ot∣s1,⋯,st,o1,⋯,ot−1)=P(ot∣st) -
马尔可夫假设: HMM在每个时刻 t t t的隐状态 s t s_t st仅与上一时刻隐状态 s t − 1 s_{t-1} st−1有关:
P ( s t ∣ s 1 , ⋯ , s t − 1 , o 1 , ⋯ , o t − 1 ) = P ( s t ∣ s t − 1 ) P(s_t|s_1,\cdots,s_{t-1},o_1,\cdots,o_{t-1})=P(s_t|s_{t-1}) P(st∣s1,⋯,st−1,o1,⋯,ot−1)=P(st∣st−1)
根据马尔可夫假设和独立输出假设,状态序列和观测序列的联合概率(生成式模型)
P
(
s
1
,
s
2
,
⋯
,
o
1
,
o
2
,
⋯
)
=
∏
t
P
(
s
t
∣
s
t
−
1
)
⋅
P
(
o
t
∣
s
t
)
P(s_1,s_2,\cdots,o_1,o_2,\cdots)=\prod_tP(s_t|s_{t-1})\cdot P(o_t|s_t)
P(s1,s2,⋯,o1,o2,⋯)=t∏P(st∣st−1)⋅P(ot∣st)
通信解码问题可用HMM解决,利用Viterbi算法找到上面概率的最大值,进而找到最可能的隐藏状态.
HMM模型表示
令隐藏状态集合 M = { m 1 , ⋯ , m N } M = \{m_1,\cdots, m_N\} M={m1,⋯,mN},观测状态集合 V = { v 1 , ⋯ , v M } V = \{v_1, \cdots, v_M\} V={v1,⋯,vM},隐藏状态序列 S = ( s 1 , ⋯ , s T ) S = (s_1, \cdots, s_T) S=(s1,⋯,sT),观测状态序列 O = ( o 1 , ⋯ , o T ) O = (o_1, \cdots, o_T) O=(o1,⋯,oT)。
I. 状态转移矩阵
若时刻
t
t
t处于隐藏状态
m
i
m_i
mi,时刻
t
+
1
t+1
t+1处于隐藏状态为
m
j
m_j
mj,则时刻
t
t
t到时刻
t
+
1
t+1
t+1的状态转移概率
a
i
j
=
P
(
s
t
+
1
=
m
j
∣
s
t
=
m
i
)
,
i
,
j
=
1
,
2
,
⋯
,
N
a_{ij} = P(s_{t+1} = m_j | s_t = m_i), \quad i,j = 1, 2, \cdots, N
aij=P(st+1=mj∣st=mi),i,j=1,2,⋯,N
状态转移矩阵
A
=
[
a
i
j
]
N
×
N
A = [a_{ij}]_{N \times N}
A=[aij]N×N.
II. 观测概率矩阵
若时刻
t
t
t处于隐藏状态
m
j
m_j
mj,则从隐藏状态
m
j
m_j
mj到观测状态
v
k
v_k
vk的生成概率
b
j
(
k
)
=
P
(
o
t
=
v
k
∣
s
t
=
m
j
)
,
k
=
1
,
2
,
⋯
,
M
;
j
=
1
,
2
,
⋯
,
N
b_j(k) = P(o_t = v_k | s_t = m_j), \quad k = 1,2,\cdots, M; \, j = 1, 2, \cdots, N
bj(k)=P(ot=vk∣st=mj),k=1,2,⋯,M;j=1,2,⋯,N
观测概率矩阵
B
=
[
b
j
(
k
)
]
N
×
M
B = [b_j(k)]_{N\times M}
B=[bj(k)]N×M.
III. 初始状态概率向量
若初始时刻
t
=
1
t=1
t=1处于状态
m
i
m_i
mi的概率
π
i
=
P
(
s
1
=
m
i
)
,
i
=
1
,
2
,
⋯
,
N
\pi_i = P(s_1 = m_i), \quad i = 1, 2, \cdots, N
πi=P(s1=mi),i=1,2,⋯,N
初始状态概率向量 Π = ( π i ) \Pi = (\pi_i) Π=(πi).
综上,
π
\pi
π和
A
A
A决定状态序列,
B
B
B决定观测序列,HMM的三元组
表示为
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)
示例:假设有 4 4 4个盒子,每盒都装有红白两种颜色的球,如下
盒子 X | 1 | 2 | 3 |
---|---|---|---|
红球数 | 5 | 4 | 7 |
白球数 | 5 | 6 | 3 |
依初始概率随机选取1个盒子,从中抽出1个球再放回,然后转移到下一个盒子,如盒子1的转移概率为
P
(
X
=
1
∣
X
=
1
)
=
0.5
,
P
(
X
=
2
∣
X
=
1
)
=
0.2
,
P
(
X
=
3
∣
X
=
1
)
=
0.3
P(X=1|X=1)=0.5,\quad P(X=2|X=1)=0.2,\quad P(X=3|X=1)=0.3
P(X=1∣X=1)=0.5,P(X=2∣X=1)=0.2,P(X=3∣X=1)=0.3
如此重复进行5次,得到球颜色的观测序列
O
=
{
红
,
红
,
白
,
白
,
红
}
O = \{红, 红,白,白,红\}
O={红,红,白,白,红}
例中,盒子序列为隐状态序列,球颜色序列是观测序列已知,HMM三要素:
A
=
[
0.5
0.2
0.3
0.3
0.5
0.2
0.2
0.3
0.5
]
,
B
=
[
0.5
0.5
0.4
0.6
0.7
0.3
]
,
Π
=
(
0.2
,
0.4
,
0.4
)
T
A = \left[\begin{matrix} 0.5 &0.2 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right] ,\quad B = \left[\begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \end{matrix}\right] ,\quad \Pi=(0.2, 0.4, 0.4)^T
A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤,Π=(0.2,0.4,0.4)T
HMM概率计算
问题描述:已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( o 1 , o 2 , ⋯ , o T ) O = (o_1, o_2, \cdots, o_T) O=(o1,o2,⋯,oT),计算模型 λ \lambda λ下观测序列 O O O的概率,即 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)。
是否可以通过枚举计算观测序列出现的概率?通过枚举状态序列
S
=
(
s
1
,
s
2
,
⋯
,
s
T
)
S = (s_1, s_2, \cdots, s_T)
S=(s1,s2,⋯,sT),求解
S
S
S与观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O=(o1,o2,⋯,oT)的联合概率
P
(
O
,
S
∣
λ
)
P(O, S|\lambda)
P(O,S∣λ),再求和
P
(
O
∣
λ
)
=
∑
S
P
(
O
,
S
∣
λ
)
=
∑
S
P
(
O
∣
S
,
λ
)
P
(
S
∣
λ
)
\begin{aligned} P(O|\lambda) & = \sum_S P(O, S|\lambda) = \sum_{S}P(O|S, \lambda)P(S|\lambda) \end{aligned}
P(O∣λ)=S∑P(O,S∣λ)=S∑P(O∣S,λ)P(S∣λ)
隐藏状态序列有
N
T
N^T
NT种组合,直接计算法的复杂度为
O
(
T
N
T
)
O(TN^T)
O(TNT),不适用于隐含状态较多的模型。
前向递推公式
前向算法是一种DP算法,通过定义局部状态前向概率得到递推公式,将子问题的最优解扩展到全局问题的最优解。给定模型
λ
\lambda
λ,在时刻
t
t
t观测序列为
o
1
,
⋯
,
o
t
o_1, \cdots, o_t
o1,⋯,ot且隐藏状态
s
t
=
q
i
s_t=q_i
st=qi的概率为前向概率,定义为
α
t
(
i
)
=
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
i
∣
λ
)
,
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
,
α
1
(
i
)
=
π
i
b
i
(
o
1
)
\alpha_t(i) = P(o_1,\cdots,o_t,s_t=q_i|\lambda),\quad \ P(O|\lambda) = \sum_{i}\alpha_T(i),\quad \alpha_1(i) = \pi_i b_i(o_1)
αt(i)=P(o1,⋯,ot,st=qi∣λ), P(O∣λ)=i∑αT(i),α1(i)=πibi(o1)
由齐次马尔可夫性
和观测独立性
假设,知前向概率的递推公式为
α
t
+
1
(
i
)
=
P
(
o
1
,
⋯
,
o
t
,
o
t
+
1
,
s
t
+
1
=
q
i
∣
λ
)
=
∑
j
P
(
o
1
,
⋯
,
o
t
,
o
t
+
1
,
s
t
=
q
j
,
s
t
+
1
=
q
i
∣
λ
)
=
∑
j
P
(
s
t
+
1
=
q
i
,
o
t
+
1
∣
o
1
,
⋯
,
o
t
,
s
t
=
q
j
,
λ
)
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
j
∣
λ
)
=
∑
j
P
(
s
t
+
1
=
q
i
,
o
t
+
1
∣
s
t
=
q
j
,
λ
)
α
t
(
j
)
=
∑
j
P
(
o
t
+
1
∣
s
t
=
q
j
,
s
t
+
1
=
q
i
,
λ
)
P
(
s
t
+
1
=
q
i
∣
s
t
=
q
j
,
λ
)
α
t
(
j
)
=
[
∑
j
α
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
\begin{aligned} \alpha_{t+1}(i) &=P(o_1,\cdots,o_t,o_{t+1},s_{t+1}=q_i|\lambda)\\[1ex] &=\sum_jP(o_1,\cdots,o_t,o_{t+1},s_t=q_j,s_{t+1}=q_i|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|o_1,\cdots,o_t,s_t=q_j,\lambda)P(o_1,\cdots,o_t,s_t=q_j|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|s_t=q_j,\lambda)\alpha_t(j)\\ &=\sum_jP(o_{t+1}|s_t=q_j,s_{t+1}=q_i,\lambda)P(s_{t+1}=q_i|s_t=q_j,\lambda)\alpha_t(j)\\ &=\left[\sum_{j} \alpha_t(j) a_{ji}\right] b_i(o_{t+1}) \end{aligned}
αt+1(i)=P(o1,⋯,ot,ot+1,st+1=qi∣λ)=j∑P(o1,⋯,ot,ot+1,st=qj,st+1=qi∣λ)=j∑P(st+1=qi,ot+1∣o1,⋯,ot,st=qj,λ)P(o1,⋯,ot,st=qj∣λ)=j∑P(st+1=qi,ot+1∣st=qj,λ)αt(j)=j∑P(ot+1∣st=qj,st+1=qi,λ)P(st+1=qi∣st=qj,λ)αt(j)=[j∑αt(j)aji]bi(ot+1)
基于状态序列的路径结构递推计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ),通过保存子问题的解以避免重复计算,达到计算加速的目的。
矩阵形式为
α
1
=
π
⊙
B
o
1
,
α
t
+
1
=
(
α
t
T
A
)
⊙
B
o
t
+
1
\boldsymbol\alpha_1=\boldsymbol\pi\odot\boldsymbol B_{o_1},\ \boldsymbol\alpha_{t+1}=(\boldsymbol\alpha_t^TA)\odot\boldsymbol B_{o_{t+1}}
α1=π⊙Bo1, αt+1=(αtTA)⊙Bot+1,最后迭代得到
α
T
(
i
)
\alpha_T(i)
αT(i),因此
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|\lambda)=\sum_{i}\alpha_T(i)
P(O∣λ)=i∑αT(i)
若模型 λ \lambda λ含 N N N个隐藏状态,观测序列 O O O的长度为 T T T,则 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)的时间复杂度为 O ( N 2 T ) O(N^2T) O(N2T)。
Python实现
import numpy as np
def forward_HMM(O, PI, A, B):
"""
已知模型,求解状态序列概率
:param O: 1D, 观测序列(元素为整数)
:param PI: 1D, 初始概率向量
:param A: 2D, 状态转移矩阵
:param B: 2D, 观测生成矩阵
:return: float, O的概率
"""
PI = np.asarray(PI).ravel()
A = np.asarray(A)
B = np.asarray(B)
# 求解第1步的前向概率
alphas = B[:, O[0]] * PI
# 求解2至T步的前向概率
for index in O[1:]:
alphas = np.dot(alphas, A) * B[:, index]
# 累计最后所有隐藏状态的前向概率
return alphas.sum()
if __name__ == '__main__':
# 初始概率向量
PI = [0.2, 0.4, 0.4]
# 状态转移矩阵N*N, N个隐含状态
A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
# 观测概率矩阵N*M, N个隐含状态, M个观测状态
B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
# 观测序列
O = [0, 1, 0]
print(forward_HMM(O, PI, A, B))
后向递推公式
给定模型
λ
\lambda
λ,在时刻
t
t
t隐藏态为
q
i
q_i
qi且时刻
t
+
1
t+1
t+1之后观测序列为
o
t
+
1
,
⋯
,
o
T
o_{t+1}, \cdots, o_T
ot+1,⋯,oT的概率为后向概率,即
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
s
t
=
q
i
,
λ
)
,
P
(
O
∣
λ
)
=
∑
i
π
i
b
i
(
o
1
)
β
1
(
i
)
,
β
T
(
i
)
=
1
\beta_t(i) = P(o_{t+1},o_{t+2},\cdots,o_T|s_t = q_i, \lambda),\quad P(O|\lambda) = \sum_{i}\pi_i b_i(o_1) \beta_1(i),\quad \beta_{T}(i) = 1
βt(i)=P(ot+1,ot+2,⋯,oT∣st=qi,λ),P(O∣λ)=i∑πibi(o1)β1(i),βT(i)=1
由齐次马尔可夫性
和观测独立性
假设,知后向概率的递推公式
β
t
(
i
)
=
∑
j
P
(
o
t
+
1
,
⋯
,
o
T
,
s
t
+
1
=
q
j
∣
s
t
=
q
i
,
λ
)
=
∑
j
P
(
o
t
+
1
,
⋯
,
o
T
∣
s
t
=
q
i
,
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
s
t
+
1
=
q
j
∣
s
t
=
q
i
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
∣
o
t
+
2
,
⋯
,
o
T
,
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
o
t
+
2
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
∣
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
o
t
+
2
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
b
j
(
o
t
+
1
)
⋅
β
t
+
1
(
j
)
\begin{aligned}\beta_t(i) & = \sum_{j}P(o_{t+1},\cdots,o_T,s_{t+1}=q_j|s_t = q_i, \lambda) \\ & = \sum_{j}P(o_{t+1},\cdots,o_T|s_t = q_i,s_{t+1}=q_j, \lambda)\cdot P(s_{t+1}=q_j|s_t =q_i, \lambda) \\ & = \sum_{j}a_{ij}\cdot P(o_{t+1},\cdots,o_T| s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|o_{t+2},\cdots,o_T,s_{t+1}=q_j,\lambda)\cdot P(o_{t+2}, \cdots, o_T|s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|s_{t+1}=q_j,\lambda)\cdot P(o_{t+2},\cdots, o_T|s_{t+1}=q_j, \lambda) \\ & = \sum_{j}a_{ij}\cdot b_j(o_{t+1})\cdot \beta_{t+1}(j) \end{aligned}
βt(i)=j∑P(ot+1,⋯,oT,st+1=qj∣st=qi,λ)=j∑P(ot+1,⋯,oT∣st=qi,st+1=qj,λ)⋅P(st+1=qj∣st=qi,λ)=j∑aij⋅P(ot+1,⋯,oT∣st+1=qj,λ)=j∑aij⋅P(ot+1∣ot+2,⋯,oT,st+1=qj,λ)⋅P(ot+2,⋯,oT∣st+1=qj,λ)=j∑aij⋅P(ot+1∣st+1=qj,λ)⋅P(ot+2,⋯,oT∣st+1=qj,λ)=j∑aij⋅bj(ot+1)⋅βt+1(j)
前后向算法之间的关系
P ( O ∣ λ ) = ∑ i P ( o 1 , ⋯ , o t , s t = q i , o t + 1 , ⋯ , o T , ∣ λ ) = ∑ i P ( o t + 1 , ⋯ , o T ∣ o 1 , ⋯ , o t , s t = q t , λ ) ⋅ P ( o 1 , ⋯ , o t , s t = q t ∣ λ ) = ∑ i P ( o t + 1 , ⋯ , o T ∣ s t = q t , λ ) ⋅ P ( o 1 , ⋯ , o t , s t = q t ∣ λ ) = ∑ i α t ( i ) β t ( i ) = ∑ i P ( s t = q i , O ∣ λ ) \begin{aligned} P(O|\lambda) & = \sum_{i}P(o_1, \cdots, o_t, s_t=q_i, o_{t+1}, \cdots, o_T, |\lambda)\\ & = \sum_{i}P(o_{t+1}, \cdots, o_T | o_1, \cdots, o_t , s_t= q_t, \lambda)\cdot P(o_1, \cdots,o_t,s_t=q_t |\lambda) \\ & = \sum_{i}P(o_{t+1}, \cdots, o_T|s_t=q_t, \lambda)\cdot P(o_1, \cdots, o_t, s_t=q_t | \lambda) \\ & = \sum_{i}\alpha_t(i)\beta_t(i)=\sum_iP(s_t=q_i, O|\lambda) \end{aligned} P(O∣λ)=i∑P(o1,⋯,ot,st=qi,ot+1,⋯,oT,∣λ)=i∑P(ot+1,⋯,oT∣o1,⋯,ot,st=qt,λ)⋅P(o1,⋯,ot,st=qt∣λ)=i∑P(ot+1,⋯,oT∣st=qt,λ)⋅P(o1,⋯,ot,st=qt∣λ)=i∑αt(i)βt(i)=i∑P(st=qi,O∣λ)
当
t
=
T
−
1
t=T-1
t=T−1和
t
=
1
t=1
t=1时,上式分别表示前向和后向概率计算公式.
一些概率计算公式
给定模型
λ
\lambda
λ和观测序列
O
O
O,时刻
t
t
t处于状态
q
i
q_i
qi的概率,记作
γ
t
(
i
)
=
P
(
s
t
=
q
i
∣
O
,
λ
)
=
P
(
s
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
α
t
(
j
)
β
t
(
j
)
\gamma_t(i) = P(s_t =q_i | O, \lambda) = \frac{P(s_t=q_i,O | \lambda)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\displaystyle\sum_{j}\alpha_t(j)\beta_t(j)}
γt(i)=P(st=qi∣O,λ)=P(O∣λ)P(st=qi,O∣λ)=j∑αt(j)βt(j)αt(i)βt(i)
给定模型
λ
\lambda
λ和观测序列
O
O
O,时刻
t
t
t处于状态
q
i
q_i
qi且时刻
t
+
1
t+1
t+1处于状态
q
j
q_j
qj的概率 ,记作
ξ
t
(
i
,
j
)
=
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
∣
O
,
λ
)
=
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
,
O
∣
λ
)
∑
i
∑
j
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
,
O
∣
λ
)
\xi_t(i, j) = P(s_t=q_i, s_{t+1}=q_j|O, \lambda) = \frac{P(s_t=q_i, s_{t+1}=q_j,O| \lambda)}{\displaystyle\sum_i\sum_jP(s_t=q_i, s_{t+1}=q_j, O|\lambda)}
ξt(i,j)=P(st=qi,st+1=qj∣O,λ)=i∑j∑P(st=qi,st+1=qj,O∣λ)P(st=qi,st+1=qj,O∣λ)
其中, P ( s t = q i , s t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(s_t=q_i, s_{t+1}=q_j, O|\lambda)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j) P(st=qi,st+1=qj,O∣λ)=αt(i)aijbj(ot+1)βt+1(j)。
HMM模型学习
问题描述:给定观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O=(o1,o2,⋯,oT),求最可能的HMM的
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)。
监督学习方法
若有足够多的标记数据,即已知隐含状态
m
j
m_j
mj出现的次数
#
(
m
j
)
\#(m_j)
#(mj)、生成观测状态
v
k
v_k
vk的次数
#
(
v
k
,
m
j
)
\#(v_k,m_j)
#(vk,mj),则参数估计
a
i
j
≈
#
(
m
i
,
m
j
)
#
(
m
i
)
,
b
j
(
k
)
≈
#
(
v
k
,
m
j
)
#
(
m
j
)
,
π
i
≈
#
(
m
i
)
∑
#
(
m
k
)
a_{ij}\approx\frac{\#(m_i,m_j)}{\#(m_i)},\quad b_j(k)\approx\frac{\#(v_k,m_j)}{\#(m_j)},\quad \pi_i\approx\frac{\#(m_i)}{\displaystyle\sum \#(m_k)}
aij≈#(mi)#(mi,mj),bj(k)≈#(mj)#(vk,mj),πi≈∑#(mk)#(mi)
很多应用不可能做到这件事情,比如语音识别的声学模型训练,人无法确定产生某个语音的状态序列。
期望最大化算法
HMM的概率模型
P
(
O
∣
λ
)
=
∑
S
P
(
O
∣
S
,
λ
)
P
(
S
∣
λ
)
P(O|\lambda)=\sum_SP(O|S, \lambda)P(S|\lambda)
P(O∣λ)=S∑P(O∣S,λ)P(S∣λ)
EM算法中的Q函数
Q
(
λ
,
λ
′
)
=
∑
S
P
(
S
∣
O
,
λ
′
)
ln
P
(
O
,
S
∣
λ
)
∝
∑
S
P
(
O
,
S
∣
λ
′
)
ln
P
(
O
,
S
∣
λ
)
Q(\lambda, \lambda')=\sum_SP(S|O,\lambda')\ln P(O,S|\lambda)\propto\sum_S P(O,S|\lambda')\ln P(O,S|\lambda)
Q(λ,λ′)=S∑P(S∣O,λ′)lnP(O,S∣λ)∝S∑P(O,S∣λ′)lnP(O,S∣λ)
根据状态序列和观测序列的联合分布(下标
i
j
i_j
ij表示任意隐状态编号)
P
(
O
,
S
∣
λ
)
=
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
b
i
2
(
o
2
)
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
P(O,S|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)\cdots a_{i_{T-1}i_T}b_{i_T}(o_T)
P(O,S∣λ)=πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbiT(oT)
得
Q
(
λ
,
λ
′
)
=
∑
S
P
(
O
,
S
∣
λ
′
)
ln
π
i
1
+
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
−
1
a
i
t
i
t
+
1
+
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
b
i
t
(
o
t
)
Q(\lambda, \lambda')=\sum_SP(O,S|\lambda')\ln\pi_{i_1}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{i_t}(o_t) \\
Q(λ,λ′)=S∑P(O,S∣λ′)lnπi1+S∑P(O,S∣λ′)lnt=1∑T−1aitit+1+S∑P(O,S∣λ′)lnt=1∑Tbit(ot)
式中
∑
S
P
(
O
,
S
∣
λ
′
)
ln
π
i
1
=
∑
i
P
(
O
,
s
1
=
q
i
∣
λ
′
)
ln
π
i
,
∑
i
π
i
=
1
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
−
1
a
i
t
i
t
+
1
=
∑
i
∑
j
P
(
O
,
s
t
=
q
i
,
s
t
+
1
=
q
j
∣
λ
′
)
∑
t
=
1
T
−
1
ln
a
i
j
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
b
i
t
(
o
t
)
=
∑
i
P
(
O
,
i
t
=
i
∣
λ
′
)
ln
∑
i
=
1
T
b
i
(
o
t
)
\begin{aligned} & \sum_SP(O,S|\lambda')\ln \pi_{i_1}=\sum_iP(O,s_1=q_i|\lambda')\ln\pi_{i},\quad\sum_i\pi_i=1\\ &\sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}=\sum_i\sum_jP(O,s_t=q_i,s_{t+1}=q_j|\lambda')\sum_{t=1}^{T-1}\ln a_{ij}\\ & \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{it}(o_t)=\sum_iP(O,i_t=i|\lambda')\ln\sum_{i=1}^Tb_i(o_t) \end{aligned}
S∑P(O,S∣λ′)lnπi1=i∑P(O,s1=qi∣λ′)lnπi,i∑πi=1S∑P(O,S∣λ′)lnt=1∑T−1aitit+1=i∑j∑P(O,st=qi,st+1=qj∣λ′)t=1∑T−1lnaijS∑P(O,S∣λ′)lnt=1∑Tbit(ot)=i∑P(O,it=i∣λ′)lni=1∑Tbi(ot)
对
π
i
\pi_i
πi,
a
i
j
a_{ij}
aij,
b
j
(
k
)
b_j(k)
bj(k)的偏导为0得(根据上节概率计算公式)
π
i
=
P
(
O
,
s
1
=
q
i
∣
λ
′
)
P
(
O
∣
λ
′
)
=
γ
1
(
i
)
,
a
i
j
=
∑
i
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
i
=
1
T
−
1
γ
t
(
i
)
,
b
j
(
k
)
=
∑
t
=
1
,
o
t
=
v
k
T
γ
t
(
j
)
∑
t
=
1
T
γ
t
(
j
)
\pi_i = \frac{P(O, s_1=q_i|\lambda')}{P(O|\lambda')}=\gamma_1(i),\quad a_{ij}=\frac{\sum_{i=1}^{T-1}\xi_t(i,j)}{\sum_{i=1}^{T-1}\gamma_t(i)},\quad b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t=1}^T\gamma_t(j)}
πi=P(O∣λ′)P(O,s1=qi∣λ′)=γ1(i),aij=∑i=1T−1γt(i)∑i=1T−1ξt(i,j),bj(k)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
HMM预测/解码
给定模型
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)和观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O=(o1,o2,⋯,oT),求最可能的隐藏状态序列
S
S
S,即
P
(
S
∣
O
,
λ
)
P(S|O, \lambda)
P(S∣O,λ).
贪心近似算法
给定
λ
\lambda
λ和观测序列
O
O
O,时刻
t
t
t处于状态
q
i
q_i
qi的概率
γ
t
(
i
)
=
P
(
s
t
=
q
i
∣
O
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
α
t
(
j
)
β
t
(
j
)
\gamma_t(i)=P(s_t=q_i | O, \lambda) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j}\alpha_t(j)\beta_t(j)}
γt(i)=P(st=qi∣O,λ)=∑jαt(j)βt(j)αt(i)βt(i)
每个时刻t选择最可能出现的状态
s
t
∗
s_t^*
st∗,从而得到状态序列
S
∗
S^*
S∗,即
S
∗
=
(
s
1
∗
,
s
2
∗
,
⋯
)
,
s
t
∗
=
q
k
=
arg
max
k
γ
t
(
k
)
S^*=(s_1^*,s_2^*,\cdots),\quad s_t^*=q_k = \arg\max_k\gamma_t(k)
S∗=(s1∗,s2∗,⋯),st∗=qk=argkmaxγt(k)
维特比算法
DP思想:最优路径中的部分路径也一定是最优的。设观测序列
o
1
,
⋯
,
o
t
o_1,\cdots,o_t
o1,⋯,ot下状态
s
t
=
q
i
s_t=q_i
st=qi的所有路径中概率最大值为
δ
t
(
i
)
=
max
i
P
(
s
t
=
q
i
,
s
t
−
1
,
⋯
,
s
1
,
o
t
,
⋯
,
o
1
∣
λ
)
\delta_t(i) = \max_{i}P(s_t=q_i, s_{t-1}, \cdots, s_1, o_t, \cdots, o_1|\lambda)
δt(i)=imaxP(st=qi,st−1,⋯,s1,ot,⋯,o1∣λ)
递推公式
δ
t
+
1
(
i
)
=
max
j
δ
t
(
j
)
a
j
i
b
i
(
o
t
+
1
)
,
δ
1
(
i
)
=
π
i
b
i
(
o
i
)
\delta_{t+1}(i)=\max_j\delta_t(j)a_{ji}b_i(o_{t+1}),\quad \delta_1(i) = \pi_ib_i(o_i)
δt+1(i)=jmaxδt(j)ajibi(ot+1),δ1(i)=πibi(oi)
定义时刻
t
+
1
t+1
t+1状态为
q
i
q_i
qi的最大概率路径的第
t
t
t个节点
i
t
=
ψ
t
+
1
(
i
)
=
arg
max
j
δ
t
(
j
)
a
j
i
,
i
T
=
arg
max
i
δ
T
(
i
)
i_{t} = \psi_{t+1}(i) = \arg\max_{j}\delta_{t}(j)a_{ji},\quad i_T=\arg\max_{i}\delta_T(i)
it=ψt+1(i)=argjmaxδt(j)aji,iT=argimaxδT(i)
则 P ( S ∣ O , λ ) = max i δ T ( i ) P(S|O,\lambda)=\max_{i}\delta_T(i) P(S∣O,λ)=maxiδT(i).
如图所示
δ
3
(
i
1
)
=
max
{
δ
2
(
i
1
)
a
11
b
1
(
o
3
)
,
δ
2
(
i
2
)
a
21
b
1
(
o
3
)
,
δ
2
(
i
3
)
a
31
b
1
(
o
3
)
}
\delta_3(i_1)=\max\{\delta_2(i_1)a_{11}b_{1}(o_3), \,\,\delta_2(i_2)a_{21}b_1(o_3),\,\, \delta_2(i_3)a_{31}b_1(o_3)\}
δ3(i1)=max{δ2(i1)a11b1(o3),δ2(i2)a21b1(o3),δ2(i3)a31b1(o3)}.
示例: 基于第4解模型 λ = ( A , B , Π ) \lambda = (A, B, \Pi) λ=(A,B,Π),已知观测序列 O = ( 红 , 白 , 红 ) O=(红, 白, 红) O=(红,白,红),求最优状态序列。
I. 初始化
时刻
t
=
1
t=1
t=1,每一个隐藏状态
q
i
q_i
qi观测到红色的概率
δ
1
(
1
)
=
0.2
∗
0.5
=
0.1
,
δ
1
(
2
)
=
0.4
∗
0.4
=
0.16
,
δ
1
(
3
)
=
0.4
∗
0.7
=
0.28
,
ψ
1
(
i
)
=
0
\delta_1(1)=0.2*0.5=0.1, \quad \delta_1(2)=0.4*0.4=0.16, \quad \delta_1(3)=0.4*0.7=0.28, \quad \psi_1(i)=0
δ1(1)=0.2∗0.5=0.1,δ1(2)=0.4∗0.4=0.16,δ1(3)=0.4∗0.7=0.28,ψ1(i)=0
II. 迭代计算
时刻
t
=
2
t=2
t=2状态为
q
1
q_1
q1观测为白的最大概率
δ
2
(
1
)
=
max
1
≤
j
≤
3
[
δ
1
(
j
)
a
j
1
]
b
1
(
o
2
)
=
max
{
0.1
∗
0.5
,
0.16
∗
0.3
,
0.28
∗
0.2
}
∗
0.5
=
0.028
,
ψ
2
(
1
)
=
3
\delta_2(1)=\max_{1\leq j \leq 3}[\delta_1(j)a_{j1}]b_1(o_2) = \max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5 = 0.028, \quad \psi_2(1)=3
δ2(1)=1≤j≤3max[δ1(j)aj1]b1(o2)=max{0.1∗0.5,0.16∗0.3,0.28∗0.2}∗0.5=0.028,ψ2(1)=3
同理
δ
2
(
2
)
=
0.0504
,
ψ
2
(
2
)
=
3
;
δ
2
(
3
)
=
0.042
,
ψ
2
(
3
)
=
3
\delta_2(2)=0.0504, \psi_2(2)=3; \, \delta_2(3)=0.042, \psi_2(3)=3
δ2(2)=0.0504,ψ2(2)=3;δ2(3)=0.042,ψ2(3)=3.
时刻
t
=
3
t=3
t=3状态为
q
j
q_j
qj观测为红的最大概率
δ
3
(
1
)
=
0.00756
,
ψ
3
(
1
)
=
2
,
δ
3
(
2
)
=
0.01008
,
ψ
3
(
2
)
=
2
,
δ
3
(
3
)
=
0.0147
,
ψ
3
(
3
)
=
3.
\delta_3(1)=0.00756,\ \psi_3(1)=2,\ \delta_3(2)=0.01008,\ \psi_3(2)=2,\ \delta_3(3)=0.0147,\ \psi_3(3)=3.
δ3(1)=0.00756, ψ3(1)=2, δ3(2)=0.01008, ψ3(2)=2, δ3(3)=0.0147, ψ3(3)=3.
III. 最优概率路径
P
∗
=
max
1
≤
i
≤
3
δ
3
(
i
)
=
0.0147
P^* = \max_{1\leq i \leq 3} \delta_3(i)=0.0147
P∗=1≤i≤3maxδ3(i)=0.0147
因此
i
3
=
3
i_3 = 3
i3=3,
i
2
=
ψ
3
(
i
3
)
=
3
i_2 = \psi_3(i_3)=3
i2=ψ3(i3)=3,
i
1
=
ψ
2
(
i
2
)
=
3
i_1 = \psi_2(i_2)=3
i1=ψ2(i2)=3,最优状态序列
I
=
(
i
1
,
i
2
,
i
3
)
=
(
3
,
3
,
3
)
I=(i_1, i_2, i_3)=(3,3,3)
I=(i1,i2,i3)=(3,3,3).
隐藏状态序列
s
=
(
s
1
,
⋯
,
s
n
)
\boldsymbol s=(s_1, \cdots, s_n)
s=(s1,⋯,sn),观测序列
o
=
(
o
1
,
⋯
,
o
n
)
\boldsymbol o=(o_1, \cdots, o_n)
o=(o1,⋯,on).
HMM局限
HMM建模联合概率分布 λ = P ( S , O ) \lambda=P(S, O) λ=P(S,O),解码/预测问题是找到状态序列 s \boldsymbol s s,使得 P ( s ∣ o , λ ) P(\boldsymbol s|\boldsymbol o, λ) P(s∣o,λ)最大。
HMM中, s i s_i si仅依赖 s i − 1 s_{i-1} si−1, o i o_i oi依赖 s i s_i si,若观测序列通过很多特征刻画,比如NER任务中标注 s i s_i si不仅依赖 o i o_i oi,还依赖前后标注 o j ( j ≠ i ) o_j(j\neq i) oj(j=i),如周围观测的大小写、词性等特征,则HMM模型不能处理该类任务。