HMM
设隐藏状态序列,及其状态值的集合,
Z
Z
Z 为离散型随机变量,有
m
m
m 种取值
Z
=
z
1
,
z
2
.
.
.
z
T
,
Q
=
{
q
1
,
q
2
,
.
.
.
q
m
}
Z=z_1,z_2...z_T, Q=\{q_1, q_2,...q_m\}
Z=z1,z2...zT,Q={q1,q2,...qm}
设观测序列,及其观测值的集合
X
=
x
1
,
x
2
,
.
.
.
x
T
,
V
=
{
v
1
,
v
2
,
.
.
.
v
T
}
X = x_1, x_2,...x_T,V=\{v_1, v_2,...v_T\}
X=x1,x2,...xT,V={v1,v2,...vT}
模型表示:
θ
=
(
A
,
B
,
π
)
\theta = (A,B, \pi)
θ=(A,B,π)
- 其中 A A A 为状态转移矩阵,其维度为 ( m × m ) (m\times m) (m×m)
A
=
[
a
i
j
]
,
a
i
j
=
P
(
z
t
+
1
=
q
j
∣
z
t
=
q
i
)
A = [a_{ij}],a_{ij} = P(z_{t+1}=q_j|z_t=q_i)
A=[aij],aij=P(zt+1=qj∣zt=qi)
a
i
j
a_{ij}
aij 表示
t
t
t 时刻,当前隐藏状态
z
t
z_t
zt 转换为下一个隐藏状态
z
t
+
1
z_{t+1}
zt+1 的概率
- B B B 为 生成矩阵
B
=
[
b
j
(
k
)
]
,
b
j
(
k
)
=
P
(
x
t
=
v
k
∣
z
t
=
q
j
)
B = [b_j(k)],b_j(k) = P(x_t=v_k|z_t=q_j)
B=[bj(k)],bj(k)=P(xt=vk∣zt=qj)
b
j
(
k
)
b_j(k)
bj(k) 表示
t
t
t 时刻,当前观测值
x
t
x_t
xt 由当前隐藏值
z
t
z_t
zt 转换而来的概率
- π \pi π 表示 Z Z Z 的初始概率分布,即 z t z_t zt 取到 π m \pi _{m} πm 的概率,其维度为 ( 1 × m ) (1 \times m) (1×m)
π = [ π 1 , π 2 , π 3 , . . . , π m ] \pi = [\pi _1, \pi _2, \pi _3, ..., \pi _m] π=[π1,π2,π3,...,πm]
其中 π 1 + π 2 + π 3 + . . . + π m = 1 \pi _1+ \pi _2+ \pi _3+ ...+ \pi _m = 1 π1+π2+π3+...+πm=1
两个假设
- 齐次马尔科夫假设
P ( z t + 1 ∣ z 1 , z 2 , . . z t , x 1 , x 2 , . . . x t ) = P ( z t + 1 ∣ z t ) P(z_{t+1}|z_1, z_2,..z_t,x_1,x_2,...x_t) = P(z_{t+1}|z_t) P(zt+1∣z1,z2,..zt,x1,x2,...xt)=P(zt+1∣zt)
t + 1 t+1 t+1 时刻的隐藏状态的生成只和 t t t 时刻的隐藏状态有关
- 观测独立性假设
P ( x t ∣ z 1 , z 2 , . . z t , x 1 , x 2 , . . . x t ) = P ( x t ∣ z t ) P(x_t|z_1, z_2,..z_t,x_1,x_2,...x_t) = P(x_t|z_t) P(xt∣z1,z2,..zt,x1,x2,...xt)=P(xt∣zt)
t t t 时刻的观测状态的生成,只和 t t t 时刻的隐藏状态有关
三个问题
Evaluation:Given λ \lambda λ ,求 P ( O ∣ λ ) P(O|\lambda) P(O∣λ),使用 F o r w a r d − B a c k w a r d Forward-Backward Forward−Backward 算法
Learning:$ \lambda_{MLE} = argmax P(O|\lambda)$,EM 算法
Decoding: I ^ = a r g m a x P ( I ∣ O , λ ) \hat{I} = argmax P(I|O,\lambda) I^=argmaxP(I∣O,λ)
问题 ①:已知 ( π \pi π, A, B),求 Z Z Z,viterbi 算法
已知观测状态和 θ \theta θ ,求使得目标概率 $ P(Z|X,\theta )$ 最大的隐藏状态序列 Z Z Z
目标概率表示为
P
(
Z
∣
X
,
θ
)
=
P
(
z
1
=
q
1
)
⋅
P
(
z
2
=
q
2
∣
z
1
=
q
1
)
⋅
P
(
x
1
=
v
1
∣
z
1
=
q
1
)
×
.
.
.
×
P
(
z
t
=
q
t
)
⋅
P
(
z
t
+
1
=
q
t
+
1
∣
z
t
=
q
t
)
⋅
P
(
x
t
=
v
t
∣
z
t
=
q
t
)
P(Z|X,\theta ) = P(z_1=q_1) \cdot P(z_2=q_2|z_1=q_1) \cdot P(x_1=v_1|z_1=q_1)\times ...\\ \times P(z_t=q_t) \cdot P(z_{t+1}=q_{t+1}|z_t=q_t) \cdot P(x_t=v_t|z_t=q_t)
P(Z∣X,θ)=P(z1=q1)⋅P(z2=q2∣z1=q1)⋅P(x1=v1∣z1=q1)×...×P(zt=qt)⋅P(zt+1=qt+1∣zt=qt)⋅P(xt=vt∣zt=qt)
需要求隐藏状态序列 $ Z$ ,使用枚举的方法,有
t
t
t 个隐藏状态,每个隐藏状态有
m
m
m 中取值,算法的时间复杂度为
O
(
m
t
)
O(m^t)
O(mt) 是无法求解的
动态规划
我们可以把 Z 及其所有取值列出来,Z 取值的最优组合可以看成是从 z 1 z_1 z1 到 z k z_k zk 走过的分数最高的路径,并且 z k z_k zk 取到 q i q_i qi
那么 δ k + 1 ( j ) \delta _{k+1} (j) δk+1(j) 可以表示成,
δ k + 1 ( j ) = m a x { δ k ( 1 ) + l o g P ( z k + 1 = q j ∣ z k = q 1 ) + l o g P ( x k + 1 ∣ z k + 1 = q j ) . . . } \delta _{k+1} (j) = max\Big\{\delta_k(1) + logP(z_{k+1}=q_j|z_k=q_1)+logP(x_{k+1}|z_{k+1}=q_j) ...\Big\} δk+1(j)=max{δk(1)+logP(zk+1=qj∣zk=q1)+logP(xk+1∣zk+1=qj)...}
δ k + 1 ( j ) = m a x { δ k ( 1 ) + l o g P ( z k + 1 = q j ∣ z k = q 1 ) + l o g P ( x k + 1 ∣ z k + 1 = q j ) δ k ( 2 ) + l o g P ( z k + 1 = q j ∣ z k = q 2 ) + l o g P ( x k + 1 ∣ z k + 1 = q j ) . . . δ k ( m ) + l o g P ( z k + 1 = q j ∣ z k = q m ) + l o g P ( x k + 1 ∣ z k + 1 = q j ) \delta _{k+1} (j) =max \begin{cases} \delta_k(1) + logP(z_{k+1}=q_j|z_k=q_1)+logP(x_{k+1}|z_{k+1}=q_j) \\ \delta_k(2) + logP(z_{k+1}=q_j|z_k=q_2)+logP(x_{k+1}|z_{k+1}=q_j) \\ ... \\ \delta_k(m) + logP(z_{k+1}=q_j|z_k=q_m)+logP(x_{k+1}|z_{k+1}=q_j) \end{cases} δk+1(j)=max⎩⎪⎪⎪⎨⎪⎪⎪⎧δk(1)+logP(zk+1=qj∣zk=q1)+logP(xk+1∣zk+1=qj)δk(2)+logP(zk+1=qj∣zk=q2)+logP(xk+1∣zk+1=qj)...δk(m)+logP(zk+1=qj∣zk=qm)+logP(xk+1∣zk+1=qj)
最后可得
δ
k
+
1
(
j
)
=
m
i
a
x
[
δ
k
+
1
(
i
)
+
l
o
g
P
(
z
k
+
1
=
q
j
∣
z
k
=
q
i
)
+
l
o
g
P
(
x
k
+
1
∣
z
k
+
1
=
q
j
)
]
\delta _{k+1} (j) = \underset{i}max \Big[ \delta _{k+1} (i)+logP(z_{k+1}=q_j|z_k=q_i)+logP(x_{k+1}|z_{k+1}=q_j)\Big ]
δk+1(j)=imax[δk+1(i)+logP(zk+1=qj∣zk=qi)+logP(xk+1∣zk+1=qj)]
待續。。。