已知隐序列状态
HMM模型解决学习问题,是在已知观测序列,估计模型参数 λ = [ A , B , π ] \lambda = [{\bf{A,B,\pi }}] λ=[A,B,π],使得 P ( O ∣ λ ) P(O|\lambda ) P(O∣λ)最大。一般情况下,如果已知观测序列和隐藏序列,模型参数是比较容易求解的,但是对于仅仅已知观测序列,是否能求出模型参数是本节讨论的重点,主要解决方法是使用鲍姆-韦尔奇(Baum-Welch)算法。
Baum-Welch算法原理
对于给定观测序列,求解模型参数。我们将不可观测的状态数据表示为隐含数据
I
I
I,可观测序的数据序列表示为
O
O
O。此时HMM变为包含隐含数据的概率模型。Baum-Welch算法的参数可以通过EM算法学习得到。EM算法分E步和M步,E步被用来求期望, M步被用来求极大化。在E步和M步之前,首先要初始化参数,值得注意的是,不同的初始化参数,会得到不同的估计参数值,所以不存在解析解。
已知:
P
(
O
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
P(O|\lambda ) = \sum\limits_I {P(O|I,\lambda )} P(I|\lambda )
P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)
在E步,计算
Q
Q
Q函数:
Q
(
λ
,
λ
ˉ
)
=
∑
I
log
P
(
O
,
I
∣
λ
)
P
(
O
,
I
∣
λ
ˉ
)
Q(\lambda ,\bar \lambda ) = \sum\limits_I {\log P(O,I|\lambda )P(O,I|\bar \lambda )}
Q(λ,λˉ)=I∑logP(O,I∣λ)P(O,I∣λˉ)
上述公式中, λ \lambda λ是极大化的模型参数, λ ˉ \bar \lambda λˉ是当前模型参数的估计值。
M步:
极大化
Q
(
λ
,
λ
ˉ
)
Q(\lambda ,\bar \lambda )
Q(λ,λˉ)函数,求模型参数
λ
=
[
A
,
B
,
π
]
\lambda = [{\bf{A,B,\pi }}]
λ=[A,B,π],
λ ˉ = arg max λ ∑ I log P ( O , I ∣ λ ) P ( O , I ∣ λ ˉ ) \bar \lambda = \arg {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {\max }\limits_\lambda \sum\limits_I {\log P(O,I|\lambda )P(O,I|\bar \lambda )} λˉ=argλmaxI∑logP(O,I∣λ)P(O,I∣λˉ)
Baum-Welch算法推导
对于
Q
(
λ
,
λ
ˉ
)
Q(\lambda ,\bar \lambda )
Q(λ,λˉ)我们可以进行如下的表示:
Q
(
λ
,
λ
‾
)
=
∑
I
log
π
i
1
P
(
O
,
I
∣
λ
‾
)
+
∑
I
(
∑
t
=
1
T
−
1
log
a
i
t
a
i
t
+
1
)
P
(
O
,
I
∣
λ
‾
)
+
∑
I
(
∑
t
=
1
T
log
b
i
t
(
a
t
)
)
P
(
O
,
I
∣
λ
‾
)
\begin{array}{l} Q\left( {\lambda ,\overline \lambda } \right){\rm{ = }}\sum\limits_I {\log {\pi _{{i_1}}}P(O,I|\overline \lambda )} \\ {\kern 42pt} + \sum\limits_I {\left( {\sum\limits_{t = 1}^{T - 1} {\log {a_{{i_t}}}{a_{{i_{t + 1}}}}} } \right)P(O,I|\overline \lambda )} {\rm{ + }}\sum\limits_I {\left( {\sum\limits_{t = 1}^T {\log {b_{{i_t}}}({a_t})} } \right)P(O,I|\overline \lambda )} \end{array}
Q(λ,λ)=I∑logπi1P(O,I∣λ)+I∑(t=1∑T−1logaitait+1)P(O,I∣λ)+I∑(t=1∑Tlogbit(at))P(O,I∣λ)
对上式进行极大化,由于是三个项和的形式,我们只需将每一项进行极大化,那么最终的结果就是极大化的结果。
对于第一项,可以进行如下的化简:
∑
I
log
π
i
0
P
(
O
,
I
∣
λ
‾
)
=
∑
i
=
1
N
log
π
i
P
(
O
,
i
1
=
i
∣
λ
‾
)
\sum\limits_I {\log {\pi _{{i_0}}}P(O,I|\overline \lambda )} {\rm{ = }}\sum\limits_{i = 1}^N {\log {\pi _i}P(O,{i_1} = i|\overline \lambda )}
I∑logπi0P(O,I∣λ)=i=1∑NlogπiP(O,i1=i∣λ)
π
i
{\pi _i}
πi满足
∑
i
=
1
N
π
i
=
1
\sum\limits_{i = 1}^N {{\pi _i}} = 1
i=1∑Nπi=1的约束条件,我们可以在这里使用拉格朗日乘子法方便求得导数。拉格朗日函数如下所示:
∑
i
=
1
N
log
π
i
P
(
O
,
i
1
=
i
∣
λ
‾
)
+
γ
(
∑
i
=
1
N
π
i
−
1
)
\sum\limits_{i = 1}^N {\log {\pi _i}P(O,{i_1} = i|\overline \lambda )} + \gamma (\sum\limits_{i = 1}^N {{\pi _i} - 1} )
i=1∑NlogπiP(O,i1=i∣λ)+γ(i=1∑Nπi−1)
令上式偏导数为0可以得到:
∂
∂
π
i
[
∑
i
=
1
N
log
π
i
P
(
O
,
i
1
=
i
∣
λ
‾
)
+
γ
(
∑
i
=
1
N
π
i
−
1
)
]
=
0
\frac{\partial }{{\partial {\pi _i}}}\left[ {\sum\limits_{i = 1}^N {\log {\pi _i}P(O,{i_1} = i|\overline \lambda )} + \gamma (\sum\limits_{i = 1}^N {{\pi _i} - 1} )} \right] = 0
∂πi∂[i=1∑NlogπiP(O,i1=i∣λ)+γ(i=1∑Nπi−1)]=0
化简得:
P
(
O
,
i
1
=
i
∣
λ
‾
)
+
γ
π
i
=
0
P(O,{i_1} = i|\overline \lambda ) + \gamma {\pi _i} = 0
P(O,i1=i∣λ)+γπi=0
化简得到
γ
\gamma
γ:
γ
=
−
P
(
O
∣
λ
‾
)
\gamma = - P(O|\overline \lambda)
γ=−P(O∣λ)
最终可以得到:
π
i
=
P
(
O
,
i
1
=
i
∣
λ
‾
)
P
(
O
∣
λ
‾
)
{\pi _i} = \frac{{P(O,{i_1} = i|\overline \lambda )}}{{P(O|\overline \lambda )}}
πi=P(O∣λ)P(O,i1=i∣λ)
对于第二项,可以进行如下的化简:
∑
I
∑
t
=
1
T
−
1
log
a
i
t
i
t
+
1
P
(
O
,
I
∣
λ
‾
)
=
∑
i
=
1
N
∑
j
=
1
N
∑
t
=
1
T
−
1
log
a
i
j
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
\sum\limits_I {\sum\limits_{t = 1}^{T - 1} {\log {a_{{i_t}{i_{t + 1}}}}P(O,I|\overline \lambda )} = \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^{T - 1} {\log {a_{ij}}P(O,{i_t} = i,{i_{t + 1}} = j|\overline \lambda )} } } }
I∑t=1∑T−1logaitit+1P(O,I∣λ)=i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λ)
类似第1项,由于
a
i
j
{a_{ij}}
aij还满足
∑
j
=
1
N
a
i
j
=
1
\sum\limits_{j = 1}^N {{a_{ij}}} = 1
j=1∑Naij=1。和求解
π
i
{\pi _i}
πi类似,可以利用拉格朗日乘子法并对
a
i
j
{a_{ij}}
aij求导,并令结果为0,可以得到
a
i
j
{a_{ij}}
aij的迭代表达式为:
a i j = ∑ t = 1 T − 1 P ( O , i t = i , i t + 1 = j ∣ λ ‾ ) ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ‾ ) {a_{ij}} = \frac{{\sum\limits_{t = 1}^{T - 1} {P(O,{i_t} = i,{i_{t + 1}} = j|\overline \lambda )} }}{{\sum\limits_{t = 1}^{T - 1} {P(O,{i_t} = i|\overline \lambda )} }} aij=t=1∑T−1P(O,it=i∣λ)t=1∑T−1P(O,it=i,it+1=j∣λ)