前言
本文将对隐马尔可夫模型的几个求解问题进行推导。不涉及什么是隐马尔可夫,什么是马尔可夫链之类的东西。
数学基础:【概率论与数理统计知识复习-哔哩哔哩】
原理推导
在推导之前,先对我们的变量进行一下定义。
观测序列X,隐序列Z
X
=
(
x
1
,
x
2
,
⋯
,
x
T
)
;
Z
=
(
z
1
,
z
2
,
⋯
,
z
T
)
;
X=\begin{pmatrix} x_1,x_2,\cdots,x_T \end{pmatrix}; \\Z=\begin{pmatrix} z_1,z_2,\cdots,z_T \end{pmatrix};
X=(x1,x2,⋯,xT);Z=(z1,z2,⋯,zT);
x
T
x_T
xT表示观测序列一共有T个。并且每一个都是不同的随机变量,而对应的隐序列也是如此。
x
i
=
(
v
1
,
v
2
,
⋯
,
v
m
)
;
z
i
=
(
q
1
,
q
2
,
⋯
,
q
n
)
x_i=\begin{pmatrix} v_1,v_2,\cdots,v_m \end{pmatrix}; z_i=\begin{pmatrix} q_1,q_2,\cdots,q_n \end{pmatrix}
xi=(v1,v2,⋯,vm);zi=(q1,q2,⋯,qn)
表示每个
x
i
x_i
xi有m个状态集,每个
z
i
z_i
zi有n个状态集合(假设z是离散型)
两个假设
隐马尔可夫模型带有两个假设。
①齐次马尔可夫假设,当前隐序列仅跟前一个隐序列有关。公式表达为
P
(
z
t
∣
z
1
,
z
2
,
⋯
z
t
−
1
,
x
1
,
⋯
,
x
t
−
1
)
=
P
(
z
t
∣
z
t
−
1
)
P(z_t|z_1,z_2,\cdots z_{t-1},x_1,\cdots,x_{t-1})=P(z_t|z_{t-1})
P(zt∣z1,z2,⋯zt−1,x1,⋯,xt−1)=P(zt∣zt−1)
②观测独立假设。当前观测状态仅和当前的隐状态有关。公式表达为
P
(
x
t
∣
x
1
,
x
2
,
⋯
x
t
−
1
,
z
1
,
⋯
,
z
t
)
=
P
(
x
t
∣
z
t
)
P(x_t|x_1,x_2,\cdots x_{t-1},z_1,\cdots,z_{t})=P(x_t|z_{t})
P(xt∣x1,x2,⋯xt−1,z1,⋯,zt)=P(xt∣zt)
Learning:
学习参数,是几乎每一个模型都必须要经历的事情,也是模型预测的前提。因此,我们先开始学习模型参数,在学习之前,先对模型参数做一下定义
初始概率分布
π
\pi
π,转移矩阵
A
A
A,发射矩阵
B
B
B
π
=
(
π
1
π
2
⋯
π
n
)
\pi=\begin{pmatrix} \pi_1&\pi_2& \cdots & \pi_n \end{pmatrix}
π=(π1π2⋯πn)
那么对应的转移矩阵A就是(n,n)维的矩阵,
A
=
[
a
i
j
]
A=[{a_{ij}}]
A=[aij],表示从状态i转移到状态j的概率。而发射矩阵B则为(n,m)维的矩阵。
B
=
[
b
i
j
]
B=[b_{ij}]
B=[bij]表示从隐状态i发射到观测状态j的概率
现在,我们用 θ = ( π , A , B ) \theta=(\pi,A,B) θ=(π,A,B)来表示参数。
要求出这些参数,因为观测序列X是我们给定的训练数据集。最朴素的一种想法就是使用极大似然估计求解参数。
θ
^
=
max
θ
P
(
X
∣
θ
)
\hat \theta=\max\limits_{\theta}{P(X|\theta)}
θ^=θmaxP(X∣θ)
请注意
P
(
X
∣
θ
)
P(X|\theta)
P(X∣θ)中的
θ
\theta
θ是参数。
EM算法求解
对于隐马尔可夫模型,如果X,Z都是给定的,那么直接采用极大似然估计就可以求解,一般称为监督式学习。另外一种就是,仅仅给了X,而未给定Z,称为非监督式学习。
对于非监督式学习,隐马尔科夫链学习参数用的是EM算法。
EM算法分为两步:
①给定 P ( Z ∣ X , θ t ) → E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] P(Z|X,\theta^{t})\rightarrow{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} P(Z∣X,θt)→EP(Z∣X,θt)[logP(Z,X∣θ)]
② θ t + 1 = max θ E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] {\theta^{t+1}}=\max\limits_{\theta}{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} θt+1=θmaxEP(Z∣X,θt)[logP(Z,X∣θ)]
所以,最主要的是求出
E
P
(
Z
∣
X
,
θ
t
)
[
l
o
g
P
(
Z
,
X
∣
θ
)
]
=
∑
Z
l
o
g
P
(
Z
,
X
∣
θ
)
P
(
Z
∣
X
,
θ
t
)
E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]=\sum\limits_{Z}logP(Z,X|\theta)P(Z|X,\theta^t)
EP(Z∣X,θt)[logP(Z,X∣θ)]=Z∑logP(Z,X∣θ)P(Z∣X,θt)
首先求出
P
(
Z
,
X
∣
θ
)
P(Z,X|\theta)
P(Z,X∣θ)
观测序列和隐序列有着关联,所以在概率中引入隐变量。
P
(
X
∣
θ
)
=
∑
Z
P
(
X
,
Z
∣
θ
)
=
∑
Z
P
(
X
∣
Z
,
θ
)
P
(
Z
∣
θ
)
\begin{equation} \begin{aligned} P(X|\theta)=&\sum\limits_{Z}P(X,Z|\theta) \\=&\sum\limits_{Z}P(X|Z,\theta)P(Z|\theta) \end{aligned} \end{equation}
P(X∣θ)==Z∑P(X,Z∣θ)Z∑P(X∣Z,θ)P(Z∣θ)
对于
P
(
Z
∣
θ
)
P(Z|\theta)
P(Z∣θ)
P
(
Z
∣
θ
)
=
P
(
z
1
,
z
2
,
⋯
,
z
T
∣
θ
)
=
P
(
z
T
∣
z
1
,
z
2
,
⋯
,
z
T
−
1
,
θ
)
P
(
z
1
,
z
2
,
⋯
,
z
T
−
1
∣
θ
)
=
P
(
z
T
∣
z
T
−
1
,
θ
)
P
(
z
1
,
z
2
,
⋯
,
z
T
−
1
∣
θ
)
=
a
(
z
T
−
1
,
z
T
)
P
(
z
1
,
z
2
,
⋯
,
z
T
−
1
∣
θ
)
\begin{equation} \begin{aligned} P(Z|\theta)=&P(z_1,z_2,\cdots,z_T|\theta) \\=&P(z_T|z_1,z_2,\cdots,z_{T-1},\theta)P(z_1,z_2,\cdots,z_{T-1}|\theta) \\=&P(z_T|z_{T-1},\theta)P(z_1,z_2,\cdots,z_{T-1}|\theta) \\=&a_{(z_{T-1},z_{T})}P(z_1,z_2,\cdots,z_{T-1}|\theta) \end{aligned} \end{equation}
P(Z∣θ)====P(z1,z2,⋯,zT∣θ)P(zT∣z1,z2,⋯,zT−1,θ)P(z1,z2,⋯,zT−1∣θ)P(zT∣zT−1,θ)P(z1,z2,⋯,zT−1∣θ)a(zT−1,zT)P(z1,z2,⋯,zT−1∣θ)
里面用到了齐次马尔可夫假设。其中
a
(
z
T
−
1
,
z
T
)
a_{(z_{T-1},z_{T})}
a(zT−1,zT)表示第
T
−
1
T-1
T−1个隐序列到第
T
T
T个隐序列的概率。我们发现
P
(
z
1
,
z
2
,
⋯
,
z
T
−
1
∣
θ
)
P(z_1,z_2,\cdots,z_{T-1}|\theta)
P(z1,z2,⋯,zT−1∣θ)和
P
(
z
1
,
z
2
,
⋯
,
z
T
∣
θ
)
P(z_1,z_2,\cdots,z_T|\theta)
P(z1,z2,⋯,zT∣θ)只差一个,那么我们再对
P
(
z
1
,
z
2
,
⋯
,
z
T
−
1
∣
θ
)
P(z_1,z_2,\cdots,z_{T-1}|\theta)
P(z1,z2,⋯,zT−1∣θ)以上面的方法不断递归,最终得到
P
(
Z
∣
θ
)
=
π
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
P(Z|\theta)=\pi\prod\limits_{i=1}^{T-1}a_{(z_{i},z_{i+1})}
P(Z∣θ)=πi=1∏T−1a(zi,zi+1)
其中 π \pi π是因为 P ( z 1 ∣ z 0 ) P(z_1|z_0) P(z1∣z0),即代表初始概率。
对于
P
(
X
∣
Z
,
θ
)
P(X|Z,\theta)
P(X∣Z,θ)
P
(
X
∣
Z
,
θ
)
=
P
(
x
1
,
x
2
,
⋯
,
x
T
∣
Z
,
θ
)
=
P
(
x
T
∣
x
1
,
x
2
,
⋯
,
x
T
−
1
,
Z
,
θ
)
P
(
x
1
,
x
2
,
⋯
,
x
T
−
1
∣
Z
,
θ
)
=
P
(
x
T
∣
z
T
)
P
(
x
1
,
x
2
,
⋯
,
x
T
−
1
∣
Z
,
θ
)
=
b
(
z
T
,
x
T
)
\begin{equation} \begin{aligned} P(X|Z,\theta)=&P(x_1,x_2,\cdots,x_T|Z,\theta) \\=&P(x_T|x_1,x_2,\cdots,x_{T-1},Z,\theta)P(x_1,x_2,\cdots,x_{T-1}|Z,\theta) \\=&P(x_T|z_{T})P(x_1,x_2,\cdots,x_{T-1}|Z,\theta) \\=&b_{(z_T,x_T)} \end{aligned} \end{equation}
P(X∣Z,θ)====P(x1,x2,⋯,xT∣Z,θ)P(xT∣x1,x2,⋯,xT−1,Z,θ)P(x1,x2,⋯,xT−1∣Z,θ)P(xT∣zT)P(x1,x2,⋯,xT−1∣Z,θ)b(zT,xT)
其中
a
(
z
T
,
x
T
)
a_{(z_{T},x_{T})}
a(zT,xT)表示第
T
T
T个隐序列到第
T
T
T个观测序列的概率。里面用到了观测独立假设。我们同样发现这也可以用递归。同上面的一样。最终得到
P
(
X
∣
Z
,
θ
)
=
∏
j
=
1
T
b
(
z
j
,
z
j
)
P(X|Z,\theta)=\prod\limits_{j=1}^{T}b_{(z_j,z_j)}
P(X∣Z,θ)=j=1∏Tb(zj,zj)
所以最终
P
(
X
∣
θ
)
=
∑
Z
π
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
∏
j
=
1
T
b
(
z
j
,
z
j
)
P(X|\theta)=\sum\limits_{Z}\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)}
P(X∣θ)=Z∑πi=1∏T−1a(zi,zi+1)j=1∏Tb(zj,zj)
而
P
(
Z
,
X
∣
θ
)
=
π
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
∏
j
=
1
T
b
(
z
j
,
z
j
)
P(Z,X|\theta)=\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)}
P(Z,X∣θ)=πi=1∏T−1a(zi,zi+1)j=1∏Tb(zj,zj)
所以对于EM算法所求
E
P
(
Z
∣
X
,
θ
t
)
[
l
o
g
P
(
Z
,
X
∣
θ
)
]
=
∑
Z
l
o
g
[
π
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
∏
j
=
1
T
b
(
z
j
,
z
j
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
[
l
o
g
π
+
l
o
g
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
+
l
o
g
∏
j
=
1
T
b
(
z
j
,
z
j
)
]
P
(
Z
∣
X
,
θ
t
)
\begin{equation} \begin{aligned} E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]=&\sum\limits_{Z}log\left[\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[log\pi+log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}+log\prod\limits_{j=1}^Tb_{(z_j,z_j)}\right]P(Z|X,\theta^t) \end{aligned} \end{equation}
EP(Z∣X,θt)[logP(Z,X∣θ)]==Z∑log[πi=1∏T−1a(zi,zi+1)j=1∏Tb(zj,zj)]P(Z∣X,θt)Z∑[logπ+logi=1∏T−1a(zi,zi+1)+logj=1∏Tb(zj,zj)]P(Z∣X,θt)
要求 θ t + 1 = max θ E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] {\theta^{t+1}}=\max\limits_{\theta}{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} θt+1=θmaxEP(Z∣X,θt)[logP(Z,X∣θ)]。因为 θ = ( π , A , B ) \theta=(\pi,A,B) θ=(π,A,B),分别对里面所有的值求最大。
对于
π
\pi
π
π
t
+
1
=
max
π
∑
Z
[
l
o
g
π
+
l
o
g
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
+
l
o
g
∏
j
=
1
T
b
(
z
j
,
z
j
)
]
P
(
Z
∣
X
,
θ
t
)
=
max
π
∑
Z
l
o
g
(
π
)
P
(
Z
∣
X
,
θ
t
)
=
max
π
∑
z
1
,
z
2
,
⋯
,
z
T
l
o
g
(
π
)
P
(
z
1
,
z
2
,
⋯
,
z
T
∣
X
,
θ
t
)
=
max
π
∑
z
1
l
o
g
(
π
)
∑
z
2
,
⋯
,
z
T
P
(
z
1
,
z
2
,
⋯
,
z
T
∣
X
,
θ
t
)
=
max
π
∑
z
1
l
o
g
(
π
)
P
(
z
1
∣
X
,
θ
t
)
=
max
π
∑
i
=
1
n
l
o
g
(
π
i
)
P
(
z
1
=
q
i
∣
X
,
θ
t
)
=
max
π
∑
i
=
1
n
l
o
g
(
π
i
)
P
(
z
1
=
q
i
,
X
∣
θ
t
)
P
(
X
,
∣
θ
t
)
=
max
π
∑
i
=
1
n
l
o
g
(
π
i
)
P
(
z
1
=
q
i
,
X
∣
θ
t
)
\begin{equation} \begin{aligned} \pi^{t+1}=&\max\limits_{\pi}\sum\limits_{Z}\left[log\pi+log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}+log\prod\limits_{j=1}^Tb_{(z_j,z_j)}\right]P(Z|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{Z}log(\pi){P(Z|X,\theta^t)} \\=&\max\limits_{\pi}\sum_{z_1,z_2,\cdots,z_T}log(\pi){P(z_1,z_2,\cdots,z_T|X,\theta^t)} \\=&\max\limits_{\pi}\sum\limits_{z_1}log(\pi)\sum\limits_{z_2,\cdots,z_T}P(z_1,z_2,\cdots,z_T|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{z_1}log(\pi)P(z_1|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})\frac{P(z_1=q_i,X|\theta^t)}{P(X,|\theta^t)} \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i,X|\theta^t) \end{aligned} \end{equation}
πt+1========πmaxZ∑[logπ+logi=1∏T−1a(zi,zi+1)+logj=1∏Tb(zj,zj)]P(Z∣X,θt)πmaxZ∑log(π)P(Z∣X,θt)πmaxz1,z2,⋯,zT∑log(π)P(z1,z2,⋯,zT∣X,θt)πmaxz1∑log(π)z2,⋯,zT∑P(z1,z2,⋯,zT∣X,θt)πmaxz1∑log(π)P(z1∣X,θt)πmaxi=1∑nlog(πi)P(z1=qi∣X,θt)πmaxi=1∑nlog(πi)P(X,∣θt)P(z1=qi,X∣θt)πmaxi=1∑nlog(πi)P(z1=qi,X∣θt)
因为 π \pi π为初始概率分布,故 ∑ i = 1 n π i = 1 \sum\limits_{i=1}^n\pi_i=1 i=1∑nπi=1,所以,问题就变成了带约束的优化问题。
构造拉格朗日函数
L
(
π
,
λ
)
=
∑
i
=
1
n
l
o
g
(
π
i
)
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
[
∑
i
=
1
n
π
i
−
1
]
L(\pi,\lambda)=\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i,X|\theta^t)+\lambda\left[\sum\limits_{i=1}^n\pi_i-1\right]
L(π,λ)=i=1∑nlog(πi)P(z1=qi,X∣θt)+λ[i=1∑nπi−1]
对
π
i
\pi_i
πi求导
∂
L
(
π
,
λ
)
∂
π
i
=
1
π
i
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
=
0
等式左右乘以
π
i
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
π
i
=
0
\begin{equation} \begin{aligned} &\frac{\partial{L(\pi,\lambda)}}{\partial{\pi_i}} \\=&\frac{1}{\pi_i}P(z_1=q_i,X|\theta^t)+\lambda \\=&0 \\&等式左右乘以\pi_i \\&P(z_1=q_i,X|\theta^t)+\lambda\pi_i=0 \end{aligned} \end{equation}
==∂πi∂L(π,λ)πi1P(z1=qi,X∣θt)+λ0等式左右乘以πiP(z1=qi,X∣θt)+λπi=0
所以
∑
i
=
1
n
[
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
π
i
λ
]
=
0
即
∑
i
=
1
n
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
∑
i
=
1
n
π
i
=
∑
i
=
1
n
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
=
P
(
X
∣
θ
t
)
+
λ
=
0
\begin{equation} \begin{aligned} &\sum\limits_{i=1}^n\left[P(z_1=q_i,X|\theta^t)+\pi_i\lambda\right]=0 \\即 \\&\sum\limits_{i=1}^nP(z_1=q_i,X|\theta^t)+\lambda\sum\limits_{i=1}^n\pi_i \\&=\sum\limits_{i=1}^nP(z_1=q_i,X|\theta^t)+\lambda \\&=P(X|\theta^t)+\lambda \\&=0 \end{aligned} \end{equation}
即i=1∑n[P(z1=qi,X∣θt)+πiλ]=0i=1∑nP(z1=qi,X∣θt)+λi=1∑nπi=i=1∑nP(z1=qi,X∣θt)+λ=P(X∣θt)+λ=0
最后
λ
=
−
P
(
X
∣
θ
t
)
\lambda=-P(X|\theta^t)
λ=−P(X∣θt)
将其回代入
P
(
z
1
=
q
i
,
X
∣
θ
t
)
+
λ
π
i
=
0
P(z_1=q_i,X|\theta^t)+\lambda\pi_i=0
P(z1=qi,X∣θt)+λπi=0中
π
i
=
P
(
z
1
=
q
i
,
X
∣
θ
t
)
P
(
X
∣
θ
t
)
\pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)}
πi=P(X∣θt)P(z1=qi,X∣θt)
对于状态转移矩阵A
L
(
A
)
=
∑
Z
l
o
g
∏
i
=
1
T
−
1
a
(
z
i
,
z
i
+
1
)
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
∑
i
=
1
T
−
1
l
o
g
[
a
(
z
i
,
z
i
+
1
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
[
log
a
(
z
1
,
z
2
)
+
log
a
(
z
2
,
z
3
)
+
⋯
+
log
a
(
z
T
−
1
,
z
T
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
log
a
(
z
1
,
z
2
)
P
(
Z
∣
X
,
θ
t
)
+
∑
Z
log
a
(
z
2
,
z
3
)
P
(
Z
∣
X
,
θ
t
)
+
⋯
+
∑
Z
log
a
(
z
T
−
1
,
z
T
)
P
(
Z
∣
X
,
θ
t
)
\begin{equation} \begin{aligned} L(A)=&\sum\limits_{Z}log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}P(Z|X,\theta^t) \\=&\sum\limits_{Z}\sum\limits_{i=1}^{T-1}log[a_{(z_i,z_{i+1})}]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[\log a_{(z_1,z_{2})}+\log a_{(z_2,z_{3})}+\cdots+\log a_{(z_{T-1},z_{T})}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\log a_{(z_1,z_{2})}P(Z|X,\theta^t)+\sum\limits_{Z}\log a_{(z_2,z_{3})}P(Z|X,\theta^t)+\cdots+\sum\limits_{Z}\log a_{(z_{T-1},z_{T})}P(Z|X,\theta^t) \end{aligned} \end{equation}
L(A)====Z∑logi=1∏T−1a(zi,zi+1)P(Z∣X,θt)Z∑i=1∑T−1log[a(zi,zi+1)]P(Z∣X,θt)Z∑[loga(z1,z2)+loga(z2,z3)+⋯+loga(zT−1,zT)]P(Z∣X,θt)Z∑loga(z1,z2)P(Z∣X,θt)+Z∑loga(z2,z3)P(Z∣X,θt)+⋯+Z∑loga(zT−1,zT)P(Z∣X,θt)
一项一项地看
∑
Z
log
a
(
z
1
,
z
2
)
P
(
Z
∣
X
,
θ
t
)
=
∑
z
1
,
z
2
,
⋯
,
z
T
log
a
(
z
1
,
z
2
)
P
(
Z
∣
X
,
θ
t
)
=
∑
z
1
∑
z
2
log
a
(
z
1
,
z
2
)
∑
z
3
,
⋯
,
z
T
P
(
z
1
,
z
2
,
⋯
,
z
T
∣
X
,
θ
t
)
=
∑
z
1
∑
z
2
log
a
(
z
1
,
z
2
)
P
(
z
1
,
z
2
∣
X
,
θ
t
)
=
∑
i
=
1
n
∑
j
=
2
n
log
a
(
z
1
=
q
i
,
z
2
=
q
j
)
P
(
z
1
=
q
i
,
z
2
=
q
j
∣
X
,
θ
t
)
\begin{equation} \begin{aligned} &\sum\limits_{Z}\log a_{(z_1,z_{2})}P(Z|X,\theta^t) \\=&\sum\limits_{z_1,z_2,\cdots,z_T}\log a_{(z_1,z_{2})}P(Z|X,\theta^t) \\=&\sum\limits_{z_1}\sum\limits_{z_2}\log a_{(z_1,z_{2})}\sum\limits_{z_3,\cdots,z_T}P(z_1,z_2,\cdots,z_T|X,\theta^t) \\=&\sum\limits_{z_1}\sum\limits_{z_2}\log a_{(z_1,z_{2})}P(z_1,z_2|X,\theta^t) \\=&\sum\limits_{i=1}^n\sum\limits_{j=2}^n\log a_{(z_1=q_i,z_{2}=q_j)}P(z_1=q_i,z_2=q_j|X,\theta^t) \end{aligned} \end{equation}
====Z∑loga(z1,z2)P(Z∣X,θt)z1,z2,⋯,zT∑loga(z1,z2)P(Z∣X,θt)z1∑z2∑loga(z1,z2)z3,⋯,zT∑P(z1,z2,⋯,zT∣X,θt)z1∑z2∑loga(z1,z2)P(z1,z2∣X,θt)i=1∑nj=2∑nloga(z1=qi,z2=qj)P(z1=qi,z2=qj∣X,θt)
所以,由第一项可得其余项,加起来得
L
(
A
)
=
∑
t
=
1
T
−
1
∑
i
=
1
n
∑
j
=
1
n
log
a
(
z
t
=
q
i
,
z
t
+
1
=
q
j
)
P
(
z
t
=
q
i
,
z
t
+
1
=
q
J
∣
X
,
θ
t
)
L(A)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_J|X,\theta^t)
L(A)=t=1∑T−1i=1∑nj=1∑nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qJ∣X,θt)
同求
π
\pi
π中的一样,把
P
(
z
t
=
q
i
,
z
t
+
1
=
q
J
∣
X
,
θ
t
)
P(z_t=q_i,z_{t+1}=q_J|X,\theta^t)
P(zt=qi,zt+1=qJ∣X,θt)进行贝叶斯展开,然后因为
P
(
X
)
P(X)
P(X)与我们要求的无关,故
L
(
A
)
=
∑
t
=
1
T
−
1
∑
i
=
1
n
∑
j
=
1
n
log
a
(
z
t
=
q
i
,
z
t
+
1
=
q
j
)
P
(
z
t
=
q
i
,
z
t
+
1
=
q
J
,
X
∣
θ
t
)
L(A)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_J,X|\theta^t)
L(A)=t=1∑T−1i=1∑nj=1∑nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qJ,X∣θt)
又因为对于状态转移矩阵的一行,有
∑
j
=
1
n
a
(
z
=
q
i
,
z
=
q
j
)
=
1
\sum\limits_{j=1}^na_{(z=q_i,z=q_j)}=1
j=1∑na(z=qi,z=qj)=1,所以对于矩阵A的拉格朗日函数
P
(
A
,
λ
)
=
∑
t
=
1
T
−
1
∑
i
=
1
n
∑
j
=
1
n
log
a
(
z
t
=
q
i
,
z
t
+
1
=
q
j
)
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
λ
[
∑
j
=
1
n
a
(
z
=
q
i
,
z
=
q
j
)
−
1
]
P(A,\lambda)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda\left[\sum\limits_{j=1}^na_{(z=q_i,z=q_j)}-1\right]
P(A,λ)=t=1∑T−1i=1∑nj=1∑nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qj,X∣θt)+λ[j=1∑na(z=qi,z=qj)−1]
对
a
(
z
=
q
i
,
z
=
q
j
)
a_{(z=q_i,z=q_j)}
a(z=qi,z=qj)求导
∂
P
(
A
,
λ
)
∂
a
(
z
=
q
i
,
z
=
q
j
)
=
∑
t
T
−
1
1
a
(
z
t
=
q
i
,
z
t
+
1
=
q
j
)
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
λ
=
0
等式左右乘以
a
(
z
=
q
i
,
z
=
q
j
)
得:
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
λ
a
(
z
=
q
i
,
z
=
q
j
)
=
0
\begin{equation} \begin{aligned} \frac{\partial P(A,\lambda)}{\partial a_{(z=q_i,z=q_j)}}=&\sum\limits_{t}^{T-1}\frac{1}{a_{(z_t=q_i,z_{t+1}=q_j)}} P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda \\=&0 \\&等式左右乘以a_{(z=q_i,z=q_j)}得: \\&\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}=0 \end{aligned} \end{equation}
∂a(z=qi,z=qj)∂P(A,λ)==t∑T−1a(zt=qi,zt+1=qj)1P(zt=qi,zt+1=qj,X∣θt)+λ0等式左右乘以a(z=qi,z=qj)得:t∑T−1P(zt=qi,zt+1=qj,X∣θt)+λa(z=qi,z=qj)=0
所以,对于不同的
q
j
q_j
qj,有
∑
j
=
1
n
[
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
λ
a
(
z
=
q
i
,
z
=
q
j
)
]
=
0
即
∑
j
=
1
n
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
∑
j
=
1
n
λ
a
(
z
=
q
i
,
z
=
q
j
)
=
0
即
∑
t
=
1
T
−
1
P
(
z
t
=
q
i
,
X
∣
θ
t
)
+
λ
=
0
\begin{equation} \begin{aligned} &\sum\limits_{j=1}^n\left[\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}\right]=0 \\即 \\&\sum\limits_{j=1}^n\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\sum\limits_{j=1}^n\lambda a_{(z=q_i,z=q_j)}=0 \\即 \\&\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)+\lambda =0 \end{aligned} \end{equation}
即即j=1∑n[t∑T−1P(zt=qi,zt+1=qj,X∣θt)+λa(z=qi,z=qj)]=0j=1∑nt∑T−1P(zt=qi,zt+1=qj,X∣θt)+j=1∑nλa(z=qi,z=qj)=0t=1∑T−1P(zt=qi,X∣θt)+λ=0
将所得
λ
\lambda
λ回代入
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
+
λ
a
(
z
=
q
i
,
z
=
q
j
)
=
0
\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}=0
t∑T−1P(zt=qi,zt+1=qj,X∣θt)+λa(z=qi,z=qj)=0,得
a
(
z
=
q
i
,
z
=
q
j
)
=
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
∑
t
=
1
T
−
1
P
(
z
t
=
q
i
,
X
∣
θ
t
)
a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)}
a(z=qi,z=qj)=t=1∑T−1P(zt=qi,X∣θt)t∑T−1P(zt=qi,zt+1=qj,X∣θt)
对于发射矩阵B
L
(
B
)
=
∑
Z
l
o
g
∏
j
=
1
T
b
(
z
i
,
x
i
)
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
∑
j
=
1
T
l
o
g
[
b
(
z
i
,
x
i
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
[
l
o
g
b
(
z
1
,
x
1
)
+
l
o
g
b
(
z
2
,
x
2
)
+
⋯
+
l
o
g
b
(
z
T
,
x
T
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
Z
l
o
g
[
b
(
z
1
,
x
1
)
]
P
(
Z
∣
X
,
θ
t
)
+
∑
Z
l
o
g
[
b
(
z
2
,
x
2
)
]
P
(
Z
∣
X
,
θ
t
)
+
⋯
+
∑
Z
l
o
g
[
b
(
z
T
,
x
T
)
]
P
(
Z
∣
X
,
θ
t
)
\begin{equation} \begin{aligned} L(B)=&\sum\limits_{Z}log\prod_{j=1}^Tb_{(z_i,x_i)}P(Z|X,\theta^t) \\=&\sum\limits_{Z}\sum\limits_{j=1}^Tlog\left[b_{(z_i,x_i)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[logb_{(z_1,x_1)}+logb_{(z_2,x_2)}+\cdots+logb_{(z_T,x_T)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta^t)+\sum\limits_{Z}log[b_{(z_2,x_2)}]P(Z|X,\theta^t)+\cdots+\sum\limits_{Z}log[b_{(z_T,x_T)}]P(Z|X,\theta^t) \end{aligned} \end{equation}
L(B)====Z∑logj=1∏Tb(zi,xi)P(Z∣X,θt)Z∑j=1∑Tlog[b(zi,xi)]P(Z∣X,θt)Z∑[logb(z1,x1)+logb(z2,x2)+⋯+logb(zT,xT)]P(Z∣X,θt)Z∑log[b(z1,x1)]P(Z∣X,θt)+Z∑log[b(z2,x2)]P(Z∣X,θt)+⋯+Z∑log[b(zT,xT)]P(Z∣X,θt)
一项一项处理,对于
∑
Z
l
o
g
[
b
(
z
1
,
x
1
)
]
P
(
Z
∣
X
,
θ
)
\sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta)
Z∑log[b(z1,x1)]P(Z∣X,θ)
∑
Z
l
o
g
[
b
(
z
1
,
x
1
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
z
1
,
⋯
,
z
T
l
o
g
[
b
(
z
1
,
x
1
)
]
P
(
Z
∣
X
,
θ
t
)
=
∑
z
1
l
o
g
[
b
(
z
1
,
x
1
)
]
∑
z
2
,
⋯
,
z
T
P
(
z
1
,
⋯
,
z
T
∣
X
,
θ
t
)
=
∑
z
1
l
o
g
[
b
(
z
1
,
x
1
)
]
P
(
z
1
∣
X
,
θ
t
)
=
∑
i
=
1
n
l
o
g
[
b
(
z
1
=
q
j
,
x
1
)
]
P
(
z
1
=
q
j
∣
X
,
θ
t
)
\begin{equation} \begin{aligned} \sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta^t)=&\sum\limits_{z_1,\cdots,z_T}log[b_{(z_1,x_1)}]P(Z|X,\theta^t) \\=&\sum\limits_{z_1}log[b_{(z_1,x_1)}]\sum\limits_{z_2,\cdots,z_T}P(z_1,\cdots,z_T|X,\theta^t) \\=&\sum\limits_{z_1}log[b_{(z_1,x_1)}]P(z_1|X,\theta^t) \\=&\sum\limits_{i=1}^nlog[b_{(z_1=q_j,x_1)}]P(z_1=q_j|X,\theta^t) \end{aligned} \end{equation}
Z∑log[b(z1,x1)]P(Z∣X,θt)====z1,⋯,zT∑log[b(z1,x1)]P(Z∣X,θt)z1∑log[b(z1,x1)]z2,⋯,zT∑P(z1,⋯,zT∣X,θt)z1∑log[b(z1,x1)]P(z1∣X,θt)i=1∑nlog[b(z1=qj,x1)]P(z1=qj∣X,θt)
所以,对于其余项,全部累加起来得
L
(
B
)
=
∑
t
=
1
T
∑
i
=
1
n
l
o
g
[
b
(
z
t
=
q
i
,
x
i
)
]
P
(
z
t
=
q
i
∣
X
,
θ
t
)
L(B)=\sum\limits_{t=1}^T\sum\limits_{i=1}^nlog[b_{(z_t=q_i,x_i)}]P(z_t=q_i|X,\theta^t)
L(B)=t=1∑Ti=1∑nlog[b(zt=qi,xi)]P(zt=qi∣X,θt)
另外,对于转移矩阵B,其行向量是肯定满足
∑
j
=
1
m
b
(
z
=
q
i
,
x
=
v
j
)
=
1
\sum\limits_{j=1}^m{b_{(z=q_i,x=v_j)}}=1
j=1∑mb(z=qi,x=vj)=1,且与上面所写的一样,对
P
(
z
t
=
q
i
∣
X
,
θ
t
)
P(z_t=q_i|X,\theta^t)
P(zt=qi∣X,θt)贝叶斯展开,然后得到
L
(
B
,
λ
)
=
∑
t
=
1
T
∑
i
=
1
n
l
o
g
[
b
(
z
t
=
q
i
,
x
i
)
]
P
(
z
t
=
q
i
,
X
∣
θ
t
)
+
λ
[
∑
j
=
1
m
b
(
z
=
q
i
,
x
=
v
j
)
−
1
]
L(B,\lambda)=\sum\limits_{t=1}^T\sum\limits_{i=1}^nlog[b_{(z_t=q_i,x_i)}]P(z_t=q_i,X|\theta^t)+\lambda\left[\sum\limits_{j=1}^m{b_{(z=q_i,x=v_j)}}-1\right]
L(B,λ)=t=1∑Ti=1∑nlog[b(zt=qi,xi)]P(zt=qi,X∣θt)+λ[j=1∑mb(z=qi,x=vj)−1]
对
b
(
z
=
q
i
,
x
=
v
j
)
b_{(z=q_i,x=v_j)}
b(z=qi,x=vj)求导,因为拉格朗日函数里面的
x
i
x_i
xi是由给定的数据确定的,我们只对
x
=
v
j
x=v_j
x=vj得部分才有值,其他的都为0。因此,我们引入示性函数
I
=
{
1
,
x
=
v
j
0
,
x
≠
u
j
I=\left\{ \begin{matrix} 1,x=v_j\\ 0,x\ne{u_j} \end{matrix} \right.
I={1,x=vj0,x=uj
所以
∂
L
(
π
,
λ
)
∂
b
(
z
=
q
i
,
x
=
v
j
)
=
∑
t
=
1
T
1
b
(
z
t
=
q
i
,
x
t
)
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
λ
=
0
等式左右乘以
b
(
z
=
q
i
,
x
=
v
j
)
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
λ
b
(
z
=
q
i
,
x
=
v
j
)
=
0
\begin{equation} \begin{aligned} \frac{\partial{L(\pi,\lambda)}}{\partial{b_{(z=q_i,x=v_j)}}}=&\sum\limits_{t=1}^T\frac{1}{b_{(z_t=q_i,x_t)}}P(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&0 \\&等式左右乘以b_{(z=q_i,x=v_j)} \\&\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda{b_{(z=q_i,x=v_j)}}=0 \end{aligned} \end{equation}
∂b(z=qi,x=vj)∂L(π,λ)==t=1∑Tb(zt=qi,xt)1P(zt=qi,X∣θt)I(xt=vj)+λ0等式左右乘以b(z=qi,x=vj)t=1∑TP(zt=qi,X∣θt)I(xt=vj)+λb(z=qi,x=vj)=0
即
∑
j
m
[
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
λ
b
(
z
=
q
i
,
x
=
v
j
)
]
=
∑
j
=
1
m
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
∑
j
=
1
m
λ
b
(
z
=
q
i
,
x
=
v
j
)
=
∑
j
=
1
m
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
λ
=
∑
t
=
1
T
∑
j
=
1
m
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
+
λ
=
0
\begin{equation} \begin{aligned} &\sum\limits_{j}^m\left[\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda{b_{(z=q_i,x=v_j)}}\right] \\=&\sum\limits_{j=1}^m\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\sum\limits_{j=1}^m\lambda{b_{(z=q_i,x=v_j)}} \\=&\sum\limits_{j=1}^m\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&\sum\limits_{t=1}^T\sum\limits_{j=1}^mP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&0 \end{aligned} \end{equation}
====j∑m[t=1∑TP(zt=qi,X∣θt)I(xt=vj)+λb(z=qi,x=vj)]j=1∑mt=1∑TP(zt=qi,X∣θt)I(xt=vj)+j=1∑mλb(z=qi,x=vj)j=1∑mt=1∑TP(zt=qi,X∣θt)I(xt=vj)+λt=1∑Tj=1∑mP(zt=qi,X∣θt)I(xt=vj)+λ0
对于
∑
j
=
1
m
P
(
z
t
=
q
i
,
X
∣
θ
)
I
(
x
t
=
v
j
)
\sum\limits_{j=1}^mP(z_t=q_i,X|\theta)I(x_t=v_j)
j=1∑mP(zt=qi,X∣θ)I(xt=vj),由于只能存在一个
x
t
=
v
j
x_t=v_j
xt=vj,故
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
+
λ
=
0
\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)+\lambda=0
t=1∑TP(zt=qi,X∣θt)+λ=0
将其回代入求导所得式中,得
b
(
z
=
q
i
,
x
=
v
j
)
=
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
)
b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)}
b(z=qi,x=vj)=t=1∑TP(zt=qi,X∣θ)t=1∑TP(zt=qi,X∣θt)I(xt=vj)
所以,最终的迭代更新式为
π
i
=
P
(
z
1
=
q
i
,
X
∣
θ
t
)
P
(
X
∣
θ
t
)
;
a
(
z
=
q
i
,
z
=
q
j
)
=
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
∑
t
=
1
T
−
1
P
(
z
t
=
q
i
,
X
∣
θ
t
)
;
b
(
z
=
q
i
,
x
=
v
j
)
=
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
)
\pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)}; \\a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)}; \\b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)}
πi=P(X∣θt)P(z1=qi,X∣θt);a(z=qi,z=qj)=t=1∑T−1P(zt=qi,X∣θt)t∑T−1P(zt=qi,zt+1=qj,X∣θt);b(z=qi,x=vj)=t=1∑TP(zt=qi,X∣θ)t=1∑TP(zt=qi,X∣θt)I(xt=vj)
\theta)}
$$
所以,最终的迭代更新式为
π
i
=
P
(
z
1
=
q
i
,
X
∣
θ
t
)
P
(
X
∣
θ
t
)
;
a
(
z
=
q
i
,
z
=
q
j
)
=
∑
t
T
−
1
P
(
z
t
=
q
i
,
z
t
+
1
=
q
j
,
X
∣
θ
t
)
∑
t
=
1
T
−
1
P
(
z
t
=
q
i
,
X
∣
θ
t
)
;
b
(
z
=
q
i
,
x
=
v
j
)
=
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
t
)
I
(
x
t
=
v
j
)
∑
t
=
1
T
P
(
z
t
=
q
i
,
X
∣
θ
)
\pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)}; \\a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)}; \\b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)}
πi=P(X∣θt)P(z1=qi,X∣θt);a(z=qi,z=qj)=t=1∑T−1P(zt=qi,X∣θt)t∑T−1P(zt=qi,zt+1=qj,X∣θt);b(z=qi,x=vj)=t=1∑TP(zt=qi,X∣θ)t=1∑TP(zt=qi,X∣θt)I(xt=vj)
可是,还有个问题。等号右边的概率又该如何计算?主要从下一篇evaluationHMM隐马尔可夫模型的数学推导(二)中进行引入。