监督学习方法
假设已给出训练数据包含S个长度相同的观测序列和对应的状态序列 { ( O 1 , I 1 ) , ( O 2 , I 2 ) , … , ( O S , I S ) } \left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \ldots,\left(O_{S}, I_{S}\right)\right\} {(O1,I1),(O2,I2),…,(OS,IS)}那么可以利用极大似然估计法来估计隐马尔科夫模型,具体方法如下
转移概率
a
i
j
a_{ij}
aij的估计:
a
i
j
=
A
i
j
∑
j
=
1
N
A
i
j
a_{i j}=\frac{A_{i j}}{\sum_{j=1}^{N} A_{i j}}
aij=∑j=1NAijAij
其中,
A
i
j
A_{ij}
Aij为样本中时刻t处于状态
q
i
q_{i}
qi而到时刻t+1转移到状态
q
j
q_{j}
qj的频数。
观测概率
b
i
j
b_{ij}
bij的估计:
b
j
k
=
B
j
k
∑
k
=
1
M
B
j
k
b_{j k}=\frac{B_{j k}}{\sum_{k=1}^{M} B_{j k}}
bjk=∑k=1MBjkBjk
其中,
B
j
k
B_{jk}
Bjk为样本中状态数为
q
j
q_j
qj,其对应观测为
v
k
v_{k}
vk的频数。初始状态概率
π
i
\pi_{i}
πi的估计为S个样本中初始状态为
q
i
q_{i}
qi的频率。
举个例子。假设状态集合是
{
1
,
2
,
3
}
\{1,2,3\}
{1,2,3},观测集合是
{
a
,
b
}
\{a, b\}
{a,b},样本有两个:
O
1
=
(
a
,
a
,
b
)
,
I
1
=
(
2
,
1
,
1
)
,
O
2
=
(
a
,
b
,
a
)
,
I
2
=
(
1
,
3
,
2
)
O_1=(a, a, b), I_{1}=(2, 1, 1), O_{2}=(a, b, a), I_{2}=(1, 3, 2)
O1=(a,a,b),I1=(2,1,1),O2=(a,b,a),I2=(1,3,2)。 那么对于转移概率有:
a
11
=
A
11
A
11
+
A
12
+
A
13
=
1
1
+
0
+
1
=
1
2
a
12
=
0
a
13
=
1
2
a_{11}=\frac{A_{11}}{A_{11}+A_{12}+A_{13}}=\frac{1}{1+0+1}=\frac{1}{2}\\ a_{12}=0\\ a_{13}=\frac{1}{2}
a11=A11+A12+A13A11=1+0+11=21a12=0a13=21
对于观测概率有
b
1
a
=
B
1
a
B
1
a
+
B
1
b
=
2
1
+
2
=
2
3
b
1
b
=
B
1
b
B
1
a
+
B
1
b
=
1
1
+
2
=
1
3
b_{1a}=\frac{B_{1a}}{B_{1a}+B_{1b}}=\dfrac{2}{1+2}=\frac{2}{3}\\ b_{1b}=\frac{B_{1b}}{B_{1a}+B_{1b}}=\dfrac{1}{1+2}=\frac{1}{3}
b1a=B1a+B1bB1a=1+22=32b1b=B1a+B1bB1b=1+21=31
初始概率有
π
1
=
1
2
,
π
2
=
1
2
,
π
3
=
0
\pi_{1}=\dfrac{1}{2}, \pi_{2}=\dfrac{1}{2}, \pi_{3}=0
π1=21,π2=21,π3=0。
Baum-Welch算法
这个算法就是EM算法在隐马尔科夫模型的应用。如果只有观测序列数据
O
=
(
o
1
,
o
2
,
…
,
o
T
)
O=\left(o_{1}, o_{2}, \ldots, o_{T}\right)
O=(o1,o2,…,oT),而没有状态序列数据
I
=
(
i
1
,
i
2
,
…
,
i
T
)
I=\left(i_{1}, i_{2}, \ldots, i_{T}\right)
I=(i1,i2,…,iT),那么隐马尔科夫模型就是一个含有隐变量的概率模型(
P
(
Y
∣
θ
)
=
∑
Z
P
(
Y
∣
Z
,
θ
)
P
(
Z
∣
θ
)
,
Y
→
O
,
Z
→
I
P(Y\mid \theta)=\sum_{Z} P(Y\mid Z, \theta)P(Z\mid \theta), Y\rightarrow O, Z\rightarrow I
P(Y∣θ)=∑ZP(Y∣Z,θ)P(Z∣θ),Y→O,Z→I):
P
(
O
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
P(O \mid \lambda)=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda)
P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)
如果要对它进行参数估计,则可以采用EM算法来实现。我们先要确定完全数据的对数似然函数。此时观测数据为
O
=
(
o
1
,
o
2
,
…
,
o
T
)
O=\left(o_{1}, o_{2}, \ldots, o_{T}\right)
O=(o1,o2,…,oT),未观测数据为
I
=
(
i
1
,
i
2
,
…
,
i
T
)
I=\left(i_{1}, i_{2}, \ldots, i_{T}\right)
I=(i1,i2,…,iT),则完全数据为
(
O
,
I
)
=
(
o
1
,
o
2
,
…
,
o
T
,
i
1
,
i
2
,
…
,
i
T
)
(O, I)=\left(o_{1}, o_{2}, \ldots, o_{T}, i_{1}, i_{2}, \ldots, i_{T}\right)
(O,I)=(o1,o2,…,oT,i1,i2,…,iT),完全数据的对数似然函数为:
ln
P
(
O
,
I
∣
λ
)
\ln P(O, I \mid \lambda)
lnP(O,I∣λ)
其中,
P
(
O
,
I
∣
λ
)
=
π
i
1
b
i
1
o
1
a
i
1
i
2
b
i
2
o
2
⋯
a
i
T
−
1
i
T
b
i
T
o
T
P(O, I \mid \lambda)=\pi_{i_{1}} b_{i_{1} o_{1}} a_{i_{1} i_{2}} b_{i_{2} o_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} o_{T}}
P(O,I∣λ)=πi1bi1o1ai1i2bi2o2⋯aiT−1iTbiToT,所以可以进一步推得
ln
P
(
O
,
I
∣
λ
)
=
ln
(
π
i
1
b
i
1
o
1
a
i
1
i
2
b
i
2
o
2
⋯
a
i
T
−
1
i
T
b
i
T
o
T
)
=
ln
π
i
1
+
∑
t
=
1
T
−
1
ln
a
i
t
i
t
+
1
+
∑
t
=
1
T
ln
b
i
t
o
t
\begin{aligned} \ln P(O, I \mid \lambda) &=\ln \left(\pi_{i_{1}} b_{i_{1} o_{1}} a_{i_{1} i_{2}} b_{i_{2} o_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} o_{T}}\right) \\ &=\ln \pi_{i_{1}}+\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}+\sum_{t=1}^{T} \ln b_{i_{t} o_{t}} \end{aligned}
lnP(O,I∣λ)=ln(πi1bi1o1ai1i2bi2o2⋯aiT−1iTbiToT)=lnπi1+t=1∑T−1lnaitit+1+t=1∑Tlnbitot
EM算法E步:
求Q函数
Q
(
λ
,
λ
ˉ
)
Q(\lambda, \bar{\lambda})
Q(λ,λˉ)
Q
(
λ
,
λ
ˉ
)
=
∑
I
P
(
I
∣
O
,
λ
ˉ
)
ln
P
(
O
,
I
∣
λ
)
Q(\lambda, \bar{\lambda})=\sum_{I} P(I \mid O, \bar{\lambda}) \ln P(O, I \mid \lambda)
Q(λ,λˉ)=I∑P(I∣O,λˉ)lnP(O,I∣λ)
其中,
λ
ˉ
\bar{\lambda}
λˉ是隐马尔科夫模型参数的当前估计值,
λ
\lambda
λ是要极大化的隐马尔科夫模型参数。为了便于后续计算,Q函数还可以作如下恒等变形:
Q
(
λ
,
λ
ˉ
)
=
∑
I
P
(
I
∣
O
,
λ
ˉ
)
ln
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
ˉ
)
P
(
O
∣
λ
ˉ
)
ln
P
(
O
,
I
∣
λ
)
\begin{aligned} Q(\lambda, \bar{\lambda}) &=\sum_{I} P(I \mid O, \bar{\lambda}) \ln P(O, I \mid \lambda) \\ &=\sum_{I} \frac{P(O, I \mid \bar{\lambda})}{P(O \mid \bar{\lambda})} \ln P(O, I \mid \lambda) \end{aligned}
Q(λ,λˉ)=I∑P(I∣O,λˉ)lnP(O,I∣λ)=I∑P(O∣λˉ)P(O,I∣λˉ)lnP(O,I∣λ)
其中利用了
P
(
A
∣
B
)
=
P
(
A
,
B
)
P
(
B
)
P(A\mid B)=\dfrac{P(A, B)}{P(B)}
P(A∣B)=P(B)P(A,B)。由于接下来仅极大化
λ
\lambda
λ,
P
(
O
∣
λ
ˉ
)
P(O \mid \bar{\lambda})
P(O∣λˉ)可以看做常数项进行略去,所以Q函数可以进一化简为:
Q
(
λ
,
λ
ˉ
)
=
∑
I
P
(
O
,
I
∣
λ
ˉ
)
ln
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
ˉ
)
(
ln
π
i
1
+
∑
t
=
1
T
−
1
ln
a
i
t
i
t
+
1
+
∑
t
=
1
T
ln
b
i
t
O
t
)
=
∑
I
P
(
O
,
I
∣
λ
ˉ
)
ln
π
i
1
+
∑
I
P
(
O
,
I
∣
λ
ˉ
)
(
∑
t
=
1
T
−
1
ln
a
i
t
i
t
+
1
)
+
∑
I
P
(
O
,
I
∣
λ
ˉ
)
(
∑
t
=
1
T
ln
b
i
t
o
t
)
\begin{aligned} Q(\lambda, \bar{\lambda}) &=\sum_{I} P(O, I \mid \bar{\lambda}) \ln P(O, I \mid \lambda) \\ &=\sum_{I} P(O, I \mid \bar{\lambda})\left(\ln \pi_{i_{1}}+\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}+\sum_{t=1}^{T} \ln b_{i_{t} O_{t}}\right) \\ &=\sum_{I} P(O, I \mid \bar{\lambda}) \ln \pi_{i_{1}}+\sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}\right)+\sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T} \ln b_{i_{t} o_{t}}\right) \end{aligned}
Q(λ,λˉ)=I∑P(O,I∣λˉ)lnP(O,I∣λ)=I∑P(O,I∣λˉ)(lnπi1+t=1∑T−1lnaitit+1+t=1∑TlnbitOt)=I∑P(O,I∣λˉ)lnπi1+I∑P(O,I∣λˉ)(t=1∑T−1lnaitit+1)+I∑P(O,I∣λˉ)(t=1∑Tlnbitot)
EM算法M步:
极大化Q函数。由于要极大化的参数在上式中单独地出现在3个项中,所以只需要对各项分别极大化。
求
π
i
\pi_i
πi: Q函数中的第1项可以写成:
∑
I
P
(
O
,
I
∣
λ
ˉ
)
ln
π
i
1
=
∑
i
1
,
i
2
,
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
T
∣
λ
ˉ
)
ln
π
i
1
=
∑
i
=
1
N
(
∑
i
2
,
i
3
,
…
,
i
T
P
(
O
,
i
1
=
q
i
,
i
2
,
i
3
,
…
,
i
T
∣
λ
ˉ
)
ln
π
i
)
=
∑
i
=
1
N
{
ln
π
i
⋅
(
∑
i
2
,
i
3
,
…
,
i
T
P
(
O
,
i
1
=
q
i
,
i
2
,
i
3
,
…
,
i
T
∣
λ
ˉ
)
)
}
=
∑
i
=
1
N
ln
π
i
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
\begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda}) \ln \pi_{i_{1}} &=\sum_{i_{1}, i_{2}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln \pi_{i_{1}} \\ &=\sum_{i=1}^{N}\left(\sum_{i_{2}, i_{3}, \ldots, i_{T}} P\left(O, i_{1}=q_{i}, i_{2}, i_{3}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln \pi_{i}\right) \\ &=\sum_{i=1}^{N}\left\{\ln \pi_{i} \cdot\left(\sum_{i_{2}, i_{3}, \ldots, i_{T}} P\left(O, i_{1}=q_{i}, i_{2}, i_{3}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right\} \\ &=\sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right) \end{aligned}
I∑P(O,I∣λˉ)lnπi1=i1,i2,…,iT∑P(O,i1,i2,…,iT∣λˉ)lnπi1=i=1∑N(i2,i3,…,iT∑P(O,i1=qi,i2,i3,…,iT∣λˉ)lnπi)=i=1∑N{lnπi⋅(i2,i3,…,iT∑P(O,i1=qi,i2,i3,…,iT∣λˉ))}=i=1∑NlnπiP(O,i1=qi∣λˉ)
由于
π
\pi
π满足约束
∑
i
=
1
N
π
i
=
1
\sum_{i=1}^{N} \pi_{i}=1
∑i=1Nπi=1,利用拉格朗日乘子法,写出拉格朗日函数:
∑
i
=
1
N
ln
π
i
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
(
∑
i
=
1
N
π
i
−
1
)
\sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta\left(\sum_{i=1}^{N} \pi_{i}-1\right)
i=1∑NlnπiP(O,i1=qi∣λˉ)+η(i=1∑Nπi−1)
对拉格朗日函数关于
π
\pi
π求偏导并令结果为0:
∂
∂
π
i
[
∑
i
=
1
N
ln
π
i
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
(
∑
i
=
1
N
π
i
−
1
)
]
=
0
1
π
i
⋅
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
=
0
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
π
i
=
0
\begin{gathered} \frac{\partial}{\partial \pi_{i}}\left[\sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta\left(\sum_{i=1}^{N} \pi_{i}-1\right)\right]=0 \\ \frac{1}{\pi_{i}} \cdot P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta=0 \\ P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}=0 \end{gathered}
∂πi∂[i=1∑NlnπiP(O,i1=qi∣λˉ)+η(i=1∑Nπi−1)]=0πi1⋅P(O,i1=qi∣λˉ)+η=0P(O,i1=qi∣λˉ)+ηπi=0
利用
∑
i
=
1
N
π
i
=
1
\sum_{i=1}^{N} \pi_{i}=1
∑i=1Nπi=1,对上式两边关于i求和可得:
∑
i
=
1
N
[
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
π
i
]
=
0
∑
i
=
1
N
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
∑
i
=
1
N
η
π
i
=
0
P
(
O
∣
λ
ˉ
)
+
η
⋅
1
=
0
η
=
−
P
(
O
∣
λ
ˉ
)
\begin{gathered} \sum_{i=1}^{N}\left[P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}\right]=0 \\ \sum_{i=1}^{N} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\sum_{i=1}^{N} \eta \pi_{i}=0 \\ P(O \mid \bar{\lambda})+\eta \cdot 1=0 \\ \eta=-P(O \mid \bar{\lambda}) \end{gathered}
i=1∑N[P(O,i1=qi∣λˉ)+ηπi]=0i=1∑NP(O,i1=qi∣λˉ)+i=1∑Nηπi=0P(O∣λˉ)+η⋅1=0η=−P(O∣λˉ)
将其代回
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
+
η
π
i
=
0
P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}=0
P(O,i1=qi∣λˉ)+ηπi=0可得:
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
−
P
(
O
∣
λ
ˉ
)
⋅
π
i
=
0
π
i
=
P
(
O
,
i
1
=
q
i
∣
λ
ˉ
)
P
(
O
∣
λ
ˉ
)
=
P
(
i
1
=
q
i
∣
O
,
λ
ˉ
)
=
γ
1
(
i
)
=
α
1
(
i
)
β
1
(
i
)
∑
j
=
1
N
α
1
(
j
)
β
1
(
j
)
\begin{gathered} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)-P(O \mid \bar{\lambda}) \cdot \pi_{i}=0 \\ \pi_{i}=\frac{P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}=P\left(i_{1}=q_{i} \mid O, \bar{\lambda}\right)=\gamma_{1}(i)=\frac{\alpha_{1}(i) \beta_{1}(i)}{\sum_{j=1}^{N} \alpha_{1}(j) \beta_{1}(j)} \end{gathered}
P(O,i1=qi∣λˉ)−P(O∣λˉ)⋅πi=0πi=P(O∣λˉ)P(O,i1=qi∣λˉ)=P(i1=qi∣O,λˉ)=γ1(i)=∑j=1Nα1(j)β1(j)α1(i)β1(i)
其中
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
\gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)}
γt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数
λ
\lambda
λ和观测
O
O
O,在时刻t处于状态
q
i
q_i
qi的概率。
求
a
i
j
a_{ij}
aij:Q函数中的第2项可以写成:
∑
I
P
(
O
,
I
∣
λ
ˉ
)
(
∑
t
=
1
T
−
1
ln
a
i
t
+
i
t
+
1
)
=
∑
t
=
1
T
−
1
(
∑
i
1
,
i
2
,
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
T
∣
λ
ˉ
)
ln
a
i
t
i
t
+
1
)
=
∑
t
=
1
T
−
1
{
∑
i
=
1
N
∑
j
=
1
N
(
∑
i
1
,
i
2
,
…
,
i
t
−
1
,
i
t
+
2
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
t
=
q
i
,
i
t
+
1
=
q
j
,
…
,
i
T
∣
λ
ˉ
)
ln
a
i
j
)
}
=
∑
t
=
1
T
−
1
{
∑
i
=
1
N
∑
j
=
1
N
[
ln
a
i
j
⋅
(
∑
i
1
,
i
2
,
…
,
i
t
−
1
,
i
t
+
2
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
t
=
q
i
,
i
t
+
1
=
q
j
,
…
,
i
T
∣
λ
ˉ
)
)
]
}
=
∑
t
=
1
T
−
1
∑
i
=
1
N
∑
j
=
1
N
ln
a
i
j
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
\begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T-1} \ln a_{i_{t}+i_{t+1}}\right) &=\sum_{t=1}^{T-1}\left(\sum_{i_{1}, i_{2}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln a_{i_{t} i_{t+1}}\right) \\ &=\sum_{t=1}^{T-1}\left\{\sum_{i=1}^{N} \sum_{j=1}^{N}\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{i}, i_{t+1}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln a_{i j}\right)\right\} \\ &=\sum_{t=1}^{T-1}\left\{\sum_{i=1}^{N} \sum_{j=1}^{N}\left[\ln a_{i j} \cdot\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{i}, i_{t+1}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right]\right\} \\ &=\sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right) \end{aligned}
I∑P(O,I∣λˉ)(t=1∑T−1lnait+it+1)=t=1∑T−1(i1,i2,…,iT∑P(O,i1,i2,…,iT∣λˉ)lnaitit+1)=t=1∑T−1⎩⎨⎧i=1∑Nj=1∑N⎝⎛i1,i2,…,it−1,it+2…,iT∑P(O,i1,i2,…,it=qi,it+1=qj,…,iT∣λˉ)lnaij⎠⎞⎭⎬⎫=t=1∑T−1⎩⎨⎧i=1∑Nj=1∑N⎣⎡lnaij⋅⎝⎛i1,i2,…,it−1,it+2…,iT∑P(O,i1,i2,…,it=qi,it+1=qj,…,iT∣λˉ)⎠⎞⎦⎤⎭⎬⎫=t=1∑T−1i=1∑Nj=1∑NlnaijP(O,it=qi,it+1=qj∣λˉ)
由于
a
i
j
a_{ij}
aij需要满足约束
∑
j
=
1
N
a
i
j
=
1
\sum_{j=1}^{N} a_{i j}=1
∑j=1Naij=1,同样利用拉格朗日乘子法,写出拉格朗日函数:
∑
t
=
1
T
−
1
∑
i
=
1
N
∑
j
=
1
N
ln
a
i
j
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
η
(
∑
j
=
1
N
a
i
j
−
1
)
\sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{j=1}^{N} a_{i j}-1\right)
t=1∑T−1i=1∑Nj=1∑NlnaijP(O,it=qi,it+1=qj∣λˉ)+η(j=1∑Naij−1)
对拉格朗日函数关于
a
i
j
a_{ij}
aij求偏导并令结果为0:
∂
∂
a
i
j
[
∑
t
=
1
T
−
1
∑
i
=
1
N
∑
j
=
1
N
ln
a
i
j
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
η
(
∑
j
=
1
N
a
i
j
−
1
)
]
=
0
1
a
i
j
⋅
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
η
=
0
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
η
a
i
j
=
0
\begin{gathered} \frac{\partial}{\partial a_{i j}}\left[\sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{j=1}^{N} a_{i j}-1\right)\right]=0 \\ \frac{1}{a_{i j}} \cdot \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta=0 \\ \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta a_{i j}=0 \end{gathered}
∂aij∂[t=1∑T−1i=1∑Nj=1∑NlnaijP(O,it=qi,it+1=qj∣λˉ)+η(j=1∑Naij−1)]=0aij1⋅t=1∑T−1P(O,it=qi,it+1=qj∣λˉ)+η=0t=1∑T−1P(O,it=qi,it+1=qj∣λˉ)+ηaij=0
利用
∑
j
=
1
N
a
i
j
=
1
\sum_{j=1}^{N} a_{i j}=1
∑j=1Naij=1对上式两边关于j求和可得:
∑
j
=
1
N
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
∑
j
=
1
N
η
a
i
j
=
0
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
∣
λ
ˉ
)
+
η
⋅
1
=
0
η
=
−
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
∣
λ
ˉ
)
\begin{gathered} \sum_{j=1}^{N} \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\sum_{j=1}^{N} \eta a_{i j}=0 \\ \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)+\eta \cdot 1=0 \\ \eta=-\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right) \end{gathered}
j=1∑Nt=1∑T−1P(O,it=qi,it+1=qj∣λˉ)+j=1∑Nηaij=0t=1∑T−1P(O,it=qi∣λˉ)+η⋅1=0η=−t=1∑T−1P(O,it=qi∣λˉ)
将其代回
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
+
η
a
i
j
=
0
\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta a_{i j}=0
∑t=1T−1P(O,it=qi,it+1=qj∣λˉ)+ηaij=0可得:
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
−
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
∣
λ
ˉ
)
⋅
a
i
j
=
0
a
i
j
=
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
∣
λ
ˉ
)
\begin{gathered} \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)-\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right) \cdot a_{i j}=0 \\ a_{i j}=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)} \end{gathered}
t=1∑T−1P(O,it=qi,it+1=qj∣λˉ)−t=1∑T−1P(O,it=qi∣λˉ)⋅aij=0aij=∑t=1T−1P(O,it=qi∣λˉ)∑t=1T−1P(O,it=qi,it+1=qj∣λˉ)
分子分母同时除以
P
(
O
∣
λ
ˉ
)
P(O \mid \bar{\lambda})
P(O∣λˉ)
a
i
j
=
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
λ
ˉ
)
P
(
O
∣
λ
ˉ
)
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
q
i
∣
λ
ˉ
)
P
(
O
∣
λ
ˉ
)
=
∑
t
=
1
T
−
1
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
O
,
λ
ˉ
)
∑
t
=
1
T
−
1
P
(
i
t
=
q
i
∣
O
,
λ
ˉ
)
=
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
l
=
1
T
−
1
γ
t
(
i
)
a_{i j}=\frac{\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}{\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}=\frac{\sum_{t=1}^{T-1} P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid O, \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(i_{t}=q_{i} \mid O, \bar{\lambda}\right)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{l=1}^{T-1} \gamma_{t}(i)}
aij=P(O∣λˉ)∑t=1T−1P(O,it=qi∣λˉ)P(O∣λˉ)∑t=1T−1P(O,it=qi,it+1=qj∣λˉ)=∑t=1T−1P(it=qi∣O,λˉ)∑t=1T−1P(it=qi,it+1=qj∣O,λˉ)=∑l=1T−1γt(i)∑t=1T−1ξt(i,j)
其中
ξ
t
(
i
,
j
)
=
α
t
(
i
)
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
∑
i
=
1
N
∑
j
=
1
N
α
t
(
i
)
a
i
j
b
j
o
t
+
1
β
t
+
1
(
j
)
\xi_{t}(i, j)=\dfrac{\alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)}{\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)}
ξt(i,j)=∑i=1N∑j=1Nαt(i)aijbjot+1βt+1(j)αt(i)aijbjot+1βt+1(j)表示给定
λ
\lambda
λ和
O
O
O,在时刻t处于状态
q
i
q_i
qi且在
t
+
1
t+1
t+1处于
q
j
q_j
qj的概率。
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
\gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)}
γt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数
λ
\lambda
λ和观测
O
O
O,在时刻t处于状态
q
i
q_i
qi的概率。
求
b
j
k
b_{jk}
bjk:Q函数中的第3项可以写成:
∑
I
P
(
O
,
I
∣
λ
ˉ
)
(
∑
t
=
1
T
ln
b
i
t
o
t
)
=
∑
t
=
1
T
(
∑
i
1
,
i
2
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
T
∣
λ
ˉ
)
ln
b
i
t
o
t
)
=
∑
t
=
1
T
{
∑
j
=
1
N
(
∑
i
1
,
i
2
,
…
,
i
t
−
1
,
i
t
+
1
,
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
t
=
q
j
,
…
,
i
T
∣
λ
ˉ
)
ln
b
j
o
t
)
}
=
∑
t
=
1
T
{
∑
j
=
1
N
[
ln
b
j
o
t
⋅
(
∑
i
1
,
i
2
,
…
,
i
t
−
1
,
i
t
+
1
,
…
,
i
T
P
(
O
,
i
1
,
i
2
,
…
,
i
t
=
q
j
,
…
,
i
T
∣
λ
ˉ
)
)
]
}
=
∑
t
=
1
T
∑
j
=
1
N
ln
b
j
o
t
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
\begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T} \ln b_{i_{t} o_{t}}\right) &=\sum_{t=1}^{T}\left(\sum_{i_{1}, i_{2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln b_{i_{t} o_{t}}\right) \\ &=\sum_{t=1}^{T}\left\{\sum_{j=1}^{N}\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+1}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln b_{j o_{t}}\right)\right\} \\ &=\sum_{t=1}^{T}\left\{\sum_{j=1}^{N}\left[\ln b_{j o_{t}} \cdot\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+1}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right]\right\} \\ &=\sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \end{aligned}
I∑P(O,I∣λˉ)(t=1∑Tlnbitot)=t=1∑T(i1,i2…,iT∑P(O,i1,i2,…,iT∣λˉ)lnbitot)=t=1∑T⎩⎨⎧j=1∑N⎝⎛i1,i2,…,it−1,it+1,…,iT∑P(O,i1,i2,…,it=qj,…,iT∣λˉ)lnbjot⎠⎞⎭⎬⎫=t=1∑T⎩⎨⎧j=1∑N⎣⎡lnbjot⋅⎝⎛i1,i2,…,it−1,it+1,…,iT∑P(O,i1,i2,…,it=qj,…,iT∣λˉ)⎠⎞⎦⎤⎭⎬⎫=t=1∑Tj=1∑NlnbjotP(O,it=qj∣λˉ)
由于
b
j
k
b_{jk}
bjk需要满足约束条件
∑
k
=
1
M
b
j
k
=
1
\sum_{k=1}^{M} b_{j k}=1
∑k=1Mbjk=1,同样利用拉格朗日乘子法,写出拉格朗日函数
∑
t
=
1
T
∑
j
=
1
N
ln
b
j
o
t
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
+
η
(
∑
k
=
1
M
b
j
k
−
1
)
\sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{k=1}^{M} b_{j k}-1\right)
t=1∑Tj=1∑NlnbjotP(O,it=qj∣λˉ)+η(k=1∑Mbjk−1)
对拉格朗日函数关于
b
j
k
b_{jk}
bjk求偏导并令结果为0:
∂
∂
b
j
k
[
∑
t
=
1
T
∑
j
=
1
N
ln
b
j
o
t
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
+
η
(
∑
k
=
1
M
b
j
k
−
1
)
]
=
0
1
b
j
k
⋅
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
+
η
=
0
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
+
η
b
j
k
=
0
\begin{aligned} &\frac{\partial}{\partial b_{j k}}\left[\sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{k=1}^{M} b_{j k}-1\right)\right]=0\\ &\frac{1}{b_{j k}} \cdot \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta=0 \\ &\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta b_{j k}=0 \end{aligned}
∂bjk∂[t=1∑Tj=1∑NlnbjotP(O,it=qj∣λˉ)+η(k=1∑Mbjk−1)]=0bjk1⋅t=1∑TP(O,it=qj∣λˉ)I(ot=vk)+η=0t=1∑TP(O,it=qj∣λˉ)I(ot=vk)+ηbjk=0
其中,
I
(
o
t
=
v
k
)
\mathbb{I}\left(o_{t}=v_{k}\right)
I(ot=vk)为指示函数。首先求导不受
∑
j
=
1
N
\sum_{j=1}^{N}
∑j=1N约束,所以
∑
j
=
1
N
\sum_{j=1}^{N}
∑j=1N可以略去。又因为这里有
ln
b
j
o
t
\ln b_{jo_{t}}
lnbjot,
o
t
o_t
ot是需要从
o
1
o_1
o1遍历到
o
T
o_{T}
oT的,也就是
∑
t
=
1
T
\sum_{t=1}^{T}
∑t=1T。如果对于某个
t
t
t,
o
t
=
v
k
o_t=v_k
ot=vk,那么
ln
b
j
o
t
\ln b_{jo_{t}}
lnbjot可以求导,如果
o
t
≠
v
k
o_t\neq v_k
ot=vk,求导就等于0。也就是说从
o
1
o_1
o1遍历到
o
T
o_{T}
oT,观测序列中可能有不止一个
o
t
=
v
k
o_{t}=v_k
ot=vk。但是我们不知道具体哪几个
o
t
=
v
k
o_t=v_k
ot=vk,所以我们就引入了指示函数。
利用
∑
k
=
1
M
b
j
k
=
1
\sum_{k=1}^{M} b_{j k}=1
∑k=1Mbjk=1,对上式两边关于k求和可得:
∑
k
=
1
M
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
+
∑
k
=
1
M
η
b
j
k
=
0
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
+
η
⋅
1
=
0
η
=
−
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
\begin{gathered} \sum_{k=1}^{M} \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\sum_{k=1}^{M} \eta b_{j k}=0 \\ \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta \cdot 1=0 \\ \eta=-\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \end{gathered}
k=1∑Mt=1∑TP(O,it=qj∣λˉ)I(ot=vk)+k=1∑Mηbjk=0t=1∑TP(O,it=qj∣λˉ)+η⋅1=0η=−t=1∑TP(O,it=qj∣λˉ)
这里从
k
=
1
k=1
k=1到
k
=
M
k=M
k=M遍历求和,除了
o
t
=
v
k
o_t=v_k
ot=vk这一项使得指示函数为1,其他项全部为0。所以我们可以去掉指示函数,去掉k保留t,我们不需要知道具体哪个k。
将其代回
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
+
η
b
j
k
=
0
\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta b_{j k}=0
∑t=1TP(O,it=qj∣λˉ)I(ot=vk)+ηbjk=0,可得:
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
−
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
⋅
b
j
k
=
0
b
j
k
=
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
\begin{gathered} \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)-\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \cdot b_{j k}=0 \\ b_{j k}=\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)} \end{gathered}
t=1∑TP(O,it=qj∣λˉ)I(ot=vk)−t=1∑TP(O,it=qj∣λˉ)⋅bjk=0bjk=∑t=1TP(O,it=qj∣λˉ)∑t=1TP(O,it=qj∣λˉ)I(ot=vk)
分子分母同时除以
P
(
O
∣
λ
ˉ
)
P(O \mid \bar{\lambda})
P(O∣λˉ)
b
j
k
=
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
P
(
O
∣
λ
)
∑
t
=
1
T
P
(
O
,
i
t
=
q
j
∣
λ
ˉ
)
P
(
O
∣
λ
ˉ
)
=
∑
t
=
1
T
P
(
i
t
=
q
j
∣
O
,
λ
ˉ
)
I
(
o
t
=
v
k
)
∑
t
=
1
T
P
(
i
t
=
q
j
∣
O
,
λ
ˉ
)
=
∑
t
=
1
,
o
t
=
v
k
T
γ
t
(
j
)
∑
t
=
1
T
γ
t
(
j
)
b_{j k}=\frac{\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{P(O \mid \lambda)}}{\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}=\frac{\sum_{t=1}^{T} P\left(i_{t}=q_{j} \mid O, \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(i_{t}=q_{j} \mid O, \bar{\lambda}\right)}=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)}
bjk=P(O∣λˉ)∑t=1TP(O,it=qj∣λˉ)P(O∣λ)∑t=1TP(O,it=qj∣λˉ)I(ot=vk)=∑t=1TP(it=qj∣O,λˉ)∑t=1TP(it=qj∣O,λˉ)I(ot=vk)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
其中,
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
\gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)}
γt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数
λ
\lambda
λ和观测
O
O
O,在时刻t处于状态
q
i
q_i
qi的概率。