10.2 概率计算算法
10.2.1 直接计算法
P ( I ∣ λ ) = P ( i 1 , i 2 , … , i T ∣ λ ) = P ( i T ∣ i 1 , i 2 , … , i T − 1 , λ ) P ( i 1 , i 2 , … , i T − 1 ∣ λ ) P(I \mid \lambda)=P\left(i_{1}, i_{2}, \ldots, i_{T} \mid \lambda\right)=P\left(i_{T} \mid i_{1}, i_{2}, \ldots, i_{T-1}, \lambda\right) P\left(i_{1}, i_{2}, \ldots, i_{T-1} \mid \lambda\right) P(I∣λ)=P(i1,i2,…,iT∣λ)=P(iT∣i1,i2,…,iT−1,λ)P(i1,i2,…,iT−1∣λ)
根据齐次一阶马尔可夫假设:
P
(
i
T
∣
i
1
,
i
2
,
…
,
i
T
−
1
,
λ
)
=
P
(
i
T
∣
i
T
−
1
,
λ
)
=
a
i
T
−
1
,
i
T
P\left(i_{T} \mid i_{1}, i_{2}, \ldots, i_{T-1}, \lambda\right)=P\left(i_{T} \mid i_{T-1}, \lambda\right)=a_{i_{T-1}, i_{T}}
P(iT∣i1,i2,…,iT−1,λ)=P(iT∣iT−1,λ)=aiT−1,iT所以
P
(
I
∣
λ
)
=
a
i
T
−
1
,
i
T
P
(
i
1
,
i
2
,
⋯
,
i
T
−
1
∣
λ
)
=
a
i
T
−
1
,
i
T
a
i
T
−
2
,
i
T
−
1
P
(
i
1
,
i
2
,
⋯
,
i
T
−
2
∣
λ
)
=
π
i
1
∏
t
=
2
T
a
i
t
−
1
,
i
t
\begin{aligned} P(I|\lambda)&=a_{i_{T-1}, i_{T}}P(i_1,i_2,\cdots ,i_{T-1}|\lambda)\\ &=a_{i_{T-1}, i_{T}}a_{i_{T-2}, i_{T-1}}P(i_1,i_2,\cdots ,i_{T-2}|\lambda)\\ &={\pi}_{i_1}\prod_{t=2}^Ta_{i_{t-1},i_t} \end{aligned}
P(I∣λ)=aiT−1,iTP(i1,i2,⋯,iT−1∣λ)=aiT−1,iTaiT−2,iT−1P(i1,i2,⋯,iT−2∣λ)=πi1t=2∏Tait−1,it
又因为:
P ( O ∣ I , λ ) = P ( o 1 , o 2 , … , o T ∣ i 1 , i 2 , … , i T , λ ) = P ( o T ∣ o 1 , o 2 , … o T − 1 , i 1 , i 2 , … , i T , λ ) P ( o 1 , o 2 , … o T − 1 ∣ i 1 , i 2 , … i T , λ ) P(O \mid I, \lambda)=P\left(o_{1}, o_{2}, \ldots, o_{T} \mid i_{1}, i_{2}, \ldots, i_{T}, \lambda\right)=P\left(o_{T} \mid o_{1}, o_{2}, \ldots o_{T-1}, i_{1}, i_{2}, \ldots, i_{T}, \lambda\right) P\left(o_{1}, o_{2}, \ldots o_{T-1} \mid i_{1}, i_{2}, \ldots i_T, \lambda \right) P(O∣I,λ)=P(o1,o2,…,oT∣i1,i2,…,iT,λ)=P(oT∣o1,o2,…oT−1,i1,i2,…,iT,λ)P(o1,o2,…oT−1∣i1,i2,…iT,λ)
根据观察独立性假设
P
(
O
∣
I
,
λ
)
=
P
(
o
T
∣
i
T
)
P
(
o
1
,
o
2
,
⋯
,
o
T
−
1
∣
i
1
,
i
2
,
⋯
,
i
T
,
λ
)
=
b
i
1
(
o
1
)
P
(
o
1
,
o
2
,
⋯
,
o
T
−
1
∣
i
1
,
i
2
,
⋯
,
i
T
,
λ
)
=
b
i
1
(
o
1
)
b
i
2
(
o
2
)
P
(
o
1
,
o
2
,
⋯
,
o
T
−
2
∣
i
1
,
i
2
,
⋯
,
i
T
,
λ
)
=
∏
t
=
1
T
b
i
t
(
o
t
)
\begin{aligned} P(O|I,\lambda)&=P(o_T|i_T)P(o_1,o_2, \cdots ,o_{T-1}|i_1,i_2,\cdots ,i_T,\lambda)\\ &=b_{i_1}(o_1)P(o_1,o_2, \cdots ,o_{T-1}|i_1,i_2,\cdots ,i_{T},\lambda)\\ &=b_{i_1}(o_1)b_{i_2}(o_2)P(o_1,o_2, \cdots ,o_{T-2}|i_1,i_2,\cdots ,i_{T},\lambda)\\ &=\prod_{t=1}^Tb_{i_t}(o_t) \end{aligned}
P(O∣I,λ)=P(oT∣iT)P(o1,o2,⋯,oT−1∣i1,i2,⋯,iT,λ)=bi1(o1)P(o1,o2,⋯,oT−1∣i1,i2,⋯,iT,λ)=bi1(o1)bi2(o2)P(o1,o2,⋯,oT−2∣i1,i2,⋯,iT,λ)=t=1∏Tbit(ot)
所以
O
和
I
O和I
O和I同时出现的联合概率为:
P
(
O
,
I
∣
λ
)
=
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
=
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
b
i
2
(
o
2
)
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
\begin{aligned} P(O, I \mid \lambda) &=P(O \mid I, \lambda) P(I \mid \lambda) \\ &=\pi_{i_{1}} b_{i_{1}}\left(o_{1}\right) a_{i_{1} i_{2}} b_{i_{2}}\left(o_{2}\right) \cdots a_{i_{T-1} i_{T}} b_{i_{T}}\left(o_{T}\right) \end{aligned}
P(O,I∣λ)=P(O∣I,λ)P(I∣λ)=πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbiT(oT)
然后, 对所有可能的状态序列
I
I
I 求和, 得到观测序列
O
O
O 的概率
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ), 即
P
(
O
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
=
∑
i
1
,
i
2
,
⋯
,
i
T
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
b
i
2
(
o
2
)
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
\begin{aligned} P(O \mid \lambda) &=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda) \\ &=\sum_{i_{1}, i_{2}, \cdots, i_{T}} \pi_{i_{1}} b_{i_{1}}\left(o_{1}\right) a_{i_{1} i_{2}} b_{i_{2}}\left(o_{2}\right) \cdots a_{i_{T-1} i_{T}} b_{i_{T}}\left(o_{T}\right) \end{aligned}
P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)=i1,i2,⋯,iT∑πi1bi1(o1)ai1i2bi2(o2)⋯aiT−1iTbiT(oT)
10.2.2 前向算法
首先,根据贝叶斯公式又有:
P
(
A
,
B
,
C
)
=
P
(
A
)
P
(
B
∣
A
)
P
(
C
∣
A
,
B
)
P
(
A
,
B
,
C
∣
λ
)
=
P
(
A
∣
λ
)
P
(
B
∣
A
,
λ
)
P
(
C
∣
A
,
B
,
λ
)
P
(
A
∣
λ
)
=
∑
B
P
(
A
,
B
∣
λ
)
\begin{aligned} &P(A,B,C)=P(A)P(B|A)P(C|A,B)\\ &P(A,B,C|\lambda)=P(A|\lambda)P(B|A,\lambda)P(C|A,B,\lambda)\\ &P(A|\lambda)=\sum_BP(A,B|\lambda) \end{aligned}
P(A,B,C)=P(A)P(B∣A)P(C∣A,B)P(A,B,C∣λ)=P(A∣λ)P(B∣A,λ)P(C∣A,B,λ)P(A∣λ)=B∑P(A,B∣λ)
α t + 1 ( i ) = P ( o 1 , ⋯ , o t + 1 , i t + 1 = q i ∣ λ ) = ∑ j = 1 N P ( o 1 , … , o t + 1 , i t = q j , i t + 1 = q i ∣ λ ) = ∑ j = 1 N P ( o 1 , … , o t , i t = q j ∣ λ ) P ( i t + 1 = q i ∣ o 1 , … , o t , i t = q j , λ ) P ( o t + 1 ∣ o 1 , … , o t , i t = q j , i t + 1 = q i , λ ) = ∑ N P ( o 1 , … , o t , i t = q j ∣ λ ) P ( i t + 1 = q i ∣ i t = q j , λ ) P ( o t + 1 ∣ i t + 1 = q i , λ ) = ∑ j = 1 N α t ( j ) a j i b i ( o t + 1 ) , i = 1 , 2 , ⋯ , N \begin{aligned} \alpha_{t+1}(i) &=P(o_1,\cdots,o_{t+1},i_{t+1}=q_i|\lambda)\\ &=\sum_{j=1}^{N} P\left(o_{1}, \ldots, o_{t+1}, i_{t}=q_{j}, i_{t+1}=q_{i} \mid \lambda\right) \\ &=\sum_{j=1}^{N} P\left(o_{1}, \ldots, o_{t}, i_{t}=q_{j} \mid \lambda\right) P\left(i_{t+1}=q_{i} \mid o_{1}, \ldots, o_{t}, i_{t}=q_{j}, \lambda\right) P\left(o_{t+1} \mid o_{1}, \ldots, o_{t}, i_{t}=q_{j}, i_{t+1}=q_{i}, \lambda\right) \\ &=\sum^{N} P\left(o_{1}, \ldots, o_{t}, i_{t}=q_{j} \mid \lambda\right) P\left(i_{t+1}=q_{i} \mid i_{t}=q_{j}, \lambda\right) P\left(o_{t+1} \mid i_{t+1}=q_{i}, \lambda\right)\\ &=\sum_{j=1}^N\alpha_t(j)a_{ji}b_i(o_{t+1}),i=1,2,\cdots,N \end{aligned} αt+1(i)=P(o1,⋯,ot+1,it+1=qi∣λ)=j=1∑NP(o1,…,ot+1,it=qj,it+1=qi∣λ)=j=1∑NP(o1,…,ot,it=qj∣λ)P(it+1=qi∣o1,…,ot,it=qj,λ)P(ot+1∣o1,…,ot,it=qj,it+1=qi,λ)=∑NP(o1,…,ot,it=qj∣λ)P(it+1=qi∣it=qj,λ)P(ot+1∣it+1=qi,λ)=j=1∑Nαt(j)ajibi(ot+1),i=1,2,⋯,N
10.2.3 后向算法
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t = q i , λ ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T , i t + 1 = q j ∣ i t = q i , λ ) = ∑ j = 1 N P ( i t + 1 = q j ∣ i t = q i , λ ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j , i t = q i , λ ) P ( o t + 1 ∣ o t + 2 , ⋯ , o T , i t + 1 = q j , i t = q i , λ ) = ∑ j = 1 N P ( i t + 1 = q j ∣ i t = q i , λ ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j , λ ) P ( o t + 1 ∣ i t + 1 = q j , λ ) = ∑ j = 1 N a i j β t + 1 ( j ) b j ( o t + 1 ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} \beta_t(i)&=P(o_{t+1},o_{t+2},\cdots,o_T|i_t=q_i,\lambda)\\ &=\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j|i_t=q_i,\lambda)\\ &=\sum_{j=1}^NP(i_{t+1}=q_j|i_t=q_i,\lambda)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j,i_t=q_i,\lambda)P(o_{t+1}|o_{t+2},\cdots,o_T,i_{t+1}=q_j,i_t=q_i,\lambda)\\ &=\sum_{j=1}^NP(i_{t+1}=q_j|i_t=q_i,\lambda)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j,\lambda)P(o_{t+1}|i_{t+1}=q_j,\lambda)\\ &=\sum_{j=1}^Na_{ij}\beta_{t+1}(j)b_j(o_{t+1})\\ &=\sum_{j=1}^Na_{ij}b_j(o_{t+1})\beta_{t+1}(j) \end{aligned} βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi,λ)=j=1∑NP(it+1=qj∣it=qi,λ)P(ot+2,⋯,oT∣it+1=qj,it=qi,λ)P(ot+1∣ot+2,⋯,oT,it+1=qj,it=qi,λ)=j=1∑NP(it+1=qj∣it=qi,λ)P(ot+2,⋯,oT∣it+1=qj,λ)P(ot+1∣it+1=qj,λ)=j=1∑Naijβt+1(j)bj(ot+1)=j=1∑Naijbj(ot+1)βt+1(j)
合并公式:
P
(
O
∣
λ
)
=
∑
i
=
1
N
∑
j
=
1
N
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
=
∑
i
=
1
N
α
t
(
i
)
∑
j
=
1
N
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
=
∑
i
=
1
N
α
t
(
i
)
β
t
(
i
)
\begin{aligned} P(O \mid \lambda) &=\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \\ &=\sum_{i=1}^{N} \alpha_{t}(i) \sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \\ &=\sum_{i=1}^{N} \alpha_{t}(i) \beta_{t}(i) \end{aligned}
P(O∣λ)=i=1∑Nj=1∑Nαt(i)aijbj(ot+1)βt+1(j)=i=1∑Nαt(i)j=1∑Naijbj(ot+1)βt+1(j)=i=1∑Nαt(i)βt(i)
又因为
α
t
(
i
)
β
t
(
i
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
…
,
o
T
∣
i
t
=
q
i
,
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
…
,
o
T
∣
o
1
,
o
2
,
…
,
o
t
,
i
t
=
q
i
,
λ
)
=
P
(
o
1
,
o
2
,
…
,
o
T
,
i
t
=
q
i
∣
λ
)
=
P
(
O
,
i
t
=
q
i
∣
λ
)
\begin{aligned} \alpha_{t}(i) \beta_{t}(i) &=P\left(o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i} \mid \lambda\right) P\left(o_{t+1}, \ldots, o_{T} \mid i_{t}=q_{i}, \lambda\right) \\ &=P\left(o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i} \mid \lambda\right) P\left(o_{t+1}, \ldots, o_{T} \mid o_{1}, o_{2}, \ldots, o_{t}, i_{t}=q_{i}, \lambda\right) \\ &=P\left(o_{1}, o_{2}, \ldots, o_{T}, i_{t}=q_{i} \mid \lambda\right) \\ &=P\left(O, i_{t}=q_{i} \mid \lambda\right) \end{aligned}
αt(i)βt(i)=P(o1,o2,…,ot,it=qi∣λ)P(ot+1,…,oT∣it=qi,λ)=P(o1,o2,…,ot,it=qi∣λ)P(ot+1,…,oT∣o1,o2,…,ot,it=qi,λ)=P(o1,o2,…,oT,it=qi∣λ)=P(O,it=qi∣λ)
所以
∑
i
=
1
N
α
t
(
i
)
β
t
(
i
)
=
∑
i
=
1
N
P
(
O
,
i
t
=
q
i
∣
λ
)
=
P
(
O
∣
λ
)
\sum_{i=1}^{N} \alpha_{t}(i) \beta_{t}(i)=\sum_{i=1}^NP(O,i_t=q_i|\lambda)=P(O|\lambda)
i=1∑Nαt(i)βt(i)=i=1∑NP(O,it=qi∣λ)=P(O∣λ)
10.3 学习算法
按照Q函数的定义:
Q
(
λ
,
λ
‾
)
=
E
I
[
l
o
g
P
(
O
,
I
∣
λ
)
∣
O
,
λ
‾
]
=
∑
I
P
(
I
∣
O
,
λ
‾
)
l
o
g
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
‾
)
P
(
O
∣
λ
‾
)
l
o
g
P
(
O
,
I
∣
λ
)
\begin{aligned} Q(\lambda,\overline{\lambda})&=E_I[logP(O,I|\lambda)|O,\overline{\lambda}]\\ &=\sum_IP(I|O,\overline{\lambda})logP(O,I|\lambda)\\ &=\sum_I\frac{P(O,I|\overline{\lambda})}{P(O|\overline{\lambda})}logP(O,I|\lambda) \end{aligned}
Q(λ,λ)=EI[logP(O,I∣λ)∣O,λ]=I∑P(I∣O,λ)logP(O,I∣λ)=I∑P(O∣λ)P(O,I∣λ)logP(O,I∣λ)
略去对
λ
\lambda
λ而言的常数因子
1
P
(
O
∣
λ
‾
)
\frac{1}{P(O|\overline{\lambda})}
P(O∣λ)1,于是得到式子(10.33)
Q
(
λ
,
λ
‾
)
=
∑
I
P
(
O
,
I
∣
λ
‾
)
l
o
g
P
(
O
,
I
∣
λ
)
Q(\lambda,\overline{\lambda})=\sum_IP(O,I|\overline{\lambda})logP(O,I|\lambda)
Q(λ,λ)=I∑P(O,I∣λ)logP(O,I∣λ)
式子(10.35)求偏导得结果是:
P
(
O
,
i
1
=
i
∣
λ
‾
)
π
i
+
γ
=
0
\frac{P(O,i_1=i|\overline{\lambda})}{\pi_i}+\gamma=0
πiP(O,i1=i∣λ)+γ=0
然后两边同时乘以
π
i
\pi_i
πi得到书上的结果:
P
(
O
,
i
1
=
i
∣
λ
‾
)
+
γ
π
i
=
0
P(O,i_1=i|\overline{\lambda})+\gamma\pi_i=0
P(O,i1=i∣λ)+γπi=0
式子(10.37)按照上面的方法计算一遍:
注意到
a
i
j
a_{ij}
aij 满足约束条件
∑
j
=
1
N
a
i
j
=
1
\sum_{j=1}^{N}a_{ij}=1
∑j=1Naij=1, 利用拉格朗日乘子法, 写出拉格朗日函数:
∑
i
=
1
N
∑
j
=
1
N
∑
t
=
1
T
−
1
l
o
g
a
i
j
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
+
β
(
∑
j
=
1
N
a
i
j
−
1
)
\sum_{i=1}^N\sum_{j=1}^N\sum_{t=1}^{T-1}loga_{ij}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})+\beta(\sum_{j=1}^Na_{ij}-1)
i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λ)+β(j=1∑Naij−1)
对其求偏导数并令结果为0得
∑
t
=
1
T
−
1
1
a
i
j
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
+
β
=
0
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
+
β
a
i
j
=
0
a
i
j
=
−
1
β
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
∑
j
=
1
N
a
i
j
=
∑
j
=
1
N
−
1
β
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
β
=
−
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
∣
λ
‾
)
\sum_{t=1}^{T-1}\frac{1}{a_{ij}}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})+\beta=0\\ \sum_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})+\beta a_{ij}=0\\ a_{ij}=-\frac{1}{\beta}\sum_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})\\ \sum_{j=1}^Na_{ij}=\sum_{j=1}^N-\frac{1}{\beta}\sum_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})\\ \beta=-\sum_{t=1}^{T-1}P(O,i_t=i|\overline{\lambda})
t=1∑T−1aij1P(O,it=i,it+1=j∣λ)+β=0t=1∑T−1P(O,it=i,it+1=j∣λ)+βaij=0aij=−β1t=1∑T−1P(O,it=i,it+1=j∣λ)j=1∑Naij=j=1∑N−β1t=1∑T−1P(O,it=i,it+1=j∣λ)β=−t=1∑T−1P(O,it=i∣λ)
然将得到的
β
\beta
β代入最开始那个式子就得到(10.37)
a
i
j
=
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
ˉ
)
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
∣
λ
ˉ
)
a_{i j}=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=i \mid \bar{\lambda}\right)}
aij=∑t=1T−1P(O,it=i∣λˉ)∑t=1T−1P(O,it=i,it+1=j∣λˉ)
同样对式子(10.38)进行一样的操作,同样用拉格朗日乘子法, 约束条件是
∑
k
=
1
M
b
j
(
k
)
=
1
\sum_{k=1}^{M} b_{j}(k)=1
∑k=1Mbj(k)=1 。注意, 只有在
o
t
=
v
k
o_{t}=v_{k}
ot=vk 时
b
j
(
o
t
)
b_{j}\left(o_{t}\right)
bj(ot) 对
b
j
(
k
)
b_{j}(k)
bj(k) 的偏导数才不为 0 , 以
I
(
o
t
=
v
k
)
I\left(o_{t}=v_{k}\right)
I(ot=vk) 表示。求得拉格朗日函数为:
∑
j
=
1
N
∑
t
=
1
T
l
o
g
b
j
(
o
t
)
P
(
O
,
i
t
=
j
∣
λ
‾
)
+
η
(
∑
k
=
1
M
−
1
)
=
0
\sum_{j=1}^N\sum_{t=1}^Tlogb_j(o_t)P(O,i_t=j|\overline{\lambda})+\eta(\sum_{k=1}^M-1)=0
j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λ)+η(k=1∑M−1)=0
对其求偏导数并令结果为0
∑
t
=
1
T
P
(
O
,
i
t
=
j
∣
λ
‾
)
I
(
o
t
=
v
k
)
b
j
(
o
k
)
+
η
=
0
b
j
(
k
)
=
−
1
η
∑
t
=
1
T
P
(
O
,
i
t
=
j
∣
λ
‾
)
I
(
o
t
=
v
k
)
η
=
−
∑
t
=
1
T
∑
k
=
1
M
P
(
O
,
i
t
=
j
∣
λ
‾
)
I
(
o
t
=
v
k
)
η
=
−
∑
t
=
1
T
P
(
O
,
i
t
=
j
∣
λ
‾
)
\sum_{t=1}^T\frac{P(O,i_t=j|\overline{\lambda})I(o_t=v_k)}{b_j(o_k)}+\eta=0\\ b_j(k)=-\frac{1}{\eta}\sum_{t=1}^TP(O,i_t=j|\overline{\lambda})I(o_t=v_k)\\ \eta=-\sum_{t=1}^T\sum_{k=1}^MP(O,i_t=j|\overline{\lambda})I(o_t=v_k)\\ \eta=-\sum_{t=1}^TP(O,i_t=j|\overline{\lambda})
t=1∑Tbj(ok)P(O,it=j∣λ)I(ot=vk)+η=0bj(k)=−η1t=1∑TP(O,it=j∣λ)I(ot=vk)η=−t=1∑Tk=1∑MP(O,it=j∣λ)I(ot=vk)η=−t=1∑TP(O,it=j∣λ)
将
η
\eta
η代回上式得式子(10.38)
b
j
(
k
)
=
∑
t
=
1
T
P
(
O
,
i
t
=
j
∣
λ
ˉ
)
I
(
o
t
=
v
k
)
∑
t
=
1
T
P
(
O
,
i
t
=
j
∣
λ
ˉ
)
b_{j}(k)=\frac{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right)}
bj(k)=∑t=1TP(O,it=j∣λˉ)∑t=1TP(O,it=j∣λˉ)I(ot=vk)
下面对(10.39)~(10.41)进行推导
a
i
j
=
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
ˉ
)
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
∣
λ
ˉ
)
=
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
,
i
t
+
1
=
j
∣
λ
‾
)
1
P
(
O
∣
λ
‾
)
∑
t
=
1
T
−
1
P
(
O
,
i
t
=
i
∣
λ
‾
)
1
P
(
O
∣
λ
‾
)
=
∑
t
=
1
T
−
1
ζ
t
(
i
,
j
)
∑
t
=
1
T
−
1
γ
t
(
i
)
\begin{aligned} a_{i j}&=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=i \mid \bar{\lambda}\right)}\\ &=\frac{\sum_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\overline{\lambda})\frac{1}{P(O|\overline{\lambda})}}{\sum_{t=1}^{T-1}P(O,i_t=i|\overline{\lambda})\frac{1}{P(O|\overline{\lambda})}} \\&=\frac{\sum_{t=1}^{T-1}\zeta_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)} \end{aligned}
aij=∑t=1T−1P(O,it=i∣λˉ)∑t=1T−1P(O,it=i,it+1=j∣λˉ)=∑t=1T−1P(O,it=i∣λ)P(O∣λ)1∑t=1T−1P(O,it=i,it+1=j∣λ)P(O∣λ)1=∑t=1T−1γt(i)∑t=1T−1ζt(i,j)
b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ‾ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = j ∣ λ ‾ ) = ∑ t = 1 , o t = v k T P ( O , i t = j ∣ λ ‾ ) 1 P ( O ∣ λ ‾ ) ∑ t = 1 T P ( O , i t = j ∣ λ ‾ ) 1 P ( O ∣ λ ‾ ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) \begin{aligned} b_j(k)&=\frac{\sum_{t=1}^TP(O,i_t=j|\overline{\lambda})I(o_t=v_k)}{\sum_{t=1}^TP(O,i_t=j|\overline{\lambda})}\\ &=\frac{\sum_{t=1,o_t=v_k}^TP(O,i_t=j|\overline{\lambda})\frac{1}{P(O|\overline{\lambda})}}{\sum_{t=1}^TP(O,i_t=j|\overline{\lambda})\frac{1}{P(O|\overline{\lambda})}}\\ &=\frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t=1}^T\gamma_t(j)} \end{aligned} bj(k)=∑t=1TP(O,it=j∣λ)∑t=1TP(O,it=j∣λ)I(ot=vk)=∑t=1TP(O,it=j∣λ)P(O∣λ)1∑t=1,ot=vkTP(O,it=j∣λ)P(O∣λ)1=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
π i = P ( O , i 1 = i ∣ λ ‾ ) P ( O ∣ λ ‾ ) = γ 1 ( i ) \pi_i=\frac{P(O,i_1=i|\overline{\lambda})}{P(O|\overline{\lambda})}=\gamma_1(i) πi=P(O∣λ)P(O,i1=i∣λ)=γ1(i)