【西瓜书笔记】12. 隐马尔科夫模型(2)

监督学习方法

假设已给出训练数据包含S个长度相同的观测序列和对应的状态序列 { ( O 1 , I 1 ) , ( O 2 , I 2 ) , … , ( O S , I S ) } \left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \ldots,\left(O_{S}, I_{S}\right)\right\} {(O1,I1),(O2,I2),,(OS,IS)}那么可以利用极大似然估计法来估计隐马尔科夫模型,具体方法如下

转移概率 a i j a_{ij} aij的估计:
a i j = A i j ∑ j = 1 N A i j a_{i j}=\frac{A_{i j}}{\sum_{j=1}^{N} A_{i j}} aij=j=1NAijAij
其中, A i j A_{ij} Aij为样本中时刻t处于状态 q i q_{i} qi而到时刻t+1转移到状态 q j q_{j} qj的频数。

观测概率 b i j b_{ij} bij的估计:
b j k = B j k ∑ k = 1 M B j k b_{j k}=\frac{B_{j k}}{\sum_{k=1}^{M} B_{j k}} bjk=k=1MBjkBjk
其中, B j k B_{jk} Bjk为样本中状态数为 q j q_j qj,其对应观测为 v k v_{k} vk的频数。初始状态概率 π i \pi_{i} πi的估计为S个样本中初始状态为 q i q_{i} qi的频率。

举个例子。假设状态集合是 { 1 , 2 , 3 } \{1,2,3\} {1,2,3},观测集合是 { a , b } \{a, b\} {a,b},样本有两个: O 1 = ( a , a , b ) , I 1 = ( 2 , 1 , 1 ) , O 2 = ( a , b , a ) , I 2 = ( 1 , 3 , 2 ) O_1=(a, a, b), I_{1}=(2, 1, 1), O_{2}=(a, b, a), I_{2}=(1, 3, 2) O1=(a,a,b),I1=(2,1,1),O2=(a,b,a),I2=(1,3,2)。 那么对于转移概率有:
a 11 = A 11 A 11 + A 12 + A 13 = 1 1 + 0 + 1 = 1 2 a 12 = 0 a 13 = 1 2 a_{11}=\frac{A_{11}}{A_{11}+A_{12}+A_{13}}=\frac{1}{1+0+1}=\frac{1}{2}\\ a_{12}=0\\ a_{13}=\frac{1}{2} a11=A11+A12+A13A11=1+0+11=21a12=0a13=21
对于观测概率有
b 1 a = B 1 a B 1 a + B 1 b = 2 1 + 2 = 2 3 b 1 b = B 1 b B 1 a + B 1 b = 1 1 + 2 = 1 3 b_{1a}=\frac{B_{1a}}{B_{1a}+B_{1b}}=\dfrac{2}{1+2}=\frac{2}{3}\\ b_{1b}=\frac{B_{1b}}{B_{1a}+B_{1b}}=\dfrac{1}{1+2}=\frac{1}{3} b1a=B1a+B1bB1a=1+22=32b1b=B1a+B1bB1b=1+21=31
初始概率有 π 1 = 1 2 , π 2 = 1 2 , π 3 = 0 \pi_{1}=\dfrac{1}{2}, \pi_{2}=\dfrac{1}{2}, \pi_{3}=0 π1=21,π2=21,π3=0

Baum-Welch算法

这个算法就是EM算法在隐马尔科夫模型的应用。如果只有观测序列数据 O = ( o 1 , o 2 , … , o T ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) O=(o1,o2,,oT),而没有状态序列数据 I = ( i 1 , i 2 , … , i T ) I=\left(i_{1}, i_{2}, \ldots, i_{T}\right) I=(i1,i2,,iT),那么隐马尔科夫模型就是一个含有隐变量的概率模型( P ( Y ∣ θ ) = ∑ Z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) , Y → O , Z → I P(Y\mid \theta)=\sum_{Z} P(Y\mid Z, \theta)P(Z\mid \theta), Y\rightarrow O, Z\rightarrow I P(Yθ)=ZP(YZ,θ)P(Zθ),YO,ZI):
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O \mid \lambda)=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda) P(Oλ)=IP(OI,λ)P(Iλ)
如果要对它进行参数估计,则可以采用EM算法来实现。我们先要确定完全数据的对数似然函数。此时观测数据为 O = ( o 1 , o 2 , … , o T ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right) O=(o1,o2,,oT),未观测数据为 I = ( i 1 , i 2 , … , i T ) I=\left(i_{1}, i_{2}, \ldots, i_{T}\right) I=(i1,i2,,iT),则完全数据为 ( O , I ) = ( o 1 , o 2 , … , o T , i 1 , i 2 , … , i T ) (O, I)=\left(o_{1}, o_{2}, \ldots, o_{T}, i_{1}, i_{2}, \ldots, i_{T}\right) (O,I)=(o1,o2,,oT,i1,i2,,iT),完全数据的对数似然函数为:
ln ⁡ P ( O , I ∣ λ ) \ln P(O, I \mid \lambda) lnP(O,Iλ)
其中, P ( O , I ∣ λ ) = π i 1 b i 1 o 1 a i 1 i 2 b i 2 o 2 ⋯ a i T − 1 i T b i T o T P(O, I \mid \lambda)=\pi_{i_{1}} b_{i_{1} o_{1}} a_{i_{1} i_{2}} b_{i_{2} o_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} o_{T}} P(O,Iλ)=πi1bi1o1ai1i2bi2o2aiT1iTbiToT,所以可以进一步推得
ln ⁡ P ( O , I ∣ λ ) = ln ⁡ ( π i 1 b i 1 o 1 a i 1 i 2 b i 2 o 2 ⋯ a i T − 1 i T b i T o T ) = ln ⁡ π i 1 + ∑ t = 1 T − 1 ln ⁡ a i t i t + 1 + ∑ t = 1 T ln ⁡ b i t o t \begin{aligned} \ln P(O, I \mid \lambda) &=\ln \left(\pi_{i_{1}} b_{i_{1} o_{1}} a_{i_{1} i_{2}} b_{i_{2} o_{2}} \cdots a_{i_{T-1} i_{T}} b_{i_{T} o_{T}}\right) \\ &=\ln \pi_{i_{1}}+\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}+\sum_{t=1}^{T} \ln b_{i_{t} o_{t}} \end{aligned} lnP(O,Iλ)=ln(πi1bi1o1ai1i2bi2o2aiT1iTbiToT)=lnπi1+t=1T1lnaitit+1+t=1Tlnbitot

EM算法E步:

求Q函数 Q ( λ , λ ˉ ) Q(\lambda, \bar{\lambda}) Q(λ,λˉ)
Q ( λ , λ ˉ ) = ∑ I P ( I ∣ O , λ ˉ ) ln ⁡ P ( O , I ∣ λ ) Q(\lambda, \bar{\lambda})=\sum_{I} P(I \mid O, \bar{\lambda}) \ln P(O, I \mid \lambda) Q(λ,λˉ)=IP(IO,λˉ)lnP(O,Iλ)
其中, λ ˉ \bar{\lambda} λˉ是隐马尔科夫模型参数的当前估计值, λ \lambda λ是要极大化的隐马尔科夫模型参数。为了便于后续计算,Q函数还可以作如下恒等变形:
Q ( λ , λ ˉ ) = ∑ I P ( I ∣ O , λ ˉ ) ln ⁡ P ( O , I ∣ λ ) = ∑ I P ( O , I ∣ λ ˉ ) P ( O ∣ λ ˉ ) ln ⁡ P ( O , I ∣ λ ) \begin{aligned} Q(\lambda, \bar{\lambda}) &=\sum_{I} P(I \mid O, \bar{\lambda}) \ln P(O, I \mid \lambda) \\ &=\sum_{I} \frac{P(O, I \mid \bar{\lambda})}{P(O \mid \bar{\lambda})} \ln P(O, I \mid \lambda) \end{aligned} Q(λ,λˉ)=IP(IO,λˉ)lnP(O,Iλ)=IP(Oλˉ)P(O,Iλˉ)lnP(O,Iλ)
其中利用了 P ( A ∣ B ) = P ( A , B ) P ( B ) P(A\mid B)=\dfrac{P(A, B)}{P(B)} P(AB)=P(B)P(A,B)。由于接下来仅极大化 λ \lambda λ P ( O ∣ λ ˉ ) P(O \mid \bar{\lambda}) P(Oλˉ)可以看做常数项进行略去,所以Q函数可以进一化简为:
Q ( λ , λ ˉ ) = ∑ I P ( O , I ∣ λ ˉ ) ln ⁡ P ( O , I ∣ λ ) = ∑ I P ( O , I ∣ λ ˉ ) ( ln ⁡ π i 1 + ∑ t = 1 T − 1 ln ⁡ a i t i t + 1 + ∑ t = 1 T ln ⁡ b i t O t ) = ∑ I P ( O , I ∣ λ ˉ ) ln ⁡ π i 1 + ∑ I P ( O , I ∣ λ ˉ ) ( ∑ t = 1 T − 1 ln ⁡ a i t i t + 1 ) + ∑ I P ( O , I ∣ λ ˉ ) ( ∑ t = 1 T ln ⁡ b i t o t ) \begin{aligned} Q(\lambda, \bar{\lambda}) &=\sum_{I} P(O, I \mid \bar{\lambda}) \ln P(O, I \mid \lambda) \\ &=\sum_{I} P(O, I \mid \bar{\lambda})\left(\ln \pi_{i_{1}}+\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}+\sum_{t=1}^{T} \ln b_{i_{t} O_{t}}\right) \\ &=\sum_{I} P(O, I \mid \bar{\lambda}) \ln \pi_{i_{1}}+\sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T-1} \ln a_{i_{t} i_{t+1}}\right)+\sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T} \ln b_{i_{t} o_{t}}\right) \end{aligned} Q(λ,λˉ)=IP(O,Iλˉ)lnP(O,Iλ)=IP(O,Iλˉ)(lnπi1+t=1T1lnaitit+1+t=1TlnbitOt)=IP(O,Iλˉ)lnπi1+IP(O,Iλˉ)(t=1T1lnaitit+1)+IP(O,Iλˉ)(t=1Tlnbitot)

EM算法M步:

极大化Q函数。由于要极大化的参数在上式中单独地出现在3个项中,所以只需要对各项分别极大化。

π i \pi_i πi: Q函数中的第1项可以写成:
∑ I P ( O , I ∣ λ ˉ ) ln ⁡ π i 1 = ∑ i 1 , i 2 , … , i T P ( O , i 1 , i 2 , … , i T ∣ λ ˉ ) ln ⁡ π i 1 = ∑ i = 1 N ( ∑ i 2 , i 3 , … , i T P ( O , i 1 = q i , i 2 , i 3 , … , i T ∣ λ ˉ ) ln ⁡ π i ) = ∑ i = 1 N { ln ⁡ π i ⋅ ( ∑ i 2 , i 3 , … , i T P ( O , i 1 = q i , i 2 , i 3 , … , i T ∣ λ ˉ ) ) } = ∑ i = 1 N ln ⁡ π i P ( O , i 1 = q i ∣ λ ˉ ) \begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda}) \ln \pi_{i_{1}} &=\sum_{i_{1}, i_{2}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln \pi_{i_{1}} \\ &=\sum_{i=1}^{N}\left(\sum_{i_{2}, i_{3}, \ldots, i_{T}} P\left(O, i_{1}=q_{i}, i_{2}, i_{3}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln \pi_{i}\right) \\ &=\sum_{i=1}^{N}\left\{\ln \pi_{i} \cdot\left(\sum_{i_{2}, i_{3}, \ldots, i_{T}} P\left(O, i_{1}=q_{i}, i_{2}, i_{3}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right\} \\ &=\sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right) \end{aligned} IP(O,Iλˉ)lnπi1=i1,i2,,iTP(O,i1,i2,,iTλˉ)lnπi1=i=1N(i2,i3,,iTP(O,i1=qi,i2,i3,,iTλˉ)lnπi)=i=1N{lnπi(i2,i3,,iTP(O,i1=qi,i2,i3,,iTλˉ))}=i=1NlnπiP(O,i1=qiλˉ)
由于 π \pi π满足约束 ∑ i = 1 N π i = 1 \sum_{i=1}^{N} \pi_{i}=1 i=1Nπi=1,利用拉格朗日乘子法,写出拉格朗日函数:
∑ i = 1 N ln ⁡ π i P ( O , i 1 = q i ∣ λ ˉ ) + η ( ∑ i = 1 N π i − 1 ) \sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta\left(\sum_{i=1}^{N} \pi_{i}-1\right) i=1NlnπiP(O,i1=qiλˉ)+η(i=1Nπi1)

对拉格朗日函数关于 π \pi π求偏导并令结果为0:
∂ ∂ π i [ ∑ i = 1 N ln ⁡ π i P ( O , i 1 = q i ∣ λ ˉ ) + η ( ∑ i = 1 N π i − 1 ) ] = 0 1 π i ⋅ P ( O , i 1 = q i ∣ λ ˉ ) + η = 0 P ( O , i 1 = q i ∣ λ ˉ ) + η π i = 0 \begin{gathered} \frac{\partial}{\partial \pi_{i}}\left[\sum_{i=1}^{N} \ln \pi_{i} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta\left(\sum_{i=1}^{N} \pi_{i}-1\right)\right]=0 \\ \frac{1}{\pi_{i}} \cdot P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta=0 \\ P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}=0 \end{gathered} πi[i=1NlnπiP(O,i1=qiλˉ)+η(i=1Nπi1)]=0πi1P(O,i1=qiλˉ)+η=0P(O,i1=qiλˉ)+ηπi=0
利用 ∑ i = 1 N π i = 1 \sum_{i=1}^{N} \pi_{i}=1 i=1Nπi=1,对上式两边关于i求和可得:
∑ i = 1 N [ P ( O , i 1 = q i ∣ λ ˉ ) + η π i ] = 0 ∑ i = 1 N P ( O , i 1 = q i ∣ λ ˉ ) + ∑ i = 1 N η π i = 0 P ( O ∣ λ ˉ ) + η ⋅ 1 = 0 η = − P ( O ∣ λ ˉ ) \begin{gathered} \sum_{i=1}^{N}\left[P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}\right]=0 \\ \sum_{i=1}^{N} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\sum_{i=1}^{N} \eta \pi_{i}=0 \\ P(O \mid \bar{\lambda})+\eta \cdot 1=0 \\ \eta=-P(O \mid \bar{\lambda}) \end{gathered} i=1N[P(O,i1=qiλˉ)+ηπi]=0i=1NP(O,i1=qiλˉ)+i=1Nηπi=0P(Oλˉ)+η1=0η=P(Oλˉ)
将其代回 P ( O , i 1 = q i ∣ λ ˉ ) + η π i = 0 P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)+\eta \pi_{i}=0 P(O,i1=qiλˉ)+ηπi=0可得:
P ( O , i 1 = q i ∣ λ ˉ ) − P ( O ∣ λ ˉ ) ⋅ π i = 0 π i = P ( O , i 1 = q i ∣ λ ˉ ) P ( O ∣ λ ˉ ) = P ( i 1 = q i ∣ O , λ ˉ ) = γ 1 ( i ) = α 1 ( i ) β 1 ( i ) ∑ j = 1 N α 1 ( j ) β 1 ( j ) \begin{gathered} P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)-P(O \mid \bar{\lambda}) \cdot \pi_{i}=0 \\ \pi_{i}=\frac{P\left(O, i_{1}=q_{i} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}=P\left(i_{1}=q_{i} \mid O, \bar{\lambda}\right)=\gamma_{1}(i)=\frac{\alpha_{1}(i) \beta_{1}(i)}{\sum_{j=1}^{N} \alpha_{1}(j) \beta_{1}(j)} \end{gathered} P(O,i1=qiλˉ)P(Oλˉ)πi=0πi=P(Oλˉ)P(O,i1=qiλˉ)=P(i1=qiO,λˉ)=γ1(i)=j=1Nα1(j)β1(j)α1(i)β1(i)
其中 γ t ( i ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) \gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)} γt(i)=j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数 λ \lambda λ和观测 O O O,在时刻t处于状态 q i q_i qi的概率。

a i j a_{ij} aij:Q函数中的第2项可以写成:
∑ I P ( O , I ∣ λ ˉ ) ( ∑ t = 1 T − 1 ln ⁡ a i t + i t + 1 ) = ∑ t = 1 T − 1 ( ∑ i 1 , i 2 , … , i T P ( O , i 1 , i 2 , … , i T ∣ λ ˉ ) ln ⁡ a i t i t + 1 ) = ∑ t = 1 T − 1 { ∑ i = 1 N ∑ j = 1 N ( ∑ i 1 , i 2 , … , i t − 1 , i t + 2 … , i T P ( O , i 1 , i 2 , … , i t = q i , i t + 1 = q j , … , i T ∣ λ ˉ ) ln ⁡ a i j ) } = ∑ t = 1 T − 1 { ∑ i = 1 N ∑ j = 1 N [ ln ⁡ a i j ⋅ ( ∑ i 1 , i 2 , … , i t − 1 , i t + 2 … , i T P ( O , i 1 , i 2 , … , i t = q i , i t + 1 = q j , … , i T ∣ λ ˉ ) ) ] } = ∑ t = 1 T − 1 ∑ i = 1 N ∑ j = 1 N ln ⁡ a i j P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) \begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T-1} \ln a_{i_{t}+i_{t+1}}\right) &=\sum_{t=1}^{T-1}\left(\sum_{i_{1}, i_{2}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln a_{i_{t} i_{t+1}}\right) \\ &=\sum_{t=1}^{T-1}\left\{\sum_{i=1}^{N} \sum_{j=1}^{N}\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{i}, i_{t+1}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln a_{i j}\right)\right\} \\ &=\sum_{t=1}^{T-1}\left\{\sum_{i=1}^{N} \sum_{j=1}^{N}\left[\ln a_{i j} \cdot\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{i}, i_{t+1}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right]\right\} \\ &=\sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right) \end{aligned} IP(O,Iλˉ)(t=1T1lnait+it+1)=t=1T1(i1,i2,,iTP(O,i1,i2,,iTλˉ)lnaitit+1)=t=1T1i=1Nj=1Ni1,i2,,it1,it+2,iTP(O,i1,i2,,it=qi,it+1=qj,,iTλˉ)lnaij=t=1T1i=1Nj=1Nlnaiji1,i2,,it1,it+2,iTP(O,i1,i2,,it=qi,it+1=qj,,iTλˉ)=t=1T1i=1Nj=1NlnaijP(O,it=qi,it+1=qjλˉ)
由于 a i j a_{ij} aij需要满足约束 ∑ j = 1 N a i j = 1 \sum_{j=1}^{N} a_{i j}=1 j=1Naij=1,同样利用拉格朗日乘子法,写出拉格朗日函数:
∑ t = 1 T − 1 ∑ i = 1 N ∑ j = 1 N ln ⁡ a i j P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + η ( ∑ j = 1 N a i j − 1 ) \sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{j=1}^{N} a_{i j}-1\right) t=1T1i=1Nj=1NlnaijP(O,it=qi,it+1=qjλˉ)+η(j=1Naij1)
对拉格朗日函数关于 a i j a_{ij} aij求偏导并令结果为0:
∂ ∂ a i j [ ∑ t = 1 T − 1 ∑ i = 1 N ∑ j = 1 N ln ⁡ a i j P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + η ( ∑ j = 1 N a i j − 1 ) ] = 0 1 a i j ⋅ ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + η = 0 ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + η a i j = 0 \begin{gathered} \frac{\partial}{\partial a_{i j}}\left[\sum_{t=1}^{T-1} \sum_{i=1}^{N} \sum_{j=1}^{N} \ln a_{i j} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{j=1}^{N} a_{i j}-1\right)\right]=0 \\ \frac{1}{a_{i j}} \cdot \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta=0 \\ \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta a_{i j}=0 \end{gathered} aij[t=1T1i=1Nj=1NlnaijP(O,it=qi,it+1=qjλˉ)+η(j=1Naij1)]=0aij1t=1T1P(O,it=qi,it+1=qjλˉ)+η=0t=1T1P(O,it=qi,it+1=qjλˉ)+ηaij=0
利用 ∑ j = 1 N a i j = 1 \sum_{j=1}^{N} a_{i j}=1 j=1Naij=1对上式两边关于j求和可得:
∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + ∑ j = 1 N η a i j = 0 ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ˉ ) + η ⋅ 1 = 0 η = − ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ˉ ) \begin{gathered} \sum_{j=1}^{N} \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\sum_{j=1}^{N} \eta a_{i j}=0 \\ \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)+\eta \cdot 1=0 \\ \eta=-\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right) \end{gathered} j=1Nt=1T1P(O,it=qi,it+1=qjλˉ)+j=1Nηaij=0t=1T1P(O,it=qiλˉ)+η1=0η=t=1T1P(O,it=qiλˉ)
将其代回 ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) + η a i j = 0 \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)+\eta a_{i j}=0 t=1T1P(O,it=qi,it+1=qjλˉ)+ηaij=0可得:
∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) − ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ˉ ) ⋅ a i j = 0 a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ˉ ) \begin{gathered} \sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)-\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right) \cdot a_{i j}=0 \\ a_{i j}=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)} \end{gathered} t=1T1P(O,it=qi,it+1=qjλˉ)t=1T1P(O,it=qiλˉ)aij=0aij=t=1T1P(O,it=qiλˉ)t=1T1P(O,it=qi,it+1=qjλˉ)
分子分母同时除以 P ( O ∣ λ ˉ ) P(O \mid \bar{\lambda}) P(Oλˉ)
a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ˉ ) P ( O ∣ λ ˉ ) ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ˉ ) P ( O ∣ λ ˉ ) = ∑ t = 1 T − 1 P ( i t = q i , i t + 1 = q j ∣ O , λ ˉ ) ∑ t = 1 T − 1 P ( i t = q i ∣ O , λ ˉ ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ l = 1 T − 1 γ t ( i ) a_{i j}=\frac{\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i}, i_{t+1}=q_{j} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}{\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=q_{i} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}=\frac{\sum_{t=1}^{T-1} P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid O, \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(i_{t}=q_{i} \mid O, \bar{\lambda}\right)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{l=1}^{T-1} \gamma_{t}(i)} aij=P(Oλˉ)t=1T1P(O,it=qiλˉ)P(Oλˉ)t=1T1P(O,it=qi,it+1=qjλˉ)=t=1T1P(it=qiO,λˉ)t=1T1P(it=qi,it+1=qjO,λˉ)=l=1T1γt(i)t=1T1ξt(i,j)
其中 ξ t ( i , j ) = α t ( i ) a i j b j o t + 1 β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j o t + 1 β t + 1 ( j ) \xi_{t}(i, j)=\dfrac{\alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)}{\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{t}(i) a_{i j} b_{j o_{t+1}} \beta_{t+1}(j)} ξt(i,j)=i=1Nj=1Nαt(i)aijbjot+1βt+1(j)αt(i)aijbjot+1βt+1(j)表示给定 λ \lambda λ O O O,在时刻t处于状态 q i q_i qi且在 t + 1 t+1 t+1处于 q j q_j qj的概率。 γ t ( i ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) \gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)} γt(i)=j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数 λ \lambda λ和观测 O O O,在时刻t处于状态 q i q_i qi的概率。

b j k b_{jk} bjk:Q函数中的第3项可以写成:
∑ I P ( O , I ∣ λ ˉ ) ( ∑ t = 1 T ln ⁡ b i t o t ) = ∑ t = 1 T ( ∑ i 1 , i 2 … , i T P ( O , i 1 , i 2 , … , i T ∣ λ ˉ ) ln ⁡ b i t o t ) = ∑ t = 1 T { ∑ j = 1 N ( ∑ i 1 , i 2 , … , i t − 1 , i t + 1 , … , i T P ( O , i 1 , i 2 , … , i t = q j , … , i T ∣ λ ˉ ) ln ⁡ b j o t ) } = ∑ t = 1 T { ∑ j = 1 N [ ln ⁡ b j o t ⋅ ( ∑ i 1 , i 2 , … , i t − 1 , i t + 1 , … , i T P ( O , i 1 , i 2 , … , i t = q j , … , i T ∣ λ ˉ ) ) ] } = ∑ t = 1 T ∑ j = 1 N ln ⁡ b j o t P ( O , i t = q j ∣ λ ˉ ) \begin{aligned} \sum_{I} P(O, I \mid \bar{\lambda})\left(\sum_{t=1}^{T} \ln b_{i_{t} o_{t}}\right) &=\sum_{t=1}^{T}\left(\sum_{i_{1}, i_{2} \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln b_{i_{t} o_{t}}\right) \\ &=\sum_{t=1}^{T}\left\{\sum_{j=1}^{N}\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+1}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right) \ln b_{j o_{t}}\right)\right\} \\ &=\sum_{t=1}^{T}\left\{\sum_{j=1}^{N}\left[\ln b_{j o_{t}} \cdot\left(\sum_{i_{1}, i_{2}, \ldots, i_{t-1}, i_{t+1}, \ldots, i_{T}} P\left(O, i_{1}, i_{2}, \ldots, i_{t}=q_{j}, \ldots, i_{T} \mid \bar{\lambda}\right)\right)\right]\right\} \\ &=\sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \end{aligned} IP(O,Iλˉ)(t=1Tlnbitot)=t=1T(i1,i2,iTP(O,i1,i2,,iTλˉ)lnbitot)=t=1Tj=1Ni1,i2,,it1,it+1,,iTP(O,i1,i2,,it=qj,,iTλˉ)lnbjot=t=1Tj=1Nlnbjoti1,i2,,it1,it+1,,iTP(O,i1,i2,,it=qj,,iTλˉ)=t=1Tj=1NlnbjotP(O,it=qjλˉ)
由于 b j k b_{jk} bjk需要满足约束条件 ∑ k = 1 M b j k = 1 \sum_{k=1}^{M} b_{j k}=1 k=1Mbjk=1,同样利用拉格朗日乘子法,写出拉格朗日函数
∑ t = 1 T ∑ j = 1 N ln ⁡ b j o t P ( O , i t = q j ∣ λ ˉ ) + η ( ∑ k = 1 M b j k − 1 ) \sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{k=1}^{M} b_{j k}-1\right) t=1Tj=1NlnbjotP(O,it=qjλˉ)+η(k=1Mbjk1)
对拉格朗日函数关于 b j k b_{jk} bjk求偏导并令结果为0:
∂ ∂ b j k [ ∑ t = 1 T ∑ j = 1 N ln ⁡ b j o t P ( O , i t = q j ∣ λ ˉ ) + η ( ∑ k = 1 M b j k − 1 ) ] = 0 1 b j k ⋅ ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) + η = 0 ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) + η b j k = 0 \begin{aligned} &\frac{\partial}{\partial b_{j k}}\left[\sum_{t=1}^{T} \sum_{j=1}^{N} \ln b_{j o_{t}} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta\left(\sum_{k=1}^{M} b_{j k}-1\right)\right]=0\\ &\frac{1}{b_{j k}} \cdot \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta=0 \\ &\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta b_{j k}=0 \end{aligned} bjk[t=1Tj=1NlnbjotP(O,it=qjλˉ)+η(k=1Mbjk1)]=0bjk1t=1TP(O,it=qjλˉ)I(ot=vk)+η=0t=1TP(O,it=qjλˉ)I(ot=vk)+ηbjk=0
其中, I ( o t = v k ) \mathbb{I}\left(o_{t}=v_{k}\right) I(ot=vk)为指示函数。首先求导不受 ∑ j = 1 N \sum_{j=1}^{N} j=1N约束,所以 ∑ j = 1 N \sum_{j=1}^{N} j=1N可以略去。又因为这里有 ln ⁡ b j o t \ln b_{jo_{t}} lnbjot, o t o_t ot是需要从 o 1 o_1 o1遍历到 o T o_{T} oT的,也就是 ∑ t = 1 T \sum_{t=1}^{T} t=1T。如果对于某个 t t t, o t = v k o_t=v_k ot=vk,那么 ln ⁡ b j o t \ln b_{jo_{t}} lnbjot可以求导,如果 o t ≠ v k o_t\neq v_k ot=vk,求导就等于0。也就是说从 o 1 o_1 o1遍历到 o T o_{T} oT,观测序列中可能有不止一个 o t = v k o_{t}=v_k ot=vk。但是我们不知道具体哪几个 o t = v k o_t=v_k ot=vk,所以我们就引入了指示函数。

利用 ∑ k = 1 M b j k = 1 \sum_{k=1}^{M} b_{j k}=1 k=1Mbjk=1,对上式两边关于k求和可得:
∑ k = 1 M ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) + ∑ k = 1 M η b j k = 0 ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) + η ⋅ 1 = 0 η = − ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) \begin{gathered} \sum_{k=1}^{M} \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\sum_{k=1}^{M} \eta b_{j k}=0 \\ \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)+\eta \cdot 1=0 \\ \eta=-\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \end{gathered} k=1Mt=1TP(O,it=qjλˉ)I(ot=vk)+k=1Mηbjk=0t=1TP(O,it=qjλˉ)+η1=0η=t=1TP(O,it=qjλˉ)
这里从 k = 1 k=1 k=1 k = M k=M k=M遍历求和,除了 o t = v k o_t=v_k ot=vk这一项使得指示函数为1,其他项全部为0。所以我们可以去掉指示函数,去掉k保留t,我们不需要知道具体哪个k。
将其代回 ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) + η b j k = 0 \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)+\eta b_{j k}=0 t=1TP(O,it=qjλˉ)I(ot=vk)+ηbjk=0,可得:
∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) − ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) ⋅ b j k = 0 b j k = ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) \begin{gathered} \sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)-\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \cdot b_{j k}=0 \\ b_{j k}=\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)} \end{gathered} t=1TP(O,it=qjλˉ)I(ot=vk)t=1TP(O,it=qjλˉ)bjk=0bjk=t=1TP(O,it=qjλˉ)t=1TP(O,it=qjλˉ)I(ot=vk)
分子分母同时除以 P ( O ∣ λ ˉ ) P(O \mid \bar{\lambda}) P(Oλˉ)
b j k = ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) I ( o t = v k ) P ( O ∣ λ ) ∑ t = 1 T P ( O , i t = q j ∣ λ ˉ ) P ( O ∣ λ ˉ ) = ∑ t = 1 T P ( i t = q j ∣ O , λ ˉ ) I ( o t = v k ) ∑ t = 1 T P ( i t = q j ∣ O , λ ˉ ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_{j k}=\frac{\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{P(O \mid \lambda)}}{\frac{\sum_{t=1}^{T} P\left(O, i_{t}=q_{j} \mid \bar{\lambda}\right)}{P(O \mid \bar{\lambda})}}=\frac{\sum_{t=1}^{T} P\left(i_{t}=q_{j} \mid O, \bar{\lambda}\right) \mathbb{I}\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(i_{t}=q_{j} \mid O, \bar{\lambda}\right)}=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)} bjk=P(Oλˉ)t=1TP(O,it=qjλˉ)P(Oλ)t=1TP(O,it=qjλˉ)I(ot=vk)=t=1TP(it=qjO,λˉ)t=1TP(it=qjO,λˉ)I(ot=vk)=t=1Tγt(j)t=1,ot=vkTγt(j)
其中, γ t ( i ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) \gamma_{t}(i)=\dfrac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)} γt(i)=j=1Nαt(j)βt(j)αt(i)βt(i)表示给定模型参数 λ \lambda λ和观测 O O O,在时刻t处于状态 q i q_i qi的概率。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值