HMM的参数学习问题

HMM的参数学习问题

HMM的参数学习问题有两种:

  1. 监督学习:给定观测序列 O = ( o 1 , . . . , o T ) O = (o_1,...,o_T) O=(o1,...,oT)和对应的状态序列 I = ( i 1 , . . . , i T ) I = (i_1,...,i_T) I=(i1,...,iT),估计参数 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π)

  2. 非监督学习:只给定观测序列 O = ( o 1 , . . . , o T ) O = (o_1,...,o_T) O=(o1,...,oT),估计参数 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π)


监督学习(极大似然直接估计)

监督学习通过使用训练数据,来得到观测序列和对应的隐状态。然后计算相应的频数值,作为参数的近似估计。


非监督学习(Baum-Welch算法迭代估计)

Baum-Welch算法的本质即EM算法,是用于含有隐向量的模型中,进行参数学习的迭代算法。回顾EM算法的核心,是按照 Θ ( g + 1 ) \Theta^{(g+1)} Θ(g+1) Θ ( g ) \Theta^{(g)} Θ(g)之间的等式关系:

Θ ( g + 1 ) = a r g m a x Θ { Q ( Θ , Θ ( g ) ) } = a r g m a x Θ ∫ z P ( Z ∣ X , Θ ( g ) ) l o g P ( X , Z ∣ Θ ) d z \Theta^{(g+1)} = \mathop{argmax}_{\Theta} \{ Q(\Theta, \Theta^{(g)}) \} = \mathop{argmax}_{\Theta} \int_z P(Z | X, \Theta^{(g)}) logP(X,Z | \Theta) dz Θ(g+1)=argmaxΘ{Q(Θ,Θ(g))}=argmaxΘzP(ZX,Θ(g))logP(X,ZΘ)dz

不断更新参数,并且保证每一次更新,都能使对数似然函数逐渐增大。

在非监督学习的情况下,我们只有观测序列 O = ( o 1 , . . . , o T ) O = (o_1,...,o_T) O=(o1,...,oT),而状态序列 I I I被视为不可观测的隐变量,此时HMM就是一个含有隐变量的概率模型:

P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O | \lambda) = \sum_I P(O | I, \lambda) P(I | \lambda) P(Oλ)=IP(OI,λ)P(Iλ)

此时的参数估计可以用EM算法实现。这里,参数 λ \lambda λ的迭代规则为:

λ ( g + 1 ) = a r g m a x λ { Q ( λ , λ ( g ) ) } = a r g m a x λ ∫ I P ( I ∣ O , λ ( g ) ) l o g P ( O , I ∣ λ ) d I \lambda^{(g+1)} = \mathop{argmax}_{\lambda} \{ Q(\lambda, \lambda^{(g)}) \} = \mathop{argmax}_{\lambda} \int_I P(I| O, \lambda^{(g)}) logP(O,I | \lambda) dI λ(g+1)=argmaxλ{Q(λ,λ(g))}=argmaxλIP(IO,λ(g))logP(O,Iλ)dI

其中, λ ( g ) \lambda^{(g)} λ(g)是上一次迭代得到的参数, λ ( g + 1 ) \lambda^{(g+1)} λ(g+1)是下一次迭代更新的参数。

E-step

如上,在HMM中,求期望的公式为:

Q ( λ , λ ( g ) ) = ∫ I P ( I ∣ O , λ ( g ) ) l o g P ( O , I ∣ λ ) d I = ∑ I P ( I ∣ O , λ ( g ) ) l o g P ( O , I ∣ λ ) Q(\lambda, \lambda^{(g)}) = \int_I P(I| O, \lambda^{(g)}) logP(O,I | \lambda) dI = \sum_I P(I | O, \lambda^{(g)}) logP(O,I | \lambda) Q(λ,λ(g))=IP(IO,λ(g))logP(O,Iλ)dI=IP(IO,λ(g))logP(O,Iλ)

由于 P ( I ∣ O , λ ( g ) ) = P ( O , I ∣ λ ( g ) ) P ( O ∣ λ ( g ) ) P(I| O, \lambda^{(g)}) = \frac{P(O,I | \lambda^{(g)})}{P(O | \lambda^{(g)})} P(IO,λ(g))=P(Oλ(g))P(O,Iλ(g)),注意 λ ( g ) \lambda^{(g)} λ(g)是一个常数,因此对于 λ \lambda λ来说, 1 P ( O ∣ λ ( g ) ) \frac{1}{P(O | \lambda^{(g)})} P(Oλ(g))1是一个常数因子,不会对 a r g m a x argmax argmax的结果产生任何影响。因此, Q Q Q函数又可写为:

Q ( λ , λ ( g ) ) = ∑ I P ( O , I ∣ λ ( g ) ) l o g P ( O , I ∣ λ ) Q(\lambda, \lambda^{(g)}) = \sum_I P(O,I | \lambda^{(g)}) logP(O,I | \lambda) Q(λ,λ(g))=IP(O,Iλ(g))logP(O,Iλ)

HMM的概率计算问题-直接计算章节,已求得:

P ( O , I ∣ λ ) = π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1 P(O,I | \lambda) = \pi_{i_1} \prod_{t=1}^T b_{i_t}(o_t) \prod_{t=1}^{T-1}a_{i_t i_{t+1}} P(O,Iλ)=πi1t=1Tbit(ot)t=1T1aitit+1

代入 Q Q Q函数并展开,记为式1

Q ( λ , λ ( g ) ) = ∑ I P ( O , I ∣ λ ( g ) ) l o g [ π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1 ] Q(\lambda, \lambda^{(g)}) = \sum_I P(O,I | \lambda^{(g)}) log[\pi_{i_1} \prod_{t=1}^T b_{i_t}(o_t) \prod_{t=1}^{T-1} a_{i_t i_{t+1}}] Q(λ,λ(g))=IP(O,Iλ(g))log[πi1t=1Tbit(ot)t=1T1aitit+1]

= ∑ I P ( O , I ∣ λ ( g ) ) l o g π i 1 + ∑ I P ( O , I ∣ λ ( g ) ) ∑ t = 1 T l o g b i t ( o t ) + ∑ I P ( O , I ∣ λ ( g ) ) ∑ t = 1 T − 1 l o g a i t i t + 1 = \sum_I P(O,I | \lambda^{(g)}) log\pi_{i_1} + \sum_I P(O,I | \lambda^{(g)}) \sum_{t=1}^T logb_{i_t}(o_t)+\sum_I P(O,I | \lambda^{(g)}) \sum_{t=1}^{T-1} loga_{i_t i_{t+1}} =IP(O,Iλ(g))logπi1+IP(O,Iλ(g))t=1Tlogbit(ot)+IP(O,Iλ(g))t=1T1logaitit+1

M-step

上述式1被展开为3项:它们分别包含了初始状态概率向量 π i 1 \pi_{i_1} πi1观测概率矩阵的元素 b i t ( o t ) b_{i_t}(o_t) bit(ot)状态转移概率矩阵的元素 a i t i t + 1 a_{i_t i_{t+1}} aitit+1,可以分别用于估计参数 π \pi π B N × M B_{N \times M} BN×M A N × N A_{N \times N} AN×N。现在分别对每一项做最大化,求出下一步的迭代参数。

  • π i 1 \pi_{i_1} πi1

∑ I P ( O , I ∣ λ ( g ) ) l o g π i 1 \sum_I P(O,I | \lambda^{(g)}) log\pi_{i_1} IP(O,Iλ(g))logπi1

= ∑ i 1 . . . ∑ i T [ P ( O , I ∣ λ ( g ) ) l o g π i 1 ] = \sum_{i_1}...\sum_{i_T} [P(O,I | \lambda^{(g)}) log\pi_{i_1}] =i1...iT[P(O,Iλ(g))logπi1]

= ∑ i 1 l o g π i 1 [ ∑ i 2 . . . ∑ i T P ( O , i 1 , i 2 , . . . , i T ∣ λ ( g ) ) ] = \sum_{i_1} log\pi_{i_1} [\sum_{i_2}...\sum_{i_T} P(O,i_1,i_2,...,i_T | \lambda^{(g)})] =i1logπi1[i2...iTP(O,i1,i2,...,iTλ(g))]

= ∑ i 1 l o g π i 1 P ( O , i 1 ∣ λ ( g ) ) = \sum_{i_1} log\pi_{i_1} P(O,i_1 | \lambda^{(g)}) =i1logπi1P(O,i1λ(g))

= ∑ i = 1 N l o g π i P ( O , i 1 = q i ∣ λ ( g ) ) = \sum_{i = 1}^N log\pi_i P(O,i_1 = q_i | \lambda^{(g)}) =i=1NlogπiP(O,i1=qiλ(g))

由于初始状态概率必须满足 ∑ i = 1 N π i = 1 \sum_{i = 1}^N \pi_i = 1 i=1Nπi=1,因此构造拉格朗日方程:

L ( π i ) = ∑ i = 1 N l o g π i P ( O , i 1 = q i ∣ λ ( g ) ) − γ ( ∑ i = 1 N π i − 1 ) L(\pi_i) = \sum_{i = 1}^N log\pi_i P(O,i_1 = q_i | \lambda^{(g)}) - \gamma (\sum_{i = 1}^N \pi_i - 1) L(πi)=i=1NlogπiP(O,i1=qiλ(g))γ(i=1Nπi1)

分别对 π i \pi_i πi γ \gamma γ求偏导,并令其等于0:

∂ L ∂ π i = P ( O , i 1 = q i ∣ λ ( g ) ) π i − γ = 0 \frac {\partial L} {\partial \pi_i} = \frac{P(O,i_1 = q_i | \lambda^{(g)})}{\pi_i} - \gamma = 0 πiL=πiP(O,i1=qiλ(g))γ=0

∂ L ∂ γ = − ( ∑ i = 1 N π i − 1 ) = 0 \frac {\partial L} {\partial \gamma} = -(\sum_{i = 1}^N \pi_i - 1) = 0 γL=(i=1Nπi1)=0

联立解得:

π i ( g + 1 ) = P ( O , i 1 = q i ∣ λ ( g ) ) ∑ i = 1 N P ( O , i 1 = q i ∣ λ ( g ) ) = P ( O , i 1 = q i ∣ λ ( g ) ) P ( O ∣ λ ( g ) ) \pi_i^{(g+1)} = \frac{P(O,i_1 = q_i | \lambda^{(g)})}{\sum_{i = 1}^N P(O,i_1 = q_i | \lambda^{(g)})} = \frac{P(O,i_1 = q_i | \lambda^{(g)})}{P(O | \lambda^{(g)})} πi(g+1)=i=1NP(O,i1=qiλ(g))P(O,i1=qiλ(g))=P(Oλ(g))P(O,i1=qiλ(g))

  • b i t ( o t ) b_{i_t}(o_t) bit(ot)

∑ I P ( O , I ∣ λ ( g ) ) ∑ t = 1 T l o g b i t ( o t ) \sum_I P(O,I | \lambda^{(g)}) \sum_{t=1}^T logb_{i_t}(o_t) IP(O,Iλ(g))t=1Tlogbit(ot)

= ∑ I [ P ( O , I ∣ λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + P ( O , I ∣ λ ( g ) ) l o g b i T ( o T ) ] = \sum_I [P(O,I | \lambda^{(g)}) logb_{i_1}(o_1)+...+P(O,I | \lambda^{(g)}) logb_{i_T}(o_T)] =I[P(O,Iλ(g))logbi1(o1)+...+P(O,Iλ(g))logbiT(oT)]

= ∑ I P ( O , I ∣ λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + ∑ I P ( O , I ∣ λ ( g ) ) l o g b i T ( o T ) = \sum_I P(O,I | \lambda^{(g)}) logb_{i_1}(o_1) + ... + \sum_I P(O,I | \lambda^{(g)}) logb_{i_T}(o_T) =IP(O,Iλ(g))logbi1(o1)+...+IP(O,Iλ(g))logbiT(oT)

= ∑ i = 1 N P ( O , i 1 = q i ∣ λ ( g ) ) l o g b i ( o 1 ) + . . . + ∑ i = 1 N P ( O , i T = q i ∣ λ ( g ) ) l o g b i ( o T ) = \sum_{i=1}^N P(O,i_1=q_i | \lambda^{(g)}) logb_i(o_1) + ... + \sum_{i=1}^N P(O,i_T=q_i | \lambda^{(g)}) logb_i(o_T) =i=1NP(O,i1=qiλ(g))logbi(o1)+...+i=1NP(O,iT=qiλ(g))logbi(oT)

= ∑ i = 1 N ∑ t = 1 T P ( O , i t = q i ∣ λ ( g ) ) l o g b i ( o t ) = \sum_{i=1}^N \sum_{t=1}^T P(O,i_t=q_i | \lambda^{(g)})logb_i(o_t) =i=1Nt=1TP(O,it=qiλ(g))logbi(ot)

由于观测概率矩阵的行和均为 1 1 1,即必须满足 N N N个约束条件: ∑ k = 1 M b i ( o t = v k ) = 1 , i ∈ { 1 , 2 , . . . , N } \sum_{k=1}^M b_i(o_t = v_k) = 1,i \in \{1,2,...,N\} k=1Mbi(ot=vk)=1,i{1,2,...,N},因此构造拉格朗日方程:

L ( b i ( o t ) ) = ∑ i = 1 N ∑ t = 1 T P ( O , i t = q i ∣ λ ( g ) ) l o g b i ( o t ) − ∑ i = 1 N γ i ( ∑ k = 1 M b i ( o t = v k ) − 1 ) L(b_i(o_t)) = \sum_{i=1}^N \sum_{t=1}^T P(O,i_t=q_i | \lambda^{(g)}) logb_i(o_t) - \sum_{i=1}^N \gamma_i (\sum_{k=1}^M b_i(o_t = v_k) - 1) L(bi(ot))=i=1Nt=1TP(O,it=qiλ(g))logbi(ot)i=1Nγi(k=1Mbi(ot=vk)1)

分别对 b i ( o t ) b_i(o_t) bi(ot) γ i \gamma_i γi求偏导,并令其等于0:

【注】:只有在 o t = v k o_t = v_k ot=vk时, b i ( o t ) b_i(o_t) bi(ot) b i ( v k ) b_i(v_k) bi(vk)的偏导才不为零,以 I ( o t = v k ) I(o_t = v_k) I(ot=vk)表示。

∂ L ∂ b i ( o t ) = ∑ t = 1 T P ( O , i t = q i ∣ λ ( g ) ) b i ( o t ) − ∑ i = 1 N γ i = 0 \frac {\partial L} {\partial b_i(o_t)} = \frac{\sum_{t=1}^T P(O,i_t=q_i | \lambda^{(g)})}{b_i(o_t)} - \sum_{i=1}^N\gamma_i = 0 bi(ot)L=bi(ot)t=1TP(O,it=qiλ(g))i=1Nγi=0

∂ L ∂ γ i = − ( ∑ k = 1 M b i ( o t = v k ) − 1 ) = 0 \frac {\partial L} {\partial \gamma_i} = -(\sum_{k=1}^M b_i(o_t = v_k) - 1) = 0 γiL=(k=1Mbi(ot=vk)1)=0

联立解得:

b i ( o t = v k ) ( g + 1 ) = ∑ t = 1 T P ( O = v k , i t = q i ∣ λ ( g ) ) ∑ k = 1 M ∑ t = 1 T P ( O = v k , i t = q i ∣ λ ( g ) ) b_i(o_t = v_k)^{(g+1)} = \frac{\sum_{t=1}^{T} P(O=v_k,i_t = q_i | \lambda^{(g)})}{\sum_{k=1}^M \sum_{t=1}^{T} P(O=v_k,i_t = q_i | \lambda^{(g)})} bi(ot=vk)(g+1)=k=1Mt=1TP(O=vk,it=qiλ(g))t=1TP(O=vk,it=qiλ(g))

= ∑ t = 1 T P ( O , i t = q i ∣ λ ( g ) ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = q i ∣ λ ( g ) ) = \frac{\sum_{t=1}^{T} P(O,i_t = q_i | \lambda^{(g)}) I(o_t = v_k)}{ \sum_{t=1}^{T} P(O,i_t = q_i | \lambda^{(g)})} =t=1TP(O,it=qiλ(g))t=1TP(O,it=qiλ(g))I(ot=vk)

  • a i t i t + 1 a_{i_t i_{t+1}} aitit+1

∑ I P ( O , I ∣ λ ( g ) ) ∑ t = 1 T − 1 l o g a i t i t + 1 \sum_I P(O,I | \lambda^{(g)}) \sum_{t=1}^{T-1} loga_{i_t i_{t+1}} IP(O,Iλ(g))t=1T1logaitit+1

= ∑ I [ P ( O , I ∣ λ ( g ) ) l o g a i 1 i 2 + . . . + P ( O , I ∣ λ ( g ) ) l o g a i T − 1 i T ] = \sum_I [P(O,I | \lambda^{(g)}) loga_{i_1 i_2} + ... + P(O,I | \lambda^{(g)}) loga_{i_{T-1} i_T}] =I[P(O,Iλ(g))logai1i2+...+P(O,Iλ(g))logaiT1iT]

= ∑ I P ( O , I ∣ λ ( g ) ) l o g a i 1 i 2 + . . . + ∑ I P ( O , I ∣ λ ( g ) ) l o g a i T − 1 i T = \sum_I P(O,I | \lambda^{(g)}) loga_{i_1 i_2} + ... + \sum_I P(O,I | \lambda^{(g)}) loga_{i_{T-1} i_T} =IP(O,Iλ(g))logai1i2+...+IP(O,Iλ(g))logaiT1iT

= ∑ i = 1 N ∑ j = 1 N P ( O , i 1 = q i , i 2 = q j ∣ λ ( g ) ) l o g a i j + . . . + ∑ i = 1 N ∑ j = 1 N P ( O , i T − 1 = q i , i T = q j ∣ λ ( g ) ) l o g a i j = \sum_{i=1}^N \sum_{j=1}^N P(O,i_1 = q_i,i_2 = q_j | \lambda^{(g)}) loga_{ij} + ... + \sum_{i=1}^N \sum_{j=1}^N P(O,i_{T-1} = q_i,i_T = q_j | \lambda^{(g)}) loga_{ij} =i=1Nj=1NP(O,i1=qi,i2=qjλ(g))logaij+...+i=1Nj=1NP(O,iT1=qi,iT=qjλ(g))logaij

= ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) l o g a i j = \sum_{i=1}^N \sum_{j=1}^N \sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)}) loga_{ij} =i=1Nj=1Nt=1T1P(O,it=qi,it+1=qjλ(g))logaij

由于状态转移概率矩阵的行和均为 1 1 1,即必须满足 N N N个约束条件 ∑ j = 1 N a i j = 1 , i ∈ { 1 , 2 , . . . , N } \sum_{j=1}^N a_{ij} = 1,i \in \{1,2,...,N\} j=1Naij=1,i{1,2,...,N},因此构造拉格朗日方程:

L ( a i j ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) l o g a i j − ∑ i = 1 N γ i ( ∑ j = 1 N a i j − 1 ) L(a_{ij}) = \sum_{i=1}^N \sum_{j=1}^N \sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)}) loga_{ij} - \sum_{i=1}^N\gamma_i (\sum_{j=1}^N a_{ij} - 1) L(aij)=i=1Nj=1Nt=1T1P(O,it=qi,it+1=qjλ(g))logaiji=1Nγi(j=1Naij1)

分别对 a i j a_{ij} aij γ i \gamma_i γi求偏导,并令其等于0:

∂ L ∂ a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) a i j − ∑ i = 1 N γ i = 0 \frac {\partial L} {\partial a_{ij}} = \frac{\sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)})}{a_{ij}} - \sum_{i=1}^N\gamma_i = 0 aijL=aijt=1T1P(O,it=qi,it+1=qjλ(g))i=1Nγi=0

∂ L ∂ γ i = − ( ∑ j = 1 N a i j − 1 ) = 0 \frac {\partial L} {\partial \gamma_i} = -(\sum_{j=1}^N a_{ij} - 1) = 0 γiL=(j=1Naij1)=0

联立解得:

a i j ( g + 1 ) = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) a_{ij}^{(g+1)} = \frac{\sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)})}{ \sum_{j=1}^N \sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)})} aij(g+1)=j=1Nt=1T1P(O,it=qi,it+1=qjλ(g))t=1T1P(O,it=qi,it+1=qjλ(g))

= ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ( g ) ) ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ( g ) ) = \frac{\sum_{t=1}^{T-1} P(O,i_t = q_i,i_{t+1} = q_j | \lambda^{(g)})}{\sum_{t=1}^{T-1} P(O,i_t = q_i | \lambda^{(g)})} =t=1T1P(O,it=qiλ(g))t=1T1P(O,it=qi,it+1=qjλ(g))

  • 2
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值