HMM(隐马尔科夫模型的无监督学习方法)

UTF8gbsn

Induction

The main topic of this article is about Baum-Welch algorithm. We will
estimate λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π) with { O 1 , O 2 , ⋯   , O S } \{O_1,O_2,\cdots,O_S\} {O1,O2,,OS} and the
lenght of O i O_i Oi is T T T. The object of our probability model is
P ( O ∣ λ ) = ∑ I P ( I ∣ λ ) P ( O ∣ I , λ ) P(O|\lambda)=\sum_{I}P(I|\lambda)P(O|I,\lambda) P(Oλ)=IP(Iλ)P(OI,λ)

The parameters of the upon formula can be estimated by EM algorithm.

  • E step:
    Q ( λ , λ ‾ ) = ∑ I P ( O , I ∣ λ ‾ ) l o g P ( O , I ∣ λ ) Q(\lambda, \overline{\lambda})=\sum_{I}P(O,I|\overline{\lambda})logP(O,I|\lambda) Q(λ,λ)=IP(O,Iλ)logP(O,Iλ)

    P ( O , I ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 , i 2 b i 2 ( o 2 ) ⋯ a i T − 1 , i T b i T ( o T ) P(O,I|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1,i_2}b_{i_2}(o_2)\cdots a_{i_{T-1},i_{T}}b_{i_T}(o_T) P(O,Iλ)=πi1bi1(o1)ai1,i2bi2(o2)aiT1,iTbiT(oT)

    The function Q ( λ , λ ‾ ) Q(\lambda,\overline{\lambda}) Q(λ,λ) becomes:

    Q ( λ , λ ‾ ) = ∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T l o g b i t ( o t ) ) P ( O , I ∣ λ ‾ ) \left. \begin{aligned} Q(\lambda,\overline{\lambda})=&\sum_{I}log\pi_{i_1}P(O,I|\overline{\lambda})+\\ &\sum_I(\sum_{t=1}^{T-1}loga_{i_{t},i_{t+1}})P(O,I|\overline{\lambda})+\\ &\sum_I(\sum_{t=1}^{T}logb_{i_t}(o_t))P(O,I|\overline{\lambda}) \end{aligned} \right. Q(λ,λ)=Ilogπi1P(O,Iλ)+I(t=1T1logait,it+1)P(O,Iλ)+I(t=1Tlogbit(ot))P(O,Iλ)

  • M step:

    1. π i \pi_i πi
      ∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) = ∑ i = 1 N l o g π i P ( O , i 1 = i ∣ λ ‾ ) \sum_Ilog\pi_{i_1} P(O,I|\overline{\lambda})=\sum_{i=1}^{N}log\pi_iP(O,i_1=i|\overline{\lambda}) Ilogπi1P(O,Iλ)=i=1NlogπiP(O,i1=iλ)
      There is a constraint ∑ i = 1 N π i = 1 \sum_{i=1}^{N}\pi_i=1 i=1Nπi=1 ,and using the
      lagrange multiplier will help us to form the lagrange function
      as follow:

      f ( π , γ ) = ∑ i = 1 N l o g π i P ( O , i 1 = i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i = 1 ) f(\pi, \gamma)=\sum_{i=1}^{N}log\pi_iP(O,i_1=i|\overline{\lambda})+\gamma(\sum_{i=1}^{N}\pi_i=1) f(π,γ)=i=1NlogπiP(O,i1=iλ)+γ(i=1Nπi=1)

      The partial derivative of f ( π , γ ) f(\pi,\gamma) f(π,γ) is:
      ∂ ∂ π i [ ∑ i = 1 N log ⁡ π i P ( O , i 1 = i ∣ λ ˉ ) + γ ( ∑ i = 1 N π i − 1 ) ] = 0 \frac{\partial}{\partial \pi_{i}}\left[\sum_{i=1}^{N} \log \pi_{i} P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma\left(\sum_{i=1}^{N} \pi_{i}-1\right)\right]=0 πi[i=1NlogπiP(O,i1=iλˉ)+γ(i=1Nπi1)]=0
      We simplify the upon formula as :

      P ( O , i 1 = i ∣ λ ˉ ) + γ π i = 0 P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma \pi_{i}=0 P(O,i1=iλˉ)+γπi=0

      From
      ∑ i = 1 N [ P ( O , i 1 = i ∣ λ ˉ ) + γ π i ] = 0 \sum_{i=1}^{N}[P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma \pi_{i}]=0 i=1N[P(O,i1=iλˉ)+γπi]=0,
      we can get γ = − P ( O ∣ λ ‾ ) \gamma=-P(O|\overline{\lambda}) γ=P(Oλ)

      Finally, we get the following formula which is:
      π i = P ( O , i 1 = i ∣ λ ‾ ) P ( O ∣ λ ‾ ) \pi_i=\frac{P(O,i_1=i|\overline{\lambda})}{P(O|\overline{\lambda})} πi=P(Oλ)P(O,i1=iλ)

    2. a i j a_{ij} aij

      ∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) \sum_I(\sum_{t=1}^{T-1}loga_{i_{t},i_{t+1}})P(O,I|\overline{\lambda})=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{i j} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right) I(t=1T1logait,it+1)P(O,Iλ)=i=1Nj=1Nt=1T1logaijP(O,it=i,it+1=jλˉ)

      Constraints of this function are
      ∑ j = 1 N a 1 j = 1 \sum_{j=1}^{N}a_{1j}=1 j=1Na1j=1, ∑ j = 1 N a 2 j = 1 \sum_{j=1}^{N}a_{2j}=1 j=1Na2j=1, ⋯ \cdots , ∑ j = 1 N a N j = 1 \sum_{j=1}^{N}a_{Nj}=1 j=1NaNj=1
      ,and our final object function will be :
      f ( A , γ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) + ∑ i = 1 N γ i ( ∑ j = 1 N a i j − 1 ) f(A, \mathbf{\gamma})=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{i j} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)+\sum_{i=1}^{N}\gamma_i(\sum_{j=1}^{N}a_{ij}-1) f(A,γ)=i=1Nj=1Nt=1T1logaijP(O,it=i,it+1=jλˉ)+i=1Nγi(j=1Naij1)

      ∂ f ( A , γ ) ∂ a i j = ∑ t = 1 T − 1 P ( O ∣ i t = i , i t + 1 = j ∣ λ ‾ ) + a i j γ i = 0 \frac{\partial f(A, \mathbf{\gamma})}{\partial a_{ij}}=\sum_{t=1}^{T-1}P(O|i_t=i,i_{t+1}=j|\overline{\lambda})+a_{ij}\gamma_i=0 aijf(A,γ)=t=1T1P(Oit=i,it+1=jλ)+aijγi=0

      ∑ j = 1 N ( ∑ t = 1 T − 1 P ( O ∣ i t = i , i t + 1 = j ∣ λ ‾ ) + a i j γ i ) ⇒ γ i = − ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ‾ ) \sum_{j=1}^{N}(\sum_{t=1}^{T-1}P(O|i_t=i,i_{t+1}=j|\overline{\lambda})+a_{ij}\gamma_i) \Rightarrow \gamma_i=-\sum_{t=1}^{T-1}P(O,i_t=i|\overline{\lambda}) j=1N(t=1T1P(Oit=i,it+1=jλ)+aijγi)γi=t=1T1P(O,it=iλ)

      Finally, we get our a i j a_{ij} aij:
      a i j = ∑ t = 1 T − 1 P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ˉ ) a_{ij}=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=i \mid \bar{\lambda}\right)} aij=t=1T1P(O,it=iλˉ)t=1T1P(O,it=i,it+1=jλˉ)

    3. b j ( k ) b_{j}(k) bj(k)
      ∑ I ( ∑ t = 1 T log ⁡ b i t ( o t ) ) P ( O , I ∣ λ ˉ ) = ∑ j = 1 N ∑ t = 1 T log ⁡ b j ( o t ) P ( O , i t = j ∣ λ ˉ ) \sum_{I}\left(\sum_{t=1}^{T} \log b_{i_{t}}\left(o_{t}\right)\right) P(O, I \mid \bar{\lambda})=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P\left(O, i_{t}=j \mid \bar{\lambda}\right) I(t=1Tlogbit(ot))P(O,Iλˉ)=j=1Nt=1Tlogbj(ot)P(O,it=jλˉ)
      Contraints of our function are
      ∑ k = 1 M b 1 ( k ) = 1 \sum_{k=1}^{M}b_1(k)=1 k=1Mb1(k)=1, ∑ k = 1 M b 2 ( k ) = 1 \sum_{k=1}^{M}b_2(k)=1 k=1Mb2(k)=1, ⋯ \cdots ,
      ∑ k = 1 M b N ( k ) = 1 \sum_{k=1}^{M}b_N(k)=1 k=1MbN(k)=1, and our final object function is:
      f ( B , γ ) = ∑ j = 1 N ∑ t = 1 T log ⁡ b j ( o t ) P ( O , i t = j ∣ λ ˉ ) + ∑ j = 1 N γ j ( ∑ k = 1 M b j ( k ) − 1 ) f(B, \mathbf{\gamma})=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P\left(O, i_{t}=j \mid \bar{\lambda}\right)+\sum_{j=1}^{N}\gamma_j(\sum_{k=1}^{M}b_j(k)-1) f(B,γ)=j=1Nt=1Tlogbj(ot)P(O,it=jλˉ)+j=1Nγj(k=1Mbj(k)1)
      A critical point of this derivative is o t = v k o_t=v_k ot=vk.
      ∂ f ( B , γ ) ∂ b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) + b j ( k ) γ j = 0 \frac{\partial f(B, \mathbf{\gamma})}{\partial b_{j}(k)}=\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right) + b_j(k)\gamma_j=0 bj(k)f(B,γ)=t=1TP(O,it=jλˉ)I(ot=vk)+bj(k)γj=0
      ∑ k = 1 M [ ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) + b j ( k ) γ j ] = 0 \sum_{k=1}^{M}\left[\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right) + b_j(k)\gamma_j\right]=0 k=1M[t=1TP(O,it=jλˉ)I(ot=vk)+bj(k)γj]=0
      There is just one I ( o t = v k ) = 1 I(o_{t}=v_{k})=1 I(ot=vk)=1 which means
      ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) + γ j = 0 ⇒ γ j = − ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) \sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right)+\gamma_j=0 \Rightarrow \gamma_j=-\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) t=1TP(O,it=jλˉ)+γj=0γj=t=1TP(O,it=jλˉ)

      Let’s substitute the uppon formula into the original derivative
      which will give us the final result:
      b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) b_{j}(k)=\frac{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right)} bj(k)=t=1TP(O,it=jλˉ)t=1TP(O,it=jλˉ)I(ot=vk)

Conclusions

  1. a i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{i j}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} aij=t=1T1γt(i)t=1T1ξt(i,j)

  2. b j ( k ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_{j}(k)=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)} bj(k)=t=1Tγt(j)t=1,ot=vkTγt(j)

  3. π i = γ 1 ( i ) \pi_{i}=\gamma_{1}(i) πi=γ1(i)

Baum-Welch Algorithm

  • Input: O = ( o 1 , o 2 , ⋯   , o T ) O=(o_1,o_2,\cdots, o_T) O=(o1,o2,,oT) is the observed sequence with length
    T T T.

  • Output: λ = ( A , B , π ) \lambda=(\mathbf{A,B,\pi}) λ=(A,B,π)

Alg:

  1. Initialization: For n = 0 n=0 n=0, we randomly choose
    a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) a_{i j}^{(0)}, b_{j}(k)^{(0)}, \pi_{i}^{(0)} aij(0),bj(k)(0),πi(0) to form our initial
    model λ ( 0 ) = ( A ( 0 ) , B ( 0 ) , π ( 0 ) ) \lambda^{(0)}=\left(A^{(0)}, B^{(0)}, \pi^{(0)}\right) λ(0)=(A(0),B(0),π(0))

  2. recursion: n = 1 , 2 , ⋯ n=1,2,\cdots n=1,2,

    a i j ( n + 1 ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{i j}^{(n+1)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} aij(n+1)=t=1T1γt(i)t=1T1ξt(i,j)

    b j ( k ) ( n + 1 ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_{j}(k)^{(n+1)}=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)} bj(k)(n+1)=t=1Tγt(j)t=1,ot=vkTγt(j)

    π i ( n + 1 ) = γ 1 ( i ) \pi_{i}^{(n+1)}=\gamma_{1}(i) πi(n+1)=γ1(i)

  3. stop: With some critical,we stop our algorithm and treat the last
    result
    λ ( n + 1 ) = ( A ( n + 1 ) , B ( n + 1 ) , π ( n + 1 ) ) \lambda^{(n+1)}=\left(A^{(n+1)}, B^{(n+1)}, \pi^{(n+1)}\right) λ(n+1)=(A(n+1),B(n+1),π(n+1)) as
    our HMM model parameters.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值