隐马尔可夫模型(HMM)进阶篇——Evaluation 前向&后向算法

前言

不太了解隐马尔可夫模型的可以先看我的上一篇博客:隐马尔可夫模型(HMM)初级篇

基础知识

在这里插入图片描述

定义:

  • 观测序列 O O O,观测值集合 Q = { q 1 , q 2 , . . . , q N } Q=\{q_1,q_2, ...,q_N\} Q={q1,q2,...,qN}
  • 状态序列 I I I,状态值集合 V = { v 1 , v 2 , . . . , v M } V=\{v_1,v_2, ..., v_M\} V={v1,v2,...,vM}
  • 初始状态概率向量 π π π
  • 状态转移概率矩阵 A A A a i j = P ( i t + 1 = q j ∣ i t = q i ) a_{ij}=P(i_{t+1}=q_j|i_t=q_i) aij=P(it+1=qjit=qi)
  • 观测概率矩阵 B B B b j ( k ) = P ( o t = v k ∣ i t = q j ) b_j(k)=P(o_t=v_k|i_t=q_j) bj(k)=P(ot=vkit=qj)

λ = ( π , A , B ) λ = (π, A, B) λ=(π,A,B) 为 HMM 三要素。

两个假设:

  1. 齐次马尔可夫性假设
    P ( i t ∣ i t − 1 , o t − 1 , . . . , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , . . . , T P(i_t | i_{t-1}, o_{t-1}, ... , i_1, o_1) = P(i_t | i_{t-1}), t = 1, 2, ..., T P(itit1,ot1,...,i1,o1)=P(itit1),t=1,2,...,T
  2. 观测独立性假设
    P ( o t ∣ i T , o T , i T − 1 , o T − 1 , . . . , i t + 1 , o t + 1 , i t , i t − 1 , o t − 1 , . . . , i 1 , o 1 ) = P ( o t ∣ i t ) P(o_t | i_T, o_T, i_{T-1}, o_{T-1}, ... , i_{t+1}, o_{t+1}, i_t, i_{t-1}, o_{t-1}, ..., i_1, o_1) = P(o_t | i_t) P(otiT,oT,iT1,oT1,...,it+1,ot+1,it,it1,ot1,...,i1,o1)=P(otit)

HMM的三个问题

  1. Evaluation
    P ( O ∣ λ ) P(O| λ) P(Oλ)——分为前向、后向
  2. Learning
    λ λ λ 如何求解?——EM算法( λ = a r g m a x P ( O ∣ λ ) λ = argmax P(O| λ) λ=argmaxP(Oλ)
  3. Decoding
    找到最大序列的状态应该是什么?就是找到 I = a r g m a x P ( I ∣ O ) I = argmax P(I|O) I=argmaxP(IO)
    可以引申出两个问题:预测问题—— P ( i t + 1 ∣ o 1 , o 2 , . . . , o t ) P(i_{t+1}|o_1, o_2, ..., o_t) P(it+1o1,o2,...,ot)预测下一状态的隐状态;滤波问题 P ( i t ∣ o 1 , o 2 , . . . , o t ) P(i_t|o_1, o_2, ..., o_t) P(ito1,o2,...,ot)

Evaluation

问题描述

给出 λ λ λ,求 P ( O ∣ λ ) P(O| λ) P(Oλ)
P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) ⋅ P ( I ∣ λ )       ① \begin{aligned} P(O| λ) = \sum_I P(O,I|λ) = \sum_I P(O| I,λ) \cdot P(I| λ)\ \ \ \ \ ① \end{aligned} P(Oλ)=IP(O,Iλ)=IP(OI,λ)P(Iλ)     

P ( I ∣ λ ) = P ( i 1 , i 2 , . . . , i T ∣ λ ) = P ( i T ∣ i 1 , i 2 , . . . , i T − 1 , λ ) ⋅ P ( i 1 , i 2 , . . . , i T − 1 , λ )       ② \begin{aligned} P(I|λ)=P(i_1,i_2, ..., i_T|λ)=P(i_T|i_1,i_2, ..., i_{T-1},λ)\cdot P(i_1,i_2, ..., i_{T-1}, λ)\ \ \ \ \ ② \end{aligned} P(Iλ)=P(i1,i2,...,iTλ)=P(iTi1,i2,...,iT1,λ)P(i1,i2,...,iT1,λ)     

根据 齐次马尔可夫性假设 和状态转移矩阵
P ( i T ∣ i 1 , i 2 , . . . , i T − 1 , λ ) = P ( i T ∣ i T − 1 ) = a i T − 1 , i T       ③ P(i_T|i_1,i_2, ..., i_{T-1},λ)=P(i_T|i_{T-1})=a_{i_{T-1},i_T}\ \ \ \ \ ③ P(iTi1,i2,...,iT1,λ)=P(iTiT1)=aiT1,iT     

同理,再结合条件概率的乘法

P ( i 1 , i 2 , . . . , i T − 1 , λ ) = P ( i T − 1 ∣ i 1 , i 2 , . . . , i T − 2 , λ ) ⋅ P ( i T − 2 ∣ i 1 , i 2 , . . . , i T − 3 , λ ) ⋅ . . . ⋅ P ( i 2 ∣ i 1 , λ ) ⋅ P ( i 1 ∣ λ ) = a i T − 2 , i T − 1 ⋅ a i T − 3 , i T − 2 ⋅ . . . ⋅ a i 2 , i 1 ⋅ π ( i 1 )       ④ P ( i 1 ∣ λ )   即   π ( i 1 ) \begin{aligned} P(i_1,i_2, ..., i_{T-1}, λ) &= P(i_{T-1}|i_1,i_2, ..., i_{T-2},λ)\cdot P(i_{T-2}|i_1,i_2, ..., i_{T-3},λ)\cdot ...\cdot P(i_2|i_1,λ)\cdot P(i_1|λ) \\ &=a_{i_{T-2},i_{T-1}}\cdot a_{i_{T-3},i_{T-2}}\cdot ...\cdot a_{i_2,i_1}\cdot π(i_1)\ \ \ \ \ ④\\ P(i_1|λ)\ 即\ π(i_1) \end{aligned} P(i1,i2,...,iT1,λ)P(i1λ)  π(i1)=P(iT1i1,i2,...,iT2,λ)P(iT2i1,i2,...,iT3,λ)...P(i2i1,λ)P(i1λ)=aiT2,iT1aiT3,iT2...ai2,i1π(i1)     

那么,
P ( I ∣ λ ) = P ( i 1 , i 2 , . . . , i T ∣ λ ) = P ( i T ∣ i 1 , i 2 , . . . , i T − 1 , λ ) ⋅ P ( i 1 , i 2 , . . . , i T − 1 , λ ) = a i T , i T − 1 ⋅ a i T − 2 , i T − 1 ⋅ . . . ⋅ a i 2 , i 1 ⋅ π ( i 1 ) = π ( i 1 ) ⋅ ∏ t = 2 T a i t − 1 , i t       ⑤ \begin{aligned} P(I|λ)&=P(i_1,i_2, ..., i_T|λ)\\ &=P(i_T|i_1,i_2, ..., i_{T-1},λ)\cdot P(i_1,i_2, ..., i_{T-1}, λ)\\ &=a_{i_{T},i_{T-1}}\cdot a_{i_{T-2},i_{T-1}}\cdot ...\cdot a_{i_2,i_1}\cdot π(i_1)\\ &=π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}}\ \ \ \ \ ⑤ \end{aligned} P(Iλ)=P(i1,i2,...,iTλ)=P(iTi1,i2,...,iT1,λ)P(i1,i2,...,iT1,λ)=aiT,iT1aiT2,iT1...ai2,i1π(i1)=π(i1)t=2Tait1,it     

由上面的HMM图解可知,
P ( O ∣ I , λ ) = ∏ t = 1 T b i t ( o t )       ⑥ P(O|I,λ)=\prod_{t=1}^T b_{i_t}(o_t) \ \ \ \ \ ⑥ P(OI,λ)=t=1Tbit(ot)     
由⑤⑥可得①为:
P ( O ∣ λ ) = ∑ I [ π ( i 1 ) ⋅ ∏ t = 2 T a i t − 1 , i t ⋅ ∏ t = 1 T b i t ( o t ) ] = ∑ i 1 ∑ i 2 . . . ∑ i T [ π ( i 1 ) ⋅ ∏ t = 2 T a i t − 1 , i t ⋅ ∏ t = 1 T b i t ( o t ) ]       ⑦ \begin{aligned} P(O| λ) &= \sum_I \left[ π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}} \cdot \prod_{t=1}^T b_{i_t}(o_t) \right] \\ &= \sum_{i_1}\sum_{i_2}...\sum_{i_T} \left[ π(i_1) \cdot \prod_{t=2}^T a_{i_{t-1},i_{t}} \cdot \prod_{t=1}^T b_{i_t}(o_t) \right]\ \ \ \ \ ⑦ \end{aligned} P(Oλ)=I[π(i1)t=2Tait1,itt=1Tbit(ot)]=i1i2...iT[π(i1)t=2Tait1,itt=1Tbit(ot)]     

每个 i i i 都有 N N N 种可能,那么⑦的复杂度即为 O ( N T ) O(N^T) O(NT),即随着序列长度 T T T 指数级增长。

可见这种计算方法是十分复杂的,因此需要想其他方法简化计算。

前向算法(Forward Algorithm)

α t ( i ) = P ( o 1 , o 2 , . . . , o t , i t = q i ∣ λ ) α_t(i) =P(o_1,o_2, ..., o_t, i_t = q_i | λ) αt(i)=P(o1,o2,...,ot,it=qiλ)

α T ( i ) = P ( O , i t = q i ∣ λ ) α_T(i) =P(O, i_t = q_i | λ) αT(i)=P(O,it=qiλ),这里 O O O即代表 o 1 , o 2 , . . . , o T o_1,o_2, ..., o_T o1,o2,...,oT

于是,有公式
P ( O ∣ λ ) = ∑ i = 1 N P ( O , i t = q i ∣ λ ) = ∑ i = 1 N α T ( i ) P(O| λ)=\sum_{i=1}^N P(O, i_t=q_i|λ)=\sum_{i=1}^N α_T(i) P(Oλ)=i=1NP(O,it=qiλ)=i=1NαT(i)

α t + 1 ( j ) = P ( o 1 , o 2 , . . . , o t , i t = q j ∣ λ ) = ∑ i = 1 N P ( o 1 , . . . , o t , o t + 1 , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) = 由 观 测 独 立 性 假 设 ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) = ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ o 1 , . . . , o t , i t = q i , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i ∣ λ ) = 由 齐 次 马 尔 可 夫 性 假 设 ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ i t = q i , λ ) ⋅ P ( o 1 , . . . , o t , i t = q i ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ P ( i t + 1 = q j ∣ i t = q i , λ ) ⋅ α t ( i ) = ∑ i = 1 N b j ( o t + 1 ) ⋅ α t ( i ) ⋅ a i j \begin{aligned} α_{t+1}(j) &=P(o_1,o_2, ..., o_t, i_t = q_j | λ)\\ &=\sum_{i=1}^N P(o_1, ..., o_t, o_{t+1}, i_t=q_i, i_{t+1}=q_j|λ)\\ &=\sum_{i=1}^N P(o_{t+1}|o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\cdot P(o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\\ &\overset{由观测独立性假设}{=}\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(o_1, ..., o_t, i_t=q_i, i_{t+1}=q_j, λ)\\ &=\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|o_1, ..., o_t, i_t=q_i, λ)\cdot P(o_1, ..., o_t, i_t=q_i|λ)\\ &\overset{由齐次马尔可夫性假设}{=}\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|i_t=q_i, λ)\cdot P(o_1, ..., o_t, i_t=q_i|λ)\\ &=\sum_{i=1}^N P(o_{t+1}|i_{t+1}=q_j)\cdot P(i_{t+1}=q_j|i_t=q_i, λ)\cdot α_t(i) \\ &=\sum_{i=1}^N b_j(o_{t+1})\cdot α_t(i) \cdot a_{ij} \end{aligned} αt+1(j)=P(o1,o2,...,ot,it=qjλ)=i=1NP(o1,...,ot,ot+1,it=qi,it+1=qjλ)=i=1NP(ot+1o1,...,ot,it=qi,it+1=qj,λ)P(o1,...,ot,it=qi,it+1=qj,λ)=i=1NP(ot+1it+1=qj)P(o1,...,ot,it=qi,it+1=qj,λ)=i=1NP(ot+1it+1=qj)P(it+1=qjo1,...,ot,it=qi,λ)P(o1,...,ot,it=qiλ)=i=1NP(ot+1it+1=qj)P(it+1=qjit=qi,λ)P(o1,...,ot,it=qiλ)=i=1NP(ot+1it+1=qj)P(it+1=qjit=qi,λ)αt(i)=i=1Nbj(ot+1)αt(i)aij

这样就得到了一个递归公式,可以计算 α T ( i ) α_T(i) αT(i),从而可算 P ( O ∣ λ ) P(O| λ) P(Oλ)。复杂度为 O ( T × N 2 ) O(T\times N^2) O(T×N2)

后向算法(backward algorithm)

β t ( k ) = P ( o t + 1 , . . . , o T ∣ i t = q k , λ ) β_t(k)=P(o_{t+1}, ..., o_T|i_t=q_k, λ) βt(k)=P(ot+1,...,oTit=qk,λ)

β 1 ( k ) = P ( o 2 , . . . , o T ∣ i 1 = q k , λ ) β_1(k)=P(o_{2}, ..., o_T|i_1=q_k, λ) β1(k)=P(o2,...,oTi1=qk,λ)

P ( O ∣ λ ) = P ( o 1 , o 2 , . . . , o T ∣ λ ) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T , i 1 = q k ) (λ是给定值,就不写上去了) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T ∣ i 1 = q k ) ⋅ P ( i 1 = q k ) = ∑ k = 1 N P ( o 1 , o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k ( P ( i 1 = q k ) 即 是 初 始 状 态 概 率 π k ) = ∑ k = 1 N P ( o 1 ∣ o 2 , . . . , o T , i 1 = q k ) ⋅ P ( o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k = 由 观 测 独 立 性 假 设 ∑ k = 1 N P ( o 1 ∣ i 1 = q k ) ⋅ P ( o 2 , . . . , o T ∣ i 1 = q k ) ⋅ π k = ∑ k = 1 N P ( o 1 ∣ i 1 = q k ) ⋅ β 1 ( i ) ⋅ π k = ∑ k = 1 N b k ( o 1 ) ⋅ β 1 ( k ) ⋅ π k \begin{aligned} P(O| λ)&=P(o_1,o_2, ..., o_T| λ)\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T,i_1=q_k) \qquad\text{(λ是给定值,就不写上去了)}\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T|i_1=q_k) \cdot P(i_1=q_k)\\ &=\sum_{k=1}^N P(o_1,o_2, ..., o_T|i_1=q_k) \cdot π_k\qquad (P(i_1=q_k)即是初始状态概率 π_k)\\ &=\sum_{k=1}^N P(o_1|o_2, ..., o_T, i_1=q_k) \cdot P(o_2, ..., o_T| i_1=q_k)\cdot π_k\\ &\overset{由观测独立性假设}{=}\sum_{k=1}^N P(o_1|i_1=q_k) \cdot P(o_2, ..., o_T| i_1=q_k)\cdot π_k\\ &=\sum_{k=1}^N P(o_1|i_1=q_k) \cdot β_1(i)\cdot π_k\\ &=\sum_{k=1}^N b_k(o_1) \cdot β_1(k)\cdot π_k \end{aligned} P(Oλ)=P(o1,o2,...,oTλ)=k=1NP(o1,o2,...,oT,i1=qk)是给定值,就不写上去了)=k=1NP(o1,o2,...,oTi1=qk)P(i1=qk)=k=1NP(o1,o2,...,oTi1=qk)πk(P(i1=qk)πk)=k=1NP(o1o2,...,oT,i1=qk)P(o2,...,oTi1=qk)πk=k=1NP(o1i1=qk)P(o2,...,oTi1=qk)πk=k=1NP(o1i1=qk)β1(i)πk=k=1Nbk(o1)β1(k)πk

β t ( k ) = P ( o t + 1 , . . . , o T ∣ i t = q k ) ( 同 样 省 略 了 λ ) = ∑ j = 1 N P ( o t + 1 , . . . , o T , i t + 1 = q j ∣ i t = q k ) = ∑ j = 1 N P ( o t + 1 , . . . , o T ∣ i t + 1 = q j , i t = q k ) ⋅ P ( i t + 1 = q j ∣ i t = q k ) = ∑ j = 1 N P ( o t + 1 , . . . , o T ∣ i t + 1 = q j , i t = q k ) ⋅ a k j = ∑ j = 1 N P ( o t + 1 , . . . , o T ∣ i t + 1 = q j ) ⋅ a k j ( 这 步 转 换 解 释 见 下 面 ) = ∑ j = 1 N P ( o t + 1 ∣ o t + 2 . . . , o T , i t + 1 = q j ) ⋅ P ( o t + 2 . . . , o T , ∣ i t + 1 = q j ) ⋅ a k j = ∑ j = 1 N P ( o t + 1 ∣ o t + 2 . . . , o T , i t + 1 = q j ) ⋅ β t + 1 ( k ) ⋅ a k j = 由 观 测 独 立 性 假 设 ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) ⋅ β t + 1 ( k ) ⋅ a k j = ∑ j = 1 N b j ( o t + 1 ) ⋅ a k j ⋅ β t + 1 ( k ) \begin{aligned} β_t(k)&=P(o_{t+1}, ..., o_T|i_t=q_k) \qquad(同样省略了λ)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T, i_{t+1}=q_j |i_t=q_k)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j, i_t=q_k)\cdot P(i_{t+1}=q_j| i_t=q_k)\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j, i_t=q_k)\cdot a_{kj}\\ &=\sum_{j=1}^N P(o_{t+1}, ..., o_T| i_{t+1}=q_j)\cdot a_{kj}\qquad (这步转换解释见下面)\\ &=\sum_{j=1}^N P(o_{t+1}| o_{t+2} ..., o_T, i_{t+1}=q_j)\cdot P(o_{t+2} ..., o_T, |i_{t+1}=q_j) \cdot a_{kj}\\ &=\sum_{j=1}^N P(o_{t+1}| o_{t+2} ..., o_T, i_{t+1}=q_j)\cdot β_{t+1}(k)\cdot a_{kj}\\ &\overset{由观测独立性假设}{=}\sum_{j=1}^N P(o_{t+1}| i_{t+1}=q_j)\cdot β_{t+1}(k)\cdot a_{kj}\\ &=\sum_{j=1}^N b_j(o_{t+1})\cdot a_{kj}\cdot β_{t+1}(k) \end{aligned} βt(k)=P(ot+1,...,oTit=qk)(λ)=j=1NP(ot+1,...,oT,it+1=qjit=qk)=j=1NP(ot+1,...,oTit+1=qj,it=qk)P(it+1=qjit=qk)=j=1NP(ot+1,...,oTit+1=qj,it=qk)akj=j=1NP(ot+1,...,oTit+1=qj)akj()=j=1NP(ot+1ot+2...,oT,it+1=qj)P(ot+2...,oT,it+1=qj)akj=j=1NP(ot+1ot+2...,oT,it+1=qj)βt+1(k)akj=j=1NP(ot+1it+1=qj)βt+1(k)akj=j=1Nbj(ot+1)akjβt+1(k)
相关解释:

  1. 通过上式就可以从 β T ( k ) β_T(k) βT(k) 开始递归推导 β T − 1 ( k ) , . . . , β 1 ( k ) β_{T-1}(k), ..., β_1(k) βT1(k),...,β1(k),从而得到 P ( O ∣ λ ) P(O| λ) P(Oλ)。复杂度为 O ( T × N 2 ) O(T\times N^2) O(T×N2)
  2. 在这里插入图片描述
    在贝叶斯网络中,正常来说a和c是有关联的,但一旦给定b,那么会发生阻断,a、c相互独立,有 P ( c ∣ a , b ) = P ( c ∣ b ) P(c|a,b)=P(c|b) P(ca,b)=P(cb)。同理应用到 i t , i t + 1 , o t + 1 i_t,i_{t+1},o_{t+1} it,it+1,ot+1,有 P ( o t + 1 ∣ i t , i t + 1 ) = P ( o t + 1 ∣ i t + 1 ) P(o_{t+1}|i_t,i_{t+1})=P(o_{t+1}|i_{t+1}) P(ot+1it,it+1)=P(ot+1it+1)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

friedrichor

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值