隐马尔可夫HMM

隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔科夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程。HMM属于生成模型。

1 HMM组成三部分

在这里插入图片描述
HMM可以由下面的三元符号表示:
λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π)

  1. A是状态转移概率矩阵
  2. B是观测概率矩阵
  3. π \pi π是初始状态概率向量

将观测变量用 o t o_t ot表示,取值为 V = v 1 , . . . , v M V={v_1,...,v_M} V=v1,...,vM
将状态变量用 i t i_t it,取值为 Q = q 1 , . . . , q N Q={q_1,...,q_N} Q=q1,...,qN
A = [ a i j ] , a i j = p ( i t + 1 = q j ∣ i t = q i ) B = [ b j ( k ) ] , b j ( k ) = p ( o t = v k ∣ i t = q j ) π = ( π 1 , . . . , π N ) , ∑ i = 1 N π i = 1 , π i = p ( i 1 = q i ) A=[a_{ij}],a_{ij}=p(i_{t+1}=q_j|i_t=q_i)\\ B=[b_j(k)], b_j(k)=p(o_t=v_k|i_t=q_j)\\ \pi=(\pi_1,...,\pi_N), \sum_{i=1}^{N}\pi_i=1,\pi_i=p(i_1=q_i) A=[aij],aij=p(it+1=qjit=qi)B=[bj(k)],bj(k)=p(ot=vkit=qj)π=(π1,...,πN),i=1Nπi=1,πi=p(i1=qi)

2 两个假设

  1. 齐次一阶Markov假设:任意时刻的状态只依赖于上一时刻。
    p ( i t + 1 ∣ i t , i t − 1 , . . . , i 1 , o t , o t − 1 , . . . , o 1 ) = p ( i t + 1 ∣ i t ) p(i_{t+1}|i_t,i_{t-1},...,i_1,o_t,o_{t-1},...,o_1)=p(i_{t+1}|i_t) p(it+1it,it1,...,i1,ot,ot1,...,o1)=p(it+1it)
  2. 观测独立假设:当前时刻的观测只依赖于当前时刻的状态。
    p ( o t ∣ i t , i t − 1 , . . . , i 1 , o t − 1 , . . . , o 1 ) = p ( o t ∣ i t ) p(o_t|i_t,i_{t-1},...,i_1,o_{t-1},...,o_1)=p(o_t|i_t) p(otit,it1,...,i1,ot1,...,o1)=p(otit)

3 HMM要解决的三个问题

  1. 概率计算问题(Evaluation) ⇒ 已 知 λ 和 O , 求 p ( O ∣ λ ) 已知\lambda 和O,求p(O|\lambda) λOp(Oλ) ⇒ 前向后向
  2. 学习问题(Learning) ⇒ 已 知 观 测 序 列 O , 估 计 λ M L E = a r g max ⁡ λ p ( o ∣ λ ) 已知观测序列O,估计\lambda_{MLE}=arg\max \limits_{\lambda}p(o|\lambda) OλMLE=argλmaxp(oλ) ⇒ EM
  3. 解码问题(Decoding) ⇒ 已 知 λ 和 观 测 序 列 O , 求 解 I ^ = a r g max ⁡ I p ( I ∣ O , λ ) 已知\lambda和观测序列O,求解\hat I=arg\max \limits_{I}p(I|O,\lambda) λO,I^=argImaxp(IO,λ) ⇒ viterbi

3.1 Evaluation

p ( O ∣ λ ) = ∑ I p ( I , O ∣ λ ) = ∑ I p ( O ∣ I , λ ) ⋅ p ( I ∣ λ ) p(O|\lambda)=\sum_Ip(I,O|\lambda)=\sum_Ip(O|I,\lambda) \cdot p(I|\lambda) p(Oλ)=Ip(I,Oλ)=Ip(OI,λ)p(Iλ)
p ( I ∣ λ ) = p ( i 1 , . . . , i T ∣ λ ) = p ( i T ∣ i 1 , . . . , i T − 1 , λ ) ⋅ p ( i 1 , . . . , i T − 1 ∣ λ ) = P ( i T ∣ i T − 1 ) ⋅ p ( i 1 , . . . , i T − 1 ∣ λ ) = P ( i T ∣ i T − 1 ) ⋅ P ( i T − 1 ∣ i T − 2 ) ⋅ p ( i 1 , . . . , i T − 2 ∣ λ ) ⋅ ⋅ = π i 1 ⋅ ∏ t = 2 T a i t − 1 , i t p(I|\lambda)=p(i_1,...,i_T|\lambda)=p(i_T|i_1,...,i_{T-1},\lambda) \cdot p(i_1,...,i_{T-1}|\lambda)\\ =P(i_T|i_{T-1}) \cdot p(i_1,...,i_{T-1}|\lambda)\\ =P(i_T|i_{T-1}) \cdot P(i_{T-1}|i_{T-2}) \cdot p(i_1,...,i_{T-2}|\lambda)\\ \cdot\cdot\\ =\pi_{i_1}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_t} p(Iλ)=p(i1,...,iTλ)=p(iTi1,...,iT1,λ)p(i1,...,iT1λ)=P(iTiT1)p(i1,...,iT1λ)=P(iTiT1)P(iT1iT2)p(i1,...,iT2λ)=πi1t=2Tait1,it
同理可得:
p ( O ∣ I , λ ) = ∏ t = 1 T b i t ( o t ) p(O|I,\lambda)=\prod_{t=1}^{T}b_{i_t}(o_t) p(OI,λ)=t=1Tbit(ot)
所以可得:
p ( O ∣ λ ) = ∑ I π i 1 ⋅ ∏ t = 2 T a i t − 1 , i t ∏ t = 1 T b i t ( o t ) = ∑ i 1 ⋅ ⋅ ⋅ ∑ i T π i 1 ⋅ ∏ t = 2 T a i t − 1 , i t ∏ t = 1 T b i t ( o t ) p(O|\lambda)=\sum_I\pi_{i_1}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_t}\prod_{t=1}^{T}b_{i_t}(o_t)\\ =\sum_{i_1}\cdot\cdot\cdot\sum_{i_T}\pi_{i_1}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_t}\prod_{t=1}^{T}b_{i_t}(o_t) p(Oλ)=Iπi1t=2Tait1,itt=1Tbit(ot)=i1iTπi1t=2Tait1,itt=1Tbit(ot)
上述公式的复杂度是 O ( T N T ) O(TN^{T}) O(TNT),计算量太大,下面使用前向算法计算。

3.1.1 前向算法(Forward algorithm)

在这里插入图片描述
记:
α t ( i ) = p ( o 1 , . . . , o t , i t = q i ∣ λ ) \alpha_t(i)=p(o_1,...,o_t,i_t=q_i|\lambda) αt(i)=p(o1,...,ot,it=qiλ)
α T ( i ) = p ( O , i T = q i ∣ λ ) \alpha_T(i)=p(O,i_T=q_i|\lambda) αT(i)=p(O,iT=qiλ)
p ( O ∣ λ ) = ∑ i = 1 N P ( O , i T = q i ∣ λ ) = ∑ i = 1 N α T ( i ) p(O|\lambda)=\sum_{i=1}^{N}P(O, i_T=q_i|\lambda)=\sum_{i=1}^{N}\alpha_T(i) p(Oλ)=i=1NP(O,iT=qiλ)=i=1NαT(i)
求解递推公式:
α t + 1 ( j ) = p ( o 1 , . . . , o t + 1 , i t + 1 = q j ∣ λ ) = ∑ i = 1 N p ( o 1 , . . . , o t + 1 , i t + 1 = q j , i t = q i ∣ λ ) = ∑ i = 1 N p ( o t + 1 ∣ o 1 , . . . , o t , i t = q i , i t + 1 = q j , λ ) ⋅ p ( o 1 , . . . o t , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N p ( o t + 1 ∣ i t + 1 = q j ) ⋅ p ( o 1 , . . . o t , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N p ( o t + 1 ∣ i t + 1 = q j ) ⋅ p ( i t + 1 = q j ∣ o 1 , . . . o t , i t = q i , λ ) ⋅ p ( o 1 , . . . o t , i t = q i ∣ λ ) = ∑ i = 1 N p ( o t + 1 ∣ i t + 1 = q j ) ⋅ p ( i t + 1 = q j ∣ i t = q i ) ⋅ α t ( i ) = ∑ i = 1 N b j ( o t + 1 ) a i j α t ( i ) \alpha_{t+1}(j)=p(o_1,...,o_{t+1},i_{t+1}=q_j|\lambda)\\ =\sum_{i=1}^{N}p(o_1,...,o_{t+1},i_{t+1}=q_j,i_t=q_i|\lambda)\\ =\sum_{i=1}^{N}p(o_{t+1}|o_1,...,o_t,i_t=q_i,i_{t+1}=q_j,\lambda)\cdot p(o_1,...o_t,i_t=q_i,i_{t+1}=q_j|\lambda)\\ =\sum_{i=1}^{N}p(o_{t+1}|i_{t+1}=q_j)\cdot p(o_1,...o_t,i_t=q_i,i_{t+1}=q_j|\lambda)\\ =\sum_{i=1}^{N}p(o_{t+1}|i_{t+1}=q_j)\cdot p(i_{t+1}=q_j|o_1,...o_t,i_t=q_i,\lambda)\cdot p(o_1,...o_t,i_t=q_i|\lambda)\\ =\sum_{i=1}^{N}p(o_{t+1}|i_{t+1}=q_j)\cdot p(i_{t+1}=q_j|i_t=q_i)\cdot \alpha_t(i)\\ =\sum_{i=1}^{N}b_j(o_{t+1})a_{ij}\alpha_t(i) αt+1(j)=p(o1,...,ot+1,it+1=qjλ)=i=1Np(o1,...,ot+1,it+1=qj,it=qiλ)=i=1Np(ot+1o1,...,ot,it=qi,it+1=qj,λ)p(o1,...ot,it=qi,it+1=qjλ)=i=1Np(ot+1it+1=qj)p(o1,...ot,it=qi,it+1=qjλ)=i=1Np(ot+1it+1=qj)p(it+1=qjo1,...ot,it=qi,λ)p(o1,...ot,it=qiλ)=i=1Np(ot+1it+1=qj)p(it+1=qjit=qi)αt(i)=i=1Nbj(ot+1)aijαt(i)
算法的复杂度为 O ( N 2 T ) O(N^{2}T) O(N2T)
例题见《统计学习方法》10.2

3.1.1 后向算法(Forward algorithm)

待补充

3.2 Learning

解法是Baum Welch算法,也就是EM算法。
EM公式: θ t + 1 = a r g max ⁡ θ ∫ z l o g p ( x , z ∣ θ ) ⋅ p ( z ∣ x , θ t ) d z \theta^{t+1}=arg \max \limits_{\theta}\int_zlogp(x,z|\theta)\cdot p(z|x,\theta^{t})dz θt+1=argθmaxzlogp(x,zθ)p(zx,θt)dz
x : 观 测 状 态 ⇒ O , z : 隐 变 量 ⇒ I , θ : 参 数 ⇒ λ x:观测状态 ⇒ O, z:隐变量 ⇒ I, \theta :参数 ⇒ \lambda x:O,z:I,θ:λ,将上述公式改为HMM中的符号:
λ t + 1 = a r g max ⁡ λ ∑ I l o g p ( O , I ∣ λ ) ⋅ p ( I ∣ O , λ t ) = a r g max ⁡ λ ∑ I l o g p ( O , I ∣ λ ) ⋅ p ( O , I ∣ λ t ) \lambda^{t+1}=arg \max \limits_{\lambda}\sum_Ilogp(O,I|\lambda)\cdot p(I|O,\lambda^{t})\\ =arg \max \limits_{\lambda}\sum_Ilogp(O,I|\lambda)\cdot p(O,I|\lambda^{t}) λt+1=argλmaxIlogp(O,Iλ)p(IO,λt)=argλmaxIlogp(O,Iλ)p(O,Iλt)
注: p ( I ∣ O , λ t ) = p ( O , I ∣ λ t ) p ( O ∣ λ t ) , 因 为 p ( O ∣ λ t ) 对 于 λ t + 1 是 常 数 所 以 可 以 省 略 。 p(I|O,\lambda^{t})=\frac{p(O,I|\lambda^{t})}{p(O|\lambda_t)},因为p(O|\lambda_t)对于\lambda_{t+1}是常数所以可以省略。 p(IO,λt)=p(Oλt)p(O,Iλt),p(Oλt)λt+1
其中 λ t = ( π t , A t , B t ) \lambda^{t}=(\pi^{t},A^{t},B^{t}) λt=(πt,At,Bt)
令: Q ( λ , λ t ) = ∑ I l o g p ( O , I ∣ λ ) ⋅ p ( O , I ∣ λ t ) = ∑ I [ ( l o g π i 1 ) + ∑ t = 2 T l o g a i t − 1 , i t + ∑ t = 1 N b i t ( o t ) ) ⋅ p ( O , I ∣ λ t ) ] Q(\lambda,\lambda^{t})=\sum_Ilogp(O,I|\lambda)\cdot p(O,I|\lambda^{t})\\ =\sum_I[(log\pi_{i_1})+\sum_{t=2}^{T}loga_{i_{t-1},i_t}+\sum_{t=1}^{N}b_{i_t}(o_t))\cdot p(O,I|\lambda^{t})] Q(λ,λt)=Ilogp(O,Iλ)p(O,Iλt)=I[(logπi1)+t=2Tlogait1,it+t=1Nbit(ot))p(O,Iλt)]
以求解 π t + 1 \pi^{t+1} πt+1为例:
π t + 1 = a r g max ⁡ π Q ( λ , λ t ) = a r g max ⁡ π ∑ I [ l o g π i 1 ⋅ p ( O , I ∣ λ t ) ] = a r g max ⁡ π ∑ i 1 ⋅ ⋅ ⋅ ∑ i T [ l o g π i 1 ⋅ p ( O , i 1 , . . . , i T ∣ λ t ) ] = a r g max ⁡ π ∑ i 1 [ l o g π i 1 p ( O , i 1 ∣ λ t ) ] = a r g max ⁡ π ∑ i = 1 N [ l o g π i   p ( O , i 1 = q i ∣ λ t ) ] ( s . t . ∑ i = 1 N π i = 1 ) \pi^{t+1}=arg \max \limits_{\pi}Q(\lambda,\lambda^{t})\\ =arg \max \limits_{\pi}\sum_I[log\pi_{i_1}\cdot p(O,I|\lambda^{t})]\\ =arg \max \limits_{\pi}\sum_{i_1}\cdot \cdot \cdot \sum_{i_T}[log\pi_{i_1}\cdot p(O,i_1,...,i_T|\lambda^{t})]\\ =arg \max \limits_{\pi}\sum_{i_1}[log\pi_{i_1}p(O,i_1|\lambda^{t})]\\ =arg \max \limits_{\pi}\sum_{i=1}^{N}[log\pi_i \ p(O,i_1=q_i|\lambda^{t})]\\ (s.t. \sum_{i=1}^{N}\pi_i=1) πt+1=argπmaxQ(λ,λt)=argπmaxI[logπi1p(O,Iλt)]=argπmaxi1iT[logπi1p(O,i1,...,iTλt)]=argπmaxi1[logπi1p(O,i1λt)]=argπmaxi=1N[logπi p(O,i1=qiλt)](s.t.i=1Nπi=1)
因为上述公式是有约束的,定义拉格朗日函数:
ζ ( π , η ) = ∑ i = 1 N [ l o g π i   p ( O , i 1 = q i ∣ λ t ) ] + η ( ∑ i = 1 N π i − 1 ) \zeta(\pi, \eta)=\sum_{i=1}^{N}[log\pi_i \ p(O,i_1=q_i|\lambda^{t})]+\eta(\sum_{i=1}^{N}\pi_i-1) ζ(π,η)=i=1N[logπi p(O,i1=qiλt)]+η(i=1Nπi1)
∂ ζ ∂ π i = 1 π i p ( O , i 1 = q i ∣ λ t ) + η = 0 p ( O , i 1 = q i ∣ λ t ) + π i η = 0 ∑ i = 1 N [ p ( O , i 1 = q i ∣ λ t ) + π i η ] = 0 p ( O ∣ λ t ) + η = 0 η = − p ( O ∣ λ t ) \frac{\partial \zeta}{\partial \pi_i}=\frac{1}{\pi_i}p(O,i_1=q_i|\lambda^{t})+\eta=0\\ p(O,i_1=q_i|\lambda^{t})+\pi_i \eta=0\\ \sum_{i=1}^{N}[p(O,i_1=q_i|\lambda^{t})+\pi_i \eta]=0\\ p(O|\lambda^{t})+\eta=0\\ \eta=-p(O|\lambda^{t}) πiζ=πi1p(O,i1=qiλt)+η=0p(O,i1=qiλt)+πiη=0i=1N[p(O,i1=qiλt)+πiη]=0p(Oλt)+η=0η=p(Oλt)
带入上式中:
π i t + 1 = p ( O , i 1 = q i ∣ λ t ) p ( O ∣ λ t ) \pi_i^{t+1}=\frac{p(O,i_1=q_i|\lambda^{t})}{p(O|\lambda^{t})} πit+1=p(Oλt)p(O,i1=qiλt)
所以 π t + 1 = ( π 1 t + 1 , . . . , π N t + 1 ) \pi^{t+1}=(\pi_1^{t+1},...,\pi_N^{t+1}) πt+1=(π1t+1,...,πNt+1)

3.3 Decoding

在这里插入图片描述利用viterbi算法,记:
δ t ( i ) = max ⁡ i 1 , . . . , i t − 1 p ( o 1 , . . . , o t , i 1 , . . . , i i − 1 , i t = q i ) \delta_t(i)=\max \limits_{i_1,...,i_{t-1}}p(o_1,...,o_t,i_1,...,i_{i-1},i_t=q_i) δt(i)=i1,...,it1maxp(o1,...,ot,i1,...,ii1,it=qi)
δ t + 1 ( j ) = max ⁡ i 1 , . . . , i t p ( o 1 , . . . , o t + 1 , i 1 , . . . , i i , i t + 1 = q j ) = max ⁡ 1 ≤ i ≤ N δ t ( i ) a i , j b j ( o t + 1 ) \delta_{t+1}(j)=\max \limits_{i_1,...,i_{t}}p(o_1,...,o_{t+1},i_1,...,i_{i},i_{t+1}=q_j)\\ =\max \limits_{1\le i\le N}\delta_t(i)a_{i,j}b_j(o_{t+1}) δt+1(j)=i1,...,itmaxp(o1,...,ot+1,i1,...,ii,it+1=qj)=1iNmaxδt(i)ai,jbj(ot+1)
φ t + 1 ( j ) = a r g max ⁡ 1 ≤ i ≤ N δ t ( i ) a i j \varphi_{t+1}(j)=arg \max \limits_{1\le i\le N}\delta_t(i)a_{ij} φt+1(j)=arg1iNmaxδt(i)aij
       维特比算法,通过动态规划的思想,将大的问题逐步分解成小的问题,再通过依次解决每个小问题解决的问题。他的思想是当计算 t + 1 t+1 t+1时刻的最优状态时,他们提前保存上一时刻时到达每个状态的最优路径,通过该方法减少计算量。
例题见《统计学习方法》10.3

4HMM vs MEMM

在这里插入图片描述

5 参考

https://www.bilibili.com/video/av32471608/?p=1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值