HMM算法原理

HMM算法背景

隐马尔可夫模型 (HMM) 是一种用于标注问题的统计学习模型,描述由隐藏的马尔可夫链随机生成观测序列的过程,是一种生成模型。其被广泛的应用于语音识别、生物信息、模式识别和自然语言处理等领域。

算法的基本概念

算法的定义

在这里插入图片描述
隐马尔可夫模型是关于时序的概率模型,描述了一个由隐藏的马尔可夫观测随机序列的过程。
隐藏的马尔可夫链随机生成的序列称为状态序列记为 I = ( i 1 , i 2 , ⋯   , i T ) I=(i_{1},i_{2},\cdots,i_{T}) I=(i1,i2,,iT)每个状态会生成一个观测,故将生成的观测序列记为 O = ( o 1 , o 2 , ⋯   , o T ) O=(o_{1},o_{2},\cdots,o_{T}) O=(o1,o2,,oT)
同时记所有可能的状态集合为 Q = ( q 1 , q 2 . ⋯   , q N ) Q=(q_{1},q_{2}.\cdots,q_{N}) Q=(q1,q2.,qN)所有可能的观测集合为 V = ( v 1 , v 2 , ⋯   , v M ) V=(v_{1},v_{2},\cdots,v_{M}) V=(v1,v2,,vM)
其中 T T T为序列的长度, N N N为状态的总数, M M M为观测值的总数

同时定义了状态转移矩阵
A = [ a i j ] N × N a i j = P ( i t + 1 = q j ∣ i t = q i ) i , j ∈ ( 1 , 2 , ⋯   , N ) A =[a_{ij}]_{N\times N}\qquad a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i})\quad i,j\in (1,2,\cdots,N) A=[aij]N×Naij=P(it+1=qjit=qi)i,j(1,2,,N) a i j a_{ij} aij是指 t t t时刻状态为 q i q_{i} qi转移到 t + 1 t+1 t+1时刻状态 q j q_{j} qj的概率

再定义发射矩阵 B = [ b j ( k ) ] N × M b j ( k ) = P ( o t = v k ∣ i t = q j ) k ∈ [ 1 , M ] , j ∈ [ 1 , N ] B=[b_{j}(k)]_{N\times M}\qquad b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})\quad k\in[1,M],j\in[1,N] B=[bj(k)]N×Mbj(k)=P(ot=vkit=qj)k[1,M],j[1,N] b j ( k ) b_{j}(k) bj(k)是指 t t t时刻状态为 q j q_{j} qj的条件下产生观测序列为 v k v_{k} vk的概率

初始状态概率向量 π = ( π i ) π i = P ( i 1 = q i ) i ∈ ( 1 , 2 , ⋯   , N ) \pi=(\pi_{i}) \qquad \pi_{i}=P(i_1=q_{i}) \quad i\in(1,2,\cdots,N) π=(πi)πi=P(i1=qi)i(1,2,,N) π i \pi_{i} πi t = 1 t=1 t=1时刻状态处于 q i q_{i} qi的概率

所以隐马尔可夫模型由初试状态概率向量 π \pi π、状态转移矩阵 A A A和发射矩阵 B B B矩阵,故隐马尔可夫模型 λ \lambda λ可以记为 λ = ( π , A , B ) \lambda = (\pi,A,B) λ=(π,A,B)

隐马尔可夫模型的俩个假设

隐马尔可夫模型的俩个假设分别为齐次马尔可夫性假设和观测独立性假设

齐次马尔可夫性假设,其假设了时刻 t t t的状态 i t i_{t} it只依赖于其前一时刻的状态,与其它时刻的状态和所有的观测序列无关,即
P ( i t ∣ o 1 , o 2 , ⋯   , o t , i 1 , i 2 , ⋯   , i t − 1 ) = P ( i t ∣ i t − 1 ) P(i_{t}|o_{1},o_{2},\cdots,o_{t},i_{1},i_{2},\cdots,i_{t-1})=P(i_{t}|i_{t-1}) P(ito1,o2,,ot,i1,i2,,it1)=P(itit1)
观测独立性假设,其假设了任意时刻的观测值只依赖于该时刻马尔可夫链的状态,与其他观测状态和状态无关,即
P ( o t ∣ i 1 , i 2 ⋯   , i t , o 1 , o 2 , ⋯   , o t − 1 ) = p ( o t ∣ i t ) P(o_{t}|i_{1},i_{2}\cdots,i_{t},o_{1},o_{2},\cdots,o_{t-1}) = p(o_{t}|i_{t}) P(oti1,i2,it,o1,o2,,ot1)=p(otit)

HMM算法的三个问题

隐马尔可夫主要有三大问题需要解决,分别是概率计算问题、学习问题和解码问题,接下来分别讨论和解决问题

概率计算问题—向前和向后算法

对于概率计算问题就是再给定的参数 λ \lambda λ的前提下,如何求 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
首先
P ( O ∣ λ ) = ∑ I P ( I , O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O|\lambda)=\sum_{I}P(I,O|\lambda)=\sum_{I}P(O|I,\lambda)P(I|\lambda) P(Oλ)=IP(I,Oλ)=IP(OI,λ)P(Iλ)其中 P ( I ∣ λ ) P(I|\lambda) P(Iλ)可以分解为 P ( I ∣ λ ) = P ( i 1 , i 2 , ⋯   , i T ∣ λ ) = P ( i T ∣ i 1 , i 2 , ⋯   , i T − 1 , λ ) ⋅ P ( i T − 1 , ⋯   , i 1 ∣ λ ) P(I|\lambda) = P(i_{1},i_{2},\cdots,i_{T}|\lambda)=P(i_{T}|i_{1},i_{2},\cdots,i_{T-1},\lambda)\cdot P(i_{T-1},\cdots,i_{1}|\lambda) P(Iλ)=P(i1,i2,,iTλ)=P(iTi1,i2,,iT1,λ)P(iT1,,i1λ) = P ( i T ∣ i T − 1 ) ⋅ P ( i T − 1 , ⋯   , i 1 ∣ λ ) = a i t − 1 i t ⋅ P ( i T − 1 , ⋯   , i 1 ∣ λ ) =P(i_{T}|i_{T-1})\cdot P(i_{T-1},\cdots,i_{1}|\lambda)=a_{i_{t-1}i_{t}}\cdot P(i_{T-1},\cdots,i_{1}|\lambda) =P(iTiT1)P(iT1,,i1λ)=ait1itP(iT1,,i1λ)其中 P ( i T ∣ i 1 , i 2 , ⋯   , i T − 1 , λ ) = P ( i T ∣ i T − 1 ) P(i_{T}|i_{1},i_{2},\cdots,i_{T-1},\lambda)=P(i_{T}|i_{T-1}) P(iTi1,i2,,iT1,λ)=P(iTiT1)是根据模型假设得到的
这里我们可以发现 ⋅ P ( i T , ⋯   , i 1 ∣ λ ) \cdot P(i_{T},\cdots,i_{1}|\lambda) P(iT,,i1λ)是可以切分成多个 a i j a_{ij} aij的,只要重复地使用条件概率公式,故有 P ( I ∣ λ ) = P ( i 1 ∣ λ ) ∏ t = 2 T a i t − 1 , i t = π i ∏ t = 2 T a i t − 1 , i t P(I|\lambda) = P(i_{1}|\lambda)\prod_{t=2}^{T}a_{i_{t-1},i_{t}}=\pi_{i}\prod_{t=2}^{T}a_{i_{t-1},i_{t}} P(Iλ)=P(i1λ)t=2Tait1,it=πit=2Tait1,it

同理,根据模型假设对 P ( O ∣ I , λ ) P(O|I,\lambda) P(OI,λ)进行类似的操作,凑出 P ( o t ∣ i t ) P(o_{t}|i_{t}) P(otit)的形式,即
P ( O ∣ I , λ ) = P ( o 1 , o 2 , ⋯   , o t ∣ i 1 , i 2 , ⋯   , i t , λ ) = P ( o 1 ∣ o 2 , ⋯   , o t , i 1 , i 2 , ⋯   , i t , λ ) P ( o 2 , ⋯   , o t ∣ i 1 , i 2 , ⋯   , i t , λ ) = P ( o 1 ∣ i 1 ) P ( o 2 , ⋯   , o t ∣ i 1 , i 2 , ⋯   , i t , λ ) = b i t ( o 1 ) ⋅ P ( o 2 , ⋯   , o t ∣ i 1 , i 2 , ⋯   , i t , λ ) = ∏ t = 1 T b i t ( o t ) \begin{alignat*}{2} P(O|I,\lambda)&=P(o_{1},o_{2},\cdots,o_{t}|i_{1},i_{2},\cdots,i_{t},\lambda)\\&=P(o_{1}|o_{2},\cdots,o_{t},i_{1},i_{2},\cdots,i_{t},\lambda)P(o_{2},\cdots,o_{t}|i_{1},i_{2},\cdots,i_{t},\lambda)\\&=P(o_{1}|i_{1})P(o_{2},\cdots,o_{t}|i_{1},i_{2},\cdots,i_{t},\lambda)\\&=b_{i_{t}}(o_{1})\cdot P(o_{2},\cdots,o_{t}|i_{1},i_{2},\cdots,i_{t},\lambda)\\&=\prod_{t=1}^{T}b_{i_{t}}(o_{t}) \end{alignat*} P(OI,λ)=P(o1,o2,,oti1,i2,,it,λ)=P(o1o2,,ot,i1,i2,,it,λ)P(o2,,oti1,i2,,it,λ)=P(o1i1)P(o2,,oti1,i2,,it,λ)=bit(o1)P(o2,,oti1,i2,,it,λ)=t=1Tbit(ot)

所以得到 P ( O ∣ λ ) = ∑ I π i ⋅ ∏ t = 2 T a i t − 1 , i t ⋅ ∏ t = 1 T b i t ( o t ) = ∑ i 1 ⋯ ∑ i 2 ∑ i T π i ⋅ ∏ t = 2 T a i t − 1 , i t ⋅ ∏ t = 1 T b i t ( o t ) \begin{alignat*}{2}P(O|\lambda)&=\sum_{I}\pi_{i}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_{t}}\cdot\prod_{t=1}^{T}b_{i_{t}}(o_{t})\\ &=\sum_{i_{1}}\cdots\sum_{i_{2}}\sum_{i_{T}}\pi_{i}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_{t}}\cdot\prod_{t=1}^{T}b_{i_{t}}(o_{t})\end{alignat*} P(Oλ)=Iπit=2Tait1,itt=1Tbit(ot)=i1i2iTπit=2Tait1,itt=1Tbit(ot)

这里的 ∑ i t \sum_{i_{t}} it是指对 t t t时刻所有可能的状态取值 i t i_{t} it进行累加求和,所以上诉公式的算法复杂度为 O ( T N T ) O(TN^{T}) O(TNT),复杂度十分的高,所以使用向前向后算法降低算法的复杂度。

前向算法

在这里插入图片描述

前向算法中,我们引入记号 α \alpha α,且 α t ( i ) = P ( o 1 , ⋯   , o t , i t = q t ∣ λ ) \alpha_{t}(i)=P(o_{1},\cdots,o_{t},i_{t}=q_{t}|\lambda) αt(i)=P(o1,,ot,it=qtλ),则其代表的就是上图中虚线部分,即给定参数 λ \lambda λ下, t t t时刻状态为 q i q_{i} qi且观测序列为 o 1 , ⋯   , o t o_{1},\cdots,o_{t} o1,,ot的概率。易知 α T ( i ) = P ( O , i T = q i ∣ λ ) \alpha_{T}(i)=P(O,i_{T}=q_{i}|\lambda) αT(i)=P(O,iT=qiλ),所以有 P ( O ∣ λ ) = ∑ i = 1 n P ( O , i T = q i ∣ λ ) = ∑ i = 1 N α T ( i ) P(O|\lambda)=\sum_{i=1}^{n}P(O,i_{T}=q_{i}|\lambda)=\sum_{i=1}^{N}\alpha_{T}(i) P(Oλ)=i=1nP(O,iT=qiλ)=i=1NαT(i)
所以这里我们就构建了 P ( O ∣ λ ) P(O|\lambda) P(Oλ) α \alpha α的关系式,接下来只需要找到 α t ( i ) \alpha_{t}(i) αt(i) α t + 1 ( j ) \alpha_{t+1}(j) αt+1(j)的递推关系,就可以实现前向算法了
α t + 1 ( j ) = P ( o 1 , o 2 , ⋯   , o t , o t + 1 , i t + 1 = q j ∣ λ ) \alpha_{t+1}(j)=P(o_{1},o_{2},\cdots,o_{t},o_{t+1},i_{t+1}=q_{j}|\lambda) αt+1(j)=P(o1,o2,,ot,ot+1,it+1=qjλ)
这里我们要凑成 α t ( i ) \alpha_{t}(i) αt(i)就要引入一个变量 i t i_{t} it,故 α t ( j ) = ∑ i = 1 N P ( o 1 , ⋯   , o t + 1 , i t = q i , i t + 1 = q j ∣ λ ) \alpha_{t}(j)=\sum_{i=1}^{N}P(o_{1},\cdots,o_{t+1},i_{t}=q_{i},i_{t+1}=q_{j}|\lambda) αt(j)=i=1NP(o1,,ot+1,it=qi,it+1=qjλ)
接下来就可以开始使用条件概率公式简化了
α t + 1 ( j ) = ∑ i = 1 N P ( o 1 , ⋯   , o t + 1 , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ o 1 , ⋯   , o t , i t = q i , i t + 1 = q j , λ ) ⋅ P ( o 1 , ⋯   , o t , i t = q i , i t + 1 = q j ∣ λ ) = ∑ i = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( i t + 1 = q j ∣ o 1 , ⋯   , o t , i t = q i , λ ) P ( o 1 , ⋯   , o t , i t = q i ∣ λ ) = ∑ i = 1 N b j ( o t + 1 ) ⋅ P ( i t + 1 = q j ∣ i t = q i ) ⋅ α t ( i ) = ∑ i = 1 N b j ( o t + 1 ) ⋅ a i j ⋅ α t ( i ) \begin{alignat*}{2} \alpha_{t+1}(j)&=\sum_{i=1}^{N}P(o_{1},\cdots,o_{t+1},i_{t}=q_{i},i_{t+1}=q_{j}|\lambda)\\ &=\sum_{i=1}^{N}P(o_{t+1}|o_{1},\cdots,o_{t},i_{t}=q_{i},i_{t+1}=q_{j},\lambda)\cdot P(o_{1},\cdots,o_{t},i_{t}=q_{i},i_{t+1}=q_{j}|\lambda)\\ &=\sum_{i=1}^{N}P(o_{t+1}|i_{t+1}=q_{j})P(i_{t+1}=q_{j}|o_{1},\cdots,o_{t},i_{t}=q_{i},\lambda)P(o_{1},\cdots,o_{t},i_{t}=q_{i}|\lambda)\\ &=\sum_{i=1}^{N}b_{j}(o_{t+1})\cdot P(i_{t+1}=q_{j}|i_{t}=q_{i})\cdot \alpha_{t}(i)\\ &= \sum_{i=1}^{N}b_{j}(o_{t+1})\cdot a_{ij}\cdot \alpha_{t}(i) \end{alignat*} αt+1(j)=i=1NP(o1,,ot+1,it=qi,it+1=qjλ)=i=1NP(ot+1o1,,ot,it=qi,it+1=qj,λ)P(o1,,ot,it=qi,it+1=qjλ)=i=1NP(ot+1it+1=qj)P(it+1=qjo1,,ot,it=qi,λ)P(o1,,ot,it=qiλ)=i=1Nbj(ot+1)P(it+1=qjit=qi)αt(i)=i=1Nbj(ot+1)aijαt(i)

所以有 α t + 1 ( j ) = ∑ i = 1 N b j ( o t + 1 ) ⋅ a i j ⋅ α t ( i ) \alpha_{t+1}(j)= \sum_{i=1}^{N}b_{j}(o_{t+1})\cdot a_{ij}\cdot \alpha_{t}(i) αt+1(j)=i=1Nbj(ot+1)aijαt(i)

在这里插入图片描述
通过上图我们可以发现,时刻 t t t转移到 t + 1 t+1 t+1时刻的算法复杂度为 N 2 N^{2} N2,共有 T − 1 T-1 T1次转移,所以算法复杂度为 O ( T N 2 ) O(TN^{2}) O(TN2),算法复杂度有了明显的下降

后向算法

在这里插入图片描述

后向算法中我们引入记号 β \beta β,且 β t ( j ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , λ ) \beta_{t}(j)=P(o_{t+1},o_{t+2},\cdots,o_{T}|i_{t}=q_{i},\lambda) βt(j)=P(ot+1,ot+2,,oTit=qi,λ),表示在给定 t t t时刻状态为 q j q_{j} qj的状态为 q i q_{i} qi的条件下,观测序列为 o t + 1 , ⋯   , o T o_{t+1},\cdots,o_{T} ot+1,,oT的概率。易知 β 1 ( i ) = P ( o 2 , ⋯   , o T ∣ i 1 = q i , λ ) \beta_{1}(i)=P(o_{2},\cdots,o_{T}|i_{1}=q_{i},\lambda) β1(i)=P(o2,,oTi1=qi,λ)
P ( O ∣ λ ) = P ( o 1 , ⋯   , o T ∣ λ ) = ∑ i = 1 N P ( o 1 , ⋯   , o T , i 1 = q i ∣ λ ) = ∑ i = 1 N P ( o 1 , ⋯   , o T ∣ i 1 = q i , λ ) P ( i 1 = q i ∣ λ ) = ∑ i = 1 N P ( o 2 , ⋯   , o T ∣ i 1 = q i , λ ) P ( o 1 ∣ o 2 , ⋯   , o T , i 1 = q i , λ ) π i = ∑ i = 1 N β 1 ( i ) ⋅ P ( o 1 ∣ i 1 = q i ) ⋅ π i = ∑ i = 1 N β 1 ( i ) ⋅ b i ( o 1 ) ⋅ π i \begin{alignat*}{2} P(O|\lambda)&= P(o_{1},\cdots,o_{T}|\lambda)\\ &=\sum_{i=1}^{N}P(o_{1},\cdots,o_{T},i_{1}=q_{i}|\lambda)\\ &=\sum_{i=1}^{N}P(o_{1},\cdots,o_{T}|i_{1}=q_{i},\lambda)P(i_{1}=q_{i}|\lambda)\\ &=\sum_{i=1}^{N}P(o_{2},\cdots,o_{T}|i_{1}=q_{i},\lambda)P(o_{1}|o_{2},\cdots,o_{T},i_{1}=q_{i},\lambda)\pi_{i}\\ &=\sum_{i=1}^{N}\beta_{1}(i)\cdot P(o_{1}|i_{1}=q_{i})\cdot\pi_{i}\\ &=\sum_{i=1}^{N}\beta_{1}(i)\cdot b_{i}(o_1) \cdot\pi_{i} \end{alignat*} P(Oλ)=P(o1,,oTλ)=i=1NP(o1,,oT,i1=qiλ)=i=1NP(o1,,oTi1=qi,λ)P(i1=qiλ)=i=1NP(o2,,oTi1=qi,λ)P(o1o2,,oT,i1=qi,λ)πi=i=1Nβ1(i)P(o1i1=qi)πi=i=1Nβ1(i)bi(o1)πi

接下来就是找递推公式了
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , λ ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j ∣ i t = q i , λ ) = ∑ j = 1 N P ( i t + 1 = q j ∣ o t + 1 , o t + 2 , ⋯   , o T , i t = q i , λ ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , i t + 1 = q j , λ ) = ∑ j = 1 N P ( i t + 1 = q j ∣ i t = q i ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , i t + 1 = q j , λ ) = ∑ j = 1 N a i , j P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j , λ ) = ∑ j = 1 N a i , j P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j , λ ) P ( o t + 1 ∣ o t + 2 , ⋯   , o T , i t + 1 = q j , λ ) = ∑ j = 1 N a i , j P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j , λ ) P ( o t + 1 ∣ i t + 1 = q j ) = ∑ j = 1 N a i , j ⋅ β t + 1 ( j ) ⋅ b j ( o t + 1 ) \begin{alignat*}{2} \beta_{t}(i) &= P(o_{t+1},o_{t+2},\cdots,o_{T}|i_{t}=q_{i},\lambda)\\ &=\sum_{j=1}^{N}P(o_{t+1},o_{t+2},\cdots,o_{T},i_{t+1}=q_{j}|i_{t}=q_{i},\lambda)\\ &=\sum_{j=1}^{N}P(i_{t+1}=q_{j}|o_{t+1},o_{t+2},\cdots,o_{T},i_{t}=q_{i},\lambda)P(o_{t+1},o_{t+2},\cdots,o_{T}|i_{t}=q_{i},i_{t+1}=q_{j},\lambda)\\ &=\sum_{j=1}^{N}P(i_{t+1}=q_{j}|i_{t}=q_{i})P(o_{t+1},o_{t+2},\cdots,o_{T}|i_{t}=q_{i},i_{t+1}=q_{j},\lambda)\\ &=\sum_{j=1}^{N}a_{i,j}P(o_{t+1},o_{t+2},\cdots,o_{T}|i_{t+1}=q_{j},\lambda)\\ &=\sum_{j=1}^{N}a_{i,j}P(o_{t+2},\cdots,o_{T}|i_{t+1}=q_{j},\lambda)P(o_{t+1}|o_{t+2},\cdots,o_{T},i_{t+1}=q_{j},\lambda)\\ &=\sum_{j=1}^{N}a_{i,j}P(o_{t+2},\cdots,o_{T}|i_{t+1}=q_{j},\lambda)P(o_{t+1}|i_{t+1}=q_{j})\\ &=\sum_{j=1}^{N}a_{i,j}\cdot \beta_{t+1}(j)\cdot b_{j}(o_{t+1}) \end{alignat*} βt(i)=P(ot+1,ot+2,,oTit=qi,λ)=j=1NP(ot+1,ot+2,,oT,it+1=qjit=qi,λ)=j=1NP(it+1=qjot+1,ot+2,,oT,it=qi,λ)P(ot+1,ot+2,,oTit=qi,it+1=qj,λ)=j=1NP(it+1=qjit=qi)P(ot+1,ot+2,,oTit=qi,it+1=qj,λ)=j=1Nai,jP(ot+1,ot+2,,oTit+1=qj,λ)=j=1Nai,jP(ot+2,,oTit+1=qj,λ)P(ot+1ot+2,,oT,it+1=qj,λ)=j=1Nai,jP(ot+2,,oTit+1=qj,λ)P(ot+1it+1=qj)=j=1Nai,jβt+1(j)bj(ot+1)

这样就可以得到递推公式 β t ( i ) = ∑ j = 1 N a i , j ⋅ β t + 1 ( j ) ⋅ b j ( o t + 1 ) \beta_{t}(i) =\sum_{j=1}^{N}a_{i,j}\cdot \beta_{t+1}(j)\cdot b_{j}(o_{t+1}) βt(i)=j=1Nai,jβt+1(j)bj(ot+1)

学习问题—EM算法

HMM模型的定义 λ = ( π , A , B ) \lambda=(\pi,A,B) λ=(π,A,B),目标是 λ t = ( π t , A t , B t ) ⟶ λ t + 1 = ( π t + 1 , A t + 1 , B t + 1 ) \lambda^{t}=(\pi^{t},A^{t},B^{t})\longrightarrow \lambda^{t+1}=(\pi^{t+1},A^{t+1},B^{t+1}) λt=(πt,At,Bt)λt+1=(πt+1,At+1,Bt+1)

在EM算法中 θ t + 1 = arg max ⁡ θ ∫ z P ( z ∣ X , θ t ) log ⁡ P ( X , z ∣ θ ) \theta^{t+1}=\argmax_{\theta}\int_{z}P(z|X,\theta^{t})\log P(X,z|\theta) θt+1=θargmaxzP(zX,θt)logP(X,zθ)

其中 x x x为观测值对应观测序列 O O O z z z为隐变量对应状态序列 I I I,参数 θ \theta θ则对应 λ \lambda λ

所以在HMM中,表达式为 λ t + 1 = arg max ⁡ λ ∑ I log ⁡ P ( O , I ∣ λ ) P ( I ∣ O , λ t ) \lambda^{t+1}=\argmax_{\lambda}\sum_{I}\log P(O,I|\lambda)P(I|O,\lambda^{t}) λt+1=λargmaxIlogP(O,Iλ)P(IO,λt)

这里做一个化简,由于 O O O λ t \lambda^{t} λt是给定的,所以有 λ t + 1 = arg max ⁡ I log ⁡ P ( O , I ∣ λ ) P ( O , I ∣ λ t ) P ( O ∣ λ t ) = arg max ⁡ I l o g P ( O , I ∣ λ ) ⋅ P ( O , I ∣ λ t ) \lambda^{t+1}=\argmax_{I}\log P(O,I|\lambda)\frac{P(O,I|\lambda^{t})}{P(O|\lambda^{t})}=\argmax_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^{t}) λt+1=IargmaxlogP(O,Iλ)P(Oλt)P(O,Iλt)=IargmaxlogP(O,Iλ)P(O,Iλt)

故令 Q ( λ , λ t ) = arg max ⁡ I l o g P ( O , I ∣ λ ) ⋅ P ( O , I ∣ λ t ) Q(\lambda,\lambda^{t})=\argmax_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^{t}) Q(λ,λt)=IargmaxlogP(O,Iλ)P(O,Iλt)

然后将 P ( O , I ∣ λ ) = π i ⋅ ∏ t = 2 T a i t − 1 , i t ⋅ ∏ t = 1 T b i t ( o t ) P(O,I|\lambda)=\pi_{i}\cdot \prod_{t=2}^{T}a_{i_{t-1},i_{t}}\cdot\prod_{t=1}^{T}b_{i_{t}}(o_{t}) P(O,Iλ)=πit=2Tait1,itt=1Tbit(ot)代入

则有 Q ( λ , λ t ) = ∑ I [ log ⁡ π i + ∑ t = 2 T a i t − 1 , i t + ∑ t = 1 T b i t ( o t ) ] ⋅ P ( O , I ∣ λ t ) Q(\lambda,\lambda^{t})=\sum_{I}[\log \pi_{i}+\sum^{T}_{t=2}a_{i_{t-1},i_{t}}+\sum_{t=1}^{T}b_{i_{t}}(o_{t})]\cdot P(O,I|\lambda^{t}) Q(λ,λt)=I[logπi+t=2Tait1,it+t=1Tbit(ot)]P(O,Iλt)

对于这个问题,我们可以使用EM算法,先假定A,B为常量,求 π t + 1 \pi^{t+1} πt+1,即 π t + 1 = arg max ⁡ π ∑ I [ log ⁡ π i + ∑ t = 2 T a i t − 1 , i t + ∑ t = 1 T b i t ( o t ) ] ⋅ P ( O , I ∣ λ t ) = arg max ⁡ π ∑ I log ⁡ π i ⋅ P ( O , I ∣ λ t ) \pi^{t+1}=\argmax_{\pi}\sum_{I}[\log \pi_{i}+\sum^{T}_{t=2}a_{i_{t-1},i_{t}}+\sum_{t=1}^{T}b_{i_{t}}(o_{t})]\cdot P(O,I|\lambda^{t})=\argmax_{\pi}\sum_{I}\log \pi_{i}\cdot P(O,I|\lambda^{t}) πt+1=πargmaxI[logπi+t=2Tait1,it+t=1Tbit(ot)]P(O,Iλt)=πargmaxIlogπiP(O,Iλt)

这里可以做一个化简,即 arg max ⁡ π ∑ I log ⁡ π i t ⋅ P ( O , I ∣ λ t ) = arg max ⁡ π ∑ i 1 ⋯ ∑ i 2 ∑ i T log ⁡ π i t ⋅ P ( O , I ∣ λ t ) = arg max ⁡ π ∑ i 1 log ⁡ π i P ( O , i 1 ∣ λ t ) \argmax_{\pi}\sum_{I}\log \pi_{i_{t}}\cdot P(O,I|\lambda^{t})=\argmax_{\pi}\sum_{i_{1}}\cdots\sum_{i_{2}}\sum_{i_{T}}\log \pi_{i_{t}}\cdot P(O,I|\lambda^{t})=\argmax_{\pi}\sum_{i_{1}}\log\pi_{i}P(O,i_{1}|\lambda^{t}) πargmaxIlogπitP(O,Iλt)=πargmaxi1i2iTlogπitP(O,Iλt)=πargmaxi1logπiP(O,i1λt)

所以这里我们的目标函数为 arg max ⁡ π ∑ i 1 log ⁡ π i P ( O , i 1 ∣ λ t ) \argmax_{\pi}\sum_{i_{1}}\log\pi_{i}P(O,i_{1}|\lambda^{t}) πargmaxi1logπiP(O,i1λt) s . t ∑ i π i = 1 s.t \qquad\sum_{i}\pi_{i} =1 s.tiπi=1

这里可以使用拉格朗日乘子法求解,即 γ ( π , η ) = ∑ i 1 log ⁡ π i P ( O , i 1 ∣ λ t ) + η ( ∑ i π i − 1 ) \gamma(\pi,\eta)=\sum_{i_{1}}\log\pi_{i}P(O,i_{1}|\lambda^{t})+\eta(\sum_{i}\pi_{i}-1) γ(π,η)=i1logπiP(O,i1λt)+η(iπi1)

接下来接可以求导了 ∂ γ ( π , η ) ∂ π i = 1 π i P ( O , i 1 ∣ λ t ) + η = 0 \frac{\partial\gamma(\pi,\eta)}{\partial\pi_{i}}=\frac{1}{\pi_{i}}P(O,i_{1}|\lambda^{t})+\eta=0 πiγ(π,η)=πi1P(O,i1λt)+η=0

可以求得 P ( O , i 1 ∣ λ t ) + π i η = 0 P(O,i_{1}|\lambda^{t})+\pi_{i}\eta=0 P(O,i1λt)+πiη=0

这里对 i 1 i_{1} i1做累加就有 ∑ i 1 P ( O , i 1 ∣ λ t ) + π i η = P ( O ∣ λ t ) + η = 0 \sum_{i_{1}}P(O,i_{1}|\lambda^{t})+\pi_{i}\eta=P(O|\lambda^{t})+\eta=0 i1P(O,i1λt)+πiη=P(Oλt)+η=0

所以解得 η = − P ( O ∣ λ t ) \eta=-P(O|\lambda^{t}) η=P(Oλt),代入原式中有 P ( O , i 1 ∣ λ t ) + π i η = P ( O , i 1 ∣ λ t ) − P ( O ∣ λ t ) π i = 0 P(O,i_{1}|\lambda^{t})+\pi_{i}\eta=P(O,i_{1}|\lambda^{t})-P(O|\lambda^{t})\pi_{i}=0 P(O,i1λt)+πiη=P(O,i1λt)P(Oλt)πi=0

所以有 π i t + 1 = P ( O , i 1 ∣ λ t ) P ( O ∣ λ t ) \pi^{t+1}_{i}=\frac{P(O,i_{1}|\lambda^{t})}{P(O|\lambda^{t})} πit+1=P(Oλt)P(O,i1λt)上述变量都是给定的,所以可以求得 π t \pi^{t} πt,同理也可以求得 A t A^{t} At,和 B t + 1 B^{t+1} Bt+1

解码问题—viterbi算法

Decoding问题,是要解决通过寻找最大概率路径的问题。这里我们引入符号 ω t ( i ) \omega_{t}(i) ωt(i),记为在时刻 1 : t − 1 1:t-1 1:t1中每个时刻的最大概率。 ω t ( i ) = max ⁡ i 1 , i 2 , ⋯   , i t − 1 P ( o 1 , o 2 , ⋯   , o t , i 1 , i 2 , ⋯   , i t − 1 , i t = q i ) \omega_{t}(i)=\max_{i_{1},i_{2},\cdots,i_{t-1}}P(o_{1},o_{2},\cdots,o_{t},i_{1},i_{2},\cdots,i_{t-1},i_{t}=q_{i}) ωt(i)=i1,i2,,it1maxP(o1,o2,,ot,i1,i2,,it1,it=qi) ω t + 1 ( j ) = max ⁡ 1 ≤ i ≤ N ω t ( i ) a i , j b j ( o t + 1 ) \omega_{t+1}(j) = \max_{1\leq i\leq N}\omega_{t}(i)a_{i,j}b_{j}(o_{t+1}) ωt+1(j)=1iNmaxωt(i)ai,jbj(ot+1)
其中我们可以用 ζ t + 1 ( j ) \zeta_{t+1}(j) ζt+1(j)来记录 y t + 1 y_{t+1} yt+1 i i i y t y_{t} yt的最佳取值,即 ζ t + 1 ( j ) = arg max ⁡ 1 ≤ i ≤ N ω t ( i ) ⋅ a i , j ⋅ b j ( o t + 1 ) = arg max ⁡ 1 ≤ i ≤ N ω t ( i ) ⋅ a i , j \zeta_{t+1}(j)=\argmax_{1\leq i\leq N}\omega_{t}(i)\cdot a_{i,j}\cdot b_{j}(o_{t+1})=\argmax_{1\leq i\leq N}\omega_{t}(i) \cdot a_{i,j} ζt+1(j)=1iNargmaxωt(i)ai,jbj(ot+1)=1iNargmaxωt(i)ai,j其中 b j ( o t + 1 ) b_{j}(o_{t+1}) bj(ot+1)是固定的,所以可以去掉
所以 i T ∗ = arg max ⁡ j = 1 : N ω T ( j ) \displaystyle i^{*}_{T} = \argmax_{j=1:N} \omega_{T}(j) iT=j=1:NargmaxωT(j)
然后根据 ζ \zeta ζ进行回溯获得最佳路径,即 i t ∗ = ζ t + 1 ( i T ∗ ) i^{*}_{t} = \zeta_{t+1}(i^{*}_{T}) it=ζt+1(iT)

通过回溯就可以得到最佳路径为 i ∗ = ( i 1 ∗ , i 2 ∗ , ⋯   , i T ∗ ) i^{*} =(i_{1}^{*},i_{2}^{*},\cdots,i_{T}^{*}) i=(i1,i2,,iT)

通过上述方法,我们就可以实现最大概率路径的寻找

HMM算法的应用场景

HMM的任务场景主要有俩种,一种是 λ \lambda λ未知,通过学习算法去推断 λ \lambda λ;另一种就是 λ \lambda λ已知,去推理后续的一些东西,例如解码、概率计算、Filtering、Smoothing和预测。学习问题、解码问题和概率计算的问题已经在上面详细阐述了。

接下来我们根据这张图完成下列任务的解释
在这里插入图片描述

Filtering 任务

Filtering问题是解决 P ( z t ∣ x 1 , x 2 , ⋯   , x t ) P(z_{t}|x_{1},x_{2},\cdots,x_{t}) P(ztx1,x2,,xt)的计算问题

对于 P ( z t ∣ x 1 , x 2 , ⋯   , x t ) P(z_{t}|x_{1},x_{2},\cdots,x_{t}) P(ztx1,x2,,xt),我们可以通过化简得到
P ( z t ∣ x 1 , x 2 , ⋯   , x t ) = P ( x 1 , ⋯   , x t , z t ) P ( x 1 , ⋯   , x t ) = P ( x 1 , ⋯   , x t , z t ) ∑ z t P ( x 1 , ⋯   , x t , z t ) P(z_{t}|x_{1},x_{2},\cdots,x_{t})=\frac{P(x_{1,\cdots,x_{t}},z_{t})}{P(x_{1},\cdots,x_{t})}=\frac{P(x_{1,\cdots,x_{t}},z_{t})}{\sum_{z_{t}}P(x_{1},\cdots,x_{t},z_{t})} P(ztx1,x2,,xt)=P(x1,,xt)P(x1,,xt,zt)=ztP(x1,,xt,zt)P(x1,,xt,zt)

又因为 α t ( z t ) = P ( x 1 , ⋯   , x t , z t ) \alpha_{t}(z_{t})=P(x_{1,\cdots,x_{t}},z_{t}) αt(zt)=P(x1,,xt,zt),而 P ( x 1 , ⋯   , x t , z t ) P(x_{1},\cdots,x_{t},z_{t}) P(x1,,xt,zt)是常量,所以可以有 P ( z t ∣ x 1 , x 2 , ⋯   , x t ) ∝ α t ( z t ) P(z_{t}|x_{1},x_{2},\cdots,x_{t})\propto \alpha_{t}(z_{t}) P(ztx1,x2,,xt)αt(zt)

Smoothing任务

Smoothing问题是在给定观测值 x 1 , x 2 , ⋯   , x T x_{1},x_{2},\cdots,x_{T} x1,x2,,xT之后,求 z t z_{t} zt,即 P ( z t ∣ x 1 , ⋯   , x T ) P(z_{t}|x_{1},\cdots,x_{T}) P(ztx1,,xT)
与Filtering任务类似,我们可以简化为 P ( z t ∣ x 1 , ⋯   , x T ) = P ( x 1 , ⋯   , x T , z t ) ∑ z t P ( x 1 , ⋯   , x T , z t ) P(z_{t}|x_{1},\cdots,x_{T})=\frac{P(x_{1},\cdots,x_{T},z_{t})}{\sum_{z_{t}}P(x_{1},\cdots,x_{T},z_{t})} P(ztx1,,xT)=ztP(x1,,xT,zt)P(x1,,xT,zt)

对于 P ( x 1 , ⋯   , x T , z t ) P(x_{1},\cdots,x_{T},z_{t}) P(x1,,xT,zt)可以做一下处理 P ( x 1 , ⋯   , x T , z t ) = P ( x t + 1 , ⋯   , x T ∣ x 1 , ⋯   , x t , z t ) P ( x 1 , ⋯   , x t , z t ) = P ( x t + 1 , ⋯   , x T ∣ x 1 , ⋯   , x t , z t ) α t ( z t ) = P ( x t + 1 , ⋯   , x T ∣ z t ) ⋅ α t ( z t ) = β t ( z t ) α t ( z t ) \begin{alignat*}{2}P(x_{1},\cdots,x_{T},z_{t})&=P(x_{t+1},\cdots,x_{T}|x_{1},\cdots,x_{t,}z_{t})P(x_{1},\cdots,x_{t},z_{t})\\ &=P(x_{t+1},\cdots,x_{T}|x_{1},\cdots,x_{t,}z_{t})\alpha_{t}(z_{t})\\ &=P(x_{t+1},\cdots,x_{T}|z_{t})\cdot\alpha_{t}(z_{t})\\ &=\beta_{t}(z_{t})\alpha_{t}(z_{t}) \end{alignat*} P(x1,,xT,zt)=P(xt+1,,xTx1,,xt,zt)P(x1,,xt,zt)=P(xt+1,,xTx1,,xt,zt)αt(zt)=P(xt+1,,xTzt)αt(zt)=βt(zt)αt(zt)
其中 P ( x t + 1 , ⋯   , x T ∣ x 1 , ⋯   , x t , z t ) = P ( x t + 1 , ⋯   , x T ∣ z t ) P(x_{t+1},\cdots,x_{T}|x_{1},\cdots,x_{t,}z_{t})=P(x_{t+1},\cdots,x_{T}|z_{t}) P(xt+1,,xTx1,,xt,zt)=P(xt+1,,xTzt),可以用贝叶斯网中的图论分析解释,由于 x 1 : t → z t → x t + 1 : T x_{1:t}\rightarrow z_{t}\rightarrow x_{t+1:T} x1:tztxt+1:T的路径是唯一的,如果缺失 z t z_{t} zt则会造成阻塞,所以在给定 z t z_{t} zt的条件下 x 1 : t x_{1:t} x1:t x t : T x_{t:T} xt:T相互独立。所以才有 P ( x t + 1 , ⋯   , x T ∣ x 1 , ⋯   , x t , z t ) = P ( x t + 1 , ⋯   , x T ∣ z t ) P(x_{t+1},\cdots,x_{T}|x_{1},\cdots,x_{t,}z_{t})=P(x_{t+1},\cdots,x_{T}|z_{t}) P(xt+1,,xTx1,,xt,zt)=P(xt+1,,xTzt)

所以 P ( z t ∣ x 1 , ⋯   , x T ) ∝ P ( x 1 , ⋯   , x T , z t ) = β t ( z t ) α t ( z t ) P(z_{t}|x_{1},\cdots,x_{T})\propto P(x_{1},\cdots,x_{T},z_{t})=\beta_{t}(z_{t})\alpha_{t}(z_{t}) P(ztx1,,xT)P(x1,,xT,zt)=βt(zt)αt(zt),同时它也被称为前向后向算法

Prediction任务

预测任务主要是已知 x 1 , ⋯   , x t x_{1},\cdots,x_{t} x1,,xt来预测 x t + 1 x_{t+1} xt+1 z t + 1 z_{t+1} zt+1。即 P ( z t + 1 ∣ x 1 , ⋯   , x t ) P(z_{t+1}|x_{1},\cdots,x_{t}) P(zt+1x1,,xt) P ( x t + 1 ∣ x 1 , ⋯   , x t ) P(x_{t+1}|x_{1},\cdots,x_{t}) P(xt+1x1,,xt)

对于 P ( z t + 1 ∣ x 1 , ⋯   , x t ) P(z_{t+1}|x_{1},\cdots,x_{t}) P(zt+1x1,,xt),我们可以引入 z t z_{t} zt,然后在进行变换,具体如下 P ( z t + 1 ∣ x 1 : t ) = ∑ z t P ( z t + 1 , z t ∣ x 1 : t ) = ∑ z t P ( z t ∣ x 1 : t ) P ( z t + 1 ∣ z t , x 1 : t ) = ∑ z t α t ( z t ) ⋅ P ( z t + 1 ∣ z t ) = ∑ z t α t ( z t ) a z t , z t + 1 \begin{alignat*}{2}P(z_{t+1}|x_{1:t})&=\sum_{z_{t}}P(z_{t+1},z_{t}|x_{1:t})\\ &=\sum_{z_{t}}P(z_{t}|x_{1:t})P(z_{t+1}|z_{t},x_{1:t})\\ &=\sum_{z_{t}}\alpha_{t}(z_{t})\cdot P(z_{t+1}|z_{t})\\ &=\sum_{z_{t}}\alpha_{t}(z_{t})a_{z_{t},z_{t+1}} \end{alignat*} P(zt+1x1:t)=ztP(zt+1,ztx1:t)=ztP(ztx1:t)P(zt+1zt,x1:t)=ztαt(zt)P(zt+1zt)=ztαt(zt)azt,zt+1

对于 P ( x t + 1 ∣ x 1 , ⋯   , x t ) P(x_{t+1}|x_{1},\cdots,x_{t}) P(xt+1x1,,xt),我们可以引入 z t + 1 z_{t+1} zt+1,与 P ( z t + 1 ∣ x 1 , ⋯   , x t ) P(z_{t+1}|x_{1},\cdots,x_{t}) P(zt+1x1,,xt)构成联系,从而求解,具体如下
P ( x t + 1 ∣ x 1 : t ) = ∑ z t + 1 P ( x t + 1 , z t + 1 ∣ x 1 : t ) = ∑ z t + 1 P ( z t + 1 ∣ x 1 : t ) P ( x t + 1 ∣ z t + 1 , x 1 : t ) = ∑ z t + 1 P ( z t + 1 ∣ x 1 : t ) P ( x t + 1 ∣ z t + 1 ) = ∑ z t + 1 P ( z t + 1 ∣ x 1 : t ) b z t + 1 ( x t + 1 ) = ∑ z t α t ( z t ) a z t , z t + 1 ⋅ b z t + 1 ( x t + 1 ) \begin{alignat*}{2} P(x_{t+1}|x_{1:t})&=\sum_{z_{t+1}}P(x_{t+1},z_{t+1}|x_{1:t})\\ &=\sum_{z_{t+1}}P(z_{t+1}|x_{1:t})P(x_{t+1}|z_{t+1},x_{1:t})\\ &=\sum_{z_{t+1}}P(z_{t+1}|x_{1:t})P(x_{t+1}|z_{t+1})\\ &=\sum_{z_{t+1}}P(z_{t+1}|x_{1:t})b_{z_{t+1}}(x_{t+1})\\ &=\sum_{z_{t}}\alpha_{t}(z_{t})a_{z_{t},z_{t+1}}\cdot b_{z_{t+1}}(x_{t+1}) \end{alignat*} P(xt+1x1:t)=zt+1P(xt+1,zt+1x1:t)=zt+1P(zt+1x1:t)P(xt+1zt+1,x1:t)=zt+1P(zt+1x1:t)P(xt+1zt+1)=zt+1P(zt+1x1:t)bzt+1(xt+1)=ztαt(zt)azt,zt+1bzt+1(xt+1)

以上内容属于个人学习总结,如有错误请指正

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值