隐含马尔科夫(HMM)模型算法推导

隐含马尔科夫(HMM)模型

1.简介

隐含马尔科夫模型是一种关于时序的有向图概率模型,可用于最高气温预测、分词等时序数据建模问题。首先介绍下这一类问题的特点,用 o t o_t ot表示t时刻观测值,记观测值序列为 O = { o 1 , o 2 , … , o n } O=\{o_1,o_2,\dots,o_n\} O={o1,o2,,on},某一个时刻t的观测值与前n个时刻观测值有一定关系;根据常识来看,某天的最高气温跟前两天的最高气温有一定关系,这就是一个时序数据建模问题。解决这类问题最简单的方法就是用多项式来拟合,以前n个时刻观测值作为输入,以t时刻观测值输出。虽然模型很简单,但其效果并不是很好,因为真实情况比这个复杂得多。隐含马尔科夫模型能够更好地解决这类问题。

隐含马尔科夫模型并不直接使用观测值作为输入,而是假设观测值序列O是由对应的状态序列S产生的,模型最后预测的结果也是状态序列,这样就能够求解出隐变量。状态S就是隐变量,满足马尔科夫假设,即满足任意t时刻的状态 s t s_t st与其前n个状态相关;并且观测值序列O由状态序列S产生。为了简便,特作出两个假设:

  • 齐次马尔科夫假设:任意t时刻状态 s t s_t st只与其前一个状态 s t − 1 s_{t-1} st1有关。
  • 观测独立性假设:任意t时刻观测值 o t o_t ot只与该时刻状态 s t s_t st有关,观测值之间相互独立。

特别说明,这里的状态S一定是离散的,观测值O既可以是离散的也可以是连续的,为了便于说明,采用离散的形式来讨论。

​ 假设 Q = { q 1 , q 2 , … , q M } Q=\{q_1,q_2,\dots,q_M\} Q={q1,q2,,qM}表示所有可能状态的集合; V = { v 1 , v 2 , … , v N } V=\{v_1,v_2,\dots,v_N\} V={v1,v2,,vN}表示所有可能观测值;状态转移概率矩阵A,其中 a i j a_{ij} aij表示状态 q i q_i qi转移到状态 q j q_j qj的概率,即 a i j = P ( s t = q j ∣ s t − 1 = q i ) a_{ij}=P(s_t = q_j|s_{t-1}=q_i) aij=P(st=qjst1=qi)
A = [ a 11 a 12 … a 1 M a 21 a 22 … a 2 M ⋮ ⋮ … ⋮ a M 1 a M 2 … a M M ] M × M A = \left[ \begin{matrix} a_{11} & a_{12} & \dots &a_{1M}\\ a_{21} & a_{22} & \dots &a_{2M}\\ \vdots & \vdots & \dots & \vdots\\ a_{M1} & a_{M2} & \dots &a_{MM} \end{matrix} \right]_{M \times M} A=a11a21aM1a12a22aM2a1Ma2MaMMM×M
观测概率矩阵B,其中 b i j b_{ij} bij表示状态 q i q_i qi产生观测值 v j v_j vj的概率,即 b i j = P ( o t = v j ∣ s t = q i ) b_{ij}=P(o_t = v_j|s_t=q_i) bij=P(ot=vjst=qi)
B = [ b 11 b 12 … b 1 N b 21 b 22 … b 2 N ⋮ ⋮ … ⋮ b M 1 b M 2 … b M N ] M × N B = \left[ \begin{matrix} b_{11} & b_{12} & \dots & b_{1N}\\ b_{21} & b_{22} & \dots & b_{2N}\\ \vdots & \vdots & \dots & \vdots\\ b_{M1} & b_{M2} & \dots & b_{MN} \end{matrix} \right]_{M \times N} B=b11b21bM1b12b22bM2b1Nb2NbMNM×N
时刻1的状态概率向量 Π \Pi Π,其中 π i \pi_i πi表示时刻1状态 q i q_i qi出现的概率,即 π i = P ( s 1 = q i ) \pi_i = P(s_1 = q_i) πi=P(s1=qi)
Π = [ π 1 , π 2 , … , π M ] 1 × M \Pi = [\pi_1, \pi_2,\dots,\pi_M]_{1 \times M} Π=[π1,π2,,πM]1×M
于是,隐含马尔科夫模型可记为 λ = ( A , B , Π ) \lambda = (A,B,\Pi) λ=(A,B,Π),其中 A , B , Π A,B,\Pi A,B,Π称为隐含马尔科夫模型三要素。

​ 训练隐含马尔科夫模型需要解决3个基本问题:

  • 概率计算问题:给出观测序列O,如何计算似然函数概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
  • 学习问题:给定观测序列O,如何最大化似然函数求解参数 λ \lambda λ
  • 预测问题:求解得到模型后,如何预测得出最优状态序列

2.概率计算问题

​ 给定观测序列 O = { o 1 , o 2 , … , o T } O=\{o_1,o_2,\dots,o_T\} O={o1,o2,,oT}与参数模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)
P ( O ∣ λ ) = ∑ S ∈ Q T P ( O , S ∣ λ ) = ∑ s 1 , s 2 , … , s T P ( o 1 , o 2 , … , o T , s 1 , s 2 , … , s T ∣ λ ) = ∑ s 1 , s 2 , … , s T π s 1 b s 1 o 1 a s 1 s 2 b s 2 o 2 o 2 … a s T − 1 s T b s T o T \begin{aligned} P(O|\lambda) &= \sum_{S \in Q^T} P(O,S|\lambda)\\ & = \sum_{s_1,s_2,\dots,s_T} P(o_1,o_2,\dots,o_T,s_1,s_2,\dots,s_T|\lambda)\\ & = \sum_{s_1,s_2,\dots,s_T} \pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2}o_2 \dots a_{s_{T-1}s_T}b_{s_To_T} \end{aligned} P(Oλ)=SQTP(O,Sλ)=s1,s2,,sTP(o1,o2,,oT,s1,s2,,sTλ)=s1,s2,,sTπs1bs1o1as1s2bs2o2o2asT1sTbsToT
​ 其中 s 1 , s 2 , … , s T s_1,s_2,\dots,s_T s1,s2,,sT为任意长度为T的状态序列,并且观测值之间相互独立。显然如果直接计算,那么计算复杂度会非常高,接下来介绍前向后向算法来计算上述概率表达式。

2.1 前向算法

前向算法做出如下定义:
α t ( i ) = P ( o 1 , o 2 , … , o t , s t = q i ∣ λ ) , t = 1 , 2 , … , T \alpha_t(i) = P(o_1,o_2,\dots,o_t,s_t=q_i|\lambda),t=1,2,\dots,T αt(i)=P(o1,o2,,ot,st=qiλ),t=1,2,,T
​ 那么,初值为 α 1 ( i ) = P ( o 1 , s 1 = q i ∣ λ ) = π i b q 1 o 1 \alpha_1(i) = P(o_1,s_1=q_i|\lambda)=\pi_ib_{q_1o_1} α1(i)=P(o1,s1=qiλ)=πibq1o1,递推公式为
α t + 1 ( i ) = P ( o 1 , o 2 , … , o t , o t + 1 , s t + 1 = q i ∣ λ ) = b i o t + 1 ⋅ ∑ j = 1 M α t ( j ) ⋅ a j i \begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,\dots,o_t,o_{t+1}, s_{t+1}=q_i|\lambda)\\ & = b_{io_{t+1}} \cdot \sum_{j=1}^M \alpha_t(j) \cdot a_{ji} \end{aligned} αt+1(i)=P(o1,o2,,ot,ot+1,st+1=qiλ)=biot+1j=1Mαt(j)aji
​ 于是通过迭代的方式可以求得 α T ( i ) , i = 1 , 2 , … , M \alpha_T(i),i=1,2,\dots,M αT(i),i=1,2,,M,即求得
P ( O ∣ λ ) = ∑ i = 1 M P ( o 1 , o 2 , … , o T , s T = q i ∣ λ ) = ∑ i = 1 M α T ( i ) P(O|\lambda)=\sum_{i=1}^M P(o_1,o_2,\dots,o_T,s_T=q_i|\lambda)=\sum_{i=1}^M\alpha_T(i) P(Oλ)=i=1MP(o1,o2,,oT,sT=qiλ)=i=1MαT(i)

2.2 后向算法

后向算法做出如下定义:
β t ( i ) = P ( o t + 1 , … , o T ∣ s t = q i , λ ) , t = T , T − 1 , … , 1 \beta_t(i) = P(o_{t+1},\dots,o_T|s_t=q_i,\lambda),t = T,T-1,\dots,1 βt(i)=P(ot+1,,oTst=qi,λ),t=T,T1,,1
​ 那么,初值为 β T ( i ) = 1 \beta_T(i)=1 βT(i)=1,递推公式为
β t − 1 ( i ) = P ( o t , o t + 1 , … , o T ∣ s t − 1 = q i , λ ) = ∑ j = 1 M α i j ⋅ b j o t ⋅ β t ( j ) \begin{aligned} \beta_{t-1}(i)&=P(o_t,o_{t+1},\dots,o_T|s_{t-1}=q_i,\lambda)\\ & = \sum_{j=1}^M \alpha_{ij} \cdot b_{jo_t} \cdot \beta_t(j) \end{aligned} βt1(i)=P(ot,ot+1,,oTst1=qi,λ)=j=1Mαijbjotβt(j)
​ 于是通过迭代的方式可以求得 β 1 ( i ) , i = 1 , 2 , … , M \beta_1(i),i=1,2,\dots,M β1(i),i=1,2,,M,即求得
P ( O ∣ λ ) = ∑ i = 1 M P ( o 1 , o 2 , … , o T ∣ s 1 = q i , λ ) = ∑ i = 1 M π i ⋅ b i o 1 ⋅ β 1 ( i ) P(O|\lambda)=\sum_{i=1}^M P(o_1,o_2,\dots,o_T|s_1=q_i,\lambda)=\sum_{i=1}^M \pi_{i} \cdot b_{io_1} \cdot \beta_1(i) P(Oλ)=i=1MP(o1,o2,,oTs1=qi,λ)=i=1Mπibio1β1(i)

3.学习问题

​ 隐含马尔科夫模型只能够含有隐变量S,可使用EM算法进行求解,求解后得到的算法称鲍姆-韦尔奇(Baum-Welch)算法。算法推导过程如下:

​ 如前所述,给定观测值序列O和模型参数 λ \lambda λ,对于任意状态序列S有
P ( O , S ∣ λ ) = P ( o 1 , o 2 , … , o T , s 1 , s 2 , … , s T ∣ λ ) = π s 1 b s 1 o 1 a s 1 s 2 b s 2 o 2 … a s T − 1 s T b s T o T P(O,S|\lambda) = P(o_1,o_2,\dots,o_T,s_1,s_2,\dots,s_T|\lambda)=\pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2} \dots a_{s_{T-1}s_T}b_{s_To_T} P(O,Sλ)=P(o1,o2,,oT,s1,s2,,sTλ)=πs1bs1o1as1s2bs2o2asT1sTbsToT
​ 1.E步:求解Q函数
Q ( λ , λ i ) = E S [ log ⁡ P ( O , S ∣ λ ) ∣ O , λ i ] = ∑ S log ⁡ [ P ( O , S ∣ λ ) ] P ( S ∣ O , λ i ) \begin{aligned} Q(\lambda,\lambda_i) &= E_S[\log P(O,S|\lambda)|O,\lambda_i]\\ & = \sum_S \log [P(O,S|\lambda)] P(S|O,\lambda_i) \end{aligned} Q(λ,λi)=ES[logP(O,Sλ)O,λi]=Slog[P(O,Sλ)]P(SO,λi)
​ 其中 λ i \lambda_i λi为当前模型参数估计值, λ \lambda λ是要极大化似然函数的模型参数值。又因为 P ( O ∣ λ i ) P(O|\lambda_i) P(Oλi) λ , S \lambda,S λ,S都无关,不影响Q函数最大化,所以Q函数也可以写成如下形式:
Q ( λ , λ i ) = P ( O ∣ λ i ) ∑ S log ⁡ [ P ( O , S ∣ λ ) ] P ( S ∣ O , λ i ) = ∑ S log ⁡ [ P ( O , S ∣ λ ) ] P ( O , S ∣ λ i ) \begin{aligned} Q(\lambda,\lambda_i) &= P(O|\lambda_i) \sum_S \log [P(O,S|\lambda)] P(S|O,\lambda_i)\\ & = \sum_S \log [P(O,S|\lambda)] P(O,S|\lambda_i) \end{aligned} Q(λ,λi)=P(Oλi)Slog[P(O,Sλ)]P(SO,λi)=Slog[P(O,Sλ)]P(O,Sλi)
​ 于是将 P ( O , S ∣ λ ) P(O,S|\lambda) P(O,Sλ)表达式带入上式可得
Q ( λ , λ i ) = ∑ S log ⁡ [ π s 1 b s 1 o 1 a s 1 s 2 b s 2 o 2 … a s T − 1 s T b s T o T ] ⋅ P ( O , S ∣ λ i ) = ∑ S log ⁡ π s 1 P ( O , S ∣ λ i ) + ∑ S ∑ t = 1 T − 1 log ⁡ a s t s t + 1 P ( O , S ∣ λ i ) + ∑ S ∑ t = 1 T log ⁡ b s t o t P ( O , S ∣ λ i ) = ∑ j = 1 M log ⁡ π j P ( O , s 1 = q j ∣ λ i ) + ∑ j = 1 M ∑ k = 1 M ∑ t = 1 T − 1 log ⁡ a j k P ( O , s t = q j , s t + 1 = q k ∣ λ i ) + ∑ j = 1 M ∑ t = 1 T log ⁡ b j o t P ( O , s t = q j ∣ λ i ) \begin{aligned} Q(\lambda,\lambda_i) &= \sum_S \log [\pi_{s_1}b_{s_1o_1} a_{s_1s_2}b_{s_2o_2} \dots a_{s_{T-1}s_T}b_{s_To_T}] \cdot P(O,S|\lambda_i)\\ & = \sum_S\log\pi_{s_1}P(O,S|\lambda_i) + \sum_S \sum_{t=1}^{T-1} \log a_{s_ts_{t+1}} P(O,S|\lambda_i) + \sum_S \sum_{t=1}^T \log b_{s_to_t} P(O,S|\lambda_i)\\ & = \sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i) + \sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)\\ &+ \sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i)\\ \end{aligned} Q(λ,λi)=Slog[πs1bs1o1as1s2bs2o2asT1sTbsToT]P(O,Sλi)=Slogπs1P(O,Sλi)+St=1T1logastst+1P(O,Sλi)+St=1TlogbstotP(O,Sλi)=j=1MlogπjP(O,s1=qjλi)+j=1Mk=1Mt=1T1logajkP(O,st=qj,st+1=qkλi)+j=1Mt=1TlogbjotP(O,st=qjλi)
​ 2.M步:最大化Q函数

​ 接下来,对Q函数中三部分分别求解条件约束极值

​ (1)求解 π \pi π

​ 因为 π \pi π向量表示时刻1的所有可能状态,所以 ∑ j = 1 M π j = 1 \sum_{j=1}^M \pi_j=1 j=1Mπj=1。于是求 ∑ j = 1 M log ⁡ π j P ( O , s 1 = q j ∣ λ i ) \sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i) j=1MlogπjP(O,s1=qjλi)在约束条件 ∑ j = 1 M π j = 1 \sum_{j=1}^M \pi_j=1 j=1Mπj=1下极值,其拉格朗日函数为
L 1 ( μ 1 ) = ∑ j = 1 M log ⁡ π j P ( O , s 1 = q j ∣ λ i ) + μ 1 ( ∑ j = 1 M π j − 1 ) L_1(\mu_1) = \sum_{j=1}^M\log\pi_{j}P(O,s_1 = q_j|\lambda_i) + \mu_1(\sum_{j=1}^M \pi_j-1) L1(μ1)=j=1MlogπjP(O,s1=qjλi)+μ1(j=1Mπj1)
​ 求偏导可得
∂ L 1 ∂ π j = P ( O , s 1 = q j ∣ λ i ) π j + μ 1 \frac{\partial L_1}{\partial \pi_j} = \frac{P(O,s_1 = q_j|\lambda_i)}{\pi_j} + \mu_1 πjL1=πjP(O,s1=qjλi)+μ1

∂ L 1 ∂ μ 1 = ∑ j = 1 M π j − 1 \frac{\partial L_1}{\partial \mu_1} = \sum_{j=1}^M \pi_j-1 μ1L1=j=1Mπj1

​ 令以上两式均为0可得
π ^ j = P ( O , s 1 = q j ∣ λ i ) ∑ k = 1 M P ( O , s 1 = q k ∣ λ i ) \hat{\pi}_j = \frac{P(O,s_1=q_j|\lambda_i)}{\sum_{k=1}^M P(O,s_1=q_k|\lambda_i)} π^j=k=1MP(O,s1=qkλi)P(O,s1=qjλi)
​ 又由前所述概率计算方法可得
P ( O , s t = q j ∣ λ i ) = α t ( j ) ⋅ β t ( j ) P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j) P(O,st=qjλi)=αt(j)βt(j)

​ 所以
π ^ j = α 1 ( j ) β 1 ( j ) ∑ k = 1 M α 1 ( k ) β 1 ( k ) \hat{\pi}_j = \frac{\alpha_1(j)\beta_1(j)}{\sum_{k=1}^M \alpha_1(k)\beta_1(k)} π^j=k=1Mα1(k)β1(k)α1(j)β1(j)
​ (2)求解A

​ 因为A是状态转移概率矩阵,所以 ∑ k = 1 M a j k = 1 \sum_{k=1}^M a_{jk}=1 k=1Majk=1。于是求解 ∑ j = 1 M ∑ k = 1 M ∑ t = 1 T − 1 log ⁡ a j k P ( O , s t = q j , s t + 1 = q k ∣ λ i ) \sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i) j=1Mk=1Mt=1T1logajkP(O,st=qj,st+1=qkλi) ∑ k = 1 M a j k = 1 \sum_{k=1}^M a_{jk}=1 k=1Majk=1下的条件极值,其拉格朗日函数为
L 2 ( μ 2 ) = ∑ j = 1 M ∑ k = 1 M ∑ t = 1 T − 1 log ⁡ a j k P ( O , s t = q j , s t + 1 = q k ∣ λ i ) + μ 2 ( ∑ k = 1 M a j k − 1 ) L_2(\mu_2) = \sum_{j=1}^M \sum_{k=1}^M \sum_{t=1}^{T-1} \log a_{jk} P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i) + \mu_2(\sum_{k=1}^M a_{jk}-1) L2(μ2)=j=1Mk=1Mt=1T1logajkP(O,st=qj,st+1=qkλi)+μ2(k=1Majk1)
​ 求偏导数可得
∂ L 2 ( μ 2 ) ∂ a j k = ∑ t = 1 T − 1 P ( O , s t = q j , s t + 1 = q k ∣ λ i ) a j k + μ 2 \frac{\partial L_2(\mu_2)}{\partial a_{jk}} = \sum_{t=1}^{T-1} \frac{P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{a_{jk}} + \mu_2 ajkL2(μ2)=t=1T1ajkP(O,st=qj,st+1=qkλi)+μ2

∂ L 2 ( μ 2 ) ∂ μ 2 = ∑ k = 1 M a j k − 1 \frac{\partial L_2(\mu_2)}{\partial \mu_2} = \sum_{k=1}^M a_{jk}-1 μ2L2(μ2)=k=1Majk1

​ 令偏导数为0可得
a ^ j k = ∑ t = 1 T − 1 P ( O , s t = q j , s t + 1 = q k ∣ λ i ) ∑ k = 1 M ∑ t = 1 T − 1 P ( O , s t = q j , s t + 1 = q k ∣ λ i ) = ∑ t = 1 T − 1 P ( O , s t = q j , s t + 1 = q k ∣ λ i ) ∑ t = 1 T − 1 P ( O , s t = q j ∣ λ i ) \begin{aligned} \hat a_{jk} &= \frac{\sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{\sum_{k=1}^M \sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}\\ & = \frac{\sum_{t=1}^{T-1}P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i)}{\sum_{t=1}^{T-1}P(O,s_t=q_j|\lambda_i)} \end{aligned} a^jk=k=1Mt=1T1P(O,st=qj,st+1=qkλi)t=1T1P(O,st=qj,st+1=qkλi)=t=1T1P(O,st=qjλi)t=1T1P(O,st=qj,st+1=qkλi)
​ 又由前述概率计算方法可得
P ( O , s t = q j , s t + 1 = q k ∣ λ i ) = α t ( j ) a j k b k o t + 1 β t + 1 ( k ) P(O,s_t=q_j,s_{t+1}=q_k|\lambda_i) = \alpha_t(j) a_{jk} b_{ko_{t+1}} \beta_{t+1}(k) P(O,st=qj,st+1=qkλi)=αt(j)ajkbkot+1βt+1(k)

P ( O , s t = q j ∣ λ i ) = α t ( j ) ⋅ β t ( j ) P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j) P(O,st=qjλi)=αt(j)βt(j)

​ 于是
a ^ j k = ∑ t = 1 T − 1 α t ( j ) a j k b k o t + 1 β t + 1 ( k ) ∑ t = 1 T − 1 α t ( j ) ⋅ β t ( j ) \hat a_{jk} = \frac{\sum_{t=1}^{T-1} \alpha_t(j) a_{jk} b_{ko_{t+1}} \beta_{t+1}(k)}{\sum_{t=1}^{T-1} \alpha_t(j) \cdot \beta_t(j)} a^jk=t=1T1αt(j)βt(j)t=1T1αt(j)ajkbkot+1βt+1(k)
​ (3)计算B

​ 因为B表示观测概率矩阵,所以 ∑ t = 1 T b j k = 1 \sum_{t=1}^T b_{jk}=1 t=1Tbjk=1。于是求解 ∑ j = 1 M ∑ t = 1 T log ⁡ b j o t P ( O , s t = q j ∣ λ i ) \sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i) j=1Mt=1TlogbjotP(O,st=qjλi) ∑ t = 1 T b j o t = 1 \sum_{t=1}^T b_{jo_t}=1 t=1Tbjot=1约束条件下极值,其拉格朗日函数为
L 3 ( μ 3 ) = ∑ j = 1 M ∑ t = 1 T log ⁡ b j o t P ( O , s t = q j ∣ λ i ) + μ 3 ( ∑ t = 1 T b j o t − 1 ) L_3(\mu_3) = \sum_{j=1}^M \sum_{t=1}^T \log b_{jo_t} P(O,s_t = q_j|\lambda_i) + \mu_3(\sum_{t=1}^T b_{jo_t}-1) L3(μ3)=j=1Mt=1TlogbjotP(O,st=qjλi)+μ3(t=1Tbjot1)
​ 求偏导数得
∂ L 3 ( μ 3 ) ∂ b j o t = P ( O , s t = q j ∣ λ i ) b j o t + μ 3 \frac{\partial L_3(\mu_3)}{\partial b_{jo_t}} = \frac{P(O,s_t = q_j|\lambda_i)}{b_{jo_t}} + \mu_3 bjotL3(μ3)=bjotP(O,st=qjλi)+μ3

∂ L 3 ( μ 3 ) ∂ μ 3 = ∑ t = 1 T b j k − 1 \frac{\partial L_3(\mu_3)}{\partial \mu_3} = \sum_{t=1}^T b_{jk}-1 μ3L3(μ3)=t=1Tbjk1

​ 令偏导数为0可得
b ^ j k = ∑ t = 1 T P ( O , s t = q j ∣ λ i ) I ( o t = v k ) ∑ t = 1 T P ( O , s t = q j ∣ λ i ) \begin{aligned} \hat b _{jk} &= \frac{\sum_{t=1}^T P(O,s_t = q_j|\lambda_i)I(o_t=v_k)}{\sum_{t=1}^T P(O,s_t = q_j|\lambda_i)} \end{aligned} b^jk=t=1TP(O,st=qjλi)t=1TP(O,st=qjλi)I(ot=vk)
​ 其中k表示可能的观测值,取值范围从1到N;I(·)为0-1函数,用于筛选出符合观测值k的数据。

​ 由前述概率计算公式可得
P ( O , s t = q j ∣ λ i ) = α t ( j ) ⋅ β t ( j ) P(O,s_t = q_j|\lambda_i) = \alpha_t(j) \cdot \beta_t(j) P(O,st=qjλi)=αt(j)βt(j)
​ 于是
b ^ j k = ∑ t = 1 T α t ( j ) β t ( j ) I ( o t = v k ) ∑ t = 1 T α t ( j ) β t ( j ) \hat b _{jk} = \frac{\sum_{t=1}^T \alpha_t(j)\beta_t(j) I(o_t=v_k)}{\sum_{t=1}^T \alpha_t(j)\beta_t(j)} b^jk=t=1Tαt(j)βt(j)t=1Tαt(j)βt(j)I(ot=vk)

​ 综上所述,学习问题可用上述求解出来的迭代公式解决,直至参数收敛。

4.预测问题

​ 隐含马尔科夫模型参数 λ = ( π , A , B ) \lambda=(\pi,A,B) λ=(π,A,B),从状态序列变化上看,可以构成树形求解空间,目标是求解最优路径问题,考虑使用动态规划算法解决预测问题。维特比算法就是这样设计来解决预测问题,最优路径问题有这样的特性,最优路径的子路径也是子问题的最优路径。

因此,令 δ t ( i ) = max ⁡ s 1 , s 2 , … , s t − 1 P ( s t = q i , s t − 1 , … , s 1 , o t , o t − 1 , … , o 1 ∣ λ ) , i = 1 , 2 , … , M \delta_t(i) = \max_{s_1,s_2,\dots,s_{t-1}} P(s_t=q_i,s_{t-1},\dots,s_1,o_t,o_{t-1},\dots,o_1|\lambda),i=1,2,\dots,M δt(i)=maxs1,s2,,st1P(st=qi,st1,,s1,ot,ot1,,o1λ),i=1,2,,M表示状态序列满足时刻t状态为i的最大概率,由此可得递推公式
δ t + 1 ( i ) = max ⁡ 1 ≤ j ≤ M [ δ t ( j ) a j i ] b i o t + 1 \delta_{t+1}(i)= \max_{1 \le j \le M} [\delta_t(j)a_{ji}]b_{io_{t+1}} δt+1(i=1jMmax[δt(j)aji]biot+1
定义时刻t状态为i的所有路径中最优路径中第t-1个结点为 ψ t ( i ) = max ⁡ 1 ≤ j ≤ M [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , … , M \psi_t(i) = \max_{1 \le j \le M} [\delta_{t-1}(j)a_{ji}], i=1,2,\dots,M ψt(i)=max1jM[δt1(j)aji],i=1,2,,M

综上所述,通过迭代可求得 δ t ( i ) , i = 1 , 2 , … , M \delta_t(i),i=1,2,\dots,M δt(i),i=1,2,,M,并且记录每次选择状态,即可得到状态序列

5.参考资料

  • 《统计学习方法》- 李航
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值