《统计学习方法》(第十章)——隐马尔科夫模型

隐马尔科夫模型的基本概念

隐马尔科夫模型的定义

  关于时间序列得模型,描述由一个隐藏得马尔可夫链随机生成不可观测得状态随机序列,再由状态生成一个观测从而产生观测随机序列得过程。隐藏得马尔可夫链随机生成得状态序列,称为状态序列;每个状态生成一个观测,而由此产生的观测随机序列,称为观测序列。序列的每个位置又可以看作是一个时刻
Q = { q 1 , q 2 , . . q N } , V = { v 1 , v 2 , . . , v M } Q=\{q_1,q_2,..q_N\},V=\{v_1,v_2,..,v_M\} Q={q1,q2,..qN},V={v1,v2,..,vM}
其中 Q Q Q为所有可能的状态集合, V V V是所有的观测集合
I = { i 1 , i 2 , . . . , i N } , O = { o 1 , o 2 , . . . , o T } I=\{i_1,i_2,...,i_N\},O=\{o_1,o_2,...,o_T\} I={i1,i2,...,iN},O={o1,o2,...,oT}
I I I是状态序列, O O O是观测序列
转移矩阵
A = [ a i j ] N × N A=[a_{ij}]_{N \times N} A=[aij]N×N
a i j = P ( i t + 1 = q j ∣ i t = q i ) a_{ij}=P(i_{t+1}=q_j|i_t=q_i) aij=P(it+1=qjit=qi)
观测矩阵
B = [ b j ( k ) ] N × M B=[b_j(k)]_{N \times M} B=[bj(k)]N×M
b j ( k ) = P ( o t = v k ∣ i t = q j ) b_j(k)=P(o_t=v_k|i_t=q_j) bj(k)=P(ot=vkit=qj)
初始状态概率向量
π = ( π i ) ,       π i = P ( i 1 = q i ) \pi=(\pi_i),\ \ \ \ \ \pi_i=P(i_1=q_i) π=(πi),     πi=P(i1=qi)
因此隐马尔科夫模型
λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π)表示
又定义知作了两个假设
P ( i t ∣ i t − 1 , o t − 1 , . . . , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) P(i_t|i_{t-1},o_{t-1},...,i_1,o_1)=P(i_t|i_{t-1}) P(itit1,ot1,...,i1,o1)=P(itit1)
P ( o t ∣ i T , o T , . . . . , i 1 , o 1 ) = P ( o t ∣ i t ) P(o_t|i_T,o_T,....,i_1,o_1)=P(o_t|i_t) P(otiT,oT,....,i1,o1)=P(otit)

观测序列的生成

输入:隐马尔可夫模型 λ = ( A , B , π ) , \lambda=(A,B,\pi), λ=(A,B,π),观测序列长度 T T T
输出:观测序列 O = ( o 1 , o 2 , . . . , o T ) O=(o_1,o_2,...,o_T) O=(o1,o2,...,oT)
( 1 ) (1) (1)按照初始状态分布 π \pi π产生状态 i 1 i_1 i1
( 2 ) (2) (2) t = 1 t=1 t=1
( 3 ) (3) (3)按照状态 i t i_t it的观测概率分布 b i t ( k ) b_{i_t}(k) bit(k)生成 o t o_t ot
( 4 ) (4) (4)按照状态 i t i_t it的状态转移概率分布产生 i t + 1 i_{t+1} it+1
( 5 ) (5) (5) t = t + 1 t=t+1 t=t+1如果 t < T t \lt T t<T ( 3 ) (3) (3)否则终止

隐马尔科夫模型的3个基本问题

  • 概率计算问题
    已知 λ = ( A , B , π ) , O = ( o 1 , o 2 , . . , o T ) \lambda=(A,B,\pi),O=(o_1,o_2,..,o_T) λ=(A,B,π),O=(o1,o2,..,oT)计算 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
  • 学习问题
    已知 O = ( o 1 , o 2 , . . , o T ) O=(o_1,o_2,..,o_T) O=(o1,o2,..,oT),估计 λ \lambda λ使得 P ( O ∣ λ ) P(O|\lambda) P(Oλ)最大
  • 预测问题
    已知 λ , O = ( o 1 , o 2 , . . , o T ) \lambda,O=(o_1,o_2,..,o_T) λ,O=(o1,o2,..,oT) P ( I ∣ O ) P(I|O) P(IO)

概率计算算法

直接计算法

P ( I ∣ λ ) = π i t a i 1 i 2 a i 2 i 3 . . . a i T − 1 i T P(I|\lambda)=\pi_{i_t}a_{i_1i_2}a_{i_2i_3}...a_{i_{T-1}i_T} P(Iλ)=πitai1i2ai2i3...aiT1iT
P ( O ∣ I , λ ) = b i 1 ( o 1 ) b i 2 ( o 2 ) . . . b i T ( o T ) P(O|I,\lambda)=b_{i_1}(o_1)b_{i_2}(o_2)...b_{i_T}(o_T) P(OI,λ)=bi1(o1)bi2(o2)...biT(oT)
P ( O , I ∣ λ ) = P ( O ∣ I , λ ) P ( I ∣ λ ) P(O,I|\lambda)=P(O|I,\lambda)P(I|\lambda) P(O,Iλ)=P(OI,λ)P(Iλ)
= π i 1 b i 1 ( o 1 ) π i 2 b i 2 ( o 2 ) . . . π i T b i T ( o T ) =\pi_{i_1}b_{i_1}(o_1)\pi_{i_2}b_{i_2}(o_2)...\pi_{i_T}b_{i_T}(o_T) =πi1bi1(o1)πi2bi2(o2)...πiTbiT(oT)
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O|\lambda)=\sum\limits_{I}P(O|I,\lambda)P(I|\lambda) P(Oλ)=IP(OI,λ)P(Iλ)
但是计算复杂度太高

前向计算法

定义 a t ( i ) = P ( o 1 , o 2 , . . . , o t , i t = q i ∣ λ ) a_t(i)=P(o_1,o_2,...,o_t,i_t=q_i|\lambda) at(i)=P(o1,o2,...,ot,it=qiλ)
算法
输入:隐马尔可夫模型 λ \lambda λ,观测序列 O O O
输出:观测序列的概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
( 1 ) (1) (1)初值
a 1 ( i ) = π i b i ( o 1 ) a_1(i)=\pi_ib_i(o_1) a1(i)=πibi(o1)
( 2 ) (2) (2)递推
a t + 1 ( i ) = [ ∑ j = 1 N a t ( j ) a j i ] b i ( o t + 1 ) a_{t+1}(i)=[\sum\limits_{j=1}^Na_t(j)a_{ji}]b_i(o_{t+1}) at+1(i)=[j=1Nat(j)aji]bi(ot+1)
( 3 ) (3) (3)终止
P ( O ∣ λ ) = ∑ i = 1 N a T ( i ) P(O|\lambda)=\sum\limits_{i=1}^Na_T(i) P(Oλ)=i=1NaT(i)

后向计算法

定义 β t ( i ) = P ( o t + 1 , o t + 2 , . . . , o T ∣ t = q i , λ ) \beta_t(i)=P(o_{t+1},o_{t+2},...,o_T|t=q_i,\lambda) βt(i)=P(ot+1,ot+2,...,oTt=qi,λ)
算法
输入:隐马尔可夫模型 λ \lambda λ,观测序列 O O O
输出:观测序列的概率 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
( 1 ) (1) (1)初值
β T ( i ) = 1 \beta_T(i)=1 βT(i)=1
( 2 ) (2) (2)递推
β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) \beta_t(i)=\sum\limits_{j=1}^Na_{ij}b_j(o_{t+1})\beta_{t+1}(j) βt(i)=j=1Naijbj(ot+1)βt+1(j)
( 3 ) (3) (3)终止
P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O|\lambda)=\sum\limits_{i=1}^N\pi_ib_i(o_1)\beta_1(i) P(Oλ)=i=1Nπibi(o1)β1(i)

一些概率与期望值的计算

1. 1. 1.给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi的概率
γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_t(i)=P(i_t=q_i|O,\lambda) γt(i)=P(it=qiO,λ)
= P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) =\frac{P(i_t=q_i,O|\lambda)}{P(O|\lambda)} =P(Oλ)P(it=qi,Oλ)

a t ( i ) β t ( i ) = P ( i t = q i , O ∣ λ ) a_t(i)\beta_t(i)=P(i_t=q_i,O|\lambda) at(i)βt(i)=P(it=qi,Oλ)
最终
γ t ( i ) = a t ( i ) β t ( i ) ∑ j = 1 N a t ( j ) β t ( j ) \gamma_t(i)=\frac{a_t(i)\beta_t(i)}{\sum\limits_{j=1}^Na_t(j)\beta_t(j)} γt(i)=j=1Nat(j)βt(j)at(i)βt(i)
2. 2. 2.给定模型 λ , O \lambda,O λ,O在时刻 t t t在, q i q_i qi t + 1 t+1 t+1 q j q_j qj处的概率
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O , λ ) \xi_t(i,j)=P(i_t=q_i,i_{t+1}=q_j|O,\lambda) ξt(i,j)=P(it=qi,it+1=qjO,λ)
= P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) ∑ i = 1 N ∑ j = 1 N P ( i t = q i , i t + 1 = q j , O ∣ λ ) =\frac{P(i_t=q_i,i_{t+1}=q_j,O|\lambda)}{P(O|\lambda)}=\frac{P(i_t=q_i,i_{t+1}=q_j,O|\lambda)}{\sum\limits_{i=1}^N\sum\limits_{j=1}^NP(i_t=q_i,i_{t+1}=q_j,O|\lambda)} =P(Oλ)P(it=qi,it+1=qj,Oλ)=i=1Nj=1NP(it=qi,it+1=qj,Oλ)P(it=qi,it+1=qj,Oλ)
= a t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N P ( i t = q i , i t + 1 = q j , O ∣ λ ) =\frac{a_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\sum\limits_{i=1}^N\sum\limits_{j=1}^NP(i_t=q_i,i_{t+1}=q_j,O|\lambda)} =i=1Nj=1NP(it=qi,it+1=qj,Oλ)at(i)aijbj(ot+1)βt+1(j)
3. 3. 3.导出

  • 在观测 O O O下状态 i i i出现的期望
    ∑ t = 1 T γ t ( i ) \sum\limits_{t=1}^T\gamma_t(i) t=1Tγt(i)
  • 在观测 O O O下状态 i i i转移的期望
    ∑ t = 1 T − 1 γ t ( i ) \sum\limits_{t=1}^{T-1}\gamma_t(i) t=1T1γt(i)
  • 在观测 O O O下状态 i i i转移到 j j j的期望
    ∑ t = 1 T − 1 ξ t ( i , j ) \sum\limits_{t=1}^{T-1}\xi_t(i,j) t=1T1ξt(i,j)

学习算法

监督学习方法

假设已经给出 { ( O 1 , I 1 ) , ( O 2 , I 2 ) , . . . , ( O S , I S ) } \{(O_1,I_1),(O_2,I_2),...,(O_S,I_S)\} {(O1,I1),(O2,I2),...,(OS,IS)},我们利用极大似然估计来求
1. 1. 1.转移矩阵 a i j a_{ij} aij
a i j = A i j ∑ j = 1 N A i j a_{ij}=\frac{A_{ij}}{\sum\limits_{j=1}^NA_{ij}} aij=j=1NAijAij
其中 A i j A_{ij} Aij i i i j j j的频数
2. 2. 2.观测矩阵估计
b j ( k ) = B j k ∑ k = 1 M B j k b_j(k)=\frac{B_{jk}}{\sum\limits_{k=1}^MB_{jk}} bj(k)=k=1MBjkBjk
3. 3. 3.初始状态的估计为初始的 q i q_i qi频度

Baum-Welch算法

1. 1. 1.确定完全数据的对数似然函数
log ⁡   P ( O , I ∣ λ ) \log \ P(O,I|\lambda) log P(O,Iλ)
2. 2. 2. Q Q Q函数
Q ( λ , λ ^ ) = ∑ I log ⁡ P ( O , I ∣ λ ) P ( O , I ∣ λ ^ ) Q(\lambda,\hat{\lambda})=\sum\limits_I\log P(O,I|\lambda)P(O,I|\hat{\lambda}) Q(λ,λ^)=IlogP(O,Iλ)P(O,Iλ^)
P ( O , I ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) . . . a i T − 1 i T b i T ( o T ) P(O,I|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)...a_{i_{T-1}i_T}b_{i_{T}}(o_T) P(O,Iλ)=πi1bi1(o1)ai1i2bi2(o2)...aiT1iTbiT(oT)
Q ( λ , λ ^ ) = ∑ I log ⁡ π i 1 P ( O , I ∣ λ ^ ) + ∑ I ( ∑ t = 1 T − 1 log ⁡ a i t i t + 1 ) P ( O , I ∣ λ ^ ) + ∑ I ( ∑ t = 1 T log ⁡ b i t ( o t ) P ( O , I ∣ λ ^ ) ) Q(\lambda,\hat{\lambda})=\sum\limits_I\log \pi_{i_1} P(O,I|\hat{\lambda})+\sum\limits_I(\sum\limits_{t=1}^{T-1}\log a_{i_ti_{t+1}})P(O,I|\hat{\lambda})+\sum\limits_I(\sum\limits_{t=1}^T\log b_{i_t}(o_t)P(O,I|\hat{\lambda})) Q(λ,λ^)=Ilogπi1P(O,Iλ^)+I(t=1T1logaitit+1)P(O,Iλ^)+I(t=1Tlogbit(ot)P(O,Iλ^))
3. 3. 3.最大化
( 1 ) (1) (1)第一项
∑ I log ⁡ π i 1 P ( O , I ∣ λ ^ ) = ∑ i = 1 N log ⁡ π i P ( O , i 1 = i ∣ λ ^ ) \sum\limits_I\log \pi_{i_1}P(O,I|\hat{\lambda})=\sum\limits_{i=1}^N \log \pi_i P(O,i_1=i|\hat{\lambda}) Ilogπi1P(O,Iλ^)=i=1NlogπiP(O,i1=iλ^)
∑ i = 1 N π i = 1 \sum\limits_{i=1}^N\pi_i=1 i=1Nπi=1,拉格朗日函数为
∑ i = 1 N log ⁡ π i P ( O , i 1 = i ∣ λ ^ ) + γ ( ∑ i = 1 N π i − 1 ) \sum\limits_{i=1}^N\log \pi_iP(O,i_1=i|\hat{\lambda})+\gamma(\sum\limits_{i=1}^N\pi_i-1) i=1NlogπiP(O,i1=iλ^)+γ(i=1Nπi1)
π i \pi_i πi求导为0得
P ( O , i 1 = i ∣ λ ^ ) + γ π i = 0 P(O,i_1=i|\hat{\lambda})+\gamma\pi_i=0 P(O,i1=iλ^)+γπi=0
π i = P ( O , i 1 = i ∣ λ ^ ) P ( O ∣ λ ^ ) \pi_i=\frac{P(O,i_1=i|\hat{\lambda})}{P(O|\hat{\lambda})} πi=P(Oλ^)P(O,i1=iλ^)
( 2 ) (2) (2)第二项
∑ I ( ∑ t = 1 T − 1 log ⁡ a i t i t + 1 ) P ( O , I ∣ λ ^ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log ⁡ a i j P ( O , i t = i , i t + 1 = j ∣ λ ^ ) \sum\limits_I(\sum\limits_{t=1}^{T-1}\log a_{i_ti_{t+1}})P(O,I|\hat{\lambda})=\sum\limits_{i=1}^N\sum\limits_{j=1}^N\sum\limits_{t=1}^{T-1}\log a_{ij}P(O,i_t=i,i_{t+1}=j|\hat{\lambda}) I(t=1T1logaitit+1)P(O,Iλ^)=i=1Nj=1Nt=1T1logaijP(O,it=i,it+1=jλ^)
约束条件为 ∑ j = 1 N a i j = 1 \sum\limits_{j=1}^Na_{ij}=1 j=1Naij=1
a i j = ∑ t = 1 T − 1 P ( O , i t = i , i t + 1 = j ∣ λ ^ ) ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ^ ) a_{ij}=\frac{\sum\limits_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\hat{\lambda})}{\sum\limits_{t=1}^{T-1}P(O,i_t=i|\hat{\lambda})} aij=t=1T1P(O,it=iλ^)t=1T1P(O,it=i,it+1=jλ^)
( 3 ) (3) (3)第三项
∑ I ( ∑ t = 1 T log ⁡ b i t ( o t ) P ( O , I ∣ λ ^ ) ) = ∑ j = 1 N ∑ t = 1 T log ⁡ b j ( o t ) P ( O , i t = j ∣ λ ^ ) \sum\limits_I(\sum\limits_{t=1}^T\log b_{i_t}(o_t)P(O,I|\hat{\lambda}))=\sum\limits_{j=1}^N\sum\limits_{t=1}^T\log b_j(o_t)P(O,i_t=j|\hat{\lambda}) I(t=1Tlogbit(ot)P(O,Iλ^))=j=1Nt=1Tlogbj(ot)P(O,it=jλ^)
同约束条件 ∑ k = 1 M b j ( k ) = 1 \sum\limits_{k=1}^Mb_j(k)=1 k=1Mbj(k)=1
b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ^ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = j ∣ λ ) ^ b_j(k)=\frac{\sum\limits_{t=1}^TP(O,i_t=j|\hat{\lambda})I(o_t=v_k)}{\sum\limits_{t=1}^TP(O,i_t=j|\hat{\lambda)}} bj(k)=t=1TP(O,it=jλ)^t=1TP(O,it=jλ^)I(ot=vk)

Baum-Welch模型参数估计公式

a i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{ij}=\frac{\sum\limits_{t=1}^{T-1}\xi_t(i,j)}{\sum\limits_{t=1}^{T-1}\gamma_t(i)} aij=t=1T1γt(i)t=1T1ξt(i,j)
b j ( k ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_j(k)=\frac{\sum\limits_{t=1,o_t=v_k}^{T}\gamma_t(j)}{\sum\limits_{t=1}^{T}\gamma_t(j)} bj(k)=t=1Tγt(j)t=1,ot=vkTγt(j)
π i = γ 1 ( i ) \pi_i=\gamma_1(i) πi=γ1(i)
算法
输入:观测数据 O = ( o 1 , o 2 , . . , o T ) O=(o_1,o_2,..,o_T) O=(o1,o2,..,oT)
输出:隐马尔可夫模型
( 1 ) (1) (1)对于 n = 0 n=0 n=0选取 λ ( 0 ) = ( A ( 0 ) , B ( 0 ) , π ( 0 ) ) \lambda^{(0)}=(A^{(0)},B^{(0)},\pi^{(0)}) λ(0)=(A(0),B(0),π(0))
( 2 ) (2) (2)递推
a i j ( n + 1 ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{ij}^{(n+1)}=\frac{\sum\limits_{t=1}^{T-1}\xi_t(i,j)}{\sum\limits_{t=1}^{T-1}\gamma_t(i)} aij(n+1)=t=1T1γt(i)t=1T1ξt(i,j)
b j ( k ) ( n + 1 ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_j(k)^{(n+1)}=\frac{\sum\limits_{t=1,o_t=v_k}^{T}\gamma_t(j)}{\sum\limits_{t=1}^{T}\gamma_t(j)} bj(k)(n+1)=t=1Tγt(j)t=1,ot=vkTγt(j)
π i ( n + 1 ) = γ 1 ( i ) \pi_i^{(n+1)}=\gamma_1(i) πi(n+1)=γ1(i)
( 3 ) (3) (3)如果满足条件,终止

预测算法

近似算法

i t ∗ = arg max ⁡ 1 ≤ i ≤ N [ γ t ( i ) ] i_t^*=\argmax\limits_{1\le i\le N}[\gamma_t(i)] it=1iNargmax[γt(i)]
但太简单了

维特比算法

输入:模型 λ , O \lambda,O λ,O
输出:最优路径 I ∗ I^* I
( 1 ) (1) (1)初始化
ϱ 1 ( i ) = π i b i ( o 1 ) \varrho_1(i)=\pi_ib_i(o_1) ϱ1(i)=πibi(o1)
ψ 1 ( i ) = 0 \psi_1(i)=0 ψ1(i)=0
( 2 ) (2) (2)递推
ϱ t ( i ) = max ⁡ 1 ≤ j ≤ N [ ϱ t − 1 ( j ) a j i ] b i ( o t ) \varrho_t(i)=\max\limits_{1\le j \le N}[\varrho_{t-1}(j)a_{ji}]b_i(o_t) ϱt(i)=1jNmax[ϱt1(j)aji]bi(ot)
ψ t ( i ) = arg max ⁡ 1 ≤ j ≤ N [ ϱ t − 1 ( j ) a j i ] \psi_t(i)=\argmax\limits_{1 \le j \le N}[\varrho_{t-1}(j)a_{ji}] ψt(i)=1jNargmax[ϱt1(j)aji]
( 3 ) (3) (3)终止
P ∗ = max ⁡ 1 ≤ i ≤ N ϱ T ( i ) P^*=\max\limits_{1 \le i \le N}\varrho_T(i) P=1iNmaxϱT(i)
i T ∗ = arg max ⁡ 1 ≤ i ≤ N [ ϱ T ( i ) ] i_T^*=\argmax\limits_{1 \le i \le N}[\varrho_T(i)] iT=1iNargmax[ϱT(i)]
( 4 ) (4) (4)回溯最优路径

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值