隐马尔可夫模型(HMM模型学习、概率计算、解码)

通信模型

发送者(人或机器)发送信息时,需通过媒介(空气或电线)传播信号,此过程为广义上的编码。接收者根据规则将信号还原成发送者发送的信息,此过程为广义上的解码.

语音识别是接收方根据接收信号还原发送方的信息的过程,如何通过观测信号 o 1 , o 2 , ⋯ o_1,o_2,\cdots o1,o2,,来分析信号源发送的信息 s 1 , s 2 , ⋯ s_1,s_2,\cdots s1,s2,呢?从概率角度来看,就是从所有源信息中找到最可能产出观测信号的源信息。

根据贝叶斯定理
P ( s 1 , s 2 , ⋯ ∣ o 1 , o 2 , ⋯   ) = P ( o 1 , o 2 , ⋯ ∣ s 1 , s 2 , ⋯   ) P ( s 1 , s 2 , ⋯   ) P ( o 1 , o 2 , ⋯   ) P(s_1,s_2,\cdots|o_1,o_2,\cdots)=\frac{P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)}{P(o_1,o_2,\cdots)} P(s1,s2,o1,o2,)=P(o1,o2,)P(o1,o2,s1,s2,)P(s1,s2,)

一旦信息 o 1 , o 2 , ⋯ o_1,o_2,\cdots o1,o2,产生后就不会改变,即 P ( o 1 , o 2 , ⋯   ) P(o_1,o_2,\cdots) P(o1,o2,)为常数,最可能的源信息
s 1 , s 2 , ⋯ = arg ⁡ max ⁡ s 1 , s 2 , ⋯ P ( s 1 , s 2 , ⋯ ∣ o 1 , o 2 , ⋯   ) = arg ⁡ max ⁡ s 1 , s 2 , ⋯ P ( o 1 , o 2 , ⋯ ∣ s 1 , s 2 , ⋯   ) P ( s 1 , s 2 , ⋯   ) s_1,s_2,\cdots =\arg\max_{s_1,s_2,\cdots}P(s_1,s_2,\cdots|o_1,o_2,\cdots)= \arg\max_{s_1,s_2,\cdots}P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots) s1,s2,=args1,s2,maxP(s1,s2,o1,o2,)=args1,s2,maxP(o1,o2,s1,s2,)P(s1,s2,)

这个公式可由隐含马尔可夫模型求解。

马尔可夫假设和马尔可夫过程

观测序列 s 1 , s 2 , ⋯   , s t , ⋯ s_1,s_2,\cdots,s_t,\cdots s1,s2,,st,是每天最高气温序列, s t s_t st为气温随机变量。假设随机过程中状态 s t s_t st的概率分布只与它的前一个状态相关(今天的最高气温仅与昨天的最高气温有关),即
P ( s t ∣ s 1 , s 2 , ⋯   , s t − 1 ) = P ( s t ∣ s t − 1 ) P(s_t|s_1,s_2,\cdots,s_{t-1})=P(s_t|s_{t-1}) P(sts1,s2,,st1)=P(stst1)

该假设称为马尔可夫假设,符合马尔可夫假设的随机过程称为马尔可夫过程(有向图-贝叶斯网络)

1.0
0.6
0.3
0.4
0.7
m1
m2
m3
m4

随机选择一个状态作为初始状态,随后依据转移规则生成后续状态,经 T T T时间后,产生状态序列 s 1 , ⋯   , s T s_1,\cdots,s_T s1,,sT。若时间足够长,从 m i m_i mi m j m_j mj的转移概率为 # ( m i , m j ) / # ( m i ) \#(m_i,m_j)/\#(m_i) #(mi,mj)/#(mi)

隐马尔可夫模型和通信模型

隐马尔可夫模型,描述由马尔可夫链生成不可观测的状态序列,再由状态序列生成观测序列的过程。 隐含的状态序列 s 1 , s 2 , ⋯ s_1,s_2,\cdots s1,s2,是一个典型的马尔可夫链,这种模型称为“隐含”马尔可夫模型。

隐马尔可夫模型的两个假设:

  • 独立输出假设: HMM在每个时刻 t t t输出一个观测 o t o_t ot仅与隐状态 s t s_t st相关::
    P ( o t ∣ s 1 , ⋯   , s t , o 1 , ⋯   , o t − 1 ) = P ( o t ∣ s t ) P(o_t|s_1,\cdots,s_{t},o_1,\cdots,o_{t-1})=P(o_t|s_{t}) P(ots1,,st,o1,,ot1)=P(otst)

  • 马尔可夫假设: HMM在每个时刻 t t t的隐状态 s t s_t st仅与上一时刻隐状态 s t − 1 s_{t-1} st1有关:
    P ( s t ∣ s 1 , ⋯   , s t − 1 , o 1 , ⋯   , o t − 1 ) = P ( s t ∣ s t − 1 ) P(s_t|s_1,\cdots,s_{t-1},o_1,\cdots,o_{t-1})=P(s_t|s_{t-1}) P(sts1,,st1,o1,,ot1)=P(stst1)

根据马尔可夫假设独立输出假设,状态序列和观测序列的联合概率(生成式模型)
P ( s 1 , s 2 , ⋯   , o 1 , o 2 , ⋯   ) = ∏ t P ( s t ∣ s t − 1 ) ⋅ P ( o t ∣ s t ) P(s_1,s_2,\cdots,o_1,o_2,\cdots)=\prod_tP(s_t|s_{t-1})\cdot P(o_t|s_t) P(s1,s2,,o1,o2,)=tP(stst1)P(otst)

通信解码问题可用HMM解决,利用Viterbi算法找到上面概率的最大值,进而找到最可能的隐藏状态.


HMM模型表示

令隐藏状态集合 M = { m 1 , ⋯   , m N } M = \{m_1,\cdots, m_N\} M={m1,,mN},观测状态集合 V = { v 1 , ⋯   , v M } V = \{v_1, \cdots, v_M\} V={v1,,vM},隐藏状态序列 S = ( s 1 , ⋯   , s T ) S = (s_1, \cdots, s_T) S=(s1,,sT),观测状态序列 O = ( o 1 , ⋯   , o T ) O = (o_1, \cdots, o_T) O=(o1,,oT)

I. 状态转移矩阵
若时刻 t t t处于隐藏状态 m i m_i mi,时刻 t + 1 t+1 t+1处于隐藏状态为 m j m_j mj,则时刻 t t t到时刻 t + 1 t+1 t+1状态转移概率
a i j = P ( s t + 1 = m j ∣ s t = m i ) , i , j = 1 , 2 , ⋯   , N a_{ij} = P(s_{t+1} = m_j | s_t = m_i), \quad i,j = 1, 2, \cdots, N aij=P(st+1=mjst=mi),i,j=1,2,,N

状态转移矩阵 A = [ a i j ] N × N A = [a_{ij}]_{N \times N} A=[aij]N×N.

II. 观测概率矩阵
若时刻 t t t处于隐藏状态 m j m_j mj,则从隐藏状态 m j m_j mj到观测状态 v k v_k vk生成概率
b j ( k ) = P ( o t = v k ∣ s t = m j ) , k = 1 , 2 , ⋯   , M ;   j = 1 , 2 , ⋯   , N b_j(k) = P(o_t = v_k | s_t = m_j), \quad k = 1,2,\cdots, M; \, j = 1, 2, \cdots, N bj(k)=P(ot=vkst=mj),k=1,2,,M;j=1,2,,N

观测概率矩阵 B = [ b j ( k ) ] N × M B = [b_j(k)]_{N\times M} B=[bj(k)]N×M.

III. 初始状态概率向量
若初始时刻 t = 1 t=1 t=1处于状态 m i m_i mi的概率
π i = P ( s 1 = m i ) , i = 1 , 2 , ⋯   , N \pi_i = P(s_1 = m_i), \quad i = 1, 2, \cdots, N πi=P(s1=mi),i=1,2,,N

初始状态概率向量 Π = ( π i ) \Pi = (\pi_i) Π=(πi).

综上, π \pi π A A A决定状态序列, B B B决定观测序列,HMM的三元组表示为
λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)


示例:假设有 4 4 4个盒子,每盒都装有红白两种颜色的球,如下

盒子 X123
红球数547
白球数563

依初始概率随机选取1个盒子,从中抽出1个球再放回,然后转移到下一个盒子,如盒子1的转移概率为
P ( X = 1 ∣ X = 1 ) = 0.5 , P ( X = 2 ∣ X = 1 ) = 0.2 , P ( X = 3 ∣ X = 1 ) = 0.3 P(X=1|X=1)=0.5,\quad P(X=2|X=1)=0.2,\quad P(X=3|X=1)=0.3 P(X=1X=1)=0.5,P(X=2X=1)=0.2,P(X=3X=1)=0.3

如此重复进行5次,得到球颜色的观测序列
O = { 红 , 红 , 白 , 白 , 红 } O = \{红, 红,白,白,红\} O={,,,,}

例中,盒子序列为隐状态序列,球颜色序列是观测序列已知,HMM三要素:
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , Π = ( 0.2 , 0.4 , 0.4 ) T A = \left[\begin{matrix} 0.5 &0.2 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right] ,\quad B = \left[\begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \end{matrix}\right] ,\quad \Pi=(0.2, 0.4, 0.4)^T A=0.50.30.20.20.50.30.30.20.5,B=0.50.40.70.50.60.3,Π=(0.2,0.4,0.4)T


HMM概率计算

问题描述:已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( o 1 , o 2 , ⋯   , o T ) O = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT),计算模型 λ \lambda λ下观测序列 O O O的概率,即 P ( O ∣ λ ) P(O|\lambda) P(Oλ)

是否可以通过枚举计算观测序列出现的概率?通过枚举状态序列 S = ( s 1 , s 2 , ⋯   , s T ) S = (s_1, s_2, \cdots, s_T) S=(s1,s2,,sT),求解 S S S与观测序列 O = ( o 1 , o 2 , ⋯   , o T ) O = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT)的联合概率 P ( O , S ∣ λ ) P(O, S|\lambda) P(O,Sλ),再求和
P ( O ∣ λ ) = ∑ S P ( O , S ∣ λ ) = ∑ S P ( O ∣ S , λ ) P ( S ∣ λ ) \begin{aligned} P(O|\lambda) & = \sum_S P(O, S|\lambda) = \sum_{S}P(O|S, \lambda)P(S|\lambda) \end{aligned} P(Oλ)=SP(O,Sλ)=SP(OS,λ)P(Sλ)

隐藏状态序列有 N T N^T NT种组合,直接计算法的复杂度为 O ( T N T ) O(TN^T) O(TNT),不适用于隐含状态较多的模型。

前向递推公式

前向算法是一种DP算法,通过定义局部状态前向概率得到递推公式,将子问题的最优解扩展到全局问题的最优解。给定模型 λ \lambda λ,在时刻 t t t观测序列为 o 1 , ⋯   , o t o_1, \cdots, o_t o1,,ot且隐藏状态 s t = q i s_t=q_i st=qi的概率为前向概率,定义为
α t ( i ) = P ( o 1 , ⋯   , o t , s t = q i ∣ λ ) ,   P ( O ∣ λ ) = ∑ i α T ( i ) , α 1 ( i ) = π i b i ( o 1 ) \alpha_t(i) = P(o_1,\cdots,o_t,s_t=q_i|\lambda),\quad \ P(O|\lambda) = \sum_{i}\alpha_T(i),\quad \alpha_1(i) = \pi_i b_i(o_1) αt(i)=P(o1,,ot,st=qiλ), P(Oλ)=iαT(i),α1(i)=πibi(o1)

齐次马尔可夫性观测独立性假设,知前向概率的递推公式为
α t + 1 ( i ) = P ( o 1 , ⋯   , o t , o t + 1 , s t + 1 = q i ∣ λ ) = ∑ j P ( o 1 , ⋯   , o t , o t + 1 , s t = q j , s t + 1 = q i ∣ λ ) = ∑ j P ( s t + 1 = q i , o t + 1 ∣ o 1 , ⋯   , o t , s t = q j , λ ) P ( o 1 , ⋯   , o t , s t = q j ∣ λ ) = ∑ j P ( s t + 1 = q i , o t + 1 ∣ s t = q j , λ ) α t ( j ) = ∑ j P ( o t + 1 ∣ s t = q j , s t + 1 = q i , λ ) P ( s t + 1 = q i ∣ s t = q j , λ ) α t ( j ) = [ ∑ j α t ( j ) a j i ] b i ( o t + 1 ) \begin{aligned} \alpha_{t+1}(i) &=P(o_1,\cdots,o_t,o_{t+1},s_{t+1}=q_i|\lambda)\\[1ex] &=\sum_jP(o_1,\cdots,o_t,o_{t+1},s_t=q_j,s_{t+1}=q_i|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|o_1,\cdots,o_t,s_t=q_j,\lambda)P(o_1,\cdots,o_t,s_t=q_j|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|s_t=q_j,\lambda)\alpha_t(j)\\ &=\sum_jP(o_{t+1}|s_t=q_j,s_{t+1}=q_i,\lambda)P(s_{t+1}=q_i|s_t=q_j,\lambda)\alpha_t(j)\\ &=\left[\sum_{j} \alpha_t(j) a_{ji}\right] b_i(o_{t+1}) \end{aligned} αt+1(i)=P(o1,,ot,ot+1,st+1=qiλ)=jP(o1,,ot,ot+1,st=qj,st+1=qiλ)=jP(st+1=qi,ot+1o1,,ot,st=qj,λ)P(o1,,ot,st=qjλ)=jP(st+1=qi,ot+1st=qj,λ)αt(j)=jP(ot+1st=qj,st+1=qi,λ)P(st+1=qist=qj,λ)αt(j)=[jαt(j)aji]bi(ot+1)

基于状态序列的路径结构递推计算 P ( O ∣ λ ) P(O|\lambda) P(Oλ),通过保存子问题的解以避免重复计算,达到计算加速的目的。

矩阵形式为 α 1 = π ⊙ B o 1 ,   α t + 1 = ( α t T A ) ⊙ B o t + 1 \boldsymbol\alpha_1=\boldsymbol\pi\odot\boldsymbol B_{o_1},\ \boldsymbol\alpha_{t+1}=(\boldsymbol\alpha_t^TA)\odot\boldsymbol B_{o_{t+1}} α1=πBo1, αt+1=(αtTA)Bot+1,最后迭代得到 α T ( i ) \alpha_T(i) αT(i),因此
P ( O ∣ λ ) = ∑ i α T ( i ) P(O|\lambda)=\sum_{i}\alpha_T(i) P(Oλ)=iαT(i)

若模型 λ \lambda λ N N N个隐藏状态,观测序列 O O O的长度为 T T T,则 P ( O ∣ λ ) P(O|\lambda) P(Oλ)的时间复杂度为 O ( N 2 T ) O(N^2T) O(N2T)


Python实现

import numpy as np


def forward_HMM(O, PI, A, B):
    """
    已知模型,求解状态序列概率

    :param O: 1D, 观测序列(元素为整数)
    :param PI: 1D, 初始概率向量
    :param A: 2D, 状态转移矩阵
    :param B: 2D, 观测生成矩阵
    :return: float, O的概率
    """
    PI = np.asarray(PI).ravel()
    A = np.asarray(A)
    B = np.asarray(B)

    # 求解第1步的前向概率
    alphas = B[:, O[0]] * PI

    # 求解2至T步的前向概率
    for index in O[1:]:
        alphas = np.dot(alphas, A) * B[:, index]

    # 累计最后所有隐藏状态的前向概率
    return alphas.sum()

if __name__ == '__main__':
    # 初始概率向量
    PI = [0.2, 0.4, 0.4]
    # 状态转移矩阵N*N, N个隐含状态
    A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
    # 观测概率矩阵N*M, N个隐含状态, M个观测状态
    B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
    # 观测序列
    O = [0, 1, 0]

    print(forward_HMM(O, PI, A, B))

后向递推公式

给定模型 λ \lambda λ,在时刻 t t t隐藏态为 q i q_i qi且时刻 t + 1 t+1 t+1之后观测序列为 o t + 1 , ⋯   , o T o_{t+1}, \cdots, o_T ot+1,,oT的概率为后向概率,即
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ s t = q i , λ ) , P ( O ∣ λ ) = ∑ i π i b i ( o 1 ) β 1 ( i ) , β T ( i ) = 1 \beta_t(i) = P(o_{t+1},o_{t+2},\cdots,o_T|s_t = q_i, \lambda),\quad P(O|\lambda) = \sum_{i}\pi_i b_i(o_1) \beta_1(i),\quad \beta_{T}(i) = 1 βt(i)=P(ot+1,ot+2,,oTst=qi,λ),P(Oλ)=iπibi(o1)β1(i),βT(i)=1

齐次马尔可夫性观测独立性假设,知后向概率的递推公式
β t ( i ) = ∑ j P ( o t + 1 , ⋯   , o T , s t + 1 = q j ∣ s t = q i , λ ) = ∑ j P ( o t + 1 , ⋯   , o T ∣ s t = q i , s t + 1 = q j , λ ) ⋅ P ( s t + 1 = q j ∣ s t = q i , λ ) = ∑ j a i j ⋅ P ( o t + 1 , ⋯   , o T ∣ s t + 1 = q j , λ ) = ∑ j a i j ⋅ P ( o t + 1 ∣ o t + 2 , ⋯   , o T , s t + 1 = q j , λ ) ⋅ P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j , λ ) = ∑ j a i j ⋅ P ( o t + 1 ∣ s t + 1 = q j , λ ) ⋅ P ( o t + 2 , ⋯   , o T ∣ s t + 1 = q j , λ ) = ∑ j a i j ⋅ b j ( o t + 1 ) ⋅ β t + 1 ( j ) \begin{aligned}\beta_t(i) & = \sum_{j}P(o_{t+1},\cdots,o_T,s_{t+1}=q_j|s_t = q_i, \lambda) \\ & = \sum_{j}P(o_{t+1},\cdots,o_T|s_t = q_i,s_{t+1}=q_j, \lambda)\cdot P(s_{t+1}=q_j|s_t =q_i, \lambda) \\ & = \sum_{j}a_{ij}\cdot P(o_{t+1},\cdots,o_T| s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|o_{t+2},\cdots,o_T,s_{t+1}=q_j,\lambda)\cdot P(o_{t+2}, \cdots, o_T|s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|s_{t+1}=q_j,\lambda)\cdot P(o_{t+2},\cdots, o_T|s_{t+1}=q_j, \lambda) \\ & = \sum_{j}a_{ij}\cdot b_j(o_{t+1})\cdot \beta_{t+1}(j) \end{aligned} βt(i)=jP(ot+1,,oT,st+1=qjst=qi,λ)=jP(ot+1,,oTst=qi,st+1=qj,λ)P(st+1=qjst=qi,λ)=jaijP(ot+1,,oTst+1=qj,λ)=jaijP(ot+1ot+2,,oT,st+1=qj,λ)P(ot+2,,oTst+1=qj,λ)=jaijP(ot+1st+1=qj,λ)P(ot+2,,oTst+1=qj,λ)=jaijbj(ot+1)βt+1(j)


前后向算法之间的关系

P ( O ∣ λ ) = ∑ i P ( o 1 , ⋯   , o t , s t = q i , o t + 1 , ⋯   , o T , ∣ λ ) = ∑ i P ( o t + 1 , ⋯   , o T ∣ o 1 , ⋯   , o t , s t = q t , λ ) ⋅ P ( o 1 , ⋯   , o t , s t = q t ∣ λ ) = ∑ i P ( o t + 1 , ⋯   , o T ∣ s t = q t , λ ) ⋅ P ( o 1 , ⋯   , o t , s t = q t ∣ λ ) = ∑ i α t ( i ) β t ( i ) = ∑ i P ( s t = q i , O ∣ λ ) \begin{aligned} P(O|\lambda) & = \sum_{i}P(o_1, \cdots, o_t, s_t=q_i, o_{t+1}, \cdots, o_T, |\lambda)\\ & = \sum_{i}P(o_{t+1}, \cdots, o_T | o_1, \cdots, o_t , s_t= q_t, \lambda)\cdot P(o_1, \cdots,o_t,s_t=q_t |\lambda) \\ & = \sum_{i}P(o_{t+1}, \cdots, o_T|s_t=q_t, \lambda)\cdot P(o_1, \cdots, o_t, s_t=q_t | \lambda) \\ & = \sum_{i}\alpha_t(i)\beta_t(i)=\sum_iP(s_t=q_i, O|\lambda) \end{aligned} P(Oλ)=iP(o1,,ot,st=qi,ot+1,,oT,λ)=iP(ot+1,,oTo1,,ot,st=qt,λ)P(o1,,ot,st=qtλ)=iP(ot+1,,oTst=qt,λ)P(o1,,ot,st=qtλ)=iαt(i)βt(i)=iP(st=qi,Oλ)

t = T − 1 t=T-1 t=T1 t = 1 t=1 t=1时,上式分别表示前向和后向概率计算公式.

一些概率计算公式

给定模型 λ \lambda λ和观测序列 O O O,时刻 t t t处于状态 q i q_i qi的概率,记作
γ t ( i ) = P ( s t = q i ∣ O , λ ) = P ( s t = q i , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j α t ( j ) β t ( j ) \gamma_t(i) = P(s_t =q_i | O, \lambda) = \frac{P(s_t=q_i,O | \lambda)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\displaystyle\sum_{j}\alpha_t(j)\beta_t(j)} γt(i)=P(st=qiO,λ)=P(Oλ)P(st=qi,Oλ)=jαt(j)βt(j)αt(i)βt(i)

给定模型 λ \lambda λ和观测序列 O O O,时刻 t t t处于状态 q i q_i qi且时刻 t + 1 t+1 t+1处于状态 q j q_j qj的概率 ,记作
ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j ∣ O , λ ) = P ( s t = q i , s t + 1 = q j , O ∣ λ ) ∑ i ∑ j P ( s t = q i , s t + 1 = q j , O ∣ λ ) \xi_t(i, j) = P(s_t=q_i, s_{t+1}=q_j|O, \lambda) = \frac{P(s_t=q_i, s_{t+1}=q_j,O| \lambda)}{\displaystyle\sum_i\sum_jP(s_t=q_i, s_{t+1}=q_j, O|\lambda)} ξt(i,j)=P(st=qi,st+1=qjO,λ)=ijP(st=qi,st+1=qj,Oλ)P(st=qi,st+1=qj,Oλ)

其中, P ( s t = q i , s t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(s_t=q_i, s_{t+1}=q_j, O|\lambda)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j) P(st=qi,st+1=qj,Oλ)=αt(i)aijbj(ot+1)βt+1(j)


HMM模型学习

问题描述:给定观测序列 O = ( o 1 , o 2 , ⋯   , o T ) O = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT),求最可能的HMM的 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)

监督学习方法

若有足够多的标记数据,即已知隐含状态 m j m_j mj出现的次数 # ( m j ) \#(m_j) #(mj)、生成观测状态 v k v_k vk的次数 # ( v k , m j ) \#(v_k,m_j) #(vk,mj),则参数估计
a i j ≈ # ( m i , m j ) # ( m i ) , b j ( k ) ≈ # ( v k , m j ) # ( m j ) , π i ≈ # ( m i ) ∑ # ( m k ) a_{ij}\approx\frac{\#(m_i,m_j)}{\#(m_i)},\quad b_j(k)\approx\frac{\#(v_k,m_j)}{\#(m_j)},\quad \pi_i\approx\frac{\#(m_i)}{\displaystyle\sum \#(m_k)} aij#(mi)#(mi,mj),bj(k)#(mj)#(vk,mj),πi#(mk)#(mi)

很多应用不可能做到这件事情,比如语音识别的声学模型训练,人无法确定产生某个语音的状态序列。

期望最大化算法

HMM的概率模型
P ( O ∣ λ ) = ∑ S P ( O ∣ S , λ ) P ( S ∣ λ ) P(O|\lambda)=\sum_SP(O|S, \lambda)P(S|\lambda) P(Oλ)=SP(OS,λ)P(Sλ)

EM算法中的Q函数
Q ( λ , λ ′ ) = ∑ S P ( S ∣ O , λ ′ ) ln ⁡ P ( O , S ∣ λ ) ∝ ∑ S P ( O , S ∣ λ ′ ) ln ⁡ P ( O , S ∣ λ ) Q(\lambda, \lambda')=\sum_SP(S|O,\lambda')\ln P(O,S|\lambda)\propto\sum_S P(O,S|\lambda')\ln P(O,S|\lambda) Q(λ,λ)=SP(SO,λ)lnP(O,Sλ)SP(O,Sλ)lnP(O,Sλ)
根据状态序列和观测序列的联合分布(下标 i j i_j ij表示任意隐状态编号)
P ( O , S ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) ⋯ a i T − 1 i T b i T ( o T ) P(O,S|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)\cdots a_{i_{T-1}i_T}b_{i_T}(o_T) P(O,Sλ)=πi1bi1(o1)ai1i2bi2(o2)aiT1iTbiT(oT)

Q ( λ , λ ′ ) = ∑ S P ( O , S ∣ λ ′ ) ln ⁡ π i 1 + ∑ S P ( O , S ∣ λ ′ ) ln ⁡ ∑ t = 1 T − 1 a i t i t + 1 + ∑ S P ( O , S ∣ λ ′ ) ln ⁡ ∑ t = 1 T b i t ( o t ) Q(\lambda, \lambda')=\sum_SP(O,S|\lambda')\ln\pi_{i_1}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{i_t}(o_t) \\ Q(λ,λ)=SP(O,Sλ)lnπi1+SP(O,Sλ)lnt=1T1aitit+1+SP(O,Sλ)lnt=1Tbit(ot)
式中
∑ S P ( O , S ∣ λ ′ ) ln ⁡ π i 1 = ∑ i P ( O , s 1 = q i ∣ λ ′ ) ln ⁡ π i , ∑ i π i = 1 ∑ S P ( O , S ∣ λ ′ ) ln ⁡ ∑ t = 1 T − 1 a i t i t + 1 = ∑ i ∑ j P ( O , s t = q i , s t + 1 = q j ∣ λ ′ ) ∑ t = 1 T − 1 ln ⁡ a i j ∑ S P ( O , S ∣ λ ′ ) ln ⁡ ∑ t = 1 T b i t ( o t ) = ∑ i P ( O , i t = i ∣ λ ′ ) ln ⁡ ∑ i = 1 T b i ( o t ) \begin{aligned} & \sum_SP(O,S|\lambda')\ln \pi_{i_1}=\sum_iP(O,s_1=q_i|\lambda')\ln\pi_{i},\quad\sum_i\pi_i=1\\ &\sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}=\sum_i\sum_jP(O,s_t=q_i,s_{t+1}=q_j|\lambda')\sum_{t=1}^{T-1}\ln a_{ij}\\ & \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{it}(o_t)=\sum_iP(O,i_t=i|\lambda')\ln\sum_{i=1}^Tb_i(o_t) \end{aligned} SP(O,Sλ)lnπi1=iP(O,s1=qiλ)lnπi,iπi=1SP(O,Sλ)lnt=1T1aitit+1=ijP(O,st=qi,st+1=qjλ)t=1T1lnaijSP(O,Sλ)lnt=1Tbit(ot)=iP(O,it=iλ)lni=1Tbi(ot)
π i \pi_i πi a i j a_{ij} aij b j ( k ) b_j(k) bj(k)的偏导为0得(根据上节概率计算公式)
π i = P ( O , s 1 = q i ∣ λ ′ ) P ( O ∣ λ ′ ) = γ 1 ( i ) , a i j = ∑ i = 1 T − 1 ξ t ( i , j ) ∑ i = 1 T − 1 γ t ( i ) , b j ( k ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) \pi_i = \frac{P(O, s_1=q_i|\lambda')}{P(O|\lambda')}=\gamma_1(i),\quad a_{ij}=\frac{\sum_{i=1}^{T-1}\xi_t(i,j)}{\sum_{i=1}^{T-1}\gamma_t(i)},\quad b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t=1}^T\gamma_t(j)} πi=P(Oλ)P(O,s1=qiλ)=γ1(i),aij=i=1T1γt(i)i=1T1ξt(i,j),bj(k)=t=1Tγt(j)t=1,ot=vkTγt(j)


HMM预测/解码

给定模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)和观测序列 O = ( o 1 , o 2 , ⋯   , o T ) O = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT),求最可能的隐藏状态序列 S S S,即 P ( S ∣ O , λ ) P(S|O, \lambda) P(SO,λ).

贪心近似算法

给定 λ \lambda λ和观测序列 O O O,时刻 t t t处于状态 q i q_i qi的概率
γ t ( i ) = P ( s t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ j α t ( j ) β t ( j ) \gamma_t(i)=P(s_t=q_i | O, \lambda) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j}\alpha_t(j)\beta_t(j)} γt(i)=P(st=qiO,λ)=jαt(j)βt(j)αt(i)βt(i)
每个时刻t选择最可能出现的状态 s t ∗ s_t^* st,从而得到状态序列 S ∗ S^* S,即
S ∗ = ( s 1 ∗ , s 2 ∗ , ⋯   ) , s t ∗ = q k = arg ⁡ max ⁡ k γ t ( k ) S^*=(s_1^*,s_2^*,\cdots),\quad s_t^*=q_k = \arg\max_k\gamma_t(k) S=(s1,s2,),st=qk=argkmaxγt(k)


维特比算法

DP思想:最优路径中的部分路径也一定是最优的。设观测序列 o 1 , ⋯   , o t o_1,\cdots,o_t o1,,ot下状态 s t = q i s_t=q_i st=qi的所有路径中概率最大值为
δ t ( i ) = max ⁡ i P ( s t = q i , s t − 1 , ⋯   , s 1 , o t , ⋯   , o 1 ∣ λ ) \delta_t(i) = \max_{i}P(s_t=q_i, s_{t-1}, \cdots, s_1, o_t, \cdots, o_1|\lambda) δt(i)=imaxP(st=qi,st1,,s1ot,,o1λ)

递推公式
δ t + 1 ( i ) = max ⁡ j δ t ( j ) a j i b i ( o t + 1 ) , δ 1 ( i ) = π i b i ( o i ) \delta_{t+1}(i)=\max_j\delta_t(j)a_{ji}b_i(o_{t+1}),\quad \delta_1(i) = \pi_ib_i(o_i) δt+1(i)=jmaxδt(j)ajibi(ot+1),δ1(i)=πibi(oi)

定义时刻 t + 1 t+1 t+1状态为 q i q_i qi的最大概率路径的第 t t t个节点
i t = ψ t + 1 ( i ) = arg ⁡ max ⁡ j δ t ( j ) a j i , i T = arg ⁡ max ⁡ i δ T ( i ) i_{t} = \psi_{t+1}(i) = \arg\max_{j}\delta_{t}(j)a_{ji},\quad i_T=\arg\max_{i}\delta_T(i) it=ψt+1(i)=argjmaxδt(j)aji,iT=argimaxδT(i)

P ( S ∣ O , λ ) = max ⁡ i δ T ( i ) P(S|O,\lambda)=\max_{i}\delta_T(i) P(SO,λ)=maxiδT(i).

如图所示 δ 3 ( i 1 ) = max ⁡ { δ 2 ( i 1 ) a 11 b 1 ( o 3 ) ,    δ 2 ( i 2 ) a 21 b 1 ( o 3 ) ,    δ 2 ( i 3 ) a 31 b 1 ( o 3 ) } \delta_3(i_1)=\max\{\delta_2(i_1)a_{11}b_{1}(o_3), \,\,\delta_2(i_2)a_{21}b_1(o_3),\,\, \delta_2(i_3)a_{31}b_1(o_3)\} δ3(i1)=max{δ2(i1)a11b1(o3),δ2(i2)a21b1(o3),δ2(i3)a31b1(o3)}.

示例: 基于第4解模型 λ = ( A , B , Π ) \lambda = (A, B, \Pi) λ=(A,B,Π),已知观测序列 O = ( 红 , 白 , 红 ) O=(红, 白, 红) O=(,,),求最优状态序列。

I. 初始化
时刻 t = 1 t=1 t=1,每一个隐藏状态 q i q_i qi观测到红色的概率
δ 1 ( 1 ) = 0.2 ∗ 0.5 = 0.1 , δ 1 ( 2 ) = 0.4 ∗ 0.4 = 0.16 , δ 1 ( 3 ) = 0.4 ∗ 0.7 = 0.28 , ψ 1 ( i ) = 0 \delta_1(1)=0.2*0.5=0.1, \quad \delta_1(2)=0.4*0.4=0.16, \quad \delta_1(3)=0.4*0.7=0.28, \quad \psi_1(i)=0 δ1(1)=0.20.5=0.1,δ1(2)=0.40.4=0.16,δ1(3)=0.40.7=0.28,ψ1(i)=0


II. 迭代计算
时刻 t = 2 t=2 t=2状态为 q 1 q_1 q1观测为白的最大概率
δ 2 ( 1 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max ⁡ { 0.1 ∗ 0.5 , 0.16 ∗ 0.3 , 0.28 ∗ 0.2 } ∗ 0.5 = 0.028 , ψ 2 ( 1 ) = 3 \delta_2(1)=\max_{1\leq j \leq 3}[\delta_1(j)a_{j1}]b_1(o_2) = \max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5 = 0.028, \quad \psi_2(1)=3 δ2(1)=1j3max[δ1(j)aj1]b1(o2)=max{0.10.5,0.160.3,0.280.2}0.5=0.028,ψ2(1)=3
同理 δ 2 ( 2 ) = 0.0504 , ψ 2 ( 2 ) = 3 ;   δ 2 ( 3 ) = 0.042 , ψ 2 ( 3 ) = 3 \delta_2(2)=0.0504, \psi_2(2)=3; \, \delta_2(3)=0.042, \psi_2(3)=3 δ2(2)=0.0504,ψ2(2)=3;δ2(3)=0.042,ψ2(3)=3.

时刻 t = 3 t=3 t=3状态为 q j q_j qj观测为红的最大概率
δ 3 ( 1 ) = 0.00756 ,   ψ 3 ( 1 ) = 2 ,   δ 3 ( 2 ) = 0.01008 ,   ψ 3 ( 2 ) = 2 ,   δ 3 ( 3 ) = 0.0147 ,   ψ 3 ( 3 ) = 3. \delta_3(1)=0.00756,\ \psi_3(1)=2,\ \delta_3(2)=0.01008,\ \psi_3(2)=2,\ \delta_3(3)=0.0147,\ \psi_3(3)=3. δ3(1)=0.00756, ψ3(1)=2, δ3(2)=0.01008, ψ3(2)=2, δ3(3)=0.0147, ψ3(3)=3.


III. 最优概率路径
P ∗ = max ⁡ 1 ≤ i ≤ 3 δ 3 ( i ) = 0.0147 P^* = \max_{1\leq i \leq 3} \delta_3(i)=0.0147 P=1i3maxδ3(i)=0.0147
因此 i 3 = 3 i_3 = 3 i3=3 i 2 = ψ 3 ( i 3 ) = 3 i_2 = \psi_3(i_3)=3 i2=ψ3(i3)=3 i 1 = ψ 2 ( i 2 ) = 3 i_1 = \psi_2(i_2)=3 i1=ψ2(i2)=3,最优状态序列 I = ( i 1 , i 2 , i 3 ) = ( 3 , 3 , 3 ) I=(i_1, i_2, i_3)=(3,3,3) I=(i1,i2,i3)=(3,3,3).

隐藏状态序列 s = ( s 1 , ⋯   , s n ) \boldsymbol s=(s_1, \cdots, s_n) s=(s1,,sn),观测序列 o = ( o 1 , ⋯   , o n ) \boldsymbol o=(o_1, \cdots, o_n) o=(o1,,on).

HMM局限

HMM建模联合概率分布 λ = P ( S , O ) \lambda=P(S, O) λ=P(S,O),解码/预测问题是找到状态序列 s \boldsymbol s s,使得 P ( s ∣ o , λ ) P(\boldsymbol s|\boldsymbol o, λ) P(so,λ)最大。

HMM中, s i s_i si仅依赖 s i − 1 s_{i-1} si1 o i o_i oi依赖 s i s_i si,若观测序列通过很多特征刻画,比如NER任务中标注 s i s_i si不仅依赖 o i o_i oi,还依赖前后标注 o j ( j ≠ i ) o_j(j\neq i) oj(j=i),如周围观测的大小写、词性等特征,则HMM模型不能处理该类任务。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值