隐马尔可夫模型学习笔记(之二,学习算法)

隐马尔可夫模型的学习,根据训练数据是包括观测序列和状态序列还是只有观测序列,可以分别由监督学习与非监督学习实现。由于监督学习需要使用训练数据,而人工标注训练数据往往代价很高,有时就会利用非监督学习的方法,即Baum-Welch算法(也就是EM算法)。在介绍学习算法之前,先介绍一些概率和期望值的计算。这些计算会成为Baum-Welch算法公式的基础。

一些概率和期望值的计算

利用前向概率和后向概率,可以得到关于单个状态和两个状态概率的计算公式。

  1. 给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi的概率。记为
    γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_t(i) = P(i_t = q_i |O,\lambda) γt(i)=P(it=qiO,λ)
    先分解为分数形式
    γ t ( i ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) (1) \gamma_t(i) = \frac{P(i_t = q_i, O | \lambda)}{P(O|\lambda)}\tag{1} γt(i)=P(Oλ)P(it=qi,Oλ)(1)
    根据前向概率的定义可以做以下变换
    α t ( i ) = P ( o 1 , o 2 . . . o t , i t = q t ∣ λ ) = P ( i t = q t ∣ λ ) P ( o 1 , o 2 . . . o t ∣ i t = q t , λ ) \alpha_t(i) = P(o_1,o_2...o_t, i_t = q_t | \lambda) = P(i_t = q_t | \lambda)P(o_1,o_2...o_t| i_t = q_t , \lambda) αt(i)=P(o1,o2...ot,it=qtλ)=P(it=qtλ)P(o1,o2...otit=qt,λ)
    后向概率的定义如下
    β t ( i ) = P ( o t + 1 , o t + 2 . . . , o T ∣ i t = q t , λ ) \beta_t(i) = P(o_{t+1},o_{t+2}...,o_T | i_t = q_t , \lambda) βt(i)=P(ot+1,ot+2...,oTit=qt,λ)
    将这两者相乘得到

α t ( i ) ∗ β t ( i ) = P ( i t = q t , O ∣ λ ) (2) \alpha_t(i) * \beta_t(i) =P(i_t = q_t,O | \lambda)\tag{2} αt(i)βt(i)=P(it=qt,Oλ)(2)

以上结果从两者的定义上也很好理解。
对变量 i i i在范围 i = 1 , 2 , . . . N i = 1,2,...N i=1,2,...N上求和
∑ i = 1 N P ( i t = q t , O ∣ λ ) = P ( O ∣ λ ) (3) \sum_{i=1}^N P(i_t = q_t,O | \lambda) = {P(O|\lambda)}\tag{3} i=1NP(it=qt,Oλ)=P(Oλ)(3)
将式 ( 2 ) , ( 3 ) (2),(3) (2),(3)代入 ( 1 ) (1) (1)可以得到
γ t ( i ) = α t ( i ) ∗ β t ( i ) ∑ j = 1 N α t ( j ) ∗ β t ( j ) (4) \gamma_t(i) = \frac{\alpha_t(i) * \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) * \beta_t(j)}\tag{4} γt(i)=j=1Nαt(j)βt(j)αt(i)βt(i)(4)
3. 给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi且在时刻 t + 1 t+1 t+1处于状态 q j q_j qj的概率。记为
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O , λ ) \xi_t(i,j) = P(i_t = q_i,i_{t+1} = q_j |O,\lambda) ξt(i,j)=P(it=qi,it+1=qjO,λ)
通过前向后向概率计算:
ξ t ( i ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) ∑ i = 1 N ∑ j = 1 N P ( i t = q i , i t + 1 = q j , O ∣ λ ) \xi_t(i) = \frac{P(i_t = q_i, i_{t+1} = q_j,O | \lambda)}{P(O|\lambda)}=\frac{P(i_t = q_i, i_{t+1} = q_j,O | \lambda)}{\sum_{i=1}^N\sum_{j=1}^NP(i_t = q_i, i_{t+1} = q_j,O | \lambda)} ξt(i)=P(Oλ)P(it=qi,it+1=qj,Oλ)=i=1Nj=1NP(it=qi,it+1=qj,Oλ)P(it=qi,it+1=qj,Oλ)
分子可以用前向后向概率表示
P ( i t = q i , i t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(i_t = q_i, i_{t+1} = q_j,O | \lambda) = \alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j) P(it=qi,it+1=qj,Oλ)=αt(i)aijbj(ot+1)βt+1(j)
ξ t ( i ) \xi_t(i) ξt(i)可以表示为
ξ t ( i ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \xi_t(i) = \frac{\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^N\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)} ξt(i)=i=1Nj=1Nαt(i)aijbj(ot+1)βt+1(j)αt(i)aijbj(ot+1)βt+1(j)
4. 将 γ t ( i ) \gamma_t(i) γt(i) ξ t ( i , j ) \xi_t(i,j) ξt(i,j)对各个时刻求和,可以得到一些有用的期望值。
(1) 观测 O O O下,状态 i i i出现的期望值
∑ t = 1 T γ t ( i ) \sum_{t=1}^T\gamma_t(i) t=1Tγt(i)
将每一个时刻下,出现状态 i i i的概率相加
(2) 观测 O O O下,由状态 i i i转移的期望值
∑ t = 1 T − 1 γ t ( i ) \sum_{t=1}^{T-1}\gamma_t(i) t=1T1γt(i)
能够从状态 i i i转移的时刻是 1 , 2... T − 1 1,2...T-1 1,2...T1,比上一个求和公式少了时刻 T T T
(3) 观测 O O O下,由状态 i i i转移到状态 j j j的期望值
∑ t = 1 T − 1 ξ t ( i , j ) \sum_{t=1}^{T-1}\xi_t(i,j) t=1T1ξt(i,j)

Baum-Welch模型

参数估计公式

推导的过程,尤其是拉格朗日对偶,我暂时还不十分理解,先直接给出训练方法,公式和代码。Baum-Welch算法(Baum-Welch algorithm),它是EM算法在隐马尔可夫模型学习过程中的具体实现,由Baum和Welch提出。

(1)初始化
n = 0 n=0 n=0,选取 a i j 0 , b j ( k ) 0 , π i 0 a_{ij}^{0} ,b_j(k)^{0} ,\pi_{i}^{0} aij0bj(k)0πi0,得到模型 λ 0 = ( a i j 0 , b j ( k ) 0 , π i 0 ) \lambda^0 = (a_{ij}^{0} ,b_j(k)^{0} ,\pi_{i}^{0}) λ0=(aij0bj(k)0πi0)

(2)递推。对 n = 1 , 2 , . . . n = 1,2,... n=1,2,...
a i j n + 1 = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{ij}^{n+1} = \frac{\sum_{t= 1}^{T-1}\xi_t(i,j)}{\sum_{t= 1}^{T-1}\gamma_t(i)} aijn+1=t=1T1γt(i)t=1T1ξt(i,j)
b j ( k ) n + 1 = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_j(k)^{n+1} = \frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t= 1}^T\gamma_t(j)} bj(k)n+1=t=1Tγt(j)t=1,ot=vkTγt(j)
π i n + 1 = γ 1 ( i ) \pi_i^{n+1} = \gamma_1(i) πin+1=γ1(i)
公式右端按照观测 O = ( o 1 , o 2 , . . . o T ) O = (o_1,o_2,...o_T) O=(o1,o2,...oT)和模型 λ n = ( a i j n , b j ( k ) n , π i n ) \lambda^n = (a_{ij}^{n} ,b_j(k)^{n} ,\pi_{i}^{n}) λn=(aijnbj(k)nπin)代入计算
这两个训练的公式还是比较好理解。 a i j n + 1 a_{ij}^{n+1} aijn+1是转移概率,分母代表当前观测 O O O下,由状态 i i i转移的期望值,而分子代表观测 O O O下,由状态 i i i转移到状态 j j j的期望值。两者相除即为 a i j n + 1 a_{ij}^{n+1} aijn+1
(3)终止,得到模型 λ n + 1 = ( a i j n + 1 , b j ( k ) n + 1 , π i n + 1 ) \lambda^{n+1} = (a_{ij}^{n+1} ,b_j(k)^{n+1} ,\pi_{i}^{n+1}) λn+1=(aijn+1bj(k)n+1πin+1)

###Baum-Welch算法的Python实现

def baum_welch_train(self, observations, criterion=0.05):
    n_states = self.A.shape[0]
    n_samples = len(observations)
 
    done = False
    while not done:
        # alpha_t(i) = P(O_1 O_2 ... O_t, q_t = S_i | hmm)
        # Initialize alpha
        alpha = self._forward(observations)
 
        # beta_t(i) = P(O_t+1 O_t+2 ... O_T | q_t = S_i , hmm)
        # Initialize beta
        beta = self._backward(observations)
 
        xi = np.zeros((n_states,n_states,n_samples-1))
        for t in range(n_samples-1):
            denom = np.dot(np.dot(alpha[:,t].T, self.A) * self.B[:,observations[t+1]].T, beta[:,t+1])
            for i in range(n_states):
                numer = alpha[i,t] * self.A[i,:] * self.B[:,observations[t+1]].T * beta[:,t+1].T
                xi[i,:,t] = numer / denom
 
        # gamma_t(i) = P(q_t = S_i | O, hmm)
        gamma = np.sum(xi,axis=1)
        # Need final gamma element for new B
        prod =  (alpha[:,n_samples-1] * beta[:,n_samples-1]).reshape((-1,1))
        gamma = np.hstack((gamma,  prod / np.sum(prod))) #append one more to gamma!!!
 
        newpi = gamma[:,0]
        newA = np.sum(xi,2) / np.sum(gamma[:,:-1],axis=1).reshape((-1,1))
        newB = np.copy(self.B)
 
        num_levels = self.B.shape[1]
        sumgamma = np.sum(gamma,axis=1)
        for lev in range(num_levels):
            mask = observations == lev
            newB[:,lev] = np.sum(gamma[:,mask],axis=1) / sumgamma
 
        if np.max(abs(self.pi - newpi)) < criterion and \
                        np.max(abs(self.A - newA)) < criterion and \
                        np.max(abs(self.B - newB)) < criterion:
            done = 1
 
        self.A[:],self.B[:],self.pi[:] = newA,newB,newpi
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值