HMM隐马尔可夫模型的数学推导(一)

前言

本文将对隐马尔可夫模型的几个求解问题进行推导。不涉及什么是隐马尔可夫,什么是马尔可夫链之类的东西。
数学基础:【概率论与数理统计知识复习-哔哩哔哩】

原理推导

在推导之前,先对我们的变量进行一下定义。

观测序列X,隐序列Z
X = ( x 1 , x 2 , ⋯   , x T ) ; Z = ( z 1 , z 2 , ⋯   , z T ) ; X=\begin{pmatrix} x_1,x_2,\cdots,x_T \end{pmatrix}; \\Z=\begin{pmatrix} z_1,z_2,\cdots,z_T \end{pmatrix}; X=(x1,x2,,xT);Z=(z1,z2,,zT);
x T x_T xT表示观测序列一共有T个。并且每一个都是不同的随机变量,而对应的隐序列也是如此。
x i = ( v 1 , v 2 , ⋯   , v m ) ; z i = ( q 1 , q 2 , ⋯   , q n ) x_i=\begin{pmatrix} v_1,v_2,\cdots,v_m \end{pmatrix}; z_i=\begin{pmatrix} q_1,q_2,\cdots,q_n \end{pmatrix} xi=(v1,v2,,vm);zi=(q1,q2,,qn)
表示每个 x i x_i xi有m个状态集,每个 z i z_i zi有n个状态集合(假设z是离散型)

两个假设

隐马尔可夫模型带有两个假设。

齐次马尔可夫假设,当前隐序列仅跟前一个隐序列有关。公式表达为
P ( z t ∣ z 1 , z 2 , ⋯ z t − 1 , x 1 , ⋯   , x t − 1 ) = P ( z t ∣ z t − 1 ) P(z_t|z_1,z_2,\cdots z_{t-1},x_1,\cdots,x_{t-1})=P(z_t|z_{t-1}) P(ztz1,z2,zt1,x1,,xt1)=P(ztzt1)
观测独立假设。当前观测状态仅和当前的隐状态有关。公式表达为
P ( x t ∣ x 1 , x 2 , ⋯ x t − 1 , z 1 , ⋯   , z t ) = P ( x t ∣ z t ) P(x_t|x_1,x_2,\cdots x_{t-1},z_1,\cdots,z_{t})=P(x_t|z_{t}) P(xtx1,x2,xt1,z1,,zt)=P(xtzt)

Learning:

学习参数,是几乎每一个模型都必须要经历的事情,也是模型预测的前提。因此,我们先开始学习模型参数,在学习之前,先对模型参数做一下定义

初始概率分布 π \pi π,转移矩阵 A A A,发射矩阵 B B B
π = ( π 1 π 2 ⋯ π n ) \pi=\begin{pmatrix} \pi_1&\pi_2& \cdots & \pi_n \end{pmatrix} π=(π1π2πn)
那么对应的转移矩阵A就是(n,n)维的矩阵, A = [ a i j ] A=[{a_{ij}}] A=[aij],表示从状态i转移到状态j的概率。而发射矩阵B则为(n,m)维的矩阵。 B = [ b i j ] B=[b_{ij}] B=[bij]表示从隐状态i发射到观测状态j的概率

现在,我们用 θ = ( π , A , B ) \theta=(\pi,A,B) θ=(π,A,B)来表示参数。

要求出这些参数,因为观测序列X是我们给定的训练数据集。最朴素的一种想法就是使用极大似然估计求解参数。
θ ^ = max ⁡ θ P ( X ∣ θ ) \hat \theta=\max\limits_{\theta}{P(X|\theta)} θ^=θmaxP(Xθ)
请注意 P ( X ∣ θ ) P(X|\theta) P(Xθ)中的 θ \theta θ是参数。

EM算法求解

对于隐马尔可夫模型,如果X,Z都是给定的,那么直接采用极大似然估计就可以求解,一般称为监督式学习。另外一种就是,仅仅给了X,而未给定Z,称为非监督式学习。

对于非监督式学习,隐马尔科夫链学习参数用的是EM算法。

EM算法分为两步:

给定 P ( Z ∣ X , θ t ) → E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] P(Z|X,\theta^{t})\rightarrow{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} P(ZX,θt)EP(ZX,θt)[logP(Z,Xθ)]

θ t + 1 = max ⁡ θ E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] {\theta^{t+1}}=\max\limits_{\theta}{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} θt+1=θmaxEP(ZX,θt)[logP(Z,Xθ)]

所以,最主要的是求出
E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] = ∑ Z l o g P ( Z , X ∣ θ ) P ( Z ∣ X , θ t ) E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]=\sum\limits_{Z}logP(Z,X|\theta)P(Z|X,\theta^t) EP(ZX,θt)[logP(Z,Xθ)]=ZlogP(Z,Xθ)P(ZX,θt)
首先求出 P ( Z , X ∣ θ ) P(Z,X|\theta) P(Z,Xθ)

观测序列和隐序列有着关联,所以在概率中引入隐变量。
P ( X ∣ θ ) = ∑ Z P ( X , Z ∣ θ ) = ∑ Z P ( X ∣ Z , θ ) P ( Z ∣ θ ) \begin{equation} \begin{aligned} P(X|\theta)=&\sum\limits_{Z}P(X,Z|\theta) \\=&\sum\limits_{Z}P(X|Z,\theta)P(Z|\theta) \end{aligned} \end{equation} P(Xθ)==ZP(X,Zθ)ZP(XZ,θ)P(Zθ)
对于 P ( Z ∣ θ ) P(Z|\theta) P(Zθ)
P ( Z ∣ θ ) = P ( z 1 , z 2 , ⋯   , z T ∣ θ ) = P ( z T ∣ z 1 , z 2 , ⋯   , z T − 1 , θ ) P ( z 1 , z 2 , ⋯   , z T − 1 ∣ θ ) = P ( z T ∣ z T − 1 , θ ) P ( z 1 , z 2 , ⋯   , z T − 1 ∣ θ ) = a ( z T − 1 , z T ) P ( z 1 , z 2 , ⋯   , z T − 1 ∣ θ ) \begin{equation} \begin{aligned} P(Z|\theta)=&P(z_1,z_2,\cdots,z_T|\theta) \\=&P(z_T|z_1,z_2,\cdots,z_{T-1},\theta)P(z_1,z_2,\cdots,z_{T-1}|\theta) \\=&P(z_T|z_{T-1},\theta)P(z_1,z_2,\cdots,z_{T-1}|\theta) \\=&a_{(z_{T-1},z_{T})}P(z_1,z_2,\cdots,z_{T-1}|\theta) \end{aligned} \end{equation} P(Zθ)====P(z1,z2,,zTθ)P(zTz1,z2,,zT1,θ)P(z1,z2,,zT1θ)P(zTzT1,θ)P(z1,z2,,zT1θ)a(zT1,zT)P(z1,z2,,zT1θ)
里面用到了齐次马尔可夫假设。其中 a ( z T − 1 , z T ) a_{(z_{T-1},z_{T})} a(zT1,zT)表示第 T − 1 T-1 T1个隐序列到第 T T T个隐序列的概率。我们发现 P ( z 1 , z 2 , ⋯   , z T − 1 ∣ θ ) P(z_1,z_2,\cdots,z_{T-1}|\theta) P(z1,z2,,zT1θ) P ( z 1 , z 2 , ⋯   , z T ∣ θ ) P(z_1,z_2,\cdots,z_T|\theta) P(z1,z2,,zTθ)只差一个,那么我们再对 P ( z 1 , z 2 , ⋯   , z T − 1 ∣ θ ) P(z_1,z_2,\cdots,z_{T-1}|\theta) P(z1,z2,,zT1θ)以上面的方法不断递归,最终得到
P ( Z ∣ θ ) = π ∏ i = 1 T − 1 a ( z i , z i + 1 ) P(Z|\theta)=\pi\prod\limits_{i=1}^{T-1}a_{(z_{i},z_{i+1})} P(Zθ)=πi=1T1a(zi,zi+1)

其中 π \pi π是因为 P ( z 1 ∣ z 0 ) P(z_1|z_0) P(z1z0),即代表初始概率。

对于 P ( X ∣ Z , θ ) P(X|Z,\theta) P(XZ,θ)
P ( X ∣ Z , θ ) = P ( x 1 , x 2 , ⋯   , x T ∣ Z , θ ) = P ( x T ∣ x 1 , x 2 , ⋯   , x T − 1 , Z , θ ) P ( x 1 , x 2 , ⋯   , x T − 1 ∣ Z , θ ) = P ( x T ∣ z T ) P ( x 1 , x 2 , ⋯   , x T − 1 ∣ Z , θ ) = b ( z T , x T ) \begin{equation} \begin{aligned} P(X|Z,\theta)=&P(x_1,x_2,\cdots,x_T|Z,\theta) \\=&P(x_T|x_1,x_2,\cdots,x_{T-1},Z,\theta)P(x_1,x_2,\cdots,x_{T-1}|Z,\theta) \\=&P(x_T|z_{T})P(x_1,x_2,\cdots,x_{T-1}|Z,\theta) \\=&b_{(z_T,x_T)} \end{aligned} \end{equation} P(XZ,θ)====P(x1,x2,,xTZ,θ)P(xTx1,x2,,xT1,Z,θ)P(x1,x2,,xT1Z,θ)P(xTzT)P(x1,x2,,xT1Z,θ)b(zT,xT)
其中 a ( z T , x T ) a_{(z_{T},x_{T})} a(zT,xT)表示第 T T T个隐序列到第 T T T个观测序列的概率。里面用到了观测独立假设。我们同样发现这也可以用递归。同上面的一样。最终得到
P ( X ∣ Z , θ ) = ∏ j = 1 T b ( z j , z j ) P(X|Z,\theta)=\prod\limits_{j=1}^{T}b_{(z_j,z_j)} P(XZ,θ)=j=1Tb(zj,zj)
所以最终
P ( X ∣ θ ) = ∑ Z π ∏ i = 1 T − 1 a ( z i , z i + 1 ) ∏ j = 1 T b ( z j , z j ) P(X|\theta)=\sum\limits_{Z}\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)} P(Xθ)=Zπi=1T1a(zi,zi+1)j=1Tb(zj,zj)

P ( Z , X ∣ θ ) = π ∏ i = 1 T − 1 a ( z i , z i + 1 ) ∏ j = 1 T b ( z j , z j ) P(Z,X|\theta)=\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)} P(Z,Xθ)=πi=1T1a(zi,zi+1)j=1Tb(zj,zj)
所以对于EM算法所求
E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] = ∑ Z l o g [ π ∏ i = 1 T − 1 a ( z i , z i + 1 ) ∏ j = 1 T b ( z j , z j ) ] P ( Z ∣ X , θ t ) = ∑ Z [ l o g π + l o g ∏ i = 1 T − 1 a ( z i , z i + 1 ) + l o g ∏ j = 1 T b ( z j , z j ) ] P ( Z ∣ X , θ t ) \begin{equation} \begin{aligned} E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]=&\sum\limits_{Z}log\left[\pi\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}\prod\limits_{j=1}^{T}b_{(z_j,z_j)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[log\pi+log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}+log\prod\limits_{j=1}^Tb_{(z_j,z_j)}\right]P(Z|X,\theta^t) \end{aligned} \end{equation} EP(ZX,θt)[logP(Z,Xθ)]==Zlog[πi=1T1a(zi,zi+1)j=1Tb(zj,zj)]P(ZX,θt)Z[logπ+logi=1T1a(zi,zi+1)+logj=1Tb(zj,zj)]P(ZX,θt)

要求 θ t + 1 = max ⁡ θ E P ( Z ∣ X , θ t ) [ l o g P ( Z , X ∣ θ ) ] {\theta^{t+1}}=\max\limits_{\theta}{E_{P(Z|X,\theta^{t})}\left[logP(Z,X|\theta)\right]} θt+1=θmaxEP(ZX,θt)[logP(Z,Xθ)]。因为 θ = ( π , A , B ) \theta=(\pi,A,B) θ=(π,A,B),分别对里面所有的值求最大。

对于 π \pi π
π t + 1 = max ⁡ π ∑ Z [ l o g π + l o g ∏ i = 1 T − 1 a ( z i , z i + 1 ) + l o g ∏ j = 1 T b ( z j , z j ) ] P ( Z ∣ X , θ t ) = max ⁡ π ∑ Z l o g ( π ) P ( Z ∣ X , θ t ) = max ⁡ π ∑ z 1 , z 2 , ⋯   , z T l o g ( π ) P ( z 1 , z 2 , ⋯   , z T ∣ X , θ t ) = max ⁡ π ∑ z 1 l o g ( π ) ∑ z 2 , ⋯   , z T P ( z 1 , z 2 , ⋯   , z T ∣ X , θ t ) = max ⁡ π ∑ z 1 l o g ( π ) P ( z 1 ∣ X , θ t ) = max ⁡ π ∑ i = 1 n l o g ( π i ) P ( z 1 = q i ∣ X , θ t ) = max ⁡ π ∑ i = 1 n l o g ( π i ) P ( z 1 = q i , X ∣ θ t ) P ( X , ∣ θ t ) = max ⁡ π ∑ i = 1 n l o g ( π i ) P ( z 1 = q i , X ∣ θ t ) \begin{equation} \begin{aligned} \pi^{t+1}=&\max\limits_{\pi}\sum\limits_{Z}\left[log\pi+log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}+log\prod\limits_{j=1}^Tb_{(z_j,z_j)}\right]P(Z|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{Z}log(\pi){P(Z|X,\theta^t)} \\=&\max\limits_{\pi}\sum_{z_1,z_2,\cdots,z_T}log(\pi){P(z_1,z_2,\cdots,z_T|X,\theta^t)} \\=&\max\limits_{\pi}\sum\limits_{z_1}log(\pi)\sum\limits_{z_2,\cdots,z_T}P(z_1,z_2,\cdots,z_T|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{z_1}log(\pi)P(z_1|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i|X,\theta^t) \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})\frac{P(z_1=q_i,X|\theta^t)}{P(X,|\theta^t)} \\=&\max\limits_{\pi}\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i,X|\theta^t) \end{aligned} \end{equation} πt+1========πmaxZ[logπ+logi=1T1a(zi,zi+1)+logj=1Tb(zj,zj)]P(ZX,θt)πmaxZlog(π)P(ZX,θt)πmaxz1,z2,,zTlog(π)P(z1,z2,,zTX,θt)πmaxz1log(π)z2,,zTP(z1,z2,,zTX,θt)πmaxz1log(π)P(z1X,θt)πmaxi=1nlog(πi)P(z1=qiX,θt)πmaxi=1nlog(πi)P(X,θt)P(z1=qi,Xθt)πmaxi=1nlog(πi)P(z1=qi,Xθt)

因为 π \pi π为初始概率分布,故 ∑ i = 1 n π i = 1 \sum\limits_{i=1}^n\pi_i=1 i=1nπi=1,所以,问题就变成了带约束的优化问题。

构造拉格朗日函数
L ( π , λ ) = ∑ i = 1 n l o g ( π i ) P ( z 1 = q i , X ∣ θ t ) + λ [ ∑ i = 1 n π i − 1 ] L(\pi,\lambda)=\sum\limits_{i=1}^nlog(\pi_{i})P(z_1=q_i,X|\theta^t)+\lambda\left[\sum\limits_{i=1}^n\pi_i-1\right] L(π,λ)=i=1nlog(πi)P(z1=qi,Xθt)+λ[i=1nπi1]
π i \pi_i πi求导
∂ L ( π , λ ) ∂ π i = 1 π i P ( z 1 = q i , X ∣ θ t ) + λ = 0 等式左右乘以 π i P ( z 1 = q i , X ∣ θ t ) + λ π i = 0 \begin{equation} \begin{aligned} &\frac{\partial{L(\pi,\lambda)}}{\partial{\pi_i}} \\=&\frac{1}{\pi_i}P(z_1=q_i,X|\theta^t)+\lambda \\=&0 \\&等式左右乘以\pi_i \\&P(z_1=q_i,X|\theta^t)+\lambda\pi_i=0 \end{aligned} \end{equation} ==πiL(π,λ)πi1P(z1=qi,Xθt)+λ0等式左右乘以πiP(z1=qi,Xθt)+λπi=0
所以
∑ i = 1 n [ P ( z 1 = q i , X ∣ θ t ) + π i λ ] = 0 即 ∑ i = 1 n P ( z 1 = q i , X ∣ θ t ) + λ ∑ i = 1 n π i = ∑ i = 1 n P ( z 1 = q i , X ∣ θ t ) + λ = P ( X ∣ θ t ) + λ = 0 \begin{equation} \begin{aligned} &\sum\limits_{i=1}^n\left[P(z_1=q_i,X|\theta^t)+\pi_i\lambda\right]=0 \\即 \\&\sum\limits_{i=1}^nP(z_1=q_i,X|\theta^t)+\lambda\sum\limits_{i=1}^n\pi_i \\&=\sum\limits_{i=1}^nP(z_1=q_i,X|\theta^t)+\lambda \\&=P(X|\theta^t)+\lambda \\&=0 \end{aligned} \end{equation} i=1n[P(z1=qi,Xθt)+πiλ]=0i=1nP(z1=qi,Xθt)+λi=1nπi=i=1nP(z1=qi,Xθt)+λ=P(Xθt)+λ=0
最后
λ = − P ( X ∣ θ t ) \lambda=-P(X|\theta^t) λ=P(Xθt)
将其回代入 P ( z 1 = q i , X ∣ θ t ) + λ π i = 0 P(z_1=q_i,X|\theta^t)+\lambda\pi_i=0 P(z1=qi,Xθt)+λπi=0
π i = P ( z 1 = q i , X ∣ θ t ) P ( X ∣ θ t ) \pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)} πi=P(Xθt)P(z1=qi,Xθt)

对于状态转移矩阵A
L ( A ) = ∑ Z l o g ∏ i = 1 T − 1 a ( z i , z i + 1 ) P ( Z ∣ X , θ t ) = ∑ Z ∑ i = 1 T − 1 l o g [ a ( z i , z i + 1 ) ] P ( Z ∣ X , θ t ) = ∑ Z [ log ⁡ a ( z 1 , z 2 ) + log ⁡ a ( z 2 , z 3 ) + ⋯ + log ⁡ a ( z T − 1 , z T ) ] P ( Z ∣ X , θ t ) = ∑ Z log ⁡ a ( z 1 , z 2 ) P ( Z ∣ X , θ t ) + ∑ Z log ⁡ a ( z 2 , z 3 ) P ( Z ∣ X , θ t ) + ⋯ + ∑ Z log ⁡ a ( z T − 1 , z T ) P ( Z ∣ X , θ t ) \begin{equation} \begin{aligned} L(A)=&\sum\limits_{Z}log\prod\limits_{i=1}^{T-1}a_{(z_i,z_{i+1})}P(Z|X,\theta^t) \\=&\sum\limits_{Z}\sum\limits_{i=1}^{T-1}log[a_{(z_i,z_{i+1})}]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[\log a_{(z_1,z_{2})}+\log a_{(z_2,z_{3})}+\cdots+\log a_{(z_{T-1},z_{T})}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\log a_{(z_1,z_{2})}P(Z|X,\theta^t)+\sum\limits_{Z}\log a_{(z_2,z_{3})}P(Z|X,\theta^t)+\cdots+\sum\limits_{Z}\log a_{(z_{T-1},z_{T})}P(Z|X,\theta^t) \end{aligned} \end{equation} L(A)====Zlogi=1T1a(zi,zi+1)P(ZX,θt)Zi=1T1log[a(zi,zi+1)]P(ZX,θt)Z[loga(z1,z2)+loga(z2,z3)++loga(zT1,zT)]P(ZX,θt)Zloga(z1,z2)P(ZX,θt)+Zloga(z2,z3)P(ZX,θt)++Zloga(zT1,zT)P(ZX,θt)
一项一项地看
∑ Z log ⁡ a ( z 1 , z 2 ) P ( Z ∣ X , θ t ) = ∑ z 1 , z 2 , ⋯   , z T log ⁡ a ( z 1 , z 2 ) P ( Z ∣ X , θ t ) = ∑ z 1 ∑ z 2 log ⁡ a ( z 1 , z 2 ) ∑ z 3 , ⋯   , z T P ( z 1 , z 2 , ⋯   , z T ∣ X , θ t ) = ∑ z 1 ∑ z 2 log ⁡ a ( z 1 , z 2 ) P ( z 1 , z 2 ∣ X , θ t ) = ∑ i = 1 n ∑ j = 2 n log ⁡ a ( z 1 = q i , z 2 = q j ) P ( z 1 = q i , z 2 = q j ∣ X , θ t ) \begin{equation} \begin{aligned} &\sum\limits_{Z}\log a_{(z_1,z_{2})}P(Z|X,\theta^t) \\=&\sum\limits_{z_1,z_2,\cdots,z_T}\log a_{(z_1,z_{2})}P(Z|X,\theta^t) \\=&\sum\limits_{z_1}\sum\limits_{z_2}\log a_{(z_1,z_{2})}\sum\limits_{z_3,\cdots,z_T}P(z_1,z_2,\cdots,z_T|X,\theta^t) \\=&\sum\limits_{z_1}\sum\limits_{z_2}\log a_{(z_1,z_{2})}P(z_1,z_2|X,\theta^t) \\=&\sum\limits_{i=1}^n\sum\limits_{j=2}^n\log a_{(z_1=q_i,z_{2}=q_j)}P(z_1=q_i,z_2=q_j|X,\theta^t) \end{aligned} \end{equation} ====Zloga(z1,z2)P(ZX,θt)z1,z2,,zTloga(z1,z2)P(ZX,θt)z1z2loga(z1,z2)z3,,zTP(z1,z2,,zTX,θt)z1z2loga(z1,z2)P(z1,z2X,θt)i=1nj=2nloga(z1=qi,z2=qj)P(z1=qi,z2=qjX,θt)
所以,由第一项可得其余项,加起来得
L ( A ) = ∑ t = 1 T − 1 ∑ i = 1 n ∑ j = 1 n log ⁡ a ( z t = q i , z t + 1 = q j ) P ( z t = q i , z t + 1 = q J ∣ X , θ t ) L(A)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_J|X,\theta^t) L(A)=t=1T1i=1nj=1nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qJX,θt)
同求 π \pi π中的一样,把 P ( z t = q i , z t + 1 = q J ∣ X , θ t ) P(z_t=q_i,z_{t+1}=q_J|X,\theta^t) P(zt=qi,zt+1=qJX,θt)进行贝叶斯展开,然后因为 P ( X ) P(X) P(X)与我们要求的无关,故
L ( A ) = ∑ t = 1 T − 1 ∑ i = 1 n ∑ j = 1 n log ⁡ a ( z t = q i , z t + 1 = q j ) P ( z t = q i , z t + 1 = q J , X ∣ θ t ) L(A)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_J,X|\theta^t) L(A)=t=1T1i=1nj=1nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qJ,Xθt)
又因为对于状态转移矩阵的一行,有 ∑ j = 1 n a ( z = q i , z = q j ) = 1 \sum\limits_{j=1}^na_{(z=q_i,z=q_j)}=1 j=1na(z=qi,z=qj)=1,所以对于矩阵A的拉格朗日函数
P ( A , λ ) = ∑ t = 1 T − 1 ∑ i = 1 n ∑ j = 1 n log ⁡ a ( z t = q i , z t + 1 = q j ) P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + λ [ ∑ j = 1 n a ( z = q i , z = q j ) − 1 ] P(A,\lambda)=\sum\limits_{t=1}^{T-1}\sum\limits_{i=1}^n\sum\limits_{j=1}^n\log a_{(z_t=q_i,z_{t+1}=q_j)}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda\left[\sum\limits_{j=1}^na_{(z=q_i,z=q_j)}-1\right] P(A,λ)=t=1T1i=1nj=1nloga(zt=qi,zt+1=qj)P(zt=qi,zt+1=qj,Xθt)+λ[j=1na(z=qi,z=qj)1]
a ( z = q i , z = q j ) a_{(z=q_i,z=q_j)} a(z=qi,z=qj)求导
∂ P ( A , λ ) ∂ a ( z = q i , z = q j ) = ∑ t T − 1 1 a ( z t = q i , z t + 1 = q j ) P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + λ = 0 等式左右乘以 a ( z = q i , z = q j ) 得: ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + λ a ( z = q i , z = q j ) = 0 \begin{equation} \begin{aligned} \frac{\partial P(A,\lambda)}{\partial a_{(z=q_i,z=q_j)}}=&\sum\limits_{t}^{T-1}\frac{1}{a_{(z_t=q_i,z_{t+1}=q_j)}} P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda \\=&0 \\&等式左右乘以a_{(z=q_i,z=q_j)}得: \\&\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}=0 \end{aligned} \end{equation} a(z=qi,z=qj)P(A,λ)==tT1a(zt=qi,zt+1=qj)1P(zt=qi,zt+1=qj,Xθt)+λ0等式左右乘以a(z=qi,z=qj)得:tT1P(zt=qi,zt+1=qj,Xθt)+λa(z=qi,z=qj)=0
所以,对于不同的 q j q_j qj,有
∑ j = 1 n [ ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + λ a ( z = q i , z = q j ) ] = 0 即 ∑ j = 1 n ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + ∑ j = 1 n λ a ( z = q i , z = q j ) = 0 即 ∑ t = 1 T − 1 P ( z t = q i , X ∣ θ t ) + λ = 0 \begin{equation} \begin{aligned} &\sum\limits_{j=1}^n\left[\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}\right]=0 \\即 \\&\sum\limits_{j=1}^n\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\sum\limits_{j=1}^n\lambda a_{(z=q_i,z=q_j)}=0 \\即 \\&\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)+\lambda =0 \end{aligned} \end{equation} j=1n[tT1P(zt=qi,zt+1=qj,Xθt)+λa(z=qi,z=qj)]=0j=1ntT1P(zt=qi,zt+1=qj,Xθt)+j=1nλa(z=qi,z=qj)=0t=1T1P(zt=qi,Xθt)+λ=0
将所得 λ \lambda λ回代入 ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) + λ a ( z = q i , z = q j ) = 0 \sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)+\lambda a_{(z=q_i,z=q_j)}=0 tT1P(zt=qi,zt+1=qj,Xθt)+λa(z=qi,z=qj)=0,得
a ( z = q i , z = q j ) = ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) ∑ t = 1 T − 1 P ( z t = q i , X ∣ θ t ) a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)} a(z=qi,z=qj)=t=1T1P(zt=qi,Xθt)tT1P(zt=qi,zt+1=qj,Xθt)
对于发射矩阵B
L ( B ) = ∑ Z l o g ∏ j = 1 T b ( z i , x i ) P ( Z ∣ X , θ t ) = ∑ Z ∑ j = 1 T l o g [ b ( z i , x i ) ] P ( Z ∣ X , θ t ) = ∑ Z [ l o g b ( z 1 , x 1 ) + l o g b ( z 2 , x 2 ) + ⋯ + l o g b ( z T , x T ) ] P ( Z ∣ X , θ t ) = ∑ Z l o g [ b ( z 1 , x 1 ) ] P ( Z ∣ X , θ t ) + ∑ Z l o g [ b ( z 2 , x 2 ) ] P ( Z ∣ X , θ t ) + ⋯ + ∑ Z l o g [ b ( z T , x T ) ] P ( Z ∣ X , θ t ) \begin{equation} \begin{aligned} L(B)=&\sum\limits_{Z}log\prod_{j=1}^Tb_{(z_i,x_i)}P(Z|X,\theta^t) \\=&\sum\limits_{Z}\sum\limits_{j=1}^Tlog\left[b_{(z_i,x_i)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}\left[logb_{(z_1,x_1)}+logb_{(z_2,x_2)}+\cdots+logb_{(z_T,x_T)}\right]P(Z|X,\theta^t) \\=&\sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta^t)+\sum\limits_{Z}log[b_{(z_2,x_2)}]P(Z|X,\theta^t)+\cdots+\sum\limits_{Z}log[b_{(z_T,x_T)}]P(Z|X,\theta^t) \end{aligned} \end{equation} L(B)====Zlogj=1Tb(zi,xi)P(ZX,θt)Zj=1Tlog[b(zi,xi)]P(ZX,θt)Z[logb(z1,x1)+logb(z2,x2)++logb(zT,xT)]P(ZX,θt)Zlog[b(z1,x1)]P(ZX,θt)+Zlog[b(z2,x2)]P(ZX,θt)++Zlog[b(zT,xT)]P(ZX,θt)
一项一项处理,对于 ∑ Z l o g [ b ( z 1 , x 1 ) ] P ( Z ∣ X , θ ) \sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta) Zlog[b(z1,x1)]P(ZX,θ)
∑ Z l o g [ b ( z 1 , x 1 ) ] P ( Z ∣ X , θ t ) = ∑ z 1 , ⋯   , z T l o g [ b ( z 1 , x 1 ) ] P ( Z ∣ X , θ t ) = ∑ z 1 l o g [ b ( z 1 , x 1 ) ] ∑ z 2 , ⋯   , z T P ( z 1 , ⋯   , z T ∣ X , θ t ) = ∑ z 1 l o g [ b ( z 1 , x 1 ) ] P ( z 1 ∣ X , θ t ) = ∑ i = 1 n l o g [ b ( z 1 = q j , x 1 ) ] P ( z 1 = q j ∣ X , θ t ) \begin{equation} \begin{aligned} \sum\limits_{Z}log[b_{(z_1,x_1)}]P(Z|X,\theta^t)=&\sum\limits_{z_1,\cdots,z_T}log[b_{(z_1,x_1)}]P(Z|X,\theta^t) \\=&\sum\limits_{z_1}log[b_{(z_1,x_1)}]\sum\limits_{z_2,\cdots,z_T}P(z_1,\cdots,z_T|X,\theta^t) \\=&\sum\limits_{z_1}log[b_{(z_1,x_1)}]P(z_1|X,\theta^t) \\=&\sum\limits_{i=1}^nlog[b_{(z_1=q_j,x_1)}]P(z_1=q_j|X,\theta^t) \end{aligned} \end{equation} Zlog[b(z1,x1)]P(ZX,θt)====z1,,zTlog[b(z1,x1)]P(ZX,θt)z1log[b(z1,x1)]z2,,zTP(z1,,zTX,θt)z1log[b(z1,x1)]P(z1X,θt)i=1nlog[b(z1=qj,x1)]P(z1=qjX,θt)
所以,对于其余项,全部累加起来得
L ( B ) = ∑ t = 1 T ∑ i = 1 n l o g [ b ( z t = q i , x i ) ] P ( z t = q i ∣ X , θ t ) L(B)=\sum\limits_{t=1}^T\sum\limits_{i=1}^nlog[b_{(z_t=q_i,x_i)}]P(z_t=q_i|X,\theta^t) L(B)=t=1Ti=1nlog[b(zt=qi,xi)]P(zt=qiX,θt)
另外,对于转移矩阵B,其行向量是肯定满足 ∑ j = 1 m b ( z = q i , x = v j ) = 1 \sum\limits_{j=1}^m{b_{(z=q_i,x=v_j)}}=1 j=1mb(z=qi,x=vj)=1,且与上面所写的一样,对 P ( z t = q i ∣ X , θ t ) P(z_t=q_i|X,\theta^t) P(zt=qiX,θt)贝叶斯展开,然后得到
L ( B , λ ) = ∑ t = 1 T ∑ i = 1 n l o g [ b ( z t = q i , x i ) ] P ( z t = q i , X ∣ θ t ) + λ [ ∑ j = 1 m b ( z = q i , x = v j ) − 1 ] L(B,\lambda)=\sum\limits_{t=1}^T\sum\limits_{i=1}^nlog[b_{(z_t=q_i,x_i)}]P(z_t=q_i,X|\theta^t)+\lambda\left[\sum\limits_{j=1}^m{b_{(z=q_i,x=v_j)}}-1\right] L(Bλ)=t=1Ti=1nlog[b(zt=qi,xi)]P(zt=qi,Xθt)+λ[j=1mb(z=qi,x=vj)1]

b ( z = q i , x = v j ) b_{(z=q_i,x=v_j)} b(z=qi,x=vj)求导,因为拉格朗日函数里面的 x i x_i xi是由给定的数据确定的,我们只对 x = v j x=v_j x=vj得部分才有值,其他的都为0。因此,我们引入示性函数
I = { 1 , x = v j 0 , x ≠ u j I=\left\{ \begin{matrix} 1,x=v_j\\ 0,x\ne{u_j} \end{matrix} \right. I={1,x=vj0,x=uj
所以
∂ L ( π , λ ) ∂ b ( z = q i , x = v j ) = ∑ t = 1 T 1 b ( z t = q i , x t ) P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + λ = 0 等式左右乘以 b ( z = q i , x = v j ) ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + λ b ( z = q i , x = v j ) = 0 \begin{equation} \begin{aligned} \frac{\partial{L(\pi,\lambda)}}{\partial{b_{(z=q_i,x=v_j)}}}=&\sum\limits_{t=1}^T\frac{1}{b_{(z_t=q_i,x_t)}}P(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&0 \\&等式左右乘以b_{(z=q_i,x=v_j)} \\&\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda{b_{(z=q_i,x=v_j)}}=0 \end{aligned} \end{equation} b(z=qi,x=vj)L(π,λ)==t=1Tb(zt=qi,xt)1P(zt=qi,Xθt)I(xt=vj)+λ0等式左右乘以b(z=qi,x=vj)t=1TP(zt=qi,Xθt)I(xt=vj)+λb(z=qi,x=vj)=0


∑ j m [ ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + λ b ( z = q i , x = v j ) ] = ∑ j = 1 m ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + ∑ j = 1 m λ b ( z = q i , x = v j ) = ∑ j = 1 m ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + λ = ∑ t = 1 T ∑ j = 1 m P ( z t = q i , X ∣ θ t ) I ( x t = v j ) + λ = 0 \begin{equation} \begin{aligned} &\sum\limits_{j}^m\left[\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda{b_{(z=q_i,x=v_j)}}\right] \\=&\sum\limits_{j=1}^m\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\sum\limits_{j=1}^m\lambda{b_{(z=q_i,x=v_j)}} \\=&\sum\limits_{j=1}^m\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&\sum\limits_{t=1}^T\sum\limits_{j=1}^mP(z_t=q_i,X|\theta^t)I(x_t=v_j)+\lambda \\=&0 \end{aligned} \end{equation} ====jm[t=1TP(zt=qi,Xθt)I(xt=vj)+λb(z=qi,x=vj)]j=1mt=1TP(zt=qi,Xθt)I(xt=vj)+j=1mλb(z=qi,x=vj)j=1mt=1TP(zt=qi,Xθt)I(xt=vj)+λt=1Tj=1mP(zt=qi,Xθt)I(xt=vj)+λ0
对于 ∑ j = 1 m P ( z t = q i , X ∣ θ ) I ( x t = v j ) \sum\limits_{j=1}^mP(z_t=q_i,X|\theta)I(x_t=v_j) j=1mP(zt=qi,Xθ)I(xt=vj),由于只能存在一个 x t = v j x_t=v_j xt=vj,故
∑ t = 1 T P ( z t = q i , X ∣ θ t ) + λ = 0 \sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)+\lambda=0 t=1TP(zt=qi,Xθt)+λ=0
将其回代入求导所得式中,得
b ( z = q i , x = v j ) = ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) ∑ t = 1 T P ( z t = q i , X ∣ θ ) b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)} b(z=qi,x=vj)=t=1TP(zt=qi,Xθ)t=1TP(zt=qi,Xθt)I(xt=vj)

所以,最终的迭代更新式为
π i = P ( z 1 = q i , X ∣ θ t ) P ( X ∣ θ t ) ; a ( z = q i , z = q j ) = ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) ∑ t = 1 T − 1 P ( z t = q i , X ∣ θ t ) ; b ( z = q i , x = v j ) = ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) ∑ t = 1 T P ( z t = q i , X ∣ θ ) \pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)}; \\a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)}; \\b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)} πi=P(Xθt)P(z1=qi,Xθt);a(z=qi,z=qj)=t=1T1P(zt=qi,Xθt)tT1P(zt=qi,zt+1=qj,Xθt);b(z=qi,x=vj)=t=1TP(zt=qi,Xθ)t=1TP(zt=qi,Xθt)I(xt=vj)
\theta)}
$$

所以,最终的迭代更新式为
π i = P ( z 1 = q i , X ∣ θ t ) P ( X ∣ θ t ) ; a ( z = q i , z = q j ) = ∑ t T − 1 P ( z t = q i , z t + 1 = q j , X ∣ θ t ) ∑ t = 1 T − 1 P ( z t = q i , X ∣ θ t ) ; b ( z = q i , x = v j ) = ∑ t = 1 T P ( z t = q i , X ∣ θ t ) I ( x t = v j ) ∑ t = 1 T P ( z t = q i , X ∣ θ ) \pi_i=\frac{P(z_1=q_i,X|\theta^t)}{P(X|\theta^t)}; \\a_{(z=q_i,z=q_j)}=\frac{\sum\limits_{t}^{T-1}P(z_t=q_i,z_{t+1}=q_j,X|\theta^t)}{\sum\limits_{t=1}^{T-1}P(z_t=q_i,X|\theta^t)}; \\b_{(z=q_i,x=v_j)}=\frac{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta^t)I(x_t=v_j)}{\sum\limits_{t=1}^TP(z_t=q_i,X|\theta)} πi=P(Xθt)P(z1=qi,Xθt);a(z=qi,z=qj)=t=1T1P(zt=qi,Xθt)tT1P(zt=qi,zt+1=qj,Xθt);b(z=qi,x=vj)=t=1TP(zt=qi,Xθ)t=1TP(zt=qi,Xθt)I(xt=vj)
可是,还有个问题。等号右边的概率又该如何计算?主要从下一篇evaluationHMM隐马尔可夫模型的数学推导(二)中进行引入。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值