首先需要明确,HMM学习的目标函数到底是什么:HMM是一种有向概率图模型,在有监督的情况下,使用极大似然估计最大化联合概率,求解最优的参数,即:
L ( θ ) = p ( x , z ∣ θ ) L(\theta) = p(x, z| \theta) L(θ)=p(x,z∣θ)
x x x是观测序列, z z z是状态序列,联合概率表示为:
p ( x , z ∣ θ ) = ∏ t = 0 T − 1 p ( x t + 1 ∣ z t + 1 , θ ) p ( z t + 1 ∣ z t , θ ) p(x, z| \theta) = \prod_{t = 0}^{T - 1} p(x_{t + 1}| z_{t + 1}, \theta) p(z_{t + 1}| z_{t}, \theta) p(x,z∣θ)=t=0∏T−1p(xt+1∣zt+1,θ)p(zt+1∣zt,θ)
取对数:
l o g ( p ( x , z ∣ θ ) ) = ∑ t = 1 T l o g ( p ( x t ∣ z t , θ ) ) + ∑ t = 1 T − 1 l o g ( p ( z t + 1 ∣ z t , θ ) ) log \bigg( p(x, z| \theta) \bigg) = \sum_{t = 1}^{T}log \bigg( p(x_t| z_t, \theta) \bigg) + \sum_{t = 1}^{T - 1}log \bigg( p(z_{t + 1}| z_t, \theta) \bigg) log(p(x,z∣θ))=t=1∑Tlog(p(xt∣zt,θ))+t=1∑T−1log(p(zt+1∣zt,θ))
假设 x t ∈ { 1 , 2 , … , O } x_t \in \{1, 2, \dots, O\} xt∈{1,2,…,O}, z t ∈ { 1 , 2 , … , H } z_t \in \{1, 2, \dots, H\} zt∈{1,2,…,H},发射概率矩阵记为 A A A,转移概率矩阵记为 B B B, A h , o A_{h, o} Ah,o在数据中的计数为 e h , o e_{h, o} eh,o, B j , k B_{j, k} Bj,k在数据中的计数为 f j , k f_{j, k} fj,k,似然函数改写为:
l o g ( p ( x , z ∣ θ ) ) = ∑ h = 1 H ∑ o = 1 O e h , o l o g ( A h , o ) + ∑ j = 1 H ∑ k = 1 O f j , k l o g ( B j , k ) log \bigg( p(x, z| \theta) \bigg) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) log(p(x,z∣θ))=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)
因此最终的优化问题为:
max ∑ h = 1 H ∑ o = 1 O e h , o l o g ( A h , o ) + ∑ j = 1 H ∑ k = 1 O f j , k l o g ( B j , k ) s . t . ∑ o = 1 O A h , o = 1 , ∑ k = 1 H B j , k = 1 \begin{array}{rcl} \max && \displaystyle \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \\ s.t. && \displaystyle \sum_{o = 1}^{O}A_{h, o} = 1, \sum_{k = 1}^{H}B_{j, k} = 1 \end{array} maxs.t.h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)o=1∑OAh,o=1,k=1∑HBj,k=1
使用拉格朗日乘数法:
L ( A , B ) = ∑ h = 1 H ∑ o = 1 O e h , o l o g ( A h , o ) + ∑ j = 1 H ∑ k = 1 O f j , k l o g ( B j , k ) − ∑ h = 1 H α h ( ∑ o = 1 O A h , o − 1 ) − ∑ j = 1 H β j ( ∑ k = 1 H B j , k − 1 ) L(A, B) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \newline -\sum_{h = 1}^{H}\alpha_h \bigg( \sum_{o = 1}^{O}{A_{h, o} - 1} \bigg) \newline -\sum_{j = 1}^{H}\beta_j \bigg( \sum_{k = 1}^{H}{B_{j, k} - 1} \bigg) L(A,B)=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)−h=1∑Hαh(o=1∑OAh,o−1)−j=1∑Hβj(k=1∑HBj,k−1)
对 A A A、 B B B求偏导数得:
∂ L ∂ A h , o = e h , o A h , o − α h = 0 ∂ L ∂ B j , k = f j , k B j , k − β j = 0 ∂ L ∂ α h = ∑ o = 1 O A h , o − 1 ∂ L ∂ β j = ∑ k = 1 H B j , k − 1 \frac{\partial{L}}{\partial{A_{h, o}}} = \frac{e_{h, o}}{A_{h, o}} - \alpha_h = 0 \newline \frac{\partial{L}}{\partial{B_{j, k}}} = \frac{f_{j, k}}{B_{j, k}} - \beta_j = 0 \newline \frac{\partial{L}}{\partial{\alpha_h}} = \sum_{o = 1}^{O}{A_{h, o} - 1} \newline \frac{\partial{L}}{\partial{\beta_j}} = \sum_{k = 1}^{H}{B_{j, k} - 1} ∂Ah,o∂L=Ah,oeh,o−αh=0∂Bj,k∂L=Bj,kfj,k−βj=0∂αh∂L=o=1∑OAh,o−1∂βj∂L=k=1∑HBj,k−1
把 A A A、 B B B代入约束条件得:
α h = ∑ o = 1 O e h , o β j = ∑ k = 1 H f j , k \alpha_h = \sum_{o = 1}^{O}e_{h, o} \newline \beta_j = \sum_{k = 1}^{H}f_{j, k} αh=o=1∑Oeh,oβj=k=1∑Hfj,k
代入增广函数:
L ( A , B ) = ∑ h = 1 H ∑ o = 1 O e h , o l o g ( A h , o ) + ∑ j = 1 H ∑ k = 1 O f j , k l o g ( B j , k ) − ∑ h = 1 H ∑ o = 1 O e h , o ( ∑ o = 1 O A h , o − 1 ) − ∑ j = 1 H ∑ k = 1 H f j , k ( ∑ k = 1 H B j , k − 1 ) L(A, B) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \newline -\sum_{h = 1}^{H}\sum_{o = 1}^{O}e_{h, o} \bigg( \sum_{o = 1}^{O}{A_{h, o} - 1} \bigg) \newline -\sum_{j = 1}^{H}\sum_{k = 1}^{H}f_{j, k} \bigg( \sum_{k = 1}^{H}{B_{j, k} - 1} \bigg) L(A,B)=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)−h=1∑Ho=1∑Oeh,o(o=1∑OAh,o−1)−j=1∑Hk=1∑Hfj,k(k=1∑HBj,k−1)
重新对 A A A、 B B B求偏导数得:
∂ L ∂ A h , o = e h , o A h , o − ∑ o = 1 O e h , o = 0 ∂ L ∂ B j , k = f j , k B j , k − ∑ k = 1 H f j , k = 0 \frac{\partial{L}}{\partial{A_{h, o}}} = \frac{e_{h, o}}{A_{h, o}} - \sum_{o = 1}^{O}e_{h, o} = 0 \newline \frac{\partial{L}}{\partial{B_{j, k}}} = \frac{f_{j, k}}{B_{j, k}} - \sum_{k = 1}^{H}f_{j, k} = 0 ∂Ah,o∂L=Ah,oeh,o−o=1∑Oeh,o=0∂Bj,k∂L=Bj,kfj,k−k=1∑Hfj,k=0
最后的结果与直观认知一致(认知与推导一致是偶然的):
A h , o = e h , o ∑ o = 1 O e h , o B j , k = f j , k ∑ k = 1 H f j , k A_{h, o} = \frac{e_{h, o}}{\displaystyle\sum_{o = 1}^{O}e_{h, o}} \newline B_{j, k} = \frac{f_{j, k}}{\displaystyle\sum_{k = 1}^{H}f_{j, k}} Ah,o=o=1∑Oeh,oeh,oBj,k=k=1∑Hfj,kfj,k