序列比对(十五)——EM算法以及Baum-Welch算法的推导

原创:hxj7

本文介绍了 EM算法 和 Baum-Welch算法的推导过程。

一般地,估算概率模型参数可以用最大似然法,即找到
θ ^ = a r g m a x θ   P ( x ∣ θ ) \hat{\theta} = \mathrm{argmax}_\theta \, P(x|\theta) θ^=argmaxθP(xθ)

有时为了简便运算,也可以用最大对数似然代替最大似然,即以下公式
θ ^ = a r g m a x θ   l o g   P ( x ∣ θ ) \hat{\theta} = \mathrm{argmax}_\theta \, \mathrm{log} \, P(x|\theta) θ^=argmaxθlogP(xθ)

很多时候上述公式没有解析解或者解析解运算过于复杂,这个时候可以用迭代的方法求解预先设置终止条件满足该条件时即停止迭代

EM算法

EM算法就是这样一种迭代算法,该算法在有缺失值存在的情况下估算概率模型的参数(或参数集,下文统称参数)。其大致过程如下:
E步骤:计算Q函数。
M步骤:相较于 θ \theta θ,最大化 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)

其具体推导过程如下:
最终目的是找到最大对数似然对应的参数。
(1) θ ^ = a r g m a x θ   l o g   P ( x ∣ θ ) \hat{\theta} = \mathrm{argmax}_\theta \, \mathrm{log} \, P(x|\theta) \tag{1} θ^=argmaxθlogP(xθ)(1)

假设 y y y为缺失数据,我们首先可以得到:
(2) l o g   P ( x ∣ θ ) = l o g   P ( x , y ∣ θ ) − l o g   P ( y ∣ x , θ ) \mathrm{log} \, P(x|\theta) = \mathrm{log} \, P(x,y|\theta) - \mathrm{log} \, P(y|x,\theta) \tag{2} logP(xθ)=logP(x,yθ)logP(yx,θ)(2)

公式(2)很容易推导,因为:
(2.1) P ( x , y ∣ θ ) = P ( x ∣ θ ) P ( y ∣ x , θ ) P(x,y|\theta) = P(x|\theta)P(y|x,\theta) \tag{2.1} P(x,yθ)=P(xθ)P(yx,θ)(2.1)
所以
(2.2) P ( x ∣ θ ) = P ( x , y ∣ θ ) P ( y ∣ x , θ ) P(x|\theta) = \frac{P(x,y|\theta)}{P(y|x,\theta)} \tag{2.2} P(xθ)=P(yx,θ)P(x,yθ)(2.2)
等式两边取 $\mathrm{log} , $ 可以得到公式(2)。

在求取最大对数似然的迭代过程中,假设步骤 t t t中得到了参数 θ t \theta^t θt,对应的对数似然是 l o g   P ( x ∣ θ t ) \mathrm{log} \, P(x|\theta^t) logP(xθt)。那么步骤 t + 1 t+1 t+1中的对数似然应该不小于 l o g   P ( x ∣ θ t ) \mathrm{log} \, P(x|\theta^t) logP(xθt)。即
(3) l o g   P ( x ∣ θ t + 1 ) − l o g   P ( x ∣ θ t ) ≥ 0 \mathrm{log} \, P(x|\theta^{t+1}) - \mathrm{log} \, P(x|\theta^t) \geq 0 \tag{3} logP(xθt+1)logP(xθt)0(3)

为了得到 θ t + 1 \theta^{t+1} θt+1,我们首先变换得到:
(4) l o g   P ( x ∣ θ ) = ∑ y P ( y ∣ x , θ t ) l o g   P ( x , y ∣ θ ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ ) \begin{aligned} \displaystyle \mathrm{log} \, P(x|\theta)= & \sum_y P(y|x,\theta^t)\mathrm{log} \, P(x,y|\theta)\\ & -\sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta) \tag{4} \end{aligned} logP(xθ)=yP(yx,θt)logP(x,yθ)yP(yx,θt)logP(yx,θ)(4)

公式(4)可以这样推导得到:
将公式(2)等式两边乘上 P ( y ∣ x , θ t ) P(y|x,\theta^t) P(yx,θt),再对所有的 y y y求和,得到:
(4.1) ∑ y P ( y ∣ x , θ t ) l o g   P ( x ∣ θ ) = ∑ y P ( y ∣ x , θ t ) l o g   P ( x , y ∣ θ ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ ) \begin{aligned} \displaystyle \sum_y P(y|x,\theta^t)\mathrm{log} \, P(x|\theta) = & \sum_y P(y|x,\theta^t)\mathrm{log} \, P(x,y|\theta)\\ & -\sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta) \tag{4.1} \end{aligned} yP(yx,θt)logP(xθ)=yP(yx,θt)logP(x,yθ)yP(yx,θt)logP(yx,θ)(4.1)
很容易看出等式左边:
(4.2) ∑ y P ( y ∣ x , θ t ) l o g   P ( x ∣ θ ) = l o g   P ( x ∣ θ ) ∑ y P ( y ∣ x , θ t ) = l o g   P ( x ∣ θ ) \begin{aligned} \displaystyle \sum_y P(y|x,\theta^t)\mathrm{log} \, P(x|\theta) & = \mathrm{log} \, P(x|\theta)\sum_y P(y|x,\theta^t) \\ & = \mathrm{log} \, P(x|\theta) \tag{4.2} \end{aligned} yP(yx,θt)logP(xθ)=logP(xθ)yP(yx,θt)=logP(xθ)(4.2)
由公式(4.1)和(4.2)可以得到公式(4)。

如果令
(5) Q ( θ ∣ θ t ) = ∑ y P ( y ∣ x , θ t ) l o g   P ( x , y ∣ θ ) Q(\theta|\theta^t) = \sum_y P(y|x,\theta^t)\mathrm{log} \, P(x,y|\theta) \tag{5} Q(θθt)=yP(yx,θt)logP(x,yθ)(5)

那么我们可以得到:
(6) l o g   P ( x ∣ θ ) − l o g   P ( x ∣ θ t ) = Q ( θ ∣ θ t ) − Q ( θ t ∣ θ t ) + ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) P ( y ∣ x , θ ) \begin{aligned} \displaystyle & \mathrm{log} \, P(x|\theta) - \mathrm{log} \, P(x|\theta^t)\\ &= Q(\theta|\theta^t) - Q(\theta^t|\theta^t) + \sum_y P(y|x,\theta^t)\mathrm{log} \, \frac{P(y|x,\theta^t)}{P(y|x,\theta)} \tag{6} \end{aligned} logP(xθ)logP(xθt)=Q(θθt)Q(θtθt)+yP(yx,θt)logP(yx,θ)P(yx,θt)(6)

公式(6)的推导也很容易:
(6.1) l o g   P ( x ∣ θ ) − l o g   P ( x ∣ θ t ) = [ ∑ y P ( y ∣ x , θ t ) l o g   P ( x , y ∣ θ ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ ) ]       − [ ∑ y P ( y ∣ x , θ t ) l o g   P ( x , y ∣ θ t ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) ] = [ Q ( θ ∣ θ t ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ ) ]       − [ Q ( θ t ∣ θ t ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) ] = [ Q ( θ ∣ θ t ) − Q ( θ t ∣ θ t ) ]       + [ ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) − ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ ) ] = Q ( θ ∣ θ t ) − Q ( θ t ∣ θ t ) + ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) P ( y ∣ x , θ ) \begin{aligned} &\mathrm{log} \, P(x|\theta) - \mathrm{log} \, P(x|\theta^t)\\ &=\bigg[\sum_y P(y|x,\theta^t)\mathrm{log} \, P(x,y|\theta) - \sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta) \bigg]\\ &\ \ \ \ \ - \bigg[\sum_y P(y|x,\theta^t)\mathrm{log} \, P(x,y|\theta^t) - \sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta^t) \bigg]\\ &=\bigg[Q(\theta|\theta^t) - \sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta)\bigg] \\ &\ \ \ \ \ - \bigg[Q(\theta^t|\theta^t) - \sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta^t)\bigg] \\ &=\bigg[Q(\theta|\theta^t) - Q(\theta^t|\theta^t)\bigg]\\ &\ \ \ \ \ +\bigg[\sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta^t) - \sum_y P(y|x,\theta^t)\mathrm{log} \, P(y|x,\theta)\bigg]\\ &=Q(\theta|\theta^t) - Q(\theta^t|\theta^t) + \sum_y P(y|x,\theta^t)\mathrm{log} \, \frac{P(y|x,\theta^t)}{P(y|x,\theta)} \tag{6.1} \end{aligned} logP(xθ)logP(xθt)=[yP(yx,θt)logP(x,yθ)yP(yx,θt)logP(yx,θ)]     [yP(yx,θt)logP(x,yθt)yP(yx,θt)logP(yx,θt)]=[Q(θθt)yP(yx,θt)logP(yx,θ)]     [Q(θtθt)yP(yx,θt)logP(yx,θt)]=[Q(θθt)Q(θtθt)]     +[yP(yx,θt)logP(yx,θt)yP(yx,θt)logP(yx,θ)]=Q(θθt)Q(θtθt)+yP(yx,θt)logP(yx,θ)P(yx,θt)(6.1)

其中 ∑ y P ( y ∣ x , θ t ) l o g   P ( y ∣ x , θ t ) P ( y ∣ x , θ ) \sum_yP(y|x,\theta^t)\mathrm{log} \, \frac{P(y|x,\theta^t)}{P(y|x,\theta)} yP(yx,θt)logP(yx,θ)P(yx,θt)一项是 P ( y ∣ x , θ t ) P(y|x,\theta^t) P(yx,θt) P ( y ∣ x , θ ) P(y|x,\theta) P(yx,θ)的相对熵,所以不小于0。只要 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)不小于 Q ( θ t ∣ θ t ) Q(\theta^t|\theta^t) Q(θtθt),那么就可以保证公式(3)成立。那么可以取
(7) θ t + 1 = a r g m a x θ   Q ( θ ∣ θ t ) \theta^{t+1}=\mathrm{argmax}_\theta \, Q(\theta|\theta^t) \tag{7} θt+1=argmaxθQ(θθt)(7)

我们定义概率分布 P ( x ) P(x) P(x)相对于概率分布 Q ( x ) Q(x) Q(x)的相对熵为:
(7.1) H ( P ∣ ∣ Q ) = ∑ i   P ( x i ) l o g   P ( x i ) Q ( x i ) \displaystyle H(P||Q) = \sum_i\ P(x_i) \mathrm{log} \, \frac{P(x_i)}{Q(x_i)} \tag{7.1} H(PQ)=i P(xi)logQ(xi)P(xi)(7.1)
那么 H ( P ∣ ∣ Q ) ≥ 0 H(P||Q) \geq 0 H(PQ)0,证明如下:
我们知道:
(7.2) l o g   x ≤ x − 1 , w h e n x > 0 \mathrm{log} \, x \leq x - 1, \qquad when \quad x > 0 \tag{7.2} logxx1,whenx>0(7.2)
那么:
(7.3) − l o g   x ≥ 1 − x , w h e n x > 0 -\mathrm{log} \, x \geq 1 - x, \qquad when \quad x > 0 \tag{7.3} logx1x,whenx>0(7.3)
所以:
(7.4) l o g   P ( x i ) Q ( x i ) = − l o g   Q ( x i ) P ( x i ) ≥ 1 − Q ( x i ) P ( x i ) \begin{aligned} \mathrm{log} \, \frac{P(x_i)}{Q(x_i)} &= -\mathrm{log} \, \frac{Q(x_i)}{P(x_i)} \\ &\geq 1 - \frac{Q(x_i)}{P(x_i)} \tag{7.4} \end{aligned} logQ(xi)P(xi)=logP(xi)Q(xi)1P(xi)Q(xi)(7.4)
因而:
(7.5) H ( P ∣ ∣ Q ) = ∑ i   P ( x i ) l o g   P ( x i ) Q ( x i ) ≥ ∑ i   P ( x i ) ( 1 − Q ( x i ) P ( x i ) ) = ∑ i   P ( x i ) − ∑ i   Q ( x i ) = 1 − 1 = 0 \begin{aligned} \displaystyle H(P||Q) &= \sum_i\ P(x_i) \mathrm{log} \, \frac{P(x_i)}{Q(x_i)}\\ &\geq \sum_i\ P(x_i) \bigg(1 - \frac{Q(x_i)}{P(x_i)}\bigg)\\ &=\sum_i\ P(x_i)-\sum_i\ Q(x_i)\\ &=1-1\\ &=0 \tag{7.5} \end{aligned} H(PQ)=i P(xi)logQ(xi)P(xi)i P(xi)(1P(xi)Q(xi))=i P(xi)i Q(xi)=11=0(7.5)
等号成立当且仅当对所有的 i i i P ( x i ) = Q ( x i ) P(x_i)=Q(x_i) P(xi)=Q(xi)都成立。

Baum-Welch算法

Baum-Welch算法是EM算法的一个特例,可以用于估算HMM模型中的概率参数。其缺失的数据是路径 π \pi π。所以,Baum-Welch算法求解的是:
(8) Q ( θ ∣ θ t ) = ∑ π P ( π ∣ x , θ t ) l o g   P ( x , π ∣ θ ) Q(\theta|\theta^t) = \sum_\pi P(\pi|x,\theta^t)\mathrm{log} \, P(x,\pi|\theta) \tag{8} Q(θθt)=πP(πx,θt)logP(x,πθ)(8)

我们知道,HMM模型中,对于一条给定的路径 π \pi π,模型中的参数(如转移概率 a k l a_{kl} akl、发射概率 e k ( b ) e_k(b) ek(b)等)都会在计算 l o g P ( x , π ∣ θ ) \mathrm{log}P(x,\pi|\theta) logP(x,πθ)的式子中出现多次。假设 a k l a_{kl} akl出现的次数为 A k l A_{kl} Akl,而 e k ( b ) e_k(b) ek(b)出现的次数为 E k ( b ) E_k(b) Ek(b),那么对于路径 π \pi π
(9) P ( x , π ∣ θ ) = ∏ k = 0 M ∏ l = 1 M a k l A k l ( π ) ∏ k = 1 M ∏ b [ e k ( b ) ] E k ( b , π ) \displaystyle P(x,\pi|\theta) = \prod_{k=0}^M \prod_{l=1}^M a_{kl}^{A_{kl}(\pi)} \prod_{k=1}^M \prod_{b}[e_k(b)]^{E_k(b, \pi)} \tag{9} P(x,πθ)=k=0Ml=1MaklAkl(π)k=1Mb[ek(b)]Ek(b,π)(9)

所以:
(10) l o g   P ( x , π ∣ θ ) = ∑ k = 0 M ∑ l = 1 M A k l ( π ) l o g   a k l + ∑ k = 1 M ∑ b E k ( b , π ) l o g   e k ( b ) \displaystyle \mathrm{log} \, P(x,\pi|\theta) = \sum_{k=0}^M \sum_{l=1}^M A_{kl}(\pi) \mathrm{log} \, a_{kl} + \sum_{k=1}^M \sum_{b} E_k(b, \pi) \mathrm{log} \, e_k(b) \tag{10} logP(x,πθ)=k=0Ml=1MAkl(π)logakl+k=1MbEk(b,π)logek(b)(10)

由此,公式(8)变成:
(11) Q ( θ ∣ θ t ) = ∑ π P ( π ∣ x , θ t ) × [ ∑ k = 0 M ∑ l = 1 M A k l ( π ) l o g   a k l + ∑ k = 1 M ∑ b E k ( b , π ) l o g   e k ( b ) ] \begin{aligned} Q(\theta|\theta^t) & = \sum_\pi P(\pi|x,\theta^t) \times \\ & \bigg[\sum_{k=0}^M \sum_{l=1}^M A_{kl}(\pi) \mathrm{log} \, a_{kl} + \sum_{k=1}^M \sum_{b} E_k(b, \pi) \mathrm{log} \, e_k(b)\bigg] \tag{11} \end{aligned} Q(θθt)=πP(πx,θt)×[k=0Ml=1MAkl(π)logakl+k=1MbEk(b,π)logek(b)](11)

改变公式(11)的求和顺序,我们可以得到:
(12) Q ( θ ∣ θ t ) = ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l + ∑ k = 1 M ∑ b E k ( b ) l o g   e k ( b ) Q(\theta|\theta^t) = \sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, a_{kl} + \sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, e_k(b) \tag{12} Q(θθt)=k=0Ml=1MAkllogakl+k=1MbEk(b)logek(b)(12)

其中:
(12.1) A k l = ∑ π P ( π ∣ x , θ t ) A k l ( π ) E k ( b ) = ∑ π P ( π ∣ x , θ t ) E k ( b , π ) A_{kl} = \sum_\pi P(\pi|x,\theta^t) A_{kl}(\pi) \\ E_k(b) = \sum_\pi P(\pi|x,\theta^t) E_k(b, \pi) \tag{12.1} Akl=πP(πx,θt)Akl(π)Ek(b)=πP(πx,θt)Ek(b,π)(12.1)

公式(12)可以这样推导得到:
公式(11)先对 π \pi π求和,可以得到:
(12.2) Q ( θ ∣ θ t ) = ∑ k = 0 M ∑ l = 1 M ∑ π P ( π ∣ x , θ t ) A k l ( π ) l o g   a k l       + ∑ k = 1 M ∑ b ∑ π P ( π ∣ x , θ t ) E k ( b , π ) l o g   e k ( b ) = ∑ k = 0 M ∑ l = 1 M l o g   a k l ∑ π P ( π ∣ x , θ t ) A k l ( π )       + ∑ k = 1 M ∑ b l o g   e k ( b ) ∑ π P ( π ∣ x , θ t ) E k ( b , π ) = ∑ k = 0 M ∑ l = 1 M l o g   a k l A k l + ∑ k = 1 M ∑ b l o g   e k ( b ) E k ( b ) \begin{aligned} Q(\theta|\theta^t) & = \sum_{k=0}^M \sum_{l=1}^M \sum_\pi P(\pi|x,\theta^t) A_{kl}(\pi) \mathrm{log} \, a_{kl} \\ & \ \ \ \ \ + \sum_{k=1}^M \sum_{b} \sum_\pi P(\pi|x,\theta^t) E_k(b, \pi) \mathrm{log} \, e_k(b)\\ & = \sum_{k=0}^M \sum_{l=1}^M \mathrm{log} \, a_{kl} \sum_\pi P(\pi|x,\theta^t) A_{kl}(\pi) \\ & \ \ \ \ \ + \sum_{k=1}^M \sum_{b} \mathrm{log} \, e_k(b) \sum_\pi P(\pi|x,\theta^t) E_k(b, \pi) \\ & = \sum_{k=0}^M \sum_{l=1}^M \mathrm{log} \, a_{kl} A_{kl} + \sum_{k=1}^M \sum_{b} \mathrm{log} \, e_k(b) E_k(b) \tag{12.2} \end{aligned} Q(θθt)=k=0Ml=1MπP(πx,θt)Akl(π)logakl     +k=1MbπP(πx,θt)Ek(b,π)logek(b)=k=0Ml=1MlogaklπP(πx,θt)Akl(π)     +k=1Mblogek(b)πP(πx,θt)Ek(b,π)=k=0Ml=1MlogaklAkl+k=1Mblogek(b)Ek(b)(12.2)

现在关键就是怎么样取 a k l a_{kl} akl以及 e k ( b ) e_k(b) ek(b)的值,可以最大化 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)。我们令
(13) a k l 0 = A k l ∑ l ′ A k l ′ e k 0 ( b ) = E k ( b ) ∑ b ′ E k ( b ′ ) a_{kl}^0 = \frac{A_{kl}}{\sum_{l'} A_{kl'}} \qquad e_k^0(b) = \frac{E_k(b)}{\sum_{b'} E_{k}(b')} \tag{13} akl0=lAklAklek0(b)=bEk(b)Ek(b)(13)

那么此时对应的 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)为其最大值。

公式(13)的结论证明如下:
假设 a k l 0 a_{kl}^0 akl0对应的 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt) Q 0 ( θ ∣ θ t ) Q^0(\theta|\theta^t) Q0(θθt),其它任意 a k l a_{kl} akl对应的 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)就记为 Q ( θ ∣ θ t ) Q(\theta|\theta^t) Q(θθt)。那么:
(13.1) Q 0 ( θ ∣ θ t ) − Q ( θ ∣ θ t ) = [ ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l 0 + ∑ k = 1 M ∑ b E k ( b ) l o g   e k 0 ( b ) ]       − [ ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l + ∑ k = 1 M ∑ b E k ( b ) l o g   e k ( b ) ] = [ ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l 0 − ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l ]       + [ ∑ k = 1 M ∑ b E k ( b ) l o g   e k 0 ( b ) − ∑ k = 1 M ∑ b E k ( b ) l o g   e k ( b ) ] = ∑ k = 0 M ∑ l = 1 M A k l l o g   a k l 0 a k l + ∑ k = 1 M ∑ b E k ( b ) l o g   e k 0 ( b ) e k ( b ) = ∑ k = 0 M [ ∑ l ′ A k l ′ ] ∑ l = 1 M A k l ∑ l ′ A k l ′ l o g   a k l 0 a k l       + ∑ k = 1 M [ ∑ b ′ E k ( b ′ ) ] ∑ b E k ( b ) ∑ b ′ E k ( b ′ ) l o g   e k 0 ( b ) e k ( b ) = ∑ k = 0 M [ ∑ l ′ A k l ′ ] ∑ l = 1 M a k l 0 l o g   a k l 0 a k l + ∑ k = 1 M [ ∑ b ′ E k ( b ′ ) ] ∑ b e k 0 ( b ) l o g   e k 0 ( b ) e k ( b ) ≥ 0 \begin{aligned} & Q^0(\theta|\theta^t) - Q(\theta|\theta^t) \\ & = \bigg[\sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, a_{kl}^0 + \sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, e_k^0(b)\bigg]\\ & \ \ \ \ \ - \bigg[\sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, a_{kl} + \sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, e_k(b)\bigg]\\ & = \bigg[\sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, a_{kl}^0 - \sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, a_{kl}\bigg]\\ & \ \ \ \ \ + \bigg[\sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, e_k^0(b) - \sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, e_k(b)\bigg]\\ & = \sum_{k=0}^M \sum_{l=1}^M A_{kl} \mathrm{log} \, \frac{a_{kl}^0}{a_{kl}} + \sum_{k=1}^M \sum_{b} E_k(b) \mathrm{log} \, \frac{e_k^0(b)}{e_k(b)}\\ & = \sum_{k=0}^M \bigg[\sum_{l'}A_{kl'}\bigg] \sum_{l=1}^M \frac{A_{kl}}{\sum_{l'}A_{kl'}} \mathrm{log} \, \frac{a_{kl}^0}{a_{kl}} \\ & \ \ \ \ \ + \sum_{k=1}^M \bigg[\sum_{b'}E_{k}(b')\bigg] \sum_{b} \frac{E_k(b)}{\sum_{b'}E_{k}(b')} \mathrm{log} \, \frac{e_k^0(b)}{e_k(b)}\\ & = \sum_{k=0}^M \bigg[\sum_{l'}A_{kl'}\bigg] \sum_{l=1}^M a_{kl}^0 \mathrm{log} \, \frac{a_{kl}^0}{a_{kl}} + \sum_{k=1}^M \bigg[\sum_{b'}E_{k}(b')\bigg] \sum_{b} e_k^0(b) \mathrm{log} \, \frac{e_k^0(b)}{e_k(b)}\\ & \geq 0 \tag{13.1} \end{aligned} Q0(θθt)Q(θθt)=[k=0Ml=1MAkllogakl0+k=1MbEk(b)logek0(b)]     [k=0Ml=1MAkllogakl+k=1MbEk(b)logek(b)]=[k=0Ml=1MAkllogakl0k=0Ml=1MAkllogakl]     +[k=1MbEk(b)logek0(b)k=1MbEk(b)logek(b)]=k=0Ml=1MAkllogaklakl0+k=1MbEk(b)logek(b)ek0(b)=k=0M[lAkl]l=1MlAklAkllogaklakl0     +k=1M[bEk(b)]bbEk(b)Ek(b)logek(b)ek0(b)=k=0M[lAkl]l=1Makl0logaklakl0+k=1M[bEk(b)]bek0(b)logek(b)ek0(b)0(13.1)

最后一个不等式中 ∑ l = 1 M a k l 0 l o g   a k l 0 a k l \sum_{l=1}^M a_{kl}^0 \mathrm{log} \, \frac{a_{kl}^0}{a_{kl}} l=1Makl0logaklakl0可以看作 a k l 0 a_{kl}^0 akl0对于 a k l a_{kl} akl的相对熵; ∑ b e k 0 ( b ) l o g   e k 0 ( b ) e k ( b ) \sum_{b} e_k^0(b) \mathrm{log} \, \frac{e_k^0(b)}{e_k(b)} bek0(b)logek(b)ek0(b)可以看作 e k 0 ( b ) e_k^0(b) ek0(b)相对于 e k ( b ) e_k(b) ek(b)的相对熵。所以二者都不小于0。

小结

本文的推导过程基于《生物序列分析》第11章中的内容,笔者所做的工作就是将书中简要的推导过程补充详细。

(公众号:生信了)

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值