UTF8gbsn
Induction
The main topic of this article is about Baum-Welch algorithm. We will
estimate
λ
=
(
A
,
B
,
π
)
\lambda=(A,B,\pi)
λ=(A,B,π) with
{
O
1
,
O
2
,
⋯
,
O
S
}
\{O_1,O_2,\cdots,O_S\}
{O1,O2,⋯,OS} and the
lenght of
O
i
O_i
Oi is
T
T
T. The object of our probability model is
P
(
O
∣
λ
)
=
∑
I
P
(
I
∣
λ
)
P
(
O
∣
I
,
λ
)
P(O|\lambda)=\sum_{I}P(I|\lambda)P(O|I,\lambda)
P(O∣λ)=I∑P(I∣λ)P(O∣I,λ)
The parameters of the upon formula can be estimated by EM algorithm.
-
E step:
Q ( λ , λ ‾ ) = ∑ I P ( O , I ∣ λ ‾ ) l o g P ( O , I ∣ λ ) Q(\lambda, \overline{\lambda})=\sum_{I}P(O,I|\overline{\lambda})logP(O,I|\lambda) Q(λ,λ)=I∑P(O,I∣λ)logP(O,I∣λ)P ( O , I ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 , i 2 b i 2 ( o 2 ) ⋯ a i T − 1 , i T b i T ( o T ) P(O,I|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1,i_2}b_{i_2}(o_2)\cdots a_{i_{T-1},i_{T}}b_{i_T}(o_T) P(O,I∣λ)=πi1bi1(o1)ai1,i2bi2(o2)⋯aiT−1,iTbiT(oT)
The function Q ( λ , λ ‾ ) Q(\lambda,\overline{\lambda}) Q(λ,λ) becomes:
Q ( λ , λ ‾ ) = ∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T l o g b i t ( o t ) ) P ( O , I ∣ λ ‾ ) \left. \begin{aligned} Q(\lambda,\overline{\lambda})=&\sum_{I}log\pi_{i_1}P(O,I|\overline{\lambda})+\\ &\sum_I(\sum_{t=1}^{T-1}loga_{i_{t},i_{t+1}})P(O,I|\overline{\lambda})+\\ &\sum_I(\sum_{t=1}^{T}logb_{i_t}(o_t))P(O,I|\overline{\lambda}) \end{aligned} \right. Q(λ,λ)=I∑logπi1P(O,I∣λ)+I∑(t=1∑T−1logait,it+1)P(O,I∣λ)+I∑(t=1∑Tlogbit(ot))P(O,I∣λ)
-
M step:
-
π i \pi_i πi
∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) = ∑ i = 1 N l o g π i P ( O , i 1 = i ∣ λ ‾ ) \sum_Ilog\pi_{i_1} P(O,I|\overline{\lambda})=\sum_{i=1}^{N}log\pi_iP(O,i_1=i|\overline{\lambda}) I∑logπi1P(O,I∣λ)=i=1∑NlogπiP(O,i1=i∣λ)
There is a constraint ∑ i = 1 N π i = 1 \sum_{i=1}^{N}\pi_i=1 ∑i=1Nπi=1 ,and using the
lagrange multiplier will help us to form the lagrange function
as follow:f ( π , γ ) = ∑ i = 1 N l o g π i P ( O , i 1 = i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i = 1 ) f(\pi, \gamma)=\sum_{i=1}^{N}log\pi_iP(O,i_1=i|\overline{\lambda})+\gamma(\sum_{i=1}^{N}\pi_i=1) f(π,γ)=i=1∑NlogπiP(O,i1=i∣λ)+γ(i=1∑Nπi=1)
The partial derivative of f ( π , γ ) f(\pi,\gamma) f(π,γ) is:
∂ ∂ π i [ ∑ i = 1 N log π i P ( O , i 1 = i ∣ λ ˉ ) + γ ( ∑ i = 1 N π i − 1 ) ] = 0 \frac{\partial}{\partial \pi_{i}}\left[\sum_{i=1}^{N} \log \pi_{i} P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma\left(\sum_{i=1}^{N} \pi_{i}-1\right)\right]=0 ∂πi∂[i=1∑NlogπiP(O,i1=i∣λˉ)+γ(i=1∑Nπi−1)]=0
We simplify the upon formula as :P ( O , i 1 = i ∣ λ ˉ ) + γ π i = 0 P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma \pi_{i}=0 P(O,i1=i∣λˉ)+γπi=0
From
∑ i = 1 N [ P ( O , i 1 = i ∣ λ ˉ ) + γ π i ] = 0 \sum_{i=1}^{N}[P\left(O, i_{1}=i \mid \bar{\lambda}\right)+\gamma \pi_{i}]=0 ∑i=1N[P(O,i1=i∣λˉ)+γπi]=0,
we can get γ = − P ( O ∣ λ ‾ ) \gamma=-P(O|\overline{\lambda}) γ=−P(O∣λ)Finally, we get the following formula which is:
π i = P ( O , i 1 = i ∣ λ ‾ ) P ( O ∣ λ ‾ ) \pi_i=\frac{P(O,i_1=i|\overline{\lambda})}{P(O|\overline{\lambda})} πi=P(O∣λ)P(O,i1=i∣λ) -
a i j a_{ij} aij
∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log a i j P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) \sum_I(\sum_{t=1}^{T-1}loga_{i_{t},i_{t+1}})P(O,I|\overline{\lambda})=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{i j} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right) I∑(t=1∑T−1logait,it+1)P(O,I∣λ)=i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λˉ)
Constraints of this function are
∑ j = 1 N a 1 j = 1 \sum_{j=1}^{N}a_{1j}=1 ∑j=1Na1j=1, ∑ j = 1 N a 2 j = 1 \sum_{j=1}^{N}a_{2j}=1 ∑j=1Na2j=1, ⋯ \cdots ⋯, ∑ j = 1 N a N j = 1 \sum_{j=1}^{N}a_{Nj}=1 ∑j=1NaNj=1
,and our final object function will be :
f ( A , γ ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 log a i j P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) + ∑ i = 1 N γ i ( ∑ j = 1 N a i j − 1 ) f(A, \mathbf{\gamma})=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=1}^{T-1} \log a_{i j} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)+\sum_{i=1}^{N}\gamma_i(\sum_{j=1}^{N}a_{ij}-1) f(A,γ)=i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λˉ)+i=1∑Nγi(j=1∑Naij−1)∂ f ( A , γ ) ∂ a i j = ∑ t = 1 T − 1 P ( O ∣ i t = i , i t + 1 = j ∣ λ ‾ ) + a i j γ i = 0 \frac{\partial f(A, \mathbf{\gamma})}{\partial a_{ij}}=\sum_{t=1}^{T-1}P(O|i_t=i,i_{t+1}=j|\overline{\lambda})+a_{ij}\gamma_i=0 ∂aij∂f(A,γ)=t=1∑T−1P(O∣it=i,it+1=j∣λ)+aijγi=0
∑ j = 1 N ( ∑ t = 1 T − 1 P ( O ∣ i t = i , i t + 1 = j ∣ λ ‾ ) + a i j γ i ) ⇒ γ i = − ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ‾ ) \sum_{j=1}^{N}(\sum_{t=1}^{T-1}P(O|i_t=i,i_{t+1}=j|\overline{\lambda})+a_{ij}\gamma_i) \Rightarrow \gamma_i=-\sum_{t=1}^{T-1}P(O,i_t=i|\overline{\lambda}) j=1∑N(t=1∑T−1P(O∣it=i,it+1=j∣λ)+aijγi)⇒γi=−t=1∑T−1P(O,it=i∣λ)
Finally, we get our a i j a_{ij} aij:
a i j = ∑ t = 1 T − 1 P ( O , i t = i , i t + 1 = j ∣ λ ˉ ) ∑ t = 1 T − 1 P ( O , i t = i ∣ λ ˉ ) a_{ij}=\frac{\sum_{t=1}^{T-1} P\left(O, i_{t}=i, i_{t+1}=j \mid \bar{\lambda}\right)}{\sum_{t=1}^{T-1} P\left(O, i_{t}=i \mid \bar{\lambda}\right)} aij=∑t=1T−1P(O,it=i∣λˉ)∑t=1T−1P(O,it=i,it+1=j∣λˉ) -
b j ( k ) b_{j}(k) bj(k)
∑ I ( ∑ t = 1 T log b i t ( o t ) ) P ( O , I ∣ λ ˉ ) = ∑ j = 1 N ∑ t = 1 T log b j ( o t ) P ( O , i t = j ∣ λ ˉ ) \sum_{I}\left(\sum_{t=1}^{T} \log b_{i_{t}}\left(o_{t}\right)\right) P(O, I \mid \bar{\lambda})=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P\left(O, i_{t}=j \mid \bar{\lambda}\right) I∑(t=1∑Tlogbit(ot))P(O,I∣λˉ)=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λˉ)
Contraints of our function are
∑ k = 1 M b 1 ( k ) = 1 \sum_{k=1}^{M}b_1(k)=1 ∑k=1Mb1(k)=1, ∑ k = 1 M b 2 ( k ) = 1 \sum_{k=1}^{M}b_2(k)=1 ∑k=1Mb2(k)=1, ⋯ \cdots ⋯,
∑ k = 1 M b N ( k ) = 1 \sum_{k=1}^{M}b_N(k)=1 ∑k=1MbN(k)=1, and our final object function is:
f ( B , γ ) = ∑ j = 1 N ∑ t = 1 T log b j ( o t ) P ( O , i t = j ∣ λ ˉ ) + ∑ j = 1 N γ j ( ∑ k = 1 M b j ( k ) − 1 ) f(B, \mathbf{\gamma})=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P\left(O, i_{t}=j \mid \bar{\lambda}\right)+\sum_{j=1}^{N}\gamma_j(\sum_{k=1}^{M}b_j(k)-1) f(B,γ)=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λˉ)+j=1∑Nγj(k=1∑Mbj(k)−1)
A critical point of this derivative is o t = v k o_t=v_k ot=vk.
∂ f ( B , γ ) ∂ b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) + b j ( k ) γ j = 0 \frac{\partial f(B, \mathbf{\gamma})}{\partial b_{j}(k)}=\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right) + b_j(k)\gamma_j=0 ∂bj(k)∂f(B,γ)=t=1∑TP(O,it=j∣λˉ)I(ot=vk)+bj(k)γj=0
∑ k = 1 M [ ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) + b j ( k ) γ j ] = 0 \sum_{k=1}^{M}\left[\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right) + b_j(k)\gamma_j\right]=0 k=1∑M[t=1∑TP(O,it=j∣λˉ)I(ot=vk)+bj(k)γj]=0
There is just one I ( o t = v k ) = 1 I(o_{t}=v_{k})=1 I(ot=vk)=1 which means
∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) + γ j = 0 ⇒ γ j = − ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) \sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right)+\gamma_j=0 \Rightarrow \gamma_j=-\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) t=1∑TP(O,it=j∣λˉ)+γj=0⇒γj=−t=1∑TP(O,it=j∣λˉ)Let’s substitute the uppon formula into the original derivative
which will give us the final result:
b j ( k ) = ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = j ∣ λ ˉ ) b_{j}(k)=\frac{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right) I\left(o_{t}=v_{k}\right)}{\sum_{t=1}^{T} P\left(O, i_{t}=j \mid \bar{\lambda}\right)} bj(k)=∑t=1TP(O,it=j∣λˉ)∑t=1TP(O,it=j∣λˉ)I(ot=vk)
-
Conclusions
-
a i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{i j}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} aij=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)
-
b j ( k ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_{j}(k)=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)} bj(k)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
-
π i = γ 1 ( i ) \pi_{i}=\gamma_{1}(i) πi=γ1(i)
Baum-Welch Algorithm
-
Input: O = ( o 1 , o 2 , ⋯ , o T ) O=(o_1,o_2,\cdots, o_T) O=(o1,o2,⋯,oT) is the observed sequence with length
T T T. -
Output: λ = ( A , B , π ) \lambda=(\mathbf{A,B,\pi}) λ=(A,B,π)
Alg:
-
Initialization: For n = 0 n=0 n=0, we randomly choose
a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) a_{i j}^{(0)}, b_{j}(k)^{(0)}, \pi_{i}^{(0)} aij(0),bj(k)(0),πi(0) to form our initial
model λ ( 0 ) = ( A ( 0 ) , B ( 0 ) , π ( 0 ) ) \lambda^{(0)}=\left(A^{(0)}, B^{(0)}, \pi^{(0)}\right) λ(0)=(A(0),B(0),π(0)) -
recursion: n = 1 , 2 , ⋯ n=1,2,\cdots n=1,2,⋯
a i j ( n + 1 ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{i j}^{(n+1)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} aij(n+1)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)
b j ( k ) ( n + 1 ) = ∑ t = 1 , o t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) b_{j}(k)^{(n+1)}=\frac{\sum_{t=1, o_{t}=v_{k}}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)} bj(k)(n+1)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)
π i ( n + 1 ) = γ 1 ( i ) \pi_{i}^{(n+1)}=\gamma_{1}(i) πi(n+1)=γ1(i)
-
stop: With some critical,we stop our algorithm and treat the last
result
λ ( n + 1 ) = ( A ( n + 1 ) , B ( n + 1 ) , π ( n + 1 ) ) \lambda^{(n+1)}=\left(A^{(n+1)}, B^{(n+1)}, \pi^{(n+1)}\right) λ(n+1)=(A(n+1),B(n+1),π(n+1)) as
our HMM model parameters.