统计学习方法-隐马尔可夫-公式推导
状态序列概率 P ( I ∣ λ ) P(I|\lambda) P(I∣λ)
长度为
T
T
T的状态序列
I
=
(
i
1
,
i
2
,
⋯
,
i
T
)
{ I } = (i_1, i_2, \cdots, i_T)
I=(i1,i2,⋯,iT)的概率
P
(
I
∣
λ
)
P(I|{\lambda})
P(I∣λ)可表示为
P
(
I
∣
λ
)
=
π
i
1
a
i
1
i
2
a
i
2
i
3
⋯
a
i
T
−
1
i
T
(1)
P(I|{\lambda}) = {{\pi}_{i_1} {a_{i_1 i_2}} {a_{i_2 i_3}} \cdots {a_{i_{T-1} i_{T}}}} \tag{1}
P(I∣λ)=πi1ai1i2ai2i3⋯aiT−1iT(1)
推导过程如下:
step1:
P
(
I
∣
λ
)
=
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
P(I|{\lambda}) = P(i_1, i_2, \cdots, i_T | {\lambda})
P(I∣λ)=P(i1,i2,⋯,iT∣λ)
根据上式,首先要明白的是概率
P
(
I
∣
λ
)
P(I|{\lambda})
P(I∣λ)本质上为状态
I
=
(
i
1
,
i
2
,
⋯
,
i
T
)
{ I } = (i_1, i_2, \cdots, i_T)
I=(i1,i2,⋯,iT)的联合概率,因而可以写成等式右端的形式。
step2:
根据联合概率和条件概率的关系
P
(
A
B
)
=
P
(
A
/
B
)
P
(
B
)
(2)
P(AB) = P(A/B)P(B) \tag{2}
P(AB)=P(A/B)P(B)(2)
将概率
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
P(i_1, i_2, \cdots, i_T | {\lambda})
P(i1,i2,⋯,iT∣λ) 展开为
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
=
P
(
i
T
∣
i
T
−
1
,
⋯
,
i
1
,
λ
)
P
(
i
T
−
1
,
i
T
−
2
,
⋯
,
i
1
∣
λ
)
P
(
i
T
−
1
,
i
T
−
2
,
⋯
,
i
1
∣
λ
)
=
P
(
i
T
−
2
∣
i
T
−
3
,
⋯
,
i
1
,
λ
)
P
(
i
T
−
2
,
i
T
−
3
,
⋯
,
i
1
∣
λ
)
⋯
P
(
i
2
,
i
1
∣
λ
)
=
P
(
i
2
∣
i
1
,
λ
)
(3)
P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1}, \cdots, i_1 , {\lambda}) P(i_{T-1}, i_{T-2}, \cdots, i_1 | {\lambda}) \\[0.5em] P(i_{T-1}, i_{T-2}, \cdots, i_1 | {\lambda}) = P(i_{T-2} | i_{T-3}, \cdots, i_1 , {\lambda}) P(i_{T-2}, i_{T-3}, \cdots, i_1 | {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(i_2, i_1 | {\lambda}) = P(i_2 | i_1 , {\lambda}) \tag{3}
P(i1,i2,⋯,iT∣λ)=P(iT∣iT−1,⋯,i1,λ)P(iT−1,iT−2,⋯,i1∣λ)P(iT−1,iT−2,⋯,i1∣λ)=P(iT−2∣iT−3,⋯,i1,λ)P(iT−2,iT−3,⋯,i1∣λ)⋯P(i2,i1∣λ)=P(i2∣i1,λ)(3)
可继续使用式(2)对概率
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
P(i_1, i_2, \cdots, i_T | {\lambda})
P(i1,i2,⋯,iT∣λ) 展开为
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
=
P
(
i
T
∣
i
T
−
1
⋯
,
i
1
,
λ
)
P
(
i
T
−
2
∣
i
T
−
3
,
⋯
,
i
1
,
λ
)
⋯
P
(
i
2
∣
i
1
,
λ
)
P
(
i
1
∣
λ
)
(4)
P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1} \cdots, i_1 , {\lambda}) P(i_{T-2} | i_{T-3} , \cdots, i_1 , {\lambda}) {\cdots} P(i_2|i_1, {\lambda}) P(i_1 | {\lambda}) \tag{4}
P(i1,i2,⋯,iT∣λ)=P(iT∣iT−1⋯,i1,λ)P(iT−2∣iT−3,⋯,i1,λ)⋯P(i2∣i1,λ)P(i1∣λ)(4)
step3:
根据隐马尔可夫的定义,隐马尔可夫模型有两个基本假设
齐次马尔可夫性假设为其中之一,可简单描述为:
齐次马尔可夫性假设
即假设隐藏的马尔可夫链在任意时刻 t t t的状态只依赖于前一时刻的状态,与其他时刻的状态及观测无关,也与时刻 t t t无关。
P ( i t ∣ i t − 1 , o t − 1 , ⋯ , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , ⋯ , T (5) P(i_t | i_{t-1}, o_{t-1}, \cdots, i_1, o_1) = P(i_t | i_{t-1}), t=1,2,\cdots, T \tag{5} P(it∣it−1,ot−1,⋯,i1,o1)=P(it∣it−1),t=1,2,⋯,T(5)
以上内容参考统计学习方法(李航)
根据隐马尔可夫的齐次性可得:
P
(
i
T
∣
i
T
−
1
,
⋯
,
i
1
,
λ
)
=
P
(
i
T
∣
i
T
−
1
,
λ
)
P
(
i
T
−
1
∣
i
T
−
2
,
⋯
,
i
1
,
λ
)
=
P
(
i
T
−
1
∣
i
T
−
2
,
λ
)
⋯
P
(
i
3
∣
i
2
,
i
1
,
λ
)
=
P
(
i
3
∣
i
2
,
λ
)
(6)
P(i_T | i_{T-1}, \cdots, i_1 , {\lambda}) = P(i_{T} | i_{T-1} , {\lambda}) \\[0.5em] P(i_{T-1} | i_{T-2}, \cdots, i_1 , {\lambda}) = P(i_{T-1} | i_{T-2} , {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(i_3 | i_2, i_1, {\lambda}) = P(i_3 | i_2, {\lambda}) \tag{6}
P(iT∣iT−1,⋯,i1,λ)=P(iT∣iT−1,λ)P(iT−1∣iT−2,⋯,i1,λ)=P(iT−1∣iT−2,λ)⋯P(i3∣i2,i1,λ)=P(i3∣i2,λ)(6)
step4:
将式(6)代入到式(4),可得
P
(
I
∣
λ
)
=
P
(
i
1
,
i
2
,
⋯
,
i
T
∣
λ
)
=
P
(
i
T
∣
i
T
−
1
,
λ
)
P
(
i
T
−
1
∣
i
T
−
2
,
λ
)
⋯
P
(
i
2
∣
i
1
,
λ
)
P
(
i
1
∣
λ
)
(7)
P(I|{\lambda}) = P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1} , {\lambda}) P(i_{T-1} | i_{T-2} , {\lambda}) {\cdots} P(i_2 | i_1, {\lambda}) P(i_1 | {\lambda}) \tag{7}
P(I∣λ)=P(i1,i2,⋯,iT∣λ)=P(iT∣iT−1,λ)P(iT−1∣iT−2,λ)⋯P(i2∣i1,λ)P(i1∣λ)(7)
因此
P
(
I
∣
λ
)
=
π
i
1
a
i
1
i
2
a
i
2
i
3
⋯
a
i
T
−
1
i
T
(8)
P(I|{\lambda}) = {{\pi}_{i_1} {a_{i_1 i_2}} {a_{i_2 i_3}} \cdots {a_{i_{T-1} i_{T}}}} \tag{8}
P(I∣λ)=πi1ai1i2ai2i3⋯aiT−1iT(8)
已知状态序列下的观测序列概率 P ( O ∣ I , λ ) P(O|I, \lambda) P(O∣I,λ)
对固定的状态序列 I = ( i 1 , i 2 , ⋯ , i T ) { I } = (i_1, i_2, \cdots, i_T) I=(i1,i2,⋯,iT),其对应的观测序列 O = ( o 1 , o 2 , ⋯ , o T ) {O} = (o_1, o_2, \cdots, o_T) O=(o1,o2,⋯,oT)的概率 P ( O ∣ I , λ ) P(O | I,{\lambda}) P(O∣I,λ)是
P ( O ∣ I , λ ) = b i 1 ( o 1 ) b i 2 ( o 2 ) ⋯ b i T ( o T ) (2.1) P(O|I, {\lambda}) = b_{i_1}(o_1) b_{i_2}(o_2) {\cdots} b_{i_T}(o_T) \tag{2.1} P(O∣I,λ)=bi1(o1)bi2(o2)⋯biT(oT)(2.1)
推导过程如下:
step1:
P
(
O
∣
I
,
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
I
,
λ
)
(2.2)
P(O|I, {\lambda}) = P(o_1, o_2, \cdots, o_T | I , {\lambda}) \tag{2.2}
P(O∣I,λ)=P(o1,o2,⋯,oT∣I,λ)(2.2)
根据上式,首先要明白的是概率
P
(
I
∣
λ
)
P(I|{\lambda})
P(I∣λ)本质上为
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
{ O } = (o_1, o_2, \cdots, o_T)
O=(o1,o2,⋯,oT)的联合概率,因而可以写成等式右端的形式。
step2:
根据联合概率和条件概率的关系
P
(
A
B
)
=
P
(
A
/
B
)
P
(
B
)
P(AB) = P(A/B)P(B)
P(AB)=P(A/B)P(B)
将概率
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
I
,
λ
)
P(o_1, o_2, \cdots, o_T | I , {\lambda})
P(o1,o2,⋯,oT∣I,λ) 展开为
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
I
,
λ
)
=
P
(
o
T
∣
o
T
−
1
,
o
T
−
2
,
⋯
,
o
1
,
I
,
λ
)
P
(
o
T
−
1
,
o
T
−
2
,
⋯
,
o
1
∣
I
,
λ
)
P
(
o
T
−
1
,
o
T
−
2
,
⋯
,
o
1
∣
I
,
λ
)
=
P
(
o
T
−
1
∣
o
T
−
2
,
o
T
−
3
,
⋯
,
o
1
,
I
,
λ
)
P
(
o
T
−
2
,
o
T
−
3
,
⋯
,
o
1
∣
I
,
λ
)
⋯
P
(
o
2
,
o
1
∣
I
,
λ
)
=
P
(
o
2
∣
o
1
,
I
,
λ
)
(2.3)
P(o_1, o_2, \cdots, o_T | I , {\lambda}) = P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) P(o_{T-1}, o_{T-2}, \cdots, o_1 | I, {\lambda}) \\[0.5em] P(o_{T-1}, o_{T-2}, \cdots, o_1 | I, {\lambda}) = P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) P(o_{T-2}, o_{T-3}, \cdots, o_1 | I, {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(o_2, o_1 | I, {\lambda}) = P(o_2 | o_1 , I, {\lambda}) \tag{2.3}
P(o1,o2,⋯,oT∣I,λ)=P(oT∣oT−1,oT−2,⋯,o1,I,λ)P(oT−1,oT−2,⋯,o1∣I,λ)P(oT−1,oT−2,⋯,o1∣I,λ)=P(oT−1∣oT−2,oT−3,⋯,o1,I,λ)P(oT−2,oT−3,⋯,o1∣I,λ)⋯P(o2,o1∣I,λ)=P(o2∣o1,I,λ)(2.3)
将上式代入到式(2.2),可更新概率 P ( o 1 , o 2 , ⋯ , o T ∣ I , λ ) P(o_1, o_2, \cdots, o_T | I , {\lambda}) P(o1,o2,⋯,oT∣I,λ)为
P ( o 1 , o 2 , ⋯ , o T ∣ I , λ ) = P ( o T ∣ o T − 1 , o T − 2 , ⋯ , o 1 , I , λ ) P ( o T − 1 ∣ o T − 2 , o T − 3 , ⋯ , o 1 , I , λ ) ⋯ P ( o 2 ∣ o 1 , λ ) P ( o 1 ∣ λ ) (2.4) P(o_1, o_2, \cdots, o_T | I, {\lambda}) = P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) {\cdots} P(o_2 | o_1 , {\lambda}) P(o_1 | {\lambda}) \tag{2.4} P(o1,o2,⋯,oT∣I,λ)=P(oT∣oT−1,oT−2,⋯,o1,I,λ)P(oT−1∣oT−2,oT−3,⋯,o1,I,λ)⋯P(o2∣o1,λ)P(o1∣λ)(2.4)
step3:
根据隐马尔可夫的定义,隐马尔可夫模型的另一个假设为观测独立性假设为其中之一,可简单描述为:
观测独立性假设
即假设任意时刻的观测只依赖于该时刻的马尔科夫链的状态,其其他状态和观测无关
P ( o t ∣ i T , o T , ⋯ , i t , o t , ⋯ , i 1 , o 1 ) = P ( o t ∣ i t ) , t = 1 , 2 , ⋯ , T (2.5) P(o_t | i_T, o_T, {\cdots}, i_t, o_t, \cdots, i_1, o_1) = P(o_t | i_t), t=1,2,\cdots, T \tag{2.5} P(ot∣iT,oT,⋯,it,ot,⋯,i1,o1)=P(ot∣it),t=1,2,⋯,T(2.5)
以上内容参考统计学习方法(李航)
根据隐马尔可夫的观测独立性可得:
P
(
o
T
∣
o
T
−
1
,
o
T
−
2
,
⋯
,
o
1
,
I
,
λ
)
=
P
(
o
T
∣
i
T
,
λ
)
P
(
o
T
−
1
∣
o
T
−
2
,
o
T
−
3
,
⋯
,
o
1
,
I
,
λ
)
=
P
(
o
T
−
1
∣
i
T
−
1
,
λ
)
⋯
P
(
o
2
∣
o
1
,
I
,
λ
)
=
P
(
o
2
∣
i
2
,
λ
)
(2.6)
P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) = P(o_{T} | i_{T} , {\lambda}) \\[0.5em] P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) = P(o_{T-1} | i_{T-1} , {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(o_2 | o_1 , I, {\lambda}) = P(o_2 | i_2, {\lambda}) \tag{2.6}
P(oT∣oT−1,oT−2,⋯,o1,I,λ)=P(oT∣iT,λ)P(oT−1∣oT−2,oT−3,⋯,o1,I,λ)=P(oT−1∣iT−1,λ)⋯P(o2∣o1,I,λ)=P(o2∣i2,λ)(2.6)
step4:
将式(2.6)代入到式(2.4),可得
P
(
O
∣
I
,
λ
)
=
P
(
o
1
,
o
2
,
⋯
,
o
T
∣
I
,
λ
)
=
P
(
o
T
∣
i
T
,
λ
)
P
(
o
T
−
1
∣
i
T
−
1
,
λ
)
⋯
P
(
o
2
∣
i
2
,
λ
)
P
(
o
1
∣
i
1
,
λ
)
(2.7)
P(O | I, {\lambda}) = P(o_1, o_2, \cdots, o_T | I , {\lambda}) = P(o_{T} | i_{T} , {\lambda}) P(o_{T-1} | i_{T-1} , {\lambda}) {\cdots} P(o_2 | i_2, {\lambda}) P(o_1 | i_1, {\lambda}) \tag{2.7}
P(O∣I,λ)=P(o1,o2,⋯,oT∣I,λ)=P(oT∣iT,λ)P(oT−1∣iT−1,λ)⋯P(o2∣i2,λ)P(o1∣i1,λ)(2.7)
因此
P
(
O
∣
I
,
λ
)
=
b
i
1
(
o
1
)
b
i
2
(
o
2
)
⋯
b
i
T
(
o
T
)
(2.8)
P(O|I, {\lambda}) = b_{i_1}(o_1) b_{i_2}(o_2) {\cdots} b_{i_T}(o_T) \tag{2.8}
P(O∣I,λ)=bi1(o1)bi2(o2)⋯biT(oT)(2.8)
完全数据序列概率 P ( O , I ∣ λ ) P(O, I|\lambda) P(O,I∣λ)
完全数据定义为
{
I
,
O
}
\{I,O\}
{I,O},其中
I
I
I代表状态序列,
O
O
O代表观测序列
则完全数据出现的概率
P
(
O
,
I
∣
λ
)
P(O,I | {\lambda})
P(O,I∣λ)为
P
(
O
,
I
∣
λ
)
=
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
=
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
P(O,I| \lambda) = P(O|I, \lambda) P(I | \lambda) = {{\pi}_{i_1} b_{i_1}(o_1) a_{i_1 i_2} \cdots a_{i_{T-1} i_T} b_{i_T}(o_T) }
P(O,I∣λ)=P(O∣I,λ)P(I∣λ)=πi1bi1(o1)ai1i2⋯aiT−1iTbiT(oT)
推导过程如下:
参考式(1)和式(2.1),则易得以上结论
未知状态序列下的观测序列概率 P ( O ∣ λ ) P(O| \lambda) P(O∣λ)
状态序列的概率
P
(
O
∣
λ
)
P(O| \lambda)
P(O∣λ)可展开为
P
(
O
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
=
∑
i
1
,
i
2
,
⋯
,
i
T
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
P(O| \lambda)=\sum \limits_{I} P(O|I, \lambda) P(I| \lambda) =\sum \limits_{i_1, i_2, \cdots, i_T} {{\pi}_{i_1} b_{i_1}(o_1) a_{i_1 i_2} \cdots a_{i_{T-1} i_T} b_{i_T}(o_T) }
P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)=i1,i2,⋯,iT∑πi1bi1(o1)ai1i2⋯aiT−1iTbiT(oT)
推导过程如下:
step1:
根据全概率公式展开观测序列概率
P
(
O
∣
λ
)
P(O| \lambda)
P(O∣λ)为
P
(
O
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
P
(
I
∣
λ
)
P(O| \lambda) = \sum \limits_{I} P(O, I | \lambda) = \sum \limits_{I} P(O|I, \lambda) P(I| \lambda)
P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)P(I∣λ)
step2:
参考式(1)和式(2.1),则易得以上结论