代码地址:https://gitee.com/liangcd/speech_learning/tree/master/HMM
HMM基本概念、概率计算、预测算法请看上一篇文章,感谢您的阅读!
学习算法
已知观测序列 O = ( o 1 , o 2 , … , o T ) , b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right), b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) O=(o1,o2,…,oT),bj(ot)=∑m=1McjmN(ot;μjm,Σjm),估计GMM-HMM参数 λ \lambda λ,使 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)最大。参数 λ \lambda λ包括:
-
初始状态概率向量 π = ( π i ) \pi = (\pi_i) π=(πi)
-
转移概率矩阵 A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij]N×N
-
状态 j j j的GMM参数 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm), j = 1 , 2 , . . . , N ; m = 1 , . . . , M j=1,2,...,N; m=1,...,M j=1,2,...,N;m=1,...,M表示GMM分量标号。
Viterbi学习算法
如果已知状态-观测对齐序列,每个观测 o t o_t ot对应一个具体的状态,状态-观测对齐序列可通过Viterbi解码算法得到,也可通过人工标注得到。知道每个观测对应的状态,则:
π i \pi_i πi可通过最大似然估计得到:
- 令 C ( i ) C(i) C(i)表示初始状态为 i i i 的次数
- π ^ i = C ( i ) ∑ k C ( k ) \widehat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)} π i=∑kC(k)C(i)
a i j a_{ij} aij也可通过最大似然估计得到:
- 令 C ( i → j ) C(i \rightarrow j) C(i→j)表示从状态 i i i到状态 j j j的转移次数
- a ^ i j = C ( i → j ) ∑ k C ( i → k ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)} a^ij=∑kC(i→k)C(i→j)
可得每个状态 j j j对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,…,yN),每个状态对应一个GMM,也就得到了每个GMM对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,…,yN)。
问题:用viterbi算法得到对齐序列需要用到模型参数 λ \lambda λ,最初的 λ \lambda λ怎么得到?
当GMM只有一个分量, b j ( o t ) = N ( o t ; μ j , Σ j ) b_{j}\left(o_{t}\right)=\mathcal{N}\left(o_{t} ; \mu_{j}, \Sigma_{j}\right) bj(ot)=N(ot;μj,Σj), ∣ Z j ∣ |Z_j| ∣Zj∣表示 Z j Z_j Zj的元素个数,则:
μ
^
j
=
∑
o
t
∈
Z
j
o
t
∣
Z
j
∣
\hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|}
μ^j=∣Zj∣∑ot∈Zjot
Σ
^
j
=
∑
o
t
∈
Z
j
(
o
t
−
μ
^
j
)
(
o
t
−
μ
^
j
)
T
∣
Z
j
∣
\hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|}
Σ^j=∣Zj∣∑ot∈Zj(ot−μ^j)(ot−μ^j)T
当GMM有多个分量, b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj(ot)=∑m=1McjmN(ot;μjm,Σjm),可以利用EM算法进行更新(参考上一篇文章)。
Viterbi学习算法
- 初始化GMM-HMM参数 λ = ( π i , a i j , G M M 参 数 ) \lambda = (\pi_i,a_{ij},GMM参数) λ=(πi,aij,GMM参数),其中每个状态 j j j对应的GMM参数为 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm)
- 基于GMM-HMM参数 λ \lambda λ和Viterbi算法得到状态-观测对齐,得到每个观测对应的隐藏状态
- 更新参数 λ \lambda λ
- π ^ i = C ( i ) ∑ k C ( k ) , C ( i ) \hat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)}, C(i) π^i=∑kC(k)C(i),C(i) 表示初始状态为 i i i 的次数
- a ^ i j = C ( i → j ) ∑ k C ( i → k ) , C ( i → j ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)}, C(i \rightarrow j) a^ij=∑kC(i→k)C(i→j),C(i→j) 表示从状态 i i i 到状态 j j j 的转移次数
- 用上一篇文章将的EM算法更新GMM参数
- 重复2,3步,直到收敛
Baum-Welch学习算法
首先看单分量GMM的情况
Viterbi学习算法是一种近似,只考虑了最优对齐路径。而每个时刻 t t t每个状态 j j j以一定的概率出现,而不是硬对齐(每个时刻只对应一个状态)。
状态占用概率:给定模型
λ
\lambda
λ和观测
O
O
O,在时刻
t
t
t处于状态
q
i
q_i
qi的概率为
γ
t
(
i
)
\gamma_{t}(i)
γt(i):
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
i
=
1
N
α
T
(
i
)
\gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)}
γt(i)=P(it=qi∣O,λ)=∑i=1NαT(i)αt(i)βt(i)
可以将这个概率用于EM算法,学习到参数
λ
\lambda
λ:
- E步:估计状态占用概率
- M步:基于估计的状态占用概率,重新估计参数 λ \lambda λ(最大化)
问题:证明 γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt(i)=P(it=qi∣O,λ)=∑i=1NαT(i)αt(i)βt(i)成立。
证:省略
λ
\lambda
λ,
o
1
t
=
o
1
,
o
2
,
.
.
.
,
o
t
o_1^t = o_1,o_2,...,o_t
o1t=o1,o2,...,ot,
O
=
o
1
T
O=o_1^T
O=o1T
P
(
i
t
=
q
i
,
O
∣
λ
)
=
P
(
i
t
=
q
i
,
o
1
t
,
o
t
+
1
T
)
=
P
(
i
t
=
q
i
,
o
1
t
)
P
(
o
t
+
1
T
∣
i
t
=
q
i
,
o
1
t
)
=
P
(
i
t
=
q
i
,
o
1
t
)
P
(
o
t
+
1
T
∣
i
t
=
q
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
∑
j
=
1
N
α
T
(
j
)
P
(
i
t
=
q
i
∣
O
,
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
T
(
j
)
\begin{aligned} P\left(i_{t}=q_{i}, O \mid \lambda\right) &=P\left(i_{t}=q_{i}, o_{1}^{t}, o_{t+1}^{T}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}, o_{1}^{t}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}\right) \\ &=\alpha_{t}(i) \beta_{t}(i) \\ P(O \mid \lambda) &=\sum_{j=1}^{N} \alpha_{T}(j) \\ P\left(i_{t}=q_{i} \mid O, \lambda\right) &=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{T}(j)} \end{aligned}
P(it=qi,O∣λ)P(O∣λ)P(it=qi∣O,λ)=P(it=qi,o1t,ot+1T)=P(it=qi,o1t)P(ot+1T∣it=qi,o1t)=P(it=qi,o1t)P(ot+1T∣it=qi)=αt(i)βt(i)=j=1∑NαT(j)=P(O∣λ)P(it=qi,O∣λ)=∑j=1NαT(j)αt(i)βt(i)
对于某个状态,将所有时刻的状态占用概率相加,可认为是一个软次数,使用该软次数重新估计HMM参数:
μ
^
j
=
∑
t
=
1
T
γ
t
(
j
)
o
t
∑
t
=
1
T
γ
t
(
j
)
Σ
^
j
=
∑
t
=
1
T
γ
t
(
j
)
(
o
t
−
μ
^
j
)
(
o
t
−
μ
^
j
)
T
∑
t
=
1
T
γ
t
(
i
)
\begin{array}{c} \hat{\mu}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j) o_{t}}{\sum_{t=1}^{T} \gamma_{t}(j)} \\ \hat{\Sigma}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j)\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\sum_{t=1}^{T} \gamma_{t}(i)} \end{array}
μ^j=∑t=1Tγt(j)∑t=1Tγt(j)otΣ^j=∑t=1Tγt(i)∑t=1Tγt(j)(ot−μ^j)(ot−μ^j)T
对比Viterbi算法(硬次数):
μ
^
j
=
∑
o
t
∈
Z
j
o
t
∣
Z
j
∣
Σ
^
j
=
∑
o
t
∈
Z
j
(
o
t
−
μ
^
j
)
(
o
t
−
μ
^
j
)
T
∣
Z
j
∣
\hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} \\ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|}
μ^j=∣Zj∣∑ot∈ZjotΣ^j=∣Zj∣∑ot∈Zj(ot−μ^j)(ot−μ^j)T
当GMM有多个分量
当
b
j
(
o
t
)
=
∑
m
=
1
M
c
j
m
N
(
o
t
;
μ
j
m
,
Σ
j
m
)
b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right)
bj(ot)=∑m=1McjmN(ot;μjm,Σjm) 时,与前面类似,定义给定模型
λ
\lambda
λ 和观测
O
O
O ,在时刻
t
t
t 处于状态
q
j
q_{j}
qj 且为GMM第
k
k
k 个分量的概率为
ζ
t
(
j
,
k
)
\zeta_{t}(j, k)
ζt(j,k)
ζ
t
(
j
,
k
)
=
P
(
i
t
=
q
j
,
m
t
=
k
∣
0
,
λ
)
=
P
(
i
t
=
q
j
,
m
t
=
k
,
0
∣
λ
)
P
(
O
∣
λ
)
=
∑
i
α
t
−
1
(
i
)
a
i
j
c
j
k
b
j
k
(
o
t
)
β
t
(
j
)
∑
i
=
1
N
α
T
(
i
)
\begin{aligned} \zeta_{t}(j, k) &=P\left(i_{t}=q_{j}, m_{t}=k \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{j}, m_{t}=k, 0 \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned}
ζt(j,k)=P(it=qj,mt=k∣0,λ)=P(O∣λ)P(it=qj,mt=k,0∣λ)=∑i=1NαT(i)∑iαt−1(i)aijcjkbjk(ot)βt(j)
则有
μ
^
j
k
=
∑
t
=
1
T
ζ
t
(
j
,
k
)
o
t
∑
t
=
1
T
ζ
t
(
j
,
k
)
Σ
^
j
k
=
∑
t
=
1
T
ζ
t
(
j
,
k
)
(
o
t
−
μ
^
j
k
)
(
o
t
−
μ
^
j
k
)
T
∑
t
=
1
T
ζ
t
(
j
,
k
)
c
^
j
k
=
∑
t
=
1
T
ζ
t
(
j
,
k
)
∑
t
=
1
T
∑
k
ζ
t
(
j
,
k
)
\begin{array}{l} \hat{\mu}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \end{array}
μ^jk=∑t=1Tζt(j,k)∑t=1Tζt(j,k)otΣ^jk=∑t=1Tζt(j,k)∑t=1Tζt(j,k)(ot−μ^jk)(ot−μ^jk)Tc^jk=∑t=1T∑kζt(j,k)∑t=1Tζt(j,k)
与状态占用概率类似,定义给定模型
λ
\lambda
λ 和观测
O
O
O ,在时刻
t
t
t 处于状态
q
i
q_{i}
qi 且在时刻
t
+
1
t+1
t+1 处于 状态
q
j
q_{j}
qj 的概率为
ξ
t
(
i
,
j
)
\xi_{t}(\boldsymbol{i}, \boldsymbol{j})
ξt(i,j) :
ξ
t
(
i
,
j
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
0
,
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
O
∣
λ
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
∑
i
=
1
N
α
T
(
i
)
\begin{aligned} \xi_{t}(i, j) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned}
ξt(i,j)=P(it=qi,it+1=qj∣0,λ)=P(O∣λ)P(it=qi,it+1=qj,O∣λ)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)
且有
γ
t
(
i
)
=
∑
k
=
1
N
ξ
t
(
i
,
k
)
\gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k)
γt(i)=∑k=1Nξt(i,k)
由该概率可得转移概率和初始概率:
a
^
i
j
=
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
t
=
1
T
−
1
∑
k
=
1
N
ξ
t
(
i
,
k
)
=
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
t
=
1
T
−
1
γ
t
(
i
)
π
^
i
=
γ
1
(
i
)
\begin{aligned} \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned}
a^ijπ^i=∑t=1T−1∑k=1Nξt(i,k)∑t=1T−1ξt(i,j)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)=γ1(i)
问题:证明
ξ
t
(
i
,
j
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
∑
i
=
1
N
α
T
(
i
)
\xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}
ξt(i,j)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)
证:由前向公式的递推证明可知
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
1
t
+
1
∣
λ
)
\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right)
αt(i)aijbj(ot+1)=P(it=qi,it+1=qj,o1t+1∣λ)
后向公式定义可知
β
t
+
1
(
j
)
=
P
(
o
t
+
2
T
∣
i
t
+
1
=
q
j
,
λ
)
\beta_{t+1}(j)=P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right)
βt+1(j)=P(ot+2T∣it+1=qj,λ)
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
0
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
1
t
+
1
,
o
t
+
2
T
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
1
t
+
1
∣
o
t
+
2
T
,
λ
)
P
(
o
t
+
2
T
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
o
1
t
+
1
∣
λ
)
P
(
o
t
+
2
T
∣
i
t
+
1
=
q
j
,
λ
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
\begin{aligned} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, 0 \mid \lambda\right) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1}, o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid o_{t+2}^{T}, \lambda\right) P\left(o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) \\ &=\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \end{aligned}
P(it=qi,it+1=qj,0∣λ)=P(it=qi,it+1=qj,o1t+1,ot+2T∣λ)=P(it=qi,it+1=qj,o1t+1∣ot+2T,λ)P(ot+2T∣λ)=P(it=qi,it+1=qj,o1t+1∣λ)P(ot+2T∣it+1=qj,λ)=αt(i)aijbj(ot+1)βt+1(j)
Baum-Welch学习算法总结
- 初始化GMM-HMM参数 λ = ( π i , a i j , ( c j m , μ j m , Σ j m ) ) \lambda=\left(\pi_{i}, a_{i j},\left(c_{j m}, \mu_{j m}, \Sigma_{j m}\right)\right) λ=(πi,aij,(cjm,μjm,Σjm))
- E步:对所有时间 t t t 、状态 i i i
- 递推计算前向概率 α t ( i ) \alpha_{t}(i) αt(i) 和后向概率 β t ( i ) \beta_{t}(i) βt(i)
- 计算 ζ t ( j , k ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) , ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) , γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \zeta_{t}(j, k)=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) ζt(j,k)=∑i=1NαT(i)∑iαt−1(i)aijcjkbjk(ot)βt(j),ξt(i,j)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j),γt(i)=∑k=1Nξt(i,k)
- M步: 更新参数
μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{\mu}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \\ \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} μ^jkΣ^jkc^jka^ijπ^i=∑t=1Tζt(j,k)∑t=1Tζt(j,k)ot=∑t=1Tζt(j,k)∑t=1Tζt(j,k)(ot−μ^jk)(ot−μ^jk)T=∑t=1T∑kζt(j,k)∑t=1Tζt(j,k)=∑t=1T−1∑k=1Nξt(i,k)∑t=1T−1ξt(i,j)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)=γ1(i)
4. 重复2,3步,直到收敛
python实现
考虑盒子和球模型
λ
=
(
A
,
B
,
π
)
\lambda=(A, B, \pi)
λ=(A,B,π), 状态集合
Q
=
{
1
,
2
,
3
}
Q=\{1,2,3\}
Q={1,2,3}, 观测集合
V
=
{
V=\{
V={ 红, 白
}
\}
}:
A
=
[
0.5
0.2
0.3
0.3
0.5
0.2
0.2
0.3
0.5
]
,
B
=
[
0.5
0.5
0.4
0.6
0.7
0.3
]
,
π
=
(
0.2
,
0.4
,
0.4
)
T
A=\left[\begin{array}{lll}0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5\end{array}\right], \mathrm{B}=\left[\begin{array}{cc}0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3\end{array}\right], \pi=(0.2,0.4,0.4)^{T}
A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤,π=(0.2,0.4,0.4)T
设
T
=
3
,
O
=
(
T=3, O=(
T=3,O=( 红,白, 红)
- 实现前向算法和后向算法,分别计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
- 实现Viterbi算法,求最优状态序列(最优路径)
前向算法:
公式:
初值:
-
$\alpha_1(i) = \pi_ib_i(o_1),i=1,2,…,N $
-
递推:对 t = 1 , 2 , . . . , T − 1 t=1,2,...,T-1 t=1,2,...,T−1
α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i)=\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) αt+1(i)=[j=1∑Nαt(j)aji]bi(ot+1) -
终止: P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \alpha_{T}(i) P(O∣λ)=∑i=1NαT(i)
def forward_algorithm(O, HMM_model):
"""HMM Forward Algorithm.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Return:
prob: the probability of HMM_model generating O.
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
prob = 0.0
alphas = np.zeros((N, T))
for t in range(T):
for i in range(N):
if t == 0:
alphas[i][t] = pi[i] * B[i][O[t]]
else:
alphas[i][t] = np.dot([alpha[t-1] for alpha in alphas], [a[i] for a in A]) * B[i][O[t]]
prob = np.sum([alpha[T-1] for alpha in alphas])
return prob
后向算法:
初值: β T ( i ) = 1 , i = 1 , 2 , . . . , N \beta_{T}(i) = 1, i=1,2,...,N βT(i)=1,i=1,2,...,N
-
递推:对 t = T − 1 , T − 2 , . . . , 1 t=T-1,T-2,...,1 t=T−1,T−2,...,1
β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , i = 1 , 2 , ⋯ , N \beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), \quad i=1,2, \cdots, N βt(i)=j=1∑Naijbj(ot+1)βt+1(j),i=1,2,⋯,N -
终止: P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}(i) P(O∣λ)=∑i=1Nπibi(o1)β1(i)
def backward_algorithm(O, HMM_model):
"""HMM Backward Algorithm.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Return:
prob: the probability of HMM_model generating O.
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
prob = 0.0
betas = np.zeros((N, T))
for i in range(N):
betas[i][0] = 1
for t in range(1, T):
for i in range(N):
for j in range(N):
betas[i][t] += A[i][j]*B[j][O[T-t]]*betas[j][t-1]
for i in range(N):
prob += pi[i]*B[i][O[0]]*betas[i][-1]
return prob
Viterbi算法:
公式:
δ t + 1 ( i ) = max i 1 , i 2 , ⋯ , i t P ( i t + 1 = i , i t , ⋯ , i 1 , o t + 1 , ⋯ , o 1 ∣ λ ) = max 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯ , N ; t = 1 , 2 , ⋯ , T − 1 \begin{aligned} \delta_{t+1}(i) &=\max _{i_1, i_2, \cdots, i_t} P\left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} \mid \lambda\right) \\ &=\max _{1 \leqslant j \leqslant N}\left[\delta_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right), \quad i=1,2, \cdots, N ; t=1,2, \cdots, T-1 \end{aligned} δt+1(i)=i1,i2,⋯,itmaxP(it+1=i,it,⋯,i1,ot+1,⋯,o1∣λ)=1⩽j⩽Nmax[δt(j)aji]bi(ot+1),i=1,2,⋯,N;t=1,2,⋯,T−1
ψ t ( i ) = arg max 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯ , N \psi_{t}(i)=\arg \max _{1 \leqslant j \leqslant N}\left[\delta_{t-1}(j) a_{j i}\right], \quad i=1,2, \cdots, N ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji],i=1,2,⋯,N
def Viterbi_algorithm(O, HMM_model):
"""Viterbi decoding.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Returns:
best_prob: the probability of the best state sequence
best_path: the best state sequence
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
best_prob, best_path = 0.0, []
# Begin Assignment
deltas = np.zeros((N,T), dtype=np.float64)
nodes = np.zeros((N,T), dtype=np.int)
for i in range(N):
deltas[0][i] = pi[i]*B[i][0]
for t in range(1, T):
tmp = [deltas[t-1][j] * A[j][i] for j in range(N)]
nodes[t][i] = int(np.argmax(tmp))
deltas[t][i] = tmp[nodes[t][i]] * B[i][O[t]]
best_path = np.zeros((T), dtype=np.int)
best_path[T-1] = np.argmax(deltas[T-1])
for t in range(T-2, -1, -1):
best_path[t] = nodes[t+1][best_path[t+1]]
#
best_prob = deltas[best_path[1]][best_path[-1]]
# transform the state as index in python start from '0'
best_path = [(val+1) for val in best_path]
return best_prob, best_path