- markov chains
也称作可观察马尔可夫模型,和HMM一样,是一种加权有限状态自动机,每个状态所有弧上的概率求和为1。表示一种概率图模型。
可以用初始状态、转移概率、有限状态集得到一个序列。 - hmm
有限状态集、转移概率、观察序列、观察概率/发射概率(即观测序列似然度)、初始状态/结束状态、初始状态概率、
HMM三个基本问题:
problem1:计算似然度,已知HMM λ = ( A , B ) \lambda=(A,B) λ=(A,B)和观测序列O,确定似然度 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
problem2:解码,已知观察序列O和HMM λ = ( A , B ) \lambda=(A,B) λ=(A,B),确定最优隐含状态序列Q。
problem3:learning模型训练,已知观察序列O和HMM的有限隐含状态集,学习HMM的参数A和B。 - computing likelyhood:HMM前向算法
- decode:veterbi算法
解码值对某一观察序列找到最优隐含状态序列。
传统算法:对每个可能的隐含状态序列,用前向算法计算给定隐含状态序列下观测序列的似然度,找出最大似然度对应的隐含状态序列,这种算法的时间复杂度是指数级别的,不可取。
HMM最常用的解码算法是维特比算法,也是动态规划算法的一种,Viterbi是一种动态规划的变体。
v t ( j ) {v_t}(j) vt(j)表示给定 λ \lambda λ参数HMM下,前t个观察序列为 o 1 , o 2 . . . o t {o_1},{o_2}...{o_t} o1,o2...ot,其最可能的隐含状态序列为 q 0 , q 1 , . . . q t − 1 {q_0},{q_1},...{q_{t - 1}} q0,q1,...qt−1时第t个观察序列对应状态为j的概率。
v t ( j ) = max q 0 , q 1 , . . . q t − 1 P ( q 0 , q 1 , . . . q t − 1 , o 1 , o 2 . . . o t , q t = j ∣ λ ) {v_t}(j) = \mathop {\max }\limits_{{q_0},{q_1},...{q_{t - 1}}} P({q_0},{q_1},...{q_{t - 1}},{o_1},{o_2}...{o_t},{q_t} = j|\lambda ) vt(j)=q0,q1,...qt−1maxP(q0,q1,...qt−1,o1,o2...ot,qt=j∣λ)
v t ( j ) = max i = 1 N v t − 1 ( i ) a i j b j ( o t ) {v_t}(j) = \mathop {\max }\limits_{i = 1}^N {v_{t - 1}}(i){a_{ij}}{b_j}({o_t}) vt(j)=i=1maxNvt−1(i)aijbj(ot)
v t − 1 ( i ) v_{t - 1}(i) vt−1(i)表示前一时刻的Viterbi路径概率
a i j a_{ij} aij表示从之前状态i转移到当前状态j的概率
b j ( o t ) b_j(o_t) bj(ot)表示观察序列为 o t o_t ot时当前状态的概率,即状态观察似然度。
Viterbi算法与前向算法类似,除了在前向路径概率上取最大值,其中前向算法是获得总和。
1.初始化
v 1 ( j ) = a 0 j b j ( o 1 ) , 1 ≤ j ≤ N v_1(j)=a_{0j}b_j(o_1), 1 \le j \le N v1(j)=a0jbj(o1),1≤j≤N
b t 1 ( j ) = 0 b_{t1}(j)=0 bt1(j)=0
2.递归计算(初始状态和结束状态时non-emitting的)
v t ( j ) = max i = 1 N v t − 1 ( i ) a i j b j ( o t ) , 1 ≤ j ≤ N , 1 < t ≤ T {v_t}(j) = \mathop {\max }\limits_{i = 1}^N {v_{t - 1}}(i){a_{ij}}{b_j}({o_t}),1 \le j \le N,1 < t \le T vt(j)=i=1maxNvt−1(i)aijbj(ot),1≤j≤N,1<t≤T
b t t ( j ) = arg max i = 1 N v t − 1 ( i ) a i j b j ( o t ) , 1 ≤ j ≤ N , 1 < t ≤ T b{t_t}(j) = \mathop {\arg \max }\limits_{i = 1}^N {v_{t - 1}}(i){a_{ij}}{b_j}({o_t}),1 \le j \le N,1 < t \le T btt(j)=i=1argmaxNvt−1(i)aijbj(ot),1≤j≤N,1<t≤T
3.终结
best score:
P ∗ = v T ( q F ) = max i = 1 N v T ( i ) ∗ a i F P* = {v_T}({q_F}) = \mathop {\max }\limits_{i = 1}^N {v_T}(i)*{a_{iF}} P∗=vT(qF)=i=1maxNvT(i)∗aiF
回溯路径是:
q T ∗ = b t T ( q F ) = arg max i = 1 N v T ( i ) ∗ a i F {q_T}* = b{t_T}({q_F}) = \mathop {\arg \max }\limits_{i = 1}^N {v_T}(i)*{a_{iF}} qT∗=btT(qF)=i=1argmaxNvT(i)∗aiF
Viterbi算法:
- training HMM:forward-backward算法或Baum-Welch算法,是EM算法的一种。通过这种算法训练转移概率A和观测概率B。
训练算法的输入是未标记的观察序列O和潜在隐含状态Q的词汇表。
后向算法:
后向概率 β \beta β是观察序列为从t+1到T,t时刻隐含状态为j的概率:
β t ( i ) = P ( o t + 1 , o t + 2 . . . o T ∣ q t = i , λ ) \beta_t(i)=P(o_{t+1},o_{t+2}...o_T|q_t=i,\lambda) βt(i)=P(ot+1,ot+2...oT∣qt=i,λ)
后向算法的推导类似于前向算法。
1.初始化:
β T ( i ) = a i , F , 1 ≤ i ≤ N \beta_T(i)=a_{i,F},1 \le i\le N βT(i)=ai,F,1≤i≤N
2.递归计算:(初始状态和结束状态没有发射概率)
β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , 1 ≤ i ≤ N , 1 < t < T {\beta _t}(i) = \sum\limits_{j = 1}^N {{a_{ij}}} {b_j}({o_{t + 1}}){\beta _{t + 1}}(j),1 \le i \le N,1 < t < T βt(i)=j=1∑Naijbj(ot+1)βt+1(j),1≤i≤N,1<t<T
3.求和终止:
P ( O ∣ λ ) = α T ( q F ) = β 1 ( 0 ) = ∑ j = 1 N a 0 j b j ( o 1 ) β 1 ( j ) P(O|\lambda)=\alpha_T(q_F)=\beta_1(0)= \sum\limits_{j = 1}^N {{a_{0j}}} {b_j}({o_1}){\beta _1}(j) P(O∣λ)=αT(qF)=β1(0)=j=1∑Na0jbj(o1)β1(j)
反向算法推到步骤:
KaTeX parse error: Expected '}', got 'EOF' at end of input: …rom state i}}}}
ξ t ( i , j ) \xi_t(i,j) ξt(i,j)定义为时刻t由状态i在t+1转为状态j的概率:
ξ t ( i , j ) = P ( q t = i , q t + 1 = j ∣ O , λ ) \xi_t(i,j)=P(q_t=i,q_{t+1}=j|O,\lambda) ξt(i,j)=P(qt=i,qt+1=j∣O,λ)
n o t − q u i t e − ξ t ( i , j ) = P ( q t = i , q t + 1 = j , O ∣ λ ) not-quite-\xi_t(i,j)=P(q_t=i,q_{t+1}=j,O|\lambda) not−quite−ξt(i,j)=P(qt=i,qt+1=j,O∣λ)
n o t − q u i t e − ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) not-quite-\xi_t(i,j)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j) not−quite−ξt(i,j)=αt(i)aijbj(ot+1)βt+1(j)
P ( X ∣ Y , Z ) = P ( X , Y ∣ Z ) P ( Y ∣ Z ) P(X|Y,Z)=\frac{P(X,Y|Z)}{P(Y|Z)} P(X∣Y,Z)=P(Y∣Z)P(X,Y∣Z)
so ξ t ( i , j ) = P ( q t = i , q t + 1 = j ∣ O , λ ) = P ( q t = i , q t + 1 = j , O ∣ λ ) P ( O ∣ λ ) \xi_t(i,j)=P(q_t=i,q_{t+1}=j|O,\lambda)=\frac{P(q_t=i,q_{t+1}=j,O|\lambda)}{P(O|\lambda)} ξt(i,j)=P(qt=i,qt+1=j∣O,λ)=P(O∣λ)P(qt=i,qt+1=j,O∣λ)
P ( O ∣ λ ) = α T ( N ) = β T ( 1 ) = ∑ j = 1 N α t ( j ) β t ( j ) P(O|\lambda)=\alpha_T(N)=\beta_T(1)=\sum\limits_{j=1}^N\alpha_t(j)\beta_t(j) P(O∣λ)=αT(N)=βT(1)=j=1∑Nαt(j)βt(j)
ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) α T ( N ) \xi_t(i,j)=\frac{\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\alpha_T(N)} ξt(i,j)=αT(N)αt(i)aijbj(ot+1)βt+1(j)
a ^ i j = ∑ t = 1 T − 1 ξ t i , j ) ∑ t = 1 T − 1 ∑ j = 1 N ξ t ( i , j ) {\hat a}_{ij} ={{\sum\nolimits_{t = 1}^{T - 1} {{\xi _t}i,j)} } \over {\sum\nolimits_{t = 1}^{T - 1} {} \sum\nolimits_{j = 1}^N {{\xi _t}(i,j)} }} a^ij=∑t=1T−1∑j=1Nξt(i,j)∑t=1T−1ξti,j)
观测概率:
b ^ j ( v k ) = e x p e c t e d n u m b e r o f t i m e s i n s t a t e j a n d o b s e r v i n g s y m b o l v k e x p e c t e d n u m b e r o f t i m e s i n s t a t e j {\hat b}_{j}(v_k)=\frac{expected\ number\ of\ times\ in\ state\ j\ and\ observing\ symbol\ v_k}{expected\ number\ of\ times\ in\ state\ j} b^j(vk)=expected number of times in state jexpected number of times in state j and observing symbol vk
观察序列O时刻t处于状态j的概率:
γ t ( j ) = P ( q t = j ∣ O , λ ) {\gamma _t}(j) = P({q_t} = j|O,\lambda ) γt(j)=P(qt=j∣O,λ)
γ t ( j ) = P ( q t = j , O ∣ λ ) P ( O ∣ λ ) \gamma_t(j)=\frac{P(q_t=j,O|\lambda)}{P(O|\lambda)} γt(j)=P(O∣λ)P(qt=j,O∣λ)
γ t ( j ) = α t ( j ) β t ( j ) P ( O ∣ λ ) \gamma_t(j)=\frac{\alpha_t(j)\beta_t(j)}{P(O|\lambda)} γt(j)=P(O∣λ)αt(j)βt(j)
b ^ j ( v k ) = ∑ t = 1 s . t . O t = v k T γ t ( j ) ∑ t = 1 T γ t ( j ) \hat b_j(v_k)=\frac{{\sum\nolimits_{t = 1s.t.{O_t} = {v_k}}^T {{\gamma _t}(j)} }}{{\sum\nolimits_{t = 1}^T {{\gamma _t}(j)} }} b^j(vk)=∑t=1Tγt(j)∑t=1s.t.Ot=vkTγt(j)
前向后向算法:
- maximum entropy 最大交叉熵训练准则
语音与语言处理笔记——6HMM & maximum entropy
最新推荐文章于 2021-06-27 09:26:11 发布