Hidden Markov Model (HMM) 详细推导及思路分析

本文详细介绍了隐马尔科夫模型(HMM)的基础知识,包括条件独立性、HMM的设置、前向算法、后向算法及其在解决概率、学习和推断问题中的应用。通过对经典案例的分析,阐述了HMM的计算原理,为理解和应用HMM提供了清晰的思路。
摘要由CSDN通过智能技术生成

往期文章链接目录

Before reading this post, make sure you are familiar with the EM Algorithm and decent among of knowledge of convex optimization. If not, please check out my previous post

Let’s get started!

Conditional independence

A A A and B B B are conditionally independent given C C C if and only if, given knowledge that C C C occurs, knowledge of whether A A A occurs provides no information on the likelihood of B B B occurring, and knowledge of whether B B B occurs provides no information on the likelihood of A A A occurring.

Formally, if we denote conditional independence of A A A and B B B given C C C by ( A ⊥  ⁣ ⁣ ⁣ ⊥ B ) ∣ C (A\perp \!\!\!\perp B)\mid C (AB)C, then by definition, we have

( A ⊥  ⁣ ⁣ ⁣ ⊥ B ) ∣ C    ⟺    P ( A , B ∣ C ) = P ( A ∣ C ) ⋅ P ( B ∣ C ) (A\perp \!\!\!\perp B)\mid C\quad \iff \quad P(A, B\mid C)= P(A\mid C) \cdot P(B\mid C) (AB)CP(A,BC)=P(AC)P(BC)

Given the knowledge that C C C occurs, to show the knowledge of whether B B B occurs provides no information on the likelihood of A A A occurring, we have

P ( A ∣ B , C ) = P ( A , B , C ) P ( B , C ) = P ( A , B ∣ C ) ⋅ P ( C ) P ( B , C ) = P ( A ∣ C ) ⋅ P ( B ∣ C ) ⋅ P ( C ) P ( B ∣ C ) ⋅ P ( C ) = P ( A ∣ C ) \begin{aligned} P(A | B ,C) &=\frac{P(A , B , C)}{P(B , C)} \\ &=\frac{P(A , B | C) \cdot P(C)}{P(B , C)} \\ &=\frac{P(A | C) \cdot P(B | C) \cdot P(C)}{P(B | C) \cdot P(C)} \\ &=P(A | C) \end{aligned} P(AB,C)=P(B,C)P(A,B,C)=P(B,C)P(A,BC)P(C)=P(BC)P(C)P(AC)P(BC)P(C)=P(AC)

Two classical cases where X X X and Z Z Z are conditionally independent

Case 1 :

From the above directed graph, we have P ( X , Y , Z ) = P ( X ) ⋅ P ( Y ∣ X ) ⋅ P ( Z ∣ Y ) P(X,Y,Z) = P(X)\cdot P(Y|X)\cdot P(Z|Y) P(X,Y,Z)=P(X)P(YX)P(ZY). Hence we have

P ( Z ∣ X , Y ) = P ( X , Y , Z ) P ( X , Y ) = P ( X ) ⋅ P ( Y ∣ X ) ⋅ P ( Z ∣ Y ) P ( X ) ⋅ P ( Y ∣ X ) = P ( Z ∣ Y ) \begin{aligned} P(Z|X,Y) &= \frac{P(X,Y,Z)}{P(X,Y)}\\ &= \frac{P(X)\cdot P(Y|X)\cdot P(Z|Y)}{P(X)\cdot P(Y|X)}\\ &= P(Z|Y) \end{aligned} P(ZX,Y)=P(X,Y)P(X,Y,Z)=P(X)P(YX)P(X)P(YX)P(ZY)=P(ZY)

Therefore, X X X and Z Z Z are conditionally independent.

Case 2 :

From the above directed graph, we have P ( X , Y , Z ) = P ( Y ) ⋅ P ( X ∣ Y ) ⋅ P ( Z ∣ Y ) P(X,Y,Z) = P(Y)\cdot P(X|Y) \cdot P(Z|Y) P(X,Y,Z)=P(Y)P(XY)P(ZY). Hence we have

P ( Z ∣ X , Y ) = P ( X , Y , Z ) P ( X , Y ) = P ( Y ) ⋅ P ( X ∣ Y ) ⋅ P ( Z ∣ Y ) P ( Y ) ⋅ P ( X ∣ Y ) = P ( Z ∣ Y ) \begin{aligned} P(Z|X,Y) &= \frac{P(X,Y,Z)}{P(X,Y)}\\ &= \frac{P(Y)\cdot P(X|Y) \cdot P(Z|Y)}{P(Y)\cdot P(X|Y)}\\ &= P(Z|Y) \end{aligned} P(ZX,Y)=P(X,Y)P(X,Y,Z)=P(Y)P(XY)P(Y)P(XY)P(ZY)=P(ZY)

Therefore, X X X and Z Z Z are conditionally independent.

Settings of the Hidden Markov Model (HMM)

The HMM is based on augmenting the Markov chain. A Markov chain is a model that tells us something about the probabilities of sequences of random variables, states, each of which can take on values from some set. A Markov chain makes a very strong assumption that if we want to predict the future in the sequence, all that matters is the current state.

To put it formally, suppose we have a sequence of state variables z 1 , z 2 , . . . , z n z_1, z_2, ..., z_n z1,z2,...,zn. Then the Markov assumption is

p ( z n ∣ z 1 z 2 . . . z n − 1 ) = p ( z n ∣ z n − 1 ) p(z_n | z_1z_2...z_{n-1}) = p(z_n | z_{n-1}) p(znz1z2...zn1)=p(znzn1)

A Markov chain is useful when we need to compute a probability for a sequence of observable events. However, in many cases the events we are interested in are hidden. For example we don’t normally observe part-of-speech (POS) tags in a text. Rather, we see words, and must infer the tags from the word sequence. We call the tags hidden because they are not observed.

A hidden Markov model (HMM) allows us to talk about both observed events (like words that we see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our probabilistic model. An HMM is specified by the following components:

  • A sequence of hidden states z z z, where z k z_k zk takes values from all possible hidden states Z = { 1 , 2 , . . , m } Z = \{1,2,..,m\} Z={ 1,2,..,m}.

  • A sequence of observations x x x, where x = ( x 1 , x 2 , . . . , x n ) x = (x_1, x_2, ..., x_n) x=(x1,x2,...,xn). Each one is drawn from a vocabulary V V V.

  • A transition probability matrix A A A, where A A A is an m × m m \times m m×m matrix. A i j A_{ij} Aij represents the probability of moving from state i i i to state j j j: A i j = p ( z t + 1 = j ∣ z t = i ) A_{ij} = p(z_{t+1}=j| z_t=i) Aij=p(zt+1=jzt=i), and ∑ j = 1 m A i j = 1 \sum_{j=1}^{m} A_{ij} = 1 j=1mAij=1 for all i i i.

  • An emission probability matrix B B B, where B B B is an m × ∣ V ∣ m \times |V| m×V matrix. B i j B_{ij} Bij represents the probability of an observation x j x_j xj being generated from a state i i i: B i j = P ( x t = V j ∣ z t = i ) B_{ij} = P(x_t = V_j|z_t = i) Bij=P(xt=Vjzt=i)

  • An initial probability distribution π \pi π over states, where π = ( π 1 , π 2 , . . . , π m ) \pi = (\pi_1, \pi_2, ..., \pi_m) π=(π1,π2,...,πm). π i \pi_i πi is the probability that the Markov chain will start in state i i i. ∑ i = 1 m π i = 1 \sum_{i=1}^{m} \pi_i = 1 i=1mπi=1.

Given a sequence x x x and the corresponding hidden states z z z (like one in the picture above), we have

P ( x , z ∣ θ ) = p ( z 1 ) ⋅ [ p ( z 2 ∣ z 1 ) ⋅ p ( z 3 ∣ z 2 ) ⋅ . . . ⋅ ( z n ∣ z n − 1 ) ] ⋅ [ p ( x 1 ∣ z 1 ) ⋅ p ( x 2 ∣ z 2 ) ⋅ . . . ⋅ p ( x n ∣ z n ) ] (0) P(x, z|\theta) = p(z_1) \cdot [p(z_2|z_1)\cdot p(z_3|z_2)\cdot ... \cdotp(z_n|z_{n-1})] \cdot [p(x_1|z_1)\cdot p(x_2|z_2)\cdot ... \cdot p(x_n|z_n)] \tag 0 P(x,zθ)=p(z1)[p(z2z1)p(z3z2)...(znzn1)][p(x1z1)p(x2z2)...p(xnzn)](0)

We get p ( z 1 ) p(z_1) p(z1) from π \pi π, p ( z k + 1 ∣ z k ) p(z_{k+1}|z_k) p(zk+1zk) from A A A, and p ( x k ∣ z k ) p(x_k|z_k) p(xkzk) from B B B.

Useful probabilities p ( z k ∣ x ) p(z_k | x) p(zkx) and p ( z k + 1 , z k ∣ x ) p(z_{k+1}, z_k | x) p(zk+1,zkx)

p ( z k ∣ x ) p(z_k | x) p(zkx) and p ( z k + 1 , z k ∣ x ) p(z_{k+1}, z_k | x) p(zk+1,zkx) are useful probabilities and we are going to use them later.

Intuition: Once we have a sequence x x x, we might be interested in find the probability of any hidden state z k z_k zk, i.e., find probabilities p ( z k = 1 ∣ x ) , p ( z k = 2 ∣ x ) , . . . , p ( z k = m ∣ x ) p(z_k =1| x), p(z_k =2| x), ..., p(z_k =m| x) p(zk=1x),p(zk=2x),...,p(zk=mx). we have the following

p ( z k ∣ x ) = p ( z k , x ) p ( x ) ( 1 ) ∝ p ( z k , x ) ( 2 ) \begin{aligned} p(z_k | x) &= \frac{p(z_k, x)}{p(x)} & & (1)\\ &\propto p(z_k, x) & & (2)\\ \end{aligned} p(zkx)=p(x)p(zk,x)p(zk,x)(1)(2)

Note that from ( 1 ) (1) (1) to (2), since p ( x ) p(x) p(x) doesn’t change for all values of z k z_k zk, p ( z k ∣ x ) p(z_k | x) p(zkx) is proportional to p ( z k , x ) p(z_k, x) p(zk,x).

p ( z k = i , x ) = p ( z k = i , x 1 : k , x k + 1 : n ) = p ( z k = i , x 1 : k ) ⋅ p ( x k + 1 : n ∣ z k = i , x 1 : k ) ( 3 ) = p ( z k = i , x 1 : k ) ⋅ p ( x k + 1 : n ∣ z k = i ) ( 4.1 ) = α k ( z k = i ) ⋅ β k ( z k = i ) ( 4.11 ) \begin{aligned} p(z_k=i, x) &= p(z_k=i, x_{1:k}, x_{k+1:n}) \\ &= p(z_k=i, x_{1:k}) \cdot p(x_{k+1:n}|z_k=i, x_{1:k}) & & (3)\\ &= p(z_k=i, x_{1:k}) \cdot p(x_{k+1:n}|z_k=i) & & (4.1) \\ &= \alpha_k(z_k=i) \cdot \beta_k(z_k=i) &&(4.11)\\ \end{aligned} p(zk=i,x)=p(zk=i,x1:k,xk+1:n)=p(zk=i,x1:k)p(xk+1:nzk=i,x1:k)=p(zk=i,x1:k)p(xk+1:nzk=i)=αk(zk=i)βk(zk=i)(3)(4.1)(4.11)

From the above graph, we see that the second term ( 3 ) (3) (3) is the 2nd classical cases. So x k + 1 : n x_{k+1:n} xk+1:n and x 1 : k x_{1:k} x1:k are conditionally independent. This is why we can go from ( 3 ) (3) (3) to ( 4.1 ) (4.1) (4.1). We are going to use the Forward Algorithm to compute p ( z k , x 1 : k ) p(z_k, x_{1:k}) p(zk,x1:k), and Backward Algorithm to compute p ( x k + 1 : n ∣ z k ) p(x_{k+1:n}|z_k) p(xk+1:nzk) later.

We denote p ( z k , x 1 : k ) p(z_k, x_{1:k}) p(zk,x

  • 6
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值