Hidden Markov Model (HMM) 详细推导及思路分析

最新推荐文章于 2024-03-20 10:00:41 发布

Jay_Tang

最新推荐文章于 2024-03-20 10:00:41 发布

阅读量1.6k

点赞数 6

分类专栏：机器学习核心推导 NLP 核心推导文章标签：自然语言处理机器学习动态规划算法

本文链接：https://blog.csdn.net/Jay_Tang/article/details/105898644

版权

本文详细介绍了隐马尔科夫模型(HMM)的基础知识，包括条件独立性、HMM的设置、前向算法、后向算法及其在解决概率、学习和推断问题中的应用。通过对经典案例的分析，阐述了HMM的计算原理，为理解和应用HMM提供了清晰的思路。

摘要由CSDN通过智能技术生成

往期文章链接目录

文章目录

Before reading this post, make sure you are familiar with the EM Algorithm and decent among of knowledge of convex optimization. If not, please check out my previous post

Let’s get started!

Conditional independence

$A$ and $B$ are conditionally independent given $C$ if and only if, given knowledge that $C$ occurs, knowledge of whether $A$ occurs provides no information on the likelihood of $B$ occurring, and knowledge of whether $B$ occurs provides no information on the likelihood of $A$ occurring.

Formally, if we denote conditional independence of $A$ and $B$ given $C$ by $(A\perp \!\!\!\perp B)\mid C$ , then by definition, we have

$(A\perp \!\!\!\perp B)\mid C\quad \iff \quad P(A, B\mid C)= P(A\mid C) \cdot P(B\mid C)$

Given the knowledge that $C$ occurs, to show the knowledge of whether $B$ occurs provides no information on the likelihood of $A$ occurring, we have

$\begin{aligned} P(A | B ,C) &=\frac{P(A , B , C)}{P(B , C)} \\ &=\frac{P(A , B | C) \cdot P(C)}{P(B , C)} \\ &=\frac{P(A | C) \cdot P(B | C) \cdot P(C)}{P(B | C) \cdot P(C)} \\ &=P(A | C) \end{aligned}$

Two classical cases where $X$ and $Z$ are conditionally independent

Case 1 :

From the above directed graph, we have $P(X)\cdot P(Y|X)\cdot P(Z|Y)$ . Hence we have

$\begin{aligned} P(Z|X,Y) &= \frac{P(X,Y,Z)}{P(X,Y)}\\ &= \frac{P(X)\cdot P(Y|X)\cdot P(Z|Y)}{P(X)\cdot P(Y|X)}\\ &= P(Z|Y) \end{aligned}$

Therefore, $X$ and $Z$ are conditionally independent.

Case 2 :

From the above directed graph, we have $P(Y)\cdot P(X|Y) \cdot P(Z|Y)$ . Hence we have

$\begin{aligned} P(Z|X,Y) &= \frac{P(X,Y,Z)}{P(X,Y)}\\ &= \frac{P(Y)\cdot P(X|Y) \cdot P(Z|Y)}{P(Y)\cdot P(X|Y)}\\ &= P(Z|Y) \end{aligned}$

Therefore, $X$ and $Z$ are conditionally independent.

Settings of the Hidden Markov Model (HMM)

The HMM is based on augmenting the Markov chain. A Markov chain is a model that tells us something about the probabilities of sequences of random variables, states, each of which can take on values from some set. A Markov chain makes a very strong assumption that if we want to predict the future in the sequence, all that matters is the current state.

To put it formally, suppose we have a sequence of state variables $z_1, z_2, ..., z_n$ . Then the Markov assumption is

$p(z_n | z_1z_2...z_{n-1}) = p(z_n | z_{n-1})$

A Markov chain is useful when we need to compute a probability for a sequence of observable events. However, in many cases the events we are interested in are hidden. For example we don’t normally observe part-of-speech (POS) tags in a text. Rather, we see words, and must infer the tags from the word sequence. We call the tags hidden because they are not observed.

A hidden Markov model (HMM) allows us to talk about both observed events (like words that we see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our probabilistic model. An HMM is specified by the following components:

A sequence of hidden states $z$ , where $z_k$ takes values from all possible hidden states $Z = \{1,2,..,m\}$ .
A sequence of observations $x$ , where $x = (x_1, x_2, ..., x_n)$ . Each one is drawn from a vocabulary $V$ .
A transition probability matrix $A$ , where $A$ is an $\times m$ matrix. $A_{ij}$ represents the probability of moving from state $i$ to state $j$ : $A_{ij} = p(z_{t+1}=j| z_t=i)$ , and $\sum_{j=1}^{m} A_{ij} = 1$ for all $i$ .
An emission probability matrix $B$ , where $B$ is an $\times |V|$ matrix. $B_{ij}$ represents the probability of an observation $x_j$ being generated from a state $i$ : $B_{ij} = P(x_t = V_j|z_t = i)$
An initial probability distribution $\pi$ over states, where $\pi = (\pi_1, \pi_2, ..., \pi_m)$ . $\pi_i$ is the probability that the Markov chain will start in state $i$ . $\sum_{i=1}^{m} \pi_i = 1$ .

Given a sequence $x$ and the corresponding hidden states $z$ (like one in the picture above), we have

$z|\theta) = p(z_1) \cdot [p(z_2|z_1)\cdot p(z_3|z_2)\cdot ... \cdotp(z_n|z_{n-1})] \cdot [p(x_1|z_1)\cdot p(x_2|z_2)\cdot ... \cdot p(x_n|z_n)] \tag 0$

We get $p(z_1)$ from $\pi$ , $p(z_{k+1}|z_k)$ from $A$ , and $p(x_k|z_k)$ from $B$ .

Useful probabilities $p(z_k | x)$ and $p(z_{k+1}, z_k | x)$

$p(z_k | x)$ and $p(z_{k+1}, z_k | x)$ are useful probabilities and we are going to use them later.

Intuition: Once we have a sequence $x$ , we might be interested in find the probability of any hidden state $z_k$ , i.e., find probabilities $p(z_k =1| x), p(z_k =2| x), ..., p(z_k =m| x)$ . we have the following

$\begin{aligned} p(z_k | x) &= \frac{p(z_k, x)}{p(x)} & & (1)\\ &\propto p(z_k, x) & & (2)\\ \end{aligned}$

Note that from $(1)$ to (2), since $p (x)$ doesn’t change for all values of $z_k$ , $p(z_k | x)$ is proportional to $p(z_k, x)$ .

$\begin{aligned} p(z_k=i, x) &= p(z_k=i, x_{1:k}, x_{k+1:n}) \\ &= p(z_k=i, x_{1:k}) \cdot p(x_{k+1:n}|z_k=i, x_{1:k}) & & (3)\\ &= p(z_k=i, x_{1:k}) \cdot p(x_{k+1:n}|z_k=i) & & (4.1) \\ &= \alpha_k(z_k=i) \cdot \beta_k(z_k=i) &&(4.11)\\ \end{aligned}$

From the above graph, we see that the second term $(3)$ is the 2nd classical cases. So $x_{k+1:n}$ and $x_{1:k}$ are conditionally independent. This is why we can go from $(3)$ to $(4.1)$ . We are going to use the Forward Algorithm to compute $p(z_k, x_{1:k})$ , and Backward Algorithm to compute $p(x_{k+1:n}|z_k)$ later.

We denote $p(z_k, x_{1:k})$

最低0.47元/天解锁文章

Jay_Tang

关注

6
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
Hidden Markov Model (HMM) 详细推导及思路分析

往期文章链接目录Before reading this post, you should be familiar with the EM Algorithm and decent among of knowledge of convex optimization. If not, check out my previous postEM Algorithmconvex optimiz...
复制链接

扫一扫