隐马尔可夫模型

前一篇文章没有系统地介绍这个模型,本篇文章将详细介绍

1. 定义:

The Hidden Markov Model is a finite set of states , each of which is associated with a (generally multidimensional) probability distribution []. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are ``hidden'' to the outside; hence the name Hidden Markov Model.

隐马尔科夫模型是由有限个状态组成的,每一个状态都以一定的概率出现,状态之间的转换由转换概率表决定,每一个状态都可以产生一个观察到的状态,在隐马尔科夫模型中,只有观察到的状态所见,真实的马尔科夫状态链不可见,因此被称为隐马尔科夫模型,该模型包含如下三个要素:

  • The number of states of the model, N .
  • The number of observation symbols in the alphabet, M . If the observations are continuous then M is infinite.
  • A set of state transition probabilities tex2html_wrap_inline2612 .

    displaymath2614

    where tex2html_wrap_inline2616 denotes the current state.
    Transition probabilities should satisfy the normal stochastic constraints,

    displaymath2618

    and

    displaymath2620

  • A probability distribution in each of the states, tex2html_wrap_inline2622 .

    displaymath2624

    where tex2html_wrap_inline2626 denotes the tex2html_wrap_inline2628 observation symbol in the alphabet, and tex2html_wrap_inline2630 the current parameter vector.
    Following stochastic constraints must be satisfied.

    displaymath2632

    and

    displaymath2634

    If the observations are continuous then we will have to use a continuous probability density function, instead of a set of discrete probabilities. In this case we specify the parameters of the probability density function. Usually the probability density is approximated by a weighted sum of M Gaussian distributions tex2html_wrap_inline2638 ,

    displaymath2640

    where,

    displaymath2604

    tex2html_wrap_inline2646 should satisfy the stochastic constrains,

    displaymath2648

    and

    displaymath2650

  • The initial state distribution, tex2html_wrap_inline2652 .
    where,

    displaymath2654

Therefore we can use the compact notation

displaymath2656

to denote an HMM with discrete probability distributions, while

displaymath2658

to denote one with continuous densities. .

所以,一个隐马尔科夫模型可记为

displaymath2656

2.一些假设:

For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs.

(1)The Markov assumption 状态的马尔科夫假设,即当前状态只与前一个状态相关
As given in the definition of HMMs, transition probabilities are defined as,

displaymath2660

In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM.
However generally the next state may depend on past k states and it is possible to obtain a such model, called an tex2html_wrap_inline2628 order HMM by defining the transition probabilities as follows.

displaymath2666

But it is seen that a higher order HMM will have a higher complexity. Even though the first order HMMs are the most common, some attempts have been made to use the higher order HMMs too.

(2)The stationarity assumption 稳定性假设,状态转换与时间无关
Here it is assumed that state transition probabilities are independent of the actual time at which the transitions takes place. Mathematically,

displaymath2668

for any tex2html_wrap_inline2670 and tex2html_wrap_inline2672 .

(3)The output independence assumption 当前状态到观察状态的转换概率与已经发生的观察序列无关,即可以将观察序列分解为多个无关的步骤
This is the assumption that current output(observation) is statistically independent of the previous outputs(observations). We can formulate this assumption mathematically, by considering a sequence of observations,

displaymath2674

. Then according to the assumption for an HMM tex2html_wrap_inline2676 ,

displaymath2678

However unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs.

3.要解决的三个问题:

Once we have an HMM, there are three problems of interest.

(1)The Evaluation Problem 计算某一个观察序列在模型下的出现概率
Given an HMM tex2html_wrap_inline2676 and a sequence of observations tex2html_wrap_inline2682 , what is the probability that the observations are generated by the model, tex2html_wrap_inline2684 ?
(2)The Decoding Problem 根据观察到的序列,计算其最有可能对应的隐藏状态序列,即解码问题
Given a model tex2html_wrap_inline2676 and a sequence of observations tex2html_wrap_inline2682 , what is the most likely state sequence in the model that produced the observations?
(3)The Learning Problem 怎样改进这个模型,使得观察到的序列的概率最大化
Given a model tex2html_wrap_inline2676 and a sequence of observations tex2html_wrap_inline2682 , how should we adjust the model parameters tex2html_wrap_inline2694 in order to maximize tex2html_wrap_inline2696

Evaluation problem can be used for isolated (word) recognition. Decoding problem is related to the continuous recognition as well as to the segmentation. Learning problem must be solved, if we want to train an HMM for the subsequent use of recognition tasks.

4. 估计观察序列的概率问题:

We have a model tex2html_wrap_inline2698 and a sequence of observations tex2html_wrap_inline2682 , and tex2html_wrap_inline2684 must be found. We can calculate this quantity using simple probabilistic arguments. But this calculation involves number of operations in the order of tex2html_wrap_inline2704 . This is very large even if the length of the sequence, T is moderate. Therefore we have to look for an other method for this calculation. Fortunately there exists one which has a considerably low complexity and makes use an auxiliary variable, tex2html_wrap_inline2708 called forward variable .

The forward variable is defined as the probability of the partial observation sequence tex2html_wrap_inline2710 , when it terminates at the state i . Mathematically,

  equation195

前向变量:观察到O1,O2,..,Ot并且t时刻Qt = i 的概率,它是按t向前推进的,当t=T时,整个观察序列都已经获取到,因此只要对所有的前向变量在T时刻的值求和就得到了观察序列出现的概率

Then it is easy to see that following recursive relationship holds.

  equation206

where,

displaymath2722

Using this recursion we can calculate

displaymath2724

and then the required probability is given by,

  equation227

The complexity of this method, known as the forward algorithm is proportional to tex2html_wrap_inline2728 , which is linear wrt T whereas the direct calculation mentioned earlier, had an exponential complexity.

In a similar way we can define the backward variable tex2html_wrap_inline2732 as the probability of the partial observation sequence tex2html_wrap_inline2734 , given that the current state is i . Mathematically ,

  equation244

后向变量定义的时t时刻之后产生的某一个观察序列的概率,而前向变量定义的是这个时刻之前的观察序列的概率,根据前面三个假设中的最后一个,因此整个序列的概率等于前向 与后向的乘积

As in the case of tex2html_wrap_inline2708 there is a recursive relationship which can be used to calculate tex2html_wrap_inline2732 efficiently.

  equation257

where,

displaymath2750

Further we can see that,

  equation272

Therefore this gives another way to calculate tex2html_wrap_inline2684 , by using both forward and backward variables as given in eqn. 1.7 .

  equation283

Eqn. 1.7 is very useful, specially in deriving the formulas required for gradient based training.

5.解码问题:

In this case We want to find the most likely state sequence for a given sequence of observations, tex2html_wrap_inline2682 and a model, tex2html_wrap_inline2762

The solution to this problem depends upon the way ``most likely state sequence'' is defined. One approach is to find the most likely state tex2html_wrap_inline2616 at t =t and to concatenate all such ' tex2html_wrap_inline2616 's. But some times this method does not give a physically meaningful state sequence. Therefore we would go for another method which has no such problems.
In this method, commonly known as Viterbi algorithm , the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable,

displaymath2770

which gives the highest probability that partial observation sequence and state sequence up to t =t can have, when the current state is i .

上面的公式定义:在t时刻,结束状态是i,并且观察到的序列是O1...t-1的最大概率,因此解码问题就变成求在T时刻,概率最大的结束状态
It is easy to observe that the following recursive relationship holds.

  equation322

where,

displaymath2778

这个递推公式说明如下:

由前面的第三个假设可知,t时刻转换到t + 1 时刻,这个概率与已经发生的观察序列无关,因此我们只需要保存在每个状态上的最大概率,然后在计算这个状态进行到下一个状态的概率,将二者进行乘积即得到在t + 1时刻该路径的概率,然后在N个值中选择一个最大的值

So the procedure to find the most likely state sequence starts from calculation of tex2html_wrap_inline2780 using recursion in 1.8 , while always keeping a pointer to the ``winning state'' in the maximum finding operation. Finally the state tex2html_wrap_inline2782 , is found where

displaymath2784

and starting from this state, the sequence of states is back-tracked as the pointer in each state indicates.This gives the required set of states.
This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the states of the HMM in each of the time instant tex2html_wrap_inline2786 .

 

From :http://blog.csdn.net/tianqio/archive/2009/06/17/4275895.aspx (THX)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值