HMM(Hidden Markov Models)

在语音识别问题中,隐马尔科夫模型(HMM)是如何支持特征在时间轴上的扭曲的?
HMM(Hidden Markov Models)
一个HMM有两部分:
  1. 状态(state)/状态的转移(transition):描述了HMM的基本骨架,即一个HMM有多少个states,以及states之间的转移关系。
  2. 每一个state的概率分布(probability distributions):可再分为两部分,①带有概率的Markov链,即由某一状态去往下一个状态的转移概率;②每一个state的data probability distributions,语音识别中通常用混合高斯模型(Gaussian Mixture Model)来描述。

Bakis topology
Bakis topology就是语音的HMM,与上面的HMM最大不同就是,一个state的转移只有三种可能性①跳回自己;②跳到下一个state;③跳到下下个state。
首先,state表示的直观理解如下图,它们其实由某一类语音特征向量(可以为一个单词的特征向量,也可以为音素的特征向量)的分割:

其次,一个state的转移只有三种可能性的原因是:
  1. 语音是从左到右的,不存在状态跳回去的可能,所以状态只能向前跳。
  2. HMM为输入语音的抽象,一个状态是由许许多多相同的特征向量训练出来的。某个HMM在识别它对应的单词的时候,HMM的某个state,如上图中的‘s’,一定对应了输入语音的多个特征向量。所以有自环,表示输入语音的当前的特征向量对应到‘s’,下一个特征向量还是对应到‘s’。
  3. 某些音节可能会被省略掉,所以会跳到下下个。

Viterbi Search
Viterbi search就是作用在Bakis topology HMM上的DTW。它通过计算输入语音的每一个特征向量在给定HMM每一个state上的观测概率,结合state之间的转移概率,找出输入语音在HMM上的最优跳转的路径。如下图,其中,横轴为输入语音、向右箭头为state的自环、右上方22.5°的箭头为跳到下一个state、右上方45°为跳到下下个state:
横轴为时间,纵轴为所有的状态集合,应该是这么理解的!结合PPT《基于HMM的孤立词语音识别》

可以看出,HMM的Viterbi search其实就是DTW算法。

HMM如何支持特征在时间轴上的扭曲
HMM模型中, state的自环就是在“支持特征在时间轴上的扭曲”。语速慢,每一个state的自环就多,Vettibe search示意图中向右 (我觉得这里指的是统一水平线向右) 的箭头就多;语速快,每一个state的自环就少,Vertibe search示意图中向右的箭头就少;语速时快时慢,读得快的那个音节对应的state自环就少,读得慢的那个音节对应的state自环就多......

关于DTW(Dynamic Time Warping)
个人感觉题主理解DTW可能局限了,所以多说几句。
输入语音的特征向量与Template的特征向量做对比,示意图如下:

输入语音的特征向量与HMM做对比,示意图如下:

HMM在做Vertibe search的时候用的也是DTW的算法,只不过从 计算template的某个特征向量与输入语音的某个特征向量的距离 变成了 计算输入语音的某个特征向量在HMM某个state上的观测概率
(HMM与DTW)
这样做的好处是显而易见的,即节省了保存多个template的特征向量的空间(只需保存HMM的参数),做best alignment的时候也更快,而且,HMM通过许许多多的templates(训练数据)训练出来,识别效果也好。

所有图片引用自JIE J1799d课程的课件http://jie.sysu.edu.cn/~mli/j1799d.htm
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
用python写的一段贝叶斯网络的程序 This file describes a Bayes Net Toolkit that we will refer to now as BNT. This version is 0.1. Let's consider this code an "alpha" version that contains some useful functionality, but is not complete, and is not a ready-to-use "application". The purpose of the toolkit is to facilitate creating experimental Bayes nets that analyze sequences of events. The toolkit provides code to help with the following: (a) creating Bayes nets. There are three classes of nodes defined, and to construct a Bayes net, you can write code that calls the constructors of these classes, and then you can create links among them. (b) displaying Bayes nets. There is code to create new windows and to draw Bayes nets in them. This includes drawing the nodes, the arcs, the labels, and various properties of nodes. (c) propagating a-posteriori probabilities. When one node's probability changes, the posterior probabilities of nodes downstream from it may need to change, too, depending on firing thresholds, etc. There is code in the toolkit to support that. (d) simulating events ("playing" event sequences) and having the Bayes net respond to them. This functionality is split over several files. Here are the files and the functionality that they represent. BayesNetNode.py: class definition for the basic node in a Bayes net. BayesUpdating.py: computing the a-posteriori probability of a node given the probabilities of its parents. InputNode.py: class definition for "input nodes". InputNode is a subclass of BayesNetNode. Input nodes have special features that allow them to recognize evidence items (using regular-expression pattern matching of the string descriptions of events). OutputNode.py: class definition for "output nodes". OutputBode is a subclass of BayesNetNode. An output node can have a list of actions to be performed when the node's posterior probability exceeds a threshold ReadWriteSigmaFiles.py: Functionality for loading and saving Bayes nets in an XML format. SampleNets.py: Some code that constructs a sample Bayes net. This is called when SIGMAEditor.py is started up. SIGMAEditor.py: A main program that can be turned into an experimental application by adding menus, more code, etc. It has some facilities already for loading event sequence files and playing them. sample-event-file.txt: A sequence of events that exemplifies the format for these events. gma-mona.igm: A sample Bayes net in the form of an XML file. The SIGMAEditor program can read this type of file. Here are some limitations of the toolkit as of 23 February 2009: 1. Users cannot yet edit Bayes nets directly in the SIGMAEditor. Code has to be written to create new Bayes nets, at this time. 2. If you select the File menu's option to load a new Bayes net file, you get a fixed example: gma-mona.igm. This should be changed in the future to bring up a file dialog box so that the user can select the file. 3. When you "run" an event sequence in the SIGMAEditor, the program will present each event to each input node and find out if the input node's filter matches the evidence. If it does match, that fact is printed to standard output, but nothing else is done. What should then happen is that the node's probability is updated according to its response method, and if the new probability exceeds the node's threshold, then its successor ("children") get their probabilities updated, too. 4. No animation of the Bayes net is performed when an event sequence is run. Ideally, the diagram would be updated dynamically to show the activity, especially when posterior probabilities of nodes change and thresholds are exceeded. To use the BNT, do three kinds of development: A. create your own Bayes net whose input nodes correspond to pieces of evidence that might be presented and that might be relevant to drawing inferences about what's going on in the situation or process that you are analyzing. You do this by writing Python code that calls constructors etc. See the example in SampleNets.py. B. create a sample event stream that represents a plausible sequence of events that your system should be able to analyze. Put this in a file in the same format as used in sample-event-sequence.txt. C. modify the code of BNT or add new modules as necessary to obtain the functionality you want in your system. This could include code to perform actions whenever an output node's threshold is exceeded. It could include code to generate events (rather than read them from a file). And it could include code to describe more clearly what is going on whenever a node's probability is updated (e.g., what the significance of the update is -- more certainty about something, an indication that the weight of evidence is becoming strong, etc.)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值