本文解释hmm这个fst的使用方法,如何将特征向量映射到元音phone的。
只解释用法,不解释如何生成hmm和model。
在yesno/s5/exp/mono0a/graph_tgpr目录下打印Ha.fst的内容
boystray@boystray-All-Series:~/kaldi/egs/yesno/s5/exp/mono0a/graph_tgpr$ fstprint Ha.fst
0 1 0 1
0 7 20 2
0 10 26 3
0 13 31 4
0
1 2 2 0 3.26078463
1 3 3 0 0.771307588
1 4 4 0 0.694674611
2 3 6 0 1.60320568
2 4 7 0 0.514737606
2 5 8 0 1.60320568
3 2 9 0 2.19029689
3 4 11 0 0.253189087
3 5 12 0 2.19029689
4 2 13 0 2.36310244
4 3 14 0 2.36310244
4 5 16 0 0.208477736
5 6 18 0
6 0 0 0
7 8 22 0
8 9 24 0
9 0 0 0
10 11 28 0
11 12 30 0 -2.38418579e-07
12 0 0 0
13 0 0 0
这里的第1列是源节点,第二列是目标节点,第3列是Transition-id,第4列是phone id。
Transition-id可以通过show-transitions获得
boystray@boystray-All-Series:~/kaldi/egs/yesno/s5/exp/mono0a$ ~/kaldi/src/bin/show-transitions phones.txt 0.mdl
/home/boystray/kaldi/src/bin/show-transitions phones.txt 0.mdl
Transition-state 1: phone = SIL hmm-state = 0 pdf = 0
Transition-id = 1 p = 0.25 [self-loop]
Transition-id = 2 p = 0.25 [0 -> 1]
Transition-id = 3 p = 0.25 [0 -> 2]
Transition-id = 4 p = 0.25 [0 -> 3]
Transition-state 2: phone = SIL hmm-state = 1 pdf = 1
Transition-id = 5 p = 0.25 [self-loop]
Transition-id = 6 p = 0.25 [1 -> 2]
Transition-id = 7 p = 0.25 [1 -> 3]
Transition-id = 8 p = 0.25 [1 -> 4]
Transition-state 3: phone = SIL hmm-state = 2 pdf = 2
Transition-id = 9 p = 0.25 [2 -> 1]
Transition-id = 10 p = 0.25 [self-loop]
Transition-id = 11 p = 0.25 [2 -> 3]
Transition-id = 12 p = 0.25 [2 -> 4]
Transition-state 4: phone = SIL hmm-state = 3 pdf = 3
Transition-id = 13 p = 0.25 [3 -> 1]
Transition-id = 14 p = 0.25 [3 -> 2]
Transition-id = 15 p = 0.25 [self-loop]
Transition-id = 16 p = 0.25 [3 -> 4]
Transition-state 5: phone = SIL hmm-state = 4 pdf = 4
Transition-id = 17 p = 0.75 [self-loop]
Transition-id = 18 p = 0.25 [4 -> 5]
Transition-state 6: phone = Y hmm-state = 0 pdf = 5
Transition-id = 19 p = 0.75 [self-loop]
Transition-id = 20 p = 0.25 [0 -> 1]
Transition-state 7: phone = Y hmm-state = 1 pdf = 6
Transition-id = 21 p = 0.75 [self-loop]
Transition-id = 22 p = 0.25 [1 -> 2]
Transition-state 8: phone = Y hmm-state = 2 pdf = 7
Transition-id = 23 p = 0.75 [self-loop]
Transition-id = 24 p = 0.25 [2 -> 3]
Transition-state 9: phone = N hmm-state = 0 pdf = 8
Transition-id = 25 p = 0.75 [self-loop]
Transition-id = 26 p = 0.25 [0 -> 1]
Transition-state 10: phone = N hmm-state = 1 pdf = 9
Transition-id = 27 p = 0.75 [self-loop]
Transition-id = 28 p = 0.25 [1 -> 2]
Transition-state 11: phone = N hmm-state = 2 pdf = 10
Transition-id = 29 p = 0.75 [self-loop]
Transition-id = 30 p = 0.25 [2 -> 3]
而phone id在phones.txt文件中。
phones.txt文件如下
<eps> 0
SIL 1
Y 2
N 3
#0 4
#1 5
有了上面的基础,再看看Ha.fst最开始的几行
源节点 目标节点 Transition-id phone id
0 7 20 2 识别出Y
0 10 26 3 识别出N
0 13 31 4 识别出#0
那么就识别出了元音phone,后续再通过HCLG,依次识别出word和句子。