GMM音素对齐

假设已经训练好了gmm模型,想用gmm模型对齐的话,这个对齐一定要有y哦
有一个特别注意的地方是kaldi里面GMM对齐有两种:align_fmllr.sh(更高级) 和 align_si.sh
可参考aishell1的训练脚本去训练fmllr特征的GMM模型,这里用align_fmllr.sh


1 准备数据

这里是参考 steps/align_fmllr.sh中的脚本,首先要的数据必须是<音频,分词的标注结果>,这里的数据准备其实和ASR数据准备一样。
在这里插入图片描述


2 计算特征得到fests.scp

. ./path.sh
. ./cmd.sh
# 这里提取的是13维的
mfccdir=mfcc
steps/make_mfcc.sh --cmd "$train_cmd" --nj 30 data/test exp/make_mfcc/test $mfccdir || exit 1;
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir || exit 1;
utils/fix_data_dir.sh data/test

【注意】

  • 【1】只提取了基础mfcc特征,特征维度为13维

3 对齐

由于模型已经训练好,直接对齐即可,产生的ali..gz文件会存放在对应位置,直接解析ali..gz文件即可

. ./path.sh
. ./cmd.sh
# 构建解码图部分属于训练模型部分,这里不用执行
# utils/mkgraph.sh data/lang_test exp/tri5a exp/tri5a/graph
# exp/tri5a_test_ali是保存的对齐位置,这里一定要注意别和训练集弄反了,nj可自己去调整
steps/align_fmllr.sh --cmd "$train_cmd" --nj 30 data/test data/lang exp/tri5a exp/tri5a_test_ali

产生对其日志如下

steps/align_fmllr.sh --cmd run.pl --mem 2G --nj 30 data/test data/lang exp/tri5a exp/tri5a_test_ali
steps/align_fmllr.sh: feature type is lda
steps/align_fmllr.sh: compiling training graphs
steps/align_fmllr.sh: aligning data in data/test using exp/tri5a/final.alimdl 
and speaker-independent features.
steps/align_fmllr.sh: computing fMLLR transforms
steps/align_fmllr.sh: doing final alignment.
steps/align_fmllr.sh: done aligning data.
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 2G data/lang exp/tri5a_test_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri5a_test_ali/log/analyze_alignments.log
3 warnings in exp/tri5a_test_ali/log/fmllr.*.log
17 warnings in exp/tri5a_test_ali/log/align_pass2.*.log
14 warnings in exp/tri5a_test_ali/log/align_pass1.*.log

3.1 对齐到pdf和输出后验概率

ali-to-pdf exp/tri5a/final.mdl \
ark:"gunzip -c exp/tri5a_test_ali/ali.1.gz|" ark,t:- | head -n 1 | \
ali-to-post ark,t:- ark,t:-

[ 70 1 ]为pdf-id为70的后验概率为1

HAO0007501-000000 [ 0 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 69 1 ] [ 68 1 ] [ 26 1 ] [ 26 1 ] [ 26 1 ] [ 26 1 ] [ 26 1 ] [ 2415 1 ] [ 2415 1 ] [ 966 1 ] [ 966 1 ] [ 966 1 ] [ 966 1 ] [ 2519 1 ] [ 2632 1 ] [ 1082 1 ] [ 2728 1 ] [ 2777 1 ] [ 1723 1 ] [ 1877 1 ] [ 1780 1 ] [ 2702 1 ] [ 2702 1 ] [ 2702 1 ] [ 2702 1 ] [ 2058 1 ] [ 2105 1 ] [ 2105 1 ] [ 2105 1 ] [ 449 1 ] [ 438 1 ] [ 438 1 ] [ 2851 1 ] [ 2372 1 ] [ 2372 1 ] [ 2372 1 ] [ 2372 1 ] [ 2372 1 ] [ 1266 1 ] [ 1266 1 ] [ 1266 1 ] [ 1788 1 ] [ 1788 1 ] [ 1788 1 ] [ 1166 1 ] [ 1166 1 ] [ 1166 1 ] [ 1166 1 ] [ 1166 1 ] [ 2894 1 ] [ 2894 1 ] [ 687 1 ] [ 687 1 ] [ 2268 1 ] [ 2268 1 ] [ 2268 1 ] [ 2268 1 ] [ 1552 1 ] [ 1552 1 ] [ 1552 1 ] [ 1552 1 ] [ 1552 1 ] [ 1897 1 ] [ 1897 1 ] [ 1897 1 ] [ 1297 1 ] [ 1297 1 ] [ 1916 1 ] [ 1916 1 ] [ 1853 1 ] [ 2330 1 ] [ 2121 1 ] [ 2121 1 ] [ 2121 1 ] [ 2121 1 ] [ 2323 1 ] [ 2323 1 ] [ 2323 1 ] [ 2323 1 ] [ 2290 1 ] [ 910 1 ] [ 910 1 ] [ 2615 1 ] [ 2352 1 ] [ 2352 1 ] [ 1552 1 ] [ 1552 1 ] [ 2417 1 ] [ 2417 1 ] [ 2417 1 ] [ 2417 1 ] [ 2142 1 ] [ 2142 1 ] [ 2370 1 ] [ 2319 1 ] [ 2631 1 ] [ 758 1 ] [ 758 1 ] [ 758 1 ] [ 1980 1 ] [ 1617 1 ] [ 1617 1 ] [ 1214 1 ] [ 976 1 ] [ 976 1 ] [ 933 1 ] [ 933 1 ] [ 933 1 ] [ 670 1 ] [ 670 1 ] [ 670 1 ] [ 670 1 ] [ 78 1 ] [ 78 1 ] [ 757 1 ] [ 757 1 ] [ 385 1 ] [ 22 1 ] [ 80 1 ] [ 80 1 ] [ 80 1 ] [ 80 1 ] [ 188 1 ] [ 188 1 ] [ 188 1 ] [ 188 1 ] [ 188 1 ] [ 258 1 ] [ 258 1 ] [ 258 1 ] [ 19 1 ] [ 19 1 ] [ 19 1 ] [ 19 1 ] [ 347 1 ] [ 347 1 ] [ 347 1 ] [ 347 1 ] [ 347 1 ] [ 347 1 ] [ 347 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 242 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 343 1 ] [ 0 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 70 1 ] [ 68 1 ] 
LOG (ali-to-post[5.5]:main():ali-to-post.cc:73) Converted 1 alignments

3.2 对齐到音素

ali-to-phones --per-frame=true exp/tri5a/final.mdl \
ark:"gunzip -c exp/tri5a_test_ali/ali.1.gz|" ark,t:-| \
int2sym.pl -f 2:10000 data/lang/phones.txt | head -n 5
HAO0007501-000000 sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil ii ii ii ii ii ii ii ii ii ii ii 
iu3 iu3 iu3 r r r en2 en2 en2 en2 en2 en2 l l l l l ai2 ai2 ai2 ai2 ai2 ai2 ai2 ai2 
q q q q q q q q q q q iang1 iang1 iang1 iang1 iang1 iang1 iang1 iang1 
j j j j j j j j j j ie2 ie2 ie2 ie2 g g g g g g g g g an3 an3 an3 an3 an3 
j j j j j j j j in3 in3 in3 d d d d d d a3 a3 a3 a3 a3 a3 
d d d d d d d d ian4 ian4 ian4 ian4 ian4 ian4 h h h h h h h h h h h h ua4 ua4 ua4 ua4 ua4 ua4 ua4 
ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 
ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 ua4 sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil sil 
sil sil sil sil sil sil sil sil sil sil sil 

id=HAO0007501-000000的对齐音素总共为373个。

3.3 再看特征

在feat.cp中找到这个id=HAO0007501-000000,我们看看里面的特征。

# 将这个id=HAO0007501-000000在feat.scp中定为到的具体raw_mfcc文件给找出来并转换为txt
# 
copy-feats ark:mfcc/raw_mfcc_test.1.ark ark,t:raw_mfcc_test.1.txt
HAO0007501-000000  [
  29.01624 -32.6691 -6.993553 -4.887017 -9.987267 -9.018179 -0.5451918 2.109525 6.914473 -2.27588 -1.337507 -6.087133 3.309676 
  28.55501 -32.18113 -3.659678 -7.498455 -1.382144 -1.900894 -4.72083 0.9089931 2.948753 3.095551 3.979742 5.138752 1.559399 
  28.09378 -35.59689 -12.8813 -10.81606 -7.499728 -10.23333 -9.514559 -1.250608 9.109114 5.337988 -0.4852505 -11.26428 -17.41257 
  ......

*会发现id=HAO0007501-000000的行数总共的确有373行,就是373帧,且特征维度为13维

【特别注意】详细看steps/align_fmllr.sh代码发现在fmllr的GMM模型(用到了LDA+MLLT来做特征矩阵转换将39维转换到40维)(模型文件夹下没有final.mat)中,特征是采用了LDA矩阵转换,真正用的并不是13维,只不过里面用了final.mat做转换到了40维(我这是没有用到pitch、ivector等特征,仅仅用mfcc下)

【特别注意】详细看steps/align_si.sh代码发现mono GMM模型(没有用到LDA+MLLT)并没有用到特征矩阵转换(模型文件夹下没有final.mat),只用到了delta特征一阶和二阶差分分别各自+13维度,总共增加到39维度。最后其实其用的是39维的mfcc特征

【特别注意】如果在使用pitch特征(维度为3)的时候,去建立mono模型,就是提取13+3=16,但是仔细看steps/train_mono.sh里面的代码,mono其实用到了add-deltas去求一阶差分+二阶差分,这样就得到了16*3=48维度的mono GMM模型了

Reference

Forced Alignment
kaldi的aishell1训练脚本

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值