Kaldi训练小批量的语音库voxforge并在线解码语音文件

环境: Ubuntu 12.04,  Kaldi

在训练timit语音库已经运行到“MMI + SGMM2 Training & Decoding",由于是在虚拟机上的ubuntu,且硬件配置一般,再往向训练DNN模型的发现需要花很长很长时间,因此就止步在那。 想使用训练的模型来做在线解码 (http://blog.itpub.net/16582684/viewspace-1270816/),发现却无法使用(timit训练数据中的wav文件是sphere格式,而voxforge的wav是可以播放),因而转向对voxforge语音库的训练。由于voxforge语音库是开源的,不像timit有版权限制,同时其训练的模型也能支持在线解码,所以对这个语音库来进行训练。

步骤:
1. 安装mitlm, g2p依赖的库
sudo apt-get install flac 
sudo apt-get install swig

2. 切换到/u01/kaldi/egs/voxforge/s5,脚本local/voxforge_prepare_lm.sh有安装mitlm的步骤,但发现无法从http://mitlm.googlecode.com/svn/trunk/地址上svn checkout下源码,只好从https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz下载源码,放到tools下,解压后更名为mitlm-svn, 注释掉 脚本 local/voxforge_prepare_lm.sh中”svn checkout -r103 http://mitlm.googlecode.com/svn/trunk/ tools/mitlm-svn“

3. 修改脚本getdata.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项,运行脚本./getdata.sh下载并解压数据,由于下载慢同时机器配置一般,只下载100M左右数据并解压

4. 修改脚本run.sh,增加 DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项 ,由于数据量比较小,还有几项修改如下:
nspk_test=7
utils/subset_data_dir.sh data/train 15 data/train.1k  || exit 1;

5. 运行脚本./run,风扇狂响 ,CPU使用率直接飙到100%,大概五个小时。运行到”#  Do MMI on top of LDA+MLLT. “,输出如下:

=== Starting VoxForge subset selection(accent: ((American)|(British)|(Australia)|(Zealand))) ...
*** VoxForge subset selection finished!
=== Starting to map anonymous users to unique IDs ...
--- Mapping the "anonymous" speakers to unique IDs ...
ls: cannot access /u01/kaldi/egs/voxforge/s5/data/selected/anonymous-*-*: No such file or directory
*** Finished mapping anonymous users!
=== Starting initial VoxForge data preparation ...
--- Making test/train data split ...
17 data/local/tmp/speakers_all.txt
 10 data/local/tmp/speakers_train.txt
  7 data/local/tmp/speakers_test.txt
 17 total
--- Preparing test_wav.scp, test_trans.txt and test.utt2spk ...
--- Preparing test.spk2utt ...
--- Preparing train_wav.scp, train_trans.txt and train.utt2spk ...

。。。。。。

steps/decode.sh --config conf/decode.config --iter 3 --nj 2 --cmd run.pl exp/tri                                                                                        2b/graph data/test exp/tri2b_mmi/decode_it3
decode.sh: feature type is lda
exp/tri2b_mmi/decode_it3/wer_10
%WER 97.59 [ 1657 / 1698, 29 ins, 649 del, 979 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_11
%WER 97.17 [ 1650 / 1698, 22 ins, 713 del, 915 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_12
%WER 96.76 [ 1643 / 1698, 15 ins, 787 del, 841 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_13
%WER 96.41 [ 1637 / 1698, 15 ins, 837 del, 785 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_14
%WER 96.64 [ 1641 / 1698, 11 ins, 888 del, 742 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_15
%WER 96.82 [ 1644 / 1698, 7 ins, 930 del, 707 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_16
%WER 97.06 [ 1648 / 1698, 7 ins, 967 del, 674 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_17
%WER 97.17 [ 1650 / 1698, 9 ins, 997 del, 644 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_18
%WER 97.17 [ 1650 / 1698, 9 ins, 1013 del, 628 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_19
%WER 97.41 [ 1654 / 1698, 9 ins, 1027 del, 618 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_20
%WER 97.17 [ 1650 / 1698, 9 ins, 1037 del, 604 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_9
%WER 98.00 [ 1664 / 1698, 35 ins, 582 del, 1047 sub ]
%SER 100.00 [ 180 / 180 ]



6. 将 /u01/kaldi/egs/voxforge/s5/exp/tri2b/graph拷贝到/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,切换至/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,
在线解码,执行如下
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt '1:2:3:4:5' ark,t:trans.txt ark,t:ali.txt final.mat

/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt final.mat
File: AT-20130718-lws-a0011
FROM EXPLAINED INCIDENTAL ACCIDENTAL AND FROM SHE


File: Aaron-20080318-pwn-a0265
DISGUSTED THE MANIFESTED THERE


File: Aaron-20080318-pwn-a0266
THERE WAS PASSIONATELY IT WAS THERE


File: AdrianMcNear-20091016-psv-a0573
IT IS GOING TO YOU MY WEEKS TO SUGGESTED PC THAT FOR SHUDDERED


至此,整个流程都走通。 

结论: 总共才100M的语音文件,训练时间之长。 当然跟硬件环境有关系。但整个voxforge语音库有20G左右,如果真的全部来训练的话,不知要多久才能跑完,看看有谁跑完的话告知下运行时间。




















来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16582684/viewspace-1273286/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/16582684/viewspace-1273286/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值