利用kaldi的csj来实现语音识别

言語モデルの学習:

make lexicon and wordlist file

text2wfreq csj_futu_kata.txt csj_futu_kata.wfreq
cat csj_futu_kata.wfreq |sort -n -k 2 -r|grep -v "+ー" | grep -v "++" | grep -v "×" > csj_futu+wfreq.txt
cut -d " " -f1 csj_futu+wfreq.txt > csj_lexicon.txt
cat csj_lexicon.txt | grep -v "+ー" | grep -v "++" | grep -v "×" > lexicon.txt
sort -u lexicon.txt > lexicon_htk.txt
../../../local/csj_make_trans/vocab2dic.pl -p  ../../../local/csj_make_trans/kana2phone -e ERROR_v2d -o lexicon.txt lexicon_htk.txt
cut -d'+' -f1,3 lexicon.txt > lexicon_htk.txt
cut -f1,3- lexicon_htk.txt | perl -ape 's:\t: :g' > lexicon.txt
cat lexicon.txt | awk '{print $1}' > wordlist
  • PS: add <sp> into the first line of wordlist and add <unk> into the second line of wordlist

train Language model

ngram-count -text hira_hisi.txt -order 3 -limit-vocab -vocab wordlist -unk -map-unk "<unk>" -kndiscount -interpolate -lm csj.WBH.gz -prune 1.0e-8

prepare dict for making HCLG

mkdir dict_csj_CB 放置lexicon.txt
mkdir lang_csj_CB  放置 WFST中转的文件
zuishuailxy@ashino:/misc/Work18/zuishuailxy/kaldi/egs/csj/s5/data/local$ mkdir ../lang_csj_CB 放置WFST文件
dict fileを作る
cp ../csj_CB/lexicon.txt lexicon1.txt
cp ../csj_CB/lexicon.txt lexicon2.txt
在lexicon2前加上<sp> sp
                <unk> spn

make nonslience_phones.txt and some files

cat lexicon1.txt |awk '{ for (n=2;n<=NF;n++){phones[$n] =1;}}END{for (p in phones) print p;}' |grep -v sp >nonsilence_phones.txt
(echo sp ; echo spn;) > silence_phones.txt
echo sp > optional_silence.txt
echo -n >extra_questions.txt
ln -sf lexicon2.txt lexicon.txt

start making HCLG

1.L.fst的制作
  utils/prepare_lang.sh data/local/dict_csj_CB "<unk>" data/local/lang_csj_CB data/lang_csj_CB

2.G.fst的制作
srilm_opts="-subset -prune-lowprobs -unk -tolower -order 3"
LM=data/local/csj_CB/csj.CB.gz
utils/format_lm_sri.sh --srilm-opts "$srilm_opts" data/lang_csj_CB $LM data/local/dict_csj_CB/lexicon.txt data/lang_csj_CB_3g

3.HCLG的合成
gmmdir=exp/tri4
graph_dir=$gmmdir/graph_csj_CB
utils/mkgraph.sh data/lang_csj_CB_3g $gmmdir $graph_dir

prepare acoustic model

1.首先将需要认识的声音录音,用16位,采样频率为16Khz.
2.制作任意相关的声音文件,wav.scp,utt2spk,spk2utt。
wav.scp 音源ID与说话人关联的文件
utt2spk 说话人与说话人id关联的文件
spk2utt 音源id与音源文件的路径关联的文件

cat wav.scp | awk '{print $1}' | sed 's/.wav//' | awk -F "_" '{print $0,$1}' > utt2spk
cat wav.scp | awk '{print $1}' | awk -F "_" '{print $0,$1}' | awk '{if(pre!=$2){print""; printf $2" "$1;}else{printf" "$1;} pre=$2;}' > hoge
cat hoge | awk 'NF' | sed 's/.wav//' > spk2utt
rm -rf hoge

you also can use csj recipe

utils/utt2spk_to_spk2utt.pl <data/speech/utt2spk> data/speech/spk2utt
  • please put wav.scp,utt2spk, spk2utt and text in the same folder. check all the data by use csj recipe
    utils/fix_data_dir.sh data/SDPWS_CB

extract mfcc feature

MFCC的抽出
mfccdir=mfcc
steps/make_mfcc.sh --nj 4 --cmd utils/run.pl data/SDPWS_CB exp/make_mfcc/SDPWS_CB $mfccdir
#CMVNの適用
steps/compute_cmvn_stats.sh data/SDPWS_CB exp/make_mfcc/SDPWS_CB $mfccdir
#データの検証
utils/fix_data_dir.sh data/SDPWS_CB

fmllr congnition

fmllr抽出のための音声認識
graph_dir=exp/tri4/graph_csj_CB
steps/decode_fmllr.sh --nj 4 --cmd utils/run.pl --config conf/decode.config $graph_dir data/SDPWS_CB exp/tri4/decode_csj_CB

extract fmllr feature for DNN congnition

fmmlr特徴量の抽出
gmmdir=exp/tri4
dir=data-fmllr-tri4/SDPWS_CB
steps/nnet/make_fmllr_feats.sh --nj 4 --cmd utils/run.pl --transform-dir exp/tri4/decode_csj_CB $dir data/SDPWS_CB $gmmdir $dir/log $dir/data

DNN congnition

DNN音声認識
gmmdir=exp/tri4
dir=exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats
acwt=0.0909
steps/nnet/decode.sh  --nj 4 --cmd utils/run.pl --config conf/decode_dnn.config --nnet $dir/2.nnet --acwt $acwt $gmmdir/graph_csj_CB data-fmllr-tri4/SDPWS_CB $dir/decode_SDPWS_CB

*Ps:It cost much time until congnition is completed ,so you`d better use background run

chmod +x fmllr_Dnn.sh
nohup ./fmllr_Dnn.sh > out.log 2>error.log &
watch tail -20 out.log
  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值