言語モデルの学習:
make lexicon and wordlist file
text2wfreq csj_futu_kata.txt csj_futu_kata.wfreq
cat csj_futu_kata.wfreq |sort -n -k 2 -r|grep -v "+ー" | grep -v "++" | grep -v "×" > csj_futu+wfreq.txt
cut -d " " -f1 csj_futu+wfreq.txt > csj_lexicon.txt
cat csj_lexicon.txt | grep -v "+ー" | grep -v "++" | grep -v "×" > lexicon.txt
sort -u lexicon.txt > lexicon_htk.txt
../../../local/csj_make_trans/vocab2dic.pl -p ../../../local/csj_make_trans/kana2phone -e ERROR_v2d -o lexicon.txt lexicon_htk.txt
cut -d'+' -f1,3 lexicon.txt > lexicon_htk.txt
cut -f1,3- lexicon_htk.txt | perl -ape 's:\t: :g' > lexicon.txt
cat lexicon.txt | awk '{print $1}' > wordlist
- PS: add
<sp>
into the first line of wordlist and add<unk>
into the second line of wordlist
train Language model
ngram-count -text hira_hisi.txt -order 3 -limit-vocab -vocab wordlist -unk -map-unk "<unk>" -kndiscount -interpolate -lm csj.WBH.gz -prune 1.0e-8
prepare dict for making HCLG
mkdir dict_csj_CB 放置lexicon.txt
mkdir lang_csj_CB 放置 WFST中转的文件
zuishuailxy@ashino:/misc/Work18/zuishuailxy/kaldi/egs/csj/s5/data/local$ mkdir ../lang_csj_CB 放置WFST文件
dict fileを作る
cp ../csj_CB/lexicon.txt lexicon1.txt
cp ../csj_CB/lexicon.txt lexicon2.txt
在lexicon2前加上<sp> sp
<unk> spn
make nonslience_phones.txt and some files
cat lexicon1.txt |awk '{ for (n=2;n<=NF;n++){phones[$n] =1;}}END{for (p in phones) print p;}' |grep -v sp >nonsilence_phones.txt
(echo sp ; echo spn;) > silence_phones.txt
echo sp > optional_silence.txt
echo -n >extra_questions.txt
ln -sf lexicon2.txt lexicon.txt
start making HCLG
1.L.fst的制作
utils/prepare_lang.sh data/local/dict_csj_CB "<unk>" data/local/lang_csj_CB data/lang_csj_CB
2.G.fst的制作
srilm_opts="-subset -prune-lowprobs -unk -tolower -order 3"
LM=data/local/csj_CB/csj.CB.gz
utils/format_lm_sri.sh --srilm-opts "$srilm_opts" data/lang_csj_CB $LM data/local/dict_csj_CB/lexicon.txt data/lang_csj_CB_3g
3.HCLG的合成
gmmdir=exp/tri4
graph_dir=$gmmdir/graph_csj_CB
utils/mkgraph.sh data/lang_csj_CB_3g $gmmdir $graph_dir
prepare acoustic model
1.首先将需要认识的声音录音,用16位,采样频率为16Khz.
2.制作任意相关的声音文件,wav.scp,utt2spk,spk2utt。
wav.scp 音源ID与说话人关联的文件
utt2spk 说话人与说话人id关联的文件
spk2utt 音源id与音源文件的路径关联的文件
cat wav.scp | awk '{print $1}' | sed 's/.wav//' | awk -F "_" '{print $0,$1}' > utt2spk
cat wav.scp | awk '{print $1}' | awk -F "_" '{print $0,$1}' | awk '{if(pre!=$2){print""; printf $2" "$1;}else{printf" "$1;} pre=$2;}' > hoge
cat hoge | awk 'NF' | sed 's/.wav//' > spk2utt
rm -rf hoge
you also can use csj recipe
utils/utt2spk_to_spk2utt.pl <data/speech/utt2spk> data/speech/spk2utt
- please put wav.scp,utt2spk, spk2utt and text in the same folder. check all the data by use csj recipe
utils/fix_data_dir.sh data/SDPWS_CB
extract mfcc feature
MFCC的抽出
mfccdir=mfcc
steps/make_mfcc.sh --nj 4 --cmd utils/run.pl data/SDPWS_CB exp/make_mfcc/SDPWS_CB $mfccdir
#CMVNの適用
steps/compute_cmvn_stats.sh data/SDPWS_CB exp/make_mfcc/SDPWS_CB $mfccdir
#データの検証
utils/fix_data_dir.sh data/SDPWS_CB
fmllr congnition
fmllr抽出のための音声認識
graph_dir=exp/tri4/graph_csj_CB
steps/decode_fmllr.sh --nj 4 --cmd utils/run.pl --config conf/decode.config $graph_dir data/SDPWS_CB exp/tri4/decode_csj_CB
extract fmllr feature for DNN congnition
fmmlr特徴量の抽出
gmmdir=exp/tri4
dir=data-fmllr-tri4/SDPWS_CB
steps/nnet/make_fmllr_feats.sh --nj 4 --cmd utils/run.pl --transform-dir exp/tri4/decode_csj_CB $dir data/SDPWS_CB $gmmdir $dir/log $dir/data
DNN congnition
DNN音声認識
gmmdir=exp/tri4
dir=exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats
acwt=0.0909
steps/nnet/decode.sh --nj 4 --cmd utils/run.pl --config conf/decode_dnn.config --nnet $dir/2.nnet --acwt $acwt $gmmdir/graph_csj_CB data-fmllr-tri4/SDPWS_CB $dir/decode_SDPWS_CB
*Ps:It cost much time until congnition is completed ,so you`d better use background run
chmod +x fmllr_Dnn.sh
nohup ./fmllr_Dnn.sh > out.log 2>error.log &
watch tail -20 out.log