Kaldi单步完美运行AIShell v1 S5之一:MONO前
致谢
感谢AIShell在商业化道路上的探索。期待着v3的到来。
机器配置
sv@HP:~$ sudo lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
sv@HP:~$ cat /proc/cpuinfo | grep model\ name
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
sv@HP:~$ cat /proc/meminfo | grep MemTotal
MemTotal: 16321360 kB
sv@HP:~$ lspci | grep 'VGA'
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
Kaldi下AIShell v1详细输出
一网打尽。
第一部分:数据准备
sv@HP:~/lkaldi/egs/aishell/s5$ data=/home/sv/lkaldi/egs/aishell/s5/dat
sv@HP:~/lkaldi/egs/aishell/s5$ . ./cmd.sh
sv@HP:~/lkaldi/egs/aishell/s5$ local/aishell_prepare_dict.sh $data/resource_aishell || exit 1;
local/aishell_prepare_dict.sh: AISHELL dict preparation succeeded
sv@HP:~/lkaldi/egs/aishell/s5$
sv@HP:~/lkaldi/egs/aishell/s5$ # Data Preparation,
sv@HP:~/lkaldi/egs/aishell/s5$ local/aishell_data_prep.sh $data/data_aishell/wav $data/data_aishell/transcript || exit 1;
Preparing data/local/train transcriptions
Preparing data/local/dev transcriptions
Preparing data/local/test transcriptions
local/aishell_data_prep.sh: AISHELL data preparation succeeded
sv@HP:~/lkaldi/egs/aishell/s5$
sv@HP:~/lkaldi/egs/aishell/s5$ # Phone Sets, questions, L compilation
sv@HP:~/lkaldi/egs/aishell/s5$ utils/prepare_lang.sh --position-dependent-phones false data/local/dict \
> "<SPOKEN_NOISE>" data/local/lang data/lang || exit 1;
utils/prepare_lang.sh --position-dependent-phones false data/local/dict <SPOKEN_NOISE> data/local/lang data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/silence_phones.txt is OK
Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/optional_silence.txt is OK
Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/nonsilence_phones.txt is OK
Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.
Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/lexicon.txt is OK
Checking data/local/dict/extra_questions.txt ...
--> reading data/local/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict]
**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang
Checking data/lang/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/phones.txt is OK
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
Checking data/lang/phones/context_indep.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII