- kaldi/egs/thchs30创建thchs30-openslr,将所有压缩包解压到这下面,文件目录如下
thchs30-openslr
├── data_thchs30
├── resource
└── test-noise
2.打开s5目录,编辑cmd.sh. 修改为本地运行, 如下
export train_cmd=run.pl
export decode_cmd=run.pl
export mkgraph_cmd=run.pl
export cuda_cmd=run.pl
3.修改s5下面的run.sh脚本,可以改两个地方
#n=4 #parallel jobs 修改并行任务的数量,可以根据cpu的个数来定
n=2 #parallel jobs
#thchs=/nfs/public/materials/data/thchs30-openslr #修改文件路径,改成你的文件路径
thchs=/home/kaldi/egs/thch30/thchs30-openslr
然后./run.sh, 出现exicon.txt错误:
Checking data/dict/lexicon.txt
--> reading data/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> ERROR: phone "file" is not in {, non}silence.txt (line 2)
--> ERROR: phone "(standard" is not in {, non}silence.txt (line 2)
--> ERROR: phone "input)" is not in {, non}silence.txt (line 2)
--> ERROR: phone "matches" is not in {, non}silence.txt (line 2)
这是grep命令引起的,打开run.sh,找到
grep -v '<s>' | grep -v '</s>' | sort -u > data/dict/lexicon.txt || exit 1;
改为
grep -v -a '<s>' | grep -v -a '</s>' | sort -u > data/dict/lexicon.txt || exit 1;
运行OK