yesno模型
kaldi常用工具 http://blog.csdn.net/zjm750617105/article/details/52548798
kaldi官网工具大全http://kaldi-asr.org/doc/tools.html
yesno孤立词识别kaldi脚本http://www.cnblogs.com/welen/p/7485151.html
执行 run.sh入口程序
# 数据处理阶段
一. 训练和测试数据预处理阶段
执行local/prepare_data.sh waves_yesno
1. 是把waves_yeno目录下的文件名全部保存到waves_all.list中.
ls -1 $waves_dir > data/local/waves_all.list |
2.使用perl脚本create_yesno_waves_test_train.pl把样本集一半数据共30个用作训练文件名列表存在 data/local/waves.train,另一半共30个识别测试文件名列表存到data/local/waves.test。
3.create_yesno_wav_scp.pl脚本把waves.test文件进行标注存到data/local/test_yesno_wav.scp,格式:
1_0_0_0_0_0_0_0 waves_yesno/1_0_0_0_0_0_0_0.wav 1_0_0_0_0_0_0_1 waves_yesno/1_0_0_0_0_0_0_1.wav .. |
4.create_yesno_wav_scp.pl脚本把waves.train进行标注存到data/local/train_yesno_wav.scp,格式:
0_0_0_0_1_1_1_1 waves_yesno/0_0_0_0_1_1_1_1.wav 0_0_0_1_0_0_0_1 waves_yesno/0_0_0_1_0_0_0_1.wav … |
5.create_yesno_txt.pl脚本把waves.test进行标注到data/local/test_yesno.txt,格式:
1_0_0_0_0_0_0_0 YES NO NO NO NO NO NO NO 1_0_0_0_0_0_0_1 YES NO NO NO NO NO NO YES … |
6.create_yesno_txt.pl脚本把waves.train进行标注到data/local/train_yesno.txt,格式:
0_0_0_0_1_1_1_1 NO NO NO NO YES YES YES YES 0_0_0_1_0_0_0_1 NO NO NO YES NO NO NO YES … |
7. data/local 目录创建一个文件lm_tg.arpa内容:
\data\ ngram 1=4
\1-grams: -1 NO -1 YES -99 <s> -1 </s>
\end\ |
8. 从WSJ样本复制阶段
8.1.创建目录data/train_yesno 和data/test_yesno
8.2. 把data/local/test_yesno_wav.scp 复制到data/test_yesno/wav.scp
把data/local/train_yesno_wav.scp 复制到data/train_yesno/wav.scp
8.3.把data/local/train_yesno.txt 复制到 data/train_yesno/text
把data/local/test_yesno .txt 复制到 data/test_yesno/text
8.4.通过awk文本处理工具处理text文本 输出到 data/train_yesno/utt2spk文件 和 data/test_yesno/utt2spk文件,这个两个文件分别是发音和人对应关系,以及人和其发音 id的对应关系.由于只有一个人的发音,所以这里都用global来表示发音.格式:
1_0_0_0_0_0_0_0 global 1_0_0_0_0_0_0_1 global 1_0_0_0_0_0_1_1 global ... |
8.5.通过 utils/utt2spk_to_spk2utt.pl 脚本 把 utt2spk 转换成spk2utt 格式:
global 1_0_0_0_0_0_0_0 1_0_0_0_0_0_0_1 1_0_0_0_0_0_1_1 1_0_0_0_1_0_0_1 1_0_0_1_0_1_1_1 1_0_1_0_1_0_0_1 1_0_1_1_0_1_1_1 1_0_1_1_1_0_1_0 1_0_1_1_1_1_0_1 1_1_0_0_0_0_0_1 1_1_0_0_0_1_1_1 1_1_0_0_1_0_1_0 1_1_0_0_1_0_1_1 1_1_0_0_1_1_1_0 1_1_0_1_0_1_0_0 1_1_0_1_0_1_1_0 1_1_0_1_1_0_0_1 1_1_0_1_1_0_1_1 1_1_0_1_1_1_1_0 1_1_1_0_0_0_0_1 1_1_1_0_0_1_0_1 1_1_1_0_0_1_1_1 1_1_1_0_1_0_1_0 1_1_1_0_1_0_1_1 1_1_1_1_0_0_1_0 1_1_1_1_0_1_0_0 1_1_1_1_1_0_0_0 1_1_1_1_1_1_0_0 1_1_1_1_1_1_1_1 |
此时目录结构如下:
data
├───
local
<span style="color:#00000a"><code>│ ├───</code><code>waves.train</code>
<code>│ ├───</code><code>waves.test</code>
<code>│ ├───</code><code>test_yesno_wav.scp</code>
<code>│ ├───</code><code>train_yesno_wav.scp</code></span>
│ ├───
test_yesno.txt
│ ├───
test_yesno.txt
<span style="color:#00000a"><code>│ ├───</code><code>lm_tg.arpa</code>
<code>│ └───</code><code>waves_all.list </code>
<code>├───</code><code>train_yesno</code>
<code>│ ├───</code><code>text</code>
<code>│ ├───</code><code>utt2spk</code>
<code>│ ├───</code><code>spk2utt</code>
<code>│ └───</code><code>wav.scp</code>
<code>├───</code><code>test_yesno</code>
<code>│ ├───</code><code>text</code>
<code>│ ├───</code><code>utt2spk</code>
<code>│ ├───</code><code>spk2utt</code>
<code>│ └───</code><code>wav.scp </code>
</span>
二. 字典预处理阶段
执行local/prepare_dict.sh
1. 创建词典目录data/local/dict 和 复制文件:input/lexicon_nosil.txt 到data/local/dict/lexicon_words.txt ; input/lexicon.txt 到data/local/dict/lexicon.txt
lexicon_words.txt内容:
YES Y NO N |
lexicon.txt 内容:
<SIL> SIL YES Y NO N |
2. cat input/phones.txt | grep -v SIL > data/local/dict/nonsilence_phones.txt 使用反转查找(排除)文件中SIL 并且存到另一个文件 nonsilence_phones.txt 内容:
Y N |
3. data/local/dict/silence_phones.txt 和 data/local/dict/optional_silence.txt 内容:
SIL |
此时目录结构如下:
<span style="color:#00000a"><code>data</code></span>
├───
local
<span style="color:#00000a"><code>│ └───</code><code>dict</code>
<code>│ ├───</code><code>lexicon_words.txt</code>
<code>│ ├───</code><code>lexicon.txt</code>
<code>│ ├───</code><code>nonsilence_phones.txt</code></span>
│ ├───
silence_phones.txt
<span style="color:#00000a"><code>│ └───</code><code>optional_silence.txt</code>
</span>
三. 执行命令
utils/prepare_lang.sh --position-dependent-phones false data/local/dict "<SIL>" data/local/lang data/lang |
1. 调用这个脚本处理传入的参数
. utils/parse_options.sh |
1.1 把传入的—position-dependent-phones处理 成 position_dependent_phones 然后通过之后的代码把第二个参数false赋值给他
name=`echo "$1" | sed s/^--// | sed s/-/_/g` |
1.2 最后左移两个参数,参数列表变为:
utils/prepare_lang.sh data/local/dict "<SIL>" data/local/lang data/lang |
2. 四个变量,方便阅读代码
srcdir=$1 #data/local/dict oov_word=$2 #<SIL> tmpdir=$3 #data/local/lang dir=$4 #data/lang |
3. 执行不启动新的shell执行脚本 设置环境变量
. ./path.sh |
执行 脚本 设置环境变量 KALDI_ROOT和 PATH
kaldi/tools/env.sh |
4. 执行命令检测词典文件内容是否正确
utils/validate_dict_dir.pl $srcdir |
检测silence_phones.txt optional_silence.txt nonsilence_phones.txt 等文件格式是否正确 (主要是匹配应该没有\r \n,是否为文件是空的,或是phones的结尾不应该是 _B, _E, _S 或 _I 这些容易混淆的符号,内容是否重复)
(检查silence_phones.txt, nonsilence_phones.txt内容互斥)
(通过 check_lexicon_pair函数 检查词典是否成对lexicon.txt lexiconp.txt )
检测data/loacal/dict/extra_questions.txt 不存在 输出"--> data/loacal/dict/extra_questions.txt is empty (this is OK)\n"
5. 检查文件$srcdir/lexicon.txt是否为普通文件,不是普通文件则执行该指令
perl -ape 's/ (\S+\s+)\S+\s+(.+)/$1$2/;' < $srcdir/lexiconp.txt > $srcdir/lexicon.txt || exit 1; |
这个perl -ape 命令 应该是-a -p -e ,后面是字符匹配替换,$1代码第一个括号$2代 表第二个括号内容,\S+ 多个非空格 \s+ 多个空格 .+ 匹配一次或多次任何字符。
(注:本代码为普通不执行后面代码)。
6.命令 复制文件 内容:
cp $srcdir/lexiconp.txt $tmpdir/ |
lexiconp.txt内容:
<SIL> 1.0 SIL YES 1.0 Y NO 1.0 N |
命令读取两个文件合并到phones文件,
cat $srcdir/silence_phones.txt $srcdir/nonsilence_phones.txt | \ |
awk '{for(n=1;n<=NF;n++) print $n; }' > $tmpdir/phones |
data/local/lang/phones文件内容:
SIL Y N |
命令 作用是把两个文件列合并到新文件
paste -d' ' $tmpdir/phones $tmpdir/phones > $tmpdir/phone_map.txt |
phone_map.txt内容:
SIL SIL Y Y N N |
创建目录 data/lang/phones 一系列音素的集合
mkdir -p $dir/phones |
官 方文档:phones目录下包含许多不同的音素集的信息,每个文件都有三种形式,扩展名为.csl, .int 和 .txt是相同信息的三种不同格式。这些文件可以用这个脚本"utils/prepare_lang.sh"创建。
命令主要 apply_map.pl脚本作用读入 phone_map.txt文件每行两个数据段用hash映射键值对存储,然后读入$srcdir/{,non}silence_phones.txt数据,用此数据作为键取之前hash的值并输出到sets.txt文件,在之后生成的.int文件是音素集合
cat $srcdir/{,non}silence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt > $dir/phones/sets.txt |
不同的silence 音素拥有不同的 GMMs. [注意: 这里所有的"shared split" 意思是对于所有状态我们可能拥有一个GMM,或者我们能够分割状态。因为他们是上下文-依赖音素(context-independent phones),他们看不到上下文context](来源:prepare_lang.sh注释)
sets.txt 内容:
SIL Y N |
命令生成的这个roots文件让所有silence音素共享同一个概率密度函数。
cat $dir/phones/sets.txt | awk '{print "shared", "split", $0;}' > $dir/phones/roots.txt |
roots.txt内容:
shared split SIL shared split Y shared split Nlex_ndisambig |
7. 下面命令其中|代表管道,执行 utils/apply_map.pl 传到脚本的第一个值 $tmpdir 第二个值是$srcdir/silence_phones.txt 的内容,然后把脚本运行的结果传给后并输出到文件中属于标准输入<STDIN>读取;整个指令目的是 匹配两个文件相同的字符输出到新文件
cat $srcdir/silence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt | \ awk '{for(n=1;n<=NF;n++) print $n;}' > $dir/phones/silence.txt |
silence.txt 内容:
SIL |
8.命令生成nonsilence.txt文件
cat $srcdir/nonsilence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt | \ awk '{for(n=1;n<=NF;n++) print $n;}' > $dir/phones/nonsilence.txt |
nonsilence.txt 内容:
Y N |
之后用下面两个命令把文件复制到指定目录
cp $srcdir/optional_silence.txt $dir/phones/optional_silence.txt |
cp $dir/phones/silence.txt $dir/phones/context_indep.txt |
optional_silence.txt内容:
SIL |
context_indep.txt内容:
SIL |
9. 下面命令生成data/lang/phones.txt文件
echo "<eps>" | cat - $dir/phones/{silence,nonsilence,disambig}.txt | \ |
awk '{n=NR-1; print $1, n;}' > $dir/phones.txt |
下面代码处理 lexiconp.txt文件每行第一个字段 并且排序去除重复 增加几个字段 并且编号 输出words.txt ,如果失败则退出。
cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | uniq | awk ' BEGIN { print "<eps> 0"; } { if ($1 == "<s>") { print "<s> is in the vocabulary!" | "cat 1>&2" exit 1; } if ($1 == "</s>") { print "</s> is in the vocabulary!" | "cat 1>&2" exit 1;cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | uniq | awk ' BEGIN { print "<eps> 0"; } { if ($1 == "<s>") { print "<s> is in the vocabulary!" | "cat 1>&2" exit 1; } if ($1 == "</s>") { print "< } printf("%s %d\n", $1, NR); } END { printf("#0 %d\n", NR+1); printf("<s> %d\n", NR+2); printf("</s> %d\n", NR+3); }' > $dir/words.txt || exit 1; |
lexiconp.txt 内容:
<SIL> 1.0 SIL YES 1.0 Y NO 1.0 N |
words.txt 内容:
<eps> 0 <SIL> 1</ |