从yesno模型入门kaldi语音识别

yesno模型

 

kaldi常用工具 http://blog.csdn.net/zjm750617105/article/details/52548798

kaldi官网工具大全http://kaldi-asr.org/doc/tools.html

yesno孤立词识别kaldi脚本http://www.cnblogs.com/welen/p/7485151.html

执行 run.sh入口程序

 

# 数据处理阶段

 

 

. 训练和测试数据预处理阶段

执行local/prepare_data.sh waves_yesno

 

1. 是把waves_yeno目录下的文件名全部保存到waves_all.list中.

ls -1 $waves_dir > data/local/waves_all.list

 

2.使用perl脚本create_yesno_waves_test_train.pl把样本集一半数据共30个用作训练文件名列表存在 data/local/waves.train,另一半共30个识别测试文件名列表存到data/local/waves.test。

 

3.create_yesno_wav_scp.pl脚本把waves.test文件进行标注存到data/local/test_yesno_wav.scp,格式:

1_0_0_0_0_0_0_0 waves_yesno/1_0_0_0_0_0_0_0.wav

1_0_0_0_0_0_0_1 waves_yesno/1_0_0_0_0_0_0_1.wav

..

 

4.create_yesno_wav_scp.pl脚本把waves.train进行标注存到data/local/train_yesno_wav.scp,格式:

0_0_0_0_1_1_1_1 waves_yesno/0_0_0_0_1_1_1_1.wav

0_0_0_1_0_0_0_1 waves_yesno/0_0_0_1_0_0_0_1.wav

 

 

5.create_yesno_txt.pl脚本把waves.test进行标注到data/local/test_yesno.txt,格式:

1_0_0_0_0_0_0_0 YES NO NO NO NO NO NO NO

1_0_0_0_0_0_0_1 YES NO NO NO NO NO NO YES

 

6.create_yesno_txt.pl脚本把waves.train进行标注到data/local/train_yesno.txt,格式:

0_0_0_0_1_1_1_1 NO NO NO NO YES YES YES YES

0_0_0_1_0_0_0_1 NO NO NO YES NO NO NO YES

 

7. data/local 目录创建一个文件lm_tg.arpa内容:

\data\

ngram 1=4

 

\1-grams:

-1 NO

-1 YES

-99 <s>

-1 </s>

 

\end\

 

8. 从WSJ样本复制阶段

 

8.1.创建目录data/train_yesno 和data/test_yesno

 

8.2. 把data/local/test_yesno_wav.scp 复制到data/test_yesno/wav.scp

把data/local/train_yesno_wav.scp 复制到data/train_yesno/wav.scp

 

8.3.把data/local/train_yesno.txt 复制到 data/train_yesno/text

把data/local/test_yesno .txt 复制到 data/test_yesno/text

 

8.4.通过awk文本处理工具处理text文本 输出到 data/train_yesno/utt2spk文件 和 data/test_yesno/utt2spk文件,这个两个文件分别是发音和人对应关系,以及人和其发音 id的对应关系.由于只有一个人的发音,所以这里都用global来表示发音.格式:

 

1_0_0_0_0_0_0_0 global

1_0_0_0_0_0_0_1 global

1_0_0_0_0_0_1_1 global

...

 

 

8.5.通过 utils/utt2spk_to_spk2utt.pl 脚本 把 utt2spk 转换成spk2utt 格式:

global 1_0_0_0_0_0_0_0 1_0_0_0_0_0_0_1 1_0_0_0_0_0_1_1 1_0_0_0_1_0_0_1 1_0_0_1_0_1_1_1 1_0_1_0_1_0_0_1 1_0_1_1_0_1_1_1 1_0_1_1_1_0_1_0 1_0_1_1_1_1_0_1 1_1_0_0_0_0_0_1 1_1_0_0_0_1_1_1 1_1_0_0_1_0_1_0 1_1_0_0_1_0_1_1 1_1_0_0_1_1_1_0 1_1_0_1_0_1_0_0 1_1_0_1_0_1_1_0 1_1_0_1_1_0_0_1 1_1_0_1_1_0_1_1 1_1_0_1_1_1_1_0 1_1_1_0_0_0_0_1 1_1_1_0_0_1_0_1 1_1_1_0_0_1_1_1 1_1_1_0_1_0_1_0 1_1_1_0_1_0_1_1 1_1_1_1_0_0_1_0 1_1_1_1_0_1_0_0 1_1_1_1_1_0_0_0 1_1_1_1_1_1_0_0 1_1_1_1_1_1_1_1

 

此时目录结构如下:

data

├───local

<span style="color:#00000a"><code>│   ├───</code><code>waves.train</code>
<code>│   ├───</code><code>waves.test</code>
<code>│   ├───</code><code>test_yesno_wav.scp</code>
<code>│   ├───</code><code>train_yesno_wav.scp</code></span>

│ ├───test_yesno.txt

│ ├───test_yesno.txt

<span style="color:#00000a"><code>│   ├───</code><code>lm_tg.arpa</code>
<code>│   └───</code><code>waves_all.list			 </code>
<code>├───</code><code>train_yesno</code>
<code>│   ├───</code><code>text</code>
<code>│   ├───</code><code>utt2spk</code>
<code>│   ├───</code><code>spk2utt</code>
<code>│   └───</code><code>wav.scp</code>
<code>├───</code><code>test_yesno</code>
<code>│   ├───</code><code>text</code>
<code>│   ├───</code><code>utt2spk</code>
<code>│   ├───</code><code>spk2utt</code>
<code>│   └───</code><code>wav.scp		 </code>
</span>

 

. 字典预处理阶段

执行local/prepare_dict.sh

 

1. 创建词典目录data/local/dict 和 复制文件:input/lexicon_nosil.txt 到data/local/dict/lexicon_words.txt ; input/lexicon.txt 到data/local/dict/lexicon.txt

lexicon_words.txt内容:

YES Y

NO N

 

lexicon.txt 内容:

<SIL> SIL

YES Y

NO N

 

2. cat input/phones.txt | grep -v SIL > data/local/dict/nonsilence_phones.txt 使用反转查找(排除)文件中SIL 并且存到另一个文件 nonsilence_phones.txt 内容:

Y

N

 

3. data/local/dict/silence_phones.txt 和 data/local/dict/optional_silence.txt 内容:

SIL

 

此时目录结构如下:

<span style="color:#00000a"><code>data</code></span>

├───local

<span style="color:#00000a"><code>│   └───</code><code>dict</code>
<code>│       ├───</code><code>lexicon_words.txt</code>
<code>│       ├───</code><code>lexicon.txt</code>
<code>│       ├───</code><code>nonsilence_phones.txt</code></span>

│ ├───silence_phones.txt

<span style="color:#00000a"><code>│       └───</code><code>optional_silence.txt</code>


</span>

 

. 执行命令

utils/prepare_lang.sh --position-dependent-phones false data/local/dict "<SIL>" data/local/lang data/lang

 

1. 调用这个脚本处理传入的参数

. utils/parse_options.sh

 

1.1 把传入的—position-dependent-phones处理 成 position_dependent_phones 然后通过之后的代码把第二个参数false赋值给他

name=`echo "$1" | sed s/^--// | sed s/-/_/g`

 

1.2 最后左移两个参数,参数列表变为:

utils/prepare_lang.sh data/local/dict "<SIL>" data/local/lang data/lang

 

2. 四个变量,方便阅读代码

srcdir=$1 #data/local/dict

oov_word=$2 #<SIL>

tmpdir=$3 #data/local/lang

dir=$4 #data/lang

 

3. 执行不启动新的shell执行脚本 设置环境变量

. ./path.sh

执行 脚本 设置环境变量 KALDI_ROOT和 PATH

kaldi/tools/env.sh

 

4. 执行命令检测词典文件内容是否正确

utils/validate_dict_dir.pl $srcdir

 

检测silence_phones.txt optional_silence.txt nonsilence_phones.txt 等文件格式是否正确 (主要是匹配应该没有\r \n,是否为文件是空的,或是phones的结尾不应该是 _B, _E, _S 或 _I 这些容易混淆的符号,内容是否重复)

(检查silence_phones.txt, nonsilence_phones.txt内容互斥)

(通过 check_lexicon_pair函数 检查词典是否成对lexicon.txt lexiconp.txt )

检测data/loacal/dict/extra_questions.txt 不存在 输出"--> data/loacal/dict/extra_questions.txt is empty (this is OK)\n"

 

5. 检查文件$srcdir/lexicon.txt是否为普通文件,不是普通文件则执行该指令

perl -ape 's/ (\S+\s+)\S+\s+(.+)/$1$2/;' < $srcdir/lexiconp.txt > $srcdir/lexicon.txt || exit 1;

这个perl -ape 命令 应该是-a -p -e ,后面是字符匹配替换,$1代码第一个括号$2代 表第二个括号内容,\S+ 多个非空格 \s+ 多个空格 .+ 匹配一次或多次任何字符。

(注:本代码为普通不执行后面代码)。

 

6.命令 复制文件 内容:

cp $srcdir/lexiconp.txt $tmpdir/

lexiconp.txt内容:

<SIL> 1.0 SIL

YES 1.0 Y

NO 1.0 N

 

命令读取两个文件合并到phones文件,

cat $srcdir/silence_phones.txt $srcdir/nonsilence_phones.txt | \

awk '{for(n=1;n<=NF;n++) print $n; }' > $tmpdir/phones

data/local/lang/phones文件内容:

SIL

Y

N

 

命令 作用是把两个文件列合并到新文件

paste -d' ' $tmpdir/phones $tmpdir/phones > $tmpdir/phone_map.txt

phone_map.txt内容:

SIL SIL

Y Y

N N

创建目录 data/lang/phones 一系列音素的集合

mkdir -p $dir/phones

方文档:phones目录下包含许多不同的音素集的信息,每个文件都有三种形式,扩展名为.csl, .int 和 .txt是相同信息的三种不同格式。这些文件可以用这个脚本"utils/prepare_lang.sh"创建。

命令主要 apply_map.pl脚本作用读入 phone_map.txt文件每行两个数据段用hash映射键值对存储,然后读入$srcdir/{,non}silence_phones.txt数据,用此数据作为键取之前hash的值并输出到sets.txt文件,在之后生成的.int文件是音素集合

cat $srcdir/{,non}silence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt > $dir/phones/sets.txt

 

不同的silence 音素拥有不同的 GMMs. [注意: 这里所有的"shared split" 意思是对于所有状态我们可能拥有一个GMM,或者我们能够分割状态。因为他们是上下文-依赖音素(context-independent phones),他们看不到上下文context](来源:prepare_lang.sh注释)

sets.txt 内容:

SIL

Y

N

 

命令生成的这个roots文件让所有silence音素共享同一个概率密度函数。

cat $dir/phones/sets.txt | awk '{print "shared", "split", $0;}' > $dir/phones/roots.txt

 

roots.txt内容:

shared split SIL

shared split Y

shared split Nlex_ndisambig

 

 

7. 下面命令其中|代表管道,执行 utils/apply_map.pl 传到脚本的第一个值 $tmpdir 第二个值是$srcdir/silence_phones.txt 的内容,然后把脚本运行的结果传给后并输出到文件中属于标准输入<STDIN>读取;整个指令目的是 匹配两个文件相同的字符输出到新文件

cat $srcdir/silence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt | \

awk '{for(n=1;n<=NF;n++) print $n;}' > $dir/phones/silence.txt

 

silence.txt 内容:

SIL

 

8.命令生成nonsilence.txt文件

cat $srcdir/nonsilence_phones.txt | utils/apply_map.pl $tmpdir/phone_map.txt | \

awk '{for(n=1;n<=NF;n++) print $n;}' > $dir/phones/nonsilence.txt

 

nonsilence.txt 内容:

Y

N

 

之后用下面两个命令把文件复制到指定目录

 

cp $srcdir/optional_silence.txt $dir/phones/optional_silence.txt

cp $dir/phones/silence.txt $dir/phones/context_indep.txt

 

optional_silence.txt内容:

SIL

 

context_indep.txt内容:

SIL

 

9. 下面命令生成data/lang/phones.txt文件

echo "<eps>" | cat - $dir/phones/{silence,nonsilence,disambig}.txt | \

awk '{n=NR-1; print $1, n;}' > $dir/phones.txt

 

 

下面代码处理 lexiconp.txt文件每行第一个字段 并且排序去除重复 增加几个字段 并且编号 输出words.txt ,如果失败则退出。

cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | uniq | awk '

BEGIN {

print "<eps> 0";

}

{

if ($1 == "<s>") {

print "<s> is in the vocabulary!" | "cat 1>&2"

exit 1;

}

if ($1 == "</s>") {

print "</s> is in the vocabulary!" | "cat 1>&2"

exit 1;cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | uniq | awk '

BEGIN {

print "<eps> 0";

}

{

if ($1 == "<s>") {

print "<s> is in the vocabulary!" | "cat 1>&2"

exit 1;

}

if ($1 == "</s>") {

print "<

}

printf("%s %d\n", $1, NR);

}

END {

printf("#0 %d\n", NR+1);

printf("<s> %d\n", NR+2);

printf("</s> %d\n", NR+3);

}' > $dir/words.txt || exit 1;

 

lexiconp.txt 内容:

<SIL> 1.0 SIL

YES 1.0 Y

NO 1.0 N

 

words.txt 内容:

<eps> 0

<SIL> 1

NO 2

  • 5
    点赞
  • 35
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

喜欢编程的网管

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值