Sphinx武林秘籍(上)

最新推荐文章于 2024-04-16 10:09:01 发布

mirkerson

最新推荐文章于 2024-04-16 10:09:01 发布

阅读量1k

点赞数

分类专栏：语音识别

语音识别专栏收录该内容

7 篇文章 1 订阅

订阅专栏

Sphinx武林秘籍(上)

――使用现有的语言模型与声学模型

一、使用平台

Windows XP、VMware workstation+ Ubuntu10.10

(1) Soundrecorder 测试下能否使用

(2) sudo apt-get install libasound2-dev

二、 CMUSphinx语音识别工具包

Pocketsphinx — 用C语言编写的轻量级识别库

Sphinxbase — Pocketsphinx所需要的支持库

Sphinx3 — 为语音识别研究用C语言编写的解码器

CMUclmtk — 语言模型工具

Sphinxtrain — 声学模型训练工具

下载网址：http://sourceforge.net/projects/cmusphinx/files/

以上对应所使用的版本如下：

pocketsphinx-0.6.1（pocketsphinx_0.6.1-1.tar.gz）

sphinxbase-0.6.1（sphinxbase-0.6.1.tar.gz）

sphinx3-0.8（sphinx3-0.8.tar.bz2）

cmuclmtk（cmusphinx-cmuclmtk.tar.gz）

SphinxTrain-1.0（SphinxTrain-1.0.tar.bz2）

三、安装pocketsphinx

由于pocketsphinx依赖于另外一个库Sphinxbase,所以先需要安装Sphinxbase。

(1)安装Sphinxbase

tar xzf sphinxbase-0.6.1.tar.gz

cd sphinxbase-0.6

./configure

make

sudo make install

默认安装在/usr/local/bin下面，ls可查看。

(2)安装pocketsphinx

export LD_LIBRARY_PATH=/usr/local/lib

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

cd pocketsphinx-0.6.1

./configure

make

sudo make install

完成安装,在/usr/local/bin下面可以看到三个新生成的文件，

cd /usr/local/bin

pocketsphinx_batch

pocketsphinx_continuous

pocketsphinx_mdef_convert

测试下安装结果

pocketsphinx_continuous

若出现如下信息，说明安装成功。

INFO: cmd_ln.c(512): Parsing command line:

pocketsphinx_continuous

Current configuration:

[NAME] [DEFLT] [VALUE]

-adcdev

-agc none none

-agcthresh 2.0 2.000000e+00

-alpha 0.97 9.700000e-01

-argfile

-ascale 20.0 2.000000e+01

-backtrace no no

-beam 1e-48 1.000000e-48

-bestpath yes yes

-bestpathlw 9.5 9.500000e+00

-bghist no no

-ceplen 13 13

-cmn current current

-cmninit 8.0 8.0

………………………………….

…………………………………

………………………………….

INFO: ngram_search_fwdtree.c(333): after: 457 root, 13300 non-root channels, 26 single-phone words

INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25

Warning: Could not find Mic element

INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Feb 21 2011, AT: 22:31:47

READY....

四、建立一个简单的语言模型

(1)创建一个语料库

vi corpus.txt

输入如下内容：

stop

forward

backward

turn right

turn left

保存退出

(2)利用在线工具LMTool建立语言模型

进入网址：http://www.speech.cs.cmu.edu/tools/lmtool.html

点击Browse按钮,选择之前创建的corpus.txt, 最后点击COMPILE KNOWLEDGE BASE 。

生成TAR2916.tar.gz

tar xzf TAR2916.tar.gz

2916.corpus 2916.lm 2916.sent.arpabo 2916.vocab

2916.dic 2916.sent 2916.token

真正有用的是.dic、.lm 的文件

(3)测试结果

pocketsphinx_continuous -lm 2916.lm -dict 2916.dic

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words

INFO: ngram_search_fwdflat.c(912): 97 words recognized (2/fr)

INFO: ngram_search_fwdflat.c(914): 2342 senones evaluated (38/fr)

INFO: ngram_search_fwdflat.c(916): 1011 channels searched (16/fr)

INFO: ngram_search_fwdflat.c(918): 167 words searched (2/fr)

INFO: ngram_search_fwdflat.c(920): 47 word transitions (0/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using <sil> instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node <sil>.56

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(<sil>:56:60) = -341653

INFO: ps_lattice.c(1266): Joint P(O,S) = -341653 P(S|O) = 0

000000000: STOP (-6531224)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 37.45 -1.28 -0.16 -0.71 0.19 -0.19 -0.07 0.34 0.13 -0.07 -0.03 -0.42 0.19 >

INFO: cmn_prior.c(139): cmn_prior_update: to < 42.22 -0.51 -0.35 -0.28 -0.24 -0.37 0.02 0.38 0.03 -0.05 0.10 -0.32 0.05 >

INFO: ngram_search_fwdtree.c(1513): 847 words recognized (9/fr)

INFO: ngram_search_fwdtree.c(1515): 11452 senones evaluated (123/fr)

INFO: ngram_search_fwdtree.c(1517): 4963 channels searched (53/fr), 534 1st, 3470 last

INFO: ngram_search_fwdtree.c(1521): 1094 words for which last channels evaluated (11/fr)

INFO: ngram_search_fwdtree.c(1524): 203 candidate words for entering last phone (2/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 2 words

INFO: ngram_search_fwdflat.c(912): 225 words recognized (2/fr)

INFO: ngram_search_fwdflat.c(914): 10189 senones evaluated (110/fr)

INFO: ngram_search_fwdflat.c(916): 5206 channels searched (55/fr)

INFO: ngram_search_fwdflat.c(918): 329 words searched (3/fr)

INFO: ngram_search_fwdflat.c(920): 164 word transitions (1/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using RIGHT instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node RIGHT.48

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(RIGHT:48:91) = -647142

INFO: ps_lattice.c(1266): Joint P(O,S) = -647271 P(S|O) = -129

000000001: TURN RIGHT (-12643528)

READY....

注意：此方法不可用于中文命令词建立语言模型

五、利用现有的语言模型和声学模型

(1)下载Mandarin language and acoustic model

下载网址：http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/

Mandarin language model : zh_broadcastnews_64000_utf8.DMP、zh_broadcastnews_utf8.dic

Mandarin Broadcast News acoustic models : zh_broadcastnews_16k_ptm256_8000.tar.bz2

tar xjf zh_broadcastnews_16k_ptm256_8000.tar.bz2

cd zh_broadcastnews_16k_ptm256_8000

feat.params means noisedict transition_matrices

mdef mixture_weights sendump variances

上面这些文件为声学模型中所拥有的文件。

把zh_broadcastnews_64000_utf8.DMP、zh_broadcastnews_utf8.dic、zh_broadcastnews_16k_ptm256_8000、pocketsphinx_continuous放在同一个目录下面，然后就可以使用模型了。

(2)测试结果

huang@ubuntu:/usr/local/bin$ pocketsphinx_continuous -hmm zh_broadcastnews_ptm256_8000 -lm zh_broadcastnews_64000_utf8.DMP -dict zh_broadcastnews_utf8.dic

-lowerf 133.33334 \

-upperf 6855.4976 \

-nfft 512 \

-wlen 0.0256 \

-transform legacy \

-feat s2_4x \

-agc none \

-cmn current \

-varnorm no

Current configuration:

[NAME] [DEFLT] [VALUE]

-agc none none

-agcthresh 2.0 2.000000e+00

-alpha 0.97 9.700000e-01

-ceplen 13 13

-cmn current current

-cmninit 8.0 8.0

-dither no no

-doublebw no no

-feat 1s_c_d_dd s2_4x

-frate 100 100

-input_endian little little

-lda

-ldadim 0 0

-lifter 0 0

-logspec no no

-lowerf 133.33334 1.333333e+02

-ncep 13 13

-nfft 512 512

-nfilt 40 40

-remove_dc no no

-round_filters yes yes

-samprate 16000 1.600000e+04

-seed -1 -1

-smoothspec no no

-svspec

-transform legacy legacy

-unit_area yes yes

-upperf 6855.4976 6.855498e+03

-varnorm no no

-verbose no no

-warp_params

-warp_type inverse_linear inverse_linear

-wlen 0.025625 2.560000e-02

…………………………….

……………………………

INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 75539

INFO: ngram_search_fwdtree.c(333): after: 461 root, 75411 non-root channels, 27 single-phone words

INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25

Warning: Could not find Mic element

INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Feb 21 2011, AT: 22:31:47

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >

INFO: cmn_prior.c(139): cmn_prior_update: to < 9.20 -0.17 -0.27 -0.29 -0.38 -0.05 -0.08 -0.15 -0.12 -0.15 0.13 -0.08 -0.07 >

INFO: ngram_search_fwdtree.c(1513): 2628 words recognized (45/fr)

INFO: ngram_search_fwdtree.c(1515): 228830 senones evaluated (3878/fr)

INFO: ngram_search_fwdtree.c(1517): 506870 channels searched (8591/fr), 25129 1st, 119738 last

INFO: ngram_search_fwdtree.c(1521): 7773 words for which last channels evaluated (131/fr)

INFO: ngram_search_fwdtree.c(1524): 146203 candidate words for entering last phone (2478/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 137 words

INFO: ngram_search_fwdflat.c(912): 1906 words recognized (32/fr)

INFO: ngram_search_fwdflat.c(914): 71680 senones evaluated (1215/fr)

INFO: ngram_search_fwdflat.c(916): 134571 channels searched (2280/fr)

INFO: ngram_search_fwdflat.c(918): 7855 words searched (133/fr)

INFO: ngram_search_fwdflat.c(920): 6388 word transitions (108/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using 啊 instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node 啊(4).21

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(啊(4):21:57) = -296641

INFO: ps_lattice.c(1266): Joint P(O,S) = -296641 P(S|O) = 0

000000000: 啊 (-4653851)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 9.37 -0.27 -0.29 -0.06 -0.23 -0.13 -0.09 -0.15 -0.08 -0.15 -0.02 -0.07 -0.10 >

INFO: cmn_prior.c(139): cmn_prior_update: to < 9.31 -0.25 -0.37 -0.08 -0.22 -0.14 -0.08 -0.11 -0.05 -0.13 -0.02 -0.10 -0.12 >

INFO: ngram_search_fwdtree.c(1513): 2368 words recognized (38/fr)

INFO: ngram_search_fwdtree.c(1515): 251689 senones evaluated (4059/fr)

INFO: ngram_search_fwdtree.c(1517): 499391 channels searched (8054/fr), 26703 1st, 127525 last

INFO: ngram_search_fwdtree.c(1521): 7782 words for which last channels evaluated (125/fr)

INFO: ngram_search_fwdtree.c(1524): 181902 candidate words for entering last phone (2933/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 106 words

INFO: ngram_search_fwdflat.c(912): 1960 words recognized (32/fr)

INFO: ngram_search_fwdflat.c(914): 40695 senones evaluated (656/fr)

INFO: ngram_search_fwdflat.c(916): 107699 channels searched (1737/fr)

INFO: ngram_search_fwdflat.c(918): 6493 words searched (104/fr)

INFO: ngram_search_fwdflat.c(920): 5071 word transitions (81/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.50

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:50:60) = -190357

INFO: ps_lattice.c(1266): Joint P(O,S) = -206492 P(S|O) = -16135

000000002: 二 (-3082778)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 9.31 -0.25 -0.37 -0.08 -0.22 -0.14 -0.08 -0.11 -0.05 -0.13 -0.02 -0.10 -0.12 >

INFO: cmn_prior.c(139): cmn_prior_update: to < 9.26 -0.29 -0.28 0.11 -0.18 -0.16 -0.05 -0.16 -0.04 -0.19 -0.05 -0.10 -0.14 >

INFO: ngram_search_fwdtree.c(1513): 1595 words recognized (18/fr)

INFO: ngram_search_fwdtree.c(1515): 302259 senones evaluated (3358/fr)

INFO: ngram_search_fwdtree.c(1517): 487518 channels searched (5416/fr), 37862 1st, 104395 last

INFO: ngram_search_fwdtree.c(1521): 5835 words for which last channels evaluated (64/fr)

INFO: ngram_search_fwdtree.c(1524): 197251 candidate words for entering last phone (2191/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 61 words

INFO: ngram_search_fwdflat.c(912): 1027 words recognized (11/fr)

INFO: ngram_search_fwdflat.c(914): 24680 senones evaluated (274/fr)

INFO: ngram_search_fwdflat.c(916): 65722 channels searched (730/fr)

INFO: ngram_search_fwdflat.c(918): 4251 words searched (47/fr)

INFO: ngram_search_fwdflat.c(920): 2861 word transitions (31/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.82

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:82:88) = -275522

INFO: ps_lattice.c(1266): Joint P(O,S) = -277262 P(S|O) = -1740

000000003: 一年 (-4414731)

READY....

Listening...

Stopped listening, please wait...

……………………………

…………………………..

……………………………

INFO: ngram_search_fwdflat.c(920): 6841 word transitions (73/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.85

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:85:91) = -261132

INFO: ps_lattice.c(1266): Joint P(O,S) = -278320 P(S|O) = -17188

000000008: 留念 (-3893136)

…………………..

…………………

……………………

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.99

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:99:105) = -305972

INFO: ps_lattice.c(1266): Joint P(O,S) = -325764 P(S|O) = -19792

000000010: 基民 (-4532446)

…………………..

INFO: ngram_search_fwdflat.c(920): 5175 word transitions (46/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.102

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:102:110) = -283182

INFO: ps_lattice.c(1266): Joint P(O,S) = -283589 P(S|O) = -407

000000012: 一九八九 (-4134767)

READY....

从以上识别结果可以看出，这普通话语音识别正确率比原来的英文语音识别正确率低了很多，这和我们的口音存在一定的关系，为了能有比较高的普通话语音识别率，最好自已生成语音模型与声学模型。

mirkerson

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录