SRILM安装与测试

最新推荐文章于 2024-02-19 13:23:29 发布

prolrj2015

最新推荐文章于 2024-02-19 13:23:29 发布

阅读量1.1k

点赞数

为什么安装SRILM

SRILM用来构建和应用统计语言模型，主要用于语音识别，统计标注和切分，以及机器翻译等。
我个人因为使用CMUSphinx --> Training an ARPA model with SRILM (https://cmusphinx.github.io/wiki/tutoriallm/)。当然Training an ARPA还有文章提到的另外一种方法–Training an ARPA model with CMUCLMTK。此处我使用SRILM进行ARPA模型的训练。

SRILM的下载

链接：http://www.speech.sri.com/projects/srilm/download.html 我使用的1.7.2
安装guide: http://www.speech.sri.com/projects/srilm/docs/INSTAL

相关下载及安装

tcl下载http://www.tcl.tk/software/tcltk/download.html 以及相关的安装方法都可在该链接上找到（这个包的安装，很常规，解压后，进入unix目录，下面就有configure文件了）。

cd tcl8.5.0/unix
configure --prefix=/usr/local/tcl
make
make test
make install

其他
A template-capableANSI-C/C++ compiler, preferably gcc version 3.4.3 or higher.
GNU make: to control compilation and installation.
GNU gawk: required for many of the utility scripts.
GNU gzip: to unpack the distribution, and to allow SRILM programs to handle “.Z” and “.gz” compressed datafiles (highly recommended).
bzip2: to handle “.bz2″ compressed files (optional).

SRILM的安装

上面的准备工作都做好，就可以安装SRILM了。我的SRILM解压后在 /home/lrj/projects/srilm-1.7.2中。

修改srilm/MakeFile
1.找到

# SRILM = /home/speech/stolcke/project/srilm/devel

另起一行输入 SRILM 的安装路径

SRILM = $(PWD)

2.找到

MACHINE_TYPE := $(shell $(SRILM)/sbin/machine-type)

在其前加＃注释掉，并另起一行输入：

MACHINE_TYPE := i686-gcc4

通过 uname -m 命令可以看到我的机器架构是 x86_64，那我修改的是Makefile.machine.i686-m64这个文件(我使用的是虚拟机Ubuntu16.4，不确定是不是Makefile.machine.i686-m64，因为还有个文件Makefile.machine.i686-ubuntu)。

找到：
　　　　TCL_INCLUDE =
　　　　TCL_LIBRARY =
修改为：
　　　　TCL_INCLUDE =
　　　　TCL_LIBRARY =
　　　　 NO_TCL = X　　
找到：
　　　　GAWK = /usr/bin/awk   
修改为：
　     GAWK = /usr/bin/gawk

编译SRILM
根目录下输入

make World

然后等待执行完毕。

修改环境变量
在终端输入：

export PATH=/home/lrj/projects/srilm-1.7.2/bin/i686-m64/:/home/lrj/projects/srilm-1.7.2/bin:$PATH

SRILM测试

1.使用SRILM提供的测试模块进行测试。进入安装的根目录：

make test

会出现很长的输出信息，没有报错即完成了测试。

2.自己新建一个txt文档进行测试：(参考https://blog.csdn.net/u010747691/article/details/44176851 )
例，source.txt

If you do want to use SRILM or are generally interested in it, please consider joining the SRILM user mailing list.

然后执行命令：

ngram-count -text source.txt -lm source.lm

就会建立基于source.txt的统计语言模型，存储在source.lm中。
只希望针对指定的词进行统计时，就建立一个词列表，例source.dict

you  
are  
list

然后执行

ngram-count -text source.txt -lm source.lm -vocab source.dict

此时安装的SRILM可以正式投入使用了，以后关于训练模型的笔记会进一步更新。

prolrj2015

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫