参考:
SRILM安装:http://blog.csdn.net/zhoubl668/article/details/7759370
SRILM使用:http://hi.baidu.com/keyever/item/8fad8918b90b8e6b3f87ce87
文献:SRILM - An Extensible Language Modeling Toolkit(点此阅读)
更有兴趣的可以参考:
SRILM源码框架分析:http://download.csdn.net/download/yqzhao/4546985
SRILM源码阅读系列:http://blog.chinaunix.net/uid/20658401/cid-67529-list-1.html
SRILM打折算法:http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html
两个核心模块
SRILM工具包的有两个核心模块,一个是利用训练数据构建语言模型,是ngram-count模块,另一个是对语言模型进评测(计算测试集困惑度),是ngram模块。
一. ngram-count
对于ngram-count模块,有很多的计数功能,可以单独生成训练语料的计数文件,然后可以读取计数文件构建语言模型,也可以两步一起做。
假设语料库的名字是train.data,如下:
it 's just down the hall . I 'll bring you some now . if there is anything else you need , just let me know .
No worry about that . I 'll take it and you need not wrap it up .
Do you do alterations ?
the light was red .
we want to have a table near the window .
it 's over there , just in front of the tourist information .
I twisted it playing tennis . it felt Okay after the game but then it started turning black - and - blue . is it serious ?
please input your pin number .