srilm 自用记录

(1)下载与安装

  • 下载网址:https://download.csdn.net/download/cyinfi/10299520
  • 解压: tar -zxvf  irstlm-5.80.08.tgz
  •  进入目录: srilm-1.7.2
  •  编译: make World   (注意更改编译目录:修改makefile文件,添加 SRILM = $(PWD))
  •  测试: make test 

        结果如下便成功了。

        fngram-count: stdout output IDENTICAL.

        fngram-count: stderr output IDENTICAL.

(2)参数介绍用法

 -version:                print version information

 -order:                  max ngram order

Default value: 3

 -debug:                  debugging level for lm

Default value: 0

 -skipoovs:               skip n-gram contexts containing OOVs

 -df:                     use disfluency ngram model

 -tagged:                 use a tagged LM

 -factored:               use a factored LM

 -skip:                   use skip ngram model

 -hiddens:                use hidden sentence ngram model

 -hidden-vocab:           hidden ngram vocabulary

 -hidden-not:             process overt hidden events

 -classes:                class definitions

 -simple-classes:         use unique class model

 -expand-classes:         expand class-model into word-model

Default value: -1

 -expand-exact:           compute expanded ngrams longer than this exactly

Default value: 0

 -stop-words:             stop-word vocabulary for stop-Ngram LM

 -decipher:               use bigram model exactly as recognizer

 -unk:                    vocabulary contains unknown word tag

 -nonull:                 remove <NULL> in LM

 -map-unk:                word to map unknown words to

 -zeroprob-word:          word to back off to for zero probs

 -tolower:                map vocabulary to lowercase

 -multiwords:             split multiwords for LM evaluation

 -ppl:                    text file to compute perplexity from

 -text-has-weights:       text file contains sentence weights

 -escape:                 escape prefix to pass data through -ppl

 -counts:                 count file to compute perplexity from

 -counts-entropy:         compute entropy (not perplexity) from counts

 -count-order:            max count order used by -counts

Default value: 0

 -float-counts:           use fractional -counts

 -use-server:             port@host to use as LM server

 -cache-served-ngrams:    enable client side caching

 -server-port:            port to listen on as probability server

Default value: 0

 -server-maxclients:      maximum number of simultaneous server clients

Default value: 0

 -gen:                    number of random sentences to generate

Default value: 0

 -gen-prefixes:           file of prefixes to generate sentences

 -seed:                   seed for randomization

Default value: 1521617620

 -vocab:                  vocab file

 -vocab-aliases:          vocab alias file

 -nonevents:              non-event vocabulary

 -limit-vocab:            limit LM reading to specified vocabulary

 -codebook:               codebook for quantized LM parameters

 -write-codebook:         output codebook (for validation)

 -write-with-codebook:    write ngram LM using codebook

 -quantize:               quantize ngram LM using specified number of bins

Default value: 0

 -lm:                     file in ARPA LM format

 -bayes:                  context length for Bayes mixture LM

Default value: 4294967295

 -bayes-scale:            log likelihood scale for -bayes

Default value: 1

 -mix-lm:                 LM to mix in

 -lambda:                 mixture weight for -lm

Default value: 0.5

 -mix-lm2:                second LM to mix in

 -mix-lambda2:            mixture weight for -mix-lm2

Default value: 0

 -mix-lm3:                third LM to mix in

 -mix-lambda3:            mixture weight for -mix-lm3

Default value: 0

 -mix-lm4:                fourth LM to mix in

 -mix-lambda4:            mixture weight for -mix-lm4

Default value: 0

 -mix-lm5:                fifth LM to mix in

 -mix-lambda5:            mixture weight for -mix-lm5

Default value: 0

 -mix-lm6:                sixth LM to mix in

 -mix-lambda6:            mixture weight for -mix-lm6

Default value: 0

 -mix-lm7:                seventh LM to mix in

 -mix-lambda7:            mixture weight for -mix-lm7

Default value: 0

 -mix-lm8:                eighth LM to mix in

 -mix-lambda8:            mixture weight for -mix-lm8

Default value: 0

 -mix-lm9:                ninth LM to mix in

 -mix-lambda9:            mixture weight for -mix-lm9

Default value: 0

 -context-priors:         context-dependent mixture weights file

 -loglinear-mix:          use log-linear mixture LM

 -read-mix-lms:           read mixture LMs from -lm file

 -maxent:                 Read a maximum entropy model

 -mix-maxent:             Mixed LMs in the interpolation scheme are maximum entropy models

 -maxent-convert-to-arpa: Convert maxent model to backoff model

 -null:                   use a null language model

 -cache:                  history length for cache language model

Default value: 0

 -cache-lambda:           interpolation weight for -cache

Default value: 0.05

 -dynamic:                interpolate with a dynamic lm

 -hmm:                    use HMM of n-grams model

 -count-lm:               use a count-based LM

 -msweb-lm:               use Microsoft Web LM

 -adapt-mix:              use adaptive mixture of n-grams model

 -adapt-decay:            history likelihood decay factor

Default value: 1

 -adapt-iters:            EM iterations for adaptive mix

Default value: 2

 -adapt-marginals:        unigram marginals to adapt base LM to

 -base-marginals:         unigram marginals of base LM to

 -adapt-marginals-beta:   marginals adaptation weight

Default value: 0.5

 -adapt-marginals-ratios: compute ratios between marginals-adapted and base probs

 -dynamic-lambda:         interpolation weight for -dynamic

Default value: 0.05

 -reverse:                reverse words

 -no-sos:                 don't insert start-of-sentence tokens

 -no-eos:                 don't insert end-of-sentence tokens

 -rescore-ngram:          recompute probs in N-gram LM

 -write-lm:               re-write LM to file

 -write-bin-lm:           write LM to file in binary format

 -write-oldbin-lm:        write LM to file in old binary format

 -write-vocab:            write LM vocab to file

 -renorm:                 renormalize backoff weights

 -prune:                  prune redundant probs

Default value: 0

 -minprune:               prune only ngrams at least this long

Default value: 2

 -prune-lowprobs:         low probability N-grams

 -prune-history-lm:       LM used for history probabilities in pruning

 -memuse:                 show memory usage

 -nbest:                  nbest list file to rescore

 -nbest-files:            list of N-best filenames

 -split-multiwords:       split multiwords in N-best lists

 -multi-char:             multiword component delimiter

Default value: "_"

 -write-nbest-dir:        output directory for N-best rescoring

 -decipher-nbest:         output Decipher n-best format

 -max-nbest:              maximum number of hyps to consider

Default value: 0

 -no-reorder:             don't reorder N-best hyps after rescoring

 -rescore:                hyp stream input file to rescore

 -decipher-lm:            DECIPHER(TM) LM for nbest list generation

 -decipher-order:         ngram order for -decipher-lm

Default value: 2

 -decipher-nobackoff:     disable backoff hack in recognizer LM

 -decipher-lmw:           DECIPHER(TM) LM weight

Default value: 8

 -decipher-wtw:           DECIPHER(TM) word transition weight

Default value: 0

 -rescore-lmw:            rescoring LM weight

Default value: 8

 -rescore-wtw:            rescoring word transition weight

Default value: 0

 -noise:                  noise tag to skip

 -noise-vocab:            noise vocabulary to skip

 -help:                   Print this message

(3)常用命令行(进入到lm目录下的子目录下,也可以添加到PATH路径)

             生成计数文件: ngram-count -text train.txt -order 3 -write train.txt.count

      生成语言模型 :ngram-count -read train.txt.count -order 3 -lm LM -interpolate -kndiscount

      计算困惑度 : ngram -ppl test.txt -order 3 -lm LM > result

      句子单独打分: ngram -ppl test.txt -order 3 -lm LM -debug 1 > result

        

    参考资源 :https://blog.csdn.net/u011500062/article/details/50780935

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

花咪

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值