srilm 自用记录

最新推荐文章于 2022-12-01 21:32:09 发布

花咪

最新推荐文章于 2022-12-01 21:32:09 发布

阅读量1.4k

点赞数

分类专栏：自然语言处理文章标签：自然语言处理语言模型

本文链接：https://blog.csdn.net/cyinfi/article/details/79636985

版权

自然语言处理专栏收录该内容

29 篇文章 2 订阅

订阅专栏

（1）下载与安装

下载网址：https://download.csdn.net/download/cyinfi/10299520
解压： tar -zxvf irstlm-5.80.08.tgz
进入目录： srilm-1.7.2
编译： make World (注意更改编译目录：修改makefile文件，添加 SRILM = $(PWD))
测试： make test

结果如下便成功了。

fngram-count: stdout output IDENTICAL.

fngram-count: stderr output IDENTICAL.

（2）参数介绍用法

-version: print version information

-order: max ngram order

Default value: 3

-debug: debugging level for lm

Default value: 0

-skipoovs: skip n-gram contexts containing OOVs

-df: use disfluency ngram model

-tagged: use a tagged LM

-factored: use a factored LM

-skip: use skip ngram model

-hiddens: use hidden sentence ngram model

-hidden-vocab: hidden ngram vocabulary

-hidden-not: process overt hidden events

-classes: class definitions

-simple-classes: use unique class model

-expand-classes: expand class-model into word-model

Default value: -1

-expand-exact: compute expanded ngrams longer than this exactly

Default value: 0

-stop-words: stop-word vocabulary for stop-Ngram LM

-decipher: use bigram model exactly as recognizer

-unk: vocabulary contains unknown word tag

-nonull: remove <NULL> in LM

-map-unk: word to map unknown words to

-zeroprob-word: word to back off to for zero probs

-tolower: map vocabulary to lowercase

-multiwords: split multiwords for LM evaluation

-ppl: text file to compute perplexity from

-text-has-weights: text file contains sentence weights

-escape: escape prefix to pass data through -ppl

-counts: count file to compute perplexity from

-counts-entropy: compute entropy (not perplexity) from counts

-count-order: max count order used by -counts

Default value: 0

-float-counts: use fractional -counts

-use-server: port@host to use as LM server

-cache-served-ngrams: enable client side caching

-server-port: port to listen on as probability server

Default value: 0

-server-maxclients: maximum number of simultaneous server clients

Default value: 0

-gen: number of random sentences to generate

Default value: 0

-gen-prefixes: file of prefixes to generate sentences

-seed: seed for randomization

Default value: 1521617620

-vocab: vocab file

-vocab-aliases: vocab alias file

-nonevents: non-event vocabulary

-limit-vocab: limit LM reading to specified vocabulary

-codebook: codebook for quantized LM parameters

-write-codebook: output codebook (for validation)

-write-with-codebook: write ngram LM using codebook

-quantize: quantize ngram LM using specified number of bins

Default value: 0

-lm: file in ARPA LM format

-bayes: context length for Bayes mixture LM

Default value: 4294967295

-bayes-scale: log likelihood scale for -bayes

Default value: 1

-mix-lm: LM to mix in

-lambda: mixture weight for -lm

Default value: 0.5

-mix-lm2: second LM to mix in

-mix-lambda2: mixture weight for -mix-lm2

Default value: 0

-mix-lm3: third LM to mix in

-mix-lambda3: mixture weight for -mix-lm3

Default value: 0

-mix-lm4: fourth LM to mix in

-mix-lambda4: mixture weight for -mix-lm4

Default value: 0

-mix-lm5: fifth LM to mix in

-mix-lambda5: mixture weight for -mix-lm5

Default value: 0

-mix-lm6: sixth LM to mix in

-mix-lambda6: mixture weight for -mix-lm6

Default value: 0

-mix-lm7: seventh LM to mix in

-mix-lambda7: mixture weight for -mix-lm7

Default value: 0

-mix-lm8: eighth LM to mix in

-mix-lambda8: mixture weight for -mix-lm8

Default value: 0

-mix-lm9: ninth LM to mix in

-mix-lambda9: mixture weight for -mix-lm9

Default value: 0

-context-priors: context-dependent mixture weights file

-loglinear-mix: use log-linear mixture LM

-read-mix-lms: read mixture LMs from -lm file

-maxent: Read a maximum entropy model

-mix-maxent: Mixed LMs in the interpolation scheme are maximum entropy models

-maxent-convert-to-arpa: Convert maxent model to backoff model

-null: use a null language model

-cache: history length for cache language model

Default value: 0

-cache-lambda: interpolation weight for -cache

Default value: 0.05

-dynamic: interpolate with a dynamic lm

-hmm: use HMM of n-grams model

-count-lm: use a count-based LM

-msweb-lm: use Microsoft Web LM

-adapt-mix: use adaptive mixture of n-grams model

-adapt-decay: history likelihood decay factor

Default value: 1

-adapt-iters: EM iterations for adaptive mix

Default value: 2

-adapt-marginals: unigram marginals to adapt base LM to

-base-marginals: unigram marginals of base LM to

-adapt-marginals-beta: marginals adaptation weight

Default value: 0.5

-adapt-marginals-ratios: compute ratios between marginals-adapted and base probs

-dynamic-lambda: interpolation weight for -dynamic

Default value: 0.05

-reverse: reverse words

-no-sos: don't insert start-of-sentence tokens

-no-eos: don't insert end-of-sentence tokens

-rescore-ngram: recompute probs in N-gram LM

-write-lm: re-write LM to file

-write-bin-lm: write LM to file in binary format

-write-oldbin-lm: write LM to file in old binary format

-write-vocab: write LM vocab to file

-renorm: renormalize backoff weights

-prune: prune redundant probs

Default value: 0

-minprune: prune only ngrams at least this long

Default value: 2

-prune-lowprobs: low probability N-grams

-prune-history-lm: LM used for history probabilities in pruning

-memuse: show memory usage

-nbest: nbest list file to rescore

-nbest-files: list of N-best filenames

-split-multiwords: split multiwords in N-best lists

-multi-char: multiword component delimiter

Default value: "_"

-write-nbest-dir: output directory for N-best rescoring

-decipher-nbest: output Decipher n-best format

-max-nbest: maximum number of hyps to consider

Default value: 0

-no-reorder: don't reorder N-best hyps after rescoring

-rescore: hyp stream input file to rescore

-decipher-lm: DECIPHER(TM) LM for nbest list generation

-decipher-order: ngram order for -decipher-lm

Default value: 2

-decipher-nobackoff: disable backoff hack in recognizer LM

-decipher-lmw: DECIPHER(TM) LM weight

Default value: 8

-decipher-wtw: DECIPHER(TM) word transition weight

Default value: 0

-rescore-lmw: rescoring LM weight

Default value: 8

-rescore-wtw: rescoring word transition weight

Default value: 0

-noise: noise tag to skip

-noise-vocab: noise vocabulary to skip

-help: Print this message

（3）常用命令行（进入到lm目录下的子目录下，也可以添加到PATH路径）