Training an ARPA model with SRILM
Training a model with the SRI Language Modeling Toolkit (SRILM) is easy. That’s why we recommend it. Moreover, SRILM is the most advanced toolkit up to date. To train a model you can use the following command:
ngram-count -kndiscount -interpolate -text train-text.txt -lm your.lm
You can prune the model afterwards to reduce the size of the model:
ngram -lm your.lm -prune 1e-8 -write-lm your-pruned.lm
After training it is worth it to test the perplexity of the model on the test data:
ngram -lm your.lm -ppl test-text.txt
ERROR (arpa2fst[5.2.2~1393-a57ea]:Read():arpa-file-parser.cc:129) line 1 []: \data\ section missing or empty.
解决办法:
gzip your.lm 压缩your.lm 。然后使用your.lm.gz生成G.fst