- ARPA format language model [slides, manpage, blog, htkbook]
- Note: the log prob’s base is 10
- http://www.statmt.org/book/slides/07-language-models.pdf
- 4-6 - Back-off and Interpolation - Stanford NLP - Dan Jurafsky & Chris Manning
- Lecture 7: Finite State Transducers, Language Modeling, and Speech Recognition Search
Back-off model:
S(wi|wi−1i−k+1)=⎧⎩⎨⎪⎪⎪⎪C(wii−k+1)C(wi−1i−k+1)α⋅S(wi|wi−1i−k+2)ifC(wi−1i−k+1)>0otherwise
Toolkits:
Libraries:
- NGram
- Create and manipulate n-gram language models encoded with weighted FSTs.
- Thrax
- compile regular expressions and content-dependent rewrite grammars into weighted FSTs.