ARPA的n-gram语法如下:
\data\
ngram 1=64000
ngram 2=522530
ngram 3=173445
\1-grams:
-5.24036 'cause -0.2084827
-4.675221 'em -0.221857
-4.989297 'n -0.05809768
-5.365303 'til -0.1855581
-2.111539 </s> 0.0
-99 <s> -0.7736475
-1.128404 <unk> -0.8049794
-2.271447 a -0.6163939
-5.174762 a's -0.03869072
-3.384722 a. -0.1877073
-5.789208 a.'s 0.0
-6.000091 aachen 0.0
-4.707208 aaron -0.2046838
-5.580914 aaron's -0.06230035
-5.789208 aarons -0.07077657
-5.881973 aaronson -0.2173971
具体说明见 :
ARPA的n-gram语言模型格式
整个ARPA-LM由很多个n-gram项组成,分别说明这两个的数据结构
一,n-gram数据结构
n-gram