开源语言模型工具:
- SRILM( http://www.speech.sri.com/projects/srilm/ )
- IRSTLM( http://hlt.fbk.eu/en/irstlm )
- MITLM( http://code.google.com/p/mitlm/ )
- BerkeleyLM( http://code.google.com/p/berkeleylm/ )
开源 n-gram 数据集:
- Google Web1T5-gram( http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html )
- Google Book N-grams( http://books.google.com/ngrams/ )
- Chinese Web 5-gram(http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2010T06)