1、中文分词包:Stanford Word Segmenter ,基于crf模型
实现论文:Huihsin Tseng, Pichuan Chang, Galen Andrew, Daniel Jurafsky and Christopher Manning. 2005. A Conditional Random Field Word Segmenter. In Fourth SIGHAN Workshop on Chinese Language Processing.
2、词性标注包:Stanford Log-linear Part-Of-Speech Tagger 基于循环依赖网络 类似于crf
实现论文:Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.
3、命名实体识别:Stanford Named Entity Recognizer 老版本基于CMM(MEMM)新版基于crf+gibbs采样
涉及论文:Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370. http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
4 分类器:Stanford Classifer 基于最大熵
实现论文:Christopher Manning and Dan Klein. 2003. Optimization, Maxent Models, and Conditional Estimation without Magic. Tutorial at HLT-NAACL 2003 and ACL 2003. [pdf slides] [pdf handouts]