对于CRFs模型,一直没有找到比较直观简单的Java实现。没有办法,就参考了博客:
该博客的细节非常清楚。在windows系统中使用CRF++,可以下载编译好的程序,地址如下:
最好还是能自己找到原始地址去,推荐一个过墙的东东:http://honx.in/i/VKEHUuz5NEy3Un5i ,我一直在用,感觉还行。
最后直接给出用pku的数据集在backoff测试的结果:
封闭测试:(用训练集测试)
=== SUMMARY:
=== TOTAL INSERTIONS:987
=== TOTAL DELETIONS:1312
=== TOTAL SUBSTITUTIONS:2277
=== TOTAL NCHANGE:4576
=== TOTAL TRUE WORD COUNT:1109947
=== TOTAL TEST WORD COUNT:1109622
=== TOTAL TRUE WORDS RECALL:0.997
=== TOTAL TEST WORDS PRECISION:0.997
=== F MEASURE:0.997
=== OOV Rate:0.000
=== OOV Recall Rate:--
=== IV Recall Rate:0.997
###pku_crf_training.word.utf8987131222774576110994711096220.9970.9970.9970.000--0.997
开放测试:(用gold/test测试)
=== SUMMARY:
=== TOTAL INSERTIONS:1492
=== TOTAL DELETIONS:3150
=== TOTAL SUBSTITUTIONS:4966
=== TOTAL NCHANGE:9608
=== TOTAL TRUE WORD COUNT:104372
=== TOTAL TEST WORD COUNT:102714
=== TOTAL TRUE WORDS RECALL:0.922
=== TOTAL TEST WORDS PRECISION:0.937
=== F MEASURE:0.930
=== OOV Rate:0.058
=== OOV Recall Rate:0.562
=== IV Recall Rate:0.944
###pku_crf_test.word.utf814923150496696081043721027140.9220.9370.9300.0580.5620.944
开放测试也有93.7%的正确率,这还是没有经过调优。都是CRF++默认的的特征模板。