lingpipe的测试Demo中的参数如下所示
mZipFile = new File(args[0],"icwb2-data.zip");
mCorpusName =args[1];
mOutputFile = new File(mCorpusName + ".segments");
mKnownToksFile = new File(mCorpusName + ".knownWords");
mMaxNGram = Integer.valueOf(args[2]);
mLambdaFactor = Double.valueOf(args[3]);
mNumChars = Integer.valueOf(args[4]);
mMaxNBest = Integer.valueOf(args[5]);
将参数作如下修改,使代码运行
mZipFile = new File("D:/..../...../lingpipe-4.1.0/demos/tutorial/chineseTokens/icwb2-data.zip");
mCorpusName = "pku";
mOutputFile = new File("D:/../.../lingpipe-4.1.0/demos/tutorial/chineseTokens/msr_test_output" + ".segments");
mKnownToksFile = new File(mCorpusName + ".knownWords");
mMaxNGram = 5;
mLambdaFactor = 5.0;
mNumChars = 4000;
mMaxNBest = 128;
s上面参数的第一个路径名的压缩文件无需解压。第二个语料名字可以根据需要修改,输出路径也可以修改,剩下的几个是自己自定义的。不知道具体的范围是多少。但是程序可以跑起来输出结果了。
======================================================
Demo里面还有一个CHineseTokener的测试程序,参数修改方式也差不多。
Data Directory=e:\..\...
Train Corpus Name=msr
Test Corpus Name=pku
Output File Name=E:\..
Known Tokens File Name=E:\..\..
Char Encoding=Big5_HKSCS
Max N-gram=5
Lambda factor=5.0
Num chars=3000
Max n-best=128
Continue weight=0.0
Break weight=0.0