java weka roc curves,Java中的Weka UI和API代码给出了不同的结果

I am new to Weka.

I am trying to run WEKA using API's and have found out that the results from the WEKA GUI does not match to the one produced by the Java code.

I am trying to run a RandomForest Algorithm by providing TrainingSet and Test Set.

Here is the code snippet:

DataSource ds = new DataSource(trainingFile);

Instances insts = ds.getDataSet();

insts.setClassIndex(insts.numAttributes() - 1);

Classifier cl = new RandomForest();

RandomForest rf = (RandomForest)cl;

// rf.setOptions(options);

// rf.setNumExecutionSlots(1);

rf.setNumFeatures(5);

rf.setSeed(1);

rf.setNumExecutionSlots(1);

Remove remove = new Remove();

int[] attrs = WekaCustomisation.convertIntegers(attrList);

remove.setAttributeIndicesArray(attrs);

remove.setInvertSelection(true);

remove.setInputFormat(insts);

insts = weka.filters.Filter.useFilter(insts, remove);

insts.setClassIndex(insts.numAttributes() - 1);

weka.core.Instances train = new weka.core.Instances(insts, 0, insts.numInstances());

cl.buildClassifier(train);

weka.core.converters.ConverterUtils.DataSource ds2 = new weka.core.converters.ConverterUtils.DataSource(testFile);

weka.core.Instances instsTest = ds2.getDataSet();

remove.setInputFormat(instsTest);

instsTest = weka.filters.Filter.useFilter(instsTest, remove);

instsTest.setClassIndex(instsTest.numAttributes() - 1);

Instances testInstances = new Instances(instsTest);

int numCorrect = 0;

weka.classifiers.Evaluation eval = new weka.classifiers.Evaluation(train);

eval.evaluateModel(cl, testInstances);

System.out.println(eval.toSummaryString());

out.write(eval.toSummaryString());

double roc = eval.areaUnderROC(0);

The confusion matrix produced by the WEKA GUI and this code differs. What am I missing here.

解决方案

At first check if the parameters and filterings executed in the Weka GUI are the same you are doing in the code. (take a look at the log generated in the GUI)

A second possilibty is the random component that the Random Forest models have in its creation structure (selecting random features in the dataset for each decision tree, see here). So, during the training phase different models are generated to the same train dataset and when you evaluate with the test you get different results.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值