java weka roc curves,Java中的Weka UI和API代码给出了不同的结果

夏末的回忆

于 2021-02-27 07:43:07 发布

阅读量82

点赞数

文章标签： java weka roc curves

I am new to Weka.

I am trying to run WEKA using API's and have found out that the results from the WEKA GUI does not match to the one produced by the Java code.

I am trying to run a RandomForest Algorithm by providing TrainingSet and Test Set.

Here is the code snippet:

DataSource ds = new DataSource(trainingFile);

Instances insts = ds.getDataSet();

insts.setClassIndex(insts.numAttributes() - 1);

Classifier cl = new RandomForest();

RandomForest rf = (RandomForest)cl;

// rf.setOptions(options);

// rf.setNumExecutionSlots(1);

rf.setNumFeatures(5);

rf.setSeed(1);

rf.setNumExecutionSlots(1);

Remove remove = new Remove();

int[] attrs = WekaCustomisation.convertIntegers(attrList);

remove.setAttributeIndicesArray(attrs);

remove.setInvertSelection(true);

remove.setInputFormat(insts);

insts = weka.filters.Filter.useFilter(insts, remove);

insts.setClassIndex(insts.numAttributes() - 1);

weka.core.Instances train = new weka.core.Instances(insts, 0, insts.numInstances());

cl.buildClassifier(train);

weka.core.converters.ConverterUtils.DataSource ds2 = new weka.core.converters.ConverterUtils.DataSource(testFile);

weka.core.Instances instsTest = ds2.getDataSet();

remove.setInputFormat(instsTest);

instsTest = weka.filters.Filter.useFilter(instsTest, remove);

instsTest.setClassIndex(instsTest.numAttributes() - 1);

Instances testInstances = new Instances(instsTest);

int numCorrect = 0;

weka.classifiers.Evaluation eval = new weka.classifiers.Evaluation(train);

eval.evaluateModel(cl, testInstances);

System.out.println(eval.toSummaryString());

out.write(eval.toSummaryString());

double roc = eval.areaUnderROC(0);

The confusion matrix produced by the WEKA GUI and this code differs. What am I missing here.

解决方案

At first check if the parameters and filterings executed in the Weka GUI are the same you are doing in the code. (take a look at the log generated in the GUI)

A second possilibty is the random component that the Random Forest models have in its creation structure (selecting random features in the dataset for each decision tree, see here). So, during the training phase different models are generated to the same train dataset and when you evaluate with the test you get different results.

夏末的回忆

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java weka roc curves,Java中的Weka UI和API代码给出了不同的结果

I am new to Weka.I am trying to run WEKA using API's and have found out that the results from the WEKA GUI does not match to the one produced by the Java code.I am trying to run a RandomForest Algorit...
复制链接

扫一扫