在Java中使用weka：入门

最新推荐文章于 2024-08-04 11:07:52 发布

Bryan__

最新推荐文章于 2024-08-04 11:07:52 发布

阅读量6k

点赞数

分类专栏：在Java中使用weka 文章标签： java 数据挖掘 weka

在Java中使用weka 专栏收录该内容

12 篇文章 1 订阅

订阅专栏

本文介绍如何使用weka构建特征向量，训练分类器，测试分类器，使用分类器。

第一步：用特征表达问题（属性）

这一步相当于构建一个arff文件

我们先把特征放入weka.core.FastVector中

每个特征都包含在weka.core.Attribute类中

现在我们有两个numeric 特征，一个 nominal 特征 (blue, gray, black) 和一个 nominal 类 (positive, negative).

// Declare two numeric attributes
 Attribute Attribute1 = new Attribute(“firstNumeric”);
 Attribute Attribute2 = new Attribute(“secondNumeric”);
 
 // Declare a nominal attribute along with its values
 FastVector fvNominalVal = new FastVector(3);
 fvNominalVal.addElement(“blue”);
 fvNominalVal.addElement(“gray”);
 fvNominalVal.addElement(“black”);
 Attribute Attribute3 = new Attribute(“aNominal”, fvNominalVal);
 
 // Declare the class attribute along with its values
 FastVector fvClassVal = new FastVector(2);
 fvClassVal.addElement(“positive”);
 fvClassVal.addElement(“negative”);
 Attribute ClassAttribute = new Attribute(“theClass”, fvClassVal);
 
 // Declare the feature vector
 FastVector fvWekaAttributes = new FastVector(4);
 fvWekaAttributes.addElement(Attribute1);    
 fvWekaAttributes.addElement(Attribute2);    
 fvWekaAttributes.addElement(Attribute3);    
 fvWekaAttributes.addElement(ClassAttribute);

第二步：训练分类器

需要训练集实例和分类器

我们先创建一个空的训练集（weka.core.Instances）

命名这个关系为 “Rel”（相当于文件名字）

属性模型使用第一步中定义的vector定义

初始化训练集容量为10

定义类属性为第一步向量中的第四个（classindex）

// Create an empty training set
 Instances isTrainingSet = new Instances("Rel", fvWekaAttributes, 10);           
 // Set class index
 isTrainingSet.setClassIndex(3);

现在用一个实例来填充训练集

// Create the instance
 Instance iExample = new Instance(4);
 iExample.setValue((Attribute)fvWekaAttributes.elementAt(0), 1.0);      
 iExample.setValue((Attribute)fvWekaAttributes.elementAt(1), 0.5);      
 iExample.setValue((Attribute)fvWekaAttributes.elementAt(2), "gray");
 iExample.setValue((Attribute)fvWekaAttributes.elementAt(3), "positive");
 
 // add the instance
 isTrainingSet.add(iExample);

最后，选择一个分类器（ weka.classifiers.Classifier）并创建模型，我们使用朴素贝叶斯分类器（ weka.classifiers.bayes.NaiveBayes）

// Create a naïve bayes classifier 
 Classifier cModel = (Classifier)new NaiveBayes();
 cModel.buildClassifier(isTrainingSet);

第三步：测试分类器

我们已经创建并训练了一个分类器，现在来测试这个分类器。我们需要一个评估模型（weka.classifiers.Evaluation），把测试集塞进去试试效果如何。

// Test the model
 Evaluation eTest = new Evaluation(isTrainingSet);
 eTest.evaluateModel(cModel, isTestingSet);

评估模型可以输出一系列统计数据

// Print the result à la Weka explorer:
 String strSummary = eTest.toSummaryString();
 System.out.println(strSummary);
 
 // Get the confusion matrix
 double[][] cmMatrix = eTest.confusionMatrix();
 System.out.println(etest.toMatrixString());

第四步：使用这个分类器

在实际应用中，使用这个分类器才是终极目标。下面是一个最简单的例子，使用在第二步中创建的实例（ iUse）。

// Specify that the instance belong to the training set 
 // in order to inherit from the set description
 iUse.setDataset(isTrainingSet);
 
 // Get the likelihood of each classes 
 // fDistribution[0] is the probability of being “positive” 
 // fDistribution[1] is the probability of being “negative” 
 double[] fDistribution = cModel.distributionForInstance(iUse);