首先要提的是LibSVM是一个库,Lib很明显是library的缩写,有些人不知道怎么会认为它是一种算法。它是由中国台湾的,记住是中国的台湾(⊙o⊙)…,Chih-Chung Chang和Chih-Jen Lin等人开发的,他们用多种语言实现写了LibSVM。
我把这一篇放到Weka开发里讲,主要讲它怎么和Weka结合,Weka中并不是没有SVM算法,Weka中有SMO算法的实现。
Weka and LibSVM are two efficient software tools for building SVM classifiers. Each one of these two tools has its points of strength and weakness. Weka has a GUI and produces many useful statistics (e.g. confusion matrix, precision, recall, F-measure, and ROC scores). LibSVM runs much faster than Weka SMO and supports several SVM methods (e.g. One-class SVM, nu-SVM, and R-SVM). Weka LibSVM (WLSVM) combines the merits of the two tools. WLSVM can be viewed as an implementation of the LibSVM running under Weka environment.
这一段是我从拷贝来的,请注意里面有一句话,LibSVM运行的比Weka里的SMO快的多,如果你敢用SMO算法去训练大数据集,你就明白天荒地老的真实含意了。其它和Weka结合的最主要的原因,我认为是,我们开始的时候往往都是用别的算法去试着做实验的(或者本来就需要多种基分类器),比如Naïve Bayes(LibSVM是比SMO快的多,可是比起来Naïve Bayes,它还是蜗牛),到最后又想试试LibSVM,这时就需要LibSVM库。
还有一点也是容易误解的就是Weka里的高版本,里面是有LibSVM这个分类器,但你如果直接想运行是会出错的,提示你没有设置路径。原因是:WLSVM can be viewed as an implementation of the LibSVM running under Weka environment.
先把LibSVM下载下来,上次竟然有人问我在哪下载,这种问题,我真是不想回答,不就是google一下吗?网址也放上:http://www.csie.ntu.edu.tw/~cjlin/libsvm/,下载WLSVM,解压后Lib文件夹下有一个LibSVM.jar的包,用和导入Weka.jar包相同的方式导入就好了,然后使用LibSVM和使用以前任何一种分类器的方式都是一样的。这里顺便带大家复习以下特征选择。
package com.cizito.weka.study;
import java.util.Random;
import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.GreedyStepwise;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.functions.LibLINEAR;
import weka.classifiers.functions.LibSVM;
import weka.classifiers.meta.AttributeSelectedClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.supervised.attribute.AttributeSelection;
/**
* @author zhangwei
*
*/
public class LibSVMTest {
private Instances m_instances = null;
private Instances selectedIns;
public static void main( String[] args ) throws Exception {
LibSVMTest filter = new LibSVMTest();
filter.getFileInstances( "D:/ProgramFiles/Weka-3-6/data/soybean.arff");
filter.selectAttUseFilter();
filter.selectAttUseMC();
}
public void getFileInstances( String fileName ) throws Exception {
DataSource frData = new DataSource( fileName );
m_instances = frData.getDataSet();
m_instances.setClassIndex( m_instances.numAttributes() - 1 );
}
public void selectAttUseFilter() throws Exception {
AttributeSelection filter = new AttributeSelection(); // package weka.filters.supervised.attribute!
CfsSubsetEval eval = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();
filter.setEvaluator(eval);
filter.setSearch(search);
filter.setInputFormat( m_instances );
System.out.println( "number of instance attribute = " + m_instances.numAttributes() );
selectedIns = Filter.useFilter( m_instances, filter);
System.out.println( "number of selected instance attribute = " + selectedIns.numAttributes() );
for( int i = 0; i < selectedIns.numInstances(); i++ ) {
System.out.println( selectedIns.instance( i ) );
}
}
public void selectAttUseMC() throws Exception {
AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();
CfsSubsetEval eval = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();
J48 base = new J48();
NaiveBayes nb = new NaiveBayes();
LibSVM svm = new LibSVM();
LibLINEAR linear = new LibLINEAR();
classifier.setClassifier( svm );
classifier.setEvaluator( eval );
classifier.setSearch( search );
// 10-fold cross-validation
Evaluation evaluation = new Evaluation( selectedIns );
evaluation.crossValidateModel(svm, m_instances, 10, new Random(1));
System.out.println("正确率为:" + (1-evaluation.errorRate()));
System.out.println( evaluation.toSummaryString());
}
}
好人做到底吧,顺便把LibSVM包上传,然后告诉大家其实新版的WEKA里面已经包含了LibSVM这个分类器了