LIBSVM在MATLAB中的使用

最新推荐文章于 2024-09-16 14:40:26 发布

yunlinzi

最新推荐文章于 2024-09-16 14:40:26 发布

阅读量869

点赞数 1

分类专栏： MATLAB数据分析统计学习

本文链接：https://blog.csdn.net/yunlinzi/article/details/90295801

版权

统计学习同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

MATLAB数据分析

4 篇文章 7 订阅

订阅专栏

LIBSVM简介

准备工作

下载LIBSVM工具（LIBSVM工具下载地址），然后解压，在MATLAB中进行make生成相应的可执行文件即可使用，就像使用MATLAB函数一样，后续的模型训练等工作主要是使用以下四个文件（函数）：

libsvmread.mexw64： 将原始数据（txt，xls等）转化为SVM package格式
libsvmwrite.mexw64： 将SVM package格式合并成其它格式文件（txt，xls等）输出
svmtrain.mexw64： 用训练数据进行训练得到一个模型
svmpredict.mexw64： 用训练所得模型进行分类预测

使用方法

帮助文档中给出如下建议的使用LIBSVM进行数据挖掘的建议过程：

Transform data to the format of an SVM package；
Conduct simple scaling on the data（数据归一化）；
Consider the RBF kernel $K(x,y) = e^{ −γ||x−y||^2}$ ;
Use cross-validation to find the best parameter $C$ and $γ$ ；
Use the best parameter $C$ and $γ$ to train the whole training set ；
Test；

各步骤细节如下：

数据转换： SVM要求数据为实数（不论特征还是标签），因此数据转换将非数值型属性转换为数值型。LIVSVM的数据格式如下，因此使用前应先将数据按下述格式整理：
$L a b e l : v a l u e 1 : v a l u e 2 : \dots .$
归一化： 归一化的作用十分重要，其目的主要是：

1）防止某个特征过大或过小，从而在训练中起的作用不平衡；

2）为了计算速度。因为在核计算中，会用到内积运算或exp运算，不平衡的数据可能造成计算困难。

归一化的范围可以自己定，一般是 [0,1] 或 [-1,1].

LIBSVM Data Sets 提供了大量已经整理好格式的数据，如数据集中的 $a 1 a$ 部分数据如下：

注意： 如果特征值为0，特征冒号前面的(姑且称做序号)可以不连续。

这些数据格式均是LIBSVM要求的格式，大部分是数据也已经进行了归一化，所以格式转换和归一化2个步骤一般是针对用户自己的数据集而言。LIBSVM提供了一个简单的归一化工具（windows文件夹下，这个没有对应的MATLAB文件），用法如下：

经此函数归一化后后，数据即可用来进行模型训练及预测。一般经过 $t r a i n$ 过程得到一个 $m o d e l$ ，然后利用此 $m o d e l$ 在测试集上进行 $p r e d i c t$ 过程，即可得到分类（或回归等）的结果，具体如下：
训练模型及预测（下面函数有Windows、MATLAB等各种版本。以下以MATLAB版本为例）：

（1）一般使用LIBSVM的 $l i b s v m r e a d$ 函数获取原始数据的标签和实例，用法如下：
$[label_-vector, instance_-matrix] = libsvmread('filename');$
其中， $f i l e n a m e$ 是原始数据文件名，此文件可以是上述数据集中的文件，也可以是 $t x t$ 、 $l s x$ 等格式的文件，只要内容是上述格式的即可。 $label_-vector$ 是获取到的标签向量， $instance_-matrix$ 是得到的实例矩阵。

（2）模型训练：
$svmtrain(training_-label_-vector, training_-instance_-matrix, 'libsvm_-options');$
```
    libsvm_options:
    -s svm_type : set type of SVM (default 0)
    	0 -- C-SVC		(multi-class classification)
    	1 -- nu-SVC		(multi-class classification)
    	2 -- one-class SVM
    	3 -- epsilon-SVR	(regression)
    	4 -- nu-SVR		(regression)
    -t kernel_type : set type of kernel function (default 2)
    	0 -- linear: u'*v
    	1 -- polynomial: (gamma*u'*v + coef0)^degree
    	2 -- radial basis function: exp(-gamma*|u-v|^2)
    	3 -- sigmoid: tanh(gamma*u'*v + coef0)
    	4 -- precomputed kernel (kernel values in training_instance_matrix)
    -d degree : set degree in kernel function (default 3)
    -g gamma : set gamma in kernel function (default 1/num_features)
    -r coef0 : set coef0 in kernel function (default 0)
    -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
    -n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
    -p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
    -m cachesize : set cache memory size in MB (default 100)
    -e epsilon : set tolerance of termination criterion (default 0.001)
    -h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
    -b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
    -wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
    -v n: n-fold cross validation mode
    -q : quiet mode (no outputs)
```
（3）预测：

$[predicted_-label, accuracy, decision_-values/prob_-estimates] = svmpredict(testing_-label_-vector, testing_-instance_-matrix, model, 'libsvm_-options')；$ $[predicted_-label] = svmpredict(testing_-label_-vector, testing_-instance_-matrix, model, 'libsvm_-options')；$
```
 libsvm_options:
        -b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); one-class SVM not supported yet
        -q : quiet mode (no outputs)
    Returns:
      predicted_label: SVM prediction output vector.
      accuracy: a vector with accuracy, mean squared error, squared correlation coefficient.
      prob_estimates: If selected, probability estimate vector.
```
一般使用上述函数的默认值就能得到结果，但是应当依据自身数据特点，深入理解SVM模型，调整相应的参数以获取最佳的结果。此即建议中的关于核函数和参数 $C$ and $γ$ 的获取的建议，这个后面再讲。