libSVM

libSVM使用之--grid.py  

2011-03-20 13:34:28|  分类: 机器学习 |  标签:grid  py  gnuplot  gamma  pathname  |举报|字号 订阅

下载LOFTER客户端


此工具利用交叉验证的原理来选择参数c和gamma,是否有更佳的参数,有待考证。其中两参数的作用见如下:

The SVM with a Gaussian kernel function has two such training parameters: C which controls overfitting of the model, and gamma (γ) which controls the degree of nonlinearity of the model. Gamma is inversely related to sigma which is a degree for spread around a mean in statistics: the higher the value of gamma, the lower the value of sigma, thus the less spread or the more nonlinear the behavior of the kernel. The values of these training parameters C and gamma are determined by grid search and cross validation: the model with the highest estimated performance determines the selected training parameters. Then, the performance of the constructed model is estimated by using 5-fold cross validation on the training data. Finally, the constructed model is validated by predicting the validation data and comparing these predictions with the real observations by means of ROC curves.

gamma(或Epsilon ε)---不敏感损失函数的参数,gamma越大,支持向量越少,gamma值越小,支持向量越多,RBF宽度越大
 C 惩罚系数,C过大或过小,泛化能力变差
    Usage:  grid.py [-log2c begin,end,step] [-log2g begin,end,step] [-v fold]        [-svmtrain pathname] [-gnuplot pathname] [-out pathname] [-png pathname]        [additional parameters for svm-train] dataset
  Example ======= > python grid.py -log2c -5,5,1 -log2g -4,0,1 -v 5 -m 300 heart_scale >grid.py -log2c -5,5,1 -svmtrain c:\libsvm\windows\svm-train.exe -gnuplot c:\tmp\gnuplot\bin\pgnuplot.exe -v 10 heart_scale
  Output: two files dataset.png: the CV accuracy contour plot generated by gnuplot dataset.out: the CV accuracy at each (log2(C),log2(gamma))
    输出的结果如下所示: [local] 1.0 -3 85.58  (best c=4.0, g=0.125, rate=86.16) [local] -1.0 -3 81.49  (best c=4.0, g=0.125, rate=86.16) [local] 3.0 -3 85.84  (best c=4.0, g=0.125, rate=86.16) 4.0 0.125 86.16 我们只要找最后一行 就是最佳结果   其中尤其要注意的是:
1、gnuplot pathname 默认是,应用程序所在分区的tmp目录,这需要你在拷贝文件时就配置好,否则会提醒一些人常见的      pgnuplot not found错误
 2、如果在同一台机子上同时调整两个训练集的参数,要设置一个延迟时间,以免提示上面的错误
 
 附:参数意义:
 #iter:迭代次数 epsilon:epsilon-SVR中的参数epsilon
obj:SVM对偶对题的最优目标函数值
rho:决策平面(w.x + b)中的b nSV:支持向量的个数 nBSV:边界支持向量的个数
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值