支持向量机专题02

向量机的参数,属性及接口

1.探索核函数

from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
from time import time
import datetime
 
data = load_breast_cancer()
X = data.data
y = data.target
 
X.shape
np.unique(y)
 
plt.scatter(X[:,0],X[:,1],c=y)
plt.show()
 
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,y,test_size=0.3,random_state=420)
 
Kernel = ["linear","poly","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
            # , degree = 1
             , cache_size=10000#使用计算的内存,单位是MB,默认是200MB
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)

但并没有跑出来,模型停留在线性核函数中。

Kernel = ["linear","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
            # , degree = 1
             , cache_size=5000
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
----------------------------------------------------------------------
结果(linear核函数结果效果很好)
The accuracy under kernel linear is 0.929825
0.795527458190918
The accuracy under kernel rbf is 0.596491
0.06104254722595215
The accuracy under kernel sigmoid is 0.596491
0.008005142211914062
----------------------------------------------------------------------
Kernel = ["linear","poly","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
             , degree = 1
             , cache_size=5000
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
----------------------------------------------------------------------
结果(多项式核函数的运行速度和精度得以提升)
The accuracy under kernel linear is 0.929825
0.8025338649749756
The accuracy under kernel poly is 0.923977
0.14710068702697754
The accuracy under kernel rbf is 0.596491
0.06003713607788086
The accuracy under kernel sigmoid is 0.596491
0.011008739471435547  

数据中存在严重的量纲不一的问题,我们预处理数据,并将数据进行标准化处理。

from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(X)#将数据转化为0,1正态分布
data = pd.DataFrame(X)
data.describe([0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.99]).T#均值很接近,方差为1了
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,y,test_size=0.3,random_state=420)
 
Kernel = ["linear","poly","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
             , degree = 1
             , cache_size=5000
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
    ----------------------------------------------------------------------
结果(所有的核函数的运行时间大大的缩小)
The accuracy under kernel linear is 0.976608
0.01501321792602539
The accuracy under kernel poly is 0.964912
0.006003141403198242
The accuracy under kernel rbf is 0.970760
0.011005401611328125
The accuracy under kernel sigmoid is 0.953216
0.0060024261474609375

结论:
1、线性核函数,尤其是degree在高次项时计算非常缓慢
2、SVM执行之前,非常推荐进行数据的无量纲化

2.与核函数相关的参数讲解

输入含义参数gramma参数degree参数coef0
linear线性核NoNoNo
poly多项式核YesYesYes
rbf高斯径向基数YesNoNo
sigmoid双曲正切核YesNoYes
score = []
gamma_range = np.logspace(-10, 1, 50) #返回在对数刻度上均匀间隔的数字
for i in gamma_range:
    clf = SVC(kernel="rbf",gamma = i,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), gamma_range[score.index(max(score))])
plt.plot(gamma_range,score)
plt.show()
----------------------------------------------------------------------
结果:通过学习曲线,很容易找到rbf最佳的gamma值
0.9766081871345029 0.012067926406393264

在这里插入图片描述

from sklearn.model_selection import StratifiedShuffleSplit#用于支持带交叉验证的网格搜索
from sklearn.model_selection import GridSearchCV#带交叉验证的网格搜索
 
time0 = time()
 
gamma_range = np.logspace(-10,1,20)
coef0_range = np.linspace(0,5,10)
 
param_grid = dict(gamma = gamma_range
                  ,coef0 = coef0_range)
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.3, random_state=420)#将数据分为5份,5份数据中测试集占30%
grid = GridSearchCV(SVC(kernel = "poly",degree=1,cache_size=5000
                        ,param_grid=param_grid
                        ,cv=cv)
grid.fit(X, y)
 
print("The best parameters are %s with a score of %0.5f" % (grid.best_params_, 
grid.best_score_))
print(time()-time0)
----------------------------------------------------------------------
结果:通过网格搜索,得到poly的值的结果不如rbf和线性
The best parameters are {'coef0': 0.0, 'gamma': 0.18329807108324375} with a score of 0.96959
13.360332727432251
----------------------------------------------------------------------
----------------------------------------------------------------------
#调线性核函数
score = []
C_range = np.linspace(0.01,30,50)
for i in C_range:
    clf = SVC(kernel="linear",C=i,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show()
 
#换rbf
score = []
C_range = np.linspace(0.01,30,50)
for i in C_range:
    clf = SVC(kernel="rbf",C=i,gamma = 0.012742749857031322,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show()
 
#进一步细化
score = []
C_range = np.linspace(5,7,50)
for i in C_range:
    clf = SVC(kernel="rbf",C=i,gamma = 
0.012742749857031322,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show()

总结:

参数含义
degree整数,默认为3,只适用于核函数为poly的参数
gramma浮点数,默认为’auto’
coef0浮点数,默认为0
C浮点数,默认为1,可不填,松弛系数的惩罚项系数,如果C较大,能更好分类决策边界,但换来的结果是训练时间将更长

参考:CDA课堂,直播课后的个人笔记总结,仅供参考,有不一样的想法的大佬们,请辩证地观看,如果有问题可以在评论区指出我再订正。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值