sklearn_SVM(上)_菜菜视频学习笔记

svm 支持向量机应用说明

可应用于,
有监督学习的
线性二分类与多分类
非线性二分类与多分类
普通连续型变量的回归
概率型连续变量的回归
无监督学习 支持向量聚类,异常值检测

功能
有监督学习的线性二分类与多分类,非线性二分类与多分类,普通连续型变量的回归,概率型连续变量的回归
无监督学习支持向量聚类,异常值检测

——>>理论部分

往后为代码实现

1 线性SVM决策过程的可视化

from sklearn.datasets import make_blobs
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np

1.1 实例化数据集,可视化数据集

X,y = make_blobs(n_samples=50, centers=2, random_state=0,cluster_std=0.6)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")#rainbow彩虹色
plt.xticks([])
plt.yticks([])
plt.show()

![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ggJXqd3A-1664676041343)(output_1_0.png)]](https://img-blog.csdnimg.cn/53324407cabb4b8e8f30bdc01d260d5
d.png)

#首先要有散点图
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
ax = plt.gca() #获取当前的子图,如果不存在,则创建新的子图

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-T3OZeMq0-1664676041344)(output_2_0.png)]

#获取平面上两条坐标轴的最大值和最小值
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xlim
(-0.7425578984849813, 3.3721920271976598)
 
#在最大值和最小值之间形成30个规律的数据
axisx = np.linspace(xlim[0],xlim[1],30)
axisy = np.linspace(ylim[0],ylim[1],30)
 
axisx
array([-0.7425579 , -0.60066997, -0.45878204, -0.31689411, -0.17500618,
       -0.03311826,  0.10876967,  0.2506576 ,  0.39254553,  0.53443346,
        0.67632139,  0.81820931,  0.96009724,  1.10198517,  1.2438731 ,
        1.38576103,  1.52764896,  1.66953689,  1.81142481,  1.95331274,
        2.09520067,  2.2370886 ,  2.37897653,  2.52086446,  2.66275238,
        2.80464031,  2.94652824,  3.08841617,  3.2303041 ,  3.37219203])
axisy,axisx = np.meshgrid(axisy,axisx)
#我们将使用这里形成的二维数组作为我们contour函数中的X和Y
#使用meshgrid函数将两个一维向量转换为特征矩阵
#核心是将两个特征向量广播,以便获取y.shape * x.shape这么多个坐标点的横坐标和纵坐标
 
axisx.shape
(30, 30)
axisx.ravel().shape
(900,)

1.2 画网格,通过点到分离超平面的距离为定值来画决策边界

xy = np.vstack([axisx.ravel(), axisy.ravel()]).T
#其中ravel()是降维函数,vstack能够将多个结构一致的一维数组按行堆叠起来
#xy就是已经形成的网格,它是遍布在整个画布上的密集的点
 
plt.scatter(xy[:,0],xy[:,1],s=1,cmap="rainbow")
<matplotlib.collections.PathCollection at 0x24dd66a29a0>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qlw1F3LJ-1664676041345)(output_10_1.png)]

 
#理解函数meshgrid和vstack的作用
a = np.array([1,2,3])
b = np.array([7,8])
#两两组合,会得到多少个坐标?
#答案是6个,分别是 (1,7),(2,7),(3,7),(1,8),(2,8),(3,8)
 
v1,v2 = np.meshgrid(a,b)
 
v1
array([[1, 2, 3],
       [1, 2, 3]])
 v2
array([[7, 7, 7],
       [8, 8, 8]])
v = np.vstack([v1.ravel(), v2.ravel()]).T
v
array([[1, 7],
       [2, 7],
       [3, 7],
       [1, 8],
       [2, 8],
       [3, 8]])
#建模,通过fit计算出对应的分离超平面
clf = SVC(kernel = "linear").fit(X,y)#计算出对应的分离超平面
Z = clf.decision_function(xy).reshape(axisx.shape)
#重要接口decision_function,返回每个输入的样本所对应的到分离超平面的距离
#然后再将这个距离转换为axisx的结构,这是由于画图的函数contour要求Z的结构必须与X和Y保持一致

#首先要有散点图
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
ax = plt.gca() #获取当前的子图,如果不存在,则创建新的子图
#画决策边界和平行于决策边界的超平面
ax.contour(axisx,axisy,Z
           ,colors="k"
           ,levels=[-1,0,1] #画三条等高线,分别是Z为-1,Z为0和Z为1的三条线
           ,alpha=0.5#透明度
           ,linestyles=["--","-","--"])
 
ax.set_xlim(xlim)#设置x轴取值
ax.set_ylim(ylim)
(-0.41872382476349596, 5.754870487889891)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gtdZ9YWO-1664676041345)(output_16_1.png)]

#记得Z的本质么?是输入的样本到分离超平面的距离,而contour函数中的level其实是输入了这个距离
#让我们用一个点来试试看
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plt.scatter(X[10,0],X[10,1],c="black",s=50,cmap="rainbow")
<matplotlib.collections.PathCollection at 0x24dd647c6d0>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AtD815vn-1664676041345)(output_17_1.png)]

clf.decision_function(X[10].reshape(1,2))#调用距离
array([-3.33917354])
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
ax = plt.gca()
ax.contour(axisx,axisy,Z
            ,colors="k"
            ,levels=[-3.33917354]
            ,alpha=0.5
            ,linestyles=["--"])
<matplotlib.contour.QuadContourSet at 0x24dd6705820>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9fM6k4LT-1664676041346)(output_19_1.png)]

1.3 将绘图过程包装成函数

#将上述过程包装成函数:
def plot_svc_decision_function(model,ax=None):
    if ax is None:
        ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    
    x = np.linspace(xlim[0],xlim[1],30)
    y = np.linspace(ylim[0],ylim[1],30)
    Y,X = np.meshgrid(y,x) 
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)
    
    ax.contour(X, Y, P,colors="k",levels=[-1,0,1],alpha=0.5,linestyles=["--","-","--"]) 
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)
 
#则整个绘图过程可以写作:
clf = SVC(kernel = "linear").fit(X,y)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plot_svc_decision_function(clf)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Qsd2snZe-1664676041346)(output_20_0.png)]

clf.predict(X)
#根据分离超平面,对X中的样本进行分类,返回的结构为n_samples
 
array([1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 1, 0])
clf.score(X,y)
#返回给定测试数据和标签的平均准确度
1.0
clf.support_vectors_
#返回支持向量坐标
array([[0.44359863, 3.11530945],
       [2.33812285, 3.43116792],
       [2.06156753, 1.96918596]])
 clf.n_support_#array([2, 1])
#返回每个类中支持向量的个数
array([2, 1])

1.4 导入非线性数据集进行验证

from sklearn.datasets import make_circles
X,y = make_circles(100, factor=0.1, noise=.1)#画环
 
X.shape
(100, 2)

y.shape
(100,)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PEbZWQWb-1664676041346)(output_28_0.png)]

clf = SVC(kernel = "linear").fit(X,y)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plot_svc_decision_function(clf)
clf.score(X,y)
0.67

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QdmCQWIQ-1664676041346)(output_29_1.png)]

1.5 投射到高维空间,以寻找将非线性数据完美分割的超平面

#定义一个由x计算出来的新维度r
r = np.exp(-(X**2).sum(1))
 
rlim = np.linspace(min(r),max(r),100)
 
r.shape
(100,)
rlim
array([0.19660588, 0.20471622, 0.21282655, 0.22093689, 0.22904723,
       0.23715756, 0.2452679 , 0.25337824, 0.26148857, 0.26959891,
       0.27770925, 0.28581958, 0.29392992, 0.30204026, 0.31015059,
       0.31826093, 0.32637127, 0.3344816 , 0.34259194, 0.35070228,
       0.35881261, 0.36692295, 0.37503329, 0.38314362, 0.39125396,
       0.3993643 , 0.40747463, 0.41558497, 0.42369531, 0.43180564,
       0.43991598, 0.44802631, 0.45613665, 0.46424699, 0.47235732,
       0.48046766, 0.488578  , 0.49668833, 0.50479867, 0.51290901,
       0.52101934, 0.52912968, 0.53724002, 0.54535035, 0.55346069,
       0.56157103, 0.56968136, 0.5777917 , 0.58590204, 0.59401237,
       0.60212271, 0.61023305, 0.61834338, 0.62645372, 0.63456406,
       0.64267439, 0.65078473, 0.65889507, 0.6670054 , 0.67511574,
       0.68322607, 0.69133641, 0.69944675, 0.70755708, 0.71566742,
       0.72377776, 0.73188809, 0.73999843, 0.74810877, 0.7562191 ,
       0.76432944, 0.77243978, 0.78055011, 0.78866045, 0.79677079,
       0.80488112, 0.81299146, 0.8211018 , 0.82921213, 0.83732247,
       0.84543281, 0.85354314, 0.86165348, 0.86976382, 0.87787415,
       0.88598449, 0.89409482, 0.90220516, 0.9103155 , 0.91842583,
       0.92653617, 0.93464651, 0.94275684, 0.95086718, 0.95897752,
       0.96708785, 0.97519819, 0.98330853, 0.99141886, 0.9995292 ])
from mpl_toolkits import mplot3d
 
#定义一个绘制三维图像的函数
#elev表示上下旋转的角度
#azim表示平行旋转的角度
def plot_3D(elev=30,azim=30,X=X,y=y):
    ax = plt.subplot(projection="3d")#构建一个画3D图的子图
    ax.scatter3D(X[:,0],X[:,1],r,c=y,s=50,cmap='rainbow')
    ax.view_init(elev=elev,azim=azim)#调整视角
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_zlabel("r")
    plt.show()
plot_3D()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xWJ6e8SN-1664676041347)(output_33_0.png)]

#如果放到jupyter notebook中运行
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
 
from sklearn.datasets import make_circles
X,y = make_circles(100, factor=0.1, noise=.1)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
 
def plot_svc_decision_function(model,ax=None):
    if ax is None:
        ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    
    x = np.linspace(xlim[0],xlim[1],30)
    y = np.linspace(ylim[0],ylim[1],30)
    Y,X = np.meshgrid(y,x) 
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)
    
    ax.contour(X, Y, P,colors="k",levels=[-1,0,1],alpha=0.5,linestyles=["--","-","--"])
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)
 
clf = SVC(kernel = "linear").fit(X,y)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plot_svc_decision_function(clf)
 
r = np.exp(-(X**2).sum(1))
 
rlim = np.linspace(min(r),max(r),100)
 
from mpl_toolkits import mplot3d
 
def plot_3D(elev=30,azim=30,X=X,y=y):
    ax = plt.subplot(projection="3d")
    ax.scatter3D(X[:,0],X[:,1],r,c=y,s=50,cmap='rainbow')
    ax.view_init(elev=elev,azim=azim)
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_zlabel("r")
    plt.show()
 
from ipywidgets import interact,fixed
interact(plot_3D,elev=[0,30,60,90],azip=(-180,180),X=fixed(X),y=fixed(y))
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nJXsdFeJ-1664676041347)(output_34_0.png)]

在这里插入图片描述

interactive(children=(Dropdown(description='elev', index=1, options=(0, 30, 60, 90), value=30), IntSlider(valu…
#上例,通过核变换,将数据投射到高维空间,以寻找将数据完美分割的超平面
#非线性SVM的重要参数
clf = SVC(kernel = "rbf").fit(X,y)
plt.scatter(X[:,0],X[:,1],c=y,s=50,cmap="rainbow")
plot_svc_decision_function(clf)
#rbf高斯径向基核函数在此时完美解决非线性分类问题

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-43Qv7JVN-1664676041347)(output_36_0.png)]

2 探索不同核函数在不同数据集上的效用与其性质

#探索不同核函数在不同数据集上的效用与其性质
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap#色彩块
from sklearn import svm#from sklearn.svm import SVC  两者都可以
from sklearn.datasets import make_circles, make_moons, make_blobs,make_classification#对半分的分类

2.1 创建数据集,定义核函数的选择

n_samples = 100
 
datasets = [
    make_moons(n_samples=n_samples, noise=0.2, random_state=0),
    make_circles(n_samples=n_samples, noise=0.2, factor=0.5, random_state=1),
    make_blobs(n_samples=n_samples, centers=2, random_state=5),#分簇的数据集
    make_classification(n_samples=n_samples,n_features = 2,n_informative=2,n_redundant=0, random_state=5)
                #n_features:特征数,n_informative:带信息的特征数,n_redundant:不带信息的特征数
    ]
 
Kernel = ["linear","poly","rbf","sigmoid"]
 
#四个数据集分别是什么样子呢?
for X,Y in datasets:
    plt.figure(figsize=(5,4)) 
    plt.scatter(X[:,0],X[:,1],c=Y,s=50,cmap="rainbow")

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3ortkoXP-1664676041347)(output_40_0.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6Z3VbgvK-1664676041348)(output_40_1.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jKOrfPwZ-1664676041348)(output_40_2.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-B1eGqOyM-1664676041348)(output_40_3.png)]

2.2 构建子图,并循环输出分类结果

# 构建子图
nrows=len(datasets)#行数
ncols=len(Kernel) + 1#列数
 
fig, axes = plt.subplots(nrows, ncols,figsize=(20,16))#figsize(长,宽)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FHiOpab9-1664676041349)(output_41_0.png)]

[*enumerate(datasets)]
[(0,
  (array([[-1.09443462e-02,  9.89784876e-01],
          [ 1.96749886e+00, -1.10921978e-01],
          [ 9.18923151e-01, -7.87831621e-03],
          [-1.97813183e-02,  3.67422878e-02],
          [ 8.97047211e-01, -5.26043067e-01],
          [ 2.05087697e+00,  4.82966687e-01],
          [ 5.52592656e-01,  5.10008493e-01],
          [ 9.36108682e-01, -6.67176177e-01],
          [-8.57905150e-03,  3.44030710e-01],
          [ 1.79962867e+00,  3.22578165e-01],
          [-1.79739813e-01,  5.12417381e-01],
          [ 1.96928635e+00, -1.84060982e-01],
          [ 1.40757108e+00, -6.55885144e-01],
          [ 1.04103920e+00,  1.04537944e+00],
          [ 6.11861752e-01,  5.09315861e-01],
          [-3.59476500e-01,  1.05930036e+00],
          [ 2.54029695e-01,  1.15116524e+00],
          [ 2.13555501e-01,  8.82321641e-01],
          [-3.30880800e-01,  8.04221145e-01],
          [ 1.06603845e+00,  5.32174106e-01],
          [-7.06988363e-01,  5.63246401e-01],
          [ 4.02928450e-01, -1.95330382e-01],
          [ 7.83527128e-01,  5.65637444e-01],
          [ 7.29264348e-01, -4.64258931e-01],
          [-7.61131674e-01, -1.74321350e-03],
          [-8.30401440e-01, -2.33952062e-01],
          [ 7.12873757e-01,  3.33441281e-01],
          [ 2.12091446e+00,  1.51388354e-01],
          [ 1.76738365e+00, -1.38842428e-02],
          [ 1.35429861e+00, -2.35239859e-01],
          [ 3.82226943e-01, -1.29870625e-01],
          [ 1.15255238e+00, -8.36624186e-01],
          [ 1.85603425e+00, -2.25641253e-02],
          [ 4.78053620e-01, -3.54658215e-01],
          [ 4.65065876e-02,  5.22966374e-01],
          [-1.68749515e-01,  9.97161466e-01],
          [ 2.17677252e-01,  9.71890153e-01],
          [ 1.45168696e-01,  2.06362619e-01],
          [-6.04440255e-02,  4.86891449e-02],
          [ 1.00652060e+00, -5.83659180e-01],
          [ 1.34599608e+00, -8.74713518e-03],
          [ 5.07344926e-01, -3.11872588e-01],
          [-8.84426881e-01,  1.75672048e-01],
          [-1.00353955e+00,  2.54679349e-01],
          [ 1.00682339e+00,  3.36434579e-01],
          [ 8.11581056e-01,  1.19684303e+00],
          [ 6.05383054e-01,  1.34346598e+00],
          [-5.25267589e-01,  6.67755643e-01],
          [-9.36918623e-01,  3.24010896e-01],
          [ 8.32721148e-01,  2.07541427e-01],
          [ 1.56011397e+00, -1.61052076e-03],
          [-2.00343863e-01, -1.71769945e-01],
          [ 8.14368163e-01,  2.98144383e-01],
          [-5.33016793e-01,  7.25851388e-01],
          [ 1.39949996e-01,  5.16100416e-01],
          [ 1.30241869e-01,  2.73900710e-01],
          [ 6.05976627e-01,  8.71416086e-01],
          [-3.55599199e-01,  4.28344752e-01],
          [ 1.80905518e-01,  1.21324092e+00],
          [-6.86271500e-02,  4.98563121e-01],
          [ 6.91482442e-01,  7.02335678e-01],
          [-3.83113433e-01,  9.66746318e-01],
          [ 2.98656366e-01, -1.83495206e-01],
          [ 1.17897990e-01, -2.31064511e-01],
          [ 9.04410734e-01, -6.86183692e-01],
          [ 1.27108202e+00, -3.39556126e-01],
          [-2.52941845e-01,  9.36590815e-01],
          [ 1.58149755e+00, -5.26620862e-01],
          [ 7.04126938e-01,  6.45019632e-01],
          [ 2.05387806e+00, -4.99221849e-01],
          [ 2.78958975e-01,  8.79248341e-01],
          [-7.28199738e-01,  9.21967277e-01],
          [-9.21538389e-01,  4.83269613e-02],
          [ 2.01257720e+00,  2.06208601e-01],
          [ 2.09649727e+00,  4.53952338e-01],
          [ 4.55121438e-01, -5.98476065e-01],
          [ 3.25942701e-01,  1.06336046e+00],
          [ 1.80917678e+00,  3.67632943e-01],
          [ 7.05843148e-01, -4.60516884e-01],
          [ 1.24497910e+00, -4.89751662e-01],
          [-1.02501239e-01,  1.16337954e+00],
          [-6.13951804e-01,  9.35134524e-01],
          [ 1.32828610e+00, -2.76080239e-01],
          [-9.10782155e-01,  4.00675696e-01],
          [-8.86192869e-01, -1.04843093e-01],
          [ 1.59251994e-01, -3.50710293e-02],
          [ 1.74397698e-01, -4.22503039e-02],
          [ 7.28423493e-01,  4.19640376e-01],
          [ 1.70911154e+00, -1.84104334e-01],
          [ 5.76918298e-01, -2.31102480e-01],
          [ 1.68451790e+00, -3.48465285e-01],
          [-7.71915511e-01,  6.67699774e-01],
          [ 2.75312996e-01,  8.48999530e-01],
          [ 1.42394089e+00, -6.57814970e-01],
          [ 1.05961304e+00,  5.80782652e-01],
          [ 1.79348845e+00,  4.94760237e-01],
          [ 3.24902726e-01, -5.63880560e-01],
          [ 7.60779589e-01, -3.73787587e-01],
          [-1.20959690e+00,  3.24700620e-01],
          [-1.18089701e+00,  2.27800799e-01]]),
   array([0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
          0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
          0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1,
          0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0,
          1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0], dtype=int64))),
 (1,
  (array([[-0.38289117, -0.09084004],
          [-0.02096229, -0.47787416],
          [-0.39611596, -1.28942694],
          [-0.61813018, -0.06383715],
          [ 0.70347838, -0.18703837],
          [-0.45970463,  0.69477465],
          [-0.45091682, -0.71570524],
          [-0.45562004, -0.13406016],
          [-0.26513904,  0.40812871],
          [-0.15474648,  0.41406973],
          [ 0.231206  , -0.53275899],
          [ 0.15623875, -0.8678088 ],
          [ 0.51647541,  0.48940995],
          [ 0.68707007, -0.02334129],
          [ 0.54759869, -0.16482373],
          [-0.32179572, -0.80553536],
          [-1.07478639, -0.733362  ],
          [ 0.76758455, -0.43498783],
          [-0.47475234, -0.33813186],
          [ 0.15420656,  1.07306032],
          [ 0.65916696,  0.20773634],
          [-0.77795003,  0.1326555 ],
          [ 0.47025454, -0.31228748],
          [-0.04246799,  0.26555446],
          [-0.72405954,  0.48807185],
          [-0.36960005, -1.06514028],
          [ 0.17833327, -0.49718972],
          [-0.93927864, -0.41951638],
          [ 0.50914152, -0.70977467],
          [-0.05569852, -0.82162607],
          [-0.11214579,  0.72197044],
          [ 0.80463921, -0.15221296],
          [ 0.08261487, -0.11749021],
          [ 0.20349541, -0.37396789],
          [ 0.13864693, -0.23905642],
          [ 0.32785307, -1.00769037],
          [ 0.88944061, -0.39117628],
          [-0.05837947,  0.28487039],
          [-1.0673653 ,  0.2204006 ],
          [-0.60071345, -0.69545189],
          [-0.03972324, -0.40936056],
          [ 0.39742085,  0.20621162],
          [-0.36941154,  0.0129811 ],
          [ 0.03573703,  0.46666229],
          [-0.56814999, -0.41288419],
          [ 0.41047299, -0.73640868],
          [ 0.88249707, -0.69004404],
          [ 0.06579822, -0.50458395],
          [-0.75737223, -0.0724028 ],
          [ 0.18316966,  0.08722007],
          [ 0.67248314, -0.41892665],
          [ 0.25898723,  0.39688645],
          [-1.1312983 ,  0.4810614 ],
          [ 1.0592844 ,  0.64490287],
          [ 0.41019663,  0.38790198],
          [ 0.95142029, -0.04089983],
          [-0.60492988,  0.43950906],
          [ 0.23314762, -0.81785711],
          [ 0.91067331,  0.30702075],
          [-0.45026472, -0.03724104],
          [-0.81396121, -0.64733959],
          [-0.23191338,  0.50533992],
          [-0.59760983,  0.28023168],
          [ 0.73960166, -0.84270281],
          [ 0.57294659, -0.31198928],
          [ 0.24821133, -0.54784509],
          [ 0.52127802,  0.94108005],
          [ 0.33973198,  0.10609978],
          [ 1.05339036, -0.02197593],
          [ 0.01327466, -0.63379502],
          [ 0.2422589 ,  0.49032064],
          [-0.89266612,  0.6345076 ],
          [ 0.1672566 ,  0.23548462],
          [-0.05611705,  0.38834099],
          [ 0.84695486,  0.81435811],
          [ 0.29976195, -0.07943031],
          [-0.1404762 ,  0.72486032],
          [-0.05482024,  0.18417328],
          [-0.24643884, -0.43283337],
          [-0.23460645,  0.6409442 ],
          [-1.13184893, -0.61964942],
          [-0.92413821, -0.45302089],
          [ 0.2225745 ,  0.77052597],
          [-0.69453765,  0.53014147],
          [-1.0362509 ,  0.77339965],
          [ 0.51880585,  0.30152232],
          [-0.77429541,  0.02553767],
          [ 0.71468326,  0.56869015],
          [-0.33875274,  0.46826063],
          [-0.34749244,  0.13441418],
          [ 1.12980796,  0.04281936],
          [-0.38308979,  0.79116812],
          [-0.07425141,  0.2184625 ],
          [-0.44945202, -0.05722266],
          [ 0.85783288,  0.63778888],
          [-0.47486203, -0.22498112],
          [ 0.12627243,  0.86978412],
          [-0.64736458, -0.36342437],
          [ 0.47440459,  1.01101585],
          [-0.38565772, -0.81031183]]),
   array([1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1,
          1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,
          1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
          0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
          1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0], dtype=int64))),
 (2,
  (array([[-5.66730056,  9.6747529 ],
          [-5.02967294,  8.6596218 ],
          [-6.46936898,  6.82300947],
          [-6.75290119,  7.20976961],
          [-4.14673856,  7.63590025],
          [-5.38927072,  7.95803989],
          [-4.64344634,  7.55830615],
          [-4.46345027,  8.83731915],
          [-4.68748196,  7.21252795],
          [-5.84140615,  8.13223642],
          [-5.36728768,  9.39035577],
          [-5.95944934,  8.31654712],
          [-4.4207936 ,  7.90908652],
          [-6.90411213,  8.20050911],
          [-4.81440963,  8.19155371],
          [-4.95550633,  8.99006291],
          [-4.76708326,  6.78307449],
          [-3.12936539,  7.16255399],
          [-6.43201576,  6.99213819],
          [-7.12970024,  7.7990026 ],
          [-7.61935775,  9.00251464],
          [-4.40874557,  9.27197713],
          [-4.41862884,  6.91289058],
          [-5.8659896 ,  6.93691471],
          [-6.93064951,  8.76263877],
          [-5.56001672,  8.89406765],
          [-3.47162189,  7.76156545],
          [-6.51197348,  8.920786  ],
          [-5.89647284,  7.31403178],
          [-5.04948453,  8.09078804],
          [-5.65507444,  8.71871991],
          [-6.16662202,  7.25381173],
          [-3.73877184,  6.98761474],
          [-7.4653154 ,  9.24937097],
          [-5.9958405 ,  8.38682543],
          [-5.80084772,  6.59052267],
          [-6.43266806,  8.78589697],
          [-5.64701218,  8.97617842],
          [-6.27060303,  7.19945832],
          [-7.22492511,  6.71446709],
          [-7.55929193,  8.67586662],
          [-5.92496887,  5.98552041],
          [-6.37587295,  8.88947751],
          [-7.43787746,  8.95340227],
          [-6.15353935,  8.20288407],
          [-6.54074446,  6.55779297],
          [-4.42173579,  8.81517442],
          [-5.61244473,  7.66386378],
          [-5.81966167,  7.88508551],
          [-4.90351711,  7.53945295],
          [-5.39791168,  8.47356295],
          [-6.86121319,  9.27910763],
          [-6.32089689,  6.7034829 ],
          [-7.64676819,  7.62431237],
          [-5.1184643 ,  9.83531364],
          [-5.36709327,  7.66612429],
          [-6.23402642,  6.12310003],
          [-5.06657892,  7.91513345],
          [-5.56633149,  7.31357851],
          [-5.50099233,  7.05133525],
          [-4.54171545,  8.47599756],
          [-8.20123871,  7.20493971],
          [-6.15849651,  7.17122642],
          [-5.61579958,  9.51942024],
          [-4.76771396,  7.58541058],
          [-6.00874331,  8.14802833],
          [-4.37107585,  7.3410528 ],
          [-6.13764981,  8.56685089],
          [-5.36247649,  8.7494947 ],
          [-5.45052674,  8.99712724],
          [-7.09800301,  9.09917142],
          [-4.90935504,  7.7337076 ],
          [-5.40795882, 10.61018377],
          [-4.41488335,  8.97908848],
          [-5.45939839,  7.7700846 ],
          [-4.93810069, 10.01217222],
          [-6.00556657,  6.93252594],
          [-4.99200386,  7.42740444],
          [-7.07131614,  8.05949363],
          [-5.29052417,  8.70660951],
          [-6.27313606,  7.5595934 ],
          [-4.56369675,  8.12706739],
          [-3.79200136,  9.08201602],
          [-6.71208551, 10.89302579],
          [-7.34687609,  8.35527284],
          [-4.65333349,  7.73756566],
          [-5.63928794,  6.72181978],
          [-5.37253335,  7.08477617],
          [-5.36599885,  7.22067391],
          [-8.41982454,  8.20401253],
          [-3.63234608,  8.33751606],
          [-4.12716808,  9.83742304],
          [-6.75503763,  7.07126671],
          [-5.55684774,  7.30871568],
          [-5.91896553,  8.01811773],
          [-7.14524008,  8.4087608 ],
          [-6.72483849,  6.0175721 ],
          [-6.24123774,  6.95029361],
          [-6.87090971,  6.72508089],
          [-6.27460923,  7.61738757]]),
   array([0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0,
          0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,
          0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1,
          0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
          1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1]))),
 (3,
  (array([[-9.33292655e-01, -1.27122049e+00],
          [-7.66789307e-01,  1.77992147e+00],
          [ 9.03096345e-01,  1.44981703e+00],
          [-1.58918963e+00, -8.32751056e-02],
          [ 9.28016429e-01, -1.23398104e+00],
          [ 1.35985803e+00,  1.51790366e+00],
          [ 5.64146897e-01,  8.34197749e-01],
          [ 3.38467944e-01, -1.54519884e+00],
          [-1.21399342e+00,  4.89499469e-01],
          [-3.51523809e-01,  1.05604000e+00],
          [ 1.42192482e+00,  1.30870092e+00],
          [ 1.18748491e+00,  9.31400904e-01],
          [ 8.19270631e-01,  1.48514069e+00],
          [-1.59251151e+00,  4.85137234e-02],
          [ 1.85219944e-01,  1.53870339e+00],
          [ 1.02943234e+00, -1.81742875e+00],
          [ 1.65055723e+00, -1.19890744e+00],
          [-3.69716837e-01,  1.67597179e+00],
          [-1.77851943e+00,  4.23377975e-01],
          [-4.81676607e-01, -2.04460722e+00],
          [ 2.19893663e+00,  2.99254258e-01],
          [ 4.16115190e-01, -7.85493949e-01],
          [-2.74842332e+00,  2.23314001e-01],
          [ 6.72127536e-01, -1.39095875e+00],
          [ 8.39858504e-01,  1.55542390e+00],
          [ 1.12529264e+00,  2.74200610e-01],
          [-1.22108822e+00, -4.41750034e-01],
          [ 2.78739455e+00, -1.61112841e-03],
          [ 3.53203076e-01,  1.12240129e-01],
          [ 3.10603047e-01, -1.81672854e+00],
          [ 1.86080686e+00, -6.41503310e-01],
          [ 8.94746914e-01, -7.35762193e-01],
          [-1.07707343e+00, -9.91615820e-01],
          [-9.57932339e-01, -9.56397414e-01],
          [-1.96082646e-01, -7.15486741e-01],
          [ 3.23109232e+00, -6.33780279e-01],
          [-1.38817524e+00, -2.48688592e-01],
          [-1.06976424e+00,  9.66055142e-01],
          [ 2.28064080e-01, -1.01741538e+00],
          [ 5.71846228e-01,  1.17039597e+00],
          [-9.47886334e-01, -1.09494698e+00],
          [-2.28879834e+00,  9.91856748e-01],
          [ 2.79871886e-01,  7.12484148e-01],
          [-1.14485872e+00, -7.50568271e-01],
          [-8.08416962e-01, -1.35525390e+00],
          [ 1.15874630e+00,  7.35684201e-02],
          [-1.27191176e+00,  6.74231628e-01],
          [-6.68826500e-01,  1.40599404e+00],
          [-2.33048236e+00,  7.96103665e-01],
          [-1.27620109e+00,  6.92027964e-01],
          [-9.52783841e-01, -1.08481428e+00],
          [ 9.24379467e-02,  1.44550825e+00],
          [ 3.32723963e-01, -1.06577791e+00],
          [-1.02970093e+00, -8.31163579e-01],
          [ 1.30603973e+00,  4.14757047e-01],
          [-1.71750956e+00,  5.00547184e-01],
          [-1.99566063e+00,  1.00972464e+00],
          [ 7.13394018e-01,  1.88985067e+00],
          [-1.79392778e+00,  1.63824076e+00],
          [ 5.19077120e-01, -5.54142502e-01],
          [ 1.41713525e+00, -1.13359774e+00],
          [-8.48825636e-01, -1.14486125e+00],
          [ 1.81154534e+00,  4.97164230e-01],
          [-2.14823656e+00,  7.33412925e-01],
          [ 1.06731305e+00,  1.63059220e+00],
          [-1.73590855e+00,  3.20189254e-01],
          [ 1.02467738e+00, -1.25475559e+00],
          [ 2.21517183e+00, -5.96284790e-01],
          [ 1.14752935e+00,  8.57570034e-01],
          [-1.55245701e+00, -3.36171152e-02],
          [ 2.55716114e+00, -2.99789421e-01],
          [ 7.94490859e-01,  8.74280300e-01],
          [-1.16336461e+00,  2.32337605e-01],
          [-3.94536291e-01,  1.19313340e+00],
          [-1.13456945e+00, -7.49047339e-01],
          [ 1.00666441e+00, -1.55511830e+00],
          [-2.34109735e+00,  4.52143651e-01],
          [-7.90107843e-01, -1.40539861e+00],
          [-2.49600849e-02,  1.01556610e+00],
          [ 1.88879091e+00, -1.27159355e+00],
          [ 2.01763813e+00, -6.80769118e-01],
          [-7.21544129e-01, -1.56319001e+00],
          [ 3.28795018e-01,  9.88507261e-01],
          [-2.39486480e+00,  8.87560308e-01],
          [-9.37827394e-01,  8.55915239e-01],
          [ 5.93709424e-01, -8.83762427e-01],
          [ 2.68917440e-01,  1.76442408e+00],
          [-1.61918481e+00,  4.36660291e-01],
          [-2.25592991e-01, -2.72499883e+00],
          [ 2.81655475e-01,  6.83987133e-01],
          [-8.51055761e-01, -1.22979364e+00],
          [ 1.74848466e+00,  1.17389035e+00],
          [ 1.12440472e+00,  5.85181570e-01],
          [-5.00014133e-01,  1.58890117e+00],
          [ 7.13425522e-01,  1.11017445e+00],
          [ 5.70065432e-01,  1.58754104e+00],
          [-1.10825112e+00, -5.84147570e-01],
          [-1.04162242e+00, -8.80162451e-01],
          [ 7.51042801e-01, -1.50398321e+00],
          [ 8.50404935e-01, -7.86309791e-01]]),
   array([0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1,
          0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,
          0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,
          1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0,
          0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1])))]
[*enumerate(Kernel)]
[(0, 'linear'), (1, 'poly'), (2, 'rbf'), (3, 'sigmoid')]
[*enumerate(datasets)] == list(enumerate(datasets))
#  enumerate、map、zip生成的惰性对象都可以通过[*]打开,list()以列表方式打开
# index,(X,Y) = [(索引, array([特矩阵征X],[标签Y]))],以该方式分别取出列表的索引,特征矩阵以及标签矩阵
True

#第一层循环:在不同的数据集中循环
for ds_cnt, (X,Y) in enumerate(datasets):
    
    #在图像中的第一列,放置原数据的4个分布
    ax = axes[ds_cnt, 0]#子图的第0行第0列开始
    if ds_cnt == 0:
        ax.set_title("Input data")
    ax.scatter(X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.cm.Paired,edgecolors='k')
    #zorder 图层显示层级,数值越大显示越上层
    #plt.cm.Paired,两种差异明显的颜色
    #edgecolors点的边缘颜色
    ax.set_xticks(())
    ax.set_yticks(())
    
    #第二层循环:在不同的核函数中循环
    #从图像的第二列开始,一个个填充分类结果
    for est_idx, kernel in enumerate(Kernel):
        
        #定义子图位置
        ax = axes[ds_cnt, est_idx + 1]#第一次是第0行的第1个图
        
        #建模(没有测试数据,纯预测原样本)
        clf = svm.SVC(kernel=kernel, gamma=2).fit(X, Y)# gamma=2?
        score = clf.score(X,Y)
        
        #绘制图像本身分布的散点图
        ax.scatter(X[:, 0], X[:, 1], c=Y
                   ,zorder=10
                   ,cmap=plt.cm.Paired,edgecolors='k')
        #绘制支持向量
        ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=50,
                    facecolors='none', zorder=10, edgecolors='k')# facecolors='none':透明的
        #相当于给是支持向量的散点加一个外圈
        
        #绘制决策边界
        #让网格面积扩大,使得原散点不会落在图像边缘
        x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
        y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
        
        #np.mgrid,合并了我们之前使用的np.linspace和np.meshgrid的用法
        #一次性使用最大值和最小值来生成网格
        #表示为[起始值:结束值:步长]
        #如果步长是复数,则其整数部分就是起始值和结束值之间创建的点的数量,并且结束值被包含在内
        XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
        #np.c_,类似于np.vstack的功能,x,y降维结合在一起形成所有点的网格坐标
        #decision_function计算所有点到分离超平面的距离
        #reshape()为得使数据适应contour轮廓函数
        Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()]).reshape(XX.shape)
        #pcolormesh填充等高线不同区域的颜色,一边填上一种颜色
        ax.pcolormesh(XX, YY, Z > 0, cmap=plt.cm.Paired)#Z距离,距离大于0为一类,距离小于0为一类
        #绘制等高线
        #给参数连接散点画三条线,三条线到分离超平面距离为,颜色为,线形式为,
        ax.contour(XX, YY, Z, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'],
                    levels=[-1, 0, 1])
        
        #设定坐标轴为不显示
        ax.set_xticks(())
        ax.set_yticks(())
        
        #将标题放在第一行的顶上
        if ds_cnt == 0:
            ax.set_title(kernel)
            
        #为每张图添加分类的分数
        #ax.text在图像上写入文字
        ax.text(0.95, 0.06, ('%.2f' % score).lstrip('0')
                #显示分数保留两位小数,不要显示0.只要显示两位小数即可
                , size=15
                , bbox=dict(boxstyle='round', alpha=0.8, facecolor='white')
                    #bbox为分数添加一个白色的格子作为底色
                    #facecolor格子颜色
                , transform=ax.transAxes #确定文字所对应的坐标轴,就是ax子图的坐标轴本身
                , horizontalalignment='right' #位于坐标轴的什么方向
               )
 
plt.tight_layout()#图像间采取紧缩的形式
plt.show()
C:\Users\chen'bu'rong\AppData\Local\Temp\ipykernel_18372\4035770019.py:50: UserWarning: No contour levels were found within the data range.
  ax.contour(XX, YY, Z, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'],

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pjyL4dUZ-1664676041349)(output_45_1.png)]

#总结
#[(0, 'linear'), (1, 'poly'), (2, 'rbf'), (3, 'sigmoid')]
#linear,poly用于线性数据
#rbf用于非线性数据
#ploy和rbf多用于图像处理
#使用流程
#先使用rbf判断数据大体情况,如果不能实现分类,那大体其他kernel也不行

3 在乳腺癌数据集上探索核函数的优势和缺陷

乳腺癌数据集为569个样本,30个特征的严重受到数据量纲的影响的高维数据集

#探索核函数的优势和缺陷
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split#分训练集和测试集的函数
import matplotlib.pyplot as plt
import numpy as np
from time import time#计算程序运行时间
import datetime#将时间戳转化为真实时间
#来比较不同核函数对于程序的运行速度的快慢
data = load_breast_cancer()
X = data.data
y = data.target
 
X.shape
(569, 30)
np.unique(y)
 
plt.scatter(X[:,0],X[:,1],c=y)
plt.show()
 

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-om9av6Da-1664676041349)(output_51_0.png)]

from sklearn.decomposition import PCA
X_dr=PCA(2).fit_transform(X)
X_dr.shape
plt.scatter(X_dr[:,0],X_dr[:,1],c=y)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1fqfhZam-1664676041349)(output_52_0.png)]

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,y,test_size=0.3,random_state=420)
 
Kernel = ["linear","poly","rbf","sigmoid"]
 
now=time() #时间戳
now
1664617506.168917
datetime.datetime.fromtimestamp(now).strftime("%Y-%m-%d,%H:%M:%S:%f")#%f对应微秒
'2022-10-01,17:45:06:168917'
'''for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
            # , degree = 1
             , cache_size=10000#使用计算的内存,单位是MB,默认是200MB
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
'''
The accuracy under kernel linear is 0.929825
0.4202580451965332
Kernel = ["linear","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
            # , degree = 1,多项式核函数的次数,当其为1时,为线性;默认值为3,非线性
             , cache_size=5000  #允许使用的内存
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)

#线性核函数的运行效率远不如非线性核函数

The accuracy under kernel linear is 0.929825
0.4206218719482422
The accuracy under kernel rbf is 0.596491
0.044010162353515625
The accuracy under kernel sigmoid is 0.596491
0.00500035285949707
Kernel = ["linear","poly","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
             , degree = 1
             , cache_size=5000
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
    
# 多项式核函数在此运行速度快,预测准确度高
The accuracy under kernel linear is 0.929825
0.4190938472747803
The accuracy under kernel poly is 0.923977
0.08101940155029297
The accuracy under kernel rbf is 0.596491
0.04300808906555176
The accuracy under kernel sigmoid is 0.596491
0.00500178337097168
import pandas as pd
data = pd.DataFrame(X)
data.describe([0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.99]).T#描述性统计
#从mean列和std列可以看出严重的"量纲不统一"
#从1%的数据和最小值相对比,90%的数据和最大值相对比,查看是否是正态分布或偏态分布,如果差的太多就是偏态分布,谁大方向就偏向谁
#可以发现数据值大的特征存在"偏态问题"
#这个时候就需要对数据进行标准化


countmeanstdmin1%5%10%25%50%75%90%99%max
0569.014.1272923.5240496.9810008.4583609.52920010.26000011.70000013.37000015.78000019.53000024.37160028.11000
1569.019.2896494.3010369.71000010.93040013.08800014.07800016.17000018.84000021.80000024.99200030.65200039.28000
2569.091.96903324.29898143.79000053.82760060.49600065.83000075.17000086.240000104.100000129.100000165.724000188.50000
3569.0654.889104351.914129143.500000215.664000275.780000321.600000420.300000551.100000782.7000001177.4000001786.6000002501.00000
4569.00.0963600.0140640.0526300.0686540.0750420.0796540.0863700.0958700.1053000.1148200.1328880.16340
5569.00.1043410.0528130.0193800.0333510.0406600.0497000.0649200.0926300.1304000.1754600.2771920.34540
6569.00.0887990.0797200.0000000.0000000.0049830.0136860.0295600.0615400.1307000.2030400.3516880.42680
7569.00.0489190.0388030.0000000.0000000.0056210.0111580.0203100.0335000.0740000.1004200.1642080.20120
8569.00.1811620.0274140.1060000.1295080.1415000.1495800.1619000.1792000.1957000.2149400.2595640.30400
9569.00.0627980.0070600.0499600.0515040.0539260.0553380.0577000.0615400.0661200.0722660.0854380.09744
10569.00.4051720.2773130.1115000.1197400.1601000.1830800.2324000.3242000.4789000.7488801.2913202.87300
11569.01.2168530.5516480.3602000.4105480.5401400.6404000.8339001.1080001.4740001.9094002.9154404.88500
12569.02.8660592.0218550.7570000.9532481.1328001.2802001.6060002.2870003.3570005.1232009.69004021.98000
13569.040.33707945.4910066.8020008.51444011.36000013.16000017.85000024.53000045.19000091.314000177.684000542.20000
14569.00.0070410.0030030.0017130.0030580.0036900.0042240.0051690.0063800.0081460.0104100.0172580.03113
15569.00.0254780.0179080.0022520.0047050.0078920.0091690.0130800.0204500.0324500.0476020.0898720.13540
16569.00.0318940.0301860.0000000.0000000.0032530.0077260.0150900.0258900.0420500.0585200.1222920.39600
17569.00.0117960.0061700.0000000.0000000.0038310.0054930.0076380.0109300.0147100.0186880.0311940.05279
18569.00.0205420.0082660.0078820.0105470.0117580.0130120.0151600.0187300.0234800.0301200.0522080.07895
19569.00.0037950.0026460.0008950.0011140.0015220.0017100.0022480.0031870.0045580.0061850.0126500.02984
20569.016.2691904.8332427.9300009.20760010.53400011.23400013.01000014.97000018.79000023.68200030.76280036.04000
21569.025.6772236.14625812.02000015.20080016.57400017.80000021.08000025.41000029.72000033.64600041.80240049.54000
22569.0107.26121333.60254250.41000058.27040067.85600072.17800084.11000097.660000125.400000157.740000208.304000251.20000
23569.0880.583128569.356993185.200000256.192000331.060000384.720000515.300000686.5000001084.0000001673.0000002918.1600004254.00000
24569.00.1323690.0228320.0711700.0879100.0957340.1029600.1166000.1313000.1460000.1614800.1889080.22260
25569.00.2542650.1573360.0272900.0500940.0711960.0936760.1472000.2119000.3391000.4478400.7786441.05800
26569.00.2721880.2086240.0000000.0000000.0183600.0456520.1145000.2267000.3829000.5713200.9023801.25200
27569.00.1146060.0657320.0000000.0000000.0242860.0384600.0649300.0999300.1614000.2089400.2692160.29100
28569.00.2900760.0618670.1565000.1760280.2127000.2261200.2504000.2822000.3179000.3600800.4869080.66380
29569.00.0839460.0180610.0550400.0585800.0625580.0657920.0714600.0800400.0920800.1063200.1406280.20750
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(X)#将数据转化为0,1正态分布
data = pd.DataFrame(X)
data.describe([0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.99]).T#均值很接近,方差为1了
countmeanstdmin1%5%10%25%50%75%90%99%max
0569.0-3.162867e-151.00088-2.029648-1.610057-1.305923-1.098366-0.689385-0.2150820.4693931.5344462.9095293.971288
1569.0-6.530609e-151.00088-2.229249-1.945253-1.443165-1.212786-0.725963-0.1046360.5841761.3269752.6440954.651889
2569.0-7.078891e-161.00088-1.984504-1.571053-1.296381-1.076672-0.691956-0.2359800.4996771.5294323.0379823.976130
3569.0-8.799835e-161.00088-1.454443-1.249201-1.078225-0.947908-0.667195-0.2951870.3635071.4860753.2187025.250529
4569.06.132177e-151.00088-3.112085-1.971730-1.517125-1.188910-0.710963-0.0348910.6361991.3136942.5995114.770911
5569.0-1.120369e-151.00088-1.610136-1.345369-1.206849-1.035527-0.747086-0.2219400.4938571.3478113.2757824.568425
6569.0-4.421380e-161.00088-1.114873-1.114873-1.052316-0.943046-0.743748-0.3422400.5260621.4342883.3005604.243589
7569.09.732500e-161.00088-1.261820-1.261820-1.116837-0.974010-0.737944-0.3977210.6469351.3284122.9737593.927930
8569.0-1.971670e-151.00088-2.744117-1.885853-1.448032-1.153036-0.703240-0.0716270.5307791.2332212.8624184.484751
9569.0-1.453631e-151.00088-1.819865-1.600987-1.257643-1.057477-0.722639-0.1782790.4709831.3422433.2094544.910919
10569.0-9.076415e-161.00088-1.059924-1.030184-0.884517-0.801577-0.623571-0.2922450.2661001.2405143.1982948.906909
11569.0-8.853492e-161.00088-1.554264-1.462915-1.227791-1.045885-0.694809-0.1974980.4665521.2565183.0818206.655279
12569.01.773674e-151.00088-1.044049-0.946900-0.858016-0.785049-0.623768-0.2866520.2430311.1173543.3780799.461986
13569.0-8.291551e-161.00088-0.737829-0.700152-0.637545-0.597942-0.494754-0.3477830.1067731.1215793.02186711.041842
14569.0-7.541809e-161.00088-1.776065-1.327593-1.116972-0.939031-0.624018-0.2203350.3683551.1230533.4058128.029999
15569.0-3.921877e-161.00088-1.298098-1.160988-0.982870-0.911510-0.692926-0.2810200.3896541.2364923.5989436.143482
16569.07.917900e-161.00088-1.057501-1.057501-0.949654-0.801336-0.557161-0.1990650.3367520.8828482.99733812.072680
17569.0-2.739461e-161.00088-1.913447-1.913447-1.292055-1.022462-0.674490-0.1404960.4726571.1179273.1464566.649601
18569.0-3.108234e-161.00088-1.532890-1.210240-1.063590-0.911757-0.651681-0.2194300.3556921.1596543.8340367.071917
19569.0-3.366766e-161.00088-1.096968-1.014237-0.859880-0.788466-0.585118-0.2299400.2886420.9042083.3493019.851593
20569.0-2.333224e-151.00088-1.726901-1.462332-1.187658-1.042700-0.674921-0.2690400.5220161.5350633.0013734.094189
21569.01.763674e-151.00088-2.223994-1.706020-1.482403-1.282757-0.748629-0.0435160.6583411.2976662.6258853.885905
22569.0-1.198026e-151.00088-1.693361-1.459232-1.173717-1.044983-0.689578-0.2859800.5402791.5035533.0096444.287337
23569.05.049661e-161.00088-1.222423-1.097625-0.966014-0.871684-0.642136-0.3411810.3575891.3930003.5818825.930172
24569.0-5.213170e-151.00088-2.682695-1.948882-1.605910-1.289152-0.691230-0.0468430.5975451.2761242.4784553.955374
25569.0-2.174788e-151.00088-1.443878-1.298811-1.164575-1.021571-0.681083-0.2695010.5396691.2314073.3357835.112877
26569.06.856456e-161.00088-1.305831-1.305831-1.217748-1.086814-0.756514-0.2182320.5311411.4350903.0233594.700669
27569.0-1.412656e-161.00088-1.745063-1.745063-1.375270-1.159448-0.756400-0.2234690.7125101.4363822.3541812.685877
28569.0-2.289567e-151.00088-2.160960-1.845039-1.251767-1.034661-0.641864-0.1274090.4501381.1325183.1843176.046041
29569.02.575171e-151.00088-1.601839-1.405690-1.185223-1.006009-0.691912-0.2164440.4507621.2398843.1410896.846856
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,y,test_size=0.3,random_state=420)
 
Kernel = ["linear","poly","rbf","sigmoid"]
 
for kernel in Kernel:
    time0 = time()
    clf= SVC(kernel = kernel
             , gamma="auto"
             , degree = 1
             , cache_size=5000
            ).fit(Xtrain,Ytrain)
    print("The accuracy under kernel %s is %f" % (kernel,clf.score(Xtest,Ytest)))
    print(time()-time0)
    
# 多项式线性核,计算高次项的速度极其缓慢

# 无量纲化,标准化数据集可以显著提升,核(rbf,多项式)处理速度
The accuracy under kernel linear is 0.976608
0.007682085037231445
The accuracy under kernel poly is 0.964912
0.00400090217590332
The accuracy under kernel rbf is 0.970760
0.007001638412475586
The accuracy under kernel sigmoid is 0.953216
0.00500178337097168

无量纲化,标准化数据集可以显著提升,核(rbf,多项式)处理速度

3.1 选取与核函数相关的参数:degree&gamma&coef0

在这里插入图片描述

degree整数,默认值3
gamma浮点数,默认值"auto":1/(n_features)
coef0默认0.0,取值>=0
score = []
gamma_range = np.logspace(-2, 0, 200) #返回在对数刻度上均匀间隔的数字 
# np.logspace y轴上值域均匀间隔分布对应的x轴的值
# gamma为正值
for i in gamma_range:
    clf = SVC(kernel="rbf",gamma = i,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), gamma_range[score.index(max(score))])
plt.plot(gamma_range,score)
plt.show()

0.9766081871345029 0.011489510001873092

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3WnI4ZQh-1664676041350)(output_62_1.png)]

from sklearn.model_selection import StratifiedShuffleSplit#用于支持带交叉验证的网格搜索
from sklearn.model_selection import GridSearchCV#带交叉验证的网格搜索

time0 = time()

gamma_range = np.logspace(-10,1,50)
coef0_range = np.linspace(0,5,10)

param_grid = dict(gamma = gamma_range
                  ,coef0 = coef0_range)
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.3, random_state=420)#将数据分为5份,5份数据中测试集占30%
grid = GridSearchCV(SVC(kernel = "poly",degree=1,cache_size=5000)
                        ,param_grid=param_grid#要枚举的参数
                        ,cv=cv)
grid.fit(X, y)

print("The best parameters are %s with a score of %0.5f" % (grid.best_params_, grid.best_score_))
print(time()-time0)
The best parameters are {'coef0': 0.0, 'gamma': 0.09540954763499924} with a score of 0.96959
13.33350944519043

4 软间隔重要参数C:松弛系数的惩罚项系数

#硬间隔与软间隔:重要参数C :松弛系数的惩罚项系数
# C越大,软间隔的决策边界就越小

#选择超平面切分数据集时,决策边界的宽度优先于小的训练误差
#为此需要平衡最大边际与误分类样本的数目

#调线性核函数
score = []
C_range = np.linspace(0.01,3,50)
for i in C_range:
    clf = SVC(kernel="linear",C=i,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show() 
0.9766081871345029 0.19306122448979593

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UKqQfnYx-1664676041350)(output_64_1.png)]

#换rbf
score = []
C_range = np.linspace(0.01,30,50)
for i in C_range:
    clf = SVC(kernel="rbf",C=i,gamma = 0.012742749857031322,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show()
#达到软间隔平衡取得极值
0.9824561403508771 6.130408163265306

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Q6G2Jtdl-1664676041350)(output_65_1.png)]

score = []
C_range = np.linspace(5,7,50)
for i in C_range:
    clf = SVC(kernel="rbf",C=i,gamma = 0.012742749857031322,cache_size=5000).fit(Xtrain,Ytrain)
    score.append(clf.score(Xtest,Ytest))
    
print(max(score), C_range[score.index(max(score))])
plt.plot(C_range,score)
plt.show()
 
0.9824561403508771 5.938775510204081

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FLsSmKcg-1664676041351)(output_66_1.png)]


  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值