【机器学习】——支持向量机

最新推荐文章于 2024-09-15 22:11:15 发布

Y小夜

最新推荐文章于 2024-09-15 22:11:15 发布

阅读量999

点赞数 34

分类专栏： Python机器学习文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/shsjssnn/article/details/138543380

版权

Python机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

💻博主现有专栏：

C51单片机（STC89C516），c语言，c++，离散数学，算法设计与分析，数据结构，Python，Java基础，MySQL，linux，基于HTML5的网页设计及应用，Rust（官方文档重点总结），jQuery，前端vue.js，Javaweb开发，设计模式、Python机器学习等
🥏主页链接：

Y小夜-CSDN博客

🎯理解支持向量机核函数和gamma、C参数

🎃载入鸢尾花数据集并创建C=1.0的“线性、径向基、高斯”核函数的支持向量机

🎃创建三个径向基核函数的支持向量机，设置gamma参数分别为0.5、5、50，训练模型并进行可视化。

✨gamma参数对模型的影响

🎃创建三个径向基核函数的支持向量机，设置C参数分别为0.01、1、100，训练模型并进行可视化。

✨C参数对模型的影响

🎯使用支持向量机预测鲍鱼年龄

🎃通过可视化手段进行探索性数据分析

🎃初步训练支持向量机模型，查看模型的预测准确率

🎯本文目的

(一)理解支持向量机的基本原理

(二)能够使用sklearn.datasets制作分类数据集

(三)能够使用sklearn库进行支持向量机模型的训练和预测

(四)理解支持向量机的核函数和gamma参数C参数对模型的影响

🎯理解支持向量机的基本原理

🎃创建“线性不可分”的数据集并进行可视化

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np
x,y=make_blobs(centers=4,random_state=18)
y=y%2
plt.scatter(x[:,0],x[:,1],c=y,s=80,cmap='autumn',edgecolor='k')
plt.show()
# 导入make_blobs函数，用于生成聚类算法的数据集
from sklearn.datasets import make_blobs
# 导入matplotlib.pyplot模块，用于绘制图形
import matplotlib.pyplot as plt
# 导入numpy模块，用于进行数值计算
import numpy as np

# 使用make_blobs函数生成数据集，设置中心点数量为4，随机种子为18
x, y = make_blobs(centers=4, random_state=18)

# 对y进行取模运算，将标签转换为0和1
y = y % 2

# 使用散点图绘制数据点，x[:, 0]表示所有数据的第0列（x坐标），x[:, 1]表示所有数据的第1列（y坐标）
# c参数表示颜色，根据y的值来设置颜色，s参数表示点的大小，cmap参数表示颜色映射，edgecolor参数表示边缘颜色
plt.scatter(x[:, 0], x[:, 1], c=y, s=80, cmap='autumn', edgecolor='k')

# 显示图形
plt.show()

🎃将数据投射到高维空间，并进行可视化

# 导入所需的库和模块
from mpl_toolkits.mplot3d import Axes3D, axes3d
import matplotlib.pyplot as plt
import numpy as np

# 创建一个新的图形对象
figure = plt.figure()

# 将原始数据 x 进行扩展，添加第二列的平方值作为新的一列
x_new = np.hstack([x, x[:, 1:] ** 2])

# 创建一个3D坐标轴对象，并设置视角参数
ax = Axes3D(figure, elev=-152, azim=26)

# 根据 y 的值生成一个布尔掩码，用于筛选出 y 等于 0 的数据点
mask = y == 0

# 绘制 y 等于 0 的数据点，使用红色表示，大小为 80
ax.scatter(x_new[mask, 0], x_new[mask, 1], x_new[mask, 2], c='r', s=80)

# 绘制 y 不等于 0 的数据点，使用蓝色星号表示，大小为 80
ax.scatter(x_new[~mask, 0], x_new[~mask, 1], x_new[~mask, 2], c='b', marker='*', s=80)

# 显示图形
plt.show()
这段代码使用了matplotlib库来绘制一个三维散点图。首先，通过mpl_toolkits.mplot3d模块导入了Axes3D和axes3d类。然后，创建了一个新的图形对象figure。接下来，使用numpy库对原始数据x进行了扩展，添加了第二列的平方值作为新的一列，得到x_new。然后，创建了一个3D坐标轴对象ax，并设置了视角参数。根据y的值生成了一个布尔掩码mask，用于筛选出y等于0的数据点。接着，使用scatter方法分别绘制了y等于0和不等于0的数据点，其中y等于0的数据点使用红色表示，大小为80；y不等于0的数据点使用蓝色星号表示，大小也为80。最后，调用plt.show()方法显示图形。

🎃创建“不线性”支持向量机，并进行可视化

from sklearn.svm import LinearSVC
linear_svm_3d=LinearSVC().fit(x_new,y)
coef,intercept=linear_svm_3d.coef_.ravel(),linear_svm_3d.intercept_
figure=plt.figure()
ax=Axes3D(figure,elev=-152,azim=16)
xx=np.linspace(x_new[:,0].min()-2,x_new[:,0].max()+2,50)
yy=np.linspace(x_new[:,1].min()-2,x_new[:,1].max()+2,50)
xx,yy=np.meshgrid(xx,yy)
zz=(coef[0]*xx+coef[1]*yy+intercept)/-coef[2]
ax.plot_surface(xx,yy,zz,rstride=8,cstride=8,alpha=0.3)
ax.scatter(x_new[mask,0],x[mask,1],x_new[mask,2],c='r',s=80)
ax.scatter(x_new[~mask,0],x[~mask,1],x_new[~mask,2],c='b',marker='*',s=80)
plt.show()
这段代码使用sklearn库中的LinearSVC类来创建一个线性支持向量机分类器，并使用fit方法对数据进行拟合。然后，通过调用coef_和intercept_属性获取模型的系数和截距。接下来，使用matplotlib库创建一个3D图形对象，并设置视角参数。然后，生成网格点坐标，并根据模型的系数和截距计算对应的z坐标。最后，使用plot_surface方法绘制三维平面，并使用scatter方法绘制散点图，其中红色表示正例样本，蓝色星号表示负例样本。最后，调用show方法显示图形。

🎯理解支持向量机核函数和gamma、C参数

🎃载入鸢尾花数据集并创建C=1.0的“线性、径向基、高斯”核函数的支持向量机

from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
def make_meshgrid(x,y,h=.02):
    x_min,x_max=x.min()-1,x.max()+1
    y_min,y_max=y.min()-1,y.max()+1
    xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
    return xx,yy
def plot_contours(ax,clf,xx,yy,**params):
    z=clf.predict(np.c_[xx.ravel(),yy.ravel()])
    z=z.reshape(xx.shape)
    out=ax.contourf(xx,yy,z,**params)
    return out
iris=load_iris()
x=iris.data[:,:2]
y=iris.target
C=1.0
models=(svm.SVC(kernel='linear',C=C),svm.SVC(kernel='rbf',gamma=0.7,C=C),svm.SVC(kernel='poly',degree=3,C=C))
models=(clf.fit(x,y) for clf in models)
titles=('SVC with linear kernel','SVC with RBF kernel','SVC with polynomial (degree=3) kernel')
fig,sub=plt.subplots(1,3,figsize=(12,3))
plt.subplots_adjust(wspace=0.4,hspace=0.2)
x0,x1=x[:,0],x[:,1]
xx,yy=make_meshgrid(x0,x1)
for clf,title,ax in zip(models,titles,sub.flatten()):
    plot_contours(ax,clf,xx,yy,cmap=plt.cm.autumn,alpha=0.8)
    ax.scatter(x0,x1,c=y,cmap=plt.cm.plasma,s=40,edgecolors='k')
    ax.set_xlim(xx.min(),xx.max())
    ax.set_ylim(yy.min(),yy.max())
    ax.set_xlabel('Feature 0')
    ax.set_ylabel('Feature 1')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)
plt.show()
# 导入所需库
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# 定义函数：生成网格点
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy

# 定义函数：绘制等高线图
def plot_contours(ax, clf, xx, yy, **params):
z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
z = z.reshape(xx.shape)
out = ax.contourf(xx, yy, z, **params)
return out

# 加载鸢尾花数据集
iris = load_iris()
x = iris.data[:, :2]
y = iris.target
C = 1.0

# 创建三个不同核函数的支持向量机模型
models = (svm.SVC(kernel='linear', C=C), svm.SVC(kernel='rbf', gamma=0.7, C=C), svm.SVC(kernel='poly', degree=3, C=C))
models = (clf.fit(x, y) for clf in models)
titles = ('SVC with linear kernel', 'SVC with RBF kernel', 'SVC with polynomial (degree=3) kernel')

# 创建一个1行3列的子图布局
fig, sub = plt.subplots(1, 3, figsize=(12, 3))
plt.subplots_adjust(wspace=0.4, hspace=0.2)

x0, x1 = x[:, 0], x[:, 1]
xx, yy = make_meshgrid(x0, x1)

# 遍历模型、标题和子图，绘制等高线图和散点图
for clf, title, ax in zip(models, titles, sub.flatten()):
plot_contours(ax, clf, xx, yy, cmap=plt.cm.autumn, alpha=0.8)
ax.scatter(x0, x1, c=y, cmap=plt.cm.plasma, s=40, edgecolors='k')
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xlabel('Feature 0')
ax.set_ylabel('Feature 1')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)

# 显示图像
plt.show()

🎃创建三个径向基核函数的支持向量机，设置gamma参数分别为0.5、5、50，训练模型并进行可视化。

# 导入所需库
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_classification

# 生成数据集
x, y = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=1)

# 定义模型列表，分别使用不同的gamma值
models = (svm.SVC(kernel='rbf', gamma=0.5), svm.SVC(kernel='rbf', gamma=5), svm.SVC(kernel='rbf', gamma=50))

# 对每个模型进行训练
models = (clf.fit(x, y) for clf in models)

# 定义标题列表
titles = ('gamma=0.5', 'gamma=5', 'gamma=50')

# 创建一个1行3列的子图布局
fig, sub = plt.subplots(1, 3, figsize=(10, 3))

# 获取特征数据
x0, x1 = x[:, 0], x[:, 1]

# 创建网格点
xx, yy = make_meshgrid(x0, x1)

# 遍历每个模型和对应的标题
for clf, title, ax in zip(models, titles, sub.flatten()):
    # 绘制等高线图
    plot_contours(ax, clf, xx, yy, cmap=plt.cm.autumn, alpha=0.8)
    # 绘制散点图
    ax.scatter(x0, x1, c=y, cmap=plt.cm.spring, s=40, edgecolors='k')
    # 设置坐标轴范围
    ax.set_xlim(xx.min(), yy.max())
    ax.set_ylim(yy.min(), yy.max())
    # 设置坐标轴标签
    ax.set_xlabel('Feature 0')
    ax.set_ylabel('Feature 1')
    # 隐藏坐标轴刻度
    ax.set_xticks(())
    ax.set_yticks(())
    # 设置子图标题
    ax.set_title(title)

# 显示图像
plt.show()

这段代码使用了sklearn库中的SVC类来构建支持向量机分类器。首先，通过make_classification函数生成了一个包含两个特征的数据集。然后，定义了三个不同gamma值的模型，并使用fit方法对每个模型进行训练。接下来，创建了一个1行3列的子图布局，用于展示不同gamma值下的分类结果。通过遍历每个模型和对应的标题，绘制了等高线图和散点图，并设置了坐标轴范围、标签和刻度。最后，显示了图像。

✨gamma参数对模型的影响

        gamma参数是支持向量机（SVM）模型中的一个重要参数，它控制了核函数（如RBF核）的影响范围。具体来说，gamma参数越大，模型对训练数据的拟合越精细，决策边界更为复杂，可能导致过拟合；而gamma参数越小，决策边界变得更加平滑，模型泛化能力可能更好，但有可能出现欠拟合的情况。

        总的来说，gamma参数的影响可以总结为：

- **大gamma值**：导致模型对训练数据拟合得更好，决策边界更复杂，可能导致过拟合。

- **小gamma值**：决策边界更平滑，模型对训练数据的拟合效果可能不如大gamma值，但有可能提升模型的泛化能力。



因此，在使用SVM模型时，需要根据数据集的特点和目标来选择合适的gamma值，通常需要进行交叉验证等方法来调参，以获得最佳的模型性能。

🎃创建三个径向基核函数的支持向量机，设置C参数分别为0.01、1、100，训练模型并进行可视化。

models=(svm.SVC(kernel='rbf',C=0.01),svm.SVC(kernel='rbf',C=1),svm.SVC(kernel='rbf',C=100))
models=(clf2.fit(x,y) for clf2 in models)
titles=('C=0.01','C=1','C=100')
fig,sub=plt.subplots(1,3,figsize=(10,3))
x0,x1=x[:,0],x[:,1]
xx,yy=make_meshgrid(x0,x1)
for clf,title,ax in zip(models,titles,sub.flatten()):
    plot_contours(ax,clf,xx,yy,cmap=plt.cm.autumn,alpha=0.8)
    ax.scatter(x0,x1,c=y,cmap=plt.cm.spring,s=40,edgecolors='k')
    ax.set_xlim(xx.min(),yy.max())
    ax.set_ylim(yy.min(),yy.max())
    ax.set_xlabel('Feature 0')
    ax.set_ylabel('Feature 1')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)
plt.show()
# 导入所需库
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix

# 生成数据集
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 定义模型列表，分别使用不同的C值进行训练
models = [svm.SVC(kernel='rbf', C=0.01), svm.SVC(kernel='rbf', C=1), svm.SVC(kernel='rbf', C=100)]
models = [clf.fit(X_train, y_train) for clf in models]

# 定义标题列表
titles = ['C=0.01', 'C=1', 'C=100']

# 创建子图
fig, sub = plt.subplots(1, 3, figsize=(10, 3))

# 获取特征数据
x0, x1 = X[:, 0], X[:, 1]

# 生成网格数据
xx, yy = make_meshgrid(x0, x1)

# 遍历模型、标题和子图，绘制分类结果的等高线图和样本点
for clf, title, ax in zip(models, titles, sub.flatten()):
plot_contours(ax, clf, xx, yy, cmap=plt.cm.autumn, alpha=0.8)
ax.scatter(x0, x1, c=y, cmap=plt.cm.spring, s=40, edgecolors='k')
ax.set_xlim(xx.min(), yy.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xlabel('Feature 0')
ax.set_ylabel('Feature 1')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)

# 显示图像
plt.show()

✨C参数对模型的影响

        C参数是支持向量机（SVM）模型中的正则化参数，它控制了对误分类样本的惩罚程度。C值越大，对误分类样本的惩罚越严重，模型会更倾向于选择边界更复杂的决策边界，以尽量正确分类更多的训练样本；而C值越小，对误分类样本的惩罚越轻，模型则更倾向于选择更简单的决策边界，以尽量使边界更平滑。

        具体来说，C参数的影响可以总结为：

- **大C值**：对误分类样本的惩罚更严重，模型更可能出现过拟合，决策边界更复杂。

- **小C值**：对误分类样本的惩罚较轻，模型更可能出现欠拟合，决策边界更平滑。

        因此，在实际应用中，需要根据具体问题和数据集的特点来选择合适的C值，通常需要通过交叉验证等方法进行参数调优，以获得最佳的模型性能和泛化能力。

🎯使用支持向量机预测鲍鱼年龄

🎃加载鲍鱼数据集，并了解查看鲍鱼数据集中数据的特征属性等数据信息。

import pandas as pd
abalone=pd.read_csv('bank/abalone.csv')
abalone.head()
import pandas as pd # 导入pandas库，用于数据处理和分析

abalone = pd.read_csv('bank/abalone.csv') # 使用pandas的read_csv函数读取名为'abalone.csv'的文件，并将其存储在变量abalone中

abalone.head() # 使用pandas的head函数显示abalone的前5行数据
abalone.info()

🎃通过可视化手段进行探索性数据分析

import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x='Sex',data=abalone,palette='Set2')
plt.show()
import matplotlib.pyplot as plt # 导入matplotlib库的pyplot模块，用于绘制图形
import seaborn as sns # 导入seaborn库，用于数据可视化

sns.countplot(x='Sex', data=abalone, palette='Set2') # 使用seaborn的countplot函数绘制性别计数图，设置颜色主题为'Set2'
plt.show() # 显示图形
abalone['age']=abalone['Rings']+1.5
plt.figure(figsize=(15,5))
sns.swarmplot(x='Sex',y='age',data=abalone,hue='Sex')
sns.violinplot(x='Sex',y='age',data=abalone)
plt.show()
# 导入所需的库
import matplotlib.pyplot as plt
import seaborn as sns

# 计算年龄并存储在新的列'age'中，年龄等于环数加1.5
abalone['age'] = abalone['Rings'] + 1.5

# 设置图形的大小为15x5
plt.figure(figsize=(15, 5))

# 使用seaborn的swarmplot函数绘制散点图，横坐标为性别，纵坐标为年龄，数据来源于abalone，按照性别进行分组
sns.swarmplot(x='Sex', y='age', data=abalone, hue='Sex')

# 使用seaborn的violinplot函数绘制小提琴图，横坐标为性别，纵坐标为年龄，数据来源于abalone
sns.violinplot(x='Sex', y='age', data=abalone)

# 显示图形
plt.show()

🎃初步训练支持向量机模型，查看模型的预测准确率

from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
data=pd.get_dummies(abalone)
data.head()
x=data.drop(['Rings','age'],axis=1)
y=data['age']
x_train,x_test,y_train,y_test=train_test_split(x,y)
x_train.shape

svr=SVR(gamma=0.125)
svr.fit(x_train,y_train)
svr.score(x_test,y_test)
# 导入train_test_split函数，用于将数据集划分为训练集和测试集
from sklearn.model_selection import train_test_split
# 导入SVR类，用于创建支持向量回归模型
from sklearn.svm import SVR

# 使用pandas的get_dummies函数对abalone数据集进行独热编码处理
data = pd.get_dummies(abalone)
# 显示数据的前5行
data.head()

# 从数据集中提取特征变量x，排除'Rings'和'age'列
x = data.drop(['Rings', 'age'], axis=1)
# 提取目标变量y，即'age'列
y = data['age']

# 使用train_test_split函数将数据集划分为训练集和测试集，其中测试集占比为30%
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

# 打印训练集的特征变量形状
x_train.shape

# 创建一个支持向量回归模型，设置gamma参数为0.125
svr = SVR(gamma=0.125)
# 使用训练集的特征变量和目标变量拟合模型
svr.fit(x_train, y_train)
# 使用测试集的特征变量和目标变量评估模型的准确性
svr.score(x_test, y_test)