支持向量机 SVM

最新推荐文章于 2024-06-11 13:10:45 发布

木南曌

最新推荐文章于 2024-06-11 13:10:45 发布

阅读量181

点赞数

分类专栏： Python Python 机器学习

本文链接：https://blog.csdn.net/qq_kbyd/article/details/89302686

版权

Python 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

Python 机器学习

6 篇文章 0 订阅

订阅专栏

原理

SVM 是一种基于最大间隔分隔数据的算法。将数据集分隔开的决策边界被称为分隔超平面(separating hyperplane)。形象地说，分隔超平面是一个N-1的对象，如果数据点分布在二维平面上，则分隔超平面就是一条直线；如果在三维空间上，就是一个平面。支持线性可分和线性不可分。

优点：
1 支持各种不同类型的数据集；
2 对高维数据集和低维数据集的支持都很好
缺点：
1 数据规模不能太大，超过 10 万，就会非常耗费时间和内存
2 对数据预处理和参数调节要求非常高

SVM 主要有下面这几种模型：多项式内核、高斯内核、线性内核、linearSVM。下面给出这几种模型的对比展示：

import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine

def make_meshgrid(x, y, h=.02):
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


# 绘制等高线的函数
def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out


# 不同核函数的 SVM 对比
def show_differ():
    wine = load_wine()
    X = wine.data[:, :2]
    y = wine.target
    C = 1.0
    models = (svm.SVC(kernel='linear', C=C),
              svm.LinearSVC(C=C),
              svm.SVC(kernel='rbf', gamma=0.7, C=C),
              svm.SVC(kernel='poly', degree=3, C=C))
    models = (clf.fit(X, y) for clf in models)

    # 设定标题
    titles = ('svc with linear kernel',
              'linearSVC',
              'SVC with RBF kernel',
              'SVC with polynomial kernel')

    # 设定一个子图形的个数和排列方式
    fig, sub = plt.subplots(2, 2)
    plt.subplots_adjust(wspace=0.4, hspace=0.4)
    X0, X1 = X[:, 0], X[:, 1]
    xx, yy = make_meshgrid(X0, X1)

    for clf, title, ax in zip(models, titles, sub.flatten()):
        plot_contours(ax, clf, xx, yy, alpha=0.8)
        ax.scatter(X0, X1, c=y, s=20, edgecolors='k')
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xlabel('feature 0')
        ax.set_ylabel('feature 1')
        ax.set_xticks(())
        ax.set_yticks(())
        ax.set_title(title)
    plt.show()

运行结果

在 SVM 算法中，有 3 点需要注意：第一个是核函数的选择；第二个是核函数的参数，例如 RBF 的 gamma 值；gamma 值越小，模型月倾向于欠拟合，而 gamma 值越大，则模型越倾向于过拟合问题。第三个是正则化参数 C。C 值越小，模型就越受限，也就是说单个数据点对模型的影响越小，模型越简单。

房价回归分析

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler


def example():
    boston = load_boston()
    X, y = boston.data, boston.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8)
    # 对训练集和测试集进行数据预处理
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train_scaled = scaler.transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    print('--------------')
    print(X_train.shape)
    print(X_test.shape)
    print('-----------------')
    for kernel in ['linear', 'rbf']:
        svr = SVR(kernel=kernel, C=100, gamma=0.1)
        svr.fit(X_train_scaled, y_train)
        print(kernel, '核函数模型训练集得分：{:.3f}'.format(svr.score(X_train_scaled, y_train)))
        print(kernel, '核函数模型测试集得分：{:.3f}'.format(svr.score(X_test_scaled, y_test)))

运行结果

--------------
(379, 13)
(127, 13)
-----------------
linear 核函数模型训练集得分：0.706
linear 核函数模型测试集得分：0.699
rbf 核函数模型训练集得分：0.966
rbf 核函数模型测试集得分：0.894

木南曌

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录