支持向量机

最新推荐文章于 2020-11-09 21:51:24 发布

u200710

最新推荐文章于 2020-11-09 21:51:24 发布

阅读量338

点赞数

分类专栏： scikit-learn 文章标签： python 机器学习支持向量机

原文链接：https://scikit-learn.org/stable/modules/svm.html

版权

scikit-learn 专栏收录该内容

21 篇文章

订阅专栏

支持向量机(SVM)是一种监督学习方法，适用于分类、回归和异常检测。它在高维数据中表现高效，尤其在样本数量少于特征数量时。SVM利用核函数进行非线性转换，并有多种核函数选择，如线性、多项式、径向基函数(RBF)和Sigmoid。在Python的scikit-learn库中，SVM支持多类别分类、概率估计和非均衡问题处理。然而，选择合适的核函数和正则项对避免过拟合至关重要，且概率估计需要昂贵的五折交叉验证。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

支持向量机

支持向量机 (Support Vector Machines)是一系列用于分类、回归和异常点检测的监督学习方法。

支持向量机的优点是：

在高维数据空间的高效性
当维度远大于样本数量时，仍然有效
在决策函数中使用训练点的子集
通用性：对于决策函数，可以使用不同的核函数。提供了通用的核，但是也可以指定定制化的核

支持向量机的缺点是：

如果特征的数量远大于样本的数量时，为避免拟合，在选择核函数和正则项时非常关键
SVMs没有直接提供概率估计，需使用昂贵的五折交叉验证计算

在scikit-learn中，支持向量机可以使用密集的(numpy.ndarray和numpy.asarray)和稀疏的(scipy.sparse)样本向量作为输入。

分类

SVC、NuSVC和LinearSVC是能够用于执行多类别分类的算法。

# coding: utf-8
# Plot different SVM classifiers in the iris dataset

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets


def make_meshgrid(x, y, h=.02):
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out


 iris = datasets.load_iris()
 
X = iris.data[:, :2]
y = iris.target

C = 1.0  # SVM regularization parameter
models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C, max_iter=10000),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, gamma='auto', C=C))
models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy,
                  cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel('Sepal length')
    ax.set_ylabel('Sepal width')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()

SVC和NuSVC是相同的方法，但是接受略微不同的参数集，且有不同的数学模型。另一方面，LinearSVC是在线性核下的支持向量分类。注意：LinearSVC不接受参数kernel，它同样也没有svc和NuSVC的成员，如support_。

支持向量机的决策函数依赖于训练数据的子集，称作支持向量。这些支持向量可以通过如下成员找到support_vectors_、support_和n_support。

多类别分类

svc和NuSVC使用one-against-one实施多类别分类。如果n_class表示类别的数量，那么需要构建n_class*(n_class-1)/2个分类器，且每个分类器在两个类的数据上训练。为了提供与其它分类器相同的接口，decision_function_shape允许单调地将one-against-one分类器的结果转化为大小为(n_samples, n_classes)的决策函数。

LinearSVC实施one-vs-the-rest多分类策略，训练n_class个模型。

# coding: utf-8

from sklearn import svm

X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]  # 4 classes

clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, Y)
dec = clf.decision_function([[1]])
print(dec.shape) # 4*3/2 classes

clf.decision_function_shape='ovr'
dec = clf.decision_function([[1]])
print(dec.shape) # 4 classes

lin_clf = svm.LinearSVC()
lin_clf.fit(X, Y)

dec = lin_clf.decision_function([[1]])
print(dec.shape)

分数和概率

svc和NuSVC的decision_function给了每个样本每类的分数(score)。当构造器的选项probability设置为True时，开启概率估计，使用predict_proba和predict_log_proba。在二元分类的情况下，使用Platt scaling校准概率：在SVM分数上进行逻辑回归，在训练数据上通过附加的交叉验证拟合。

对于大的数据集而言，在Platt scaling中执行交叉验证是一种昂贵的操作。此外，估算的概率也许和分数不一致，即，分数中最大值所对应的类别可能与概率中最大值所对应的类别不同。

注意，当decision_function_shape=ovr和n_classes > 2，不同于decision_function，predict方法没有默认尝试打破平局。

# coding: utf-8
# SVM Tie Breaking Example

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_blobs

X, y = make_blobs(random_state=27)

fig, sub = plt.subplots(2, 1, figsize=(5, 8))
titles = ('break_ties = False',
          'break_ties = True')

for break_ties, title, ax in zip((False, True), titles, sub.flatten()):

    svm = SVC(kernel='linear', C=1, break_ties=break_ties,
              decision_function_shape='ovr').fit(X, y)

    xlim = [X[:,0].min(), X[:,0].max()]
    ylim = [X[:,1].min(), X[:,1].max()]

    xs = np.linspace(xlim[0], xlim[1], 1000)
    ys = np.linspace(ylim[0], ylim[1], 1000)
    xx, yy = np.meshgrid(xs, ys)

    pred = svm.predict(np.c_[xx.ravel(), yy.ravel()])

    colors = [plt.cm.Accent(i) for i in [0, 4, 7]]

    points = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='Accent')
    classes = [(0, 1), (0, 2), (1, 2)]
    line = np.linspace(X[:, 1].min()-5, X[:, 1].max()+5)
    ax.imshow(-pred.reshape(xx.shape), cmap='Accent', alpha=.2,
              extent=(xlim[0], xlim[1], ylim[1], ylim[0]))

    for coef, intercept, col in zip(svm.coef_, svm.intercept_, classes):
        line2 = -(line * coef[1] + intercept) / coef[0]
        ax.plot(line2, line, "-", c=colors[col[0]])
        ax.plot(line2, line, "--", c=colors[col[1]])
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)
    ax.set_title(title)
    ax.set_aspect("equal")

plt.show()

非均衡问题

在对于给某些类别或样本更高的重要性，可以使用关键词class_weight和sampe_weight。

SVC(但非NuSVC)在fit方法中添加了关键词class_weight。以字典形式{class_label: value}描述，其中，value是一个大于0的浮点数，设置class_label的C值等于C * value。

SVC、NuSVC、SVR、NuSVR、LinearSVC、LinearSVR和OneClassSVM同样在fit方法中实现了通过sample_weight对单个样本设置权重。和class_weight相似，设置第i个样本的参数C等于C * sample_weight[i]。

# coding: utf-8
# SVM: Weighted samples

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

def plot_decision_function(classifier, sample_weight, axis, title):

    xx, yy = np.meshgrid(np.linspace(-4, 5, 500), np.linspace(-4, 5, 500))

    Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axis.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.bone)
    axis.scatter(X[:, 0], X[:, 1], c=y, s=100*sample_weight, alpha=0.9,
                 cmap=plt.cm.bone, edgecolors='black')

    axis.axis('off')
    axis.set_title(title)

np.random.seed(0)
X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)]
y = [1] * 10 + [-1] * 10
sample_weight_last_ten = abs(np.random.randn(len(X)))
sample_weight_const = np.ones(len(X))

sample_weight_last_ten[15:] *= 5
sample_weight_last_ten[9] *= 15

clf_weights = svm.SVC(gamma=1)
clf_weights.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights = svm.SVC(gamma=1)
clf_no_weights.fit(X, y)

fit, axes = plt.subplots(1, 2, figsize=(14, 6))
plot_decision_function(clf_no_weights, sample_weight_const, axes[0], 'Constant Weights')
plot_decision_function(clf_weights, sample_weight_last_ten, axes[1], 'Modified Weights')

plt.show()

回归

支持向量分类的方法可以拓展到求解回归问题。这种方法称作支持向量回归。

支持向量分类构建的模型仅仅依赖于训练数据的子集，因为构建模型的成本函数不关心超过边缘的训练数据。同样的，支持向量回归的模型也只依赖于训练数据的子集，因为构建模型的成本函数忽略了接近模型预测的训练数据。

实现支持向量的类包括SVR、NuSVR和LinearSVR。

# coding: utf-8
# Support Vector Regression (SVR) using linear and non-linear kernels

import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt

X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()

y[::5] += 3 * (0.5 - np.random.rand(8))

svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=.1)
svr_lin = SVR(kernel='linear', C=100, gamma='auto')
svr_poly = SVR(kernel='poly', C=100, gamma='auto', degree=3, epsilon=.1, coef0=1)

lw = 2

svrs = [svr_rbf, svr_lin, svr_poly]
kernel_label = ['RBF', 'Linear', 'Polynomial']
model_color = ['m', 'c', 'g']

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 10), sharey=True)
for ix, svr in enumerate(svrs):
    axes[ix].plot(X, svr.fit(X, y).predict(X), color=model_color[ix], lw=lw, label='{} model'.format(kernel_label[ix]))
    axes[ix].scatter(X[svr.support_], y[svr.support_], facecolor='none', edgecolor=model_color[ix], s=50,
                     label='{} support vectors'.format(kernel_label[ix]))
    axes[ix].scatter(X[np.setdiff1d(np.arange(len(X)), svr.support_)],
                     y[np.setdiff1d(np.arange(len(X)), svr.support_)],
                     facecolor='none', edgecolor='k', s=50,
                     label='other training data')
    axes[ix].legend(loc='upper center', bbox_to_anchor=(0.5, 1.1),
                    ncol=1, fancybox=True, shadow=True)

fig.text(0.5, 0.04, 'data', ha='center', va='center')
fig.text(0.06, 0.5, 'target', ha='center', va='center', rotation='vertical')
fig.suptitle('Support Vector Regression', fontsize=14)
plt.show()

密度估计和异常检测

OneClassSVM实现了单一类别的支持向量机，可以用于异常检测。

核函数

常用的核函数如下：

linear： $\left\langle x, x^{'} \right\rangle$
polynomial： $\left( \gamma\left\langle x, x^{'} \right\rangle + r \right)^d$
rbf： $exp\left(-\gamma\left\| x-x^{'} \right\|^2\right)$ 。 $\gamma$ 使用关键词gamma指定，必须大于0
sigmoid： $\left( tanh(\gamma \left\langle x, x^{'} \right\rangle + r) \right)$ ， $r$ 有参数coef0指定

定制核函数

使用Python函数作为核函数

在构造器中，你可以通过传递一个函数给关键词kernel来定义自己的核函数。

# coding: utf-8
# SVM with custom kernel

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

iris = datasets.load_iris()
X = iris.data[:, :2]
Y = iris.target


def my_kernel(X, Y):
    """
    We create a customer kernel
    """
    M = np.array([[2, 0], [0, 1.0]])
    return np.dot(np.dot(X, M), Y.T)

h = 0.02

clf = svm.SVC(kernel=my_kernel)
clf.fit(X, Y)

x_min, x_max = X[:, 0].min()-1, X[:, 0].max()+1
y_min, y_max = X[:, 1].min()-1, X[:, 1].max()+1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('3-Class classification using Support Vector Machine with custom kernel')
plt.axis('tight')
plt.show()

使用`Gram`矩阵

令kernel='precomputed，将Gram矩阵传入fit函数。如下例

# coding: utf-8

import numpy as np
from sklearn import svm

X = np.array([[0, 0], [1, 1]])
y = [0, 1]
clf = svm.SVC(kernel='precomputed')
gram = np.dot(X, X.T)
clf.fit(gram, y)

print(clf.predict(gram))