支持向量机（SVM）讲解

最新推荐文章于 2024-10-17 12:11:36 发布

略知一二三

最新推荐文章于 2024-10-17 12:11:36 发布

阅读量1.5k

点赞数 1

文章标签：支持向量机机器学习人工智能

本文链接：https://blog.csdn.net/qq_43380699/article/details/127607004

版权

什么是支持向量机？

支持向量机是一种用来分类预测的模型；

支持向量机细讲：

从一个简单的例子说明以一维线性来分类：

我们要找到一条线使得红黑两条线分开，即招待（w,b）使得蓝色的点都满足wx+b>=1，使得红色的点满足wx+b<=1 ;这样的线，我们可以以找到很多条，但是我们要找到最好的那一根，要以什么样的标准去衡量它呢；

支持向量机定义了这样一个标准，我们找一条线，使其平行线恰好与红色分类接触，即wx+b = -1,还有一根线恰好与蓝色分类接触，即wx+b = 1,计算这两根线的距离，就能衡量我们要找的线，即使得距离最大的两条线的中间线；

恰好接触的这个点就叫做支持向量；以最简单的二维平面举例，根据初中知识，两条平行线之间的距离是2/sqrt(1 + w^2);要求距离最大，即是求w最小；

好，现在我们将这个问题抛到R空间中，我们又当如何去处理呢，如何去找这个超平面呢？

同样我们要求在支持向量所在的两个平面的最大距离，假设我们其中的一个支持向量是x2，所在平面为wx2 + b = 1; 那么我们一定能在wx + b = -1上找到一点x1，使其满足x2 = x1 + lambaw;

接着往下推理我们能发现两线的距离与lambaw = 2/|w|,所以我们求这个超平面就可以理解为求|w|的最小值；用数学表达就是

当然我们有时候又不能达到线性可分，如图

这时我们的处理办法就是将该维度的变量映射到高纬度中，就能够通过超平面去分类这个问题了；

下面提供相应代码块：

import numpy as np
import matplotlib.pyplot as plt

from sklearn.svm import SVC

# Plot the classifier boundaries on input data
def plot_classifier(classifier, X, y, title='Classifier boundaries', annotate=False):
    # define ranges to plot the figure 
    x_min, x_max = min(X[0]) - 1.0, max(X[0]) + 1.0
    y_min, y_max = min(X[1]) - 1.0, max(X[1]) + 1.0

    # denotes the step size that will be used in the mesh grid
    step_size = 0.01

    # define the mesh grid
    x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))

    # compute the classifier output
    mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])

    # reshape the array
    mesh_output = mesh_output.reshape(x_values.shape)

    # Plot the output using a colored plot 
    plt.figure()

    # Set the title
    plt.title(title)

    # choose a color scheme you can find all the options 
    # here: http://matplotlib.org/examples/color/colormaps_reference.html
    plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.gray)

    # Overlay the training points on the plot 
    plt.scatter(X[0], X[1], c=y, s=80, edgecolors='black', linewidth=1, cmap=plt.cm.Paired)

    # specify the boundaries of the figure
    plt.xlim(x_values.min(), x_values.max())
    plt.ylim(y_values.min(), y_values.max())

    # specify the ticks on the X and Y axes
    plt.xticks(())
    plt.yticks(())

    if annotate:
        for x, y in zip(X[0, :], X[:, 1]):
            # Full documentation of the function available here: 
            # http://matplotlib.org/api/text_api.html#matplotlib.text.Annotation
            plt.annotate(
                '(' + str(round(x, 1)) + ',' + str(round(y, 1)) + ')',
                xy = (x, y), xytext = (-15, 15), 
                textcoords = 'offset points', 
                horizontalalignment = 'right', 
                verticalalignment = 'bottom', 
                bbox = dict(boxstyle = 'round,pad=0.6', fc = 'white', alpha = 0.8),
                arrowprops = dict(arrowstyle = '-', connectionstyle = 'arc3,rad=0'))

def load_data(input_file):
    x = []
    y = []
    with open(input_file, 'r') as f:
        for line in f.readlines():
            data = [float(i) for i in line.split(",")]
            x.append(data[:-1])
            y.append(data[-1])
    return x, y

input_file = 'data_multivar.txt'
x , y = load_data(input_file)

class1 = [x[i] for i in range(len(x)) if y[i] == 1]
class1label = np.full(len(class1),1.0)
class0 = [x[i] for i in range(len(x)) if y[i] == 0]
class0label = np.full(len(class1),0.0)



params = {'kernel': 'poly', 'degree': 3}
classifier = SVC(**params)

classifier.fit(x,y)
x = list(zip(*x))
x = [list(x[1]), list(x[0])]

plot_classifier(classifier, x, y, 'Test dataset')