Support Vector Machine

送快递的勃仕

于 2023-07-08 15:42:39 发布

阅读量157

点赞数 3

文章标签：线性代数机器学习 scikit-learn svm

本文链接：https://blog.csdn.net/tew_315/article/details/131612764

版权

SVM

Q: Why is hyperplane?
A: The dataset is normally more than 2 dimension, when dim=2, we need a 2-1 dimensional line to separate the data, when dim=3, we need a 3-1 dimensional plane to separate. So we need a n-1 dimensional hyperplane to separate the n dimensional data.

Q: which hyperplane fits well the most in classify application?
在这里插入图片描述

A: The middle one fits well, for it has a wild rage of tolerance and high robustness with high generalization ability

Hyperplane

The function of hyperplane is $\omega^Tx+b=0$ , we define $(\omega,b)$ as a notation of hyperplane.
We assume $(\omega,b)$ could classify the training sample correctly, we could get:
$if\ y_i=+1,\omega^T+b>0;\\ if\ y_i=-1,\omega^T+b<0$
在这里插入图片描述

margin

在这里插入图片描述 Maximize the margin for its robustness.
The distance are the same for every points on the dashed line, the hyperplane can be located by just only these few points, that’s why we called it the ‘supported vector’.
Maximize margin by finding the corresponding $\omega$ , and $b$
$arg\max_{\omega,b}\frac{2}{||\omega||}$
transform maximize optimization into minimize:
$\begin{aligned} &arg\min_{\omega,b}\frac{1}{2}||\omega||^2\\ &s.t.\ y_i(\omega^Tx_i+b){\geq}1,i=1,2,...,m. \end{aligned}$

Model

solve:
Introducing the Lagrange Multiplier:
$L(\omega,b,\alpha)=\frac{1}{2}||\omega||^2-\sum_{i=1}^{m}\alpha(y_i(\omega^Tx_i+b)-1)$
Let: $\frac{{\partial}L}{{\partial}\omega}=\frac{{\partial}L}{{\partial}b}=0$
$\omega=\sum_{i=1}^{m}\alpha_iy_ix_i,\sum^m_{i=1}\alpha_iy_i=0$

Dual Problem

$\begin{aligned} \min_{\alpha}\ \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j-\sum_{i=1}^m\alpha_i\\ s.t.\sum^m_{i=1}\alpha_iy_j=0,\alpha_i\geq0,i=1,2,...,m \end{aligned}$

Classification model

$X^T$ :testing sample
$f(x)=\omega^Tx+b=\sum_{i=1}^m\alpha_iy_ix_i^Tx+b$
fulfill KKT constrains:( Karush-Kuhn-Tucker)
$\left\{\begin{aligned} &\alpha_i\geq0\\ &y_if(x_i)\geq1\\ &\alpha_i(y_if(x_i)-1)=0 \end{aligned}\right.$
Sparsity of solutions: the sample will not be reserved after training, the model at last is only related to the supported vector. So for the SVM model, over fitting may not occur easily.

from sklearn import svm

X = [[2, 0], [1, 1], [2,3]]
y = [0, 0, 1]

clf = svm.SVC(kernel = 'linear')
clf.fit(X, y)
clf

clf.support_vectors_

array([[1., 1.],
       [2., 3.]])

clf.support_

array([1, 2])

clf.predict([[2,0]])

array([0])

import numpy as np
import pylab as pl
from sklearn import svm

np.random.seed(0)
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [0] * 20 + [1] * 20
clf = svm.SVC(kernel='linear')
clf.fit(X, Y)

#hyperplane：w0x0 + w1x1 + b = 0;  y = -(w0/w1)x - (w2/w1)
w = clf.coef_[0]  # coef:w
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (clf.intercept_[0]) / w[1]  # intercept:bias,b

b = clf.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = clf.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])

print("w: ", w)
print("a: ", a)
print("support_vectors_: ", clf.support_vectors_)
print("clf.coef_: ", clf.coef_)

pl.plot(xx, yy, 'k-')
pl.plot(xx, yy_down, 'k--')
pl.plot(xx, yy_up, 'k--')
pl.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
           s=80, facecolors='none')
pl.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)

pl.axis('tight')
pl.show()

w:  [0.90230696 0.64821811]
a:  -1.391980476255765
support_vectors_:  [[-1.02126202  0.2408932 ]
 [-0.46722079 -0.53064123]
 [ 0.95144703  0.57998206]]
clf.coef_:  [[0.90230696 0.64821811]]

在这里插入图片描述

Kernel

Linear Separable

Q: What if the hyperplane that separate the sample in two does not exist?
A: Mapping the original space into a higher dimensional feature space, making the sample linear separable.
在这里插入图片描述 ### Dual Problem
$\begin{aligned} \min_{\alpha}\ \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_j\phi(x_i)^T\phi(x_j)-\sum_{i=1}^m\alpha_i\\ s.t.\sum^m_{i=1}\alpha_iy_j=0,\alpha_i\geq0,i=1,2,...,m \end{aligned}$

Classification model

$f(x)=\omega^T\phi(x)+b=\sum_{i=1}^m\alpha_iy_i\phi(x_i)^T\phi(x+b)$

Kernel trick

$\begin{aligned}K(x,z)&=\phi(x){\cdot}\phi(z)\\ &=x_1^2z_1^2+2x_1x_2z_1z_2+x_2^2x_2^2\\ &=(x_1z_1+x_2z_2)^2\\ &=\left( \left[ \begin{array}{c} x_1\\ x_2 \end{array}\right] {\cdot} \left[ \begin{array}{c} z_1\\ z_2 \end{array} \right] \right)^2\\ &=(x{\cdot}z)^2 \end{aligned}$
computing $k (k, z)$ some times is faster that calculating feature transformation and the inner product.

Mercer therom

As long as the kernel matrix corresponding to a symmetric function is semi positive, it can be used as a kernel function

Kernel Function:

Linear Kernel:
$K(x_i,x_j)=x_i^Tx_j$
Multinomial Kernel:
$K(x_i,x_j)=(x_i^Tx_j)^d$
Gaussian radial basis function kernel:
$K(x_i,x_j)=e^{||x_i^Tx_j||^2/e\sigma^2}$
Sigmoid function kernel:
$K(x_i,x_j)=tanh({\beta}x_i^Tx_j+\theta)$

eg: breast_cancer dataset using multiple kernel SVM

from sklearn import datasets
from sklearn.model_selection import train_test_split, ShuffleSplit
from sklearn import svm
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve
import numpy as np

# 绘图函数
def plot_learning_curve(estimator, title, X_data, y_target, ylim=None, cv=None, n_jobs=1, train_sizes=np.linspace(.1, 1.0, 20)):
    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training examples")
    plt.ylabel("Score")
    train_sizes, train_scores, test_scores = learning_curve(estimator, X_data, y_target, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
    test_scores_std = np.std(test_scores, axis=1)
    plt.grid()

    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
                     train_scores_mean + train_scores_std, alpha=0.1,
                     color="r")
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
                     test_scores_mean + test_scores_std, alpha=0.1, color="g")
    plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
             label="Training score")
    plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
             label="Cross-validation score")

    plt.legend(loc="best")
    return plt


# 从乳腺癌库中调用数据集
data_cancer = datasets.load_breast_cancer()
X = data_cancer.data
y = data_cancer.target
print(X, y)
print("====================")
# 训练样本测试样本划分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)
# 初始化SVM分类器
svc_classifier = svm.SVC()
# 采用训练数据训练SVM分类器
svc_classifier.fit(X_train, y_train)
'''
# 对测试样本进行预测
y_predict = svc_classifier.predict(X_test)
print(y_predict)
'''
# 用测试样本对模型进行评价
svc_accuracy = svc_classifier.score(X_test, y_test)
print("svc_accuracy：%f %%" % (100 * svc_accuracy))
plot_learning_curve(svc_classifier, "SVM", X, y, (0.87, 0.95), cv=cv, n_jobs=1)
# 无核函数
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)
title = 'Learning Curves (C=1.0, kernel=linear, degree=2) '
estimator = svm.SVC(C=1.0, kernel='linear', degree=2)
SVC_2=estimator
plot_learning_curve(estimator, title, X, y, (0.9, 1.0), cv=cv, n_jobs=1)
plt.show()

cv.get_n_splits(X, y)
for train_index, test_index in cv.split(X, y):
    X_train = X[train_index]
    X_test = X[test_index]
    y_train = y[train_index]
    y_test = y[test_index]
SVC_2.fit(X_train, y_train)
train_score = SVC_2.score(X_train, y_train)
test_score = SVC_2.score(X_test, y_test)
print('train score: {0}; test score: {1}'.format(train_score, test_score))

[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02]
 ...
 [1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02]
 [2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01]
 [7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0
 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0 1
 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0
 1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 1
 1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1
 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 0 0 0 0 0 1]
====================
svc_accuracy：92.397661 %

在这里插入图片描述

train score: 0.9604395604395605; test score: 0.9649122807017544

送快递的勃仕

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Support Vector Machine

【代码】Support Vector Machine。
复制链接

扫一扫