Support Vector Machine

大星小辰

已于 2023-05-05 06:20:36 修改

阅读量459

点赞数

分类专栏： python3-机器学习文章标签：支持向量机机器学习算法

于 2019-04-20 12:35:06 首次发布

本文链接：https://blog.csdn.net/qq_28306361/article/details/89414989

版权

python3-机器学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

文章目录

支持向量机（Support Vector Machine，SVM）

支持向量机（Support Vector Machine，SVM）

什么是SVM

找到一个最优的决策边界，使得两类点中距离决策边界最近的点到边界的距离最大，max margin
$ma r g in = 2 d$
解决线性可分问题。其中，根据实际的情况，可以采用两种形式：

Hard Margin：将数据完全分开，Soft Margin：允许一部分数据分类出错（有一些数据实在不能用直线将两类数据区分开）

在这里插入图片描述

SVM的理论推导

点到直线的距离计算公式如下：
$\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}}$

对于分类结果，我们希望有
$\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$
等价于
$\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right.$
这里的分母是一个常数，那么上面的公式等价于：
$\left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$
即：
$\left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.$
在这里插入图片描述

把 $y^{(i)}$ 乘进来，我们就有：
$y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1$
因此，想要求 $\:\:d$ , 把点到直线的距离公式带入进来，就只需要求 $\:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|}$ . 对于边界以外的点，分母
$\left | w^{T}\cdot x^{(i)}+b \right |\geq 1$ 因此只需要求 $\:\:\frac{1}{\left \| w \right \|}$ ，等价于求 $\left \| w \right \|$

最后，这个优化问题可以等价于：
$\begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix}$

Soft Margin SVM

有些时候，数据是线性不可分的，因此我们在上面的优化问题上，再加一项：
$min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0$
这种情况下我们称这一项为 L1 norm. 当然，我们也可以用 L2 norm.
$min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0$ 这个常数项C表示对错误的容忍程度。

利用scikit-learn库使用SVM

在使用SVM之前，为了使用距离，我们先做一下数据的预处理

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]

plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')
plt.show()

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

from sklearn.svm import LinearSVC
svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)

def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])
plt.show()

Using polynomial features and kernel function in SVM

…

什么是核函数

Actually, we can convert the optimization problem to this problem:
$\begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix}$ In the past, we use the polynomial features to convert the example $x^{(i)}$ to $x^{'(i)} $ , $x^{(j)}$ to $x^{'(j)}$ . Then we calculate the product of the $x^{'(i)} x^{'(j)}$ . But now we want to use a function which the input is $x^{(i)},x^{(j)}$ and the output is the $x^{'(i)}x^{'(j)}$ to calculate the product directly.
$K(x^{(i)},x^{(j)})=x^{'(i)}x^{'(j)}$ It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like $x_{i}x_{j}$ we can use the kernel function. It’s not only belong to SVM.

多项式核函数

$y)=(x\cdot y+1)^{2}$ $\begin{aligned} K(x, y)=&(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned}$ if we define
$x^{'} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1)$ $y^{'} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1)$ then we have
$K(x,y)=x^{'}y^{'}$ so we directly calculate
$y)=(x\cdot y+1)^{2}$ instead of calculate the polynomial features $x^{'}y^{'}$
Generally speaking, the kernel function is
$y)=(x\cdot y+c)^{d}$

RBF核函数

高斯核函数为：
$K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}}$
也叫作RBF(Radial Basis Function) Kernel(径向基函数)。将每一个样本点映射到一个无穷维的特征空间。（对于每一个数据点都是landmark，将 $m * n$ 的数据映射成 $m * m$ 的数据）

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-4, 5, 1)
x

array([-4, -3, -2, -1,  0,  1,  2,  3,  4])

y = np.array((x >= -2) & (x <= 2), dtype='int')
y

array([0, 0, 1, 1, 1, 1, 1, 0, 0])

plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))
plt.show()

在这里插入图片描述

def gaussian(x, l):
    gamma = 1.0
    return np.exp(-gamma * (x-l)**2)

l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)

plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])
plt.show()

在这里插入图片描述

Gamma in RBF kernel

在这里插入图片描述

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

X, y = datasets.make_moons(noise=0.15, random_state=666)

plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

在这里插入图片描述

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

def RBFKernelSVC(gamma):
    return Pipeline([
        ("std_scaler", StandardScaler()),
        ("svc", SVC(kernel="rbf", gamma=gamma))
    ])

svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)

plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)

Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

Solving regression problem by using SVM

期望在margin的范围中，所包含的点越多越好

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)


from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def StandardLinearSVR(epsilon=0.1):
    return Pipeline([
        ('std_scaler', StandardScaler()),
        ('linearSVR', LinearSVR(epsilon=epsilon))
    ])

svr = StandardLinearSVR()
svr.fit(X_train, y_train)

svr.score(X_test, y_test)