Support Vector Machine

支持向量机(Support Vector Machine,SVM)

什么是SVM

找到一个最优的决策边界,使得两类点中距离决策边界最近的点到边界的距离最大,max margin
m a r g i n = 2 d margin=2d margin=2d
解决线性可分问题。其中,根据实际的情况,可以采用两种形式:

Hard Margin:将数据完全分开,Soft Margin:允许一部分数据分类出错(有一些数据实在不能用直线将两类数据区分开)

在这里插入图片描述

SVM的理论推导

点到直线的距离计算公式如下:
d = ∣ w T ⋅ x + b ∣ ∥ w ∥ ,    ∥ w ∥ = w 1 2 + w 2 2 + ⋯ + w n 2 d = \frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}} d=w wTx+b ,w=w12+w22++wn2

对于分类结果,我们希望有
{ w T ⋅ x ( i ) + b ∥ w ∥ ≥ d     ∀ y ( i ) = 1 w T ⋅ x ( i ) + b ∥ w ∥ ≤ − d     ∀ y ( i ) = − 1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right. wwTx(i)+bdy(i)=1wwTx(i)+bdy(i)=1
等价于
{ w T ⋅ x ( i ) + b ∥ w ∥ d ≥ 1     ∀ y ( i ) = 1 w T ⋅ x ( i ) + b ∥ w ∥ d ≤ − 1     ∀ y ( i ) = − 1 \left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right. wdwTx(i)+b1wdwTx(i)+b1y(i)=1y(i)=1
这里的分母是一个常数,那么上面的公式等价于:
{ w d T ⋅ x ( i ) + b d ≥ 1     ∀ y ( i ) = 1 w d T ⋅ x ( i ) + b d ≤ − 1     ∀ y ( i ) = − 1 \left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right. wdTx(i)+bd1y(i)=1wdTx(i)+bd1y(i)=1
即:
{ w T ⋅ x ( i ) + b ≥ 1     ∀ y ( i ) = 1 w T ⋅ x ( i ) + b ≤ − 1     ∀ y ( i ) = − 1 \left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right. wTx(i)+b1y(i)=1wTx(i)+b1y(i)=1
在这里插入图片描述

y ( i ) y^{(i)} y(i)乘进来,我们就有:
y ( i ) ( w T ⋅ x ( i ) + b ) ≥ 1 y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 y(i)(wTx(i)+b)1
因此,想要求 m a x    d max \:\:d maxd , 把点到直线的距离公式带入进来,就只需要求 m a x    ∣ w T ⋅ x + b ∣ ∥ w ∥ max \:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|} maxwwTx+b. 对于边界以外的点,分母
∣ w T ⋅ x ( i ) + b ∣ ≥ 1 \left | w^{T}\cdot x^{(i)}+b \right |\geq 1 wTx(i)+b 1因此只需要求 m a x    1 ∥ w ∥ max \:\:\frac{1}{\left \| w \right \|} maxw1,等价于求 m i n ∥ w ∥ min \left \| w \right \| minw

最后,这个优化问题可以等价于:
m i n    1 2 ∥ w ∥ 2 s . t .    y ( i ) ( w T ⋅ x ( i ) + b ) ≥ 1 \begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix} min21w2s.t.y(i)(wTx(i)+b)1

Soft Margin SVM

有些时候,数据是线性不可分的,因此我们在上面的优化问题上,再加一项:
m i n    1 2 ∥ w ∥ 2 + C ∑ i = 1 m ζ i    s . t .    y ( i ) ( w T ⋅ x ( i ) + b ) ≥ 1 − ζ i ,    ζ i ≥ 0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 min21w2+Ci=1mζis.t.y(i)(wTx(i)+b)1ζi,ζi0
这种情况下我们称这一项为 L1 norm. 当然,我们也可以用 L2 norm.
m i n    1 2 ∥ w ∥ 2 + C ∑ i = 1 m ζ i 2    s . t .    y ( i ) ( w T ⋅ x ( i ) + b ) ≥ 1 − ζ i ,    ζ i ≥ 0 min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0 min21w2+Ci=1mζi2s.t.y(i)(wTx(i)+b)1ζi,ζi0这个常数项C表示对错误的容忍程度。

利用scikit-learn库使用SVM

在使用SVM之前,为了使用距离,我们先做一下数据的预处理

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]

plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')
plt.show()

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

from sklearn.svm import LinearSVC
svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)

def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])
plt.show()

Using polynomial features and kernel function in SVM

什么是核函数

Actually, we can convert the optimization problem to this problem:
m a x   ∑ i = 1 m a i − 1 2 ∑ i = 1 m ∑ j = 1 m a i a j y i y j x i x j s . t .    0 ⩽ a i ⩽ c ,    ∑ i = 1 m a i y i = 0 \begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix} maxi=1mai21i=1mj=1maiajyiyjxixjs.t.0aic,i=1maiyi=0In the past, we use the polynomial features to convert the example x ( i ) x^{(i)} x(i) to x ′ ( i ) ​ x^{'(i)} ​ x(i), x ( j ) x^{(j)} x(j) to x ′ ( j ) x^{'(j)} x(j). Then we calculate the product of the x ′ ( i ) x ′ ( j ) x^{'(i)} x^{'(j)} x(i)x(j). But now we want to use a function which the input is x ( i ) , x ( j ) x^{(i)},x^{(j)} x(i),x(j) and the output is the x ′ ( i ) x ′ ( j ) x^{'(i)}x^{'(j)} x(i)x(j) to calculate the product directly.
K ( x ( i ) , x ( j ) ) = x ′ ( i ) x ′ ( j ) K(x^{(i)},x^{(j)})=x^{'(i)}x^{'(j)} K(x(i),x(j))=x(i)x(j)It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like x i x j x_{i}x_{j} xixj we can use the kernel function. It’s not only belong to SVM.

多项式核函数

K ( x , y ) = ( x ⋅ y + 1 ) 2 K(x, y)=(x\cdot y+1)^{2} K(x,y)=(xy+1)2 K ( x , y ) = ( ∑ i = 1 n x i y i + 1 ) 2 = ∑ i = 1 n ( x i 2 ) ( y i 2 ) + ∑ i = 2 n ∑ j = 1 i − 1 ( 2 x i x j ) ( 2 y i y j ) + ∑ i = 1 n ( 2 x i ) ( 2 y i ) + 1 \begin{aligned} K(x, y)=&(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned} K(x,y)==(i=1nxiyi+1)2i=1n(xi2)(yi2)+i=2nj=1i1(2 xixj)(2 yiyj)+i=1n(2 xi)(2 yi)+1if we define
x ′ = ( x n 2 , ⋯   , x 1 2 , 2 x n x n − 1 , ⋯   , 2 x n , ⋯   , 2 x 1 , 1 ) x^{'} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1) x=(xn2,,x12,2 xnxn1,,2 xn,,2 x1,1) y ′ = ( y n 2 , ⋯   , y 1 2 , 2 y n y n − 1 , ⋯   , 2 y n , ⋯   , 2 y 1 , 1 ) y^{'} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1) y=(yn2,,y12,2 ynyn1,,2 yn,,2 y1,1)then we have
K ( x , y ) = x ′ y ′ K(x,y)=x^{'}y^{'} K(x,y)=xyso we directly calculate
K ( x , y ) = ( x ⋅ y + 1 ) 2 K(x, y)=(x\cdot y+1)^{2} K(x,y)=(xy+1)2instead of calculate the polynomial features x ′ y ′ x^{'}y^{'} xy
Generally speaking, the kernel function is
K ( x , y ) = ( x ⋅ y + c ) d K(x, y)=(x\cdot y+c)^{d} K(x,y)=(xy+c)d

RBF核函数

高斯核函数为:
K ( x , y ) = e − γ ∥ x − y ∥ 2 K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}} K(x,y)=eγxy2
也叫作RBF(Radial Basis Function) Kernel(径向基函数)。将每一个样本点映射到一个无穷维的特征空间。(对于每一个数据点都是landmark,将 m ∗ n m*n mn的数据映射成 m ∗ m m*m mm的数据)

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-4, 5, 1)
x
array([-4, -3, -2, -1,  0,  1,  2,  3,  4])
y = np.array((x >= -2) & (x <= 2), dtype='int')
y
array([0, 0, 1, 1, 1, 1, 1, 0, 0])
plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))
plt.show()

在这里插入图片描述

def gaussian(x, l):
    gamma = 1.0
    return np.exp(-gamma * (x-l)**2)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])
plt.show()

在这里插入图片描述

Gamma in RBF kernel

在这里插入图片描述

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

X, y = datasets.make_moons(noise=0.15, random_state=666)

plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()

在这里插入图片描述

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

def RBFKernelSVC(gamma):
    return Pipeline([
        ("std_scaler", StandardScaler()),
        ("svc", SVC(kernel="rbf", gamma=gamma))
    ])
svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
l1, l2 = -1, 1

X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
    X_new[i, 0] = gaussian(data, l1)
    X_new[i, 1] = gaussian(data, l2)
plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)
Pipeline(memory=None,
     steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
  s)

在这里插入图片描述

Solving regression problem by using SVM

期望在margin的范围中,所包含的点越多越好

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)


from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def StandardLinearSVR(epsilon=0.1):
    return Pipeline([
        ('std_scaler', StandardScaler()),
        ('linearSVR', LinearSVR(epsilon=epsilon))
    ])

svr = StandardLinearSVR()
svr.fit(X_train, y_train)

svr.score(X_test, y_test)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值