文章目录
支持向量机(Support Vector Machine,SVM)
什么是SVM
找到一个最优的决策边界,使得两类点中距离决策边界最近的点到边界的距离最大,max margin
m
a
r
g
i
n
=
2
d
margin=2d
margin=2d
解决线性可分问题。其中,根据实际的情况,可以采用两种形式:
Hard Margin:将数据完全分开,Soft Margin:允许一部分数据分类出错(有一些数据实在不能用直线将两类数据区分开)
SVM的理论推导
点到直线的距离计算公式如下:
d
=
∣
w
T
⋅
x
+
b
∣
∥
w
∥
,
∥
w
∥
=
w
1
2
+
w
2
2
+
⋯
+
w
n
2
d = \frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|},\: \: \left \| w \right \|=\sqrt{w_{1}^{2}+w_{2}^{2}+\cdots +w_{n}^{2}}
d=∥w∥
wT⋅x+b
,∥w∥=w12+w22+⋯+wn2
对于分类结果,我们希望有
{
w
T
⋅
x
(
i
)
+
b
∥
w
∥
≥
d
∀
y
(
i
)
=
1
w
T
⋅
x
(
i
)
+
b
∥
w
∥
≤
−
d
∀
y
(
i
)
=
−
1
\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\geq d\: \: \: \forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|}\leq -d\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.
⎩
⎨
⎧∥w∥wT⋅x(i)+b≥d∀y(i)=1∥w∥wT⋅x(i)+b≤−d∀y(i)=−1
等价于
{
w
T
⋅
x
(
i
)
+
b
∥
w
∥
d
≥
1
∀
y
(
i
)
=
1
w
T
⋅
x
(
i
)
+
b
∥
w
∥
d
≤
−
1
∀
y
(
i
)
=
−
1
\left\{\begin{matrix} \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\geq 1\: \: \: &\forall y^{(i)}=1 \\ \\ \frac{w^{T} \cdot x^{(i)}+b}{\left \| w \right \|d}\leq -1\: \: \: &\forall y^{(i)}=-1 \end{matrix}\right.
⎩
⎨
⎧∥w∥dwT⋅x(i)+b≥1∥w∥dwT⋅x(i)+b≤−1∀y(i)=1∀y(i)=−1
这里的分母是一个常数,那么上面的公式等价于:
{
w
d
T
⋅
x
(
i
)
+
b
d
≥
1
∀
y
(
i
)
=
1
w
d
T
⋅
x
(
i
)
+
b
d
≤
−
1
∀
y
(
i
)
=
−
1
\left\{\begin{matrix}& w_{d}^{T}\cdot x^{(i)}+b_{d}\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ &w_{d}^{T}\cdot x^{(i)}+b_{d}\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.
⎩
⎨
⎧wdT⋅x(i)+bd≥1∀y(i)=1wdT⋅x(i)+bd≤−1∀y(i)=−1
即:
{
w
T
⋅
x
(
i
)
+
b
≥
1
∀
y
(
i
)
=
1
w
T
⋅
x
(
i
)
+
b
≤
−
1
∀
y
(
i
)
=
−
1
\left\{\begin{matrix} w^{T}\cdot x^{(i)}+b\geq 1\: \: \: \forall y^{(i)}=1 \\ \\ w^{T}\cdot x^{(i)}+b\leq -1\: \: \: \forall y^{(i)}=-1 \end{matrix}\right.
⎩
⎨
⎧wT⋅x(i)+b≥1∀y(i)=1wT⋅x(i)+b≤−1∀y(i)=−1
把
y
(
i
)
y^{(i)}
y(i)乘进来,我们就有:
y
(
i
)
(
w
T
⋅
x
(
i
)
+
b
)
≥
1
y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1
y(i)(wT⋅x(i)+b)≥1
因此,想要求
m
a
x
d
max \:\:d
maxd , 把点到直线的距离公式带入进来,就只需要求
m
a
x
∣
w
T
⋅
x
+
b
∣
∥
w
∥
max \:\:\frac{\left | w^{T} \cdot x+b\right |}{\left \| w \right \|}
max∥w∥∣wT⋅x+b∣. 对于边界以外的点,分母
∣
w
T
⋅
x
(
i
)
+
b
∣
≥
1
\left | w^{T}\cdot x^{(i)}+b \right |\geq 1
wT⋅x(i)+b
≥1因此只需要求
m
a
x
1
∥
w
∥
max \:\:\frac{1}{\left \| w \right \|}
max∥w∥1,等价于求
m
i
n
∥
w
∥
min \left \| w \right \|
min∥w∥
最后,这个优化问题可以等价于:
m
i
n
1
2
∥
w
∥
2
s
.
t
.
y
(
i
)
(
w
T
⋅
x
(
i
)
+
b
)
≥
1
\begin{matrix} min\: \: \frac{1}{2}\left \| w \right \|^{2} \\ \\ s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1 \end{matrix}
min21∥w∥2s.t.y(i)(wT⋅x(i)+b)≥1
Soft Margin SVM
有些时候,数据是线性不可分的,因此我们在上面的优化问题上,再加一项:
m
i
n
1
2
∥
w
∥
2
+
C
∑
i
=
1
m
ζ
i
s
.
t
.
y
(
i
)
(
w
T
⋅
x
(
i
)
+
b
)
≥
1
−
ζ
i
,
ζ
i
≥
0
min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0
min21∥w∥2+Ci=1∑mζis.t.y(i)(wT⋅x(i)+b)≥1−ζi,ζi≥0
这种情况下我们称这一项为 L1 norm. 当然,我们也可以用 L2 norm.
m
i
n
1
2
∥
w
∥
2
+
C
∑
i
=
1
m
ζ
i
2
s
.
t
.
y
(
i
)
(
w
T
⋅
x
(
i
)
+
b
)
≥
1
−
ζ
i
,
ζ
i
≥
0
min\: \: \frac{1}{2}\left \| w \right \|^{2}+C\sum_{i=1}^{m}\zeta _{i}^{2}\: \: s.t.\: \: y^{(i)}(w^{T}\cdot x^{(i)}+b)\geq 1-\zeta _{i},\: \: \zeta _{i}\geq 0
min21∥w∥2+Ci=1∑mζi2s.t.y(i)(wT⋅x(i)+b)≥1−ζi,ζi≥0这个常数项C表示对错误的容忍程度。
利用scikit-learn库使用SVM
在使用SVM之前,为了使用距离,我们先做一下数据的预处理
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y<2,:2]
y = y[y<2]
plt.scatter(X[y==0,0], X[y==0,1], color='red')
plt.scatter(X[y==1,0], X[y==1,1], color='blue')
plt.show()
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1e9)
svc.fit(X_standard, y)
def plot_decision_boundary(model, axis):
x0, x1 = np.meshgrid(
np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
)
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = model.predict(X_new)
zz = y_predict.reshape(x0.shape)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[y==0,0], X_standard[y==0,1])
plt.scatter(X_standard[y==1,0], X_standard[y==1,1])
plt.show()
Using polynomial features and kernel function in SVM
…
什么是核函数
Actually, we can convert the optimization problem to this problem:
m
a
x
∑
i
=
1
m
a
i
−
1
2
∑
i
=
1
m
∑
j
=
1
m
a
i
a
j
y
i
y
j
x
i
x
j
s
.
t
.
0
⩽
a
i
⩽
c
,
∑
i
=
1
m
a
i
y
i
=
0
\begin{matrix} max\: \sum_{i=1}^{m}a_{i}-\frac{1}{2}\sum_{i=1}^{m} \sum_{j=1}^{m}a_{i}a_{j}y_{i}y_{j}x_{i}x_{j} \\ \\ s.t.\: \: 0\leqslant a_{i}\leqslant c,\: \: \sum_{i=1}^{m}a_{i}y_{i}=0 \end{matrix}
max∑i=1mai−21∑i=1m∑j=1maiajyiyjxixjs.t.0⩽ai⩽c,∑i=1maiyi=0In the past, we use the polynomial features to convert the example
x
(
i
)
x^{(i)}
x(i) to
x
′
(
i
)
x^{'(i)}
x′(i),
x
(
j
)
x^{(j)}
x(j) to
x
′
(
j
)
x^{'(j)}
x′(j). Then we calculate the product of the
x
′
(
i
)
x
′
(
j
)
x^{'(i)} x^{'(j)}
x′(i)x′(j). But now we want to use a function which the input is
x
(
i
)
,
x
(
j
)
x^{(i)},x^{(j)}
x(i),x(j) and the output is the
x
′
(
i
)
x
′
(
j
)
x^{'(i)}x^{'(j)}
x′(i)x′(j) to calculate the product directly.
K
(
x
(
i
)
,
x
(
j
)
)
=
x
′
(
i
)
x
′
(
j
)
K(x^{(i)},x^{(j)})=x^{'(i)}x^{'(j)}
K(x(i),x(j))=x′(i)x′(j)It can make our code run faster and occupying less memory. You know, it costs more memory to represent a polynomial features.
As long as the model need to calculate the form like
x
i
x
j
x_{i}x_{j}
xixj we can use the kernel function. It’s not only belong to SVM.
多项式核函数
K
(
x
,
y
)
=
(
x
⋅
y
+
1
)
2
K(x, y)=(x\cdot y+1)^{2}
K(x,y)=(x⋅y+1)2
K
(
x
,
y
)
=
(
∑
i
=
1
n
x
i
y
i
+
1
)
2
=
∑
i
=
1
n
(
x
i
2
)
(
y
i
2
)
+
∑
i
=
2
n
∑
j
=
1
i
−
1
(
2
x
i
x
j
)
(
2
y
i
y
j
)
+
∑
i
=
1
n
(
2
x
i
)
(
2
y
i
)
+
1
\begin{aligned} K(x, y)=&(\sum_{i=1}^{n}x_{i}y_{i}+1)^{2} \\ =&\sum_{i=1}^{n}(x_{i}^{2})(y_{i}^{2})+\sum_{i=2}^{n}\sum_{j=1}^{i-1}(\sqrt{2}x_{i}x_{j})(\sqrt{2}y_{i}y_{j})+\sum_{i=1}^{n}(\sqrt{2}x_{i})(\sqrt{2}y_{i})+1 \end{aligned}
K(x,y)==(i=1∑nxiyi+1)2i=1∑n(xi2)(yi2)+i=2∑nj=1∑i−1(2xixj)(2yiyj)+i=1∑n(2xi)(2yi)+1if we define
x
′
=
(
x
n
2
,
⋯
,
x
1
2
,
2
x
n
x
n
−
1
,
⋯
,
2
x
n
,
⋯
,
2
x
1
,
1
)
x^{'} = (x_{n}^{2},\cdots ,x_{1}^{2},\sqrt{2}x_{n}x_{n-1},\cdots ,\sqrt{2}x_{n},\cdots ,\sqrt{2}x_{1},1)
x′=(xn2,⋯,x12,2xnxn−1,⋯,2xn,⋯,2x1,1)
y
′
=
(
y
n
2
,
⋯
,
y
1
2
,
2
y
n
y
n
−
1
,
⋯
,
2
y
n
,
⋯
,
2
y
1
,
1
)
y^{'} = (y_{n}^{2},\cdots ,y_{1}^{2},\sqrt{2}y_{n}y_{n-1},\cdots ,\sqrt{2}y_{n},\cdots ,\sqrt{2}y_{1},1)
y′=(yn2,⋯,y12,2ynyn−1,⋯,2yn,⋯,2y1,1)then we have
K
(
x
,
y
)
=
x
′
y
′
K(x,y)=x^{'}y^{'}
K(x,y)=x′y′so we directly calculate
K
(
x
,
y
)
=
(
x
⋅
y
+
1
)
2
K(x, y)=(x\cdot y+1)^{2}
K(x,y)=(x⋅y+1)2instead of calculate the polynomial features
x
′
y
′
x^{'}y^{'}
x′y′
Generally speaking, the kernel function is
K
(
x
,
y
)
=
(
x
⋅
y
+
c
)
d
K(x, y)=(x\cdot y+c)^{d}
K(x,y)=(x⋅y+c)d
RBF核函数
高斯核函数为:
K
(
x
,
y
)
=
e
−
γ
∥
x
−
y
∥
2
K(x,y)=e^{-\gamma \left \| x-y \right \|^{2}}
K(x,y)=e−γ∥x−y∥2
也叫作RBF(Radial Basis Function) Kernel(径向基函数)。将每一个样本点映射到一个无穷维的特征空间。(对于每一个数据点都是landmark,将
m
∗
n
m*n
m∗n的数据映射成
m
∗
m
m*m
m∗m的数据)
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-4, 5, 1)
x
array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
y = np.array((x >= -2) & (x <= 2), dtype='int')
y
array([0, 0, 1, 1, 1, 1, 1, 0, 0])
plt.scatter(x[y==0], [0]*len(x[y==0]))
plt.scatter(x[y==1], [0]*len(x[y==1]))
plt.show()
def gaussian(x, l):
gamma = 1.0
return np.exp(-gamma * (x-l)**2)
l1, l2 = -1, 1
X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
X_new[i, 0] = gaussian(data, l1)
X_new[i, 1] = gaussian(data, l2)
plt.scatter(X_new[y==0,0], X_new[y==0,1])
plt.scatter(X_new[y==1,0], X_new[y==1,1])
plt.show()
Gamma in RBF kernel
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
X, y = datasets.make_moons(noise=0.15, random_state=666)
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
def RBFKernelSVC(gamma):
return Pipeline([
("std_scaler", StandardScaler()),
("svc", SVC(kernel="rbf", gamma=gamma))
])
svc = RBFKernelSVC(gamma=1)
svc.fit(X, y)
Pipeline(memory=None,
steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
def plot_decision_boundary(model, axis):
x0, x1 = np.meshgrid(
np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),
np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),
)
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = model.predict(X_new)
zz = y_predict.reshape(x0.shape)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])
plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
l1, l2 = -1, 1
X_new = np.empty((len(x), 2))
for i, data in enumerate(x):
X_new[i, 0] = gaussian(data, l1)
X_new[i, 1] = gaussian(data, l2)
plot_decision_boundary(svc, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
s)
svc_gamma100 = RBFKernelSVC(gamma=100)
svc_gamma100.fit(X, y)
Pipeline(memory=None,
steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=100, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma100, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
s)
svc_gamma10 = RBFKernelSVC(gamma=10)
svc_gamma10.fit(X, y)
Pipeline(memory=None,
steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=10, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma10, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
s)
svc_gamma05 = RBFKernelSVC(gamma=0.5)
svc_gamma05.fit(X, y)
Pipeline(memory=None,
steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.5, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma05, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
s)
svc_gamma01 = RBFKernelSVC(gamma=0.1)
svc_gamma01.fit(X, y)
Pipeline(memory=None,
steps=[('std_scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))])
plot_decision_boundary(svc_gamma01, axis=[-1.5, 2.5, -1.0, 1.5])
plt.scatter(X[y==0,0], X[y==0,1])
plt.scatter(X[y==1,0], X[y==1,1])
plt.show()
D:\anaconda\lib\site-packages\matplotlib\contour.py:967: UserWarning: The following kwargs were not used by contour: 'linewidth'
s)
Solving regression problem by using SVM
期望在margin的范围中,所包含的点越多越好
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
boston = datasets.load_boston()
X = boston.data
y = boston.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)
from sklearn.svm import LinearSVR
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
def StandardLinearSVR(epsilon=0.1):
return Pipeline([
('std_scaler', StandardScaler()),
('linearSVR', LinearSVR(epsilon=epsilon))
])
svr = StandardLinearSVR()
svr.fit(X_train, y_train)
svr.score(X_test, y_test)