一、线性可分支持向量机
1、针对问题: 线性可分的二分类数据
2、思想: 寻找几何间隔最大的分离超平面,不像PLA,该超平面是唯一的。(硬间隔最大化)
max
w
,
b
γ
s
.
t
.
y
i
(
w
∥
w
∥
⋅
x
i
+
b
∥
w
∥
)
≥
γ
,
i
=
1
,
2
,
.
.
.
,
N
(1)
\max\limits_{w,b} \quad \gamma \\ s.t.\quad y_i\left(\frac{w}{\Vert w\Vert} \cdot x_i + \frac{b}{\Vert w\Vert} \right) \ge \gamma, i = 1,2,...,N \tag{1}
w,bmaxγs.t.yi(∥w∥w⋅xi+∥w∥b)≥γ,i=1,2,...,N(1)
上式中
γ
\gamma
γ表示几何间隔,由于函数间隔
γ
^
\hat{\gamma}
γ^与几何间隔
γ
\gamma
γ间的关系:
γ
=
γ
^
∥
w
∥
\gamma=\frac{\hat{\gamma}}{\Vert w\Vert}
γ=∥w∥γ^,可得到下式:
max
w
,
b
γ
^
∥
w
∥
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
≥
γ
^
,
i
=
1
,
2
,
.
.
.
,
N
(2)
\max\limits_{w,b} \quad \frac{\hat{\gamma}}{\Vert w\Vert} \\ s.t.\quad y_i\left(w \cdot x_i + b \right) \ge \hat{\gamma}, i = 1,2,...,N \tag{2}
w,bmax∥w∥γ^s.t.yi(w⋅xi+b)≥γ^,i=1,2,...,N(2)
令
w
^
=
w
γ
^
\hat{w}=\frac{w}{\hat{\gamma}}
w^=γ^w,
b
^
=
b
γ
^
\hat{b}=\frac{b}{\hat{\gamma}}
b^=γ^b,得到下式,由于
s
i
g
n
(
w
⋅
x
+
b
)
=
s
i
g
n
(
γ
^
w
^
⋅
x
+
γ
^
b
^
)
=
s
i
g
n
(
w
^
⋅
x
+
b
^
)
sign(w\cdot x +b)=sign(\hat{\gamma} \hat{w}\cdot x+\hat{\gamma} \hat{b})=sign(\hat{w}\cdot x+\hat{b})
sign(w⋅x+b)=sign(γ^w^⋅x+γ^b^)=sign(w^⋅x+b^),所以这样变换不会影响预测结果。
max
w
^
,
b
^
1
∥
w
^
∥
s
.
t
.
y
i
(
w
^
⋅
x
i
+
b
^
)
≥
1
,
i
=
1
,
2
,
.
.
.
,
N
(3)
\max\limits_{\hat{w},\hat{b}} \quad \frac{1}{\Vert \hat{w}\Vert} \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1, i = 1,2,...,N \tag{3}
w^,b^max∥w^∥1s.t.yi(w^⋅xi+b^)≥1,i=1,2,...,N(3)
由于
max
w
^
,
b
^
1
∥
w
^
∥
\max\limits_{\hat{w},\hat{b}} \frac{1}{\Vert \hat{w}\Vert}
w^,b^max∥w^∥1 和
min
w
^
,
b
^
1
2
∥
w
^
∥
2
\min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2
w^,b^min21∥w^∥2等价,因此(3)式可等价于下式,称其为线性可分支持向量机的原始问题。
min
w
^
,
b
^
1
2
∥
w
^
∥
2
s
.
t
.
y
i
(
w
^
⋅
x
i
+
b
^
)
≥
1
,
i
=
1
,
2
,
.
.
.
,
N
(4)
\min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2 \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1, i = 1,2,...,N \tag{4}
w^,b^min21∥w^∥2s.t.yi(w^⋅xi+b^)≥1,i=1,2,...,N(4)
求解(4)式得到
w
∗
^
,
b
∗
^
\hat{w^*},\hat{b^*}
w∗^,b∗^,由此可得分离超平面和决策函数:
w
∗
^
⋅
x
+
b
∗
^
=
0
(5)
\hat{w^*}\cdot x+\hat{b^*}=0 \tag{5}
w∗^⋅x+b∗^=0(5)
f
(
x
)
=
s
i
g
n
(
w
∗
^
⋅
x
+
b
∗
^
)
(6)
f(x) = sign(\hat{w^*}\cdot x+\hat{b^*}) \tag{6}
f(x)=sign(w∗^⋅x+b∗^)(6)
3、支持向量: 训练数据集的样本点中与分离超平面距离最近的样本点,即使(3)式约束条件等号成立的样本点。对于
y
i
=
1
y_i=1
yi=1的样本点,即满足
H
1
H_1
H1;对于
y
i
=
−
1
y_i=-1
yi=−1的样本点,即满足
H
2
H_2
H2。
y
i
(
w
^
∗
⋅
x
i
+
b
^
∗
)
=
1
H
1
:
w
^
∗
⋅
x
i
+
b
^
∗
=
1
H
2
:
w
^
∗
⋅
x
i
+
b
^
∗
=
−
1
y_i (\hat{w}^* \cdot x_i + \hat{b}^* ) =1 \\ H_1: \hat{w}^* \cdot x_i + \hat{b}^* =1 \\ H_2: \hat{w}^* \cdot x_i + \hat{b}^* =-1
yi(w^∗⋅xi+b^∗)=1H1:w^∗⋅xi+b^∗=1H2:w^∗⋅xi+b^∗=−1
4、间隔边界:
H
1
H_1
H1和
H
2
H_2
H2之间的距离,为
2
∥
w
^
∗
∥
\frac{2}{\Vert \hat{w}^*\Vert}
∥w^∗∥2。
5、学习的对偶算法: 对于线性可分支持向量机来说,可以直接求解原始问题(4)式得到结果,在这里为了后面的内容做铺垫,给出原始问题的对偶算法。(为了方便,把上面的
w
^
\hat{w}
w^ 和
b
^
\hat{b}
b^ 统一写为
w
w
w 和
b
b
b)
① 构建极小化问题(4)的拉格朗日函数:
L
(
w
,
b
,
α
)
=
1
2
∥
w
∥
2
−
∑
i
=
1
N
α
i
y
i
(
w
⋅
x
i
+
b
)
+
∑
i
=
1
N
α
i
(7)
L(w,b,\alpha)=\frac{1}{2}\Vert w\Vert ^2-\sum_{i=1}^{N}\alpha_iy_i\left(w \cdot x_i + b \right)+\sum_{i=1}^{N}\alpha_i \tag{7}
L(w,b,α)=21∥w∥2−i=1∑Nαiyi(w⋅xi+b)+i=1∑Nαi(7)
则该极小化问题等价于:
min
w
,
b
max
α
L
(
w
,
b
,
α
)
(8)
\min\limits_{w,b}\max\limits_{\alpha}L(w,b,\alpha) \tag{8}
w,bminαmaxL(w,b,α)(8)
根据拉格朗日对偶性,该问题的对偶问题为:
max
α
min
w
,
b
L
(
w
,
b
,
α
)
(9)
\max\limits_{\alpha}\min\limits_{w,b}L(w,b,\alpha) \tag{9}
αmaxw,bminL(w,b,α)(9)
当满足KKT条件时,(8)和(9)等价。由KKT条件中的:
∇
w
L
(
w
,
b
,
α
)
=
w
−
∑
i
=
1
N
α
i
y
i
x
i
=
0
∇
b
L
(
w
,
b
,
α
)
=
−
∑
i
=
1
N
α
i
y
i
=
0
\nabla_wL(w,b,\alpha)=w-\sum_{i=1}^{N}\alpha_iy_ix_i=0\\ \nabla_bL(w,b,\alpha)=-\sum_{i=1}^{N}\alpha_iy_i=0
∇wL(w,b,α)=w−i=1∑Nαiyixi=0∇bL(w,b,α)=−i=1∑Nαiyi=0
得到:
w
=
∑
i
=
1
N
α
i
y
i
x
i
∑
i
=
1
N
α
i
y
i
=
0
(10)
w=\sum_{i=1}^{N}\alpha_iy_ix_i \\ \sum_{i=1}^{N}\alpha_iy_i=0 \tag{10}
w=i=1∑Nαiyixii=1∑Nαiyi=0(10)
代入(7),考虑对偶问题(9),再将极大转化成求极小,得到与原始问题(4)等价的对偶最优化问题:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
α
i
≥
0
,
i
=
1
,
2
,
.
.
.
.
.
.
,
N
(11)
\min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_j(x_i\cdot x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{11} \alpha_i \ge 0, i=1,2,......,N
αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=0αi≥0,i=1,2,......,N(11)
求解该问题,得到最优的
α
∗
\alpha^*
α∗,代入下式得到最优的
w
∗
,
b
∗
w^*,b^*
w∗,b∗,从而求得决策函数和分离超平面。
w
∗
=
∑
i
=
1
N
α
i
∗
y
i
x
i
b
∗
=
y
j
−
∑
i
=
1
N
α
i
∗
y
i
(
x
i
⋅
x
j
)
(12)
w^{*}=\sum_{i=1}^{N}\alpha_i^*y_ix_i \\ b^*=y_j-\sum_{i=1}^{N}\alpha_i^*y_i(x_i\cdot x_j) \tag{12}
w∗=i=1∑Nαi∗yixib∗=yj−i=1∑Nαi∗yi(xi⋅xj)(12)
二、线性支持向量机
1、针对问题: 线性不可分的二分类数据
2、思想: 允许一部分的数据点在两条分界线以内(软间隔最大化)。对每个样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)引入一个松弛变量
ξ
i
≥
0
\xi_i \ge0
ξi≥0,相应的目标函数需要支付一个代价
ξ
i
\xi_i
ξi,
C
>
0
C>0
C>0为惩罚参数。得到如下凸二次规划问题(原始问题)。
w
w
w的解是唯一的,但是
b
b
b的解可能不唯一。
min
w
^
,
b
^
1
2
∥
w
^
∥
2
+
C
∑
i
=
1
N
ξ
i
s
.
t
.
y
i
(
w
^
⋅
x
i
+
b
^
)
≥
1
−
ξ
i
,
i
=
1
,
2
,
.
.
.
,
N
ξ
i
≥
0
,
i
=
1
,
2
,
.
.
.
,
N
(13)
\min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2+C\sum_{i=1}^{N}\xi_i \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1-\xi_i, i = 1,2,...,N \\ \xi_i \ge 0,i = 1,2,...,N \tag{13}
w^,b^min21∥w^∥2+Ci=1∑Nξis.t.yi(w^⋅xi+b^)≥1−ξi,i=1,2,...,Nξi≥0,i=1,2,...,N(13)
该问题的对偶最优化问题为:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
.
.
.
.
.
.
,
N
(14)
\min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_j(x_i\cdot x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{14} 0 \le \alpha_i \le C, i=1,2,......,N
αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=00≤αi≤C,i=1,2,......,N(14)
选择合适的惩罚参数
C
>
0
C>0
C>0(一般用留一法或交叉验证法对误差求和,从中选出误差最小的
C
C
C)求解该问题,得到最优的
α
∗
\alpha^*
α∗。代入(12)式得到最优的
w
∗
w^*
w∗;选择
α
∗
\alpha^*
α∗的一个分量满足
0
<
α
j
∗
<
C
0<\alpha^*_j<C
0<αj∗<C计算
b
∗
b^*
b∗,从而求得决策函数和分离超平面。
3. Python代码: 用交叉验证法选择合适的
C
C
C,然后求对偶最优化问题得到最优解。
# -*- coding: utf-8 -*-
"""
Created on Sun Oct 31 15:24:29 2021
@author: 22905
"""
import numpy as np
import random
from cvxopt import matrix, solvers
import matplotlib.pyplot as plt
import csv
import time
data = np.loadtxt(open('data.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
#random.shuffle(data)
x_data = np.array(data)[...,[0,1]].tolist() #如果是多维,需要增加列数,比如取前三列,括号里改为0,1,2
y_data = np.array(data)[...,[2]].tolist()
# 超参数
C_SET = [2**(i) for i in range(-15, 15, 2)]
def sign(v):
if v>=0: return 1
else: return -1
def find_alpha(x_train, y_train, C):
n = len(x_train)
X = np.array(x_train).dot(np.transpose(np.array(x_train)))
Y = np.array(y_train).dot(np.transpose(np.array(y_train)))
P = matrix(X*Y)
q1 = np.array([[-1.] for i in range(n)]) #cvx只接受双精度
q = matrix(q1)
A = matrix(np.transpose(np.array(y_train)))
b = matrix(np.array([[0]]))
G1 = np.diag([-1. for i in range(n)])
G2 = np.diag([1. for i in range(n)])
G = matrix(np.vstack((G1,G2)))
h1 = [[0.] for i in range(n)]
h2 = [[C] for i in range(n)]
h = matrix(np.vstack((h1, h2)))
solvers.options['show_progress'] = False
sol = solvers.qp(P, q, G, h)
alpha = sol['x'] #n*1的
alpha = np.array(alpha) # (n, 1)的矩阵
return alpha
def find_weight_bias(x_train, y_train, alpha):
weight = []
for i in range(np.array(x_train).shape[1]):
temp = alpha*np.array(y_train)*np.array(x_train)[...,[i]] # [...,[i]]表示索引矩阵所有行,第i列
weight.append(temp.sum())#temp内所有元素之和
for i in range(alpha.shape[1]):
if alpha[i][0] > 0 and alpha[i][0] < C:
j = i
break
sigmod = 0
alpha = alpha.tolist()
for i in range(len(x_train)):
sigmod += y_train[i][0]*alpha[i][0]*float(np.dot(x_train[i],x_train[j]))
bias = y_train[j][0] - sigmod
return weight, bias
def find_false_number(weight, bias, x_val, y_val):
false_number = 0
for i in range(len(x_val)):
y_predict = sign(np.array(weight).dot(np.array(x_val[i])) + bias)
if y_predict != y_val[i][0]:
false_number += 1
return false_number
if __name__ == '__main__':
time_start = time.time()
false_num = [] #存放错误数量
interval = len(data)//5 #索引间隔
# 交叉验证
for C in C_SET:
num = 0
for i in range(5):
start, end = i*interval, (i+1)*interval
x_val = x_data[start: end]
y_val = y_data[start: end]
x_train = x_data[0:start] + x_data[end:len(x_data)]
y_train = y_data[0:start] + y_data[end:len(y_data)]
alpha = find_alpha(x_train, y_train, C)
weight, bias = find_weight_bias(x_train, y_train, alpha)
num += find_false_number(weight, bias, x_val, y_val)
false_num.append(num)
fine_C = C_SET[false_num.index(min(false_num))]
print("the fine C is:", fine_C)
fine_alpha = find_alpha(x_data, y_data, fine_C)
fine_weight, fine_bias = find_weight_bias(x_data, y_data, fine_alpha)
fine_false_number = find_false_number(fine_weight, fine_bias, x_data, y_data)
print("right rate is:", (len(x_data)-fine_false_number)/len(x_data))
print("the fine weight and bias are:", fine_weight, bias)
time_end = time.time()
print('running time is:', time_end - time_start)
for i in range(len(x_data)):
if y_data[i][0] == 1:
plt.plot(x_data[i][0], x_data[i][1], 'ro')
else:
plt.plot(x_data[i][0], x_data[i][1], 'bo')
# 二维空间中的线
X = np.arange(round(min(np.array(x_data)[...,[0]])[0])-2, round(max(np.array(x_data)[...,[0]])[0])+2)
Y = (-fine_weight[0]*X-fine_bias)/fine_weight[1]
plt.plot(X, Y)
plt.show()
分类结果:
三、非线性支持向量机
对于线性不可分数据,可以寻找一个正定核函数
K
(
x
i
,
x
j
)
K(x_i,x_j)
K(xi,xj),将原数据映射到另一个特征空间中使得数据线性可分。可将(14)写为:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
,
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
.
.
.
.
.
.
,
N
(15)
\min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_jK(x_i,x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{15} 0 \le \alpha_i \le C, i=1,2,......,N
αmin21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)−i=1∑Nαis.t.i=1∑Nαiyi=00≤αi≤C,i=1,2,......,N(15)
求解该问题,得到最优的
α
∗
\alpha^*
α∗。代入(12)式得到最优的
w
∗
w^*
w∗;选择
α
∗
\alpha^*
α∗的一个分量满足
0
<
α
j
∗
<
C
0<\alpha^*_j<C
0<αj∗<C计算
b
∗
b^*
b∗,从而求得决策函数和分离超平面。
b
∗
=
y
j
−
∑
i
=
1
N
α
i
∗
y
i
K
(
x
i
,
x
j
)
(16)
b^*=y_j-\sum_{i=1}^{N}\alpha^*_iy_iK(x_i,x_j) \tag{16}
b∗=yj−i=1∑Nαi∗yiK(xi,xj)(16)
构造决策函数:
f
(
x
)
=
s
i
g
n
(
∑
i
=
1
N
α
i
∗
y
i
K
(
x
,
x
i
)
+
b
∗
)
(17)
f(x)=sign\left(\sum_{i=1}^{N}\alpha^*_iy_iK(x,x_i)+b^*\right) \tag{17}
f(x)=sign(i=1∑Nαi∗yiK(x,xi)+b∗)(17)