支持向量机SVM

一、线性可分支持向量机

1、针对问题: 线性可分的二分类数据
2、思想: 寻找几何间隔最大的分离超平面,不像PLA,该超平面是唯一的。(硬间隔最大化)
max ⁡ w , b γ s . t . y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) ≥ γ , i = 1 , 2 , . . . , N (1) \max\limits_{w,b} \quad \gamma \\ s.t.\quad y_i\left(\frac{w}{\Vert w\Vert} \cdot x_i + \frac{b}{\Vert w\Vert} \right) \ge \gamma, i = 1,2,...,N \tag{1} w,bmaxγs.t.yi(wwxi+wb)γ,i=1,2,...,N(1)
上式中 γ \gamma γ表示几何间隔,由于函数间隔 γ ^ \hat{\gamma} γ^与几何间隔 γ \gamma γ间的关系: γ = γ ^ ∥ w ∥ \gamma=\frac{\hat{\gamma}}{\Vert w\Vert} γ=wγ^,可得到下式:
max ⁡ w , b γ ^ ∥ w ∥ s . t . y i ( w ⋅ x i + b ) ≥ γ ^ , i = 1 , 2 , . . . , N (2) \max\limits_{w,b} \quad \frac{\hat{\gamma}}{\Vert w\Vert} \\ s.t.\quad y_i\left(w \cdot x_i + b \right) \ge \hat{\gamma}, i = 1,2,...,N \tag{2} w,bmaxwγ^s.t.yi(wxi+b)γ^,i=1,2,...,N(2)
w ^ = w γ ^ \hat{w}=\frac{w}{\hat{\gamma}} w^=γ^w, b ^ = b γ ^ \hat{b}=\frac{b}{\hat{\gamma}} b^=γ^b,得到下式,由于 s i g n ( w ⋅ x + b ) = s i g n ( γ ^ w ^ ⋅ x + γ ^ b ^ ) = s i g n ( w ^ ⋅ x + b ^ ) sign(w\cdot x +b)=sign(\hat{\gamma} \hat{w}\cdot x+\hat{\gamma} \hat{b})=sign(\hat{w}\cdot x+\hat{b}) sign(wx+b)=sign(γ^w^x+γ^b^)=sign(w^x+b^),所以这样变换不会影响预测结果。
max ⁡ w ^ , b ^ 1 ∥ w ^ ∥ s . t . y i ( w ^ ⋅ x i + b ^ ) ≥ 1 , i = 1 , 2 , . . . , N (3) \max\limits_{\hat{w},\hat{b}} \quad \frac{1}{\Vert \hat{w}\Vert} \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1, i = 1,2,...,N \tag{3} w^,b^maxw^1s.t.yi(w^xi+b^)1,i=1,2,...,N(3)
由于 max ⁡ w ^ , b ^ 1 ∥ w ^ ∥ \max\limits_{\hat{w},\hat{b}} \frac{1}{\Vert \hat{w}\Vert} w^,b^maxw^1 min ⁡ w ^ , b ^ 1 2 ∥ w ^ ∥ 2 \min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2 w^,b^min21w^2等价,因此(3)式可等价于下式,称其为线性可分支持向量机的原始问题
min ⁡ w ^ , b ^ 1 2 ∥ w ^ ∥ 2 s . t . y i ( w ^ ⋅ x i + b ^ ) ≥ 1 , i = 1 , 2 , . . . , N (4) \min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2 \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1, i = 1,2,...,N \tag{4} w^,b^min21w^2s.t.yi(w^xi+b^)1,i=1,2,...,N(4)
求解(4)式得到 w ∗ ^ , b ∗ ^ \hat{w^*},\hat{b^*} w^,b^,由此可得分离超平面和决策函数:
w ∗ ^ ⋅ x + b ∗ ^ = 0 (5) \hat{w^*}\cdot x+\hat{b^*}=0 \tag{5} w^x+b^=0(5)
f ( x ) = s i g n ( w ∗ ^ ⋅ x + b ∗ ^ ) (6) f(x) = sign(\hat{w^*}\cdot x+\hat{b^*}) \tag{6} f(x)=sign(w^x+b^)(6)
3、支持向量: 训练数据集的样本点中与分离超平面距离最近的样本点,即使(3)式约束条件等号成立的样本点。对于 y i = 1 y_i=1 yi=1的样本点,即满足 H 1 H_1 H1;对于 y i = − 1 y_i=-1 yi=1的样本点,即满足 H 2 H_2 H2
y i ( w ^ ∗ ⋅ x i + b ^ ∗ ) = 1 H 1 : w ^ ∗ ⋅ x i + b ^ ∗ = 1 H 2 : w ^ ∗ ⋅ x i + b ^ ∗ = − 1 y_i (\hat{w}^* \cdot x_i + \hat{b}^* ) =1 \\ H_1: \hat{w}^* \cdot x_i + \hat{b}^* =1 \\ H_2: \hat{w}^* \cdot x_i + \hat{b}^* =-1 yi(w^xi+b^)=1H1:w^xi+b^=1H2:w^xi+b^=1
4、间隔边界: H 1 H_1 H1 H 2 H_2 H2之间的距离,为 2 ∥ w ^ ∗ ∥ \frac{2}{\Vert \hat{w}^*\Vert} w^2
5、学习的对偶算法: 对于线性可分支持向量机来说,可以直接求解原始问题(4)式得到结果,在这里为了后面的内容做铺垫,给出原始问题的对偶算法。(为了方便,把上面的 w ^ \hat{w} w^ b ^ \hat{b} b^ 统一写为 w w w b b b
① 构建极小化问题(4)的拉格朗日函数:
L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i y i ( w ⋅ x i + b ) + ∑ i = 1 N α i (7) L(w,b,\alpha)=\frac{1}{2}\Vert w\Vert ^2-\sum_{i=1}^{N}\alpha_iy_i\left(w \cdot x_i + b \right)+\sum_{i=1}^{N}\alpha_i \tag{7} L(w,b,α)=21w2i=1Nαiyi(wxi+b)+i=1Nαi(7)
则该极小化问题等价于:
min ⁡ w , b max ⁡ α L ( w , b , α ) (8) \min\limits_{w,b}\max\limits_{\alpha}L(w,b,\alpha) \tag{8} w,bminαmaxL(w,b,α)(8)
根据拉格朗日对偶性,该问题的对偶问题为:
max ⁡ α min ⁡ w , b L ( w , b , α ) (9) \max\limits_{\alpha}\min\limits_{w,b}L(w,b,\alpha) \tag{9} αmaxw,bminL(w,b,α)(9)
当满足KKT条件时,(8)和(9)等价。由KKT条件中的:
∇ w L ( w , b , α ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , α ) = − ∑ i = 1 N α i y i = 0 \nabla_wL(w,b,\alpha)=w-\sum_{i=1}^{N}\alpha_iy_ix_i=0\\ \nabla_bL(w,b,\alpha)=-\sum_{i=1}^{N}\alpha_iy_i=0 wL(w,b,α)=wi=1Nαiyixi=0bL(w,b,α)=i=1Nαiyi=0
得到:
w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 (10) w=\sum_{i=1}^{N}\alpha_iy_ix_i \\ \sum_{i=1}^{N}\alpha_iy_i=0 \tag{10} w=i=1Nαiyixii=1Nαiyi=0(10)
代入(7),考虑对偶问题(9),再将极大转化成求极小,得到与原始问题(4)等价的对偶最优化问题
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , . . . . . . , N (11) \min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_j(x_i\cdot x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{11} \alpha_i \ge 0, i=1,2,......,N αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=0αi0,i=1,2,......,N(11)
求解该问题,得到最优的 α ∗ \alpha^* α,代入下式得到最优的 w ∗ , b ∗ w^*,b^* w,b,从而求得决策函数和分离超平面。
w ∗ = ∑ i = 1 N α i ∗ y i x i b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) (12) w^{*}=\sum_{i=1}^{N}\alpha_i^*y_ix_i \\ b^*=y_j-\sum_{i=1}^{N}\alpha_i^*y_i(x_i\cdot x_j) \tag{12} w=i=1Nαiyixib=yji=1Nαiyi(xixj)(12)

二、线性支持向量机

1、针对问题: 线性不可分的二分类数据
2、思想: 允许一部分的数据点在两条分界线以内(软间隔最大化)。对每个样本点 ( x i , y i ) (x_i,y_i) (xi,yi)引入一个松弛变量 ξ i ≥ 0 \xi_i \ge0 ξi0,相应的目标函数需要支付一个代价 ξ i \xi_i ξi C > 0 C>0 C>0为惩罚参数。得到如下凸二次规划问题(原始问题)。 w w w的解是唯一的,但是 b b b的解可能不唯一
min ⁡ w ^ , b ^ 1 2 ∥ w ^ ∥ 2 + C ∑ i = 1 N ξ i s . t . y i ( w ^ ⋅ x i + b ^ ) ≥ 1 − ξ i , i = 1 , 2 , . . . , N ξ i ≥ 0 , i = 1 , 2 , . . . , N (13) \min\limits_{\hat{w},\hat{b}} \frac{1}{2}\Vert \hat{w}\Vert ^2+C\sum_{i=1}^{N}\xi_i \\ s.t.\quad y_i\left(\hat{w} \cdot x_i + \hat{b} \right) \ge 1-\xi_i, i = 1,2,...,N \\ \xi_i \ge 0,i = 1,2,...,N \tag{13} w^,b^min21w^2+Ci=1Nξis.t.yi(w^xi+b^)1ξi,i=1,2,...,Nξi0,i=1,2,...,N(13)
该问题的对偶最优化问题为:
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , . . . . . . , N (14) \min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_j(x_i\cdot x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{14} 0 \le \alpha_i \le C, i=1,2,......,N αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=00αiC,i=1,2,......,N(14)
选择合适的惩罚参数 C > 0 C>0 C>0(一般用留一法或交叉验证法对误差求和,从中选出误差最小的 C C C)求解该问题,得到最优的 α ∗ \alpha^* α。代入(12)式得到最优的 w ∗ w^* w;选择 α ∗ \alpha^* α的一个分量满足 0 < α j ∗ < C 0<\alpha^*_j<C 0<αj<C计算 b ∗ b^* b,从而求得决策函数和分离超平面。
3. Python代码: 用交叉验证法选择合适的 C C C,然后求对偶最优化问题得到最优解。

# -*- coding: utf-8 -*-
"""
Created on Sun Oct 31 15:24:29 2021

@author: 22905
"""
import numpy as np
import random
from cvxopt import matrix, solvers
import matplotlib.pyplot as plt
import csv
import time

data = np.loadtxt(open('data.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
#random.shuffle(data)

x_data = np.array(data)[...,[0,1]].tolist() #如果是多维,需要增加列数,比如取前三列,括号里改为0,1,2
y_data = np.array(data)[...,[2]].tolist()

# 超参数
C_SET = [2**(i) for i in range(-15, 15, 2)]
    
def sign(v):
    if v>=0: return 1
    else: return -1
    
def find_alpha(x_train, y_train, C):
    n = len(x_train)
    X = np.array(x_train).dot(np.transpose(np.array(x_train)))
    Y = np.array(y_train).dot(np.transpose(np.array(y_train)))
    P = matrix(X*Y)
    q1 = np.array([[-1.] for i in range(n)]) #cvx只接受双精度
    q = matrix(q1)
    A = matrix(np.transpose(np.array(y_train)))
    b = matrix(np.array([[0]]))
    G1 = np.diag([-1. for i in range(n)])
    G2 = np.diag([1. for i in range(n)])
    G = matrix(np.vstack((G1,G2)))
    h1 = [[0.] for i in range(n)]
    h2 = [[C] for i in range(n)]
    h = matrix(np.vstack((h1, h2)))
    solvers.options['show_progress'] = False
    sol = solvers.qp(P, q, G, h)
    alpha = sol['x'] #n*1的
    alpha = np.array(alpha) # (n, 1)的矩阵
    return alpha

def find_weight_bias(x_train, y_train, alpha):
    weight = []
    for i in range(np.array(x_train).shape[1]):
        temp = alpha*np.array(y_train)*np.array(x_train)[...,[i]] # [...,[i]]表示索引矩阵所有行,第i列
        weight.append(temp.sum())#temp内所有元素之和
    for i in range(alpha.shape[1]):
        if alpha[i][0] > 0 and alpha[i][0] < C:
            j = i
            break
    sigmod = 0
    alpha = alpha.tolist()

    for i in range(len(x_train)):
        sigmod += y_train[i][0]*alpha[i][0]*float(np.dot(x_train[i],x_train[j]))
    bias = y_train[j][0] - sigmod
    return weight, bias

def find_false_number(weight, bias, x_val, y_val):
    false_number = 0
    for i in range(len(x_val)):
        y_predict = sign(np.array(weight).dot(np.array(x_val[i])) + bias)
        if y_predict != y_val[i][0]:
            false_number += 1
    return false_number

if __name__ == '__main__':
    
    time_start = time.time()
    false_num = [] #存放错误数量
    interval = len(data)//5 #索引间隔
    # 交叉验证
    for C in C_SET:
        num = 0
        for i in range(5):
            start, end = i*interval, (i+1)*interval
            x_val = x_data[start: end]
            y_val = y_data[start: end]
            x_train = x_data[0:start] + x_data[end:len(x_data)]
            y_train = y_data[0:start] + y_data[end:len(y_data)]
            alpha = find_alpha(x_train, y_train, C)
            weight, bias = find_weight_bias(x_train, y_train, alpha)
            num += find_false_number(weight, bias, x_val, y_val)
        false_num.append(num)
        
    fine_C = C_SET[false_num.index(min(false_num))]
    print("the fine C is:", fine_C)
    fine_alpha = find_alpha(x_data, y_data, fine_C)
    fine_weight, fine_bias = find_weight_bias(x_data, y_data, fine_alpha)
    fine_false_number = find_false_number(fine_weight, fine_bias, x_data, y_data)
    print("right rate is:", (len(x_data)-fine_false_number)/len(x_data))
    print("the fine weight and bias are:", fine_weight, bias)
    time_end = time.time()
    print('running time is:', time_end - time_start)
    
    for i in range(len(x_data)):
        if y_data[i][0] == 1:
            plt.plot(x_data[i][0], x_data[i][1], 'ro')
        else:
            plt.plot(x_data[i][0], x_data[i][1], 'bo')
    # 二维空间中的线
    X = np.arange(round(min(np.array(x_data)[...,[0]])[0])-2, round(max(np.array(x_data)[...,[0]])[0])+2)
    Y = (-fine_weight[0]*X-fine_bias)/fine_weight[1]
    plt.plot(X, Y)
    plt.show()

分类结果:
在这里插入图片描述

三、非线性支持向量机

对于线性不可分数据,可以寻找一个正定核函数 K ( x i , x j ) K(x_i,x_j) K(xi,xj),将原数据映射到另一个特征空间中使得数据线性可分。可将(14)写为:
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i , x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , . . . . . . , N (15) \min\limits_{\alpha}\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_jK(x_i,x_j)-\sum_{i=1}^{N}\alpha_i \\ s.t. \quad \sum_{i=1}^{N}\alpha_iy_i=0 \\ \tag{15} 0 \le \alpha_i \le C, i=1,2,......,N αmin21i=1Nj=1NαiαjyiyjK(xi,xj)i=1Nαis.t.i=1Nαiyi=00αiC,i=1,2,......,N(15)
求解该问题,得到最优的 α ∗ \alpha^* α。代入(12)式得到最优的 w ∗ w^* w;选择 α ∗ \alpha^* α的一个分量满足 0 < α j ∗ < C 0<\alpha^*_j<C 0<αj<C计算 b ∗ b^* b,从而求得决策函数和分离超平面。
b ∗ = y j − ∑ i = 1 N α i ∗ y i K ( x i , x j ) (16) b^*=y_j-\sum_{i=1}^{N}\alpha^*_iy_iK(x_i,x_j) \tag{16} b=yji=1NαiyiK(xi,xj)(16)
构造决策函数:
f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i K ( x , x i ) + b ∗ ) (17) f(x)=sign\left(\sum_{i=1}^{N}\alpha^*_iy_iK(x,x_i)+b^*\right) \tag{17} f(x)=sign(i=1NαiyiK(x,xi)+b)(17)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值