支持向量机代码实现_支持向量机SVM及python实现

2ed563abc624ce427a103142417c4ea3.png

0. 介绍

支持向量机,support vector machines,SVM,是一种二分类模型。

  • 策略: 间隔最大化。这等价于正则化的合页损失函数最小化问题。
  • 学习算法: 序列最小最优化算法SMO
  • 分类 线性可分支持向量机,线性支持向量机、非线性支持向量机。

1、线性可分支持向量机

  • 特点: 训练数据线性可分;策略为硬间隔最大化;线性分类器。

模型 分类决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

分类超平面:

equation?tex=w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D+%3D+0+%5C%5C

定义超平面关于样本点

equation?tex=%28x_i%2C+y_i%29
函数间隔为:

equation?tex=%5Chat%7B%5Cgamma%7D_%7Bi%7D%3Dy_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29+%5C%5C

定义超平面关于样本点

equation?tex=%28x_i%2C+y_i%29
几何间隔

equation?tex=%5Cgamma_%7Bi%7D%3Dy_%7Bi%7D%5Cleft%28%5Cfrac%7Bw%7D%7B%5C%7C%7Bw%7D%5C%7C%7D+%5Ccdot+x_%7Bi%7D%2B%5Cfrac%7Bb%7D%7B%5C%7Cw%5C%7C%7D%5Cright%29+%5C%5C

几何距离是真正的点到面的距离。 定义所有样本点到面的距离的最小值:

equation?tex=%5Cgamma%3D%5Cmin+_%7Bi%3D1%2C+%5Cldots%2C+N%7D+%5Cgamma_%7Bi%7D+%5C%5C

间隔最大化:对训练集找到几何间隔最大的超平面,也就是充分大的确信度对训练数据进行分类。

以下通过最大间隔法和对偶法进行实现:

最大间隔法: 1)构造约束最优化函数

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmax+_%7Bw%2C+b%7D+%5Cquad+%5Cgamma%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad++y_%7Bi%7D%5Cleft%28%5Cfrac%7Bw%7D%7B%5C%7Cw%5C%7C%7D+%5Ccdot+x_%7Bi%7D%2B%5Cfrac%7Bb%7D%7B%5C%7Cw%5C%7C%7D%5Cright%29+%5Cgeqslant+%5Cgamma%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D+%5C%5C

如果假设函数间隔

equation?tex=%5Chat%5Cgamma+%3D+%7C%7Cw%7C%7C%5Cgamma+%3D+1,易得 上述等价于:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+%5Cquad+%5Cfrac%7B1%7D%7B2%7D%5C%7C%5Comega%5C%7C%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+y_%7Bi%7D%5Cleft%28w+x_%7Bi%7D%2Bb%5Cright%29-1+%5Cgeq+0+%5Cend%7Baligned%7D+%5C%5C

2)解约束函数,即获得超平面

equation?tex=w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D+%3D+0+%5C%5C

对偶法: 对偶算法可以使得问题更容易求解,并且能自然引入核函数,推广非线性分类。 1、定义拉格朗日函数

equation?tex=L%28w%2C+b%2C+%5Calpha%29%3D%5Cfrac%7B1%7D%7B2%7D%5C%7Cw%5C%7C%5E%7B2%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29%2B%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C

优化目标:

equation?tex=%5Cmax+_%7B%5Calpha%7D+%5Cmin+_%7Bw%2Cb%7D+L%28w%2C+b%2C+%5Calpha%29+%5C%5C

2、求

equation?tex=%5Cmin+_%7Bw%2C+b%7D+L%28w%2C+b%2C+%5Calpha%29
equation?tex=L 分别对
equation?tex=w%2Cb求偏导数,并令其等于0。

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cnabla_%7Bw%7D+L%28w%2C+b%2C+%5Calpha%29%3Dw-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D+x_%7Bi%7D%3D0%5C%5C+%26%5Cnabla_%7Bb%7D+L%28w%2C+b%2C+%5Calpha%29%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5Cend%7Baligned%7D%5C%5C

得:

equation?tex=%5Cbegin%7Baligned%7D+%26w%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D+x_%7Bi%7D%5C%5C+%26%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5Cend%7Baligned%7D%5C%5C

3、求

equation?tex=%5Cmin+_%7Bw%2C+b%2C%5Calpha%7D+L%28w%2C+b%2C+%5Calpha%29
equation?tex=%5Calpha的极大。 根据2中的结果,

equation?tex=%5Cmin+_%7Bw%2C+b%7D+L%28w%2C+b%2C+%5Calpha%29%3D-%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29%2B%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C

equation?tex=min+L%28w%2Cb%2C%5Calpha%29
equation?tex=%5Calpha的极大的对偶问题是:

equation?tex=%5Cbegin%7Barray%7D%7Bll%7D+%5Cmin+%5Climits_%7B%5Calpha%7D+%26+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C+%5Ctext+%7B+s.t.+%7D+%26+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5C%5C+%26+%5Calpha_%7Bi%7D+%5Cgeqslant+0%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Barray%7D%5C%5C

求解出

equation?tex=%5Calpha%5E%2A,即可得:

equation?tex=w%5E%7B%2A%7D%3D%5Csum_%7Bi%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+x_%7Bi%7D+%5C%5C

存在一个

equation?tex=%5Calpha_j%5E%2A%3E0,使得

equation?tex=y_%7Bj%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x_%7Bj%7D%2Bb%5E%7B%2A%7D%5Cright%29-1%3D0+%5C%5C

因此,分类决策函数为:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D%5Cleft%28x+%5Ccdot+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

总结

感知机模型的定义和SVM一样,但是两者的学习策略不同,感知机是误分类驱动,最小化误分点到超平面距离;SVM是最大化所有样本点到超平面的几何距离,也就是SVM是由与超平面距离最近的点决定的,这些点也称为支持向量

2、线性支持向量机与软间隔最大化

  • 特点: 训练数据近似可分;策略为软间隔最大化;线性分类器。

线性不可分指,某些样本点不能满足函数间隔

equation?tex=%5Chat%5Cgamma+%5Cgeq1的约束条件。 引入松弛因子
equation?tex=%5Cxi 和惩罚参数
equation?tex=C,得

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7Bw%2C+b%2C+%5Cxi%7D+%5Cquad+%5Cfrac%7B1%7D%7B2%7D%5C%7Cw%5C%7C%5E%7B2%7D%2BC+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Cxi_%7Bi%7D%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+y_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29+%5Cgeqslant+1-%5Cxi_%7Bi%7D%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N%5C%5C+%26%5Cxi_%7Bi%7D+%5Cgeqslant+0%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D%5C%5C

优化目标,一方面是使得间隔尽量大,另一方面使得误分点的个数尽量少。 当C趋于无穷大时,优化目标不允许误分点存在,从而过拟合; 当C趋于0时,只要求间隔越大越好,那么将无法得到有意义的解且算法不会收敛,即欠拟合

学习算法

1.构建凸二次规划问题:

equation?tex=%5Cbegin%7Barray%7D%7Bll%7D+%5Cmin+%5Climits_%7B%5Calpha%7D+%26+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C+%5Ctext+%7B+s.t.+%7D+%26+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5C%5C+%26+0+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Barray%7D%5C%5C

2.求得

equation?tex=w%5E%7B%2A%7D%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+x_%7Bi%7D+%5C%5C

存在 $0<alpha_{j}^{*}

equation?tex=b%5E%7B%2A%7D%3Dy_%7Bj%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29+%5C%5C

分类决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

3、非线性支持向量机与核函数

  • 特点: 训练数据线性不可分;学习策略为使用核技巧和软间隔最大化;非线性支持向量机。

思想: 使用非线性变换,把输入空间对应一个特征空间(希尔伯特空间),把非线性问题转换成线性问题;

核函数: 定义

equation?tex=%5Cphi是一个输入空间到特征空间的映射,使得对于所有的输入空间的x, z满足条件:
equation?tex=K%28x%2C+z%29+%3D+%5Cphi%28x%29+%5Cphi%28z%29 其中,
equation?tex=K%28x%2Cz%29是核函数;
equation?tex=%5Cphi%28x%29是映射函数。 把原来目标函数的
equation?tex=x_i%2C+x_j内积替换成
equation?tex=K%28x_i%2Cx_j%29.

正定核:

equation?tex=K%28x%2C+z%29 是对称函数, 核对应的Gram矩阵半正定。

常用的核函数: 多项式核函数:

equation?tex=K%28x%2C+z%29%3D%28x+%5Ccdot+z%2B1%29%5E%7Bp%7D+%5C%5C

高斯核函数:

equation?tex=K%28x%2Cz%29+%3D+exp%28-%5Cfrac%7B%7C%7Cx-z%7C%7C%5E2%7D%7B2%5Csigma%5E2%7D%29+%5C%5C

非线性支持向量分类机学习算法: (1)选取适当的核函数

equation?tex=K%28x%2Cz%29和适当的参数C,构造并求解最优化问题:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7B%5Calpha%7D+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D+%5C%5C

求得最优解

equation?tex=%5Calpha%5E%7B%2A%7D%3D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%2A%7D%2C+%5Calpha_%7B2%7D%5E%7B%2A%7D%2C+%5Ccdots%2C+%5Calpha_%7BN%7D%5E%7B%2A%7D%5Cright%29%5E%7B%5Cmathrm%7BT%7D%7D。 (2)选择
equation?tex=%5Calpha%5E%2A的一个正分量$0<alpha^*
equation?tex=b%5E%7B%2A%7D%3Dy_%7Bj%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29+%5C%5C

(3)构建决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x%2C+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

equation?tex=K%28x%2Cz%29是正定核函数时,上述是凸二次规划问题,解是存在的。

4、学习算法:序列最小最优化算法SMO

4.1 原理

序列最小最优化SMO算法: 1、通过满足KKT条件,来求解; 2、如果没有满足KKT条件,选择两个变量,固定其他变量,构造二次规划问题。 算法包括,求解两个变量二次规划和选择变量的启发式方法。

优化目标:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7B%5Calpha%7D+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5C%5C+%26%5Ctext+%7B+st+%7D+%5Cquad+%5Csum_%7Bi%3D1%7D%5E%7Bn%7D+%5Calpha_%7Bi%7D%3D0%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D%5C%5C

变量是拉格朗日乘子,一个变量

equation?tex=%5Calpha_i对应于一个样本点
equation?tex=%28x_i%2C+y_i%29; 变量的总数等于训练样本的容量N。 SMO是启发式算法,思路是: 固定其他变量,针对其中两个变量构建二次规划问题,通过子问题求解,提高算法计算速度。 这两个变量,一个是违反KKT条件最严重的那个,另一个是由约束条件自动确定。

两个变量二次规划的求解方法

equation?tex=%5Cbegin%7Barray%7D%7Brl%7D+%5Cmin+%5Climits_%7B%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%7D+%26+W%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C+%26+-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+%5Calpha_%7B1%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+1%7D%2By_%7B2%7D+%5Calpha_%7B2%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+2%7D%5C%5C+%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D%3D-%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%3D%5Czeta%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2+%5Cend%7Barray%7D%5C%5C

其中,

equation?tex=K_%7Bij%7D+%3D+K%28x_i%2Cx_j%29

由于

equation?tex=y_i要么是1,要么是-1;所以,
equation?tex=%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D是一个平行于以下正方形对角线的线段。

61e908d882bb33c63682e116b84503a9.png

由于,

equation?tex=0%3C%5Calpha_i+%3C+C
equation?tex=%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D+%3D+const,那么容易知道
equation?tex=%5Calpha_2%5E%7Bnew%7D 的取值范围: 如果
equation?tex=y_1+%5Cneq+y_2

equation?tex=L%3D%5Cmax+%5Cleft%280%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2C+%5Cquad+H%3D%5Cmin+%5Cleft%28C%2C+C%2B%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29+%5C%5C

如果

equation?tex=y_1+%3D+y_2:

equation?tex=L%3D%5Cmax+%5Cleft%280%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D-C%5Cright%29%2C+%5Cquad+H%3D%5Cmin+%5Cleft%28C%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29+%5C%5C

记未考虑

equation?tex=%5Calpha_2不等式约束
equation?tex=0+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C,为
equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D .

那么:

equation?tex=%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew%2C+unc+%7D%7D%3D%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Cfrac%7By_%7B2%7D%5Cleft%28E_%7B1%7D-E_%7B2%7D%5Cright%29%7D%7B%5Ceta%7D+%5C%5C

其中,

equation?tex=E_i 为函数
equation?tex=g%28x%29 对输入
equation?tex=x_i 的预测值与真实值
equation?tex=y_i之差。

equation?tex=E_%7Bi%7D%3Dg%5Cleft%28x_%7Bi%7D%5Cright%29-y_%7Bi%7D%3D%5Cleft%28%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bj%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bj%7D%2C+x_%7Bi%7D%5Cright%29%2Bb%5Cright%29-y_%7Bi%7D%2C+%5Cquad+i%3D1%2C2+%5C%5C
equation?tex=%5Ceta%3DK_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%3D%5Cleft%5C%7C%5CPhi%5Cleft%28x_%7B1%7D%5Cright%29-%5CPhi%5Cleft%28x_%7B2%7D%5Cright%29%5Cright%5C%7C%5E%7B2%7D+%5C%5C

于是:

86567b862ba3b58d9e04aa91d2aa125b.png

其中,

equation?tex=H,
equation?tex=L
equation?tex=%5Calpha_2%5E%7Bnew%7D取值范围的上下界.

equation?tex=%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2By_%7B1%7D+y_%7B2%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%5Cright%29+%5C%5C

4.2 算法步骤

SMO算法在每个子问题中选择两个变量优化,其中至少一个变量是违反KKT条件的。 (1)第一个变量选择 从间隔边界上的支持向量点(

equation?tex=+0%3C%5Calpha%3CC+),选取不满足KKT条件的点,如果没有,从剩下的点选择。

KKT条件为:

a99c21bdd332713f617bf94837cc1bd0.png

(2)第二个变量的选择 第二个变量选择使得

equation?tex=%7CE_1+-+E_2%7C最大。

其中,

equation?tex=E_i代表函数 g(x)对输入
equation?tex=x_i的预测值与真实输出
equation?tex=y_i 之差。

equation?tex=E_%7Bi%7D%3Dg%5Cleft%28x_%7Bi%7D%5Cright%29-y_%7Bi%7D%3D%5Cleft%28%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bj%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bj%7D%2C+x_%7Bi%7D%5Cright%29%2Bb%5Cright%29-y_%7Bi%7D%2C+%5Cquad+i%3D1%2C2+%5C%5C

(3)计算阈值b 和差值

equation?tex=E_i. 这里更新
equation?tex=%5Calpha_1%5E%7Bnew%7D参考上面4.1。

equation?tex=%7Bb%7D_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B1%7D-y_%7B1%7D+K_%7B11%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B21%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C
equation?tex=b_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B2%7D-y_%7B1%7D+K_%7B12%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B22%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C

equation?tex=b%5E%7Bnew%7D选择满足条件$0<alpha_{i}^{mathrm{new}}
equation?tex=E_%7Bi%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Csum_%7BS%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bnew+%7D%7D-y_%7Bi%7D+%5C%5C

(4)预测 根据决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x+%5Ccdot+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

可以获得预测样本的标签。

5、实践

377f57ea821248399dd6df1e73f9edf1.png

下面代码实现有点问题,这里使用了 max_iter作为终止条件,实际上应该是,找到最不符合KKT条件的样本,然后,该样本和剩下样本中挑选一个符合|Ei - Ej|最大的样本,不断重复挑选第二个样本直到目标函数不下降。也就是说,这里是O(n^2)的复杂度。代码展示的是O(n*max_iter)的复杂度,不合理。

from sklearn import datasets
import numpy as np
class SVM:
    def __init__(self, max_iter=100, kernel='linear'):
        self.max_iter = max_iter
        self._kernel = kernel

    def init_args(self, features, labels):
        # 样本数,特征维度
        self.m, self.n = features.shape
        self.X = features
        self.Y = labels
        self.b = 0.0

        # 将Ei保存在一个列表里
        self.alpha = np.ones(self.m)
        self.E = [self._E(i) for i in range(self.m)]
        # 松弛变量
        self.C = 1.0

    def _KKT(self, i):
        y_g = self._g(i) * self.Y[i]
        if self.alpha[i] == 0:
            return y_g >= 1
        elif 0 < self.alpha[i] < self.C:
            return y_g == 1
        else:
            return y_g <= 1

    # g(x)预测值,输入xi(X[i])
    # g(x) = sum_{j=1}^N {alpha_j*y_j*K(x_j,x)+b}
    def _g(self, i):
        r = self.b
        for j in range(self.m):
            r += self.alpha[j] * self.Y[j] * self.kernel(self.X[i], self.X[j])
        return r

    # 核函数
    def kernel(self, x1, x2):
        if self._kernel == 'linear':
            return sum([x1[k] * x2[k] for k in range(self.n)])
        elif self._kernel == 'poly':
            return (sum([x1[k] * x2[k] for k in range(self.n)]) + 1)**2

        return 0

    # E(x)为g(x)对输入x的预测值和y的差
    # E_i = g(x_i) - y_i
    def _E(self, i):
        return self._g(i) - self.Y[i]

    def _init_alpha(self):
        # 外层循环首先遍历所有满足0<a<C的样本点,检验是否满足KKT
        index_list = [i for i in range(self.m) if 0 < self.alpha[i] < self.C]
        # 否则遍历整个训练集
        non_satisfy_list = [i for i in range(self.m) if i not in index_list]
        index_list.extend(non_satisfy_list)
        # 外层循环选择满足0<alpha_i<C,且不满足KKT的样本点。如果不存在遍历剩下训练集
        for i in index_list:
            if self._KKT(i):
                continue
            # 内层循环,|E1-E2|最大化
            E1 = self.E[i]
            # 如果E1是+,选择最小的E_i作为E2;如果E1是负的,选择最大的E_i作为E2
       
            if E1 >= 0:
                j = min(range(self.m), key=lambda x: self.E[x])
            else:
                j = max(range(self.m), key=lambda x: self.E[x])
            return i, j

    def _compare(self, _alpha, L, H):
        if _alpha > H:
            return H
        elif _alpha < L:
            return L
        else:
            return _alpha

    def fit(self, features, labels):
        self.init_args(features, labels)

        for t in range(self.max_iter):
            # train, 时间复杂度O(n)
            i1, i2 = self._init_alpha()

            # 边界,计算阈值b和差值E_i
            if self.Y[i1] == self.Y[i2]:
                # L = max(0, alpha_2 + alpha_1 -C)
                # H = min(C, alpha_2 + alpha_1)
                L = max(0, self.alpha[i1] + self.alpha[i2] - self.C)
                H = min(self.C, self.alpha[i1] + self.alpha[i2])
            else:
                # L = max(0, alpha_2 - alpha_1)
                # H = min(C, alpha_2 + alpha_1+C)
                L = max(0, self.alpha[i2] - self.alpha[i1])
                H = min(self.C, self.C + self.alpha[i2] - self.alpha[i1])

            E1 = self.E[i1]
            E2 = self.E[i2]
            # eta=K11+K22-2K12= ||phi(x_1) - phi(x_2)||^2
            eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(
                self.X[i2],
                self.X[i2]) - 2 * self.kernel(self.X[i1], self.X[i2])
            if eta <= 0:
                # print('eta <= 0')
                continue
            # 更新约束方向的解
            alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (
                E1 - E2) / eta  #此处有修改,根据书上应该是E1 - E2,书上130-131页
            alpha2_new = self._compare(alpha2_new_unc, L, H)

            alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (
                self.alpha[i2] - alpha2_new)

            b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (
                alpha1_new - self.alpha[i1]) - self.Y[i2] * self.kernel(
                    self.X[i2],
                    self.X[i1]) * (alpha2_new - self.alpha[i2]) + self.b
            b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (
                alpha1_new - self.alpha[i1]) - self.Y[i2] * self.kernel(
                    self.X[i2],
                    self.X[i2]) * (alpha2_new - self.alpha[i2]) + self.b

            if 0 < alpha1_new < self.C:
                b_new = b1_new
            elif 0 < alpha2_new < self.C:
                b_new = b2_new
            else:
                # 选择中点
                b_new = (b1_new + b2_new) / 2

            # 更新参数
            self.alpha[i1] = alpha1_new
            self.alpha[i2] = alpha2_new
            self.b = b_new

            self.E[i1] = self._E(i1)
            self.E[i2] = self._E(i2)
        return 'train done!'

    def predict(self, data):
        r = self.b
        for i in range(self.m):
            r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i])

        return 1 if r > 0 else -1

    def score(self, X_test, y_test):
        right_count = 0
        for i in range(len(X_test)):
            result = self.predict(X_test[i])
            if result == y_test[i]:
                right_count += 1
        return right_count / len(X_test)

    def _weight(self):
        # linear model
        yx = self.Y.reshape(-1, 1) * self.X
        self.w = np.dot(yx.T, self.alpha)
        return self.w
        
def normalize(x):
    return (x - np.min(x))/(np.max(x) - np.min(x))
def get_datasets():
    # breast cancer for classification(2 classes)
    X, y = datasets.load_breast_cancer(return_X_y=True)
    # 归一化
    X_norm = normalize(X)
    X_train = X_norm[:int(len(X_norm)*0.8)]
    X_test = X_norm[int(len(X_norm)*0.8):]
    y_train = y[:int(len(X_norm)*0.8)]
    y_test = y[int(len(X_norm)*0.8):]
    return X_train,y_train,X_test,y_test
if __name__ == '__main__':
    X_train,y_train,X_test,y_test = get_datasets()
    svm = SVM(max_iter=200)
    svm.fit(X_train, y_train)
    print("acccucy:{:.4f}".format(svm.score(X_test, y_test)))

运行结果:

acccucy:0.6491

Q:为什么支持向量机不适合大规模数据? 这里,大规模数据指,样本数目和特征数目都很大。

A:个人理解: Linear核的SVM迭代一次时间复杂度

equation?tex=O%28n%5E2+k%29, n 为样本数目,k为特征数目,因为计算每个符合KKT条件的样本时,都得遍历剩余的样本来更新参数,直到目标函数不下降。 像LR时间复杂度是
equation?tex=O%28n%29, LR的样本和权重矩阵相乘(矩阵运算快,我认为可以忽略特征数目)。

非线性核的SVM:使用非线性特征映射将低维特征映射到高维,使用核技巧计算高维特征之间的内积。

77f530bbc2a0e0192157f397e7b43a9e.png

由于使用数据集的核矩阵K(

equation?tex=K%5Bi%5D%5Bj%5D代表两个样本的核函数值)描述样本之间的相似性,矩阵元素随着数据规模的增大成平方增长。

参考:

  1. 李航 统计学习方法;
  2. sklearn SVM;
  3. github lihang-code SVM;
  4. 支持向量机(SVM)是否适合大规模数据? - 知乎;
  5. Github kernel-svm/svm.py ;
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值