python 支持向量机预测_支持向量机SVM及python实现

0. 介绍

支持向量机,support vector machines,SVM,是一种二分类模型。策略: 间隔最大化。这等价于正则化的合页损失函数最小化问题。

学习算法: 序列最小最优化算法SMO

分类 线性可分支持向量机,线性支持向量机、非线性支持向量机。

1、线性可分支持向量机特点: 训练数据线性可分;策略为硬间隔最大化;线性分类器。

模型 分类决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

分类超平面:

equation?tex=w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D+%3D+0+%5C%5C

定义超平面关于样本点

equation?tex=%28x_i%2C+y_i%29的函数间隔为:

equation?tex=%5Chat%7B%5Cgamma%7D_%7Bi%7D%3Dy_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29+%5C%5C

定义超平面关于样本点

equation?tex=%28x_i%2C+y_i%29的几何间隔:

equation?tex=%5Cgamma_%7Bi%7D%3Dy_%7Bi%7D%5Cleft%28%5Cfrac%7Bw%7D%7B%5C%7C%7Bw%7D%5C%7C%7D+%5Ccdot+x_%7Bi%7D%2B%5Cfrac%7Bb%7D%7B%5C%7Cw%5C%7C%7D%5Cright%29+%5C%5C

几何距离是真正的点到面的距离。 定义所有样本点到面的距离的最小值:

equation?tex=%5Cgamma%3D%5Cmin+_%7Bi%3D1%2C+%5Cldots%2C+N%7D+%5Cgamma_%7Bi%7D+%5C%5C

间隔最大化:对训练集找到几何间隔最大的超平面,也就是充分大的确信度对训练数据进行分类。

以下通过最大间隔法和对偶法进行实现:

最大间隔法: 1)构造约束最优化函数

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmax+_%7Bw%2C+b%7D+%5Cquad+%5Cgamma%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad++y_%7Bi%7D%5Cleft%28%5Cfrac%7Bw%7D%7B%5C%7Cw%5C%7C%7D+%5Ccdot+x_%7Bi%7D%2B%5Cfrac%7Bb%7D%7B%5C%7Cw%5C%7C%7D%5Cright%29+%5Cgeqslant+%5Cgamma%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D+%5C%5C

如果假设函数间隔

equation?tex=%5Chat%5Cgamma+%3D+%7C%7Cw%7C%7C%5Cgamma+%3D+1,易得 上述等价于:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+%5Cquad+%5Cfrac%7B1%7D%7B2%7D%5C%7C%5Comega%5C%7C%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+y_%7Bi%7D%5Cleft%28w+x_%7Bi%7D%2Bb%5Cright%29-1+%5Cgeq+0+%5Cend%7Baligned%7D+%5C%5C

2)解约束函数,即获得超平面

equation?tex=w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D+%3D+0+%5C%5C

对偶法: 对偶算法可以使得问题更容易求解,并且能自然引入核函数,推广非线性分类。 1、定义拉格朗日函数

equation?tex=L%28w%2C+b%2C+%5Calpha%29%3D%5Cfrac%7B1%7D%7B2%7D%5C%7Cw%5C%7C%5E%7B2%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29%2B%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C

优化目标:

equation?tex=%5Cmax+_%7B%5Calpha%7D+%5Cmin+_%7Bw%2Cb%7D+L%28w%2C+b%2C+%5Calpha%29+%5C%5C

2、求

equation?tex=%5Cmin+_%7Bw%2C+b%7D+L%28w%2C+b%2C+%5Calpha%29

equation?tex=L 分别对

equation?tex=w%2Cb求偏导数,并令其等于0。

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cnabla_%7Bw%7D+L%28w%2C+b%2C+%5Calpha%29%3Dw-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D+x_%7Bi%7D%3D0%5C%5C+%26%5Cnabla_%7Bb%7D+L%28w%2C+b%2C+%5Calpha%29%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5Cend%7Baligned%7D%5C%5C

得:

equation?tex=%5Cbegin%7Baligned%7D+%26w%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D+x_%7Bi%7D%5C%5C+%26%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5Cend%7Baligned%7D%5C%5C

3、求

equation?tex=%5Cmin+_%7Bw%2C+b%2C%5Calpha%7D+L%28w%2C+b%2C+%5Calpha%29

equation?tex=%5Calpha的极大。 根据2中的结果,

equation?tex=%5Cmin+_%7Bw%2C+b%7D+L%28w%2C+b%2C+%5Calpha%29%3D-%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29%2B%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C

equation?tex=min+L%28w%2Cb%2C%5Calpha%29

equation?tex=%5Calpha的极大的对偶问题是:

equation?tex=%5Cbegin%7Barray%7D%7Bll%7D+%5Cmin+%5Climits_%7B%5Calpha%7D+%26+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C+%5Ctext+%7B+s.t.+%7D+%26+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5C%5C+%26+%5Calpha_%7Bi%7D+%5Cgeqslant+0%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Barray%7D%5C%5C

求解出

equation?tex=%5Calpha%5E%2A,即可得:

equation?tex=w%5E%7B%2A%7D%3D%5Csum_%7Bi%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+x_%7Bi%7D+%5C%5C

存在一个

equation?tex=%5Calpha_j%5E%2A%3E0,使得

equation?tex=y_%7Bj%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x_%7Bj%7D%2Bb%5E%7B%2A%7D%5Cright%29-1%3D0+%5C%5C

因此,分类决策函数为:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D%5Cleft%28x+%5Ccdot+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

总结

感知机模型的定义和SVM一样,但是两者的学习策略不同,感知机是误分类驱动,最小化误分点到超平面距离;SVM是最大化所有样本点到超平面的几何距离,也就是SVM是由与超平面距离最近的点决定的,这些点也称为支持向量。

2、线性支持向量机与软间隔最大化特点: 训练数据近似可分;策略为软间隔最大化;线性分类器。

线性不可分指,某些样本点不能满足函数间隔

equation?tex=%5Chat%5Cgamma+%5Cgeq1的约束条件。 引入松弛因子

equation?tex=%5Cxi 和惩罚参数

equation?tex=C,得

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7Bw%2C+b%2C+%5Cxi%7D+%5Cquad+%5Cfrac%7B1%7D%7B2%7D%5C%7Cw%5C%7C%5E%7B2%7D%2BC+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Cxi_%7Bi%7D%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+y_%7Bi%7D%5Cleft%28w+%5Ccdot+x_%7Bi%7D%2Bb%5Cright%29+%5Cgeqslant+1-%5Cxi_%7Bi%7D%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N%5C%5C+%26%5Cxi_%7Bi%7D+%5Cgeqslant+0%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D%5C%5C

优化目标,一方面是使得间隔尽量大,另一方面使得误分点的个数尽量少。 当C趋于无穷大时,优化目标不允许误分点存在,从而过拟合; 当C趋于0时,只要求间隔越大越好,那么将无法得到有意义的解且算法不会收敛,即欠拟合。

学习算法:

1.构建凸二次规划问题:

equation?tex=%5Cbegin%7Barray%7D%7Bll%7D+%5Cmin+%5Climits_%7B%5Calpha%7D+%26+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5C%5C+%5Ctext+%7B+s.t.+%7D+%26+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0+%5C%5C+%26+0+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Barray%7D%5C%5C

2.求得

equation?tex=w%5E%7B%2A%7D%3D%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+x_%7Bi%7D+%5C%5C

存在 $0<\alpha_{j}^{*}

equation?tex=b%5E%7B%2A%7D%3Dy_%7Bj%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29+%5C%5C

分类决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28w%5E%7B%2A%7D+%5Ccdot+x%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

3、非线性支持向量机与核函数特点: 训练数据线性不可分;学习策略为使用核技巧和软间隔最大化;非线性支持向量机。

思想: 使用非线性变换,把输入空间对应一个特征空间(希尔伯特空间),把非线性问题转换成线性问题;

核函数: 定义

equation?tex=%5Cphi是一个输入空间到特征空间的映射,使得对于所有的输入空间的x, z满足条件:

equation?tex=K%28x%2C+z%29+%3D+%5Cphi%28x%29+%5Cphi%28z%29 其中,

equation?tex=K%28x%2Cz%29是核函数;

equation?tex=%5Cphi%28x%29是映射函数。 把原来目标函数的

equation?tex=x_i%2C+x_j内积替换成

equation?tex=K%28x_i%2Cx_j%29.

正定核:

equation?tex=K%28x%2C+z%29 是对称函数, 核对应的Gram矩阵半正定。

常用的核函数: 多项式核函数:

equation?tex=K%28x%2C+z%29%3D%28x+%5Ccdot+z%2B1%29%5E%7Bp%7D+%5C%5C

高斯核函数:

equation?tex=K%28x%2Cz%29+%3D+exp%28-%5Cfrac%7B%7C%7Cx-z%7C%7C%5E2%7D%7B2%5Csigma%5E2%7D%29+%5C%5C

非线性支持向量分类机学习算法: (1)选取适当的核函数

equation?tex=K%28x%2Cz%29和适当的参数C,构造并求解最优化问题:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7B%5Calpha%7D+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+y_%7Bi%7D%3D0%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D+%5C%5C

求得最优解

equation?tex=%5Calpha%5E%7B%2A%7D%3D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%2A%7D%2C+%5Calpha_%7B2%7D%5E%7B%2A%7D%2C+%5Ccdots%2C+%5Calpha_%7BN%7D%5E%7B%2A%7D%5Cright%29%5E%7B%5Cmathrm%7BT%7D%7D。 (2)选择

equation?tex=%5Calpha%5E%2A的一个正分量$0<\alpha^*

equation?tex=b%5E%7B%2A%7D%3Dy_%7Bj%7D-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x_%7Bi%7D+%5Ccdot+x_%7Bj%7D%5Cright%29+%5C%5C

(3)构建决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x%2C+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

equation?tex=K%28x%2Cz%29是正定核函数时,上述是凸二次规划问题,解是存在的。

4、学习算法:序列最小最优化算法SMO

4.1 原理

序列最小最优化SMO算法: 1、通过满足KKT条件,来求解; 2、如果没有满足KKT条件,选择两个变量,固定其他变量,构造二次规划问题。 算法包括,求解两个变量二次规划和选择变量的启发式方法。

优化目标:

equation?tex=%5Cbegin%7Baligned%7D+%26%5Cmin+_%7B%5Calpha%7D+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D+%5Calpha_%7Bj%7D+y_%7Bi%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29-%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5C%5C+%26%5Ctext+%7B+st+%7D+%5Cquad+%5Csum_%7Bi%3D1%7D%5E%7Bn%7D+%5Calpha_%7Bi%7D%3D0%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2%2C+%5Ccdots%2C+N+%5Cend%7Baligned%7D%5C%5C

变量是拉格朗日乘子,一个变量

equation?tex=%5Calpha_i对应于一个样本点

equation?tex=%28x_i%2C+y_i%29; 变量的总数等于训练样本的容量N。 SMO是启发式算法,思路是: 固定其他变量,针对其中两个变量构建二次规划问题,通过子问题求解,提高算法计算速度。 这两个变量,一个是违反KKT条件最严重的那个,另一个是由约束条件自动确定。

两个变量二次规划的求解方法

equation?tex=%5Cbegin%7Barray%7D%7Brl%7D+%5Cmin+%5Climits_%7B%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%7D+%26+W%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C+%26+-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+%5Calpha_%7B1%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+1%7D%2By_%7B2%7D+%5Calpha_%7B2%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+2%7D%5C%5C+%5C%5C+%26%5Ctext+%7B+s.t.+%7D+%5Cquad+%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D%3D-%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%3D%5Czeta%5C%5C+%260+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C%2C+%5Cquad+i%3D1%2C2+%5Cend%7Barray%7D%5C%5C

其中,

equation?tex=K_%7Bij%7D+%3D+K%28x_i%2Cx_j%29

由于

equation?tex=y_i要么是1,要么是-1;所以,

equation?tex=%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D是一个平行于以下正方形对角线的线段。

由于,

equation?tex=0%3C%5Calpha_i+%3C+C

equation?tex=%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D+%3D+const,那么容易知道

equation?tex=%5Calpha_2%5E%7Bnew%7D 的取值范围: 如果

equation?tex=y_1+%5Cneq+y_2

equation?tex=L%3D%5Cmax+%5Cleft%280%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2C+%5Cquad+H%3D%5Cmin+%5Cleft%28C%2C+C%2B%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29+%5C%5C

如果

equation?tex=y_1+%3D+y_2:

equation?tex=L%3D%5Cmax+%5Cleft%280%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D-C%5Cright%29%2C+%5Cquad+H%3D%5Cmin+%5Cleft%28C%2C+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29+%5C%5C

记未考虑

equation?tex=%5Calpha_2不等式约束

equation?tex=0+%5Cleqslant+%5Calpha_%7Bi%7D+%5Cleqslant+C,为

equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D .

那么:

equation?tex=%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew%2C+unc+%7D%7D%3D%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2B%5Cfrac%7By_%7B2%7D%5Cleft%28E_%7B1%7D-E_%7B2%7D%5Cright%29%7D%7B%5Ceta%7D+%5C%5C

其中,

equation?tex=E_i 为函数

equation?tex=g%28x%29 对输入

equation?tex=x_i 的预测值与真实值

equation?tex=y_i之差。

equation?tex=E_%7Bi%7D%3Dg%5Cleft%28x_%7Bi%7D%5Cright%29-y_%7Bi%7D%3D%5Cleft%28%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bj%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bj%7D%2C+x_%7Bi%7D%5Cright%29%2Bb%5Cright%29-y_%7Bi%7D%2C+%5Cquad+i%3D1%2C2+%5C%5C

equation?tex=%5Ceta%3DK_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%3D%5Cleft%5C%7C%5CPhi%5Cleft%28x_%7B1%7D%5Cright%29-%5CPhi%5Cleft%28x_%7B2%7D%5Cright%29%5Cright%5C%7C%5E%7B2%7D+%5C%5C

于是:

其中,

equation?tex=H,

equation?tex=L

equation?tex=%5Calpha_2%5E%7Bnew%7D取值范围的上下界.

equation?tex=%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2By_%7B1%7D+y_%7B2%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%5Cright%29+%5C%5C

4.2 算法步骤

SMO算法在每个子问题中选择两个变量优化,其中至少一个变量是违反KKT条件的。 (1)第一个变量选择 从间隔边界上的支持向量点(

equation?tex=+0%3C%5Calpha%3CC+),选取不满足KKT条件的点,如果没有,从剩下的点选择。

KKT条件为:

(2)第二个变量的选择 第二个变量选择使得

equation?tex=%7CE_1+-+E_2%7C最大。

其中,

equation?tex=E_i代表函数 g(x)对输入

equation?tex=x_i的预测值与真实输出

equation?tex=y_i 之差。

equation?tex=E_%7Bi%7D%3Dg%5Cleft%28x_%7Bi%7D%5Cright%29-y_%7Bi%7D%3D%5Cleft%28%5Csum_%7Bj%3D1%7D%5E%7BN%7D+%5Calpha_%7Bj%7D+y_%7Bj%7D+K%5Cleft%28x_%7Bj%7D%2C+x_%7Bi%7D%5Cright%29%2Bb%5Cright%29-y_%7Bi%7D%2C+%5Cquad+i%3D1%2C2+%5C%5C

(3)计算阈值b 和差值

equation?tex=E_i. 这里更新

equation?tex=%5Calpha_1%5E%7Bnew%7D参考上面4.1。

equation?tex=%7Bb%7D_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B1%7D-y_%7B1%7D+K_%7B11%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B21%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C

equation?tex=b_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B2%7D-y_%7B1%7D+K_%7B12%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B22%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C

equation?tex=b%5E%7Bnew%7D选择满足条件$0<\alpha_{i}^{\mathrm{new}}

equation?tex=E_%7Bi%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Csum_%7BS%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bnew+%7D%7D-y_%7Bi%7D+%5C%5C

(4)预测 根据决策函数:

equation?tex=f%28x%29%3D%5Coperatorname%7Bsign%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%5E%7BN%7D+%5Calpha_%7Bi%7D%5E%7B%2A%7D+y_%7Bi%7D+K%5Cleft%28x+%5Ccdot+x_%7Bi%7D%5Cright%29%2Bb%5E%7B%2A%7D%5Cright%29+%5C%5C

可以获得预测样本的标签。

5、实践

下面代码实现有点问题,这里使用了 max_iter作为终止条件,实际上应该是,找到最不符合KKT条件的样本,然后,该样本和剩下样本中挑选一个符合|Ei - Ej|最大的样本,不断重复挑选第二个样本直到目标函数不下降。也就是说,这里是O(n^2)的复杂度。代码展示的是O(n*max_iter)的复杂度,不合理。

from sklearn import datasets

import numpy as np

class SVM:

def __init__(self, max_iter=100, kernel='linear'):

self.max_iter = max_iter

self._kernel = kernel

def init_args(self, features, labels):

# 样本数,特征维度

self.m, self.n = features.shape

self.X = features

self.Y = labels

self.b = 0.0

# 将Ei保存在一个列表里

self.alpha = np.ones(self.m)

self.E = [self._E(i) for i in range(self.m)]

# 松弛变量

self.C = 1.0

def _KKT(self, i):

y_g = self._g(i) * self.Y[i]

if self.alpha[i] == 0:

return y_g >= 1

elif 0 < self.alpha[i] < self.C:

return y_g == 1

else:

return y_g <= 1

# g(x)预测值,输入xi(X[i])

# g(x) = \sum_{j=1}^N {\alpha_j*y_j*K(x_j,x)+b}

def _g(self, i):

r = self.b

for j in range(self.m):

r += self.alpha[j] * self.Y[j] * self.kernel(self.X[i], self.X[j])

return r

# 核函数

def kernel(self, x1, x2):

if self._kernel == 'linear':

return sum([x1[k] * x2[k] for k in range(self.n)])

elif self._kernel == 'poly':

return (sum([x1[k] * x2[k] for k in range(self.n)]) + 1)**2

return 0

# E(x)为g(x)对输入x的预测值和y的差

# E_i = g(x_i) - y_i

def _E(self, i):

return self._g(i) - self.Y[i]

def _init_alpha(self):

# 外层循环首先遍历所有满足0

index_list = [i for i in range(self.m) if 0 < self.alpha[i] < self.C]

# 否则遍历整个训练集

non_satisfy_list = [i for i in range(self.m) if i not in index_list]

index_list.extend(non_satisfy_list)

# 外层循环选择满足0

for i in index_list:

if self._KKT(i):

continue

# 内层循环,|E1-E2|最大化

E1 = self.E[i]

# 如果E1是+,选择最小的E_i作为E2;如果E1是负的,选择最大的E_i作为E2

if E1 >= 0:

j = min(range(self.m), key=lambda x: self.E[x])

else:

j = max(range(self.m), key=lambda x: self.E[x])

return i, j

def _compare(self, _alpha, L, H):

if _alpha > H:

return H

elif _alpha < L:

return L

else:

return _alpha

def fit(self, features, labels):

self.init_args(features, labels)

for t in range(self.max_iter):

# train, 时间复杂度O(n)

i1, i2 = self._init_alpha()

# 边界,计算阈值b和差值E_i

if self.Y[i1] == self.Y[i2]:

# L = max(0, alpha_2 + alpha_1 -C)

# H = min(C, alpha_2 + alpha_1)

L = max(0, self.alpha[i1] + self.alpha[i2] - self.C)

H = min(self.C, self.alpha[i1] + self.alpha[i2])

else:

# L = max(0, alpha_2 - alpha_1)

# H = min(C, alpha_2 + alpha_1+C)

L = max(0, self.alpha[i2] - self.alpha[i1])

H = min(self.C, self.C + self.alpha[i2] - self.alpha[i1])

E1 = self.E[i1]

E2 = self.E[i2]

# eta=K11+K22-2K12= ||phi(x_1) - phi(x_2)||^2

eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(

self.X[i2],

self.X[i2]) - 2 * self.kernel(self.X[i1], self.X[i2])

if eta <= 0:

# print('eta <= 0')

continue

# 更新约束方向的解

alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (

E1 - E2) / eta #此处有修改,根据书上应该是E1 - E2,书上130-131页

alpha2_new = self._compare(alpha2_new_unc, L, H)

alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (

self.alpha[i2] - alpha2_new)

b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (

alpha1_new - self.alpha[i1]) - self.Y[i2] * self.kernel(

self.X[i2],

self.X[i1]) * (alpha2_new - self.alpha[i2]) + self.b

b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (

alpha1_new - self.alpha[i1]) - self.Y[i2] * self.kernel(

self.X[i2],

self.X[i2]) * (alpha2_new - self.alpha[i2]) + self.b

if 0 < alpha1_new < self.C:

b_new = b1_new

elif 0 < alpha2_new < self.C:

b_new = b2_new

else:

# 选择中点

b_new = (b1_new + b2_new) / 2

# 更新参数

self.alpha[i1] = alpha1_new

self.alpha[i2] = alpha2_new

self.b = b_new

self.E[i1] = self._E(i1)

self.E[i2] = self._E(i2)

return 'train done!'

def predict(self, data):

r = self.b

for i in range(self.m):

r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i])

return 1 if r > 0 else -1

def score(self, X_test, y_test):

right_count = 0

for i in range(len(X_test)):

result = self.predict(X_test[i])

if result == y_test[i]:

right_count += 1

return right_count / len(X_test)

def _weight(self):

# linear model

yx = self.Y.reshape(-1, 1) * self.X

self.w = np.dot(yx.T, self.alpha)

return self.w

def normalize(x):

return (x - np.min(x))/(np.max(x) - np.min(x))

def get_datasets():

# breast cancer for classification(2 classes)

X, y = datasets.load_breast_cancer(return_X_y=True)

# 归一化

X_norm = normalize(X)

X_train = X_norm[:int(len(X_norm)*0.8)]

X_test = X_norm[int(len(X_norm)*0.8):]

y_train = y[:int(len(X_norm)*0.8)]

y_test = y[int(len(X_norm)*0.8):]

return X_train,y_train,X_test,y_test

if __name__ == '__main__':

X_train,y_train,X_test,y_test = get_datasets()

svm = SVM(max_iter=200)

svm.fit(X_train, y_train)

print("acccucy:{:.4f}".format(svm.score(X_test, y_test)))

运行结果:

acccucy:0.6491

Q:为什么支持向量机不适合大规模数据? 这里,大规模数据指,样本数目和特征数目都很大。

A:个人理解: Linear核的SVM迭代一次时间复杂度

equation?tex=O%28n%5E2+k%29, n 为样本数目,k为特征数目,因为计算每个符合KKT条件的样本时,都得遍历剩余的样本来更新参数,直到目标函数不下降。 像LR时间复杂度是

equation?tex=O%28n%29, LR的样本和权重矩阵相乘(矩阵运算快,我认为可以忽略特征数目)。

非线性核的SVM:使用非线性特征映射将低维特征映射到高维,使用核技巧计算高维特征之间的内积。

由于使用数据集的核矩阵K(

equation?tex=K%5Bi%5D%5Bj%5D代表两个样本的核函数值)描述样本之间的相似性,矩阵元素随着数据规模的增大成平方增长。

参考:

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值