1.SVM算法推导-线性可分时

问题

已知样本为 [ x 1 , x 2 . . . x n ] [{x_1},{x_2}...{x_n}] [x1,x2...xn],样本标签为 [ y 1 , y 2 . . . y n ] , y i = 0 , 1 [{y_1},{y_2}...{y_n}],{y_i} = 0,1 [y1,y2...yn],yi=0,1。试通过SVM法求最好分隔样本的超平面,写出其计算过程。

1构造优化问题

解:假设存在一个超平面 w x + b = 0 wx + b = 0 wx+b=0能完全分隔样本,则通过尺度收缩总能找到两个超平面 w x + b = − 1 wx + b = -1 wx+b=1 w x + b = 1 wx + b = 1 wx+b=1,使样本在平面上或平面外侧,如下图所示。

图1 分隔平面

即满足:

(1) y i ( w x i + b ) ≥ 1 {y_i}(w{x_i} + b) \ge 1 \tag{1} yi(wxi+b)1(1)

但对线性可分的训练数据集而言,使其线性可分的超平面有无穷多个,那么我们该如何选择这两个平面呢?

我们希望我们找的2个平面能最好分隔样本点,那么什么才是最好分隔呢?不难想象,使这两个平面的距离尽可能大,则两类样本的差异就越明显,分类效果最好。即目标是:

(2) m a x d max d \tag{2} maxd(2)

其中d为两个分隔面之间的距离,这样的平面只有一个。

x 1 , x 2 x_1,x_2 x1,x2分别是 w x + b = − 1 wx + b = -1 wx+b=1 w x + b = 1 wx + b = 1 wx+b=1上的两点,且垂直于两个平面 x 1 x 2 x_1x_2 x1x2,即 ∣ ∣ x 1 x 2 ∣ ∣ = d ||x_1x_2||=d x1x2=d
因为:
(3) x 1 x 2 = x 2 − x 1 = λ w x_1x_2 = x_2-x_1= \lambda w \tag{3} x1x2=x2x1=λw(3)

(3)式代入 w x 2 + b = 1 w{x_2} + b = 1 wx2+b=1可得:

(4) w ( x 1 + λ w ) + b = 1 w({x_1} + \lambda w) + b = 1 \tag{4} w(x1+λw)+b=1(4)

代入 w x 1 + b = − 1 w{x_1} + b = -1 wx1+b=1到(4)式可得:

(5) λ w 2 = 2 \lambda {w^2} = 2 \tag{5} λw2=2(5)

从而得:
max ⁡ d = max ⁡ ∣ x 2 − x 1 ∣ = m a x λ ∥ w ∥ = max ⁡ 2 w 2 ∥ w ∥ = max ⁡ 2 ∥ w ∥ \max d = \max |{x_2} - {x_1}|{\rm{ = max}}\lambda \left\| w \right\| = \max \frac{2}{{{w^2}}}\left\| w \right\| = \max \frac{2}{{\left\| w \right\|}} maxd=maxx2x1=maxλw=maxw22w=maxw2

等价于 min ⁡ w 2 2 \min \frac{{{w^2}}}{2} min2w2

即原问题变为凸优化问题:

(6) min ⁡ w 2 2 \min \frac{{{w^2}}}{2} \tag{6} min2w2(6)

s t : y i ( w x i + b ) ≥ 1 , i = 1 , . . N st:{y_i}(w{x_i} + b) \ge 1,i = 1,..N st:yi(wxi+b)1,i=1,..N

2拉格朗日对偶求解

构建拉格朗日函数:
(7) L ( w , b , α ) = w 2 2 + ∑ i = 1 N α i ( 1 − y i ( w x i + b ) ) L(w,b,\alpha ) = \frac{{{w^2}}}{2} + \sum\limits_{i = 1}^N {{\alpha _i}(1 - {y_i}(w{x_i} + b))} \tag{7} L(w,b,α)=2w2+i=1Nαi(1yi(wxi+b))(7)

其中 α i ≥ 0 {\alpha _i} \ge 0 αi0为拉格朗日乘子。

根据拉格朗日对偶性,原问题的对偶问题是最大最小值问题:

(8) max ⁡ α min ⁡ w , b L ( w , b , α ) {\max _\alpha }{\min _{w,b}}L(w,b,\alpha ) \tag{8} αmaxw,bminL(w,b,α)(8)

首先求解 min ⁡ w , b L ( w , b , α ) {\min _{w,b}}L(w,b,\alpha ) minw,bL(w,b,α),对w求梯度,令其为0:
∇ w L ( w , b , α ) = w − ∑ i = 1 N α i y i x i = 0 {\nabla _w}L(w,b,\alpha ) = w - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} {\rm{ = }}0 wL(w,b,α)=wi=1Nαiyixi=0
∇ b L ( w , b , α ) = ∑ i = 1 N α i y i = 0 {\nabla _b}L(w,b,\alpha ) = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 bL(w,b,α)=i=1Nαiyi=0
可得:
(9) w = ∑ i = 1 N α i y i x i w = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} \tag{9} w=i=1Nαiyixi(9)
(10) ∑ i = 1 N α i y i = 0 \sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 \tag{10} i=1Nαiyi=0(10)

把(9)(10)代入(7)式可得:
min ⁡ L ( w , b , α ) w , b = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i − ∑ i = 1 N α i y i ( ( ∑ j = 1 N α j y j x j ) x i ) − b ∑ i = 1 N α i y i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i \begin{array}{l} \min L{(w,b,\alpha )_{w,b}} = \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j}{\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}((\sum\limits_{j = 1}^N {{\alpha _j}{y_j}{x_j}){x_i})} } - b\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} \\ = - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} {\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}} \end{array} minL(w,b,α)w,b=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαii=1Nαiyi((j=1Nαjyjxj)xi)bi=1Nαiyi=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi

然后求 m i n L ( w , b , α ) w , b min L{(w,b,\alpha )_{w,b}} minL(w,b,α)w,b α \alpha α的极大,即是对偶问题:
(11) max ⁡ α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i {\max _\alpha } - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} + \sum\limits_{i = 1}^N {{\alpha _i}} \tag{11} αmax21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi(11)
s . t : ∑ i = 1 N α i y i = 0 s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 s.t:i=1Nαiyi=0
α i ≥ 0 , i = 1 , . . . N {\alpha _i} \ge 0,i = 1,...N αi0,i=1,...N
上式等价于:
(12) min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j − ∑ i = 1 N α i {\min _\alpha }\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} - \sum\limits_{i = 1}^N {{\alpha _i}} \tag{12} αmin21i=1Nj=1Nαiαjyiyjxixji=1Nαi(12)
s . t : ∑ i = 1 N α i y i = 0 s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 s.t:i=1Nαiyi=0
α i ≥ 0 , i = 1 , . . . N {\alpha _i} \ge 0,i = 1,...N αi0,i=1,...N

(12)式是(6)式的对偶问题。
最后,使用SMO算法(序列最小最优化)即可求出对偶问题的解 α i ∗ {\alpha _i}^* αi,再通过(9)和(10)式可得到原问题的解 w ∗ {w^*} w b ∗ {b^*} b,从而得到最优超平面 w ∗ x + b ∗ = 0 {w^*}x + {b^*} = 0 wx+b=0,即 ∑ i = 1 N α i ∗ y i ( x i x ) + b ∗ = 0 \sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*} = 0 i=1Nαiyi(xix)+b=0。得到分类决策函数:
(13) f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i ( x i x ) + b ∗ ) f(x) = sign(\sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*}) \tag{13} f(x)=sign(i=1Nαiyi(xix)+b)(13)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值