问题
已知样本为 [ x 1 , x 2 . . . x n ] [{x_1},{x_2}...{x_n}] [x1,x2...xn],样本标签为 [ y 1 , y 2 . . . y n ] , y i = 0 , 1 [{y_1},{y_2}...{y_n}],{y_i} = 0,1 [y1,y2...yn],yi=0,1。试通过SVM法求最好分隔样本的超平面,写出其计算过程。
1构造优化问题
解:假设存在一个超平面 w x + b = 0 wx + b = 0 wx+b=0能完全分隔样本,则通过尺度收缩总能找到两个超平面 w x + b = − 1 wx + b = -1 wx+b=−1和 w x + b = 1 wx + b = 1 wx+b=1,使样本在平面上或平面外侧,如下图所示。
即满足:
(1) y i ( w x i + b ) ≥ 1 {y_i}(w{x_i} + b) \ge 1 \tag{1} yi(wxi+b)≥1(1)
但对线性可分的训练数据集而言,使其线性可分的超平面有无穷多个,那么我们该如何选择这两个平面呢?
我们希望我们找的2个平面能最好分隔样本点,那么什么才是最好分隔呢?不难想象,使这两个平面的距离尽可能大,则两类样本的差异就越明显,分类效果最好。即目标是:
(2) m a x d max d \tag{2} maxd(2)
其中d为两个分隔面之间的距离,这样的平面只有一个。
记
x
1
,
x
2
x_1,x_2
x1,x2分别是
w
x
+
b
=
−
1
wx + b = -1
wx+b=−1和
w
x
+
b
=
1
wx + b = 1
wx+b=1上的两点,且垂直于两个平面
x
1
x
2
x_1x_2
x1x2,即
∣
∣
x
1
x
2
∣
∣
=
d
||x_1x_2||=d
∣∣x1x2∣∣=d。
因为:
(3)
x
1
x
2
=
x
2
−
x
1
=
λ
w
x_1x_2 = x_2-x_1= \lambda w \tag{3}
x1x2=x2−x1=λw(3)
(3)式代入 w x 2 + b = 1 w{x_2} + b = 1 wx2+b=1可得:
(4) w ( x 1 + λ w ) + b = 1 w({x_1} + \lambda w) + b = 1 \tag{4} w(x1+λw)+b=1(4)
代入 w x 1 + b = − 1 w{x_1} + b = -1 wx1+b=−1到(4)式可得:
(5) λ w 2 = 2 \lambda {w^2} = 2 \tag{5} λw2=2(5)
从而得:
max
d
=
max
∣
x
2
−
x
1
∣
=
m
a
x
λ
∥
w
∥
=
max
2
w
2
∥
w
∥
=
max
2
∥
w
∥
\max d = \max |{x_2} - {x_1}|{\rm{ = max}}\lambda \left\| w \right\| = \max \frac{2}{{{w^2}}}\left\| w \right\| = \max \frac{2}{{\left\| w \right\|}}
maxd=max∣x2−x1∣=maxλ∥w∥=maxw22∥w∥=max∥w∥2
等价于 min w 2 2 \min \frac{{{w^2}}}{2} min2w2。
即原问题变为凸优化问题:
(6) min w 2 2 \min \frac{{{w^2}}}{2} \tag{6} min2w2(6)
s t : y i ( w x i + b ) ≥ 1 , i = 1 , . . N st:{y_i}(w{x_i} + b) \ge 1,i = 1,..N st:yi(wxi+b)≥1,i=1,..N
2拉格朗日对偶求解
构建拉格朗日函数:
(7)
L
(
w
,
b
,
α
)
=
w
2
2
+
∑
i
=
1
N
α
i
(
1
−
y
i
(
w
x
i
+
b
)
)
L(w,b,\alpha ) = \frac{{{w^2}}}{2} + \sum\limits_{i = 1}^N {{\alpha _i}(1 - {y_i}(w{x_i} + b))} \tag{7}
L(w,b,α)=2w2+i=1∑Nαi(1−yi(wxi+b))(7)
其中 α i ≥ 0 {\alpha _i} \ge 0 αi≥0为拉格朗日乘子。
根据拉格朗日对偶性,原问题的对偶问题是最大最小值问题:
(8) max α min w , b L ( w , b , α ) {\max _\alpha }{\min _{w,b}}L(w,b,\alpha ) \tag{8} αmaxw,bminL(w,b,α)(8)
首先求解
min
w
,
b
L
(
w
,
b
,
α
)
{\min _{w,b}}L(w,b,\alpha )
minw,bL(w,b,α),对w求梯度,令其为0:
∇
w
L
(
w
,
b
,
α
)
=
w
−
∑
i
=
1
N
α
i
y
i
x
i
=
0
{\nabla _w}L(w,b,\alpha ) = w - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} {\rm{ = }}0
∇wL(w,b,α)=w−i=1∑Nαiyixi=0
∇
b
L
(
w
,
b
,
α
)
=
∑
i
=
1
N
α
i
y
i
=
0
{\nabla _b}L(w,b,\alpha ) = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0
∇bL(w,b,α)=i=1∑Nαiyi=0
可得:
(9)
w
=
∑
i
=
1
N
α
i
y
i
x
i
w = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} \tag{9}
w=i=1∑Nαiyixi(9)
(10)
∑
i
=
1
N
α
i
y
i
=
0
\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0 \tag{10}
i=1∑Nαiyi=0(10)
把(9)(10)代入(7)式可得:
min
L
(
w
,
b
,
α
)
w
,
b
=
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
x
i
x
j
+
∑
i
=
1
N
α
i
−
∑
i
=
1
N
α
i
y
i
(
(
∑
j
=
1
N
α
j
y
j
x
j
)
x
i
)
−
b
∑
i
=
1
N
α
i
y
i
=
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
x
i
x
j
+
∑
i
=
1
N
α
i
\begin{array}{l} \min L{(w,b,\alpha )_{w,b}} = \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j}{\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}((\sum\limits_{j = 1}^N {{\alpha _j}{y_j}{x_j}){x_i})} } - b\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} \\ = - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} {\rm{ + }}\sum\limits_{i = 1}^N {{\alpha _i}} \end{array}
minL(w,b,α)w,b=21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi−i=1∑Nαiyi((j=1∑Nαjyjxj)xi)−bi=1∑Nαiyi=−21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi
然后求
m
i
n
L
(
w
,
b
,
α
)
w
,
b
min L{(w,b,\alpha )_{w,b}}
minL(w,b,α)w,b对
α
\alpha
α的极大,即是对偶问题:
(11)
max
α
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
x
i
x
j
+
∑
i
=
1
N
α
i
{\max _\alpha } - \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} + \sum\limits_{i = 1}^N {{\alpha _i}} \tag{11}
αmax−21i=1∑Nj=1∑Nαiαjyiyjxixj+i=1∑Nαi(11)
s
.
t
:
∑
i
=
1
N
α
i
y
i
=
0
s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0
s.t:i=1∑Nαiyi=0
α
i
≥
0
,
i
=
1
,
.
.
.
N
{\alpha _i} \ge 0,i = 1,...N
αi≥0,i=1,...N
上式等价于:
(12)
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
x
i
x
j
−
∑
i
=
1
N
α
i
{\min _\alpha }\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}{x_i}} } {x_j} - \sum\limits_{i = 1}^N {{\alpha _i}} \tag{12}
αmin21i=1∑Nj=1∑Nαiαjyiyjxixj−i=1∑Nαi(12)
s
.
t
:
∑
i
=
1
N
α
i
y
i
=
0
s.t:\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ = }}0
s.t:i=1∑Nαiyi=0
α
i
≥
0
,
i
=
1
,
.
.
.
N
{\alpha _i} \ge 0,i = 1,...N
αi≥0,i=1,...N
(12)式是(6)式的对偶问题。
最后,使用SMO算法(序列最小最优化)即可求出对偶问题的解
α
i
∗
{\alpha _i}^*
αi∗,再通过(9)和(10)式可得到原问题的解
w
∗
{w^*}
w∗和
b
∗
{b^*}
b∗,从而得到最优超平面
w
∗
x
+
b
∗
=
0
{w^*}x + {b^*} = 0
w∗x+b∗=0,即
∑
i
=
1
N
α
i
∗
y
i
(
x
i
x
)
+
b
∗
=
0
\sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*} = 0
i=1∑Nαi∗yi(xix)+b∗=0。得到分类决策函数:
(13)
f
(
x
)
=
s
i
g
n
(
∑
i
=
1
N
α
i
∗
y
i
(
x
i
x
)
+
b
∗
)
f(x) = sign(\sum\limits_{i = 1}^N {{\alpha _i}^{\rm{*}}{y_i}({x_i}x)} + {b^*}) \tag{13}
f(x)=sign(i=1∑Nαi∗yi(xix)+b∗)(13)