3 线性支持向量机与软间隔最大化
一个特征空间上的数据集:
-
T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } x_{i} \in \mathcal{X}=\mathbf{R}^{n},y_{i} \in \mathcal{Y}=\{+1,-1\} xi∈X=Rn,yi∈Y={+1,−1}, i = 1 , 2 , ⋯ , N i=1,2, \cdots, N i=1,2,⋯,N, x i x_i xi是第 i i i个特征向量,也称为实例, y i y_i yi为 x i x_i xi的类别标记
-
( x i , y i ) (x_i,y_i) (xi,yi)为样本点,当 y = + 1 y=+1 y=+1时,称 x i x_i xi为正例;当 y = − 1 y=-1 y=−1时,称 x i x_i xi为正例
-
假设数据集是线性不可分的,训练数据集中存在一些特异点
线性不可分:
-
对于某些点不满足约束条件: y i ( w ⋅ x i + b ) − 1 ⩾ 0 , i = 1 , 2 , ⋯ , N {y_{i}\left(w \cdot x_{i}+b\right)-1 \geqslant 0, \quad i=1,2, \cdots, N} yi(w⋅xi+b)−1⩾0,i=1,2,⋯,N
-
对于每个样本点引入松弛变量 ξ i ⩾ 0 \xi_{i} \geqslant 0 ξi⩾0,约束条件变为: y i ( w ⋅ x i + b ) ⩾ 1 − ξ i y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i} yi(w⋅xi+b)⩾1−ξi
-
目标函数: 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i} 21∥w∥2+C∑i=1Nξi, C > 0 C>0 C>0为惩罚参数
-
原始最优化问题为凸二次规划问题:
min w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i s.t. y i ( w ⋅ x i + b ) ⩾ 1 − ξ i , i = 1 , 2 , ⋯ , N ξ i ⩾ 0 , i = 1 , 2 , ⋯ , N \color{red}\begin{array}{ll}{\min _{w, b, \xi}} & {\frac{1}{2}\|w\|^{2}+C \displaystyle \sum_{i=1}^{N} \xi_{i}} \\ {\text { s.t. }} & {y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N} \\ {} & {\xi_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array} minw,b,ξ s.t. 21∥w∥2+Ci=1∑Nξiyi(w⋅xi+b)⩾1−ξi,i=1,2,⋯,Nξi⩾0,i=1,2,⋯,N -
可以证明 w w w的解是唯一的,但** b b b的解可能不是唯一的,而是存在于一个区间内**
学习的对偶算法:
原始问题的拉格朗日函数为:
L
(
w
,
b
,
ξ
,
α
,
μ
)
≡
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
−
∑
i
=
1
N
α
i
(
y
i
(
w
⋅
x
i
+
b
)
−
1
+
ξ
i
)
−
∑
i
=
1
N
μ
i
ξ
i
α
i
⩾
0
,
μ
i
⩾
0
L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}-\sum_{i=1}^{N} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{N} \mu_{i} \xi_{i} \\ \alpha_{i} \geqslant 0, \mu_{i} \geqslant 0
L(w,b,ξ,α,μ)≡21∥w∥2+Ci=1∑Nξi−i=1∑Nαi(yi(w⋅xi+b)−1+ξi)−i=1∑Nμiξiαi⩾0,μi⩾0
对偶问题是拉格朗日函数的极大极小问题,
L
(
w
,
b
,
ξ
,
α
,
μ
)
L(w, b, \xi, \alpha, \mu)
L(w,b,ξ,α,μ)对
w
,
b
,
ξ
w, b, \xi
w,b,ξ求导得:
∇
w
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
w
−
∑
i
=
1
N
α
i
y
i
x
i
=
0
∇
b
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
−
∑
i
=
1
N
α
i
y
i
=
0
∇
ξ
i
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
C
−
α
i
−
μ
i
=
0
\nabla_{w} L(w, b, \xi, \alpha, \mu)=w-\sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}=0\\ \nabla_{b} L(w, b, \xi, \alpha, \mu)=-\sum_{i=1}^{N} \alpha_{i} y_{i}=0\\ \nabla_{\xi_{i}} L(w, b, \xi, \alpha, \mu)=C-\alpha_{i}-\mu_{i}=0
∇wL(w,b,ξ,α,μ)=w−i=1∑Nαiyixi=0∇bL(w,b,ξ,α,μ)=−i=1∑Nαiyi=0∇ξiL(w,b,ξ,α,μ)=C−αi−μi=0
解得:
w
=
∑
i
=
1
N
α
i
y
i
x
i
∑
i
=
1
N
α
i
y
i
=
0
C
−
α
i
−
μ
i
=
0
w=\sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}\\ \sum_{i=1}^{N} \alpha_{i} y_{i}=0\\ C-\alpha_{i}-\mu_{i}=0
w=i=1∑Nαiyixii=1∑Nαiyi=0C−αi−μi=0
代入
L
(
w
,
b
,
ξ
,
α
,
μ
)
L(w, b, \xi, \alpha, \mu)
L(w,b,ξ,α,μ)得:
min
w
,
b
,
ξ
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
+
∑
i
=
1
N
α
i
\min _{w, b, \xi} L(w, b, \xi, \alpha, \mu)=-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{N} \alpha_{i}
w,b,ξminL(w,b,ξ,α,μ)=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
对上式求极大得对偶问题:
max
α
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
+
∑
i
=
1
N
α
i
s.t.
∑
i
=
1
N
α
i
y
i
=
0
C
−
α
i
−
μ
i
=
0
α
i
⩾
0
μ
i
⩾
0
,
i
=
1
,
2
,
⋯
,
N
\max _{\alpha}-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{N} \alpha_{i}\\ \begin{array}{ll}{\text { s.t. }} & {\displaystyle\sum_{i=1}^{N} \alpha_{i} y_{i}=0} \\ {} & {C-\alpha_{i}-\mu_{i}=0} \\ {} & {\alpha_{i} \geqslant 0} \\ {} & {\mu_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array}
αmax−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi s.t. i=1∑Nαiyi=0C−αi−μi=0αi⩾0μi⩾0,i=1,2,⋯,N
进一步得:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
N
α
i
s.t.
∑
i
=
1
N
α
i
y
i
=
0
0
⩽
α
i
⩽
C
,
i
=
1
,
2
,
⋯
,
N
\color{red} \begin{array}{ll}{\min _{\alpha}} & {\frac{1}{2} \displaystyle \sum_{i=1}^{N} \displaystyle \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\displaystyle \sum_{i=1}^{N} \alpha_{i}} \\ {\text { s.t. }} & {\displaystyle \sum_{i=1}^{N} \alpha_{i} y_{i}=0} \\ {} & {0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \cdots, N}\end{array}
minα s.t. 21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαii=1∑Nαiyi=00⩽αi⩽C,i=1,2,⋯,N
求解对偶问题:
设
α
∗
=
(
α
1
∗
,
α
2
∗
,
⋯
,
α
N
∗
)
T
\alpha^{*}=\left(\alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*}\right)^{\mathrm{T}}
α∗=(α1∗,α2∗,⋯,αN∗)T是对偶问题的一个解,若存在
α
∗
\alpha^{*}
α∗的一个分量
α
j
\alpha_{j}
αj,
0
<
α
j
∗
<
C
0<\alpha_{j}^{*}<C
0<αj∗<C,则原始问题的解
w
∗
,
b
∗
w^{*}, b^{*}
w∗,b∗可按下式求得:
w
∗
=
∑
i
=
1
N
α
i
∗
y
i
x
i
b
∗
=
y
j
−
∑
i
=
1
N
y
i
α
i
∗
(
x
i
⋅
x
j
)
\color{red} {w^{*}=\displaystyle \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i}\\ b^{*}=y_{j}-\displaystyle \sum_{i=1}^{N} y_{i} \alpha_{i}^{*}\left(x_{i} \cdot x_{j}\right)}
w∗=i=1∑Nαi∗yixib∗=yj−i=1∑Nyiαi∗(xi⋅xj)
由此得分离超平面为:
∑
i
=
1
N
α
i
∗
y
i
(
x
⋅
x
i
)
+
b
∗
=
0
\color{red} \sum_{i=1}^{N} \alpha_{i}^{*} y_{i}\left(x \cdot x_{i}\right)+b^{*}=0
i=1∑Nαi∗yi(x⋅xi)+b∗=0
分类决策函数:
f
(
x
)
=
sign
(
∑
i
=
1
N
α
i
∗
y
i
(
x
⋅
x
i
)
+
b
∗
)
\color{red} f(x)=\operatorname{sign}\left(\sum_{i=1}^{N} \alpha_{i}^{*} y_{i}\left(x \cdot x_{i}\right)+b^{*}\right)
f(x)=sign(i=1∑Nαi∗yi(x⋅xi)+b∗)
支持向量:
训练数据集中对应于 α i ∗ > 0 \alpha_{i}^{*}>0 αi∗>0的样本点 ( x i , y i ) \left(x_{i}, y_{i}\right) (xi,yi)的实例点 x i ∈ R n x_{i} \in \mathbf{R}^{n} xi∈Rn称为支持向量,实例点 x i x_{i} xi到分类间隔的距离为 ξ i ∥ w ∥ \frac{\xi_{i}}{\|w\|} ∥w∥ξi
- α i ∗ < C \alpha_{i}^{*}<C αi∗<C,则 ξ i = 0 \xi_{i}=0 ξi=0,支持向量 x i x_{i} xi恰好在间隔边界上
- α i ∗ = C , 0 < ξ i < 1 \alpha_{i}^{*}=C,0<\xi_{i}<1 αi∗=C,0<ξi<1,分类正确,支持向量 x i x_{i} xi在间隔边界和分类超平面之间
- α i ∗ = C , ξ i = 1 \alpha_{i}^{*}=C,\xi_i=1 αi∗=C,ξi=1,支持向量 x i x_{i} xi在分类超平面上
- α i ∗ = C , ξ i > 1 \alpha_{i}^{*}=C,\xi_i>1 αi∗=C,ξi>1,分类错误,支持向量 x i x_{i} xi在分类超平面误分一侧
合页损失函数:
线性支持向量机的原始最优化问题
min
w
,
b
,
ξ
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
s.t.
y
i
(
w
⋅
x
i
+
b
)
⩾
1
−
ξ
i
,
i
=
1
,
2
,
⋯
,
N
ξ
i
⩾
0
,
i
=
1
,
2
,
⋯
,
N
\begin{array}{ll}{\min _{w, b, \xi}} & {\frac{1}{2}\|w\|^{2}+C \displaystyle \sum_{i=1}^{N} \xi_{i}} \\ {\text { s.t. }} & {y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N} \\ {} & {\xi_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array}
minw,b,ξ s.t. 21∥w∥2+Ci=1∑Nξiyi(w⋅xi+b)⩾1−ξi,i=1,2,⋯,Nξi⩾0,i=1,2,⋯,N
等价于最优化问题:
min
w
,
b
∑
i
=
1
N
[
1
−
y
i
(
w
⋅
x
i
+
b
)
]
+
+
λ
∥
w
∥
2
\min _{w, b} \sum_{i=1}^{N}\left[1-y_{i}\left(w \cdot x_{i}+b\right)\right]_{+}+\lambda\|w\|^{2}
w,bmini=1∑N[1−yi(w⋅xi+b)]++λ∥w∥2
y
i
(
w
⋅
x
i
+
b
)
y_{i}\left(w \cdot x_{i}+b\right)
yi(w⋅xi+b)是函数间隔(确信度),
λ
\lambda
λ是
w
w
w的
L
2
L_2
L2范数
函数
L
(
y
(
w
⋅
x
+
b
)
)
=
[
1
−
y
(
w
⋅
x
+
b
)
]
+
L(y(w \cdot x+b))=[1-y(w \cdot x+b)]_{+}
L(y(w⋅x+b))=[1−y(w⋅x+b)]+为合页损失函数,下标“+”表示取正值的函数:
[
z
]
+
=
{
z
,
z
>
0
0
,
z
⩽
0
[z]_{+}=\left\{\begin{array}{ll}{z,} & {z>0} \\ {0,} & {z \leqslant 0}\end{array}\right.
[z]+={z,0,z>0z⩽0
合页损失函数的图形为:
合页损失函数对 y i ( w ⋅ x i + b ) y_{i}\left(w \cdot x_{i}+b\right) yi(w⋅xi+b)的要求更高,不仅要分类正确