线性支持向量机学习有另一种解释,那就是最小化以下目标函数:
∑
i
=
1
N
[
1
−
y
i
(
w
⋅
x
i
+
b
)
]
+
+
λ
∣
∣
w
∣
∣
2
\sum_{i=1}^N[1-y_i(w · x_i+b)]_+ + \lambda ||w||^2
i=1∑N[1−yi(w⋅xi+b)]++λ∣∣w∣∣2
目标函数得第一项是经验损失函数或者经验风险,函数
L
(
y
(
w
⋅
x
+
b
)
)
=
[
1
−
y
(
w
⋅
x
+
b
)
]
+
L(y(w·x+b)) = [1-y(w·x+b)]_+
L(y(w⋅x+b))=[1−y(w⋅x+b)]+称为合页损失函数。下标+表示以下取正值得函数。
[
z
]
+
=
{
z
,
z
>
0
0
,
z
≤
0
[z]_+ =\begin{cases} z,&z>0\\ 0,&z \le 0 \end{cases}
[z]+={z,0,z>0z≤0
目标函数第二项是系数为
λ
\lambda
λ的
w
w
w的
L
2
L_2
L2范数,是正则化项。
定理证明
线性支持向量机原始最优化问题:
(1)
min
w
,
b
,
ξ
1
2
∣
∣
w
∣
∣
2
+
C
∑
i
=
1
N
ξ
i
\min_{w,b,\xi} \dfrac{1}{2} ||w||^2 + C\sum_{i=1}^{N}\xi_i \tag{1}
w,b,ξmin21∣∣w∣∣2+Ci=1∑Nξi(1)
(2)
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
≥
1
−
ξ
i
,
i
=
1
,
2
,
…
,
N
s.t. \ y_i(w·x_i+b) \ge 1- \xi_i,i=1,2,\dots,N \tag{2}
s.t. yi(w⋅xi+b)≥1−ξi,i=1,2,…,N(2)
(3)
ξ
i
≥
0
,
i
=
1
,
2
,
…
,
N
\xi_i \ge 0,i=1,2,\dots,N \tag{3}
ξi≥0,i=1,2,…,N(3)
等价于最优化问题
(4)
min
w
,
b
∑
i
=
1
N
[
1
−
y
i
(
w
⋅
x
i
+
b
)
]
+
+
λ
∣
∣
w
∣
∣
2
\min\limits_{w,b} \sum_{i=1}^N [1-y_i(w·x_i+b)]_+ + \lambda||w||^2 \tag{4}
w,bmini=1∑N[1−yi(w⋅xi+b)]++λ∣∣w∣∣2(4)
证明:
令
[
1
−
y
i
(
w
⋅
x
i
+
b
)
]
+
=
ξ
i
[1-y_i(w·x_i+b)]_+ = \xi_i
[1−yi(w⋅xi+b)]+=ξi,则
ξ
i
≥
0
\xi_i \ge 0
ξi≥0,式(2)成立。
当
1
−
y
i
(
w
⋅
x
i
+
b
)
>
0
1-y_i(w·x_i+b)>0
1−yi(w⋅xi+b)>0时,有
1
−
y
i
(
w
⋅
x
i
+
b
)
=
ξ
i
1-y_i(w·x_i+b) = \xi_i
1−yi(w⋅xi+b)=ξi,
y
i
(
w
⋅
x
i
+
b
)
=
1
−
ξ
i
y_i(w·x_i+b)=1-\xi_i
yi(w⋅xi+b)=1−ξi;当
1
−
y
i
(
w
⋅
x
i
+
b
)
≤
0
1-y_i(w·x_i+b) \le0
1−yi(w⋅xi+b)≤0时,有
ξ
i
=
0
\xi_i =0
ξi=0,
y
i
(
w
⋅
x
i
+
b
)
≥
1
−
ξ
i
y_i(w·x_i+b) \ge 1-\xi_i
yi(w⋅xi+b)≥1−ξi,故(3)式成立。
于是
w
,
b
,
ξ
w,b,\xi
w,b,ξ 满足约束条件(2)(3),所以最优化问题(4)可以写成
min
w
,
b
∑
i
=
1
N
ξ
i
+
λ
∣
∣
w
∣
∣
2
\min\limits_{w,b} \sum_{i=1}^N\xi_i + \lambda||w||^2
w,bmini=1∑Nξi+λ∣∣w∣∣2,若取
C
⋅
2
λ
=
1
C·2\lambda=1
C⋅2λ=1,则
min
w
,
b
1
C
(
1
2
∣
∣
w
∣
∣
2
+
C
∑
i
=
1
N
ξ
i
)
\min\limits_{w,b} \dfrac{1}{C}{(\dfrac{1}{2}||w||^2+C\sum_{i=1}^N\xi_i)}
w,bminC1(21∣∣w∣∣2+Ci=1∑Nξi)