Z.F Zhou Watermelon Book SVM.This article is a supplementary material for SVM.
Watermelon Book
6.3
Suppose training data set is linear separable.Minimum margin is
δ
\delta
δ.
{
w
T
x
+
b
>
=
+
δ
,
y
i
=
+
1
w
T
x
+
b
<
=
−
δ
,
y
i
=
−
1
→
{
w
T
δ
x
+
b
δ
>
=
+
1
,
y
i
=
+
1
w
T
δ
x
+
b
δ
<
=
−
1
,
y
i
=
−
1
\begin{cases} w^Tx+b>=+\delta,y_i=+1\\ w^Tx+b<=-\delta,y_i=-1 \end{cases}\\ \rightarrow \begin{cases} \frac{w^T}{\delta}x+\frac{b}{\delta}>=+1,y_i=+1\\ \frac{w^T}{\delta}x+\frac{b}{\delta}<=-1,y_i=-1 \end{cases}
{wTx+b>=+δ,yi=+1wTx+b<=−δ,yi=−1→{δwTx+δb>=+1,yi=+1δwTx+δb<=−1,yi=−1
6.8
Refer to the below link about lagrange and KKT.
6.9
L = 1 / 2 w T w + ∑ i = 1 m a i ( 1 − y i ( w t x i + b ) ) d e r i v a t i o n 0 = w − ∑ i = 1 m a i y i x i → w = ∑ i = 1 m a i y i x i L=1/2w^Tw+\sum_{i=1}^ma_i(1-y_i(w^tx_i+b))\\ derivation\\ 0=w-\sum_{i=1}^ma_iy_ix_i\\ \rightarrow w=\sum_{i=1}^ma_iy_ix_i L=1/2wTw+i=1∑mai(1−yi(wtxi+b))derivation0=w−i=1∑maiyixi→w=i=1∑maiyixi
6.11
You can compute 1 2 ∣ ∣ w ∣ ∣ 2 \frac{1}{2}||w||^2 21∣∣w∣∣2 and ∑ i = 1 m a i ( 1 − y i ( w t x i + b ) ) \sum_{i=1}^ma_i(1-y_i(w^tx_i+b)) ∑i=1mai(1−yi(wtxi+b)) respectively.I am lazy.
6.18
Because the a i a_i ai may have error with theory value using SMO.
6.41
In below of 6.41,the interpretation about support vectors is based on a i ! = 0 a_i!=0 ai!=0.
[1] Partial Reference
KKT
part1
What is clear interpretation is:
Three equations base on:
u
=
0
o
r
g
=
0
w
h
e
r
e
u
a
n
d
g
a
r
e
v
e
c
t
o
r
s
.
(
u
i
=
0
o
r
g
i
=
0
)
u=0\ or\ g=0\ where\ u\ and\ g\ are\ vectors.(u_i=0\ or\ g_i=0)
u=0 or g=0 where u and g are vectors.(ui=0 or gi=0)
For three equations,we can compute a
x
x
x minimizing
f
(
x
)
f(x)
f(x) by compute another formulation,e,g,
min
x
max
u
L
(
x
,
u
)
\min_x\max_u L(x,u)
minxmaxuL(x,u).Their
x
x
x and
u
u
u are common.
part2
Why is the
L
(
x
^
,
u
)
L(\hat{x},u)
L(x^,u) equivalent to the
min
x
L
(
x
,
u
)
\min_x L(x,u)
minxL(x,u)?
I think it can use proof by contradiction.Suppose
L
(
x
′
,
u
)
=
min
x
L
(
x
,
u
)
L(x',u)=\min_x L(x,u)
L(x′,u)=minxL(x,u).Then
max
u
max
x
L
(
x
,
u
)
=
max
u
L
(
x
′
,
u
)
=
f
(
x
′
)
!
=
f
(
x
^
)
\max_u\max_xL(x,u)=\max_uL(x',u)=f(x')!=f(\hat{x})
maxumaxxL(x,u)=maxuL(x′,u)=f(x′)!=f(x^).It is contradictory.
Lagrange
I think the link in below is very detailed.
Reference
[1]Lagrange and KKT condition.
[2]拉格朗日乘数