SVM
Suppot Vector Machine
支持向量机有三宝,间隔,对偶,核技巧
简而言之,SVM是一个二分类问题模型,总而言之,SVM就是找到一个超平面
w
T
x
+
b
w^Tx+b
wTx+b,使得正类
w
T
x
+
b
>
0
w^Tx+b>0
wTx+b>0,相反,负类
w
T
x
+
b
<
0
w^Tx+b<0
wTx+b<0。本质是一个最大间隔分类器。
{
(
x
i
,
y
i
)
}
,
x
i
∈
R
,
y
i
∈
{
−
1
,
1
}
{\lbrace(x_i,y_i)\rbrace}, x_i \in R ,y_i\in {\lbrace-1,1\rbrace}
{(xi,yi)},xi∈R,yi∈{−1,1}
我们首先定义距离distance,假设一个样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)距离超平面
w
T
x
+
b
w^Tx+b
wTx+b的距离定义为
d
i
s
t
a
n
c
e
=
1
∣
∣
w
∣
∣
∣
w
T
x
i
+
b
∣
distance = \frac{1}{||w||}|w^Tx_i+b|
distance=∣∣w∣∣1∣wTxi+b∣,
然后间隔就是
m
a
r
g
i
n
(
w
,
b
)
=
min
w
,
b
,
x
i
{
d
i
s
t
a
n
c
e
(
w
,
b
,
x
i
)
}
margin(w,b) = \min_{w,b,x_i}\lbrace distance(w,b,x_i) \rbrace
margin(w,b)=minw,b,xi{distance(w,b,xi)}
硬间隔SVM
总的而言:
w
T
x
+
b
>
0
,
y
i
=
+
1
w^Tx+b>0 , y_i=+1
wTx+b>0,yi=+1
w
T
x
+
b
<
0
,
y
i
=
−
1
w^Tx+b<0 , y_i=-1
wTx+b<0,yi=−1
上面的两个式子可以转换成一个
y
i
(
w
T
x
i
+
b
)
>
0
,
∀
i
=
1...
N
y_i(w^Tx_i+b)>0,\forall i=1...N
yi(wTxi+b)>0,∀i=1...N
m
a
r
g
i
n
(
w
,
b
)
=
min
w
,
b
,
x
i
1
∣
∣
w
∣
∣
∣
w
T
x
i
+
b
∣
margin(w,b)=\min_{w,b,x_i}\frac{1}{||w||}|w^Tx_i+b|
margin(w,b)=w,b,ximin∣∣w∣∣1∣wTxi+b∣
最大间隔:
max
w
,
b
min
x
i
1
∣
∣
w
∣
∣
∣
w
T
x
i
+
b
∣
,
s
t
∀
i
=
1...
N
,
y
i
(
w
T
x
i
+
b
)
>
0
\max_{w,b}\min_{x_i}\frac{1}{||w||}|w^Tx_i+b|,st \forall i=1...N,y_i(w^Tx_i+b)>0
w,bmaxximin∣∣w∣∣1∣wTxi+b∣,st∀i=1...N,yi(wTxi+b)>0
max
w
,
b
min
x
i
1
∣
∣
w
∣
∣
∣
w
T
x
i
+
b
∣
=
max
w
,
b
min
x
i
1
∣
∣
w
∣
∣
y
i
(
x
T
x
i
+
b
)
=
max
w
,
b
1
∣
∣
w
∣
∣
min
x
i
y
i
(
w
T
x
i
+
b
)
\max_{w,b}\min_{x_i}\frac{1}{||w||}|w^Tx_i+b| =\max_{w,b}\min_{x_i} \frac{1}{||w||} y_i(x^Tx_i+b)=\max_{w,b}\frac{1}{||w||}\min_{x_i}y_i(w^Tx_i+b)
w,bmaxximin∣∣w∣∣1∣wTxi+b∣=w,bmaxximin∣∣w∣∣1yi(xTxi+b)=w,bmax∣∣w∣∣1ximinyi(wTxi+b)
∃
γ
>
0
,
s
t
min
x
i
,
y
i
y
i
(
w
T
x
i
+
b
)
=
γ
\exists \gamma>0,st\min_{x_i,y_i}y_i(w^Tx_i+b)=\gamma
∃γ>0,stxi,yiminyi(wTxi+b)=γ
γ
=
1
\gamma = 1
γ=1则:
max
w
,
b
1
∣
∣
w
∣
∣
min
x
i
y
i
(
w
T
x
i
+
b
)
=
max
w
,
b
1
∣
∣
w
∣
∣
\max_{w,b}\frac{1}{||w||}\min_{x_i}y_i(w^Tx_i+b) = \max_{w,b}\frac{1}{||w||}
w,bmax∣∣w∣∣1ximinyi(wTxi+b)=w,bmax∣∣w∣∣1
这样就转化成为了一个凸优化问题
∀
i
=
1..
N
,
s
t
y
i
(
w
T
x
i
+
b
)
>
=
1
,
min
w
,
b
1
2
w
T
w
\forall i=1..N,st ~ y_i(w^Tx_i+b)>=1~,~\min_{w,b}\frac{1}{2}w^Tw
∀i=1..N,st yi(wTxi+b)>=1 , w,bmin21wTw
拉格朗日乘子法
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
[
1
−
y
i
(
w
T
x
i
+
b
)
]
\mathcal{L}(w,b,\lambda) = \frac{1}{2}w^Tw+\sum_{i=1}^{N}\lambda_i[1-y_i(w^Tx_i+b)]
L(w,b,λ)=21wTw+i=1∑Nλi[1−yi(wTxi+b)]
转化为无参约束问题的解释:
i
f
1
−
y
i
(
w
T
x
i
+
b
)
>
0
,
max
λ
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∞
=
∞
if ~ 1-y_i(w^Tx_i+b)>0 ~,~\max_{\lambda}\mathcal{L}(w,b,\lambda)=\frac{1}{2} w^T w + \infty = \infty
if 1−yi(wTxi+b)>0 , λmaxL(w,b,λ)=21wTw+∞=∞
i
f
1
−
y
i
(
w
T
x
i
+
b
)
<
=
0
,
max
λ
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
0
=
1
2
w
T
w
if ~ 1-y_i(w^Tx_i+b)<=0 ~,~\max_{\lambda}\mathcal{L}(w,b,\lambda)=\frac{1}{2} w^T w + 0 = \frac{1}{2} w^T w
if 1−yi(wTxi+b)<=0 , λmaxL(w,b,λ)=21wTw+0=21wTw
满足kkt条件,将凸优化问题转换成为一个无参数约束问题:
λ
i
>
=
0
,
min
w
,
b
max
λ
L
(
w
,
b
,
λ
)
\lambda_i>=0~,~ \min_{w,b}\max_{\lambda}\mathcal{L}(w,b,\lambda)
λi>=0 , w,bminλmaxL(w,b,λ)
转化为对偶问题
λ
i
>
=
0
,
max
λ
min
w
,
b
L
(
w
,
b
,
λ
)
\lambda_i>=0~,~ \max_{\lambda}\min_{w,b}\mathcal{L}(w,b,\lambda)
λi>=0 , λmaxw,bminL(w,b,λ)
接下来就是计算
min
w
,
b
L
(
w
,
b
,
λ
)
\min_{w,b}\mathcal{L}(w,b,\lambda)
minw,bL(w,b,λ)问题
∂
L
∂
b
=
−
∑
i
=
1
N
λ
i
y
i
=
0
\frac{\partial \mathcal{L} }{\partial b} = -\sum_{i=1}^N \lambda_iy_i = 0
∂b∂L=−i=1∑Nλiyi=0
将上述计算结果带入
L
(
w
,
b
,
λ
)
\mathcal{L}(w,b,\lambda)
L(w,b,λ)
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
−
∑
i
=
1
N
λ
i
y
i
w
T
x
i
\mathcal{L}(w,b,\lambda) = \frac{1}{2}w^Tw+\sum_{i=1}^N\lambda_i-\sum_{i=1}^N\lambda_iy_iw^Tx_i
L(w,b,λ)=21wTw+i=1∑Nλi−i=1∑NλiyiwTxi
∂
L
∂
w
=
1
/
2
∗
2
w
−
∑
i
=
1
N
λ
i
y
i
x
i
=
0
=
>
w
=
∑
i
=
1
N
λ
i
y
i
x
i
\frac{\partial \mathcal{L} }{\partial w} = 1/2 * 2w-\sum_{i=1}^N\lambda_iy_ix_i = 0 => w = \sum_{i=1}^N\lambda_iy_ix_i
∂w∂L=1/2∗2w−i=1∑Nλiyixi=0=>w=i=1∑Nλiyixi
L
(
w
,
b
,
λ
)
=
−
1
2
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
i
T
x
j
+
∑
i
=
1
N
λ
i
\mathcal{L}(w,b,\lambda) = -\frac{1}{2}\sum_{i=1}^N \sum_{j=1} ^N \lambda_i \lambda_j y_iy_j{x_i}^T x_j + \sum_{i=1}^N \lambda_i
L(w,b,λ)=−21i=1∑Nj=1∑NλiλjyiyjxiTxj+i=1∑Nλi
原问题转化成为
λ
i
>
=
0
,
max
λ
−
1
2
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
i
T
x
j
+
∑
i
=
1
N
λ
i
\lambda_i>=0~,~\max_{\lambda}-\frac{1}{2}\sum_{i=1}^N \sum_{j=1} ^N \lambda_i \lambda_j y_iy_j{x_i}^T x_j + \sum_{i=1}^N \lambda_i
λi>=0 , λmax−21i=1∑Nj=1∑NλiλjyiyjxiTxj+i=1∑Nλi
λ
i
>
=
0
,
min
λ
1
2
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
i
T
x
j
+
∑
i
=
1
N
λ
i
\lambda_i>=0~,~\min_{\lambda} \frac{1}{2}\sum_{i=1}^N \sum_{j=1} ^N \lambda_i \lambda_j y_iy_j{x_i}^T x_j + \sum_{i=1}^N \lambda_i
λi>=0 , λmin21i=1∑Nj=1∑NλiλjyiyjxiTxj+i=1∑Nλi
强对偶关系需要满足KKT条件:
KKT
∂
L
∂
w
=
0
,
∂
L
∂
b
=
0
,
∂
L
∂
λ
=
0
,
\frac{\partial \mathcal{L} }{\partial w}=0, \frac{\partial \mathcal{L} }{\partial b}=0, \frac{\partial \mathcal{L} }{\partial \lambda}=0,
∂w∂L=0,∂b∂L=0,∂λ∂L=0,
λ
i
(
1
−
y
i
(
w
T
x
i
+
b
)
)
=
0
\lambda_i(1-y_i(w^Tx_i+b))=0
λi(1−yi(wTxi+b))=0
λ
i
>
=
0
\lambda_i>=0
λi>=0
1
−
y
i
(
w
T
x
i
+
b
)
<
=
0
1-y_i(w^Tx_i+b)<=0
1−yi(wTxi+b)<=0
根据kkt条件求得:
w
∗
=
∑
i
=
0
N
λ
i
y
i
x
i
w* = \sum_{i=0}^N\lambda_iy_ix_i
w∗=i=0∑Nλiyixi
b
∗
=
y
k
−
∑
i
=
0
N
λ
i
y
i
x
i
T
x
k
b* = y_k-\sum_{i=0}^N\lambda_iy_i{x_i}^Tx_k
b∗=yk−i=0∑NλiyixiTxk
判别面的方程为:
w
∗
T
x
+
b
∗
w*^Tx+b*
w∗Tx+b∗
软间隔SVM
硬间隔SVM默认数据是可分的,但是,数据有时候往往是不可分的,或者是存在噪声点,这时候就引入软间隔SVM,加上一个loss(距离)
min
w
,
b
1
2
w
T
w
+
l
o
s
s
\min_{w,b} \frac{1}{2}w^Tw + loss
w,bmin21wTw+loss
i
f
y
i
(
w
T
x
i
+
b
)
>
=
1
,
l
o
s
s
=
0
if ~~ y_i(w^Tx_i+b)>=1 ~,~ loss=0
if yi(wTxi+b)>=1 , loss=0
i
f
y
i
(
w
T
x
i
+
b
)
<
1
,
l
o
s
s
=
1
−
y
i
(
w
T
x
i
+
b
)
if ~~ y_i(w^Tx_i+b)<1 ~,~ loss=1-y_i(w^Tx_i+b)
if yi(wTxi+b)<1 , loss=1−yi(wTxi+b)
conclude
l
o
s
s
=
m
a
x
{
0
,
1
−
y
i
(
w
T
x
i
+
b
)
}
loss = max\lbrace 0,1-y_i(w^Tx_i+b)\rbrace
loss=max{0,1−yi(wTxi+b)}
优化函数就变成(C是参数,需要自己调整)
min
w
,
b
1
2
w
T
w
+
C
∑
i
=
1
N
m
a
x
{
0
,
1
−
y
i
(
w
T
x
i
+
b
)
}
\min_{w,b} \frac{1}{2}w^Tw +C \sum_{i=1}^Nmax\lbrace 0,1-y_i(w^Tx_i+b)\rbrace
w,bmin21wTw+Ci=1∑Nmax{0,1−yi(wTxi+b)}
令
ξ
i
=
1
−
y
i
(
w
T
x
i
+
b
)
\xi_i = 1-y_i(w^Tx_i+b)
ξi=1−yi(wTxi+b)
得
y
i
(
w
T
x
i
+
b
)
>
=
1
−
ξ
i
,
min
w
,
b
1
2
w
T
w
+
C
∑
i
=
1
N
ξ
i
y_i(w^Tx_i+b)>=1-\xi_i~,~\min_{w,b} \frac{1}{2}w^Tw +C \sum_{i=1}^N\xi_i
yi(wTxi+b)>=1−ξi , w,bmin21wTw+Ci=1∑Nξi