点到超平面的距离
向量
x
x
x到超平面
w
T
x
+
b
w^Tx+b
wTx+b的距离为
∣
w
T
(
x
−
x
0
)
∣
=
∣
w
∣
∣
x
−
x
0
∣
cos
π
2
=
∣
∣
w
∣
∣
d
|w^T(x-x_0)|=|w||x-x_0|\cos \frac{\pi}{2}=||w||d
∣wT(x−x0)∣=∣w∣∣x−x0∣cos2π=∣∣w∣∣d
又因为
w
T
(
x
−
x
0
)
=
w
T
x
−
w
T
x
0
=
w
T
x
+
b
w^T(x-x_0)=w^Tx-w^Tx_0=w^Tx+b
wT(x−x0)=wTx−wTx0=wTx+b
合并上两式
∣
w
T
(
x
−
x
0
)
∣
=
∣
w
T
x
+
b
∣
=
∣
∣
w
∣
∣
d
⇒
d
=
1
∣
∣
w
∣
∣
∣
w
T
x
+
b
∣
\begin{aligned} |w^T(x-x_0)| &=|w^Tx+b|=||w||d \\ & \Rightarrow d=\frac{1}{||w||}|w^Tx+b| \end{aligned}
∣wT(x−x0)∣=∣wTx+b∣=∣∣w∣∣d⇒d=∣∣w∣∣1∣wTx+b∣
硬间隔分类器
由于超平面有两端,再另一端则会出现负号,为了消除这个负号,我们乘上类标
y
∈
{
−
1
,
1
}
y \in \{ -1,1\}
y∈{−1,1}
γ
i
=
y
i
(
w
T
x
i
+
b
)
\gamma_i=y_i(w^Tx_i+b)
γi=yi(wTxi+b)
结合点到超平面的距离公式,发现只需要除以
∣
∣
w
∣
∣
||w||
∣∣w∣∣就能将
γ
i
\gamma_i
γi转化为距离
d
i
=
γ
i
∣
∣
w
∣
∣
=
y
i
(
w
T
x
+
b
)
∣
∣
w
∣
∣
d_i=\frac{\gamma_i}{||w||}=\frac{y_i(w^Tx+b)}{||w||}
di=∣∣w∣∣γi=∣∣w∣∣yi(wTx+b)
为了使得每个向量到超平面的距离尽可能的大,只需要使最小的
d
i
d_i
di最大即可,同时,其他向量到超平面的距离会大于这个最小距离,即:
max
w
,
b
γ
∣
∣
w
∣
∣
s
.
t
.
y
i
(
w
T
x
i
+
b
)
≥
0
,
i
=
1
,
…
,
n
\begin{aligned} \max_{w,b}\ & \frac{\gamma}{||w||} \\ s.t.\ & y_i(w^Tx_i+b) \ge 0,i=1,\dots,n \end{aligned}
w,bmax s.t. ∣∣w∣∣γyi(wTxi+b)≥0,i=1,…,n
此处使得
γ
\gamma
γ取到最小的
x
i
x_i
xi即为支持向量,支持向量机也就行想使得支持向量离超平面尽可能的远。
由于
w
,
b
w,b
w,b可以以任意比例缩放,所以令
γ
=
1
\gamma=1
γ=1,可推出
max
w
,
b
1
∣
∣
w
∣
∣
s
.
t
.
y
i
(
w
T
x
i
+
b
)
≥
1
,
i
=
1
,
…
,
n
\begin{aligned} \max_{w,b}\ & \frac{1}{||w||} \\ s.t.\ & y_i(w^Tx_i+b) \ge 1,i=1,\dots,n \end{aligned}
w,bmax s.t. ∣∣w∣∣1yi(wTxi+b)≥1,i=1,…,n
当
1
∣
∣
w
∣
∣
\frac{1}{||w||}
∣∣w∣∣1取得最大值时
∣
∣
w
∣
∣
||w||
∣∣w∣∣最小,故可再转化为
min
w
,
b
w
T
w
s
.
t
.
y
i
(
w
T
x
i
+
b
)
≥
1
,
i
=
1
,
…
,
n
\begin{aligned} \min_{w,b}\ & w^Tw \\ s.t.\ & y_i(w^Tx_i+b) \ge 1,i=1,\dots,n \end{aligned}
w,bmin s.t. wTwyi(wTxi+b)≥1,i=1,…,n
为了解决带约束的最优化问题,使用拉格朗日乘子法,构建拉格朗日函数
L
(
w
,
b
,
α
)
=
1
2
w
T
w
+
∑
i
=
1
n
a
i
[
1
−
y
i
(
w
T
x
i
+
b
)
]
\mathcal{L}(w,b,\alpha)=\frac{1}{2}w^Tw+\sum_{i=1}^{n}{a_i \left[ 1- y_i(w^Tx_i+b) \right]}
L(w,b,α)=21wTw+i=1∑nai[1−yi(wTxi+b)]
于是将问题转化为不带
w
,
b
w,b
w,b约束的优化问题
min
w
,
b
max
α
L
s
.
t
.
α
i
≥
0
\begin{aligned} \min_{w,b} & \max_{\alpha}\mathcal{L} \\ s.t.\ & \alpha_i \ge 0 \end{aligned}
w,bmins.t. αmaxLαi≥0
当
1
−
y
i
(
w
T
x
i
+
b
)
1- y_i(w^Tx_i+b)
1−yi(wTxi+b)不满足约束时,
m
a
x
α
L
=
max_{\alpha}\mathcal{L}=
maxαL=,这样是没有意义的。而当其满足约束时
m
a
x
α
L
=
0
max_{\alpha}\mathcal{L}=0
maxαL=0。
再将其转化为对偶问题
max
α
min
w
,
b
L
s
.
t
.
α
i
≥
0
\begin{aligned} \max_{\alpha} & \min_{w,b}\mathcal{L} \\ s.t.\ & \alpha_i \ge 0 \end{aligned}
αmaxs.t. w,bminLαi≥0
先看最小化的部分,发现与
α
\alpha
α无关,于是可以直接对
w
,
b
w,b
w,b求导
∂
L
∂
b
=
0
⇒
∑
i
=
1
n
α
i
y
i
=
0
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
x
i
\begin{aligned} &\frac{\partial \mathcal{L}}{\partial b}=0 \Rightarrow \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\frac{\partial \mathcal{L}}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n}{\alpha_iy_ix_i} \end{aligned}
∂b∂L=0⇒i=1∑nαiyi=0∂w∂L=0⇒w=i=1∑nαiyixi
将其带回原式
L
=
1
2
(
∑
i
=
1
n
α
i
y
i
x
i
)
T
∑
j
=
1
n
α
j
y
j
x
j
−
∑
i
=
1
n
α
i
y
i
[
(
∑
j
=
1
n
α
j
y
j
x
j
)
T
x
i
+
b
]
+
∑
i
=
1
n
α
i
=
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
−
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
+
∑
i
=
1
n
α
i
=
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
\begin{aligned} \mathcal{L}&=\frac{1}{2}(\sum_{i=1}^{n}{\alpha_iy_ix_i})^T\sum_{j=1}^{n}{\alpha_jy_jx_j}-\sum_{i=1}^{n} \alpha_iy_i \left[ (\sum_{j=1}^{n}{\alpha_jy_jx_j})^Tx_i+b \right] +\sum_{i=1}^{n}\alpha_i\\ &=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j-\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^{n}\alpha_i\\ &=\sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \end{aligned}
L=21(i=1∑nαiyixi)Tj=1∑nαjyjxj−i=1∑nαiyi[(j=1∑nαjyjxj)Txi+b]+i=1∑nαi=21i=1∑nj=1∑nαiαjyiyjxiTxj−i=1∑nj=1∑nαiαjyiyjxiTxj+i=1∑nαi=i=1∑nαi−21i=1∑nj=1∑nαiαjyiyjxiTxj
将原约束问题转化为
max
α
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
s
.
t
.
∑
i
=
1
n
α
i
y
i
=
0
α
i
≥
0
\begin{aligned} \max_{\alpha} & \sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \\ s.t.\ & \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\ \alpha_i \ge 0 \end{aligned}
αmaxs.t. i=1∑nαi−21i=1∑nj=1∑nαiαjyiyjxiTxji=1∑nαiyi=0 αi≥0
但上式只能对
w
w
w求解,而对于
b
b
b的求解则需要用到KKT条件(因为函数
L
\mathcal{L}
L满足一些条件,故其满足KKT条件)
{
∂
L
∂
w
=
0
,
∂
L
∂
b
=
0
α
i
[
1
−
y
i
(
w
T
x
i
+
b
)
]
=
0
1
−
y
i
(
w
T
x
i
+
b
)
≤
0
α
i
≥
0
\left\{ \begin{aligned} &\frac{\partial \mathcal{L}}{\partial w}=0, \frac{\partial \mathcal{L}}{\partial b}=0\\ &\pmb{\alpha_i \left[ 1-y_i(w^Tx_i+b)\right]=0}\\ &1-y_i(w^Tx_i+b) \le 0\\ &\alpha_i \ge 0 \end{aligned} \right.
⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧∂w∂L=0,∂b∂L=0αi[1−yi(wTxi+b)]=0αi[1−yi(wTxi+b)]=0αi[1−yi(wTxi+b)]=01−yi(wTxi+b)≤0αi≥0
故当
x
i
x_i
xi为支持向量时满足
1
−
y
i
(
w
T
x
i
+
b
)
=
0
1-y_i(w^Tx_i+b)=0
1−yi(wTxi+b)=0,推出
b
=
y
i
−
w
T
x
i
b=y_i-w^Tx_i
b=yi−wTxi,再结合
w
w
w
{
w
∗
=
∑
i
=
1
n
α
i
y
i
x
i
b
∗
=
y
i
−
(
∑
j
=
1
n
α
j
y
j
x
j
)
T
x
i
\left\{ \begin{aligned} &w^*=\sum_{i=1}^{n}{\alpha_iy_ix_i}\\ &b^*=y_i-(\sum_{j=1}^{n}{\alpha_jy_jx_j})^Tx_i \end{aligned} \right.
⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧w∗=i=1∑nαiyixib∗=yi−(j=1∑nαjyjxj)Txi
软间隔分类器
在数据不能线性可分的情况下,硬间隔SVM是不收敛的,故在原有最优化条件上加一个损失,使其成为软间隔分离器
min
w
,
b
w
T
w
+
C
∑
i
=
1
n
(
max
{
0
,
1
−
y
i
(
w
T
x
i
+
b
)
}
)
\min_{w,b}\ w^Tw + C\sum_{i=1}^{n}(\max \left\{ 0,1-y_i(w^Tx_i+b) \right\})
w,bmin wTw+Ci=1∑n(max{0,1−yi(wTxi+b)})
但是一般不会写成括号里的形式,令
max
{
0
,
1
−
y
i
(
w
T
x
i
+
b
)
}
=
ξ
i
\max \left\{ 0,1-y_i(w^Tx_i+b) \right\}=\xi_i
max{0,1−yi(wTxi+b)}=ξi,故将最优化问题转化为
min
w
,
b
w
T
w
+
C
∑
i
=
1
n
ξ
i
s
.
t
.
y
i
(
w
T
x
i
+
b
)
≥
1
−
ξ
i
ξ
i
≥
0
\begin{aligned} \min_{w,b}\ &w^Tw + C\sum_{i=1}^{n} \xi_i \\ s.t.\ & y_i(w^Tx_i+b) \ge 1-\xi_i \\ & \xi_i \ge 0 \end{aligned}
w,bmin s.t. wTw+Ci=1∑nξiyi(wTxi+b)≥1−ξiξi≥0
引入拉格朗日乘子,并将其转化为对偶问题
max
α
,
β
min
w
,
b
L
=
w
T
w
+
C
∑
i
=
1
n
ξ
i
−
∑
i
=
1
n
α
i
[
ξ
i
+
y
i
(
w
T
x
i
+
b
)
−
1
]
−
∑
i
=
1
n
β
i
ξ
i
α
i
≥
0
β
i
≥
0
\begin{aligned} \max_{\alpha,\beta}\min_{w,b}\ & \mathcal{L}=w^Tw+C\sum_{i=1}^{n}\xi_i-\sum_{i=1}^{n}{\alpha_i\left[ \xi_i+y_i(w^Tx_i+b)-1 \right]}-\sum_{i=1}^{n}{\beta_i\xi_i}\\ \ & \alpha_i \ge 0\\ \ & \beta_i \ge 0 \end{aligned}
α,βmaxw,bmin L=wTw+Ci=1∑nξi−i=1∑nαi[ξi+yi(wTxi+b)−1]−i=1∑nβiξiαi≥0βi≥0
同时,其满足KKT条件
{
∂
L
∂
w
=
0
,
∂
L
∂
b
=
0
,
∂
L
∂
ξ
=
0
α
i
[
1
−
y
i
(
w
T
x
i
+
b
)
]
=
0
β
i
ξ
i
=
0
y
i
(
w
T
x
i
+
b
)
−
1
+
ξ
i
≤
0
ξ
i
,
α
i
,
β
i
≥
0
\left\{ \begin{aligned} &\frac{\partial \mathcal{L}}{\partial w}=0, \frac{\partial \mathcal{L}}{\partial b}=0, \frac{\partial \mathcal{L}}{\partial \xi}=0\\ &\alpha_i \left[ 1-y_i(w^Tx_i+b)\right]=0\\ &\beta_i\xi_i=0\\ &y_i(w^Tx_i+b)-1+\xi_i \le 0\\ &\xi_i,\alpha_i,\beta_i \ge 0 \end{aligned} \right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎧∂w∂L=0,∂b∂L=0,∂ξ∂L=0αi[1−yi(wTxi+b)]=0βiξi=0yi(wTxi+b)−1+ξi≤0ξi,αi,βi≥0
对
w
,
b
,
ξ
w,b,\xi
w,b,ξ分别求偏导得出:
∂
L
∂
b
=
0
⇒
∑
i
=
1
n
α
i
y
i
=
0
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
x
i
∂
L
∂
ξ
=
0
⇒
ξ
i
=
C
−
β
i
\begin{aligned} &\frac{\partial \mathcal{L}}{\partial b}=0 \Rightarrow \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\frac{\partial \mathcal{L}}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n}{\alpha_iy_ix_i}\\ &\frac{\partial \mathcal{L}}{\partial \xi}=0 \Rightarrow \xi_i=C-\beta_i \end{aligned}
∂b∂L=0⇒i=1∑nαiyi=0∂w∂L=0⇒w=i=1∑nαiyixi∂ξ∂L=0⇒ξi=C−βi
对
L
\mathcal{L}
L化简
L
=
w
T
w
−
∑
i
=
1
n
α
i
y
i
(
w
T
x
+
b
)
+
∑
i
=
1
n
α
i
+
∑
i
=
1
n
(
C
−
α
i
)
ξ
i
−
∑
i
=
1
n
(
C
−
α
i
)
ξ
i
=
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
\begin{aligned} \mathcal{L}&=w^Tw-\sum_{i=1}^{n}\alpha_iy_i(w^Tx+b)+\sum_{i=1}^{n}\alpha_i+\sum_{i=1}^{n}(C-\alpha_i)\xi_i-\sum_{i=1}^{n}(C-\alpha_i)\xi_i\\ &=\sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \end{aligned}
L=wTw−i=1∑nαiyi(wTx+b)+i=1∑nαi+i=1∑n(C−αi)ξi−i=1∑n(C−αi)ξi=i=1∑nαi−21i=1∑nj=1∑nαiαjyiyjxiTxj
优化问题转化为
max
α
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
x
i
T
x
j
s
.
t
.
0
≤
α
i
≤
C
,
i
=
1
,
…
,
n
∑
i
=
1
m
α
i
y
i
=
0
\begin{aligned} \max_{\alpha}\ & \sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \\ s.t.\ & 0\le \alpha_i \le C,i=1,\dots,n\\ &\sum_{i=1}^{m}\alpha_iy_i=0 \end{aligned}
αmax s.t. i=1∑nαi−21i=1∑nj=1∑nαiαjyiyjxiTxj0≤αi≤C,i=1,…,ni=1∑mαiyi=0
SMO
由于需要满足等式约束 ∑ i = 1 m α i y i = 0 \sum_{i=1}^{m}\alpha_iy_i=0 ∑i=1mαiyi=0,而当只改变一个 α i \alpha_i αi时会违反该约束,所以一次至少要对两个 α i \alpha_i αi进行修改