SVM
提示:以下是本篇文章正文内容,下面案例可供参考
一、SVM问题
min 1 2 ∥ w ∥ 2 s . t . y i ( w T x i + b ≥ 1 ) \begin{array}{l} \min \frac{1}{2}\|w\|^2\\ s.t.~~~ y_i(w^Tx_i+b\geq 1) \end{array} min21∥w∥2s.t. yi(wTxi+b≥1)
二、Lagrange function
**step 1.**约束罚上去
L : ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 n α i ( y i ( w T x i + b ) − 1 ) ) L:(w,b,\alpha)=\frac{1}{2}\|w\|^2-\sum^{n}_{i=1}\alpha_i(y_i(w^Tx_i+b)-1)) L:(w,b,α)=21∥w∥2−i=1∑nαi(yi(wTxi+b)−1))
step 2. 分别对w,b求偏导得0
L
w
(
w
,
b
,
α
)
=
w
−
∑
α
i
y
i
x
i
=
0
;
L
b
(
w
,
b
,
α
)
=
−
∑
α
i
y
i
=
0
;
\begin{array}{l} L_w(w,b,\alpha)=w-\sum\alpha_iy_ix_i=0;\\ L_b(w,b,\alpha)=-\sum\alpha_iy_i=0; \end{array}
Lw(w,b,α)=w−∑αiyixi=0;Lb(w,b,α)=−∑αiyi=0;
由上可得
w
i
=
∑
α
i
y
i
x
i
;
∑
α
i
y
i
=
0
;
\begin{array}{l} w_i=\sum\alpha_iy_ix_i;\\ \sum\alpha_iy_i=0; \end{array}
wi=∑αiyixi;∑αiyi=0;
setp 3 将上面两式代入
L
(
w
,
b
,
α
)
L(w,b,\alpha)
L(w,b,α)得
L
(
w
,
b
,
α
)
=
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
T
x
j
)
+
∑
i
=
1
n
α
i
;
L(w,b,\alpha)=-\frac{1}{2}\sum^n_{i=1}\sum^n_{j=1}\alpha_i\alpha_jy_iy_j(x_i^Tx_j)+\sum^n_{i=1}\alpha_i;
L(w,b,α)=−21i=1∑nj=1∑nαiαjyiyj(xiTxj)+i=1∑nαi;
三. 求解如下函数
step 1.
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
T
x
j
)
−
∑
i
=
1
n
α
i
;
s
.
t
.
∑
i
=
1
n
α
i
y
i
=
0
,
α
≥
0
,
i
=
1
,
.
.
.
.
,
n
\begin{array}{l} \min_\alpha~~ \frac{1}{2}\sum^n_{i=1}\sum^n_{j=1}\alpha_i\alpha_jy_iy_j(x_i^Tx_j)-\sum^n_{i=1}\alpha_i;\\ s.t. ~~\sum^n_{i=1}\alpha_iy_i=0,\\ ~~~~~~~~~\alpha\geq0, ~~~~~~~~~~~~~~~i=1,....,n \end{array}
minα 21∑i=1n∑j=1nαiαjyiyj(xiTxj)−∑i=1nαi;s.t. ∑i=1nαiyi=0, α≥0, i=1,....,n
可简化表达如下
min
α
1
2
α
T
Q
α
−
α
;
s
.
t
.
y
T
α
=
0
,
0
≤
α
≤
C
,
i
=
1
,
.
.
.
.
,
n
\begin{array}{l} \min_\alpha~~ \frac{1}{2}\alpha^TQ\alpha-\alpha;\\ s.t. ~~y^T\alpha=0,\\ ~~~~~~~~~0\leq\alpha\leq C, ~~~~~~~~~~~~~~~i=1,....,n \end{array}
minα 21αTQα−α;s.t. yTα=0, 0≤α≤C, i=1,....,n
where (线性)
Q
=
y
i
y
j
(
x
i
T
x
j
)
Q=y_iy_j(x_i^Tx_j)
Q=yiyj(xiTxj) ;
(非线性引入核函数)
Q
=
y
i
y
j
k
e
r
n
e
r
l
R
B
F
<
x
i
,
x
j
>
Q=y_iy_jkernerl_{RBF}<x_i,x_j>
Q=yiyjkernerlRBF<xi,xj>;
step 2. 将约束罚上去得
h
(
α
)
=
1
2
α
T
Q
α
−
α
−
μ
α
y
−
δ
α
+
β
(
α
−
C
)
h(\alpha)=\frac{1}{2}\alpha^TQ\alpha-\alpha-\mu\alpha y-\delta\alpha+\beta(\alpha-C)
h(α)=21αTQα−α−μαy−δα+β(α−C)
按分量看
kkt条件
- h对
α
\alpha
α求导得0
h α = Q α − 1 − μ i y i − δ i + β i = 0 h_{\alpha}=Q\alpha -1-\mu_i y_i-\delta_i+\beta_i=0 hα=Qα−1−μiyi−δi+βi=0;
令 g ( α , μ ) = Q α − 1 − μ i y i g(\alpha,\mu)=Q\alpha -1-\mu_i y_i g(α,μ)=Qα−1−μiyi
简化表达 h α = g ( α , μ ) − δ i + β i = 0 h_{\alpha}=g(\alpha,\mu)-\delta_i+\beta_i=0 hα=g(α,μ)−δi+βi=0; - 乘子大于等于0
δ i ≥ 0 \delta_i\geq0 δi≥0;
β i ≥ 0 \beta_i\geq0 βi≥0; - 互补条件(乘子*约束=0,即三种情况1.乘子等于0,2.约束等于0,3.乘子约束同时等于0)
δ i α i = 0 \delta_i\alpha_i=0 δiαi=0;
β i ( α i − C ) = 0 \beta_i(\alpha_i-C)=0 βi(αi−C)=0;
case 1.
α
=
0
\alpha=0
α=0 时
由
δ
i
α
i
=
0
\delta_i\alpha_i=0
δiαi=0 和
α
=
0
\alpha=0
α=0 , 可得
δ
i
≥
0
\delta_i\geq0
δi≥0.
由
β
i
(
α
i
−
C
)
=
0
\beta_i(\alpha_i-C)=0
βi(αi−C)=0和
α
=
0
\alpha=0
α=0, 可得
β
i
(
−
C
)
=
0
\beta_i(-C)=0
βi(−C)=0,又因为C>0, 可得
β
i
=
0
\beta_i=0
βi=0.
由
h
α
=
g
(
α
,
μ
)
−
δ
i
+
β
i
=
0
h_{\alpha}=g(\alpha,\mu)-\delta_i+\beta_i=0
hα=g(α,μ)−δi+βi=0
有
h
α
=
g
(
α
,
μ
)
−
δ
i
+
0
=
0
h_{\alpha}=g(\alpha,\mu)-\delta_i+0=0
hα=g(α,μ)−δi+0=0
g
(
α
,
μ
)
=
δ
i
≥
0
g(\alpha,\mu)=\delta_i\geq0
g(α,μ)=δi≥0
即当
α
=
0
\alpha=0
α=0 时,
g
(
α
,
μ
)
≥
0
g(\alpha,\mu)\geq0
g(α,μ)≥0.
case 2.
0
<
α
<
C
0<\alpha< C
0<α<C 时
由
δ
i
α
i
=
0
\delta_i\alpha_i=0
δiαi=0 和
α
≠
0
\alpha\neq0
α=0, 可得
δ
i
=
0
\delta_i=0
δi=0.
由
β
i
(
α
i
−
C
)
=
0
\beta_i(\alpha_i-C)=0
βi(αi−C)=0和
0
<
α
<
C
0<\alpha< C
0<α<C , 可得
(
α
i
−
C
)
≠
0
(\alpha_i-C)\neq0
(αi−C)=0,可得
β
i
=
0
\beta_i=0
βi=0.
由
h
α
=
g
(
α
,
μ
)
−
δ
i
+
β
i
=
0
h_{\alpha}=g(\alpha,\mu)-\delta_i+\beta_i=0
hα=g(α,μ)−δi+βi=0
有
h
α
=
g
(
α
,
μ
)
−
0
+
0
=
0
h_{\alpha}=g(\alpha,\mu)-0+0=0
hα=g(α,μ)−0+0=0
h
α
=
g
(
α
,
μ
)
=
0
h_{\alpha}=g(\alpha,\mu)=0
hα=g(α,μ)=0
即当
0
<
α
<
C
0<\alpha< C
0<α<C 时,
g
(
α
,
μ
)
=
0
g(\alpha,\mu)=0
g(α,μ)=0.
case 3.
α
=
C
\alpha=C
α=C 时
由
δ
i
α
i
=
0
\delta_i\alpha_i=0
δiαi=0 和
α
=
C
\alpha=C
α=C, 可得
δ
i
=
0
\delta_i=0
δi=0.
由
β
i
(
α
i
−
C
)
=
0
\beta_i(\alpha_i-C)=0
βi(αi−C)=0和
α
=
0
\alpha=0
α=0, 可得
(
α
i
−
C
)
=
0
(\alpha_i-C)=0
(αi−C)=0, 可得
β
i
≥
0
\beta_i\geq0
βi≥0.
由
h
α
=
g
(
α
,
μ
)
−
δ
i
+
β
i
=
0
h_{\alpha}=g(\alpha,\mu)-\delta_i+\beta_i=0
hα=g(α,μ)−δi+βi=0
有
h
α
=
g
(
α
,
μ
)
−
0
+
β
i
=
0
h_{\alpha}=g(\alpha,\mu)-0+\beta_i=0
hα=g(α,μ)−0+βi=0
g
(
α
,
μ
)
=
−
β
i
≤
0
g(\alpha,\mu)=-\beta_i\leq0
g(α,μ)=−βi≤0
即当
α
=
C
\alpha=C
α=C 时,
g
(
α
,
μ
)
≤
0
g(\alpha,\mu)\leq0
g(α,μ)≤0.
四. L ( w , b , α ) L(w,b,\alpha) L(w,b,α)的KKT 条件如下:
α = 0 , g ( α , μ ) ≥ 0 0 < α < C , g ( α , μ ) = 0 α = C , g ( α , μ ) ≤ 0 \begin{array}{l} \alpha=0 ,~~~~~~~~~~~ g(\alpha,\mu)\geq0\\ 0<\alpha< C, ~~ g(\alpha,\mu)=0\\ \alpha=C, ~~~~~~~~~ g(\alpha,\mu)\leq0\\ \end{array} α=0, g(α,μ)≥00<α<C, g(α,μ)=0α=C, g(α,μ)≤0
上文中的
g
(
α
,
μ
)
g(\alpha,\mu)
g(α,μ)
=
Q
α
−
1
−
μ
i
y
i
=
y
i
y
j
(
x
i
T
x
j
)
α
−
1
−
y
i
μ
i
=
y
i
(
y
j
(
x
i
T
x
j
)
α
−
μ
i
)
−
1
\begin{array}{l}=Q\alpha -1-\mu_i y_i\\=y_iy_j(x_i^Tx_j)\alpha -1- y_i\mu_i\\ =y_i(y_j(x_i^Tx_j)\alpha-\mu_i)-1 \end{array}
=Qα−1−μiyi=yiyj(xiTxj)α−1−yiμi=yi(yj(xiTxj)α−μi)−1还有的论文中令
f
(
x
i
)
=
y
j
(
x
i
T
x
j
)
α
−
μ
f(x_i)=y_j(x_i^Tx_j)\alpha-\mu
f(xi)=yj(xiTxj)α−μ,即
g
(
α
,
μ
)
=
y
i
f
(
x
i
)
−
1
g(\alpha,\mu)=y_if(x_i)-1
g(α,μ)=yif(xi)−1
则有