Support vector machine (SVM) analysis is a popular machine learning tool for classification and regression, first identified by Vladimir Vapnik and his colleagues in 1992. Linear epsilon-insensitive SVM (ε-SVM) regression is also known as L 1 L_{1} L1 loss. In ε-SVM regression, the set of training data includes predictor variables and observed response values. The goal is to find a function f(x) that deviates from y n y_{n} yn by a value no greater than ε for each training point x, and at the same time is as flat as possible.
Linear SVM Regression: Primal Formula
原SVR问题:
min
ω
,
b
1
2
∥
ω
∥
2
+
C
∑
k
=
1
N
L
ϵ
(
f
(
x
k
)
−
y
k
)
L
ϵ
(
z
)
=
{
0
,
i
f
∣
z
∣
<
ϵ
∣
z
∣
−
ϵ
,
o
t
h
e
r
w
i
s
e
\begin{array}{l} \min_{\omega,b} \frac{1}{2}\|\omega\|^{2}+C\sum_{k=1}^N L_{\epsilon}(f(x_{k})-y_{k}) \\ L_{\epsilon}(z)=\left\{ \begin{array}{l} 0, \quad\quad if \, |z|<\epsilon \\ |z|-\epsilon,\quad otherwise \end{array} \right. \end{array}
minω,b21∥ω∥2+C∑k=1NLϵ(f(xk)−yk)Lϵ(z)={0,if∣z∣<ϵ∣z∣−ϵ,otherwise
引入松弛变量
ξ
k
,
ξ
k
∗
\xi_{k},\xi_{k} ^{*}
ξk,ξk∗ , 将上式重写
min
J
(
ω
)
=
1
2
ω
T
ω
+
C
∑
k
=
1
N
(
ξ
k
+
ξ
k
∗
)
s
.
t
.
y
k
−
(
x
k
T
ω
+
b
)
≤
ϵ
+
ξ
k
(
x
k
T
ω
+
b
)
−
y
k
≤
ϵ
+
ξ
k
∗
ξ
k
≥
0
ξ
k
∗
≥
0
\begin{array}{l} \min J(\omega)=\frac{1}{2}\omega^T\omega+C \sum_{k=1}^N(\xi_{k}+\xi_{k}^{*}) \\ s.t. \\ \quad\quad \begin{array}{l} y_{k}-(x_{k}^T\omega+b)\leq \epsilon+\xi_{k} \\ (x_{k}^T\omega+b)-y_{k}\leq\epsilon+\xi_{k}^{*} \\ \xi_{k}\geq 0 \\ \xi_{k}^{*}\geq 0 \end{array} \end{array}
minJ(ω)=21ωTω+C∑k=1N(ξk+ξk∗)s.t.yk−(xkTω+b)≤ϵ+ξk(xkTω+b)−yk≤ϵ+ξk∗ξk≥0ξk∗≥0
Linear SVM Regression: Dual Formula
L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = 1 2 ∣ ∣ ω ∣ ∣ 2 + C ∑ i = 1 N ( ξ i + ξ i ∗ ) − ∑ i = 1 N μ i ξ i − ∑ i = 1 N μ i ∗ ξ i ∗ + ∑ i = 1 N α i ( y i − f ( x i ) − ϵ − ξ i ) + ∑ i = 1 N α i ∗ ( f ( x i ) − y i − ϵ − ξ i ∗ ) s . t . α i ≥ 0 , α i ∗ ≥ 0 μ i ≥ 0 , μ i ∗ ≥ 0 (3) \begin{array}{l} L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \frac{1}{2}||\omega |{|^2} + C\sum\limits_{i = 1}^N {({\xi _i} + {\xi _i}^*)} \\ -\sum\limits_{i = 1}^N {{\mu _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\mu _i}^*} {\xi _i}^* + \sum\limits_{i = 1}^N {{\alpha _i}} ({y_i} - f({x_i})-\epsilon - {\xi _i}) + \sum\limits_{i = 1}^N {{\alpha _i}^*} (f({x_i}) - {y_i}-\epsilon - {\xi _i}^*) \\ s.t. \quad \begin{array}{l} \alpha_{i}\geq 0, \alpha_{i}^{*}\geq 0 \\ \mu_{i}\geq 0, \mu_{i}^{*}\geq 0 \end{array} \end{array} \tag{3} L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=21∣∣ω∣∣2+Ci=1∑N(ξi+ξi∗)−i=1∑Nμiξi−i=1∑Nμi∗ξi∗+i=1∑Nαi(yi−f(xi)−ϵ−ξi)+i=1∑Nαi∗(f(xi)−yi−ϵ−ξi∗)s.t.αi≥0,αi∗≥0μi≥0,μi∗≥0(3)
KKT 偏导为0条件
∂
∂
ω
L
(
ω
,
b
,
ξ
,
ξ
∗
,
α
,
α
∗
,
μ
,
μ
∗
)
=
ω
−
∑
i
=
1
N
α
i
x
i
+
∑
i
=
1
N
α
i
∗
x
i
=
0
∂
∂
b
L
(
ω
,
b
,
ξ
,
ξ
∗
,
α
,
α
∗
,
μ
,
μ
∗
)
=
−
∑
i
=
1
N
α
i
+
∑
i
=
1
N
α
i
∗
=
0
∂
∂
ξ
i
L
(
ω
,
b
,
ξ
,
ξ
∗
,
α
,
α
∗
,
μ
,
μ
∗
)
=
C
−
μ
i
−
α
i
=
0
∂
∂
ξ
i
∗
L
(
ω
,
b
,
ξ
,
ξ
∗
,
α
,
α
∗
,
μ
,
μ
∗
)
=
C
−
μ
i
∗
−
α
i
∗
=
0
(4)
\begin{array}{l} \frac{\partial }{{\partial \omega }}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \omega - \sum\limits_{i = 1}^N {{\alpha _i}} {x_i} + \sum\limits_{i = 1}^N {{\alpha _i}^*} {x_i} = 0 \\ \frac{\partial }{{\partial b}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = - \sum\limits_{i = 1}^N {{\alpha _i}} + \sum\limits_{i = 1}^N {{\alpha _i}^*} = 0 \\ \frac{\partial }{{\partial {\xi _i}}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = C - {\mu _i} - {\alpha _i} = 0 \\ \frac{\partial }{{\partial {\xi _i^*}}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = C - {\mu _i^*} - {\alpha _i^*} = 0 \end{array}\tag{4}
∂ω∂L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=ω−i=1∑Nαixi+i=1∑Nαi∗xi=0∂b∂L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=−i=1∑Nαi+i=1∑Nαi∗=0∂ξi∂L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=C−μi−αi=0∂ξi∗∂L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=C−μi∗−αi∗=0(4)
代入式(3)得
L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = 1 2 ∑ i = 1 N ∑ j = 1 N ( α i − α i ∗ ) ( α j − α j ∗ ) x i T x j + C ∑ i = 1 N ( ξ i + ξ i ∗ ) − ∑ i = 1 N μ i ξ i − ∑ i = 1 N μ i ∗ ξ i ∗ + ∑ i = 1 N y i ( α i − α i ∗ ) − ∑ i = 1 N ϵ ( α i + α i ∗ ) − ∑ i = 1 N α i ξ i − ∑ i = 1 N α i ∗ ξ i ∗ − ∑ i = 1 N ( α i − α i ∗ ) x i T ω − ∑ i = 1 N ( α i − α i ∗ ) b = − 1 2 ∑ i = 1 N ∑ j = 1 N ( α i − α i ∗ ) ( α j − α j ∗ ) x i T x j + ∑ i = 1 N y i ( α i − α i ∗ ) − ∑ i = 1 N ϵ ( α i + α i ∗ ) \begin{array}{l} L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {({\alpha _i} - {\alpha _i}^*)({\alpha _j} - {\alpha _j}^*){x_i}^T{x_j}} } + C\sum\limits_{i = 1}^N {({\xi _i} + {\xi _i}^*)} - \sum\limits_{i = 1}^N {{\mu _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\mu _i}^*} {\xi _i}^*\\ +\sum\limits_{i = 1}^N {{y_i}({\alpha _i}} - {\alpha _i}^*) - \sum\limits_{i = 1}^N {\epsilon ({\alpha _i}} + {\alpha _i}^*) -\sum\limits_{i = 1}^N {{\alpha _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\alpha _i}^*} {\xi _i}^* - \sum\limits_{i = 1}^N {({\alpha _i} - {\alpha _i}^*)} {x_i}^T\omega - \sum\limits_{i = 1}^N {({\alpha _i} - {\alpha _i}^*)} b\\ = -\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {({\alpha _i} - {\alpha _i}^*)({\alpha _j} - {\alpha _j}^*){x_i}^T{x_j}} } + \sum\limits_{i = 1}^N {{y_i}({\alpha _i}} - {\alpha _i}^*) - \sum\limits_{i = 1}^N {\epsilon ({\alpha _i}} + {\alpha _i}^*) \end{array} L(ω,b,ξ,ξ∗,α,α∗,μ,μ∗)=21i=1∑Nj=1∑N(αi−αi∗)(αj−αj∗)xiTxj+Ci=1∑N(ξi+ξi∗)−i=1∑Nμiξi−i=1∑Nμi∗ξi∗+i=1∑Nyi(αi−αi∗)−i=1∑Nϵ(αi+αi∗)−i=1∑Nαiξi−i=1∑Nαi∗ξi∗−i=1∑N(αi−αi∗)xiTω−i=1∑N(αi−αi∗)b=−21i=1∑Nj=1∑N(αi−αi∗)(αj−αj∗)xiTxj+i=1∑Nyi(αi−αi∗)−i=1∑Nϵ(αi+αi∗)
其余KKT条件
α
i
(
y
i
−
f
(
x
i
)
−
ϵ
−
ξ
i
)
=
0
α
i
∗
(
f
(
x
i
)
−
y
i
−
ϵ
−
ξ
i
∗
)
=
0
μ
i
ξ
i
=
0
⟹
(
C
−
α
i
)
ξ
i
=
0
μ
i
∗
ξ
i
∗
=
0
⟹
(
C
−
α
i
∗
)
ξ
i
∗
=
0
α
i
≥
0
,
α
i
∗
≥
0
μ
i
≥
0
,
μ
i
∗
≥
0
\begin{array}{l} {\alpha _i}({y_i} - f({x_i}) - \epsilon - {\xi _i}) = 0\\ {\alpha _i}^*(f({x_i}) - {y_i} - \epsilon - {\xi _i}^*) = 0\\ {\mu _i}{\xi _i} = 0 \implies (C-\alpha_{i})\xi_{i}=0\\ {\mu _i}^*{\xi _i}^* = 0\implies(C-\alpha_{i}^{*})\xi_{i}^{*}=0\\ {\alpha _i} \ge 0,{\alpha _i}^* \ge 0\\ {\mu _i} \ge 0,{\mu _i}^* \ge 0 \end{array}
αi(yi−f(xi)−ϵ−ξi)=0αi∗(f(xi)−yi−ϵ−ξi∗)=0μiξi=0⟹(C−αi)ξi=0μi∗ξi∗=0⟹(C−αi∗)ξi∗=0αi≥0,αi∗≥0μi≥0,μi∗≥0
解KKT条件,可得SVR解
f
(
x
)
=
ω
T
x
+
b
=
∑
i
=
1
N
(
α
i
−
α
i
∗
)
x
i
T
x
+
b
f(x)=\omega^Tx+b=\sum_{i=1}^N (\alpha_{i}-\alpha_{i}^{*})x_{i}^Tx+b
f(x)=ωTx+b=i=1∑N(αi−αi∗)xiTx+b
仅当样本不落入
ϵ
\epsilon
ϵ 间的隔带中,相应的
α
i
\alpha_{i}
αi和
α
i
∗
\alpha_{i}^{*}
αi∗才能取非零值。
使上式中的
α
i
−
α
i
∗
≠
0
\alpha_{i}-\alpha_{i}^{*}\neq 0
αi−αi∗=0 的样本即为SVR的支持向量,它们落在
ϵ
\epsilon
ϵ 间的隔带之外。
在得到
α
i
\alpha_{i}
αi 后,若
0
<
α
i
<
C
0<\alpha_{i}<C
0<αi<C ,则必有
ξ
i
=
0
\xi_{i}=0
ξi=0 ,进而有
b
=
y
i
−
ϵ
−
ω
T
x
i
b=y_{i}-\epsilon-\omega^Tx_{i}
b=yi−ϵ−ωTxi
实践中采用更鲁棒的办法:选取多个或所有满足条件
0
<
α
i
<
C
0<\alpha_{i}<C
0<αi<C 的样本求解 b 后取平均值。
引入核函数,则有
f
(
x
)
=
∑
i
=
1
N
(
α
i
−
α
i
∗
)
ϕ
(
x
i
)
T
ϕ
(
x
)
+
b
f(x)=\sum_{i=1}^N (\alpha_{i}-\alpha_{i}^{*})\phi(x_{i})^T\phi(x)+b
f(x)=i=1∑N(αi−αi∗)ϕ(xi)Tϕ(x)+b
参考资料
https://www.mathworks.com/help/stats/understanding-support-vector-machine-regression.html