SVR简介

Support vector machine (SVM) analysis is a popular machine learning tool for classification and regression, first identified by Vladimir Vapnik and his colleagues in 1992. Linear epsilon-insensitive SVM (ε-SVM) regression is also known as L 1 L_{1} L1 loss. In ε-SVM regression, the set of training data includes predictor variables and observed response values. The goal is to find a function f(x) that deviates from y n y_{n} yn by a value no greater than ε for each training point x, and at the same time is as flat as possible.

Linear SVM Regression: Primal Formula

原SVR问题:
min ⁡ ω , b 1 2 ∥ ω ∥ 2 + C ∑ k = 1 N L ϵ ( f ( x k ) − y k ) L ϵ ( z ) = { 0 , i f   ∣ z ∣ < ϵ ∣ z ∣ − ϵ , o t h e r w i s e \begin{array}{l} \min_{\omega,b} \frac{1}{2}\|\omega\|^{2}+C\sum_{k=1}^N L_{\epsilon}(f(x_{k})-y_{k}) \\ L_{\epsilon}(z)=\left\{ \begin{array}{l} 0, \quad\quad if \, |z|<\epsilon \\ |z|-\epsilon,\quad otherwise \end{array} \right. \end{array} minω,b21ω2+Ck=1NLϵ(f(xk)yk)Lϵ(z)={0,ifz<ϵzϵ,otherwise
引入松弛变量 ξ k , ξ k ∗ \xi_{k},\xi_{k} ^{*} ξk,ξk , 将上式重写
min ⁡ J ( ω ) = 1 2 ω T ω + C ∑ k = 1 N ( ξ k + ξ k ∗ ) s . t . y k − ( x k T ω + b ) ≤ ϵ + ξ k ( x k T ω + b ) − y k ≤ ϵ + ξ k ∗ ξ k ≥ 0 ξ k ∗ ≥ 0 \begin{array}{l} \min J(\omega)=\frac{1}{2}\omega^T\omega+C \sum_{k=1}^N(\xi_{k}+\xi_{k}^{*}) \\ s.t. \\ \quad\quad \begin{array}{l} y_{k}-(x_{k}^T\omega+b)\leq \epsilon+\xi_{k} \\ (x_{k}^T\omega+b)-y_{k}\leq\epsilon+\xi_{k}^{*} \\ \xi_{k}\geq 0 \\ \xi_{k}^{*}\geq 0 \end{array} \end{array} minJ(ω)=21ωTω+Ck=1N(ξk+ξk)s.t.yk(xkTω+b)ϵ+ξk(xkTω+b)ykϵ+ξkξk0ξk0

Linear SVM Regression: Dual Formula

L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = 1 2 ∣ ∣ ω ∣ ∣ 2 + C ∑ i = 1 N ( ξ i + ξ i ∗ ) − ∑ i = 1 N μ i ξ i − ∑ i = 1 N μ i ∗ ξ i ∗ + ∑ i = 1 N α i ( y i − f ( x i ) − ϵ − ξ i ) + ∑ i = 1 N α i ∗ ( f ( x i ) − y i − ϵ − ξ i ∗ ) s . t . α i ≥ 0 , α i ∗ ≥ 0 μ i ≥ 0 , μ i ∗ ≥ 0 (3) \begin{array}{l} L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \frac{1}{2}||\omega |{|^2} + C\sum\limits_{i = 1}^N {({\xi _i} + {\xi _i}^*)} \\ -\sum\limits_{i = 1}^N {{\mu _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\mu _i}^*} {\xi _i}^* + \sum\limits_{i = 1}^N {{\alpha _i}} ({y_i} - f({x_i})-\epsilon - {\xi _i}) + \sum\limits_{i = 1}^N {{\alpha _i}^*} (f({x_i}) - {y_i}-\epsilon - {\xi _i}^*) \\ s.t. \quad \begin{array}{l} \alpha_{i}\geq 0, \alpha_{i}^{*}\geq 0 \\ \mu_{i}\geq 0, \mu_{i}^{*}\geq 0 \end{array} \end{array} \tag{3} L(ω,b,ξ,ξ,α,α,μ,μ)=21∣∣ω2+Ci=1N(ξi+ξi)i=1Nμiξii=1Nμiξi+i=1Nαi(yif(xi)ϵξi)+i=1Nαi(f(xi)yiϵξi)s.t.αi0,αi0μi0,μi0(3)

KKT 偏导为0条件

∂ ∂ ω L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = ω − ∑ i = 1 N α i x i + ∑ i = 1 N α i ∗ x i = 0 ∂ ∂ b L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = − ∑ i = 1 N α i + ∑ i = 1 N α i ∗ = 0 ∂ ∂ ξ i L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = C − μ i − α i = 0 ∂ ∂ ξ i ∗ L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = C − μ i ∗ − α i ∗ = 0 (4) \begin{array}{l} \frac{\partial }{{\partial \omega }}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \omega - \sum\limits_{i = 1}^N {{\alpha _i}} {x_i} + \sum\limits_{i = 1}^N {{\alpha _i}^*} {x_i} = 0 \\ \frac{\partial }{{\partial b}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = - \sum\limits_{i = 1}^N {{\alpha _i}} + \sum\limits_{i = 1}^N {{\alpha _i}^*} = 0 \\ \frac{\partial }{{\partial {\xi _i}}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = C - {\mu _i} - {\alpha _i} = 0 \\ \frac{\partial }{{\partial {\xi _i^*}}}L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = C - {\mu _i^*} - {\alpha _i^*} = 0 \end{array}\tag{4} ωL(ω,b,ξ,ξ,α,α,μ,μ)=ωi=1Nαixi+i=1Nαixi=0bL(ω,b,ξ,ξ,α,α,μ,μ)=i=1Nαi+i=1Nαi=0ξiL(ω,b,ξ,ξ,α,α,μ,μ)=Cμiαi=0ξiL(ω,b,ξ,ξ,α,α,μ,μ)=Cμiαi=0(4)
代入式(3)得

L ( ω , b , ξ , ξ ∗ , α , α ∗ , μ , μ ∗ ) = 1 2 ∑ i = 1 N ∑ j = 1 N ( α i − α i ∗ ) ( α j − α j ∗ ) x i T x j + C ∑ i = 1 N ( ξ i + ξ i ∗ ) − ∑ i = 1 N μ i ξ i − ∑ i = 1 N μ i ∗ ξ i ∗ + ∑ i = 1 N y i ( α i − α i ∗ ) − ∑ i = 1 N ϵ ( α i + α i ∗ ) − ∑ i = 1 N α i ξ i − ∑ i = 1 N α i ∗ ξ i ∗ − ∑ i = 1 N ( α i − α i ∗ ) x i T ω − ∑ i = 1 N ( α i − α i ∗ ) b = − 1 2 ∑ i = 1 N ∑ j = 1 N ( α i − α i ∗ ) ( α j − α j ∗ ) x i T x j + ∑ i = 1 N y i ( α i − α i ∗ ) − ∑ i = 1 N ϵ ( α i + α i ∗ ) \begin{array}{l} L(\omega ,b,\xi ,{\xi ^*},\alpha ,{\alpha ^*},\mu ,{\mu ^*}) = \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {({\alpha _i} - {\alpha _i}^*)({\alpha _j} - {\alpha _j}^*){x_i}^T{x_j}} } + C\sum\limits_{i = 1}^N {({\xi _i} + {\xi _i}^*)} - \sum\limits_{i = 1}^N {{\mu _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\mu _i}^*} {\xi _i}^*\\ +\sum\limits_{i = 1}^N {{y_i}({\alpha _i}} - {\alpha _i}^*) - \sum\limits_{i = 1}^N {\epsilon ({\alpha _i}} + {\alpha _i}^*) -\sum\limits_{i = 1}^N {{\alpha _i}} {\xi _i} - \sum\limits_{i = 1}^N {{\alpha _i}^*} {\xi _i}^* - \sum\limits_{i = 1}^N {({\alpha _i} - {\alpha _i}^*)} {x_i}^T\omega - \sum\limits_{i = 1}^N {({\alpha _i} - {\alpha _i}^*)} b\\ = -\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {({\alpha _i} - {\alpha _i}^*)({\alpha _j} - {\alpha _j}^*){x_i}^T{x_j}} } + \sum\limits_{i = 1}^N {{y_i}({\alpha _i}} - {\alpha _i}^*) - \sum\limits_{i = 1}^N {\epsilon ({\alpha _i}} + {\alpha _i}^*) \end{array} L(ω,b,ξ,ξ,α,α,μ,μ)=21i=1Nj=1N(αiαi)(αjαj)xiTxj+Ci=1N(ξi+ξi)i=1Nμiξii=1Nμiξi+i=1Nyi(αiαi)i=1Nϵ(αi+αi)i=1Nαiξii=1Nαiξii=1N(αiαi)xiTωi=1N(αiαi)b=21i=1Nj=1N(αiαi)(αjαj)xiTxj+i=1Nyi(αiαi)i=1Nϵ(αi+αi)

其余KKT条件
α i ( y i − f ( x i ) − ϵ − ξ i ) = 0 α i ∗ ( f ( x i ) − y i − ϵ − ξ i ∗ ) = 0 μ i ξ i = 0    ⟹    ( C − α i ) ξ i = 0 μ i ∗ ξ i ∗ = 0    ⟹    ( C − α i ∗ ) ξ i ∗ = 0 α i ≥ 0 , α i ∗ ≥ 0 μ i ≥ 0 , μ i ∗ ≥ 0 \begin{array}{l} {\alpha _i}({y_i} - f({x_i}) - \epsilon - {\xi _i}) = 0\\ {\alpha _i}^*(f({x_i}) - {y_i} - \epsilon - {\xi _i}^*) = 0\\ {\mu _i}{\xi _i} = 0 \implies (C-\alpha_{i})\xi_{i}=0\\ {\mu _i}^*{\xi _i}^* = 0\implies(C-\alpha_{i}^{*})\xi_{i}^{*}=0\\ {\alpha _i} \ge 0,{\alpha _i}^* \ge 0\\ {\mu _i} \ge 0,{\mu _i}^* \ge 0 \end{array} αi(yif(xi)ϵξi)=0αi(f(xi)yiϵξi)=0μiξi=0(Cαi)ξi=0μiξi=0(Cαi)ξi=0αi0,αi0μi0,μi0
解KKT条件,可得SVR解
f ( x ) = ω T x + b = ∑ i = 1 N ( α i − α i ∗ ) x i T x + b f(x)=\omega^Tx+b=\sum_{i=1}^N (\alpha_{i}-\alpha_{i}^{*})x_{i}^Tx+b f(x)=ωTx+b=i=1N(αiαi)xiTx+b
仅当样本不落入 ϵ \epsilon ϵ 间的隔带中,相应的 α i \alpha_{i} αi α i ∗ \alpha_{i}^{*} αi才能取非零值。
使上式中的 α i − α i ∗ ≠ 0 \alpha_{i}-\alpha_{i}^{*}\neq 0 αiαi=0 的样本即为SVR的支持向量,它们落在 ϵ \epsilon ϵ 间的隔带之外。

在得到 α i \alpha_{i} αi 后,若 0 < α i < C 0<\alpha_{i}<C 0<αi<C ,则必有 ξ i = 0 \xi_{i}=0 ξi=0 ,进而有
b = y i − ϵ − ω T x i b=y_{i}-\epsilon-\omega^Tx_{i} b=yiϵωTxi
实践中采用更鲁棒的办法:选取多个或所有满足条件 0 < α i < C 0<\alpha_{i}<C 0<αi<C 的样本求解 b 后取平均值。

引入核函数,则有
f ( x ) = ∑ i = 1 N ( α i − α i ∗ ) ϕ ( x i ) T ϕ ( x ) + b f(x)=\sum_{i=1}^N (\alpha_{i}-\alpha_{i}^{*})\phi(x_{i})^T\phi(x)+b f(x)=i=1N(αiαi)ϕ(xi)Tϕ(x)+b

参考资料

https://www.mathworks.com/help/stats/understanding-support-vector-machine-regression.html

  • 22
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值