【Gurobi】SVM 对偶形式的推导,以及 Gurobi 求解对偶形式 SVM

SVM对偶形式的推导

SVM原问题:

min ⁡ w , b 1 2 ∣ ∣ w ∣ ∣ 2 2 \min\limits_{w,b}{\frac{1}{2}\left| |w| \right|_{2}^{2}} w,bmin21w22

s . t .    y i ( w T x i + b ) ≥ 1 s.t.~~y_{i}\left( {w^{T}x_{i} + b} \right) \geq 1 s.t.  yi(wTxi+b)1

先把不等式变成小于号的标准形式:
1 − y i ( w T x i + b ) ≤ 0 1 - y_{i}\left( {w^{T}x_{i} + b} \right) \leq 0 1yi(wTxi+b)0

化拉格朗日对偶问题:

min ⁡ w , b { max ⁡ α { 1 2 ∣ ∣ w ∣ ∣ 2 2 + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) } } \min\limits_{w,b}\left\{ {\max\limits_{\alpha}\left\{ {\frac{1}{2}\left| |w| \right|_{2}^{2} + {\sum_{i = 1}^{m}{\alpha_{i}\left( {1 - y_{i}\left( {w^{T}x_{i} + b} \right)} \right)}}} \right\}} \right\} w,bmin{αmax{21w22+i=1mαi(1yi(wTxi+b))}}

s . t .    α i ≥ 0 s.t.~~\alpha_{i} \geq 0 s.t.  αi0

为什么会多一个 max 和 α \alpha α 呢?
这是因为,为了把不等式减少,用 α \alpha α作为类似惩罚因子一样的东西,来对 ( 1 − y i ( w T x i + b ) ) \left(1-y_i\left(w^Tx_i+b\right)\right) (1yi(wTxi+b))的值进行限制,因为 ( 1 − y i ( w T x i + b ) ) ≤ 0 \left(1-y_i\left(w^Tx_i+b\right)\right)\le0 (1yi(wTxi+b))0,而 α i ≥ 0 \alpha_i\geq0 αi0,如果对两者乘积取 max,那么 ( 1 − y i ( w T x i + b ) ) \left(1-y_i\left(w^Tx_i+b\right)\right) (1yi(wTxi+b))一定全都会小于0,因为若大于0,再取 m a x max max,一定可以取到 ∞ \infty ,而外面套了一个 m i n min min,所以肯定取不到 ∞ \infty ,所以肯定可以满足条件。这样就把不等式转化到目标函数内了,这就是拉格朗日对偶。

然后,把上面这个式子记作 L ( w , b , α ) L\left(w,b,\alpha\right) L(w,b,α),然后交换 min ⁡ , max ⁡   \min{,}\max{\ } min,max 的位置

L ( w , b , α ) = max ⁡ α { min ⁡ w , b { 1 2 ∣ ∣ w ∣ ∣ 2 2 + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) } } L\left( {w,b,\alpha} \right) = {\max\limits_{\alpha}\left\{ {\min\limits_{w,b}\left\{ {\frac{1}{2}\left| |w| \right|_{2}^{2} + {\sum_{i = 1}^{m}{\alpha_{i}\left( {1 - y_{i}\left( {w^{T}x_{i} + b} \right)} \right)}}} \right\}} \right\}} L(w,b,α)=αmax{w,bmin{21w22+i=1mαi(1yi(wTxi+b))}}

里面的 min 可以对 w,b 求导然后令导数=0,从而去掉min
对min 里面的这部分求导:
f =   1 2 ∣ ∣ w ∣ ∣ 2 2 + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) f = ~\frac{1}{2}\left| |w| \right|_{2}^{2} + {\sum_{i = 1}^{m}{\alpha_{i}\left( {1 - y_{i}\left( {w^{T}\mathbf{x}_{\mathbf{i}} + b} \right)} \right)}} f= 21w22+i=1mαi(1yi(wTxi+b))

= 1 2 w T w + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) = \frac{1}{2}w^{T}w + {\sum_{i = 1}^{m}{\alpha_{i}\left( {1 - y_{i}\left( {w^{T}\mathbf{x}_{\mathbf{i}} + b} \right)} \right)}} =21wTw+i=1mαi(1yi(wTxi+b))

求导:

∂ f ∂ w = w − ∑ i = 1 m α i y i x i \frac{\partial f}{\partial w} = w - {\sum_{i = 1}^{m}{\alpha_{i}y_{i}\mathbf{x}_{\mathbf{i}}}} wf=wi=1mαiyixi

∂ f ∂ b = − ∑ i = 1 m α i y i \frac{\partial f}{\partial b} = - {\sum_{i = 1}^{m}{\alpha_{i}y_{i}}} bf=i=1mαiyi

{ ∂ f ∂ w = 0 ∂ f ∂ b = 0 ⇒   { w = ∑ i = 1 m α i y i x i 0 = ∑ i = 1 m α i y i \left. \left\{ \begin{matrix} {\frac{\partial f}{\partial w} = 0} \\ {\frac{\partial f}{\partial b} = 0} \\ \end{matrix} \right.\Rightarrow~\left\{ \begin{matrix} {w = {\sum_{i = 1}^{m}{\alpha_{i}y_{i}\mathbf{x}_{\mathbf{i}}}}} \\ {0 = {\sum_{i = 1}^{m}{\alpha_{i}y_{i}}}} \\ \end{matrix} \right. \right. {wf=0bf=0 {w=i=1mαiyixi0=i=1mαiyi
将解回代,消去w,得:
f = 1 2 ( ∑ i = 1 m α i y i x i ) T ( ∑ j = 1 m α j y j x j ) + ∑ i = 1 m α i ( 1 − y i ( ( ∑ j = 1 m α j y j x j ) T x i + b ) ) f = \frac{1}{2}\left( {\sum_{i = 1}^{m}{\alpha_{i}y_{i}\mathbf{x}_{\mathbf{i}}}} \right)^{T}\left( {\sum_{j = 1}^{m}{\alpha_{j}y_{j}\mathbf{x}_{\mathbf{j}}}} \right) + {\sum_{i = 1}^{m}{\alpha_{i}\left( {1 - y_{i}\left( {\left( {\sum_{j = 1}^{m}{\alpha_{j}y_{j}\mathbf{x}_{\mathbf{j}}}} \right)^{T}x_{i} + b} \right)} \right)}} f=21(i=1mαiyixi)T(j=1mαjyjxj)+i=1mαi(1yi((j=1mαjyjxj)Txi+b))

= 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j + ∑ i = 1 m α i − ∑ i = 1 m α i α j y i y j x i T x j −   ∑ i = 1 m α i y i b 等于 0 = \frac{1}{2}{\sum_{i = 1}^{m}{\sum_{j = 1}^{m}{\alpha_{i}\alpha_{j}y_{i}y_{j}\mathbf{x}_{\mathbf{i}}^{\mathbf{T}}\mathbf{x}_{\mathbf{j}}}}} + {\sum_{i = 1}^{m}\alpha_{i}} - {\sum_{i = 1}^{m}{\alpha_{i}\alpha_{j}y_{i}y_{j}\mathbf{x}_{\mathbf{i}}^{\mathbf{T}}\mathbf{x}_{\mathbf{j}}}} - ~\overset{等于0}{\sum_{i = 1}^{m}{\alpha_{i}y_{i}b}} =21i=1mj=1mαiαjyiyjxiTxj+i=1mαii=1mαiαjyiyjxiTxj i=1mαiyib等于0

= ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j = {\sum_{i = 1}^{m}\alpha_{i}} - \frac{1}{2}{\sum_{i = 1}^{m}{\sum_{j = 1}^{m}{\alpha_{i}\alpha_{j}y_{i}y_{j}\mathbf{x}_{\mathbf{i}}^{\mathbf{T}}\mathbf{x}_{\mathbf{j}}}}} =i=1mαi21i=1mj=1mαiαjyiyjxiTxj

这就是SVM的对偶形式:

L ( α ) = max ⁡ α { ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j } L(\alpha) = {\max\limits_{\alpha}\left\{ {{\sum_{i = 1}^{m}\alpha_{i}} - \frac{1}{2}{\sum_{i = 1}^{m}{\sum_{j = 1}^{m}{\alpha_{i}\alpha_{j}y_{i}y_{j}\mathbf{x}_{\mathbf{i}}^{\mathbf{T}}\mathbf{x}_{\mathbf{j}}}}}} \right\}} L(α)=αmax{i=1mαi21i=1mj=1mαiαjyiyjxiTxj}

s . t .     ∑ i = 1 m α i y i = 0 s.t.~~~{\sum_{i = 1}^{m}{\alpha_{i}y_{i}}} = 0 s.t.   i=1mαiyi=0

然后怎么求解呢?(二次规划形式,可以用 Gurobi 等优化软件求解)

注:下面在使用 gurobi 的代码中,设定的 α \alpha α范围设定为了(0,1000),
很巧合地使用了软间隔SVM

硬间隔 SVM 对偶形式: 0 ≤ α i 0\le\alpha_i 0αi
软间隔 SVM 对偶形式: 0 ≤ α i ≤ C 0\le\alpha_i\le C 0αiC
α i \alpha_i αi 的取值范围的不同恰好就是软间隔 SVM 对偶形式和硬间隔 SVM 的唯一区别

Gurobi 求解 SVM 对偶形式

from gurobipy import *
import numpy as np
# import iris
from sklearn.datasets import load_iris
iris = load_iris()
X_raw = iris.data
y_raw = iris.target

把 iris 数据集改造成二分类数据集,并且x取前两个维度,方便观察

# get iris class 1 and class 2 as training data
X = X_raw[y_raw!= 2][:,:2]
y = y_raw[y_raw!= 2]
# convert y==0 to y==-1
y[y==0] = -1
y
array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1])
X.shape,y.shape
((100, 2), (100,))
# plot X,y, with different colors for class 1 and class 2
import matplotlib.pyplot as plt
plt.scatter(X[y==-1,0],X[y==-1,1],c='r',marker='o',label='class 1')
plt.scatter(X[y==1,0],X[y==1,1],c='b',marker='x',label='class 2')

在这里插入图片描述

建立 Gurobi 模型

model = Model()
m = X.shape[0]

设定对偶问题变量 α i     i = 1 , . . . , m \alpha_i\ \ \ i=1,...,m αi   i=1,...,m

a = {}
for i in range(m):
    name = 'a_' + str(i) + '_'
    a[i]=model.addVar(0,1000,vtype=GRB.CONTINUOUS,name=name)

设定目标函数 ( max ⁡ )   o b j = ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j (\max)\ obj=\sum_{i=1}^m{\alpha_i}-\frac{1}{2}\sum_{i=1}^m{\sum_{j=1}^m{\alpha_i \alpha_j y_i y_j x_i^T x_j }} (max) obj=i=1mαi21i=1mj=1mαiαjyiyjxiTxj

obj = LinExpr(0)
for i in range(m):
    obj.addTerms(1, a[i])

for i in range(m):
    for j in range(m):
        obj = obj + (-0.5*y[i]*y[j]*(X[i]@X[j])*a[i]*a[j])

model.setObjective(obj,GRB.MAXIMIZE)

设定约束 ∑ i = 1 m α i y i ≥ 0 \sum_{i=1}^m{\alpha_i y_i}\ge 0 i=1mαiyi0

for i in range(m):
    lhs=LinExpr(0)
    lhs.addTerms(y[i],a[i])
model.addConstr(lhs==0)

设定约束 α i ≥ 0 \alpha_i\ge0 αi0

for i in range(m):
    lhs=LinExpr(0)
    lhs.addTerms(1,a[i])
    model.addConstr(lhs>=0)
# set model torrentiality
model.Params.TimeLimit = 60

求解

model.optimize()

Output exceeds the size limit. Open the full output data in a text editor
Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (win64)
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 101 rows, 100 columns and 101 nonzeros
Model fingerprint: 0x531e14cb
Model has 5050 quadratic objective terms
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  QObjective range [3e+01, 1e+02]
  Bounds range     [1e+03, 1e+03]
  RHS range        [0e+00, 0e+00]
Presolve removed 101 rows and 1 columns
Presolve time: 0.00s
Presolved: 0 rows, 99 columns, 0 nonzeros
Presolved model has 4950 quadratic objective terms
Ordering time: 0.00s

Barrier statistics:
 Free vars  : 2
 AA' NZ     : 1.000e+00
 Factor NZ  : 3.000e+00
 Factor Ops : 5.000e+00 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
...

Barrier solved model in 14 iterations and 0.01 seconds (0.00 work units)
Optimal objective 3.98490889e+03
for v in model.getVars():
    print('%s %g' % (v.varName, v.x))
print('Obj: %g' % model.objVal)
Output exceeds the size limit. Open the full output data in a text editor
a_0_ 8.66013e-11
a_1_ 192.72
a_2_ 1.11011e-10
a_3_ 1.3588e-10
a_4_ 5.60845e-11
a_5_ 4.88591e-11
a_6_ 5.23398e-11
a_7_ 1.00946e-10
a_8_ 2.46651e-10
a_9_ 7.19769e-10
a_10_ 8.36457e-11
a_11_ 6.78189e-11
a_12_ 3.88936e-09
a_13_ 9.53111e-11
a_14_ 6.77632e-11
a_15_ 3.0376e-11
a_16_ 4.88591e-11
a_17_ 8.66013e-11
a_18_ 1.14911e-10
a_19_ 4.23708e-11
a_20_ 6.43613e-10
a_21_ 5.11846e-11
a_22_ 3.9366e-11
a_23_ 2.84358e-10
a_24_ 6.78189e-11
...
a_97_ 4.33557e-11
a_98_ 7.59785e-11
a_99_ 0
Obj: 3984.91

查看所有非 0 的 α \alpha α

for v in model.getVars():
    if v.x > 1e-8:
        print('%s %g' % (v.varName, v.x))
print('Obj: %g' % model.objVal)
a_1_ 192.72
a_25_ 1000
a_41_ 1000
a_84_ 823.764
a_85_ 1000
Obj: 3984.91

如何求解b?只需知道哪些是支持向量就行了👇

https://stats.stackexchange.com/questions/362046/how-is-bias-term-calculated-in-svms

在这里插入图片描述

获得所有支持向量

# find support vectors xi ( which a are non-zero )
support_vectors = []
for i in range(m):
    if a[i].x > 1e-8:
        support_vectors.append(X[i])
support_vectors
[array([4.9, 3. ]),
 array([5., 3.]),
 array([4.5, 2.3]),
 array([5.4, 3. ]),
 array([6. , 3.4])]

根据支持向量计算出 b = y 支持 − w T x 支持 b=y_{支持}-w^T x_{支持} b=y支持wTx支持

# calculate b ( b= y_i - w^T x_i for support vectors )
b = 0
for i in range(len(support_vectors)):
    b += y[i] - (w@support_vectors[i])
b /= len(support_vectors) # 取平均

计算 w = ∑ i = 1 m α i y i x i w=\sum_{i=1}^m{\alpha_i y_i x_i} w=i=1mαiyixi

w = np.zeros(2)

for i,v in enumerate(model.getVars()):
    w += v.x * y[i] * X[i]
w,b
(array([ 4.        , -6.86666667]), -1.451999999981188)

画图

# plot w^T x + b =0 and data
xx = np.linspace(3,7,100)
yy = -(w[0]*xx + b)/w[1]
plt.plot(xx,yy,'r')
# plot dataset
plt.scatter(X[y==-1,0],X[y==-1,1],c='r',marker='o',label='class 1')
plt.scatter(X[y==1,0],X[y==1,1],c='b',marker='x',label='class 2')

C=1000

在这里插入图片描述

C=1

在这里插入图片描述

C=0.2

在这里插入图片描述
观察得到结论:根据公式
min ⁡ f Ω ( f ) + C ∑ i = 1 m l ( f ( x i , y i ) ) \min_f{\Omega(f)+C\sum_{i=1}^m{l(f(\mathbf{x}_i,y_i))}} minfΩ(f)+Ci=1ml(f(xi,yi))

左边:结构风险(防过拟合);右边:经验风险(拟合数据集)

随着 C C C 的减小,越来越不能拟合数据集(越来越趋向于软间隔 SVM)
这里的 C C C 的调整,依靠调整 Gurobi 模型设置中 α i \alpha_i αi 的取值范围实现

a = {}
for i in range(m):
    name = 'a_' + str(i) + '_'
    a[i]=model.addVar(0,1000,vtype=GRB.CONTINUOUS,name=name) # 在这里调整 (C=1000)
  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值