支持向量机--线性分类LinearSVC

本文介绍了线性支持向量机(SVM)的基本原理,包括最优化问题的转换和合页损失函数。通过sklearn库展示了二元分类的实现,如模型参数、决策函数和分类曲线的绘制。接着,使用onevsrest方法进行了鸢尾花数据的多元分类,探讨了模型的性能和参数含义,并给出了预测置信水平和实际标签的对比。
摘要由CSDN通过智能技术生成

原理

线性支持向量机原始最优化问题:
m i n : 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i min:\frac{1}{2}\parallel{w}\parallel^{2}+C\sum_{i=1}^{N}\xi_{i} min:21w2+Ci=1Nξi
s . t .   y i ( w ∙ x i + b ) ≥ 1 − ξ i , i = 1 , 2 , ⋯   , N s.t.\ y_{i}(w\bullet{x_{i}}+b)\ge1-\xi_{i},i=1,2,\cdots,N s.t. yi(wxi+b)1ξi,i=1,2,,N
ξ i ≥ 0 , i = 1 , 2 , ⋯   , N \xi_{i}\ge0,i=1,2,\cdots,N ξi0,i=1,2,,N
等价于最优化问题:
m i n :   C ∑ i = 1 N m a x ( 0 , [ 1 − y i ( w ∙ x i + b ) ] ) + 1 2 ∥ w ∥ 2 min:\ C\sum_{i=1}^{N}max(0,[1-y_{i}(w\bullet{x_{i}}+b)])+\frac{1}{2}\parallel{w}\parallel^{2} min: Ci=1Nmax(0,[1yi(wxi+b)])+21w2
第一项为合页损失函数,C为正则化系数的倒数–惩罚系数。

sklearn实现

模型:
sklearn.svm.linearSVC(penalty=‘l2’, loss=‘squared_hinge’, *, dual=True, tol=0.0001, C=1.0, multi_class=‘ovr’, fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)
主要参数:

  1. penalty:{‘l1’, ‘l2’}, default=’l2’,正则化方法
  2. loss:{‘hinge’, ‘squared_hinge’}, default=’squared_hinge’,即hinge的平方
  3. dual:bool, default=True,Prefer dual=False when n_samples > n_features.
  4. C:float, default=1.0,The strength of the regularization is inversely proportional to C.
  5. max_iter:int, default=1000,The maximum number of iterations to be run.

主要方法:

  1. decision_function(X):输入数据,输出置信水平
  2. fit(X_train,y_train):拟合模型
  3. score(X_test,y_test):输出结果

二元分类

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
import matplotlib.pyplot as plt
%matplotlib notebook
def create_data():  #处理鸢尾花数据,获取前两项特征与标签进行分类
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, [0, 1, -1]])
    for i in range(len(data)):
        if data[i,-1] == 0:
            data[i,-1] = -1
    # print(data)
    return data[:,:2], data[:,-1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
<IPython.core.display.Javascript object>
<matplotlib.legend.Legend at 0x228908ddac8>
from sklearn.svm import LinearSVC
model = LinearSVC()
model.fit(X_train,y_train)
model.score(X_test,y_test)
1.0
#绘制分类曲线
w = model.coef_   
b = model.intercept_
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
x = np.linspace(3.0,8.0,20)
y = -w[0][0]*x/w[0][1]-b/w[0][1]
plt.plot(x,y)
plt.legend()
<IPython.core.display.Javascript object>
<matplotlib.legend.Legend at 0x22890494fd0>

多元分类–采用one vs rest方法

from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
data = pd.DataFrame(X,columns=feature_names)
data['labels'] = y
data
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)labels
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
55.43.91.70.40
64.63.41.40.30
75.03.41.50.20
84.42.91.40.20
94.93.11.50.10
105.43.71.50.20
114.83.41.60.20
124.83.01.40.10
134.33.01.10.10
145.84.01.20.20
155.74.41.50.40
165.43.91.30.40
175.13.51.40.30
185.73.81.70.30
195.13.81.50.30
205.43.41.70.20
215.13.71.50.40
224.63.61.00.20
235.13.31.70.50
244.83.41.90.20
255.03.01.60.20
265.03.41.60.40
275.23.51.50.20
285.23.41.40.20
294.73.21.60.20
..................
1206.93.25.72.32
1215.62.84.92.02
1227.72.86.72.02
1236.32.74.91.82
1246.73.35.72.12
1257.23.26.01.82
1266.22.84.81.82
1276.13.04.91.82
1286.42.85.62.12
1297.23.05.81.62
1307.42.86.11.92
1317.93.86.42.02
1326.42.85.62.22
1336.32.85.11.52
1346.12.65.61.42
1357.73.06.12.32
1366.33.45.62.42
1376.43.15.51.82
1386.03.04.81.82
1396.93.15.42.12
1406.73.15.62.42
1416.93.15.12.32
1425.82.75.11.92
1436.83.25.92.32
1446.73.35.72.52
1456.73.05.22.32
1466.32.55.01.92
1476.53.05.22.02
1486.23.45.42.32
1495.93.05.11.82

150 rows × 5 columns

from sklearn.svm import LinearSVC
model = LinearSVC(penalty='l2',loss='hinge',C=5.0,max_iter=1000)
model
LinearSVC(C=5.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
     penalty='l2', random_state=None, tol=0.0001, verbose=0)
#利用支持向量机进行多元分类,求得3个二元分类模型
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
model.fit(X_train,y_train)   
model.score(X_test,y_test)
0.91111111111111109
model.coef_   #求得参数w(3*4)
array([[ 0.09122994,  0.68795258, -0.89771321, -0.46878047],
       [ 0.63845111, -2.53659412,  0.56076395, -2.09711552],
       [-0.93766893, -1.60206571,  1.61178485,  3.44670505]])
model.intercept_  #参数b,截距
array([ 0.02498646,  3.58805562, -2.9032515 ])
print(model.decision_function(X_test))  #类别预测的置信水平,选取值最大的类别进行预测
y_test
[[  7.10150829e-01   7.26777649e-01  -7.67818099e+00]
 [ -2.63877237e+00   1.34370998e+00   1.81745975e-01]
 [  1.61727503e+00  -2.58289007e+00  -1.03215239e+01]
 [ -1.91526895e+00  -2.76549708e-01  -1.98082316e+00]
 [ -3.19493369e+00  -1.71267056e+00   1.95280407e+00]
 [ -2.01050042e+00  -3.60258900e-01  -2.19276841e+00]
 [ -2.51660439e+00  -1.27416100e+00   6.05174156e-01]
 [ -4.29851520e+00   9.64553122e-01   3.08328234e+00]
 [ -1.69251111e+00  -1.03050471e-01  -2.18095999e+00]
 [  1.50066057e+00  -1.87798823e+00  -1.00020827e+01]
 [ -1.75373537e+00   1.57268128e+00  -1.70777554e+00]
 [ -2.27403231e+00   4.99195848e-01  -1.39738111e+00]
 [ -3.69443294e+00   1.00181579e+00   2.36162361e+00]
 [  2.06559741e+00  -3.18851211e+00  -1.18432668e+01]
 [ -2.09932222e+00  -1.48025719e+00  -1.00726573e+00]
 [  1.13697130e+00  -9.16788189e-01  -9.07591398e+00]
 [ -2.09122537e+00   7.82175918e-01  -8.17325781e-01]
 [ -1.86566083e+00   3.05463735e-03  -2.13893179e+00]
 [  1.07090611e+00  -5.93235984e-01  -8.72914554e+00]
 [  1.60721089e+00  -1.98578120e+00  -1.04131929e+01]
 [ -3.70431871e+00  -1.99146410e+00   2.17055764e+00]
 [ -3.55621482e+00  -1.23301770e+00   3.12680656e+00]
 [  1.37984905e+00  -1.42238598e+00  -9.93160127e+00]
 [ -2.14042657e+00  -4.83622474e-01  -1.18963172e+00]
 [ -2.64474161e+00  -6.66677804e-01   1.60943687e+00]
 [  1.62272680e+00  -1.92798377e+00  -1.07872886e+01]
 [ -2.29899315e+00  -1.73886667e-01  -8.68246668e-01]
 [  1.06544596e+00  -7.33021571e-01  -9.10226928e+00]
 [ -1.93257380e+00  -1.06519391e+00  -1.60785352e+00]
 [  1.01489669e+00  -3.51671937e-01  -9.12959650e+00]
 [ -3.76293090e+00  -5.08358729e-01   2.82980981e+00]
 [ -3.38338703e+00  -1.89056278e+00   3.14605024e+00]
 [  2.03790650e+00  -3.93148742e+00  -1.15006942e+01]
 [ -1.72407717e+00  -5.43885397e-01  -2.14861910e+00]
 [ -3.48953042e+00  -2.76896386e+00   3.02674585e+00]
 [ -2.38813687e+00  -1.84532494e+00   3.78527906e-01]
 [  1.49247035e+00  -2.08766661e+00  -1.05617683e+01]
 [ -3.21108558e+00   4.72167391e-01   4.02415283e-01]
 [ -4.75628250e+00   9.54890080e-01   4.76006397e+00]
 [ -3.05742817e+00  -6.82355356e-01   2.10153361e+00]
 [ -1.83975047e+00  -4.99937538e-01  -1.64374203e+00]
 [ -1.65202598e+00   1.12708763e-01  -2.24530173e+00]
 [  1.36706321e+00  -1.41029061e+00  -9.37094374e+00]
 [ -1.81059256e+00   4.22444570e-01  -1.92391668e+00]
 [  1.12511823e+00  -1.05052609e+00  -9.16870896e+00]]





array([0, 1, 0, 1, 2, 1, 2, 2, 1, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 2, 2, 0,
       1, 2, 0, 1, 0, 1, 0, 2, 2, 0, 1, 2, 1, 0, 2, 2, 2, 1, 1, 0, 1, 0])
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值