scikit-learn（工程中用的相对较多的模型介绍）：1.4. Support Vector Machines

最新推荐文章于 2024-04-22 20:47:55 发布

mmc2015

最新推荐文章于 2024-04-22 20:47:55 发布

阅读量2.6k

点赞数 1

分类专栏： scikit-learn scikit-learn 文章标签：机器学习 scikit-learn Support Vector Machi SVM 工程应用

本文链接：https://blog.csdn.net/mmc2015/article/details/47271039

版权

scikit-learn 同时被 2 个专栏收录

51 篇文章 2 订阅

订阅专栏

scikit-learn

35 篇文章 69 订阅

订阅专栏

SVM（Support Vector Machines）在机器学习工程中广泛应用于分类、回归和异常检测。其优点包括高维效率、小样本量的有效性和多种核函数选择。SVM的缺点是当维度远超样本数时效果不佳，且不直接提供概率估计。在分类中，SVC、NuSVC和LinearSVC有不同数学形式，SVC和NuSVC采用one-against-one策略，LinearSVC采用one-vs-rest。在回归任务中，有SVR、NuSVR和LinearSVR。此外，OneClassSVM用于无监督的新颖性检测。实际应用中，应注意数据预处理、参数调整和选择合适的核函数。

摘要由CSDN通过智能技术生成

参考：http://scikit-learn.org/stable/modules/svm.html

在实际项目中，我们真的很少用到那些简单的模型，比如LR、kNN、NB等，虽然经典，但在工程中确实不实用。

今天我们关注在工程中用的相对较多的SVM。

SVM功能不少：Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

好处多多：高维空间的高效率；维度大于样本数的有效性；仅使用训练点的子集（称作支持向量），空间占用少；有不同的kernel functions供选择。

也有坏处：维度大于样本数的有效性----但维度如果相对样本数过高，则效果会非常差；不能直接提供概率估计，需要通过an expensive five-fold cross-validation (see Scores and probabilities, below).才能实现。

（SVM支持dense和sparse sample vectors，但是如果预测使用的sparse data，那训练也要使用稀疏数据。为了发挥SVM效用，请use C-ordered numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.）

1、分类

SVC, NuSVC and LinearSVC 是三个可以进行multi-class分类的模型。三者的本质区别就是 have different mathematical formulations，具体参考本文最后的公式。

SVC, NuSVC and LinearSVC 和其他分类器一样，使用fit、predict方法：

 
  >>> from sklearn import svm
>>> X = [[0, 0], [1, 1]]
>>> y = [0, 1]
>>> clf = svm.SVC()
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
 
 

After being fitted, the model can then be used to predict new values:

 
  >>> 
  >>> clf.predict([[2., 2.]])
array([1]) 
 

SVM中的支持向量的相关属性可以使用 support_vectors_ , support_ and n_support 来获取：

>>> # get support vectors
>>> clf.support_vectors_
array([[ 0.,  0.],
       [ 1.,  1.]])
>>> # get indices of support vectors
>>> clf.support_ 
array([0, 1]...)
>>> # get number of support vectors for each class
>>> clf.n_support_ 
array([1, 1]...)

对于multi-class分类：

SVC and NuSVC 的机制是“one-against-one”（training n_class * (n_class - 1) / 2个 models），而 LinearSVC 的策略是“one-vs-the-rest”（training n_class个 models）。而实践中，one-vs-rest是常用和较好的，因为结果其实差不多，但时间省好多。。。

[python]view plaincopy 
   
 >>> X = [[0], [1], [2], [3]]  
 >>> Y = [0, 1, 2, 3]  
 >>> clf = svm.SVC()  
 >>> clf.fit(X, Y)   
 SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,  
 gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,  
 shrinking=True, tol=0.001, verbose=False)  
 >>> dec = clf.decision_function([[1]])  
 >>> dec.shape[1] # 4 classes: 4*3/2 = 6  
 6  
 >>> lin_clf = svm.LinearSVC()  
 >>> lin_clf.fit(X, Y)   
 LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,  
      intercept_scaling=1, loss='squared_hinge', max_iter=1000,  
      multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,  
      verbose=0)  
 >>> dec = lin_clf.decision_function([[1]])  
 >>> dec.shape[1]  
 4  

关于样本所属类别的confidence： The SVC method decision_function gives per-class scores for each sample。另外还有所谓的 option probability，但是，If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.（主要是因为probability的理论背景有缺陷）

在每个class或者sample的权重不同的情况下，可以设置keywords class_weight andsample_weight ：

类别权重：SVC (but not NuSVC) implement a keyword class_weight in the fit method. It’s a dictionary of the form {class_label : value}, where value is a floating point number > 0 that sets the parameter C of class class_label to C * value.

样本权重：SVC, NuSVC, SVR, NuSVR and OneClassSVM implement also weights for individual samples in method fit through keyword sample_weight. Similar to class_weight, these set the parameter C for the i-th example to C * sample_weight[i].

最后给几个例子：

2、回归

Support Vector Regression.

看能明白这句话不能：Analogously（to SVClassfication）, the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.

同样也是三个模型： SVR, NuSVR and LinearSVR。

>>> from sklearn import svm
>>> X = [[0, 0], [2, 2]]
>>> y = [0.5, 2.5]
>>> clf = svm.SVR()
>>> clf.fit(X, y) 
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.5])

给个例子：

Support Vector Regression (SVR) using linear and non-linear kernels

3、Density estimation，novelty detection（密度估计、新颖性检测）

先看下wiki上怎么说Novelty detection：Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of,^[1] with the help of either statistical or machine learning based approaches.

OneClassSVM is used for novelty detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify new points as belonging to that set or not. 过程是无监督的，所以输入只有X。

具体详细应用参考：section Novelty and Outlier Detection 。

最后给出两个例子：

4、复杂度

The QP（quadratic programming problem） solver used by this libsvm-based implementation scales between $O(n_{features} \times n_{samples}^2)$ and $O(n_{features} \times n_{samples}^3)$ depending on how efficiently the libsvm cache is used in practice (dataset dependent).

5、实际应用中的一些小tips

Avoid data copy；kernel cache size；

Setting C：C默认是1，但是如果data中有很多noisy observations，需要减小C；

it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.

在 SVC中，如果数据样本unbalanced，set class_weight='auto' and/or try different penalty parameters C.

6、kernel function

使用方式为：svm.SVC(kernel='linear')，常见的kernel有：

linear: $\langle x, x'\rangle$ .
polynomial: $(\gamma \langle x, x'\rangle + r)^d$ . $d$ is specified by keyword degree, $r$ by coef0.
rbf: $\exp(-\gamma |x-x'|^2)$ . $\gamma$ is specified by keyword gamma, must be greater than 0.
sigmoid ( $\tanh(\gamma \langle x,x'\rangle + r)$ ), where $r$ is specified by coef0.

也可自定义kernel，例如：

>>> import numpy as np
>>> from sklearn import svm
>>> def my_kernel(x, y):
...     return np.dot(x, y.T)
...
>>> clf = svm.SVC(kernel=my_kernel)

SVM with custom kernel.

7、Mathematical formulation

1、SVC：

2、SVR：