【ML】SVC接口及案例

最新推荐文章于 2022-12-05 18:32:45 发布

Quant0xff

最新推荐文章于 2022-12-05 18:32:45 发布

阅读量1.2k

点赞数

分类专栏： # 机器学习 # Python 文章标签： SVC

本文链接：https://blog.csdn.net/qq_18822147/article/details/109020571

版权

机器学习同时被 2 个专栏收录

34 篇文章

订阅专栏

Python

27 篇文章

订阅专栏

Navigator

SVM
Complexity
Tips
Kernel functions
- parameters of RBF kernel
Interface
- demos: multilabel classification
References

SVM

SVM原理部分可以见SVM系列博客，也可以参考讲义.
SVM的主要优势如下：

Effective in high dimensional spaces.
Still effective in cases where number of domensions is greater than the number of samples.(当特征数量比观测点数量还多的时候，SVM仍然是有效的，这个特点使得SVM比很多计量模型优秀的地方).
Uses a subset of training points in the decision function (called support vectors, 支持向量)，so it is also memory efficient.
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

SVM的主要劣势如下：

If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial. (当特征的数量比样本数量还多的时候，关键是如何选择核函数来克服过拟合的情况).
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.（使用5-fold交叉验证，对计算资源的开销比较大）.

Classification

SVC，NuSVC和LinearSVC模型既可以完成二分类也可以完成多分类任务.
在多分类任务中，SVC和NuSVC（两种模型类似，仅仅在参数和数学模型上有着差别），使用one-versus-one方法，即建立 $\frac{N(N-1)}{2}$ 个分类器，其中 $N$ 表示类别的数量，每个分类器都执行二分类任务.

To provide a consistent interface with other classifiers, the decision_function_shape option allows to monotonically transform the results of the ovo classifier to a ovr decision function of shape (n_samples, n_classes).

def svc_multi_class_demo():
    X = [[0], [1], [2], [3]]
    Y = [0, 1, 2, 3]
    clf = SVC(decision_function_shape='ovo') # 使用ovo方法
    clf.fit(X, Y)
    dec = clf.decision_function([[1]])
    print(dec.shape[1]) # 6

    clf = SVC(decision_function_shape='ovr') # 使用ovr方法
    clf.fit(X, Y)
    dec = clf.decision_function([[0]])
    print(dec.shape[1]) # 4

Scores and probabilities

决策函数decision_function给出了每种类别的评分，当选项probability设置为True,可以给出每种类别的估计概率（方法predict_proba和predict_log_proba），概率计算使用Platt scaling¹: logistic regression on the SVM's scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per².

Issues about Platt Scaling
The cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimate may be inconsistent with the scores:

the argmax of the scores may not be the argmax of the probabilities
in binary classification, a sample may be labeled by predict as belonging to the positive class even if the output of predict_proba is less than 0.5.

Unbalanced prolems

为了解决数据集中存在的不平衡问题，可以对稀有类数据进行加权处理，参数class_weight和sample_weight可以实现该操作.
在SVC中，class_weight以字典形式传入{class_label : value}，将class_value的权重值变为C*value.

Complexity

当训练向量的数量上升时，SVM的计算开销也在增加，SVM的核心是求解一个QP规划问题，将支持向量从其余训练数据中分离，基于libsvm的QO求解器的算法空间复杂度介于 $\mathcal{O}(n_{features}\times n_{samples}^2)$ 和 $\mathcal{O}(n_{features}\times n^3_{samples})$ .
相比较而言，LinearSVC的效率就会高很多.

Tips

Kernel cache size

kernel cache的值对算法的运行时间有着明显的影响，在RAM足够的条件下，可以将cache_size设置为超过默认值200MB.

Setting C

C值默认为1，如果观测数据中存在很多噪音，那么应该降低C值

decresing C correspond to more regularization.

highly recommended to scale data

可以使用pipeline()方法将数据标准化/归一化流程与SVC串联

def svc_pipeline():
    X = [[0, 2, 10, 15], [1,-3, 2, 7], [2, 18, -100, 1], [3, 0.25, 100, 7]]
    y = [0, 1, 1, 0]
    clf = make_pipeline(StandardScaler(), SVC())
    clf.fit(X, y)
    ans = clf.predict([[2, 3, 11, 3]])
    print(ans)

Shrinking parameters

We found that if the number of iterations is large, then shrinking can shorten the training time. However, if we loosely solve the optimization problem (e.g. using a large stopping tolerance), the code without using shrinking may be faster.

L1 penalization

使用L1正则化项可以产生稀疏解，仅有部分特征被使用，当C值增加时倾向于产生一个更加复杂的模型（使用更多的特征）,零模型(null model, all weights equal to zero)可以使用函数l1_min_c得到

Kernel functions

常用的核函数有如下几种

kernel function	expressions
linear	$\langle x, x'\rangle$
polynomial	$(\gamma\langle x, x'\rangle+r)^d$ , $d$ 由参数`degree`定义， $r$ 由`coef0`定义
rbf	$\exp(-\gamma\lVert x-x' \rVert^2)$ ， $\gamma$ 由参数`gamma`，大于0
sigmoid	$\tanh(\gamma\langle x, x' \rangle+r)$

def svc_kernel():
    linear_svc = SVC(kernel='linear')
    print(linear_svc.kernel)
    rbf_svc = SVC(kernel='rbf')
    print(rbf_svc.kernel)

parameters of RBF kernel

在RBF(Radial Basis Function)核中，需要考虑两个参数，C和gamma.

C: 参数在所有的SVM核函数中通用，在误分类和训练样本中平衡，描述了决策平面的形态，一个较低的C值倾向于得到一个粗略的分离平面，而较高的C值倾向于将每个样本都分类正确.
gamma: 定义了单个样本点的影响范围，当gamma值越大，越近的样本点会受到影响.

Interface

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)

demos: multilabel classification

def demo_multilabel():
    # 绘制分离超平面
    def hyperplane_plot(clf, x_min, x_max, ls, label):
        w = clf.coef_[0]
        k = -w[0]/w[1]
        xx = np.linspace(x_min-3, x_max+3)
        yy = k*xx-clf.intercept_[0]
        plt.plot(xx, yy, ls, alpha=0.7, label=label)


    def subfigure_plot(X, y, subp, title, transform):
        # PCA提取主成分
        if transform=='pca':
            X = PCA(n_components=2).fit_transform(X)
        elif transform=='cca':
            X = CCA(n_components=2).fit(X, y).transform(X)
        else:
            raise ValueError

        x_min, x_max = np.min(X[:, 0]), np.max(X[:, 0])
        y_min, y_max = np.min(X[:, 1]), np.max(X[:, 1])

        clf = OneVsRestClassifier(SVC(kernel='linear')) # ovr多分类
        clf.fit(X, y)

        plt.subplot(2, 2, subp)
        plt.title(title)

        class_0 = np.where(y[:, 0])
        class_1 = np.where(y[:, 1])        

        plt.scatter(X[:, 0], X[:, 1], s=40, c='gray', alpha=0.6, edgecolors=(0, 0, 0))
        plt.scatter(X[class_0, 0], X[class_0, 1], s=120, edgecolors='b', facecolors='none', lw=1, label='Class 0')
        plt.scatter(X[class_1, 0], X[class_1, 1], s=60, edgecolors='orange', facecolors='none', lw=1, label='Class 1')

        hyperplane_plot(clf.estimators_[0], x_min, x_max, 'k--', 'Boundary\nfor Class 0')
        hyperplane_plot(clf.estimators_[1], x_min, x_max, 'k-.', 'Boundary\nfor Class 1')

        plt.xticks([])
        plt.yticks([])
        plt.xlim(x_min-0.5*x_max, x_max+0.5*x_max)
        plt.ylim(y_min-0.5*y_max, y_max+0.5*y_max)

    plt.figure(figsize=(8, 6))

    # 加入无标签的数据
    X, y = make_multilabel_classification(n_classes=2, n_labels=1, allow_unlabeled=True, random_state=729)
    subfigure_plot(X, y, 1, 'With Unlabeled obs PCA', 'pca')
    subfigure_plot(X, y, 2, 'With Unlabeled obs CCA', 'cca')

    # 不存在无标签数据
    X, y = make_multilabel_classification(n_classes=2, n_labels=1, allow_unlabeled=False, random_state=729)
    subfigure_plot(X, y, 3, 'Without Unlabeled obs PCA', 'pca')
    subfigure_plot(X, y, 4, 'Without Unlabeled obs CCA', 'cca')

    plt.subplots_adjust(.04, .02, .97, .94, .09, .2)
    plt.show()