【ML】SVC接口及案例

SVM

SVM原理部分可以见SVM系列博客,也可以参考讲义.
SVM的主要优势如下:

  • Effective in high dimensional spaces.
  • Still effective in cases where number of domensions is greater than the number of samples.(当特征数量比观测点数量还多的时候,SVM仍然是有效的,这个特点使得SVM比很多计量模型优秀的地方).
  • Uses a subset of training points in the decision function (called support vectors, 支持向量),so it is also memory efficient.
  • Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

SVM的主要劣势如下:

  • If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial. (当特征的数量比样本数量还多的时候,关键是如何选择核函数来克服过拟合的情况).
  • SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.(使用5-fold交叉验证,对计算资源的开销比较大).

Classification

SVCNuSVCLinearSVC模型既可以完成二分类也可以完成多分类任务.
在多分类任务中,SVCNuSVC(两种模型类似,仅仅在参数和数学模型上有着差别),使用one-versus-one方法,即建立 N ( N − 1 ) 2 \frac{N(N-1)}{2} 2N(N1)个分类器,其中 N N N表示类别的数量,每个分类器都执行二分类任务.

To provide a consistent interface with other classifiers, the decision_function_shape option allows to monotonically transform the results of the ovo classifier to a ovr decision function of shape (n_samples, n_classes).

def svc_multi_class_demo():
    X = [[0], [1], [2], [3]]
    Y = [0, 1, 2, 3]
    clf = SVC(decision_function_shape='ovo') # 使用ovo方法
    clf.fit(X, Y)
    dec = clf.decision_function([[1]])
    print(dec.shape[1]) # 6

    clf = SVC(decision_function_shape='ovr') # 使用ovr方法
    clf.fit(X, Y)
    dec = clf.decision_function([[0]])
    print(dec.shape[1]) # 4

Scores and probabilities

决策函数decision_function给出了每种类别的评分 ,当选项probability设置为True,可以给出每种类别的估计概率(方法predict_probapredict_log_proba),概率计算使用Platt scaling1: logistic regression on the SVM's scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per2.

Issues about Platt Scaling
The cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimate may be inconsistent with the scores:

  • the argmax of the scores may not be the argmax of the probabilities
  • in binary classification, a sample may be labeled by predict as belonging to the positive class even if the output of predict_proba is less than 0.5.

Unbalanced prolems

为了解决数据集中存在的不平衡问题,可以对稀有类数据进行加权处理,参数class_weightsample_weight可以实现该操作.
SVC中,class_weight以字典形式传入{class_label : value},将class_value的权重值变为C*value.

Complexity

当训练向量的数量上升时,SVM的计算开销也在增加,SVM的核心是求解一个QP规划问题,将支持向量从其余训练数据中分离,基于libsvm的QO求解器的算法空间复杂度介于 O ( n f e a t u r e s × n s a m p l e s 2 ) \mathcal{O}(n_{features}\times n_{samples}^2) O(nfeatures×nsamples2) O ( n f e a t u r e s × n s a m p l e s 3 ) \mathcal{O}(n_{features}\times n^3_{samples}) O(nfeatures×nsamples3).
相比较而言,LinearSVC的效率就会高很多.

Tips

Kernel cache size

kernel cache的值对算法的运行时间有着明显的影响,在RAM足够的条件下,可以将cache_size设置为超过默认值200MB.

Setting C

C值默认为1,如果观测数据中存在很多噪音,那么应该降低C

decresing C correspond to more regularization.

highly recommended to scale data

可以使用pipeline()方法将数据标准化/归一化流程与SVC串联

def svc_pipeline():
    X = [[0, 2, 10, 15], [1,-3, 2, 7], [2, 18, -100, 1], [3, 0.25, 100, 7]]
    y = [0, 1, 1, 0]
    clf = make_pipeline(StandardScaler(), SVC())
    clf.fit(X, y)
    ans = clf.predict([[2, 3, 11, 3]])
    print(ans)

Shrinking parameters

We found that if the number of iterations is large, then shrinking can shorten the training time. However, if we loosely solve the optimization problem (e.g. using a large stopping tolerance), the code without using shrinking may be faster.

L1 penalization

使用L1正则化项可以产生稀疏解,仅有部分特征被使用,当C值增加时倾向于产生一个更加复杂的模型(使用更多的特征),零模型(null model, all weights equal to zero)可以使用函数l1_min_c得到

Kernel functions

常用的核函数有如下几种

kernel functionexpressions
linear ⟨ x , x ′ ⟩ \langle x, x'\rangle x,x
polynomial ( γ ⟨ x , x ′ ⟩ + r ) d (\gamma\langle x, x'\rangle+r)^d (γx,x+r)d, d d d由参数degree定义, r r rcoef0定义
rbf exp ⁡ ( − γ ∥ x − x ′ ∥ 2 ) \exp(-\gamma\lVert x-x' \rVert^2) exp(γxx2) γ \gamma γ由参数gamma,大于0
sigmoid tanh ⁡ ( γ ⟨ x , x ′ ⟩ + r ) \tanh(\gamma\langle x, x' \rangle+r) tanh(γx,x+r)
def svc_kernel():
    linear_svc = SVC(kernel='linear')
    print(linear_svc.kernel)
    rbf_svc = SVC(kernel='rbf')
    print(rbf_svc.kernel)

parameters of RBF kernel

在RBF(Radial Basis Function)核中,需要考虑两个参数,Cgamma.

  • C: 参数在所有的SVM核函数中通用,在误分类和训练样本中平衡,描述了决策平面的形态,一个较低的C值倾向于得到一个粗略的分离平面,而较高的C值倾向于将每个样本都分类正确.
  • gamma: 定义了单个样本点的影响范围,当gamma值越大,越近的样本点会受到影响.

Interface

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)

demos: multilabel classification

def demo_multilabel():
    # 绘制分离超平面
    def hyperplane_plot(clf, x_min, x_max, ls, label):
        w = clf.coef_[0]
        k = -w[0]/w[1]
        xx = np.linspace(x_min-3, x_max+3)
        yy = k*xx-clf.intercept_[0]
        plt.plot(xx, yy, ls, alpha=0.7, label=label)


    def subfigure_plot(X, y, subp, title, transform):
        # PCA提取主成分
        if transform=='pca':
            X = PCA(n_components=2).fit_transform(X)
        elif transform=='cca':
            X = CCA(n_components=2).fit(X, y).transform(X)
        else:
            raise ValueError

        x_min, x_max = np.min(X[:, 0]), np.max(X[:, 0])
        y_min, y_max = np.min(X[:, 1]), np.max(X[:, 1])

        clf = OneVsRestClassifier(SVC(kernel='linear')) # ovr多分类
        clf.fit(X, y)

        plt.subplot(2, 2, subp)
        plt.title(title)

        class_0 = np.where(y[:, 0])
        class_1 = np.where(y[:, 1])        

        plt.scatter(X[:, 0], X[:, 1], s=40, c='gray', alpha=0.6, edgecolors=(0, 0, 0))
        plt.scatter(X[class_0, 0], X[class_0, 1], s=120, edgecolors='b', facecolors='none', lw=1, label='Class 0')
        plt.scatter(X[class_1, 0], X[class_1, 1], s=60, edgecolors='orange', facecolors='none', lw=1, label='Class 1')

        hyperplane_plot(clf.estimators_[0], x_min, x_max, 'k--', 'Boundary\nfor Class 0')
        hyperplane_plot(clf.estimators_[1], x_min, x_max, 'k-.', 'Boundary\nfor Class 1')

        plt.xticks([])
        plt.yticks([])
        plt.xlim(x_min-0.5*x_max, x_max+0.5*x_max)
        plt.ylim(y_min-0.5*y_max, y_max+0.5*y_max)

    plt.figure(figsize=(8, 6))

    # 加入无标签的数据
    X, y = make_multilabel_classification(n_classes=2, n_labels=1, allow_unlabeled=True, random_state=729)
    subfigure_plot(X, y, 1, 'With Unlabeled obs PCA', 'pca')
    subfigure_plot(X, y, 2, 'With Unlabeled obs CCA', 'cca')

    # 不存在无标签数据
    X, y = make_multilabel_classification(n_classes=2, n_labels=1, allow_unlabeled=False, random_state=729)
    subfigure_plot(X, y, 3, 'Without Unlabeled obs PCA', 'pca')
    subfigure_plot(X, y, 4, 'Without Unlabeled obs CCA', 'cca')

    plt.subplots_adjust(.04, .02, .97, .94, .09, .2)
    plt.show()

multiclass

References

SVM User Guide
sklearn.svm.SVC


  1. Probabilistic Outputs for SVM and Comparsions to Regularized Likelihood Methods ↩︎

  2. Probability Estimates for Multi-class Classification by Pairwise Coupling ↩︎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Quant0xff

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值