机器学习——python scikit-learn SVC分类

scikit-learn SVC例子:https://scikit-learn.org/stable/modules/svm.html

SVM可以用于分类、回归、异常检测。SVM库中包括SVC、LinearSVC接口

1.SVC函数介绍

​ clf = svm.SVC(C=0.8, kernel='rbf', gamma=20, decision_function_shape='ovr')
    clf.fit(x_train, y_train.ravel())​

C:惩罚参数

            默认值为1,惩罚参数是由于松弛变量而加入的,它表征的是对错误分类的惩罚程度,也就是不允许分类出错的程度。C越大,表明越不允许分类出错,但是C越大越可能过拟合。C太小的话趋于0的话,分类将不会关注分类是否正确的问题,只要求间隔 越大越好,此时分类也没有意义。

Kernel:核函数

作用:核函数的引入是为了解决线性不可分的问题,讲分类点映射的高维空间中以后,转化为可线性分割的问题

  1. kernel=‘linear’时,为线性核,C越大分类效果越好,但可能会过拟合;
  2. kernel='rbf'时,为高斯核,gamma值越小,分类界面越连续;gamma值越大,分类界面越“散”,分类效果越好,但可能会过拟合;
  3. kernel='poly':多项式核
  4. kernel=sigmoid’:Sigmoid核函数 
  5. kernel=‘precomputed’:核矩阵

decision_function_shape参数

  1. decision_function_shape='ovr'时,为one v rest,即一个类别与其他ov类别进行划分;
  2. decision_function_shape='ovo'时,为one v one,即将类别两两进行划分,用二分类的方法模拟多分类的结果;

degree 参数

                多项式poly函数的维度,默认是3,选择其他核函数时会被忽略

gamma 参数

              ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’,则会选择1/n_features

coef0参数

          核函数的常数项。对于‘poly’和‘sigmoid’有用

probability参数

              是否采用概率估计?默认为False

shrinking参数

          是否采用shirinking heuristic方法,默认为true

tol参数

        停止训练的误差值大小,默认为1e-3

cache_size参数

          核函数cache缓存大小,默认为200

class_weight参数

        类别的权重,字典形式传递,设置第几类的参数C为weight*C(C-SVC中的C)

verbose参数

        允许冗余输出

max_iter参数

         最大迭代册数,-1为无限制

random_state参数

      数据洗牌时的种子值,int值

2.例子

"""
==================================================
Plot different SVM classifiers in the iris dataset
==================================================

Comparison of different linear SVM classifiers on a 2D projection of the iris
dataset. We only consider the first 2 features of this dataset:

- Sepal length
- Sepal width

This example shows how to plot the decision surface for four SVM classifiers
with different kernels.

The linear models ``LinearSVC()`` and ``SVC(kernel='linear')`` yield slightly
different decision boundaries. This can be a consequence of the following
differences:

- ``LinearSVC`` minimizes the squared hinge loss while ``SVC`` minimizes the
  regular hinge loss.

- ``LinearSVC`` uses the One-vs-All (also known as One-vs-Rest) multiclass
  reduction while ``SVC`` uses the One-vs-One multiclass reduction.

Both linear models have linear decision boundaries (intersecting hyperplanes)
while the non-linear kernel models (polynomial or Gaussian RBF) have more
flexible non-linear decision boundaries with shapes that depend on the kind of
kernel and its parameters.

.. NOTE:: while plotting the decision function of classifiers for toy 2D
   datasets can help get an intuitive understanding of their respective
   expressive power, be aware that those intuitions don't always generalize to
   more realistic high-dimensional problems.

"""
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets


def make_meshgrid(x, y, h=.02):
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    """Plot the decision boundaries for a classifier.

    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out


# import some data to play with
iris = datasets.load_iris()
# Take the first two features. We could avoid this by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0  # SVM regularization parameter
models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, C=C))
models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

# Set-up 2x2 grid for plotting.
plt.figure(1)
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy,
                  cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel('Sepal length')
    ax.set_ylabel('Sepal width')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()

3.结果

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 4
    点赞
  • 70
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是一个使用Python中的scikit-learn库对筛选后的特征进行训练,并构建SVM分类器的例子: 首先,我们需要导入需要的库: ```python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score ``` 然后,我们可以从CSV文件中读取数据集: ```python data = pd.read_csv('data.csv') ``` 接下来,我们将数据集中的特征和标签分开: ```python X = data.drop('label', axis=1) y = data['label'] ``` 然后,我们将数据集分成训练集和测试集: ```python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` 现在,我们可以使用scikit-learn库中的SVM分类器进行训练: ```python clf = SVC() clf.fit(X_train, y_train) ``` 最后,我们可以使用测试集对分类器进行评估: ```python y_pred = clf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy) ``` 完整代码如下: ```python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score # 从CSV文件中读取数据集 data = pd.read_csv('data.csv') # 将特征和标签分开 X = data.drop('label', axis=1) y = data['label'] # 将数据集分成训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 使用SVM分类器进行训练 clf = SVC() clf.fit(X_train, y_train) # 使用测试集对分类器进行评估 y_pred = clf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值