机器学习---利用sklearn库实现SVM的花卉数据分类代码详解

最新推荐文章于 2024-05-18 20:08:10 发布

happylife_mini

最新推荐文章于 2024-05-18 20:08:10 发布

阅读量572

点赞数 1

文章标签：机器学习 svm 支持向量机

本文链接：https://blog.csdn.net/m0_46384757/article/details/119221233

版权

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
import matplotlib.pyplot as plt

# data
def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)    #列为花卉数据的feature_names
    df['label'] = iris.target    #创造一个名为label的列，值是花卉数据的分类012
    df.columns = [
        'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
    ]        #重命名列的名字
    #data取花卉数据的前两列和最后的分类数据
    data = np.array(df.iloc[:100, [0, 1, -1]])    #df是pandas中的DataFrame类型的，有个参数iloc前行后列，前面取了100行，只包含0/1分类
                                                  # 后面列表示取0,1,-1列

    for i in range(len(data)):
        if data[i, -1] == 0:
            data[i, -1] = -1
    #将所以label为0的改为-1
    return data[:, :2], data[:, -1]   #将花卉前两列数据和最后label数据分开返回

X, y = create_data()     #X中保存的是'sepal length', 'sepal width'前100行的数据，，y保存的是-1/1的label值
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_T_1=[]
X_T_2=[]
for i in range(len(y_test)):
    if y_test[i]==1:
        X_T_1=X_T_1+[X_test[i]]   #X_T_1中保存label为1的用于测试的花卉数据
    else:
        X_T_2=X_T_2+[X_test[i]]   #X_T_2中保存label为-1的用于测试的花卉数据

X_S_1=[]
X_S_2=[]
for i in range(len(y_train)):
    if y_train[i]==1:
        X_S_1=X_S_1+[X_train[i]]    #X_S_1中保存label为1的用于训练的花卉数据
    else:
        X_S_2=X_S_2+[X_train[i]]    #X_S_2中保存label为-1的用于训练的花卉数据

#将列表的形式转化为array的形式
X_T_1=np.array(X_T_1)
X_T_2=np.array(X_T_2)
X_S_1=np.array(X_S_1)
X_S_2=np.array(X_S_2)

plt.scatter(X_T_1[:,0],X_T_1[:,1], label='Test0')
plt.scatter(X_T_2[:,0],X_T_2[:,1], label='Test1')
plt.scatter(X_S_1[:,0],X_S_1[:,1], label='Train0')
plt.scatter(X_S_2[:,0],X_S_2[:,1], label='Train1')
plt.legend()
plt.show()
print("X的形状:",X.shape)

#以上都是处理数据部分
#利用sklearn 中的svm工具，将以上数据进行SVM分类，画出训练样本及测试样本的散状图、分离超平面、上边界、下边界。
#scikit-learn实例
from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train, y_train)

'''
SVC中的参数
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
'''
#测试集分数不是个定值，是会改变的
print("测试集分数:",clf.score(X_test, y_test))

#请补全代码 训练SVM
clf=SVC(kernel='linear',C=100)
clf.fit(X_train,y_train)
xaxis=np.linspace(min(X[:][0]),max(X[:][0])+2,10)
print(xaxis)
#二维坐标下的直线方程由Ax+By=C,clf.coef_表示[A,B]，C即为clf.intercept_
w=clf.coef_[0]
# 计算斜率,即求-A/B
a=-w[0]/w[1]
# 得到分离超平面
y_sep=a*xaxis-(clf.intercept_[0])/w[1]     #即为y_sep=斜率*xaxis+截距
# 下边界超平面
b=clf.support_vectors_[0]        #clf.support_vectors_表示支持向量
yy_down=a*xaxis+(b[1]-a*b[0])
# 上边界超平面
b=clf.support_vectors_[-1]
yy_up=a*xaxis+(b[1]-a*b[0])
# 绘制超平面,plot用于画线
plt.plot(xaxis,y_sep,'k-')
plt.plot(xaxis,yy_down,'k--')
plt.plot(xaxis,yy_up,'k--')

plt.scatter(X_T_1[:,0],X_T_1[:,1],label='Test0')
plt.scatter(X_T_2[:,0],X_T_2[:,1],label='Test1')
plt.scatter(X_S_1[:,0],X_S_1[:,1],label='Train0')
plt.scatter(X_S_2[:,0],X_S_2[:,1],label='Train1')
plt.show()


'''
sklearn.svm.SVC
(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)
参数：
C：C-SVC的惩罚参数C,默认值是1.0
C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。
kernel ：核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
– 线性：u'v
– 多项式：(gamma*u'*v + coef0)^degree
– RBF函数：exp(-gamma|u-v|^2)
– sigmoid：tanh(gamma*u'*v + coef0)
degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。
gamma ： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features
coef0 ：核函数的常数项。对于‘poly’和 ‘sigmoid’有用。
probability ：是否采用概率估计？.默认为False
shrinking ：是否采用shrinking heuristic方法，默认为true
tol ：停止训练的误差值大小，默认为1e-3
cache_size ：核函数cache缓存大小，默认为200
class_weight ：类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)
verbose ：允许冗余输出？
max_iter ：最大迭代次数。-1为无限制。
decision_function_shape ：‘ovo’, ‘ovr’ or None, default=None3
random_state ：数据洗牌时的种子值，int值
主要调节的参数有：C、kernel、degree、gamma、coef0。
'''

happylife_mini

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
机器学习---利用sklearn库实现SVM的花卉数据分类代码详解

import numpy as npimport pandas as pdfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitimport matplotlib.pyplot as plt# datadef create_data(): iris = load_iris() df = pd.DataFrame(iris.data, columns=i
复制链接

扫一扫