分类问题 python实现

随川sui-chuan

已于 2023-12-22 10:44:02 修改

阅读量418

点赞数 10

分类专栏：视觉文章标签：分类 python 数据挖掘

于 2023-12-19 21:15:08 首次发布

本文链接：https://blog.csdn.net/suichuansuichuan/article/details/135094578

版权

视觉专栏收录该内容

12 篇文章 0 订阅

订阅专栏

代码：

from sklearn import svm
import numpy as np
import pylab as pl

np.random.seed(0)
#将矩阵在行方面进行链接
x=np.r_[np.random.randn(20,2)-[2,2],np.random.randn(20,2)+[2,2]]
#生成标签
y=[0]*20+[1]*20
#定义并训练分类器
clf=svm.SVC(kernel='linear')
clf.fit(x,y)

w=clf.coef_[0]
b=clf.intercept_[0]

a=-w[0]/w[1]    #斜率
xx=np.linspace(-5,5)
yy=a*xx-(b/w[1])
#查看相关参数值
print("w:",w)
print("b:",b)
print("support_vectors:",clf.support_vectors_)
#画出超平面，经过支持向量的正负超平面
v1=clf.support_vectors_[0]#取出第一个支持向量
v2=clf.support_vectors_[-1]
yy_down=a*xx+(v1[1]-a*v2[0])

yy_up=a*xx+(v2[1]-a*v2[0])

pl.plot(xx,yy,'k-')
pl.plot(xx,yy_down,'--')
pl.plot(xx,yy_up,'k--')
#测试分类器
testData=np.array([[0,-1],[10,10]])
testY=clf.predict(testData)
print("test result:",testY)

#绘制数据点并圈出支持向量
pl.scatter(clf.support_vectors_[:,0],clf.support_vectors_[:,1],s=150,c='none',
           linewidths=1.5,edgecolors='#1f77b4')
pl.scatter(x[:,0],x[:,1],c=y,cmap=pl.cm.Paired)
pl.scatter(testData[:,0],testData[:,1],c='red')#绘制测试数据
pl.axis('tight')
pl.show()

解析：

np.r_

按列连接两个矩阵，就是把两矩阵上下相加，要求列数相等。

import numpy as np
a = np.array([[1, 2, 3],[7,8,9]])
b=np.array([[4,5,6],[1,2,3]])

g=np.r_[a,b]
print(g)

x=np.r_[np.random.randn(20,2)-[2,2],np.random.randn(20,2)+[2,2]]

创建一个包含40个点的数组x。前20个点的坐标是从标准正态分布中随机生成的，并且每个点的坐标都减去了[2,2]。后20个点的坐标也是从标准正态分布中随机生成的，并且每个点的坐标都加上了[2,2]。

这个数组x可以用于各种数据分析和机器学习任务，例如聚类分析、分类问题等。

创建数组x并进行可视化：

import numpy as np
import matplotlib.pyplot as plt

x = np.r_[np.random.randn(20,2)-[2,2],np.random.randn(20,2)+[2,2]]

plt.scatter(x[:,0], x[:,1])
plt.xlabel('x')
plt.ylabel('y')
plt.title('Random Points')
plt.show()

生成一个散点图，其中前20个点的坐标位于(-2,-2)附近，后20个点的坐标位于(2,2)附近。

svm.SVC（定义分类器）

当样例数量少于10000时的二元和多元分类

关键参数：C、kernel、degree 、gamma

C：惩罚参数，默认参数为1.0

越大，希望松弛变量接近0，对误分类的惩罚加大，趋向于对训练集全分化对的情况。

测试时准确率高，但泛化能力弱。

越小，误分类当作噪声点，泛化能力强。

kernel：

核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
– 线性：u’v
– 多项式：(gamma*u’v + coef0)^degree
– RBF函数：exp(-gamma|u-v|^2)
–sigmoid：tanh(gammau’*v + coef0)

degree：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。

gamma： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features

from sklearn.svm import SVC
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # 特征样本
y = np.array([1, 1, 2, 2])  # 分类标记
clf = SVC(kernel='sigmoid')  # 线性核函数
clf.fit(X, y)  # 训练模型，自动分类

print(clf.fit(X, y))  # svc训练函数参数
print(clf.predict([[-0.8, -1]]))  # 测试，新样本分类
print(clf.support_vectors_)  # 支持向量点 
print(clf.support_)  # 支持向量点的索引 
print(clf.n_support_)  # 每个class有几个支持向量点

clf.fit(训练分类器)

用训练数据拟合分类器模型

clf.coef_

分割超平面的参数权值，由于属性只有两维，所以 weight 也只有 2 维

np.linspace

xx=np.linspace(-5,5) 将 -5 到 5 上的数均分
np.linspace(start, stop, num,endpoint=True, retstep=False, dtype=None, axis=0)

start:返回样本数据开始点
stop:返回样本数据结束点
num:生成的样本数据量长度，默认为50
endpoint：True则包含stop；False则不包含stop
retstep：If True, return (samples, step), where step is the spacing between samples.(即如果为True则结果会给出数据间隔)
dtype：输出数组类型，默认是float
axis：0(默认)或-1

index = np.linspace(0,10,5,dtype=int)
print( index)

随川sui-chuan

关注

10
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
分类问题 python实现

详解
复制链接

扫一扫

专栏目录