python中分类常用的方法

最新推荐文章于 2024-07-26 16:50:17 发布

yyx0801mm

最新推荐文章于 2024-07-26 16:50:17 发布

阅读量5.3k

点赞数 3

文章标签：数据分析分类朴素贝叶斯支持向量机 KNN

本文链接：https://blog.csdn.net/qq_42786453/article/details/81505441

版权

本文介绍了Python中几种常见的分类算法，包括线性逻辑分类、朴素贝叶斯、随机森林、支持向量机（SVM）和KNN。通过具体代码示例展示了如何使用sklearn库实现这些算法，同时探讨了各种算法的应用场景和核心思想。

摘要由CSDN通过智能技术生成

分类是数据处理常用的方法，今天介绍python中种常用的数据分析方法

1、线性逻辑分类

逻辑分类分为二元分类和多元分类

函数：y = 1 / (1 + e^-z) 其中 z = k1x1 + k2x2 + b

交叉熵误差：J(k1,k2,b) = sigma(-ylog(y') - (1-y)log(1-y')) / m + 正则函数 * 正则强度（目的是防止过拟合，提高模型泛化性能）

python方法：sklearn.linear_model.LogisticRegression(solver='liblinear', c=正则强度)

二元分类示例：

import numpy as np
import matplotlib.pyplot as mp
import sklearn.linear_model as lm

x = np.array([
    [3, 1],
    [2, 5],
    [1, 8],
    [6, 4],
    [5, 2],
    [3, 5],
    [4, 7],
    [4, -1]
])
y = np.array([0, 1, 1, 0, 0, 1, 1, 0])
model = lm.LogisticRegression(solver='liblinear', C=1)
model.fit(x, y)
l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.05
b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.05
grid_x = np.meshgrid(np.arange(l, r, h), np.arange(b, t, v))
flat_x = np.c_[grid_x[0].ravel(), grid_x[1].ravel()]
flat_y = model.predict(flat_x)
grid_y = flat_y.reshape(grid_x[0].shape)

mp.figure('Logistic Classification', facecolor='lightgray')
mp.title('Logistic Classification', fontsize=12)
mp.xlabel('x', fontsize=12)
mp.ylabel('y', fontsize=12)
mp.tick_params(labelsize=10)
# 根据颜色画图
mp.pcolormesh(grid_x[0], grid_x[1], grid_y, cmap='gray')
mp.scatter(x[:, 0], x[:, 1], c=y, cmap='brg', s=60)
mp.show()

结果：

多元分类示例：

import numpy as np
import matplotlib.pyplot as mp
import sklearn.linear_model as lm

x = np.array([
    [4, 7],
    [3.5, 8],
    [3.1, 6.2],
    [0.5, 1],
    [1, 2],
    [1.2, 1.9],
    [6, 2],
    [5.7, 1.5],
    [5.4, 2.2]
])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
model = lm.LogisticRegression(solver='liblinear', C=100)
model.fit(x, y)
l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.05
b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.05
grid_x = np.meshgrid(np.arang