机器学习day03

最新推荐文章于 2022-08-13 21:57:26 发布

CSDN时光

最新推荐文章于 2022-08-13 21:57:26 发布

阅读量543

点赞数

本文链接：https://blog.csdn.net/qq_42584444/article/details/84108936

版权

2.划分训练集和测试集

import sklearn.model_selection as ms
ms.train_test_split(
输入集, 输出集, test_size=测试集占比,
ramdom_state=随机种子)
->训练输入, 测试输入, 训练输出, 测试输出
代码：split.py

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import numpy as np
import sklearn.model_selection as ms
import sklearn.naive_bayes as nb
import matplotlib.pyplot as mp
x, y = [], []
with open('../../data/multiple1.txt', 'r') as f:
    for line in f.readlines():
        data = [float(substr) for substr
                in line.split(',')]
        x.append(data[:-1])
        y.append(data[-1])
x = np.array(x)
y = np.array(y, dtype=int)
# 划分训练集和测试集
train_x, test_x, train_y, test_y = \
    ms.train_test_split(
        x, y, test_size=0.25, random_state=7)
# 朴素贝叶斯分类器
model = nb.GaussianNB()
# 用训练集训练模型
model.fit(train_x, train_y)
l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.005
b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.005
grid_x = np.meshgrid(np.arange(l, r, h),
                     np.arange(b, t, v))
flat_x = np.c_[grid_x[0].ravel(), grid_x[1].ravel()]
flat_y = model.predict(flat_x)
grid_y = flat_y.reshape(grid_x[0].shape)
# 用测试集测试模型
pred_test_y = model.predict(test_x)
print((pred_test_y == test_y).sum() / pred_test_y.size)
mp.figure('Naive Bayes Classification',
          facecolor='lightgray')
mp.title('Naive Bayes Classification', fontsize=20)
mp.xlabel('x', fontsize=14)
mp.ylabel('y', fontsize=14)
mp.tick_params(labelsize=10)
mp.pcolormesh(grid_x[0], grid_x[1], grid_y,
              cmap='gray')
mp.scatter(test_x[:, 0], test_x[:, 1], c=test_y,
           cmap='brg', s=80)
mp.show()

3.交叉验证

ms.cross_val_score(模型, 输入集, 输出集, cv=折叠数,
scoring=指标名)->指标值数组
指标：

精确度(accuracy)：分类正确的样本数/总样本数
查准率(precision_weighted)：针对每一个类别，预测正确的样本数比上预测出来的样本数
召回率(recall_weighted)：针对每一个类别，预测正确的样本数比上实际存在的样本数

f1得分(f1_weighted)：
2x查准率x召回率/(查准率+召回率)
在交叉验证过程中，针对每一个折叠，计算所有类别的查准率、召回率或者f1得分，然后取各类别相应指标值的平均数，作为这一个折叠的评估指标，然后再将所有折叠的评估指标以数组的形式返回调用者。
代码：cv.py

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import numpy as np
import sklearn.model_selection as ms
import sklearn.naive_bayes as nb
import matplotlib.pyplot as mp
x, y = [], []
with open('../../data/multiple1.txt', 'r') as f:
    for line in f.readlines():
        data = [float(substr) for substr
                in line.split(',')]
        x.append(data[:-1])
        y.append(data[-1])
x = np.array(x)
y = np.array(y, dtype=int)
# 划分训练集和测试集
train_x, test_x, train_y, test_y = \
    ms.train_test_split(
        x, y, test_size=0.25, random_state=7)
# 朴素贝叶斯分类器
model = nb.GaussianNB()
# 交叉验证
# 精确度
ac = ms.cross_val_score(
    model, train_x, train_y, cv=5,
    scoring='accuracy')
print(ac.mean())
# 查准率
pw = ms.cross_val_score(
    model, train_x, train_y, cv=5,
    scoring='precision_weighted')
print(pw.mean())
# 召回率
rw = ms.cross_val_score(
    model, train_x, train_y, cv=5,
    scoring='recall_weighted')
print(rw.mean())
# f1得分
fw = ms.cross_val_score(
    model, train_x, train_y, cv=5,
    scoring='f1_weighted')
print(fw.mean())
# 用训练集训练模型
model.fit(train_x, train_y)
l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.005
b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.005
grid_x = np.meshgrid(np.arange(l, r, h),
                     np.arange(b, t, v))
flat_x = np.c_[grid_x[0].ravel(), grid_x[1].ravel()]
flat_y = model.predict(flat_x)
grid_y = flat_y.reshape(grid_x[0].shape)
#

最低0.47元/天解锁文章

CSDN时光

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习day03

2.划分训练集和测试集import sklearn.model_selection as ms ms.train_test_split( 输入集, 输出集, test_size=测试集占比, ramdom_state=随机种子) -&gt;训练输入, 测试输入, 训练输出, 测试输出代码：split.py # -*- coding: utf-8 -*-from __futur...
复制链接

扫一扫