机器学习day03

2.划分训练集和测试集

  • import sklearn.model_selection as ms
    ms.train_test_split(
    输入集, 输出集, test_size=测试集占比,
    ramdom_state=随机种子)
    ->训练输入, 测试输入, 训练输出, 测试输出
    代码:split.py
    # -*- coding: utf-8 -*-
    from __future__ import unicode_literals
    import numpy as np
    import sklearn.model_selection as ms
    import sklearn.naive_bayes as nb
    import matplotlib.pyplot as mp
    x, y = [], []
    with open('../../data/multiple1.txt', 'r') as f:
        for line in f.readlines():
            data = [float(substr) for substr
                    in line.split(',')]
            x.append(data[:-1])
            y.append(data[-1])
    x = np.array(x)
    y = np.array(y, dtype=int)
    # 划分训练集和测试集
    train_x, test_x, train_y, test_y = \
        ms.train_test_split(
            x, y, test_size=0.25, random_state=7)
    # 朴素贝叶斯分类器
    model = nb.GaussianNB()
    # 用训练集训练模型
    model.fit(train_x, train_y)
    l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.005
    b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.005
    grid_x = np.meshgrid(np.arange(l, r, h),
                         np.arange(b, t, v))
    flat_x = np.c_[grid_x[0].ravel(), grid_x[1].ravel()]
    flat_y = model.predict(flat_x)
    grid_y = flat_y.reshape(grid_x[0].shape)
    # 用测试集测试模型
    pred_test_y = model.predict(test_x)
    print((pred_test_y == test_y).sum() / pred_test_y.size)
    mp.figure('Naive Bayes Classification',
              facecolor='lightgray')
    mp.title('Naive Bayes Classification', fontsize=20)
    mp.xlabel('x', fontsize=14)
    mp.ylabel('y', fontsize=14)
    mp.tick_params(labelsize=10)
    mp.pcolormesh(grid_x[0], grid_x[1], grid_y,
                  cmap='gray')
    mp.scatter(test_x[:, 0], test_x[:, 1], c=test_y,
               cmap='brg', s=80)
    mp.show()

     

3.交叉验证

  • ms.cross_val_score(模型, 输入集, 输出集, cv=折叠数,
                                         scoring=指标名)->指标值数组
    指标:
  1. 精确度(accuracy):分类正确的样本数/总样本数
  2. 查准率(precision_weighted):针对每一个类别,预测正确的样本数比上预测出来的样本数
  3. 召回率(recall_weighted):针对每一个类别,预测正确的样本数比上实际存在的样本数
  4. f1得分(f1_weighted):
      2x查准率x召回率/(查准率+召回率)
    在交叉验证过程中,针对每一个折叠,计算所有类别的查准率、召回率或者f1得分,然后取各类别相应指标值的平均数,作为这一个折叠的评估指标,然后再将所有折叠的评估指标以数组的形式返回调用者。
    代码:cv.py
    # -*- coding: utf-8 -*-
    from __future__ import unicode_literals
    import numpy as np
    import sklearn.model_selection as ms
    import sklearn.naive_bayes as nb
    import matplotlib.pyplot as mp
    x, y = [], []
    with open('../../data/multiple1.txt', 'r') as f:
        for line in f.readlines():
            data = [float(substr) for substr
                    in line.split(',')]
            x.append(data[:-1])
            y.append(data[-1])
    x = np.array(x)
    y = np.array(y, dtype=int)
    # 划分训练集和测试集
    train_x, test_x, train_y, test_y = \
        ms.train_test_split(
            x, y, test_size=0.25, random_state=7)
    # 朴素贝叶斯分类器
    model = nb.GaussianNB()
    # 交叉验证
    # 精确度
    ac = ms.cross_val_score(
        model, train_x, train_y, cv=5,
        scoring='accuracy')
    print(ac.mean())
    # 查准率
    pw = ms.cross_val_score(
        model, train_x, train_y, cv=5,
        scoring='precision_weighted')
    print(pw.mean())
    # 召回率
    rw = ms.cross_val_score(
        model, train_x, train_y, cv=5,
        scoring='recall_weighted')
    print(rw.mean())
    # f1得分
    fw = ms.cross_val_score(
        model, train_x, train_y, cv=5,
        scoring='f1_weighted')
    print(fw.mean())
    # 用训练集训练模型
    model.fit(train_x, train_y)
    l, r, h = x[:, 0].min() - 1, x[:, 0].max() + 1, 0.005
    b, t, v = x[:, 1].min() - 1, x[:, 1].max() + 1, 0.005
    grid_x = np.meshgrid(np.arange(l, r, h),
                         np.arange(b, t, v))
    flat_x = np.c_[grid_x[0].ravel(), grid_x[1].ravel()]
    flat_y = model.predict(flat_x)
    grid_y = flat_y.reshape(grid_x[0].shape)
    # 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值