贝叶斯分类器原理和应用

最新推荐文章于 2024-05-14 02:31:33 发布

_Daibingh_

最新推荐文章于 2024-05-14 02:31:33 发布

阅读量1.2k

点赞数 4

分类专栏：机器学习文章标签：朴素贝叶斯 sklearn python

本文链接：https://blog.csdn.net/healingwounds/article/details/84448341

版权

机器学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

利用 sklearn 贝叶斯分类器对 IRIS 数据集分类

贝叶斯分类的基本思想一言以蔽之“将样本归为其后验概率最大的那个类”。

具体原理参考: http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html

sklearn 工具包中对根据样本的分布特性对朴素贝叶斯分类器进行了实现，分为以下几个具体情况：

朴素贝叶斯-高斯模型
朴素贝叶斯-多项式模型
朴素贝叶斯-伯努利模型

参考官方文档：http://sklearn.lzjqsdd.com/modules/naive_bayes.html

其中，高斯模型应用最普遍，本文调用 sklearn 工具包中朴素贝叶斯-高斯模型分类器（GaussianNB）对 IRIS 进行分类。

严格来讲首先应该进行假设检验，判断样本是否符合高斯分布。在这里将这一步骤省略，以分布直方图的形式直观展现样本的分布特征。

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.ticker import PercentFormatter

if __name__ == '__main__':
  
    iris = datasets.load_iris() 
    print(type(iris), dir(iris))

    x = iris.get('data')
    y = iris.get('target')

    # show attributes histogram
    c = np.unique(y)
    ind = []
    ind.append(y==c[0])
    ind.append(y==c[1])
    ind.append(y==c[2])
    bin_num = 40
    fig, axes = plt.subplots(len(c),4)
    for i, ax in enumerate(axes.flat):
        ind_ = ind[i//4]
        j = i%4
        ax.hist(x[ind_,j], bins=bin_num)

    axes[0,0].set_ylabel("y = 0")
    axes[1,0].set_ylabel("y = 1")
    axes[2,0].set_ylabel("y = 2")
    axes[0,0].set_title("attribute 0")
    axes[0,1].set_title("attribute 1")
    axes[0,2].set_title("attribute 2")
    axes[0,3].set_title("attribute 3")
    plt.show()

在这里插入图片描述

从分布直方图看出，样本数据的分布呈现单峰特性，近似服从高斯分布。

下面对数据集进行划分，分类和测试。

 # 随机划分训练集和测试集
    num = x.shape[0] # 样本总数
    ratio = 7/3 # 划分比例，训练集数目:测试集数目
    num_test = int(num/(1+ratio)) # 测试集样本数目
    num_train = num -  num_test # 训练集样本数目
    index = np.arange(num) # 产生样本标号
    np.random.shuffle(index) # 洗牌
    x_test = x[index[:num_test],:] # 取出洗牌后前 num_test 作为测试集
    y_test = y[index[:num_test]]
    x_train = x[index[num_test:],:] # 剩余作为训练集
    y_train = y[index[num_test:]]

    gnb = GaussianNB()
    gnb.fit(x_train, y_train)
    y_test_pre = gnb.predict(x_test)

    # 计算分类准确率
    acc = sum(y_test_pre==y_test)/num_test
    print('The accuracy is', acc) # 显示预测准确率

分类结果显示：

The accuracy is 0.9111111111111111

_Daibingh_

关注

4
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
贝叶斯分类器原理和应用

利用 sklearn 贝叶斯分类器对 IRIS 数据集分类贝叶斯分类的基本思想一言以蔽之“将样本归为其后验概率最大的那个类”。具体原理参考: http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.htmlsklearn 工具包中对根据样本的分布特性对朴素贝叶斯分类器进行了实现，分为以下几个具体情...
复制链接

扫一扫