Naive Bayes 算法，R 实现，二元正态样本 demo

最新推荐文章于 2023-03-09 14:18:23 发布

junglezax

最新推荐文章于 2023-03-09 14:18:23 发布

阅读量1.2k

点赞数

分类专栏：数据挖掘机器学习模式识别文章标签： Naive Bayes R

本文链接：https://blog.csdn.net/junglezax/article/details/14052159

版权

这篇博客通过R语言实现了朴素贝叶斯算法，用以解决二元正态分布样本分类问题。博主首先生成了两个类别的二维正态分布样本，然后利用MASS库进行数据处理。通过计算先验概率和条件概率，构建分类函数`hfun`，并用此函数预测新的样本。博客中还展示了分类边界绘制过程，并进行了测试，得出大约80%的识别率。

摘要由CSDN通过智能技术生成

# naive bayes alogrithm
# Xindong Wu, top ten algorithms for data mining, ch9, exercise 1
# author: nullspace(jxhchina at gmail.com)
# last updated: 10:34 2013/11/2

library(MASS)

source('readkey.R')

n <- 100

sigma1 <- matrix(c(1, 0, 0, 1), 2, 2)
sigma1
mu1 <- c(0, 0)
d1 <- mvrnorm(n, mu1, sigma1)

sigma2 <- matrix(c(1, 0, 0, 2), 2, 2)
sigma2
mu2 <- c(2, 2)
d2 <- mvrnorm(n, mu2, sigma2)

d <- rbind(cbind(d1, 1), cbind(d2, 2))
dim(d)

minxy <- apply(d[,1:2], 2, min)
maxxy <- apply(d[,1:2], 2, max)
minmaxxy <- rbind(minxy, maxxy)

plot.new()
plot(d1, col='red', xlim=minmaxxy[,1], ylim=minmaxxy[,2]) # new=FALSE|TRUE not work
points(d2, col='blue') 

df <- data.frame(d)
df[,3] <- factor(df[,3])
is.factor(df[,3])

# ratio of prior of class 1 and 2
r <- (sum(df[,3] == 1) / sum(df[,3] == 2))

# margin distribution of x1, x2
# suppose both of them belong to