Table of Contents
1 概念
朴素贝叶斯法(Naive Bayes)是另一个简单直观的分类算法。顾名思义,它的核心是贝叶斯公式。
我们先看看最基本的贝叶斯公式:
P(y|x)=P(x|y)(y)P(x),(1)
接下来的事情就是要计算 P(x|y) 、 P(y) 和 P(x) 。
P(x|y)===P(X=x|Y=y)P(X(1)=x(1),X(2)=x(2),...,X(n)=x(n)|Y=Ck)∏j=1nP(X(l)=x(l)|Y=Ck),(2)
P(y)=P(Y=Ck),(3)
P(x)===P(X=x)P(X(1)=x(1),X(2)=x(2),..,X(n)=x(n))∑kP(Y=Ck)∏j=1nP(X(j)=x(j)|Y=Ck),(4)
P(Y=Ck|X=x)=P(Y=Ck)∏nj=1P(X(j)=x(j)|Y=Ck)∑kP(Y=Ck)∏nj=1P(X(j)=x(j)|Y=Ck),(5)
y=f(x)=argmax(P(Y=Ck)∏j=1nP(X(j)=x(j)|Y=Ck)),(6)
2 已有工具
2.1 R: e1071
e1071提供朴素贝叶斯分类器。用法举例:1
## Categorical data only: data(HouseVotes84, package = "mlbench") model <- naiveBayes(Class ~ ., data = HouseVotes84) predict(model, HouseVotes84[1:10,]) predict(model, HouseVotes84[1:10,], type = "raw") pred <- predict(model, HouseVotes84) table(pred, HouseVotes84$Class) ## using laplace smoothing: model <- naiveBayes(Class ~ ., data = HouseVotes84, laplace = 3) pred <- predict(model, HouseVotes84[,-1]) table(pred, HouseVotes84$Class) ## Example of using a contingency table: data(Titanic) m <- naiveBayes(Survived ~ ., data = Titanic) m predict(m, as.data.frame(Titanic)) ## Example with metric predictors: data(iris) m <- naiveBayes(Species ~ ., data = iris) ## alternatively: m <- naiveBayes(iris[,-5], iris[,5]) m table(predict(m, iris), iris[,5])
Footnotes:
1 Package e1071, http://cran.r-project.org/web/packages/e1071/e1071.pdf