朴素贝叶斯分类
条件概率:P(A|B) = P(AB)/P(B)
条件概率公式变形(贝叶斯定理):
P(A|B) = P(AB)/P(B) => P(AB) = P(A|B) * P(B)
P(B|A) = P(AB)/P(A) => P(AB) = P(B|A) * P(A)
=> P(A|B) * P(B) = P(B|A) * P(A)
训练数据如下
RID Age Income Student Credit_rating Class:buys_computer
1 <=30 High No Fair No
2 <=30 High No Excellent No
3 31…40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31…40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Fair Yes
11 <=30 Medium Yes Excellent Yes
12 31…40 Medium No Excellent Yes
13 31…40 High Yes Fair Yes
14 >40 Medium No Excellent No
使用朴素贝叶斯分类预测类标号:训练数据如上,我们希
望使用朴素贝叶斯分类预测一个未知样本的类标号。数据样本用属性age,
income, student 和credit_rating 描述。类标号属性buys_computer 具有两个不同值(即,{yes,
no})。设C1 对应于类buys_computer = “yes”,而C2 对应于类buys_computer = “no”。我们希望
分类的未知样本为:
X = (age ="<= 30", income ="medium", student =" yes", credit _ rating =" fair").
我们需要最大化P(X |Ci )P(Ci ),i = 1,2。每个类的先验概率P(Ci )可以根据训练样本计算:
P(buys_computer = yes) = 9/14 = 0.643
P(buys_computer = no) = 5/14 = 0.357
为计算P(X |Ci ), i = 1,2。我们计算下面的条件概率(根据样本计算而出):
P(age = “<30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<30” | buys_computer = “no”) = 3/5 = 0.600
P(income =“medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.400
P(student = “yes” | buys_computer = “ yes”) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.200
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.400
使用以上概率,我们得到P(AB):
P(X | buys_computer = “yes”) = 0.222×0.444×0.667×0.667 = 0.044
P(X | buys_computer = “no”) = 0.600×0.400×0.200×0.400 = 0.019
P(X | buys_computer = “yes”) P(buys_computer = “yes”) = 0.044×0.643 = 0.028
P(X | buys_computer = “no”) P(buys_computer = “no”) = 0.019×0.357 = 0.007
因此,对于样本X,朴素贝叶斯分类预测buys_computer =” yes”