利用近邻算法,很难量化分类的置信度。而基于概率的分类算法---贝叶斯算法却不仅能够分类而且能够给出分类的概率,比如这个运动员80%的概率是一名篮球运动员
P(h)称为h的先验概率prior probability
P(h | d)称为h的后验概率posterior probability
一、贝叶斯定理
二、朴素贝叶斯
iHealth公司的i100 i500两款产品,
iHealth100:心率、GPS、WiFi
iHealth500:在i100的基础上添加了血氧饱和度和到iHealth网站的免费3G连接
iHealth公司雇我们构建一个面对顾客的产品推荐系统。为获得数据来构建系统,顾客购买时会让顾客填写一张问卷调查表。问卷中的每个问题都与某个属性有关
问题:如果某人的主要兴趣是健康、当前锻炼级别适中、动机中等,那么利用朴素贝叶斯方法会推荐哪款产品给他?
def classify(self, itemVector):
"""Return class we think item Vector is in"""
results = []
for (category, prior) in self.prior.items():
prob = prior
col = 1
for attrValue in itemVector:
if not attrValue in self.conditional[category][col]:
# we did not find any instances of this attribute value
# occurring with this category so prob = 0
prob = 0
else:
prob = prob * self.conditional[category][col][attrValue]
col += 1
results.append((prob, category))
# return the category with the highest probability
return(max(results)[1])
三、国会投票记录数据集
http://archive.ics.uci.edu/ml/index.html的机器学习资源库下载
概率估计