一、实验原理
贝叶斯分类器是各种分类器中分类错误概率最小或者在预先给定代价的情况下平均风险最小的分类器。它的设计方法是一种最基本的统计分类方法。其分类原理是通过某对象的先验概率,利用贝叶斯公式计算出其后验概率,即该对象属于某一类的概率,选择具有最大后验概率的类作为该对象所属的类。
二、实验内容
三、实验过程
本次实验使用python语言,创建了三个python文件,分别为bayes_classfier.py,
Classfication.py,generate_attires.py
Bayes_classfier.py
datasets = {‘banala’:{‘long’:400,‘not_long’:100,‘sweet’:350,‘not_sweet’:150,‘yellow’:450,‘not_yellow’:50},
‘orange’:{‘long’:0,‘not_long’:300,‘sweet’:150,‘not_sweet’:150,‘yellow’:300,‘not_yellow’:0},
‘other_fruit’:{‘long’:100,‘not_long’:100,‘sweet’:150,‘not_sweet’:50,‘yellow’:50,‘not_yellow’:150}
}
def count_total(data):
‘’‘计算各种水果的总数
return {‘banala’:500 …}’‘’
count = {}
total = 0
for fruit in data:
‘’‘因为水果要么甜要么不甜,可以用 这两种特征来统计总数’‘’
count[fruit] = data[fruit][‘sweet’] + data[fruit][‘not_sweet’]
total += count[fruit]
return count,total
###########################################################
def cal_base_rates(data):
‘’‘计算各种水果的先验概率
return {‘banala’:0.5 …}’‘’
categories,total = count_total(data)
cal_base_rates = {}
for label in categories:
priori_prob = categories[label]/total
cal_base_rates[label] = priori_prob
return cal_base_rates
############################################################
def likelihold_prob(data):
‘’‘计算各个特征值在已知水果下的概率(likelihood probabilities)
{‘banala’:{‘long’:0.8}…}’‘’
count,_ = count_total(data)
likelihold = {}
for fruit in data:
‘’‘创建一个临时的字典,临时存储各个特征值的概率’‘’
attr_prob = {}
for attr in data[fruit]:
#计算各个特征值在已知水果下的概率
attr_prob[attr] = data[fruit][attr]/count[fruit]
likelihold[fruit] = attr_prob
return likelihold
############################################################
def evidence_prob(data):
‘’‘计算特征的概率对分类结果的影响
return {‘long’:50%…}’‘’
#水果的所有特征
attrs = list(data[‘banala’].keys())
count,total = count_total(data)
evidence_prob = {}
#计算各种特征的概率
for attr in attrs:
attr_total = 0
for fruit in data:
attr_total += data[fruit][attr]
evidence_prob[attr] = attr_total/total
return evidence_prob
##########################################################
#以上是训练数据用到的函数,即将数据转化为代码计算概率
##########################################################
class navie_bayes_classifier:
‘’‘初始化贝叶斯分类器,实例化时会调用__init__函数’‘’
def init(self,data=datasets):
self._data = datasets
self._labels = [key for key in self._data.keys()]
self._priori_prob = cal_base_rates(self._data)
self._likelihold_prob = likelihold_prob(self._data)
self._evidence_prob = evidence_prob(self._data)
#下面的函数可以直接调用上面类中定义的变量
def get_label(self,length,sweetness,color):
'''获取某一组特征值的类别'''
self._attrs = [length,sweetness,color]
res = {}
for label in self._labels:
prob = self._priori_prob[label]#取某水果占比率
#print("各个水果的占比率:",prob)
for attr in self._attrs:
return res
Generate_attires.py
import random
def random_attr(pair):
#生成0-1之间的随机数
return pair[random.randint(0,1)]
def gen_attrs():
#特征值的取值集合
sets = [(‘long’,‘not_long’),(‘sweet’,‘not_sweet’),(‘yellow’,‘not_yellow’)]
test_datasets = []
for i in range(20):
#使用map函数来生成一组特征值
test_datasets.append(list(map(random_attr,sets)))
return test_datasets
Classfication.py
import operator
import bayes_classfier
import generate_attires
def main():
test_datasets = generate_attires.gen_attrs()
classfier = bayes_classfier.navie_bayes_classifier()
for data in test_datasets:
print(“特征值:”,end=‘\t’)
print(data)
print(“预测结果:”, end=‘\t’)
res=classfier.get_label(*data)#表示多参传入
print(res)#预测属于哪种水果的概率
print(‘水果类别:’,end=‘\t’)
#对后验概率排序,输出概率最大的标签
print(sorted(res.items(),key=operator.itemgetter(1),reverse=True)[0][0])
四、实验结果
特征值: [‘not_long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.08571428571428573, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘not_long’, ‘sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.1076923076923077, ‘orange’: 0.0, ‘other_fruit’: 0.8653846153846153}
水果类别: other_fruit
特征值: [‘not_long’, ‘sweet’, ‘yellow’]
预测结果: {‘banala’: 0.24230769230769234, ‘orange’: 0.5769230769230769, ‘other_fruit’: 0.07211538461538461}
水果类别: orange
特征值: [‘not_long’, ‘sweet’, ‘yellow’]
预测结果: {‘banala’: 0.24230769230769234, ‘orange’: 0.5769230769230769, ‘other_fruit’: 0.07211538461538461}
水果类别: orange
特征值: [‘not_long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.08571428571428573, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.7714285714285716, ‘orange’: 0.0, ‘other_fruit’: 0.04464285714285715}
水果类别: banala
特征值: [‘not_long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.1928571428571429, ‘orange’: 1.0714285714285714, ‘other_fruit’: 0.04464285714285715}
水果类别: orange
特征值: [‘not_long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.1928571428571429, ‘orange’: 1.0714285714285714, ‘other_fruit’: 0.04464285714285715}
水果类别: orange
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘not_long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.1928571428571429, ‘orange’: 1.0714285714285714, ‘other_fruit’: 0.04464285714285715}
水果类别: orange
特征值: [‘not_long’, ‘sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.1076923076923077, ‘orange’: 0.0, ‘other_fruit’: 0.8653846153846153}
水果类别: other_fruit
特征值: [‘long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.7714285714285716, ‘orange’: 0.0, ‘other_fruit’: 0.04464285714285715}
水果类别: banala
特征值: [‘not_long’, ‘sweet’, ‘yellow’]
预测结果: {‘banala’: 0.24230769230769234, ‘orange’: 0.5769230769230769, ‘other_fruit’: 0.07211538461538461}
水果类别: orange
特征值: [‘long’, ‘not_sweet’, ‘not_yellow’]
预测结果: {‘banala’: 0.3428571428571429, ‘orange’: 0.0, ‘other_fruit’: 0.5357142857142858}
水果类别: other_fruit
特征值: [‘not_long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.1928571428571429, ‘orange’: 1.0714285714285714, ‘other_fruit’: 0.04464285714285715}
水果类别: orange
特征值: [‘long’, ‘not_sweet’, ‘yellow’]
预测结果: {‘banala’: 0.7714285714285716, ‘orange’: 0.0, ‘other_fruit’: 0.04464285714285715}
水果类别: banala