继续入门~
这一次实现的是李航博士《统计学习方法》中的朴素贝叶斯的算法,我认为我的代码输出效果并不好,而且也只是验证了书中的例题,还不能对多个特征进行训练测试,在每一次编写的过程中,我都对Python的列表,数组等结构能有一个更深的认识,也是一个不断学习的过程,希望自己在接下来的学习过程中能不断转变面向对象的思维。
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 14 12:13:37 2017
完成于2017-4-16
参考文献是李航《统计学习方法》,并验证了例题
程序只实现了两个特征的训练数据集
@author: sgp
"""
from collections import Counter
def naviebayes_1(y):
'''
根据最大似然估计计算先验概率
输出格式为:[['s', 0.5], ['y', 0.5]]
即P('s') = 0.5
'''
y_num = len(y)
y1 = Counter(y).most_common()#[(1, 5), (2, 5), (3, 5)]这种格式的结构,外部为list结构,内部(1, 5)为tuple结构
y2 = []
y1_num = len(y1)
for i in range(y1_num):
y2.append(list(y1[i]))#将tuple结构转为list结构
for i in range(len(y2)):
y2[i][1] = (y2[i][1] +1) / (y_num + len(y2))#计算每一个特征的概率
return y2
def naviebayes_2(x,y):
'''
计算条件概率
输入为训练数据集x,和类标签集y
输出格式:[[[1, 1, 0.2222222222222222],
[2, 1, 0.3333333333333333],
[3, 1, 0.4444444444444444]],
[[1, -1, 0.5], [2, -1, 0.3333333333333333], [3, -1, 0.16666666666666666]]]
比如[1, 1, 0.2222222222222222]表示在条件y=1下x=1的概率为0.222...,即条件概率P(x=1|y=1)=0.222...
x在前,y在中间,概率值在最后
'''
x_num = len(x)
x1 = naviebayes_1(x)
y1 = naviebayes_1(y)
x1_num = len(x1)
y1_num = len(y1)
n = 0
z1 = []
z2 = []
z3 = []
y2 = []
y11 = Counter(y).most_common()
y11_num = len(y11)
for i in range(y11_num):
y2.append(list(y11[i]))#将tuple结构转为list结构
for i in range(y1_num):
z2 = []
for j in range(x1_num):
for h in range(x_num):
if y[h] == y1[i][0] and x[h] == x1[j][0]:
n = n + 1
z1.append(x1[j][0])
z1.append(y1[i][0])
z1.append((n+1)/(y2[i][1] + x1_num))
z2.append(z1)
n = 0
z1 = []
z3.append(z2)
return z3
def naviebayes_3(x,x1,x2,y):
'''
输入为:要测试类标记的点x,训练数据集x1,x2,类标签集y
输出:0.06666666666666667
-1
第一行0.0666...表示最大概率
第二行-1表示测试点的分类
'''
z1 = naviebayes_2(x1,y)#计算y条件下x1的概率
z2 = naviebayes_2(x2,y)#计算y条件下x2的概率
x11 = naviebayes_1(x1)#计算x1的先验概率
x12 = naviebayes_1(x2)#计算x2的先验概率
y1 = naviebayes_1(y)#计算类标签y的先验概率
y1_num = len(y1)
x11_num = len(x11)
p1 = 0
p2 = 0
p = []
y_lab = 0
x12_num = len(x12)
for i in range(y1_num):
for j in range(x11_num):
if z1[i][j][0] == x[0]:
p1 = y1[i][1] * z1[i][j][2]
for h in range(x12_num):
if z2[i][h][0] == x[1]:
p2 = p1 * z2[i][j][2]
y_lab = z2[i][h][1]
p.append(p2)
return max(p),y_lab