1,训练数据集
30 m h g yes
40 c h j no
20 m l g no
25 c mi s no
20 m h h yes
20 c h j no
28 m mi s no
22 m mi x yes
训练集合数据一定要满足各种情况,上面是不完整的集合,有兴趣的可以自己造训练数据集
2,测试集
['29','m','l','g']
#coidng:utf8
from math import log
import operator
import treePlotter
import copy
#加载数据
def dataload(files):
f = open(files,'r')
lines = f.readlines()
f.close()
dataset = [line.strip().split('\t') for line in lines]
return dataset
#求特征熵
def shang(dataset):
nums = len(dataset)
featCounts = {}
for va in dataset:
if va[-1] not in featCounts.keys():
featCounts[va[-1]] = 0
featCounts[va[-1]] += 1
e = 0.0
for k in featCounts.keys