用python实现id3_Python实现ID3(信息增益)

最新推荐文章于 2024-05-30 17:12:10 发布

林夕fingerstyle

最新推荐文章于 2024-05-30 17:12:10 发布

阅读量1.1k

点赞数

文章标签：用python实现id3

本文链接：https://blog.csdn.net/weixin_33940733/article/details/111988107

版权

Python实现ID3(信息增益)

运行环境

Pyhton3

treePlotter模块(画图所需，不画图可不必)

matplotlib(如果使用上面的模块必须)

计算过程

st=>start: 开始

e=>end

op1=>operation: 读入数据

op2=>operation: 格式化数据

cond=>condition: 是否建树完成

su=>subroutine: 递归建树

op3=>operation: 选择熵最大的为判决点

op4=>operation: 测试判决情况

op5=>operation: 划分为判决节点子树

st->op1->op2->cond

cond(no)->su->op5->op3->su

cond(yes)->op4->e

输入样例

/* Dataset.txt */

训练集:

outlook temperature humidity windy

---------------------------------------------------------

sunny hot high false N

sunny hot high true N

overcast hot high false Y

rain mild high false Y

rain cool normal false Y

rain cool normal true N

overcast cool normal true Y

测试集

outlook temperature humidity windy

---------------------------------------------------------

sunny mild high false

sunny cool normal false

rain mild normal false

sunny mild normal true

overcast mild high true

overcast hot normal false

rain mild high true

代码实现

# -*- coding: utf-8 -*-

__author__ = 'Wsine'

from math import log

import operator

import treePlotter

def calcShannonEnt(dataSet):

"""

输入：数据集

输出：数据集的香农熵

描述：计算给定数据集的香农熵

"""

numEntries = len(dataSet)

labelCounts = {}

for featVec in dataSet:

currentLabel = featVec[-1]

if currentLabel not in labelCounts.keys():

labelCounts[currentLabel] = 0

labelCounts[currentLabel] += 1

shannonEnt = 0.0

for key in labelCounts:

prob = float(labelCounts[key])/numEntries

shannonEnt -= prob * log(prob, 2)

return shannonEnt

def splitDataSet(dataSet, axis, value):

"""

输入：数据集，选择维度，选择值

输出：划分数据集

描述：按照给定特征划分数据集；去除选择维度中等于选择值的项

"""

retDataSet = []

for featVec in dataSet:

if featVec[axis] == value:

reduceFeatVec = featVec[:axis]

reduceFeatVec.extend(featVec[axis+1:])

retDataSet.append(reduceFeatVec)

return retDataSet

def chooseBestFeatureToSplit(dataSet):

"""

输入：数据集

输出：最好的划分维度

描述：选择最好的数据集划分维度

"""

numFeatures = len(dataSet[0]) - 1

baseEntropy = calcShannonEnt(dataSet)

bestInfoGain = 0.0

bestFeature = -1

for i in range(

最低0.47元/天解锁文章

林夕fingerstyle

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python实现id3_Python实现ID3(信息增益)

Python实现ID3(信息增益)运行环境Pyhton3treePlotter模块(画图所需，不画图可不必)matplotlib(如果使用上面的模块必须)计算过程st=>start: 开始e=>endop1=>operation: 读入数据op2=>operation: 格式化数据cond=>condition: 是否建树完成su=>subroutine: 递归...
复制链接

扫一扫