educoder机器学习-实验四-编程实现基于信息增益进行划分选择的决策树算法_任务描述本关任务:编写一个使用决策树算法进行信息增益计算及结点划分的程序。-CSDN博客

本文链接：https://blog.csdn.net/m0_64351669/article/details/127587741

这是educoder平台，机器学习-实验四-编程实现基于信息增益进行划分选择的决策树算法的代码详解与解决过程详解，创造不易，请大家点点赞，收藏藏！

1.导包加载数据

2. 计算信息熵

3.对数据进行划分

4.计算信息的熵，条件熵，信息增益来保存最好的属性

1.导包加载数据

from math import log
import operator


def loaddata():
    dataSet = [[0, 0, 0, 0, 0, 0, 'yes'],
               [1, 0, 1, 0, 0, 0, 'yes'],
               [1, 0, 0, 0, 0, 0, 'yes'],
               [0, 0, 1, 0, 0, 0, 'yes'],
               [2, 0, 0, 0, 0, 0, 'yes'],
               [0, 1, 0, 0, 1, 1, 'yes'],
               [1, 1, 0, 1, 1, 1, 'yes'],
               [1, 1, 0, 0, 1, 0, 'yes'],
               [1, 1, 1, 1, 1, 0, 'no'],
               [0, 2, 2, 0, 2, 1, 'no'],
               [2, 2, 2, 2, 2, 0, 'no'],
               [2, 0, 0, 2, 2, 1, 'no'],
               [0, 1, 0, 1, 0, 0, 'no'],
               [2, 1, 1, 1, 0, 0, 'no'],
               [1, 1, 0, 0, 1, 1, 'no'],
               [2, 0, 0, 2, 2, 0, 'no'],
               [0, 0, 1, 1, 1, 0, 'no']]
    feature_name = ['a1', 'a2', 'a3', 'a4', 'a5', 'a6']
    return dataSet, feature_name

2. 计算信息熵

def entropy(dataSet):
    # 数据集条数
    m = len(dataSet)
    # 保存所有的类别及属于该类别的样本数
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1
    # 保存熵值
    e = 0.0
    # 补充计算