八、Decision Tree 决策树

最新推荐文章于 2023-06-20 14:18:41 发布

s_daqing

最新推荐文章于 2023-06-20 14:18:41 发布

阅读量187

点赞数

分类专栏： tensorflow 文章标签：决策树信息熵基尼系数分类数据纯度

本文链接：https://blog.csdn.net/s_daqing/article/details/117852455

版权

tensorflow 专栏收录该内容

30 篇文章 0 订阅

订阅专栏

决策树模型：让计算机自动构建逐层的 if-else 模型

优点：
可解释性
可提取到重要特征

缺点：
分类简单，解决不了复杂的问题

信息熵

在这里插入图片描述
参考：https://blog.csdn.net/weixin_39826984/article/details/111269019

基尼系数

在这里插入图片描述
参考：https://blog.csdn.net/weixin_41855010/article/details/110312523

import numpy as np
from collections import Counter
from icecream import ic
from functools import lru_cache


# 信息熵
@lru_cache(maxsize=2**10)
def pr(es):
    counter = Counter(es)

    def _wrap(e):
        return counter[e] / len(es)

    return _wrap


def entropy(elements):
    # 信息熵
    p = pr(elements)

    # return -np.sum(p(e) * np.log(p(e)) for e in set(elements))
    return -np.sum(np.fromiter([p(e) * np.log(p(e)) for e in set(elements)], dtype=float))


# 基尼系数
def gini(elements):
    p = pr(elements)

    # return 1 - np.sum(p(e) ** 2 for e in set(elements))
    return 1 - np.sum(np.fromiter([p(e) ** 2 for e in set(elements)], dtype=float))


pure_func = gini

ic(pure_func([1, 1, 1, 1, 1, 0]))
ic(pure_func([1, 1, 1, 1, 1, 1]))
ic(pure_func([1, 2, 3, 4, 5, 8]))
ic(pure_func([1, 2, 3, 4, 5, 9]))
ic(pure_func(['a', 'b', 'c', 'c', 'c', 'c', 'c']))
ic(pure_func(['a', 'b', 'c', 'c', 'c', 'c', 'd']))