决策树算法中：根据信息增益选取下一个分类特征（python代码-算法面试）

格雷拉-皮奇

已于 2022-09-01 17:22:42 修改

阅读量454

点赞数

分类专栏：笔试面试

于 2022-09-01 17:09:24 首次发布

本文链接：https://blog.csdn.net/weixin_43897187/article/details/126647368

版权

决策树算法 python

笔试面试专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Python实现基于信息增益选取样本分类特


def ent(dataSet):
    N=len(dataSet)
    n=defaultdict(int)
    ent = 0
    for i in range(N):
        n[dataSet[i][-1]] += 1
    for index, num in n.items():
        p = num/N


        ent -= p*math.log(p)
    return ent

def entCondition(dataSet,feature):
    n = defaultdict(list)
    N = len(dataSet)
    entConditon = 0
    for i in range(N):
        n[dataSet[i][feature]].append(dataSet[i])
    for _, data in n.items():
        q = len(data)/N
        entConditon+=q*ent(data)
    return entConditon

dataSet = [[1,1,1,1],[0,0,1,1],[1,1,0,0],[1,1,0,0]]
feature_num=len(dataSet[0])-1
N = len(dataSet)
best_feature=0
best_ent_increase=0
last_ent=ent(dataSet)
for i in range(feature_num):
    i_entCondtion = entCondition(dataSet, i)
    ent_increase=last_ent-i_entCondtion
    print(ent_increase)
    if ent_increase>best_ent_increase:
        best_ent_increase=ent_increase
        best_feature=i
print(best_feature)