ID3算法的python实现

本文介绍了如何在MNIST数据集上使用Python实现ID3决策树算法。数据集包含28×28像素的二值化图像,共有784个特征和10个类别。采用了预剪枝策略进行模型简化。
摘要由CSDN通过智能技术生成

数据集

  • 数据集:MNIST数据,图片大小是28×28的,10个类别,使用数据的原始特征,所有每个样本有28×28=784个特征。
  • 图片中的每个元素值都经过二值化
  • 剪枝使用的是预剪枝。

代码

import cv2
import time
import logging
import numpy as np
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


# 二值化
def binaryzation(img):
    for i in range(len(img)):
        img_1 = img[i]  # 图片二值化
        cv_img = img_1.astype(np.uint8)  # 将图片的0-255取值变为0-1
        cv2.threshold(cv_img, 50, 1, cv2.THRESH_BINARY_INV, cv_img)
        img[i] = cv_img

# 树节点类
class Tree(object):
    def __init__(self, node_type, Class=None, feature=None):
        self.node_type = node_type
        self.Child = {
   }
        self.Class = Class
        self.feature = feature

    def add_tree(self, val, tree):
        self.Child[val] = tree

    def predict(self, features):
        if self.node_type == 'leaf':
            return self.Class
      
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
ID3算法是一种分类算法,它基于信息增益选择最优特征来构建决策树。下面是一个简单的Python实现: ```python import math def calc_entropy(data): """ 计算数据集的熵 """ size = len(data) classes = {} for item in data: label = item[-1] if label not in classes: classes[label] = 0 classes[label] += 1 entropy = 0.0 for key in classes: prob = float(classes[key]) / size entropy -= prob * math.log(prob, 2) return entropy def split_data(data, axis, value): """ 划分数据集 """ ret_data = [] for item in data: if item[axis] == value: reduced_item = item[:axis] reduced_item.extend(item[axis+1:]) ret_data.append(reduced_item) return ret_data def choose_feature(data): """ 选择最优特征 """ num_features = len(data[0]) - 1 base_entropy = calc_entropy(data) best_info_gain = 0.0 best_feature = -1 for i in range(num_features): feat_list = [example[i] for example in data] unique_vals = set(feat_list) new_entropy = 0.0 for value in unique_vals: sub_data = split_data(data, i, value) prob = len(sub_data) / float(len(data)) new_entropy += prob * calc_entropy(sub_data) info_gain = base_entropy - new_entropy if info_gain > best_info_gain: best_info_gain = info_gain best_feature = i return best_feature def create_tree(data, labels): """ 构建决策树 """ class_list = [example[-1] for example in data] if class_list.count(class_list[0]) == len(class_list): return class_list[0] if len(data[0]) == 1: return max(set(class_list), key=class_list.count) best_feat = choose_feature(data) best_feat_label = labels[best_feat] my_tree = {best_feat_label: {}} del(labels[best_feat]) feat_values = [example[best_feat] for example in data] unique_vals = set(feat_values) for value in unique_vals: sub_labels = labels[:] my_tree[best_feat_label][value] = create_tree(split_data(data, best_feat, value), sub_labels) return my_tree ``` 以上是一个简单的ID3算法Python实现,其中calc_entropy函数计算数据集的熵,split_data函数用于划分数据集,choose_feature函数选择最优特征,create_tree函数递归地构建决策树。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值