本文以python语言实现了C4.5和ID3算法,默认为C4.5算法,若要使用ID3算法,将函数 entropy()最后的返回值改变一下即可,即注释掉C4.5那行代码,启用ID3那行代码即可。
将源代码保存为python文件,命名为c45.py,最后一个参数为数据的路径,可自由设置,参考以下运行方式:
python c45.py data.txt
特别感谢:
源代码如下:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
__author__ = 'Administrator'
######## C4.5 ID3 finished!! ######
################# (tm_year=2016, tm_mon=3, tm_mday=15, tm_hour=22, tm_min=56, tm_sec=56, tm_wday=1, tm_yday=75, tm_isdst=0) ################
import re
import math
import sys
mini_size = 1 #### the minimum size of the nodes, the nodes will not be splited in the next though it is not fully just one type
DataLength = 100 ### the length of data items
used = [0 for i in range(DataLength)] ### attribute used or not
ended = [0 for i in range(DataLength)] #### if the nodes will be splited in the next
tp = [-1 for i in range(DataLength)] #### 1 - yes, 0 - no
class node:
def __init__(self):
self.value = ''
self.father =