机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

最新推荐文章于 2023-07-03 11:03:48 发布

智慧地球（AI·Earth）社区

最新推荐文章于 2023-07-03 11:03:48 发布

阅读量835

点赞数

分类专栏：机器学习文章标签： python 代码理解决策树id3算法机器学习

本文链接：https://blog.csdn.net/qq_36396104/article/details/76944667

版权

机器学习专栏收录该内容

19 篇文章 8 订阅

订阅专栏

python递归构建决策树：

Python 基础：
count()方法：
Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。
示例：

>>> a = [-1, 3, 'aa', 85] # 定义一个list
>>> a
[-1, 3, 'aa', 85]
>>> del a[0] # 删除第0个元素
>>> a
[3, 'aa', 85]
>>> del a[2:4] # 删除从第2个元素开始，到第4个为止的元素。包括头不包括尾
>>> a
[3, 'aa']
>>> del a # 删除整个list
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>>

开始构造第一个基础的决策树：
（一）：得到决策树（字典的表示形式）的代码：

def majorityCnt(classList):#得到出现次数最多的分类名称（投票表决代码）
    classCount={}
    for vote in classList:
        if vote not in classCount.keys():classCount[vote]=0
        classCount[vote]+=1
    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]

def createTree(dataSet,labels):#**构造的决策树**
    classList = [example[-1] for example in dataSet]#得到数据集的所有类标签，列表解析详见前一节
    if classList.count(classList[0]) == len(classList):#Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)#分离出最适合的那个属性
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel:{}}#创建一个嵌套有属性bestFeatLabel的字典，bestFeatLabel:{}后的字典内嵌套的是 myTree[bestFeatLabel][value]（递归得到的字典）
    # 或者是上面两个if语句结束时return的myTree[bestFeatLabel][value]）的值majorityCnt(classList)或classList[0]
    del(labels[bestFeat])#删除已经选择出来的属性标签
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)#得到属性的各种取值（所得元素不重复）
    for value in uniqueVals:
        subLabels = labels[:]#在python中函数参数是列表类型时，参数是按照引用的方式传递，可防止改变原始列表的内容
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,bestFeat,value),subLabels)
    return myTree

#测试代码：
def createDataSet():
    dataSet = [[1,1,0,'maybe'],
               [1, 1,0,'yes'],
               [1, 1, 1,'yes'],
               [1,0,1,'maybe'],
               [0,1,0,'no'],
               [0,1,0,'no']]
    labels = ['no surfacing','flippers','maybe']
    return dataSet,labels
import CreateDataSet
import trees

myDat,labels=CreateDataSet.createDataSet()
myTree = trees.createTree(myDat,labels)
print(myTree)

#结果：{'no surfacing': {0: 'no', 1: {'flippers': {0: 'maybe', 1: {'maybe': {0: 'maybe', 1: 'yes'}}}}}}

（二）绘制树形图的代码（由于代码仅是依照上述的字典绘制，这里就不再占用过多的空间）：
中间可能会遇到的一些问题：主要是Python2.x与3.x的差别导致的：
firstStr = myTree.keys()[0]
#Clearly you’re passing in d.keys() to your shuffle function.
# Probably this was written with python2.x (when d.keys() returned a list). With python3.x, d.keys() returns a dict_keys object which behaves a lot more like a set than a list.
# As such, it can’t be indexed.
#The solution is to pass list(d.keys()) (or simply list(d)) to shuffle.
或者中文可以参照这位csdn的：firstStr = myTree.keys()[0]

（三）测试算法，使用决策树：

def classify(inputTree, featLabels, testVec):
    firstStr = list(inputTree.keys())#得到节点所代表的属性eg：'flippers'
    firstStr = firstStr[0]
    secondDict = inputTree[firstStr]#得到该节点的子节点，是一个dict，eg：{0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}
    featIndex = featLabels.index(firstStr)#得到firstStr在所给的featLabels（属性）中的位置，以便将testVec中的值与相应的属性对应
    for key in secondDict.keys():#将testVec中的值放入决策树中进行判断
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__=='dict':#如果还有子节点则继续判断
                classLabel = classify(secondDict[key],featLabels,testVec)
            else: classLabel = secondDict[key]#否则返回该节点的值
    return classLabel

（四）决策树的存储与读取：
此处主要遇到的问题是pickle的问题：
Pickle文件是二进制数据文件，因此必须使用’rb’模式打开文件，’wb’模式写入文件，而不是使用文本模式。

def storeTree(inputTree,filename):
    import pickle
    fw = open(filename,'wb')#Pickle files are binary data files, so you always have to open the file with the 'wb' mode when writing. Don't try to use a text mode here.
    pickle.dump(inputTree,fw)
    fw.close()

def grabTree(filename):
    import pickle
    fr = open(filename,'rb')#Pickle files are binary data files, so you always have to open the file with the 'rb' mode when loading. Don't try to use a text mode here.
    return pickle.load(fr)

智慧地球（AI·Earth）社区

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
机器学习实战_初识决策树（ID3）算法_理解其python代码（二）

python递归构建决策树：Python 基础： count()方法： Python count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索的开始与结束位置。示例：>>> a = [-1, 3, 'aa', 85] # 定义一个list>>> a[-1, 3, 'aa', 85]>>> del a[0] # 删除第0个元素>>> a[3, 'aa', 85]
复制链接

扫一扫