python实现决策树C4.5算法(ID3基础上改进)

一、概论 
C4.5主要是在ID3的基础上改进,ID3选择(属性)树节点是选择信息增益值最大的属性作为节点。而C4.5引入了新概念“信息增益率”,C4.5是选择信息增益率最大的属性作为树节点。 
二、信息增益 
信息增益

以上公式是求信息增益率(ID3的知识点) 
三、信息增益率 
信息增益率 
信息增益率是在求出信息增益值在除以这里写图片描述。 
例如下面公式为求属性为“outlook”的这里写图片描述值: 
这里写图片描述 
四、C4.5的完整代码

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> *
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> scipy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> *
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> math <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> log
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> operator

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#计算给定数据的香浓熵:</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">calcShannonEnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span>
    numEntries = len(dataSet)  
    labelCounts = {}  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#类别字典(类别的名称为键,该类别的个数为值)</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet:
        currentLabel = featVec[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> currentLabel <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts.keys():  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#还没添加到字典里的类型</span>
            labelCounts[currentLabel] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>;
        labelCounts[currentLabel] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>;
    shannonEnt = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts:  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出每种类型的熵</span>
        prob = float(labelCounts[key])/numEntries  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#每种类型个数占所有的比值</span>
        shannonEnt -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>)
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> shannonEnt;  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回熵</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按照给定的特征划分数据集</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">splitDataSet</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, axis, value)</span>:</span>
    retDataSet = []  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet:  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按dataSet矩阵中的第axis列的值等于value的分数据集</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> featVec[axis] == value:      <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#值等于value的,每一行为新的列表(去除第axis个数据)</span>
            reducedFeatVec = featVec[:axis]
            reducedFeatVec.extend(featVec[axis+<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:])  
            retDataSet.append(reducedFeatVec) 
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> retDataSet  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回分类后的新矩阵</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择最好的数据集划分方式</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">chooseBestFeatureToSplit</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span>  
    numFeatures = len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>])-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求属性的个数</span>
    baseEntropy = calcShannonEnt(dataSet)
    bestInfoGain = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>; bestFeature = -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> range(numFeatures):  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求所有属性的信息增益</span>
        featList = [example[i] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet]  
        uniqueVals = set(featList)  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#第i列属性的取值(不同值)数集合</span>
        newEntropy = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>  
        splitInfo = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals:  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求第i列属性每个不同值的熵*他们的概率</span>
            subDataSet = splitDataSet(dataSet, i , value)  
            prob = len(subDataSet)/float(len(dataSet))  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该值在i列属性中的概率</span>
            newEntropy += prob * calcShannonEnt(subDataSet)  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求i列属性各值对于的熵求和</span>
            splitInfo -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>);
        infoGain = (baseEntropy - newEntropy) / splitInfo;  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出第i列属性的信息增益率</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> infoGain;    
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(infoGain > bestInfoGain):  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#保存信息增益率最大的信息增益率值以及所在的下表(列值i)</span>
            bestInfoGain = infoGain  
            bestFeature = i  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> bestFeature  

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出出现次数最多的分类名称</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">majorityCnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(classList)</span>:</span>  
    classCount = {}  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classList:  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classCount.keys(): classCount[vote] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>  
        classCount[vote] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>  
    sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>), reverse=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>)
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> sortedClassCount[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>][<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]  

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建树</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTree</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, labels)</span>:</span>  
    classList = [example[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet];    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建需要创建树的训练数据的结果列表(例如最外层的列表是[N, N, Y, Y, Y, N, Y])</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> classList.count(classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == len(classList):  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#如果所有的训练数据都是属于一个类别,则返回该类别</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>];  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>):  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#训练数据只给出类别数据(没给任何属性值数据),返回出现次数最多的分类名称</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> majorityCnt(classList);

    bestFeat = chooseBestFeatureToSplit(dataSet);   <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择信息增益最大的属性进行分(返回值是属性类型列表的下标)</span>
    bestFeatLabel = labels[bestFeat]  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据下表找属性名称当树的根节点</span>
    myTree = {bestFeatLabel:{}}  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#以bestFeatLabel为根节点建一个空树</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">del</span>(labels[bestFeat])  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#从属性列表中删掉已经被选出来当根节点的属性</span>
    featValues = [example[bestFeat] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet]  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出该属性所有训练数据的值(创建列表)</span>
    uniqueVals = set(featValues)  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该属性的所有值得集合(集合的元素不能重复)</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals:  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据该属性的值求树的各个分支</span>
        subLabels = labels[:]  
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据各个分支递归创建树</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> myTree  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#生成的树</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#实用决策树进行分类</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">classify</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(inputTree, featLabels, testVec)</span>:</span>  
    firstStr = inputTree.keys()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]  
    secondDict = inputTree[firstStr]  
    featIndex = featLabels.index(firstStr)  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> secondDict.keys():  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> testVec[featIndex] == key:  
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> type(secondDict[key]).__name__ == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'dict'</span>:  
                classLabel = classify(secondDict[key], featLabels, testVec)  
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>: classLabel = secondDict[key]  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classLabel  

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的训练数据(生成二维列表)</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTrainData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
    lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines()
    labelLine = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>];
    labels = labelLine.strip().split()
    lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>]
    dataSet = [];
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set:
        data = line.split();
        dataSet.append(data);
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet, labels


<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的测试数据(生成二维列表)</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTestData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
    lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines()
    lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>]
    dataSet = [];
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set:
        data = line.strip().split();
        dataSet.append(data);
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet

myDat, labels = createTrainData()  
myTree = createTree(myDat,labels) 
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> myTree
bootList = [<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'outlook'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'temperature'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'humidity'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'windy'</span>];
testList = createTestData();
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> testData <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> testList:
    dic = classify(myTree, bootList, testData)
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> dic</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul>

五、C4.5与ID3的代码区别 
这里写图片描述
如上图,C4.5主要在第52、53行代码与ID3不同(ID3求的是信息增益,C4.5求的是信息增益率)。 
六、训练、测试数据集样例

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">训练集:

    outlook    temperature    humidity    windy 
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">---------------------------------------------------------</span>
    sunny     hot             high           <span class="hljs-constant" style="box-sizing: border-box;">false</span>          N
    sunny     hot             high           <span class="hljs-constant" style="box-sizing: border-box;">true</span>          N
    overcast  hot             high           <span class="hljs-constant" style="box-sizing: border-box;">false</span>         Y
    rain       mild           high           <span class="hljs-constant" style="box-sizing: border-box;">false</span>          Y
    rain        cool           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">false</span>          Y
    rain        cool           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">true</span>           N
   overcast  cool           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">true</span>          Y

测试集
 outlook    temperature    humidity    windy 
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">---------------------------------------------------------      </span>
    sunny       mild           high           <span class="hljs-constant" style="box-sizing: border-box;">false</span>          
    sunny       cool           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">false</span>         
    rain           mild           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">false</span>        
    sunny        mild           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>       <span class="hljs-constant" style="box-sizing: border-box;">true</span>          
    overcast    mild            high           <span class="hljs-constant" style="box-sizing: border-box;">true</span>          
    overcast    hot             <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span>      <span class="hljs-constant" style="box-sizing: border-box;">false</span>         
    rain           mild           high           <span class="hljs-constant" style="box-sizing: border-box;">true</span>       </code>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值