一、概论
C4.5主要是在ID3的基础上改进,ID3选择(属性)树节点是选择信息增益值最大的属性作为节点。而C4.5引入了新概念“信息增益率”,C4.5是选择信息增益率最大的属性作为树节点。
二、信息增益
以上公式是求信息增益率(ID3的知识点)
三、信息增益率
信息增益率是在求出信息增益值在除以。
例如下面公式为求属性为“outlook”的值:
四、C4.5的完整代码
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> scipy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> math <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> log <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> operator <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#计算给定数据的香浓熵:</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">calcShannonEnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span> numEntries = len(dataSet) labelCounts = {} <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#类别字典(类别的名称为键,该类别的个数为值)</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet: currentLabel = featVec[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> currentLabel <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts.keys(): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#还没添加到字典里的类型</span> labelCounts[currentLabel] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>; labelCounts[currentLabel] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>; shannonEnt = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> labelCounts: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出每种类型的熵</span> prob = float(labelCounts[key])/numEntries <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#每种类型个数占所有的比值</span> shannonEnt -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> shannonEnt; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回熵</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按照给定的特征划分数据集</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">splitDataSet</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, axis, value)</span>:</span> retDataSet = [] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> featVec <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#按dataSet矩阵中的第axis列的值等于value的分数据集</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> featVec[axis] == value: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#值等于value的,每一行为新的列表(去除第axis个数据)</span> reducedFeatVec = featVec[:axis] reducedFeatVec.extend(featVec[axis+<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:]) retDataSet.append(reducedFeatVec) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> retDataSet <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#返回分类后的新矩阵</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择最好的数据集划分方式</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">chooseBestFeatureToSplit</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet)</span>:</span> numFeatures = len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>])-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求属性的个数</span> baseEntropy = calcShannonEnt(dataSet) bestInfoGain = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>; bestFeature = -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> i <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> range(numFeatures): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求所有属性的信息增益</span> featList = [example[i] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet] uniqueVals = set(featList) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#第i列属性的取值(不同值)数集合</span> newEntropy = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span> splitInfo = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求第i列属性每个不同值的熵*他们的概率</span> subDataSet = splitDataSet(dataSet, i , value) prob = len(subDataSet)/float(len(dataSet)) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该值在i列属性中的概率</span> newEntropy += prob * calcShannonEnt(subDataSet) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求i列属性各值对于的熵求和</span> splitInfo -= prob * log(prob, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>); infoGain = (baseEntropy - newEntropy) / splitInfo; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出第i列属性的信息增益率</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> infoGain; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span>(infoGain > bestInfoGain): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#保存信息增益率最大的信息增益率值以及所在的下表(列值i)</span> bestInfoGain = infoGain bestFeature = i <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> bestFeature <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出出现次数最多的分类名称</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">majorityCnt</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(classList)</span>:</span> classCount = {} <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classList: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> vote <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> classCount.keys(): classCount[vote] = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span> classCount[vote] += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>), reverse=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> sortedClassCount[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>][<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建树</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTree</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet, labels)</span>:</span> classList = [example[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet]; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#创建需要创建树的训练数据的结果列表(例如最外层的列表是[N, N, Y, Y, Y, N, Y])</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> classList.count(classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == len(classList): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#如果所有的训练数据都是属于一个类别,则返回该类别</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classList[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (len(dataSet[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>): <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#训练数据只给出类别数据(没给任何属性值数据),返回出现次数最多的分类名称</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> majorityCnt(classList); bestFeat = chooseBestFeatureToSplit(dataSet); <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#选择信息增益最大的属性进行分(返回值是属性类型列表的下标)</span> bestFeatLabel = labels[bestFeat] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据下表找属性名称当树的根节点</span> myTree = {bestFeatLabel:{}} <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#以bestFeatLabel为根节点建一个空树</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">del</span>(labels[bestFeat]) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#从属性列表中删掉已经被选出来当根节点的属性</span> featValues = [example[bestFeat] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> example <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> dataSet] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#找出该属性所有训练数据的值(创建列表)</span> uniqueVals = set(featValues) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#求出该属性的所有值得集合(集合的元素不能重复)</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> value <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> uniqueVals: <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据该属性的值求树的各个分支</span> subLabels = labels[:] myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#根据各个分支递归创建树</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> myTree <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#生成的树</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#实用决策树进行分类</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">classify</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(inputTree, featLabels, testVec)</span>:</span> firstStr = inputTree.keys()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] secondDict = inputTree[firstStr] featIndex = featLabels.index(firstStr) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> secondDict.keys(): <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> testVec[featIndex] == key: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> type(secondDict[key]).__name__ == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'dict'</span>: classLabel = classify(secondDict[key], featLabels, testVec) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>: classLabel = secondDict[key] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> classLabel <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的训练数据(生成二维列表)</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTrainData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span> lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines() labelLine = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>]; labels = labelLine.strip().split() lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>] dataSet = []; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set: data = line.split(); dataSet.append(data); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet, labels <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#读取数据文档中的测试数据(生成二维列表)</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">createTestData</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span> lines_set = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'../data/ID3/Dataset.txt'</span>).readlines() lines_set = lines_set[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>] dataSet = []; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> lines_set: data = line.strip().split(); dataSet.append(data); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> dataSet myDat, labels = createTrainData() myTree = createTree(myDat,labels) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> myTree bootList = [<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'outlook'</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'temperature'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'humidity'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'windy'</span>]; testList = createTestData(); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> testData <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> testList: dic = classify(myTree, bootList, testData) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> dic</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li></ul>
五、C4.5与ID3的代码区别
如上图,C4.5主要在第52、53行代码与ID3不同(ID3求的是信息增益,C4.5求的是信息增益率)。
六、训练、测试数据集样例
<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">训练集: outlook temperature humidity windy <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">---------------------------------------------------------</span> sunny hot high <span class="hljs-constant" style="box-sizing: border-box;">false</span> N sunny hot high <span class="hljs-constant" style="box-sizing: border-box;">true</span> N overcast hot high <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y rain mild high <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y rain cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span> Y rain cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span> N overcast cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span> Y 测试集 outlook temperature humidity windy <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">--------------------------------------------------------- </span> sunny mild high <span class="hljs-constant" style="box-sizing: border-box;">false</span> sunny cool <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span> rain mild <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span> sunny mild <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">true</span> overcast mild high <span class="hljs-constant" style="box-sizing: border-box;">true</span> overcast hot <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">normal</span> <span class="hljs-constant" style="box-sizing: border-box;">false</span> rain mild high <span class="hljs-constant" style="box-sizing: border-box;">true</span> </code>