Weka 学习:J48(C4.5)

最新推荐文章于 2017-07-11 16:27:05 发布

MrRoyLee

最新推荐文章于 2017-07-11 16:27:05 发布

阅读量1.2k

点赞数

分类专栏： Weka学习

本文链接：https://blog.csdn.net/MrRoyLee/article/details/11725287

版权

Before writing:To improve my english,I will write my blog in English.

Section 1: J48

J48 is a class to implement C4.5 algorithm.Look at part of the code.In thebuildClassifier(...) function,there are two important classes:ModelSelection (It is extended by the class of BinC45ModelSelectionand the class of C45ModelSelection)& ClassifierTree (It is extended by the class of C45PruneableClassifierTreeandthe class of PruneableClassifierTree) .theModelSelection is used to select sons of the node and theClassifierTree is used with ModelSelection to build the tree.

 public void buildClassifier(Instances instances) 
       throws Exception {

    ModelSelection modSelection;	 

    if (m_binarySplits)
      modSelection = new BinC45ModelSelection(m_minNumObj, instances);
    else
      modSelection = new C45ModelSelection(m_minNumObj, instances);
    if (!m_reducedErrorPruning)
      m_root = new C45PruneableClassifierTree(modSelection, !m_unpruned, m_CF,
					    m_subtreeRaising, !m_noCleanup);
    else
      m_root = new PruneableClassifierTree(modSelection, !m_unpruned, m_numFolds,
					   !m_noCleanup, m_Seed);
    m_root.buildClassifier(instances);
    if (m_binarySplits) {
      ((BinC45ModelSelection)modSelection).cleanup();
    } else {
      ((C45ModelSelection)modSelection).cleanup();
    }
  }

Section 2: ModelSelection

Secition 2.1 Split

In this part,I will discuss class BinC45ModelSelection and class C45ModelSelection.In general,these two classes are the iterations of selecting the best attribute with the highest info gain ratio.To calculate the info gain of a certain attribute is what the class BinC45Split or C45Split dose.So firstly,let us give a glance of BinC45Split and C45Split .

These two classes are so alike that the only difference isthat for a nominal attribute, the former splits it into two subsets while the latter splits it into multiple subsets.So i will only explain the former(the latter is easier).

 public void buildClassifier(Instances trainInstances) 
       throws Exception {

    // Initialize the remaining instance variables.
    m_numSubsets = 0;
    m_splitPoint = Double.MAX_VALUE;
    m_infoGain = 0;
    m_gainRatio = 0;

    // Different treatment for enumerated and numeric
    // attributes.
    if (trainInstances.attribute(m_attIndex).isNominal()) {
      m_complexityIndex = trainInstan

最低0.47元/天解锁文章

MrRoyLee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Weka 学习:J48(C4.5)

Before writing:To improve my english,I will write my blog in English. Section One: J48 J48 is a class to implement C4.5 algorithm.Look at part of the code.In thebuildClass
复制链接

扫一扫