Before writing:To improve my english,I will write my blog in English.
Section 1: J48
J48 is a class to implement C4.5 algorithm.Look at part of the code.In thebuildClassifier(...) function,there are two important classes:ModelSelection (It is extended by the class of BinC45ModelSelectionand the class of C45ModelSelection)& ClassifierTree (It is extended by the class of C45PruneableClassifierTreeandthe class of PruneableClassifierTree) .theModelSelection is used to select sons of the node and theClassifierTree is used with ModelSelection to build the tree.
public void buildClassifier(Instances instances)
throws Exception {
ModelSelection modSelection;
if (m_binarySplits)
modSelection = new BinC45ModelSelection(m_minNumObj, instances);
else
modSelection = new C45ModelSelection(m_minNumObj, instances);
if (!m_reducedErrorPruning)
m_root = new C45PruneableClassifierTree(modSelection, !m_unpruned, m_CF,
m_subtreeRaising, !m_noCleanup);
else
m_root = new PruneableClassifierTree(modSelection, !m_unpruned, m_numFolds,
!m_noCleanup, m_Seed);
m_root.buildClassifier(instances);
if (m_binarySplits) {
((BinC45ModelSelection)modSelection).cleanup();
} else {
((C45ModelSelection)modSelection).cleanup();
}
}
Section 2: ModelSelection
Secition 2.1 Split
In this part,I will discuss class BinC45ModelSelection and class C45ModelSelection.In general,these two classes are the iterations of selecting the best attribute with the highest info gain ratio.To calculate the info gain of a certain attribute is what the class BinC45Split or C45Split dose.So firstly,let us give a glance of BinC45Split and C45Split .
These two classes are so alike that the only difference isthat for a nominal attribute, the former splits it into two subsets while the latter splits it into multiple subsets.So i will only explain the former(the latter is easier).
public void buildClassifier(Instances trainInstances)
throws Exception {
// Initialize the remaining instance variables.
m_numSubsets = 0;
m_splitPoint = Double.MAX_VALUE;
m_infoGain = 0;
m_gainRatio = 0;
// Different treatment for enumerated and numeric
// attributes.
if (trainInstances.attribute(m_attIndex).isNominal()) {
m_complexityIndex = trainInstan