Decision Tree_information gain decision tree root node-CSDN博客

Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Related image

information gain
Firstly, we need to talk about what is entropy and conditional entropy.

we assume that probability distribution of X is:

$P\left ( X= x_{i} \right )= p_{i},i=1,2,...,n$
hence we have entropy (A measure of the amount of information, higher means more important):

$H\left ( p \right )= -\sum_{i=1}^{n}p_{i}logp_{i}$

conditional entropy :

$H\left ( Y\mid X \right )= -\sum_{i=1}^{n}p_{i}H\left ( Y\mid X = x_{i}\right )$

information gain:

$g\left ( D,A \right )= H\left ( D\right )- H\left ( D\mid A \right )$

information gain ratio

information gain is based on your training set, it doesn't have absolute meaning, we prefer to use information gain ratio to correct it.

$g_{R}\left ( D,A \right )= \frac{g(D,A)}{H(D)}$

Three main algorithms:

ID3

ID3 (Examples, Target_Attribute, Attributes)
    Create a root node for the tree
    If all examples are positive, Return the single-node tree Root, with label = +.
    If all examples are negative, Return the single-node tree Root, with label = -.
    If number of predicting attributes is empty, then Return the single node tree Root,
    with label = most common value of the target attribute in the examples.
    Otherwise Begin
        A ← The Attribute that best classifies examples(highest information gain).
        Decision Tree attribute for Root = A.
        For each possible value,  $v i$ , of A,
            Add a new tree branch below Root, corresponding to the test A =  $v i$ .
            Let Examples( $v i$ ) be the subset of examples that have the value  $v i$  for A
            If Examples( $v i$ ) is empty
                Then below this new branch add a leaf node with label = most common target value in the examples
            Else below this new branch add the subtree ID3 (Examples( $v i$ ), Target_Attribute, Attributes – {A})
    End
    Return Root

C4.5

Same process as ID3 execpt C4.5 use highest information gain ratio as the measure to choose the attribute that best classifies examples

CART

Classification and regression trees (CART) are a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.

Decision trees are formed by a collection of rules based on variables in the modeling data set:

Rules based on variables' values are selected to get the best split to differentiate observations based on the dependent variable
Once a rule(gini impurity usually) is selected and splits a node into two, the same process is applied to each "child" node (i.e. it is a recursive procedure)
Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met. (Alternatively, the data are split as much as possible and then the tree is later pruned.)

Pruning

Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.