Note:
CVP(Critical Value Pruning) is also called
Chi-Square Pruning(test) in many materials.
The following is a contingency table[1]:
H
0
:
X
i
j
n
=
N
i
N
j
n
2
H_0:\frac{X_{ij}}{n}=\frac{N_iN_j}{n^2}
H0:nXij=n2NiNj
H
1
:
X
i
j
n
≠
N
i
N
j
n
2
H_1:\frac{X_{ij}}{n}≠\frac{N_iN_j}{n^2}
H1:nXij=n2NiNj
N
i
j
=
X
i
j
N_{ij}=X_{ij}
Nij=Xij
when
∑
i
=
1
i
=
r
∑
j
=
1
j
=
s
(
N
i
j
−
N
i
⋅
N
j
n
)
2
N
i
⋅
N
j
n
<
χ
[
(
r
−
1
)
(
s
−
1
)
]
,
α
2
=
c
r
i
t
i
c
a
l
v
a
l
u
e
\sum_{i=1}^{i=r}\sum_{j=1}^{j=s}\frac{(N_{ij}-\frac{N_i·N_j}{n})^2} {\frac{N_i·N_j}{n}}<\chi_{[(r-1)(s-1)],\alpha}^2=critical \ value
i=1∑i=rj=1∑j=snNi⋅Nj(Nij−nNi⋅Nj)2<χ[(r−1)(s−1)],α2=critical value
then
H
0
H_0
H0 is accepted and the decision tree is pruned.
among which
α
\alpha
α can be set as 0.05,etc.
n
n
n is the total length(means counts,quantity) of your datasets.
The relationships between contingency table and Decision Tree are listed in the following table:
Split node with Attribute f f f | class 1 1 1 | class 2 2 2 | … \dots … | class s s s |
---|---|---|---|---|
branch 1 1 1 | n L 1 n_{L1} nL1 | n L 2 n_{L2} nL2 | … \dots … | n L n_L nL |
branch 2 2 2 | n R 1 n_{R1} nR1 | n R 2 n_{R2} nR2 | … \dots … | n R n_R nR |
⋮ \vdots ⋮ | ⋮ \vdots ⋮ | ⋮ \vdots ⋮ | ⋱ \ddots ⋱ | ⋮ \vdots ⋮ |
branch r r r | n 1 n_1 n1 | n 2 n_2 n2 | … \dots … | n n n |
Now let’s use the above table to learn the following lecture PPT[2].
in the above picture,some parameters are explained in the table in front of it.
Let’s go on…
In the above ppt,note that CVP can both be used in pre-pruning and post-pruning stages.
According to the growth stage ,we know “less than critical value” happend before current sub-tree is pruned.
So,we can infer from the above lecture PDF that the post-pruning will have the same experience.
How to understand the above pruning criterion?
------------------------------------------
In above table,
different branches
=different value level of current decision node of decision tree
(also called split node,one split node owns one Attribute of datasets)
when
H
0
H_0
H0 is accepted,then
X
i
j
N
i
⋅
≈
N
⋅
j
n
\frac{X_{ij}}{N_{i·}}≈\frac{N_{·j}}{n}
Ni⋅Xij≈nN⋅j
which means:
the probability of “items belongs to class j” in each
i
t
h
(
i
∈
[
1
,
r
]
)
i_{th}(i∈[1,r])
ith(i∈[1,r]) branch
=the probability of “items belongs to class j” in all datasets
=>Merging(prune) these branches into one leaf will Not make the probability of “items belongs to class j”vary too much,which means that accuracy will not vary too much after being pruned
In conclusion,when Chi-Square Statistics do Not reach the Critical Value,branches(different value levels of split Attribute of Decision Tree)
will not contribute too much for increasing the accuracy,then these branches can be pruned.
The above conclusion can be used directly when we implement our CVP(Critical Value Pruning) algorithm with python.
We can also learn from above analysis that CVP is targeted at simplifying your decision tree while Not losing accuracy too much.
Reference:
[1]http://www.maths.manchester.ac.uk/~saralees/pslect8.pdf
[2]https://www.docin.com/p1-2336928230.html