All the python-implementation for 7 post-pruning Algorithms
are here.
Table of Decision Trees:
name of tree | inventer | name of article | year |
---|---|---|---|
ID3 | Ross Quinlan | 《Discovering rules by induction from large collections of examples》 | 1979 |
ID3 | Ross Quinlan | Another origin:《Learning efficient classification procedures and their application to chess end games》 | 1983a |
CART | Leo Breiman | 《Classification and Regression Trees》 | 1984 |
C4.5 | Ross Quinlan | 《C4.5: Programming for machine learning》 | 1993 |
C5.0 | Ross Quinlan | Commercial Edition of C4.5 ,no relevant papers | - |
Table of Pruning Algorithms:
name of post-pruning algorithm | name of article or book | year | inventer | the tree pruned | Remark |
---|---|---|---|---|---|
Pessimistic Pruning | 《Simplifying Decision Trees》part2.3 | 1986b(也有1987b的说法,这里以论文上写的时间为准) | Quinlan | C4.5 | Ross Quinlan invented “Pessimistic Pruning”, John Mingers rename it as “Pessimistic Error Pruning” in his article《An Empirical Comparison of Pruning Methods for Decision Tree induction》 |
Reduced Error Pruning | 《Simplifying Decision Trees》part2.2 | 1986b | Quinlan | C4.5 | 需要额外的验证集才能剪枝 |
Cost-Complexity Pruning | 《Classification and Regression Trees》3.3节 | 1984 | L Breiman | CART | For Classification Tree only |
Error-Complexity Pruning | 《Classification and Regression Trees》8.5.1节 | 1984 | L Breiman | CART | For Regression Tree only |
Critical Value Pruning | 《Expert System-Rule Induction with Statistical Data》,还有一说是:《An Empirical Comparison of Pruning Methods for Decision Tree Induction》但是该文作者自己说是引用自1987年的论文 | 1987a | John Mingers | 论文中没有明说哪一种 | 关于CVP算法的出处众说纷纭,这里的出处是以《An Empirical Comparison of Pruning Methods for Ensemble Classifiers》P212提到的为准 |
Minimum-Error Pruning | 《Learning decision rules in noisy domains》 | 1986 | Niblett and Bratko | - | Can Not be Downloaded from Internet |
Minimum-Error Pruning | 《on estimating probabilities in tree pruning》 | 1991 | Bojan Cestnik,Ivan Bratko | - | modified MEP algorithm |
Error-Based Pruning | 《C4.5: Programs for Machine Learning 》4.2节 | 1993 | Quinlan | C4.5 | EBP is an evolution of PEP |
Pruning Target of classification trees(ID3,C4.5,CART-classification)
1.simplifying your classification trees without losing precision too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit
Pruning Target of classification trees(CART-Regression):
1.simplifying your classification trees without increasing MSE too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit
----------------For C4.5-----------------
U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》
Weka的-3.6.10的C4.5与Quinlan教授的C4.5算法的区别
《C4.5: Programs for Machine Learning》chaper4实验结果重现
C4.5-Release8中Ross Quinlan对缺失值的处理
C4.5最新版本Release8与MDL的关系的详细解读
some understanding of《Inferring Decision Trees Using the Minimum Description Length Principle*》
some understanding of《Improved Use of Continuous Attributes in C4.5》
C4.5-Release8的代码架构图
------------REP-start--------
ID3的REP(Reduced Error Pruning)剪枝代码详细解释+周志华《机器学习》决策树图4.5、图4.6、图4.7绘制
周志華《機器學習》圖4.4和图4.9繪制(轉載+增加熵顯示功能)
ID3决策树中连续值的处理+周志华《機器學習》图4.8和图4.10绘制
sklearn没有实现ID3算法
------------REP-end--------
----------------PEP-start-----------------
Earliest PEP Algorithm Principles
Pessimistic Error Pruning example of C4.5
Pessimistic error pruning illustration with C4.5-python implemention
----------------PEP-end----------------
-------------EBP-start--------------------
Error Based Pruning剪枝算法、代码实现与举例
这里有人会质疑为何不直接采用weka中的J48的python接口?
注意,weka是以quinlan的C语言版本代码为准的(说白了就是J48抄的C4.5-Release8),在某些数据集中,例如使用hypo数据集,weka生成的决策树很庞大,这是非常糟糕的。
因为决策树的初衷是帮助分类,生成知识,
十分庞大的决策树显然是不利于使用的。
J48:Java edition of C4.5-Release8
-------------EBP-end--------------------
-------------MEP-start--------------------
Two Examples of Minimum Error Pruning(reprint)
MEP(minimum error pruning) principle with python implemention
-----------------MEP-end--------------------
-----------------CVP-start--------------------
proof of Chi-square statistics used in CVP
CVP(Critical Value Pruning)illustration with clear principle in details
CVP(Critical value pruning)examples with python implemention
Error in a paper about CVP
contingency(列联表)python计算与实验结果分析
python卡方分布计算
-----------------CVP-end--------------------
------------------------For CART-------------------------------------------------------
notes from《classification and regression trees》
\--------CCP start---------------------
《统计学习方法》P59决策树绘制-sklearn版本
CCP(Cost complexity pruning) on sklearn with python implemention
Theory Defect in selecting best pruned tree from CCP with Cross-validation
1SE rule details in CCP pruning of CART
\--------CCP-end---------------------
\--------ECP start---------------------
举例讲清楚模型树和回归树的区别
Error Complexity Pruning for sklearn’s Regression Tree with Python Implementation
\--------ECP end---------------------
Note:
Critical Value Pruning can be both used in pre-pruning and post-pruning.
When in pre-pruing,IM(Information Measure)is replaced with Chi-Square Statistics.
When post-pruning with
χ
2
\chi^2
χ2 test,then you need an independent datasets.
Of course,if you grow a tree with CVP,then you need not post-prune it with CVP with the same data which is used to grow the tree.
T
a
b
l
e
s
a
b
o
u
t
w
h
e
t
h
e
r
y
o
u
n
e
e
d
e
x
t
r
a
v
a
l
i
d
a
t
i
o
n
s
e
t
s
w
h
e
n
p
r
u
n
i
n
g
Tables\ about\ whether\ you\ need\ extra\ validation\ sets\ when\ pruning
Tables about whether you need extra validation sets when pruning
In “Reference and Quotation” of the following table,
the word “test"means sets with
“class label”,so it actually means"validation datasets”.
Pruning Algorithm | Need extra validation datasets? | Reference and Quotation |
---|---|---|
REP(Reduced Error Pruning) | yes | According to 《An Empirical Comparison of Pruing Methods for Decision Tree Induction》Section2.2.4: “The pruned node will often make fewer errors on the test data than the sub-tree makes.” |
CCP(Cost Complexity Pruning) | pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no | According to《Simplifying Decision Trees》part 2.1: “Finally,the procedure requires a test set distinct from the original training set” |
ECP(Error Complexity Pruning) | pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no | According to《An Empirical Comparison of Pruing Methods for Decision Tree Induction》part 2.2.1-page230: “Instead,each of the pruned trees is used to classify an independent test data set” |
CVP(Critical Value Pruning) | pre-pruning:no post-pruning:yes | 《Expert System-Rule Induction with Statistical Data》(pre-prune): “Intial runs were performed using chi-square purely as a means of choosing attributes-not of judging their significance-onthe two years separately and on the data as a whole.” |
MEP(Minimum Error Pruning) | It all depends how you set m | According to《ON ESTIMATING PROBABILITIES IN TREE PRUNING》 Section1: "m can be adjusted to some essential properties of the learning domain,such as the level of noise in the learning data." Section 2: “m can be set so as to maximise the classification accuracy on an independent data set” |
PEP(Pessimistic Error Pruning) | no | According to 《Simplifying Decision Trees》Section2.3: “Unlike these methods,it does not require a test set separate from the cases in the training set from which the tree was constructed.” |
EBP(Error Based Pruning) | no | According to《C4.5:Programs for machine learning》Page 40th: “The approach taken in C4.5 belongs to the second family of techniques that use only the training set from which the tree was build.” |
Atttention:
When your datasets is small,you need Cross Validation in CCP、ECP to produce k candidate model,and each candidate model is then pruned many times to get its pruned tree sequence (K such sequences totally),and finally to choose the best pruned tree from among the K sequences.
When your datasets is large,you need 1-SE Rule in CCP、ECP to select the best pruned tree.
In above Github Link ,we use the ② method.
P r u n i n g S t y l e T a b l e o f A l g o r i t h m s Pruning\ Style\ Table\ of\ Algorithms Pruning Style Table of Algorithms
Pruning Algorithm | Pruning Style | Reference and Quotation |
---|---|---|
REP(Reduced Error Pruning) | bottom-up | NOT mentioned in earliest article 《Simplifying Decision Trees》about it |
CCP(Cost Complexity Pruning) | bottom-up | 《Classification and Regression Trees》 -LEO BREIMAN Chapter 3.3 MINIMAL COST-COMPLEXITY PRUNING: "Thus, the pruning process produces a finite sequence of subtrees T1 , T2 , T3 , …with progressively fewer terminal nodes " |
ECP(Error Complexity Pruning) | bottom-up | 《Classification and Regression Trees》 -LEO BREIMAN Chpater8.5.1: “Now minimal error-complexity pruning is done exactly as minimal cost-complexity pruning in classification.” |
CVP(Critical Value Pruning) | top-down in pre-pruning bottom-up in post-pruing | pre-pruning 《Expert Systems-Rule Induction with Statistical Data》"The ID3 algorithm,with the enhancements mentioned previously, 1 ^1 1was modified to calculate χ 2 \chi^2 χ2instead of IM" post-pruning: https://www.cs.rit.edu/~rlaz/prec2010/slides/DecisionTrees.pdf |
MEP(Minimum Error Pruning) | bottom-up | 《The effects of pruning methods on the predictive accuracy of induced decision rules》: “Niblett and Bratko[26] proposed a bottom-up approach for seaching a single tree that minimizes the expected error rate.” |
PEP(Pessimistic Error Pruning) | top-down | 《Top-Down Induction of Decision Trees Classifiers – A Survey》part E: “The pessimistic pruning procedure performs top-down traversing over the internal nodes.” |
EBP(Error Based Pruning) | bottom-up | 《C4.5: Programs for Machine Learning》page 39th: “Start from the bottom of the tree and examine each nonleaf sub-tree.” |
Do all above pruning tend to be overpruning or underpruning?
According to the following 2 articles:
《Simplifying Decision Trees by Pruning and Grafting:New Results》
《Top-Down Induction of Decision Trees Classifiers –A Survey》
The result table is as follows:
Pruning Algorithm | tendency of pruning |
---|---|
REP | over-pruning or not significant |
PEP | under-pruning |
EBP | under-pruning |
MEP | under-pruning |
CVP | under-pruning |
CCP | under-pruning(from my own experiment) |
ECP | under-pruning (from my own experiment) |
markdown tables generation table
https://tool.lu/tables