History of pruning algorithm development and python implementation(finished)

本文链接：https://blog.csdn.net/appleyuchi/article/details/83692381

All the python-implementation for 7 post-pruning Algorithms
are here.

Table of Decision Trees:

name of tree	inventer	name of article	year
ID3	Ross Quinlan	《Discovering rules by induction from large collections of examples》	1979
ID3	Ross Quinlan	Another origin:《Learning efficient classification procedures and their application to chess end games》	1983a
CART	Leo Breiman	《Classification and Regression Trees》	1984
C4.5	Ross Quinlan	《C4.5: Programming for machine learning》	1993
C5.0	Ross Quinlan	Commercial Edition of C4.5 ,no relevant papers	-

Table of Pruning Algorithms:

name of post-pruning algorithm	name of article or book	year	inventer	the tree pruned	Remark
Pessimistic Pruning	《Simplifying Decision Trees》part2.3	1986b(也有1987b的说法，这里以论文上写的时间为准)	Quinlan	C4.5	Ross Quinlan　invented “Pessimistic Pruning”， John Mingers rename it as “Pessimistic Error Pruning” in his article《An Empirical Comparison of Pruning Methods for Decision Tree induction》
Reduced Error Pruning	《Simplifying Decision Trees》part2.2	1986b	Quinlan	C4.5	需要额外的验证集才能剪枝
Cost-Complexity Pruning	《Classification and Regression Trees》3.3节	1984	L Breiman	CART	For Classification Tree only
Error-Complexity Pruning	《Classification and Regression Trees》8.5.1节	1984	L Breiman	CART	For Regression Tree only
Critical Value Pruning	《Expert System-Rule Induction with Statistical Data》,还有一说是:《An Empirical Comparison of Pruning Methods for Decision Tree Induction》但是该文作者自己说是引用自1987年的论文	1987a	John Mingers	论文中没有明说哪一种	关于CVP算法的出处众说纷纭，这里的出处是以《An Empirical Comparison of Pruning Methods for Ensemble Classifiers》P212提到的为准
Minimum-Error Pruning	《Learning decision rules in noisy domains》	1986	Niblett and Bratko	－	Can Not be Downloaded from Internet
Minimum-Error Pruning	《on estimating probabilities in tree pruning》	1991	Bojan Cestnik,Ivan Bratko	－	modified MEP algorithm
Error-Based Pruning	《C4.5: Programs for Machine Learning 》4.2节	1993	Quinlan	C4.5	EBP is an evolution of PEP

Pruning Target of classification trees(ID3,C4.5,CART-classification)
1.simplifying your classification trees without losing precision too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit

Pruning Target of classification trees(CART-Regression):
1.simplifying your classification trees without increasing MSE too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit

----------------For C4.5-----------------
U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》
Weka的-3.6.10的C4.5与Quinlan教授的C4.5算法的区别
 《C4.5: Programs for Machine Learning》chaper4实验结果重现
 C4.5-Release8中Ross Quinlan对缺失值的处理
 C4.5最新版本Release8与ＭＤＬ的关系的详细解读
 some understanding of《Inferring Decision Trees Using the Minimum Description Length Principle*》
some understanding of《Improved Use of Continuous Attributes in C4.5》
C4.5-Release8的代码架构图

------------REP-start--------
ID3的REP（Reduced Error Pruning）剪枝代码详细解释+周志华《机器学习》决策树图4.5、图4.6、图4.7绘制
 周志華《機器學習》圖4.4和图4.9繪制(轉載+增加熵顯示功能)
ID3决策树中连续值的处理+周志华《機器學習》图4.8和图4.10绘制
 sklearn没有实现ID3算法

------------REP-end--------

----------------PEP-start-----------------
Earliest PEP Algorithm Principles
Pessimistic Error Pruning example of C4.5
Pessimistic error pruning illustration with C4.5-python implemention
----------------PEP-end----------------

-------------EBP-start--------------------
Error Based Pruning剪枝算法、代码实现与举例
这里有人会质疑为何不直接采用weka中的J48的python接口？
注意，weka是以quinlan的C语言版本代码为准的(说白了就是J48抄的C4.5-Release8)，在某些数据集中，例如使用hypo数据集，weka生成的决策树很庞大,这是非常糟糕的。
因为决策树的初衷是帮助分类，生成知识，
十分庞大的决策树显然是不利于使用的。
J48:Java edition of C4.5-Release8
-------------EBP-end--------------------

-------------MEP-start--------------------
Two Examples of Minimum Error Pruning(reprint)
MEP(minimum error pruning) principle with python implemention

-----------------MEP-end--------------------

-----------------CVP-start--------------------
proof of Chi-square statistics used in CVP
CVP(Critical Value Pruning)illustration with clear principle in details
CVP(Critical value pruning)examples with python implemention
Error in a paper about CVP
contingency(列联表)python计算与实验结果分析
 python卡方分布计算
-----------------CVP-end--------------------

------------------------For CART-------------------------------------------------------
notes from《classification and regression trees》
＼－－－－－－－－CCP start－－－－－－－－－－－－－－－－－－－－－
《统计学习方法》P59决策树绘制-sklearn版本
 CCP(Cost complexity pruning) on sklearn with python implemention
Theory Defect in selecting best pruned tree from CCP with Cross-validation
1SE rule details in CCP pruning of CART
＼－－－－－－－－CCP-end－－－－－－－－－－－－－－－－－－－－－

＼－－－－－－－－ECP start－－－－－－－－－－－－－－－－－－－－－
举例讲清楚模型树和回归树的区别
 Error Complexity Pruning for sklearn’s Regression Tree with Python Implementation
＼－－－－－－－－ECP end－－－－－－－－－－－－－－－－－－－－－

Note:
Critical Value Pruning can be both used in pre-pruning and post-pruning.
When in pre-pruing,IM(Information Measure)is replaced with Chi-Square Statistics.
When post-pruning with $\chi^2$ test,then you need an independent datasets.
Of course,if you grow a tree with CVP,then you need not post-prune it with CVP with the same data which is used to grow the tree.

$Tables\ about\ whether\ you\ need\ extra\ validation\ sets\ when\ pruning$
In “Reference and Quotation” of the following table,
the word “test"means sets with
“class label”,so it actually means"validation datasets”.

Pruning Algorithm	Need extra validation datasets?	Reference and Quotation
REP(Reduced Error Pruning)	yes	According to 《An Empirical Comparison of Pruing Methods for Decision Tree Induction》Section2.2.4: “The pruned node will often make fewer errors on the test data than the sub-tree makes.”
CCP(Cost Complexity Pruning)	pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no	According to《Simplifying Decision Trees》part 2.1: “Finally,the procedure requires a test set distinct from the original training set”
ECP(Error Complexity Pruning)	pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no	According to《An Empirical Comparison of Pruing Methods for Decision Tree Induction》part 2.2.1-page230: “Instead,each of the pruned trees is used to classify an independent test data set”
CVP(Critical Value Pruning)	pre-pruning:no post-pruning:yes	《Expert System-Rule Induction with Statistical Data》(pre-prune): “Intial runs were performed using chi-square purely as a means of choosing attributes-not of judging their significance-onthe two years separately and on the data as a whole.”
MEP(Minimum Error Pruning)	It all depends how you set m	According to《ON ESTIMATING PROBABILITIES IN TREE PRUNING》 Section1: "m can be adjusted to some essential properties of the learning domain,such as the level of noise in the learning data." Section 2: “m can be set so as to maximise the classification accuracy on an independent data set”
PEP(Pessimistic Error Pruning)	no	According to 《Simplifying Decision Trees》Section2.3: “Unlike these methods,it does not require a test set separate from the cases in the training set from which the tree was constructed.”
EBP(Error Based Pruning)	no	According to《C4.5:Programs for machine learning》Page 40th: “The approach taken in C4.5 belongs to the second family of techniques that use only the training set from which the tree was build.”

Atttention:
When your datasets is small,you need Cross Validation in CCP、ECP to produce k candidate model,and each candidate model is then pruned many times to get its pruned tree sequence (K such sequences totally),and finally to choose the best pruned tree from among the K sequences.

When your datasets is large,you need 1-SE Rule in CCP、ECP to select the best pruned tree.
In above Github Link ,we use the ② method.

$Pruning\ Style\ Table\ of\ Algorithms$

Pruning Algorithm	Pruning Style	Reference and Quotation
REP(Reduced Error Pruning)	bottom-up	NOT mentioned in earliest article 《Simplifying Decision Trees》about it
CCP(Cost Complexity Pruning)	bottom-up	《Classification and Regression Trees》 -LEO BREIMAN Chapter 3.3 MINIMAL COST-COMPLEXITY PRUNING: "Thus, the pruning process produces a finite sequence of subtrees T1 , T2 , T3 , …with progressively fewer terminal nodes "
ECP(Error Complexity Pruning)	bottom-up	《Classification and Regression Trees》 -LEO BREIMAN Chpater8.5.1: “Now minimal error-complexity pruning is done exactly as minimal cost-complexity pruning in classification.”
CVP(Critical Value Pruning)	top-down in pre-pruning bottom-up in post-pruing	pre-pruning 《Expert Systems-Rule Induction with Statistical Data》"The ID3 algorithm,with the enhancements mentioned previously, $^1$ was modified to calculate $\chi^2$ instead of IM" post-pruning: https://www.cs.rit.edu/~rlaz/prec2010/slides/DecisionTrees.pdf
MEP(Minimum Error Pruning)	bottom-up	《The effects of pruning methods on the predictive accuracy of induced decision rules》: “Niblett and Bratko[26] proposed a bottom-up approach for seaching a single tree that minimizes the expected error rate.”
PEP(Pessimistic Error Pruning)	top-down	《Top-Down Induction of Decision Trees Classifiers – A Survey》part E: “The pessimistic pruning procedure performs top-down traversing over the internal nodes.”
EBP(Error Based Pruning)	bottom-up	《C4.5: Programs for Machine Learning》page 39th: “Start from the bottom of the tree and examine each nonleaf sub-tree.”

Do all above pruning tend to be overpruning or underpruning?
According to the following 2 articles:
《Simplifying Decision Trees by Pruning and Grafting:New Results》
《Top-Down Induction of Decision Trees Classifiers –A Survey》
The result table is as follows:

Pruning Algorithm	tendency of pruning
REP	over-pruning or not significant
PEP	under-pruning
EBP	under-pruning
MEP	under-pruning
CVP	under-pruning
CCP	under-pruning(from my own experiment)
ECP	under-pruning (from my own experiment)