According to《Estimating Probabilities: A Crucial Task in Machine learning》:
Θ
=
p
(
C
)
p
(
C
∣
V
1
)
p
(
C
)
p
(
C
∣
V
1
V
2
)
p
(
C
∣
V
1
)
p
(
C
∣
V
1
V
2
V
3
)
p
(
C
∣
V
1
V
2
)
⋅
⋅
⋅
⋅
①
w
h
e
r
e
Θ=p(C)\frac{p(C|V_1)}{p(C)} \frac{p(C|V_1V_2)}{p(C|V_1)} \frac{p(C|V_1V_2V_3)}{p(C|V_1V_2)}····①\\ where
Θ=p(C)p(C)p(C∣V1)p(C∣V1)p(C∣V1V2)p(C∣V1V2)p(C∣V1V2V3)⋅⋅⋅⋅①where
h
(
i
)
=
p
(
C
∣
V
i
)
p
(
C
)
h(i)=\frac{p(C|V_i)}{p(C)}
h(i)=p(C)p(C∣Vi)
p
(
C
∣
V
i
)
=
n
(
C
V
i
)
n
(
V
i
)
p(C|V_i)=\frac{n(CV_i)}{n(V_i)}
p(C∣Vi)=n(Vi)n(CVi)
i
f
n
(
V
i
)
=
0
,
t
h
e
n
h
(
i
)
=
1
if \ n(V_i)=0,\\ then \ h(i)=1
if n(Vi)=0,then h(i)=1
i
f
n
(
V
i
)
>
0
a
n
d
n
(
C
V
i
)
=
0
,
t
h
e
n
h
(
i
)
=
0
if \ n(V_i)>0 \ and\ n(CV_i)=0,\\ then \ h(i)=0
if n(Vi)>0 and n(CVi)=0,then h(i)=0
which causes that the estimation unreliable.
Because when h(i)=0,no matter how the other factors in ①varies,
Θ
Θ
Θ=0.
So to solve the above problem,
β
\beta
β distribution is used.
According to 《on estimating probabilities in tree pruning》:
There are 3 stages:
①tree construction stage,m=0
②tree pruning stage, m>0
③classification phase,a new different m.
In summary ,at each stage ,you need a different m.
E
s
=
1
−
n
e
+
p
a
e
⋅
m
N
+
m
=
N
−
n
e
+
(
1
−
p
a
e
⋅
m
)
N
+
m
E_s=1-\frac{n_e+p_{ae}·m}{N+m}=\frac{N-n_e+(1-p_{ae}·m)}{N+m}
Es=1−N+mne+pae⋅m=N+mN−ne+(1−pae⋅m)
N
N
N:total number of examples reach the node.
n
e
n_e
ne:number of examples in class c that minimises
E
s
E_s
Es for the given
m
m
m.
p
a
e
p_{ae}
pae:aprioir probability of class c.
m
m
m:the parameter of the estimation method.
The backed-up Error is:
∑
i
=
1
c
o
u
n
t
s
o
f
a
l
l
s
u
b
−
t
r
e
e
s
p
i
⋅
E
i
\sum_{i=1}^{counts\ of\ all\ sub-trees} p_i·E_i
i=1∑counts of all sub−treespi⋅Ei
E
i
E_i
Ei refers to the
i
t
h
i_{th}
ithsub-tree’s static error
The criterion to prune a tree is:
E
b
≥
E
s
E_b≥E_s
Eb≥Es
More Details from the MEP’s author’s replies:
datasets and python-implemention for MEP are both included in my Github:
https://github.com/appleyuchi/Decision_Tree_Prune
*******************************************************
we use the following settings by default:
m=2
p
a
e
=
p_{ae}=
pae=apriority probabilities of each class
*******************************************************
if you like Laplace’s Law of succession methoned in
<estimating probabilities in tree pruning>page 139th,
just set:
m=counts of classes of datasets
p
a
e
=
1
m
p_{ae}=\frac{1}{m}
pae=m1
*******************************************************
Attention:
①MEP is proposed on the basis of ID3,but I decided to implement it on C4.5
②although《on estimating probabilities in tree pruning》said you need to set m=0 when creating your decision model,but
the original C4.5 model is created from
http://www.rulequest.com/Personal/c4.5r8.tar.gz
so we do Not need to set "m“ .
③although《on estimating probabilities in tree pruning》said you need to set m when you have a test.
I did not set “m” in my testing,
because I use the most common testing mechanism of C4.5,instead of the mechanism mentioned in above paper.
Now let’s perform our first MEP(minimum error pruning) experiment with “abalone datasets”
First,for much easier to visualize,I reorder the datasets by the last column and choose the first 200 items,
and save them as abalone_parts.data(you can find this file in my github)
The C4.5 model before MEP pruned:
model= {‘Viscera’: {’<=0.0145’: {‘Shucked’: {’>0.007’: ’ 4 (66.0/31.0)’, ‘<=0.007’: {‘Shucked’: {’<=0.0045’: {‘Height’: {’<=0.025’: ’ 1 (2.0/1.0)’, ‘>0.025’: ’ 3 (2.0)’}}, ‘>0.0045’: {‘Shucked’: {’>0.005’: {‘Height’: {’<=0.02’: ’ 4 (2.0)’, ‘>0.02’: ’ 3 (4.0)’}}, ‘<=0.005’: ’ 4 (3.0)’}}}}}}, ‘>0.0145’: {‘Shell’: {’<=0.0345’: {‘Viscera’: {’<=0.0285’: ’ 5 (50.0/9.0)’, ‘>0.0285’: ’ 4 (3.0)’}}, ‘>0.0345’: {‘Sex’: {’=M’: ’ 6 (6.0/3.0)’, ‘=F’: ’ 5 (3.0)’, ‘=I’: ’ 5 (59.0/12.0)’}}}}}}
The C4.5 model after MEP pruned:
model_pruned= {‘Viscera’: {’>0.0145’: {‘Shell’: {’<=0.0345’: {‘Viscera’: {’<=0.0285’: ’ 5 (50.0/9.0)’, ‘>0.0285’: ’ 4 (3.0)’}}, ‘>0.0345’: ‘5 (68/16)’}}, ‘<=0.0145’: {‘Shucked’: {’>0.007’: ’ 4 (66.0/31.0)’, ‘<=0.007’: {‘Shucked’: {’>0.0045’: {‘Shucked’: {’>0.005’: {‘Height’: {’<=0.02’: ’ 4 (2.0)’, ‘>0.02’: ’ 3 (4.0)’}}, ‘<=0.005’: ’ 4 (3.0)’}}, ‘<=0.0045’: {‘Height’: {’<=0.025’: ’ 1 (2.0/1.0)’, ‘>0.025’: ’ 3 (2.0)’}}}}}}}}
visualization of unpruned model:
visualization of MEP_pruned model:
The accuracy of MEP algorithm for 200items of abalone datasets:
accuracy_unprune= 0.72
accuracy_prune= 0.715
The accuracy of EBP algorithm for 200items of abalone datasets:
Evaluation on training data (200 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
20 56(28.0%) 17 57(28.5%) (36.1%) <<
Let’s put the above result in the following table:
pruning methods | unpruned accuracy | pruned accuracy |
---|---|---|
MEP | 0.72 | 71.5% |
EBP | 0.72 | 71.5% |
Now Let’s do the second MEP experiment with “Credit-a datasets”
Credit-a is from UCI(you can also find this in the above github link):
the C4.5 model generated from http://www.rulequest.com/Personal/c4.5r8.tar.gz
is(unpruned model):
model= {‘A9’: {’=t’: {‘A15’: {’>228’: ’ + (106.0/2.0)’, ‘<=228’: {‘A11’: {’>3’: {‘A15’: {’<=4’: ’ + (25.0)’, ‘>4’: {‘A15’: {’<=5’: ’ - (2.0)’, ‘>5’: {‘A7’: {’=v’: ’ + (5.0)’, ‘=bb’: ’ + (1.0)’, ‘=ff’: ’ + (0.0)’, ‘=j’: ’ + (0.0)’, ‘=o’: ’ + (0.0)’, ‘=n’: ’ + (0.0)’, ‘=h’: ’ + (3.0)’, ‘=dd’: ’ + (0.0)’, ‘=z’: ’ - (1.0)’}}}}}}, ‘<=3’: {‘A4’: {’=u’: {‘A7’: {’=v’: {‘A14’: {’<=110’: ’ + (18.0/1.0)’, ‘>110’: {‘A15’: {’>8’: ’ + (4.0)’, ‘<=8’: {‘A6’: {’=w’: {‘A12’: {’=t’: ’ - (2.0)’, ‘=f’: ’ + (3.0)’}}, ‘=q’: {‘A12’: {’=t’: ’ + (4.0)’, ‘=f’: ’ - (2.0)’}}, ‘=ff’: ’ - (0.0)’, ‘=r’: ’ - (0.0)’, ‘=x’: ’ - (0.0)’, ‘=e’: ’ - (0.0)’, ‘=d’: ’ - (2.0)’, ‘=c’: ’ - (4.0/1.0)’, ‘=m’: {‘A13’: {’=g’: ’ + (2.0)’, ‘=p’: ’ - (0.0)’, ‘=s’: ’ - (5.0)’}}, ‘=i’: ’ - (0.0)’, ‘=k’: ’ - (2.0)’, ‘=j’: ’ - (0.0)’, ‘=aa’: {‘A2’: {’<=41’: ’ - (3.0)’, ‘>41’: ’ + (2.0)’}}, ‘=cc’: ’ + (2.0/1.0)’}}}}}}, ‘=dd’: ’ + (0.0)’, ‘=ff’: ’ - (1.0)’, ‘=j’: ’ - (1.0)’, ‘=o’: ’ + (0.0)’, ‘=n’: ’ + (0.0)’, ‘=h’: ’ + (18.0)’, ‘=bb’: {‘A14’: {’<=164’: ’ + (3.4/0.4)’, ‘>164’: ’ - (5.6)’}}, ‘=z’: ’ + (1.0)’}}, ‘=l’: ’ + (0.0)’, ‘=y’: {‘A13’: {’=g’: {‘A14’: {’<=204’: ’ - (16.0/1.0)’, ‘>204’: ’ + (5.0/1.0)’}}, ‘=p’: ’ - (0.0)’, ‘=s’: ’ + (2.0)’}}, ‘=t’: ’ + (0.0)’}}}}}}, ‘=f’: {‘A13’: {’=g’: ’ - (204.0/10.0)’, ‘=p’: {‘A2’: {’<=36’: ’ - (4.0/1.0)’, ‘>36’: ’ + (2.0)’}}, ‘=s’: {‘A4’: {’=u’: {‘A6’: {’=w’: ’ - (0.0)’, ‘=q’: ’ - (1.0)’, ‘=ff’: ’ - (2.0)’, ‘=r’: ’ - (0.0)’, ‘=x’: ’ + (1.0)’, ‘=e’: ’ - (0.0)’, ‘=d’: ’ - (2.0)’, ‘=c’: ’ - (3.0)’, ‘=m’: ’ - (3.0)’, ‘=i’: ’ - (3.0)’, ‘=k’: ’ - (4.0)’, ‘=j’: ’ - (0.0)’, ‘=aa’: ’ - (0.0)’, ‘=cc’: ’ - (1.0)’}}, ‘=l’: ’ + (1.0)’, ‘=y’: ’ - (8.0/1.0)’, ‘=t’: ’ - (0.0)’}}}}}}
After EBP(invented by quinlan) pruned,the model is
{‘A9’: {’=t’: {‘A15’: {’>228’: ’ + (106.0/3.8)’, ‘<=228’: {‘A11’: {’>3’: {‘A15’: {’>4’: {‘A15’: {’<=5’: ’ - (2.0/1.0)’, ‘>5’: ’ + (10.0/2.4)’}}, ‘<=4’: ’ + (25.0/1.3)’}}, ‘<=3’: {‘A4’: {’=u’: {‘A7’: {’=v’: {‘A14’: {’<=110’: ’ + (18.0/2.5)’, ‘>110’: {‘A15’: {’>8’: ’ + (4.0/1.2)’, ‘<=8’: {‘A6’: {’=aa’: {‘A2’: {’<=41’: ’ - (3.0/1.1)’, ‘>41’: ’ + (2.0/1.0)’}}, ‘=w’: {‘A12’: {’=t’: ’ - (2.0/1.0)’, ‘=f’: ’ + (3.0/1.1)’}}, ‘=q’: {‘A12’: {’=t’: ’ + (4.0/1.2)’, ‘=f’: ’ - (2.0/1.0)’}}, ‘=ff’: ’ - (0.0)’, ‘=r’: ’ - (0.0)’, ‘=i’: ’ - (0.0)’, ‘=x’: ’ - (0.0)’, ‘=e’: ’ - (0.0)’, ‘=d’: ’ - (2.0/1.0)’, ‘=c’: ’ - (4.0/2.2)’, ‘=m’: {‘A13’: {’=g’: ’ + (2.0/1.0)’, ‘=p’: ’ - (0.0)’, ‘=s’: ’ - (5.0/1.2)’}}, ‘=cc’: ’ + (2.0/1.8)’, ‘=k’: ’ - (2.0/1.0)’, ‘=j’: ’ - (0.0)’}}}}}}, ‘=z’: ’ + (1.0/0.8)’, ‘=bb’: {‘A14’: {’<=164’: ’ + (3.4/1.5)’, ‘>164’: ’ - (5.6/1.2)’}}, ‘=ff’: ’ - (1.0/0.8)’, ‘=o’: ’ + (0.0)’, ‘=n’: ’ + (0.0)’, ‘=h’: ’ + (18.0/1.3)’, ‘=dd’: ’ + (0.0)’, ‘=j’: ’ - (1.0/0.8)’}}, ‘=l’: ’ + (0.0)’, ‘=y’: {‘A13’: {’=g’: {‘A14’: {’<=204’: ’ - (16.0/2.5)’, ‘>204’: ’ + (5.0/2.3)’}}, ‘=p’: ’ - (0.0)’, ‘=s’: ’ + (2.0/1.0)’}}, ‘=t’: ’ + (0.0)’}}}}}}, ‘=f’: ’ - (239.0/19.4)’}}
After MEP pruned,the model is:
model_pruned= {‘A9’: {’=t’: {‘A15’: {’>228’: ’ + (106.0/2.0)’, ‘<=228’: {‘A11’: {’>3’: {‘A15’: {’>4’: {‘A15’: {’<=5’: ’ - (2.0)’, ‘>5’: ‘+ (10/1)’}}, ‘<=4’: ’ + (25.0)’}}, ‘<=3’: {‘A4’: {’=u’: {‘A7’: {’=v’: {‘A14’: {’<=110’: ’ + (18.0/1.0)’, ‘>110’: {‘A15’: {’>8’: ’ + (4.0)’, ‘<=8’: {‘A6’: {’=aa’: ‘+ (8/3)’, ‘=w’: {‘A12’: {’=t’: ’ - (2.0)’, ‘=f’: ’ + (3.0)’}}, ‘=q’: ‘+ (12/2)’, ‘=c’: ’ - (4.0/1.0)’, ‘=r’: ’ - (0.0)’, ‘=cc’: ’ + (2.0/1.0)’, ‘=x’: ’ - (0.0)’, ‘=e’: ’ - (0.0)’, ‘=d’: ’ - (2.0)’, ‘=ff’: ’ - (0.0)’, ‘=m’: {‘A13’: {’=g’: ’ + (2.0)’, ‘=p’: ’ - (0.0)’, ‘=s’: ’ - (5.0)’}}, ‘=i’: ’ - (0.0)’, ‘=k’: ’ - (2.0)’, ‘=j’: ’ - (0.0)’}}}}}}, ‘=z’: ’ + (1.0)’, ‘=bb’: ‘- (9/3)’, ‘=ff’: ’ - (1.0)’, ‘=o’: ’ + (0.0)’, ‘=n’: ’ + (0.0)’, ‘=h’: ’ + (18.0)’, ‘=dd’: ’ + (0.0)’, ‘=j’: ’ - (1.0)’}}, ‘=l’: ’ + (0.0)’, ‘=y’: {‘A13’: {’=g’: ‘- (21/5)’, ‘=p’: ’ - (0.0)’, ‘=s’: ’ + (2.0)’}}, ‘=t’: ’ + (0.0)’}}}}}}, ‘=f’: ‘- (239/16)’}}
visiualization of the above unpruned model:
visiualization of the above MEP-pruned model:
accuracy_unprune= 0.961224489796
accuracy_prune= 0.928571428571
the EBP-pruned results is:
Evaluation on training data (490 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
90 19( 3.9%) 58 24( 4.9%) (11.9%) <<
Let’s put the above results in the following table:
pruning methods | unpruned accuracy | pruned accuracy | simplicity (how huge is the model after being pruned) |
---|---|---|---|
MEP | 0.961224489796 | 0.928571428571 | 10.5 lines long |
EBP | 0.961 | 0.951 | 14 lines long |
Note:different editor has different “line length”,so the 10.5lines and 14 lines are counted in CSDN blog Markdown Editor
We can see that:
In terms of simplicity,MEP wins,EBP loses.
In terms of accuracy,EBP wins,MEP loses.
Summary:
MEP is targeted at simplifying your decision trees without losing accuracy too much.