监督学习
[决策树] 基于信息增益,对下述数据集进行决策树构建,描述过程。一个关于配眼镜的一个决策分类所需要的数据,数据集包含4属性:age, astigmatism, trear-prod-rate为输入特征,contact-lenses为决策属性。
-
由上表可知,一共有12个训练样本, ∣ Y ∣ = 3 |\mathcal{Y}|=3 ∣Y∣=3,其中soft有2个(编号{1,5}),hard有3个(编号{3,6,12}),none有7个(编号{2,4,7,8,9,10,11}),于是可以得到根节点的信息熵为:
H ( D ) = − ∑ k = 1 3 p k log 2 p k = − ( 2 12 log 2 2 12 + 3 12 log 2 3 12 + 7 12 log 2 7 12 ) = 1.384 H(D)=-\sum\limits_{k=1}^3 p_k \log_{2}{p_k}=-(\frac{2}{12}\log_2{\frac{2}{12}}+\frac{3}{12}\log_2\frac{3}{12}+\frac{7}{12}\log_2\frac{7}{12})=1.384 H(D)=−k=1∑3pklog2pk=−(122log2122+123log2123+127log2127)=1.384 -
对于age特征,其对应有3个子数据集,分别是: D 1 ( y o u n g ) , D 2 ( p r e − p r e s b y o p i c ) , D 3 ( p r e s b y o p i c ) D^1(young), D^2(pre-presbyopic), D^3(presbyopic) D1(young),D2(pre−presbyopic),D3(presbyopic),其中 ∣ D 1 ∣ = 3 , ∣ D 2 ∣ = 5 , ∣ D 3 ∣ = 4 |D^1|=3, |D^2|=5,|D^3|=4 ∣D1∣=3,∣D2∣=5,∣D3∣=4,子集 D 1 D^1 D1中,soft占 1 3 \frac{1}{3} 31,none占 1 3 \frac{1}{3} 31,hard占 1 3 \frac{1}{3} 31, D 2 , D 3 D^2, D^3 D2,D3同理,于是可以计算age的3个节点的信息熵为:
H
(
D
1
)
=
−
(
1
3
log
2
1
3
+
1
3
log
2
1
3
+
1
3
log
2
1
3
)
=
1.585
H(D^1)=-(\frac{1}{3}\log_{2}{\frac{1}{3}}+\frac{1}{3}\log_{2}{\frac{1}{3}}+\frac{1}{3}\log_{2}{\frac{1}{3}})=1.585
H(D1)=−(31log231+31log231+31log231)=1.585
H
(
D
2
)
=
−
(
1
5
log
2
1
5
+
1
5
log
2
1
5
+
3
5
log
2
3
5
)
=
1.371
H(D^2)=-(\frac{1}{5}\log_{2}{\frac{1}{5}}+\frac{1}{5}\log_{2}{\frac{1}{5}}+\frac{3}{5}\log_{2}{\frac{3}{5}})=1.371
H(D2)=−(51log251+51log251+53log253)=1.371
H
(
D
3
)
=
−
(
1
4
log
2
1
4
+
3
4
log
2
3
4
)
=
0.811
H(D^3)=-(\frac{1}{4}\log_{2}{\frac{1}{4}}+\frac{3}{4}\log_{2}{\frac{3}{4}})=0.811
H(D3)=−(41log241+43log243)=0.811
-
age属性的信息增益为 G ( D , a g e ) = H ( D ) − ∑ v = 1 3 D v D ⋅ H ( D v ) = 1.384 − ( 3 12 ⋅ 1.585 + 5 12 ⋅ 1.371 + 4 12 ⋅ 0.811 ) = 1.384 − 1.238 = 0.146 G(D,age)=H(D)-\sum\limits_{v=1}^{3}\frac{D^v}{D}\cdot H(D^v)=1.384-(\frac{3}{12}\cdot 1.585+\frac{5}{12}\cdot 1.371+\frac{4}{12}\cdot 0.811)=1.384-1.238=0.146 G(D,age)=H(D)−v=1∑3DDv⋅H(Dv)=1.384−(123⋅1.585+125⋅1.371+124⋅0.811)=1.384−1.238=0.146
-
同理对于astigmatism特征,分为两个子数据集, D 1 ( y e s ) , D 2 ( n o ) D^1(yes),D^2(no) D1(yes),D2(no),计算出信息增益: G ( D , a s t i g m a t i s m ) = H ( D ) − ∑ v = 1 2 D u D ⋅ H ( D u ) G(D,astigmatism)=H(D)-\sum\limits_{v=1}^2\frac{D^u}{D}\cdot H(D^u) G(D,astigmatism)=H(D)−v=1∑2DDu⋅H(Du)
= 1.384 − ( 5 12 ( − ( 2 5 log 2 2 5 + 3 5 log 2 3 5 ) ) + 7 12 ( − ( 3 7 log 2 3 7 + 4 7 log 2 4 7 ) ) ) =1.384-\left(\frac{5}{12}\left(-\left(\frac{2}{5}\log_{2}{\frac{2}{5}}+\frac{3}{5}\log_{2}{\frac{3}{5}}\right)\right)+\frac{7}{12}\left(-\left(\frac{3}{7}\log_{2}{\frac{3}{7}}+\frac{4}{7}\log_{2}{\frac{4}{7}}\right)\right)\right) =1.384−(125(−(52log252+53log253))+127(−(73log273+74log274)))
= 1.384 − 0.979 = 0.405 =1.384-0.979=0.405 =1.384−0.979=0.405
-
同理计算 G ( D , t e a r − p r o d u c t i o n − r a t e ) = 1.384 − 1.041 = 0.343 G(D,tear-production-rate)=1.384-1.041=0.343 G(D,tear−production−rate)=1.384−1.041=0.343
-
由上可知,astigmatism这一属性的信息增益最大,从而被选为划分属性第一次划分后建的树:
- 考虑对分支节点进一步划分。
- 对于 y e s yes yes节点,特征划分如下:
age | soft | hard | none | sum |
---|---|---|---|---|
young | 0 | 1 | 1 | 2 |
pre-presbyopic | 0 | 1 | 2 | 3 |
presbyopic | 0 | 1 | 1 | 2 |
tear-production-rate | soft | hard | none | sum |
---|---|---|---|---|
reduced | 0 | 0 | 2 | 2 |
normal | 0 | 3 | 2 | 5 |
- 和选择第一个节点类似,分别计算两者的信息增益:
G ( D 1 , a g e ) = 0.985 − 0.965 = 0.02 G(D^1, age)=0.985-0.965=0.02 G(D1,age)=0.985−0.965=0.02
G ( D 1 , t e a r − p r o d u c t i o n − r a t e ) = 0.985 − 0.694 = 0.291 G(D^1,tear-production-rate)=0.985-0.694=0.291 G(D1,tear−production−rate)=0.985−0.694=0.291
-
因此取 y e s yes yes节点的时候选择tear-production-rate当第二次划分的属性。
-
同理对于 n o no no节点,特征划分如下:
age | soft | hard | none | sum |
---|---|---|---|---|
young | 1 | 0 | 0 | 1 |
pre-presbyopic | 1 | 0 | 1 | 2 |
presbyopic | 0 | 0 | 2 | 2 |
tear-production-rate | soft | hard | none | sum |
---|---|---|---|---|
reduced | 0 | 0 | 2 | 2 |
normal | 2 | 0 | 1 | 3 |
- 继续计算得到:
G ( D 2 , a g e ) = 0.971 − 0.4 = 0.571 G(D^2,age)=0.971-0.4=0.571 G(D2,age)=0.971−0.4=0.571
G ( D 2 , t e a r − p r o d u c t i o n − r a t e ) = 0.971 − 0.551 = 0.42 G(D^2,tear-production-rate)=0.971-0.551=0.42 G(D2,tear−production−rate)=0.971−0.551=0.42
-
于是取 n o no no节点的时候选择age作为第二次划分的属性。对分支节点进一步划分,由于只有3个属性,因此第三层的非叶子节点无需选择。
-
综上,可以得到如下的决策树: