- [决策树] 基于信息增益,对下述数据集进行决策树构建,描述过程
一个关于配眼镜的一个决策分类所需要的数据,数据集包含4属性:age, astigmatism, trear-prod-rate为输入特征,contact-lenses为决策属性。
以属性a对数据集D进行划分所获得的信息增益为:
G
(
D
,
a
)
=
H
(
D
)
−
∑
v
=
1
V
∣
D
v
∣
D
⋅
H
(
D
v
)
G(D,a)=H(D)-\sum_{v=1}^V\frac{|D^v|}{D}\cdot H(D^v)
G(D,a)=H(D)−v=1∑VD∣Dv∣⋅H(Dv)
可知,划分后的信息熵越小,则获得的信息增益越大,从而选择使得当前样本集合信息增益最大的属性a作为当前的划分属性
-
第一次划分
-
age
age\contact-lenses none soft hard sum young 1 1 1 3 pre-presbyopic 3 1 1 5 presbyopic 3 0 1 4 D 1 D^1 D1(young), D 2 D^2 D2(pre-presbyopic), D 3 D^3 D3(presbyopic)
H ( D 1 ) = − ( 1 3 log 1 3 + 1 3 log 1 3 + 1 3 log 1 3 ) = 1.58 H ( D 2 ) = − ( 3 5 log 3 5 + 1 5 log 1 5 + 1 5 log 1 5 ) = 1.37 H ( D 3 ) = − ( 3 4 log 3 4 + 1 4 log 1 4 ) = 0.81 H ( D ) a g e = 3 12 H ( D 1 ) + 5 12 H ( D 2 ) + 4 12 H ( D 3 ) = 1.24 \begin{align} H(D^1)=&-(\frac{1}{3}\log\frac{1}{3}+\frac{1}{3}\log\frac{1}{3}+\frac{1}{3}\log\frac{1}{3})=1.58\newline H(D^2)=&-(\frac{3}{5}\log\frac{3}{5}+\frac{1}{5}\log\frac{1}{5}+\frac{1}{5}\log\frac{1}{5})=1.37\newline H(D^3)=&-(\frac{3}{4}\log\frac{3}{4}+\frac{1}{4}\log\frac{1}{4})=0.81\newline H(D)_{age}=&\frac{3}{12}H(D^1)+\frac{5}{12}H(D^2)+\frac{4}{12}H(D^3)=1.24\newline \end{align} H(D1)=H(D2)=H(D3)=H(D)age=−(31log31+31log31+31log31)=1.58−(53log53+51log51+51log51)=1.37−(43log43+41log41)=0.81123H(D1)+125H(D2)+124H(D3)=1.24 -
astigmatism
astigmatism\contact-lenses none soft hard sum yes 4 0 3 7 no 3 2 0 5 D 1 D^1 D1(yes), D 2 D^2 D2(no)
H ( D 1 ) = − ( 4 7 log 4 7 + 3 7 log 3 7 ) = 0.99 H ( D 2 ) = − ( 3 5 log 3 5 + 2 5 log 2 5 ) = 0.97 H ( D ) a s t i g m a t i s m = 7 12 H ( D 1 ) + 5 12 H ( D 2 ) = 0.98 \begin{align} H(D^1)=&-(\frac{4}{7}\log\frac{4}{7}+\frac{3}{7}\log\frac{3}{7})=0.99\newline H(D^2)=&-(\frac{3}{5}\log\frac{3}{5}+\frac{2}{5}\log\frac{2}{5})=0.97\newline H(D)_{astigmatism}=&\frac{7}{12}H(D^1)+\frac{5}{12}H(D^2)=0.98\newline \end{align} H(D1)=H(D2)=H(D)astigmatism=−(74log74+73log73)=0.99−(53log53+52log52)=0.97127H(D1)+125H(D2)=0.98 -
trear-prod-rate
trear-prod-rate\contact-lenses none soft hard sum normal 3 2 3 8 reduced 4 0 0 4 D 1 D^1 D1(normal), D 2 D^2 D2(reduced)
H ( D 1 ) = − ( 3 8 log 3 8 + 2 8 log 2 8 + 3 8 log 3 8 ) = 1.56 H ( D 2 ) = − 4 4 log 4 4 = 0 H ( D ) t r e a r − p r o d − r a t e = 8 12 H ( D 1 ) + 4 12 H ( D 2 ) = 1.04 \begin{align} H(D^1)=&-(\frac{3}{8}\log\frac{3}{8}+\frac{2}{8}\log\frac{2}{8}+\frac{3}{8}\log\frac{3}{8})=1.56\newline H(D^2)=&-\frac{4}{4}\log\frac{4}{4}=0\newline H(D)_{trear-prod-rate}=&\frac{8}{12}H(D^1)+\frac{4}{12}H(D^2)=1.04\newline \end{align} H(D1)=H(D2)=H(D)trear−prod−rate=−(83log83+82log82+83log83)=1.56−44log44=0128H(D1)+124H(D2)=1.04
根据astigmatism划分后的信息熵最小,相应的信息增益最大,被选为第一次划分的属性
第一次划分后的决策
-
-
第二次划分
对 D 1 D_1 D1进行划分
ID age trear-prod-rate contact-lenses 2 young reduced none 3 young normal hard 6 pre-presbyopic normal hard 7 pre-presbyopic normal none 8 pre-presbyopic normal none 11 presbyopic reduced none 12 presbyopic normal hard -
age
age\contact-lenses none soft hard sum young 1 0 1 2 pre-presbyopic 2 0 1 3 presbyopic 1 0 1 2 D 11 D^{11} D11(young), D 12 D^{12} D12(pre-presbyopic), D 13 D^{13} D13(presbyopic)
H ( D 11 ) = − ( 1 2 log 1 2 + 1 2 log 1 2 ) = 1 H ( D 12 ) = − ( 2 3 log 2 3 + 1 3 log 1 3 ) = 0.92 H ( D 13 ) = − ( 1 2 log 1 2 + 1 2 log 1 2 ) = 1 H ( D ) a g e = 2 7 H ( D 11 ) + 3 7 H ( D 12 ) + 2 7 H ( D 13 ) = 0.97 \begin{align} H(D^{11})=&-(\frac{1}{2}\log\frac{1}{2}+\frac{1}{2}\log\frac{1}{2})=1\newline H(D^{12})=&-(\frac{2}{3}\log\frac{2}{3}+\frac{1}{3}\log\frac{1}{3})=0.92\newline H(D^{13})=&-(\frac{1}{2}\log\frac{1}{2}+\frac{1}{2}\log\frac{1}{2})=1\newline H(D)_{age}=&\frac{2}{7}H(D^{11})+\frac{3}{7}H(D^{12})+\frac{2}{7}H(D^{13})=0.97\newline \end{align} H(D11)=H(D12)=H(D13)=H(D)age=−(21log21+21log21)=1−(32log32+31log31)=0.92−(21log21+21log21)=172H(D11)+73H(D12)+72H(D13)=0.97 -
trear-prod-rate
trear-prod-rate\contact-lenses none soft hard sum normal 2 0 3 5 reduced 2 0 0 2 D 11 D^{11} D11(normal), D 12 D^{12} D12(reduced)
H ( D 11 ) = − ( 2 5 log 2 5 + 3 5 log 3 5 ) = 0.97 H ( D 12 ) = − 2 2 log 2 2 = 0 H ( D ) t r e a r − p r o d − r a t e = 5 7 H ( D 11 ) + 2 7 H ( D 12 ) = 0.69 \begin{align} H(D^{11})=&-(\frac{2}{5}\log\frac{2}{5}+\frac{3}{5}\log\frac{3}{5})=0.97\newline H(D^{12})=&-\frac{2}{2}\log\frac{2}{2}=0\newline H(D)_{trear-prod-rate}=&\frac{5}{7}H(D^{11})+\frac{2}{7}H(D^{12})=0.69\newline \end{align} H(D11)=H(D12)=H(D)trear−prod−rate=−(52log52+53log53)=0.97−22log22=075H(D11)+72H(D12)=0.69
根据trear-prod-rate划分后的信息熵最小,相应的信息增益最大,trear-prod-rate被选为对 D 1 D^1 D1进行划分的属性
对 D 2 D^2 D2进行划分
ID age trear-prod-rate contact-lenses 1 young normal soft 4 pre-presbyopic reduced none 5 pre-presbyopic normal soft 9 presbyopic reduced none 10 presbyopic normal none -
age
age\contact-lenses none soft hard sum young 0 1 0 1 pre-presbyopic 1 1 0 2 presbyopic 2 0 0 2 D 21 D^{21} D21(young), D 22 D^{22} D22(pre-presbyopic), D 23 D^{23} D23(presbyopic)
H ( D 21 ) = − 1 1 log 1 1 = 0 H ( D 22 ) = − ( 1 2 log 1 2 + 1 2 log 1 2 ) = 1 H ( D 23 ) = − 2 2 log 2 2 = 0 H ( D ) a g e = 2 5 H ( D 12 ) = 0.4 \begin{align} H(D^{21})=&-\frac{1}{1}\log\frac{1}{1}=0\newline H(D^{22})=&-(\frac{1}{2}\log\frac{1}{2}+\frac{1}{2}\log\frac{1}{2})=1\newline H(D^{23})=&-\frac{2}{2}\log\frac{2}{2}=0\newline H(D)_{age}=&\frac{2}{5}H(D^{12})=0.4\newline \end{align} H(D21)=H(D22)=H(D23)=H(D)age=−11log11=0−(21log21+21log21)=1−22log22=052H(D12)=0.4 -
trear-prod-rate
trear-prod-rate\contact-lenses none soft hard sum normal 1 2 0 3 reduced 2 0 0 2 D 21 D^{21} D21(normal), D 22 D^{22} D22(reduced)
H ( D 21 ) = − ( 1 3 log 1 3 + 2 3 log 2 3 ) = 0.92 H ( D 22 ) = − 2 2 log 2 2 = 0 H ( D ) t r e a r − p r o d − r a t e = 3 5 H ( D 21 ) = 0.55 \begin{align} H(D^{21})=&-(\frac{1}{3}\log\frac{1}{3}+\frac{2}{3}\log\frac{2}{3})=0.92\newline H(D^{22})=&-\frac{2}{2}\log\frac{2}{2}=0\newline H(D)_{trear-prod-rate}=&\frac{3}{5}H(D^{21})=0.55\newline \end{align} H(D21)=H(D22)=H(D)trear−prod−rate=−(31log31+32log32)=0.92−22log22=053H(D21)=0.55
根据age划分后的信息熵最小,相应的信息增益最大,age被选为对 D 2 D^2 D2进行划分的属性
-