E(S)=(-9/15)log2(9/15)-(6/15)log2(6/15)=0.971
Values(收入范围)={20-30K,30-40k,40-50K,50-60K}
E(S(20-30K))= (-2/4)log2(2/4)- (2/4)log2(2/4)=1
E(S(30-40K))= (-4/5)log2(4/5)- (1/5)log2(1/5)=0.7219
E(S(40-50K))= (-1/4)log2(1/4)- (3/4)log2(3/4)=0.8113
E(S(50-60K))= (-2/2)log2 (2/2)- (0/2)log2(0/2)=0
所以
E(S,收入范围)=(4/15) E(S(20-30K)) +(5/15) E(S(30-40K)) +(4/15) E(S(40-50K)) +(2/15) E(S(50-60K))=0.7236
Gain(S,收入范围)=0.971-0.7236=0.2474
同理:计算“保险”,“性别”,“年龄”的信息增益为:
E(S)=(-9/15)log2(9/15)-(6/15)log2(6/15)=0.971
Insurance(保险)={yes, no}
E(S(yes))= (-3/3)log2 (3/3)- (0/3)log2(0/3)=0
E(S(no))= (-6/12)log2 (6/12)- (6/12)log2(6/12)=1
E(S,保险)=(3/15) E(S(yes)) +(12/15) E(S(no)) =0.8
Gain(S,保险)=0.971-0