决策树。weka。信息增益

最新推荐文章于 2022-09-14 15:05:47 发布

summerbell

最新推荐文章于 2022-09-14 15:05:47 发布

阅读量967

点赞数 1

分类专栏：数据挖掘文章标签：数据结构与算法

数据挖掘专栏收录该内容

9 篇文章 0 订阅

订阅专栏

首先举出打网球的例子。

Day	Outlook	Temperature	Humidity	Wind	Play Tennis
1	sunny	hot	high	weak	no
2	sunny	hot	high	strong	no
3	overcast	hot	high	weak	yes
4	rain	mild	high	weak	yes
5	rain	cool	normal	weak	yes
6	rain	cool	normal	strong	no
7	overcast	cool	normal	strong	yes
8	sunny	mild	high	weak	no
9	sunny	cool	normal	weak	yes
10	rain	mild	normal	weak	yes
11	sunny	mild	normal	strong	yes
12	overcast	mild	high	strong	yes
13	overcast	hot	normal	weak	yes
14	rain	mild	high	strong	no

数据集中包含14个样本，其中9个正样本（yes），5个负样本（no）。则这些元组的期望信息（即熵）为：

Info(D) = - 9/14 * log₂(9/14) - 5/14 * log₂(5/14) = 0.940

现在观察每个属性的期望信息需求。在属性Outlook中，对于sunny，正样本数为2，负样本数为3；对于overcast，正样本数为4，负样本数为0；对与rain，正样本数为3，负样本数为2。

按照Outlook划分样例得到的期望信息为：

5/14 * ( - 2/5log₂2/5 – 3/5log₂3/5) + 4/15 * ( - 4/4log₂4/4) + 5/14 * ( - 3/5log₂3/5 – 2/5log₂2/5)=0.694

即其信息增益为：

Gain(outlook) = 0.940 – 0.694 = 0.246

Gain(Temperature) = 0.029

Gain(Humidity) = 0.151

Gain(Wind) = 0.048

继续信息增益的计算，最终得到如下的决策树：

以sunny,mild,normal,FALSE作为测试集，使用决策树，得出其结论为yes。

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。