决策树-信息增益，信息增益率，Gini

最新推荐文章于 2024-03-11 15:14:33 发布

lishuandao

最新推荐文章于 2024-03-11 15:14:33 发布

阅读量1.8w

点赞数 1

分类专栏：人工智能

本文链接：https://blog.csdn.net/lishuandao/article/details/52541148

版权

本文详细介绍了决策树在属性选择时使用的三种度量标准：信息增益、信息增益率和Gini指数。信息增益用于ID3，信息增益率用于C4.5，Gini用于CART。通过具体例子说明了如何计算这些值，并指出信息增益可能存在的偏向性问题，而信息增益率作为补偿措施。同时，还简要提到了CART中采用的Gini指数。

摘要由CSDN通过智能技术生成

原文出处：信息增益，信息增益率，Gini

话说今天《机器学习》上课被很深地打击了，标名为“数据挖掘”专业的我居然连个信息增益的例子都没能算正确。唉，自看书以来，这个地方就一直没有去推算过，每每看到决策树时看完Entropy就直接跳过后面增益计算了。因而，总想找个时间再回过来好好看一下，这不，被逼上了呢。神奇的墨菲定律呢：你担心它发生的，它就一定会发生。

回正题了，这三个指标均是决策树用来划分属性的时候用到的，其中信息增益（Info Gain）用于ID3，Gini用于CART，信息增益率（Info Gain Ratio）用于C4.5。提到前两个指标的计算时，首先要讲到的是关于熵（Entropy）的计算。

1、熵（Entropy）

理论上来说用于决策树的属性选择函数，为方便计算，往往是定义为其属性的不纯性度量，那么必须满足如下三个条件：

当结点很纯时，其度量值应为0
当不纯性最大时（比如所有类都有同样的可能），其度量值应最大
度量应该服从多级特性，这样决策树才能分阶段建立起来
$measure([2,3,4])=measure([2,7])+frac79timesmeasure([3,4])$

而熵（Entropy）能够满足以上三点特性。熵（Entropy）是由“信息论之父”香农提出的，更多的各种历史、数学理论请查看参考[1]。接下来，看看熵的计算公式如下：

e n t r o p y (p 1, p 2, \dots, p n) = - p 1 l o g 2 (p 1) - p 2 l o g 2 (p 2) - \dots - p n l o g 2 (p n)

其中，( p_i )为比例值。其实，熵也可以用另外一种意思来解释：

Given a probability distribution, the info required to predict an event is the distribution’s entropy. Entropy gives the information required in bits (this can involve fractions of bits!)

可以简单的理解为“熵”描述了用来预测的信息位数。接下来看个例子：

如下表所述的天气数据，学习目标是预测Play or not play?

表1 天气预报数据集例子

Outlook	Temperature	Humidity	Windy	Play?
sunny	hot	high	false	no
sunny	hot	high	true	no
overcast	hot	high	false	yes
rain	mild	high	false	yes
rain	cool	normal	false	yes
rain	cool	normal	true	no
overcast	cool	normal	true	yes
sunny	mild	high	false	no
sunny	cool	normal	false	yes
rain	mild	normal	false	yes
sunny	mild	normal	true	yes
overcast	mild	high	true	yes
overcast	hot	normal	false	yes
rain	mild	high	true	no