决策树原理

最新推荐文章于 2023-02-27 20:19:46 发布

公良将

最新推荐文章于 2023-02-27 20:19:46 发布

阅读量817

点赞数

分类专栏：机器学习文章标签：决策树

机器学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

引用：http://www.saedsayad.com/decision_tree.htm

Decision Tree - Classification

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes andleaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classification or decision. The topmost decision node in a tree which corresponds to the best predictor calledroot node. Decision trees can handle both categorical and numerical data.

Algorithm

The core algorithm for building decision trees calledID3 by J. R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 usesEntropy and Information Gain to construct a decision tree.

Entropy

A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.

我的理解：

思路：想要用一个函数 f 来表示信息的不确定性（混乱程度），这个函数得满足两个要求：1. 假设信息中各个元素出现的概率越高 f的取值应该越小（比如上图中play golf列元素数越多则f取值应越小，反之若该列只有一个取值‘NO’则该列信息就不混乱了（此时概率为1） f取值应该为0）.2.两个独立符号所产生的不确定性应等于各自不确定性之和，即f（P1，P2）=f（P1）+f（P2）

同时满足这两个条件的函数f是对数函数，即

To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:

a) Entropy using the frequency table of one attribute:

b) Entropy using the frequency table of two attributes:

Information Gain

The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).

翻译：信息增益是在数据集被分割到某一特定属性后熵值的减少量。构造决策树就是找出能带来最大信息增益的特定属性的过程。

还是不好理解，看下面的step2会有帮助。

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.

Step 3: Choose attribute with the largest information gain as the decision node.

为什么要选择信息增益最大的属性来作为跟节点？如果Gain(A)越大则属性A对于分类来说提供的信息量也就越大。选择A来作为跟节点则剩下的对于分类的不确定性就越小。这使得在每一个非叶节点上进行测试时能活到关于被测试记录的最大信息类别！！

（至于这背后的原理我还不知道，如果读者有知道的，还请您告知或讨论）

Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.

Decision Tree to Decision Rules

A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf nodes one by one.

Decision Trees - Issues

Working with continuous attributes (binning)
Avoiding overfitting
Super Attributes (attributes with many values)
Working with missing values

Exercise

Try to invent a new algorithm to construct a decision tree from data usingChi² test.

公良将

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
决策树原理

引用：http://www.saedsayad.com/decision_tree.htm Decision Tree - ClassificationDecision tree builds classification or regression models in the form of a tree structure. It breaks do
复制链接

扫一扫