机器学习（七） — 决策树

最新推荐文章于 2024-07-24 14:49:54 发布

绘梨衣吖

最新推荐文章于 2024-07-24 14:49:54 发布

阅读量1k

点赞数 20

分类专栏：机器学习文章标签：机器学习决策树人工智能

本文链接：https://blog.csdn.net/m0_65591847/article/details/135641799

版权

机器学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

model 4 — decision tree

1 decision tree

1. component

usage: classification

root node
decision node

2. choose feature on each node

maximize purity (minimize inpurity)

3. stop splitting

a node is 100% on class
splitting a node will result in the tree exceeding a maximum depth
improvement in purity score are below a threshold
number of examples in a node is below a threshold

2 meature of impurity

use entropy( $H$ ) as a meature of impurity

$H(p) = -plog_2(p) - (1-p)log_2(1-p)\\ note: 0log0 = 0$

在这里插入图片描述

3 information gain

1. definition

$infomation\_gain = H(p^{root}) - (w^{left}H(p^{left}) + w^{right}H(p^{right}))$

2. usage

meature the reduction in entropy
a signal of stopping splitting

3. continuous

find the threshold that has the most infomation gain

在这里插入图片描述

4 random forest

generating a tree sample

given training set of size m
for b = 1 to B:
	use sampling with replacement to create a new training set of size m
	train a decision tree on the training set

randomizing the feature choice: at each node, when choosing a feature to use to split, if n features is available, pick a random subset of k < n(usually $\sqrt{n}$ ) features and alow the algorithm to only choose from that subset of features