统计学习导论_统计学习导论 | 读书笔记16 | 树模型

最新推荐文章于 2023-09-16 15:05:26 发布

weixin_39635657

最新推荐文章于 2023-09-16 15:05:26 发布

阅读量222

点赞数

文章标签：统计学习导论

ISLR 8.1.1 - 树模型

要点：
0.树模型简介
1.回归树
-- 动机
-- 树分枝
-- 树剪枝
2.分类树
-- 基尼系数
-- 交叉熵
3.树模型vs线性模型
4.树模型优缺点

0. Tree-based methods

Involving stratifying / segmenting the predictor space into a number of simple regions

Use the mean/mode of the training data in the region as prediction for test data

1. Regression Decision Tree

1.1 Motivation

Making Prediction via Stratification of the Feature Space：

Divide the predictor space -- that is, the set of possible response

for
-- into

distinct and non-overlapping regions,
For every test data which will fall into the region
, we make the
same prediction, which is simply the mean of the response

for the training observations in

1.2 Tree Splitting

The goal is to find leaves

that minimizes the RSS, given by:

where
is the
mean response for the training observations within the
leaf

「Problem」: it is computationally infeasible to consider every possible partition of the feature space into

leaves

「Solution」: Recursive Binary Splitting is a top-down, greedy approach:

top-down because it begins at the top of the tree (all in one region) and then successively splits the predictor space;
greedy because the best split is made at each step, rather than looking ahead globally and picking a split will lead to a better tree in some future step(which is impossible)

First, for any feature

and cutpoint

, we define two regions:

to get the best

and

that minimize:

Next, repeat the process, looking for the best

and

to continue splitting

until a stopping criterion is reached:
e.g. no region contains more than 5 observations.

If the number of features

is not too large, this process can be done quickly

predict
in test data using the
mean of the train data in the region
to which the test data belongs

1.3 Tree Pruning

「Problem」：Complex Tree will lead overfit (each leaf has one data)

「Solution」: A smaller tree with fewer splits (fewer

) may lead to

lower variance and better interpretation at the cost of a little bias

「Method 1 - Threshold」
Splitting only as the decrease in the RSS exceeds some (high) threshold

Problem: too short-sighted since a seemingly worthless split early on may lead to a better split with large reduction in RSS

「Method 2 - Pruning」
Grow a very large tree

, then

prune it back in order to obtain a subtree

Goal: select a subtree that leads to the lowest test error rate

Rather than CVing every possible subtree, we consider a sequence of trees indexed by non-negative tuning parameter

Cost Complexity / Weakest Link Pruning

For each value of

there corresponds a subtree

to minimize:

: number of leaves of the tree

The tuning parameter

controls a trade-off

between the subtree's complexity and its fit to the training data.

, just measures the error
As
increases, there is
penalty for the subtree with many leaves
so branches get pruned from the tree in a nested and predictable fashion,
then obtaining the whole sequence of subtrees (as a function of
) is easy

is similar to

of the lasso

, which is a controller of the complexity of a linear model

also can be selected via CV and obtain the subtree corresponding to
「Example on Baseball Hitters Data」

Perform 6-fold CV to estimate the CV MSE of the trees as a function of

CV error is minimum at
based on the best

2. Classification Trees

For a classification tree, we predict the test data belongs to the most commonly occurring class of train data in the region to which it belongs

RSS cannot be a criterion for classification tree
need two criterions to evaluate the quality of a particular split

「Gini Index」:
A measure of total variance across the

classes:

represents the
proportion of train data in the
region that are from the

class
G small if all
are close 1 or 0
Node Purity: Smaller if a node contains larger amount of observations from a single class

「Cross-Entropy」:

Like the Gini Index, the Entropy is smaller if
node is pure
both are sensitive to node purity

「Heart Disease Example」:

The splits may yield two same predicted value, there are reasons to keep them:

because it leads to increased node purity
improves the Gini Index and the Entropy

3. Trees vs Linear Models

If there is a highly non-linear complex relationship between the features and the response, CARTs may outperform classical approaches

However, there may still be linear relationship

4. Pros & Cons of Trees

「Advantages:」

Easier to interpret than Linear Regression
More closely mirror human decision making
Trees can be displayed graphically
Easily handle qualitative predictor without the dummy variables

「Limitations」:

Trees can be very non-robust:

a small change in the data can cause a large change in the final estimated tree

Trees do not have the same level of predictive accuracy as other regression and classification methods

TOGO：By aggregating many decision trees, bagging, random forests and boosting will improve accuracy, at the expense of some loss in interpretation

5. Reference

An Introduction to Statistical Learning, with applications in R (Springer, 2013)

weixin_39635657

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
统计学习导论_统计学习导论 | 读书笔记16 | 树模型

ISLR 8.1.1 - 树模型要点：0.树模型简介1.回归树-- 动机-- 树分枝-- 树剪枝2.分类树-- 基尼系数-- 交叉熵3.树模型vs线性模型4.树模型优缺点0. Tree-based methodsInvolving stratifying / segmenting the predictor space into a number of simple regionsUse the ...
复制链接

扫一扫