统计学习导论_统计学习导论 | 读书笔记16 | 树模型

ISLR 8.1.1 - 树模型

要点:
0.树模型简介
1.回归树
-- 动机
-- 树分枝
-- 树剪枝
2.分类树
-- 基尼系数
-- 交叉熵
3.树模型vs线性模型
4.树模型优缺点

0. Tree-based methods

Involving stratifying / segmenting the predictor space into a number of simple regions

  • Use the mean/mode of the training data in the region as prediction for test data

1. Regression Decision Tree

1.1 Motivation

Making Prediction via Stratification of the Feature Space:

  1. Divide the predictor space -- that is, the set of possible response
    for
    -- into
    distinct and non-overlapping regions,
  2. For every test data which will fall into the region
    , we make the
    same prediction, which is simply the mean of the response
    for the training observations in

1.2 Tree Splitting

The goal is to find leaves

that minimizes the RSS, given by:

  • where
    is the
    mean response for the training observations within the
    leaf

「Problem」: it is computationally infeasible to consider every possible partition of the feature space into

leaves

「Solution」: Recursive Binary Splitting is a top-down, greedy approach:

  • top-down because it begins at the top of the tree (all in one region) and then successively splits the predictor space;
  • greedy because the best split is made at each step, rather than looking ahead globally and picking a split will lead to a better tree in some future step(which is impossible)

First, for any feature

and cutpoint
, we define two regions:

to get the best

and
that minimize:

Next, repeat the process, looking for the best

and
to continue splitting
  • until a stopping criterion is reached:
  • e.g. no region contains more than 5 observations.

If the number of features

is not too large, this process can be done quickly
  • predict
    in test data using the
    mean of the train data in the region
    to which the test data belongs

c146060a2f4be7dbc84beb374ae28e2e.png

1.3 Tree Pruning

「Problem」:Complex Tree will lead overfit (each leaf has one data)

「Solution」: A smaller tree with fewer splits (fewer

) may lead to
lower variance and better interpretation at the cost of a little bias

「Method 1 - Threshold」
Splitting only as the decrease in the RSS exceeds some (high) threshold

  • Problem: too short-sighted since a seemingly worthless split early on may lead to a better split with large reduction in RSS

「Method 2 - Pruning」
Grow a very large tree

, then
prune it back in order to obtain a subtree
  • Goal: select a subtree that leads to the lowest test error rate

Rather than CVing every possible subtree, we consider a sequence of trees indexed by non-negative tuning parameter

  • Cost Complexity / Weakest Link Pruning

For each value of

there corresponds a subtree
to minimize:

  • : number of leaves of the tree

The tuning parameter

controls a trade-off
between the subtree's complexity and its fit to the training data.
  • , just measures the error
  • As
    increases, there is
    penalty for the subtree with many leaves
  • so branches get pruned from the tree in a nested and predictable fashion,
  • then obtaining the whole sequence of subtrees (as a function of
    ) is easy

is similar to
of the lasso
, which is a controller of the complexity of a linear model
  • also can be selected via CV and obtain the subtree corresponding to
  • 「Example on Baseball Hitters Data」

965911d9c4707a95061aed1c75b8f72c.png

Perform 6-fold CV to estimate the CV MSE of the trees as a function of

  • CV error is minimum at
    based on the best

66686fda738ba712e07c9d8a5f3f5b23.png

2. Classification Trees

For a classification tree, we predict the test data belongs to the most commonly occurring class of train data in the region to which it belongs

  • RSS cannot be a criterion for classification tree
  • need two criterions to evaluate the quality of a particular split

「Gini Index」:
A measure of total variance across the

classes:

  • represents the
    proportion of train data in the
    region that are from the
    class
  • G small if all
    are close 1 or 0
  • Node Purity: Smaller if a node contains larger amount of observations from a single class

「Cross-Entropy」:

  • Like the Gini Index, the Entropy is smaller if
    node is pure
  • both are sensitive to node purity

「Heart Disease Example」:

77a6aff9c5c25afaeff3fe7566ebf100.png

The splits may yield two same predicted value, there are reasons to keep them:

  • because it leads to increased node purity
  • improves the Gini Index and the Entropy

3. Trees vs Linear Models

If there is a highly non-linear complex relationship between the features and the response, CARTs may outperform classical approaches

  • However, there may still be linear relationship

9bc15bebc1291e566c25f3a4c8f55e97.png

4. Pros & Cons of Trees

「Advantages:」

  1. Easier to interpret than Linear Regression
  2. More closely mirror human decision making
  3. Trees can be displayed graphically
  4. Easily handle qualitative predictor without the dummy variables

「Limitations」:

  1. Trees can be very non-robust:
  • a small change in the data can cause a large change in the final estimated tree
  1. Trees do not have the same level of predictive accuracy as other regression and classification methods
  • TOGO:By aggregating many decision trees, bagging, random forests and boosting will improve accuracy, at the expense of some loss in interpretation

5. Reference

An Introduction to Statistical Learning, with applications in R (Springer, 2013)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值