# Regression Tree 回归树

## 1. 引言

AI时代，机器学习算法成为了研究、应用的热点。当前，最火的两类算法莫过于神经网络算法（CNN、RNN、LSTM等）与树形算法（随机森林、GBDT、XGBoost等），树形算法的基础就是决策树。决策树因其易理解、易构建、速度快的特性，被广泛应用于统计学、数据挖掘、机器学习领域。因此，对决策树的学习，是机器学习之路必不可少的一步。

Classification tree analysis is when the predicted outcome is the class to which the data belongs.

Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient’s length of stay in a hospital).

## 2. 回归树

The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the above procedures, first introduced by Breiman et al. Trees used for regression and trees used for classification have some similarities - but also some differences, such as the procedure used to determine where to split.

### 2.1 原理概述

$\min_j_s[&space;\min_c___1&space;Loss(y_i,&space;c_1)&space;+&space;\min_c___2&space;Loss(y_i,&space;c_2)&space;]$

### 2.3 一个简单实例

x 1 2 3 4 5 6 7 8 9 10
y 5.56 5.7 5.91 6.4 6.8 7.05 8.9 8.7 9 9.05

1. 选择最优切分变量j与最优切分点s

$\min_j_s[&space;\min_c___1&space;Loss(y_i,&space;c_1)&space;+&space;\min_c___2&space;Loss(y_i,&space;c_2)&space;]$

${c}_{1}=5.56,{c}_{2}=\frac{1}{9}\left(5.7+5.91+6.4+6.8+7.05+8.9+8.7+9+9.05\right)=7.50$$c_1=5.56, c_2= \frac{1}{9}(5.7+5.91+6.4+6.8+7.05+8.9+8.7+9+9.05)=7.50$。得到下表：

s 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
${c}_{1}$$c_1$ 5.56 5.63 5.72 5.89 6.07 6.24 6.62 6.88 7.11
${c}_{2}$$c_2$ 7.5 7.73 7.99 8.25 8.54 8.91 8.92 9.03 9.05

${c}_{1},{c}_{2}$$c_1, c_2$的值代入到上式，如：$m\left(1.5\right)=0+15.72=15.72$$m(1.5)=0+15.72=15.72$。同理，可获得下表：

s 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
m(s) 15.72 12.07 8.36 5.78 3.91 1.93 8.01 11.73 15.74

1. 用选定的(j,s)划分区域，并决定输出值

两个区域分别是：${R}_{1}=\left\{1,2,3,4,5,6\right\},{R}_{2}=\left\{7,8,9,10\right\}$$R_1=\{1,2,3,4,5,6\} , R_2=\{7,8,9,10\}$输出值${c}_{m}=ave\left({y}_{i}|{x}_{i}\in {R}_{m}\right),{c}_{1}=6.24,{c}_{2}=8.91$$c_m=ave(y_i|x_i \in R_m),c_1=6.24,c_2=8.91$

2. 对两个子区域继续调用步骤1、步骤2

${R}_{1}$$R_1$继续进行划分:

x 1 2 3 4 5 6
y 5.56 5.7 5.91 6.4 6.8 7.05

取切分点$\left[1.5,2.5,3.5,4.5,5.5\right]$$[1.5, 2.5, 3.5, 4.5, 5.5]$，则各区域的输出值$c$$c$如下表

s 1.5 2.5 3.5 4.5 5.5
${c}_{1}$$c_1$ 5.56 5.63 5.72 5.89 6.07
${c}_{2}$$c_2$ 6.37 6.54 6.75 6.93 7.05

计算m(s)：

s 1.5 2.5 3.5 4.5 5.5
m(s) 1.3087 0.754 0.2771 0.4368 1.0644

s=3.5时m(s)最小。

之后的过程不再赘述。

3. 生成回归树

假设在生成3个区域之后停止划分，那么最终生成的回归树形式如下：

$T=\left\{\begin{array}{cc}5.72& x\le 3.5\\ 6.75& 3.5⩽x\le 6.5\\ 8.91& 6.5$T=\left\{\begin{matrix}5.72 & x\leq 3.5\\ 6.75 &3.5\leqslant x\leq 6.5\\ 8.91 & 6.5

## 3. 总结

09-17 18万+

08-11 8935

10-27 1875

04-09 5202

05-27 4万+

01-12 1万+

09-06 507

08-04 9224

08-13 901

01-07

12-04 1万+

04-08 3963

06-27 2664

10-05 2246

01-20 7561

07-17 5404

03-17 2955

06-11 8161

04-24 8893

08-24 2万+

04-14 1万+

09-08 1768

03-22 4万+

09-14 1万+

05-11 1917

04-01 1879

07-27 1万+

04-09 3万+

02-22 246

03-24 2万+

05-15 60

01-20 6379

04-12 188

03-19 3万+

03-24 9096

03-23 4万+

12-11 5389

04-30 3万+

11-21 17万+

08-28 1066

03-04 14万+

05-16 7120

11-19 2万+

05-08 5万+

03-30 4万+

03-23 4万+

03-08 2万+

11-20 7686

08-12 28

10-14 39

04-07 5万+

03-16 4181

10-30 2933

05-07 3万+

07-19 47

04-01 172

04-16 585

10-15 4379

10-24 161

08-08 296

03-10 2万+

07-24 2万+

10-09 800

09-11 2135

05-15 4万+

12-09 2290

01-16 760

11-09 2565

03-10 1万+

04-01 10万+

06-18 2万+

03-13 1009

01-09 1029

04-17 5万+

#### Java岗开发3年，公司临时抽查算法，离职后这几题我记一辈子

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客