# 梯度提升树(GBDT)

## 提升树模型

θ m = arg ⁡ min ⁡ θ ∑ i = 1 N L ( y i , F m − 1 ( x i ) + f ( x i ; θ ) ) \theta_{m}=\arg\min_{\theta}\sum_{i=1}^{N}L(y_{i},F_{m-1}(\mathbf{x}_{i})+f(\mathbf{x}_{i};\theta))

L ( y , F m − 1 ( x ) + f ( x ; θ ) ) = L [ y − F m − 1 ( x ) − f ( x ; θ ) ] 2 = [ r − f ( x ; θ ) ] 2 L(y,F_{m-1}(\mathbf{x})+f(\mathbf{x};\theta))=L[y-F_{m-1}(\mathbf{x})-f(\mathbf{x};\theta)]^{2}=[r-f(\mathbf{x};\theta)]^{2}
r = y − F m − 1 ( x ) r=y-F_{m-1}(\mathbf{x}) 是当前模型拟合数据的残差(residual)。所以，回归问题的提升树算法只需要拟合当前模型的残差。

m = 1 , 2 , . . . , M m=1,2,...,M :
---------计算残差 r m i = y i − F m − 1 ( x i ) , i = 1 , 2 , . . . , N r_{mi}=y_{i}-F_{m-1}(\mathbf x_{i}),i=1,2,...,N
---------拟合残差学习一个回归树，得到回归树 f ( x ; θ m ) f(\mathbf{x;\theta_{m}})
---------更新 F m ( x ) = F m − 1 ( x ) + f ( x ; θ m ) F_{m}(\mathbf{x})=F_{m-1}(\mathbf{x})+f(\mathbf{x;\theta_{m}})

x12345678910
y5.565.705.916.406.807.058.908.709.009.05

m ( s ) = min ⁡ c 1 ∑ x i ∈ R 1 ( y i − c 1 ) 2 + min ⁡ c 2 ∑ x i ∈ R 2 ( y i − c 2 ) 2 m(s)=\min_{c_{1}}\sum_{x_{i}\in R_{1}}(y_{i}-c_{1})^{2}+\min_{c_{2}}\sum_{x_{i}\in R_{2}}(y_{i}-c_{2})^{2}

c 1 = 1 N 1 ∑ x i ∈ R 1 y i , c 2 = 1 N 2 ∑ x i ∈ R 2 y i c_{1}=\frac{1}{N_{1}}\sum_{x_{i}\in R_{1}}y_{i},c_{2}=\frac{1}{N_{2}}\sum_{x_{i}\in R_{2}}y_{i}

s1.52.53.54.55.56.57.58.59.5
m(s)15.7212.078.365.783.911.938.0111.7315.74

f 1 ( x ) f_{1}(x) 拟合训练数据的残差如下表所示： r 2 i = y i − f 1 ( x i ) r_{2i}=y_{i}-f_{1}(x_{i})

x x 12345678910
r 2 i r_{2i} -0.68-0.54-0.330.160.560.81-0.01-0.210.090.14

F 1 ( x ) F_{1}(x) 拟合训练数据的平方损失误差： L ( y , F 1 ( x ) ) = ∑ i = 1 10 ( y i − F 1 ( x i ) ) 2 = 1.93 L(y,F_{1}(x))=\sum_{i=1}^{10}(y_{i}-F_{1}(x_{i}))^{2}=1.93

f 3 ( x ) = { 0.15 , x &lt; 6.5 − 0.22 , x ≥ 6.5 , f 4 ( x ) = { − 0.16 , x &lt; 4.5 0.11 , x ≥ 4.5 f_{3}(x)=\left\{\begin{matrix} 0.15, x&lt;6.5\\ -0.22, x \geq 6.5 \end{matrix}\right.,f_{4}(x)=\left\{\begin{matrix} -0.16, x&lt;4.5\\ 0.11, x \geq 4.5 \end{matrix}\right.
f 5 ( x ) = { 0.07 , x &lt; 6.5 − 0.11 , x ≥ 6.5 , f 6 ( x ) = { − 0.15 , x &lt; 2.5 0.04 , x ≥ 2.5 f_{5}(x)=\left\{\begin{matrix} 0.07, x&lt;6.5\\ -0.11, x \geq 6.5 \end{matrix}\right.,f_{6}(x)=\left\{\begin{matrix} -0.15, x&lt;2.5\\ 0.04, x \geq 2.5 \end{matrix}\right.
F 6 ( x ) = F 5 ( x ) + f 6 ( x ) = f 1 ( x ) + f 2 ( x ) + . . . + f 6 ( x ) = { 5.63 , x &lt; 2.5 5.82 , 2.5 ≤ x &lt; 3.5 6.56 , 3.5 ≤ x &lt; 4.5 6.83 , 4.5 ≤ x &lt; 6.5 8.95 , x ≥ 6.5 F_{6}(x)=F_{5}(x)+f_{6}(x)=f_{1}(x)+f_{2}(x)+...+f_{6}(x)=\left\{\begin{matrix} 5.63,x&lt;2.5\\ 5.82,2.5\leq x &lt;3.5\\ 6.56,3.5\leq x &lt;4.5\\ 6.83,4.5\leq x &lt;6.5\\ 8.95,x \geq 6.5 \end{matrix}\right.
F 6 ( x ) F_{6}(x) 拟合训练数据的平方损失函数误差是： L ( y , F 6 ( x ) ) = ∑ i = 1 10 L ( y i , F 6 ( x i ) ) = 0.17 L(y,F_{6}(x))=\sum_{i=1}^{10}L(y_{i},F_{6}(x_{i}))=0.17

## 梯度提升

− [ ∂ L ( y i , F ( x i ) ) ∂ F ( x i ) ] F ( x ) = F m − 1 ( x ) -[\frac{\partial{L(y_{i},F(\mathbf{x}_{i}))}}{\partial{F({\mathbf{x}_{i}})}}]_{F({\mathbf{x}})=F_{m-1}(\mathbf x)}

− g m ( x i ) = − [ ∂ L ( y i , F ( x i ) ) ∂ F ( x i ) ] F ( x ) = F m − 1 ( x ) -g_{m}(\mathbf{x}_{i})=-[\frac{\partial{L(y_{i},F(\mathbf{x}_{i}))}}{\partial{F({\mathbf{x}_{i}})}}]_{F({\mathbf{x}})=F_{m-1}(\mathbf x)}

a m = arg ⁡ min ⁡ a , β ∑ i = 1 N [ − g m ( x i ) − β h ( x i ; a ) ] 2 a_{m}=\arg \min_{a,\beta}\sum_{i=1}^{N}[-g_{m}(\mathbf{x}_{i})-\beta h(\mathbf{x}_{i};a)]^{2}
β m \beta_{m} 无需求解的原因是下面求解的 ρ m \rho_{m} 可能会比其更优：
ρ m = arg ⁡ min ⁡ ρ ∑ i = 1 N L ( y i , F m − 1 ( x i ) + ρ h ( x i ; a m ) ) \rho_{m}=\arg\min_{\rho}\sum_{i=1}^{N}L(y_{i},F_{m-1}(\mathbf{x}_{i})+\rho h(\mathbf{x}_{i};a_{m}))

F m ( x i ) = F m − 1 ( x i ) + ρ m h ( x i ; a m ) ) F_{m}(\mathbf{x}_{i})=F_{m-1}(\mathbf{x}_{i})+\rho_{m} h(\mathbf{x}_{i};a_{m}))

12-06 974

02-28 1万+
12-29 5635
08-30 1万+
08-10 5482
04-07 2685
11-28 395
01-26 354
05-02 1万+
09-01 227
06-22 1923
04-18 3587
03-12 805
03-24 2329
04-02 4653
08-29 2万+
09-06 1891
08-20 2996
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客