文章目录
Decision Tree Ensemble 决策树集成原理
集成模型的最终预测结果为每一颗树的预测结果之加和:
y i ^ = ∑ k = 1 n f k ( x i ) \hat{y_i}=\sum_{k=1}^nf_k(x_i) yi^=∑k=1nfk(xi)
Additive Training:
y i ^ ( 0 ) = 0 \hat{y_i}^{(0)}=0 yi^(0)=0
y i ^ ( 1 ) = f 1 ( x i ) = y i ^ ( 0 ) + f 1 ( x i ) \hat{y_i}^{(1)}=f_1(x_i)=\hat{y_i}^{(0)}+f_1(x_i) yi^(1)=f1(xi)=yi^(0)+f1(xi)
y i ^ ( 2 ) = f 1 ( x i ) + f 2 ( x i ) = y i ^ ( 1 ) + f 2 ( x i ) \hat{y_i}^{(2)}=f_1(x_i)+f_2(x_i)=\hat{y_i}^{(1)}+f_2(x_i) yi^(2)=f1(xi)+f2(xi)=yi^(1)+f2(xi)
. . . . . . ...... ......
y i ^ ( t ) = ∑ k = 1 t f k ( x i ) = y i ^ ( t − 1 ) + f t ( x i ) \hat{y_i}^{(t)}=\sum_{k=1}^tf_k(x_i)=\hat{y_i}^{(t-1)}+f_t(x_i) yi^(t)=∑k=1tfk(xi)=yi^(t−1)+ft(xi)
Tree Boosting算法
定义一个目标函数(包含损失函数+正则化项),然后最优化它;
在训练第t个模型时,第t-1个及之前的模型已经确定。
o b j ( t ) = ∑ i = 1 n l ( y i , y i ^ ( t ) ) + ∑ i = 1 t ω ( f i ) = ∑ i = 1 n l ( y i , y i ^ ( t − 1 ) + f t ( x i ) ) + ∑ i = 1 t − 1 ω ( f i ) + ω ( f t ) = ∑ i = 1 n l ( y i , y i ^ ( t − 1 ) + f t ( x i ) ) + ω ( f t ) + C 1 \begin{aligned} obj^{(t)} &=\sum_{i=1}^nl(y_i,\hat{y_i}^{(t)})+\sum_{i=1}^t\omega(f_i)\\ &=\sum_{i=1}^nl(y_i,\hat{y_i}^{(t-1)}+f_t(x_i))+\sum_{i=1}^{t-1}\omega(f_i)+\omega(f_t)\\ &=\sum_{i=1}^nl(y_i,\hat{y_i}^{(t-1)}+f_t(x_i))+\omega(f_t)+C1 \end{aligned} obj(t)=i=1∑nl(yi,yi^(t))+i=1∑tω(fi)=i=1∑nl(yi,yi^(t−1)+ft(xi))+i=1∑t−1ω(fi)+ω(ft)=i=1∑nl(yi,yi^(t−1)+ft(xi))+ω(ft)+C1
二阶泰勒展开公式:
f ( x + Δ x ) ≈ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 f(x+\Delta x)\approx f(x)+f^{'}(x)\Delta x+\frac{1}{2}f^{''}(x)\Delta x^2 f(x+Δx)≈f(x)+f′(x)Δx+21f′′(x)Δx2
l ( y i , y i ^ ( t − 1 ) + f t ( x i ) ) = l ( y i , y i ^ ( t − 1 ) ) + g i f t ( x i ) + 1 2 h i f t 2 ( x i ) l(y_i,\hat{y_i}^{(t-1)}+f_t(x_i))=l(y_i,\hat{y_i}^{(t-1)})+g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i) l(yi,yi^(t−1)+ft(xi))=l(yi,yi^(t−1))+gift(xi)+21hift2(xi)
其中: g i = ( ∂ l ∂ y i ^ ) ( t − 1 ) , h i = ( ∂ 2 l ∂ y i ^ 2 ) ( t − 1 ) g_i=(\frac{\partial l}{\partial\hat{y_i}})^{(t-1)}, h_i=(\frac{\partial^2l}{\partial \hat{y_i}^2})^{(t-1)} gi=(∂yi^∂l)(t−1),hi=(∂yi^2∂2l)(t−1)
带入目标函数中,简化得:
o b j ( t ) = ∑ i = 1 n l ( y i , y i ^ ( t − 1 ) + f t ( x i ) ) + ω ( f t ) + C 1 = ∑ i = 1 n l ( y i , y i ^ ( t − 1 ) ) + ∑ i = 1 n [ g i f t ( x i ) + 1 2 h i f t 2 ( x i ) ] + ω ( f t ) + C 1 = ∑ i = 1 n [ g i f t ( x i ) + 1 2 h i f t 2 ( x i ) ] + ω ( f t ) + C 2 \begin{aligned} obj^{(t)} &=\sum_{i=1}^nl(y_i,\hat{y_i}^{(t-1)}+f_t(x_i))+\omega(f_t)+C1\\ &=\sum_{i=1}^nl(y_i,\hat{y_i}^{(t-1)})+\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\omega(f_t)+C1\\ &=\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\omega(f_t)+C2\\ \end{aligned} obj(t)=i=1∑nl(yi,yi^(t−1)+ft(xi))+ω(ft)+C1=i=1∑nl(yi,yi^(t−1))+i=1∑n[gift(xi)+21hift2(xi)]+ω(ft)+C1=i=1∑n[gift(xi)+21hift2(xi)]+ω(ft)+C2
扔掉常数项,简化得:
o b j ( t ) = ∑ i = 1 n [ g i f t ( x i ) + 1 2 h i f t 2 ( x i ) ] + ω ( f t ) obj^{(t)}=\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\omega(f_t) obj(t)=∑i=1n[gift(xi)+21hift2(xi)]+ω(ft)
Model Complexity 处理正则化项
将第t棵树表示为:
f t ( x ) = ω q ( x ) , ω ∈ R T f_t(x)=\omega_{q(x)}, \omega\in R^T ft(x)=ωq(x),ω∈RT
其中, ω \omega ω为叶子节点得分向量, q ( x ) q(x) q(x)为确定每个样本被划分到哪个叶子节点中的函数,T为叶子节点数。
设第t棵树的正则化项表达式为:
ω ( f ) = γ T + 1 2 λ ∑ j = 1 T ω j 2 \omega(f)=\gamma T+\frac{1}{2}\lambda\sum_{j=1}^T\omega_j^2 ω(f)=γT+21λ∑j=1Tωj2
I j I_j Ij为第j个叶子节点的样本集合:
I j = { i ∣ q ( x i ) = j } I_j=\lbrace i|q(x_i)=j\rbrace Ij={i∣q(xi)=j}
带入目标函数(将按样本累加变换为按叶子节点累加),简化得:
o b j ( t ) = ∑ i = 1 n [ g i f t ( x i ) + 1 2 h i f t 2 ( x i ) ] + ω ( f t ) = ∑ i = 1 n [ g i ω i + 1 2 h i ω i 2 ] + γ T + 1 2 λ ∑ j = 1 T ω j 2 = ∑ j = 1 T [ ( ∑ i ∈ I j g i ) ω j + 1 2 ( ∑ i ∈ I j h i + λ ) ω j 2 ] + γ T = ∑ j = 1 T [ G j ω j + 1 2 ( H j + λ ) ω j 2 ] + γ T \begin{aligned} obj^{(t)} &=\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\omega(f_t)\\ &=\sum_{i=1}^n[g_i\omega_i+\frac{1}{2}h_i\omega_i^2]+\gamma T+\frac{1}{2}\lambda\sum_{j=1}^T\omega_j^2\\ &=\sum_{j=1}^T[(\sum_{i\in I_j}g_i)\omega_j+\frac{1}{2}(\sum_{i\in I_j}h_i+\lambda)\omega _j^2]+\gamma T\\ &=\sum_{j=1}^T[G_j\omega_j+\frac{1}{2}(H_j+\lambda)\omega _j^2]+\gamma T \end{aligned} obj(t)=i=1∑n[gift(xi)+21hift2(xi)]+ω(ft)=i=1∑n[giωi+21hiωi2]+γT+21λj=1∑Tωj2=j=1∑T[(i∈Ij∑gi)ωj+21(i∈Ij∑hi+λ)ωj2]+γT=j=1∑T[Gjωj+21(Hj+λ)ωj2]+γT
式 G j ω j + 1 2 ( H j + λ ) ω j 2 G_j\omega_j+\frac{1}{2}(H_j+\lambda)\omega _j^2 Gjωj+21(Hj+λ)ωj2是一个关于 ω j \omega_j ωj的一元二次式,可得目标函数最小值 o b j ∗ obj^* obj∗及对应的叶子节点得分 ω j \omega_j ωj为:
ω j ∗ = − G j H j + λ \omega_j^*=-\frac{G_j}{H_j+\lambda} ωj∗=−Hj+λGj
o b j ∗ = − 1 2 ∑ j = 1 T G j 2 H j + λ + γ T obj^*=-\frac{1}{2}\sum_{j=1}^T\frac{G_j^2}{H_j+\lambda}+\gamma T obj∗=−21∑j=1THj+λGj2+γT
至此,计算出第t棵树每个叶子节点的得分 ω j \omega_j ωj,另外, o b j ∗ obj^* obj∗是用来衡量当前第t棵树模型好坏的指标, o b j ∗ obj^* obj∗越小,树模型越好。
Learn the tree structure 构造第t棵树模型
构造树模型的要素:
1、确定用来划分样本的特征;
2、特征确定后,判断特征划分的参数。
设在某个树节点上,将样本分裂为左(L)和右(R)两个子节点;
分裂前,只有1个节点,T=1,目标函数得分为:
o b j b e f o r e = − 1 2 ( G L + G R ) 2 H L + H R + λ + γ obj_{before}=-\frac{1}{2}\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}+\gamma objbefore=−21HL+HR+λ(GL+GR)2+γ
分裂后,T=2,目标函数得分为:
o b j a f t e r = − 1 2 [ G L 2 H L + λ + G R 2 H L + λ ] + 2 γ obj_{after}=-\frac{1}{2}[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_L+\lambda}]+2\gamma objafter=−21[HL+λGL2+HL+λGR2]+2γ
此次分裂得到的目标函数增益(减小量)为:
G a i n = o b j b e f o r e − o b j a f t e r = 1 2 [ G L 2 H L + λ + G R 2 H L + λ − ( G L + G R ) 2 H L + H R + λ ] − γ Gain=obj_{before}-obj_{after}=\frac{1}{2}[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_L+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}]-\gamma Gain=objbefore−objafter=21[HL+λGL2+HL+λGR2−HL+HR+λ(GL+GR)2]−γ
根据Gain的大小即可筛选最佳变量及最佳划分点。