Boosting Tree
f ( x ) = f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f(\bold{x}) = f_M(\bold{x}) = \sum_{m=1}^Mh(\bold{x};\theta_m) f(x)=fM(x)=m=1∑Mh(x;θm)
其中 h ( x ; θ m ) h(\bold{x;\theta_m}) h(x;θm)为第m棵树, θ m \theta_m θm为第m棵树的参数,M为决策树的数量。
前向分步算法:
- 首先确定初始提升树 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
- 第m步的提升树模型为 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_{m}({\bold{x}})=f_{m-1}({\bold{x}})+h_{m}({\bold{x}} ; \theta_{m}) fm(x)=fm−1(x)+hm(x;θm), θ m \theta_m θm为待求解的第m个树的参数
- ( x i , y ‾ i ) (\bold{x}_i,\overline{y}_i) (xi,yi)为样本, θ m : θ ^ m = a r g m i n θ m E [ L ( y , f m ( x ) ) ] = arg min θ m ∑ i = 1 N L ( y ‾ i , f m ( x i ) ) \theta_m: \hat{\theta}_m=argmin_{\theta_m}E[L(y,f_m(\bold{x}))]=\arg \min _{\theta_m} \sum_{i=1}^{N} L\left(\overline{y}_i, f_m\left(\bold{x}_{i}\right)\right) θm:θ^m=argminθmE[L(y,fm(x))]=argminθm∑i=1NL(yi,fm(xi))
损失函数有以下推导:
L ( y ˉ , f m ( x ) ) = ( y ˉ − f m ( x ) ) 2 L(\bar{y}, f_m(\bold{x})) = (\bar{y}-f_m(\bold{x}))^2 L(yˉ,fm(x))=(yˉ−fm(x))2
= ( y ˉ − f m − 1 ( x ) − h m ( x ; θ m ) ) 2 = ( r − h m ( x ; θ m ) ) 2 , r = y ˉ − f m − 1 ( x ) =(\bar{y}-f_{m-1}(\bold{x})-h_m(\bold{x};\theta_m))^2 = (r - h_m(\bold{x};\theta_m))^2,r=\bar{y}-f_{m-1}({\bold{x}}) =(yˉ−fm−1(x)−hm(x;θm))2=(r−hm(x;θm))2,r=yˉ−fm−1(x)
其中r为上一步模型拟合的残差, h m ( ⋅ ) h_{m}(\cdot) hm(⋅)拟合的是 f m − 1 ( ⋅ ) f_{m-1}(\cdot) fm−1(⋅)的残差。
回归提升树的算法过程
输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D=\{(\bold{x}_1,\bar{y}_1),...,(\bold{x}_n,\bar{y}_n)\} D={(x1,yˉ1),...,(xn,yˉn)}
输出: f M ( x ) f_M(\bold{x}) fM(x)
算法:
- 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
- 对于
m
=
1
,
2
,
.
.
,
M
m = 1, 2,..,M
m=1,2,..,M
- 计算残差 r m i = y ˉ i − f m − 1 ( x i ) r_{mi} = \bar{y}_i - f_{m-1}(\bold{x}_i) rmi=yˉi−fm−1(xi),构建训练样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } \{(\bold{x}_1,r_{m1}),...,(\bold{x}_n,r_{mn})\} {(x1,rm1),...,(xn,rmn)}
- 新建一个回归树拟合上一步残差,得到回归树 h m ( x ; θ m ) h_m({\bold{x};\theta_m}) hm(x;θm)
- 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(\bold{x}) = f_{m-1}(\bold{x})+h_m(\bold{x};\theta_m) fm(x)=fm−1(x)+hm(x;θm)
- 得到模型 f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f_M(\bold{x})=\sum_{m=1}^Mh(\bold{x};\theta_m) fM(x)=∑m=1Mh(x;θm)
Gradiant Boosting Tree
对损失函数进行泰勒展开,得到损失函数的近似表示:
L ( y ˉ , f m ( x ) ) = L ( y ˉ , f m − 1 ( x ) + h m ( x ; θ m ) ) L(\bar{y},f_m(\bold{x})) = L(\bar{y},f_{m-1}(\bold{x})+h_m(\bold{x};\theta_m)) L(yˉ,fm(x))=L(yˉ,fm−1(x)+hm(x;θm))
≈ L ( y ˉ , f m − 1 ( x ) ) + ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h m ( x ; θ m ) \approx L(\bar{y},f_{m-1}(\bold{x})) + \frac{\partial L(\bar{y}, f_{m-1}(\bold{x}))}{\partial f_{m-1}(\bold{x})} h_m(\bold{x};\theta_m) ≈L(yˉ,fm−1(x))+∂fm−1(x)∂L(yˉ,fm−1(x))hm(x;θm)
取 h m ( x ; θ m ) = − ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h_m(\bold{x};\theta_m) = -\frac{\partial L(\bar{y},f_{m-1}(\bold{x}))}{\partial f_{m-1}(\bold{x})} hm(x;θm)=−∂fm−1(x)∂L(yˉ,fm−1(x)),则损失函数下降
梯度提升树算法过程:
输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D=\{(\bold{x}_1,\bar{y}_1),...,(\bold{x}_n,\bar{y}_n)\} D={(x1,yˉ1),...,(xn,yˉn)}
输出: f M ( x ) f_M(\bold{x}) fM(x)
算法:
- 初始化 f 0 ( x ) = 0 f_0(\bold{x})=0 f0(x)=0
- 对于
m
=
1
,
2
,
.
.
.
,
M
m=1, 2,...,M
m=1,2,...,M
- 计算梯度 r m i = − ∂ L ( y ˉ i , f m − 1 ( x i ) ) ∂ f m − 1 ( x i ) r_{mi} = -\frac{\partial L(\bar{y}_i,f_{m-1}(\bold{x}_i))}{\partial f_{m-1}(\bold{x}_i)} rmi=−∂fm−1(xi)∂L(yˉi,fm−1(xi)),构建样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } \{(\bold{x}_1,r_{m1}),...,(\bold{x}_n,r_{mn})\} {(x1,rm1),...,(xn,rmn)}
- 拟合 r m i r_{mi} rmi,得到回归树 h m ( x ; θ m ) h_m(\bold{x};\theta_m) hm(x;θm)
- 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(\bold{x})=f_{m-1}(\bold{x}) + h_m(\bold{x};\theta_m) fm(x)=fm−1(x)+hm(x;θm)
- 得到模型 f M ( x ) = ∑ m = 1 M h m ( x ; θ m ) f_M(\bold{x})=\sum_{m=1}^Mh_m(\bold{x};\theta_m) fM(x)=∑m=1Mhm(x;θm)
xgboost
xgboost的第m步的损失函数定义为:
L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + Ω ( h m ( x ; θ m ) ) L({\theta}_{m})=\sum_{i=1}^{N} L\left(\bar{y}_{i}, f_{m}\left(\mathbf{x}_{i}\right)\right)+\Omega\left(h_{m}({\mathbf{x}};\theta_m)\right) L(θm)=i=1∑NL(yˉi,fm(xi))+Ω(hm(x;θm))
= ∑ i = 1 N L ( y ˉ i , f m − 1 ( x i ) + h m ( x i ; θ m ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 = \sum_{i=1}^{N} L(\bar{y}_{i}, f_{m-1}(\bold{x}_i)+h_m(\bold{x}_i;\theta_m))+\gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 =i=1∑NL(yˉi,fm−1(xi)+hm(xi;θm))+γT+21λj=1∑Twj2
其中 Ω ( ∗ ) \Omega(*) Ω(∗)表示正则项,具体包括: T T T表示m棵树叶子节点数量, w j w_j wj表示叶子节点 j j j输出值,正则项的含义是希望数的叶子节点数量较少,并且叶子节点的输出值不要出现极值。
由二阶泰勒展开:
f ( x + Δ x ) ≃ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 f(x+\Delta x) \simeq f(x)+f^{\prime}(x) \Delta x+\frac{1}{2} f^{\prime \prime}(x) \Delta x^{2} f(x+Δx)≃f(x)+f′(x)Δx+21f′′(x)Δx2
得到:
L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 L(\theta_m)= \sum_{i=1}^{N} L\left(\bar{y}_{i}, f_{m}\left(\mathbf{x}_{i}\right)\right)+\gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 L(θm)=i=1∑NL(yˉi,fm(xi))+γT+21λj=1∑Twj2
≈ ∑ i = 1 N [ L ( y ˉ i , f m − 1 ( x i ) ) + g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 \approx \sum_{i=1}^N[L(\bar{y}_i,f_{m-1}(\bold{x}_i)) + g_i * h_m(\bold{x}_i;\theta_m) + \frac{1}{2}h_i * h_m(\bold{x}_i;\theta_m)^2] + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 ≈i=1∑N[L(yˉi,fm−1(xi))+gi∗hm(xi;θm)+21hi∗hm(xi;θm)2]+γT+21λj=1∑Twj2
其中
y ^ i < m − 1 > = f m − 1 ( x i ) , g i = ∂ L ( y ˉ i , y ^ i < m − 1 > ) ∂ y ^ i < m − 1 > , h i = ∂ 2 L ( y ˉ i , y ^ i < m − 1 > ) ∂ 2 y ^ i < m − 1 > \hat{y}_i^{<m-1>}=f_{m-1}\left({\bold{x}_i}\right), \quad g_i=\frac{\partial L\left(\bar{y}_i, \hat{y}_i^{<m-1>}\right)}{\partial \hat{y}_i^{<m-1>}}, \quad h_i=\frac{\partial^{2} L\left(\bar{y}_i, \hat{y}_i^{<m-1>}\right)}{\partial^{2} \hat{y}_i^{<m-1>}} y^i<m−1>=fm−1(xi),gi=∂y^i<m−1>∂L(yˉi,y^i<m−1>),hi=∂2y^i<m−1>∂2L(yˉi,y^i<m−1>)
去掉常数项,简化为:
L ( θ m ) = ∑ i = 1 N [ g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 L(\theta_m) = \sum_{i=1}^N[g_i * h_m(\bold{x}_i;\theta_m) + \frac{1}{2}h_i * h_m(\bold{x}_i;\theta_m)^2] + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 L(θm)=i=1∑N[gi∗hm(xi;θm)+21hi∗hm(xi;θm)2]+γT+21λj=1∑Twj2
= ∑ j = 1 T [ ∑ x i ∈ I j [ g i ∗ w j + 1 2 h i ∗ w j 2 ] + 1 2 λ w j 2 ] + γ T =\sum_{j=1}^T[\sum_{\bold{x}_i\in I_j}[g_i * w_j + \frac{1}{2}h_i * w_j^2] + \frac{1}{2}\lambda w_j^2] + \gamma T =j=1∑T[xi∈Ij∑[gi∗wj+21hi∗wj2]+21λwj2]+γT
= ∑ j = 1 T [ w j ∑ x i ∈ I j g i + 1 2 w j 2 ( ∑ x i ∈ I j [ h i ] + λ ) + γ T =\sum_{j=1}^T[w_j\sum_{\bold{x}_i\in I_j}g_i + \frac{1}{2}w_j^2(\sum_{\bold{x}_i\in I_j}[h_i]+\lambda) + \gamma T =j=1∑T[wjxi∈Ij∑gi+21wj2(xi∈Ij∑[hi]+λ)+γT
其中 x i ∈ I j \bold{x}_i\in I_j xi∈Ij表示 x i \bold{x}_i xi落到第 j j j个叶子节点。
∂ L ∂ w j = 0 \frac{\partial L}{\partial w_j} = 0 ∂wj∂L=0
得到:
w j ∗ = − ∑ x i ∈ I j g i λ + ∑ x i ∈ I j h i ( 1 ) w_j^* = -\frac{\sum_{\bold{x}_i\in I_j}g_i}{\lambda + \sum_{\bold{x}_i\in I_j}h_i} \quad (1) wj∗=−λ+∑xi∈Ijhi∑xi∈Ijgi(1)
L ( w j ∗ ) = − 1 2 ∑ j = 1 T ( ∑ x i ∈ I j g i ) 2 λ + ∑ x i ∈ I j h i + γ T ( 2 ) L(w_j^*) = -\frac{1}{2}\sum_{j=1}^T\frac{(\sum_{\bold{x}_i\in I_j}g_i)^2}{\lambda + \sum_{\bold{x}_i\in I_j}h_i} + \gamma T \quad (2) L(wj∗)=−21j=1∑Tλ+∑xi∈Ijhi(∑xi∈Ijgi)2+γT(2)
根据损耗函数决定节点分裂方式,假设根据某个特征和特征值将样本分裂成 I L , I R I_L,I_R IL,IR两个集合,定义节点分裂的增益为:
G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ − ( G L + G R ) 2 H L + H R + λ ] − γ ( 3 ) Gain = \frac{1}{2}[\frac{G_L^2}{H_L+\lambda} + \frac{G_R^2}{H_R + \lambda}- \frac{(G_L+G_R)^2}{H_L+H_R+\lambda}] - \gamma \quad (3) Gain=21[HL+λGL2+HR+λGR2−HL+HR+λ(GL+GR)2]−γ(3)
G L = ∑ x i ∈ I L g i G_L=\sum_{\bold{x}_i\in I_L}g_i GL=xi∈IL∑gi
G R = ( ∑ x i ∈ I R g i ) 2 G_R=(\sum_{\bold{x}_i\in I_R}g_i)^2 GR=(xi∈IR∑gi)2
H L = ∑ x i ∈ I L h i H_L = \sum_{\bold{x}_i\in I_L} h_i HL=xi∈IL∑hi
H R = ∑ x i ∈ I R h i H_R = \sum_{\bold{x}_i\in I_R} h_i HR=xi∈IR∑hi
xgboost算法过程:
输入:训练集 { ( x 1 , y 1 ) , . . . , ( x n , y n ) } \{(\bold{x}_1,y_1),..., (\bold{x}_n,y_n)\} {(x1,y1),...,(xn,yn)}
输出: f M ( x ) f_M(\bold{x}) fM(x)
算法:
- 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
- 对于
m
=
1
,
.
.
.
,
M
m = 1,...,M
m=1,...,M
- 节点分裂增益定义(3)找到最优的分裂特征和分裂值进行分裂
- 根据(1)计算叶子节点输出值,得到第m个树 h m ( x ; θ m ) h_m(\bold{x};\theta_m) hm(x;θm)
- 更新模型 f m ( x ) = f m − 1 ( x ) − h m ( x ; θ m ) f_m(\bold{x}) = f_{m-1}(\bold{x})-h_m(\bold{x};\theta_m) fm(x)=fm−1(x)−hm(x;θm)
- 得到模型 f M ( x ) f_M(\bold{x}) fM(x)
LambdaMart
d c g @ T = ∑ i = 1 T 2 l a b l e ( i ) − 1 l o g ( i + 1 ) dcg@T=\sum_{i=1}^T\frac{2^{lable(i)}-1}{log(i+1)} dcg@T=i=1∑Tlog(i+1)2lable(i)−1
n d c g @ T = d c g @ T m a x D c g @ T ndcg@T=\frac{dcg@T}{maxDcg@T} ndcg@T=maxDcg@Tdcg@T
Δ N D C G ( i , j ) = ∣ n d c g ( o r i g i n a l S e q u e n c e ) − n d c g ( s e q u e n c e A f t e r S w a p ( i , j ) ) ∣ \Delta NDCG(i,j) = |ndcg(originalSequence) - ndcg(sequenceAfterSwap(i,j))| ΔNDCG(i,j)=∣ndcg(originalSequence)−ndcg(sequenceAfterSwap(i,j))∣
λ i j = − σ 1 + e σ ( s i − s j ) ∣ Δ N D C G ( i , j ) ∣ \lambda_{ij} = \frac{-\sigma }{1+e^{\sigma(s_i-s_j)}}|\Delta NDCG(i,j)| λij=1+eσ(si−sj)−σ∣ΔNDCG(i,j)∣
λ i = ∑ j : l a b l e ( i ) > l a b l e ( j ) λ i j − ∑ j : l a b e l ( j ) > l a b l e ( i ) λ i j \lambda_i = \sum_{j:lable(i)>lable(j)} \lambda_{ij}-\sum_{j:label(j)>lable(i)} \lambda_{ij} λi=j:lable(i)>lable(j)∑λij−j:label(j)>lable(i)∑λij
输入:树的数量M,训练数据集 { ( x 1 , l a b e l 1 ) , . . . , ( x n , l a b e l n ) } \{(\bold{x}_1,label_1),...,(\bold{x}_n,label_n)\} {(x1,label1),...,(xn,labeln)},学习率 η \eta η
输出: f M ( x ) f_M(\bold{x}) fM(x)
算法
- 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
- 对于
m
=
1
,
.
.
.
,
M
m=1,...,M
m=1,...,M
- 计算一阶导,也就是 x i x_i xi的 λ \lambda λ梯度,记为 y i = λ i y_i = \lambda_i yi=λi值,构成 { ( x 1 , λ 1 ) , . . . , ( x n , λ n ) } \{(\bold{x}_1,\lambda_1),...,(\bold{x}_n,\lambda_n)\} {(x1,λ1),...,(xn,λn)}
- 计算二阶导 w i = ∂ y i ∂ f m − 1 ( x i ) w_i = \frac{\partial y_i}{\partial f_{m-1}(\bold{x}_i)} wi=∂fm−1(xi)∂yi,用于下面基于牛顿法的优化
- 回归树拟合 λ i \lambda_i λi,得到决策树 f m ( x ) f_m(\bold{x}) fm(x)
- 计算叶子节点的值 γ m k = ∑ x i ∈ I k y i ∑ x i ∈ I k w i \gamma_{mk} = \frac{\sum_{x_i\in I_k} y_i}{\sum_{x_i\in I_k} w_i} γmk=∑xi∈Ikwi∑xi∈Ikyi
- 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ) f_m(\bold{x}) = f_{m-1}(\bold{x}) + h_m(\bold{x}) fm(x)=fm−1(x)+hm(x)
- 更新数据得分为 f m ( x i ) f_m(\bold{x}_i) fm(xi),根据最新得分重排序训练集
- 得到模型 f M ( x ) f_M(\bold{x}) fM(x)
待更新。。。