LambdaMart

Boosting Tree

f ( x ) = f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f(\bold{x}) = f_M(\bold{x}) = \sum_{m=1}^Mh(\bold{x};\theta_m) f(x)=fM(x)=m=1Mh(x;θm)

其中 h ( x ; θ m ) h(\bold{x;\theta_m}) h(x;θm)为第m棵树, θ m \theta_m θm为第m棵树的参数,M为决策树的数量。

前向分步算法:

  • 首先确定初始提升树 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
  • 第m步的提升树模型为 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_{m}({\bold{x}})=f_{m-1}({\bold{x}})+h_{m}({\bold{x}} ; \theta_{m}) fm(x)=fm1(x)+hm(x;θm) θ m \theta_m θm为待求解的第m个树的参数
  • ( x i , y ‾ i ) (\bold{x}_i,\overline{y}_i) (xi,yi)为样本, θ m : θ ^ m = a r g m i n θ m E [ L ( y , f m ( x ) ) ] = arg ⁡ min ⁡ θ m ∑ i = 1 N L ( y ‾ i , f m ( x i ) ) \theta_m: \hat{\theta}_m=argmin_{\theta_m}E[L(y,f_m(\bold{x}))]=\arg \min _{\theta_m} \sum_{i=1}^{N} L\left(\overline{y}_i, f_m\left(\bold{x}_{i}\right)\right) θm:θ^m=argminθmE[L(y,fm(x))]=argminθmi=1NL(yi,fm(xi))

损失函数有以下推导:

L ( y ˉ , f m ( x ) ) = ( y ˉ − f m ( x ) ) 2 L(\bar{y}, f_m(\bold{x})) = (\bar{y}-f_m(\bold{x}))^2 L(yˉ,fm(x))=(yˉfm(x))2

= ( y ˉ − f m − 1 ( x ) − h m ( x ; θ m ) ) 2 = ( r − h m ( x ; θ m ) ) 2 , r = y ˉ − f m − 1 ( x ) =(\bar{y}-f_{m-1}(\bold{x})-h_m(\bold{x};\theta_m))^2 = (r - h_m(\bold{x};\theta_m))^2,r=\bar{y}-f_{m-1}({\bold{x}}) =(yˉfm1(x)hm(x;θm))2=(rhm(x;θm))2,r=yˉfm1(x)

其中r为上一步模型拟合的残差, h m ( ⋅ ) h_{m}(\cdot) hm()拟合的是 f m − 1 ( ⋅ ) f_{m-1}(\cdot) fm1()的残差。

回归提升树的算法过程

输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D=\{(\bold{x}_1,\bar{y}_1),...,(\bold{x}_n,\bar{y}_n)\} D={(x1,yˉ1),...,(xn,yˉn)}

输出: f M ( x ) f_M(\bold{x}) fM(x)

算法:

  • 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
  • 对于 m = 1 , 2 , . . , M m = 1, 2,..,M m=1,2,..,M
    • 计算残差 r m i = y ˉ i − f m − 1 ( x i ) r_{mi} = \bar{y}_i - f_{m-1}(\bold{x}_i) rmi=yˉifm1(xi),构建训练样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } \{(\bold{x}_1,r_{m1}),...,(\bold{x}_n,r_{mn})\} {(x1,rm1),...,(xn,rmn)}
    • 新建一个回归树拟合上一步残差,得到回归树 h m ( x ; θ m ) h_m({\bold{x};\theta_m}) hm(x;θm)
    • 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(\bold{x}) = f_{m-1}(\bold{x})+h_m(\bold{x};\theta_m) fm(x)=fm1(x)+hm(x;θm)
  • 得到模型 f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f_M(\bold{x})=\sum_{m=1}^Mh(\bold{x};\theta_m) fM(x)=m=1Mh(x;θm)
Gradiant Boosting Tree

对损失函数进行泰勒展开,得到损失函数的近似表示:

L ( y ˉ , f m ( x ) ) = L ( y ˉ , f m − 1 ( x ) + h m ( x ; θ m ) ) L(\bar{y},f_m(\bold{x})) = L(\bar{y},f_{m-1}(\bold{x})+h_m(\bold{x};\theta_m)) L(yˉ,fm(x))=L(yˉ,fm1(x)+hm(x;θm))

≈ L ( y ˉ , f m − 1 ( x ) ) + ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h m ( x ; θ m ) \approx L(\bar{y},f_{m-1}(\bold{x})) + \frac{\partial L(\bar{y}, f_{m-1}(\bold{x}))}{\partial f_{m-1}(\bold{x})} h_m(\bold{x};\theta_m) L(yˉ,fm1(x))+fm1(x)L(yˉ,fm1(x))hm(x;θm)

h m ( x ; θ m ) = − ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h_m(\bold{x};\theta_m) = -\frac{\partial L(\bar{y},f_{m-1}(\bold{x}))}{\partial f_{m-1}(\bold{x})} hm(x;θm)=fm1(x)L(yˉ,fm1(x)),则损失函数下降

梯度提升树算法过程

输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D=\{(\bold{x}_1,\bar{y}_1),...,(\bold{x}_n,\bar{y}_n)\} D={(x1,yˉ1),...,(xn,yˉn)}

输出: f M ( x ) f_M(\bold{x}) fM(x)

算法:

  • 初始化 f 0 ( x ) = 0 f_0(\bold{x})=0 f0(x)=0
  • 对于 m = 1 , 2 , . . . , M m=1, 2,...,M m=1,2,...,M
    • 计算梯度 r m i = − ∂ L ( y ˉ i , f m − 1 ( x i ) ) ∂ f m − 1 ( x i ) r_{mi} = -\frac{\partial L(\bar{y}_i,f_{m-1}(\bold{x}_i))}{\partial f_{m-1}(\bold{x}_i)} rmi=fm1(xi)L(yˉi,fm1(xi)),构建样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } \{(\bold{x}_1,r_{m1}),...,(\bold{x}_n,r_{mn})\} {(x1,rm1),...,(xn,rmn)}
    • 拟合 r m i r_{mi} rmi,得到回归树 h m ( x ; θ m ) h_m(\bold{x};\theta_m) hm(x;θm)
    • 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(\bold{x})=f_{m-1}(\bold{x}) + h_m(\bold{x};\theta_m) fm(x)=fm1(x)+hm(x;θm)
  • 得到模型 f M ( x ) = ∑ m = 1 M h m ( x ; θ m ) f_M(\bold{x})=\sum_{m=1}^Mh_m(\bold{x};\theta_m) fM(x)=m=1Mhm(x;θm)
xgboost

xgboost的第m步的损失函数定义为:

L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + Ω ( h m ( x ; θ m ) ) L({\theta}_{m})=\sum_{i=1}^{N} L\left(\bar{y}_{i}, f_{m}\left(\mathbf{x}_{i}\right)\right)+\Omega\left(h_{m}({\mathbf{x}};\theta_m)\right) L(θm)=i=1NL(yˉi,fm(xi))+Ω(hm(x;θm))

= ∑ i = 1 N L ( y ˉ i , f m − 1 ( x i ) + h m ( x i ; θ m ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 = \sum_{i=1}^{N} L(\bar{y}_{i}, f_{m-1}(\bold{x}_i)+h_m(\bold{x}_i;\theta_m))+\gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 =i=1NL(yˉi,fm1(xi)+hm(xi;θm))+γT+21λj=1Twj2

其中 Ω ( ∗ ) \Omega(*) Ω()表示正则项,具体包括: T T T表示m棵树叶子节点数量, w j w_j wj表示叶子节点 j j j输出值,正则项的含义是希望数的叶子节点数量较少,并且叶子节点的输出值不要出现极值。

由二阶泰勒展开:

f ( x + Δ x ) ≃ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 f(x+\Delta x) \simeq f(x)+f^{\prime}(x) \Delta x+\frac{1}{2} f^{\prime \prime}(x) \Delta x^{2} f(x+Δx)f(x)+f(x)Δx+21f(x)Δx2

得到:

L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 L(\theta_m)= \sum_{i=1}^{N} L\left(\bar{y}_{i}, f_{m}\left(\mathbf{x}_{i}\right)\right)+\gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 Lθm=i=1NL(yˉi,fm(xi))+γT+21λj=1Twj2

≈ ∑ i = 1 N [ L ( y ˉ i , f m − 1 ( x i ) ) + g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 \approx \sum_{i=1}^N[L(\bar{y}_i,f_{m-1}(\bold{x}_i)) + g_i * h_m(\bold{x}_i;\theta_m) + \frac{1}{2}h_i * h_m(\bold{x}_i;\theta_m)^2] + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 i=1N[L(yˉi,fm1(xi))+gihm(xi;θm)+21hihm(xi;θm)2]+γT+21λj=1Twj2

其中

y ^ i < m − 1 > = f m − 1 ( x i ) , g i = ∂ L ( y ˉ i , y ^ i < m − 1 > ) ∂ y ^ i < m − 1 > , h i = ∂ 2 L ( y ˉ i , y ^ i < m − 1 > ) ∂ 2 y ^ i < m − 1 > \hat{y}_i^{<m-1>}=f_{m-1}\left({\bold{x}_i}\right), \quad g_i=\frac{\partial L\left(\bar{y}_i, \hat{y}_i^{<m-1>}\right)}{\partial \hat{y}_i^{<m-1>}}, \quad h_i=\frac{\partial^{2} L\left(\bar{y}_i, \hat{y}_i^{<m-1>}\right)}{\partial^{2} \hat{y}_i^{<m-1>}} y^i<m1>=fm1(xi),gi=y^i<m1>L(yˉi,y^i<m1>),hi=2y^i<m1>2L(yˉi,y^i<m1>)

去掉常数项,简化为:

L ( θ m ) = ∑ i = 1 N [ g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 L(\theta_m) = \sum_{i=1}^N[g_i * h_m(\bold{x}_i;\theta_m) + \frac{1}{2}h_i * h_m(\bold{x}_i;\theta_m)^2] + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^Tw_j^2 L(θm)=i=1N[gihm(xi;θm)+21hihm(xi;θm)2]+γT+21λj=1Twj2

= ∑ j = 1 T [ ∑ x i ∈ I j [ g i ∗ w j + 1 2 h i ∗ w j 2 ] + 1 2 λ w j 2 ] + γ T =\sum_{j=1}^T[\sum_{\bold{x}_i\in I_j}[g_i * w_j + \frac{1}{2}h_i * w_j^2] + \frac{1}{2}\lambda w_j^2] + \gamma T =j=1T[xiIj[giwj+21hiwj2]+21λwj2]+γT

= ∑ j = 1 T [ w j ∑ x i ∈ I j g i + 1 2 w j 2 ( ∑ x i ∈ I j [ h i ] + λ ) + γ T =\sum_{j=1}^T[w_j\sum_{\bold{x}_i\in I_j}g_i + \frac{1}{2}w_j^2(\sum_{\bold{x}_i\in I_j}[h_i]+\lambda) + \gamma T =j=1T[wjxiIjgi+21wj2(xiIj[hi]+λ)+γT

其中 x i ∈ I j \bold{x}_i\in I_j xiIj表示 x i \bold{x}_i xi落到第 j j j个叶子节点。

∂ L ∂ w j = 0 \frac{\partial L}{\partial w_j} = 0 wjL=0

得到:

w j ∗ = − ∑ x i ∈ I j g i λ + ∑ x i ∈ I j h i ( 1 ) w_j^* = -\frac{\sum_{\bold{x}_i\in I_j}g_i}{\lambda + \sum_{\bold{x}_i\in I_j}h_i} \quad (1) wj=λ+xiIjhixiIjgi(1)

L ( w j ∗ ) = − 1 2 ∑ j = 1 T ( ∑ x i ∈ I j g i ) 2 λ + ∑ x i ∈ I j h i + γ T ( 2 ) L(w_j^*) = -\frac{1}{2}\sum_{j=1}^T\frac{(\sum_{\bold{x}_i\in I_j}g_i)^2}{\lambda + \sum_{\bold{x}_i\in I_j}h_i} + \gamma T \quad (2) L(wj)=21j=1Tλ+xiIjhi(xiIjgi)2+γT(2)

根据损耗函数决定节点分裂方式,假设根据某个特征和特征值将样本分裂成 I L , I R I_L,I_R IL,IR两个集合,定义节点分裂的增益为:

G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ − ( G L + G R ) 2 H L + H R + λ ] − γ ( 3 ) Gain = \frac{1}{2}[\frac{G_L^2}{H_L+\lambda} + \frac{G_R^2}{H_R + \lambda}- \frac{(G_L+G_R)^2}{H_L+H_R+\lambda}] - \gamma \quad (3) Gain=21[HL+λGL2+HR+λGR2HL+HR+λ(GL+GR)2]γ(3)

G L = ∑ x i ∈ I L g i G_L=\sum_{\bold{x}_i\in I_L}g_i GL=xiILgi

G R = ( ∑ x i ∈ I R g i ) 2 G_R=(\sum_{\bold{x}_i\in I_R}g_i)^2 GR=(xiIRgi)2

H L = ∑ x i ∈ I L h i H_L = \sum_{\bold{x}_i\in I_L} h_i HL=xiILhi

H R = ∑ x i ∈ I R h i H_R = \sum_{\bold{x}_i\in I_R} h_i HR=xiIRhi

xgboost算法过程

输入:训练集 { ( x 1 , y 1 ) , . . . , ( x n , y n ) } \{(\bold{x}_1,y_1),..., (\bold{x}_n,y_n)\} {(x1,y1),...,(xn,yn)}

输出: f M ( x ) f_M(\bold{x}) fM(x)

算法:

  • 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
  • 对于 m = 1 , . . . , M m = 1,...,M m=1,...,M
    • 节点分裂增益定义(3)找到最优的分裂特征和分裂值进行分裂
    • 根据(1)计算叶子节点输出值,得到第m个树 h m ( x ; θ m ) h_m(\bold{x};\theta_m) hm(x;θm)
    • 更新模型 f m ( x ) = f m − 1 ( x ) − h m ( x ; θ m ) f_m(\bold{x}) = f_{m-1}(\bold{x})-h_m(\bold{x};\theta_m) fm(x)=fm1(x)hm(x;θm)
  • 得到模型 f M ( x ) f_M(\bold{x}) fM(x)
LambdaMart

d c g @ T = ∑ i = 1 T 2 l a b l e ( i ) − 1 l o g ( i + 1 ) dcg@T=\sum_{i=1}^T\frac{2^{lable(i)}-1}{log(i+1)} dcg@T=i=1Tlog(i+1)2lable(i)1

n d c g @ T = d c g @ T m a x D c g @ T ndcg@T=\frac{dcg@T}{maxDcg@T} ndcg@T=maxDcg@Tdcg@T

Δ N D C G ( i , j ) = ∣ n d c g ( o r i g i n a l S e q u e n c e ) − n d c g ( s e q u e n c e A f t e r S w a p ( i , j ) ) ∣ \Delta NDCG(i,j) = |ndcg(originalSequence) - ndcg(sequenceAfterSwap(i,j))| ΔNDCG(i,j)=ndcg(originalSequence)ndcg(sequenceAfterSwap(i,j))

λ i j = − σ 1 + e σ ( s i − s j ) ∣ Δ N D C G ( i , j ) ∣ \lambda_{ij} = \frac{-\sigma }{1+e^{\sigma(s_i-s_j)}}|\Delta NDCG(i,j)| λij=1+eσ(sisj)σΔNDCG(i,j)

λ i = ∑ j : l a b l e ( i ) > l a b l e ( j ) λ i j − ∑ j : l a b e l ( j ) > l a b l e ( i ) λ i j \lambda_i = \sum_{j:lable(i)>lable(j)} \lambda_{ij}-\sum_{j:label(j)>lable(i)} \lambda_{ij} λi=j:lable(i)>lable(j)λijj:label(j)>lable(i)λij

输入:树的数量M,训练数据集 { ( x 1 , l a b e l 1 ) , . . . , ( x n , l a b e l n ) } \{(\bold{x}_1,label_1),...,(\bold{x}_n,label_n)\} {(x1,label1),...,(xn,labeln)},学习率 η \eta η

输出: f M ( x ) f_M(\bold{x}) fM(x)

算法

  • 初始化 f 0 ( x ) = 0 f_0(\bold{x}) = 0 f0(x)=0
  • 对于 m = 1 , . . . , M m=1,...,M m=1,...,M
    • 计算一阶导,也就是 x i x_i xi λ \lambda λ梯度,记为 y i = λ i y_i = \lambda_i yi=λi值,构成 { ( x 1 , λ 1 ) , . . . , ( x n , λ n ) } \{(\bold{x}_1,\lambda_1),...,(\bold{x}_n,\lambda_n)\} {(x1,λ1),...,(xn,λn)}
    • 计算二阶导 w i = ∂ y i ∂ f m − 1 ( x i ) w_i = \frac{\partial y_i}{\partial f_{m-1}(\bold{x}_i)} wi=fm1(xi)yi,用于下面基于牛顿法的优化
    • 回归树拟合 λ i \lambda_i λi,得到决策树 f m ( x ) f_m(\bold{x}) fm(x)
    • 计算叶子节点的值 γ m k = ∑ x i ∈ I k y i ∑ x i ∈ I k w i \gamma_{mk} = \frac{\sum_{x_i\in I_k} y_i}{\sum_{x_i\in I_k} w_i} γmk=xiIkwixiIkyi
    • 更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ) f_m(\bold{x}) = f_{m-1}(\bold{x}) + h_m(\bold{x}) fm(x)=fm1(x)+hm(x)
    • 更新数据得分为 f m ( x i ) f_m(\bold{x}_i) fm(xi),根据最新得分重排序训练集
  • 得到模型 f M ( x ) f_M(\bold{x}) fM(x)

待更新。。。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值