gbdt基本原理

为什么要建立多棵树

函数空间上的梯度下降

设样本为 ( x j , y j ) , j = 1 , 2 , ⋯   , n \begin{array}{rcl}(x^j,y^j),j=1,2,\cdots,n\end{array} (xj,yj),j=1,2,,n
对于回归问题,损失函数为
L = ∑ j = 1 n l ( x j , y j , f j ) = L ( f 1 , f 2 , ⋯   , f j ) = L ( F ) \begin{array}{rcl}L&=&\sum_{j=1}^nl(x^j,y^j,f^j)\\&=&L(f^1,f^2,\cdots,f^j)\\&=&L(F)\end{array} L===j=1nl(xj,yj,fj)L(f1,f2,,fj)L(F)
对于二分类问题,损失函数为
L = ∑ j = 1 n l ( x j , y j , σ ( f j ) ) = L ( f 1 , f 2 , ⋯   , f j ) = L ( F ) \begin{array}{rcl}L&=&\sum_{j=1}^nl(x^j,y^j,\sigma(f^j))\\&=&L(f^1,f^2,\cdots,f^j)\\&=&L(F)\end{array} L===j=1nl(xj,yj,σ(fj))L(f1,f2,,fj)L(F)
F是多维函数 ( f 1 , f 2 , ⋯   , f j ) (f^1,f^2,\cdots,f^j) (f1,f2,,fj)
目的要求
F = a r g m i n F L ( F ) F=\underset F{argmin}L(F) F=FargminL(F)

根据梯度下降有
F 0 = F 0 F 1 = F 0 − η ∇ L ∣ F = F 0 ⋯ F i = F i − 1 − η ∇ L ∣ F = F i − 1 F_0=F_0\\F_1=F_0-\eta\nabla L\vert_{F=F_0}\\\cdots\\F_i=F_{i-1}-\eta\nabla L\vert_{F=F_{i-1}} F0=F0F1=F0ηLF=F0Fi=Fi1ηLF=Fi1
所以有
F = F 0 + η ∑ i = 0 m − ∇ L ∣ F = F i F=F_0+\eta\sum_{i=0}^m-\nabla L\vert_{F=F_i} F=F0+ηi=0mLF=Fi
取每个维度,
f j = f 0 j + η ∑ i = 0 m − ∂ L ∂ f j ∣ f j = f i j = f 0 j + η ∑ i = 0 m − ∂ l ∂ f j ∣ f j = f i j j = 1 , 2 , ⋯   , n \begin{array}{rcl}f^j&=&f_0^j+\eta\sum_{i=0}^m-{\textstyle\frac{\partial L}{\partial f^j}}\vert_{f^j=f_i^j}\\&=&f_0^j+\eta\sum_{i=0}^m-{\textstyle\frac{\partial l}{\partial f^j}}\vert_{f^j=f_i^j}\\j&=&1,2,\cdots,n\end{array} fjj===f0j+ηi=0mfjLfj=fijf0j+ηi=0mfjlfj=fij1,2,,n
f j f^j fj看成函数 f f f,即
f = { f 1 , x = x 1 f 2 , x = x 2 ⋯ f n , x = x n = f ( x ) \begin{array}{rcl}f&=&\left\{\begin{array}{c}\begin{array}{c}\begin{array}{c}f^1,x=x^1\\f^2,x=x^2\end{array}\end{array}\\\cdots\\f^n,x=x^n\end{array}=f(x)\right.\end{array} f=f1,x=x1f2,x=x2fn,x=xn=f(x)
所以
f = f 0 + η ∑ i = 0 m − ∂ l ∂ f ∣ f = f i f=f_0+\eta\sum_{i=0}^m-{\textstyle\frac{\partial l}{\partial f}}\vert_{f=f_i} f=f0+ηi=0mflf=fi

T 0 = f 0 T 1 = − ∂ l ∂ f ∣ f = f 0 ⋯ T i = − ∂ l ∂ f ∣ f = f i T_0=f_0\\T_1=-{\textstyle\frac{\partial l}{\partial f}}\vert_{f=f_0}\\\cdots\\T_i=-{\textstyle\frac{\partial l}{\partial f}}\vert_{f=fi} T0=f0T1=flf=f0Ti=flf=fi

f = T 0    + η T 1 + ⋯ + η T i f=T_0\;+\eta T_1+\cdots+\eta T_i f=T0+ηT1++ηTi
T 0 T_0 T0为初始函数,可以随意定。
对于基分类器 T i T_i Ti,则有
T i ( x ) = { − ∂ l ∂ f ∣ f = f i 1 , x = x 1 − ∂ l ∂ f ∣ f = f i 2 , x = x 2 ⋯ − ∂ l ∂ f ∣ f = f i n , x = x n T_i(x)=\left\{\begin{array}{c}\begin{array}{c}\begin{array}{c}-\frac{\partial l}{\partial f}\vert_{f=f_i^1},x=x^1\\-\frac{\partial l}{\partial f}\vert_{f=f_i^2},x=x^2\end{array}\end{array}\\\cdots\\-\frac{\partial l}{\partial f}\vert_{f=f_i^n},x=x^n\end{array}\right. Ti(x)=flf=fi1,x=x1flf=fi2,x=x2flf=fin,x=xn

残差推导

对于回归问题,取平方损失 l = ( y j − f ) 2 l=(y^j-f)^2 l=(yjf)2,则有
T i = − ∂ l ∂ f ∣ f = f i = 2 ( f i − y j ) T_i=-{\textstyle\frac{\partial l}{\partial f}}\vert_{f=f_i}=2(f_i-y^j) Ti=flf=fi=2(fiyj)
对于二分类问题,取交叉熵损失 l = y j l n ( σ ( f ) ) + ( 1 − y j ) l n ( 1 − σ ( f ) ) l=y^jln(\sigma(f))+(1-y^j)ln(1-\sigma(f)) l=yjln(σ(f))+(1yj)ln(1σ(f)),则有
T i = − ∂ l ∂ f ∣ f = f i = σ ( f i ) − y j T_i=-{\textstyle\frac{\partial l}{\partial f}}\vert_{f=f_i}=\sigma(f_i)-y^j Ti=flf=fi=σ(fi)yj

建树过程

第一棵树

第一棵树只有一个根节点,即对所有的 x j x^j xj,都有唯一的输出c。
对于回归问题
T 0 = c = a r g m i n c ( L ( c ) ) = a r g m i n c ( ∑ j = 1 n l ( x j , y j , c ) ) = a r g m i n c ( ∑ j = 1 n ( c − y j ) 2 ) T_0=c=\underset c{argmin}(L(c))\\=\underset c{argmin}(\sum_{j=1}^nl(x^j,y^j,c))\\=\underset c{argmin}(\sum_{j=1}^n{(c-y^j)}^2) T0=c=cargmin(L(c))=cargmin(j=1nl(xj,yj,c))=cargmin(j=1n(cyj)2)
求导得到
∂ L ∂ c = 2 ∑ j = 1 n c − y j = 0 \frac{\partial L}{\partial c}=2\sum_{j=1}^nc-y^j=0 cL=2j=1ncyj=0
解得
c = 1 n ∑ j = 1 n y j c=\frac1n\sum_{j=1}^ny^j c=n1j=1nyj
对于二分类问题
T 0 = c = a r g m i n c ( L ( c ) ) = a r g m i n c ( ∑ j = 1 n l ( x j , y j , σ ( c ) ) ) = a r g m i n c ( ∑ j = 1 n y j l n ( σ ( c ) ) + ( 1 − y j ) l n ( 1 − σ ( c ) ) ) T_0=c=\underset c{argmin}(L(c))\\=\underset c{argmin}(\sum_{j=1}^nl(x^j,y^j,\sigma(c)))\\=\underset c{argmin}(\sum_{j=1}^ny^jln(\sigma(c))+(1-y^j)ln(1-\sigma(c))) T0=c=cargmin(L(c))=cargmin(j=1nl(xj,yj,σ(c)))=cargmin(j=1nyjln(σ(c))+(1yj)ln(1σ(c)))
求导得到
∂ L ∂ c = ∑ j = 1 n ( σ ( c ) − y j ) = 0 \frac{\partial L}{\partial c}=\sum_{j=1}^n(\sigma(c)-y^j)=0 cL=j=1n(σ(c)yj)=0
解得
c = σ − 1 ( 1 n ∑ j = 1 n y j ) c=\sigma^{-1}(\frac1n\sum_{j=1}^ny^j) c=σ1(n1j=1nyj)

第i棵树

如何分裂

这里选择CART回归树,对特征排序,遍历分裂点,把样本分为两个组,L和R
G = m i n ( ∑ y j ∈ L ( y j − c 1 ) 2 + ∑ y j ∈ R ( y j − c 2 ) 2 ) G=min(\sum_{y^j\in L}{(y^j-c_1)}^2+\sum_{y^j\in R}{(y^j-c_2)}^2) G=min(yjL(yjc1)2+yjR(yjc2)2)
解得
c 1 = 1 L ∑ y j ∈ L y j c 2 = 1 R ∑ y j ∈ R y j c_1=\frac1L\sum_{y^j\in L}y^j\\c_2=\frac1R\sum_{y^j\in R}y^j c1=L1yjLyjc2=R1yjRyj
取能够使G达到最小的特征及其对应的分裂点

如何取值

如何计算c
对于回归问题
c = a r g m i n c ( ∑ y j ∈ R l ( x j , y j , f i j + c ) ) = a r g m i n c ( ∑ y j ∈ R ( f i j + c − y j ) 2 ) c=\underset c{argmin}(\sum_{y^j\in R}l(x^j,y^j,f_i^j+c))\\=\underset c{argmin}{(\sum_{y^j\in R}(f_i^j+c-y^j)}^2) c=cargmin(yjRl(xj,yj,fij+c))=cargmin(yjR(fij+cyj)2)
解得
c = 1 R ∑ y j ∈ R ( y j − f i j ) c=\frac1R\sum_{y^j\in R}(y^j-f_i^j) c=R1yjR(yjfij)
对于分类问题
c = a r g m i n c ( ∑ y j ∈ R l ( x j , y j , σ ( f i j + c ) ) ) = a r g m i n c ( ∑ y j ∈ R y j l n ( σ ( f i j + c ) ) + ( 1 − y j ) l n ( 1 − σ ( f i j + c ) ) ) c=\underset c{argmin}(\sum_{y^j\in R}l(x^j,y^j,\sigma(f_i^j+c)))\\=\underset c{argmin}{(\sum_{y^j\in R}} y^jln(\sigma(f_i^j+c))+(1-y^j)ln(1-\sigma(f_i^j+c))) c=cargmin(yjRl(xj,yj,σ(fij+c)))=cargmin(yjRyjln(σ(fij+c))+(1yj)ln(1σ(fij+c)))
令导数为0
∂ L ∂ c = ∑ j ∈ R ( y j − σ ( f i j + c ) ) = 0 \frac{\partial L}{\partial c}=\sum_{j\in R}(y^j-\sigma(f_i^j+c))=0 cL=jR(yjσ(fij+c))=0
该函数的导数为
∂ 2 L ∂ c 2 = − ∑ j ∈ R σ ( f i j + c ) ( 1 − σ ( f i j + c ) ) < 0 \frac{\partial^2L}{\partial c^2}=-\sum_{j\in R}\sigma(f_i^j+c)(1-\sigma(f_i^j+c))<0 c22L=jRσ(fij+c)(1σ(fij+c))<0
函数递减,图像如下
在这里插入图
利用牛顿一阶近似求零点,左边直线方程为
y = k x + b y=kx+b y=kx+b
k = ∂ 2 L ∂ c 2 ∣ c = 0 = − ∑ j ∈ R σ ( f i j ) ( 1 − σ ( f i j ) ) b = ∂ L ∂ c ∣ c = 0 = ∑ j ∈ R y j − σ ( f i j ) k=\frac{\partial^2L}{\partial c^2}|_{c=0}=-\sum_{j\in R}\sigma(f_i^j)(1-\sigma(f_i^j))\\ b=\frac{\partial L}{\partial c}|_{c=0}=\sum_{j\in R}y^j-\sigma(f_i^j) k=c22Lc=0=jRσ(fij)(1σ(fij))b=cLc=0=jRyjσ(fij)
所以左边比较靠近c的红色圆圈的点的坐标为
c ^ = − b k = ∑ j ∈ R y j − σ ( f i j ) ∑ j ∈ R σ ( f i j ) ( 1 − σ ( f i j ) ) \widehat c=-\frac bk=\frac{\sum_{j\in R}y^j-\sigma(f_i^j)}{\sum_{j\in R}\sigma(f_i^j)(1-\sigma(f_i^j))} c =kb=jRσ(fij)(1σ(fij))jRyjσ(fij)

多分类问题

假如有K>=3种分类,那么gbdt会创建K串树,每一串会在各自的梯度上分裂增长。但是梯度计算是相互依赖的。
L = ∑ j = 1 n l ( x j , y j , s ( f j ) ) = ∑ j = 1 n ∑ k = 1 K y j k ln ⁡ e f i j k ∑ t = 1 K e f i j t L=\sum_{j=1}^nl(x^j,y^j,s(f^j))=\sum_{j=1}^n\sum_{k=1}^Ky^{jk}\ln{\textstyle\frac{e^{f_i^{jk}}}{\textstyle\sum_{t=1}^Ke^{f_i^{jt}}}} L=j=1nl(xj,yj,s(fj))=j=1nk=1Kyjklnt=1Kefijtefijk
如果 y j k = 1 y^{jk} = 1 yjk=1,则
∂ L ∂ f i j k = ∑ t = 1 K e f i j t e f i j k e f i j k ∑ t = 1 K e f i j t    −    ( e f i j k ) 2 ( ∑ t = 1 K e f i j t ) 2 = 1 − e f i j k ∑ t = 1 K e f i j t \frac{\partial L}{\partial f_i^{jk}}=\frac{\sum_{t=1}^Ke^{f_i^{jt}}}{\displaystyle e^{f_i^{jk}}}{\textstyle\frac{e^{f_i^{jk}}\sum_{t=1}^Ke^{f_i^{jt}}\;-\;{(e^{f_i^{jk}})}^2}{\textstyle{(\sum_{t=1}^Ke^{f_i^{jt}})}^2}=1-}\frac{e^{f_i^{jk}}}{\displaystyle\sum\nolimits_{t=1}^Ke^{f_i^{jt}}} fijkL=efijkt=1Kefijt(t=1Kefijt)2efijkt=1Kefijt(efijk)2=1t=1Kefijtefijk
如果 y j k = 0 y^{jk} = 0 yjk=0,则
∂ L ∂ f i j k = − e f i j k ∑ t = 1 K e f i j t \frac{\partial L}{\partial f_i^{jk}}=-\frac{e^{f_i^{jk}}}{\displaystyle\sum\nolimits_{t=1}^Ke^{f_i^{jt}}} fijkL=t=1Kefijtefijk
所以
∂ L ∂ f i j k = y j k − e f i j k ∑ t = 1 K e f i j t \frac{\partial L}{\partial f_i^{jk}}=y^{jk}-\frac{e^{f_i^{jk}}}{\displaystyle\sum\nolimits_{t=1}^Ke^{f_i^{jt}}} fijkL=yjkt=1Kefijtefijk
公式编辑网址
http://www.wiris.com/editor/demo/en/developers

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值