Summary of Boosted Tree and Factorization Machines

Boosted Tree

Definition:

y ^ = ∑ k = 1 K f k ( x ) \widehat y=\sum_{k=1}^{K}f_k(x) y =k=1Kfk(x)

In which f k ( x ) f_k(x) fk(x) is one of K regression trees.

Loss:

L o s s = ∑ i = 1 n L ( y i , y ^ i ) Loss=\sum_{i=1}^{n}L(y_i, \widehat y_i) Loss=i=1nL(yi,y i)

Add some regularization:

L o s s = ∑ i = 1 n L ( y i , y ^ i ) + ∑ k = 1 K Ω ( f k ) Loss=\sum_{i=1}^{n}L(y_i,\widehat y_i) + \sum_{k=1}^{K}\Omega(f_k) Loss=i=1nL(yi,y i)+k=1KΩ(fk)

Additive Training:

y ^ ( 1 ) = 0 \widehat y^{(1)} = 0 y (1)=0

y ^ ( t ) = y ^ ( t − 1 ) + f t ( x ) \widehat y^{(t)} = \widehat y^{(t-1)} + f_t(x) y (t)=y (t1)+ft(x)

L o s s ( t ) = ∑ i = 1 n L ( y i , y ^ i ( t ) ) + ∑ k = 1 t Ω ( f k ) Loss^{(t)}=\sum_{i=1}^{n}L(y_i, \widehat y_i^{(t)}) + \sum_{k=1}^{t}\Omega(f_k) Loss(t)=i=1nL(yi,y i(t))+k=1tΩ(fk)

= ∑ i = 1 n L ( y i , y ^ i ( t − 1 ) + f t ( x i ) ) + ∑ k = 1 t − 1 Ω ( f k ) + Ω ( f t ) =\sum_{i=1}^{n}L(y_i, \widehat y_i^{(t-1)}+f_t(x_i))+\sum_{k=1}^{t-1}\Omega(f_k)+\Omega(f_t) =i=1nL(yi,y i(t1)+ft(xi))+k=1t1Ω(fk)+Ω(ft)

= ∑ i = 1 n L ( y i , y ^ i ( t − 1 ) + f t ( x i ) ) + Ω ( f t ) + C =\sum_{i=1}^{n}L(y_i, \widehat y_i^{(t-1)}+f_t(x_i))+\Omega(f_t)+C =i=1nL(yi,y i(t1)+ft(xi))+Ω(ft)+C

≈ ∑ i = 1 n [ L ( y i , y ^ i ( t − 1 ) ) + f t ( x i ) ∂ L ∂ y ^ i ( t − 1 ) + 1 2 f t 2 ( x i ) ∂ L 2 ∂ y ^ i ( t − 1 ) ] + Ω ( f t ) + C \approx\sum_{i=1}^{n}[ L(y_i,\widehat y_i^{(t-1)})+f_t(x_i)\frac{\partial L}{\partial \widehat y_i^{(t-1)}}+\frac{1}{2}f_t^{2}(x_i)\frac{\partial L^2}{\partial \widehat y_i^{(t-1)}}]+\Omega(f_t)+C i=1n[L(yi,y i(t1))+ft(xi)y i(t1)L+21ft2(xi)y i(t1)L2]+Ω(ft)+C

= ∑ i = 1 n [ L ( y i , y ^ i ( t − 1 ) ) + f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C =\sum_{i=1}^{n}[ L(y_i,\widehat y_i^{(t-1)})+f_t(x_i)G_i+\frac{1}{2}f_t^{2}(x_i)H_i]+\Omega(f_t)+C =i=1n[L(yi,y i(t1))+ft(xi)Gi+21ft2(xi)Hi]+Ω(ft)+C

= ∑ i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C ′ =\sum_{i=1}^{n}[f_t(x_i)G_i+\frac{1}{2}f_t^{2}(x_i)H_i] + \Omega(f_t) + C' =i=1n[ft(xi)Gi+21ft2(xi)Hi]+Ω(ft)+C

Loss at time t is:

L o s s ( t ) = ∑ i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C ′ Loss^{(t)}=\sum_{i=1}^{n}[f_t(x_i)G_i+\frac{1}{2}f_t^{2}(x_i)H_i] + \Omega(f_t) + C' Loss(t)=i=1n[ft(xi)Gi+21ft2(xi)Hi]+Ω(ft)+C

Use:

f t ( x ) = w q ( x ) , q : R d → { 1 , 2 , . . . , M } , w i ∈ R f_t(x)=w_{q(x)}, q:R^d\rightarrow\{1,2,...,M\}, w_i \in R ft(x)=wq(x),q:Rd{1,2,...,M},wiR

Ω ( f ) = 1 2 λ ∑ i = 1 M w j 2 + γ M \Omega(f)=\frac{1}{2}\lambda\sum_{i=1}^{M}w_j^{2}+\gamma M Ω(f)=21λi=1Mwj2+γM

We get:

L o s s ( t ) = ∑ i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C ′ Loss^{(t)}=\sum_{i=1}^{n}[f_t(x_i)G_i+\frac{1}{2}f_t^{2}(x_i)H_i] + \Omega(f_t) + C' Loss(t)=i=1n[ft(xi)Gi+21ft2(xi)Hi]+Ω(ft)+C

= ∑ i = 1 n [ w q ( x i ) G i + 1 2 w q ( x i ) 2 H i ] + 1 2 λ ∑ j = 1 M w j 2 + γ M + C ′ =\sum_{i=1}^{n}[w_{q(x_i)}G_i+\frac{1}{2} w_{q(x_i)}^2H_i]+\frac{1}{2}\lambda\sum_{j=1}^{M}w_j^{2}+\gamma M+C' =i=1n[wq(xi)Gi+21wq(xi)2Hi]+21λj=1Mwj2+γM+C

With I j = { i ∣ q ( x i ) = j } I_j=\{i|q(x_i)=j\} Ij={iq(xi)=j}:

∑ i = 1 n w q ( x i ) G i = ∑ j = 1 M [ w j ∑ i ∈ I j G i ] \sum_{i=1}^{n}w_{q(x_i)}G_i=\sum_{j=1}^{M}[w_j\sum_{i\in I_j}^{}G_i] i=1nwq(xi)Gi=j=1M[wjiIjGi]

∑ i = 1 n 1 2 w q ( x i ) 2 H i = ∑ j = 1 M w j 2 ∑ i ∈ I j 1 2 H i \sum_{i=1}^{n}\frac{1}{2}w_{q(x_i)}^2H_i=\sum_{j=1}^{M}w_j^2\sum_{i \in I_j}^{}\frac{1}{2}H_i i=1n21wq(xi)2Hi=j=1Mwj2iIj21Hi

So:

L o s s ( t ) = ∑ j = 1 M [ w j ∑ i ∈ I j G i + w j 2 ∑ i ∈ I j 1 2 H i + 1 2 λ w j 2 ] + γ M + C ′ Loss^{(t)}=\sum_{j=1}^{M}[w_j\sum_{i\in I_j}G_i+w_j^2\sum_{i\in I_j}\frac{1}{2}H_i+\frac{1}{2}\lambda w_j^2]+\gamma M + C' Loss(t)=j=1M[wjiIjGi+wj2iIj21Hi+21λwj2]+γM+C

= ∑ j = 1 M [ w j ∑ i ∈ I j G i + 1 2 w j 2 ( λ + ∑ i ∈ I j H i ) ] + γ M + C ′ =\sum_{j=1}^{M}[w_j\sum_{i\in I_j}G_i+\frac{1}{2}w_j^2(\lambda+\sum_{i\in I_j}H_i)]+\gamma M + C' =j=1M[wjiIjGi+21wj2(λ+iIjHi)]+γM+C

With G j ′ = ∑ i ∈ I j G i , H j ′ = ∑ i ∈ I j H i G_j'=\sum_{i\in I_j}G_i, H_j'=\sum_{i\in I_j}H_i Gj=iIjGi,Hj=iIjHi:

L o s s ( t ) = ∑ j = 1 M [ w j G j ′ + 1 2 w j 2 ( λ + H j ′ ) ] + γ M + C ′ Loss^{(t)}=\sum_{j=1}^{M}[w_jG_j'+\frac{1}{2}w_j^2(\lambda+H_j')]+\gamma M + C' Loss(t)=j=1M[wjGj+21wj2(λ+Hj)]+γM+C

Finally:

w j ∗ = a r g m i n ( w j G j ′ + 1 2 w j 2 ( λ + H i ′ ) ) = − G j ′ λ + H i ′ w_j^*=argmin(w_jG_j'+\frac{1}{2}w_j^2(\lambda+H_i'))=-\frac{G_j'}{\lambda+H_i'} wj=argmin(wjGj+21wj2(λ+Hi))=λ+HiGj

O b j ( t ) = m i n ( L o s s ( t ) ) = − 1 2 ∑ j = 1 M G j ′ 2 H j ′ + λ + γ M + C ′ Obj^{(t)}=min(Loss^{(t)})=-\frac{1}{2}\sum_{j=1}^{M}\frac{G_j'^2}{H_j'+\lambda}+\gamma M + C' Obj(t)=min(Loss(t))=21j=1MHj+λGj2+γM+C

So for each iteration t of training, greedily seach for a regression tree f t ( x i ) = w q ( x i ) f_t(x_i)=w_{q(x_i)} ft(xi)=wq(xi) with w j = − G j ′ λ + H i ′ w_j=-\frac{G_j'}{\lambda+H_i'} wj=λ+HiGj with minumum O b j ( t ) Obj^{(t)} Obj(t) and add it to model.

Factorization Machines

y = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n − 1 ∑ j = i + 1 n &lt; v i , v j &gt; x i x j y=w_0+\sum_{i=1}^{n}w_ix_i+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}&lt;\boldsymbol{v}_i,\boldsymbol{v}_j&gt;x_ix_j y=w0+i=1nwixi+i=1n1j=i+1n<vi,vj>xixj

In which

w 0 ∈ R , w ∈ R n , v ∈ R n × k w_0\in R,\boldsymbol{w}\in R^{n},\boldsymbol{v}\in R^{n\times k} w0R,wRn,vRn×k

&lt; v i , v j &gt; = ∑ l = 1 k v i l v j l &lt;\boldsymbol{v}_i,\boldsymbol{v}_j&gt;=\sum_{l=1}^{k}v_{il}v_{jl} <vi,vj>=l=1kvilvjl

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值