Xgboost算法(回归树)
1、算法原理
步骤(booststrap sampling):
目标函数: o b j ( t ) = ∑ i = 1 n L ( y i , y ^ i ( t − 1 ) + f t x i ) + Ω f ( t ) + C obj^{(t)}=\sum_{i=1}^nL(y_i,\widehat y_i^{(t-1)}+f_t^{x_i})+\Omega f(t)+C obj(t)=∑i=1nL(yi,y
i(t−1)+ftxi)+Ωf(t)+C
Taylor展开: f ( x + Δ x ) ≈ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) ( Δ x ) 2 f(x+\Delta x)\approx f(x)+f'(x)\Delta x+\frac{1}{2}f''(x)(\Delta x)^2 f(x+Δx)≈f(x)+f′(x)Δx+21f′′(x)(Δx)2
加法训练优化步骤:
{ y ^ i ( 0 ) = 0 y ^ i ( 1 ) = f 1 ( x i ) = y ^ i ( 0 ) + f 1 ( x i ) . . . . . . y ^ i ( t ) = ∑ k = 1 t f k ( x i ) = y ^ i ( t − 1 ) + f t ( x i ) \begin{cases} \widehat y_i^{(0)}=0\\ \widehat y_i^{(1)}=f_1(x_i)=\widehat y_i^{(0)}+f_1(x_i)\\ ......\\ \widehat y_i^{(t)}=\sum_{k=1}^tf_k(x_i)=\widehat y_i^{(t-1)}+f_t(x_i) \end{cases} ⎩⎪⎪⎪⎨⎪⎪⎪⎧y
i(0)=0y
i(1)=f1(xi)=y
i(0)+f1(xi)......y
i(t)=∑k=1tfk(xi)=y
i(t−1)+ft(xi)
目标函数进一步可表示为:
o b j ( t ) = ∑ i = 1 n l ( y i , y ^ i ( t ) ) + ∑ i = 1 t Ω f ( t ) obj^{(t)}=\sum_{i=1}^nl(y_i,\widehat y_i^{(t)})+\sum_{i=1}^t\Omega f(t) obj(t)=∑i=1nl(yi,y
i(t))+∑i=1tΩf(t)
∑ i = 1 t Ω f ( t ) = Ω f ( t ) + ∑ i = 1 t − 1 Ω f ( t − 1 ) = Ω f ( t ) + c o n s t a t n t \sum_{i=1}^t\Omega f(t)=\Omega f(t)+\sum_{i=1}^{t-1}\Omega f(t-1)=\Omega f(t)+constatnt ∑i=1