xgboost论文公式解析

最新推荐文章于 2021-12-06 11:22:08 发布

微电子学与固体电子学-俞驰

最新推荐文章于 2021-12-06 11:22:08 发布

阅读量636

点赞数

分类专栏：机器学习算法

本文链接：https://blog.csdn.net/appleyuchi/article/details/85176508

版权

机器学习算法专栏收录该内容

87 篇文章 7 订阅

订阅专栏

<XGBoost: A Scalable Tree Boosting System>
论文结构如下：
1.介绍。
2.回顾boosting Tree以及作者做的一系列修改
3.寻找最佳分割点的算法
4.加速设计
5.相关工作
6.评价

可以看出，论文的重点是第３部分，其他都是在优化和加速上面的工作。

原始论文式(1)
$\hat{y_i}=\phi(x_i)=\sum_{k=1}^Kf_k(x_i),f\in F$
K:K个弱分类器

where $F=\{f(x)=w_{q(x)}\}$
原文提到：
“here q represents the structure of each tree that maps an example to the correspoding leaf index”
意思就是：
q(x)表示ｘ这条测试数据到达了哪一个叶子节点

原始论文式(3)
$\tilde{L}^{(t)}=\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+Ω(f_t)$

原始论文:
expanding Ω( $f_t$ ) as follows:
\－－－－－－－－－－－－－－－－－－－－－－－－－－
原始论文式(2)
$L(\phi)=\sum_{i}l(\hat{y_i},y_i)+\sum_{k}Ω(f_k)$
where $Ω(f)=\gamma T+\frac{1}{2}\lambda||w||^2$
引用：
Here $l$ a differentiable convex loss function that measures
the difference between the prediction $\hat{y}_i$ and the target $y_i$ .
The second term Ω penalizes the complexity of the model
(i.e., the regression tree functions). The additional regular-
ization term helps to smooth the final learnt weights to avoid
over-fitting.
也就是说：这是一个可以微分的凸函数，是可以求得最小值的。

\－－－－－－－－－－－－－－－－－－－－－－－－－－
原始论文式(3)
泰勒公式展开到二阶近似：
$L^{(t)}\approx\sum_{i=1}^n[l(y_i,\hat{y}^{(t-1)})＋g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+Ω(f_k)$
这里的n表示的n条数据
引文：
Formally,let $\hat{y_i}^{t}$ be the prediction of the i-th instance at the t-th iteration,we will need to add $f_t$ to　minimize the following objecte.
$g_i=∂_{\hat{y}^{(t-1)}}l(y_i,\hat{y}^{(t-1)})$
$h_i=∂_{\hat{y}^{(t-1)}}^2l(y_i,\hat{y}^{(t-1)})$
\－－－－－－－－－－－－－－－－－－－－－－－－－－
原始论文式(4)
$\tilde{L}^{(t)}=\sum_{i=1}^n[g_if_t(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\gamma T+\frac{1}{2}\lambda\sum_{j=1}^T{w_j^2}$

这个式子继续往下推需要这么几个关系式(论文中没有提到)：

$\frac{1}{2}\sum_{j=1}^T\sum_{i\in I_j}h_iw_j^2=\frac{1}{2}\sum_{i=1}^n h_if_t^2(x_i)$

$\sum_{j=1}^T(\sum_{i\in I_j}g_i)w_j=\sum_{i=1}^ng_if_t(x_i)$

$w_j=f_t(x_i)$

T:叶子数量

$I_j$ :到达第j个叶子的数据集
$w_j$ :第j个叶子的权重，原文是：
“we use $w_i$ to represent score on i-th leaf”

$=\sum_{j=1}^T[(\sum_{i\in I_j}g_j)w_j+\frac{1}{2}(\sum_{i\in{I_j}}h_i+\lambda)w_j^2]+\gamma T$

\－－－－－－－－－－－－－－－－－－－－－－－－－－
原始论文式(5)
$w_j^{*}=-\frac{\sum_{i\in I_j}g_i}{\sum_{i \in I_j}h_i+\lambda}$

\－－－－－－－－－－－－－－－－－－－－－－－－－－
把式(5)带入(4)得到原始论文式(6)：
$\tilde{L}^{(t)}(q)=-\frac{1}{2}\sum_{j=1}^T\frac{(\sum_{i\in I_j}g_i)^2}{\sum_{i \in{I_j}}h_j+\lambda}+\gamma T$

\－－－－－－－－－－－－－－－－－－－－－－－－－－
然后是式(7),评价指标为：
$L_{split}=\frac{1}{2}[\frac{(\sum_{i\in{I_L}}g_i)^2}{\sum_{i \in I_L}h_i+\lambda}+\frac{(\sum_{i\in{I_R}}g_i)^2}{\sum_{i \in I_R}h_i+\lambda}- \frac{(\sum_{i\in{I}}g_i)^2}{\sum_{i \in I}h_i+\lambda} ]-\gamma$