ml_ensemble_boostedtrees_guide

最新推荐文章于 2024-06-28 20:50:44 发布

徐长亮

最新推荐文章于 2024-06-28 20:50:44 发布

阅读量336

点赞数 1

分类专栏： clark_ai_lab

原文链接：https://xgboost.readthedocs.io/en/latest/tutorials/model.html

版权

clark_ai_lab 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

极端梯队提升 Extreme Gradient Boosting
贪婪函数逼近 Greedy Function Approximation
梯队提升树 gradient boosted trees

监督学习 Supervised Learning

模型和参数 Model and Parameters

线性回归预测为例 linear model prediction :

$\hat{y}_i = \sum_j \theta_j x_{ij}$

模型解读: a linear combination of weighted input features 加权输入特征的线性组合

预测值含义: depending on the task, i.e., regression or classification

参数: are the undetermined part that we need to learn from data

目标函数 objective function :

训练的目的: The task of training the model amounts to finding the best parameters θ that best fit the training data xi and labels yi

目标函数的用途: define the objective function to measure how well the model fit the training data.

目标函数的定义objective function

training loss and regularization term:

$\text{obj}(\theta) = L(\theta) + \Omega(\theta)$

包含两部分:training loss and regularization term

训练损失函数training loss function

训练损失函数的用途:
我们的模型对训练数据的预测性
measures how predictive our model is with respect to the training data

均方误差mean squared error

公式定义:

$L(\theta) = \sum_i (y_i-\hat{y}_i)^2$

逻辑损失函数 ogistic loss function

$L(\theta) = \sum_i[ y_i\ln (1+e^{-\hat{y}_i}) + (1-y_i)\ln (1+e^{\hat{y}_i})]$

正则化项 regularization term

意义: 控制模型的复杂性，这有助于我们避免过度拟合
controls the complexity of the model, to avoid overfitting

平衡损失函数和正则化项 bias-variance tradeoff

too many spits, 正则化项函数值高.
wrong split point, 损失函数数值高.
两者之间的权衡也称为机器学习中的bias-variance tradeoff 偏差-方差权衡。

平衡损失函数和正则化项

决策树集合 Decision Tree Ensembles

分类树,回归树的集合 classification trees, regression trees

CART案例

输入: age,gender,occupation 年龄,性别,职业
输出: prediction score in each leaf

使用集合模型的原因:

a single tree is not strong enough to be used in practice. What is actually used is the ensemble model, which sums the prediction of multiple trees together.
单棵树的强度不足以在实践中使用。实际使用的是集合模型，它将多个树的预测相加在一起。

另一个好处:

two trees try to complement each other
一个重要的事实是这两棵树试图相互补充

数学公式

预测模型:

$\hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in \mathcal{F}$

K is the number of trees
f is a function in the functional space \mathcal{F},
and $\mathcal{F}$ is the set of all possible CARTs
所有可能的CART的集合

目标函数:

$\text{obj}(\theta) = \sum_i^n l(y_i, \hat{y}_i) + \sum_{k=1}^K \Omega(f_k)$

提升树 Tree Boosting

所有监督学习模型训练都是如此：定义目标函数并对其进行优化！
for all supervised learning models: define an objective function and optimize it!

$\text{obj} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\Omega(f_i)$

加法训练 Additive Training

关于 $f_i$

$\hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in \mathcal{F}$

f is a function in the functional space $\mathcal{F}$ ,
and \mathcal{F} is the set of all possible CARTs

$f_i$ 每个都包含树的结构和叶子的分数
一次学习所有树是难以处理的
我们使用一个加法策略：修复我们学到的东西，并一次添加一个新树

预测值为 $\hat{y}_i^{(t)}$
$KaTeX parse error: Expected 'EOF', got '&' at position 16: hat{y}_i^{(0)} &̲= 0\\ \hat{y}_i…$
目标函数:
$\text{obj}^{(t)} & = \sum_{i=1}^n l(y_i, \hat{y}i^{(t)}) + \sum{i=1}^t\Omega(f_i) \$
$KaTeX parse error: Expected 'EOF', got '&' at position 1: &̲ = \sum_{i=1}^n…$

如果我们考虑使用均方误差mean squared error（MSE）作为我们的损失函数，那么目标就变成了

目标函数为

$\text{obj}^{(t)} & = \sum_{i=1}^n (y_i - (\hat{y}i^{(t-1)} + f_t(x_i)))^2 + \sum{i=1}^t\Omega(f_i) \$
$KaTeX parse error: Expected 'EOF', got '&' at position 1: &̲ = \sum_{i=1}^n…$

MSE的形式是友好的，具有一阶项（通常称为残差）和二次项
a first order term (usually called the residual) and a quadratic term

我们将损失函数的泰勒展开式提升到二阶：
we take the Taylor expansion of the loss function up to the second order:
$\text{obj}^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + \mathrm{constant}$

其中 gi 和 hi 被定义为

$g_i &= \partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\$
$KaTeX parse error: Expected 'EOF', got '&' at position 5: h_i &̲= \partial_{\ha…$

在我们删除所有常量后，步骤中的具体目标 t 变

$\sum_{i=1}^n [g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t)$

  这成为我们对新树的优化目标。该定义的一个重要优点是目标函数的值仅取决于gi 和 hi。
  这就是XGBoost如何支持自定义损失函数。

模型复杂度 Model Complexity 正则项

定义树的复杂性 \Omega(f)
完善树的定义f(x)
$f_t(x) = w_{q(x)}, w \in R^T, q:R^d\rightarrow \{1,2,\cdots,T\}$

w 是叶子上的分数矢量， q 是一个将每个数据点分配给相应叶子的函数，和 T是叶子的数量

在XGBoost中，我们将复杂性定义为
$\Omega(f) = \gamma T + \frac{1}{2}\lambda \sum_{j=1}^T w_j^2$

The Structure Score

在重新制定树模型后，我们可以用它来写出目标值t树为

$KaTeX parse error: Expected 'EOF', got '&' at position 17: …ext{obj}^{(t)} &̲\approx \sum_{i…$

$I_j = \{i|q(x_i)=j\}$ 是分配给的数据点的索引集 j - 叶子

因为同一叶子上的所有数据点都得到相同的分数

我们可以通过定义来进一步压缩表达式

$G_j = \sum_{i\in I_j} g_i H_j = \sum_{i\in I_j} h_i$

$\text{obj}^{(t)} = \sum^T_{j=1} [G_jw_j + \frac{1}{2} (H_j+\lambda) w_j^2] +\gamma T$

wj 形式相互独立 $G_jw_j+\frac{1}{2}(H_j+\lambda)w_j^2$ 是二次的
也是最好的 wj 对于给定的结构 q(x) 我们可以得到的最好的客观减少是

$KaTeX parse error: Expected 'EOF', got '&' at position 10: w_j^\ast &̲= -\frac{G_j}{H…$

the smaller the sore is ,the better the structure is

Learn the tree structure

理想情况下我们会列举所有可能的树并选择最好的树。在实践中，这是难以处理的，因此我们将尝试一次优化树的一个级别。具体来说，我们尝试将一片叶子分成两片叶子，它获得的分数是

$\frac{1}{2} \left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right] - \gamma$

该公式可以分解为1）新左叶上的分数2）新右叶上的分数3）原叶上的分数4）附加叶上的正则化。

看到一个重要的事实：如果增益小于γ，我们最好不要添加该分支。这正是基于树的模型中的修剪技术！

对于实值数据，我们通常希望搜索最佳分割。为了有效地执行此操作，我们将所有实例按排序顺序放置，如下图所示。

从左到右扫描足以计算所有可能的分裂解的结构分数，并且我们可以有效地找到最佳分裂。

徐长亮

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ml_ensemble_boostedtrees_guide

极端梯队提升 Extreme Gradient Boosting贪婪函数逼近 Greedy Function Approximation梯队提升树 gradient boosted trees监督学习 Supervised Learning模型和参数 Model and Parameters线性回归预测为例 linear model prediction :\hat{y}i = \su...
复制链接

扫一扫

专栏目录