Algorithm: Boosting model with XGBoost

最新推荐文章于 2023-08-31 12:05:36 发布

sesiria

最新推荐文章于 2023-08-31 12:05:36 发布

阅读量254

点赞数 1

分类专栏： Machine Learning 文章标签：提升树 XGBoost

本文链接：https://blog.csdn.net/sesiria/article/details/102581265

版权

Machine Learning 专栏收录该内容

25 篇文章 3 订阅

订阅专栏

Difference between bagging and boosting:

We call each sub model in ensemble mode as weak learner. In random forest, it is the decision tree.

Weak Learner: it can't be used to predict the result indepedently.

Overfitting: the model will predict good in the training data but weak in the test model.

Underfitting: the model can't fit the training data well.

We considered that the bagging model is combined with a lot of "professor". We use a lot of "professor" to prevent from overfitting.

Boosting is combined with a lot of "slacker student". We use a lot of "slacker student" to improve from underfitting.

How to train a boosting model?

Residual

We can train the bagging model in parallel, while train the boosting model in serial by residual.

We use a weak learning model to predict ,and calculate the residual value.

We base on the residual value and train the weak learning model 2.

and then we calculate the residual value respect with model 2.

Then we train the model 3 and do the same work.

Once we have trained a bunch of weak learners，we sum all of the predictions from each of the model to get the final prediction

Reference of XGBoost

https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf

https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

XGBoost Model

How to understand XGBoost？

1）How to build the target function？

2）Approximate of the target function？ Taylor Expansion

3）How to obtain the target function respect to the Tree datastructure， parameterize of the tree

4）How to optimize the target function？ Greedy Algorithm

The target function

Once we have a trained model，we sum all of the predictions from each of the sub model to get the final prediction

fk(Xi) means the prediction of the k-th sub tree respect to the i-th sample of the input data.

The lost(error) function

The target function is combined with the lost function and regulization term

For regression problem we can use the mean squared error(MSE)

For classification problem we can use the cross entropy.

For regulization we can use L1, L2, or elastic net.

How to define the complexity of the XGBoost?

the depth, the number of leaves, and the predict value of each leaf node..

If we have trained the K-1 sub models, how to obtain the k-th one?

Additive Training(叠加式训练)

Taylor Expansion with object function

We get the new object function，but we still can‘t parameterize fk(xi)’ and the complexity of the tree

Parameterize of the tree

Parameterize the complexity of the tree(regulization term)

New Object function

We obtain the final object function and we need to optimize it

Optimization

It is an optimization problem for the structure of the tree， and it is equal to a search problem

we can use brute-force， greedy algorithm etc

But the brute-force has the complexity of exponential

Greedy Approach similary to decision tree

When we build the decision tree we using the entropy or standard deviation to best feature selection

When build the xgboost tree we using our object function to choose the best decision boundary

sesiria

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Algorithm: Boosting model with XGBoost

Difference between bagging and boosting:We call each sub model in ensemble mode as weak learner. In random forest, it is the decision tree.Weak Learner: it can't be used to predict the result in...
复制链接

扫一扫

专栏目录