Algorithm: Boosting model with XGBoost

Difference between bagging and boosting:

We call each sub model in ensemble mode as weak learner. In random forest, it is the decision tree.

Weak Learner: it can't be used to predict the result indepedently.

Overfitting: the model will predict good in the training data but weak in the test model.

Underfitting: the model can't fit the training data well.

We considered that the bagging model is combined with a lot of "professor". We use a lot of "professor" to prevent from overfitting.

Boosting is combined with a lot of "slacker student". We use a lot of "slacker student" to improve from underfitting.

 

 

How to train a boosting model? 

Residual

We can train the bagging model in parallel, while train the boosting model in serial by residual.

We use a weak learning model to predict ,and calculate the residual value.

We base on the residual value and train the weak learning model 2.

and then we calculate the residual value respect with model 2.

Then we train the model 3 and do the same work.

Once we have trained a bunch of weak learners,we sum all of the predictions from each of the model to get the final prediction

Reference of XGBoost

https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf

https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

XGBoost Model

How to understand XGBoost? 

1)How to build the target function?

2)Approximate of the target function? Taylor Expansion

3)How to obtain the target function respect to the Tree datastructure, parameterize of the tree

4)How to optimize the target function? Greedy Algorithm

 

The target function

Once we have a trained model,we sum all of the predictions from each of the sub model to get the final prediction

fk(Xi) means the prediction of the k-th sub tree respect to the i-th sample of the input data.

The lost(error) function

The target function is combined with the lost function and regulization term

For regression problem we can use the mean squared error(MSE)

For classification problem we can use the cross entropy.

For regulization we can use L1, L2, or elastic net.

How to define the complexity of the XGBoost?

the depth, the number of leaves, and the predict value of each leaf node..

If we have trained the K-1 sub models, how to obtain the k-th one?

Additive Training(叠加式训练)

Taylor Expansion with object function

We get the new object function,but we still can‘t parameterize fk(xi)’ and the complexity of the tree

 

Parameterize of the tree

Parameterize the complexity of the tree(regulization term)

New Object function

We obtain the final object function and we need to optimize it

Optimization

It is an optimization problem for the structure of the tree, and it is equal to a search problem

we can use brute-force, greedy algorithm etc

But the brute-force has the complexity of exponential

Greedy Approach similary to decision tree

When we build the decision tree we using the entropy or standard deviation to best feature selection

When build the xgboost tree we using our object function to choose the best decision boundary

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值