XGBOOST回归用法和官方参数解释

最新推荐文章于 2024-09-07 09:42:54 发布

funk_gg

最新推荐文章于 2024-09-07 09:42:54 发布

阅读量6.8k

点赞数 1

文章标签：机器学习-集成算法

本文链接：https://blog.csdn.net/weixin_44473755/article/details/102729505

版权

本文详细介绍了 XGBoost 的参数设置，包括通用参数、树增强器参数和学习任务参数，并提供了实际训练代码示例。通过调整参数如 `eta`, `gamma`, `max_depth` 和 `subsample`，可以优化模型以适应不同任务，如回归问题。代码展示了使用这些参数进行训练和评估的过程。" 80641612,6863257,MyBatis动态SQL实战：foreach遍历集合详解,"['MyBatis', '动态SQL', 'foreach', '映射文件']

摘要由CSDN通过智能技术生成

XGBoost Parameters
本文连接官网地址：https://xgboost.readthedocs.io/en/latest/parameter.html

Before running XGBoost, we must set three types of parameters: general parameters, booster parameters and task parameters.
General parameters relate to which booster we are using to do boosting, commonly tree or linear model
Booster parameters depend on which booster you have chosen
Learning task parameters decide on the learning scenario. For example, regression tasks may use different parameters with ranking tasks.
Command line parameters relate to behavior of CLI version of XGBoost.
Note
Parameters in R package

In R-package, you can use . (dot) to replace underscore in the parameters, for example, you can use max.depth to indicate max_depth. The underscore parameters are also valid in R.

General Parameters

Parameters for Tree Booster

Additional parameters for Dart Booster (booster=dart)

Parameters for Linear Booster (booster=gblinear)

Parameters for Tweedie Regression (objective=reg:tweedie)

Learning Task Parameters

Command Line Parameters

General Parameters

booster [default= gbtree ]

Which booster to use. Can be gbtree, gblinear or dart; gbtree and dart use tree based models while gblinear uses linear functions.

silent [default=0] [Deprecated]

Deprecated. Please use verbosity instead.

verbosity [default=1]

Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3 (debug). Sometimes XGBoost tries to change configurations based on heuristics, which is displayed as warning message. If there’s unexpected behaviour, please try to increase value of verbosity.

nthread [default to maximum number of threads available if not set]

Number of parallel threads used to run XGBoost

disable_default_eval_metric [default=0]

Flag to disable default metric. Set to >0 to disable.

num_pbuffer [set automatically by XGBoost, no need to be set by user]

Size of prediction buffer, normally set to number of training instances. The buffers are used to save the prediction results of last boosting step.

num_feature [set automatically by XGBoost, no need to be set by user]

Feature dimension used in boosting, set to maximum dimension of the feature

Parameters for Tree Booster

eta [default=0.3, alias: learning_rate]

Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.

range: [0,1]

gamma [default=0, alias: min_split_loss]

Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be.

range: [0,∞]

max_depth [default=6]

Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 is only accepted in lossguided growing policy when tree_method is set as hist and it indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree.

range: [0,∞] (0 is only accepted in lossguided growing policy when tree_method is set as hist)

min_child_weight [default=1]

Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression task, this simply corresponds to minimum number of instances needed to be in each node. The larger min_child_weight is, the more conservative the algorithm will be.

range: [0,∞]

max_delta_step [default=0]

Maximum delta step we allow each leaf output to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update.

range: [0,∞]

subsample [default=1]

Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. and this will prevent overfitting. Subsampling will occur once in every boosting iteration.

range: (0,1]

colsample_bytree, colsample_bylevel, colsample_bynode [default=1]

This is a family of paramete