Gradient Boosted Regression

最新推荐文章于 2024-09-07 22:40:55 发布

weixin_34005042

最新推荐文章于 2024-09-07 22:40:55 发布

阅读量920

点赞数

文章标签： python 人工智能

原文链接：http://www.cnblogs.com/chaofn/p/4681394.html

版权

这篇博客详细介绍了如何利用Python的sklearn库实现Gradient Boosting Regression，重点讲解了sklearn.ensemble.GradientBoostingRegressor的用法。

摘要由CSDN通过智能技术生成

3.2.4.3.6. `sklearn.ensemble`.GradientBoostingRegressor

class sklearn.ensemble. GradientBoostingRegressor ( loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False ) [source]

Gradient Boosting for regression.

GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.

Read more in the User Guide.

P a r a m e t e r s :	loss : {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’) loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (usealpha to specify the quantile). learning_rate : float, optional (default=0.1) learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators. n_estimators : int (default=100) The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. max_depth : integer, optional (default=3) maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. Ignored if `max_leaf_nodes` is not None. min_samples_split : integer, optional (default=2) The minimum number of samples required to split an internal node. min_samples_leaf : integer, optional (default=1) The minimum number of samples required to be at a leaf node. min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the input samples required to be at a leaf node. subsample : float, optional (default=1.0) The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parametern_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias. max_features : int, float, string or None, optional (default=None) The number of features to consider when looking for the best split: If int, then consider max_features features at each split. If float, then max_features is a percentage and int(max_features * n_features)features are considered at each split. If “auto”, then max_features=n_features. If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features=n_features. Choosing max_features < n_features leads to a reduction of variance and an increase in bias. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than `max_features` features. max_leaf_nodes : int or None, optional (default=None) Grow trees with `max_leaf_nodes` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. alpha : float (default=0.9) The alpha-quantile of the huber loss function and the quantile loss function. Only if`loss='huber'` or `loss='quantile'`. init : BaseEstimator, None, optional (default=None) An estimator object that is used to compute the initial predictions. `init` has to provide `fit`and `predict`. If None it uses `loss.init_estimator`. verbose : int, default: 0 Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree. warm_start : bool, default: False When set to `True`, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
A t t r i b u t e s :

loss : {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’)

loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (usealpha to specify the quantile).

learning_rate : float, optional (default=0.1)

learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

n_estimators : int (default=100)

The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

max_depth : integer, optional (default=3)

maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. Ignored if max_leaf_nodes is not None.

min_samples_split : integer, optional (default=2)

The minimum number of samples required to split an internal node.

min_samples_leaf : integer, optional (default=1)

The minimum number of samples required to be at a leaf node.

min_weight_fraction_leaf : float, optional (default=0.)

The minimum weighted fraction of the input samples required to be at a leaf node.

subsample : float, optional (default=1.0)

The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parametern_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

max_features : int, float, string or None, optional (default=None)

The number of features to consider when looking for the best split:

If int, then consider max_features features at each split.

If float, then max_features is a percentage and int(max_features * n_features)features are considered at each split.

If “auto”, then max_features=n_features.

If “sqrt”, then max_features=sqrt(n_features).

If “log2”, then max_features=log2(n_features).

If None, then max_features=n_features.

Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_leaf_nodes : int or None, optional (default=None)

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

alpha : float (default=0.9)

The alpha-quantile of the huber loss function and the quantile loss function. Only ifloss='huber' or loss='quantile'.

init : BaseEstimator, None, optional (default=None)

An estimator object that is used to compute the initial predictions. init has to provide fitand predict. If None it uses loss.init_estimator.

verbose : int, default: 0

Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.

warm_start : bool, default: False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.