Gradient Boosted Regression Trees 2

最新推荐文章于 2024-06-17 14:22:59 发布

weixin_33812433

最新推荐文章于 2024-06-17 14:22:59 发布

阅读量191

点赞数

文章标签：人工智能 runtime 开发工具

Gradient Boosted Regression Trees 2

Regularization

GBRT provide three knobs to control overfitting: tree structure, shrinkage, and randomization.

Tree Structure

The depth of the individual trees is one aspect of model complexity. The depth of the trees basically control the degree of feature interactions that your model can fit. For example, if you want to capture the interaction between a feature latitude and a feature longitude your trees need a depth of at least two to capture this. Unfortunately, the degree of feature interactions is not known in advance but it is usually fine to assume that it is faily low -- in practise, a depth of 4-6 usually gives the best results. In scikit-learn you can constrain the depth of the trees using the max_depth argument.

Another way to control the depth of the trees is by enforcing a lower bound on the number of samples in a leaf: this will avoid inbalanced splits where a leaf is formed for just one extreme data point. In scikit-learn you can do this using the argument min_samples_leaf. This is effectively a means to introduce bias into your model with the hope to also reduce variance as shown in the example below:

def fmt_params(params):
    return ", ".join("{0}={1}".format(key, val) for key, val in params.iteritems())fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'min_samples_leaf': 3}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Higher bias', xy=(900, est.train_score_[899]), xycoords='data', xytext=(600, 0.3), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )ax.annotate('Lower variance', xy=(900, test_dev[899]), xycoords='data', xytext=(600, 0.4), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )plt.legend(loc='upper right')

Shrinkage

The most important regularization technique for GBRT is shrinkage: the idea is basically to do slow learning by shrinking the predictions of each individual tree by some small scalar, the learning_rate. By doing so the model has to re-enforce concepts. A lower learning_rate requires a higher number of n_estimatorsto get to the same level of training error -- so its trading runtime against accuracy.

fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'learning_rate': 0.1}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Requires more trees', xy=(200, est.train_score_[199]), xycoords='data', xytext=(300, 1.0), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )ax.annotate('Lower test error', xy=(900, test_dev[899]), xycoords='data', xytext=(600, 0.5), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )plt.legend(loc='upper right')

Stochastic Gradient Boosting

Similar to RandomForest, introducing randomization into the tree building process can lead to higher accuracy. Scikit-learn provides two ways to introduce randomization: a) subsampling the training set before growing each tree (subsample) and b) subsampling the features before finding the best split node (max_features). Experience showed that the latter works better if there is a sufficient large number of features (>30). One thing worth noting is that both options reduce runtime.

Below we show the effect of using subsample=0.5, ie. growing each tree on 50% of the training data, on our toy example:

fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'learning_rate': 0.1, 'subsample': 0.5}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0, random_state=1) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Even lower test error', xy=(400, test_dev[399]), xycoords='data', xytext=(500, 0.5), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0, subsample=0.5)est.fit(X_train, y_train)test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params({'subsample': 0.5}), train_color='#abd9e9', test_color='#fdae61', alpha=0.5)ax.annotate('Subsample alone does poorly', xy=(300, test_dev[299]), xycoords='data', xytext=(250,

weixin_33812433

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Gradient Boosted Regression Trees 2

Gradient Boosted Regression Trees 2 RegularizationGBRT provide three knobs to control overfitting: tree structure, shrinkage, and randomization.Tree StructureThe depth of the individual trees i...
复制链接

扫一扫