Gradient Boosted Regression Trees 2

Gradient Boosted Regression Trees 2

 

Regularization

GBRT provide three knobs to control overfitting: tree structure, shrinkage, and randomization.

Tree Structure

The depth of the individual trees is one aspect of model complexity. The depth of the trees basically control the degree of feature interactions that your model can fit. For example, if you want to capture the interaction between a feature latitude and a feature longitude your trees need a depth of at least two to capture this. Unfortunately, the degree of feature interactions is not known in advance but it is usually fine to assume that it is faily low -- in practise, a depth of 4-6 usually gives the best results. In scikit-learn you can constrain the depth of the trees using the max_depth argument.

Another way to control the depth of the trees is by enforcing a lower bound on the number of samples in a leaf: this will avoid inbalanced splits where a leaf is formed for just one extreme data point. In scikit-learn you can do this using the argument min_samples_leaf. This is effectively a means to introduce bias into your model with the hope to also reduce variance as shown in the example below:

def fmt_params(params):
    return ", ".join("{0}={1}".format(key, val) for key, val in params.iteritems())fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'min_samples_leaf': 3}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Higher bias', xy=(900, est.train_score_[899]), xycoords='data', xytext=(600, 0.3), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )ax.annotate('Lower variance', xy=(900, test_dev[899]), xycoords='data', xytext=(600, 0.4), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )plt.legend(loc='upper right')
Shrinkage

The most important regularization technique for GBRT is shrinkage: the idea is basically to do slow learning by shrinking the predictions of each individual tree by some small scalar, the learning_rate. By doing so the model has to re-enforce concepts. A lower learning_rate requires a higher number of n_estimatorsto get to the same level of training error -- so its trading runtime against accuracy.

fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'learning_rate': 0.1}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Requires more trees', xy=(200, est.train_score_[199]), xycoords='data', xytext=(300, 1.0), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )ax.annotate('Lower test error', xy=(900, test_dev[899]), xycoords='data', xytext=(600, 0.5), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )plt.legend(loc='upper right')
Stochastic Gradient Boosting

Similar to RandomForest, introducing randomization into the tree building process can lead to higher accuracy. Scikit-learn provides two ways to introduce randomization: a) subsampling the training set before growing each tree (subsample) and b) subsampling the features before finding the best split node (max_features). Experience showed that the latter works better if there is a sufficient large number of features (>30). One thing worth noting is that both options reduce runtime.

Below we show the effect of using subsample=0.5, ie. growing each tree on 50% of the training data, on our toy example:

fig = plt.figure(figsize=(8, 5))ax = plt.gca()for params, (test_color, train_color) in [({}, ('#d7191c', '#2c7bb6')), ({'learning_rate': 0.1, 'subsample': 0.5}, ('#fdae61', '#abd9e9'))]: est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0, random_state=1) est.set_params(**params) est.fit(X_train, y_train) test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params(params), train_color=train_color, test_color=test_color) ax.annotate('Even lower test error', xy=(400, test_dev[399]), xycoords='data', xytext=(500, 0.5), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc"), )est = GradientBoostingRegressor(n_estimators=n_estimators, max_depth=1, learning_rate=1.0, subsample=0.5)est.fit(X_train, y_train)test_dev, ax = deviance_plot(est, X_test, y_test, ax=ax, label=fmt_params({'subsample': 0.5}), train_color='#abd9e9', test_color='#fdae61', alpha=0.5)ax.annotate('Subsample alone does poorly', xy=(300, test_dev[299]), xycoords='data', xytext=(250, 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值