Gradient Tree Boosting:梯度提升树详解

最新推荐文章于 2024-04-30 10:06:39 发布

weixin_30614587

最新推荐文章于 2024-04-30 10:06:39 发布

阅读量817

点赞数

文章标签：人工智能 python 数据结构与算法

原文链接：http://www.cnblogs.com/fonttian/p/9162725.html

版权

理论数学推导请参考《统计机器学习》-李航,或者参考sklearn的官方文档,下面是我的部分笔记,也可以作为参考优缺点GBRT是对任意的可微损失函数的提升算法的泛化，即可回归亦可分（sai）类（ting）。优点： 1. 对混合数据的的天然处理能力 2. 强大的预测能力（主要指算法本身的能力强大，一般性能好） 3. 在输出空间中...

摘要由CSDN通过智能技术生成

理论

数学推导请参考《统计机器学习》-李航,或者参考sklearn的官方文档,下面是我的部分笔记,也可以作为参考

梯度提升树的数学推导1

梯度提升树的数学推导2

优缺点

GBRT是对任意的可微损失函数的提升算法的泛化，即可回归亦可分（sai）类（ting）。
优点：
1. 对混合数据的的天然处理能力
2. 强大的预测能力（主要指算法本身的能力强大，一般性能好）
3. 在输出空间中对异常点的鲁棒性（通过具有鲁棒性的损失函数来实现的）
缺点：
1. 难以并行计算

Sklearn note：

查过两类的分类问题需要在每一次迭代的推导n-classes个回归树。因此所有需要推导的树的数量等于n-classses * nesstimators.所以在数据量较大的时候建议使用其他算法代替GBRT.

下面例子,我们依旧使用Adaboost算法特性中的数据,同时先下面的例子也很好的证明了两个结论:

子采样可以有效的避免过拟合
子采样也会导致偏差增加,因此相比于原来的Adaboost需要更多的提升次数. (关于如何更好地使用梯度提升算法,请看倒数第二部分)

%matplotlib inline
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor

# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(10 * rng.rand(160, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 2 * (0.5 - rng.rand(int(len(X)/5))) # 每五个点增加一次噪音

# Fit regression model

estimators_num = 500

regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
params_1 = {
   'n_estimators': 100, 'max_depth': 3, 'subsample': 0.5,
          'learning_rate': 0.01, 'min_samples_leaf': 1, 'random_state': 3}

params_2 = {
   'n_estimators': 500, 'max_depth': 3, 'subsample': 0.5,
          'learning_rate': 0.01, 'min_samples_leaf': 1, 'random_state': 3}
regr_3 = GradientBoostingRegressor(**params_1)
regr_4 = GradientBoostingRegressor(**params_2)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)
regr_4.fit(X, y)

# Predict
X_test = np.arange(0.0, 10.0, 0.01)[:, np.newaxis]
y_test = np.sin(X_test).ravel()
y_test[::5] += 2 * (0.5 - rng.rand(int(len(X_test)/5))) # 每五个点增加一次噪音
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)
y_4 = regr_4.predict(X_test)

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(2, 1, 1)
plt.scatter(X, y, s=20, edgecolor="black",
            c="darkorange", label="data")
ax.plot(X_test, y_1, color="cornflowerblue",label="max_depth=2", linewidth=2)
ax.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
ax.plot(X_test, y_3, color="r", label="n_e

最低0.47元/天解锁文章

weixin_30614587

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Gradient Tree Boosting:梯度提升树详解

理论数学推导请参考《统计机器学习》-李航,或者参考sklearn的官方文档,下面是我的部分笔记,也可以作为参考优缺点GBRT是对任意的可微损失函数的提升算法的泛化，即可回归亦可分（sai）类（ting）。优点： 1. 对混合数据的的天然处理能力 2. 强大的预测能力（主要指算法本身的能力强大，一般性能好） 3. 在输出空间中...
复制链接

扫一扫