随机森林(random forest),GBDT(Gradient Boosting Decision Tree),前者中的森林,与后者中的 Boosting 都在说明,两种模型其实都是一种集成学习(ensemble learning)的学习方式。
随机森林的一个基本框架:
1. RandomForestRegressor:随机森林回归
from sklearn.ensemble import RandomForestRegressor
max_features = [.1, .3, .5, .7, .9, .99]
test_scores = []
for max_feat in max_features:
clf = RandomForestRegressor(n_estimators=200, max_features=max_feat)
test_score = np.sqrt(-cross_val_score(clf, X_train, y_train,
cv=5, scoring='neg_mean_squared_error'))
test_scores.append(np.mean(test_score))
plt.plot(max_feat, test_scores)
plt.title('Max feat vs CV Error')
2. Bagging
from sklearn.ensemble import BaggingRegressor
from sklearn.linear_model import Ridge
ridge = Ridge(15)
params = [1, 10, 15, 20, 25, 30, 40]
test_scores = []
for param in params:
clf = BaggingRegressor(n_estimators=param, base_estimator=ridge)
test_score = np.sqrt(-cross_val_score(clf, X_train, y_train,
cv=10, scoring='neg_mean_squared_error'))
test_scores.append(np.mean(test_score))