1. grid search是用来寻找模型的最佳参数
先导入一些依赖包
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.grid_search import GridSearchCV
from sklearn import metrics
import numnpy as np
import pandas as pd
2. 设置要查找的参数
params={'learning_rate' :np.linspace(0.05 ,0.25 ,5 ), 'max_depth' :[x for x in range(1 ,8 ,1 )], 'min_samples_leaf' :[x for x in range(1 ,5 ,1 )], 'n_estimators' :[x for x in range(50 ,100 ,10 )]}
3. 设置模型和评价指标,开始用不同的参数训练模型
clf = GradientBoostingClassifier()
grid = GridSearchCV(clf, params, cv=10 , scoring="f1" )
grid.fit(X, y)
scoring所有可能情况如下:
scoring function comment accuracy metrics.accuracy_score average_precision metrics.average_precision_score f1 metrics.f1_score for binary targets f1_micro metrics.f1_score micro-averaged f1_macro metrics.f1_score macro-averaged f1_weighted metrics.f1_score weighted average f1_samples metrics.f1_score by multilabel sample neg_log_loss metrics.log_loss requires predict_proba support precision etc. metrics.precision_score suffixes apply as with “f1” recall etc. metrics.recall_score suffixes apply as with “f1” roc_auc metrics.roc_auc_score
scoring function comment adjusted_rand_score metrics.adjusted_rand_score
scoring function comment neg_mean_absolute_error metrics.mean_absolute_error neg_mean_squared_error metrics.mean_squared_error neg_median_absolute_error metrics.median_absolute_error r2 metrics.r2_score
4. 查看最佳分数和最佳参数
grid.best_score_
grid.best_params_
5. 获取最佳模型
grid.best_estimator_
6. 利用最佳模型来进行预测
best_model=grid.best_estimator_
predict_y=best_model.predict(Test_X)
metrics.f1_score(y, predict_y)