Python机器学习（六）-XGBoost调参

最新推荐文章于 2024-03-20 10:50:41 发布

tantao666

最新推荐文章于 2024-03-20 10:50:41 发布

阅读量4.3k

点赞数 1

分类专栏：机器学习 python

python 同时被 2 个专栏收录

39 篇文章 0 订阅

订阅专栏

机器学习

10 篇文章 0 订阅

订阅专栏

转自：https://mp.weixin.qq.com/s?__biz=MzI2ODY5NjI2Mg==&mid=2247483759&idx=1&sn=69aefa04b283eba638854993e0384cb4&chksm=eaeaeec6dd9d67d0

当了建了一个模型，为了达到最佳性能，通常需要对参数进行调整。这样你的模型，才会像一碗加了辣油精心调制过的香气扑鼻的馄饨。所以

调参 = 调料？

(一)。XGBoost 及调参简介

XGBoost(eXtreme Gradient Boosting)是Gradient Boosting算法的一个优化的版本，是大牛陈天奇的杰作（向上海交通大学校友顶礼膜拜）。Anaconda中不带这个模块，需要自行下载安装。搭建一个XGBoost模型十分简单，4行代码即可实现，分别是：

import xgboost as xgb#调入XGBoost模块

xgbr=xgb.XGBRegressor()#调用XGBRegressor函数‍

xgbr.fit(x_train,y_train)#拟合

xgbr_y_predict=xgbr.predict(x_test)#预测

第二行代码使用默认参数，直接调用XGBRegressor函数。这个算法的迷人之处在于它使用了好几个参数，通过调整参数，可以提高模型的质量。主要参数包括：参数[默认值]（典型值范围）

n_estimators:[500](500,600,700,800)

min_child_weight:[2](1,3,5)

max_depth:[6](3-10)

gamma:[0]

subsample:[1](0.5-1)

colsample_bytree:[0.7](0.5-1)

reg_alpha:[0](0.01,0.5,1)

reg_lambda:[1]（0.01-0.1，1）

‍learning_rate:[0.05](0.01-0.3)

（二）。Python调参

如果电脑是GPU，可以将所有的参数打包，一次运行程序，获得所有参数的最佳值。受电脑性能限制，只能逐个调参。步骤如下：

#1。调用XGBRegressor和GridSearchCV，XGBoost自带plot_importance，其他算法需调用feature_importances_

import xgboost as xgb

from sklearn.grid_search import GridSearchCV

from xgboost import plot_importance

import matplotlib.pyplot as plt

#2。优化最佳迭代次数，将需要优化的参数放在cv_params里，其他参数按照默认值打包存放在字典other_params中

cv_params={'n_estimators':[500,600,700]}

other_params={'base_score':0.3,

'colsample_bylevel':1,

'colsample_bytree':0.7,

'gamma':0,'learning_rate':0.05,

'max_delta_step':0,

'max_depth':6,

'min_child_weight':2,

'n_estimators':500,

'reg_alpha':0.1,

'reg_lambda':0.05,

'subsample':0.7

}

model=xgb.XGBRegressor(**other_params)

print(model.get_params())#获取默认参数

{'base_score': 0.3, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 0.7, 'gamma': 0, 'learning_rate': 0.05, 'max_delta_step': 0, 'max_depth': 6, 'min_child_weight': 2, 'missing': None, 'n_estimators': 500, 'n_jobs': 1, 'nthread': None, 'objective': 'reg:linear', 'random_state': 0, 'reg_alpha': 0.1, 'reg_lambda': 0.05, 'scale_pos_weight': 1, 'seed': None, 'silent': True, 'subsample': 0.7}

‍opt=GridSearchCV(model,cv_params,scoring='r2',cv=5)#调参

opt.fit(x_train,y_train)

print('optimize is ongoing...')

print("Best parameters set found on development set:")

print()

print('result of each iteration:')

‍print(opt.grid_scores_)‍#输出每次运算的结果：

[mean: 0.96580, std: 0.00672, params: {'n_estimators': 500},

mean: 0.96587, std: 0.00676, params: {'n_estimators': 600},

mean: 0.96590, std: 0.00676, params: {'n_estimators': 700}]

print()

print(opt.best_params_)#输出最佳参数 {'n_estimators': 700}

至此，当n=700时，模型可获得最佳性能。如果电脑性能允许，可以多设置几个n值，做更加精致的凋参，如cv_params={'n_estimators':[500,600,700,800,900,1000]}，当知道n=700模型性能最佳使，可以继续调‍cv_params={'n_estimators':[680,690,700,710,720]}。

‍

n_estimators调好后，按照同样的方法调其他参数，将需要调参的放入cv_params, other_params中，调好的参数按最优值放入，即n_estimators=700，其他取默认值

cv_params={'max_depth':[3,4,5,6,7,8,9,10], 'min_child_weight':[1,2,3,4,5]}

other_params={'n_estimators':700,...}

重复以上步骤，直到把所有的参数都调好。

（三）。最佳参数建模

将最佳参数代入XGBoost进行建模，并通过plot_importance直接输出特征重要性排序图形，模型评估

xgbr=xgb.XGBRegressor(base_score=0.3,colsample_bylevel=1,colsample_bytree=0.7, gamma=0,learning_rate=0.05,max_depth=6,min_child_weight=2,n_estimators=1040,reg_alpha=0.1,reg_lambda=0.05,subsample=0.7)

xgbr.fit(x_train,y_train)

xgbr_y_predict=xgbr.predict(x_test)

plot_importance(xgbr)

plt.show()

‍

（四）。模型在训练集、测试集上的性能对比

import matplotlib.pyplot as plt

plt.figure()

plt.scatter(xgbr_y_predict,y_test,marker='X',s=5,c='blue')

plt.scatter(xgbr_y_predict_train,y_train,marker='X',s=5,c='red')

plt.title('XGBRegressor training set & test set prediction vs true')

plt.xlabel('xgbr_y_predict')

plt.ylabel('y_true value')

plt.show()

from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error

print('R-squared of XGBoostRegressor on test set is: %.4f'%(r2_score(y_test,xgbr_y_predict)))

print('R-squared of XGBoostRegressor on training set is: %.4f'%(r2_score(y_train,xgbr_y_predict_train)))

XGBoost在训练集上可以达到非常好的效果，如图形中红色散点所示。在测试集拟合效果如蓝色散点图所示。和Python机器学习(四) - 线性回归中的线性回归对比，模型性能有了较明显的提升。

tantao666

关注

1
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
Python机器学习（六）-XGBoost调参

转自：https://mp.weixin.qq.com/s?__biz=MzI2ODY5NjI2Mg==&mid=2247483759&idx=1&sn=69aefa04b283eba638854993e0384cb4&chksm=eaeaeec6dd9d67d0当了建了一个模型，为了达到最佳性能，通常需要对参数进行调整。这样你的模型，才会像一碗加了辣油精心调制...
复制链接

扫一扫