XGboost调参_自己练习

最新推荐文章于 2022-11-30 18:35:54 发布

CristinaM

最新推荐文章于 2022-11-30 18:35:54 发布

阅读量602

点赞数

分类专栏： XGboost 错误数据分析

本文链接：https://blog.csdn.net/weixin_43615654/article/details/103113424

版权

本文介绍了如何使用XGBoost进行参数调优，包括设置学习速率、决策树参数、正则化参数等。通过交叉验证和GridSearchCV，逐步优化max_depth、min_child_weight、gamma、subsample、colsample_bytree等参数，以提高模型性能。

摘要由CSDN通过智能技术生成

主要参考：https://blog.csdn.net/kicilove/article/details/78413112#comments
https://wuhuhu800.github.io/2018/02/28/XGboost_param_share/#xgboost%E7%9A%84%E5%8F%82%E6%95%B0
在这里插入图片描述
开始：

导入库，加载数据

import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import metrics
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=(12.0,4.0)

train = pd.read_csv("C:\\Users\\Nihil\\Documents\\pythonlearn\\data\\train_modified.csv")

target='Disbursed' # Disbursed的值就是二元分类的输出
IDcol = 'ID'#写成格式方便对比
x_columns = [x for x in train.columns if x not in [target, IDcol]]
X_train = train[x_columns]
y_train = train['Disbursed']

关于rcParams的用法参考Matplotlib中plt.rcParams用法（设置图像细节）

以上代码采用两种形式的XGBoost:
xgb -原生，可将使用该库中的特定函数“cv”，在每一步迭代中使用交叉验证。并返回理想的树数量。
但交叉验证很慢，所以可以import另一种：
XGBClassifier - XGBoost的sklearn封装。为了用GridSearchCV调整其他参数。

找迭代1：定义一个modelfit函数，建立XGBoost models并进行交叉验证

def modelfit(alg, useTrainCV=True, cv_folds=5, early_stopping_rounds=50):
    if useTrainCV:
        xgb_param = alg.get_xgb_params()
        xgtrain = xgb.DMatrix(X_train,label=y_train)
        cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,
            metrics='auc', early_stopping_rounds=early_stopping_rounds, verbose_eval=False)
        alg.set_params(n_estimators=cvresult.shape[0])

    alg.fit(X_train,y_train,eval_metric='auc')
    dtrain_predictions = alg.predict(X_train)
    dtrain_predprob = alg.predict_proba(X_train)[:, 1]
    acc = metrics.accuracy_score(y_train.values,dtrain_predictions)
    auc = metrics.roc_auc_score(y_train,dtrain_predprob)
    print("Accuracy is {:.4f}".format(acc))
    print('Best number of trees ={}'.format(cvresult.shape[0]))#输出树的数量
    print("AUC Score(Train) is {:.4f}".format(auc))

画出特征重要性

    print(alg.feature_importances_)
    plt.bar(range(len(alg.feature_importances_)),alg.feature_importances_)
    plt.show()

另一种，推荐使用

    feat_imp = pd.Series(alg.get_booster().get_fscore()).sort_values(ascending=False)
    feat_imp.plot(kind='bar',title='Feature Important')
    plt.ylabel('Feature Importance Score')
    plt.show()

参照：用xgboost模型对特征重要性进行排序

有几个地方：

alg.set_params(n_estimators=cvresult.shape[0])
cvresult.shape[0]和alg.get_params()[‘n_estimators’]值一样
在numpy里.shape[0]代表行数，shape[1]代表列数。
参照numpy.array 的shape属性理解
dtrain_predprob = alg.predict_proba(X_train)[:, 1]
属于第二类的概率
predict predict_proba区别的小例子
Format格式化
format 格式化函数

用Early Stop

修改了一下代码
这一段参照：https://www.yuque.com/zhaoshijun/md/mtx7ty

import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import metrics
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=(12.0,

最低0.47元/天解锁文章

CristinaM

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
XGboost调参_自己练习

主要参考：https://blog.csdn.net/kicilove/article/details/78413112#comments开始：导入库，加载数据import pandas as pdimport numpy as npimport xgboost as xgbfrom xgboost.sklearn import XGBClassifierfrom sklearn ...
复制链接

扫一扫

专栏目录