XGB建模流程化代码——针对业务修改整理

最新推荐文章于 2023-11-28 09:51:44 发布

数据厂商小伙

最新推荐文章于 2023-11-28 09:51:44 发布

阅读量649

点赞数

分类专栏：菜鸟数据建模文章标签：深度学习机器学习

本文链接：https://blog.csdn.net/qq_42457415/article/details/115240710

版权

菜鸟数据建模专栏收录该内容

6 篇文章 1 订阅

订阅专栏

XGB建模流程化代码——针对业务修改整理

methods模块是自己日常使用的模块，后面会发出来

#筛选变量
dt,drop_dt=methods.feature_select(dt,'y','app_date',return_drop=True)

#按年分样本
dt2019 = methods.data_by_year(dt,2019)
dt2020 = methods.data_by_year(dt,2020)

(13428, 50)
(15438, 50)

#切分2019样本训练集、测试集
Xtr,Xts,Ytr,Yts = train_test_split(dt2019.drop('y',axis=1),dt2019['y'],test_size=0.3,random_state=100)

a_params = {'learning_rate': [0.02],
                      'n_estimators': [400],
                      'max_depth': [3],
                      'min_child_weight': [3],
                      'gamma': [4],
                      'colsample_bytree':[ 0.3],
                      'subsample': [0.9],
                      'reg_lambda': [1],
                      'reg_alpha': [1]
                      }
methods.get_best_estimator(Xtr,Ytr,b_params=a_params,cv=3)

AUC_mean: [0.62880929]
XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.3, gamma=4, gpu_id=-1,
importance_type=‘gain’, interaction_constraints=None,
learning_rate=0.02, max_delta_step=0, max_depth=3,
min_child_weight=3, missing=nan, monotone_constraints=None,
n_estimators=400, n_jobs=0, num_parallel_tree=1, random_state=0,
reg_alpha=1, reg_lambda=1, scale_pos_weight=1, subsample=0.9,
tree_method=None, validate_parameters=False, verbosity=None)*

这里输出调参过程的平均auc，可以对测试集、OOT打分的时候进行对比，判断是否稳定。这可以作为考量之一

params={'learning_rate': 0.02,
                      'n_estimators': 400,
                      'max_depth': 3,
                      'min_child_weight': 3,
                      'gamma': 4,
                      'colsample_bytree':0.3,
                      'subsample':0.9,
                      'reg_lambda':1,
                      'reg_alpha': 1}
model=methods.to_model(Xtr,Ytr,params,'train',score=True)

methods.down_model(model,'XGB_ModelT')#输出模型

#测试集分数
methods.cal_ks_auc(model,Xts,Yts,'test')

test
KS: 0.18959549611913964
AUC: 0.6259647453877316

#OOT分数
methods.cal_ks_auc(model,dt2020.drop('y',axis=1),dt2020['y'],'OOT')

OOT
KS: 0.18594045740912735
AUC: 0.6217593281536178

#训练集、测试集、OOT分数分布
s_tr = model.predict_proba(Xtr)[:,1]
s_ts = model.predict_proba(Xts)[:,1]
s_OOT = model.predict_proba(dt2020.drop('y',axis=1))[:,1]

methods.score_distribution_plots('score_dis',s_tr,s_ts,s_OOT,'train','test','OOT',three=True)

在这里插入图片描述

print(methods.psi_for_continue_var(s_tr,s_OOT,detail=True))

acture_score_range expecteds expected(%) actucals actucal(%)
0 [0.0578,0.1075] 1616.0 17.193318 2065.0 13.38
1 (0.1075,0.1572] 2914.0 31.003298 5936.0 38.45
2 (0.1572,0.2068] 2504.0 26.641132 4599.0 29.79
3 (0.2068,0.2565] 1348.0 14.341951 2080.0 13.47
4 (0.2565,0.3062] 629.0 6.692201 584.0 3.78
5 (0.3062,0.3558] 267.0 2.840728 142.0 0.92
6 (0.3558,0.4055] 89.0 0.946909 24.0 0.16
7 (0.4055,0.4552] 23.0 0.244707 8.0 0.05
8 (0.4552,0.5048] 6.0 0.063837 0.0 0.00
9 (0.5048,0.5545] 3.0 0.031918 0.0 0.00
10 >>> summary 9399.0 100.000000 15438.0 100.00
ac - ex(%) ln(ac/ex) psi max
0 -3.82 -0.250743 0.009583
1 7.45 0.215259 0.016033
2 3.15 0.111712 0.003518
3 -0.87 -0.062719 0.000543
4 -2.91 -0.571104 0.016597
5 -1.92 -1.126707 0.021661 <<<<<<<
6 -0.79 -1.772854 0.014300
7 -0.19 -1.572314 0.002994
8 -0.06 -4.171870 0.000162
9 -0.03 -3.494028 0.000777
10 NaN NaN 0.086168 <<< result