评分模型的开发、部署、测试、文档说明全流程
文章目录
详见github
https://github.com/chengsong990020186/xgboost_score_model/tree/master
如对你有学习有帮助,请支持点赞~
主要内容:
- 1.使用xgboost训练模型,并保存。
- 2.基于falsk框架,生成实时api接口,进行部署。
- 3.测试api接口。
- 4.api文档
1.使用xgboost训练模型,并保存。
训练数据已上传至github,可以自行下载。
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
import logging
data=pd.read_csv('C:\\Users\\HP\\Desktop\\give me some credit\\data\\cs-training.csv')
del data['Unnamed: 0']
data.columns=['y','RevolvingUtilizationOfUnsecuredLines', 'age','NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome','NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse','NumberOfDependents']
data_x=data[['RevolvingUtilizationOfUnsecuredLines', 'age','NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome','NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse','NumberOfDependents']]
data_y=data['y']
train_x, test_x, train_y, test_y = train_test_split(data_x.values, data_y.values, test_size=0.2,random_state=1234)
d_train = xgb.DMatrix(train_x, label=train_y)
d_valid = xgb.DMatrix(test_x, label=test_y)
watchlist = [(d_train, 'train'), (d_valid, 'valid')]
#参数设置
params={
'eta': 0.2, # 特征权重 取值范围0~1 通常最后设置eta为0.01~0.2
'max_depth':3, # 通常取值:3-10 树的深度
'min_child_weight':1, # 最小样本的权重,调大参数可以防止过拟合
'gamma':0.3,
'subsam