一行代码搞定信用评分模型(python)

文章目录

  • 安装说明
  • 模型训练
  • 模型结果
    • 评分卡
    • 各变量类型以及IV值
    • 连续变量的切分点
  • 模型效果检验
  • 预测
  • 补充-模型调试
    • 变量IV值以及分箱分析
    • 模型效果分析
  • 补充-包的参数解释

安装说明

包已上传至PYPI官网,详见CreditScoreModel包

首次使用可以直接使用以下代码安装

pip install CreditScoreModel

模型训练

from CreditScoreModel.LogisticScoreCard import *
data=pd.read_csv('C:\\Users\\HP\\Desktop\\give me some credit\\data\\cs-training.csv')
data_predict=pd.read_csv('C:\\Users\\HP\\Desktop\\give me some credit\\data\\cs-test.csv')del data['Unnamed: 0']
data.columns=['y','RevolvingUtilizationOfUnsecuredLines', 'age','NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome','NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse','NumberOfDependents']
del data_predict['Unnamed: 0']ls=logistic_score_card()
data_train, data_test = ls.get_data_train_test(data,test_size=0.25,random_state=1234)
ls.fit(data_train)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\fixes.py:313: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change._nan_object_mask = _nan_object_array != _nan_object_array
2019 16:15:06 INFO 任务开始。。。
2019 16:15:06 INFO 连续和离散变量划分中。。。
2019 16:15:06 INFO 连续和离散变量划分完成!
2019 16:15:06 INFO 连续变量最优分组进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 23.53it/s]
2019 16:15:06 INFO 连续变量最优分组完成!
2019 16:15:06 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.38it/s]
2019 16:15:11 INFO 根据cut离散化连续变量完成!
2019 16:15:11 INFO IV值计算中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 37.45it/s]
2019 16:15:11 INFO IV值计算完成!
2019 16:15:11 INFO WOE转换中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 11.36it/s]
2019 16:15:12 INFO WOE转换完成!
2019 16:15:12 INFO 根据IV值大于 0.1 且 相关性小于 0.6 ,以及l1正则选取变量进行中。。。
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.FutureWarning)
2019 16:15:12 INFO 变量选取完成,总共 10 个变量,最终筛选出 5 个变量
2019 16:15:12 INFO 评分卡制作中。。。
2019 16:15:12 INFO 连续和离散变量划分中。。。
2019 16:15:12 INFO 连续和离散变量划分完成!
2019 16:15:12 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.36it/s]
2019 16:15:14 INFO 根据cut离散化连续变量完成!
2019 16:15:14 INFO WOE转换中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11.81it/s]
2019 16:15:15 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.FutureWarning)
2019 16:15:18 INFO 评分卡制作完成!
2019 16:15:18 INFO 任务完成!

模型结果

评分卡

对应的变量中文名

[‘变量名’, ‘变量类型’, ‘切分点’, ‘切分分组’, ‘y为1的数量’, ‘y为0的数量’, ‘总数’, ‘y为1的数量占比’, ‘y为0的数量占比’, ‘总数占比’,‘y为1占总数比例’, ‘woe’, ‘各分组iv’, ‘变量iv值’, ‘logistic参数col_coef’, ‘logistic参数lr_intercept’, ‘分组分数’]

ls.score_card
coltypecutscut_points1_num0_numtotal_num1_pct0_pcttotal_pct1_ratewoeivtotal_ivcol_coeflr_interceptscore
0NumberOfTime30-59DaysPastDueNotWorsecontinuous[-inf, 0.0, 1.0, inf](-inf, 0.0]380390648944510.5035090.8637500.8395640.040264-0.5396830.1944160.7370790.538172-2.59825420.950981
1NumberOfTime30-59DaysPastDueNotWorsecontinuous[-inf, 0.0, 1.0, inf](0.0, 1.0]180110249120500.2384480.0976590.1071110.1494610.8926730.1256790.7370790.538172-2.598254-34.654352
2NumberOfTime30-59DaysPastDueNotWorsecontinuous[-inf, 0.0, 1.0, inf](1.0, inf]1949405059990.2580430.0385910.0533240.3248871.9001100.4169830.7370790.538172-2.598254-73.763987
3NumberOfTime60-89DaysPastDueNotWorsecontinuous[-inf, 0.0, inf](-inf, 0.0]54691013141067830.7240830.9653830.9491820.051216-0.2876180.0694020.5702770.403517-2.5982548.371879
4NumberOfTime60-89DaysPastDueNotWorsecontinuous[-inf, 0.0, inf](0.0, inf]2084363357170.2759170.0346170.0508180.3645272.0757410.5008750.5702770.403517-2.598254-60.419867
5NumberOfTimes90DaysLatecontinuous[-inf, 0.0, inf](-inf, 0.0]49531013331062860.6557660.9655640.9447640.046601-0.3869080.1198630.8330810.528354-2.59825414.746114
6NumberOfTimes90DaysLatecontinuous[-inf, 0.0, inf](0.0, inf]2600361462140.3442340.0344360.0552360.4184102.3022070.7132180.8330810.528354-2.598254-87.743345
7RevolvingUtilizationOfUnsecuredLinescontinuous[-inf, 0.22, 0.49, 0.86, inf](-inf, 0.22]132062341636610.1747650.5940240.5658760.020735-1.2234770.5129531.0716930.643205-2.59825456.766152
8RevolvingUtilizationOfUnsecuredLinescontinuous[-inf, 0.22, 0.49, 0.86, inf](0.22, 0.49]91616760176760.1212760.1597000.1571200.051822-0.2752230.0105751.0716930.643205-2.59825412.769650
9RevolvingUtilizationOfUnsecuredLinescontinuous[-inf, 0.22, 0.49, 0.86, inf](0.49, 0.86]169513027147220.2244140.1241290.1308620.1151340.5921690.0593861.0716930.643205-2.598254-27.475114
10RevolvingUtilizationOfUnsecuredLinescontinuous[-inf, 0.22, 0.49, 0.86, inf](0.86, inf]362212819164410.4795450.1221470.1461420.2203031.3676090.4887791.0716930.643205-2.598254-63.453483
11agecontinuous[-inf, 35.0, 55.0, 62.0, inf](-inf, 35.0]180114288160890.2384480.1361450.1430130.1119400.5604330.0573340.2398430.462719-2.598254-18.706199
12agecontinuous[-inf, 35.0, 55.0, 62.0, inf](35.0, 55.0]406145818498790.5376670.4365820.4433690.0814170.2082630.0210520.2398430.462719-2.598254-6.951426
13agecontinuous[-inf, 35.0, 55.0, 62.0, inf](55.0, 62.0]89817050179480.1188930.1624630.1595380.050033-0.3122250.0136040.2398430.462719-2.59825410.421482
14agecontinuous[-inf, 35.0, 55.0, 62.0, inf](62.0, inf]79327791285840.1049910.2648100.2540800.027743-0.9251340.1478530.2398430.462719-2.59825430.879240

各变量类型以及IV值

ls.col_type_iv
coltypeiv
0RevolvingUtilizationOfUnsecuredLinescontinuous1.071693
1agecontinuous0.239843
2NumberOfTime30-59DaysPastDueNotWorsecontinuous0.737079
3DebtRatiocontinuous0.069471
4MonthlyIncomecontinuous0.076410
5NumberOfOpenCreditLinesAndLoanscontinuous0.073217
6NumberOfTimes90DaysLatecontinuous0.833081
7NumberRealEstateLoansOrLinescontinuous0.055378
8NumberOfTime60-89DaysPastDueNotWorsecontinuous0.570277
9NumberOfDependentscontinuous0.031616

连续变量的切分点

ls.col_continuous_cut_points
[['RevolvingUtilizationOfUnsecuredLines', [-inf, 0.22, 0.49, 0.86, inf]],['age', [-inf, 35.0, 55.0, 62.0, inf]],['NumberOfTime30-59DaysPastDueNotWorse', [-inf, 0.0, 1.0, inf]],['DebtRatio', [-inf, 0.41, 0.67, 2.66, inf]],['MonthlyIncome', [-inf, 1297.0, 4838.0, 6596.0, inf]],['NumberOfOpenCreditLinesAndLoans', [-inf, 2.0, 3.0, 13.0, inf]],['NumberOfTimes90DaysLate', [-inf, 0.0, inf]],['NumberRealEstateLoansOrLines', [-inf, 0.0, 1.0, 2.0, inf]],['NumberOfTime60-89DaysPastDueNotWorse', [-inf, 0.0, inf]],['NumberOfDependents', [-inf, 0.0, 1.0, 2.0, inf]]]

模型效果检验

ls.plot_roc_ks(data_train,ls.score_card)
2019 16:15:18 INFO 预测用户分数中。。。
2019 16:15:18 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.37it/s]
2019 16:15:20 INFO 根据cut离散化连续变量完成!

在这里插入图片描述

ls.plot_roc_ks(data_test,ls.score_card)
2019 16:15:21 INFO 预测用户分数中。。。
2019 16:15:21 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.97it/s]
2019 16:15:22 INFO 根据cut离散化连续变量完成!

在这里插入图片描述

预测

ls.predict_score_proba(data_test,ls.score_card)
2019 16:15:22 INFO 预测用户分数中。。。
2019 16:15:22 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.99it/s]
2019 16:15:23 INFO 根据cut离散化连续变量完成!
NumberOfTime30-59DaysPastDueNotWorsescoreNumberOfTime60-89DaysPastDueNotWorsescoreNumberOfTimes90DaysLatescoreRevolvingUtilizationOfUnsecuredLinesscoreagescorescoreproba
020.9509818.37187914.74611456.76615230.879240731.7143660.011842
120.9509818.37187914.746114-63.453483-18.706199561.9092920.112027
220.9509818.37187914.74611456.766152-6.951426693.8837000.019845
3-34.654352-60.419867-87.743345-27.475114-6.951426382.7558960.601900
4-73.7639878.37187914.746114-63.45348310.421482496.3220070.238491
520.9509818.37187914.74611456.76615230.879240731.7143660.011842
620.9509818.371879-87.743345-63.453483-6.951426471.1746060.307389
720.9509818.37187914.74611412.769650-6.951426649.8871980.035921
820.9509818.37187914.74611456.76615210.421482711.2566080.015664
920.9509818.37187914.746114-27.475114-6.951426609.6424340.061116
1020.9509818.37187914.746114-27.475114-6.951426609.6424340.061116
1120.9509818.37187914.74611456.766152-6.951426693.8837000.019845
1220.9509818.37187914.74611456.76615230.879240731.7143660.011842
1320.9509818.37187914.74611456.766152-6.951426693.8837000.019845
14-34.6543528.37187914.74611456.76615210.421482655.6512760.033255
1520.9509818.37187914.746114-63.45348310.421482591.0369740.077701
1620.9509818.37187914.746114-27.475114-6.951426609.6424340.061116
1720.9509818.37187914.74611456.76615230.879240731.7143660.011842
1820.9509818.371879-87.743345-63.453483-18.706199459.4198330.343125
1920.9509818.37187914.74611456.766152-6.951426693.8837000.019845
2020.9509818.37187914.74611456.766152-6.951426693.8837000.019845
2120.9509818.371879-87.743345-63.453483-6.951426471.1746060.307389
22-34.6543528.371879-87.74334556.76615230.879240573.6195740.096866
2320.9509818.37187914.74611456.766152-6.951426693.8837000.019845
2420.9509818.37187914.74611412.769650-6.951426649.8871980.035921
25-73.7639878.37187914.74611456.76615230.879240636.9993980.042649
2620.9509818.37187914.74611456.76615230.879240731.7143660.011842
2720.9509818.37187914.746114-63.45348330.879240611.4947310.059659
2820.9509818.37187914.74611412.76965010.421482667.2601070.028452
2920.9509818.37187914.74611456.76615230.879240731.7143660.011842
........................
3747020.9509818.37187914.74611412.769650-6.951426649.8871980.035921
37471-34.6543528.37187914.74611412.769650-6.951426594.2818650.074538
3747220.9509818.371879-87.743345-27.475114-6.951426507.1529750.212299
37473-73.763987-60.419867-87.74334512.769650-18.706199372.1362520.636593
3747420.9509818.37187914.74611456.76615210.421482711.2566080.015664
3747520.9509818.37187914.74611456.766152-18.706199682.1289260.023276
3747620.9509818.37187914.746114-63.453483-6.951426573.6640650.096812
3747720.9509818.37187914.74611456.766152-18.706199682.1289260.023276
3747820.9509818.37187914.74611456.76615210.421482711.2566080.015664
3747920.9509818.37187914.746114-27.475114-6.951426609.6424340.061116
3748020.9509818.37187914.74611456.766152-6.951426693.8837000.019845
37481-34.6543528.371879-87.743345-63.453483-6.951426415.5692740.489626
3748220.9509818.37187914.74611456.766152-6.951426693.8837000.019845
37483-73.7639878.37187914.74611456.766152-18.706199587.4139590.081378
3748420.9509818.37187914.74611456.76615230.879240731.7143660.011842
3748520.9509818.37187914.746114-63.453483-18.706199561.9092920.112027
3748620.9509818.37187914.74611456.766152-18.706199682.1289260.023276
37487-73.763987-60.41986714.746114-27.475114-6.951426446.1357200.385743
3748820.9509818.37187914.74611456.766152-18.706199682.1289260.023276
3748920.9509818.371879-87.743345-63.453483-6.951426471.1746060.307389
3749020.9509818.37187914.74611456.76615230.879240731.7143660.011842
3749120.9509818.37187914.74611456.766152-6.951426693.8837000.019845
3749220.9509818.37187914.74611456.76615230.879240731.7143660.011842
37493-34.6543528.37187914.74611456.766152-6.951426638.2783670.041931
3749420.9509818.37187914.746114-27.47511410.421482627.0153430.048671
3749520.9509818.37187914.74611412.76965030.879240687.7178640.021578
37496-34.6543528.37187914.74611456.766152-6.951426638.2783670.041931
3749720.9509818.37187914.74611412.769650-6.951426649.8871980.035921
3749820.9509818.37187914.74611456.76615230.879240731.7143660.011842
3749920.9509818.37187914.746114-27.47511430.879240647.4731000.037099

37500 rows × 7 columns

补充-模型调试

变量IV值以及分箱分析

#默认决策树分箱
ls.plot_col_woe_iv(data,'age') 

png

cut_pointscut_points_id1_num0_numtotal_num1_pct0_pcttotal_pct1_ratewoeivtotal_iv
1(-inf, 36.0]0262821237238650.2621180.1517210.1591000.1101190.5467530.0603600.250005
0(36.0, 55.0]1517758953641300.5163570.4211710.4275330.0807270.2037600.0193950.250005
3(55.0, 63.0]2134526409277540.1341510.1886710.1850270.048461-0.3410360.0185930.250005
2(63.0, inf]387633375342510.0873730.2384370.2283400.025576-1.0039210.1516570.250005
# 手动分箱
ls.plot_col_woe_iv(data,'age',[-inf,20,30,40,inf])
C:\ProgramData\Anaconda3\lib\site-packages\CreditScoreModel\LogisticScoreCard.py:152: RuntimeWarning: divide by zero encountered in logresult['woe'] = np.log(result['1_pct'] / result['0_pct'])  # WOE

png

cut_pointscut_points_id1_num0_numtotal_num1_pct0_pcttotal_pct1_ratewoeivtotal_iv
3(-inf, 20.0]00110.0000000.0000070.0000070.0000000.0000000.0000000.0
2(20.0, 30.0]112449513107570.1240770.0679630.0717130.1156460.6019480.0337780.0
1(30.0, 40.0]2239021949243390.2383800.1568080.1622600.0981960.4188470.0341660.0
0(40.0, inf]363921085111149030.6375420.7752230.7660200.055630-0.1955290.0269210.0
# 不输出具体数据
ls.plot_col_woe_iv(data,'age',[-inf,20,30,40,inf],return_data=False)
C:\ProgramData\Anaconda3\lib\site-packages\CreditScoreModel\LogisticScoreCard.py:152: RuntimeWarning: divide by zero encountered in logresult['woe'] = np.log(result['1_pct'] / result['0_pct'])  # WOE

png

模型效果分析

#默认参数跑出的结果
col_result=ls.col_result
col_continuous_cut_points=[col for col in ls.col_continuous_cut_points if col[0] in ls.col_result]
data_new=data_train[ls.col_result+['y']]score_card=ls.get_logistic_socre_card(data_new,col_continuous_cut_points)
ls.plot_roc_ks(data_new,score_card)
2019 16:15:35 INFO 评分卡制作中。。。
2019 16:15:35 INFO 连续和离散变量划分中。。。
2019 16:15:35 INFO 连续和离散变量划分完成!
2019 16:15:35 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.50it/s]
2019 16:15:38 INFO 根据cut离散化连续变量完成!
2019 16:15:38 INFO WOE转换中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11.89it/s]
2019 16:15:38 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.FutureWarning)
2019 16:15:40 INFO 评分卡制作完成!
2019 16:15:40 INFO 预测用户分数中。。。
2019 16:15:40 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.41it/s]
2019 16:15:43 INFO 根据cut离散化连续变量完成!

png

# 例如:选取全部变量,并按以下切分点分箱
col_result=['y', 'RevolvingUtilizationOfUnsecuredLines', 'age','NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome','NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse','NumberOfDependents']
col_continuous_cut_points=[['RevolvingUtilizationOfUnsecuredLines', [-inf, 0.22, 0.49, 0.86, inf]],['age', [-inf, 35.0, 55.0, 62.0, inf]],['NumberOfTime30-59DaysPastDueNotWorse', [-inf, 0.0, 1.0, inf]],['DebtRatio', [-inf, 0.41, 0.67, 2.66, inf]],['MonthlyIncome', [-inf, 1297.0, 4838.0, 6596.0, inf]],['NumberOfOpenCreditLinesAndLoans', [-inf, 2.0, 3.0, 13.0, inf]],['NumberOfTimes90DaysLate', [-inf, 0.0, inf]],['NumberRealEstateLoansOrLines', [-inf, 0.0, 1.0, 2.0, inf]],['NumberOfTime60-89DaysPastDueNotWorse', [-inf, 0.0, inf]],['NumberOfDependents', [-inf, 0.0, 1.0, 2.0, inf]]]
data_new=data_train[col_result]score_card=ls.get_logistic_socre_card(data_new,col_continuous_cut_points)
ls.plot_roc_ks(data_new,score_card)
2019 16:15:43 INFO 评分卡制作中。。。
2019 16:15:43 INFO 连续和离散变量划分中。。。
2019 16:15:43 INFO 连续和离散变量划分完成!
2019 16:15:43 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.49it/s]
2019 16:15:48 INFO 根据cut离散化连续变量完成!
2019 16:15:48 INFO WOE转换中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 12.17it/s]
2019 16:15:49 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.FutureWarning)
2019 16:15:54 INFO 评分卡制作完成!
2019 16:15:54 INFO 预测用户分数中。。。
2019 16:15:54 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.13it/s]
2019 16:15:58 INFO 根据cut离散化连续变量完成!

在这里插入图片描述

补充-包的参数解释

def __init__(self,max_depth=None,  # 决策树的深度max_leaf_nodes=4,  # 决策树的子节点数min_samples_l

文章来源:https://www.rstk.cn/news/857619.html

一行代码搞定信用评分模型(python)就为大家介绍到这里,《python金融风控评分卡模型和数据分析(加强版)》更多实战案例会定期更新,用于银行培训,大家扫一扫下面二维码,记得收藏课程。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值