ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

326 篇文章 64 订阅

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

# 1、定义数据集

# 1.1、加载德国信用卡数据集

#1.2、对各个变量进行EDA分析

# 1.3、输出连续型变量的mean、std、min、3种分位数、max

# 2、数据预处理

# 2.1、对类别型目标变量映射成数值型变量

# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

# 2.3、筛选特征:分别基于IV、empty、corr指标

# 2.4、分箱处理

# 2.5、利用badrate图进一步调整分箱

# 2.5.1、自定义调整分箱示例

# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

 # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

 # 2.6、对分箱后的数据进行WOE转换

# 2.7、特征选择

# 3、模型建立、训练、评估

# 3.1、切分训练集、测试集

# 3.2、模型训练

# 3.3、模型评估:F1、KS、AUC

# 4、模型上线评估,并计算信用分

# 4.1、评估变量的稳定性PSI:比较训练集和测试集

# 4.2、训练集等频分箱,观测每组的区别

# 4.3、评分卡分数变换


相关文章
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解代码实现

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

# 1、定义数据集

# 1.1、加载德国信用卡数据集

将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 https://archive.ics.uci.edu/ml/datasets/Statlog+

status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomepersonal.status.and.sexother.debtors.or.guarantorspresent.residence.sincepropertyage.in.yearsother.installment.planshousingnumber.of.existing.credits.at.this.bankjobnumber.of.people.being.liable.to.provide.maintenance.fortelephoneforeign.workercreditability
0... < 0 DM6critical account/ other credits existing (not at this bank)radio/television1169unknown/ no savings account... >= 7 years4male : divorced/separatednone4real estate67noneown2skilled employee / official1yes, registered under the customers nameyesgood
10 <= ... < 200 DM48existing credits paid back duly till nowradio/television5951... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2real estate22noneown1skilled employee / official1noneyesbad
2no checking account12critical account/ other credits existing (not at this bank)education2096... < 100 DM4 <= ... < 7 years2male : divorced/separatednone3real estate49noneown1unskilled - resident2noneyesgood
3... < 0 DM42existing credits paid back duly till nowfurniture/equipment7882... < 100 DM4 <= ... < 7 years2male : divorced/separatedguarantor4building society savings agreement/ life insurance45nonefor free1skilled employee / official2noneyesgood
4... < 0 DM24delay in paying off in the pastcar (new)4870... < 100 DM1 <= ... < 4 years3male : divorced/separatednone4unknown / no property53nonefor free2skilled employee / official2noneyesbad
5no checking account36existing credits paid back duly till noweducation9055unknown/ no savings account1 <= ... < 4 years2male : divorced/separatednone4unknown / no property35nonefor free1unskilled - resident2yes, registered under the customers nameyesgood
6no checking account24existing credits paid back duly till nowfurniture/equipment2835500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone4building society savings agreement/ life insurance53noneown1skilled employee / official1noneyesgood
70 <= ... < 200 DM36existing credits paid back duly till nowcar (used)6948... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2car or other, not in attribute Savings account/bonds35nonerent1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesgood
8no checking account12existing credits paid back duly till nowradio/television3059... >= 1000 DM4 <= ... < 7 years2male : divorced/separatednone4real estate61noneown1unskilled - resident1noneyesgood
90 <= ... < 200 DM30critical account/ other credits existing (not at this bank)car (new)5234... < 100 DMunemployed4male : divorced/separatednone2car or other, not in attribute Savings account/bonds28noneown2management/ self-employed/ highly qualified employee/ officer1noneyesbad
100 <= ... < 200 DM12existing credits paid back duly till nowcar (new)1295... < 100 DM... < 1 year3male : divorced/separatednone1car or other, not in attribute Savings account/bonds25nonerent1skilled employee / official1noneyesbad
11... < 0 DM48existing credits paid back duly till nowbusiness4308... < 100 DM... < 1 year3male : divorced/separatednone4building society savings agreement/ life insurance24nonerent1skilled employee / official1noneyesbad
120 <= ... < 200 DM12existing credits paid back duly till nowradio/television1567... < 100 DM1 <= ... < 4 years1male : divorced/separatednone1car or other, not in attribute Savings account/bonds22noneown1skilled employee / official1yes, registered under the customers nameyesgood
13... < 0 DM24critical account/ other credits existing (not at this bank)car (new)1199... < 100 DM... >= 7 years4male : divorced/separatednone4car or other, not in attribute Savings account/bonds60noneown2unskilled - resident1noneyesbad
14... < 0 DM15existing credits paid back duly till nowcar (new)1403... < 100 DM1 <= ... < 4 years2male : divorced/separatednone4car or other, not in attribute Savings account/bonds28nonerent1skilled employee / official1noneyesgood
15... < 0 DM24existing credits paid back duly till nowradio/television1282100 <= ... < 500 DM1 <= ... < 4 years4male : divorced/separatednone2car or other, not in attribute Savings account/bonds32noneown1unskilled - resident1noneyesbad
16no checking account24critical account/ other credits existing (not at this bank)radio/television2424unknown/ no savings account... >= 7 years4male : divorced/separatednone4building society savings agreement/ life insurance53noneown2skilled employee / official1noneyesgood
17... < 0 DM30no credits taken/ all credits paid back dulybusiness8072unknown/ no savings account... < 1 year2male : divorced/separatednone3car or other, not in attribute Savings account/bonds25bankown3skilled employee / official1noneyesgood
180 <= ... < 200 DM24existing credits paid back duly till nowcar (used)12579... < 100 DM... >= 7 years4male : divorced/separatednone2unknown / no property44nonefor free1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesbad
19no checking account24existing credits paid back duly till nowradio/television3430500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone2car or other, not in attribute Savings account/bonds31noneown1skilled employee / official2yes, registered under the customers nameyesgood

#1.2、对各个变量进行EDA分析

# 数值型变量:数据类型、缺失率、唯一值、均值、标准差、分位数等,
# 分类型变量:数据类型、缺失率、唯一值、top1(占比第一的数据类)等。

typesizemissinguniquemean_or_top1std_or_top2min_or_top31%_or_top410%_or_top550%_or_bottom575%_or_bottom490%_or_bottom399%_or_bottom2max_or_bottom1
status.of.existing.checking.accountcategory10000.00%4no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%
duration.in.monthint6410000.00%3320.90312.058814454691824366072
credit.historycategory10000.00%5existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%
purposeobject10000.00%10radio/television:28.00%car (new):23.40%furniture/equipment:18.10%car (used):10.30%business:9.70%education:5.00%repairs:2.20%domestic appliances:1.20%others:1.20%retraining:0.90%
credit.amountint6410000.00%9213271.2582822.736876250425.839322319.53972.257179.414180.3918424
savings.account.and.bondscategory10000.00%5... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%
present.employment.sincecategory10000.00%51 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%1 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%
installment.rate.in.percentage.of.disposable.incomeint6410000.00%42.9731.11871467411134444
personal.status.and.sexcategory10000.00%4male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%
other.debtors.or.guarantorscategory10000.00%3none:90.70%guarantor:5.20%co-applicant:4.10%none:90.70%guarantor:5.20%co-applicant:4.10%
present.residence.sinceint6410000.00%42.8451.10371789611134444
propertycategory10000.00%4car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%
age.in.yearsint6410000.00%5335.54611.3754685719202333425267.0175
other.installment.planscategory10000.00%3none:81.40%bank:13.90%stores:4.70%none:81.40%bank:13.90%stores:4.70%
housingcategory10000.00%3own:71.30%rent:17.90%for free:10.80%own:71.30%rent:17.90%for free:10.80%
number.of.existing.credits.at.this.bankint6410000.00%41.4070.57765446811112234
jobcategory10000.00%4skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%
number.of.people.being.liable.to.provide.maintenance.forint6410000.00%21.1550.36208577211111222
telephonecategory10000.00%2none:59.60%yes, registered under the customers name:40.40%none:59.60%yes, registered under the customers name:40.40%
foreign.workercategory10000.00%2yes:96.30%no:3.70%yes:96.30%no:3.70%
creditabilityobject10000.00%2good:70.00%bad:30.00%good:70.00%bad:30.00%

# 1.3、输出连续型变量的mean、std、min、3种分位数、max

duration.in.monthcredit.amountinstallment.rate.in.percentage.of.disposable.incomepresent.residence.sinceage.in.yearsnumber.of.existing.credits.at.this.banknumber.of.people.being.liable.to.provide.maintenance.for
mean20.9033271.2582.9732.84535.5461.4071.155
std12.058814452822.7368761.1187146741.10371789611.375468570.5776544680.362085772
min4250111911
25%121365.5222711
50%182319.5333311
75%243972.25444221
max7218424447542

# 2、数据预处理

# 2.1、对类别型目标变量映射成数值型变量

# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

ivginientropyunique
creditability12.22649152002
status.of.existing.checking.account0.6660115030.3680372040.5451963414
duration.in.month0.3547835740.4067550430.60965916133
credit.amount0.3514549660.4086798340.610864302921
credit.history0.2932335470.3940896130.5806307475
age.in.years0.211196620.414339280.61086320653
savings.account.and.bonds0.1960095570.404838450.5913766945
purpose0.1691950660.4059902920.59360941510
property0.1126382620.4100377880.5990910684
present.employment.since0.0864336310.4122853250.6017824645
housing0.0832934340.4123560670.6020244673
other.installment.plans0.0576145420.4146075410.6047125723
foreign.worker0.0438774120.4171704410.6068281122
other.debtors.or.guarantors0.0320193220.4172089460.6075392613
installment.rate.in.percentage.of.disposable.income0.026322090.4176997470.608111034
number.of.existing.credits.at.this.bank0.0132665240.4188780970.6094930274
personal.status.and.sex0.0088399190.4192381710.6099442874
job0.0087627660.4192082340.6099373174
telephone0.0063776050.4194414910.6101963442
present.residence.since0.0035887730.4196852950.6104882694
number.of.people.being.liable.to.provide.maintenance.for4.34E-050.4199961820.610859752

# 2.3、筛选特征:分别基于IV、empty、corr指标

drop_cols: 
 {'empty': array([], dtype=float64), 'iv': array(['personal.status.and.sex', 'present.residence.since',
       'number.of.existing.credits.at.this.bank', 'job',
       'number.of.people.being.liable.to.provide.maintenance.for',
       'telephone'], dtype=object), 'corr': array([], dtype=object)}

# 2.4、分箱处理

对数值型变量和分类型变量进行分箱,分箱方法支持卡方chi、决策树、百分位、等频、等距分箱

data_df_s2bins_dict: 
 {'status.of.existing.checking.account': [['no checking account'], ['... >= 200 DM / salary assignments for at least 1 year'], ['0 <= ... < 200 DM'], ['... < 0 DM']], 'duration.in.month': [9, 12, 13, 16, 36, 45], 'credit.history': [['critical account/ other credits existing (not at this bank)'], ['delay in paying off in the past', 'existing credits paid back duly till now'], ['all credits at this bank paid back duly', 'no credits taken/ all credits paid back duly']], 'purpose': [['retraining', 'car (used)'], ['radio/television'], ['furniture/equipment'], ['domestic appliances', 'business', 'repairs'], ['car (new)'], ['others', 'education']], 'credit.amount': [3556], 'savings.account.and.bonds': [['... >= 1000 DM', '500 <= ... < 1000 DM', 'unknown/ no savings account'], ['100 <= ... < 500 DM'], ['... < 100 DM']], 'present.employment.since': [['4 <= ... < 7 years'], ['... >= 7 years'], ['1 <= ... < 4 years'], ['unemployed'], ['... < 1 year']], 'installment.rate.in.percentage.of.disposable.income': [2, 3, 4], 'other.debtors.or.guarantors': [['guarantor', 'none', 'co-applicant']], 'property': [['real estate'], ['building society savings agreement/ life insurance'], ['car or other, not in attribute Savings account/bonds'], ['unknown / no property']], 'age.in.years': [26, 35, 37, 49], 'other.installment.plans': [['none'], ['stores', 'bank']], 'housing': [['own'], ['rent'], ['for free']], 'foreign.worker': [['no', 'yes']], 'creditability': [['good'], ['bad']]}

# 2.5、利用badrate图进一步调整分箱

# 2.5.1、自定义调整分箱示例

# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

 

 # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

 # 2.6、对分箱后的数据进行WOE转换

status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomeother.debtors.or.guarantorspropertyage.in.yearsother.installment.planshousingforeign.workercreditabilitycreditability_map
00.818098706-1.280933845-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.1573002890-0.461034959-0.194156014-0.121178625-0.1941560140-5.7037824750
10.4013917831.1349799330.087868755-0.4100628170.315638150.2713578440.032103245-0.1554664690-0.4610349590.48083491-0.121178625-0.19415601406.5510803351
2-1.176263223-0.128416292-0.7337405780.587786665-0.1534921350.271357844-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.7037824750
30.8180987060.5245244680.0878687550.0955565160.315638150.271357844-0.394415272-0.15546646900.028573372-0.266352306-0.1211786250.4726044110-5.7037824750
40.8180987060.1086883060.0878687550.3592004880.315638150.2713578440.032103245-0.06453852100.586082361-0.266352306-0.1211786250.47260441106.5510803351
5-1.1762632230.5245244680.0878687550.5877866650.31563815-0.7621400520.032103245-0.15546646900.586082361-0.044353168-0.1211786250.4726044110-5.7037824750
6-1.1762632230.1086883060.0878687550.095556516-0.153492135-0.762140052-0.235566071-0.06453852100.028573372-0.266352306-0.121178625-0.1941560140-5.7037824750
70.4013917830.5245244680.087868755-0.8056251640.315638150.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.7037824750
8-1.176263223-0.1284162920.087868755-0.410062817-0.153492135-0.762140052-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.7037824750
90.4013917830.108688306-0.7337405780.3592004880.315638150.2713578440.319230430.15730028900.034191365-0.044353168-0.121178625-0.19415601406.5510803351
100.401391783-0.1284162920.0878687550.359200488-0.1534921350.2713578440.470820289-0.06453852100.034191365-0.044353168-0.1211786250.4044452206.5510803351
110.8180987061.1349799330.0878687550.2332880.315638150.2713578440.470820289-0.06453852100.0285733720.48083491-0.1211786250.4044452206.5510803351
120.401391783-0.1284162920.087868755-0.410062817-0.1534921350.2713578440.032103245-0.25131442800.0341913650.48083491-0.121178625-0.1941560140-5.7037824750
130.8180987060.108688306-0.7337405780.359200488-0.1534921350.271357844-0.2355660710.15730028900.034191365-0.266352306-0.121178625-0.19415601406.5510803351
140.818098706-0.6652902260.0878687550.359200488-0.1534921350.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.7037824750
150.8180987060.1086883060.087868755-0.410062817-0.1534921350.139551880.0321032450.15730028900.034191365-0.044353168-0.121178625-0.19415601406.5510803351
16-1.1762632230.108688306-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.15730028900.028573372-0.266352306-0.121178625-0.1941560140-5.7037824750
170.8180987060.1086883061.2340708350.2332880.31563815-0.7621400520.470820289-0.15546646900.034191365-0.0443531680.477550835-0.1941560140-5.7037824750
180.4013917830.1086883060.087868755-0.8056251640.315638150.271357844-0.2355660710.15730028900.586082361-0.044353168-0.1211786250.47260441106.5510803351
19-1.1762632230.1086883060.087868755-0.410062817-0.153492135-0.762140052-0.235566071-0.06453852100.034191365-0.044353168-0.121178625-0.1941560140-5.7037824750

# 2.7、特征选择

# 通过向前、向后、双向选择来进行特征选择,使用aic、bic、ks、auc 作为选择标准

final_data: 
 (1000, 3)
final_data: 
 Index(['status.of.existing.checking.account', 'creditability',
       'creditability_map'],
      dtype='object')

# 3、模型建立、训练、评估

# 3.1、切分训练集、测试集

# 3.2、模型训练

# 3.3、模型评估:F1、KS、AUC

# 4、模型上线评估,并计算信用分

# 4.1、评估变量的稳定性PSI:比较训练集和测试集

cal PSI 0.012897491574571578

# 4.2、训练集等频分箱,观测每组的区别

minmaxbadsgoodstotalbad_rategood_rateoddsbad_propgood_proptotal_propcum_bad_ratecum_bad_rate_revcum_bads_propcum_bads_prop_revcum_goods_propcum_goods_prop_revcum_total_propcum_total_prop_revkslift
00.0001949760.000204106029229201000.55831740.38933333300.302666667010.558317410.38933333310.55831741
10.0002141220.000214122012512501000.2390057360.16666666700.495633188010.7973231360.44168260.5560.6106666670.7973231361.637554585
20.0002194860.000219486010610601000.2026768640.14133333300.6816816820110.2026768640.6973333330.44412.252252252
30.9994849360.999507974804810inf0.21145374400.0640.08406304710.2114537441100.7613333330.3026666670.7885462563.303964758
40.999530980.999530987807810inf0.34361233500.1040.19414483810.5550660790.788546256100.8653333330.2386666670.4449339213.303964758
50.9995424390.999542439101010110inf0.44493392100.1346666670.302666667110.4449339211010.13466666703.303964758

# 4.3、评分卡分数变换

namevaluescore
0status.of.existing.checking.accountno checking account261.94
1status.of.existing.checking.account... >= 200 DM / salary assignments for at least 1 year258.87
2status.of.existing.checking.account0 <= ... < 200 DM255.66
3status.of.existing.checking.account... < 0 DM254.01
4creditabilitygood744.2
5creditabilitybad-302.01

  • 8
    点赞
  • 39
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一个处女座的程序猿

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值