评分卡模型阈值表(接上篇)

前言

接上篇:评分卡模型案例(GiveMeSomeCredit,kaggle数据)(自己练习版本)-CSDN博客

通过评分卡,可以对数据记录进行评分。模型投产时,需要设定一个评分阈值,将低于评分阈值的客户拒绝。

一、计算信用评分

全量样本进行打分

# 本段代码的目的是把输入数据映射到分箱,并且选取相应的分值来计算最后的信用评分
def str_to_int(s):
    if s == '-inf':
        return -999999999.0
    elif s=='inf':
        return 999999999.0
    else:
        return float(s)
    
def map_value_to_bin(feature_value,feature_to_bin):
    for idx, row in feature_to_bin.iterrows():
        bins = str(row['Binning'])
        left_open = bins[0]=="("
        right_open = bins[-1]==")"
        binnings = bins[1:-1].split(',')
        in_range = True
        # check left bound
        if left_open:
            if feature_value<= str_to_int(binnings[0]):
                in_range = False   
        else:
            if feature_value< str_to_int(binnings[0]):
                in_range = False   
        #check right bound
        if right_open:
            if feature_value>= str_to_int(binnings[1]):
                in_range = False 
        else:
            if feature_value> str_to_int(binnings[1]):
                in_range = False   
        if in_range:
            return row['Binning']
    return null

def map_to_score(df,score_card):
    scored_columns = list(score_card['Variable'].unique())
    score = 0
    for col in scored_columns:
        feature_to_bin = score_card[score_card['Variable']==col]
        feature_value = df[col]
        selected_bin = map_value_to_bin(feature_value,feature_to_bin)
        selected_record_in_scorecard = feature_to_bin[feature_to_bin['Binning'] == selected_bin]
        score += selected_record_in_scorecard['Score'].iloc[0]
    return score  

def calculate_score_with_card(df,score_card,A):
    df['score'] = df.apply(map_to_score,args=(score_card,),axis=1)
    df['score'] = df['score']+A
    df['score'] = df['score'].astype(int)
    return df

df_score001 = calculate_score_with_card(df_train,score_card,A)

二、计算阈值表

#----阈值表计算函数-----
def cal_score_threshold_tb(score_df,bin_step=10,is_bad_col_name='SeriousDlqin2yrs',score_col_name='score'):
    # -----计算分组起始结束字段--------------
    bin_start = math.trunc(score_df[score_col_name].min()/bin_step)*bin_step                     
    bin_end   =  math.trunc(score_df[score_col_name].max()/bin_step+1)*bin_step
    score_thd = pd.DataFrame(columns=['分组名称','本组客户','本组好客户','本组坏客户'])         
    #-----统计分组内的好坏客户个数-------
    for cur_bin in range(bin_start,bin_end,bin_step):
        cur_bin_name ='['+str(cur_bin)+'-'+str(cur_bin+bin_step)+')'
        cur_score_df = score_df[(score_df[score_col_name]>=cur_bin)&(score_df[score_col_name]<cur_bin+bin_step)][is_bad_col_name]
        bad_cn       = cur_score_df.sum()
        cn           = cur_score_df.shape[0]
        score_thd.loc[score_thd.shape[0]]=[cur_bin_name,cn,cn-bad_cn,bad_cn]
    #------计算阈值表其它字段-------------------    
    score_thd['总客户']      = score_thd['本组客户'].sum()
    score_thd['总好客户']    = score_thd['本组好客户'].sum()
    score_thd['总坏客户']    = score_thd['本组坏客户'].sum()
    score_thd['阈值']        = score_thd['分组名称'].apply(lambda x: '<'+x.split('-')[1].replace(')',''))  
    score_thd['损失客户']    = score_thd['本组客户'].cumsum()
    score_thd['损失客户%']   = score_thd['损失客户']/score_thd['总客户']
    score_thd['损失好客户']  = score_thd['本组好客户'].cumsum()
    score_thd['损失好客户%'] = score_thd['损失好客户']/score_thd['总好客户']
    score_thd['剔除坏客户']  = score_thd['本组坏客户'].cumsum()
    score_thd['剔除坏客户%'] = score_thd['剔除坏客户']/score_thd['总坏客户']
    tmp = score_thd['本组客户'].copy()
    tmp[tmp==0] = 1
    score_thd['本组坏客户占比']       = score_thd['本组坏客户']/tmp
    score_thd['损失客户中坏客户占比'] = score_thd['剔除坏客户']/score_thd['损失客户']
    return score_thd

# --------计算分数阈值表---------------
score_df  = df_score001[['score','SeriousDlqin2yrs']]
score_thd = cal_score_threshold_tb(score_df,bin_step=20)

输出如下: 

 分组名称本组客户本组好客户本组坏客户阈值损失客户损失客户%损失好客户损失好客户%剔除坏客户剔除坏客户%本组坏客户占比损失客户中坏客户占比
0[300-320)15411<320150.01%40.00%110.11%73.33%73.33%
1[320-340)16340123<3401780.12%440.03%1341.36%75.46%75.28%
2[340-360)25797160<3604350.29%1410.10%2942.98%62.26%67.59%
3[360-380)324117207<3807590.51%2580.19%5015.07%63.89%66.01%
4[380-400)357163194<40011160.75%4210.30%6957.04%54.34%62.28%
5[400-420)374160214<42014901.00%5810.42%9099.21%57.22%61.01%
6[420-440)533228305<44020231.36%8090.58%121412.30%57.22%60.01%
7[440-460)1124585539<46031472.11%13941.00%175317.76%47.95%55.70%
8[460-480)932495437<48040792.73%18891.36%219022.18%46.89%53.69%
9[480-500)725435290<50048043.22%23241.67%248025.12%40.00%51.62%
10[500-520)20511263788<52068554.60%35872.58%326833.10%38.42%47.67%
11[520-540)14861022464<54083415.59%46093.31%373237.80%31.22%44.74%
12[540-560)24541782672<560107957.24%63914.59%440444.61%27.38%40.80%
13[560-580)25071951556<580133028.92%83425.99%496050.24%22.18%37.29%
14[580-600)22381824414<6001554010.42%101667.30%537454.43%18.50%34.58%
15[600-620)72526395857<6202279215.28%1656111.89%623163.11%11.82%27.34%
16[620-640)72316546685<6403002320.13%2310716.59%691670.05%9.47%23.04%
17[640-660)84177753664<6603844025.77%3086022.15%758076.78%7.89%19.72%
18[660-680)99199336583<6804835932.42%4019628.86%816382.68%5.88%16.88%
19[680-700)1047310044429<7005883239.44%5024036.07%859287.03%4.10%14.60%
20[700-720)1498714543444<7207381949.49%6478346.51%903691.52%2.96%12.24%
21[720-740)1514914865284<7408896859.64%7964857.18%932094.40%1.87%10.48%
22[740-760)1962619374252<76010859472.80%9902271.09%957296.95%1.28%8.81%
23[760-780)1865918484175<78012725385.31%11750684.36%974798.72%0.94%7.66%
24[780-800)109701090169<80013822392.66%12840792.19%981699.42%0.63%7.10%
25[800-820)109421088557<820149165100.00%139292100.00%9873100.00%0.52%6.62%

简单说明:

以第13行为例,阈值设置580, 将有13302个客户被拒绝,占比8.92%,其中8342个好客户,4960个坏客户......

三、分数分布图

# --------画出分数分布------------------
x_axis = score_thd['分组名称'].apply(lambda x: x.split('-')[0].replace('[',''))  
plt.bar(x_axis, score_thd['本组好客户'], align="center", color="#66c2a5",label="good")
plt.bar(x_axis, score_thd['本组坏客户'], align="center", bottom=score_thd['本组好客户'], color='r', label="bad")
plt.rcParams["figure.figsize"] = (9, 4) # 设置figure_size尺寸
plt.legend()
plt.show()

从图中可以得到,分数越低,坏客户的占比越高,当分数大于580时,坏客户的占比较少;整体客户质量是较好的,分数基本集中在高分段

参考

网址链接:评分卡实例-阈值表与评分阈值-老饼讲解-机器学习-通俗易懂

  • 20
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值