评分卡模型阈值表（接上篇）

最新推荐文章于 2024-09-06 15:55:40 发布

stayerr

最新推荐文章于 2024-09-06 15:55:40 发布

阅读量583

点赞数 20

文章标签： python 数据挖掘

本文链接：https://blog.csdn.net/stayerr/article/details/141607381

版权

前言

接上篇：评分卡模型案例（GiveMeSomeCredit，kaggle数据）（自己练习版本）-CSDN博客

通过评分卡，可以对数据记录进行评分。模型投产时，需要设定一个评分阈值，将低于评分阈值的客户拒绝。

一、计算信用评分

全量样本进行打分

# 本段代码的目的是把输入数据映射到分箱，并且选取相应的分值来计算最后的信用评分
def str_to_int(s):
    if s == '-inf':
        return -999999999.0
    elif s=='inf':
        return 999999999.0
    else:
        return float(s)
    
def map_value_to_bin(feature_value,feature_to_bin):
    for idx, row in feature_to_bin.iterrows():
        bins = str(row['Binning'])
        left_open = bins[0]=="("
        right_open = bins[-1]==")"
        binnings = bins[1:-1].split(',')
        in_range = True
        # check left bound
        if left_open:
            if feature_value<= str_to_int(binnings[0]):
                in_range = False   
        else:
            if feature_value< str_to_int(binnings[0]):
                in_range = False   
        #check right bound
        if right_open:
            if feature_value>= str_to_int(binnings[1]):
                in_range = False 
        else:
            if feature_value> str_to_int(binnings[1]):
                in_range = False   
        if in_range:
            return row['Binning']
    return null

def map_to_score(df,score_card):
    scored_columns = list(score_card['Variable'].unique())
    score = 0
    for col in scored_columns:
        feature_to_bin = score_card[score_card['Variable']==col]
        feature_value = df[col]
        selected_bin = map_value_to_bin(feature_value,feature_to_bin)
        selected_record_in_scorecard = feature_to_bin[feature_to_bin['Binning'] == selected_bin]
        score += selected_record_in_scorecard['Score'].iloc[0]
    return score  

def calculate_score_with_card(df,score_card,A):
    df['score'] = df.apply(map_to_score,args=(score_card,),axis=1)
    df['score'] = df['score']+A
    df['score'] = df['score'].astype(int)
    return df

df_score001 = calculate_score_with_card(df_train,score_card,A)

二、计算阈值表

#----阈值表计算函数-----
def cal_score_threshold_tb(score_df,bin_step=10,is_bad_col_name='SeriousDlqin2yrs',score_col_name='score'):
    # -----计算分组起始结束字段--------------
    bin_start = math.trunc(score_df[score_col_name].min()/bin_step)*bin_step                     
    bin_end   =  math.trunc(score_df[score_col_name].max()/bin_step+1)*bin_step
    score_thd = pd.DataFrame(columns=['分组名称','本组客户','本组好客户','本组坏客户'])         
    #-----统计分组内的好坏客户个数-------
    for cur_bin in range(bin_start,bin_end,bin_step):
        cur_bin_name ='['+str(cur_bin)+'-'+str(cur_bin+bin_step)+')'
        cur_score_df = score_df[(score_df[score_col_name]>=cur_bin)&(score_df[score_col_name]<cur_bin+bin_step)][is_bad_col_name]
        bad_cn       = cur_score_df.sum()
        cn           = cur_score_df.shape[0]
        score_thd.loc[score_thd.shape[0]]=[cur_bin_name,cn,cn-bad_cn,bad_cn]
    #------计算阈值表其它字段-------------------    
    score_thd['总客户']      = score_thd['本组客户'].sum()
    score_thd['总好客户']    = score_thd['本组好客户'].sum()
    score_thd['总坏客户']    = score_thd['本组坏客户'].sum()
    score_thd['阈值']        = score_thd['分组名称'].apply(lambda x: '<'+x.split('-')[1].replace(')',''))  
    score_thd['损失客户']    = score_thd['本组客户'].cumsum()
    score_thd['损失客户%']   = score_thd['损失客户']/score_thd['总客户']
    score_thd['损失好客户']  = score_thd['本组好客户'].cumsum()
    score_thd['损失好客户%'] = score_thd['损失好客户']/score_thd['总好客户']
    score_thd['剔除坏客户']  = score_thd['本组坏客户'].cumsum()
    score_thd['剔除坏客户%'] = score_thd['剔除坏客户']/score_thd['总坏客户']
    tmp = score_thd['本组客户'].copy()
    tmp[tmp==0] = 1
    score_thd['本组坏客户占比']       = score_thd['本组坏客户']/tmp
    score_thd['损失客户中坏客户占比'] = score_thd['剔除坏客户']/score_thd['损失客户']
    return score_thd

# --------计算分数阈值表---------------
score_df  = df_score001[['score','SeriousDlqin2yrs']]
score_thd = cal_score_threshold_tb(score_df,bin_step=20)

输出如下：

	分组名称	本组客户	本组好客户	本组坏客户	阈值	损失客户	损失客户%	损失好客户	损失好客户%	剔除坏客户	剔除坏客户%	本组坏客户占比	损失客户中坏客户占比
0	[300-320)	15	4	11	<320	15	0.01%	4	0.00%	11	0.11%	73.33%	73.33%
1	[320-340)	163	40	123	<340	178	0.12%	44	0.03%	134	1.36%	75.46%	75.28%
2	[340-360)	257	97	160	<360	435	0.29%	141	0.10%	294	2.98%	62.26%	67.59%
3	[360-380)	324	117	207	<380	759	0.51%	258	0.19%	501	5.07%	63.89%	66.01%
4	[380-400)	357	163	194	<400	1116	0.75%	421	0.30%	695	7.04%	54.34%	62.28%
5	[400-420)	374	160	214	<420	1490	1.00%	581	0.42%	909	9.21%	57.22%	61.01%
6	[420-440)	533	228	305	<440	2023	1.36%	809	0.58%	1214	12.30%	57.22%	60.01%
7	[440-460)	1124	585	539	<460	3147	2.11%	1394	1.00%	1753	17.76%	47.95%	55.70%
8	[460-480)	932	495	437	<480	4079	2.73%	1889	1.36%	2190	22.18%	46.89%	53.69%
9	[480-500)	725	435	290	<500	4804	3.22%	2324	1.67%	2480	25.12%	40.00%	51.62%
10	[500-520)	2051	1263	788	<520	6855	4.60%	3587	2.58%	3268	33.10%	38.42%	47.67%
11	[520-540)	1486	1022	464	<540	8341	5.59%	4609	3.31%	3732	37.80%	31.22%	44.74%
12	[540-560)	2454	1782	672	<560	10795	7.24%	6391	4.59%	4404	44.61%	27.38%	40.80%
13	[560-580)	2507	1951	556	<580	13302	8.92%	8342	5.99%	4960	50.24%	22.18%	37.29%
14	[580-600)	2238	1824	414	<600	15540	10.42%	10166	7.30%	5374	54.43%	18.50%	34.58%
15	[600-620)	7252	6395	857	<620	22792	15.28%	16561	11.89%	6231	63.11%	11.82%	27.34%
16	[620-640)	7231	6546	685	<640	30023	20.13%	23107	16.59%	6916	70.05%	9.47%	23.04%
17	[640-660)	8417	7753	664	<660	38440	25.77%	30860	22.15%	7580	76.78%	7.89%	19.72%
18	[660-680)	9919	9336	583	<680	48359	32.42%	40196	28.86%	8163	82.68%	5.88%	16.88%
19	[680-700)	10473	10044	429	<700	58832	39.44%	50240	36.07%	8592	87.03%	4.10%	14.60%
20	[700-720)	14987	14543	444	<720	73819	49.49%	64783	46.51%	9036	91.52%	2.96%	12.24%
21	[720-740)	15149	14865	284	<740	88968	59.64%	79648	57.18%	9320	94.40%	1.87%	10.48%
22	[740-760)	19626	19374	252	<760	108594	72.80%	99022	71.09%	9572	96.95%	1.28%	8.81%
23	[760-780)	18659	18484	175	<780	127253	85.31%	117506	84.36%	9747	98.72%	0.94%	7.66%
24	[780-800)	10970	10901	69	<800	138223	92.66%	128407	92.19%	9816	99.42%	0.63%	7.10%
25	[800-820)	10942	10885	57	<820	149165	100.00%	139292	100.00%	9873	100.00%	0.52%	6.62%

简单说明：

以第13行为例，阈值设置580，将有13302个客户被拒绝，占比8.92%，其中8342个好客户，4960个坏客户......

三、分数分布图

# --------画出分数分布------------------
x_axis = score_thd['分组名称'].apply(lambda x: x.split('-')[0].replace('[',''))  
plt.bar(x_axis, score_thd['本组好客户'], align="center", color="#66c2a5",label="good")
plt.bar(x_axis, score_thd['本组坏客户'], align="center", bottom=score_thd['本组好客户'], color='r', label="bad")
plt.rcParams["figure.figsize"] = (9, 4) # 设置figure_size尺寸
plt.legend()
plt.show()