利用python构建信用卡评分

一、目录

  1. 样本概述和说明
    1. 数据源
    2. 目标变量的定义
    3. 样本统计
    4. 数据可视化
  2. 特征工程
    1. 缺失值处理
    2. 同值化处理
    3. 业务相关性
    4. IV值筛选变量
    5. 皮尔森系数
  3. 最优分箱
    1. 等距分箱
    2. 卡方分箱
  4. 模型评估
    1. 准确率
    2. 召回率
    3. ROC曲线
    4. KS值
  5. 评分卡
  6. 模型结果

二、样本概述和说明

1.数据源
本文的数据源从Lending Club官方网站下载
开发数据集:2017.01-2017.06
验证数据集:2017.07-2017.09

选取的开发数据集共计202234条数据,变量总数为145个,验证数据集共计122703条数据,变量总数为145个。其145个变量的部分解释如下:

这里写图片描述
这里写图片描述
2. 目标变量的定义
在释义中,绿色标注的就是目标变量,为了方便将其变量名标注为y,其变量名和统计个数如下:
这里写图片描述
根据业务场景对其进行定义,一般逾期超过30天以上的客户定义为信用较差的客户,对于一些客户无法直接定义其信用的优劣,则将这部分不确定的客户,定义为信用中等的客户,具体的定义如下:
这里写图片描述
这里写图片描述

#定义新函数 , 给出目标Y值
def coding(col, codeDict):
    colCoded = pd.Series(col, copy=True)
    for key, value in codeDict.items():
        colCoded.replace(key, value, inplace=True)

    return colCoded
codeDict = {'Current':0,'Fully Paid':0,
     'Late (31-120 days)':1,'Charged Off':1,
     'Late (16-30 days)':2,'In Grace     Period':2,'Default':2}      

3. 样本统计

表现窗口:在时间轴上从观察点向后推得的表现窗口,用来提取目标变量和进行表现排除。
观察窗口:从观察点向前推一段时间得到观察窗口,用来提取自变量信息和进行观察窗口排除,观察窗口一般长度通常为6-12个月。

这里写图片描述
这里写图片描述
可以看出其特征值数量过大,需要进行有效的筛选,而且其坏样本的占比比较低,仅6-7%,在后续的处理中需要做不平衡样本处理。
4. 数据可视化
数据可视化可以比较直观的观察数据,对数据有一个整体的了解,在开发样本中,风险评估(grade)处于B和C的比较多,换言之,申请贷款的人风险评估大多数处于中等偏上。

  • 4
    点赞
  • 38
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
lending club 贷款数据 2018年第二季度的贷款数据 "id","member_id","loan_amnt","funded_amnt","funded_amnt_inv","term","int_rate","installment","grade","sub_grade","emp_title","emp_length","home_ownership","annual_inc","verification_status","issue_d","loan_status","pymnt_plan","url","desc","purpose","title","zip_code","addr_state","dti","delinq_2yrs","earliest_cr_line","inq_last_6mths","mths_since_last_delinq","mths_since_last_record","open_acc","pub_rec","revol_bal","revol_util","total_acc","initial_list_status","out_prncp","out_prncp_inv","total_pymnt","total_pymnt_inv","total_rec_prncp","total_rec_int","total_rec_late_fee","recoveries","collection_recovery_fee","last_pymnt_d","last_pymnt_amnt","next_pymnt_d","last_credit_pull_d","collections_12_mths_ex_med","mths_since_last_major_derog","policy_code","application_type","annual_inc_joint","dti_joint","verification_status_joint","acc_now_delinq","tot_coll_amt","tot_cur_bal","open_acc_6m","open_act_il","open_il_12m","open_il_24m","mths_since_rcnt_il","total_bal_il","il_util","open_rv_12m","open_rv_24m","max_bal_bc","all_util","total_rev_hi_lim","inq_fi","total_cu_tl","inq_last_12m","acc_open_past_24mths","avg_cur_bal","bc_open_to_buy","bc_util","chargeoff_within_12_mths","delinq_amnt","mo_sin_old_il_acct","mo_sin_old_rev_tl_op","mo_sin_rcnt_rev_tl_op","mo_sin_rcnt_tl","mort_acc","mths_since_recent_bc","mths_since_recent_bc_dlq","mths_since_recent_inq","mths_since_recent_revol_delinq","num_accts_ever_120_pd","num_actv_bc_tl","num_actv_rev_tl","num_bc_sats","num_bc_tl","num_il_tl","num_op_rev_tl","num_rev_accts","num_rev_tl_bal_gt_0","num_sats","num_tl_120dpd_2m","num_tl_30dpd","num_tl_90g_dpd_24m","num_tl_op_past_12m","pct_tl_nvr_dlq","percent_bc_gt_75","pub_rec_bankruptcies","tax_liens","tot_hi_cred_lim","total_bal_ex_mort","total_bc_limit","total_il_high_credit_limit","revol_bal_joint","sec_app_earliest_cr_line","sec_app_inq_last_6mths","sec_app_mort_acc","sec_app_open_acc","sec_app_revol_util","sec_app_open_act_il","sec_app_num_rev

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值