基于LendingClub数据的信贷分析和建模报告

本文基于LendingClub数据进行信贷风险分析,包括数据获取、探索、预处理、特征工程和建模。研究发现样本集为不平衡数据,重点关注特征分布、缺失值处理、文本特征分类及时序特征分析。通过评分卡和随机森林模型建立风险评估模型,以辅助决策。
摘要由CSDN通过智能技术生成

一:课题分析

二:数据获取

三:数据探索

3.1主要特征含义理解

3.2特征分布

3.2.1目标特征分布

3.2.2分类变量的分布

3.2.3连续数值特征分布

3.2.4时序特征分布

3.2.5文字特征分布

3.2.6两两特征的协方差

 

四:数据预处理

4.1数据集划分

4.2特征缺失值识别与处理

4.2.1严重缺失值的处理

4.2.2缺失值填充

4.3同值性特征识别与处理

4.4特征格式变换

4.5文本特征处理

4.5.1工作机构分类

4.5.2借款人州地址分类

4.5.2.1 k-means聚类分析

4.5.2.2 等频分箱

4.6时序特征处理

4.7特征编码

4.8归一化处理

4.9警惕数据泄露

4.9.1警惕不恰当特征

4.9.2错误的交叉验证策略

 

五:特征工程

5.1特征衍生

5.2筛选变量

特征筛选的目的

5.2.1依据共线性筛选变量

5.2.2依据IV筛选变量

5.3特征分箱

5.3.1对无缺省值的连续型数值变量分箱

5.3.2对含有缺省值的连续型数值变量分箱

 

六:建模

6.1建立评分卡

6.2建立随机森林模型

 

七:总结

一:课题分析

课题:

研究小微企业主贷款的风险特征,提出可应用性强的风险评估模型构建方案。

课题分析:

小微企业贷款信用评估与小微企业主个人信用情况关系密切,本文将以lendingclub信贷平台上的公开数据作为小微企业主信贷数据模拟样本,构造一个简单明了的传统信贷申请评分卡(A卡),和一个解释性较差的黑箱预测模型,用于辅助决策。

 

二:数据获取

根据巴塞尔协议提供的经验,正常还款12期以上的贷款人,其还款状态会趋于稳定,因此,我选择LendingClub平台(以下简称LC)2017年Q1的数据,那么我们的样本到目前为止还款全部超过12个月其数据更新度高样本可以被有效利用

由上图,样本共有42538条数据,共有145个特征变量,样本量够大,对它们的分析具有统计意义。

此样本是经过LC平台依据一些条件(比如FICO值)筛选过的美国借款人的样本,因此应用具有一定局限性。

 

三:数据探索

在着手处理数据之前,先了解基本的数据情况,为进一步的数据探索、数据预处理、特征工程和建模做准备。

3.1主要特征含义理解

主要特征摘录:

3.2特征分布

3.2.1目标特征分布

 

fully paid:完全结清 charged off:坏账注销

由上表可知,此样本集是一个不平衡数据集。在后续过程中需要考虑到这一点。

 

3.2.2分类变量的分布

 

探索典型分类变量依据好坏样本两类的分布并进行可视化展示

lending club 贷款数据 2018年第二季度的贷款数据 "id","member_id","loan_amnt","funded_amnt","funded_amnt_inv","term","int_rate","installment","grade","sub_grade","emp_title","emp_length","home_ownership","annual_inc","verification_status","issue_d","loan_status","pymnt_plan","url","desc","purpose","title","zip_code","addr_state","dti","delinq_2yrs","earliest_cr_line","inq_last_6mths","mths_since_last_delinq","mths_since_last_record","open_acc","pub_rec","revol_bal","revol_util","total_acc","initial_list_status","out_prncp","out_prncp_inv","total_pymnt","total_pymnt_inv","total_rec_prncp","total_rec_int","total_rec_late_fee","recoveries","collection_recovery_fee","last_pymnt_d","last_pymnt_amnt","next_pymnt_d","last_credit_pull_d","collections_12_mths_ex_med","mths_since_last_major_derog","policy_code","application_type","annual_inc_joint","dti_joint","verification_status_joint","acc_now_delinq","tot_coll_amt","tot_cur_bal","open_acc_6m","open_act_il","open_il_12m","open_il_24m","mths_since_rcnt_il","total_bal_il","il_util","open_rv_12m","open_rv_24m","max_bal_bc","all_util","total_rev_hi_lim","inq_fi","total_cu_tl","inq_last_12m","acc_open_past_24mths","avg_cur_bal","bc_open_to_buy","bc_util","chargeoff_within_12_mths","delinq_amnt","mo_sin_old_il_acct","mo_sin_old_rev_tl_op","mo_sin_rcnt_rev_tl_op","mo_sin_rcnt_tl","mort_acc","mths_since_recent_bc","mths_since_recent_bc_dlq","mths_since_recent_inq","mths_since_recent_revol_delinq","num_accts_ever_120_pd","num_actv_bc_tl","num_actv_rev_tl","num_bc_sats","num_bc_tl","num_il_tl","num_op_rev_tl","num_rev_accts","num_rev_tl_bal_gt_0","num_sats","num_tl_120dpd_2m","num_tl_30dpd","num_tl_90g_dpd_24m","num_tl_op_past_12m","pct_tl_nvr_dlq","percent_bc_gt_75","pub_rec_bankruptcies","tax_liens","tot_hi_cred_lim","total_bal_ex_mort","total_bc_limit","total_il_high_credit_limit","revol_bal_joint","sec_app_earliest_cr_line","sec_app_inq_last_6mths","sec_app_mort_acc","sec_app_open_acc","sec_app_revol_util","sec_app_open_act_il","sec_app_num_rev
评论 14
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值