(信贷风控十一)随机森林在催收评分卡还款率模型的应用

信贷风控--贷后评分卡 专栏收录该内容
1 篇文章 0 订阅

(信贷风控十一)随机森林在催收评分卡还款率模型的应用(python代码实现)

信贷风控—评分卡

本文链接:https://blog.csdn.net/LuYi_WeiLin/article/details/88049521
(十一)随机森林在催收评分卡还款率模型的应用(python代码实现)
催收评分卡和申请评分卡和行为评分卡不太一样,一般申请评分卡和行为评分卡使用一个模型就可以了,但是催收评分卡由三个模型构成:(不同的模型功能目的不一样,其中失联预测模型是比较重要的)

还款率模型
账龄滚动模型
失联预测模型
这篇博客以还款率模型进行讲解,要讲解还款率模型,我们首先要了解一下随机森林模型

在这里插入图片描述

基于回归树的随机森林(元分类器是由许多回归树构成,每一个元分类器模型并行运行得出一个预测值,取所有元分类器模型的平均值作为最终的预测值)

在这里插入图片描述

随机森林模型的训练步骤
在这里插入图片描述
如何建立还款率模型呢?
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
代码如下,数据可以在我的资源下载,当然了,还款率模型完之后还可以对其进行延伸,预测出来的催回还款率假设定一个阈值(80%,自己可以定),大于80%为可摧回,小于为不可催回,之后可以使用二分类的逻辑回归对客户情况进行预测该客户是可摧回还是不可催回:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

‘’’
时间:20190309
作者:小象学院
‘’’

def MakeupMissingCategorical(x):
if str(x) == ‘nan’:
return ‘Unknown’
else:
return x

def MakeupMissingNumerical(x,replacement):
if np.isnan(x):
return replacement
else:
return x

‘’’
第一步:文件准备
‘’’
foldOfData = ‘H:/’
mydata = pd.read_csv(foldOfData + “还款率模型.csv”,header = 0,engine =‘python’)
#催收还款率等于催收金额/(所欠本息+催收费用)。其中催收费用以支出形式表示
mydata[‘rec_rate’] = mydata.apply(lambda x: x.LP_NonPrincipalRecoverypayments /(x.AmountDelinquent-x.LP_CollectionFees), axis=1)
#还款率假如大于1,按作1处理
mydata[‘rec_rate’] = mydata[‘rec_rate’].map(lambda x: min(x,1))
#整个开发数据分为训练集、测试集2个部分
trainData, testData = train_test_split(mydata,test_size=0.4)

‘’’
第二步:数据预处理
‘’’
#由于不存在数据字典,所以只分类了一些数据
categoricalFeatures = [‘CreditGrade’,‘Term’,‘BorrowerState’,‘Occupation’,‘EmploymentStatus’,‘IsBorrowerHomeowner’,‘CurrentlyInGroup’,‘IncomeVerifiable’]

numFeatures = [‘BorrowerAPR’,‘BorrowerRate’,‘LenderYield’,‘ProsperRating (numeric)’,‘ProsperScore’,‘ListingCategory (numeric)’,‘EmploymentStatusDuration’,‘CurrentCreditLines’,
‘OpenCreditLines’,‘TotalCreditLinespast7years’,‘CreditScoreRangeLower’,‘OpenRevolvingAccounts’,‘OpenRevolvingMonthlyPayment’,‘InquiriesLast6Months’,‘TotalInquiries’,
‘CurrentDelinquencies’,‘DelinquenciesLast7Years’,‘PublicRecordsLast10Years’,‘PublicRecordsLast12Months’,‘BankcardUtilization’,‘TradesNeverDelinquent (percentage)’,
‘TradesOpenedLast6Months’,‘DebtToIncomeRatio’,‘LoanFirstDefaultedCycleNumber’,‘LoanMonthsSinceOrigination’,‘PercentFunded’,‘Recommendations’,‘InvestmentFromFriendsCount’,
‘Investors’]

‘’’
类别型变量需要用目标变量的均值进行编码
‘’’
encodedFeatures = []
encodedDict = {}
for var in categoricalFeatures:
trainData[var] = trainData[var].map(MakeupMissingCategorical)
avgTarget = trainData.groupby([var])[‘rec_rate’].mean()
avgTarget = avgTarget.to_dict()
newVar = var + ‘_encoded’
trainData[newVar] = trainData[var].map(avgTarget)
encodedFeatures.append(newVar)
encodedDict[var] = avgTarget

#对数值型数据的缺失进行补缺
trainData[‘ProsperRating (numeric)’] = trainData[‘ProsperRating (numeric)’].map(lambda x: MakeupMissingNumerical(x,0))
trainData[‘ProsperScore’] = trainData[‘ProsperScore’].map(lambda x: MakeupMissingNumerical(x,0))

avgDebtToIncomeRatio = np.mean(trainData[‘DebtToIncomeRatio’])
trainData[‘DebtToIncomeRatio’] = trainData[‘DebtToIncomeRatio’].map(lambda x: MakeupMissingNumerical(x,avgDebtToIncomeRatio))
numFeatures2 = numFeatures + encodedFeatures

‘’’
第三步:调参
对基于CART的随机森林的调参,主要有:
1,树的个数
2,树的最大深度
3,内部节点最少样本数与叶节点最少样本数
4,特征个数
此外,调参过程中选择的误差函数是均值误差,5倍折叠
‘’’
X, y= trainData[numFeatures2],trainData[‘rec_rate’]

param_test1 = {‘n_estimators’:range(60,91,5)}
gsearch1 = GridSearchCV(estimator = RandomForestRegressor(min_samples_split=50,min_samples_leaf=10,max_depth=8,max_features=‘sqrt’ ,random_state=10),param_grid = param_test1, scoring=‘neg_mean_squared_error’,cv=5)
gsearch1.fit(X,y)
gsearch1.best_params_, gsearch1.best_score_
best_n_estimators = gsearch1.best_params_[‘n_estimators’]

param_test2 = {‘max_depth’:range(3,15), ‘min_samples_split’:range(10,101,10)}
gsearch2 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, min_samples_leaf=10,max_features=‘sqrt’ ,random_state=10,oob_score=True),param_grid = param_test2, scoring=‘neg_mean_squared_error’,cv=5)
gsearch2.fit(X,y)
gsearch2.best_params_, gsearch2.best_score_
best_max_depth = gsearch2.best_params_[‘max_depth’]
best_min_samples_split = gsearch2.best_params_[‘min_samples_split’]

param_test3 = {‘min_samples_leaf’:range(1,20,2)}
gsearch3 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, max_depth = best_max_depth,max_features=‘sqrt’,min_samples_split=best_min_samples_split,random_state=10,oob_score=True),param_grid = param_test3, scoring=‘neg_mean_squared_error’,cv=5)
gsearch3.fit(X,y)
gsearch3.best_params_, gsearch3.best_score_
best_min_samples_leaf = gsearch3.best_params_[‘min_samples_leaf’]

numOfFeatures = len(numFeatures2)
mostSelectedFeatures = numOfFeatures/2
param_test4 = {‘max_features’:range(3,numOfFeatures+1)}
gsearch4 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, max_depth=best_max_depth,min_samples_leaf=best_min_samples_leaf,min_samples_split=best_min_samples_split,random_state=10,oob_score=True),param_grid = param_test4, scoring=‘neg_mean_squared_error’,cv=5)
gsearch4.fit(X,y)
gsearch4.best_params_, gsearch4.best_score_
best_max_features = gsearch4.best_params_[‘max_features’]

#把最优参数全部获取去做随机森林拟合
cls = RandomForestRegressor(n_estimators=best_n_estimators,max_depth=best_max_depth,min_samples_leaf=best_min_samples_leaf,min_samples_split=best_min_samples_split,max_features=best_max_features,random_state=10,oob_score=True)
cls.fit(X,y)
trainData[‘pred’] = cls.predict(trainData[numFeatures2])
trainData[‘less_rr’] = trainData.apply(lambda x: int(x.pred > x.rec_rate), axis=1)
np.mean(trainData[‘less_rr’])
err = trainData.apply(lambda x: np.abs(x.pred - x.rec_rate), axis=1)
np.mean(err)

#随机森林评估变量重要性
importance=cls.feature_importances_
featureImportance=dict(zip(numFeatures2,importance))
featureImportance=sorted(featureImportance.items(),key=lambda x:x[1],reverse=True)

‘’’
第四步:在测试集上测试效果
‘’’
#类别型数据处理
for var in categoricalFeatures:
testData[var] = testData[var].map(MakeupMissingCategorical)
newVar = var + ‘_encoded’
testData[newVar] = testData[var].map(encodedDict[var])
avgnewVar = np.mean(trainData[newVar])
testData[newVar] = testData[newVar].map(lambda x: MakeupMissingNumerical(x, avgnewVar))

#连续性数据处理
testData[‘ProsperRating (numeric)’] = testData[‘ProsperRating (numeric)’].map(lambda x: MakeupMissingNumerical(x,0))
testData[‘ProsperScore’] = testData[‘ProsperScore’].map(lambda x: MakeupMissingNumerical(x,0))
testData[‘DebtToIncomeRatio’] = testData[‘DebtToIncomeRatio’].map(lambda x: MakeupMissingNumerical(x,avgDebtToIncomeRatio))

testData[‘pred’] = cls.predict(testData[numFeatures2])
testData[‘less_rr’] = testData.apply(lambda x: int(x.pred > x.rec_rate), axis=1)
np.mean(testData[‘less_rr’])
err = testData.apply(lambda x: np.abs(x.pred - x.rec_rate), axis=1)
np.mean(err)

  • 1
    点赞
  • 0
    评论
  • 8
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值