得出woe值和IV值后,通过柱状图对数据进行可视化输出。再自定义函数replace_woe,读取训练集,并将得到的woe值进行替换输出到新的文件WoeData里面。去除掉对因变量影响不明显的变量'SeriousDlqin2yrs', 'DebtRatio', 'MonthlyIncome_rf', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'NumberOfDependents'后通过对数据进行statsmodels包对数据进行逻辑回归运算,输出结果。最后利用sklearn.metrics,通过ROC曲线和AUC来评估模型的拟合能力得出最终的图,分析得出预测效果良好,正确率较高。
replace_woe函数
def replace_woe(series, bins, woe):
list = []
i = 0
while i < len(series):
value = series[i]
j = len(bins) - 2
m = len(bins) - 2
while j >= 0:
if value >= bins[j]:
j = -1
else:
j -= 1
m -= 1
list.append(woe[m])
i += 1
return list
替换woe语句例子
df['MonthlyIncome_rf'] = Series(replace_woe(df['MonthlyIncome_rf'], MonthlyIncome_rf_bins, MonthlyIncome_rf_woe))
Logistics模型建立
data = pd.read_csv('WoeData.csv')
Y = data['SeriousDlqin2yrs']
X = data.drop(['SeriousDlqin2yrs', 'DebtRatio', 'MonthlyIncome_rf',
'NumberOfOpenCreditLinesAndLoans','NumberRealEstateLoansOrLines',
'NumberOfDependents'], axis=1)
X1 = sm.add_constant(X)
logit = sm.Logit(Y, X1)
result = logit.fit()
print(result.summary())
模型检验
Y_test = test['SeriousDlqin2yrs']
X_test = test.drop(['SeriousDlqin2yrs', 'DebtRatio', 'MonthlyIncome_rf', 'NumberOfOpenCreditLinesAndLoans',
'NumberRealEstateLoansOrLines', 'NumberOfDependents'], axis=1)
X3 = sm.add_constant(X_test)
resu = result.predict(X3)
fpr, tpr, threshold = roc_curve(Y_test, resu)
rocauc = auc(fpr, tpr)
ax = plt.axes()
ax.set_facecolor("white")
plt.plot(fpr, tpr, 'b', label='AUC = %0.2f' % rocauc)
plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('true')
plt.xlabel('false')