汽车贷款违约预测

最新推荐文章于 2024-10-24 20:06:20 发布

a useful man

最新推荐文章于 2024-10-24 20:06:20 发布

阅读量3k

点赞数 2

分类专栏： matlab

本文链接：https://blog.csdn.net/sinat_23971513/article/details/118312488

版权

这篇博客探讨了汽车贷款违约数据，利用逻辑回归进行建模和预测。首先，研究了分类变量如破产标识与违约的关系。接着，通过随机抽样创建训练集和测试集，构建并训练逻辑回归模型。模型评估中，使用了混淆矩阵、ROC曲线等指标，最终优化模型以提高预测准确率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

逻辑回归

数据说明:本数据是一份汽车贷款违约数据

名称	中文含义
application_id	申请者ID
account_number	帐户号
bad_ind	是否违约
vehicle_year	汽车购买时间
vehicle_make	汽车制造商
bankruptcy_ind	曾经破产标识
tot_derog	五年内信用不良事件数量(比如手机欠费消号)
tot_tr	全部帐户数量
age_oldest_tr	最久账号存续时间(月)
tot_open_tr	在使用帐户数量
tot_rev_tr	在使用可循环贷款帐户数量(比如信用卡)
tot_rev_debt	在使用可循环贷款帐户余额(比如信用卡欠款)
tot_rev_line	可循环贷款帐户限额(信用卡授权额度)
rev_util	可循环贷款帐户使用比例(余额/限额)
fico_score	FICO打分
purch_price	汽车购买金额(元)
msrp	建议售价
down_pyt	分期付款的首次交款
loan_term	贷款期限(月)
loan_amt	贷款金额
ltv	贷款金额/建议售价*100
tot_income	月均收入(元)
veh_mileage	行使历程(Mile)
used_ind	是否二手车
weight	样本权重

%matplotlib inline
import os
import numpy as np
from scipy import stats
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# os.chdir(‘E:/data’)
pd.set_option(‘display.max_columns’, None)

导入数据和数据清洗

accepts = pd.read_csv('accepts.csv', skipinitialspace=True)
accepts = accepts.dropna(axis=0, how='any')

分类变量的相关关系

曾经破产标识与是否违约是否有关系?

交叉表

cross_table = pd.crosstab(accepts.bankruptcy_ind, 
                         accepts.bad_ind, margins=True)
cross_table

bad_ind	0	1	All
bankruptcy_ind
N	3076	719	3795
Y	243	67	310
All	3319	786	4105

列联表

def percConvert(ser):
    return ser/float(ser[-1])

cross_table.apply(percConvert, axis=1)

bad_ind	0	1	All
bankruptcy_ind
N	0.810540	0.189460	1.0
Y	0.783871	0.216129	1.0
All	0.808526	0.191474	1.0

print('''chisq = %6.4f 
p-value = %6.4f
dof = %i 
expected_freq = %s'''  %stats.chi2_contingency(cross_table.iloc[:2, :2]))

chisq = 1.1500 
p-value = 0.2835
dof = 1 
expected_freq = [[3068.35688185  726.64311815]
 [ 250.64311815   59.35688185]]

逻辑回归

accepts.plot(x='fico_score', y='bad_ind', kind='scatter')

<matplotlib.axes._subplots.AxesSubplot at 0x63c4ef0>

在这里插入图片描述

•随机抽样，建立训练集与测试集

train = accepts.sample(frac=0.7, random_state=1234).copy()
test = accepts[~ accepts.index.isin(train.index)].copy()
print(' 训练集样本量: %i \n 测试集样本量: %i' %(len(train), len(test)))

 训练集样本量: 2874 
 测试集样本量: 1231

lg = smf.glm('bad_ind ~ fico_score', dat

最低0.47元/天解锁文章