Loan Prediction

该博客主要探讨了Loan Prediction项目,分析了各种变量对贷款批准的影响。数据探索显示,贷款批准率约为69%,男性、已婚、受过教育的申请人以及信用良好的人更有可能获得贷款。在数据清洗阶段,处理了缺失值和离群值,采用了取对数的方法。构建预测模型时,发现逻辑回归可能过度依赖信用历史,而决策树和随机森林模型则揭示了其他变量的重要性。
摘要由CSDN通过智能技术生成

#Loan Prediction
项目地址:https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
df = pd.read_csv("/Users/mac/Desktop/kaggle/P2/train.csv") #导入数据

##数据探索
###变量识别

df.head(10)
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y
4 LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y
5 LP001011 Male Yes 2 Graduate Yes 5417 4196.0 267.0 360.0 1.0 Urban Y
6 LP001013 Male Yes 0 Not Graduate No 2333 1516.0 95.0 360.0 1.0 Urban Y
7 LP001014 Male Yes 3+ Graduate No 3036 2504.0 158.0 360.0 0.0 Semiurban N
8 LP001018 Male Yes 2 Graduate No 4006 1526.0 168.0 360.0 1.0 Urban Y
9 LP001020 Male Yes 1 Graduate No 12841 10968.0 349.0 360.0 1.0 Semiurban N
df.describe()
ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History
count 614.000000 614.000000 592.000000 600.00000 564.000000
mean 5403.459283 1621.245798 146.412162 342.00000 0.842199
std 6109.041673 2926.248369 85.587325 65.12041 0.364878
min 150.000000 0.000000 9.000000 12.00000 0.000000
25% 2877.500000 0.000000 100.000000 360.00000 1.000000
50% 3812.500000 1188.500000 128.000000 360.00000 1.000000
75% 5795.000000 2297.250000 168.000000 360.00000 1.000000
max 81000.000000 41667.000000 700.000000 480.00000 1.000000
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
Loan_ID              614 non-null object
Gender               601 non-null object
Married              611 non-null object
Dependents           599 non-null object
Education            614 non-null object
Self_Employed        582 non-null object
ApplicantIncome      614 non-null int64
CoapplicantIncome    614 non-null float64
LoanAmount           592 non-null float64
Loan_Amount_Term     600 non-null float64
Credit_History       564 non-null float64
Property_Area        614 non-null object
Loan_Status          614 non-null object
dtypes: float64(4), int64(1), object(8)
memory usage: 62.4+ KB

注意到有数据缺失的情况,在数据整理阶段进行处理。

###单一变量分析

df['Loan_Status'].value_counts(normalize=True).plot.bar(title= 'Loan_Status')
<matplotlib.axes._subplots.AxesSubplot at 0x111b343c8>

png

分析预测项Loan_Status,约69%的贷款得到批准。

df['Gender'].value_counts(normalize=True)
Male      0.813644
Female    0.186356
Name: Gender, dtype: float64

分析性别特征,发现申请贷款的人多数为男性。

df['Married'].value_counts(normalize=True)
Yes    0.651391
No     0.348609
Name: Married, dtype: float64

分析结婚状态,有65%的贷款人是已婚。

df['Dependents'].value_counts().plot.bar(title= 'Dependents')
<matplotlib.axes._subplots.AxesSubplot at 0x111b4fe10>

png

分析亲属数量,大多数贷款人没有亲属。

df['Loan_Amount_Term'].value_counts()
360.0    512
180.0     44
480.0     15
300.0     13
84.0       4
240.0      4
120.0      3
36.0       2
60.0       2
12.0       1
Name: Loan_Amount_Term, dtype: int64

分析贷款时间,绝大多数为360天。

df['Credit_History'].value_counts(normalize=True)
1.0    0.842199
0.0    0.157801
Name: Credit_History, dtype: float64

分析信用历史,84%的贷款人已偿还债务,信用良好。

df['Education'].value_counts(normalize=True)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值