银行贷款预测分析(Loan Prediction)

贷款数据的预测分析,通过使用python来分析申请人哪些条件对贷款有影响,并预测哪些客户更容易获得银行贷款。

数据来源 Loan Prediction:https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/

提出问题:哪些客户更容易获得银行贷款?

导入数据

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

# 导入数据
full_data = pd.read_csv('loan_train.csv')
full_data.shape
(614, 13)

数据有614行,13列。

查看前五行数据

full_data.head()
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y
4 LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y

一、理解数据

Loan_ID 贷款人ID

Gender 性别 (Male, female)

ApplicantIncome 申请人收入

Coapplicant Income 申请收入

Credit_History 信用记录

Dependents 亲属人数

Education 教育程度

LoanAmount 贷款额度

Loan_Amount_Term 贷款时间长

Loan_Status 贷款状态 (Y, N)

Married 婚姻状况(NO,Yes)

Property_Area 所在区域包括:城市地区、半城区和农村地区

Self_Employed 职业状况:自雇还是非自雇

查看描述统计数据

full_data.describe()
ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History
count 614.000000 614.000000 592.000000 600.00000 564.000000
mean 5403.459283 1621.245798 146.412162 342.00000 0.842199
std 6109.041673 2926.248369 85.587325 65.12041 0.364878
min 150.000000 0.000000 9.000000 12.00000 0.000000
25% 2877.500000 0.000000 100.000000 360.00000 1.000000
50% 3812.500000 1188.500000 128.000000 360.00000 1.000000
75% 5795.000000 2297.250000 168.000000 360.00000 1.000000
max 81000.000000 41667.000000 700.000000 480.00000 1.000000

查看数据集

full_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
Loan_ID              614 non-null object
Gender               601 non-null object
Married              611 non-null object
Dependents           599 non-null object
Education            614 non-null object
Self_Employed        582 non-null object
ApplicantIncome      614 non-null int64
CoapplicantIncome    614 non-null float64
LoanAmount           592 non-null float64
Loan_Amount_Term     600 non-null float64
Credit_History       564 non-null float64
Property_Area        614 non-null object
Loan_Status          614 non-null object
dtypes: float64(4), int64(1), object(8)
memory usage: 62.4+ KB

看到数据有缺失值,需要后面进一步处理

二、从单变量进行分析

1. 分析目标变量Loan_Status贷款状态

#目标变量统计
full_data['Loan_Status'].value_counts()
Y    422
N    192
Name: Loan_Status, dtype: int64
#统计百分比
full_data['Loan_Status'].value_counts(normalize=True)
Y    0.687296
N    0.312704
Name: Loan_Status, dtype: float64
sns.countplot(x='Loan_Status', data=full_data, palette = 'Set1')

614个人中有422人(约69%)获得贷款批准

2.Gender 性别特征

full_data['Gender'].value_counts(normalize=True)
Male      0.813644
Female    0.186356
Name: Gender, dtype: float64
sns.countplot(x='Gender', data=full_data, palette = 'Set1')

png

数据集中80%的申请人是男性。

3.Married婚姻特征

full_data['Married'].value_counts(normalize=True).plot.bar(title= 'Married')

png

有65%的申请贷款的人是已经结婚。

4.Dependent亲属特征

Dependents=full_data['Dependents'].value_counts(normalize=True)
Dependents
0     0.575960
1     0.170284
2     0.168614
3+    0.085142
Name: Dependents, dtype: float64
Dependents.plot.bar(tit
  • 13
    点赞
  • 120
    收藏
    觉得还不错? 一键收藏
  • 12
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值