机器学习9-案例1：银行营销策略分析

最新推荐文章于 2024-06-11 21:29:04 发布

哎呦-_-不错

最新推荐文章于 2024-06-11 21:29:04 发布

阅读量5.8k

点赞数 34

分类专栏： # 机器学习基础文章标签：机器学习 python 银行营销策略分析

本BLOG上原创文章未经本人许可，不得用于商业用途，转载请注明出处。

本文链接：https://blog.csdn.net/weixin_46649052/article/details/108588549

版权

这篇博客详细介绍了如何运用机器学习进行银行营销策略分析，包括数据预处理、探索性分析、特征工程、模型训练和评价。通过一系列步骤，作者展示了如何利用Python进行数据分析并构建有效的预测模型。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

数据及代码连接—提取码：1234

1.数据说明与预处理

import pandas as pd
import matplotlib.pyplot as plt

# 加载数据
bank = pd.read_csv('data/bank-full.csv',delimiter=';')
# 通过查看前五行简要查看数据集的构成
print(bank.head(5))

# 通过describe()和info()函数查看各类数据的分布情况
# 用 describe() 函数分别观察数值型（numeric）特征的分布和类别型（categorical）特征的分布
# 数值型（numeric）特征的分布
print(bank.describe())
# 类别型（categorical）特征的分布
print(bank.describe(include=['O']))

# 用info()观察缺失值情况,可看出数据集中不存在缺失值
print(bank.info())

# 在此数据表中，部分数据以字符串 'unknown' 形式存在于类别型特征里。使用如下代码查看类别型特征中 'unknown' 的个数
# 筛选类型为object型数据，统计’unknown‘的个数
for col in bank.select_dtypes(include=['object']).columns:
     print(col,':',bank[bank[col] == 'unknown'][col].count())


# 查看样本类别分布情况
print('样本类别分布情况:\n',bank['y'].value_counts())
# 画图
plt.rcParams['font.sans-serif'] = ['SimHei']

fig,ax = plt.subplots(1,1,figsize=(4,4))
colors = ["#FA5858", "#64FE2E"]
labels ="no", "yes"
ax.set_title('是否认购定期存款',fontsize = 16)
# 饼状图
bank['y'].value_counts().plot.pie(explode=[0,0.25],autopct='%.2f%%',ax = ax,shadow=True,colors = colors,labels=labels,fontsize=14,startangle=25)
plt.axis('off')
plt.show()

   age           job  marital  education  ... pdays  previous poutcome   y
0   58    management  married   tertiary  ...    -1         0  unknown  no
1   44    technician   single  secondary  ...    -1         0  unknown  no
2   33  entrepreneur  married  secondary  ...    -1         0  unknown  no
3   47   blue-collar  married    unknown  ...    -1         0  unknown  no
4   33       unknown   single    unknown  ...    -1         0  unknown  no
[5 rows x 17 columns]
                age        balance  ...         pdays      previous
count  45211.000000   45211.000000  ...  45211.000000  45211.000000
mean      40.936210    1362.272058  ...     40.197828      0.580323
std       10.618762    3044.765829  ...    100.128746      2.303441
min       18.000000   -8019.000000  ...     -1.000000      0.000000
25%       33.000000      72.000000  ...     -1.000000      0.000000
50%       39.000000     448.000000  ...     -1.000000      0.000000
75%       48.000000    1428.000000  ...     -1.000000      0.000000
max       95.000000  102127.000000  ...    871.000000    275.000000
[8 rows x 7 columns]
                job  marital  education  ...  month poutcome      y
count         45211    45211      45211  ...  45211    45211  45211
unique           12        3          4  ...     12        4      2
top     blue-collar  married  secondary  ...    may  unknown     no
freq           9732    27214      23202  ...  13766    36959  39922
[4 rows x 10 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 17 columns):
age          45211 non-null int64
job          45211 non-null object
marital      45211 non-null object
education    45211 non-null object
default      45211 non-null object
balance      45211 non-null int64
housing      45211 non-null object
loan         45211 non-null object
contact      45211 non-null object
day          45211 non-null int64
month        45211 non-null object
duration     45211 non-null int64
campaign     45211 non-null int64
pdays        45211 non-null int64
previous     45211 non-null int64
poutcome     45211 non-null object
y            45211 non-null object
dtypes: int64(7), object(10)
memory usage: 5.9+ MB
None
job : 288
marital : 0
education : 1857
default : 0
housing : 0
loan : 0
contact : 13020
month : 0
poutcome : 36959
y : 0
样本类别分布情况:
 no     39922
yes     5289
Name: y, dtype: int64

在这里插入图片描述

2.探索性分析

# 探索性分析
# 1.数值型特征的分布情况
# 通过DataFrame的 hist() 函数查看每个数值型特征的分布情况。值得一提的是，虽然我们是对整个数据表调用 hist()
# 函数，但是由于程序本身无法直观的理解类别型特征（因为它们以str形式存储），所以它们不会显示
bank.hist(bins=25,figsize=(14,10)

最低0.47元/天解锁文章