Day 9 训练


热力图和子图的绘制

先对数据进行常规的处理,包括数据编码等

import pandas as pd
data = pd.read_csv('data.csv')
mapping = {
"Years in current job": {
"10+ years": 10,
"8 years": 8,
"7 years": 7,
"6 years": 6,
"5 years": 5,
"4 years": 4,
"3 years": 3,
"2 years": 2,
"1 year": 1,
    },
"Home Ownership": {
    "Home Mortgage": 0,
    "Rent": 1,
    "Own Home": 2,
    "HaveMortgage": 3
    }
}

data["Years in current job"] = data["Years in current job"].map(mapping["Home Ownership"])
data["Home Ownership"] = data["Home Ownership"].map(mapping["Home Ownership"])

1.介绍了热力图的绘制方法

  1. 热力图原理 :
  • 颜色映射:将数值大小映射到颜色梯度(如-1到1映射为蓝到红)
  • 矩阵展示:每个单元格代表两个变量的相关系数
  • 颜色强度:颜色越深(红/蓝)表示相关性越强
  • 对角线对称:变量与自身的相关性为1(对角线)
  1. 关键参数说明 :
  • annot=True :在单元格中显示具体数值
  • cmap :指定颜色映射方案(coolwarm/RdBu等)
  • vmin/vmax :设置颜色映射范围
  • fmt :控制数值显示格式(如’.2f’保留两位小数)
  1. 应用场景 :
  • 特征相关性分析
  • 数据聚类结果展示
  • 矩阵数据模式识别
  • 异常值检测
    该示例展示了如何用seaborn快速创建专业的热力图,特别适合用于探索特征间的关系。
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt 

# 提取连续特征
continuous_features = [
    'Annual Income',  'Tax Liens',
    'Number of Open Accounts', 'Years of Credit History',
    'Maximum Open Credit', 'Number of Credit Problems',
    'Months since last delinquent', 'Bankruptcies',
    'Current Loan Amount', 'Current Credit Balance', 'Monthly Debt',
    'Credit Score'
]

correlation_matrix = data[continuous_features].corr()
plt.rcParams['figure.dpi'] = 300
plt.figure(figsize = (12,10))

sns.heatmap(correlation_matrix,annot = True,cmap='coolwarm',vmin=-1,vmax=1)
plt.title('Correlation Heatmap of Continuous Features')
plt.show()

2.介绍了enumerate()函数

enumerate()是Python内置函数,用于在遍历序列时同时获取索引和值。以下是详细介绍:

  1. 基本用法
for index, value in enumerate(sequence):
    print(index, value)
  1. 参数说明
  • 第一个参数:可迭代对象(如列表、字符串等)
  • 第二个参数start(可选):索引起始值,默认为0
  1. 代码示例
features = ['Annual Income','Years in current job','Tax Liens','Number of Open Accounts']
for i, feature in enumerate(features):
    print(f"索引{i}对应的特征是{feature}")
  1. 特点
  • 避免手动维护计数器变量
  • 支持自定义起始索引(如enumerate(features, 1))
  • 适用于所有可迭代对象
  • 返回的是(index, value)元组

在day9.ipynb的代码中,enumerate()被用来同时获取特征列表的索引和值,方便在循环中定位每个特征的位置。

3.介绍了子图的绘制方法

采用subplots绘制子图,方法一使用改变i,一个个绘制
方法二对每一个i,采用计算行列的方法循环绘制

方法一:


features = ['Annual Income','Years in current job','Tax Liens','Number of Open Accounts']
plt.rcParams['figure.dpi'] = 300
fig,axes = plt.subplots(2,2,figsize=(12,8))

i = 0
feature = features[i]
axes[0,0].boxplot(data[feature].dropna())
axes[0,0].set_title(f'Boxplot of {feature}')
axes[0,0].set_ylabel(feature)

i = 1
feature = features[i]
axes[0,1].boxplot(data[feature].dropna())
axes[0,1].set_title(f'Boxplot of {feature}')
axes[0,1].set_ylabel(feature)

i = 2
feature = features[i]
axes[1,0].boxplot(data[feature].dropna())
axes[1,0].set_title(f'Boxplot of {feature}')
axes[1,0].set_ylabel(feature)

i = 3
feature = features[i]
axes[1,1].boxplot(data[feature].dropna())
axes[1,1].set_title(f'Boxplot of {feature}')
axes[1,1].set_ylabel(feature)

plt.tight_layout()
plt.show()

方法二:

features = ['Annual Income','Years in current job','Tax Liens','Number of Open Accounts']
plt.rcParams['figure.dpi'] = 300

for i,freature in enumerate(features):
    print(f"索引{i}对应的特征是{freature}")

features = ['Annual Income','Years in current job','Tax Liens','Number of Open Accounts']
fig,axes = plt.subplots(2,2,figsize=(12,8))

for i,feature in enumerate(features):
    row = i // 2
    col = i % 2
    axes[row, col].boxplot(data[feature].dropna())
    axes[row, col].set_title(f'Boxplot of {feature}')
    axes[row, col].set_ylabel(feature)

@浙大疏锦行

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值