部分Matplotlib可视化代码模板（多子图、散点图、一图中多折线图、一图中多种图、箱线图、双y轴图、饼图、ROC曲线、频率直方图）

老猫iiii

已于 2024-01-07 22:41:50 修改

阅读量478

点赞数 12

文章标签： matplotlib 信息可视化 python scikit-learn pandas

于 2024-01-07 22:37:34 首次发布

本文链接：https://blog.csdn.net/m0_62052024/article/details/135446141

版权

数据可视化

1、基本操作

中文不显示问题：

import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

2、模板

2.1、一图中多子图

代码：

# 绘图
# 创建一个10x8的图像
fig, axs = plt.subplots(figsize=(16, 9), nrows=2, ncols=2)
# 子图1
line1, = axs[0, 0].plot(train_size_list, accuracyList, 'r', label= '准确率')
line2, = axs[0, 0].plot(train_size_list, recallList, 'b', label= '召回率')
line3, = axs[0, 0].plot(train_size_list, precisionList, 'g', label= '精确率')
axs[0, 0].set_xlabel('训练集占比')
axs[0, 0].set_ylabel('数值')
axs[0, 0].set_title('准确率、召回率、精确率与训练集占比曲线')
axs[0, 0].legend(handles=[line1, line2, line3]) # 添加图例
# 子图2
axs[0, 1].plot(train_size_list, accuracyList, 'r', label= '准确率')
axs[0, 1].set_xlabel('训练集占比')
axs[0, 1].set_ylabel('准确率')
axs[0, 1].set_title('准确率与训练集占比曲线')
# 子图3
axs[1, 0].plot(train_size_list, recallList, 'b', label= '召回率')
axs[1, 0].set_xlabel('训练集占比')
axs[1, 0].set_ylabel('召回率')
axs[1, 0].set_title('召回率与训练集占比曲线')
# 子图4
axs[1, 1].plot(train_size_list, precisionList, 'g', label= '精确率')
axs[1, 1].set_xlabel('训练集占比')
axs[1, 1].set_ylabel('精确率')
axs[1, 1].set_title('精确率与训练集占比曲线')
# 调整子图之间的间距
plt.subplots_adjust(wspace=0.2, hspace=0.3)
plt.show()

效果：

在这里插入图片描述

2.2、散点图

代码：

positive = dataSet1[dataSet1['Admitted'] == 1]
negative = dataSet1[dataSet1['Admitted'] == 0]
fig, ax = plt.subplots(figsize = (16, 9))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label = 'Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label = 'Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1')
ax.set_ylabel('Exam 2')
plt.show()

效果：

在这里插入图片描述

2.3、一图中多折线图

代码：

# 绘制折线图
codeList = np.arange(1, len(y_real) + 1)

line1, = plt.plot(codeList, y_linear, 'r', label= '线性回归')
line2, = plt.plot(codeList, y_ridge, 'b', label= '岭回归')
line3, = plt.plot(codeList, y_lasson, 'g', label= '套索回归')
line4, = plt.plot(codeList, y_real, 'y', label= '真实值')
plt.xlabel('验证集数据编号')
plt.ylabel('数值')
plt.title('三种模型在验证集上的表现')
# 添加图例
plt.legend(handles=[line1, line2, line3, line4])
plt.show()

效果：
在这里插入图片描述

2.4、一图中多柱状图子图（设置x轴名称）

代码：

# 绘图
r2_score = [r2_linear, r2_ridge, r2_lasson]
rmse = [rmse_linear, rmse_ridge, rmse_lasson]
# 创建 x 轴坐标
models = ['Linear', 'Ridge', 'Lasso']
x = range(len(models))  # 创建 x 轴坐标

# 创建第一个子图（柱状图）
plt.subplot(1, 2, 1)  # 1行2列的子图布局，选择第一个子图
plt.bar(x, r2_score, color='b')
plt.title('R2_score')
plt.xlabel('Model',)
plt.ylabel('Value')
plt.xticks(x, models, rotation=45)  # 设置 x 轴刻度和标签
plt.legend(['R2_score']) # 添加图例

# 创建第二个子图（柱状图）
plt.subplot(1, 2, 2)  # 1行2列的子图布局，选择第二个子图
plt.bar(x, rmse, color='r')
plt.title('RMSE')
plt.xlabel('Model')
plt.ylabel('Value')
plt.xticks(x, models, rotation=45)  # 设置 x 轴刻度和标签
plt.legend(['RMSE'])  # 添加图例

# 调整子图之间的间距
plt.tight_layout()
# 显示图形
plt.show()

效果：
在这里插入图片描述

2.6、一图中多种图（散点与折线）

代码：

# 绘制梯度下降后的图像
plt.scatter(dataSet1['人口'], dataSet1['收益'], color = 'blue')
plt.xlabel("properity")
plt.ylabel("profit")
predicted_profit = np.dot(X, w_result.T)
plt.plot(dataSet1['人口'], predicted_profit, 'r')
plt.show()

效果：
在这里插入图片描述

2.7、设置每个隔若干个显示一次x轴标签

# 创建一个宽高比为16:9的图表
plt.figure(figsize=(16, 5))
plt.plot(timeSeries, labels, 'g')
plt.xlabel('Quarter')
# Set the X-axis
# 获取原始 x 轴刻度位置和标签
x_ticks = plt.xticks()[0]
x_labels = plt.xticks()[1]
# 保留每隔4个数据点的刻度位置和标签
new_x_ticks = x_ticks[::4]
new_x_labels = x_labels[::4]
# 设置新的刻度位置和标签
plt.xticks(new_x_ticks, new_x_labels, fontsize=10, rotation=30)
plt.ylabel('Value')
plt.title('Change curve of comprehensive indicators in the next 10 years')
plt.show()

效果图：
在这里插入图片描述

原始：
在这里插入图片描述

2.7、plt箱线图

数据集（dataset)：

在这里插入图片描述

plt.figure(figsize=(10, 6))
dataSet.boxplot()
plt.title('Boxplot of Features')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

运行结果如下：

在这里插入图片描述

2.8、一横坐标对应两柱子

# 工作日数据
y1 = login[login['day_of_week'].apply(lambda x: True if x < 5 else False)].groupby('day_of_hour').size().reset_index(name='登录次数')['登录次数']
# 非工作日数据
y2 = login[login['day_of_week'].apply(lambda x: True if x > 4 else False)].groupby('day_of_hour').size().reset_index(name='登录次数')['登录次数']
plt.figure(figsize=(16,9))
# 设置每个柱状图的宽度
bar_width = 0.35
# 生成工作日和非工作日的x轴坐标
x1 = np.arange(len(y1))
x2 = x1 + bar_width
# 工作日
color1 = 'dodgerblue'
plt.bar(x1, y1, width=bar_width, color=color1, label='工作日')
plt.plot(x1, y1, marker='o' , linestyle='--', color='r')
for i, j in zip(x1, y1):
    plt.text(i, j, str(j))
# 非工作日
color2 = 'orange'
plt.bar(x2, y2, width=bar_width, color=color2, label='非工作日')
plt.plot(x2,  y2, marker='o', linestyle='--', color='g')
for i, j in zip(x2, y2):
    plt.text(i, j+100, str(j))
# 设置x轴刻度标签
plt.xticks(x1 + bar_width / 2, x1)
# 设置图表标题和坐标轴标签
plt.title('登录次数按小时统计')
plt.xlabel('小时')
plt.ylabel('登录次数')
# 添加图例
plt.legend()
# 显示图形
plt.show()

运行结果：
在这里插入图片描述

2.9、饼图

m = all.shape[0]  # 数据集总条数
values = [all[all['logged_now_time'] <= 30],  # 30天内登录过用户
          all[(all['logged_now_time'] > 30) & (all['logged_now_time'] < 60)],  # 30天至60天内登录过用户
          all[(all['logged_now_time'] > 60) & (all['logged_now_time'] < 90)],  # 60天至90天内登录过用户
          all[all['logged_now_time'] > 90]]  # 90天以上未用户（流失用户）
# 标签
labels = ['30天内登录过用户', '30天至60天内登录过用户', '60天至90天内登录过用户', '90天以上未登录用户']
# 占比数据
sizes = [df.shape[0] / m * 100 for df in values]
# 颜色
colors = ['red', 'green', 'dodgerblue', 'orange']
# 突出显示某个部分
explode = (0, 0, 0, 0.1)
# 绘制饼图
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True)
# 添加标题
plt.title('用户登录情况')
# 添加文本
plt.text(0.8, -1.5, f'用户流失率: {values[-1].shape[0]/m*100:.2f}%', fontsize=20, weight='bold', color='red', ha='right')
# 显示图形
plt.show()

运行结果：

在这里插入图片描述

2.10、ROC曲线

# Logistic Regression
fpr_lr, tpr_lr, thresholds_lr = roc_curve(y_test, pred_lr)
roc_auc_lr = auc(fpr_lr, tpr_lr)

# KNN
fpr_knn, tpr_knn, thresholds_knn = roc_curve(y_test, pred_knn)
roc_auc_knn = auc(fpr_knn, tpr_knn)

# Naive Bayes
fpr_bayes, tpr_bayes, thresholds_bayes = roc_curve(y_test, pred_bayes)
roc_auc_bayes = auc(fpr_bayes, tpr_bayes)

# Decision Tree
fpr_tree, tpr_tree, thresholds_tree = roc_curve(y_test, pred_tree)
roc_auc_tree = auc(fpr_tree, tpr_tree)

plt.figure(figsize=(8, 5))

# 绘制ORC曲线，第一个参数为
plt.plot(fpr_lr, tpr_lr, linewidth=2,linestyle='--', label='Logistic Regression (area = %0.2f)' % roc_auc_lr)
plt.plot(fpr_knn, tpr_knn, linewidth=2, linestyle='--', label='KNN (area = %0.2f)' % roc_auc_knn)
plt.plot(fpr_bayes, tpr_bayes, linewidth=2, linestyle='--', label='Naive Bayes (area = %0.2f)' % roc_auc_bayes)
plt.plot(fpr_tree, tpr_tree, linewidth=2, linestyle='--', label='Decision Tree (area = %0.2f)' % roc_auc_tree)

plt.plot([0, 1], [0, 1], 'k--', label='Guess')
plt.title('ROC Curves for Multiple Classifiers', fontsize=15)
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.legend(loc='lower right', fontsize=10)
plt.show()

运行结果如下：

在这里插入图片描述

2.11、一图中做子图

# 绘图
fig, axs = plt.subplots(figsize=(16, 13), nrows=3, ncols=1)
dfs = [course_user_num, price_course_user_num, free_course_user_num]
colors = ['#2295A2', '#E3A031', '#DD7C83']
titles = ['所有课程报名人数', '收费课程报名人数', '免费课程报名人数']
number = 20 # 展示的数量
for i in range(len(dfs)):
    df = dfs[i]
    df = df.sort_values('user_count', ascending=False)
    x, y = df['course_id'][:number], df['user_count'][:number]
    axs[i].bar(x, y, color=colors[i], label=titles[i])
    axs[i].set_xticks(range(len(x)))  # 设置刻度位置
    axs[i].set_xticklabels(x, rotation=0)  # 设置刻度标签的旋转角度
    axs[i].set_ylabel('报名人数')
    axs[i].set_title(titles[i], color='r')
    axs[i].legend()

plt.subplots_adjust(hspace=0.3)
plt.show()

运行结果如下：

在这里插入图片描述

2.12、双Y轴图

# 根据课程分组并计算报名人数和价格的汇总信息
course_info = study_information.groupby('course_id').agg({'user_id': 'count', 'price': 'mean'}).reset_index()
course_info.columns = ['course_id', 'user_count', 'avg_price']
# 按报名人数降序排序，并取前100个课程
course_info = course_info.sort_values('user_count', ascending=False).head(60)
# 创建图表
fig, ax1 = plt.subplots(figsize=(16, 9))
# 左侧 Y 轴 - 报名人数（柱状图）
color = '#2295A2'
ax1.set_xlabel('课程ID')
ax1.set_ylabel('报名人数', color=color)
ax1.bar(course_info['course_id'], course_info['user_count'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
# 右侧 Y 轴 - 课程价格（折线图）
ax2 = ax1.twinx()
color = '#F8766E'
ax2.set_ylabel('课程价格', color=color)
ax2.plot(course_info['course_id'], course_info['avg_price'], color=color, marker='o')
ax2.tick_params(axis='y', labelcolor=color)
# x 轴标签设置及旋转
ax1.set_xticks(range(len(course_info['course_id'])))
ax1.set_xticklabels(course_info['course_id'], rotation=90)
# 添加图例
ax1.legend(['报名人数'], loc='upper left', fontsize=10)
ax2.legend(['课程价格'], loc='upper right', fontsize=10)
# 添加总标题
plt.suptitle('课程报名人数与价格关系', fontsize=16)
# 调整布局
plt.tight_layout()
plt.show()

运行结果如下：
在这里插入图片描述

2.14、频率直方图

# 插入、删除、替换率频率直方图
fig, axs = plt.subplots(3, 4, figsize=(16, 10))  # 创建一个3x4的子图布局
bins = 20
alpha = 0.5
# 不同率的颜色
colors = ['red', 'blue', 'green', 'orange']
for i, rate in enumerate(['插入率', '删除率', '替换率']):
    for j, jianji in enumerate(['A', 'T', 'G', 'C']):
        color = colors[j]  # 使用颜色列表中的不同颜色
        title = jianji + '_' + rate + '(%)'
        temp = data[title]
        axs[i, j].hist(temp, bins=bins, alpha=alpha, color=color, label=jianji)
        axs[i, j].set_title(jianji + '-' + rate)
        axs[i, j].set_xlabel(rate + '(%)')
        axs[i, j].set_ylabel('频率')

# 添加图例
handles, labels = axs[0, 0].get_legend_handles_labels()
fig.legend(handles, labels, loc='upper right')
# 添加总标题
plt.suptitle('插入、删除、替换率频率直方图', fontsize=16, color='blue')
# 调整布局
plt.tight_layout()
plt.show()