章节3 多姿多彩的Python数据可视化

赵孝正

已于 2023-05-20 21:19:58 修改

阅读量832

点赞数

分类专栏： # Python数据分析-pandas玩转Excel 文章标签： python 信息可视化 pandas

于 2023-04-01 21:55:12 首次发布

本文链接：https://blog.csdn.net/weixin_46713695/article/details/129903416

版权

Python数据分析-pandas玩转Excel 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

9. 柱状图 bar
- 9.1 pandas 绘图
- 9.2 matplotlib.pyplot 绘图
10. 分组柱图，深度优化
- 10.1 pandas 绘图
11. 叠加柱状图，水平柱状图
- 11.1 Excel 绘图
- 11.2 pandas 绘图
12. 饼图 pie
13. 折线趋势图，叠加区域图
- 12.1 excel 绘图
- 12.2 python 绘图
14. 散点图、直方图
15. 密度图、数据相关性
- 14.3 pandas强大的数据分析功能简介

文不如表，表不如图，数据分析师可以影响到老板的决策。在不调用第三方库的情况下，pandas就可以完成 10 多种图表的制作。

数据太大时，excel打不开文件，这时需要pandas

9. 柱状图 bar

9.1 pandas 绘图

import pandas as pd

students = pd.read_excel('C:/Temp/Students.xlsx')
print(students)

在这里插入图片描述

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.plot.bar(x='Field', y='Number')
print(students)
plt.show()

在这里插入图片描述
从高到低进行排序：降序排列

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
students.plot.bar(x='Field', y='Number')
print(students)
plt.show()

在这里插入图片描述
颜色不一样，看起来比较乱，

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
students.plot.bar(x='Field', y='Number', color='orange')  # 修改
print(students)
plt.show()

在这里插入图片描述
标签完整显示：

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
students.plot.bar(x='Field', y='Number', color='orange')  
plt.tight_layout()  # 新增，让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
增加Title：

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
students.plot.bar(x='Field', y='Number', color='orange', \
title='International Students by Field')  # 新增 title
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述

9.2 matplotlib.pyplot 绘图

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
# students.plot.bar(x='Field', y='Number', color='orange', title='International Students by Field')  #  title
plt.bar(students.Field, students.Number)
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
上图中 x 轴标签交织在一起，通过 ratotion 参数将其展开：

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
# students.plot.bar(x='Field', y='Number', color='orange', title='International Students by Field')  #  title
plt.bar(students.Field, students.Number, color='orange')
plt.xticks(students.Field, rotation='90')  # 新增
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='Number', inplace=True, ascending=False)
# students.plot.bar(x='Field', y='Number', color='orange', title='International Students by Field')  #  title
plt.bar(students.Field, students.Number, color='orange')
plt.xticks(students.Field, rotation='90')  # 新增

plt.xlabel('Field')
plt.ylabel('Number')
plt.title('International Students by Field', fontsize=16)

plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述

10. 分组柱图，深度优化

拼颜值的时代，连做个图都不能例外
本节目标图片：分组比较
在这里插入图片描述
重点：

x 轴标签倾斜
颜色调整
标题、label 的字号

10.1 pandas 绘图

两列数据对比：
在这里插入图片描述

import pandas as pd

students = pd.read_excel('C:/Temp/Students.xlsx')
print(students)

在这里插入图片描述
制图：

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.show()

在这里插入图片描述
排序：
inplace=Ture不会生成新的DataFrame，ascending=False 从大到小排

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.show()

在这里插入图片描述
变的宽松：

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
加title

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'], title='International Students by Field')  
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
添加Titie，同时将文字大小调为 16 号大写、粗体，必须要用 plt.title进行设置

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.title('International Students by Field', fontsize=16, fontweight='bold')
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
已经有了一个默认的 x 轴的label了，再添加一个x轴、y轴的label

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.title('International Students by Field', fontsize=16, fontweight='bold')
plt.xlabel('Field', fontweight='bold') # 加粗
plt.ylabel('Number', fontweight='bold') # 加粗
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
将标签倾斜 45 度.
用matplotlib绘制的图基本分为两大区域，figure和axes（轴）

想优化轴的话，首先要拿到它的 axes，需要调用函数plt.gca()

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.title('International Students by Field', fontsize=16, fontweight='bold')
plt.xlabel('Field', fontweight='bold') # 加粗
plt.ylabel('Number', fontweight='bold') # 加粗
ax = plt.gca() # get cart axes 的缩写
ax.set_xticklabels(students['Field'], rotation=45)  # 重新铺一遍 x 轴上的文字
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

发现有点怪，x 轴的 ticks 没有对其，因为他们以每个单词的中点，作为中心点旋转的，继续修改
在这里插入图片描述
ha：Horizontal alignment 水平对其，将x轴标签单词的开始部分对其，以右侧部分为中心点，进行选择 45 度

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.title('International Students by Field', fontsize=16, fontweight='bold')
plt.xlabel('Field', fontweight='bold') # 加粗
plt.ylabel('Number', fontweight='bold') # 加粗
ax = plt.gca() # get cart axes 的缩写
ax.set_xticklabels(students['Field'], rotation=45, ha='right')  # 重新铺一遍 x 轴上的文字 ha：Horizontal alignment 水平对其
plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述
发现新的问题，留的空白太多

plt.gca()拿到当前的figure图形，子图形的调整subplots_adjust()，left留出 20% 的宽度，bottom（底部）留出42%的宽度，将plt.tight_layout()注销掉

import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.plot.bar(x='Field', y=['2016', '2017'], color=['orange', 'red'])  
plt.title('International Students by Field', fontsize=16, fontweight='bold')
plt.xlabel('Field', fontweight='bold') # 加粗
plt.ylabel('Number', fontweight='bold') # 加粗
ax = plt.gca() # get cart axes 的缩写
ax.set_xticklabels(students['Field'], rotation=45, ha='right')  # 重新铺一遍 x 轴上的文字 ha：Horizontal alignment 水平对其
# plt.tight_layout()  # 让图更紧凑，这样可以将x轴标签显示完整。
plt.show()

在这里插入图片描述

11. 叠加柱状图，水平柱状图

叠加柱状图：
在这里插入图片描述
转 90° 之后，变成横向的柱状图

11.1 Excel 绘图

数据介绍：用户ID、用户姓名、用户在 10、11、12 三个月使用的次数，看用户在10、11、12三个月的叠加柱状图。
在这里插入图片描述
选中区域，然后

变成横向，

11.2 pandas 绘图

import pandas as pd

users= pd.read_excel('C:/Temp/Users.xlsx')
print(users)

在这里插入图片描述
绘制一个分组柱状图（类似上一节课）

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users.plot.bar(x='Name', y=['Oct', 'Nov', 'Dec']]
plt.show()

在这里插入图片描述
将上图，变成叠加柱状图，添加参数 stacked=True

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users.plot.bar(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True]
plt.show()

在这里插入图片描述
上图中，标签是截断的，修改一下，增加 plt.tight_layout()

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users.plot.bar(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True]
plt.tight_layout()
plt.show()

在这里插入图片描述
加Title，并按用户总的使用量进行排序：

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users['Total'] = users['Oct'] + users['Nov'] + users['Dec']
users.sort_values(by='Total', inplace=True, ascending=False)
users.plot.bar(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True]
plt.tight_layout()
plt.show()

在这里插入图片描述
将图转为水平，将plot.bar改为plot.barh (h: horizontal 水平的)

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users['Total'] = users['Oct'] + users['Nov'] + users['Dec']
users.sort_values(by='Total', inplace=True, ascending=False)
users.plot.barh(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True]
plt.tight_layout()
plt.show()

在这里插入图片描述
ascending=True

import pandas as pd
import matplotlib.pyplot as plt

users= pd.read_excel('C:/Temp/Users.xlsx')
users['Total'] = users['Oct'] + users['Nov'] + users['Dec']
users.sort_values(by='Total', inplace=True, ascending=True)
users.plot.barh(x='Name', y=['Oct', 'Nov', 'Dec'], stacked=True]
plt.tight_layout()
plt.show()

在这里插入图片描述

12. 饼图 pie

在这里插入图片描述
数据，2016年和2017年从不同国家来北京的人数：

点进去之后，发现带一个单引号，告诉 excel，2016 和 2017 不是数字，是字符串，如果不这样标识的话，python会把它们两个识别成数字。

imort pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('C:/Temp/Students.xlsx')
# 画饼图只需要1列数据就可以了
students['2017'].plot.pie()
plt.show()

在这里插入图片描述

students = pd.read_excel('C:/Temp/Students.xlsx', index_col='From')
# 画饼图只需要1列数据就可以了
students['2017'].plot.pie()
plt.show()

在这里插入图片描述

students = pd.read_excel('C:/Temp/Students.xlsx', index_col='From')
# 画饼图只需要1列数据就可以了
students['2017'].plot.pie(fontsize=8)
plt.title('Source of International Students', fontsize=16, fontweight='bold')
plt.ylabel('2017', fontsize=12, fontweight='bold')
plt.show()

在这里插入图片描述
逆时针转的，改为顺时针: sort_values

students = pd.read_excel('C:/Temp/Students.xlsx', index_col='From')
# 画饼图只需要1列数据就可以了
students['2017'].sort_values(ascending=True).plot.pie(fontsize=8)
plt.title('Source of International Students', fontsize=16, fontweight='bold')
plt.ylabel('2017', fontsize=12, fontweight='bold')
plt.show()

在这里插入图片描述
起始点好像还不太对，希望起始点（最大）是从上面开始： startangle

students = pd.read_excel('C:/Temp/Students.xlsx', index_col='From')
# 画饼图只需要1列数据就可以了
students['2017'].sort_values(ascending=True).plot.pie(fontsize=8, startangle=-270)
plt.title('Source of International Students', fontsize=16, fontweight='bold')
plt.ylabel('2017', fontsize=12, fontweight='bold')
plt.show()

在这里插入图片描述
手动拉大的效果：

简便方法：counterclock默认为True,替换sort_value，为了不让数据从小到大排列，只需要加上counterclock=False即可

students = pd.read_excel('C:/Temp/Students.xlsx', index_col='From')
# 画饼图只需要1列数据就可以了
students['2017'].plot.pie(fontsize=8, counterclock=False, startangle=-270)
plt.title('Source of International Students', fontsize=16, fontweight='bold')
plt.ylabel('2017', fontsize=12, fontweight='bold')
plt.show()

13. 折线趋势图，叠加区域图

折线图，是为了让我们看到一个趋势，这个趋势往往是时间的趋势
在这里插入图片描述
叠加区域图，除了让我们看到趋势之外，还能让我们看到，在某一个节点上，所有值叠加起来的效果，有点像叠加柱状图。

12.1 excel 绘图

在这里插入图片描述
销售业绩情况，主要用到 Accesseries, Bikes, Clothing, Components 四列

12.2 python 绘图

import pandas as pd
import matplotlib.pyplot as plt

weeks = pd.read_excel('C:/Temp/Weeks.xlsx', index_col='Week')
weeks.plot(y=['Accesseries'])
plt.show()

在这里插入图片描述

import pandas as pd
import matplotlib.pyplot as plt

weeks = pd.read_excel('C:/Temp/Weeks.xlsx', index_col='Week')
weeks.plot(y=['Accesseries', 'Bikes', 'Clothing', 'Components'])
plt.show()

在这里插入图片描述
缺一个 Title，x 轴标签间隔太大（0，10，20，……），

weeks = pd.read_excel('C:/Temp/Weeks.xlsx', index_col='Week')
weeks.plot(y=['Accesseries', 'Bikes', 'Clothing', 'Components'])
plt.title('Scales Weekly Trend', fontsize=16, fontweight='bold')
plt.ylabel('Total', fontsize=12, fontsize='bold')
plt.xticks(weeks.index, fontsize=8)
plt.show()

在这里插入图片描述
叠加区域图：weeks.plot.area()即可

weeks = pd.read_excel('C:/Temp/Weeks.xlsx', index_col='Week')
weeks.plot.area(y=['Accesseries', 'Bikes', 'Clothing', 'Components'])
plt.title('Scales Weekly Trend', fontsize=16, fontweight='bold')
plt.ylabel('Total', fontsize=12, fontsize='bold')
plt.xticks(weeks.index, fontsize=8)
plt.show()

在这里插入图片描述
叠加区域图和叠加柱状图很像，如果画叠加柱状图，应该怎么画？
weeks.plot.bar(y=[], stacked=True)

weeks = pd.read_excel('C:/Temp/Weeks.xlsx', index_col='Week')
weeks.plot.bar(y=['Accesseries', 'Bikes', 'Clothing', 'Components'])
plt.title('Scales Weekly Trend', fontsize=16, fontweight='bold')
plt.ylabel('Total', fontsize=12, fontsize='bold')
plt.xticks(weeks.index, fontsize=8)
plt.show()

在这里插入图片描述
叠加柱状图重在表达在某一个节点上，这些值叠加起来是一个怎样的高度。
而叠加区域图，能够为我们指明一个趋势，叠加区域图告诉我们的决策者，一年下来，走到哪一周会上扬，走到哪一周会下来，起起伏伏。

14. 散点图、直方图

14.1 绘图目标样式

散点图
在这里插入图片描述
直方图

密度图

14.2 数据介绍

西雅图前几年的房价情况，售价、卧室、洗手间、居住面积、地下室、院子、层、建造时间，数据比较复杂
在这里插入图片描述
数据分析的工作，是把看似杂乱无章的表组合起来，画成图，再传递给决策者。

先看房屋的面积和价格有什么关系
用 excel 选中这两列，然后插入数据就可以了。
看一下房子基本上都在什么价位，房价列画一个直方图即可
从下面的直方图可以看出，房子的价格主要集中在20w-30w之间
看一下，房屋面积有多大，选中房子列，插入直方图即可

在这里插入图片描述
这时候有人问，西雅图的房子多大概率是70w一套呢？这时候就引出了 密度图。

excel 的局限：

excel在处理 2 万条以上的数据的时候，就压力很大了。
画密度图的话，需要有额外的计算，这时候用 excel 就不太方便了。

14.2 pandas绘图散点图、直方图、密度图

import pandas as pd
import matplotlib.pyplot as plt

homes = pd.read_excel('C:/Temp/home_data.xlsx')
print(homes.head())

在这里插入图片描述
有些列被隐藏了，如何将所有的列都显示出来呢？
写一行配置：pd.options.display.max_columns = 777

import pandas as pd
import matplotlib.pyplot as plt

pd.options.display.max_columns = 777
homes = pd.read_excel('C:/Temp/home_data.xlsx')
print(homes.head())

在这里插入图片描述
绘制散点图

homes.plot.scatter(x='sqft_living', y='price')
plt.show()

在这里插入图片描述
直方图，绘制直方图之前，我们先看一下西雅图房子的面积分布情况

homes.sqft_living.plot.hist(bins=100)  # bins越大越细长
plt.show()

在这里插入图片描述
值最高的条，它的面积有多大？看不出来，怎么办？需要把 xsticks 重新铺一遍

homes.sqft_living.plot.hist(bins=100)  # bins越大越细长
plt.sticks(range(0, max(homes.sqft_living), 500), fontsize=8, rotation=90)
plt.show()

在这里插入图片描述
通过上图可以看出，最高点在x轴上对应的坐标，大致是1000-1500之间。

看价格的直方图

homes.price.plot.hist(bins=100)  # bins越大越细长
plt.sticks(range(0, max(homes.price), 100000), fontsize=8, rotation=90)
plt.show()

在这里插入图片描述
从上图可以看出，最高的点x轴大致对应 35-40w这个区间。

15. 密度图、数据相关性

密度图 kde kenel density e
我手里有一笔钱，想在西雅图买一房子，那么，我有多大的概率在西雅图地区买到一个1250尺的房子，这时候需要用到密度图。

homes.sqft_living.plot.kde()  # bins越大越细长
plt.sticks(range(0, max(homes.sqft_living), 500), fontsize=8, rotation=90)
plt.show()

在这里插入图片描述
从上图可以看出，大概有0.0004-0.0005的概率可以买到1250尺的房子。

14.3 pandas强大的数据分析功能简介

你的数据蕴含了什么信息？上面我们靠自己的想法一列一列的将它拼凑出来，而pandas只需要一行代码，就可以将它蕴含的秘密告诉我们

print(homes.corr())  # 两列两列之间的相关性

在这里插入图片描述
由上图可知，房屋面积 sqft_living 和价格 price 的相关性是0.7，是一个非常高的相关性。但并达不到1，说明房屋价格除了和面积有关，肯定还有别的影响因素。

由下图可知，和房间相关度第二大的是 bathrooms ，0.52，等等。
在这里插入图片描述

赵孝正

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
章节3 多姿多彩的Python数据可视化

Python数据可视化
复制链接

扫一扫

专栏目录