Python库Pandas数据可视化实战案例

最新推荐文章于 2025-04-17 08:28:07 发布

大数据技术派

最新推荐文章于 2025-04-17 08:28:07 发布

阅读量4.4k

点赞数 3

文章标签： python 数据分析数据挖掘

原文链接：https://study.163.com/course/courseMain.htm?courseId=1006418002\x26amp;share=2\x26amp;shareId=400000000197013

版权

>关注公众号：大数据技术派，回复`资料`，领取`1024G`资料。

数据可视化可以让我们很直观的发现数据中隐藏的规律，察觉到变量之间的互动关系，可以帮助我们更好的给他人解释现象，做到一图胜千文的说明效果。

常见的数据可视化库有：

matplotlib 是最常见的2维库，可以算作可视化的必备技能库，由于matplotlib是比较底层的库，api很多，代码学起来不太容易。
seaborn 是建构于matplotlib基础上，能满足绝大多数可视化需求。更特殊的需求还是需要学习matplotlib。
pyecharts 上面的两个库都是静态的可视化库，而pyecharts有很好的web兼容性，可以做到可视化的动态效果。

但是在数据科学中，几乎都离不开pandas数据分析库，而pandas可以做：

数据采集：如何批量采集网页表格数据？
数据读取：pd.read_csv/pd.read_excel
数据清洗（预处理）：理解pandas中的apply和map的作用和异同
可视化，兼容matplotlib语法(今天重点)

准备工作

如果你之前没有学过pandas和matpltolib,我们先安装好这几个库

!pip3 install numpy
!pip3 install pandas
!pip3 install matplotlib

已经安装好，现在我们导入这几个要用到的库。使用的是伦敦天气数据，一开始我们只有12个月的小数据作为例子

#jupyter notebook中需要加这行代码
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#读取天气数据
df = pd.read_csv('data/london2018.csv')
df

plot最简单的图

选择Month作为横坐标，Tmax作为纵坐标，绘图。

大家注意下面两种写法

#写法1
df.plot(x='Month', y='Tmax')
plt.show()

横坐标轴参数x传入的是df中的列名Month
纵坐标轴参数y传入的是df中的列名Tmax

折线图

上面的图就是折线图，折线图语法有三种

df.plot(x='Month', y='Tmax')
df.plot(x='Month', y='Tmax', kind='line')
df.plot.line(x='Month', y='Tmax')

df.plot.line(x='Month', y='Tmax')
plt.show()

#grid绘制格线
df.plot(x='Month', y='Tmax', kind='line', grid=True)
plt.show()

多个y值

上面的折线图中只有一条线，如何将多个y绘制到一个图中，比如Tmax， Tmin。

df.plot(x='Month', y=['Tmax', 'Tmin'])
plt.show()

条形图

df.plot(x='Month',
        y='Rain',
        kind='bar')
#同样还可以这样画
#df.plot.bar(x='Month', y='Rain')
plt.show()

水平条形图

bar环卫barh，就可以将条形图变为水平条形图

df.plot(x='Month',
        y='Rain',
        kind='barh')
#同样还可以这样画
#df.plot.bar(x='Month', y='Rain')
plt.show()

多个变量的条形图

df.plot(kind='bar',
        x = 'Month',
       y=['Tmax', 'Tmin'])
plt.show()

散点图

df.plot(kind='scatter',
        x = 'Month',
        y = 'Sun')
plt.show()

饼形图

df.plot(kind='pie', y='Sun')
plt.show()

上图绘制有两个小问题：

legend图例不应该显示
月份的显示用数字不太正规

df.index = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df.plot(kind='pie', y = 'Sun', legend=False)
plt.show()

更多数据

一开头的数据只有12条记录(12个月)的数据，现在我们用更大的伦敦天气数据

import pandas as pd
df2 = pd.read_csv('data/londonweather.csv')
df2.head()

df2.Rain.describe()
count    748.000000
mean      50.408957
std       29.721493
min        0.300000
25%       27.800000
50%       46.100000
75%       68.800000
max      174.800000
Name: Rain, dtype: float64

上面一共有748条记录, 即62年的记录。

箱型图

df2.plot.box(y='Rain')
#df2.plot(y='Rain', kind='box')
plt.show()

直方图

df2.plot(y='Rain', kind='hist')
#df2.plot.hist(y='Rain')
plt.show()

纵坐标的刻度可以通过bins设置

df2.plot(y='Rain', kind='hist', bins=[0,25,50,75,100,125,150,175, 200])
#df2.plot.hist(y='Rain')
plt.show()

多图并存

df.plot(kind='line',
         y=['Tmax', 'Tmin', 'Rain', 'Sun'], #4个变量可视化
         subplots=True,   #多子图并存
         layout=(2, 2),   #子图排列2行2列
         figsize=(20, 10)) #图布的尺寸
plt.show()

df.plot(kind='bar',
         y=['Tmax', 'Tmin', 'Rain', 'Sun'], #4个变量可视化
         subplots=True,   #多子图并存
         layout=(2, 2),   #子图排列2行2列
         figsize=(20, 10)) #图布的尺寸
plt.show()

加标题

给可视化起个标题

df.plot(kind='bar',
         y=['Tmax', 'Tmin'], #2个变量可视化
         subplots=True,   #多子图并存
         layout=(1, 2),   #子图排列1行2列
         figsize=(20, 5),#图布的尺寸
         title='The Weather of London')  #标题
plt.show()

保存结果

可视化的结果可以存储为图片文件

df.plot(kind='pie', y='Rain', legend=False, figsize=(10, 5), title='Pie of Weather in London')
plt.savefig('img/pie.png')
plt.show()

df.plot更多参数

df.plot(x, y, kind, figsize, title, grid, legend, style)

x 只有dataframe对象时，x可用。横坐标
y 同上，纵坐标变量
kind 可视化图的种类，如line,hist, bar, barh, pie, kde, scatter
figsize 画布尺寸
title 标题
grid 是否显示格子线条
legend 是否显示图例
style 图的风格

查看plot参数可以使用help

import pandas as pd
help(pd.DataFrame.plot)

End.

来源：大邓和他的Python

猜你喜欢