Pandas绘图
Series和DataFrame都有一个用于生成各类图表的plot方法。默认情况下,它们所生成的是线形图
线形图
简单的Series图表示例,plot()
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
%matplotlib inline
plot()
线形图反映的是趋势.
s = Series(data=np.random.randint(0,10, size=10))
s
0 4
1 6
2 5
3 8
4 7
5 7
6 5
7 6
8 3
9 0
dtype: int32
# 以index作为x轴数据, values作为y轴数据
s.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x110ee518>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QDxsFTW1-1634125797106)(output_8_1.png)]
简单的DataFrame图表示例,plot()
- 图例的位置可能会随着数据的不同而不同
data = np.random.randint(0,150, size=(4,3))
index = ['张三', '李四', '王五', '赵六']
columns = ['语文', '数学', '英语']
df = DataFrame(index=index, data=data, columns=columns)
df
语文 | 数学 | 英语 | |
---|---|---|---|
张三 | 32 | 72 | 142 |
李四 | 15 | 127 | 67 |
王五 | 66 | 9 | 99 |
赵六 | 29 | 93 | 95 |
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x4a21208>
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 24352 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 19977 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 26446 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 22235 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 29579 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 20116 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 36213 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 20845 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 24352 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 19977 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 26446 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 22235 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 29579 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 20116 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 36213 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 20845 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 35821 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 25991 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 25968 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 23398 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 33521 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 35821 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 25991 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 25968 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 23398 missing from current font.
font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 33521 missing from current font.
font.set_text(s, 0, flags=flags)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xMPs2UhB-1634125797109)(output_11_2.png)]
plt.rcParams['font.sans-serif'] = ['SimHei']
# 以dataframe的index作为x轴数据, 每一列是一条线
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x161b4d30>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a0URR9vB-1634125797110)(output_13_1.png)]
df.index = [-1, -2, 0, 1]
df
语文 | 数学 | 英语 | |
---|---|---|---|
-1 | 47 | 104 | 39 |
-2 | 46 | 10 | 130 |
0 | 78 | 86 | 27 |
1 | 125 | 90 | 140 |
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x148a6438>
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 8722 missing from current font.
font.set_text(s, 0, flags=flags)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-viRbF1QR-1634125797112)(output_15_2.png)]
# 解决中文字体下负号无法正常显示的问题
plt.rcParams['axes.unicode_minus'] = False
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x160f7c88>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e1UCkKjN-1634125797113)(output_17_1.png)]
柱状图
Series柱状图示例,kind = ‘bar’/‘barh’
柱状图一般用来比较大小
s.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x1604f208>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mxlNiT6x-1634125797114)(output_21_1.png)]
DataFrame柱状图示例
# 每一行数据放在一起比较大小
df.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x13bd60f0>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1wrNXJbu-1634125797114)(output_23_1.png)]
读取文件tips.csv,查看每天的聚会人数情况
每天各种聚会规模的比例
求和并df.sum(),注意灵活使用axis
tips = pd.read_csv('../data/tips.csv')
tips
day | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
0 | Fri | 1 | 16 | 1 | 1 | 0 | 0 |
1 | Stat | 2 | 53 | 18 | 13 | 1 | 0 |
2 | Sun | 0 | 39 | 15 | 18 | 3 | 1 |
3 | Thur | 1 | 48 | 4 | 5 | 1 | 3 |
tips.set_index(keys='day', inplace=True)
tips.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x168473c8>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Lr17ZdYI-1634125797115)(output_27_1.png)]
tips.sum(axis=1)
day
Fri 19
Stat 87
Sun 76
Thur 62
dtype: int64
# 先求每一行的和
result = tips.div(tips.sum(axis=1), axis=0)
16 / 19
0.8421052631578947
result.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x16a575f8>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7X93Go2J-1634125797115)(output_31_1.png)]
直方图
直方图一般用来表示数据的分布情况
n = np.array([1,1,2,3,4,4,5,6,6,8])
s = Series(data=n)
# 直方图histogram
s.plot(kind='hist', bins=5, density=True)
# 加入density=True之后,y轴的值变成了概率/组距的结果
# 如果bins设置不太合理,可以加上kde图来显示数据分布
s.plot(kind='kde')
# kde = kernel density estimate
<matplotlib.axes._subplots.AxesSubplot at 0x18c95400>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fvmFFPVM-1634125797116)(output_34_1.png)]
!pip install scipy -i https://pypi.douban.com/simple
Looking in indexes: https://pypi.douban.com/simple
Collecting scipy
Using cached https://pypi.doubanio.com/packages/9e/fd/9a995b7fc18c6c17ce570b3cfdabffbd2718e4f1830e94777c4fd66e1179/scipy-1.3.0-cp36-cp36m-win_amd64.whl
Requirement already satisfied: numpy>=1.13.3 in d:\1903\.venv\lib\site-packages (from scipy) (1.17.0)
Installing collected packages: scipy
Successfully installed scipy-1.3.0
# 计算一组数据的直方数据
np.histogram(n, bins=100, density=True)
(array([2.85714286, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1.42857143,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1.42857143, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 2.85714286, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 1.42857143, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 2.85714286, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1.42857143]),
array([1. , 1.07, 1.14, 1.21, 1.28, 1.35, 1.42, 1.49, 1.56, 1.63, 1.7 ,
1.77, 1.84, 1.91, 1.98, 2.05, 2.12, 2.19, 2.26, 2.33, 2.4 , 2.47,
2.54, 2.61, 2.68, 2.75, 2.82, 2.89, 2.96, 3.03, 3.1 , 3.17, 3.24,
3.31, 3.38, 3.45, 3.52, 3.59, 3.66, 3.73, 3.8 , 3.87, 3.94, 4.01,
4.08, 4.15, 4.22, 4.29, 4.36, 4.43, 4.5 , 4.57, 4.64, 4.71, 4.78,
4.85, 4.92, 4.99, 5.06, 5.13, 5.2 , 5.27, 5.34, 5.41, 5.48, 5.55,
5.62, 5.69, 5.76, 5.83, 5.9 , 5.97, 6.04, 6.11, 6.18, 6.25, 6.32,
6.39, 6.46, 6.53, 6.6 , 6.67, 6.74, 6.81, 6.88, 6.95, 7.02, 7.09,
7.16, 7.23, 7.3 , 7.37, 7.44, 7.51, 7.58, 7.65, 7.72, 7.79, 7.86,
7.93, 8. ]))
0.2 / 0.07
2.857142857142857
rondom生成随机数百分比直方图,调用hist方法
- 柱高表示数据的频数,柱宽表示各组数据的组距
- 参数bins可以设置直方图方柱的个数上限,越大柱宽越小,数据分组越细致
- 设置density参数为True,可以把频数转换为概率
kde图:核密度估计,用于弥补直方图由于参数bins设置的不合理导致的精度缺失问题
练习
绘制一个由两个不同的正态分布组成的的双峰分布
n1 = np.random.normal(loc=5, scale=5, size=10000)
n2 = np.random.normal(loc=30, scale=8, size=10000)
n = np.hstack((n1,n2))
s = Series(data=n)
s.plot(kind='hist', bins=500, density=True)
s.plot(kind='kde')
<matplotlib.axes._subplots.AxesSubplot at 0x1ada0f60>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-M8LEPRYI-1634125797116)(output_42_1.png)]
散布图(散点图)
散布图
散布图是观察两个一维数据数列之间的关系的有效方法,DataFrame对象可用
使用方法:
设置kind = ‘scatter’,给明标签columns
散点图研究两个一维数据之间的关系.
df = DataFrame({'A': np.random.randn(1000), 'B': np.random.randn(1000), 'C': np.random.randn(1000), 'D': np.random.randn(1000)})
df.head()
A | B | C | D | |
---|---|---|---|---|
0 | 1.350469 | -0.757953 | -0.130467 | -0.558658 |
1 | 1.703911 | -0.053533 | -1.290569 | 0.619652 |
2 | -0.132011 | 1.871678 | -0.383147 | -0.774807 |
3 | -0.632711 | 1.487892 | 0.447683 | -2.492515 |
4 | 0.937552 | -0.200019 | -0.449218 | -0.776772 |
df.plot(x='A', y='B', kind='scatter')
<matplotlib.axes._subplots.AxesSubplot at 0x1ce364a8>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jqrwoeRk-1634125797117)(output_47_1.png)]
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
df2 = DataFrame({'x': x, 'y':y})
df2.plot(x='x', y='y', kind='scatter')
<matplotlib.axes._subplots.AxesSubplot at 0x1d021908>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BIQN1I80-1634125797117)(output_48_1.png)]
散布图矩阵,当有多个点时,两两点的关系
使用函数:pd.plotting.scatter_matrix(),
- 参数diagnol:设置对角线的图像类型
_ = pd.plotting.scatter_matrix(df, figsize=(16, 16), alpha=0.6, diagonal='kde')
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6CWubpDQ-1634125797118)(output_50_0.png)]