pandas数据分析给力教程【完整版】(七)

Pandas绘图

上一篇:pandas数据分析给力教程【完整版】(六)

Series和DataFrame都有一个用于生成各类图表的plot方法。默认情况下,它们所生成的是线形图

线形图

简单的Series图表示例,plot()

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import matplotlib.pyplot as plt
%matplotlib inline
plot()
线形图反映的是趋势.
s = Series(data=np.random.randint(0,10, size=10))
s
0    4
1    6
2    5
3    8
4    7
5    7
6    5
7    6
8    3
9    0
dtype: int32
# 以index作为x轴数据, values作为y轴数据
s.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x110ee518>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QDxsFTW1-1634125797106)(output_8_1.png)]

简单的DataFrame图表示例,plot()

  • 图例的位置可能会随着数据的不同而不同
data = np.random.randint(0,150, size=(4,3))
index = ['张三', '李四', '王五', '赵六']
columns = ['语文', '数学', '英语']
df = DataFrame(index=index, data=data, columns=columns)
df
语文数学英语
张三3272142
李四1512767
王五66999
赵六299395
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x4a21208>



d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 24352 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 19977 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 26446 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 22235 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 29579 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 20116 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 36213 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 20845 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 24352 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 19977 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 26446 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 22235 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 29579 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 20116 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 36213 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 20845 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 35821 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 25991 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 25968 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 23398 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 33521 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 35821 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 25991 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 25968 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 23398 missing from current font.
  font.set_text(s, 0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 33521 missing from current font.
  font.set_text(s, 0, flags=flags)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xMPs2UhB-1634125797109)(output_11_2.png)]

plt.rcParams['font.sans-serif'] = ['SimHei']
# 以dataframe的index作为x轴数据, 每一列是一条线
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x161b4d30>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a0URR9vB-1634125797110)(output_13_1.png)]

df.index = [-1, -2, 0, 1]
df
语文数学英语
-14710439
-24610130
0788627
112590140
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x148a6438>



d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 8722 missing from current font.
  font.set_text(s, 0.0, flags=flags)
d:\1903\.venv\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 8722 missing from current font.
  font.set_text(s, 0, flags=flags)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-viRbF1QR-1634125797112)(output_15_2.png)]

# 解决中文字体下负号无法正常显示的问题
plt.rcParams['axes.unicode_minus'] = False
df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x160f7c88>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e1UCkKjN-1634125797113)(output_17_1.png)]

柱状图

Series柱状图示例,kind = ‘bar’/‘barh’

柱状图一般用来比较大小
s.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x1604f208>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mxlNiT6x-1634125797114)(output_21_1.png)]

DataFrame柱状图示例

# 每一行数据放在一起比较大小
df.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x13bd60f0>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1wrNXJbu-1634125797114)(output_23_1.png)]

读取文件tips.csv,查看每天的聚会人数情况
每天各种聚会规模的比例

求和并df.sum(),注意灵活使用axis

tips = pd.read_csv('../data/tips.csv')
tips
day123456
0Fri1161100
1Stat253181310
2Sun039151831
3Thur1484513
tips.set_index(keys='day', inplace=True)
tips.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x168473c8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Lr17ZdYI-1634125797115)(output_27_1.png)]

tips.sum(axis=1)
day
Fri     19
Stat    87
Sun     76
Thur    62
dtype: int64
# 先求每一行的和
result = tips.div(tips.sum(axis=1), axis=0)
16 / 19
0.8421052631578947
result.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x16a575f8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7X93Go2J-1634125797115)(output_31_1.png)]

直方图

直方图一般用来表示数据的分布情况
n = np.array([1,1,2,3,4,4,5,6,6,8])
s = Series(data=n)
# 直方图histogram
s.plot(kind='hist', bins=5,  density=True)
# 加入density=True之后,y轴的值变成了概率/组距的结果
# 如果bins设置不太合理,可以加上kde图来显示数据分布
s.plot(kind='kde')
# kde = kernel density estimate 
<matplotlib.axes._subplots.AxesSubplot at 0x18c95400>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fvmFFPVM-1634125797116)(output_34_1.png)]

!pip install scipy -i https://pypi.douban.com/simple
Looking in indexes: https://pypi.douban.com/simple
Collecting scipy
  Using cached https://pypi.doubanio.com/packages/9e/fd/9a995b7fc18c6c17ce570b3cfdabffbd2718e4f1830e94777c4fd66e1179/scipy-1.3.0-cp36-cp36m-win_amd64.whl
Requirement already satisfied: numpy>=1.13.3 in d:\1903\.venv\lib\site-packages (from scipy) (1.17.0)
Installing collected packages: scipy
Successfully installed scipy-1.3.0
# 计算一组数据的直方数据
np.histogram(n,  bins=100, density=True)
(array([2.85714286, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 1.42857143,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 1.42857143, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 2.85714286, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 1.42857143, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 2.85714286, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 1.42857143]),
 array([1.  , 1.07, 1.14, 1.21, 1.28, 1.35, 1.42, 1.49, 1.56, 1.63, 1.7 ,
        1.77, 1.84, 1.91, 1.98, 2.05, 2.12, 2.19, 2.26, 2.33, 2.4 , 2.47,
        2.54, 2.61, 2.68, 2.75, 2.82, 2.89, 2.96, 3.03, 3.1 , 3.17, 3.24,
        3.31, 3.38, 3.45, 3.52, 3.59, 3.66, 3.73, 3.8 , 3.87, 3.94, 4.01,
        4.08, 4.15, 4.22, 4.29, 4.36, 4.43, 4.5 , 4.57, 4.64, 4.71, 4.78,
        4.85, 4.92, 4.99, 5.06, 5.13, 5.2 , 5.27, 5.34, 5.41, 5.48, 5.55,
        5.62, 5.69, 5.76, 5.83, 5.9 , 5.97, 6.04, 6.11, 6.18, 6.25, 6.32,
        6.39, 6.46, 6.53, 6.6 , 6.67, 6.74, 6.81, 6.88, 6.95, 7.02, 7.09,
        7.16, 7.23, 7.3 , 7.37, 7.44, 7.51, 7.58, 7.65, 7.72, 7.79, 7.86,
        7.93, 8.  ]))
0.2 / 0.07
2.857142857142857

rondom生成随机数百分比直方图,调用hist方法

  • 柱高表示数据的频数,柱宽表示各组数据的组距
  • 参数bins可以设置直方图方柱的个数上限,越大柱宽越小,数据分组越细致
  • 设置density参数为True,可以把频数转换为概率

kde图:核密度估计,用于弥补直方图由于参数bins设置的不合理导致的精度缺失问题

练习

绘制一个由两个不同的正态分布组成的的双峰分布

n1 = np.random.normal(loc=5, scale=5, size=10000)
n2 = np.random.normal(loc=30, scale=8, size=10000)

n = np.hstack((n1,n2))
s = Series(data=n)
s.plot(kind='hist', bins=500, density=True)
s.plot(kind='kde')
<matplotlib.axes._subplots.AxesSubplot at 0x1ada0f60>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-M8LEPRYI-1634125797116)(output_42_1.png)]

散布图(散点图)

散布图
散布图是观察两个一维数据数列之间的关系的有效方法,DataFrame对象可用

使用方法:
设置kind = ‘scatter’,给明标签columns

散点图研究两个一维数据之间的关系.
df = DataFrame({'A': np.random.randn(1000), 'B': np.random.randn(1000), 'C': np.random.randn(1000), 'D': np.random.randn(1000)})
df.head()
ABCD
01.350469-0.757953-0.130467-0.558658
11.703911-0.053533-1.2905690.619652
2-0.1320111.871678-0.383147-0.774807
3-0.6327111.4878920.447683-2.492515
40.937552-0.200019-0.449218-0.776772
df.plot(x='A', y='B', kind='scatter')
<matplotlib.axes._subplots.AxesSubplot at 0x1ce364a8>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jqrwoeRk-1634125797117)(output_47_1.png)]

x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
df2 = DataFrame({'x': x, 'y':y})
df2.plot(x='x', y='y', kind='scatter')
<matplotlib.axes._subplots.AxesSubplot at 0x1d021908>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BIQN1I80-1634125797117)(output_48_1.png)]

散布图矩阵,当有多个点时,两两点的关系

使用函数:pd.plotting.scatter_matrix(),

  • 参数diagnol:设置对角线的图像类型

_ = pd.plotting.scatter_matrix(df, figsize=(16,  16), alpha=0.6, diagonal='kde')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6CWubpDQ-1634125797118)(output_50_0.png)]

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值