Python-Matplotlib{数据可视化}

最新推荐文章于 2023-07-08 14:15:00 发布

樊鴻燁

最新推荐文章于 2023-07-08 14:15:00 发布

阅读量418

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/qq_43524475/article/details/118982299

版权

基础篇 | python数据清洗专栏收录该内容

8 篇文章 2 订阅

订阅专栏

画图的复杂度比较高。自由度比较高

基本配置

导入Matplotlib库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # matplotlib的子包有很多，这里我们只导入常用的包pyplot。

绘制图

绘制折线图

绘制折线图使用plot函数，plot函数有很多参数

x轴
y轴
color 颜色
linewidth 线宽
markersize 点大小
fontsize 字体
marker 点形状
linestyle 线风格
linewidth 线宽
lable 标签
alpha 透明度

除了x轴，y轴其他参数都有多种选择。python plot 参数_matplotlib.pyplot.plot()参数使用详解

x = [1,2,3,4] 
y = [2,4,1,6]
plt.plot(x,y) # x与y一一对应

在这里插入图片描述

plt.plot(x,y , color = 'yellow');

在这里插入图片描述

plt.plot(x,y , color = '#8CEA00',linewidth=5,marker='D',
        markersize = 20,linestyle='-',alpha=0.3);

在这里插入图片描述

# 简易的写法
plt.plot(x, y, 'go--',linewidth= 5) # 第三个参数: 三个值组成一个字符串 '颜色点形状线形状'

在这里插入图片描述

查看都有什么风格style

plt.style 绘图风格相关功能

plt.style.available # 查看我们的包里面都有啥风格
'''
['Solarize_Light2',
 '_classic_test_patch',
 'bmh',
 'classic',
 'dark_background',
 'fast',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn',
 'seaborn-bright',
 'seaborn-colorblind',
 'seaborn-dark',
 'seaborn-dark-palette',
 'seaborn-darkgrid',
 'seaborn-deep',
 'seaborn-muted',
 'seaborn-notebook',
 'seaborn-paper',
 'seaborn-pastel',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-ticks',
 'seaborn-white',
 'seaborn-whitegrid',
 'tableau-colorblind10']
1

'''

plt.plot( [1, 2, 3]) # 默认为y轴，而x 轴相当于[0,1,2]
plt.title('a picture')  # 设置图表的标题

在这里插入图片描述

plt.style.use('seaborn') # 使用某种绘图风格

在这里插入图片描述
设置支持中文字体显示

# 大家知道怎么写即可，必要时粘贴复制就行

# windows电脑 Matplotlib 支持中文
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus'] = False
#mac电脑正常显示中文
plt.rcParams['font.family'] = ['Arial Unicode MS']

plt.plot( [1, 2, 3])
plt.title('这是一张图')

在这里插入图片描述

折线图使用

常用于绘制某个数据的走势波动等,例如股票

flights = pd.read_csv('flights.csv')
year_group = flights.groupby('year').sum()
plt.plot(year_group)

在这里插入图片描述

文件下载：用于博客系列文章pandans的学习文件.zip

实际上在Pandas已经封装了更方便的画图功能

不用Matplotbib 也能实现，Pandas里面已经封装了更方便的画图, ，参数如下

kind : str

‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
‘scatter’ : scatter plot
‘hexbin’ : hexbin plot

# pandas自带画图功能
year_group.plot(kind='line')

在这里插入图片描述

# pandas自带画图功能
year_group.plot(kind='barh');

在这里插入图片描述

# pandas自带画图功能
year_group.passengers.plot(kind='pie')

在这里插入图片描述

# DataFrame
grade = pd.read_csv('student_grade.txt',sep='\t')
grade.plot(kind='line',y=['数学','语文','英语'])

在这里插入图片描述

小练习

绘制一个图形通用方法:

首先生成横坐标,可以用线性序列函数生成.
带入函数中计算出每个点对应的纵坐标.
根据生成的横坐标和纵坐标绘图

# 绘制一个正弦曲线图
# 横坐标
x = np.linspace(0, 10, 100) # 在指定的间隔内返回均匀间隔的数字。
y = np.sin(x)
plt.plot(x,y);

在这里插入图片描述
绘制sigmoid函数图像

$S(x)=\frac{1}{1+e^{-x}}$

def sigmoid(x): # 可以看到, 函数中只要一个未知数x, 因此函数只有一个参数
    r = 1 / (1 + np.exp(-x) )
    return r
x = np.linspace(-10, 10, 100)
y = sigmoid(x)
plt.plot(x,y);

在这里插入图片描述
绘制标准正太分布函数图像

$f(x)=\frac{1}{\sqrt{2 \pi}} e^{\left(-\frac{x^{2}}{2}\right)}$

def normal_distribution(x):
    r = 1 / np.sqrt(2*np.pi) * np.exp(-x**2/2)
    return r
x = np.linspace(-10, 10, 100)
y = normal_distribution(x)
plt.plot(x,y);

在这里插入图片描述

优化图片

添加常见图例属性

plt.figure 图片基本设置
- figsize = (x,y) 调节图片大小
- dpi 图像清晰度
plt.title 添加标题
plt.xlabel(’’)
plt.ylabel(’’) x轴和y轴添加标签
plt.grid(True) 添加网格
plt.xlim(-1,20) plt.ylim(-1,1) 设置坐标轴范围
plt.text 图片添加文本
plt.legend 添加图例
rotation=0, 旋转角度
labelpad =10 和旁边的距离
xticks 横坐标刻度
yticks 纵坐标刻度


plt.figure(figsize=(3,2),dpi=200 ) # figsize : (float, float)  图片的宽度,高度
plt.title('历年航班乘客走势图', fontsize= 6, color='green')
plt.xlabel('年份',color='red',fontsize = 8)
plt.ylabel('数量',color='red',fontsize = 8,rotation=0,labelpad =10)
plt.plot(year_group ,label = '乘客数量变化曲线')
plt.grid(True) # False去掉网格
plt.xlim(1946,1964) 
plt.ylim(1000,6000)
plt.xticks(fontsize = 5,ticks=[1946,1948,1950,1952,1954,1956,1958,1960,1962,1964])
plt.yticks(fontsize = 5)
plt.text(1958,4200, '在这增加的挺快!',fontsize=6,color='purple')
plt.legend(fontsize=5,loc=4); # x, y  是坐标，其实是针对plt.plot(year_group ,label = '乘客数量变化曲线')进行修改的。loc = 4 代表是在右下角的

在这里插入图片描述

小练习

&enmsp; 优化正太图像
$f(x)=\frac{1}{\sqrt{2 \pi}} e^{\left(-\frac{x^{2}}{2}\right)}$

def normal_distribution(x):
    r = 1 / np.sqrt(2*np.pi) * np.exp(-x**2/2)
    return r
x = np.linspace(-10, 10, 100)
y = normal_distribution(x)

图片大小(3,2)

plt.figure(figsize = (3,2))

图片清晰度150

plt.figure(figsize = (3,2),dpi = 150)

图片标题’标准正态分布’, 字体绿色, 字体大小10号

plt.title('标准正态分布',fontsize = 10, color = 'green')

x轴名称<数值>

plt.xlabe('数值')

y轴名称<概率>

plt.ylabe('概率')

x轴坐标范围(-8,8)

plt.xlim(-8.8)

y轴坐标范围(-0.1,0.5)

plt.ylim(-0.1,0.5)

x轴刻度线(-8,-7…7,8)

plt.xticks(ticks=range(-8,9,2))

y轴刻度线(-0.1,0, 0.1.0.2,0.3,0.4,0.5)

plt.yticks(ticks=np.arange(-0.1,0.5,0.1))

plt.plot(x,y,label='正太分布曲线')
plt.legend(fontsize= 5);

在这里插入图片描述

一个图中画多条线

x = np.linspace(0.1, 10, 100)
y1 = np.sin(x)
y2 = np.log(x)
y3 = np.cos(x)
y4 =np.tan(x)
plt.plot(x, y1, label='sin图像') # sin
plt.plot(x, y2, label='log图像') # cos
plt.plot(x,y3, label='cos图像') # log
# plt.plot(x,y4, label='tan图像') # tan
plt.legend(fontsize=15);

其他各种图像

柱状图

s1 = [1, 2, 3, 4, 5] # 位置坐标
s2 = [10, 13, 6, 3, 12]  # 表示高度
plt.bar(s1,s2, align='edge',); # 坐标在柱的边缘

在这里插入图片描述

plt.bar(s1,s2,width=0.2,color=['red','green','yellow','blue','purple']); # width表示宽度，color就是颜色

在这里插入图片描述

plt.bar(s1, s2, width=0.5, color=['r', 'y', 'b', 'g', 'm'], 
      tick_label=['Java', 'C', 'C++', 'PHP', 'Python']   )
plt.title('编程语言使用数量', fontsize=15);

在这里插入图片描述

假设有两家分店从周一到周五的营业额，将两家分店营业额绘制成柱状图进行比较

# 让两个店的柱位置错开
plt.bar([1,2,3,4,5],a_money,width=0.3, label='A分店')
plt.bar([1.3,2.3,3.3,4.3,5.3],b_money,width=0.3, tick_label=tick_label,label='B分店')
# 添加图例说明
plt.legend(); # 默认在右上角

DataFrame数据画柱状图

drinks = pd.read_csv('drinks.csv')
drinks.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   country                       193 non-null    object 
 1   beer_servings                 193 non-null    int64  
 2   spirit_servings               193 non-null    int64  
 3   wine_servings                 193 non-null    int64  
 4   total_litres_of_pure_alcohol  193 non-null    float64
 5   continent                     170 non-null    object 
dtypes: float64(1), int64(3), object(2)
memory usage: 9.2+ KB
'''
# 我们发现最后一列中,只有170个有效值，其余的为空值
drinks = pd.read_csv('drinks.csv',
    keep_default_na=False,)
# 空值都是默认为 NAN，即源码中keep_default_na=True
# 所以当我们想要空值就是空值的话，那么可以设置 keep_default_na=False，这时候，读取出来的空值是空字符串
drinks.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country                         193 non-null object
beer_servings                   193 non-null int64
spirit_servings                 193 non-null int64
wine_servings                   193 non-null int64
total_litres_of_pure_alcohol    193 non-null float64
continent                       193 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 9.2+ KB
'''
# 排序
alcohol = drinks.groupby(by='continent')['total_litres_of_pure_alcohol'].sum().sort_values(ascending=False)
# 添加一个折线
plt.bar(alcohol.index, alcohol, width=0.5, )
plt.plot([0,1,2,3,4,5],alcohol,color='green')

在这里插入图片描述

t = drinks.groupby('continent').sum()
# DataFrame也可以画
t.plot(kind='bar')

在这里插入图片描述

直方图

bar是用来把你已经总结好的数据画出来，可以用来对比各个组的数据。

hist是制作一个频率分布图，比如说把一个数据分成10个部分，每个部分的频率是多少。大概看一下数据的分布。

# x 数据, 
plt.hist(drinks.total_litres_of_pure_alcohol, bins=20)
# bins 分成相等的20分
'''
(array([37., 16., 12., 18.,  4., 11., 10.,  5., 15., 15.,  7.,  6.,  5.,
         7., 11.,  8.,  3.,  2.,  0.,  1.]),
 array([ 0.  ,  0.72,  1.44,  2.16,  2.88,  3.6 ,  4.32,  5.04,  5.76,
         6.48,  7.2 ,  7.92,  8.64,  9.36, 10.08, 10.8 , 11.52, 12.24,
        12.96, 13.68, 14.4 ]),
 <a list of 20 Patch objects>)
'''

在这里插入图片描述

tips = pd.read_csv('tips.csv')
tips.time.value_counts()
'''
Dinner    176
Lunch      68
Name: time, dtype: int64
'''
plt.hist(tips.total_bill ,bins = 15)
'''
(array([ 2., 10., 37., 42., 49., 28., 24., 14., 12., 10.,  4.,  5.,  2.,
         1.,  4.]),
 array([ 3.07      ,  6.25266667,  9.43533333, 12.618     , 15.80066667,
        18.98333333, 22.166     , 25.34866667, 28.53133333, 31.714     ,
        34.89666667, 38.07933333, 41.262     , 44.44466667, 47.62733333,
        50.81      ]),
 <a list of 15 Patch objects>)
'''

在这里插入图片描述

# 统计概率值
plt.hist(tips.total_bill ,bins = [0,5,10,15,20,25,30,35,40,55],density=True)
'''
(array([0.00081967, 0.01311475, 0.05163934, 0.05491803, 0.03360656,
        0.01967213, 0.01311475, 0.00491803, 0.00273224]),
 array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 55]),
 <a list of 9 Patch objects>)
'''

在这里插入图片描述

散点图

plt.scatter

s 散点的点尺寸
c 颜色
cmap 颜色映射方法

#s 参数 , 尺寸映射
plt.scatter([1,2,3], [1,2,3] , c = 'red' ,s = 300);

在这里插入图片描述

#s 参数 , 尺寸映射
#s 参数 , 尺寸映射
plt.scatter([1,2,3,4], [1,2,3,4] , s=[100,200,300,400] ,c =[0,0,1,1]);

在这里插入图片描述

plt.scatter([1,2,3,4,5,6,7], [1,2,3,4,5,6,7] , s= 300,cmap='rainbow',
         c =[1,2,3,4,5,6,7]);

在这里插入图片描述

# 将男和女绘制成不同的颜色
(tips.sex == 'Female').astype('int')
'''
0      1
1      0
2      0
3      0
4      1
      ..
239    0
240    1
241    0
242    0
243    1
Name: sex, Length: 244, dtype: int64
'''

# 把男女标注成不同的颜色
plt.scatter(tips.total_bill, tips.tip,s=100 ,c=(tips.sex == 'Female').astype('int'),cmap='rainbow');

在这里插入图片描述

饼状图

plt.pie 饼状图

x, 数据
explode=None, 哪部分突出显示 (0, 0, 0.3, 0)
labels=None, 标签名称
colors=None, 颜色
autopct=None,百分号显示格式
pctdistance=0.6, 数字和边缘距离
shadow=False, 阴影
labeldistance=1.1, 标签距离
startangle=None, 角度
radius=None, 饼图半径
counterclock=True, 逆时针

plt.pie([1,2,3,4],labels=['a','b','c','d']);

在这里插入图片描述

labels = [ '吃饭', '交通', '游戏', '衣服']
data = [1000, 100, 500, 2000]

plt.figure(figsize=(3,3),dpi=200)
plt.pie(data,labels=labels,explode= (0, 0, 0.3, 0),
       shadow=True,labeldistance=1.2,autopct='%1.0f%%');

在这里插入图片描述

Series数据绘制

alcohol
'''
continent
EU    387.8
AF    159.4
NA    137.9
AS     95.5
SA     75.7
OC     54.1
Name: total_litres_of_pure_alcohol, dtype: float64
'''

plt.figure(figsize=(1.5,1.5),dpi=300)
plt.pie(alcohol,labels=alcohol.index,shadow=True);

在这里插入图片描述

箱线图

主要用于分析数据内部的分布状态或分散状态。其中箱线图（箱型图）的主要作用是发现数据内部整体的分布分散情况，包括上下限、各分位数、异常值。在这里插入图片描述

drinks[['beer_servings','spirit_servings','wine_servings']]

在这里插入图片描述

plt.boxplot([drinks.beer_servings,drinks.spirit_servings,drinks.wine_servings],
           labels=['啤酒','白酒','红酒']);

在这里插入图片描述

plt.boxplot([drinks.beer_servings,drinks.spirit_servings,drinks.wine_servings],
           labels=['啤酒','白酒','黄酒'],sym='*',
           patch_artist = True, boxprops = {'color':'g','facecolor':'yellow'});

在这里插入图片描述

DataFrame自带boxplot画图接口

DataFrame.boxplot(column=None, by=None, ax=None,showmeans=False, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, ...)
在这里插入图片描述

drinks.boxplot(column=['beer_servings','spirit_servings','wine_servings'],vert=False)

在这里插入图片描述

一张画布上面绘制多张图(子图)

subplot()中三个参数

第一个,行数
第二个,列数
第三个,图片位置

x = np.linspace(0,10,100)
y = np.sin(x)
y2 = np.exp(x)
y3 = np.sin(x)
y4 = np.cos(x)
plt.figure(figsize = (16, 14))

a1 = plt.subplot(221)
plt.plot(x,y)
plt.title('四张子图之一', fontsize=16)
plt.xlabel('x轴')
plt.ylabel('y轴')
# plt.plot(x,y)

a2 = plt.subplot(222)
plt.title('指数图像', fontsize=16)
plt.xlabel('x轴')
plt.ylabel('y轴')
plt.plot(x, y2)

a3 = plt.subplot(223)
plt.title('sin图像', fontsize=16)
plt.plot(x, np.sin(x), label='sinx')
plt.xlabel('x轴')
plt.ylabel('y轴')
plt.legend(fontsize=26)

a4 = plt.subplot(224)
plt.plot(x, np.cos(x))
plt.title('cos图像', fontsize=16);

在这里插入图片描述

保存图片

plt.figure(figsize = (16, 14))
plt.subplot(2, 2, 1)
s1 = [1, 2, 3, 4, 5]
s2 = [10, 13, 6, 3, 12]
plt.bar(s1, s2, width=0.5, color=['r', 'y', 'b', 'g', 'm'], 
       edgecolor ='k',linewidth =5,tick_label=['Java', 'C', 'C++', 'PHP', 'Python']   )
plt.title('编程语言使用数量', fontsize=15)
plt.subplot(2, 2, 2)
plt.pie(data,labels=labels,explode=(0, 0, 0.3, 0),autopct='%1.1f%%', 
        shadow=True,   colors=['r', 'k', 'g', 'b'] );
plt.subplot(2, 2, 3)
plt.scatter(grade.语文, grade.数学)


plt.subplot(2, 2, 4)
x = np.linspace(1, 10, 100)
y = np.sin(x)
plt.plot(x,y)

在这里插入图片描述

# 保存图片
plt.savefig('hello.jpg',dpi=200)

# 保存为pdf
plt.savefig('hello.pdf',dpi=200)

小练习

绘制一个雅虎股票走势图

stock_data = pd.read_csv('yahoo_stock.csv')
stock_data.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4419 entries, 0 to 4418
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            4419 non-null   object 
 1   Open            4419 non-null   float64
 2   High            4419 non-null   float64
 3   Low             4419 non-null   float64
 4   Close           4419 non-null   float64
 5   Adjusted_close  4419 non-null   float64
 6   Volume          4419 non-null   float64
dtypes: float64(6), object(1)
memory usage: 241.8+ KB
'''
# 我们可以发现 Data并不是一个时间型数据
#修改Data列为时间型数据
stock_data.Date = stock_data.Date.astype('datetime64')

# 将日期列变成索引列
stock_data.set_index('Date',inplace=True)

stock_data.sort_index(inplace=True) # 日期排序

plt.figure(figsize=(16, 14))
plt.title('雅虎历年股票走势图', fontsize=22)
plt.plot(stock_data.Open , label='每日开盘价')
# data.Open.plot( title='yahoo', legend=True,)
plt.xlabel('时间', fontsize=22)
plt.ylabel('价格', fontsize=22)
plt.axhline(stock_data.Open[0],color='black',linewidth = 3,label='上市价格')

# 在两者中填充颜色
plt.fill_between(x=stock_data.index, y1=stock_data.Open, y2=stock_data.Open[0],
             where=stock_data.Open>stock_data.Open[0],
                 color='red',alpha=0.5,label='增长部分'   )  #  x横坐标,where: 当满足什么条件的时候才进行填充

plt.fill_between(x=stock_data.index, y1=stock_data.Open, y2=stock_data.Open[0],
             where=stock_data.Open<stock_data.Open[0],color='green',alpha=0.5,
                label='破发部分')  #  x横坐标,where: 当满足什么条件的时候才进行填充

plt.annotate('最高价:{}!'.format(stock_data.Open.max()), ['2012-09-21', stock_data.Open.max()], fontsize=24,  xytext=(380, 700), 
         textcoords='figure points' ,   arrowprops ={"color":'red'})


plt.legend(fontsize=22, loc=2)
# 保存图片
plt.savefig('yahoo_stock.jpg',dpi=200);'

在这里插入图片描述

樊鴻燁

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python-Matplotlib{数据可视化}

Matplotlib基本配置导入Matplotlib库绘制图绘制折线图查看都有什么风格style折线图使用小练习优化图片添加常见图例属性小练习一个图中画多条线其他各种图像柱状图DataFrame数据画柱状图直方图散点图画图的复杂度比较高。自由度比较高基本配置导入Matplotlib库import numpy as npimport pandas as pdimport matplotlib.pyplot as plt # matplotlib的子包有很多，这里我们只导入常用的包pyplot
复制链接

扫一扫