【Python 数据科学】数据可视化进阶 matplotlib.pyplot

最新推荐文章于 2023-11-20 19:40:56 发布

Mercy92

最新推荐文章于 2023-11-20 19:40:56 发布

阅读量328

点赞数

分类专栏： # 入门Python数据科学

本文链接：https://blog.csdn.net/weixin_40844116/article/details/97694570

版权

入门Python数据科学专栏收录该内容

15 篇文章 1 订阅

订阅专栏

文章目录

#导入 matplotlib 包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

df=pd.read_csv('DataAnalyst.csv',encoding='gbk')
df.head()

	city	companyFullName	companyId	companyLabelList	companyShortName	companySize	businessZones	firstType	secondType	education	industryField	positionId	positionAdvantage	positionName	positionLables	bottom	top	avg	workYear
0	上海	纽海信息技术(上海)有限公司	8581	['技能培训', '节日礼物', '带薪年假', '岗位晋升']	1号店	2000人以上	['张江']	技术	数据开发	硕士	移动互联网	2537336	知名平台	数据分析师	['分析师', '数据分析', '数据挖掘', '数据']	7	9	8.0	应届毕业生
1	上海	上海点荣金融信息服务有限责任公司	23177	['节日礼物', '带薪年假', '岗位晋升', '扁平管理']	点融网	500-2000人	['五里桥', '打浦桥', '制造局路']	技术	数据开发	本科	金融	2427485	挑战机会,团队好,与大牛合作,工作环境好	数据分析师-CR2017-SH2909	['分析师', '数据分析', '数据挖掘', '数据']	10	15	12.5	应届毕业生
2	上海	上海晶樵网络信息技术有限公司	57561	['技能培训', '绩效奖金', '岗位晋升', '管理规范']	SPD	50-150人	['打浦桥']	设计	数据分析	本科	移动互联网	2511252	时间自由,领导nic	数据分析师	['分析师', '数据分析', '数据']	4	6	5.0	应届毕业生
3	上海	杭州数云信息技术有限公司上海分公司	7502	['绩效奖金', '股票期权', '五险一金', '通讯津贴']	数云	150-500人	['龙华', '上海体育场', '万体馆']	市场与销售	数据分析	本科	企业服务,数据服务	2427530	五险一金绩效奖金带薪年假节日福利	大数据业务分析师【数云校招】	['商业', '分析师', '大数据', '数据']	6	8	7.0	应届毕业生
4	上海	上海银基富力信息技术有限公司	130876	['年底双薪', '通讯津贴', '定期体检', '绩效奖金']	银基富力	15-50人	['上海影城', '新华路', '虹桥']	技术	软件开发	本科	其他	2245819	在大牛下指导	BI开发/数据分析师	['分析师', '数据分析', '数据', 'BI']	2	3	2.5	应届毕业生

一、问题解决

#解决问题1：中文不显示  将文字改成黑体
plt.rcParams['font.sans-serif']=['SimHei']
#解决问题2：在坐标轴上允许显示负数False
plt.rcParams['axes.unicode_minus']=False

#labels设置数据标签
plt.pie(df.groupby('education').top.count(),labels=df.groupby('education').top.count().index)

([<matplotlib.patches.Wedge at 0x23703a1b1d0>,
  <matplotlib.patches.Wedge at 0x23703a1b6a0>,
  <matplotlib.patches.Wedge at 0x23703a1bbe0>,
  <matplotlib.patches.Wedge at 0x23703a23160>,
  <matplotlib.patches.Wedge at 0x23703a236a0>],
 [Text(1.08382,0.187992,'不限'),
  Text(1.03437,0.374265,'博士'),
  Text(0.835846,0.715095,'大专'),
  Text(-1.03506,-0.372359,'本科'),
  Text(1.08084,-0.204438,'硕士')])

在这里插入图片描述

plt.plot(np.random.randint(-20,20,20))

[<matplotlib.lines.Line2D at 0x21ccd1f02e8>]

在这里插入图片描述

二、元素解析

基本参数

#定义画布的大小figsize
plt.figure(1,figsize=(10,4))
#绘制一张图  随机数数组 np.random.random_integers(-20,20,20)生成20个-20~20之间的数字
plt.plot(np.random.randint(-20,20,20))
#添加标题,不需要赋值
plt.title('cccc')
plt.title('这是一条折线')
#调整刻度范围
plt.xticks([0,10,30])
#命名轴名称
plt.xlabel('我是x轴')
#去除[内存地址]
plt.show()

在这里插入图片描述

figure
title
data
x轴刻度xticks 标签xlabel
y轴

多图的层叠

#多图的层叠
plt.plot(np.random.randint(-20,20,20))
plt.plot(np.random.randint(-20,20,20))
#.legend添加图例，传入元组
plt.legend(('no1','no2'))

##或者直接在绘图的时候定义图例名称label，添加颜色
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')
plt.plot(np.random.randint(-20,20,20),label='no2')
#.legend打开图例
plt.legend()

<matplotlib.legend.Legend at 0x21cce3c5f98>

在这里插入图片描述

#将df表按照education和city分类后，聚合出每个字段平均值和计数值，取avg字段，并让序列数据化，得到新表
#.aggregate(['mean','count'])是可以将每个字段计算出平均值（数值型）和记数值 并增加名称为mean和count的新字段
data=df.groupby(['education','city']).agg(['mean','count']).avg.reset_index()
data.head()

	education	city	mean	count
0	不限	上海	14.051471	68
1	不限	北京	15.495238	210
2	不限	南京	7.000000	5
3	不限	厦门	12.500000	3
4	不限	天津	3.500000	1

#按照education分组，逐组打印
#a 是分组的依据即这里的‘education' b则是被education分组后的df
for a,b in data.groupby('education'):
    #print(b)
    #去除分组后的mean字段和count字段
    x=b['mean']#series
    y=b['count']#series
    #添加图例
    plt.scatter(x,y,label=a)
#打开图例，放在左边
plt.legend(loc='upper left')
#命名轴名称
plt.xlabel('平均薪资')
plt.ylabel('职位总数')    
#逐个打印
plt.show()

#例如博士只有三个点，说明博士这一个教育等级分类下只有三个城市有数据

在这里插入图片描述

绘制子图

#绘制子图,定义画布大小
plt.figure(figsize=(12,4))


#拆分成两个子图1行2列在左图绘制位置是1(左)  类似拆分单元格
plt.subplot(1,2,1)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')
plt.legend()

#拆分成两个子图1行2列在右图绘制位置是2（右）  逗号可以省略
plt.subplot(122)
#子图层叠
plt.plot(np.random.randint(-20,20,20),label='no2',color='r')
plt.plot(np.random.randint(-20,20,20),label='no3',color='b')
plt.legend()

#展示两幅图
plt.show()

在这里插入图片描述

#绘制子图,定义画布大小
plt.figure(figsize=(12,4))


#拆分成两个子图2行1列在左图绘制位置是1（上）  类似拆分单元格
plt.subplot(2,1,1)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')
plt.legend()

#拆分成两个子图2行1列在右图绘制位置是2（下）  逗号可以省略
plt.subplot(212)
plt.plot(np.random.randint(-20,20,20),label='no2',color='r')
plt.plot(np.random.randint(-20,20,20),label='no3',color='b')
plt.legend()
#展示两幅图
plt.show()

在这里插入图片描述

#绘制子图,定义画布大小
plt.figure(figsize=(12,8))

#同理，可以把画布拆分成2*2的单元格，4块子画布的位置分别是2221 222 223 224

plt.subplot(2,2,1)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')



plt.subplot(222)
plt.plot(np.random.randint(-20,20,20),label='no2',color='r')
plt.plot(np.random.randint(-20,20,20),label='no3',color='b')
plt.legend()


plt.subplot(2,2,3)
plt.plot(np.random.randint(-20,20,20),label='no4',color='y')

plt.subplot(2,2,4)
plt.plot(np.random.randint(-20,20,20),label='no5',color='black')

#展示图
plt.show()

在这里插入图片描述

#绘制子图,定义画布大小
plt.figure(figsize=(12,8))

#同理，可以把画布拆分成2*2的单元格，4块子画布的位置分别是2221 222 223 224

plt.subplot(2,2,1)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')
plt.legend()


plt.subplot(222)
plt.plot(np.random.randint(-20,20,20),label='no2',color='r')
plt.plot(np.random.randint(-20,20,20),label='no3',color='b')
plt.legend()

#第三张图为宽图,重置子画布区分，不管上面,看作是独立的 2行1列位置是2(下方)
plt.subplot(212)
plt.plot(np.random.randint(-20,20,20),label='no4',color='y')


#展示图
plt.show()

在这里插入图片描述

#绘制子图,定义画布大小
plt.figure(figsize=(12,8))

#同理，可以把画布拆分成2*2的单元格，4块子画布的位置分别是2221 222 223 224

plt.subplot(2,2,1)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')



plt.subplot(223)
plt.plot(np.random.randint(-20,20,20),label='no2',color='r')
plt.plot(np.random.randint(-20,20,20),label='no3',color='b')
plt.legend()

#第三张图为宽图 1行2列位置是2(右方)
plt.subplot(122)
plt.plot(np.random.randint(-20,20,20),label='no4',color='y')


#展示图
plt.show()

在这里插入图片描述

plt.figure(figsize=(12,4))
plt.subplot(121)
plt.plot(np.random.randint(-20,20,20),label='no1',color='g')

#a 答应出来的是分组的依据即这里的‘education' b则是被education分组后的数据
for a,b in data.groupby('education'):
    #print(b)
    #去除分组后的mean字段和count字段
    x=b['mean']
    y=b['count']
    #并层打印 添加图例
    
    #设置所在画布位置
    plt.subplot(122)
    
    plt.scatter(x,y,label=a)
    #打开图例，放在左边
plt.legend(loc='upper left')
#命名轴名称
plt.xlabel('平均薪资')
plt.ylabel('职位总数')    
#逐个打印
plt.show()

#例如博士只有三个点，说明博士这一个教育等级分类下只有三个城市有数据

在这里插入图片描述

Mercy92

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【Python 数据科学】数据可视化进阶 matplotlib.pyplot

文章目录一、问题解决二、元素解析基本参数多图的层叠绘制子图#导入 matplotlib 包import pandas as pdimport numpy as npimport matplotlib.pyplot as plt%matplotlib inlinedf=pd.read_csv('DataAnalyst.csv',encoding='gbk')df.head()...
复制链接

扫一扫

专栏目录