python+pands+matplotlib分析Excel表格

liangchen_first

已于 2022-08-06 19:15:32 修改

阅读量984

点赞数 2

分类专栏： Python 文章标签： python matplotlib

于 2022-08-06 18:48:22 首次发布

本文链接：https://blog.csdn.net/weixin_42453317/article/details/126196141

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

需要分析的Excel表格是一张 2021年华为杯数学建模E题的获奖名单。需要分析出各个奖项在每个学校的分布。下面是表格的一部分。
在这里插入图片描述

第一步：把excel表格读进来

sheet = pd.read_excel("2021E.xls", sheet_name=0)

这个函数依赖xlrd包，pands内部会引用这个包。第一个参数是文件路径，我用的是相对路径也可以是绝对路径。还有几个比较重要的参数：

参数	含义
sheet_name	选择哪个sheet读入，默认是第0个sheet
header	指定哪一行作为列名，默认第0行
names	自定义列名，比如names = [‘xxx’, ‘xxx’ , ‘xxx’]
index_col	指定哪一列作为行索引

可以看下sheet的数据类型，打印的结果是：
(2704, 10)
<class ‘pandas.core.frame.DataFrame’>

说明Excel中有2704行，10列数据，数据类型是DataFrame对象。

print(sheet.shape)
print(type(sheet))

第二步：数据切割

需要将一等奖，二等奖，三等奖分割出来。
分割一等奖的数据：从打印结果可以看出一等奖有31个。

sheet1 = sheet.loc[sheet['奖项'] == '一等奖']
print(sheet1.shape)

分割二等奖的数据：有336个

sheet2 = sheet.loc[sheet['奖项'] == '二等奖']
print(sheet2.shape)

分割三等奖的数据：有537个

sheet3 = sheet.loc[sheet['奖项'] == '三等奖']
print(sheet3.shape)

分割出所有获奖的：有904个

sheet4 = sheet.loc[(sheet['奖项'] == '一等奖'])|(sheet['奖项'] == '二等奖')|(sheet['奖项'] == '三等奖')]
print(sheet4.shape)

注：loc方法，按行列名称索引，iloc方法，按整数编号索引。

第三步：统计各项数据

output1 = sheet1['队长所在单位'].value_counts()
print(type(output1))

value_counts是pandas 统计数据频率的函数，支持Series类型和DataFrame类型，我这里sheet1[‘队长所在单位’]是Series类型。output1是Series类型数据。

第四步：matplotlib画图

下面的程序是画出二等奖获得数量前十的学校，并显示出获奖数量。

# 解决坐标轴刻度负号乱码
plt.rcParams['axes.unicode_minus'] = False
# 解决中文乱码问题
plt.rcParams['font.sans-serif'] = ['Simhei']

x = output2.index[:10]
y = output2.values[:10]
for i in range(10):
      plt.bar(x[i],y[i])
      plt.text(x[i], y[i], str(y[i]), ha="center", va="bottom")
plt.title("二等奖获奖情况")
plt.xlabel("大学")
plt.ylabel("数量")
plt.show()

在这里插入图片描述

完整python代码

import pandas as pd
import matplotlib.pyplot as plt

sheet = pd.read_excel("2021E.xls", sheet_name=0)
print(sheet.shape)
print(type(sheet))

sheet1 = sheet.loc[sheet['奖项'] == '一等奖']
print(sheet1.shape)


sheet2 = sheet.loc[sheet['奖项'] == '二等奖']
print(sheet2.shape)

sheet3 = sheet.loc[sheet['奖项'] == '三等奖']
print(sheet3.shape)

sheet4 = sheet.loc[(sheet['奖项'] == '一等奖')|(sheet['奖项'] == '二等奖')|(sheet['奖项'] == '三等奖')]
print(sheet4.shape)



output2 = sheet2['队长所在单位'].value_counts()
print(output2)
print(type(output2))

plt.rcParams['axes.unicode_minus'] = False
# 解决中文乱码问题
plt.rcParams['font.sans-serif'] = ['Simhei']

x = output2.index[:10]
y = output2.values[:10]
for i in range(10):
    plt.bar(x[i], y[i])
    plt.text(x[i], y[i], str(y[i]), ha="center", va="bottom")


plt.title("二等奖获奖情况")
plt.xlabel("大学")
plt.ylabel("数量")
plt.show()