第四章. Pandas进阶—数据分组统计

最新推荐文章于 2024-06-18 09:14:16 发布

归途^ω^

最新推荐文章于 2024-06-18 09:14:16 发布

阅读量1.9k

点赞数 1

分类专栏： Python数据分析从入门到实践--明日科技文章标签： pandas python 数据分析

本文链接：https://blog.csdn.net/weixin_45116749/article/details/127900066

版权

Python数据分析从入门到实践--明日科技专栏收录该内容

36 篇文章 18 订阅

订阅专栏

第四章. Pandas进阶

4.3 数据分组统计

1.分组统计函数(groupby函数)

1).功能：

根据给定的条件将数据拆分成组
每个组否可以独立应用函数（sum，mean，min）
将结果合并到一个数据结构中

2).语法：

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

参数说明:
by:映射，字典，Series对象，数组，标签或者标签列表
axis：0:代表列；1:代表行，默认0
level：索引层次
as_index：返回以组标签为索引的对象
sort：对组进行排序

3).示例：

import pandas as pd

pd.set_option('display.unicode.ambiguous_as_wide', True)  # 处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.east_asian_width', True)  # 无法对齐主要是因为列标题是中文
df = pd.read_excel('F:\\Note\\图书采购清单.xlsx')
print(df)

#一列分组统计
df1=df[['类名','折扣价']]
df2=df1.groupby(['类名']).sum()
print(df2)
print('*'*50)

#多列分组统计
df1=df[['类名','折扣价','入库日期']]
df2=df1.groupby(['类名','入库日期']).sum()
print(df2)

结果展示:

图书采购清单.xlsx
在这里插入图片描述

2.对分组数据进行迭代(groupby函数)：

1).示例：

import pandas as pd

pd.set_option('display.unicode.ambiguous_as_wide', True)  # 处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.east_asian_width', True)  # 无法对齐主要是因为列标题是中文
df = pd.read_excel('F:\\Note\\图书采购清单.xlsx')

df1=df[['类名','折扣价','入库日期']]
df2=df1.groupby('类名')['折扣价'].sum()
print(df2)
print('*'*50)

df1 = df[['类名', '折扣价', '入库日期']]
for (name1, name2), group in df1.groupby(['类名', '入库日期']):
    print(name1, name2)
    print(group)

结果展示:
在这里插入图片描述

3.对分类的某列使用聚合(groupby+agg函数)：

1).示例：

import pandas as pd

pd.set_option('display.unicode.ambiguous_as_wide', True)  # 处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.east_asian_width', True)  # 无法对齐主要是因为列标题是中文
df = pd.read_excel('F:\\Note\\图书采购清单.xlsx')

df1 = df[['类名', '折扣价']]
df2=df1.groupby(['类名']).agg(['mean','sum'])
print(df2)