函数pandas.DataFrame.groupby参数as_index的意义

最新推荐文章于 2023-09-08 01:32:04 发布

coasxu

最新推荐文章于 2023-09-08 01:32:04 发布

阅读量5.4k

点赞数 4

分类专栏： Python 文章标签： pandas

本文链接：https://blog.csdn.net/weixin_44633882/article/details/102659055

版权

Python 专栏收录该内容

12 篇文章

订阅专栏

本文深入探讨了Pandas库中DataFrame.groupby方法的as_index参数作用，通过实例对比as_index为True与False时对数据分组及显示的影响，揭示了设置as_index参数对查询速度和数据展示的不同效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

函数pandas.DataFrame.groupby参数as_index的意义

含义：as_index决定了分组使用的属性是否成为新的表格的索引，默认是as_index=True，我的代码中常用：as_index=False.

使用作为索引只是会影响查询速度，而一般没有这样的需求。
as_index=True是常用的表格形式，而as_index=False除了表格有变化，显示也会不同。

文档 pandas.DataFrame.groupby

DataFrame.groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, 
squeeze=False, observed=False, **kwargs)[source]
  Group DataFrame or Series using a mapper or by a Series of columns.
  A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Parameters:
  sort : bool, default True
         Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

参考 stackoverflow上“what-is-as-index-in-groupby-in-pandas”的回答，来举个例子：
创建一个表格，有group_id,age,status三个属性。

import pandas as pd
test = {"group_id":[1,1,2,3,3,3,4,4],"age":[22,15,27,35,28,17,45,29],
        "status":[1,2,3,4,5,6,7,8]}
df = pd.DataFrame(test)
df

group_id	age	status
0	1	22
1	1	15
2	2	27
3	3	35
4	3	28
5	3	17
6	4	45
7	4	29

df.groupby(['group_id']).mean()

as_index=True（默认）得到的是以group_id作为索引的DataFrame，这里我认为是在显示上索引名和属性名区分开，所以，group_id会比age和status低一点。在这里插入图片描述

df.groupby(['group_id'], as_index=False).mean()

as_index=False得到的表格就没有使用group_id作为索引。
在这里插入图片描述

补充使用双属性进行分组

df.groupby(['group_id','age']).mean()

as_index=True得到的是以group_id,age作为索引的DataFrame。
在这里插入图片描述

默认使用as_index=True的原因是将分组属性作为索引，这在之后的使用中能增加查询速度。
将属性A，B一起用于分组，也是同样的。

参考：

https://stackoverflow.com/questions/41236370/what-is-as-index-in-groupby-in-pandas
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html?highlight=groupby#pandas.DataFrame.groupby