关于pandas中groupby的参数as_index的True与False

在完成作业的过程中遇到了一些困难,在参考别的同学代码中发现他比我多了一条as_index=False,就把index的标题位置上移,为实现后面的工作提供了基础。上面说的比较抽象,在下面有实例说明。

首先看一下pandas官方给出的groupby函数,可以看到默认值为as_index=True

grouby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
下面部分是从https://stackoverflow.com/questions/41236370/what-is-as-index-in-groupby-in-pandas搬运


import pandas as pd

df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
print df
print
print df.groupby('books', as_index=True).sum()
print
print df.groupby('books', as_index=False).sum()
Output:

注意两次print输出中‘book’和‘price’的位置

  books  price
0   bk1     12
1   bk1     12
2   bk1     12
3   bk2     15
4   bk2     15
5   bk3     17

       price
books       
bk1       36
bk2       30
bk3       17

  books  price
0   bk1     36
1   bk2     30
2   bk3     17

When as_index=True the key(s) you use in groupby will become an index in the new dataframe.

The benefit of as_index=True is that you can yank out the rows you want by using key names. For eg. if you want 'bk1' you can get it like this: df.loc['bk1'] as opposed to when as_index=Falsethen you will have to get it like this: df.loc[df.books=='bk1']

Including the other main benefit of using as_index=True raised by @ayhan in comments: df.loc['bk1'] would be faster because it doesn't have to traverse the entire books column to find 'bk1' when it's indexed. It will just calculate the hash value of 'bk1' and find it in 1 go.









评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值