在完成作业的过程中遇到了一些困难,在参考别的同学代码中发现他比我多了一条as_index=False,就把index的标题位置上移,为实现后面的工作提供了基础。上面说的比较抽象,在下面有实例说明。
首先看一下pandas官方给出的groupby函数,可以看到默认值为as_index=True
grouby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
下面部分是从https://stackoverflow.com/questions/41236370/what-is-as-index-in-groupby-in-pandas搬运
import pandas as pd
df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
print df
print
print df.groupby('books', as_index=True).sum()
print
print df.groupby('books', as_index=False).sum()
Output:
注意两次print输出中‘book’和‘price’的位置
books price
0 bk1 12
1 bk1 12
2 bk1 12
3 bk2 15
4 bk2 15
5 bk3 17
price
books
bk1 36
bk2 30
bk3 17
books price
0 bk1 36
1 bk2 30
2 bk3 17
When as_index=True
the key(s) you use in groupby will become an index in the new dataframe.
The benefit of as_index=True
is that you can yank out the rows you want by using key names. For eg. if you want 'bk1'
you can get it like this: df.loc['bk1']
as opposed to when as_index=False
then you will have to get it like this: df.loc[df.books=='bk1']
Including the other main benefit of using as_index=True
raised by @ayhan in comments: df.loc['bk1']
would be faster because it doesn't have to traverse the entire books
column to find 'bk1'
when it's indexed. It will just calculate the hash value of 'bk1'
and find it in 1 go.