Pandas groupby

我们经常需要对某些标签或索引的局部进行累计分析,这时候需要用到groupby函数了。
其中groupby函数的as_index参数有以下介绍:

as_index : boolean, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

意思是as_index 的默认值为True, 对于聚合输出,返回以组标签作为索引的对象。仅与DataFrame输入相关。as_index = False实际上是“SQL风格”的分组输出。
具体看下面例子:

import pandas as pd

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'D'],
                   'data': range(6)}, columns=['key', 'data'])
df

输出:

  key  data
0	A	0
1	B	1
2	C	2
3	A	3
4	B	4
5	D	5

一. 累计相同key的data值

a = df.groupby('key').sum()
a

输出:

	data
key	
A	3
B	5
C	2
D	5

当as_index=True时,没有显示索引项,而是以第一列组标签为索引值,故不能通过a.loc[0]取值,可以通过a.loc[‘A’]取值;

a.loc[0]

输出:

TypeError                                 Traceback (most recent call last)
<ipython-input-48-1ba88c28627a> in <module>()
----> 1 a.loc[0]

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1371 
   1372             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1373             return self._getitem_axis(maybe_callable, axis=axis)
   1374 
   1375     def _is_scalar_access(self, key):

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1624 
   1625         # fall thru to straight lookup
-> 1626         self._has_valid_type(key, axis)
   1627         return self._get_label(key, axis=axis)
   1628 

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _has_valid_type(self, key, axis)
   1502 
   1503             try:
-> 1504                 key = self._convert_scalar_indexer(key, axis)
   1505                 if not ax.contains(key):
   1506                     error()

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_scalar_indexer(self, key, axis)
    254         ax = self.obj._get_axis(min(axis, self.ndim - 1))
    255         # a scalar
--> 256         return ax._convert_scalar_indexer(key, kind=self.name)
    257 
    258     def _convert_slice_indexer(self, key, axis):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _convert_scalar_indexer(self, key, kind)
   1390             elif kind in ['loc'] and is_integer(key):
   1391                 if not self.holds_integer():
-> 1392                     return self._invalid_indexer('label', key)
   1393 
   1394         return key

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _invalid_indexer(self, form, key)
   1574                         "indexers [{key}] of {kind}".format(
   1575                             form=form, klass=type(self), key=key,
-> 1576                             kind=type(key)))
   1577 
   1578     def get_duplicates(self):

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [0] of <class 'int'>

但是a.loc[‘A’]可以访问:

a.loc['A']

输出:

data    3
Name: A, dtype: int64

当as_index=False时,显示索引项,此时可以通过b.loc[0]取得值。因此as_index的作用是控制聚合输出是否以组标签为索引值。

b = df.groupby('key', as_index=False).sum()
b

输出:

   key	data
0	A	3
1	B	5
2	C	2
3	D	5

b.loc[0]也可以访问了:

b.loc[0]

输出:

key     A
data    3
Name: 0, dtype: object

二、累计相同key的数目

c = df.groupby('key', as_index=False).count()
c

输出:

   key data
0	A	2
1	B	2
2	C	1
3	D	1

修改一下列名,将data列名改为count:

c.columns = ['key', 'count']
c

输出:

   key count
0	A	2
1	B	2
2	C	1
3	D	1
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值