Pandas groupby

最新推荐文章于 2022-07-29 18:45:54 发布

AndrewTeng

最新推荐文章于 2022-07-29 18:45:54 发布

阅读量560

点赞数

分类专栏： Python Pandas 文章标签：数据处理

本文链接：https://blog.csdn.net/qq_30982323/article/details/97935138

版权

Python 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

Pandas

1 篇文章 0 订阅

订阅专栏

我们经常需要对某些标签或索引的局部进行累计分析，这时候需要用到groupby函数了。
其中groupby函数的as_index参数有以下介绍：

as_index : boolean, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

意思是as_index 的默认值为True，对于聚合输出，返回以组标签作为索引的对象。仅与DataFrame输入相关。as_index = False实际上是“SQL风格”的分组输出。
具体看下面例子：

import pandas as pd

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'D'],
                   'data': range(6)}, columns=['key', 'data'])
df

输出：

  key  data
0	A	0
1	B	1
2	C	2
3	A	3
4	B	4
5	D	5

一. 累计相同key的data值

a = df.groupby('key').sum()
a

输出：

	data
key	
A	3
B	5
C	2
D	5

当as_index=True时，没有显示索引项，而是以第一列组标签为索引值，故不能通过a.loc[0]取值，可以通过a.loc[‘A’]取值；

a.loc[0]

输出：

TypeError                                 Traceback (most recent call last)
<ipython-input-48-1ba88c28627a> in <module>()
----> 1 a.loc[0]

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1371 
   1372             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1373             return self._getitem_axis(maybe_callable, axis=axis)
   1374 
   1375     def _is_scalar_access(self, key):

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1624 
   1625         # fall thru to straight lookup
-> 1626         self._has_valid_type(key, axis)
   1627         return self._get_label(key, axis=axis)
   1628 

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _has_valid_type(self, key, axis)
   1502 
   1503             try:
-> 1504                 key = self._convert_scalar_indexer(key, axis)
   1505                 if not ax.contains(key):
   1506                     error()

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_scalar_indexer(self, key, axis)
    254         ax = self.obj._get_axis(min(axis, self.ndim - 1))
    255         # a scalar
--> 256         return ax._convert_scalar_indexer(key, kind=self.name)
    257 
    258     def _convert_slice_indexer(self, key, axis):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _convert_scalar_indexer(self, key, kind)
   1390             elif kind in ['loc'] and is_integer(key):
   1391                 if not self.holds_integer():
-> 1392                     return self._invalid_indexer('label', key)
   1393 
   1394         return key

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _invalid_indexer(self, form, key)
   1574                         "indexers [{key}] of {kind}".format(
   1575                             form=form, klass=type(self), key=key,
-> 1576                             kind=type(key)))
   1577 
   1578     def get_duplicates(self):

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [0] of <class 'int'>

但是a.loc[‘A’]可以访问：

a.loc['A']

输出：

data    3
Name: A, dtype: int64

当as_index=False时，显示索引项，此时可以通过b.loc[0]取得值。因此as_index的作用是控制聚合输出是否以组标签为索引值。

b = df.groupby('key', as_index=False).sum()
b

输出：

   key	data
0	A	3
1	B	5
2	C	2
3	D	5

b.loc[0]也可以访问了：

b.loc[0]

输出：

key     A
data    3
Name: 0, dtype: object

二、累计相同key的数目

c = df.groupby('key', as_index=False).count()
c

输出：

   key data
0	A	2
1	B	2
2	C	1
3	D	1

修改一下列名，将data列名改为count：

c.columns = ['key', 'count']
c

输出：

   key count
0	A	2
1	B	2
2	C	1
3	D	1

AndrewTeng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas groupby

我们经常需要对某些标签或索引的局部进行累计分析，这时候需要用到groupby函数了。其中groupby函数的as_index参数有以下介绍：as_index : boolean, default TrueFor aggregated output, return object with group labels as the index. Only relevant for DataFram...
复制链接

扫一扫

专栏目录