python对象group函数,Python熊猫groupby对象应用方法复制第一组

最新推荐文章于 2023-11-11 19:58:26 发布

CD-小C

最新推荐文章于 2023-11-11 19:58:26 发布

阅读量143

点赞数

文章标签： python对象group函数

My first SO question:

I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame

>>> import pandas as pd

>>> df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]})

>>> print(df)

class count

0 A 1

1 B 0

2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):

>>> print(group)

('A', class count

0 A 1)

('B', class count

1 B 0)

('C', class count

2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):

>>> print(group)

>>> df.groupby('class', group_keys = True).apply(checkit)

class count

0 A 1

class count

0 A 1

class count

1 B 0

class count

2 C 2

Any help would be appreciated! Thanks.

Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:

>>> def addone(group):

>>> group['count'] += 1

>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)

>>> print(df)

class count

0 A 1

1 B 0

2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)

print(df2)

class count

0 A 2

1 B 1

2 C 3

解决方案

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

Depending on your actual use case, you can replace the call to apply with aggregate, transform or filter, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

CD-小C

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python对象group函数,Python熊猫groupby对象应用方法复制第一组

My first SO question:I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:>>&gt...
复制链接

扫一扫