python对象group函数,Python熊猫groupby对象应用方法复制第一组

My first SO question:

I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame

>>> import pandas as pd

>>> df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]})

>>> print(df)

class count

0 A 1

1 B 0

2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):

>>> print(group)

('A', class count

0 A 1)

('B', class count

1 B 0)

('C', class count

2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):

>>> print(group)

>>> df.groupby('class', group_keys = True).apply(checkit)

class count

0 A 1

class count

0 A 1

class count

1 B 0

class count

2 C 2

Any help would be appreciated! Thanks.

Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:

>>> def addone(group):

>>> group['count'] += 1

>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)

>>> print(df)

class count

0 A 1

1 B 0

2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)

print(df2)

class count

0 A 2

1 B 1

2 C 3

解决方案

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

Depending on your actual use case, you can replace the call to apply with aggregate, transform or filter, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
`groupby`函数是 pandas 库中常用的函数,用于按照指定的列或多个列对数据进行分组。一旦数据被分组,可以对每个组应用各种操作,例如聚合函数、转换函数、筛选等。 以下是一些常见的使用示例: 1. 按照单个列进行分组并应用聚合函数: ```python import pandas as pd # 创建示例数据 data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'], 'value': [10, 20, 30, 40, 50, 60]} df = pd.DataFrame(data) # 按照 group 列进行分组,并计算每组的平均值 grouped = df.groupby('group') result = grouped.mean() print(result) ``` 输出结果: ``` value group A 15 B 35 C 55 ``` 2. 按照多个列进行分组并应用聚合函数: ```python import pandas as pd # 创建示例数据 data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'], 'category': ['X', 'Y', 'X', 'Y', 'X', 'Y'], 'value': [10, 20, 30, 40, 50, 60]} df = pd.DataFrame(data) # 按照 group 和 category 列进行分组,并计算每组的总和 grouped = df.groupby(['group', 'category']) result = grouped.sum() print(result) ``` 输出结果: ``` value group category A X 10 Y 20 B X 30 Y 40 C X 50 Y 60 ``` 3. 应用自定义函数进行数据转换: ```python import pandas as pd # 创建示例数据 data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'], 'value': [10, 20, 30, 40, 50, 60]} df = pd.DataFrame(data) # 自定义函数将每个组的值减去平均值 def subtract_mean(group): group['value'] = group['value'] - group['value'].mean() return group # 按照 group 列进行分组,并应用自定义函数 result = df.groupby('group').apply(subtract_mean) print(result) ``` 输出结果: ``` group value 0 A -5.0 1 A 5.0 2 B -5.0 3 B 5.0 4 C -5.0 5 C 5.0 ``` 以上是 `groupby` 函数的一些常见用法示例,你可以根据具体需求选择合适的聚合函数或转换函数对数据进行处理。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值