python groupby apply_Python Pandas：Groupby和Apply多列操作

最新推荐文章于 2024-06-21 10:21:49 发布

weixin_39830012

最新推荐文章于 2024-06-21 10:21:49 发布

阅读量735

点赞数

文章标签： python groupby apply

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39830012/article/details/111787447

版权

df1 is DataFrame with 4 columns.

I want to created a new DataFrame (df2) by grouping df1 with Column 'A' with multi-column operation on column 'C' and 'D'

Column 'AA' = mean(C)+mean(D)

Column 'BB' = std(D)

df1= pd.DataFrame({

'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],

'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],

'C' : np.random.randn(8),

'D' : np.random.randn(8)})

A B C D

0 foo one 1.652675 -1.983378

1 bar one 0.926656 -0.598756

2 foo two 0.131381 0.604803

3 bar three -0.436376 -1.186363

4 foo two 0.487161 -0.650876

5 bar two 0.358007 0.249967

6 foo one -1.150428 2.275528

7 foo three 0.202677 -1.408699

def fun1(gg): # this does not work

return pd.DataFrame({'AA':C.mean()+gg.C.std(), 'BB':gg.C.std()})

dg1 = df1.groupby('A')

df2 = dg1.apply(fun1)

This does not work. It seems like aggregation() only works for Series and multi-column operation is not possible.

And apply() only produce Series output with multi-column operation.

Is there any other way to produce multi-column output (DataFrame) with multi-column operation?

解决方案

Do you have a typo in your f function? Should AA be C.mean() + C.std() or C.mean() + D.mean()

In this first case, AA = C.mean() + C.std(),

In [91]: df = df1.groupby('A').agg({'C': lambda x: x.mean() + x.std(),

'D': lambda x x.std()})

In [92]: df

Out[92]:

C D

A

bar 1.255506 0.588981

foo 1.775945 0.442724

For the second one C.mean() + D.mean(), things aren't quite as nice. When you give the .agg function on groupby objects a dict, I don't think there's a way to get values from two columns.

In [108]: g = df1.groupby('A')

In [109]: df = pd.DataFrame({"AA": g.mean()['C'] + g.mean()['D'], "BB": g.std()['D']})

In [110]: df

Out[110]:

AA BB

A

bar 0.532263 0.721351

foo 0.427608 0.494980

You may want to assign g.mean() and g.std() to temporary variables to avoid calculating them twice.

weixin_39830012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python groupby apply_Python Pandas：Groupby和Apply多列操作

df1 is DataFrame with 4 columns.I want to created a new DataFrame (df2) by grouping df1 with Column 'A' with multi-column operation on column 'C' and 'D'Column 'AA' = mean(C)+mean(D)Column 'BB' = std...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。