python col函数,Python熊猫中的GroupBy函数,例如SUM(col_1 * col_2),加权平均值等

Is it possible to directly compute the product (or for example sum) of two columns without using

grouped.apply(lambda x: (x.a*x.b).sum()

It is much (less than half the time on my machine) faster to use

df['helper'] = df.a*df.b

grouped= df.groupby(something)

grouped['helper'].sum()

df.drop('helper', axis=1)

But I don't really like having to do this.

It is for example useful to compute the weighted average per group. Here the lambda approach would be

grouped.apply(lambda x: (x.a*x.b).sum()/(df.b).sum())

and again is much slower than dividing the helper by b.sum().

解决方案

I want to eventually build an embedded array expression evaluator (Numexpr on steroids) to do things like this. Right now we're working with the limitations of Python-- if you implemented a Cython aggregator to do (x * y).sum() then it could be connected with groupby, but ideally you could write the Python expression as a function:

def weight_sum(x, y):

return (x * y).sum()

and that would get "JIT-compiled" and be about as fast as groupby(...).sum(). What I'm describing is a pretty significant (many month) project. If there were a BSD-compatible APL implementation I might be able to do something like the above quite a bit sooner (just thinking out loud).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值