Input:
df=pd.DataFrame({
'BusId':['abc1','abc2','abc3','abc1','abc2','abc4'],
"Fair":[5,6,7,10,5,4]
})
Need to group by BusId and need the following output
Output:
BusId Count of Fair>=5 Count of Fair>=10
abc1 2 1
abc2 1 0
abc3 1 0
abc4 0 0
Thanks for the help.
解决方案
Using agg on your series with two helper functions to count the values above each of your thresholds.
However, aggregation on a Series as I am doing here will be deprecated in a future version of pandas.
df.groupby('BusId').Fair.agg({
'gt5': lambda x: (x>=5).sum(),
'gt10': lambda x: (x>=10).sum()
})
gt5 gt10
BusId
abc1 2 1
abc2 2 0
abc3 1 0
abc4 0 0
You could also remove the use of lambda:
out = df.assign(gt5=df.Fair.ge(5), gt10=df.Fair.ge(10))
out.groupby('BusId').agg({'gt5': 'sum', 'gt10': 'sum'}).astype(int)
gt5 gt10
BusId
abc1 2 1
abc2 2 0
abc3 1 0
abc4 0 0
The second approach will be slightly faster:
%%timeit
df.groupby('BusId').Fair.agg({
'gt5': lambda x: (x>=5).sum(),
'gt10': lambda x: (x>=10).sum()
})
5.05 ms ± 69 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
out = df.assign(gt5=df.Fair.ge(5), gt10=df.Fair.ge(10))
out.groupby('BusId').agg({'gt5': 'sum', 'gt10': 'sum'}).astype(int)
3.76 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)