构造数据:
np.random.seed(1)
df = pd.DataFrame({'编号':[1,2,3,4,5,7,8,9,11,12,14,15,16,19],
'金额':np.random.randint(40,200,size=(14))})
方法一:pd.cut()
(
df
.assign(编号=lambda d:pd.cut(d['编号'],[0,5,9,12,14,19],
labels=['1-5','7-9','11-12','14-16','19']))
.groupby('编号')
.sum()
)
方法二:增加辅助列
(
df
.assign(flag=lambda d:np.arange(1,len(d)+1)-d['编号'])
.groupby('flag').agg({'编号':lambda s:'%d-%d'%(min(s),max(s)),'金额':'sum'})
.set_index('编号')
)
方法三:replace()
(
df
.replace([[i for i in range(1,6)],
[i for i in range(7,10)],
[i for i in range(11,13)],
[i for i in range(14,17)],
[i for i in range(19,20)]
],
value=['1-5','7-9','11-12','14-16','19'])
.groupby('编号')
.sum()
)
三种方法的结果一样,只是排序有些不同:
深入浅出pandas,很棒的书!