导入模块
import pandas as pd
import numpy as np
高级数据聚合
frame = pd.DataFrame({'color':['yellow','red','green','red','green'],
'price1':[5.56,4.2,1.3,0.56,2.75],
'price2':[4.75,4.12,1.6,0.75,3.15]})
sums = frame.groupby('color').sum().add_prefix('tot_')
sums
使用merge()函数把聚合结果添加到原DataFrame
pd.merge(frame,sums,left_on='color',right_index=True)
transform()函数 数据聚合
frame.groupby('color').transform(np.sum).add_prefix('tot_')
apply()函数 数据聚合
frame = pd.DataFrame({'color':['white','black','white','white','black','black'],
'status':['up','up','down','down','down','up'],
'value1':[12.33,14.55,22.34,17.84,23.40,18.33],
'value2':[11.23,31.80,29.99,31.18,18.25,22.44]})
frame
frame.groupby(['color','status']).apply(lambda x:x.max())
参考:
法比奥·内利. Python数据分析实战:第2版.北京:人民邮电出版社, 2019.11.