导入模块
import pandas as pd
1.pandas数据聚合
创建一个DataFrame
frame = pd.DataFrame({'color':['yellow','red','green','red','green'],
'object':['pen','pencil','pencil','ashtray','pen'],
'price1':[5.56,4.2,1.3,0.56,2.75],
'price2':[4.75,4.12,1.6,0.75,3.15]})
frame
Q:使用color列的组标签,计算price1列的均值
先获取price1列,然后调用groupby()函数,用参数指定color这一列
group = frame['price1'].groupby(frame['color'])
group
'''
<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001B0E1704F88>
'''
d调用groupby对象的groups属性查看分组情况
group.groups
#返回{'green': [2, 4], 'red': [1, 3], 'yellow': [0]}
对每个组进行操作(求均值、中值等)
group.mean()
'''
color
green 2.025
red 2.380
yellow 5.560
'''
group.sum()
'''
color
green 4.05
red 4.76
yellow 5.56
'''
2.等级分组
用多列元素作为键来分组
ggroup = frame['price1'].groupby([frame['color'],frame['object']])
ggroup.groups
#{('green', 'pen'): [4], ('green', 'pencil'): [2], ('red', 'ashtray'): [3], ('red', 'pencil'): [1], ('yellow', 'pen'): [0]}
ggroup.sum()
'''
color object
green pen 2.75
pencil 1.30
red ashtray 0.56
pencil 4.20
yellow pen 5.56
'''
一次性指定好分组依据和计算方法
frame['price1'].groupby([frame['color'],frame['object']]).sum()
'''
color object
green pen 2.75
pencil 1.30
red ashtray 0.56
pencil 4.20
yellow pen 5.56
'''
参考:
法比奥·内利. Python数据分析实战:第2版.北京:人民邮电出版社, 2019.11.