joyful pandas task4-分组

最新推荐文章于 2023-06-22 23:10:17 发布

qq_41768189

最新推荐文章于 2023-06-22 23:10:17 发布

阅读量89

点赞数

分类专栏： pandas

原文链接：https://github.com/datawhalechina/team-learning

版权

pandas 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

1.分组的一般模式
想要实现分组操作，必须明确三个要素：分组依据、数据来源、操作及其返回结果。
code: df.groupby(分组依据)[数据来源].使用操作
eg:df.groupby(‘Gender’)[‘Longevity’].mean()

2.Groupby对象
1.通过ngroups属性，可以访问分为了多少组：
gb.ngroups
2.通过get_group方法可以直接获取所在组对应的行，此时必须知道组的具体名字：
gb.get_group((‘Fudan University’, ‘Freshman’))

3.聚合函数
1.max/min/mean/median/count/all/any/idxmax/idxmin/mad/nunique/skew/quantile/sum/std/var/sem/size/prod
2.当使用多个聚合函数时，需要用列表的形式把内置聚合函数的对应的字符串传入，先前提到的所有字符串都是合法的。
gb.agg([‘sum’, ‘idxmax’, ‘skew’])

4.变换和过滤
1.cumcount/cumsum/cumprod/cummax/cummin
2.gb.transform(lambda x: (x-x.mean())/x.std()).head()
3.筛选：gb.filter(lambda x: x.shape[0] > 100).head()

5.跨列分组
apply函数同时处理多列数据。
def BMI(x):
Height = x[‘Height’]/100
Weight = x[‘Weight’]
BMI_value = Weight/Height**2
return BMI_value.mean()
gb.apply(BMI)