Task04-分组

最新推荐文章于 2024-07-04 16:41:16 发布

一棵二叉树

最新推荐文章于 2024-07-04 16:41:16 发布

阅读量109

点赞数

文章标签： python 数据分析

本文链接：https://blog.csdn.net/qq_46576562/article/details/111710721

版权

Datawhale 组队学习打卡04
分组 https://datawhalechina.github.io/joyful-pandas/build/html/%E7%9B%AE%E5%BD%95/ch4.html

一、分组模式及其对象

1. 分组的一般模式

分组操作在日常生活中使用极其广泛，例如：

依据性别分组，统计全国人口寿命的平均值
依据季节分组，对每一个季节的温度进行组内标准化
依据班级分组，筛选出组内数学分数的平均值超过80分的班级

要想实现以上分组操作，需要确定三个要素：分组依据，数据来源，操作及其返回结果。
分组一般模式如下：

df.groupby(分组依据)[数据来源].使用操作

eg1:df.groupby('Gender')['Longevity'].mean()
eg2:df.groupby('Gender')['Height'].median()

Gender
Female 159.6
Male 173.4
Name: Height, dtype: float64

2. 分组依据的本质

2.1在 groupby 中传入相应列名构成的列表,就可实现多个维度分组。
eg：df.groupby(['School', 'Gender'])['Height'].mean()

Out: 
School                         Gender
Fudan University               Female    158.776923
                               Male      174.212500
Peking University              Female    158.666667
                               Male      172.030000
Shanghai Jiao Tong University  Female    159.122500
                               Male      176.760000
Tsinghua University            Female    159.753333
                               Male      171.638889
Name: Height, dtype: float64

2.2 groupby 的分组依据中传入一定复杂逻辑来分组，例如根据学生体重是否超过总体均值来分组，同样还是计算身高的均值。

#分组条件
condition = df.Weight > df.Weight.mean()

df.groupby(condition)['Height'].mean()

Out: 
Weight
False    159.034646
True     172.705357
Name: Height, dtype: float64

练一练：请根据上下四分位数分割，将体重分为high、normal、low三组，统计身高的均值。

def setQua(x):
    return df.Weight.quantile(x)

condition = df.Weight.mask(df.Weight>setQua(0.75),'high').\
          mask(df.Weight<setQua(0.25),'low').mask((df.Weight>=setQua(0.25))&(df.Weight<=setQua(0.75)),'normal')
                                                  
df.groupby(condition)['Height'].mean()

Weight
high      174.935714
low       153.753659
normal    161.883516
Name: Height, dtype: float64

一棵二叉树

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫