第3章分组

最新推荐文章于 2022-12-10 17:30:28 发布

supermanwasd

最新推荐文章于 2022-12-10 17:30:28 发布

阅读量195

点赞数

本文链接：https://blog.csdn.net/supermanwasd/article/details/105780207

版权

groupby函数

分组函数的基本内容：

根据某一列分组
根据某几列分组
组容量与组数
组的遍历

for name,group in grouped_single:
    print(name)
    display(group.head())

level参数（用于多级索引）和axis参数

df.set_index(['Gender','School']).groupby(level=1,axis=0).get_group('S_1').head()

groupby的对象：

[‘Address’, ‘Class’, ‘Gender’, ‘Height’, ‘Math’, ‘Physics’, ‘School’, ‘Weight’, ‘agg’, ‘aggregate’, ‘all’, ‘any’, ‘apply’, ‘backfill’, ‘bfill’, ‘boxplot’, ‘corr’, ‘corrwith’, ‘count’, ‘cov’, ‘cumcount’, ‘cummax’, ‘cummin’, ‘cumprod’, ‘cumsum’, ‘describe’, ‘diff’, ‘dtypes’, ‘expanding’, ‘ffill’, ‘fillna’, ‘filter’, ‘first’, ‘get_group’, ‘groups’, ‘head’, ‘hist’, ‘idxmax’, ‘idxmin’, ‘indices’, ‘last’, ‘mad’, ‘max’, ‘mean’, ‘median’, ‘min’, ‘ndim’, ‘ngroup’, ‘ngroups’, ‘nth’, ‘nunique’, ‘ohlc’, ‘pad’, ‘pct_change’, ‘pipe’, ‘plot’, ‘prod’, ‘quantile’, ‘rank’, ‘resample’, ‘rolling’, ‘sem’, ‘shift’, ‘size’, ‘skew’, ‘std’, ‘sum’, ‘tail’, ‘take’, ‘transform’, ‘tshift’, ‘var’]

聚合、过滤和变换

常用聚合函数
group_m = grouped_single[‘Math’]
group_m.std().values/np.sqrt(group_m.count().values)== group_m.sem().values

同时使用多个聚合函数

过滤（Filteration）
变换（Transformation）

apply函数

apply函数的灵活性
用apply同时统计多个指标

‘’’
from collections import OrderedDict
def f(df):
data = OrderedDict()
data[‘M_sum’] = df[‘Math’].sum()
data[‘W_var’] = df[‘Weight’].var()
data[‘H_mean’] = df[‘Height’].mean()
return pd.Series(data)
‘’’