groupby函数
分组函数的基本内容:
根据某一列分组
根据某几列分组
组容量与组数
组的遍历
for name,group in grouped_single:
print(name)
display(group.head())
level参数(用于多级索引)和axis参数
df.set_index(['Gender','School']).groupby(level=1,axis=0).get_group('S_1').head()
groupby的对象:
[‘Address’, ‘Class’, ‘Gender’, ‘Height’, ‘Math’, ‘Physics’, ‘School’, ‘Weight’, ‘agg’, ‘aggregate’, ‘all’, ‘any’, ‘apply’, ‘backfill’, ‘bfill’, ‘boxplot’, ‘corr’, ‘corrwith’, ‘count’, ‘cov’, ‘cumcount’, ‘cummax’, ‘cummin’, ‘cumprod’, ‘cumsum’, ‘describe’, ‘diff’, ‘dtypes’, ‘expanding’, ‘ffill’, ‘fillna’, ‘filter’, ‘first’, ‘get_group’, ‘groups’, ‘head’, ‘hist’, ‘idxmax’, ‘idxmin’, ‘indices’, ‘last’, ‘mad’, ‘max’, ‘mean’, ‘median’, ‘min’, ‘ndim’, ‘ngroup’, ‘ngroups’, ‘nth’, ‘nunique’, ‘ohlc’, ‘pad’, ‘pct_change’, ‘pipe’, ‘plot’, ‘prod’, ‘quantile’, ‘rank’, ‘resample’, ‘rolling’, ‘sem’, ‘shift’, ‘size’, ‘skew’, ‘std’, ‘sum’, ‘tail’, ‘take’, ‘transform’, ‘tshift’, ‘var’]
聚合、过滤和变换
常用聚合函数
group_m = grouped_single[‘Math’]
group_m.std().values/np.sqrt(group_m.count().values)== group_m.sem().values
同时使用多个聚合函数
过滤(Filteration)
变换(Transformation)
apply函数
apply函数的灵活性
用apply同时统计多个指标
‘’’
from collections import OrderedDict
def f(df):
data = OrderedDict()
data[‘M_sum’] = df[‘Math’].sum()
data[‘W_var’] = df[‘Weight’].var()
data[‘H_mean’] = df[‘Height’].mean()
return pd.Series(data)
‘’’