过滤
grouped_single[['Math','Physics']].filter(lambda x:(x['Math']>32).all()).head()
变换
(a)传入对象
transform函数中传入对象是组内的列,并且返回值需要与列长完全一致
grouped_single[['Math','Height']].transform(lambda x:x-x.min()).head()
如果返回了标量值,那么组内的所有元素会被广播为这个值
grouped_single[['Math','Height']].transform(lambda x:x.mean()).head()
(b)利用变换方法进行组内标准化
grouped_single[['Math','Height']].transform(lambda x:(x-x.mean())/x.std()).head()
(c)利用变换方法进行组内缺失值的均值填充
df_nan = df[['Math','School']].copy().reset_index()
df_nan.loc[np.random.randint(0,df.shape[0],25),['Math']]=np.nan
df_nan.head()
Out[34]:
ID Math School
0 1101 NaN S_1
1 1102 NaN S_1
2 1103 87.2 S_1
3 1104 80.4 S_1
4 1105 NaN S_1
df_nan.groupby('School').transform(lambda x: x.fillna(x.mean())).join(df.reset_index()['School']).head()
ID Math School
0 1101 68.214286 S_1
1 1102 68.214286 S_1
2 1103 87.200000 S_1
3 1104 80.400000 S_1
4 1105 68.214286 S_1