可能在所有的分组函数中,apply是应用最为广泛的,这得益于它的灵活性:¶
对于传入值而言,从下面的打印内容可
以看到是以分组的表传入apply中:
对于传入值而言,从下面的打印内容可以看到是以分组的表传入apply中
df.groupby('School').apply(lambda x:print(x.head(1)))
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
School Class Gender Address Height Weight Math Physics
ID
2101 S_2 C_1 M street_7 174 84 83.3 C
apply函数的灵活性很大程度来源于其返回值的多样性
① 标量返回值
In [37]:
df[['School','Math','Height']].groupby('School').apply(lambda x:x.max())
Out[37]:
School Math Height
School
S_1 S_1 97.0 195
S_2 S_2 95.5 194
② 列表返回值
In [38]:
df[['School','Math','Height']].groupby('School').apply(lambda x:x-x.min()).head()
Out[38]:
Math Height
ID
1101 2.5 14.0
1102 1.0 33.0
1103 55.7 27.0
1104 48.9 8.0
1105 53.3 0.0
③ 数据框返回值
In [39]:
df[['School','Math','Height']].groupby('School')\
.apply(lambda x:pd.DataFrame({'col1':x['Math']-x['Math'].max(),
'col2':x['Math']-x['Math'].min(),
'col3':x['Height']-x['Height'].max(),
'col4':x['Height']-x['Height'].min()})).head()
Out[39]:
col1 col2 col3 col4
ID
1101 -63.0 2.5 -22 14
1102 -64.5 1.0 -3 33
1103 -9.8 55.7 -9 27
1104 -16.6 48.9 -28 8
1105 -12.2 53.3 -36 0
- 用apply同时统计多个指标
此处可以借助OrderedDict工具进行快捷的统计
In [40]:
from collections import OrderedDict
def f(df):
data = OrderedDict()
data['M_sum'] = df['Math'].sum()
data['W_var'] = df['Weight'].var()
data['H_mean'] = df['Height'].mean()
return pd.Series(data)
grouped_single.apply(f)
Out[40]:
M_sum W_var H_mean
School
S_1 956.2 117.428571 175.733333
S_2 1191.1 181.081579 172.950000