key1 key2
a one 2
two 1
d one 1
two 1
dtype: int64
遍历各个分组
for name,group in df.groupby('key1'):print(name)print(group)
a
data1 data2 key1 key2
0 0.398171 0.618838 a one
1 1.406440 0.007411 a two
4 -0.525386 -1.980548 a one
d
data1 data2 key1 key2
2 0.842236 0.090966 d one
3 -0.377231 0.431523 d two
for name,group in df.groupby(['key1','key2']):print(name)print(group)
('a', 'one')
data1 data2 key1 key2
0 0.398171 0.618838 a one
4 -0.525386 -1.980548 a one
('a', 'two')
data1 data2 key1 key2
1 1.40644 0.007411 a two
('d', 'one')
data1 data2 key1 key2
2 0.842236 0.090966 d one
('d', 'two')
data1 data2 key1 key2
3 -0.377231 0.431523 d two
float64
data1 data2
0 0.398171 0.618838
1 1.406440 0.007411
2 0.842236 0.090966
3 -0.377231 0.431523
4 -0.525386 -1.980548
object
key1 key2
0 a one
1 a two
2 d one
3 d two
4 a one
选择一列或者所有列的子集
df.groupby(['key1','key2'])[['data2']].mean()
data2
key1
key2
a
one
-0.680855
two
0.007411
d
one
0.090966
two
0.431523
s_grouped = df.groupby(['key1','key2'])['data2']
s_grouped
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f3f1974de10>
s_grouped.mean()
key1 key2
a one -0.680855
two 0.007411
d one 0.090966
two 0.431523
Name: data2, dtype: float64
使用字典和Series分组
people = pd.DataFrame(np.random.randn(5,5),
columns =['a','b','c','d','e'],
index =['Joe','Steve','Wes','Jim','Travis'])