1 数据分组 - groupby()
from pandas import DataFrame,Series
import pandas as pd
import numpy as np
from numpy import nan as NA
df = DataFrame({
"key1" :list("aabba" ),
"key2" :["one" ,"two" ,"one" ,"two" ,"one" ],
"data1" :np.random.randn(5 ),
"data2" :np.random.randn(5 )})
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
data1
data2
key1
key2
0
0.008908
0.652712
a
one
1
0.438874
0.423774
a
two
2
0.299105
-1.279888
b
one
3
-0.191032
0.429504
b
two
4
-0.395208
0.523417
a
one
grouped = df["data1" ].groupby(df["key1" ])
grouped.count()
key1
a 3
b 2
Name: data1, dtype: int64
grouped.max()
key1
a 0.438874
b 0.299105
Name: data1, dtype: float64
grouped.size()
key1
a 3
b 2
Name: data1, dtype: int64
grouped.mean()
key1
a 0.017525
b 0.054036
Name: data1, dtype: float64
grouped.describe()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
count
mean
std
min
25%
50%
75%
max
key1
a
3.0
0.017525
0.417108
-0.395208
-0.193150
0.008908
0.223891
0.438874
b
2.0
0.054036
0.346579
-0.191032
-0.068498
0.054036
0.176571
0.299105
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
data1
data2
key1
key2
0
0.008908
0.652712
a
one
1
0.438874
0.423774
a
two
2
0.299105
-1.279888
b
one
3
-0.191032
0.429504
b
two
4
-0.395208
0.523417
a
one
1.1 按照多列进行分组的两种方法
grouped1 = df["data1" ].groupby([df["key1" ],df["key2" ]])
grouped1.mean()
key1 key2
a one -0.193150
two 0.438874
b one 0.299105
two -0.191032
Name: data1, dtype: float64
group_all = df.groupby(["key1" ,"key2" ])
group_all["data1" ].mean()
key1 key2
a one -0.193150
two 0.438874
b one 0.299105
two -0.191032
Name: data1, dtype: float64
group_all["data2" ].mean()
key1 key2
a one 0.588065
two 0.423774
b one -1.279888
two 0.429504
Name: data2, dtype: float64
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
data1
data2
key1
key2
0
0.008908
0.652712
a
one
1
0.438874
0.423774
a
two
2
0.299105
-1.279888
b
one
3
-0.191032
0.429504
b
two
4
-0.395208
0.523417
a
one
group_key1 = df.groupby(["key1" ])
1.2 对分组进行迭代
for name,group in group_key1:
print("组别:" ,name)
print("数据:\n" ,group)
组别: a
数据:
data1 data2 key1 key2
0 0.008908 0.652712 a one
1 0.438874 0.423774 a two
4 -0.395208 0.523417 a one
组别: b
数据:
data1 data2 key1 key2
2 0.299105 -1.279888 b one
3 -0.191032 0.429504 b two
1.3 将列数据按照数据类型进行分组
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
data1
data2
key1
key2
0
0.008908
0.652712
a
one
1
0.438874
0.423774
a
two
2
0.299105
-1.279888
b
one
3
-0.191032
0.429504
b
two
4
-0.395208
0.523417
a
one
df.dtypes
data1 float64
data2 float64
key1 object
key2 object
dtype: object
group_types = df.groupby(df.dtypes,axis=1 )