数据聚合与分组操作

GroupBy机制

import numpy as np
import pandas as pd
df = pd.DataFrame({
   'key1':['a','a','d','d','a'],
                  'key2':['one','two','one','two','one'],
                  'data1': np.random.randn(5),
                  'data2': np.random.randn(5)})
df
data1 data2 key1 key2
0 0.398171 0.618838 a one
1 1.406440 0.007411 a two
2 0.842236 0.090966 d one
3 -0.377231 0.431523 d two
4 -0.525386 -1.980548 a one
# 根据key1标签计算data1列的均值
grouped = df['data1'].groupby(df['key1'])
grouped
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f3f19fbc438>
grouped.mean()
key1
a    0.426408
d    0.232502
Name: data1, dtype: float64
means = df['data1'].groupby([df['key1'], df['key2']]).mean()
means
key1  key2
a     one    -0.063608
      two     1.406440
d     one     0.842236
      two    -0.377231
Name: data1, dtype: float64
means.unstack()
key2 one two
key1
a -0.063608 1.406440
d 0.842236 -0.377231
states = np.array(['Ohio','California','California','Ohio','Ohio'])
years = np.array([2005,2005,2006,2005,2006])
df['data1'].groupby([states,years]).mean()
California  2005    1.406440
            2006    0.842236
Ohio        2005    0.010470
            2006   -0.525386
Name: data1, dtype: float64
df.groupby('key1').mean()
data1 data2
key1
a 0.426408 -0.451433
d 0.232502 0.261245
df.groupby(['key1','key2']).mean()
data1 data2
key1 key2
a one -0.063608 -0.680855
two 1.406440 0.007411
d one 0.842236 0.090966
two -0.377231 0.431523
df.groupby(['key1','key2']).size()
key1  key2
a     one     2
      two     1
d     one     1
      two     1
dtype: int64
遍历各个分组
for name,group in df.groupby('key1'):
    print(name)
    print(group)
a
      data1     data2 key1 key2
0  0.398171  0.618838    a  one
1  1.406440  0.007411    a  two
4 -0.525386 -1.980548    a  one
d
      data1     data2 key1 key2
2  0.842236  0.090966    d  one
3 -0.377231  0.431523    d  two
for name,group in df.groupby(['key1','key2']):
    print(name)
    print(group)
('a', 'one')
      data1     data2 key1 key2
0  0.398171  0.618838    a  one
4 -0.525386 -1.980548    a  one
('a', 'two')
     data1     data2 key1 key2
1  1.40644  0.007411    a  two
('d', 'one')
      data1     data2 key1 key2
2  0.842236  0.090966    d  one
('d', 'two')
      data1     data2 key1 key2
3 -0.377231  0.431523    d  two
pieces = dict(list(df.groupby('key1')))
pieces['d']
data1 data2 key1 key2
2 0.842236 0.090966 d one
3 -0.377231 0.431523 d two
df.dtypes
data1    float64
data2    float64
key1      object
key2      object
dtype: object
grouped  = df.groupby(df.dtypes, axis = 1)  # 指定axis = 1按列分组
for name,group in grouped:
    print(name)
    print(group)
float64
      data1     data2
0  0.398171  0.618838
1  1.406440  0.007411
2  0.842236  0.090966
3 -0.377231  0.431523
4 -0.525386 -1.980548
object
  key1 key2
0    a  one
1    a  two
2    d  one
3    d  two
4    a  one
选择一列或者所有列的子集
df.groupby(['key1','key2'])[['data2']].mean()
data2
key1 key2
a one -0.680855
two 0.007411
d one 0.090966
two 0.431523
s_grouped = df.groupby(['key1','key2'])['data2']
s_grouped
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f3f1974de10>
s_grouped.mean()
key1  key2
a     one    -0.680855
      two     0.007411
d     one     0.090966
      two     0.431523
Name: data2, dtype: float64
使用字典和Series分组
people = pd.DataFrame(np.random.randn(5,5),
                     columns = ['a','b','c','d','e'],
                     index = ['Joe','Steve','Wes','Jim','Travis'])
people
a b c d e
Joe -1.650753 -1.182232 -0.534644 0.344981 -0.747273
Steve 1.481961 1.266301 -0.758866 0.931459 -0.512757
Wes -1.755591 -0.003535 0.910192 -0.187150 -0.603618
Jim 1.520320 -1.055722 -1.221894 0.741607 1.282918
Travis -0.271283 0.343674 -0.210378 -0.503580 -0.816606
people.iloc[2:3,[1,2]] = np.nan
people
a b c d e
Joe -1.650753 -1.182232 -0.534644 0.344981 -0.747273
Steve 1.481961 1.266301 -0.758866 0.931459 -0.512757
Wes -1.755591 NaN NaN -0.187150 -0.603618
Jim 1.520320 -1.055722 -1.221894 0.741607 1.282918
Travis -0.271283 0.343674 -0.210378 -0.503580 -0.816606
mapping = {
   'a':
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值