Python数据处理——Pandas模块使用(五)

本文主要介绍pandas数据转换操作

1.分组
In [42]: df = DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5)
    ...: ,'data2':np.random.randn(5)})

In [43]: df
Out[43]:
      data1     data2 key1 key2
0  0.762448  0.816634    a  one
1  1.412613  0.867923    a  two
2  0.899297 -1.049657    b  one
3  0.912080  0.628012    b  two
4 -0.549258 -1.327614    a  one

In [44]: grouped = df['data1'].groupby(df['key1'])    #按key1列分组,计算data1列的平均值
In [45]: grouped
Out[45]: <pandas.core.groupby.SeriesGroupBy object at 0x00000000073C97F0>

In [46]: grouped.mean()
Out[46]:
key1
a    0.541935
b    0.905688
Name: data1, dtype: float64


In [48]: df['data1'].groupby([df['key1'],df['key2']]).mean()
Out[48]:
key1  key2
a     one     0.106595
      two     1.412613
b     one     0.899297
      two     0.912080
Name: data1, dtype: float64

In [49]: df.groupby('key1').mean()    #根据列名分组
Out[49]:
         data1     data2
key1
a     0.541935  0.118981
b     0.905688 -0.210822

In [50]: df.groupby(['key1','key2']).mean()
Out[50]:
              data1     data2
key1 key2
a    one   0.106595 -0.255490
     two   1.412613  0.867923
b    one   0.899297 -1.049657
     two   0.912080  0.628012

In [51]: df.groupby('key1')['data1'].mean()    #选取部分列进行聚合
Out[51]:
key1
a    0.541935
b    0.905688
Name: data1, dtype: float64

In [52]: df.groupby(['key1','key2'])['data1'].mean()
Out[52]:
key1  key2
a     one     0.106595
      two     1.412613
b     one     0.899297
      two     0.912080
Name: data1, dtype: float64

In [53]: people = DataFrame(np.random.randn(5,5),columns = ['a','b','c','d','e'],index = ['Joe','Steve','Wes','Jim','Tr
    ...: avis'])

In [54]: people
Out[54]:
               a         b         c         d         e
Joe     0.223628 -0.282831  0.368583  0.246665 -0.815742
Steve   0.662181  0.187961  0.515883 -2.021429 -0.624596
Wes    -1.009086  0.450082 -0.819855 -1.626971  0.632064
Jim     1.593881  0.803760 -0.209345 -1.295325 -0.553693
Travis -0.041911  1.115285 -1.648207  0.521751 -0.414183

In [55]: mapping = {'a':'red','b':'red','c':'blue','d':'blue','e':'red','f':'orange'}
In [56]: map_series = Series(mapping)
In [57]: map_series
Out[57]:
a       red
b       red
c      blue
d      blue
e       red
f    orange
dtype: object

In [58]: people.groupby(map_series,axis = 1).count()  #根据series分组
Out[58]:
        blue  red
Joe        2    3
Steve      2    3
Wes        2    3
Jim        2    3
Travis     2    3

In [59]: by_columns = people.groupby(mapping,axis =1)  #根据字典分组
In [60]: by_columns.sum()
Out[60]:
            blue       red
Joe     0.615248 -0.874945
Steve  -1.505546  0.225546
Wes    -2.446826  0.073060
Jim    -1.504670  1.843948
Travis -1.126456  0.659191

In [61]: people.groupby(len).sum()  #根据函数分组
Out[61]:
          a         b         c         d         e
3  0.808423  0.971012 -0.660617 -2.675632 -0.737371
5  0.662181  0.187961  0.515883 -2.021429 -0.624596
6 -0.041911  1.115285 -1.648207  0.521751 -0.414183

In [63]: columns = pd.MultiIndex.from_arrays([['US','US','US','JP','JP'],[1,3,5,1,3]],names = ['city','tennor'])
In [65]: df1 = DataFrame(np.random.randn(4,5),columns = columns)
In [66]: df1
Out[66]:
city          US                            JP
tennor         1         3         5         1         3
0       1.103548  1.087425  0.717741 -0.354419  1.294512
1      -0.247544 -1.247665  1.340309  1.337957  0.528693
2       2.168903 -0.124958  0.367158  0.478355 -0.828126
3      -0.078540 -3.062132 -2.095675 -0.879590 -0.020314

In [67]: df1.groupby(level = 'city',axis = 1).count()   #根据索引级别分组
Out[67]:
city  JP  US
0      2   3
1      2   3
2      2   3
2.透视表

pandas为我们提供了实现数据透视表功能的函数pivot_table(),该函数参数说明如下:

参数说明
data需要进行透视的数据
value指定需要聚合的字段
index指定值为行索引
columns指定值为列索引
aggfunc聚合函数
fill_value常量替换缺失值,默认不替换
margins是否合并,默认否
dropna是否观测缺失值,默认是

举例:

In [68]: dic = {'Name':['LiuShunxiang','Zhangshan','ryan'],
    ...:    ...: 'Sex':['M','F','F'],
    ...:    ...: 'Age':[27,23,24],
    ...:    ...: 'Height':[165.7,167.2,154],
    ...:    ...: 'Weight':[61,63,41]}
    ...:
In [69]: student = pd.DataFrame(dic)
In [70]: student
Out[70]:
   Age  Height          Name Sex  Weight
0   27   165.7  LiuShunxiang   M      61
1   23   167.2     Zhangshan   F      63
2   24   154.0          ryan   F      41

In [71]: pd.pivot_table(student,values = ['Height'],columns = ['Sex'])   #'Height'作为数值变量,'Sex'作为分组变量
Out[71]:
Sex         F      M
Height  160.6  165.7

In [72]: pd.pivot_table(student,values = ['Height','Weight'],columns = ['Sex','Age'])
Out[72]:
        Sex  Age
Height  F    23     167.2
             24     154.0
        M    27     165.7
Weight  F    23      63.0
             24      41.0
        M    27      61.0
dtype: float64

In [73]: pd.pivot_table(student,values = ['Height','Weight'],columns = ['Sex','Age']).unstack()
Out[73]:
Age            23     24     27
       Sex
Height F    167.2  154.0    NaN
       M      NaN    NaN  165.7
Weight F     63.0   41.0    NaN
       M      NaN    NaN   61.0

In [74]: pd.pivot_table(student,values = ['Height','Weight'],columns = ['Sex'],aggfunc = [np.mean,np.median,np.std])
Out[74]:
         mean        median               std
Sex         F      M      F      M          F   M
Height  160.6  165.7  160.6  165.7   9.333810 NaN
Weight   52.0   61.0   52.0   61.0  15.556349 NaN
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值