pandas_cookbook学习(三)

多重索引

> df = pd.DataFrame({'row' : [0,1,2],
   ....:                    'One_X' : [1.1,1.1,1.1],
   ....:                    'One_Y' : [1.2,1.2,1.2],
   ....:                    'Two_X' : [1.11,1.11,1.11],
   ....:                    'Two_Y' : [1.22,1.22,1.22]}); df

row	One_X	One_Y	Two_X	Two_Y
0	0	1.1	1.2	1.11	1.22
1	1	1.1	1.2	1.11	1.22
2	2	1.1	1.2	1.11	1.22
> df = df.set_index('row');df
	One_X	One_Y	Two_X	Two_Y
row				
0	1.1	1.2	1.11	1.22
1	1.1	1.2	1.11	1.22
2	1.1	1.2	1.11	1.22
> df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns]);df
	One	Two
X	Y	X	Y
row				
0	1.1	1.2	1.11	1.22
1	1.1	1.2	1.11	1.22
2	1.1	1.2	1.11	1.22
> df = df.stack(0).reset_index(1);df
	level_1	X	Y
row			
0	One	1.10	1.20
0	Two	1.11	1.22
1	One	1.10	1.20
1	Two	1.11	1.22
2	One	1.10	1.20
2	Two	1.11	1.22
> df.columns = ['Sample','All_X','All_Y'];df
	Sample	All_X	All_Y
row			
0	One	1.10	1.20
0	Two	1.11	1.22
1	One	1.10	1.20
1	Two	1.11	1.22
2	One	1.10	1.20
2	Two	1.11	1.22

算法

使用MultiIndex.from_tuples创建多索引:

In [58]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])

In [59]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df
Out[59]: 
          A                   B                   C          
          O         I         O         I         O         I
n  1.920906 -0.388231 -2.314394  0.665508  0.402562  0.399555
m -1.765956  0.850423  0.388054  0.992312  0.744086 -0.739776

In [60]: df = df.div(df['C'],level=1); df
Out[60]: 
          A                   B              C     
          O         I         O         I    O    I
n  4.771702 -0.971660 -5.749162  1.665625  1.0  1.0
m -2.373321 -1.149568  0.521518 -1.341367  1.0  1.0

切片

使用xs切分多索引:

In [61]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]

In [62]: index = pd.MultiIndex.from_tuples(coords)

In [63]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df
Out[63]: 
        MyData
AA one      11
   six      22
BB one      33
   two      44
   six      55

访问BB:

In [64]: df.xs('BB',level=0,axis=0)  #Note : level and axis are optional, and default to zero
Out[64]: 
     MyData
one      33
two      44
six      55

访问six:

In [65]: df.xs('six',level=1,axis=0)
Out[65]: 
    MyData
AA      22
BB      55

另一个方法:

In [66]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))

In [67]: headr = list(itertools.product(['Exams','Labs'],['I','II']))

In [68]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])

In [69]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named

In [70]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]

In [71]: df = pd.DataFrame(data,indx,cols); df
Out[71]: 
               Exams     Labs    
                   I  II    I  II
Student Course                   
Ada     Comp      70  71   72  73
        Math      71  73   75  74
        Sci       72  75   75  75
Quinn   Comp      73  74   75  76
        Math      74  76   78  77
        Sci       75  78   78  78
Violet  Comp      76  77   78  79
        Math      77  79   81  80
        Sci       78  81   81  81

In [72]: All = slice(None)

In [73]: df.loc['Violet']
Out[73]: 
       Exams     Labs    
           I  II    I  II
Course                   
Comp      76  77   78  79
Math      77  79   81  80
Sci       78  81   81  81

In [74]: df.loc[(All,'Math'),All]
Out[74]: 
               Exams     Labs    
                   I  II    I  II
Student Course                   
Ada     Math      71  73   75  74
Quinn   Math      74  76   78  77
Violet  Math      77  79   81  80

In [75]: df.loc[(slice('Ada','Quinn'),'Math'),All]
Out[75]: 
               Exams     Labs    
                   I  II    I  II
Student Course                   
Ada     Math      71  73   75  74
Quinn   Math      74  76   78  77

In [76]: df.loc[(All,'Math'),('Exams')]
Out[76]: 
                 I  II
Student Course        
Ada     Math    71  73
Quinn   Math    74  76
Violet  Math    77  79

In [77]: df.loc[(All,'Math'),(All,'II')]
Out[77]: 
               Exams Labs
                  II   II
Student Course           
Ada     Math      73   74
Quinn   Math      76   77
Violet  Math      79   80

排序

按某一特定的列或行排序:

In [78]: df.sort_values(by=('Labs', 'II'), ascending=False)
Out[78]: 
               Exams     Labs    
                   I  II    I  II
Student Course                   
Violet  Sci       78  81   81  81
        Math      77  79   81  80
        Comp      76  77   78  79
Quinn   Sci       75  78   78  78
        Math      74  76   78  77
        Comp      73  74   75  76
Ada     Sci       72  75   75  75
        Math      71  73   75  74
        Comp      70  71   72  73
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值