多重索引
> df = pd.DataFrame({'row' : [0,1,2],
....: 'One_X' : [1.1,1.1,1.1],
....: 'One_Y' : [1.2,1.2,1.2],
....: 'Two_X' : [1.11,1.11,1.11],
....: 'Two_Y' : [1.22,1.22,1.22]}); df
row One_X One_Y Two_X Two_Y
0 0 1.1 1.2 1.11 1.22
1 1 1.1 1.2 1.11 1.22
2 2 1.1 1.2 1.11 1.22
> df = df.set_index('row');df
One_X One_Y Two_X Two_Y
row
0 1.1 1.2 1.11 1.22
1 1.1 1.2 1.11 1.22
2 1.1 1.2 1.11 1.22
> df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns]);df
One Two
X Y X Y
row
0 1.1 1.2 1.11 1.22
1 1.1 1.2 1.11 1.22
2 1.1 1.2 1.11 1.22
> df = df.stack(0).reset_index(1);df
level_1 X Y
row
0 One 1.10 1.20
0 Two 1.11 1.22
1 One 1.10 1.20
1 Two 1.11 1.22
2 One 1.10 1.20
2 Two 1.11 1.22
> df.columns = ['Sample','All_X','All_Y'];df
Sample All_X All_Y
row
0 One 1.10 1.20
0 Two 1.11 1.22
1 One 1.10 1.20
1 Two 1.11 1.22
2 One 1.10 1.20
2 Two 1.11 1.22
算法
使用MultiIndex.from_tuples创建多索引:
In [58]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])
In [59]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df
Out[59]:
A B C
O I O I O I
n 1.920906 -0.388231 -2.314394 0.665508 0.402562 0.399555
m -1.765956 0.850423 0.388054 0.992312 0.744086 -0.739776
In [60]: df = df.div(df['C'],level=1); df
Out[60]:
A B C
O I O I O I
n 4.771702 -0.971660 -5.749162 1.665625 1.0 1.0
m -2.373321 -1.149568 0.521518 -1.341367 1.0 1.0
切片
使用xs切分多索引:
In [61]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]
In [62]: index = pd.MultiIndex.from_tuples(coords)
In [63]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df
Out[63]:
MyData
AA one 11
six 22
BB one 33
two 44
six 55
访问BB:
In [64]: df.xs('BB',level=0,axis=0) #Note : level and axis are optional, and default to zero
Out[64]:
MyData
one 33
two 44
six 55
访问six:
In [65]: df.xs('six',level=1,axis=0)
Out[65]:
MyData
AA 22
BB 55
另一个方法:
In [66]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))
In [67]: headr = list(itertools.product(['Exams','Labs'],['I','II']))
In [68]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])
In [69]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named
In [70]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]
In [71]: df = pd.DataFrame(data,indx,cols); df
Out[71]:
Exams Labs
I II I II
Student Course
Ada Comp 70 71 72 73
Math 71 73 75 74
Sci 72 75 75 75
Quinn Comp 73 74 75 76
Math 74 76 78 77
Sci 75 78 78 78
Violet Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
In [72]: All = slice(None)
In [73]: df.loc['Violet']
Out[73]:
Exams Labs
I II I II
Course
Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
In [74]: df.loc[(All,'Math'),All]
Out[74]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
Violet Math 77 79 81 80
In [75]: df.loc[(slice('Ada','Quinn'),'Math'),All]
Out[75]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
In [76]: df.loc[(All,'Math'),('Exams')]
Out[76]:
I II
Student Course
Ada Math 71 73
Quinn Math 74 76
Violet Math 77 79
In [77]: df.loc[(All,'Math'),(All,'II')]
Out[77]:
Exams Labs
II II
Student Course
Ada Math 73 74
Quinn Math 76 77
Violet Math 79 80
排序
按某一特定的列或行排序:
In [78]: df.sort_values(by=('Labs', 'II'), ascending=False)
Out[78]:
Exams Labs
I II I II
Student Course
Violet Sci 78 81 81 81
Math 77 79 81 80
Comp 76 77 78 79
Quinn Sci 75 78 78 78
Math 74 76 78 77
Comp 73 74 75 76
Ada Sci 72 75 75 75
Math 71 73 75 74
Comp 70 71 72 73