In Pandas 0.17 I try to sort by a specific column while maintaining the hierarchical index (A and B). B is a running number created when setting up the dataframe through concatenation. My data looks like this:
C D
A B
bar one shiny 10
two dull 5
three glossy 8
foo one dull 3
two shiny 9
three matt 12
This is what I need:
C D
A B
bar two dull 5
three glossy 8
one shiny 10
foo one dull 3
three matt 12
two shiny 9
Below is the code I am using and the result. Note: Pandas 0.17 alerts that dataframe.sort will be deprecated.
df.sort_values(by="C", ascending=True)
C D
A B
bar two dull 5
foo one dull 3
bar three glossy 8
foo three matt 12
bar one shiny 10
foo two shiny 9
Adding .groupby produces the same result:
df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)
Similarly, switching to sorting indices first, and then groupby the column is not fruitful:
df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)
I am not certain about reindexing I need to keep the first index A, second index B can be reassigned, but does not have to. It would surprise me if there is not an easy solution; I guess I just don't find it. Any suggestions are appreciated.
Edit: In the meantime I dropped the second index B, reassigned first index A to be a column instead of an index sorted multiple columns, then re-indexed it:
df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")
Still very verbose.
解决方案
Feels like there could be a better way, but here's one approach:
In [163]: def sorter(sub_df):
...: sub_df = sub_df.sort_values('C')
...: sub_df.index = sub_df.index.droplevel(0)
...: return sub_df
In [164]: df.groupby(level='A').apply(sorter)
Out[164]:
C D
A B
bar two dull 5
three glossy 8
one shiny 10
foo one dull 3
three matt 12
two shiny 9