I have the following pandas data frame which I want to sort by 'test_type'
test_type tps mtt mem cpu 90th
0 sso_1000 205.263559 4139.031090 24.175933 34.817701 4897.4766
1 sso_1500 201.127133 5740.741266 24.599400 34.634209 6864.9820
2 sso_2000 203.204082 6610.437558 24.466267 34.831947 8005.9054
3 sso_500 189.566836 2431.867002 23.559557 35.787484 2869.7670
My code to load the dataframe and sort it is, the first print line prints the data frame above.
df = pd.read_csv(file) #reads from a csv file
print df
df = df.sort_values(by=['test_type'], ascending=True)
print '\nAfter sort...'
print df
After doing the sort and printing the dataframe content, the data frame still looks like below.
Program output:
After sort...
test_type tps mtt mem cpu 90th
0 sso_1000 205.263559 4139.031090 24.175933 34.817701 4897.4766
1 sso_1500 201.127133 5740.741266 24.599400 34.634209 6864.9820
2 sso_2000 203.204082 6610.437558 24.466267 34.831947 8005.9054
3 sso_500 189.566836 2431.867002 23.559557 35.787484 2869.7670
I expect row 3 (test type: sso_500 row) to be on top after sorting. Can someone help me figure why it's not working as it should?
解决方案
Presumbaly, what you're trying to do is sort by the numerical value after sso_. You can do this as follows:
import numpy as np
df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)
This
splits the strings at _
converts what's after this character to the numerical value
Finds the indices sorted according to the numerical values
Reorders the DataFrame according to these indices
Example
In [15]: df = pd.DataFrame({'test_type': ['sso_1000', 'sso_500']})
In [16]: df.sort_values(by=['test_type'], ascending=True)
Out[16]:
test_type
0 sso_1000
1 sso_500
In [17]: df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)]
Out[17]:
test_type
1 sso_500
0 sso_1000