pandas排序 sort_values
numpy库中提供了argsort()函数用于排序,而pandas库则提供了sort_values()函数用于排序
DataFrame.sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')[source]
一共有六个参数,by、axis、ascending、inplace、kind和na_position
by参数
by : str or list of str
Name or list of names to sort by.
if axis is 0 or ‘index’ then by may contain index levels and/or column labels
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels
Changed in version 0.23.0: Allow specifying index or column level names.
如果axis=0,那么by参数为列标签,纵向排序;
如果axis=1,那么by参数为行标签,横向排序;
axis参数
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Axis to be sorted.
选择按行排序、还是按列排序
axis默认为0,0表示为纵向排序
axis为1,表示为横向排序
ascending参数
ascending : bool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
默认为True表示升序,为False表示降序,若by参数是一个列表,则ascending参数可为一个相同长度的列表,指定其中每个标签的升降序规则
inplace参数
inplace : bool, default False
If True, perform operation in-place.
inplace参数默认为False,若为True,则用排序后的数据代替原数据
kind参数
kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
选择哪一种排序算法,默认为快速排序
na_position参数
na_position : {‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
把缺失值放在什么位置,默认为last,即把缺失值放在最后,可设置为first即把缺失值放在最前面
例子
import pandas as pd
data = pd.DataFrame([[1, 'Wang', 20], [2, 'Li', 20], [1, 'Wang', 21], [1, 'Wang', 20]], columns=['id', 'name', 'age'])
数据为
id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20
按id和age进行排序,id升序,age降序
data = data.sort_values(['id', 'age'], ascending=[True, False])
结果是
id name age
2 1 Wang 21
0 1 Wang 20
3 1 Wang 20
1 2 Li 20
按行排序,让在每一行出现从小到大的顺序
data = data.sort_values(0, axis=1)
结果是
id age name
2 1 21 Wang
0 1 20 Wang
3 1 20 Wang
1 2 20 Li