Python基础：Pandas-排序函数sort_values()_dataframe sort values-CSDN博客

本文链接：https://blog.csdn.net/weixin_42414714/article/details/116084779

本文详细介绍了Pandas DataFrame的sort_values方法，包括参数用法、示例及排序策略。通过该方法，可以对DataFrame的列或索引进行升序或降序排序，甚至处理缺失值的位置和自定义排序函数。示例展示了单列、多列、降序以及自定义函数排序的实现，帮助理解排序的灵活性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

DataFrame.sort_values(by, 
					  axis=0, 
					  ascending=True, 
					  inplace=False, 
                      kind='quicksort', 
                      na_position='last', 
                      ignore_index=False, 
                      key=None)

方法简介

原理类似于SQL中的order by，可以将数据集依照某个字段中的数据进行排序，即可根据指定列数据也可根据指定行的数据排序。

参数说明

Parameters

by

str or list of str

指定列名(axis=0或’index’)或索引值(axis=1或’columns’)
axis

{0 or ‘index’, 1 or ‘columns’}, default 0

若axis=0或’index’，则按照指定列中数据大小排序；若axis=1或’columns’，则按照指定索引中数据大小排序，默认axis=0
ascending

bool or list of bool, default True

是否按指定列的数组升序排列，默认为True，即升序排列
inplace

bool, default False

是否用排序后的数据集替换原来的数据，默认为False，即不替换
kind

{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position

{‘first’, ‘last’}, default ‘last’

{‘first’,‘last’}，设定缺失值的显示位置
ignore_index

bool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.New in version 1.0.0.
key

callable, optional

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.New in version 1.1.0.

Returns

DataFrame or None

DataFrame with sorted values or None if inplace=True.

用法举例

df = pd.DataFrame({
    'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
    'col2': [2, 1, 9, 8, 7, 4],
    'col3': [0, 1, 9, 4, 2, 3],
    'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
df
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

单列排序

>>> df.sort_values(by=['col1'])
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

多列排序

>>> df.sort_values(by=['col1', 'col2'])
  col1  col2  col3 col4
1    A     1     1    B
0    A     2     0    a
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

降序

>>> df.sort_values(by='col1', ascending=False)
  col1  col2  col3 col4
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B
3  NaN     8     4    D

空值放在最前面

>>> df.sort_values(by='col1', ascending=False, na_position='first')
  col1  col2  col3 col4
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B

自定义函数排序

>>> df.sort_values(by='col4', key=lambda col: col.str.lower())
   col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

带关键参数的自然排序（配合natsort包）

>>> df = pd.DataFrame({
...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
...    "value": [10, 20, 30, 40, 50]
... })
>>> df
    time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> df.sort_values(
...    by="time",
...    key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
    time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20