Pandas 模块-操纵数据(5)-数据排序- .sort_values()

江南野栀子

已于 2024-04-17 16:08:59 修改

阅读量6.2k

点赞数 3

分类专栏： # Python 数据分析文章标签： python pandas 数据分析

于 2021-12-07 19:59:05 首次发布

本文链接：https://blog.csdn.net/u010701274/article/details/121772068

版权

Python 数据分析专栏收录该内容

22 篇文章 29 订阅

订阅专栏

5. 使用 .sort_values() 查看按照值排序的数据

5.1 .sort_values() 语法

5.2 .sort_values() 范例

5. 使用 .sort_values() 查看按照值排序的数据

5.1 .sort_values() 语法

语法：.sort_values(by, axis=0, ascending = Ture, inplace = Flase, kind = ‘quicksort’,

na_position=‘last’,, ignore_index=False, key=None)

相对于 .sort_index() 函数，此处多了一个 by

by：可以是字符串（行名或者列名），也可以是是字符串的列表（多行或者多列），需要配合 axis 使用。如果axis=0 或者 “index”，那么 by="列名"；如果axis=1 或者 “columns”，那么 by="行名"。注意：必须指定 by 参数，即必须指定哪几行或哪几列；无法根据 index 名和 columns 名排序
axis：{0 or 'index', 1 or 'columns'}，axis 默认为 0，即指按照行的索引进行排序；axis 设置为 1，即指按照列的索引进行排序
ascending：布尔值或布尔值列表，默认为 True，即升序，设置为 False 时候为降序。
inplace：布尔值, 默认为 False。如果设置为True, 则在原地（原来的数据）进行操作。
kind : 可以是 'quicksort', 'mergesort', 'heapsort'，默认是 'quicksort'。用户可以自己选用。在数据量很大的时候，kind 是排序性能上很重要的因素。
na_position：缺失值默认排序，{"first","last"}。默认是“last”。参数“ first”将NaN放在开头，“ last”将NaN放在结尾。
ignore_index：布尔量，默认为 False，如果为 True, 那么 axis 则是 label 0，1，2；这是新加的
key：这是一个可调用的函数，即在排序之前先对值执行 key 函数。这有点类似于内置函数 sorted() 函数里面的 key 函数

help(df.sort_values)

Help on method sort_values in module pandas.core.frame:

sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: 'ValueKeyFunc' = None) method of pandas.core.frame.DataFrame instance
    Sort by the values along either axis.
    
    Parameters
    ----------
            by : str or list of str
                Name or list of names to sort by.
    
                - if `axis` is 0 or `'index'` then `by` may contain index
                  levels and/or column labels.
                - if `axis` is 1 or `'columns'` then `by` may contain column
                  levels and/or index labels.
    axis : {0 or 'index', 1 or 'columns'}, default 0
         Axis to be sorted.
    ascending : bool or list of bool, default True
         Sort ascending vs. descending. Specify list for multiple sort
         orders.  If this is a list of bools, must match the length of
         the by.
    inplace : bool, default False
         If True, perform operation in-place.
    kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
         Choice of sorting algorithm. See also ndarray.np.sort for more
         information.  `mergesort` is the only stable algorithm. For
         DataFrames, this option is only applied when sorting on a single
         column or label.
    na_position : {'first', 'last'}, default 'last'
         Puts NaNs at the beginning if `first`; `last` puts NaNs at the
         end.
    ignore_index : bool, default False
         If True, the resulting axis will be labeled 0, 1, …, n - 1.
    
         .. versionadded:: 1.0.0
    
    key : callable, optional
        Apply the key function to the values
        before sorting. This is similar to the `key` argument in the
        builtin :meth:`sorted` function, with the notable difference that
        this `key` function should be *vectorized*. It should expect a
        ``Series`` and return a Series with the same shape as the input.
        It will be applied to each column in `by` independently.
    
        .. versionadded:: 1.1.0
    
    Returns
    -------
    DataFrame or None
        DataFrame with sorted values or None if ``inplace=True``.
    
    See Also
    --------
    DataFrame.sort_index : Sort a DataFrame by the index.
    Series.sort_values : Similar method for a Series.

5.2 .sort_values() 范例

help 出来的内容就可以先看一遍。

Examples
    --------
    >>> df = pd.DataFrame({
    ...     'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
    ...     'col2': [2, 1, 9, 8, 7, 4],
    ...     'col3': [0, 1, 9, 4, 2, 3],
    ...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
    ... })
    >>> df
      col1  col2  col3 col4
    0    A     2     0    a
    1    A     1     1    B
    2    B     9     9    c
    3  NaN     8     4    D
    4    D     7     2    e
    5    C     4     3    F
    
    Sort by col1
    
    >>> df.sort_values(by=['col1'])
      col1  col2  col3 col4
    0    A     2     0    a
    1    A     1     1    B
    2    B     9     9    c
    5    C     4     3    F
    4    D     7     2    e
    3  NaN     8     4    D
    
    Sort by multiple columns
    
    >>> df.sort_values(by=['col1', 'col2'])
      col1  col2  col3 col4
    1    A     1     1    B
    0    A     2     0    a
    2    B     9     9    c
    5    C     4     3    F
    4    D     7     2    e
    3  NaN     8     4    D
    
    Sort Descending
    
    >>> df.sort_values(by='col1', ascending=False)
      col1  col2  col3 col4
    4    D     7     2    e
    5    C     4     3    F
    2    B     9     9    c
    0    A     2     0    a
    1    A     1     1    B
    3  NaN     8     4    D
    
    Putting NAs first
    
    >>> df.sort_values(by='col1', ascending=False, na_position='first')
      col1  col2  col3 col4
    3  NaN     8     4    D
    4    D     7     2    e
    5    C     4     3    F
    2    B     9     9    c
    0    A     2     0    a
    1    A     1     1    B
    
    Sorting with a key function
    
    >>> df.sort_values(by='col4', key=lambda col: col.str.lower())
       col1  col2  col3 col4
    0    A     2     0    a
    1    A     1     1    B
    2    B     9     9    c
    3  NaN     8     4    D
    4    D     7     2    e
    5    C     4     3    F
    
    Natural sort with the key argument,
    using the `natsort <https://github.com/SethMMorton/natsort>` package.
    
    >>> df = pd.DataFrame({
    ...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
    ...    "value": [10, 20, 30, 40, 50]
    ... })
    >>> df
        time  value
    0    0hr     10
    1  128hr     20
    2   72hr     30
    3   48hr     40
    4   96hr     50
    >>> from natsort import index_natsorted
    >>> df.sort_values(
    ...    by="time",
    ...    key=lambda x: np.argsort(index_natsorted(df["time"]))
    ... )
        time  value
    0    0hr     10
    3   48hr     40
    2   72hr     30
    4   96hr     50
    1  128hr     20

上代码大家就能看清楚了

import pandas as pd
dict_data={"X":["a","d","g","f","i","n"],"Y":["M","N","D","A","C","Y"],"Z":["X","DSF","DST","XX","FGDDSFG","B"]}
df=pd.DataFrame.from_dict(dict_data)
df.index=["01","002","03","004","005","006"]
print(("*"*20+" 初始的 DataFrame 数据"+"*"*20).ljust(80))
print(df)

#按照 Y 列的 value的排序，方向是降序
print(("*"*15+" 打印sort_values(by='Y',ascending=False)的结果"+"*"*15).ljust(80))  
print(df.sort_values(by="Y",ascending=False))

#先按照 X 列的 value的排序，方向是升序，然后是按照 Z 列的 value的排序，方向是降序
print(("*"*20+" 打印sort_values(by=['X','Z'],ascending=[True,False])的结果"+"*"*20).ljust(80))  
print(df.sort_values(by=["X","Y"],ascending=[True,False]))


# 按照 Z 列的 value的长度排序
print(("*"*15+" 打印sort_values(by='Z',key=lambda x:x.str.len())的结果"+"*"*15).ljust(80))  
print(df.sort_values(by='Z',key=lambda x:x.str.len()))

运行结果非常理想

******************** 初始的 DataFrame 数据********************                       
     X  Y        Z
01   a  M        X
002  d  N      DSF
03   g  D      DST
004  f  A       XX
005  i  C  FGDDSFG
006  n  Y        B
*************** 打印sort_values(by='Y',ascending=False)的结果***************         
     X  Y        Z
006  n  Y        B
002  d  N      DSF
01   a  M        X
03   g  D      DST
005  i  C  FGDDSFG
004  f  A       XX
******************** 打印sort_values(by=['X','Z'],ascending=[True,False])的结果********************
     X  Y        Z
01   a  M        X
002  d  N      DSF
004  f  A       XX
03   g  D      DST
005  i  C  FGDDSFG
006  n  Y        B
*************** 打印sort_values(by='Z',key=lambda x:x.str.len())的结果***************
     X  Y        Z
01   a  M        X
006  n  Y        B
004  f  A       XX
002  d  N      DSF
03   g  D      DST
005  i  C  FGDDSFG