目录
5. 使用 .sort_values() 查看按照值排序的数据
5. 使用 .sort_values() 查看按照值排序的数据
5.1 .sort_values() 语法
语法:.sort_values(by, axis=0, ascending = Ture, inplace = Flase, kind = ‘quicksort’,
na_position=‘last’,, ignore_index=False, key=None)
相对于 .sort_index() 函数,此处多了一个 by
- by:可以是字符串(行名或者列名),也可以是是字符串的列表 (多行或者多列),需要配合 axis 使用。 如果axis=0 或者 “index”,那么 by="列名";如果axis=1 或者 “columns”,那么 by="行名"。注意:必须指定 by 参数,即必须指定哪几行或哪几列;无法根据 index 名和 columns 名排序
- axis:{0 or 'index', 1 or 'columns'},axis 默认为 0,即指按照行的索引进行排序;axis 设置为 1,即指按照列的索引进行排序
- ascending:布尔值 或布尔值列表,默认为 True,即升序,设置为 False 时候为降序。
- inplace:布尔值, 默认为 False。如果设置为True, 则在原地(原来的数据)进行操作。
- kind : 可以是 'quicksort', 'mergesort', 'heapsort',默认是 'quicksort'。用户可以自己选用。在数据量很大的时候,kind 是排序性能上很重要的因素。
- na_position:缺失值默认排序,{"first","last"}。默认是“last”。参数“ first”将NaN放在开头,“ last”将NaN放在结尾。
- ignore_index:布尔量,默认为 False,如果为 True, 那么 axis 则是 label 0,1,2;这是新加的
- key:这是一个可调用的函数,即在排序之前先对值执行 key 函数。这有点类似于内置函数 sorted() 函数里面的 key 函数
help(df.sort_values)
Help on method sort_values in module pandas.core.frame: sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: 'ValueKeyFunc' = None) method of pandas.core.frame.DataFrame instance Sort by the values along either axis. Parameters ---------- by : str or list of str Name or list of names to sort by. - if `axis` is 0 or `'index'` then `by` may contain index levels and/or column labels. - if `axis` is 1 or `'columns'` then `by` may contain column levels and/or index labels. axis : {0 or 'index', 1 or 'columns'}, default 0 Axis to be sorted. ascending : bool or list of bool, default True Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by. inplace : bool, default False If True, perform operation in-place. kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort' Choice of sorting algorithm. See also ndarray.np.sort for more information. `mergesort` is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. na_position : {'first', 'last'}, default 'last' Puts NaNs at the beginning if `first`; `last` puts NaNs at the end. ignore_index : bool, default False If True, the resulting axis will be labeled 0, 1, …, n - 1. .. versionadded:: 1.0.0 key : callable, optional Apply the key function to the values before sorting. This is similar to the `key` argument in the builtin :meth:`sorted` function, with the notable difference that this `key` function should be *vectorized*. It should expect a ``Series`` and return a Series with the same shape as the input. It will be applied to each column in `by` independently. .. versionadded:: 1.1.0 Returns ------- DataFrame or None DataFrame with sorted values or None if ``inplace=True``. See Also -------- DataFrame.sort_index : Sort a DataFrame by the index. Series.sort_values : Similar method for a Series.
5.2 .sort_values() 范例
help 出来的内容就可以先看一遍。
Examples -------- >>> df = pd.DataFrame({ ... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'], ... 'col2': [2, 1, 9, 8, 7, 4], ... 'col3': [0, 1, 9, 4, 2, 3], ... 'col4': ['a', 'B', 'c', 'D', 'e', 'F'] ... }) >>> df col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F Sort by col1 >>> df.sort_values(by=['col1']) col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D Sort by multiple columns >>> df.sort_values(by=['col1', 'col2']) col1 col2 col3 col4 1 A 1 1 B 0 A 2 0 a 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D Sort Descending >>> df.sort_values(by='col1', ascending=False) col1 col2 col3 col4 4 D 7 2 e 5 C 4 3 F 2 B 9 9 c 0 A 2 0 a 1 A 1 1 B 3 NaN 8 4 D Putting NAs first >>> df.sort_values(by='col1', ascending=False, na_position='first') col1 col2 col3 col4 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F 2 B 9 9 c 0 A 2 0 a 1 A 1 1 B Sorting with a key function >>> df.sort_values(by='col4', key=lambda col: col.str.lower()) col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F Natural sort with the key argument, using the `natsort <https://github.com/SethMMorton/natsort>` package. >>> df = pd.DataFrame({ ... "time": ['0hr', '128hr', '72hr', '48hr', '96hr'], ... "value": [10, 20, 30, 40, 50] ... }) >>> df time value 0 0hr 10 1 128hr 20 2 72hr 30 3 48hr 40 4 96hr 50 >>> from natsort import index_natsorted >>> df.sort_values( ... by="time", ... key=lambda x: np.argsort(index_natsorted(df["time"])) ... ) time value 0 0hr 10 3 48hr 40 2 72hr 30 4 96hr 50 1 128hr 20
上代码大家就能看清楚了
import pandas as pd
dict_data={"X":["a","d","g","f","i","n"],"Y":["M","N","D","A","C","Y"],"Z":["X","DSF","DST","XX","FGDDSFG","B"]}
df=pd.DataFrame.from_dict(dict_data)
df.index=["01","002","03","004","005","006"]
print(("*"*20+" 初始的 DataFrame 数据"+"*"*20).ljust(80))
print(df)
#按照 Y 列的 value的排序,方向是降序
print(("*"*15+" 打印sort_values(by='Y',ascending=False)的结果"+"*"*15).ljust(80))
print(df.sort_values(by="Y",ascending=False))
#先按照 X 列的 value的排序,方向是升序,然后是按照 Z 列的 value的排序,方向是降序
print(("*"*20+" 打印sort_values(by=['X','Z'],ascending=[True,False])的结果"+"*"*20).ljust(80))
print(df.sort_values(by=["X","Y"],ascending=[True,False]))
# 按照 Z 列的 value的长度排序
print(("*"*15+" 打印sort_values(by='Z',key=lambda x:x.str.len())的结果"+"*"*15).ljust(80))
print(df.sort_values(by='Z',key=lambda x:x.str.len()))
运行结果非常理想
******************** 初始的 DataFrame 数据******************** X Y Z 01 a M X 002 d N DSF 03 g D DST 004 f A XX 005 i C FGDDSFG 006 n Y B *************** 打印sort_values(by='Y',ascending=False)的结果*************** X Y Z 006 n Y B 002 d N DSF 01 a M X 03 g D DST 005 i C FGDDSFG 004 f A XX ******************** 打印sort_values(by=['X','Z'],ascending=[True,False])的结果******************** X Y Z 01 a M X 002 d N DSF 004 f A XX 03 g D DST 005 i C FGDDSFG 006 n Y B *************** 打印sort_values(by='Z',key=lambda x:x.str.len())的结果*************** X Y Z 01 a M X 006 n Y B 004 f A XX 002 d N DSF 03 g D DST 005 i C FGDDSFG