Pandas 模块-操纵数据(4)-数据排序- .sort_index()

江南野栀子

已于 2024-05-22 14:45:31 修改

阅读量1w

点赞数 11

分类专栏： # Python 数据分析文章标签： python pandas 数据分析

于 2021-12-07 17:04:26 首次发布

本文链接：https://blog.csdn.net/u010701274/article/details/121771916

版权

4. 使用 .sort_index() 查看按照轴排序的数据

4.2.5 na_position 的用法

4.2.6 ignore_index 用法

4.2.6 key 用法

4.2.7 key 用法拓展----问题代码1）和解决方案

4.2.8 key 用法拓展----问题代码2）和未知解决方案

4. 使用 .sort_index() 查看按照轴排序的数据

df. sort_index() 可以完成和 df. sort_values() 完全相同的功能，但python更推荐用只用df. sort_index()对“根据行标签”和“根据列标签”排序，其他排序方式推荐用df.sort_values()。

4.1 .sort_index() 语法

语法：下面是最新的，

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

大家可能在很多版本上看到下面的语法结构，尤其是 by，已经被删掉了。

sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)

参数说明：
axis：axis 默认为 0，即指按照行的索引进行排序；axis 设置为 1，即指按照列的索引进行排序
level：默认None，否则按照给定的 level 顺序排列。
ascending：ascending 默认为 True，即升序，设置为 False 时候为降序。
inplace：默认False，否则排序之后的数据直接替换原来的数据框
kind：排序方法，{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’。用户可以自己选用
na_position：缺失值默认排在最后{"first","last"}，参数“ first”将NaN放在开头，“ last”将NaN放在结尾。
~~by：按照某一列或几列数据进行排序，但是by参数貌似不建议使用，已经被删除~~

ignore_index：布尔量，默认为 False，如果为 True, 那么 axis 则是 label 0，1，2；这是新加的

key：这是一个可调用的函数，即在排序之前先对 index 的值执行 key 函数。这有点类似于内置函数 sorted() 函数里面的 key 函数

Help on method sort_index in module pandas.core.frame:

sort_index(axis=0, level=None, ascending: 'Union[Union[bool, int], Sequence[Union[bool, int]]]' = True, inplace: 'bool' = False, kind: 'str' = 'quicksort', na_position: 'str' = 'last', sort_remaining: 'bool' = True, ignore_index: 'bool' = False, key: 'IndexKeyFunc' = None) method of pandas.core.frame.DataFrame instance
    Sort object by labels (along an axis).
    
    Returns a new DataFrame sorted by label if `inplace` argument is
    ``False``, otherwise updates the original DataFrame and returns None.
    
    Parameters
    ----------
    axis : {0 or 'index', 1 or 'columns'}, default 0
        The axis along which to sort.  The value 0 identifies the rows,
        and 1 identifies the columns.
    level : int or level name or list of ints or list of level names
        If not None, sort on values in specified index level(s).
    ascending : bool or list-like of bools, default True
        Sort ascending vs. descending. When the index is a MultiIndex the
        sort direction can be controlled for each level individually.
    inplace : bool, default False
        If True, perform operation in-place.
    kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
        Choice of sorting algorithm. See also ndarray.np.sort for more
        information.  `mergesort` is the only stable algorithm. For
        DataFrames, this option is only applied when sorting on a single
        column or label.
    na_position : {'first', 'last'}, default 'last'
        Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.
        Not implemented for MultiIndex.
    sort_remaining : bool, default True
        If True and sorting by level and index is multilevel, sort by other
        levels too (in order) after sorting by specified level.
    ignore_index : bool, default False
        If True, the resulting axis will be labeled 0, 1, …, n - 1.
    
        .. versionadded:: 1.0.0
    
    key : callable, optional
        If not None, apply the key function to the index values
        before sorting. This is similar to the `key` argument in the
        builtin :meth:`sorted` function, with the notable difference that
        this `key` function should be *vectorized*. It should expect an
        ``Index`` and return an ``Index`` of the same shape. For MultiIndex
        inputs, the key is applied *per level*.
    
        .. versionadded:: 1.1.0
    
    Returns
    -------
    DataFrame or None
        The original DataFrame sorted by the labels or None if ``inplace=True``.
    
    See Also
    --------
    Series.sort_index : Sort Series by the index.
    DataFrame.sort_values : Sort DataFrame by the value.
    Series.sort_values : Sort Series by the value.
    
    Examples
    --------
    >>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
    ...                   columns=['A'])
    >>> df.sort_index()
         A
    1    4
    29   2
    100  1
    150  5
    234  3
    
    By default, it sorts in ascending order, to sort in descending order,
    use ``ascending=False``
    
    >>> df.sort_index(ascending=False)
         A
    234  3
    150  5
    100  1
    29   2
    1    4
    
    A key function can be specified which is applied to the index before
    sorting. For a ``MultiIndex`` this is applied to each level separately.
    
    >>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
    >>> df.sort_index(key=lambda x: x.str.lower())
       a
    A  1
    b  2
    C  3
    d  4

4.2 .sort_index() 范例

先准备数据

在使用各种api之前，先创建测试使用数据：

代码：

import numpy as np
import pandas as pd
dict_data={"a":list("abcdef"),"b":list("defghi"),"c":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
df

运行结果：

Out[1]:

	a	b	c
0	a	d	g
1	b	e	h
2	c	f	i
3	d	g	j
4	e	h	k
5	f	i	l

4.2.1 axis 用法

axis =0，按照行排序；axis = 1，按照列排序；

In [27]: df.sort_index(axis=0,ascending=False)

Out[27]:

a b c

5 f i l

4 e h k

3 d g j

2 c f i

1 b e h

0 a d g

In [28]: df.sort_index(axis=1,ascending=False)

Out[28]:

c b a

0 g d a

1 h e b

2 i f c

3 j g d

4 k h e

5 l i f

	a	b	c
5	f	i	l
4	e	h	k
3	d	g	j
2	c	f	i
1	b	e	h
0	a	d	g

	c	b	a
0	g	d	a
1	h	e	b
2	i	f	c
3	j	g	d
4	k	h	e
5	l	i	f

4.2.2 ascending 用法

In [25]: df.sort_index(ascending=False)

Out[25]:

a b c

5 f i l

4 e h k

3 d g j

2 c f i

1 b e h

0 a d g

In [26]: df.sort_index(ascending=True)

Out[26]:

a b c

0 a d g

1 b e h

2 c f i

3 d g j

4 e h k

5 f i l

	a	b	c
5	f	i	l
4	e	h	k
3	d	g	j
2	c	f	i
1	b	e	h
0	a	d	g

4.2.3 inplace 用法

inplace 默认为 False，也就是不改变原来的 DataFrame；使用 inplace 参数为 True 时候，则改变原来的 DataFrame。代码如下

df.sort_index(axis=1,ascending=False,inplace=True)
print(df)

运行结果如下，可以看到 df 已经被改变了

   c  b  a
0  g  d  a
1  h  e  b
2  i  f  c
3  j  g  d
4  k  h  e
5  l  i  f

4.2.4 kind 用法

kind：排序方法，{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’。用户可以自己选用

这个其实在大量数据进行排序时候才有意义，也和数据的特点有关系，具体选择哪个排序方法，需要使用者知道一定的数据排序知识，在此不赘述。

4.2.5 na_position 的用法

na_position：{'first'，'last'}，默认为'last'

如果 na_position 为 “first”，则将NaNs置于开头， na_position 为`last' 将 NaNs 置于末尾。

不是为多索引实现的。

4.2.6 ignore_index 用法

ignore_index：布尔量，默认为 False，如果为 True, 那么 axis 则是 label 0，1，2；

直接看代码

dict_data={"a":list("abcdef"),"b":list("defghi"),"c":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
print(("*"*20+" 初始的 DataFrame 数据"+"*"*20).ljust(80))
print(df)
print(("*"*20+" 修改 index 后的 DataFrame 数据"+"*"*20).ljust(80))
df.index=["01","002","03","004","005","006"]
print(df)
print(("*"*20+" 再修改 columns 后的 DataFrame 数据"+"*"*20).ljust(80))
df.columns=["X","Y","Z"]
print(df)
print(("*"*20+" sort_index 排序 axis =0 ignore_index=True "+"*"*20).ljust(80))
print(df.sort_index(ignore_index=True,axis=0))
print(("*"*20+" sort_index 排序 axis =1 ignore_index=True "+"*"*20).ljust(80))
print(df.sort_index(ignore_index=True,axis=1)) # 这时候 axis 不起作用
print(("*"*20+" 当前的 DataFrame 数据"+"*"*20).ljust(80))
print(df)

运行结果

******************** 初始的 DataFrame 数据********************                       
   a  b  c
0  a  d  g
1  b  e  h
2  c  f  i
3  d  g  j
4  e  h  k
5  f  i  l
******************** 修改 index 后的 DataFrame 数据********************               
     a  b  c
01   a  d  g
002  b  e  h
03   c  f  i
004  d  g  j
005  e  h  k
006  f  i  l
******************** 再修改 columns 后的 DataFrame 数据********************            
     X  Y  Z
01   a  d  g
002  b  e  h
03   c  f  i
004  d  g  j
005  e  h  k
006  f  i  l
******************** sort_index 排序 axis =0 ign

最低0.47元/天解锁文章

江南野栀子

关注

11
点赞
踩
49

收藏

觉得还不错? 一键收藏
打赏
0
评论
Pandas 模块-操纵数据(4)-数据排序- .sort_index()

使用 .sort_index() 查看按照轴排序的数据df. sort_index() 可以完成和 df. sort_values() 完全相同的功能，但python更推荐用只用df. sort_index()对“根据行标签”和“根据列标签”排序，其他排序方式推荐用df.sort_values()。
复制链接

扫一扫