Pandas库学习笔记(6) Pandas中Series和DataFrame数据类型的排序

最新推荐文章于 2024-02-08 20:35:28 发布

敲代码的小风

最新推荐文章于 2024-02-08 20:35:28 发布

阅读量183

点赞数

分类专栏： Pandas库学习笔记文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/m0_46653437/article/details/110451095

版权

Pandas库学习笔记专栏收录该内容

9 篇文章 3 订阅

订阅专栏

参考链接: Python数据分析与展示
参考链接: Pandas官网
参考链接: User Guide
参考链接: Getting started tutorials

对索引排序:

.sort_index()方法在指定轴上根据索引进行排序，默认升序
.sort_index(axis=0, ascending=True)

实验演示1:

Microsoft Windows [版本 10.0.18363.1198]
(c) 2019 Microsoft Corporation。保留所有权利。

C:\Users\chenxuqi>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> b = pd.DataFrame(np.arange(20).reshape(4,5),index=["c","a","d","b"])
>>> b
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19
>>> # 对行索引升序排序,而不关注数据本身
... b.sort_index()
    0   1   2   3   4
a   5   6   7   8   9
b  15  16  17  18  19
c   0   1   2   3   4
d  10  11  12  13  14
>>> # 对行索引降序排序,而不关注数据本身
... b.sort_index(ascending=False)
    0   1   2   3   4
d  10  11  12  13  14
c   0   1   2   3   4
b  15  16  17  18  19
a   5   6   7   8   9
>>> b
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19
>>> c = b.sort_index(axis=1,ascending=False)
>>> c
    4   3   2   1   0
c   4   3   2   1   0
a   9   8   7   6   5
d  14  13  12  11  10
b  19  18  17  16  15
>>> c = c.sort_index()
>>> c
    4   3   2   1   0
a   9   8   7   6   5
b  19  18  17  16  15
c   4   3   2   1   0
d  14  13  12  11  10
>>>
>>>

对数据本身进行排序:

.sort_values()方法在指定轴上根据数值进行排序，默认升序
Series.sort_values(axis=0, ascending=True)
DataFrame.sort_values(by, axis=0, ascending=True)
by : axis轴上的某个索引或索引列表

实验演示2,对Series的排序:

>>> # 对数据本身的排序
... bSeries = pd.Series([99,44,203,34,56],index=['a','b','c','d','e'])
>>> bSeries
a     99
b     44
c    203
d     34
e     56
dtype: int64
>>> bSeries.sort_values(axis=0,ascending=True)
d     34
b     44
e     56
a     99
c    203
dtype: int64
>>> bSeries.sort_values(axis=0,ascending=False)
c    203
a     99
e     56
b     44
d     34
dtype: int64
>>>
>>>

实验演示3,对DataFrame的排序:

Microsoft Windows [版本 10.0.18363.1198]
(c) 2019 Microsoft Corporation。保留所有权利。

C:\Users\chenxuqi>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> b = pd.DataFrame(np.arange(20).reshape(4,5),index=['c','a','d','b'])
>>> b
    0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19
>>> c = b.sort_values(2,ascending=False)
>>> c
    0   1   2   3   4
b  15  16  17  18  19
d  10  11  12  13  14
a   5   6   7   8   9
c   0   1   2   3   4
>>> c = c.sort_values("a",axis=1,ascending=False)
>>> c
    4   3   2   1   0
b  19  18  17  16  15
d  14  13  12  11  10
a   9   8   7   6   5
c   4   3   2   1   0
>>>
>>>
>>> np.random.seed(20200910)
>>> b = pd.DataFrame(np.random.randint(0,100,(4,5)),index=['c','a','d','b'])
>>> b
    0   1   2   3   4
c  48  10  29  84  20
a  48   9  22  12   6
d  11  35  24   7  85
b  99  88  84  42  42
>>> b.sort_values(2,ascending=True)
    0   1   2   3   4
a  48   9  22  12   6
d  11  35  24   7  85
c  48  10  29  84  20
b  99  88  84  42  42
>>> c = b.sort_values(2,ascending=False)
>>> c
    0   1   2   3   4
b  99  88  84  42  42
c  48  10  29  84  20
d  11  35  24   7  85
a  48   9  22  12   6
>>> c = c.sort_values("a",axis=1,ascending=False)
>>> c
    0   2   3   1   4
b  99  84  42  88  42
c  48  29  84  10  20
d  11  24   7  35  85
a  48  22  12   9   6
>>> c.sort_values("a",axis=1,ascending=True)
    4   1   3   2   0
b  42  88  42  84  99
c  20  10  84  29  48
d  85  35   7  24  11
a   6   9  12  22  48
>>>

在排序中NaN被统一放到末尾:

>>>
>>> # 排序中NaN统一被放到末尾
... np.random.seed(20200910)
>>> a = pd.DataFrame(np.random.randint(0,100,(3,4)),index=['a','b','c'])
>>> a
    0   1   2   3
a  48  10  29  84
b  20  48   9  22
c  12   6  11  35
>>> b = pd.DataFrame(np.random.randint(0,100,(4,5)),index=['c','a','d','b'])
>>> b
    0   1   2   3   4
c  24   7  85  99  88
a  84  42  42  48  12
d  58  15  42  36  53
b   8  83  19  48  28
>>> c = a + b
>>> c
       0      1     2      3   4
a  132.0   52.0  71.0  132.0 NaN
b   28.0  131.0  28.0   70.0 NaN
c   36.0   13.0  96.0  134.0 NaN
d    NaN    NaN   NaN    NaN NaN
>>> c.sort_values(2,ascending=False)
       0      1     2      3   4
c   36.0   13.0  96.0  134.0 NaN
a  132.0   52.0  71.0  132.0 NaN
b   28.0  131.0  28.0   70.0 NaN
d    NaN    NaN   NaN    NaN NaN
>>> c.sort_values(2,ascending=True)
       0      1     2      3   4
b   28.0  131.0  28.0   70.0 NaN
a  132.0   52.0  71.0  132.0 NaN
c   36.0   13.0  96.0  134.0 NaN
d    NaN    NaN   NaN    NaN NaN
>>>
>>>