pandas排序

最新推荐文章于 2024-02-29 20:00:00 发布

wyc-

最新推荐文章于 2024-02-29 20:00:00 发布

阅读量368

点赞数

分类专栏： pandas

本文链接：https://blog.csdn.net/qq_28120673/article/details/103473332

版权

pandas 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

pandas排序

文章目录

pandas排序

pandas支持三种排序方式:

sorting by index labels
sorting by column values
sorting by a combination of both

By index

series.sort_index和DataFrame.sort_index()方法被用来根据index排序pandas对象。

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'one':pd.Series(np.random.randn(3),index=['a','b','c']),
     'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
     'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])                                       
})

unsorted_df = df.reindex(index=['a','d','c','b'],
                                columns=['three','teo','one'])

unsorted_df

	three	teo	one
a	NaN	NaN	1.474084
d	2.021092	NaN	NaN
c	0.057446	NaN	-0.085553
b	0.705659	NaN	-0.101295

unsorted_df.sort_index()

	three	teo	one
a	NaN	NaN	1.474084
b	0.705659	NaN	-0.101295
c	0.057446	NaN	-0.085553
d	2.021092	NaN	NaN

unsorted_df.sort_index(ascending=False)

	three	teo	one
d	2.021092	NaN	NaN
c	0.057446	NaN	-0.085553
b	0.705659	NaN	-0.101295
a	NaN	NaN	1.474084

unsorted_df.sort_index(axis=1) # 设置排序的轴

	one	teo	three
a	1.474084	NaN	NaN
d	NaN	NaN	2.021092
c	-0.085553	NaN	0.057446
b	-0.101295	NaN	0.705659

By values

Series.sort_values()方法按照Series的值排序。

DataFrame.sort_values()方法按照其columns或rows的值排序。

df1 = pd.DataFrame({
    'one':[2,1,1,1],
    'two':[1,2,3,4],
    'three':[5,4,3,2]
})

df1.sort_values(by='two')

	one	two	three
0	2	1	5
1	1	2	4
2	1	3	3
3	1	4	2

df1.sort_values(by=['one','two'])

	one	two	three
1	1	2	4
2	1	3	3
3	1	4	2
0	2	1	5

这些方法通过na_position参数对NA值进行特殊处理：

 s = pd.Series(['a', 'a', 'b', 'b', 'a', 'a', np.nan, 'c', 'd', 'a'])

s[2] = np.nan

s.sort_values()

0      a
1      a
4      a
5      a
9      a
3      b
7      c
8      d
2    NaN
6    NaN
dtype: object

s.sort_values(na_position='first')

2    NaN
6    NaN
0      a
1      a
4      a
5      a
9      a
3      b
7      c
8      d
dtype: object

By indexes and values

作为by参数传递给DataFrame.sort_values（）的字符串可以引用列名或索引级名。

 idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 2),('b', 2), ('b', 1), ('b', 1)])

idx.names = ['first', 'second']

idx

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 2),
            ('b', 2),
            ('b', 1),
            ('b', 1)],
           names=['first', 'second'])

df_multi = pd.DataFrame({'A': np.arange(6, 0, -1)},index=idx)

df_multi

		A
first	second
a	1	6
	2	5
	2	4
b	2	3
	1	2
	1	1

df_multi.sort_values(by=['second', 'A'])

		A
first	second
b	1	1
b	1	2
a	1	6
b	2	3
a	2	4
a	2	5

如果字符串与列名和索引级名称都匹配，则会发出警告，并且列优先。

搜索排序

Series的serachsorted()方法工作方式与numpy.ndarray.serachsorted()方法类似。

该方式表示将传入的数组的每一个元素插入到Series对象中,返回插入的位置。（实际上并没有插入Serives,只是求，如果插入，那么插入的位置是什么。）

ser = pd.Series([1, 2, 3])

ser.searchsorted([0,3])

array([0, 2], dtype=int64)

ser.searchsorted([0, 4])

array([0, 3], dtype=int64)

ser.searchsorted([1, 3], side='right')

array([1, 3], dtype=int64)

ser.searchsorted([1, 3], side='left')

array([0, 2], dtype=int64)

ser = pd.Series([3, 1, 2])
ser.searchsorted([0, 3], sorter=np.argsort(ser))

array([0, 2], dtype=int64)

最小/最大值

Series有nsmallest()和nlargest()方法，它们返回最大或者最小的前n个值。对于一个大的Series,大的速度要比排序整个Series要快的多。

s = pd.Series(np.random.permutation(10))

s

0    9
1    2
2    8
3    6
4    0
5    7
6    3
7    4
8    1
9    5
dtype: int32

s.sort_values()

4    0
8    1
1    2
6    3
7    4
9    5
3    6
5    7
2    8
0    9
dtype: int32

s.nsmallest(4)

4    0
8    1
1    2
6    3
dtype: int32

s.nlargest(3)

0    9
2    8
5    7
dtype: int32

DataFrame同样也有nlargest()和nsmallist()方法

df = pd.DataFrame({
    'a':[-2,-1,1,10,8,11,-1],
    'b':list('abdceff'),
    'c':[1.,2.,4.,3.2,np.nan,3.,4.0]
})

df.nlargest(3,'a')

	a	b	c
5	11	f	3.0
3	10	c	3.2
4	8	e	NaN

df.nlargest(5,['a','c'])

	a	b	c
5	11	f	3.0
3	10	c	3.2
4	8	e	NaN
2	1	d	4.0
6	-1	f	4.0

根据多重索引排序

当column是多重索引时，必须明确指明排序依据的levels。

df1

	one	two	three
0	2	1	5
1	1	2	4
2	1	3	3
3	1	4	2

df1.columns = pd.MultiIndex.from_tuples([
    ('a','one'),
    ('a','two'),
    ('b','three')
])

df1

	a		b
	one	two	three
0	2	1	5
1	1	2	4
2	1	3	3
3	1	4	2

df1.sort_values(by=('a','two'))

	a		b
	one	two	three
0	2	1	5
1	1	2	4
2	1	3	3
3	1	4	2

wyc-

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
pandas排序

pandas排序文章目录pandas排序By indexBy valuesBy indexes and values搜索排序最小/最大值根据多重索引排序pandas支持三种排序方式:sorting by index labelssorting by column valuessorting by a combination of bothBy indexseries.sort_i...
复制链接

扫一扫

专栏目录