pandas reindex_如何使用pandas对象方法？

最新推荐文章于 2023-11-01 16:21:36 发布

weixin_39582480

最新推荐文章于 2023-11-01 16:21:36 发布

阅读量162

点赞数

文章标签： pandas reindex

一起学习，一起成长！

1. 重新索引

Pandas对象的一个重要方法是reindex，其作用是创建一个适应新索引的新对象。调用该Series的reindex将会根据新索引进行重排。如果某个索引值当前不存在，就引入缺失值：

1.1 reindex

In [70]: obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])

In [72]: obj2=obj.reindex(['a','b','c','d','e'])

In [73]: obj2

Out[73]:

a -5.3

b 7.2

c 3.6

d 4.5

e NaN

dtype: float64

1.2 使用fill_value调整缺失值

In [75]: obj.reindex(['a','b','c','d','e'],fill_value=0)

Out[75]:

a -5.3

b 7.2

c 3.6

d 4.5

e 0.0

dtype: float64

1.3 插值处理

对于时间序列这样的有序数据，重新索引时可能需要做一些插值处理。method选项即可达到此目的，例如，使用ffill可以实现前向值填充：

In [76]: obj3=Series(['blue','purple','yellow'],index=[0,2,4])

In [77]: obj3.reindex(range(6),method='ffill')

Out[77]:

0 blue

1 blue

2 purple

3 purple

4 yellow

5 yellow

dtype: object

表5-2-1：reindex的(插值)method选项

ffill或pad：前向填充(或搬运)值
bfill或backfill：后向填充(或搬运)值

对于DataFrame，reindex可以修改(行)索引、列，或两个都修改。如果仅传入一个序列，则会重新索引行：

In [78]: frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['Ohio','Texas','California'])

In [79]: frame

Out[79]:

Ohio Texas California

a 0 1 2

c 3 4 5

d 6 7 8

表5.2.1.1：reindex函数的参数

index：用作索引的新序列。既可以是index实例，也可以是其他序列型的python数据结构。Index会被完全使用，就像没有任何复制一样。
method：插值(填充)方式。
fill_value：在重新索引的过程中，需要引入缺失值时使用的替代值。
limit：前向或后向填充时的最大填充量。
level：在MultiIndex的指定级别上匹配简单索引，否则选取其子集。
copy：默认为True，无论如何都赋值；如果为False，则新旧相等就不赋值。

2. 丢弃指定轴上的项

丢弃某条轴上的一个或多个项很简单，只要有一个索引数组或列表即可。由于需要执行一些数据整理和集合逻辑，所以drop方法返回的是一个在指定轴上删除了指定值的新对象：

In [80]: obj=Series(np.arange(5),index=['a','b','c','d','e'])

In [82]: obj.drop(['d','c'])

Out[82]:

a 0

b 1

e 4

dtype: int32

对于DataFrame，可以删除任意轴上的索引值：

In [83]: data=DataFrame(np.arange(16).reshape((4,4)),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four'])

2.1 删除行

In [84]: data.drop(['Colorado','Ohio'])

Out[84]:

one two three four

Utah 8 9 10 11

New York 12 13 14 15

2.2 删除列

In [85]: data.drop(['two','four'],axis=1)

Out[85]:

one three

Ohio 0 2

Colorado 4 6

Utah 8 10

3 索引、选取和过滤

3.1 Series索引

Series索引(obj[…])的工作方式类似于Numpy数组的索引，只不过Series的索引值不只是整数。

In [86]: obj=Series(np.arange(4.0),index=['a','b','c','d'])

In [87]: obj

Out[87]:

a 0.0

b 1.0

c 2.0

d 3.0

dtype: float64

In [88]: obj['b']

Out[88]: 1.0

In [89]: obj[1]

Out[89]: 1.0

In [90]: obj[2:4]

Out[90]:

c 2.0

d 3.0

dtype: float64

In [91]: obj[['b','a','d']]

Out[91]:

b 1.0

a 0.0

d 3.0

dtype: float64

In [92]: obj[[1,3]]

Out[92]:

b 1.0

d 3.0

dtype: float64

In [93]: obj[obj<2]

Out[93]:

a 0.0

b 1.0

dtype: float64

利用标签的切片与普通的python切片运算不同，其末端是包含的：

In [94]: obj['b':'c']

Out[94]:

b 1.0

c 2.0

dtype: float64

设置的方式如下：

In [96]: obj['b':'c']=5

In [97]: obj

Out[97]:

a 0.0

b 5.0

c 5.0

d 3.0

dtype: float64

3.2 DataFrame索引

对DataFrame进行索引其实就是获取一个或多个列：

In [98]: data=DataFrame(np.arange(16).reshape((4,4)),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four'])

In [99]: data

Out[99]:

one two three four

Ohio 0 1 2 3

Colorado 4 5 6 7

Utah 8 9 10 11

New York 12 13 14 15

In [100]: data['two']

Out[100]:

Ohio 1

Colorado 5

Utah 9

New York 13

Name: two, dtype: int32

In [101]: data[['three','one']]

Out[101]:

three one

Ohio 2 0

Colorado 6 4

Utah 10 8

New York 14 12

几种特殊情况的索引

首先，通过切片或布尔型数组选取行。

In [102]: data[:2]

Out[102]:

one two three four

Ohio 0 1 2 3

Colorado 4 5 6 7

In [103]: data[data['three']>5]

Out[103]:

one two three four

Colorado 4 5 6 7

Utah 8 9 10 11

New York 12 13 14 15

另一种方法是通过布尔型DataFrame进行索引.

In [104]: data<5

Out[104]:

one two three four

Ohio True True True True

Colorado True False False False

Utah False False False False

New York False False False False

In [105]: data[data<5]=0

In [106]: data

Out[106]:

one two three four

Ohio 0 0 0 0

Colorado 0 5 6 7

Utah 8 9 10 11

New York 12 13 14 15

为了在DataFrame的行上进行标签索引，引入了专门的索引字段ix。它可以通过Numpy式的标记法以及轴标签从DataFrame中选取行和列的子集。

In [107]: data.ix['Colorado',['two','three']]

Out[107]:

two 5

three 6

Name: Colorado, dtype: int32

In [108]: data.ix[['Colorado','Utah'],[3,0,1]]

Out[108]:

four one two

Colorado 7 0 5

Utah 11 8 9

In [109]: data.ix[2]

Out[109]:

one 8

two 9

three 10

four 11

Name: Utah, dtype: int32

In [110]: data.ix[:'Utah','two']

Out[110]:

Ohio 0

Colorado 5

Utah 9

Name: two, dtype: int32

In [111]: data.ix[data.three>5,:3]

Out[111]:

one two three

Colorado 0 5 6

Utah 8 9 10

New York 12 13 14

表5-2-3-2 DataFrame的索引选项

obj[val]：选取DataFrame的单个列或一组列。在一些特殊情况下会比较便利：布尔型数组(过滤行)、切片(行切片)、布尔型DataFrame(根据条件设置值)。

obj.ix[val]：选取DataFrame的单个行或一组行

obj.ix[:,val]：选取单个列或列子集

obj.ix[val1,val2]：同时选取行和列

reindex方法：将一个或多个轴匹配到新索引

xs方法：根据标签选取单行或单列，并返回一个Series

icol、irow方法：根据整数位置选取单列或单行，并返回一个Series

get_value、set_value方法：根据行标签和列标签选取单个值

get_value是选取；

set_value是设置；

「亲，如果笔记对您有帮助，收藏的同时，记得给点个赞、加个关注哦！感谢！」

「文中代码均亲测过，若有错误之处，欢迎批评指正，一起学习，一起成长！」

weixin_39582480

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫