python中的Reindexing

最新推荐文章于 2024-06-12 15:27:44 发布

hsc_1

最新推荐文章于 2024-06-12 15:27:44 发布

阅读量593

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/hsc_1/article/details/79604260

版权

python 专栏收录该内容

143 篇文章 1 订阅

订阅专栏

Reindexing

在一个Series上调用reindex是根据新的索引重新排列数据，如果有新的索引，那么对应的值为NaN。当然，如果新的索引不包含之前的索引，那么这些索引以及对应的数据并不包含在新返回的对象。

In [49]: obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])

In [50]: obj
Out[50]:
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [51]: obj.reindex(['a','b','c','d','e'])
Out[51]:
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

当reindex的时候，对于新的索引，其对应的值你不想为NaN。可以使用fill_value参数，这样就可以填充你想填充的值。

In [57]: obj.reindex(list('abcde'),fill_value=0)
Out[57]:
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

而如果你想要按照数据的规律，那么就可以利用method参数了。

method='ffill'(forward fill)的时候，从前往后填充。
method='bfill'(back fill)的时候，从后往前填充。

In [53]: obj3 =pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
    ...:

In [54]:

In [54]: obj3.reindex([0,1,2,3,4,5])
Out[54]:
0      blue
1       NaN
2    purple
3       NaN
4    yellow
5       NaN
dtype: object

In [55]: obj3.reindex([0,1,2,3,4,5],method='ffill')
Out[55]:
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

In [56]: obj3.reindex([0,1,2,3,4,5],method='bfill')
Out[56]:
0      blue
1    purple
2    purple
3    yellow
4    yellow
5       NaN
dtype: object

对于DataFrame对象，reindex可以改变index或者columns，也可以同时改变index和columns。默认是改变index的。

In [58]: frame = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'],
    ...:  ....: columns=['Ohio', 'Texas', 'California'])
    ...:

In [59]: frame
Out[59]:
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8

In [60]: frame.reindex(['a','b','c','d'])
Out[60]:
   Ohio  Texas  California
a   0.0    1.0         2.0
b   NaN    NaN         NaN
c   3.0    4.0         5.0
d   6.0    7.0         8.0

In [61]: frame.reindex(['a','b','c','d'],fill_value=0)
Out[61]:
   Ohio  Texas  California
a     0      1           2
b     0      0           0
c     3      4           5
d     6      7           8

In [62]: states = ['Texas', 'Utah', 'California']

In [63]: frame.reindex(columns=states)
Out[63]:
   Texas  Utah  California
a      1   NaN           2
c      4   NaN           5
d      7   NaN           8