python中的reindex_pandas.DataFrame.reindex的使用介绍

最新推荐文章于 2024-06-17 14:55:22 发布

Aurora曙光

最新推荐文章于 2024-06-17 14:55:22 发布

阅读量1.1k

点赞数

文章标签： python中的reindex

本文链接：https://blog.csdn.net/weixin_42163404/article/details/113968963

版权

本文介绍了pandas DataFrame的reindex方法，用于根据指定的索引重新调整数据。reindex允许填充缺失值，如使用'pad'或'backfill'方法。示例展示了如何通过重新设置index和columns以及使用fill_value参数来改变DataFrame的结构。此外，还演示了在有序索引上使用method参数，例如在时间序列数据中用'backfill'方法填充缺失值。

摘要由CSDN通过智能技术生成

参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameterskeywords for axesarray-like, optionalNew labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: Propagate last valid observation forward to next valid.

backfill / bfill: Use next valid observation to fill gap.

nearest: Use nearest valid observations to fill gap.copybool, default TrueReturn a new object, even if the passed indexes are the same.levelint or nameBroadcast across a level, matching Index values on the passed MultiIndex level.fill_valuescalar, default np.NaNValue to use for missing values. Defaults to NaN, but can be any “compatible” value.limitint, default NoneMaximum number of consecutive elements to forward or backward fill.toleranceoptionalMaximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

DataFrame.reindex supports two calling conventions

(index=index_labels, columns=column_labels, ...)

(labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

通过查寻了解，这个主要是外部定义一个索引，返回一个新的df对象，对于新的索引的缺省项，可以设置一些默认值。

可以通过两种方式传参，推荐使用第一种。

参数col_level在我调试的版本中已经改为level

书中示例代码，该方法主要用于重设index，并且为新的index中的内容添加默认值。

In [123]: index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']

...: df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],

...: 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},

...: index=index)

In [124]: df

Out[124]:

http_status response_time

Firefox 200 0.04

Chrome 200 0.02

Safari 404 0.07

IE10 404 0.08

Konqueror 301 1.00

In [125]:

定义了一个df对象，定义了一个index

后面将定义一个新的index对象，另外使用默认参数

In [130]: new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',

...: 'Chrome']

In [131]: df

Out[131]:

http_status response_time

Firefox 200 0.04

Chrome 200 0.02

Safari 404 0.07

IE10 404 0.08

Konqueror 301 1.00

In [132]: df.reindex(index=new_index)

Out[132]:

http_status response_time

Safari 404.0 0.07

Iceweasel NaN NaN

Comodo Dragon NaN NaN

IE10 404.0 0.08

Chrome 200.0 0.02

生成了一个新的df对象，添加的index

我们也可以通过fill_value的选项来设置默认值

In [133]: df.reindex(index=new_index, fill_value='missing')

Out[133]:

http_status response_time

Safari 404 0.07

Iceweasel missing missing

Comodo Dragon missing missing

IE10 404 0.08

Chrome 200 0.02

也可以通过下面两种方式重设列的索引。

In [134]: df.reindex(columns=['http_status', 'user_agent'])

Out[134]:

http_status user_agent

Firefox 200 NaN

Chrome 200 NaN

Safari 404 NaN

IE10 404 NaN

Konqueror 301 NaN

In [135]: df.reindex(['http_status', 'user_agent'], axis="columns")

Out[135]:

http_status user_agent

Firefox 200 NaN

Chrome 200 NaN

Safari 404 NaN

IE10 404 NaN

Konqueror 301 NaN

为了进一步说明reindex的使用中，针对的有序索引，使用metho的参数，填写默认值。

首先创建一个时间索引的df对象

In [137]: date_index = pd.date_range('1/1/2010', periods=6, freq='D')

...: df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},

...: index=date_index)

...:

In [138]: df2

Out[138]:

prices

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

然后通过reindex替换成一个时间周期更长的，并使用method参数。

In [139]: date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')

In [140]: df2.reindex(index=date_index2)

Out[140]:

prices

2009-12-29 NaN

2009-12-30 NaN

2009-12-31 NaN

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

2010-01-07 NaN

In [141]: df2.reindex(index=date_index2, method='bfill')

Out[141]:

prices

2009-12-29 100.0

2009-12-30 100.0

2009-12-31 100.0

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

2010-01-07 NaN

In [142]:

从输出可以看出，默认的还是NAN参数，使用了后面数据为默认数据，新的索引已经添加了数据，但老的索引内的数据并没有修改。

如果需要更改，使用fillna的方法。

Aurora曙光

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中的reindex_pandas.DataFrame.reindex的使用介绍

参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindexDataFrame.reindex(labels=None,index=None,columns=None,axis=None,method=None,c...
复制链接

扫一扫