python中的reindex_pandas.DataFrame.reindex的使用介绍

本文介绍了pandas DataFrame的reindex方法,用于根据指定的索引重新调整数据。reindex允许填充缺失值,如使用'pad'或'backfill'方法。示例展示了如何通过重新设置index和columns以及使用fill_value参数来改变DataFrame的结构。此外,还演示了在有序索引上使用method参数,例如在时间序列数据中用'backfill'方法填充缺失值。
摘要由CSDN通过智能技术生成

参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameterskeywords for axesarray-like, optionalNew labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: Propagate last valid observation forward to next valid.

backfill / bfill: Use next valid observation to fill gap.

nearest: Use nearest valid observations to fill gap.copybool, default TrueReturn a new object, even if the passed indexes are the same.levelint or nameBroadcast across a level, matching Index values on the passed MultiIndex level.fill_valuescalar, default np.NaNValue to use for missing values. Defaults to NaN, but can be any “compatible” value.limitint, default NoneMaximum number of consecutive elements to forward or backward fill.toleranceoptionalMaximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

DataFrame.reindex supports two calling conventions

(index=index_labels, columns=column_labels, ...)

(labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

通过查寻了解,这个主要是外部定义一个索引,返回一个新的df对象,对于新的索引的缺省项,可以设置一些默认值。

可以通过两种方式传参,推荐使用第一种。

参数col_level在我调试的版本中已经改为level

书中示例代码,该方法主要用于重设index,并且为新的index中的内容添加默认值。

In [123]: index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']

...: df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],

...: 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},

...: index=index)

In [124]: df

Out[124]:

http_status response_time

Firefox 200 0.04

Chrome 200 0.02

Safari 404 0.07

IE10 404 0.08

Konqueror 301 1.00

In [125]:

定义了一个df对象,定义了一个index

后面将定义一个新的index对象,另外使用默认参数

In [130]: new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',

...: 'Chrome']

In [131]: df

Out[131]:

http_status response_time

Firefox 200 0.04

Chrome 200 0.02

Safari 404 0.07

IE10 404 0.08

Konqueror 301 1.00

In [132]: df.reindex(index=new_index)

Out[132]:

http_status response_time

Safari 404.0 0.07

Iceweasel NaN NaN

Comodo Dragon NaN NaN

IE10 404.0 0.08

Chrome 200.0 0.02

生成了一个新的df对象,添加的index

我们也可以通过fill_value的选项来设置默认值

In [133]: df.reindex(index=new_index, fill_value='missing')

Out[133]:

http_status response_time

Safari 404 0.07

Iceweasel missing missing

Comodo Dragon missing missing

IE10 404 0.08

Chrome 200 0.02

也可以通过下面两种方式重设列的索引。

In [134]: df.reindex(columns=['http_status', 'user_agent'])

Out[134]:

http_status user_agent

Firefox 200 NaN

Chrome 200 NaN

Safari 404 NaN

IE10 404 NaN

Konqueror 301 NaN

In [135]: df.reindex(['http_status', 'user_agent'], axis="columns")

Out[135]:

http_status user_agent

Firefox 200 NaN

Chrome 200 NaN

Safari 404 NaN

IE10 404 NaN

Konqueror 301 NaN

为了进一步说明reindex的使用中,针对的有序索引,使用metho的参数,填写默认值。

首先创建一个时间索引的df对象

In [137]: date_index = pd.date_range('1/1/2010', periods=6, freq='D')

...: df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},

...: index=date_index)

...:

In [138]: df2

Out[138]:

prices

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

然后通过reindex替换成一个时间周期更长的,并使用method参数。

In [139]: date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')

In [140]: df2.reindex(index=date_index2)

Out[140]:

prices

2009-12-29 NaN

2009-12-30 NaN

2009-12-31 NaN

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

2010-01-07 NaN

In [141]: df2.reindex(index=date_index2, method='bfill')

Out[141]:

prices

2009-12-29 100.0

2009-12-30 100.0

2009-12-31 100.0

2010-01-01 100.0

2010-01-02 101.0

2010-01-03 NaN

2010-01-04 100.0

2010-01-05 89.0

2010-01-06 88.0

2010-01-07 NaN

In [142]:

从输出可以看出,默认的还是NAN参数,使用了后面数据为默认数据,新的索引已经添加了数据,但老的索引内的数据并没有修改。

如果需要更改,使用fillna的方法。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值