pandas 筛选_pandas-reindex居然也可以筛选数据？不止这么简单

最新推荐文章于 2023-07-14 18:02:47 发布

weixin_39732018

最新推荐文章于 2023-07-14 18:02:47 发布

阅读量243

点赞数

文章标签： pandas 筛选

前段时间写过关于在pandas中如何筛选数据，如loc,iloc，今天讲一讲reindex

pandas中的reindex原来是干什么的呢？

官方的解释是这样的：

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

用可选填充逻辑将DataFrame符合到新索引，将NA / NaN放置在先前索引中没有值的位置。除非新索引等效于当前索引并且copy = False，否则将生成新对象。

我的理解是：

也就是说设计当初就是重新构建一个新的索引，有的数据会被保留，没有的会新增进来，并且以na填充(还有更多的填充方法)

设计这个方法有什么意义呢？

如果可以提供和原来行和列相等长度的列表，调整列表顺序也可以调整数据的顺序
如果你需要新增一列或一行数据，用来计算，当然还有别的简便方法

重点掌握知识点：

reindex设计之初就是要通过新建索引增加或者删改数据
不仅如此你还可以调整顺序
两种方法二选一
默认返回的是新对象
填充新建的值得方法有好多
默认设置的索引是index

本文涵盖内容：

详细参数详解：
每个参数的的解释实例和代码
如何实现数据筛选

详细参数讲解：

labels : 列表, optional

新标签新索引，配合‘axis’ 一起使用

index, columns : 列表, optional

应使用关键字指定要符合的新标签/索引。

axis : int or str, optional

Axis目标。可以是轴名称('index'，'columns')或数字(0,1)。

method : {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

用于在重建索引的DataFrame中填充孔的方法。请注意：这仅适用于具有单调递增/递减指数的DataFrames / Series。

None(默认)：不填补空白
pad / ffill：将前一个数据填充
backfill / bfill：使用后一个数据填充
nearest：使用最近的有效观察来填补空白

copy : bool, default True

返回一个新对象，即使传递的索引是相同的.

level : int or name

跨级别广播，匹配传递的MultiIndex级别的索引值。

fill_value：标量，默认np.NaN

用于缺失值的值。默认为NaN，但可以是任何“兼容”值。

limit : int, default None

先前或者向后要填充的最大连续元素数。

tolerance : optional，没用过，不清楚有啥用

不精确匹配的原始和新标签之间的最大距离。匹配位置处的索引值最满足方程abs(index [indexer] - target)<= tolerance。

容差可以是标量值，它对所有值应用相同的容差，或类似列表，这对每个元素应用可变容差。类似列表包括列表，元组，数组，系列，并且必须与索引的大小相同，并且其dtype必须与索引的类型完全匹配。

实例：

新建一个DataFrame:

index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']df = pd.DataFrame({ 'http_status': [200,200,404,404,301], 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}, index=index)df

最基础的操作：通过新建索引，改变数据结构

new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 'Chrome']df.reindex(new_index)

通过指定标签来新建索引：

df.reindex(columns=['http_status', 'user_agent'])

通过标签新建索引的第二种方式：

df.reindex(['http_status', 'user_agent'], axis="columns")

通过重建索引，增加行或列

原来的索引如果不出现在列表里将会被删掉

new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 'Chrome']df.reindex(new_index)

填充增加的索引数据：

可以用整数或浮点数
也可以用str

df.reindex(new_index, fill_value=0)

df.reindex(new_index, fill_value='missing')

通过指定方法来填充增加的行或列：method

新建一个数据：

date_index = pd.date_range('1/1/2010', periods=6, freq='D')df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},index=date_index)

常规方式新建索引：

date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')df2.reindex(date_index2)

通过方法填充：

bfill：用最后一次出现数值的填充

df2.reindex(date_index2, method='bfill')

bfill：用前面出现数值的填充

df2.reindex(date_index2, method='ffill')

nearest:用附近的值填充：

df2.reindex(date_index2, method='nearest')

limit：先前或者向后要填充的最大连续元素数:

个人感觉没啥用：

df2.reindex(date_index2, method='bfill',limit=2)

扩展：如何实现数据筛选？

其实通过索引可以增加删除数据，只要我们传入需要的索引，就可以达到数据筛选的目的：

筛选行：

new_index= ['Safari']

筛选行和列：

new_index= ['Safari']new_columns= ['http_status']df.reindex(index=new_index,columns=new_columns)

能不能传入一个筛选范围呢？

当然也是可以的啦

new_index= ['Safari']new_columns= ['http_status']df.reindex(index=df[df["http_status"]>200].index,columns=new_columns)

官方链接：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

英文版参数解释：

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)[source]

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:labels : array-like, optional

New labels / index to conform the axis specified by ‘axis’ to.

index, columns : array-like, optional

New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data

axis : int or str, optional

Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).

method : {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: propagate last valid observation forward to next valid

backfill / bfill: use next valid observation to fill gap

nearest: use nearest valid observations to fill gap

copy : bool, default True

Return a new object, even if the passed indexes are the same.

level : int or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value : scalar, default np.NaN

Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

limit : int, default None

Maximum number of consecutive elements to forward or backward fill.

tolerance : optional

Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

New in version 0.21.0: (list-like tolerance)

weixin_39732018

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas 筛选_pandas-reindex居然也可以筛选数据？不止这么简单

前段时间写过关于在pandas中如何筛选数据，如loc,iloc，今天讲一讲reindexpandas中的reindex原来是干什么的呢？官方的解释是这样的：Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous i...
复制链接

扫一扫