利用Python进行数据分析（二）— pandas 2.2 基本功能

本文链接：https://blog.csdn.net/qq_47343046/article/details/137786415

1. 重建索引 `reindex()`

在 Pandas 中，可以使用 reindex() 方法来执行重建索引操作。根据指定的新索引重新排列数据，使数据与新索引对应，创建符合新索引的新对象。

new_index = ['a', 'b', 'c']
= s.reindex(new_index)

1.1 按照指定顺序重建索引

通过传入新的索引列表，可以按照指定顺序重建索引。如果指定的新索引中存在原索引中没有的标签，在重建索引时会在新对象中创建缺失值，并用 NaN 表示。

import pandas as pd

# 创建一个示例 Series
data = {'a': 1, 'b': 2}
s = pd.Series(data)

# 指定新的索引顺序并创建新对象
new_index = ['a', 'b', 'c']
s_reindexed = s.reindex(new_index)
print(s_reindexed)

a    1.0
b    2.0
c    NaN
dtype: float64

1.2 添加缺失索引值

重建索引时可能会插值或者填充值，可以使用 method可选参数来进行插值或填充值。Pandas 提供了一些内置的插值方法，例如向前填充、向后填充、线性插值等。

s.reindex(new_index, method='ffill')

method= 'ffill' 'bfill' ' linear'

fill_value=0

向前填充（ffill）：使用前一个非缺失值填充缺失值。
s_reindexed_ffill = s.reindex(new_index, method='ffill')
向后填充（bfill）：使用后一个非缺失值填充缺失值。
s_reindexed_bfill = s.reindex(new_index, method='bfill')
线性插值（linear）：使用线性插值方法根据缺失值前后的两个非缺失值之间的线性关系填充缺失值。
s_reindexed_linear = s.reindex(new_index, method='linear')
填充常数值：使用指定的常数值填充缺失值。
s_reindexed_fill_value = s.reindex(new_index, fill_value=0)

1.3 改变行索引列索引

df.reindex(index=new_index, columns=new_columns)

1.3.1 只改变行索引

写法1： df_reindexed_rows = df.reindex(index=new_index)

写法2：df_reindexed_rows = df.reindex(new_index)

1.3.2 只改变列索引

df_reindexed_columns = df.reindex(columns=new_columns)

1.3.3 同时改变行索引和列索引

写法1：reindex()方法用于重建索引，可以同时改变行索引和列索引。

df_reindexed_both = df.reindex(index=new_index, columns=new_columns)

写法2：.loc[]方法用于按标签索引，可以指定新的行索引和列索引，然后返回新的DataFrame。

df_reindexed_both = df.loc(new_index, new_columns)

1.4 `reindex()` 方法的常用参数

index（行索引）： 指定新的行索引，可以是一个索引标签的列表、一个索引对象，或者其他能被转换成索引的对象。
columns（列索引）： 指定新的列索引，可以是一个列标签的列表、一个索引对象，或者其他能被转换成索引的对象。
fill_value（填充值）： 指定填充缺失值的值，当重建索引时出现缺失值时会用指定的值填充。
method（插值方法）： 指定在重建索引时进行插值的方法，常用的有向前填充（ffill）、向后填充（bfill）、线性插值（linear）等。
copy（复制）： 指定是否返回一个新的对象而不是修改原始对象，默认为 True，即返回一个新的对象。
level（级别）： 当 DataFrame 使用多级索引时，可以指定要重建的索引级别。
limit（限制）： 当进行向前或向后填充时，limit参数指定了填充的最大数量。如果设置了limit，则只会在连续缺失值的情况下填充指定数量的值，超过limit数量的缺失值将保持为缺失值。
tolerance（容差）： 在进行线性插值时，tolerance参数指定了插值时允许的最大误差范围。如果设置了tolerance，则在计算线性插值时，会根据指定的容差范围进行插值，超出容差范围的值将保持为缺失值。