【pandas 小记】rename、reindex、set_index

最新推荐文章于 2024-02-07 14:29:51 发布

杨jun坚

最新推荐文章于 2024-02-07 14:29:51 发布

阅读量1.7k

点赞数 1

分类专栏： pandas 文章标签： python pandas reindex rename index

本文链接：https://blog.csdn.net/yangjjuan/article/details/104704407

版权

pandas 专栏收录该内容

11 篇文章 2 订阅

订阅专栏

rename、reindex、set_index区别

rename、reindex、set_index都是pandas中对索引操作的主要方法，它们的区别如下

方法	适用场景	调用方法	备注
rename	传入字典或函数修改索引的名称，即轴标签名，只能对现有轴标签重命名，不能新增或删减索引，	df.rename(dict1)	当dict1中包含不存在的索引时，参数errors，可以设定捕捉错误或忽略
reindex	返回一个符合新索引的DataFrame，之前不存在的索引，用NaN填充。	df.reindex(new_index)	参数copy=True时，即使新索引与旧索引相同，也会返回新对象
set_index	将一列数据设置为索引	df.set_index(‘col_name’)	参数drop=True，将作为索引的列删除。

1,rename详解

(1) 语法格式

DataFrame.rename(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')

(2) 参数说明
mapper：dict 或者函数，旧轴标签与新轴标签的对应关系，与axis连用，已确定修改哪个轴向上的轴标签。
index：dict 或者函数，旧轴标签与新轴标签的对应关系，等价于 (mapper, axis=0)。
columns：dict 或者函数，旧轴标签与新轴标签的对应关系，等价于 (mapper, axis=1)。
axis：轴向，与mapper连用，已确定修改哪个轴标签。
inplace：bool，默认False，若为True，则直接修改原对象的轴标签。
level：分层索引时，指定修改哪一层轴标签。
errors：当字典中包含不存在的轴标签时，捕捉错误还是忽略。

(3) demo

frame = pd.DataFrame(np.arange(9).reshape((3, -1)), \
                     index=list('bce'), \
                     columns=['Ohio', 'Texas', 'Callfornis'])
frame.rename(index={'b': 'B', 'c': 'C', 'e': 'E'}, inplace=True)  # 直接修改原对象
print(frame)
"""
   Ohio  Texas  Callfornis
B     0      1           2
C     3      4           5
E     6      7           8
"""
frame1 = frame.rename(mapper={'B': 'Bb', 'C': 'Cc', 'E': 'Ee'}, axis=0) # mapper,axis方式

"""
    Ohio  Texas  Callfornis
Bb     0      1           2
Cc     3      4           5
Ee     6      7           8
"""
frame2 = frame.rename(columns=str.upper) # 传入函数，转换成大写
"""
   OHIO  TEXAS  CALLFORNIS
B     0      1           2
C     3      4           5
E     6      7           8
"""
frame.rename(index={'B': 'Bb', 'C': 'Cc', 'D': 'De', 'E': 'Ee'}, errors='raise')  # 不存在的轴标签，捕捉错误
"""KeyError: "['D'] not found in axis"""

2,reindex详解

(1) 语法格式

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

(2) 参数说明
labels：新的索引序列，与参数axis使用，已确定新索引的轴向。
index：0轴方向的索引。
columns：1轴方向的索引。
axis：轴向，与labels一起使用。
method：重建索引时插值的方法，因为需要获取前一个/后一个索引对应的数据，所以这仅适用于递增/递减的索引。‘backfill’/’bfill’，后向填充，用后一个索引对应的数据来插值，pad’/’ffill’’，前向填充，用前一个索引对应的数据来插值。
copy：是否返回副本；
level：分层索引时，新索引在哪一层。
fill_value：重建索引后缺失数据的填充值，默认np.NaN。
limit：前向/后向填充时，填充的最大尺寸。

(3) demo

frame = pd.DataFrame(np.arange(9).reshape((3, -1)), \
                     index=list('bce'), \
                     columns=['Ohio', 'Texas', 'Callfornis'])
new_index = ['a', 'b', 'd']
frame1 = frame.reindex(index=new_index)  # 参数 index 则作用于行索引
"""
   Ohio  Texas  Callfornis
a   NaN    NaN         NaN
b   0.0    1.0         2.0
d   NaN    NaN         NaN
"""
frame2 = frame.reindex(labels=new_index, axis=0)  # label 与 axis 同时使用
"""
   Ohio  Texas  Callfornis
a   NaN    NaN         NaN
b   0.0    1.0         2.0
d   NaN    NaN         NaN
"""
# methon 这仅适用于索引单调递增/递减的df，缺失值需要根据索引顺序取值
new_index = ['a', 'b', 'd', 'e', 'f', 'g']
frame3 = frame.reindex(index=new_index, method='pad')
"""  a 没有前一行所以为NaN
   Ohio  Texas  Callfornis
a   NaN    NaN         NaN
b   0.0    1.0         2.0
d   3.0    4.0         5.0
e   6.0    7.0         8.0
f   6.0    7.0         8.0
g   6.0    7.0         8.0
"""
frame4 = frame.reindex(index=new_index, method='pad', limit=1)  # limit 与 method 连用 限制填充行/列数
"""
   Ohio  Texas  Callfornis
a   NaN    NaN         NaN
b   0.0    1.0         2.0
d   3.0    4.0         5.0
e   6.0    7.0         8.0
f   6.0    7.0         8.0
g   NaN    NaN         NaN
"""
frame5 = frame.reindex(index=new_index, fill_value=0)  # 用0填充缺失数据
"""
   Ohio  Texas  Callfornis
a     0      0           0
b     0      1           2
d     0      0           0
e     6      7           8
f     0      0           0
g     0      0           0
"""

3,set_index详解

(1) 语法格式

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

(2) 参数说明
keys：列名，可以是单个列名，或者列名组合， Series, Index, np.ndarray, and instances of Iterator.。
drop：bool，是否删除作为索引的列，默认为True。
append：bool，是否将列作为索引，而不删除原来的索引，默认为True。
inplace：bool，是否直接修改原对象，默认为False，则会产生一个新对象。
verify_integrity：bool，是否检验新索引包含重复值，默认为False。

(3) demo

df = pd.DataFrame({'month': [1, 4, 7, 10],
                     'year': [2012, 2014, 2013, 2014],
                     'sale': [55, 40, 84, 31]})
df1 = df.set_index('month')
"""
       year  sale
month            
1      2012    55
4      2014    40
7      2013    84
10     2014    31
"""
df2 = df.set_index(['year', 'month'])  # 两列一起作为索引，形成分层索引
"""
            sale
year month      
2012 1        55
2014 4        40
2013 7        84
2014 10       31
"""
df3 = df.set_index([pd.Index([1, 2, 3, 4]), 'year'])  # 将index对象和列名作为索引
"""
        month  sale
  year             
1 2012      1    55
2 2014      4    40
3 2013      7    84
4 2014     10    31
"""
s = pd.Series(['a', 'b', 'c', 'd'])
df4 = df.set_index(s,append=True)   # 将Series append为索引，而不删除原来的索引
"""
     month  year  sale
0 a      1  2012    55
1 b      4  2014    40
2 c      7  2013    84
3 d     10  2014    31
"""

杨jun坚

关注

1
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
【pandas 小记】rename、reindex、set_index

rename、reindex、set_index区别rename、reindex、set_index都是pandas中对索引操作的主要方法，它们的区别如下方法适用场景调用方法备注rename传入字典或函数修改索引的名称，即轴标签名，只能对现有轴标签重命名，不能新增或删减索引，df.rename(dict1)当dict1中包含不存在的索引时，参数errors，可以设...
复制链接

扫一扫