1. set_index
DataFrame.set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)- keys: 将keys列设置为index(可设置单级索引和多级索引)
- 用于设置索引或者多级索引
In [307]: data
Out[307]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
In [308]: indexed1 = data.set_index('c')
In [309]: indexed1
Out[309]:
a b d
c
z bar one 1.0
y bar two 2.0
x foo one 3.0
w foo two 4.0
In [310]: indexed2 = data.set_index(['a', 'b'])
In [311]: indexed2
Out[311]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
2. reset_index
DataFrame.reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')- 将index恢复为列属性
- 将索引或者多级索引恢复成属性
In [318]: data
Out[318]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
In [319]: data.reset_index()
Out[319]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
3. reindex
DataFrame.
reindex
(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)- columns: 要修改的列名
- 修改数据集的列名
>>> p2
col1 col2 col3
0 1 6 2
1 2 1 8
2 3 0 1
>>> p2.reindex(columns=['col2', 'col3', 'col1'])
col2 col3 col1
0 6 2 1
1 1 8 2
2 0 1 3
# 注意和df.columns的区别: reindex会重新排列数据
# df.columns只是改变列名不会移动数据
>>> p2
col1 col2 col3
0 1 6 2
1 2 1 8
2 3 0 1
>>> p2.columns = ['col2', 'col3', 'col1']
>>> p2
col2 col3 col1
0 1 6 2
1 2 1 8
2 3 0 1