多级/分层索引(MultiIndex)
1、创建(多级行索引)
1) 从数组列表:MultiIndex.from_arrays()
df = pd.DataFrame(np.random.randn(4), columns=['v'])
print(df.to_string())
'''
v
0 -1.066128
1 0.393179
2 1.595051
3 -1.492583
'''
arrays = [np.array(['bar', 'bar', 'foo', 'foo']), np.array(['one', 'two', 'one', 'two'])]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df.index = index
print(df.to_string())
'''
v
first second
bar one 1.270260
two -0.539714
foo one -0.284528
two 0.682699
'''
2) 从元组数组:MultiIndex.from_tuples()
df = pd.DataFrame(np.random.randn(4), columns=['v'])
arrays = [['bar', 'bar', 'foo', 'foo'], ['one', 'two', 'one', 'tow']]
tuples = list(zip(*arrays))
print(tuples) # [('bar', 'one'), ('bar', 'two'), ('foo', 'one'), ('foo', 'tow')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df.index = index
print(df.to_string())
'''
v
first second
bar one -1.066128
two 0.393179
foo one 1.595051
tow -1.492583
'''
3) 从交叉迭代器集(两两匹配):MultiIndex.from_product()
df = pd.DataFrame(np.random.randn(4), columns=['v'])
iterables = [['bar', 'foo'], ['one', 'two']]
index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df.index = index
print(df.to_string())
'''
v
first second
bar one -0.201214
two -1.097793
foo one 0.033961
two -0.023977
'''
4) 从DataFrame:MultiIndex.from_frame()
df = pd.DataFrame(np.random.randn(4), columns=['v'])
df_index = pd.DataFrame([['bar', 'one'], ['bar', 'two'], ['foo', 'one'], ['foo', 'two']], columns=['first', 'second'])
index = pd.MultiIndex.from_frame(df_index)
df.index = index
print(df.to_string())
'''
v
first second
bar one -0.669057
two 0.749700
foo one -1.458682
two -0.795187
'''
2、获取指定层索引标签列表与数据对齐
1) 获取指定层索引标签列表:get_level_values()
print(df.index.get_level_values(-1)) # Index(['one', 'two', 'one', 'two'], dtype='object', name='second')
2) 数据对齐:reindex()
# 调整列(默认行):可添加列、对列调换位置
df_reindex = df.reindex(['k', 'v'], axis=1, fill_value=0)
print(df_reindex.to_string())
'''
k v
first second
bar one 0 0.778402
two 0 -0.773582
foo one 0 0.102783
two 0 -0.849725
'''
3、多级/分层索引高级用法
1) 取指定索引的所有列的值
print(df_reindex.loc[('bar', 'two')])
'''
k 0.000000
v -0.115918
Name: (bar, two), dtype: float64
'''
2) 取指定索引指定列的值
print(df_reindex.loc[('bar', 'two'), 'v'])
'''
1.1971089038835172
'''
3) 局部切片
print(df_reindex.loc['bar'])
'''
k v
second
one 0 -0.141858
two 0 -0.786239
'''
print(df_reindex.loc(axis=0)[:, 'two'])
'''
k v
first second
bar two 0 -0.615808
foo two 0 0.238987
'''
print(df_reindex.loc[('bar', 'two'):('foo', 'one')])
'''
k v
first second
bar two 0 1.055614
foo one 0 0.899269
'''
4) 交叉选择:xs(key, axis, level, drop_level)
print(df_reindex.xs(key='two', level='second', drop_level=False))
'''
k v
first second
bar two 0 -0.615808
foo two 0 0.238987
'''
5) 多级列索引的使用
df = df_reindex.T
print(df)
'''
first bar foo
second one two one two
k 0.000000 0.000000 0.000000 0.000000
v -0.062292 0.342238 -0.520897 -0.565263
'''
print(df.loc(axis=1)[:, 'two'])
print(df.xs('two', level=-1, axis=1, drop_level=False))
'''
first bar foo
second two two
k 0.000000 0.000000
v -0.711593 1.917493
'''
print(df.xs(('bar', 'two'), level=(0, -1), axis=1))
'''
first bar
second two
k 0.00000
v 0.38138
'''
6) 交换索引层级:swaplevel()
print(df.swaplevel(axis=1))
'''
second one two one two
first bar bar foo foo
k 0.000000 0.000000 0.000000 0.000000
v 0.398076 -0.346376 0.555577 -1.454103
'''
7) 重命名索引:rename(index)
print(df_reindex.rename(index={'bar': "bar_x", 'one': "one_x"}))
'''
k v
first second
bar_x one_x 0 0.686636
two 0 0.352534
foo one_x 0 0.489657
two 0 0.067204
'''
8) 重命名轴名称:rename_axis(index)
print(df_reindex.rename_axis(index=['first_x', 'second_x']))
'''
k v
first_x second_x
bar one 0 0.792857
two 0 -1.092279
foo one 0 -1.822016
two 0 0.661191
'''