一、重复索引
查看是否是相同索引
s = pd.Series(np.random.rand(6), index=list('abcbda'))
a 0.848267
b 0.704451
c 0.326481
b 0.897240
d 0.220018
a 0.565038
print(s.index.is_unique)
False
把两个相同的索引进行清洗
s.groupby(s.index).sum()
a 0.523258
b 0.313002
c 0.581596
b 0.745864
d 0.829234
a 0.455364
二、多级索引
s = pd.Series(np.random.rand(7), index=index)
level1 level2
a 1 0.137450
2 0.968562
3 0.622390
b 1 0.035829
2 0.874913
c 2 0.043371
3 0.226650
dtype: float64
一级索引
print(s['c'])
level2
2 0.695136
3 0.933268
dtype: float64
二级索引
print(s[:, '2'])
level1
a 0.227801
b 0.777973
c 0.032813
dtype: float64
复杂索引
DataFrame
df = pd.DataFrame(np.random.randint(1, 10, (4, 3)),
index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns=[['one', 'one', 'two'], ['blue', 'red', 'blue']])
df.index.names = ['row-1', 'row-2']
df.columns.names = ['col-1', 'col-2']
print(df)
col-1 one two
col-2 blue red blue
row-1 row-2
a 1 2 4 1
2 3 1 4
b 1 3 9 4
2 7 9 8
一级索引
print(df.loc['a'])
col-1 one two
col-2 blue red blue
row-2
1 8 5 1
2 8 5 6
二级索引
print(df.loc['a', 1])
col-1 col-2
one blue 2
red 4
two blue 1
交换索引
df2 = df.swaplevel('row-1', 'row-2')
print(df2)
col-1 one two
col-2 blue red blue
row-2 row-1
1 a 5 1 9
2 a 9 4 1
1 b 3 8 3
2 b 7 7 8
多级索引的统计
df.sum(level=0)
print(df)
df = pd.DataFrame({
'a': range(7),
'b': range(7, 0, -1),
'c': ['one', 'one', 'one', 'one', 'two', 'two', 'two'],
'd': [0, 1, 2, 0, 1, 2, 3]
})
print(df)
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 one 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
把某一列设成索引值
print(df.set_index('c'))
a b d
c
one 0 7 0
one 1 6 1
one 2 5 2
one 3 4 0
two 4 3 1
two 5 2 2
two 6 1 3
print(df.set_index(['c', 'd']))
a b
c d
one 0 0 7
1 1 6
2 2 5
0 3 4
two 1 4 3
2 5 2
3 6 1
将它转换回来
df2 = df.set_index(['c', 'd'])
print(df2.reset_index())
c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 one 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1
列索引排序
print(df2.reset_index().sort_index('columns'))
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 one 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3