创建多重索引
In [16]: df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
In [17]: df
Out[17]:
first bar baz foo qux \
second one two one two one two one
A 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299
B 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127
C -1.413681 1.607920 1.024180 0.569605 0.875906 -2.211372 0.974466
first
second two
A -0.226169
B -1.436737
C -2.006747
获得索引信息
get_level_values
In [23]: index.get_level_values(0)
Out[23]: Index(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], dtype='object', name='first')
In [24]: index.get_level_values('second')
Out[24]: Index(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], dtype='object', name='second')
基本索引
In [25]: df['bar']
Out[25]:
second one two
A 0.895717 0.805244
B 0.410835 0.813850
C -1.413681 1.607920
In [26]: df['bar', 'one']
Out[26]:
A 0.895717
B 0.410835
C -1.413681
Name: (bar, one), dtype: float64
In [27]: df['bar']['one']
Out[27]:
A 0.895717
B 0.410835
C -1.413681
Name: one, dtype: float64
使用reindex对齐数据
数据准备
In [11]: s = pd.Series(np.random.randn(8), index=arrays)
In [12]: s
Out[12]:
bar one -0.861849
two -2.104569
baz one -0.494929
two 1.071804
foo one 0.721555
two -0.706771
qux one -1.039575
two 0.271860
dtype: float64
s序列加(0~-2)索引的值,因为s[:-2]没有最后两个的索引,所以为NaN.s[::2]意思是步长为1.
In [34]: s + s[:-2]
Out[34]:
bar one -1.723698
two -4.209138
baz one -0.989859
two 2.143608
foo one 1.443110
two -1.413542
qux one NaN
two NaN
dtype: float64
In [35]: s + s[::2]
Out[35]:
bar one -1.723698
two NaN
baz one -0.989859
two NaN
foo one 1.443110
two NaN
qux one -2.079150
two NaN
dtype: float64