pandas multiIndex

最新推荐文章于 2024-08-08 16:06:13 发布

wangquannuaa

最新推荐文章于 2024-08-08 16:06:13 发布

阅读量2.1k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/wangquannuaa/article/details/47065229

版权

python 专栏收录该内容

41 篇文章 0 订阅

订阅专栏

In [1]: arrays = [[’bar’, ’bar’, ’baz’, ’baz’, ’foo’, ’foo’, ’qux’, ’qux’],
...: [’one’, ’two’, ’one’, ’two’, ’one’, ’two’, ’one’, ’two’]]
In [2]: tuples = list(zip(*arrays))
In [4]: index = MultiIndex.from_tuples(tuples, names=[’first’, ’second’])
In [6]: s = Series(randn(8), index=index)
In [7]: s
Out[7]:
first second
bar   one   0.469112
      two   -0.282863
baz   one   -1.509059
      two   -1.135632
foo   one   1.212112
      two   -0.173215
qux   one   0.119209
      two   -1.044236
dtype: float64
In [16]: df = DataFrame(randn(3, 8), index=[’A’, ’B’, ’C’], columns=index)
In [17]: df
Out[17]:
first  bar     baz     foo     qux \
second one two one two one two one
A      0.895717 0.805244 -1.206412 2.565646    1.431256 1.340309 -1.170299
B      0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127
C      -1.413681 1.607920 1.024180 0.569605 0.875906 -2.211372 0.974466
first
second two
A     -0.226169
B     -1.436737
C     -2.006747

In [41]: def mklbl(prefix,n):
....: return ["%s%s" % (prefix,i) for i in range(n)]
In [42]: miindex = MultiIndex.from_product([mklbl(’A’,4),
....: mklbl(’B’,2),
....: mklbl(’C’,4),
....: mklbl(’D’,2)])
In [43]: micolumns = MultiIndex.from_tuples([(’a’,’foo’),(’a’,’bar’),
....: (’b’,’foo’),(’b’,’bah’)],
....: names=[’lvl0’, ’lvl1’])
In [44]: dfmi = DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),index=miindex,columns=micolumns).sortlevel().sortlevel(axis=1)
In [45]: dfmi
Out[45]:
lvl0        a       b
lvl1        bar foo bah foo
A0 B0 C0 D0 1   0   3   2
         D1 5   4   7   6
      C1 D0 9   8   11  10
         D1 13  12  15  14
      C2 D0 17  16  19  18
         D1 21  20  23  22
      C3 D0 25  24  27  26
... ... ... ... ...
A3 B1 C0 D1 229 228 231 230
      C1 D0 233 232 235 234
         D1 237 236 239 238
      C2 D0 241 240 243 242
         D1 245 244 247 246
      C3 D0 249 248 251 250
         D1 253 252 255 254
In [47]: idx = pd.IndexSlice
In [51]: mask = dfmi[(’a’,’foo’)]>200
In [52]: dfmi.loc[idx[mask,:[’C1’,’C3’]],idx[:,’foo’]]
Out[52]:
lvl0        a   b
lvl1        foo foo
A3 B0 C1 D1 204 206
      C3 D0 216 218
         D1 220 222
   B1 C1 D0 232 234
         D1 236 238
      C3 D0 248 250
         D1 252 254