python中multiindex如何索引_python中multiindex如何索引_python – MultiIndex DataFr

最新推荐文章于 2022-11-16 14:41:12 发布

一起来读英文原版

最新推荐文章于 2022-11-16 14:41:12 发布

阅读量210

点赞数

文章标签： python中multiindex如何索引

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_30191159/article/details/114409968

版权

在Pandas中,有没有办法以表格格式有效地提取HDFStore中存在的所有MultiIndex索引？

我可以使用where =来有效地选择(),但我想要所有索引,而不是所有列.我也可以选择()使用iterator = True来保存RAM,但这仍然意味着从磁盘读取几乎所有的表,所以它仍然很慢.

我一直在store.root..table.*东西打猎,希望我能得到一个索引值列表.我是在正确的轨道上吗？

计划B将保留一个较短的MultiIndex DataFrame,它只包含每次附加主数据时附加的空DataFrame.我可以检索它并使索引比主要索引便宜得多.虽然不太优雅.

解决方法:

创建一个多索引df

In [35]: df = DataFrame(randn(100000,3),columns=list('ABC'))

In [36]: df['one'] = 'foo'

In [37]: df['two'] = 'bar'

In [38]: df.ix[50000:,'two'] = 'bah'

In [40]: mi = df.set_index(['one','two'])

In [41]: mi

Out[41]:

MultiIndex: 100000 entries, (foo, bar) to (foo, bah)

Data columns (total 3 columns):

A 100000 non-null values

B 100000 non-null values

C 100000 non-null values

dtypes: float64(3)

将其存储为表格

In [42]: store = pd.HDFStore('test.h5',mode='w')

In [43]: store.append('df',mi)

get_storer将返回存储的对象(但不检索数据)

In [44]: store.get_storer('df').levels

Out[44]: ['one', 'two']

In [2]: store

Out[2]:

File path: test.h5

/df frame_table (typ->appendable_multi,nrows->100000,ncols->5,indexers->[index],dc->[two,one])

索引级别创建为data_columns,这意味着您可以在选择中使用它们

这是如何只选择索引

In [48]: store.select('df',columns=['one'])

Out[48]:

MultiIndex: 100000 entries, (foo, bar) to (foo, bah)

Empty DataFrame

选择单个列并将其作为mi-frame返回

In [49]: store.select('df',columns=['A'])

Out[49]:

MultiIndex: 100000 entries, (foo, bar) to (foo, bah)

Data columns (total 1 columns):

A 100000 non-null values

dtypes: float64(1)

要将单个列选择为Series(也可以是索引,因为它们存储为列).这将非常快.

In [2]: store.select_column('df','one')

Out[2]:

0 foo

1 foo

2 foo

3 foo

4 foo

5 foo

6 foo

7 foo

8 foo

9 foo

10 foo

11 foo

12 foo

13 foo

14 foo

...

99985 foo

99986 foo

99987 foo

99988 foo

99989 foo

99990 foo

99991 foo

99992 foo

99993 foo

99994 foo

99995 foo

99996 foo

99997 foo

99998 foo

99999 foo

Length: 100000, dtype: object

如果你真的想要最快的选择只有索引

In [4]: %timeit store.select_column('df','one')

100 loops, best of 3: 8.71 ms per loop

In [5]: %timeit store.select('df',columns=['one'])

10 loops, best of 3: 43 ms per loop

或者获得完整的索引

In [6]: def f():

...: level_1 = store.select_column('df','one')

...: level_2 = store.select_column('df','two')

...: return MultiIndex.from_arrays([ level_1, level_2 ])

...:

In [17]: %timeit f()

10 loops, best of 3: 28.1 ms per loop

如果你想要每个级别的值,这是一种非常快速的方法

In [2]: store.select_column('df','one').unique()

Out[2]: array(['foo'], dtype=object)

In [3]: store.select_column('df','two').unique()

Out[3]: array(['bar', 'bah'], dtype=object)

标签：python,pandas,hdfstore

来源： https://codeday.me/bug/20190517/1121943.html

一起来读英文原版

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中multiindex如何索引_python中multiindex如何索引_python – MultiIndex DataFr

在Pandas中,有没有办法以表格格式有效地提取HDFStore中存在的所有MultiIndex索引？我可以使用where =来有效地选择(),但我想要所有索引,而不是所有列.我也可以选择()使用iterator = True来保存RAM,但这仍然意味着从磁盘读取几乎所有的表,所以它仍然很慢.我一直在store.root..table.*东西打猎,希望我能得到一个索引值列表.我是在正确的轨道上吗？...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。