python 读写h5py文件(转载)

原文链接,大家还是看原文吧。(仅供个人资料保存,不喜勿喷)

 

1. Creating HDF5 files

      We first load the numpy and h5py modules

      

import numpy as np
import h5py

Now mock up some simple dummy data to save to our file.

d1 = np.random.random(size = (1000,20))
d2 = np.random.random(size = (1000,200))
print d1.shape, d2.shape

output:(1000, 20) (1000, 200)

The first step to creating a HDF5 file is to initialise it. It uses a very similar syntax to initialising a typical text file in numpy. The first argument provides the filename and location, the second the mode. We’re writing the file, so we provide a w for write access.

hf = h5py.File('data.h5', 'w')

This creates a file object, hf, which has a bunch of associated methods. One is create_dataset, which does what it says on the tin. Just provide a name for the dataset, and the numpy array.

hf.create_dataset('dataset_1', data=d1)
hf.create_dataset('dataset_2', data=d2)
<HDF5 dataset "dataset_2": shape (1000, 200), type "<f8">

All we need to do now is close the file, which will write all of our work to disk.

hf.close()

2.  Reading HDF5 files

     To open and read data we use the same File method in read mode, r.

      

hf = h5py.File('data.h5', 'r')

   To see what data is in this file, we can call the keys() method on the file object.

hf.keys()
[u'group1']

 We can then grab each dataset we created above using the get method, specifying the name.

n1 = hf.get('dataset_1')
n1

This returns a HDF5 dataset object. To convert this to an array, just call numpy’s array method.

n1 = np.array(n1)
n1.shape
(1000, 20)
hf.close()

3. Groups

Groups are the basic container mechanism in a HDF5 file, allowing hierarchical organisation of the data. Groups are created similarly to datasets, and datsets are then added using the group object.

d1 = np.random.random(size = (100,33))
d2 = np.random.random(size = (100,333))
d3 = np.random.random(size = (100,3333))
hf = h5py.File('data.h5', 'w')
g1 = hf.create_group('group1')
g1.create_dataset('data1',data=d1)
g1.create_dataset('data2',data=d1)
<HDF5 dataset "data2": shape (100, 33), type "<f8">

We can also create subfolders. Just specify the group name as a directory format.

g2 = hf.create_group('group2/subfolder')
g2.create_dataset('data3',data=d3)
<HDF5 dataset "data3": shape (100, 3333), type "<f8">

As before, to read data in irectories and subdirectories use the get method with the full subdirectory path.

group2 = hf.get('group2/subfolder')
group2.items()
[(u'data3', <HDF5 dataset "data3": shape (100, 3333), type "<f8">)]
group1 = hf.get('group1')
group1.items()
[(u'data1', <HDF5 dataset "data1": shape (100, 33), type "<f8">),
 (u'data2', <HDF5 dataset "data2": shape (100, 33), type "<f8">)]
n1 = group1.get('data1')
np.array(n1).shape
(100, 33)
hf.close()

4.  Compression

To save on disk space, while sacrificing read speed, you can compress the data. Just add the compression argument, which can be either gziplzf or szipgzip is the most portable, as it’s available with every HDF5 install, lzf is the fastest but doesn’t compress as effectively as gzip, and szip is a NASA format that is patented up; if you don’t know about it, chances are your organisation doesn’t have the patent, so avoid.

For gzip you can also specify the additional compression_opts argument, which sets the compression level. The default is 4, but it can be an integer between 0 and 9.

hf = h5py.File('data.h5', 'w')

hf.create_dataset('dataset_1', data=d1, compression="gzip", compression_opts=9)
hf.create_dataset('dataset_2', data=d2, compression="gzip", compression_opts=9)

hf.close()

 

  • 8
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值