【无标题】

[转载]Quick Start Guide

把h5py官网的入门页贴过来,需要时比较方便参阅

https://docs.h5py.org/en/latest/quick.html

认为其是一个文件系统使用即可

h5py文件
一个h5py文件是 “dataset” 和 “group” 二合一的容器。

  1. dataset ——> 可以类比成ndarray,包含了一些数据
  2. group ——>可以类比成字典, 包含了其它 dataset 和 其它 group

我们可以把h5py文件类比成“文件夹”,以树形结构存储group和dataset

Install

With Anaconda or Miniconda:

conda install h5py

If there are wheels for your platform (mac, linux, windows on x86) and you do not need MPI you can install h5py via pip:

pip install h5py

With Enthought Canopy, use the GUI package manager or:

enpkg h5py

To install from source see Installation.

Core concepts

An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:

Groups work like dictionaries, and datasets work like NumPy arrays

Suppose someone has sent you a HDF5 file, mytestfile.hdf5. (To create this file, read Appendix: Creating a file.) The very first thing you’ll need to do is to open the file for reading:

>>> import h5py
>>> f = h5py.File('mytestfile.hdf5', 'r')

The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,

>>> list(f.keys())
['mydataset']

Based on our observation, there is one data set, mydataset in the file. Let us examine the data set as a Dataset object

>>> dset = f['mydataset']

The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

>>> dset.shape
(100,)
>>> dset.dtype
dtype('int32')

They also support array-style slicing. This is how you read and write data from a dataset in the file:

>>> dset[...] = np.arange(100)
>>> dset[0]
0
>>> dset[10]
10
>>> dset[0:100:10]
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

For more, see File Objects and Datasets.

Appendix: Creating a file

At this point, you may wonder how mytestdata.hdf5 is created. We can create a file by setting the mode to w when the File object is initialized. Some other modes are a (for read/write/create access), and r+ (for read/write access). A full list of file access modes and their meanings is at File Objects.

>>> import h5py
>>> import numpy as np
>>> f = h5py.File("mytestfile.hdf5", "w")

The File object has a couple of methods which look interesting. One of them is create_dataset, which as the name suggests, creates a data set of given shape and dtype

>>> dset = f.create_dataset("mydataset", (100,), dtype='i')

The File object is a context manager; so the following code works too

>>> import h5py
>>> import numpy as np
>>> with h5py.File("mytestfile.hdf5", "w") as f:
>>>     dset = f.create_dataset("mydataset", (100,), dtype='i')

Groups and hierarchical organization

“HDF” stands for “Hierarchical Data Format”. Every object in an HDF5 file has a name, and they’re arranged in a POSIX-style hierarchy with /-separators:

>>> dset.name
'/mydataset'

The “folders” in this system are called groups. The File object we created is itself a group, in this case the root group, named /:

>>> f.name
'/'

Creating a subgroup is accomplished via the aptly-named create_group. But we need to open the file in the “append” mode first (Read/write if exists, create otherwise)

>>> f = h5py.File('mydataset.hdf5', 'a')
>>> grp = f.create_group("subgroup")

All Group objects also have the create_* methods like File:

>>> dset2 = grp.create_dataset("another_dataset", (50,), dtype='f')
>>> dset2.name
'/subgroup/another_dataset'

By the way, you don’t have to create all the intermediate groups manually. Specifying a full path works just fine:

>>> dset3 = f.create_dataset('subgroup2/dataset_three', (10,), dtype='i')
>>> dset3.name
'/subgroup2/dataset_three'

Groups support most of the Python dictionary-style interface. You retrieve objects in the file using the item-retrieval syntax:

>>> dataset_three = f['subgroup2/dataset_three']

Iterating over a group provides the names of its members:

>>> for name in f:
...     print(name)
mydataset
subgroup
subgroup2

Membership testing also uses names:

>>> "mydataset" in f
True
>>> "somethingelse" in f
False

You can even use full path names:

>>> "subgroup/another_dataset" in f
True

There are also the familiar keys(), values(), items() and iter() methods, as well as get().

Since iterating over a group only yields its directly-attached members, iterating over an entire file is accomplished with the Group methods visit() and visititems(), which take a callable:

>>> def printname(name):
...     print(name)
>>> f.visit(printname)
mydataset
subgroup
subgroup/another_dataset
subgroup2
subgroup2/dataset_three

For more, see Groups.

Attributes

One of the best features of HDF5 is that you can store metadata right next to the data it describes. All groups and datasets support attached named bits of data called attributes.

Attributes are accessed through the attrs proxy object, which again implements the dictionary interface:

>>> dset.attrs['temperature'] = 99.5
>>> dset.attrs['temperature']
99.5
>>> 'temperature' in dset.attrs
True

For more, see Attributes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值