文章目录
一、简介
-
Hierarchical Data Format Version 5, HDF5: 层次性数据格式第五版
- 是一种
存储相同类型数值
的大数组的机制,适用于可被层次性组织
且数据集需要被元数据标记
的数据模型 - 常用的接口模块为 h5py
- 是一种
-
HDF5 三大要素:
- hdf5 files: 能够存储两类数据对象 dataset 和 group 的容器,其操作类似 python 标准的文件操作;File 实例对象本身就是一个组,以
/
为名,是遍历文件的入口 - dataset(array-like): 可类比为 Numpy 数组,每个数据集都有一个名字(name)、形状(shape) 和类型(dtype),支持切片操作
- group(folder-like): 可以类比为 字典,它是一种像文件夹一样的容器;group 中可以存放 dataset 或者其他的 group,键就是组成员的名称,值就是组成员对象本身(组或者数据集)
- hdf5 files: 能够存储两类数据对象 dataset 和 group 的容器,其操作类似 python 标准的文件操作;File 实例对象本身就是一个组,以
-
HDF5 数据可视化工具 HDFView:
- 支持全平台,可查看数据的细节
- 注意: 打开路径中不要包含中文
二、HDF5 Files
HDF5 Files
work generally like standard python file objects, theFile instance
we created is itself a group, in this case the root group, named/
, soFile instance
acts like a Python dictionary.
1、文件对象 f 的属性和方法:
- 属性:
f.name f.filename f.mode
- 方法:
- f.keys():
f[f.keys()[i]].value
- f.values():
- 存入的是 Numpy 数组,取出的是
h5py.Dataset
类的实例,这是一个代理对象,它会代理你的请求读写磁盘上的 HDF5 数据,对一个 Dataset 对象进行切片操作会返回一个 Numpy 数组。f.values()[0][:] or f[f.keys()[0]][:]
- key 为 dataset 时的属性
f.values()[i].name or f[key].name
: 以根目录为起点eg: u'/data' u'/label'
f.values()[i].shape or f[key].shape
f.values()[i].dtype or f[key].dtype
f.values()[i].value or f[key].value
- 存入的是 Numpy 数组,取出的是
- f.items()
- f.create_dataset()
- f.create_group()
- 注意:
- py2 中以迭代器的方式取数据为:
f.iterkeys()、f.itervalues()、f.iteritems()
py3 中不加 iter 即为迭代器方式 - Names of all objects in the file are all text strings (
unicode on py2, str on py3
)
- py2 中以迭代器的方式取数据为:
- f.keys():
2、文件读写
# 使用 gzip 压缩,压缩等级 1 表示压缩速度最快, 但是压缩比最差
comp_kwargs = {'compression': 'gzip', 'compression_opts': 1}
# 往 h5 文件里面写数据
with h5py.File(‘xxx.h5’, ‘w’) as f:
f.create_dataset(‘data’, data=np.array((imgs[start: end] - imgsMean) / 255.0).astype(np.float32), comp_kwargs)
f.create_dataset(‘label’, data=np.array(labels[start: end]).astype(np.float32), comp_kwargs)
# 读 h5 文件里面的数据
with h5py.File(‘xxx.h5’, ‘r’) as f:
for key in f.keys()
print(f[key].name)
print(f[key].value)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
三、Datasets
Datasets work like NumPy arrays !
1、创建数据集:f.create_dataset() or 字典赋值创建
- 创建空数据集时,只需指定数据集的 name 和 shape 即可,dtype 默认为
np.float32
,默认填充值为 0,亦可通过关键字参数fillvalue
来改变 - 创建非空数据集时,只需指定 name 和具体的数据 data 即可,shape 和 dtype 都会从 data 中自动获取,当然你也可以显示的指定存储类型来节省空间。(单精度浮点比双精度浮点要节省一半的空间)
- 改变形状:创建数据集时指定一个跟输入的数组不同的形状,只要两个形状的元素个数相等
2、h5py.File.create_dataset 函数的用法
h5py.File.create_dataset(self, name, shape=None, dtype=None, data=None, **kwds): name: Name of the dataset (absolute or relative). Provide None to make an anonymous dataset. shape: Dataset shape. Use "()" for scalar datasets. Required if "data" isn't provided. dtype: Numpy dtype or string. If omitted, dtype('f') will be used. Required if "data" isn't provided; otherwise, overrides data array's dtype. data: Provide data to initialize the dataset. If used, you can omit shape and dtype arguments.
Keyword<span class="token operator">-</span>only arguments<span class="token punctuation">:</span> chunks <span class="token punctuation">(</span>Tuple<span class="token punctuation">)</span> Chunk shape<span class="token punctuation">,</span> <span class="token operator">or</span> <span class="token boolean">True</span> to enable auto<span class="token operator">-</span>chunking<span class="token punctuation">.</span> maxshape <span class="token punctuation">(</span>Tuple<span class="token punctuation">)</span> Make the dataset resizable up to this shape<span class="token punctuation">.</span> Use <span class="token boolean">None</span> <span class="token keyword">for</span> axes you want to be unlimited<span class="token punctuation">.</span> compression <span class="token punctuation">(</span>String <span class="token operator">or</span> <span class="token builtin">int</span><span class="token punctuation">)</span> Compression strategy<span class="token punctuation">.</span> Legal values are <span class="token string">'gzip'</span><span class="token punctuation">,</span> <span class="token string">'szip'</span><span class="token punctuation">,</span> <span class="token string">'lzf'</span><span class="token punctuation">.</span> If an integer <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">10</span><span class="token punctuation">)</span><span class="token punctuation">,</span> this indicates gzip compression level<span class="token punctuation">.</span> Otherwise<span class="token punctuation">,</span> an integer indicates the number of a dynamically loaded compression <span class="token builtin">filter</span><span class="token punctuation">.</span> compression_opts Compression settings<span class="token punctuation">.</span> This <span class="token keyword">is</span> an integer <span class="token keyword">for</span> gzip<span class="token punctuation">,</span> <span class="token number">2</span><span class="token operator">-</span><span class="token builtin">tuple</span> <span class="token keyword">for</span> szip<span class="token punctuation">,</span> etc<span class="token punctuation">.</span> If specifying a dynamically loaded compression <span class="token builtin">filter</span> number<span class="token punctuation">,</span> this must be a <span class="token builtin">tuple</span> of values<span class="token punctuation">.</span> shuffle <span class="token punctuation">(</span>T<span class="token operator">/</span>F<span class="token punctuation">)</span> Enable shuffle <span class="token builtin">filter</span><span class="token punctuation">.</span> fillvalue <span class="token punctuation">(</span>Scalar<span class="token punctuation">)</span> Use this value <span class="token keyword">for</span> uninitialized parts of the dataset<span class="token punctuation">.</span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
四、Group
Groups work like dictionaries ! 创建组:
f.create_group(name)
五、参考资料
1、H5PY DOC
2、Python中h5py模块的使用(基础入门)
</div><div><div></div></div>
<link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-60ecaf1f42.css" rel="stylesheet">
</div>
</article>