从GitHub
上下载的代码,涉及到的数据都是以.h5
为后缀的,那么这是什么类型的文件呢?
可以找到,代码中都引入了这个包h5py
,接下来一起看看这种文件怎么读取吧!
An HDF5 file is a container for two kinds of objects: datasets
, which are array-like collections of data, andgroups
, which are folder-like containers that hold datasets and other groups.
Groups work like dictionaries, and datasets work like NumPy arrays
拿到别人给你的文件mytestfile.hdf5
,如何读取呢?看下面代码:
# 导入包
import h5py
# 读取文件,r表示只读
f = h5py.File('mytestfile.hdf5', 'r')
# 查看文件keys
list(f.keys()) # ['mydataset']
# Remember h5py.File acts like a Python dictionary, thus we can check the keys
# 将file中key对应的dataset提取出来
data = f['mydataset']
# 这个dataset是HDF5 dataset类型,有dtype和shape可以查看类型和大小
data.shape
data.dtype
# 同样支持array的切片操作
data[...] = np.arange(100)
data[0:100:10] # array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
那么,你自己可以创建一个HDF5
吗?
import h5py
# 创建一个file,w表示写
f = h5py.File("myh5py.hdf5","w")
# 创建一个dataset
dataset = f.create_dataset("d1",(100,),dtype=