HDF5: Python 的h5py与Julia的HDF5库读取效率比较，不差上下

最新推荐文章于 2020-12-03 22:59:49 发布

songroom

最新推荐文章于 2020-12-03 22:59:49 发布

阅读量831

点赞数 1

分类专栏： Julia python 大数据

本文链接：https://blog.csdn.net/wowotuo/article/details/102144114

版权

Julia 同时被 3 个专栏收录

171 篇文章 20 订阅

订阅专栏

大数据

126 篇文章 8 订阅

订阅专栏

python

79 篇文章 6 订阅

订阅专栏

封装一下h5py的读写array库，全数值型

def h5py_write_arr(save_path,data):
    f = h5py.File(save_path,'w')
    try:
        f['data'] =data
    finally:
        f.close()
        
def h5py_read_arr(save_path):
    f = h5py.File(save_path,'r')
    try:
        data =f['data'][()]
    finally:
        f.close()
    return data

用python 把一个array持久化为h5.

path = "C:\Users\rustr\Desktop\my.h5"

比较一下读取的速度情况

1、用 h5py_read_arr：

    t1 =t.time()
    data = h5py_read_arr(path)
    
    print("read h5 cost time:",t.time()-t1)

time:",t.time()-t1)
read h5 cost time: 0.14444708824157715

t1 =t.time() data = h5py_read_arr(path)

print(“read h5 cost time:”,t.time()-t1) read h5 cost time:
0.2049427032470703

t1 =t.time() data = h5py_read_arr(path)

print(“read h5 cost time:”,t.time()-t1) read h5 cost time:
0.27478766441345215

t1 =t.time() data = h5py_read_arr(path)

print(“read h5 cost time:”,t.time()-t1) read h5 cost time:
0.18096709251403809

2、用Julia:

using HDF5

function read_h5(file)
    fid = HDF5.h5open(file,"r")
    try
        data = read(fid,"data")
    finally
        close(fid)
    end
    return data
end
file = s"C:\Users\rustr\Desktop\my.h5"
@time data =read_h5(file);
size(data)

或者直接来一行：

@time data = h5read(file,"data") # 简单，一行搞定

结果如下：

julia> @time data =read_h5(file);
  0.178845 seconds (2.56 k allocations: 397.228 MiB, 17.72% gc time)

julia> @time data =read_h5(file);
  0.181711 seconds (59 allocations: 397.107 MiB, 20.64% gc time)

3、结论

（1）python 的h5py库是可以被Julia的HDF5库读取的；有可能在h5层次成为一个文件的标准；其它库生成的文件，并不一定，比如pdstore函数。

（2）Julia和python差不多。