Mike Müller..
21
这是与PyTables的比较.
(int(1e3), int(1e6)由于内存限制,我无法起床.因此,我使用了一个较小的数组:
data = np.random.random((int(1e3), int(1e5)))
NumPy save:
%timeit np.save('array.npy', data)
1 loops, best of 3: 4.26 s per loop
NumPy load:
%timeit data2 = np.load('array.npy')
1 loops, best of 3: 3.43 s per loop
PyTables写作:
%%timeit
with tables.open_file('array.tbl', 'w') as h5_file:
h5_file.create_array('/', 'data', data)
1 loops, best of 3: 4.16 s per loop
PyTables阅读:
%%timeit
with tables.open_file('array.tbl', 'r') as h5_file:
data2 = h5_file.root.data.read()
1 loops, best of 3: 3.51 s per loop
数字非常相似.因此PyTables在这里没有真正的好处.但我们非常接近我的SSD的最大写入和读取速率.
写作:
Maximum write speed: 241.6 MB/s
PyTables write speed: 183.4 MB/s
读:
Maximum read speed: 250.2
PyTables read speed: 217.4
由于数据的随机性,压缩并没有真正帮助:
%%timeit
FILTERS = tables.Filters(complib='blosc', complevel=5)
with tables.open_file('array.tbl', mode='w', filters=FILTERS) as h5_file:
h5_file.create_carray('/', 'data', obj=data)
1 loops, best of 3: 4.08 s per loop
读取压缩数据变得有点慢:
%%timeit
with tables.open_file('array.tbl', 'r') as h5_file:
data2 = h5_file.root.data.read()
1 loops, best of 3: 4.01 s per loop
这与常规数据不同:
reg_data = np.ones((int(1e3), int(1e5)))
写作速度明显加快:
%%timeit
FILTERS = tables.Filters(complib='blosc', complevel=5)
with tables.open_file('array.tbl', mode='w', filters=FILTERS) as h5_file:
h5_file.create_carray('/', 'reg_data', obj=reg_data)
1个循环,最佳3:849 ms每个循环
阅读也是如此:
%%timeit
with tables.open_file('array.tbl', 'r') as h5_file:
reg_data2 = h5_file.root.reg_data.read()
1 loops, best of 3: 1.7 s per loop
结论:使用PyTables时,您的数据越频繁.