HDF5保存numpy读取numpy

最新推荐文章于 2022-12-06 15:32:01 发布

只取一勺

最新推荐文章于 2022-12-06 15:32:01 发布

阅读量1.4k

点赞数

分类专栏：大数据

本文链接：https://blog.csdn.net/qq_41736190/article/details/115140797

版权

大数据专栏收录该内容

1 篇文章 0 订阅

订阅专栏

HDF5采集强化学习专家轨迹数据

这两天，一直在收集课题所需的数据，数据量算不上大，类型也并不丰富，但依然踩了不少坑。列入，Pandas存储的数组结构，保存到csv文件后。读取保存的csv文件是字符串形式，难以进一步加工。（这个我刚接触pandas，或许我暂时还没想到怎么解决）。

HDF5是一种存储文件的类型，它类似于目录的结构，可以方便的存储大量数据。

我的课题基于Python环境，需要保存一维数组，二维数组，这些数据是实时生成，且数据量较大，因此使用Numpy保存为.npy文件，虽然可行，但是会降低收集数据的效率。

我需要保存的一组数据示意如下，需要保存n组：

joint_pos_list = np.zeros((1,3))
image_list = np.zeros((1,1,48,48))
action_list =  np.zeros((1,3))
reward_list =  np.zeros((1,1))
next_joint_pos_list = np.zeros((1,3))
next_image_list = np.zeros((1,1,48,48))
done_list = np.zeros((1,1))

我最初采用的Pandas存储数据为csv格式，然而读取保存好的csv文件是以字符串形式构成。为此我采用HDF5来保存我所需要的数据，它的优势之一在于，以numpy保存的数据，仍然以numpy读取。

下面是我的代码，以收集100张所需的数据为例：

import h5py
import numpy as np 
#create a HDF5 file mode = {'w', 'r', 'a'}
expert_filename = "expert_data.hdf5"
f = h5py.File(expert_filename, 'w')

#create a group under root './'
g = f.create_group('expert_data')

cnt_datas = 0
total_datas = int(100)
#create a dataset under './expert_data/'
joint_pos_expert = g.create_dataset("joint_pos", data=np.zeros((total_datas, 3)))
image_expert = g.create_dataset("image", data=np.zeros((total_datas, 1, 48, 48)))
action_expert = g.create_dataset("action", data=np.zeros((total_datas, 3)))
reward_expert = g.create_dataset("reward", data=np.zeros((total_datas, 1)))
done_expert = g.create_dataset("done", data=np.zeros((total_datas, 1)))
next_joint_pos_expert = g.create_dataset("next_joint_pos", data=np.zeros((total_datas, 3)))
next_image_expert = g.create_dataset("next_image", data=np.zeros((total_datas, 1, 48, 48)))


joint_pos = np.random.randn(1,3)
image = np.random.randn(1,48,48)
action = np.random.randn(1,3)
reward = np.random.randn(1,1)
done = np.random.randn(1,1)
next_joint_pos = np.random.randn(1,3)
next_image = np.random.randn(1,48,48)

for i in range(100):
    # save data to datasets
    g['joint_pos'][cnt_datas] = joint_pos
    g['image'][cnt_datas] = image
    g['action'][cnt_datas] = action
    g['reward'][cnt_datas] = reward
    g['done'][cnt_datas] = done
    g['next_joint_pos'][cnt_datas] = next_joint_pos
    g['next_image'][cnt_datas] = next_image
    cnt_datas += 1

# read data from datasets
# read first row in joint_pos
joint_pos = g['joint_pos'][0]

# read last row in joint_pos
joint_pos = g['joint_pos'][total_datas-1]

# close file
f.close()

只取一勺

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HDF5保存numpy读取numpy

HDF5采集强化学习专家轨迹数据这两天，一直在收集课题所需的数据，数据量算不上大，类型也并不丰富，但依然踩了不少坑。列入，Pandas存储的数组结构，保存到csv文件后。读取保存的csv文件是字符串形式，难以进一步加工。（这个我刚接触pandas，或许我暂时还没想到怎么解决）。HDF5是一种存储文件的类型，它类似于目录的结构，可以方便的存储大量数据。我的课题基于Python环境，需要保存一维数组，二维数组，这些数据是实时生成，且数据量较大，因此使用Numpy保存为.npy文件，虽然可行，但是会降低收集
复制链接

扫一扫