HDF5采集强化学习专家轨迹数据
这两天,一直在收集课题所需的数据,数据量算不上大,类型也并不丰富,但依然踩了不少坑。列入,Pandas存储的数组结构,保存到csv文件后。读取保存的csv文件是字符串形式,难以进一步加工。(这个我刚接触pandas,或许我暂时还没想到怎么解决)。
HDF5是一种存储文件的类型,它类似于目录的结构,可以方便的存储大量数据。
我的课题基于Python环境,需要保存一维数组,二维数组,这些数据是实时生成,且数据量较大,因此使用Numpy保存为.npy文件,虽然可行,但是会降低收集数据的效率。
我需要保存的一组数据示意如下,需要保存n组:
joint_pos_list = np.zeros((1,3))
image_list = np.zeros((1,1,48,48))
action_list = np.zeros((1,3))
reward_list = np.zeros((1,1))
next_joint_pos_list = np.zeros((1,3))
next_image_list = np.zeros((1,1,48,48))
done_list = np.zeros((1,1))
我最初采用的Pandas存储数据为csv格式,然而读取保存好的csv文件是以字符串形式构成。为此我采用HDF5来保存我所需要的数据,它的优势之一在于,以numpy保存的数据,仍然以numpy读取。
下面是我的代码,以收集100张所需的数据为例:
import h5py
import numpy as np
#create a HDF5 file mode = {'w', 'r', 'a'}
expert_filename = "expert_data.hdf5"
f = h5py.File(expert_filename, 'w')
#create a group under root './'
g = f.create_group('expert_data')
cnt_datas = 0
total_datas = int(100)
#create a dataset under './expert_data/'
joint_pos_expert = g.create_dataset("joint_pos", data=np.zeros((total_datas, 3)))
image_expert = g.create_dataset("image", data=np.zeros((total_datas, 1, 48, 48)))
action_expert = g.create_dataset("action", data=np.zeros((total_datas, 3)))
reward_expert = g.create_dataset("reward", data=np.zeros((total_datas, 1)))
done_expert = g.create_dataset("done", data=np.zeros((total_datas, 1)))
next_joint_pos_expert = g.create_dataset("next_joint_pos", data=np.zeros((total_datas, 3)))
next_image_expert = g.create_dataset("next_image", data=np.zeros((total_datas, 1, 48, 48)))
joint_pos = np.random.randn(1,3)
image = np.random.randn(1,48,48)
action = np.random.randn(1,3)
reward = np.random.randn(1,1)
done = np.random.randn(1,1)
next_joint_pos = np.random.randn(1,3)
next_image = np.random.randn(1,48,48)
for i in range(100):
# save data to datasets
g['joint_pos'][cnt_datas] = joint_pos
g['image'][cnt_datas] = image
g['action'][cnt_datas] = action
g['reward'][cnt_datas] = reward
g['done'][cnt_datas] = done
g['next_joint_pos'][cnt_datas] = next_joint_pos
g['next_image'][cnt_datas] = next_image
cnt_datas += 1
# read data from datasets
# read first row in joint_pos
joint_pos = g['joint_pos'][0]
# read last row in joint_pos
joint_pos = g['joint_pos'][total_datas-1]
# close file
f.close()