这是一个神经网络与模型预测控制结合,应用在基于模型的强化学习算法。
节点之间用文件通信。
如在main中:
#save the rollouts for later MBMF usage
pathname_savedMPCrollouts = save_dir + '/savedRollouts_avg'+ str(int(avg_rew)) +'.save'
pathname2_savedMPCrollouts = save_dir + '/savedRollouts.save'
f = open(pathname_savedMPCrollouts, 'wb')
cPickle.dump(all_rollouts_to_save, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
f = open(pathname2_savedMPCrollouts, 'wb')
cPickle.dump(all_rollouts_to_save, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
#save the starting states of these rollouts, in case want to visualize them later
f = open(save_dir + '/savedRollouts_startingStates.save', 'wb')
cPickle.dump(starting_states, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
cPickle是用来快速读写并序列化文件的操作库,其中的dump函数需要指定两个参数,第一个是需要序列化的python对象名称,第二个是本地的文件,需要注意的是,在这里需要使用open函数打开一个文件,并指定“写”操作 ,load函数与之相对是读操作。
Saved MPC rollouts for later mbmf TRPO usage.(保存文件为下一个节点使用)
而就在下一个节点mbmf:
f = open(save_dir + '/policy_tf_values.save', 'wb')
cPickle.dump(values, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
f = open(save_dir + '/policy_tf_values.save', 'rb')
values = cPickle.load(f)
f.close()
f = open(save_dir + '/policy_mlp.save', 'wb')
cPickle.dump(policy, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
f = open('run_'+ str(run_num)+'/savedRollouts.save', 'rb')
allData = cPickle.load(f)
f.close()