原文链接: 强化学习 qlearning CartPole-v0 单网络 学习
上一篇: numpy 向量叉积计算时的性能问题
下一篇: vs code 配置 c++环境
玄学炼丹,有时候可以跑出好结果,有时候不行
最好的一次,很快收敛,且效果很好
更新
避免两百步时,无论是否失败,都结束的行为,将reword自己进行判断,只有超出范围才算作失败
def refresh_memory(self):
observation = self.env.reset()
step = 0
for i in range(self.memory_size):
action = self.choose_action(observation)
observation_new, reword, done, info = self.env.step(action)
step += 1
theta_threshold_radians = 12 * 2 * math.pi / 360
x_threshold = 2.4
x, x_dot, theta, theta_dot = observation_new
done2 = any([
x < -x_threshold,
x > x_threshold,
theta < -theta_threshold_radians,
theta > theta_threshold_radians
])
# reword = -10 if done and step < 200 else reword
reword = -10 if done2 else reword
self.store_memory(observation, action, reword, observation_new)
observation = observation_new
if done:
observation = self.env.reset()
step = 0