强化学习 qlearning CartPole-v0 单网络学习

最新推荐文章于 2023-02-08 12:44:38 发布

阿豪boy

最新推荐文章于 2023-02-08 12:44:38 发布

阅读量1.9k

点赞数

文章标签：网络游戏 python 深度学习 tensorflow

本文链接：https://blog.csdn.net/qq_35516360/article/details/122066038

版权

原文链接: 强化学习 qlearning CartPole-v0 单网络学习

上一篇: numpy 向量叉积计算时的性能问题

下一篇: vs code 配置 c++环境

玄学炼丹，有时候可以跑出好结果，有时候不行

最好的一次，很快收敛，且效果很好

更新

避免两百步时，无论是否失败，都结束的行为，将reword自己进行判断，只有超出范围才算作失败


    def refresh_memory(self):
        observation = self.env.reset()
        step = 0
        for i in range(self.memory_size):
            action = self.choose_action(observation)
            observation_new, reword, done, info = self.env.step(action)
            step += 1
            theta_threshold_radians = 12 * 2 * math.pi / 360
            x_threshold = 2.4
            x, x_dot, theta, theta_dot = observation_new
            done2 = any([
                x < -x_threshold,
                x > x_threshold,
                theta < -theta_threshold_radians,
                theta > theta_threshold_radians
            ])
            # reword = -10 if done and step < 200 else reword
            reword = -10 if done2 else reword
            self.store_memory(observation, action, reword, observation_new)
            observation = observation_new
            if done:
                observation = self.env.reset()
                step = 0