强化学习使用gym库遇到的问题（持续更新......）

absolute_beauty

已于 2023-02-20 21:48:40 修改

阅读量2.1k

点赞数 7

文章标签： python

于 2023-02-20 15:19:58 首次发布

本文链接：https://blog.csdn.net/qq_36819624/article/details/129124509

版权

1. env = gym.make('CartPole-v0')

env = gym.make('CartPole-v0')
env.render()

报错：gym.error.ResetNeeded: Cannot call `env.render()` before calling `env.reset()`, if this is a intended action, set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper.

分析：这是高版本gym库导致。render()函数扮演图像引擎的角色。调用env.render()【env.render(mode='human')】的话直接被忽视。要么降低gym版本，或者按下面操作。需要更改上面的代码为：

env = gym.make('CartPole-v0',render_mode='human')

2. observation_, reward, done, info = env.step(action)

observation_, reward, done, info ,_= env.step(action)

报错：参数不足。

分析：这是高版本gym库导致。获取的变量少了，应该是5个，现在只定义4个，所以报错。可以改成：

observation_, reward, done, info ,_= env.step(action)

step函数参数：

    def step(self, action):
         #Map the action to the direction we walk in
         direction=self._action_to_direction[action]
         #We use np.clip to make sure we don't leave the grid
         self._agent_location=np.clip(self._agent_location+direction,0,self.size-1)
         terminated=np.array_equal(self._agent_location,self._target_location)
         reward=1 if terminated else 0
         observation=self._get_obs()
         info=self._get_info()

         if self.render_mode=="human":
            self._render_frame()

         return observation,reward,terminated,False,info

【1】observation (object): this will be an element of the environment's :attr:`observation_space`. This may, for instance, be a numpy array containing the positions and velocities of certain objects. 环境状态信息
【2】reward (float): The amount of reward returned as a result of taking the action. 奖励信息
【3】terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results. 是否到达终端状态。
【4】info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging). `info”包含辅助诊断信息（有助于调试、学习和记录）。
This might, for instance, contain: metrics that describe the agent's performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version. 例如，这可能包含：描述代理的绩效状态的指标、隐藏在观察中的变量，或组合起来产生总奖励的单个奖励术语。它还可以包含区分截断和终止的信息，但这是不推荐的，而是支持返回两个布尔值，并将在将来的版本中删除。

absolute_beauty

关注

7
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
强化学习使用gym库遇到的问题（持续更新......）

强化学习使用gym库遇到的问题（持续更新......）
复制链接

扫一扫

强化学习使用gym库遇到的问题（持续更新......）

1. env = gym.make('CartPole-v0')

2. observation_, reward, done, info = env.step(action)

“相关推荐”对你有帮助么？