openai gym学习

openai gym学习

  1. 环境的定义 Env类
    class Env(Generic[ObsType, ActType]):
    有如下抽象方法: step reset render close seed

    step

    @abstractmethod
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
        """Run one timestep of the environment's dynamics. When end of
        episode is reached, you are responsible for calling `reset()`
        to reset this environment's state.
        Accepts an action and returns a tuple (observation, reward, done, info).
        Args:
            action (object): an action provided by the agent
        Returns:
            observation (object): agent's observation of the current environment
            reward (float) : amount of reward returned after previous action
            done (bool): whether the episode has ended, in which case further step() calls will return undefined results
            info (dict): contains auxiliary diagnostic information (helpful for debugging, logging, and sometimes learning)
        """
        raise NotImplementedError
    

    reset

    @abstractmethod
    def reset(
        self,
        *,
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None,
    ) -> Union[ObsType, tuple[ObsType, dict]]:
        """Resets the environment to an initial state and returns an initial
        observation.
        This method should also reset the environment's random number
        generator(s) if `seed` is an integer or if the environment has not
        yet initialized a random number generator. If the environment already
        has a random number generator and `reset` is called with `seed=None`,
        the RNG should not be reset.
        Moreover, `reset` should (in the typical use case) be called with an
        integer seed right after initialization and then never again.
        Returns:
            observation (object): the initial observation.
            info (optional dictionary): a dictionary containing extra information, this is only returned if return_info is set to true
        """
        # Initialize the RNG if the seed is manually passed
        if seed is not None:
            self._np_random, seed = seeding.np_random(seed)
    

    render

     @abstractmethod
     def render(self, mode="human"):
         """Renders the environment.
         The set of supported modes varies per environment. (And some
         third-party environments may not support rendering at all.)
         By convention, if mode is:
         - human: render to the current display or terminal and
           return nothing. Usually for human consumption.
         - rgb_array: Return an numpy.ndarray with shape (x, y, 3),
           representing RGB values for an x-by-y pixel image, suitable
           for turning into a video.
         - ansi: Return a string (str) or StringIO.StringIO containing a
           terminal-style text representation. The text can include newlines
           and ANSI escape sequences (e.g. for colors).
         Note:
             Make sure that your class's metadata 'render_modes' key includes
               the list of supported modes. It's recommended to call super()
               in implementations to use the functionality of this method.
         Args:
             mode (str): the mode to render with
         Example:
         class MyEnv(Env):
             metadata = {'render_modes': ['human', 'rgb_array']}
             def render(self, mode='human'):
                 if mode == 'rgb_array':
                     return np.array(...) # return RGB frame suitable for video
                 elif mode == 'human':
                     ... # pop up a window and render
                 else:
                     super(MyEnv, self).render(mode=mode) # just raise an exception
         """
         raise NotImplementedError
    

    close

    def close(self):
        """Override close in your subclass to perform any necessary cleanup.
        Environments will automatically close() themselves when
        garbage collected or when the program exits.
        """
        pass
    

    seed

    def seed(self, seed=None):
        """Sets the seed for this env's random number generator(s).
        Note:
            Some environments use multiple pseudorandom number generators.
            We want to capture all such seeds used in order to ensure that
            there aren't accidental correlations between multiple generators.
        Returns:
            list<bigint>: Returns the list of seeds used in this env's random
              number generators. The first value in the list should be the
              "main" seed, or the value which a reproducer should pass to
              'seed'. Often, the main seed equals the provided 'seed', but
              this won't be true if seed=None, for example.
        """
        deprecation(
            "Function `env.seed(seed)` is marked as deprecated and will be removed in the future. "
            "Please use `env.reset(seed=seed) instead."
        )
        self._np_random, seed = seeding.np_random(seed)
        return [seed]
    

    有如下属性:
    action_space: The Space object corresponding to valid actions
    observation_space: The Space object corresponding to valid observations
    reward_range: A tuple corresponding to the min and max possible rewards

  2. Wrapper类 class Wrapper(Env[ObsType, ActType]):
    装饰器模式,在不改变原有代码的基础上,添加或者更改一些功能。体现在修改原环境的类方法(reset step render等)或者类属性(action_space observation_space reward_range),比如下面的CliffWalkingWrapper就是继承了Wrapper类,重写了用来可视化的render函数,部分代码如下:

    class Wrapper(Env[ObsType, ActType]):
        """Wraps the environment to allow a modular transformation.
        This class is the base class for all wrappers. The subclass could override
        some methods to change the behavior of the original environment without touching the
        original code.
        .. note::
            Don't forget to call ``super().__init__(env)`` if the subclass overrides :meth:`__init__`.
        """
    
        def __init__(self, env: Env):
            self.env = env
    
            self._action_space: spaces.Space | None = None
            self.
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
OpenAI Gym是一个用于开发和比较强化学习算法的开源工具包。它提供了许多经典的强化学习环境,让研究者能够更轻松地实验和测试自己的算法。 OpenAI Gym包含了一系列模拟环境,可以在这些环境中训练强化学习算法。这些环境包括了各种各样的问题,例如棋盘游戏、控制机器人或车辆等场景。这些问题复杂多样,要求智能体在环境中进行观察、决策和行动。 OpenAI Gym的设计使得使用者能够方便地编写、测试和比较各种不同的强化学习算法。用户可以在该工具包中选择合适的环境,并使用内置的API进行训练和测试。此外,用户还可以通过插入自定义代码来扩展现有环境或创建全新的环境。 OpenAI Gym还提供了一种称为“gym spaces”的概念。这是一种用于描述观察空间和动作空间的通用接口。用户只需定义环境的观察空间和动作空间的特征,就可以使用这些通用接口来处理不同类型的环境。 通过使用OpenAI Gym,研究者可以在一个统一的框架下进行强化学习算法的开发和评估。这使得算法的比较更加公平和准确。同时,OpenAI Gym的开源性质也促进了算法共享和交流,推动了强化学习领域的发展。 总之,OpenAI Gym是一个强大的工具包,为研究者提供了广泛的强化学习环境和便利的开发、测试以及比较算法的功能。它的开源性质和通用接口设计使得研究者能够更加高效地进行算法的开发和创新。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值