Gym模块学习笔记（Ⅰ）——Rita_Aloha

最新推荐文章于 2024-07-22 14:10:07 发布

Rita_Aloha

最新推荐文章于 2024-07-22 14:10:07 发布

阅读量1.5k

点赞数

文章标签：学习

本文链接：https://blog.csdn.net/Rita_Aloha/article/details/124696221

版权

一、环境构建（Environment Creation）

1、如何分类文件（Subclassing gym.Env）

实例如下：

gym-examples/
  README.md
  setup.py
  gym_examples/
    __init__.py
    envs/
      __init__.py
      grid_world.py
    wrappers/
      __init__.py
      relative_position.py

链接： https://github.com/Farama-Foundation/gym-examples

实例的几点说明：

Observation提供target和agent的位置（location）
4向运动：上下左右
agent到达target所在栅格会返回一个“done”信号
Reward是二进制且稀疏：即时奖励为0，到达target为1

2.声明（Declaration）和初始化（Initialization）

定制环境继承（inherit）于抽象类（gym.Env），需要为其添加metadata属性，即指定render_modes（"human", "rgb_array", "ansi"）以及帧速率（framerate）。__init__方法会接收网格大小的数据，定义self，设置一些用于render的变量（self.observation_space and self.action_space）

3.通过环境状态构建Observation

由于需要在reset和step中计算observation，通常使用私有函数_get_obs将环境值转化为observation。

有时，info中包含一些只有step函数中需要的数据，通过_get_info返回相应的（dict）数据。

4.Reset

reset函数用于得到done信号后启用新的脚本（episode），并需用在step函数前。可以传递seed关键字重置环境的任何随机数生成器（self.np_random），以保证初始化为同一确定性状态。如果在同一范围内使用，不必每次调用同一随机数生成器，但是需要调用super().reset(seed=seed)保证gym.Env类中的seed的范围正确。reset方法要么返回初始状态的观察值，要么返回初始观察值的元组和一些辅助信息，这取决于return_info是否为True，可以使用方法_get_obs和_get_info实现。

5.Step

step方法通常包含环境的大部分逻辑。它接受一个action，在应用该动作后计算环境的状态，并返回4-tuple (observation, reward, done, info)。一旦计算出环境的新状态，我们就可以检查它是否是终端状态，并相应地设置done。可以使用方法_get_obs和_get_info获取observation和info。

6.Rendering（render）

使用PyGame进行render。

7.Close

close方法用于关闭环境使用的所有开放资源。在许多情况下，不必实现这种方法。然而，在render_mode是“human”需要关闭已打开的窗口。在其他环境中，close还可能关闭已打开的文件或释放其他资源。close后不能于环境交互。

8.Registering Envs

为了让Gym检测到定制环境，必须注册。

from gym.envs.registration import register

register(
    id='gym_examples/GridWorld-v0',
    entry_point='gym_examples.envs:GridWorldEnv',
    max_episode_steps=300,
)

环境ID由三个组件组成：名称空间（可选）、名称（强制）和版本（可选但推荐）。在创建环境时应使用相应的ID。关键字参数max_eposion_steps=300将确保定制环境通过gym.make被包装在TimeLimit装饰器中（有关更多信息，请参阅包装文档）。如果agent已达到target或在当前事件中执行了300个步骤，都会生成done信号。要区分截断和终止可以检查info["TimeLimit.truncated"]。除了id和entrypoint，还可以传递以下附加关键字参数进行注册：

Name	Type	Default	Description
`reward_threshold`	`float`	`None`	The reward threshold before the task is considered solved
`nondeterministic`	`bool`	`False`	Whether this environment is non-deterministic even after seeding
`max_episode_steps`	`int`	`None`	The maximum number of steps that an episode can consist of. If not `None`, a `TimeLimit` wrapper is added
`order_enforce`	`bool`	`True`	Whether to wrap the environment in an `OrderEnforcing` wrapper
`autoreset`	`bool`	`False`	Whether to wrap the environment in an `AutoResetWrapper`
`kwargs`	`dict`	`{}`	The default kwargs to pass to the environment class

注册后，创建自定义环境：env = gym.make('gym_examples/GridWorld-v0')

9.创建包

最后一步是将代码构造为Python包。

from setuptools import setup

setup(name='gym_examples',
    version='0.0.1',
    install_requires=['gym==0.23.1', 'pygame==2.1.0']
)

10.创建环境实例

使用pip install -e gym-examples在本地安装软件包后创建环境实例：

import gym_examples
env = gym.make('gym_examples/GridWorld-v0')

或

env = gym.make('gym_examples/GridWorld-v0', size=10)

11.使用装饰器

装饰器可以在不改变环境实现或添加任何样板代码的情况下修改Gym或其他模块提供的环境。查看相关文档了解使用的详细信息，以及实现自己的装饰器。Defines the observation and action spaces, so you can write generic code that applies to any Env. For example, you can choose a random action.

二、Spaces类

定义观察空间和动作空间，因此可以编写适用于任何环境的通用代码。

WARNING -自定义观察和操作空间可以从Space类继承。然而，大多数用例都应该包含在现有的空间类（例如Box、Discrete等）和容器类（Tuple和Dict）中。（通过sample（）方法）参数化概率分布和在（gym.vector.VectorEnv中的）批处理函数，仅在gym中默认提供的空间中定义完备。此外，强化学习算法的一些实现可能无法正确处理自定义空间。

1.常用函数

gym.spaces.Space.sample(self) → gym.spaces.space.T_cov

gym.spaces.Space.contains(self, x) → bool

property Space.shape: Optional[tuple[int, ...]]

property gym.spaces.Space.dtype

gym.spaces.Space.seed(self, seed: Optional[int] = None) → list

gym.spaces.Space.to_jsonable(self, sample_n: Sequence[gym.spaces.space.T_cov]) → list

gym.spaces.Space.from_jsonable(self, sample_n: list) → list[+T_cov]

2.Box

class gym.spaces.Box(low: typing.Union[typing.SupportsFloat, numpy.ndarray], 
                     high: typing.Union[typing.SupportsFloat, numpy.ndarray],
                     shape: typing.Optional[typing.Sequence[int]] = None, 
                     dtype: typing.Type = <class 'numpy.float32'>, 
                     seed: typing.Optional[typing.Union[int,gym.utils.seeding.RandomNumberGenerator]] = None)

一个Box代表n个闭合区间的笛卡尔积。每个区间的形式为[a，b]、（-oo，b]、[a，oo]或（-oo，oo）之一。有两种常见的用例：

每个维度的边界都相同，例：

Box(low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)

每个维度有相互独立边界，例：

Box(low=np.array([-1.0, -2.0]), high=np.array([2.0, 4.0]), dtype=np.float32)

在Box内部生成一个随机样本，在创建样本时，将根据间隔的形式对每个坐标进行采样：

[a，b]：均匀分布
[a，∞]：移动指数分布
(-∞，b)：移位负指数分布
(-∞，∞)：正态分布

3.离散（Discrete）

class gym.spaces.Discrete(n: int, 
                          seed: Optional[Union[int,gym.utils.seeding.RandomNumberGenerator]] = None, 
                          start: int = 0)

4.多维二进制（MultiBinary）

class gym.spaces.MultiBinary(n: Union[numpy.ndarray, Sequence[int], int], 
                             seed: Optional[Union[int,gym.utils.seeding.RandomNumberGenerator]] = None)

MultiBinary的参数定义n，n可以是一个数字或一组数字。

5.多重离散（MultiDiscrete）

class gym.spaces.MultiDiscrete(nvec: list[int], 
                               dtype=<class 'numpy.int64'>, 
                               seed: typing.Optional[typing.Union[int,gym.utils.seeding.RandomNumberGenerator]] = None)

多离散动作空间由一系列离散动作空间组成，每个离散动作空间中有不同数量的动作。这对于表示游戏控制器或键盘很有用，其中每个键都可以表示为一个离散的动作空间。它通过传递一个正整数数组来参数化，该数组指定每个离散动作空间的动作数。一些环境装饰器假定值0表示NOOP。

6.Dict

class gym.spaces.Dict(spaces: Optional[dict[str, gym.spaces.space.Space]] = None, 
                      seed: Optional[Union[dict, int, 
                      gym.utils.seeding.RandomNumberGenerator]] = None, 
                      **spaces_kwargs: gym.spaces.space.Space)

7.元组

class gym.spaces.Tuple(spaces: Iterable[gym.spaces.space.Space], 
                       seed: Optional[Union[int,list[int],gym.utils.seeding.RandomNumberGenerator]] = None)

8.Utility函数

gym.spaces.utils.flatdim(space: gym.spaces.space.Space) → int
#Raises NotImplementedError if the space is not defined in gym.spaces.
gym.spaces.utils.flatten_space(space: gym.spaces.space.Space) → gym.spaces.box.Box
#Raises NotImplementedError if the space is not defined in gym.spaces.
gym.spaces.utils.flatten(space: gym.spaces.space.Space[gym.spaces.utils.T], x: gym.spaces.utils.T) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.multi_binary.MultiBinary, x) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.box.Box, x) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.discrete.Discrete, x) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.multi_discrete.MultiDiscrete, x) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.tuple.Tuple, x) → numpy.ndarray
gym.spaces.utils.flatten(space: gym.spaces.dict.Dict, x) → numpy.ndarray
#NotImplementedError if the space is not defined in gym.spaces.
gym.spaces.utils.unflatten(space: gym.spaces.space.Space[gym.spaces.utils.T], x: numpy.ndarray) → gym.spaces.utils.T
#Unflatten a data point from a space.
#Raises NotImplementedError if the space is not defined in gym.spaces.

Environment Creation - Gym Documentationhttps://www.gymlibrary.ml/content/environment_creation/

Rita_Aloha

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Gym模块学习笔记（Ⅰ）——Rita_Aloha

gym学习笔记Environment Creation - Gym Documentationhttps://www.gymlibrary.ml/content/environment_creation/
复制链接

扫一扫