【Gym】【强化学习环境】

目录

1 官网

 2 Atari

3 Classic Control 经典控制

4 Box2D 

5 MuJoCo


1 官网

https://www.gymlibrary.dev/

Gym is a standard API for reinforcement learning, and a diverse collection of reference environments

Gym 是用于强化学习的标准 API,也是各种参考环境的集合

The Gym interface is simple, pythonic, and capable of representing general RL problems:
Gym 接口简单、pythonic,能够表示一般的 RL 问题:

import gym
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = policy(observation)  # User-defined policy function
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()

 

 2 Atari

A set of Atari 2600 environment simulated through Stella and the Arcade Learning Environment.
通过Stella和Arcade学习环境模拟的一套Atari 2600环境。

Complete List - Atari - Gym Documentation

 

 

3 Classic Control 经典控制

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项:

pip install gym[classic_control]

There are five classic control environments: Acrobot, CartPole, Mountain Car, Continuous Mountain Car, and Pendulum. All of these environments are stochastic in terms of their initial state, within a given range. In addition, Acrobot has noise applied to the taken action. Also, regarding the both mountain car environments, the cars are under powered to climb the mountain, so it takes some effort to reach the top.
有五种经典控制环境:
Acrobot、CartPole、Mountain Car、Continuous Mountain Car Pendulum。在给定范围内,所有这些环境的初始状态都是随机的。此外,Acrobot 还会将噪声应用于所执行的操作。此外,关于两种山地车的环境,汽车爬山的动力不足,因此需要一些努力才能到达山顶。

Among Gym environments, this set of environments can be considered as easier ones to solve by a policy.
在 Gym 环境中,这组环境可以被认为是策略更容易解决的环境。

All environments are highly configurable via arguments specified in each environment’s documentation.
所有环境都可以通过每个环境文档中指定的参数进行高度配置。

4 Box2D 

These environments all involve toy games based around physics control, using box2d based physics and PyGame based rendering. These environments were contributed back in the early days of Gym by Oleg Klimov, and have become popular toy benchmarks ever since. All environments are highly configurable via arguments specified in each environment’s documentation.
这些环境都涉及基于物理控制的玩具游戏,使用基于 box2d 的物理和基于 PyGame 的渲染。这些环境是由奥列格·克里莫夫(Oleg Klimov)在Gym的早期贡献的,从那时起就成为流行的玩具基准。所有环境都可以通过每个环境文档中指定的参数进行高度配置。

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项:

pip install gym[box2d]

5 MuJoCo

 

MuJoCo stands for Multi-Joint dynamics with Contact. It is a physics engine for faciliatating research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed.
MuJoCo 代表 多关节动力学与接触。它是一个物理引擎,用于促进机器人技术、生物力学、图形和动画以及其他需要快速准确仿真的领域的研究和开发。

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项:

pip install gym[mujoco]

These environments also require that the MuJoCo engine be installed. As of October 2021 DeepMind has acquired MuJoCo and is open sourcing it in 2022, making it free for everyone. Instructions on installing the MuJoCo engine can be found at their website and GitHub repository. Using MuJoCo with OpenAI Gym also requires that the framework mujoco-py be installed, which can be found at the GitHub repository (this dependency in installed with the above command).
这些环境还要求安装 MuJoCo 引擎。截至 2021 年 10 月,DeepMind 已收购 MuJoCo,并于 2022 年开源,使其对所有人免费。有关安装 MuJoCo 引擎的说明,请访问他们的网站和 GitHub 存储库。将 MuJoCo 与 OpenAI Gym 一起使用还需要安装框架 mujoco-py,该框架可以在 GitHub 仓库中找到(此依赖项与上述命令一起安装)。

There are ten Mujoco environments: Ant, HalfCheetah, Hopper, Humanoid, HumanoidStandup, IvertedDoublePendulum, InvertedPendulum, Reacher, Swimmer, and Walker. All of these environments are stochastic in terms of their initial state, with a Gaussian noise added to a fixed initial state in order to add stochasticity. The state spaces for MuJoCo environments in Gym consist of two parts that are flattened and concatented together: a position of a body part (’mujoco-py.mjsim.qpos’) or joint and its corresponding velocity (’mujoco-py.mjsim.qvel’). Often, some of the first positional elements are omitted from the state space since the reward is calculated based on their values, leaviing it up to the algorithm to infer those hidden values indirectly.
有十种 Mujoco 环境:Ant、HalfCheetah、Hopper、Humanoid、HumanoidStandup、IvertedDoublePendulum、InvertedPendulum、Reacher、Swimmer 和 Walker。就其初始状态而言,所有这些环境都是随机的,为了增加随机性,将高斯噪声添加到固定的初始状态中。Gym 中 MuJoCo 环境的状态空间由两个部分组成,这两个部分被展平并融合在一起:身体部位的位置 ('mujoco-py.mjsim.qpos') 或关节及其相应的速度 ('mujoco-py.mjsim.qvel')。通常,状态空间中会省略一些第一个位置元素,因为奖励是根据它们的值计算的,将其留给算法来间接推断这些隐藏值。

Among Gym environments, this set of environments can be considered as more difficult ones to solve by a policy.
在 Gym 环境中,这组环境可以被视为策略更难解决的环境。

Environments can be configured by changing the XML files or by tweaking the parameters of their classes.
可以通过更改 XML 文件或调整其类的参数来配置环境。

  • 7
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

资源存储库

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值