【Gym】【强化学习环境】

资源存储库

于 2024-07-30 17:33:41 发布

阅读量533

点赞数 7

分类专栏：算法笔记文章标签： python windows 开发语言

本文链接：https://blog.csdn.net/wq6qeg88/article/details/140801865

版权

笔记同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

算法

4 篇文章 0 订阅

订阅专栏

1 官网

2 Atari

3 Classic Control 经典控制

4 Box2D

5 MuJoCo

1 官网

https://www.gymlibrary.dev/

Gym is a standard API for reinforcement learning, and a diverse collection of reference environments

Gym 是用于强化学习的标准 API，也是各种参考环境的集合

The Gym interface is simple, pythonic, and capable of representing general RL problems:
Gym 接口简单、pythonic，能够表示一般的 RL 问题：

import gym
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(1000):
   action = policy(observation)  # User-defined policy function
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()

2 Atari

A set of Atari 2600 environment simulated through Stella and the Arcade Learning Environment.
通过Stella和Arcade学习环境模拟的一套Atari 2600环境。

Adventure
Air Raid
Alien
Amidar
Assault
Asterix
Asteroids
Atlantis
Bank Heist
Battle Zone
Beam Rider
Berzerk
Bowling
Boxing
Breakout
Carnival
Centipede
Chopper Command
Crazy Climber
Defender
Demon Attack
Double Dunk
Elevator Action
Enduro
FishingDerby
Freeway
Frostbite
Gopher
Gravitar
Hero
IceHockey
Jamesbond
JourneyEscape
Kangaroo
Krull
Kung Fu Master
Montezuma Revenge
Ms Pacman
Name This Game
Phoenix
Pitfall
Pong
Pooyan
PrivateEye
Qbert
Riverraid
Road Runner
Robot Tank
Seaquest
Skiings
Solaris
SpaceInvaders
StarGunner
Tennis
TimePilot
Tutankham
Up n’ Down
Venture
Video Pinball
Wizard of Wor
Zaxxon

Complete List - Atari - Gym Documentation

3 Classic Control 经典控制

Acrobot
Cart Pole
Mountain Car Continuous
Mountain Car
Pendulum

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项：

pip install gym[classic_control]

There are five classic control environments: Acrobot, CartPole, Mountain Car, Continuous Mountain Car, and Pendulum. All of these environments are stochastic in terms of their initial state, within a given range. In addition, Acrobot has noise applied to the taken action. Also, regarding the both mountain car environments, the cars are under powered to climb the mountain, so it takes some effort to reach the top.
有五种经典控制环境：Acrobot、CartPole、Mountain Car、Continuous Mountain Car 和 Pendulum。在给定范围内，所有这些环境的初始状态都是随机的。此外，Acrobot 还会将噪声应用于所执行的操作。此外，关于两种山地车的环境，汽车爬山的动力不足，因此需要一些努力才能到达山顶。

Among Gym environments, this set of environments can be considered as easier ones to solve by a policy.
在 Gym 环境中，这组环境可以被认为是策略更容易解决的环境。

All environments are highly configurable via arguments specified in each environment’s documentation.
所有环境都可以通过每个环境文档中指定的参数进行高度配置。

4 Box2D

These environments all involve toy games based around physics control, using box2d based physics and PyGame based rendering. These environments were contributed back in the early days of Gym by Oleg Klimov, and have become popular toy benchmarks ever since. All environments are highly configurable via arguments specified in each environment’s documentation.
这些环境都涉及基于物理控制的玩具游戏，使用基于 box2d 的物理和基于 PyGame 的渲染。这些环境是由奥列格·克里莫夫（Oleg Klimov）在Gym的早期贡献的，从那时起就成为流行的玩具基准。所有环境都可以通过每个环境文档中指定的参数进行高度配置。

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项：

pip install gym[box2d]

5 MuJoCo

MuJoCo stands for Multi-Joint dynamics with Contact. It is a physics engine for faciliatating research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed.
MuJoCo 代表多关节动力学与接触。它是一个物理引擎，用于促进机器人技术、生物力学、图形和动画以及其他需要快速准确仿真的领域的研究和开发。

The unique dependencies for this set of environments can be installed via:
可以通过以下方式安装这组环境的唯一依赖项：

pip install gym[mujoco]

These environments also require that the MuJoCo engine be installed. As of October 2021 DeepMind has acquired MuJoCo and is open sourcing it in 2022, making it free for everyone. Instructions on installing the MuJoCo engine can be found at their website and GitHub repository. Using MuJoCo with OpenAI Gym also requires that the framework mujoco-py be installed, which can be found at the GitHub repository (this dependency in installed with the above command).
这些环境还要求安装 MuJoCo 引擎。截至 2021 年 10 月，DeepMind 已收购 MuJoCo，并于 2022 年开源，使其对所有人免费。有关安装 MuJoCo 引擎的说明，请访问他们的网站和 GitHub 存储库。将 MuJoCo 与 OpenAI Gym 一起使用还需要安装框架 mujoco-py，该框架可以在 GitHub 仓库中找到（此依赖项与上述命令一起安装）。

There are ten Mujoco environments: Ant, HalfCheetah, Hopper, Humanoid, HumanoidStandup, IvertedDoublePendulum, InvertedPendulum, Reacher, Swimmer, and Walker. All of these environments are stochastic in terms of their initial state, with a Gaussian noise added to a fixed initial state in order to add stochasticity. The state spaces for MuJoCo environments in Gym consist of two parts that are flattened and concatented together: a position of a body part (’mujoco-py.mjsim.qpos’) or joint and its corresponding velocity (’mujoco-py.mjsim.qvel’). Often, some of the first positional elements are omitted from the state space since the reward is calculated based on their values, leaviing it up to the algorithm to infer those hidden values indirectly.
有十种 Mujoco 环境：Ant、HalfCheetah、Hopper、Humanoid、HumanoidStandup、IvertedDoublePendulum、InvertedPendulum、Reacher、Swimmer 和 Walker。就其初始状态而言，所有这些环境都是随机的，为了增加随机性，将高斯噪声添加到固定的初始状态中。Gym 中 MuJoCo 环境的状态空间由两个部分组成，这两个部分被展平并融合在一起：身体部位的位置（'mujoco-py.mjsim.qpos'）或关节及其相应的速度（'mujoco-py.mjsim.qvel'）。通常，状态空间中会省略一些第一个位置元素，因为奖励是根据它们的值计算的，将其留给算法来间接推断这些隐藏值。

Among Gym environments, this set of environments can be considered as more difficult ones to solve by a policy.
在 Gym 环境中，这组环境可以被视为策略更难解决的环境。

Environments can be configured by changing the XML files or by tweaking the parameters of their classes.
可以通过更改 XML 文件或调整其类的参数来配置环境。

资源存储库

关注

7
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
【Gym】【强化学习环境】

就其初始状态而言，所有这些环境都是随机的，为了增加随机性，将高斯噪声添加到固定的初始状态中。这些环境是由奥列格·克里莫夫（Oleg Klimov）在Gym的早期贡献的，从那时起就成为流行的玩具基准。此外，关于两种山地车的环境，汽车爬山的动力不足，因此需要一些努力才能到达山顶。通常，状态空间中会省略一些第一个位置元素，因为奖励是根据它们的值计算的，将其留给算法来间接推断这些隐藏值。在 Gym 环境中，这组环境可以被认为是策略更容易解决的环境。在 Gym 环境中，这组环境可以被视为策略更难解决的环境。
复制链接

扫一扫