Gymnasium 为 RL 强化学习研究和开发提供各种标准环境,比如经典控制(CartPole、MountainCar)、2D/3D 游戏、物理仿真等

Gymnasium 是 OpenAI Gym 的社区接力项目,属于强化学习(Reinforcement Learning, RL)领域的标准环境接口库。它主打开放、社区维护和可持续的发展,已经是众多 RL 论文、教程和项目的基础环境标准。


一、Gymnasium 是什么?

Gymnasium 继承了 OpenAI Gym 的经典接口和理念,是一个通用的强化学习环境库
它为强化学习算法提供环境标准接口

  • 轻松加载/运行不同的环境(游戏、仿真、任务等)
  • 让 RL 智能体与环境交互
  • 复现/跑通大量 RL 相关论文、模型

Gymnasium 与 OpenAI Gym 最初版本完全兼容,是后者的“正统社区维护继承版”(OpenAI 官方早期已不再维护 Gym,社区由 Farama Foundation 继续发展)。


二、核心特点

  • 高度兼容:对 gym 接口(如 env.stepenv.resetenv.render)100%兼容。
  • 丰富的环境资源:包含了众多经典和前沿 RL 环境,如 CartPole、MountainCar、LunarLander、Atari、Box2D、MuJoCo 等。
  • 社区维护活跃:由 Farama Foundation 组织,持续集成、修复和创新。
  • API 规范统一:便于和各类 RL 算法库(Stable-Baselines3、RLlib、CleanRL、PettingZoo 等)无缝衔接。
  • 支持自定义环境注册与扩展,适合科研和工程级场景。

三、常用环境分类

  1. Classic control(经典控制)
    • 比如:CartPole,MountainCar,Acrobot,Pendulum
  2. Atari 游戏(需要安装额外依赖如 gymnasium[atari])
    • 比如:Pong、Breakout、Space Invaders
  3. Box2D 物理仿真
    • LunarLander、BipedalWalker
  4. MuJoCo 机械臂/物理仿真
    • Hopper、HalfCheetah、Walker2d
  5. TOY_TEXT/表格环境(FrozenLake、Taxi 等)
  6. 自定义和第三方环境:很多基于 Gymnasium 的包也可以注册自定义环境。

四、基本用法举例

安装

pip install gymnasium

最小RL循环代码

import gymnasium as gym

# 创建环境
env = gym.make("CartPole-v1")
# 初始化环境,获得初始观测观察
observation, info = env.reset(seed=42)
done = False

while not done:
    # 智能体选择动作,这里示例为随机动作
    action = env.action_space.sample()
    # 与环境交互
    observation, reward, done, truncated, info = env.step(action)
    # 渲染窗口
    env.render()
env.close()

五、生态圈与扩展

Gymnasium 是 RL 生态的基础:

  • 可与 Stable-Baselines3、RLlib、Ray、CleanRL 等算法库无缝结合
  • 可配合 PettingZoo(多智能体)、Minigrid(迷宫/探索)、Procgen(生成式环境)等专用环境库
  • 支持 自定义环境,只需实现标准接口即可整合进训练管道

六、发展历史与社区

  • OpenAI Gym 最早由 OpenAI 发布,成为 RL 界事实标准
  • 后期 OpenAI 仓库维护暂停,Farama Foundation 牵头重启为 Gymnasium
  • 官网、文档、社区活动活跃,持续修 bug、加新环境、跟进兼容性
  • 除 Python 社区,大量 C++/Java/Julia 等语言环境都以 Gym 接口为兼容目标

七、官网内容亮点

进入 https://gymnasium.farama.org/ 会看到:

  • 环境总览和截图(直观找到你要测试/训练的任务)
  • API 文档、快速入门、进阶主题
  • how-to 指南,比如如何创建自定义环境、如何多进程训练等
  • 教程和案例(Python 代码 + 讲解)
  • 各个环境的注册表和参数说明

八、常见用途

  • RL课程和教程(包括 Deep RL 经典教材和网上MOOC)
  • 新论文实验复现
  • 学术研究与基准测试
  • 工业工程应用里的仿真与智能控制
  • 开发和测试你自己的 RL 算法

总结


Gymnasium 是强化学习领域生态最广、标准性最强的环境接口库。入门、研究、工程实现都可以用它快速跑通。

如果你有具体想试的环境或应用场景(如机器人、Atari游戏、迷宫、控制系统等),可以告诉我,我能帮你推荐资源或代码模板!

实践

安装gymnasium

pip install gymnasium
# 或
uv pip install gymnasium

有些还需要2d支持

pip install swig
pip install gymnasium[box2d]
# 或
# uv pip install swig
# uv pip install gymnasium[box2d]

运行demo

创建test.py文件,内容:

import gymnasium as gym

# 创建环境
env = gym.make("CartPole-v1")
# 初始化环境,获得初始观测观察
observation, info = env.reset(seed=42)
done = False

while not done:
    # 智能体选择动作,这里示例为随机动作
    action = env.action_space.sample()
    # 与环境交互
    observation, reward, done, truncated, info = env.step(action)
    # 渲染窗口
    env.render()
env.close()

运行python test.py ,显示:

结合PARL强化学习进行训练

 安装PARL强化学习套件

参考:在星河社区学习PARL使用强化学习来训练AI-CSDN博客

修改PARL训练的代码,将里面的gym库改成gymnasium库,也就是在

PARL/examples/QuickStart/train.py 文件中,修改

# import gym
import gymnasium  as gym

然后执行

cd PARL/examples/QuickStart/

python train.py

可以看到训练输出:

home/aistudio/work/PARL/examples/QuickStart
[05-14 21:31:25 MainThread @logger.py:242] Argv: train.py
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
[05-14 21:31:28 MainThread @train.py:85] obs_dim 4, act_dim 2
[05-14 21:31:28 MainThread @train.py:101] Episode 0, Reward Sum 26.0.
[05-14 21:31:28 MainThread @train.py:101] Episode 10, Reward Sum 13.0.
[05-14 21:31:28 MainThread @train.py:101] Episode 20, Reward Sum 13.0.
[05-14 21:31:28 MainThread @train.py:101] Episode 30, Reward Sum 20.0.
[05-14 21:31:28 MainThread @train.py:101] Episode 40, Reward Sum 10.0.
[05-14 21:31:28 MainThread @train.py:101] Episode 50, Reward Sum 17.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 60, Reward Sum 45.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 70, Reward Sum 13.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 80, Reward Sum 27.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 90, Reward Sum 10.0.
[05-14 21:31:29 MainThread @train.py:111] Test reward: 33.4
[05-14 21:31:29 MainThread @train.py:101] Episode 100, Reward Sum 13.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 110, Reward Sum 29.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 120, Reward Sum 23.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 130, Reward Sum 45.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 140, Reward Sum 12.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 150, Reward Sum 25.0.
[05-14 21:31:29 MainThread @train.py:101] Episode 160, Reward Sum 35.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 170, Reward Sum 31.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 180, Reward Sum 63.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 190, Reward Sum 19.0.
[05-14 21:31:30 MainThread @train.py:111] Test reward: 39.0
[05-14 21:31:30 MainThread @train.py:101] Episode 200, Reward Sum 29.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 210, Reward Sum 57.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 220, Reward Sum 32.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 230, Reward Sum 33.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 240, Reward Sum 35.0.
[05-14 21:31:30 MainThread @train.py:101] Episode 250, Reward Sum 19.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 260, Reward Sum 42.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 270, Reward Sum 51.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 280, Reward Sum 51.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 290, Reward Sum 30.0.
[05-14 21:31:31 MainThread @train.py:111] Test reward: 55.2
[05-14 21:31:31 MainThread @train.py:101] Episode 300, Reward Sum 53.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 310, Reward Sum 29.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 320, Reward Sum 32.0.
[05-14 21:31:31 MainThread @train.py:101] Episode 330, Reward Sum 18.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 340, Reward Sum 31.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 350, Reward Sum 19.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 360, Reward Sum 33.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 370, Reward Sum 46.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 380, Reward Sum 53.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 390, Reward Sum 56.0.
[05-14 21:31:32 MainThread @train.py:111] Test reward: 42.2
[05-14 21:31:32 MainThread @train.py:101] Episode 400, Reward Sum 47.0.
[05-14 21:31:32 MainThread @train.py:101] Episode 410, Reward Sum 16.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 420, Reward Sum 50.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 430, Reward Sum 44.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 440, Reward Sum 34.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 450, Reward Sum 36.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 460, Reward Sum 7.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 470, Reward Sum 14.0.
[05-14 21:31:33 MainThread @train.py:101] Episode 480, Reward Sum 27.0.
[05-14 21:31:34 MainThread @train.py:101] Episode 490, Reward Sum 35.0.
[05-14 21:31:34 MainThread @train.py:111] Test reward: 56.4
[05-14 21:31:34 MainThread @train.py:101] Episode 500, Reward Sum 33.0.
[05-14 21:31:34 MainThread @train.py:101] Episode 510, Reward Sum 40.0.
[05-14 21:31:34 MainThread @train.py:101] Episode 520, Reward Sum 53.0.
[05-14 21:31:34 MainThread @train.py:101] Episode 530, Reward Sum 27.0.
[05-14 21:31:35 MainThread @train.py:101] Episode 540, Reward Sum 42.0.
[05-14 21:31:35 MainThread @train.py:101] Episode 550, Reward Sum 3.0.
[05-14 21:31:35 MainThread @train.py:101] Episode 560, Reward Sum 84.0.
[05-14 21:31:35 MainThread @train.py:101] Episode 570, Reward Sum 76.0.
[05-14 21:31:35 MainThread @train.py:101] Episode 580, Reward Sum 132.0.
[05-14 21:31:36 MainThread @train.py:101] Episode 590, Reward Sum 80.0.
[05-14 21:31:36 MainThread @train.py:111] Test reward: 170.2
[05-14 21:31:36 MainThread @train.py:101] Episode 600, Reward Sum 47.0.
[05-14 21:31:36 MainThread @train.py:101] Episode 610, Reward Sum 72.0.
[05-14 21:31:37 MainThread @train.py:101] Episode 620, Reward Sum 4.0.
[05-14 21:31:37 MainThread @train.py:101] Episode 630, Reward Sum 86.0.
[05-14 21:31:37 MainThread @train.py:101] Episode 640, Reward Sum 32.0.
[05-14 21:31:37 MainThread @train.py:101] Episode 650, Reward Sum 38.0.
[05-14 21:31:38 MainThread @train.py:101] Episode 660, Reward Sum 55.0.
[05-14 21:31:38 MainThread @train.py:101] Episode 670, Reward Sum 49.0.
[05-14 21:31:38 MainThread @train.py:101] Episode 680, Reward Sum 60.0.
[05-14 21:31:38 MainThread @train.py:101] Episode 690, Reward Sum 8.0.
[05-14 21:31:39 MainThread @train.py:111] Test reward: 200.0
[05-14 21:31:39 MainThread @train.py:101] Episode 700, Reward Sum 95.0.
[05-14 21:31:39 MainThread @train.py:101] Episode 710, Reward Sum 149.0.
[05-14 21:31:40 MainThread @train.py:101] Episode 720, Reward Sum 28.0.
[05-14 21:31:40 MainThread @train.py:101] Episode 730, Reward Sum 68.0.
[05-14 21:31:40 MainThread @train.py:101] Episode 740, Reward Sum 29.0.
[05-14 21:31:41 MainThread @train.py:101] Episode 750, Reward Sum 57.0.
[05-14 21:31:41 MainThread @train.py:101] Episode 760, Reward Sum 17.0.
[05-14 21:31:41 MainThread @train.py:101] Episode 770, Reward Sum 107.0.
[05-14 21:31:42 MainThread @train.py:101] Episode 780, Reward Sum 58.0.
[05-14 21:31:42 MainThread @train.py:101] Episode 790, Reward Sum 144.0.
[05-14 21:31:43 MainThread @train.py:111] Test reward: 300.0
[05-14 21:31:43 MainThread @train.py:101] Episode 800, Reward Sum 83.0.
[05-14 21:31:43 MainThread @train.py:101] Episode 810, Reward Sum 28.0.
[05-14 21:31:43 MainThread @train.py:101] Episode 820, Reward Sum 84.0.
[05-14 21:31:44 MainThread @train.py:101] Episode 830, Reward Sum 162.0.
[05-14 21:31:44 MainThread @train.py:101] Episode 840, Reward Sum 142.0.
[05-14 21:31:44 MainThread @train.py:101] Episode 850, Reward Sum 112.0.
[05-14 21:31:45 MainThread @train.py:101] Episode 860, Reward Sum 48.0.
[05-14 21:31:45 MainThread @train.py:101] Episode 870, Reward Sum 179.0.
[05-14 21:31:45 MainThread @train.py:101] Episode 880, Reward Sum 126.0.
[05-14 21:31:46 MainThread @train.py:101] Episode 890, Reward Sum 71.0.
[05-14 21:31:46 MainThread @train.py:111] Test reward: 200.0
[05-14 21:31:46 MainThread @train.py:101] Episode 900, Reward Sum 139.0.
[05-14 21:31:47 MainThread @train.py:101] Episode 910, Reward Sum 67.0.
[05-14 21:31:47 MainThread @train.py:101] Episode 920, Reward Sum 166.0.
[05-14 21:31:48 MainThread @train.py:101] Episode 930, Reward Sum 262.0.
[05-14 21:31:48 MainThread @train.py:101] Episode 940, Reward Sum 38.0.
[05-14 21:31:49 MainThread @train.py:101] Episode 950, Reward Sum 276.0.
[05-14 21:31:49 MainThread @train.py:101] Episode 960, Reward Sum 78.0.
[05-14 21:31:50 MainThread @train.py:101] Episode 970, Reward Sum 218.0.
[05-14 21:31:50 MainThread @train.py:101] Episode 980, Reward Sum 128.0.
[05-14 21:31:51 MainThread @train.py:101] Episode 990, Reward Sum 226.0.
[05-14 21:31:52 MainThread @train.py:111] Test reward: 500.0

 

调试

报错Box2D is not installed

gymnasium.error.DependencyNotInstalled: Box2D is not installed, you can install it by run `pip install swig` followed by `pip install "gymnasium[box2d]"`

执行

pip install swig
pip install gymnasium[box2d]
# 或
# uv pip install swig
# uv pip install gymnasium[box2d]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值