How to use a custom Openai gym environment with Openai stable-baselines RL algorithms?

题意

如何将自定义的 OpenAI Gym 环境与 OpenAI Stable-Baselines 强化学习算法一起使用?

问题背景:

I've been trying to use a custom openai gym environment for fixed wing uav from GitHub - eivindeb/fixed-wing-gym by testing it with the openai stable-baselines algorithms but I have been running into issues for several days now. My baseline is the CartPole example Multiprocessing: Unleashing the Power of Vectorized Environments from Examples — Stable Baselines 2.10.3a0 documentation since I would need to supply arguments and I am trying to use multiprocessing which I believe this example is all I need.

我一直在尝试使用 GitHub 上 eivindeb/fixed-wing-gym 提供的固定翼无人机的自定义 OpenAI Gym 环境,并使用 OpenAI 的 Stable-Baselines 算法进行测试,但我已经遇到问题好几天了。我的基准是 CartPole 示例,即 “多进程:释放矢量化环境的力量” (Multiprocessing: Unleashing the Power of Vectorized Environments),来自 “Stable Baselines 2.10.3a0 文档的示例 (Examples — Stable Baselines 2.10.3a0 documentation)”,因为我需要提供参数,并且我正在尝试使用多进程处理,我认为这个示例正是我所需要的。

I have modified the baseline example as follows:

我已将基准示例修改如下:

import gym
import numpy as np

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines.common import set_global_seeds
from stable_baselines import ACKTR, PPO2
from gym_fixed_wing.fixed_wing import FixedWingAircraft


def make_env(env_id, rank, seed=0):
    """
    Utility function for multiprocessed env.

    :param env_id: (str) the environment ID
    :param num_env: (int) the number of environments you wish to have in subprocesses
    :param seed: (int) the inital seed for RNG
    :param rank: (int) index of the subprocess
    """

    def _init():
        env = FixedWingAircraft("fixed_wing_config.json")
        #env = gym.make(env_id)
        env.seed(seed + rank)
        return env

    set_global_seeds(seed)
    return _init

if __name__ == '__main__':
    env_id = "fixed_wing"
    #env_id = "CartPole-v1"
    num_cpu = 4  # Number of processes to use
    # Create the vectorized environment
    env = SubprocVecEnv([lambda: FixedWingAircraft for i in range(num_cpu)])
    #env = SubprocVecEnv([make_env(env_id, i) for i in range(num_cpu)])

    model = PPO2(MlpPolicy, env, verbose=1)
    model.learn(total_timesteps=25000)

    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        env.render()

and the error I keep getting is the following:

我不断遇到的错误如下:

Traceback (most recent call last):
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/fixed-wing-gym/gym_fixed_wing/ACKTR_fixedwing.py", line 38, in <module>
    model = PPO2(MlpPolicy, env, verbose=1)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/ppo2/ppo2.py", line 104, in __init__
    self.setup_model()
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/ppo2/ppo2.py", line 134, in setup_model
    n_batch_step, reuse=False, **self.policy_kwargs)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 660, in __init__
    feature_extraction="mlp", **_kwargs)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 540, in __init__
    scale=(feature_extraction == "cnn"))
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 221, in __init__
    scale=scale)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 117, in __init__
    self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/input.py", line 51, in observation_input
    type(ob_space).__name__))
NotImplementedError: Error: the model does not support input space of type NoneType

I am not sure what to really input as the env_id and for the def make_env(env_id, rank, seed=0) function. I am also thinking that the VecEnv function for parallel processes is not properly setup.

我不确定 env_id 应该输入什么,也不确定 def make_env(env_id, rank, seed=0) 函数该如何设置。我还在考虑 VecEnv 用于并行进程的函数是否设置正确。

I am coding with Python v3.6 using PyCharm IDE in Ubuntu 18.04.

我在 Ubuntu 18.04 上使用 PyCharm IDE 和 Python 3.6 进行编码。

Any suggestions would really help at this point!

此时,任何建议都会对我有所帮助!

Thank you.        多谢

问题解决:

You created a custom environment alright, but you didn't register it with the openai gym interface. That's what the env_id refers to. All environments in gym can be set up by calling their registered name.

你确实创建了一个自定义环境,但你没有将其注册到 OpenAI Gym 接口中。这就是 env_id 所指的。Gym 中的所有环境都可以通过调用它们的注册名称来设置。

So basically what you need to do is follow the set up instructions here and create the appropriate __init__.py and setup.py scripts, and follow the same file structure.

因此,基本上你需要做的是按照这里的设置说明,创建适当的 __init__.pysetup.py 脚本,并遵循相同的文件结构。

At the end locally install your package using pip install -e . from within your environment directory.

最后,在你的环境目录中使用 pip install -e . 本地安装你的包。

  • 23
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

营赢盈英

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值