Why unwrap an openAI gym?

题意:为什么要“解开”OpenAI Gym?

问题背景:

I'm trying to get some insights into reinforcement learning while using openAI gym as a learning environment. I do this by reading the book Hands-on reinforcement learning with Python. In this book, some code is provided. Often, the code doesn't work, because I have to unwrap it first, as shown in: openai gym env.P, AttributeError 'TimeLimit' object has no attribute 'P'

我正在尝试通过使用OpenAI Gym作为学习环境来深入了解强化学习。我通过阅读《使用Python的动手强化学习》这本书来实现这一点。在这本书中,提供了一些代码示例。然而,这些代码经常无法直接运行,因为我需要先对它们进行“展开”操作,正如我在尝试访问openai gym env.P时遇到的AttributeError: 'TimeLimit' object has no attribute 'P'错误所示。

However, I personally am still interested in the WHY of this unwrapping. Why do you need to unwrap? What does this do exactly? And why isn't it coded like that in the book? Is it outdated software as Giuliov assumed?

然而,我个人仍然对这个展开操作背后的原因感兴趣。你为什么需要展开?这具体做了什么?为什么书中的代码没有这样写?是像Giuliov所想的那样,是过时的软件吗?

Thanks in advance.        提前感谢。

问题解决:

Open AI Gym offers many different environments. Each of them with their own set of parameters and methods. Nevertheless they generally are wrapped by a single Class (like an interface on real OOPLs) called Env. This class exposes the common most essential methods of any environment, like stepreset and seed. Having this “interface” class is great, because it allows your code to be environment agnostic. It is also makes things easier if you want to test a single agent on different environments.

OpenAI Gym 提供了许多不同的环境。每个环境都有自己的一套参数和方法。然而,它们通常都被一个名为 Env 的单一类(类似于真实面向对象编程语言中的接口)所封装。这个类暴露了任何环境中最常见、最基本的方法,如 stepreset 和 seed。拥有这个“接口”类是非常棒的,因为它允许你的代码与环境无关。如果你想要在不同的环境上测试单个代理,这样做也会使事情变得更简单。

However, if you want to access the behind-the.scenes dynamics of a specific environment, then you use the unwrapped property.

然而,如果你想要访问特定环境的幕后动态(即其内部工作机制),那么你可以使用unwrapped属性。

  • 21
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

营赢盈英

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值