ml-agents0.7_ML-Agents Toolkit v0.5，现已提供给AI研究人员的新资源

最新推荐文章于 2022-12-02 10:52:39 发布

culiao6493

最新推荐文章于 2022-12-02 10:52:39 发布

阅读量241

点赞数

文章标签：游戏算法 java 人工智能深度学习

原文链接：https://blogs.unity3d.com/2018/09/11/ml-agents-toolkit-v0-5-new-resources-for-ai-researchers-available-now/

版权

ml-agents0.7

We are committed to working to help make Unity the go-to platform for Artificial Intelligence (AI) research. In the past few weeks, we’ve seen research groups taking notice, with OpenAI using Unity to help train a robot hand to perform a grasping task, and a group at UC Berkeley using it to test a new Curiosity-based Learning approach. Today we are happy to share the next wave of improvements and resources to help fulfill our mission of supporting the AI research community.

我们致力于帮助Unity成为人工智能(AI)研究的首选平台。在过去的几周中，我们看到了一些研究小组注意到这一点，其中OpenAI使用Unity来帮助训练机器人手来执行抓握任务，而UC Berkeley的小组则使用它来测试一种新的基于好奇心的学习方法。今天，我们很高兴分享下一波改进和资源浪潮，以帮助完成支持AI研究社区的使命。

This release includes a new version of the ML-Agents toolkit (v0.5) with more flexible action specification and curricula, a research paper we’ve written on ML-Agents and the Unity platform, a Gym interface for researchers to more easily integrate ML-Agents environments into their training workflows, and a new suite of learning environments which replicate some of the Continuous Control benchmarks used by many Deep Reinforcement Learning researchers.

此版本包括ML-Agents工具包(v0.5)的新版本，具有更灵活的动作规范和课程，我们在ML-Agents和Unity平台上撰写的研究论文，Gym界面供研究人员更轻松地集成ML-Agents环境进入了他们的培训工作流，并提供了一套新的学习环境，这些环境可以复制许多“深度强化学习”研究人员所使用的“ 持续控制”基准。

关于Unity作为AI平台的研究论文 (A research paper on Unity as an AI platform)

With the growing adoption of Unity and the ML-Agents toolkit as a research platform, we have received numerous requests for a paper outlining the platform which can be referenced. We are happy to finally release a preprint of our paper “Unity: A General Platform for Intelligent Agents,” which is now available now on arXiv. Within this reference paper, we describe our vision for Unity as a simulation platform that builds on and extends the capabilities of other similar platforms and outline the ML-Agents toolkit as a research tool, discussing both the fundamental design, as well as the additional features. We also provide benchmark results on our example environments using the Proximal Policy Optimization algorithm, as well as a small cohort of Unity employees who provided a human-benchmark against which to compare. With these baselines in place, we look forward to seeing research groups outperform our results, and achieve “superhuman” performance on as many of the example environments as possible.

随着Unity和ML-Agents工具箱越来越多地用作研究平台，我们收到了许多要求提供概述该平台的论文的要求，这些论文可供参考。我们很高兴最终发布论文“ Unity：智能代理的通用平台”的预印本，现在可在arXiv上获得。在本参考文件中，我们将Unity的愿景描述为基于其他相似平台并扩展其功能的仿真平台，并概述了ML-Agents工具包作为研究工具，并讨论了基本设计以及其他功能。。我们还使用近似策略优化算法在示例环境中提供了基准测试结果，以及一小部分的Unity员工，他们提供了一个可供比较的人为基准。有了这些基准，我们希望看到研究小组的表现超过我们的结果，并在尽可能多的示例环境中实现“超人”性能。

健身界面支持 (Gym interface support)

When we first released the ML-Agents toolkit, we provided a custom Python API for interacting with learning environments. We did this because we wanted to provide a powerful and flexible way of interacting with our environments which wasn’t limited by pre-existing conventions. This allowed us to enable scenarios involving multi-agent and multi-brain learning with complex mixed observation spaces. Many in the research community have asked about the availability of a gym wrapper for these environments as well. For those unfamiliar, gym is a standardized and popular means of interacting with simulation environments. As such, we are happy to share that we’ve created a gym interface which can be used to interact with Unity environments. If you are a researcher who has built an experimentation pipeline around using gyms, this means you will be able to easily swap out other gym environments for Unity ones. To learn more about the gym interface, see our package page.

当我们首次发布ML-Agents工具包时，我们提供了一个自定义的Python API，用于与学习环境进行交互。这样做是因为我们想提供一种强大而灵活的方式来与环境进行交互，而不受现有约定的限制。这使我们能够实现涉及具有复杂混合观察空间的多主体和多脑学习的方案。研究社区中的许多人也询问了针对这些环境的健身房包装纸的可用性。对于不熟悉的人，健身房是与模拟环境进行交互的一种标准化且流行的方式。因此，我们很高兴分享我们已经创建了一个健身房界面，该界面可用于与Unity环境进行交互。如果您是围绕使用健身房建立了实验管道的研究人员，则意味着您可以轻松地将其他健身房环境换成Unity的环境。要了解有关Gym界面的更多信息，请参见我们的包装页面。

马拉松环境介绍 (Introducing Marathon Environments)

For the past year, one of our community members, Joe Booth, has been working on re-implementing the classic set of Continuous Control benchmarks typically seen in Deep Reinforcement Learning literature as Unity environments using the ML-Agents toolkit. The environments include the Walker, Hopper, Humanoid, and Ant based on the environments available in the DeepMind Control Suite and OpenAI Gym. Collectively we are calling these the Marathon Environments since in all cases the goal is for the agents to learn to run forward as quickly and consistently as possible. We are providing these for the research community to use as an easy way to get started with benchmarking algorithms against these classic tasks.

在过去的一年中，我们的社区成员之一Joe Booth致力于使用ML-Agents工具包重新实现经典的连续控制基准，这在“深度强化学习”文献中通常被视为Unity环境。这些环境包括DeepMind Control Suite和OpenAI Gym中可用的环境，包括Walker，Hopper，Humanoid和Ant。我们统称为“马拉松环境”，因为在所有情况下，目标都是使特工学会尽可能快而一致地前进。我们为研究社区提供这些工具，以作为一种针对这些经典任务的基准算法入门的简便方法。

They were made possible by the contributions of Joe Booth, a veteran in the games industry turned Machine Learning researcher. Click here to download and get started with the environments yourself. Read below to hear from Joe himself on how he made this possible.

游戏行业资深人士Joe Booth的贡献使之成为可能，他们是机器学习研究员。单击此处下载并亲自开始使用环境。阅读以下内容，听听乔本人的信，以了解他是如何做到这一点的。

In Joe’s own words…

用乔自己的话说...

“I wanted to see if the research on continuous control and locomotion from OpenAI, DeepMind and others would transfer to a modern day game engine such as Unity and PhysX. Imagine a future where a game designer could input a YouTube URL of a desired animation and the AI would mimic this while dynamically reacting to a changing environment – super cool! By creating a framework to reproduce these benchmarks within Unity one is able to take incremental steps.

“我想看看OpenAI ， DeepMind和其他公司对连续控制和运动的研究是否可以转移到Unity和PhysX等现代游戏引擎上。 想象一个未来，游戏设计师可以输入所需动画的YouTube URL，而AI会在动态响应不断变化的环境的同时模仿这一点-太酷了！ 通过创建一个在Unity中重现这些基准的框架，人们可以采取增量步骤。

When implementing a paper or novel idea, one can first test on a simple model such as Hopper and have confidence that results will scale up to more complex models such as Walker or Humanoid. You can see how I use this incremental approach in my research into mastering dynamic environments, controllers, and style transfer.

实施论文或新颖的构想时，可以首先在简单的模型(如Hopper)上进行测试，并确信结果会扩展到更复杂的模型(如Walker或Humanoid)。 您可以看到我如何在研究动态环境，控制器和样式转换的过程中使用这种增量方法。

I’m excited to see how others will use Marathon Environments. With the addition of Gym we have the opportunity to bring many new bleeding edge algorithms to ML-Agents such as HER or MAML and I would gladly support or partner in these efforts.”

我很高兴看到其他人将如何使用马拉松环境。 随着 Gym 的加入， 我们有机会将许多新的前沿算法引入HER或MAML等ML-Agent，我很乐意为这些努力提供支持或成为伙伴。”

其他新功能 (Additional new features)

Expanded Discrete Action Space – We have changed the way discrete action spaces work to allow for agents using this space type to make multiple action selections at once. While the previous versions of ML-Agents only allowed agents to select a single discrete action at a time, v0.5 allows you to create action branches for your agent. Each branch can contain a different fixed number of possible actions which can be selected from. During runtime, the agent will choose one action for each branch when a decision is requested. Concretely, this means that it is now possible for an agent to both move in a chosen direction and jump, as is now the case in WallJump. We have also modified the BananaCollector environment, making it possible for the agents in that environment to move, turn, and potentially fire their laser.

扩展的离散动作空间 –我们已更改了离散动作空间的工作方式，以允许使用此空间类型的座席一次选择多个动作。以前的ML-Agent版本仅允许代理一次选择单个离散操作，而v0.5允许您为代理创建操作分支。每个分支可以包含不同固定数量的可能动作，可以从中选择。在运行期间，当请求决策时，代理将为每个分支选择一个动作。具体而言，这意味着代理现在可以同时沿选定的方向移动并跳跃，就像WallJump中的情况一样。我们还修改了BananaCollector环境，使该环境中的特工可以移动，转动并可能发射激光。

Discrete Action Masking – Under certain circumstances, an agent should be disallowed from performing a specific action. For example, if an agent is in the air, it does not make sense for it to jump. In v0.5, you can now specify impossible actions for the next decision to your agents. When collecting observations, you can optionally specify one or more impossible actions for each action branch of the agent. The agent will not try to perform any of those actions at the next decision step. This is meant to make it easy to prevent agents from doing impossible actions without the need for additional code in the action methods.

离散操作屏蔽 –在某些情况下，不应允许座席执行特定操作。例如，如果有空中特工在空中，那就跳下去是没有意义的。在v0.5中，您现在可以为代理商的下一个决定指定不可能采取的措施。收集观察值时，可以选择为代理的每个操作分支指定一个或多个不可能的操作。代理将不会在下一个决策步骤中尝试执行任何这些操作。这意味着可以轻松防止代理执行不必要的操作，而无需在操作方法中添加其他代码。

We modified the GridWorld environment to mask the actions that would involve the agent attempting to walk into a wall. By doing this, the agent does not lose time when exploring the grid and learns a lot faster. See the figure below for a comparison of the learning process with and without action masking.

我们修改了GridWorld环境，以掩盖可能会导致代理尝试进入墙壁的动作。这样，代理在探索网格时不会浪费时间，并且学习速度更快。请参见下图，以比较使用和不使用动作遮罩的学习过程。

Meta-Curriculum – Curriculum learning is a great feature that allows you to create environments that get increasingly hard as the agent progresses. We introduced this feature in v0.2 and we have improved it for v0.5. You can now use meta-curriculums which enables you to create curriculum scenarios in environments that use multiple Brains. You can do so by specifying a curriculum for each Brain independently. You can now create a learning environment where multiple kinds of agents can learn, at their own pace. We included curriculum examples for the WallJump environment.

元课程 –课程学习是一项很棒的功能，可让您创建随着代理人的进步而变得越来越困难的环境。我们在v0.2中引入了此功能，并在v0.5中进行了改进。现在，您可以使用元课程，使您能够在使用多个大脑的环境中创建课程方案。您可以通过为每个大脑独立指定课程表来实现。现在，您可以创建一个学习环境，多种代理可以按照自己的进度进行学习。我们提供了WallJump环境的课程示例。

结论 (Conclusion)

We look forward to seeing how the research community takes advantage of this collection of new resources and improved features. Going forward, we plan to continue to support both the research and game developer communities with our work and releases. If you have comments, feedback, or questions, feel free to reach out to us on our GitHub issues page, or email us directly at ml-agents@unity3d.com. Happy training!

我们期待看到研究社区如何利用这些新资源和改进功能的集合。展望未来，我们计划继续通过我们的工作和发行来支持研究和游戏开发商社区。如果您有任何意见，反馈或问题，请随时在我们的GitHub问题页面上与我们联系，或直接通过ml-agents@unity3d.com向我们发送电子邮件。培训愉快！