ml-agents_使用ML-Agents的自玩功能来训练智能对手

最新推荐文章于 2024-08-05 16:33:36 发布

culiao6493

最新推荐文章于 2024-08-05 16:33:36 发布

阅读量1.1k

点赞数 1

文章标签：游戏算法编程语言 java 人工智能

原文链接：https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/

版权

最新版ML-Agents工具包(v0.14)引入了自玩功能，用于训练对抗游戏中的智能对手。自玩通过让代理与自身历史策略对抗，提供逐渐增强的挑战，实现稳定有效的训练。这种方法类似人类如何通过与相似水平的对手竞争来提升技能。文章介绍了自玩在RL中的应用，以及如何在网球和足球环境中应用自玩，展示了它如何帮助训练出能与人类高手抗衡的智能代理。

摘要由CSDN通过智能技术生成

ml-agents

In the latest release of the ML-Agents Toolkit (v0.14), we have added a self-play feature that provides the capability to train competitive agents in adversarial games (as in zero-sum games, where one agent’s gain is exactly the other agent’s loss). In this blog post, we provide an overview of self-play and demonstrate how it enables stable and effective training on the Soccer demo environment in the ML-Agents Toolkit.

在最新版的ML-Agents工具包(v0.14)中，我们添加了一项自玩功能，该功能可在对抗游戏(如零和游戏中，一个代理商的收益恰好是其他代理人的损失)。在此博客文章中，我们提供了自我竞赛的概述，并演示了它如何在ML-Agents Toolkit中的足球演示环境上实现稳定有效的培训。

The Tennis and Soccer example environments of the Unity ML-Agents Toolkit pit agents against one another as adversaries. Training agents in this type of adversarial scenario can be quite challenging. In fact, in previous releases of the ML-Agents Toolkit, reliably training agents in these environments required significant reward engineering. In version 0.14, we have enabled users to train agents in games via reinforcement learning (RL) from self-play, a mechanism fundamental to a number of the most high profile results in RL such as OpenAI Five and DeepMind’s AlphaStar. Self-play uses the agent’s current and past ‘selves’ as opponents. This provides a naturally improving adversary against which our agent can gradually improve using traditional RL algorithms. The fully trained agent can be used as competition for advanced human players.

Unity ML-Agents工具包的Tennis和Soccer示例环境使代理人相互对抗。在这种对抗性情况下培训代理人可能会非常具有挑战性。实际上，在ML-Agents Toolkit的早期版本中，在这些环境中可靠地培训代理需要大量的奖励工程。在 0.14版中，我们使用户能够通过自学的强化学习(RL)来训练游戏中的特工，这是RL中许多最引人注目的结果的基础机制，例如 OpenAI Five 和 DeepMind的AlphaStar 。自我玩法将坐席的当前和过去“自我”用作对手。这提供了一个自然地改进的对手，我们的代理可以使用传统的RL算法来逐步改进它。训练有素的经纪人可以用作高级选手的比赛。

Self-play provides a learning environment analogous to how humans structure competition. For example, a human learning to play tennis would train against opponents of similar skill level because an opponent that is too strong or too weak is not as conducive to learning the game. From the standpoint of improving one’s skills, it would be far more valuable for a beginner-level tennis player to compete against other beginners than, say, against a newborn child or Novak Djokovic. The former couldn’t return the ball, and the latter wouldn’t serve them a ball they could return. When the beginner has achieved sufficient strength, they move on to the next tier of tournament play to compete with stronger opponents.

自我游戏提供了类似于人类如何构成竞争的学习环境。例如，人类学习打网球会训练类似技能水平的对手，因为太强或太弱的对手都不利于学习比赛。从提高技能的角度来看，对于初学者水平的网球运动员来说，与其他初学者竞争比与新生婴儿或诺瓦克·德约科维奇竞争更有价值。前者不能退回球，后者不能为他们提供可以退回的球。当初学者获得足够的力量时，他们将进入下一级比赛，与实力更强的对手竞争。

In this blog post, we give some technical insight into the dynamics of self-play as well as provide an overview of our Tennis and Soccer example environments that have been refactored to showcase self-play.

在此博客文章中，我们提供了一些有关自我比赛动态的技术见解，并概述了重构后的网球和足球示例环境以展示自我比赛。

游戏中自我游戏的历史 (History of self-play in games)

The notion of self-play has a long history in the practice of building artificial agents to solve and compete with humans in games. One of the earliest uses of t