深度学习游戏关卡_强化学习的游戏关卡设计

最新推荐文章于 2024-01-07 02:04:55 发布

weixin_26630173

最新推荐文章于 2024-01-07 02:04:55 发布

阅读量781

点赞数

文章标签：游戏 python 深度学习强化学习人工智能

原文链接：https://medium.com/deepgamingai/game-level-design-with-reinforcement-learning-52b02bb94954

版权

本文探讨了如何结合深度学习与强化学习来创新性地设计游戏关卡，通过翻译自Medium的文章，揭示了这一领域的最新研究和发展。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

深度学习游戏关卡

演示地址

Procedural Content Generation (or PCG) is a method of using a computer algorithm to generate large amounts of content within a game like huge open-world environments, game levels and many other assets that go into creating a game.

程序性内容生成(PCG)是一种使用计算机算法在游戏中生成大量内容的方法，例如巨大的开放世界环境，游戏级别以及创建游戏所需的许多其他资产。

Today, I want to share with you a paper titled “PCGRL: Procedural Content Generation via Reinforcement Learning” which shows how we can use self-learning AI algorithms for procedural generation of 2D game environments.

今天，我想与大家分享题为“一文PCGRL：程序内容生成通过强化学习 ”，这显示了我们如何能够利用自我学习的人工智能算法程序生成2D游戏环境。

Usually, we are familiar with the use of the AI technique called Reinforcement Learning to train AI agents to play games, but this paper trains an AI agent to design levels of that game. According to the authors, this is the first time RL has been used for the task of PCG.

通常，我们熟悉使用称为“强化学习”的AI技术来训练AI代理玩游戏，但是本文通过训练AI代理来设计该游戏的级别。这组作者说，这是RL首次用于PCG的任务。

推箱子游戏环境 (Sokoban Game Environment)

Let’s look at the central idea of the paper. Consider a simple game environment like in the game called Sokoban.

让我们看一下本文的中心思想。考虑一个简单的游戏环境，例如在名为《推箱子》的游戏中。

We can look at this map or game level as a 2D array of integers that represent this state of the game. This state is observed by the Reinforcement Learning agent that can edit the game environment. By taking actions like adding or removing certain element of the game (like solid box, crate, player, target, etc. ), it can edit this environment to give us a new state.

我们可以将地图或游戏级别视为代表游戏状态的2D整数数组。强化学习代理可以观察此状态，该代理可以编辑游戏环境。通过采取诸如添加或删除游戏某些元素(如实心框，板条箱，玩家，目标等)之类的动作，它可以编辑此环境以赋予我们新的状态。

Now, in order to ensure that the environment generated by this agent is of good quality, we need some sort of feedback mechanism. This mechanism is constructed in this paper by comparing the previous state and the updated state using a hand-crafted reward calculator for this particular game. By adding appropriate rewards for rules that make the level more fun to play, we can train the RL agent to generate certain types of maps or levels. The biggest advantage of this framework is that after training is complete, we can generate practically infinite unique game levels at the click of a button, without having to design anything manually.

现在，为了确保此代理生成的环境具有良好的质量，我们需要某种反馈机制。本文通过使用手工奖励计算器针对此特定游戏比较先前状态和更新状态来构造此机制。通过为使关卡更有趣的规则添加适当的奖励，我们可以训练RL代理生成某些类型的地图或关卡。该框架的最大优点是，训练完成后，我们只需单击一下按钮，便可以生成几乎无限的独特游戏关卡，而无需手动进行任何设计。

The paper also contains comparisons between different approaches that the RL agent can use to traverse and edit the environment. If you’d like to get more details on the performance comparison between these methods, here is the full text of the research results.

本文还包含了RL代理可以用来遍历和编辑环境的不同方法之间的比较。如果您想获得这些方法之间的性能比较的详细信息，下面是全文的研究成果。

总体研究方向 (General Research Direction)

While the games that were use in this paper’s experiments are simple 2D games, this research direction excites me because we can build upon this work to create large open-world 3D game environments.

尽管本文实验中使用的游戏是简单的2D游戏，但该研究方向使我很兴奋，因为我们可以在此工作的基础上创建大型开放世界3D游戏环境。

This has the potential of changing online multiplayer gaming experience. Imagine, if at the start of every multiplayer open-world game, we could generate a new and unique tactical map every single time. This means we do not need to wait for the game developers to release new maps every few months or years, but we can do so right within the game with AI, which is really cool!

这有可能改变在线多人游戏体验。想象一下，如果在每个多人开放世界游戏开始时，我们都可以每次生成一个新的独特战术地图。这意味着我们不需要等待游戏开发人员每隔几个月或几年发布一次新地图，但是我们可以使用AI在游戏中完成发布，这真的很酷！