Puppo,柯基犬:使用Unity ML-Agents工具包的可爱重载

Building a game is a creative process that involves many challenging steps including defining the game concept and logic, building assets and animations, specifying NPC behaviors, tuning difficulty an...
摘要由CSDN通过智能技术生成

Building a game is a creative process that involves many challenging steps including defining the game concept and logic, building assets and animations, specifying NPC behaviors, tuning difficulty and balance and, finally, testing the game with real players before launch. We believe machine learning can be used across the entire creative process and in today’s blog post we will focus on one of these challenges: specifying the behavior of an NPC.

制作游戏是一个创造性的过程,涉及许多挑战性步骤,包括定义游戏概念和逻辑,构建资产和动画,指定NPC行为,调整难度和平衡,最后在发布前与真实玩家测试游戏。 我们相信机器学习可以在整个创作过程中使用,在今天的博客文章中,我们将重点关注以下挑战之一:指定NPC的行为。

Traditionally, the behavior of an NPC is hard-coded using scripting and behavior trees. These (typically long) lists of rules process information about the surroundings of the NPC (called observations) to dictate its next action. These rules can be time-consuming to write and maintain as the game evolves. Reinforcement learning provides a promising, alternative framework for defining the behavior of an NPC. More specifically, instead of defining the observation to action mapping by hand, you can simply train your NPC by providing it with rewards when it achieves the desired goal.

传统上,使用脚本和行为树对NPC的行为进行硬编码。 这些(通常很长)的规则列表处理有关NPC周围环境的信息(称为观察值),以指示其下一步行动。 随着游戏的发展,编写和维护这些规则可能会非常耗时。 强化学习为定义NPC的行为提供了一个有希望的替代框架。 更具体地说,您无需手动定义观察到操作的映射,而可以通过在NPC实现预期目标时为其提供奖励来简单地训练它。

好小狗,坏小狗的方法 (The good puppy, bad puppy method)

Training an NPC using reinforcement learning is quite similar to how we train a puppy to play fetch. We present the puppy with a treat and then throw the stick. At first, the puppy wanders around not sure what to do, until it eventually picks up the stick and brings it back, promptly getting a treat. After a few sessions, the puppy learns that retrieving a stick is the best way to get a treat and continues to do so.

使用强化学习训练NPC与我们训练小狗玩取情的方法非常相似。 我们给小狗吃点心,然后扔棍子。 刚开始时,小狗四处游荡,不确定该怎么做,直到它最终捡起棍子并把它带回来,并Swift得到治疗。 经过几次训练后,这只小狗得知,取回棍子是获得治疗的最佳方法,并将继续这样做。

That is precisely how reinforcement learning works in training the behavior of an NPC. We provide our NPC with a reward whenever it completes a task correctly. Through multiple simulations of the game (the equivalent of many fetch sessions), the NPC builds an internal model of what action it needs to perform at each instance to maximize its reward, which results in the ideal, desired behavior. Thus, instead of creating and maintaining low-level actions for each observation of the NPC, we only need to provide a high-level reward when a task is completed correctly and the NPC learns the appropriate low-level behavior.

这正是强化学习在训练NPC行为方面的工作方式。 每当NPC正确完成任务时,我们都会给予奖励。 通过对游戏进行多次模拟(相当于许多获取会话),NPC建立了一个内部模型,该模型需要在每个实例上执行什么动作以最大化其奖励,从而产生理想的期望行为。 因此,我们无需为每次对NPC的观察都创建和维护低级动作,而只需要在正确完成任务并且NPC学习适当的低级行为时提供高级奖励即可。

小狗柯基犬 (Puppo, The Corgi)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值