障碍 期权_障碍塔挑战赛已经开始!

障碍 期权

Three weeks ago we announced the release of the Obstacle Tower Environment, a new benchmark for Artificial Intelligence (AI) research built using our ML-Agents toolkit. One week ago we followed that up with the launch of the Obstacle Tower Challenge, a contest that offers researchers and developers the chance to compete to train the best-performing agents on this new task. The reception so far from the community has been great. I wanted to take the time to talk a little more about our motivation for the challenge, and what we hope it will promote.

三周前,我们宣布发布障碍塔环境 ,这是使用我们的ML-Agents工具箱构建的人工智能(AI)研究的新基准。 一周前,我们随后发起了障碍塔挑战赛,该竞赛为研究人员和开发人员提供了机会,以竞争的方式培训表现最佳的代理以完成这项新任务。 到目前为止,离社区的接待很棒。 我想花些时间谈论我们挑战的动机以及我们希望它将推动什么。

The idea for the Obstacle Tower came from looking at the current field of benchmarks being used in Artificial Intelligence research today. Despite the great theoretical and engineering work being put into developing new algorithms, many researchers were still focused on using decades-old home console games such as Pong, Breakout, or Ms. PacMan. Aside from containing crude graphics and gameplay mechanics, these games are also completely deterministic, meaning that a player (or computer) could memorize a series of button presses, and even be able to solve them blindfolded. Given these drawbacks, we wanted to start from scratch and build a procedurally generated environment that we believe can be a benchmark that pushes modern AI algorithms to their limits. Specifically, we wanted to focus on AI agents vision, control, planning, and generalization abilities.

障碍塔的想法来自对当今人工智能研究中使用的基准测试的当前领域的考察。 尽管开发新算法方面投入了大量的理论和工程工作,但许多研究人员仍专注于使用具有数十年历史的家用游戏机,例如Pong,Breakout或PacMan女士。 除了包含原始图形和游戏玩法机制之外,这些游戏还具有完全确定性,这意味着玩家(或计算机)可以记住一系列按钮,甚至可以蒙住眼睛解决它们。 鉴于这些缺点,我们希望从头开始,构建一个程序生成的环境,我们认为这可以成为将现代AI算法推向极限的基准。 具体来说,我们希望专注于AI代理的愿景,控制,计划和概括能力。

We believe that the Obstacle Tower has the potential to help contribute to research into AI, specifically a sub-field called Deep Reinforcement Learning (Deep RL), which focuses on agents which learn from trial-and-error experience. Our own internal tests have shown that even the current state-of-the-art algorithms in Deep RL are only able to solve on average a few test floors of Obstacle Tower. The graph below is taken from our paper, and shows that the top Deep RL algorithms (PPO and Rainbow) are still nowhere near the average human player when it comes to learning to play a deterministic version of the game (No Generalization) let alone a version where things look and play differently than what they were trained on (Weak and Strong Generalization).

我们认为障碍塔有潜力帮助对AI的研究做出贡献,特别是一个名为“深度强化学习”(Deep RL)的子领域,该领域专注于从试验和错误经验中学习的特工。 我们自己的内部测试表明,即使Deep RL中当前的最新算法也只能平均解决Obstacle Tower的几个测试楼层。 下图摘自我们的论文,表明在学习玩确定性版本的游戏(No Generalization)时,顶级的Deep RL算法(PPO和Rainbow)仍远未达到普通人类玩家的水平,更不用说了。版本的外观和玩法与所接受的训练有所不同(弱和强泛化)。

At Unity, we think that the research being conducted on AI has benefits not only to the broader technology community but also to game developers and players. Smarter AI means better NPCs, more thorough playtesting, and ultimately more engaging player experiences. That is why we decided to launch the Obstacle Tower Challenge. To invite the best minds in Deep RL research and beyond to make an effort to solve the tower, and have those insights contribute to a wider world.

在Unity,我们认为针对AI进行的研究不仅有益于更广泛的技术社区,也有益于游戏开发商和玩家。 智慧的AI意味着更好的NPC,更全面的游戏测试以及最终更引人入胜的玩家体验。 这就是为什么我们决定发起障碍塔挑战赛的原因。 邀请Deep RL研究及以后的顶尖人才努力解决塔楼问题,并使这些见解为更广阔的世界做出贡献。

To help us evaluate entries, we have teamed up with AICrowd, a platform for hosting Machine Learning challenges. The challenge is taking place in, with the Round 1 submission deadline of March 31st and participants in the contest will submit trained agents, which will be evaluated on a special test set of Obstacle Tower levels. To enter the contest, learn more about the process, and to get started, go here.

为了帮助我们评估参赛作品,我们与AICrowd合作, AICrowd是一个托管机器学习挑战的平台。 挑战正在进行中,第一轮提交截止日期为3月31日,比赛的参与者将提交训练有素的特工,这些特工将根据障碍塔等级的特殊测试集进行评估。 要参加比赛,请详细了解该过程,然后开始这里

演示地址

We are happy to share that Google Cloud Platform (GCP) is a prize sponsor of the contest, and on top of the cash prizes and travel grants provided by Unity, winning participants will also receive GCP credits. These prizes are collectively valued at over $100K! Using GCP, it is possible to train agents on the cloud remotely rather than using desktop resources. This can both speed up training time, as well as make it simpler to run multiple concurrent experiments. Users who sign up for a new GCP account get $300 in free credit. On top of this, the first 50 participants who pass Round 1 of the Obstacle Tower Challenge will receive an additional $1100 in credits. The top three winners from Round 2 will receive an additional $5000 in credits.

我们很高兴与您分享Google Cloud Platform(GCP)是竞赛的赞助商,除了Unity提供的现金奖励和旅行补助金之外,获胜的参与者还将获得GCP积分。 这些奖品的总价值超过10万美元! 使用GCP,可以在云上远程培训代理,而无需使用桌面资源。 这样既可以加快训练时间,又可以简化进行多个并发实验的过程。 注册新GCP帐户的用户可获得$ 300的免费信用。 最重要的是,通过障碍塔挑战赛第一轮的前50名参与者将获得额外的$ 1100赠金。 第二轮的前三名优胜者将获得额外的$ 5000赠金。

For those new to training agents, or those wanting an easy way to get started, we have written a guide on using training an agent on Google Cloud Platform. The guide walks through setting up a cloud computing instance and using a state of the art algorithm provided by Google Dopamine to train an agent to progress in the Obstacle Tower. You can read the guide here.

对于那些刚刚开始培训代理的人,或者想要一种简单的入门方法的人,我们编写了关于在Google Cloud Platform上使用培训代理的指南。 该指南逐步介绍了如何设置云计算实例,并使用Google多巴胺提供的最新算法来训练代理在障碍塔中前进。 您可以在此处阅读指南。

If you have any questions about the contest, including support on submitting entries, please see the discussion forum here. For general issues or discussion of the environment itself, see our GitHub repo here. To learn more about the environment, read our research paper. We look forward to seeing the creative solutions the community comes up with to the challenge!

如果您对比赛有任何疑问,包括对参赛作品的支持,请访问此处的讨论论坛。 有关环境本身的一般性问题或讨论,请参阅此处的GitHub存储库。 要了解有关环境的更多信息,请阅读我们的研究论文 。 我们期待看到社区提出挑战的创造性解决方案!

翻译自: https://blogs.unity3d.com/2019/02/18/the-obstacle-tower-challenge-is-live/

障碍 期权

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值