迈向人工智能的下一步-《星际争霸II》

by Daniel Bourke

丹尼尔·伯克(Daniel Bourke)

迈向人工智能的下一步-《星际争霸II》 (The Next Step Towards Artificial General Intelligence — StarCraft II)

I’m working through my own self-created Artificial Intelligence Master’s Degree. The creations which come out of DeepMind fascinate me. When they drop a mixtape with one of the biggest names in gaming in order to push Artificial Intelligence (AI) research forward, I listen to it.

我正在研究自己创建的人工智能硕士学位DeepMind产生的作品让我着迷。 当他们用游戏中最著名的人物之一混音带以推动人工智能(AI)研究向前发展时,我会听。

Before we get into the specifics of this collaboration, a quick backstory of the history of AI and gaming.

在我们详细介绍这项合作之前,请快速回顾一下AI和游戏的历史。

AlphaGo shocked the world of Go by introducing moves which went against hundreds of years of game-playing strategy while defeating several world champions. DeepBlue did the same for chess in 1997, defeating then world champion Gary Kasparov.

AlphaGo 推出了与数百年来的游戏策略背道而驰的棋子 ,同时击败了数个世界冠军,震惊了围棋世界。 DeepBlue在1997年为国际象棋做了同样的事情 ,击败了当时的世界冠军加里·卡斯帕罗夫。

A computer beat a world champion at chess in 1997, why did it take until 2016 to conquer the game of Go? And why StarCraft II now?

1997年,一台计算机在国际象棋比赛上击败了世界冠军,为什么要到2016年才征服围棋比赛? 那为什么现在要《星际争霸II》呢?

Let me shed a little light on the situation.

让我稍微说明一下情况。

After 4 moves (2 moves for white and 2 moves for black) in chess, the number of possible board combinations is 8,902. In total, there are more possible board combinations than there are electrons in the observable universe. But the total number of sensible moves (such as, not needlessly sacrificing a Queen to a pawn) in chess is a little lower, in the order of ten duodecillion or 10 followed by 40 zeros.

象棋下4棋(白棋2棋,黑棋2棋)后,可能的棋盘组合数为8,902。 总体而言,在可观察的宇宙中,可能存在的板组合数量超过电子数量。 但是国际象棋中明智的举动(例如,不必要地将皇后牺牲为典当)的总数要少一些,大约为十二十亿分之一或十个十进制,然后是40个零。

40,000,000,000,000,000,000,000,000,000,000,000,000,000
40,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

For 1997’s fastest supercomputer to calculate the exact winning set of moves for every possible chess board layout (end game) would take, the sun would’ve engulfed Earth several times over. Obviously, a brute force approach like this wasn’t feasible.

对于1997年最快的超级计算机,它可以计算出每一种可能的国际象棋棋盘布局(最终游戏)所需要的确切动作,太阳将吞没地球的次数超过了数倍。 显然,像这样的暴力手段是不可行的。

深蓝是如何做到的? (How did Deep Blue do it?)

Deep Blue used a selective system which would assess the state of the board before choosing a certain sequence of moves to explore. Moves which didn’t maximize the probability of success were eliminated.

深蓝使用了选择性系统,该系统将在选择某些顺序进行探索之前评估棋盘的状态。 消除了未使成功概率最大化的动作。

This selection strategy combined with parallel processing allowed Deep Blue to calculate 60 billion possible moves within three minutes, the time allowed for each player’s move in classical chess.

这种选择策略与并行处理相结合,使Deep Blue在三分钟内即可计算出600亿个可能的动作,这是每个棋手在经典国际象棋中所允许的时间。

This kind of power led Kasparov to accuse IBM of cheating after his dethroning.

这种权力导致卡斯帕罗夫(Kasparov)退位后指控IBM作弊。

为什么Go还要花近二十年的时间来征服? (Why did Go take almost another two decades to conquer?)

Without rules, let’s compare the two game boards. As you can you see, the chess board looks fancy with its colored squares but the Go board has five times more squares.

没有规则,让我们比较两个游戏板。 如您所见,国际象棋棋盘看起来带有彩色正方形,而Go棋盘的正方形多五倍。

Remember how there was 8,902 possible moves after the first four moves in chess? Go has 46,655,640 possibles moves after the first three moves. The number of legal Go positions on a 19x19 board has been calculated to be:

还记得国际象棋的前四步棋之后有8,902步可能的棋步吗? 在前三个动作之后,Go拥有46,655,640个可能的动作。 19x19板上合法Go位置的数量经计算为:

208,168,199,381,979,984,699,478,633,344,862,770,286,522,453,884,530,548,425,639,456,820,927,419,612,738,015,378,525,648,451,698,519,643,907,259,916,015,628,128,546,089,888,314,427,129,715,319,317,557,736,620,397,247,064,840,935.
208,168,199,381,979,984,699,478,633,344,862,770,286,522,453,884,530,548,425,639,456,820,927,419,612,738,015,378,525,648,451,698,519,643,907,259,916,015,628,128,546,089,888,557,620,935,247,736,736,929,736,929,736,736,319,319,319,319

Okay, really big numbers, computing power has increased a bunch since 1997, Go must be easy to take on.

好的,确实是一个很大的数字,自1997年以来,计算能力已经增长了一大堆,Go必须易于使用。

Not entirely.

不是完全。

Go becomes more complex when you consider the goal of incremental influence of the board and capturing an undefined amount of territory rather than trying to capture another players king.

当您考虑增加董事会影响力并占领不确定数量的领土而不是试图占领另一位玩家之王时,围棋变得更加复杂。

Even with the full power of Moore’s law, brute force wasn’t an option for conquering Go.

即使拥有摩尔定律的全部力量,强暴也不是征服Go的选择。

AlphaGo是如何做到的? (How did AlphaGo do it?)

A combination of three techniques: advanced tree search, deep neural networks and reinforcement learning.

三种技术的组合:高级树搜索,深度神经网络和强化学习。

Tree search is a popular technique used in AI to find the optimal path to a goal. Imagine you’re at the top of a Christmas tree and your goal is find a blue ornament a few branches down, however, you have no idea which branch it’s on. In order to find the ornament, you have to search branches of the tree.

树木搜索是AI中用于找到目标最佳路径的一种流行技术。 想象一下,您在圣诞树的顶端,而您的目标是找到一个蓝色的装饰品,上面有几个分支,但是您不知道它在哪个分支上。 为了找到装饰品,您必须搜索树的分支。

Deep neural networks involve taking a large input data source and performing several mathematical transformations on it. This results in an output data source which is smaller but still within the same probability distribution as the input data.

深度神经网络涉及获取大量输入数据源并对其进行一些数学转换。 这导致输出数据源较小,但仍在与输入数据相同的概率分布内。

For example, say you have 1 million examples of how you went about finding the blue ornament in the past, this would be your input data source. The output may be a set of the best and most efficient patterns to find the blue ornament.

例如,假设您有100万个过去如何找到蓝色装饰的示例,这将是您的输入数据源。 输出可能是找到蓝色装饰物的一组最佳和最有效的模式。

For AlphaGo, replace the top of the Christmas tree with your current position on a Go board, the branches are your different move options and the blue ornament is the optimal next move to take.

对于AlphaGo,将圣诞树的顶部替换为您在Go板上的当前位置,分支是您不同的移动选项,而蓝色装饰是下一步的最佳选择。

If you’ve ever given your dog a treat for sitting on command, you’ve practiced a form of reinforcement learning. In the beginning, AlphaGo was was shown millions of examples of how humans play Go so it could establish a foundation level of play. When AlphaGo was training to play, it was rewarded for making good moves.

如果您曾经给狗上过指挥令,那么您就练习了一种强化学习。 最初,向AlphaGo展示了数以百万计的人类如何玩Go的示例,从而可以建立起基本的游戏水平。 当AlphaGo训练比赛时,他的出色举动得到了奖励。

Combining these techniques and plenty of compute power leads to a very good Go player, the best in the world.

将这些技术与充足的计算能力相结合,可以打造出世界上最好的Go播放器。

Go is conquered, what’s next?

Go被征服了,下一步是什么?

StarCraft II.

星际争霸II

StarCraft II is a real-time strategy game. Players build armies to go head to head in hopes of taking over the battlefield. But don’t let the simplicity of this description fool you.

《星际争霸II》是一款实时策略游戏。 玩家组建军队并肩作战,希望接管战场。 但是,不要让此描述的简单性蒙骗您。

If you thought Go was a step up from Chess, StarCraft II turns it up to 11.

如果您认为Go是Chess的升级版,那么StarCraft II会将其提高到11。

为什么《星际争霸2》如此重要? (Why is StarCraft II such a big step?)

To start with, armies can contain a variety of different characters and the game interface is in color. Chess and Go only have two characters and colors, black and white.

首先,军队可以包含各种不同的角色,并且游戏界面为彩色。 国际象棋和围棋只有两个字符和颜色,黑白。

The battlefield isn’t entirely visible, portions of the map are hidden unless a player has explored that territory. Imagine trying to plan a move in Chess if you can only see your side of the board.

战场并不完全可见,除非玩家已探索该领土,否则地图的某些部分将被隐藏。 想象一下,如果您只能看到棋盘那边的话,尝试计划在国际象棋中的移动。

Delayed credit assignment — some moves aren’t rewarded until later in the game. Chess and Go both have this but nowhere near the level of StarCraft II.

延迟的信用分配-直到游戏后期才会奖励某些举动。 国际象棋和围棋都拥有这一点,但远不及《星际争霸2》的水平。

Opponents can be one or many. Chess and Go are both played one on one. Imagine trying to take on three people at once in Chess, except all three have a different rule set to you. This is the equivalent of taking on different armies in StarCraft II.

对手可以是一个或多个。 国际象棋和围棋都是一对一的。 想象一下,试图在国际象棋中一次招募三个人,除了三个人都为您设置了不同的规则。 这相当于在《星际争霸2》中加入不同的军队。

These factors make StarCraft II a worthy endeavour indeed. But what’s the point?

这些因素使《星际争霸2》确实值得一游。 但是有什么意义呢?

为什么要创建智能系统来玩游戏? (Why create intelligent systems to play games?)

DeepMind’s goal is to solve intelligence and use it to make the world a better place. Creating systems which can learn to solve complex problems is a fundamental step towards completing this goal.

DeepMind的目标是解决情报,并利用它使世界变得更美好。 创建可以学习解决复杂问题的系统是实现此目标的基本步骤。

Enter games.

输入游戏。

Games are very repeatable states. This means I can play the same game you’re playing and we can both understand what it takes to win and what it means to be a good player. They are also becoming increasingly complex as game development improves alongside technology.

游戏是非常可重复的状态。 这意味着我可以玩与您正在玩的游戏,我们都可以理解赢得胜利所需的条件以及成为一名优秀玩家的含义。 随着游戏开发和技术的进步,它们也变得越来越复杂。

Even with increasingly complex game development, one fundamental aspect of games will always remain, the requirement to solve problems.

即使游戏开发日益复杂,解决问题的要求仍将始终是游戏的一个基本方面。

A game provides a rich opportunity for repeatable problem-solving. Go was considered a challenge because of the big numbers you saw above. What these big numbers don’t convey is that all of them are solutions to problems.

游戏为解决问题提供了很多机会。 由于您在上面看到了很多数字,Go被认为是一项挑战。 这些大数字无法传达的是,所有这些数字都是解决问题的方法。

Systems which learn to play games might seem like a waste of time. But these systems aren’t playing games, they’re learning to solve problems.

学习玩游戏的系统似乎在浪费时间。 但是这些系统不是在玩游戏,而是在学习解决问题。

Creating intelligent systems which learn to play games such as Go and StarCraft II is a crucial step towards creating systems which are adaptable in the ultimate game, real life.

创建学习如何玩Go和StarCraft II等游戏的智能系统,是朝着创建适应最终游戏,现实生活的系统迈出的关键一步。

The outside world is far more complex than any game but it is still comprised of a series of problem-solving. Each day you wake up, you have to solve the problem of how you’re going to get to the bathroom, solve the problem of deciding what to have for breakfast. We’ve gotten used to these things because we’ve done them thousands of times. When we face a problem we haven’t solved before, the difficulty is increased.

外部世界比任何游戏都复杂得多,但它仍然包含一系列解决问题的方法。 每天起床时,您都必须解决如何上厕所的问题,解决决定早餐吃什么的问题。 我们已经习惯了这些事情,因为我们已经完成了数千次。 当我们遇到以前从未解决过的问题时,难度就会增加。

Once an intelligent system learns to solve a problem over and over again, it slowly loses its image of being intelligent. This is becoming the case for AlphaGo.

一旦智能系统一次又一次地学习解决问题,它就会慢慢失去其智能的形象。 AlphaGo的情况正是如此。

Humans have the capability to transfer their problem solving abilities from one domain to another. So far, intelligent systems fall down in this area.

人类具有将解决问题的能力从一个领域转移到另一个领域的能力。 到目前为止,智能系统属于这一领域。

We know AlphaGo can play Go better than any other human, but can it learn to ride a bike? A human can easily go from playing Go to riding a bike. AlphaGo cannot.

我们知道AlphaGo可以比其他任何人玩Go更好,但是它可以学会骑自行车吗? 人类可以轻松地从玩围棋变成骑自行车。 AlphaGo不能。

In order to achieve this ability of transfer learning or what some may refer to as Artificial General Intelligence (AGI), intelligent systems must learn to solve new and more complex problems.

为了实现这种转移学习的能力或某些人所说的人工智能(AGI),智能系统必须学会解决新的和更复杂的问题。

Enter the StarCraft II Learning Environment (SC2LE).

进入《 星际争霸II》学习环境 (SC2LE)。

DeepMind, in collaboration with Blizzard (the makers of StarCraft II), has released SC2LE with the goal of catalyzing AI research in a game not specifically designed for this purpose.

DeepMind与暴雪(《星际争霸II》的制造商)合作,发布了SC2LE,其目的是在非专门为此目的设计的游戏中催化AI研究。

You can imagine SC2LE as a gym where intelligent systems can go and train in the hopes of being able to defeat a professional human player.

您可以将SC2LE想象成一间健身房,在那里可以使用智能系统进行训练,以期能够击败专业的人类运动员。

Tools one can find in SC2LE include a Machine Learning API developed by Blizzard to allows researchers to dig deeper into the game mechanics, an initial dataset of 60,000+ game replays, and PySC2, an open source Python library created by DeepMind to take advantage of Blizzard’s feature-layer API.

可以在SC2LE中找到的工具包括由Blizzard开发的,可让研究人员更深入地研究游戏机制的机器学习API,包含60,000多个游戏重播的初始数据集以及PySC2 (由DeepMind创建的开源Python库,以利用Blizzard的优势)。功能层API。

A joint paper from Blizzard and DeepMind showed some surprising results. Even the best problem-solving systems out of the DeepMind lab failed to complete even one full game of StarCraft II. This includes the Deep Reinforcement Learning algorithm DeepMind crafted which achieved superhuman scores on 49 different Atari games in 2015.

暴雪和DeepMind的联合论文显示了一些令人惊讶的结果。 即使DeepMind实验室中最好的解决问题系统也无法完成《星际争霸II》的完整游戏。 其中包括精心设计的深度强化学习算法DeepMind,该算法在2015年的49种不同的Atari游戏中都获得了超人得分。

Even in the StarCraft II minigames (released in SC2LE), a simplified version of the full game, none of the intelligent systems in the original paper achieved anywhere near the scores of a human professional playing the same game. Some of the agents did, however, achieve comparable results to a novice player in simpler minigames.

即使在完整版游戏的简化版《星际争霸II》迷你游戏(以SC2LE发行)中,原始论文中的任何智能系统都无法达到与玩同一游戏的人类专业人士相同的分数。 但是,某些代理确实在更简单的迷你游戏中取得了与新手玩家相当的结果。

These initial findings are exciting. The fact current intelligent systems fail to produce optimal results on even a simplified version of StarCraft II means there is plenty of room to improve.

这些初步发现令人兴奋。 当前的智能系统即使在简化版本的StarCraft II上也无法产生最佳结果,这意味着仍有很大的改进空间。

The release of SC2LE and the joint paper provides a baseline performance level for AI researchers to challenge in the future.

SC2LE的发布和联合论文为AI研究人员将来提供了挑战的基准性能水平。

接下来要去哪里? (Where to next?)

With the open access to SC2LE, DeepMind and Blizzard hope the community will contribute to building intelligent systems which humans can consider to be worthwhile StarCraft II opponents.

通过对SC2LE的开放访问,DeepMind和暴雪希望社区能够为构建智能系统做出贡献,人类可以认为这是星际争霸II的对手。

Future updates promise the removal of simplifications to the game, making it more like how a human would play and access to more human game replays to help train reinforcement learning agents.

未来的更新有望消除对游戏的简化,使其更像人类的玩法,并获得更多的人类游戏回放来帮助训练强化学习者。

I’ve always been a gamer. I played RuneScape relentlessly as kid. This kind of game playing research fascinates me. However, building the best game players in the world is not what excites me the most.

我一直都是游戏玩家。 我从小就不停地玩RuneScape。 这种游戏研究使我着迷。 但是,培养世界上最好的游戏玩家并不是我最兴奋的事情。

The real value will be gained when an intelligent system is able to learn to adapt the principles it has learned from one game to another, or even a completely different environment without having to start over again.

当一个智能系统能够从一个游戏中学到的原理适应另一种游戏,甚至是完全不同的环境而无需重新开始时,就会获得真正的价值。

If an intelligent system can learn how to play StarCraft II, what other problems could it learn to solve?

如果智能系统可以学习如何玩《星际争霸II》,它还可以学习解决什么其他问题?

For those looking to learn more about the SC2LE, you can read more about the full release on DeepMind’s blog and Siraj Raval has a great introductory video on how to get started with it on his YouTube channel.

对于希望了解有关SC2LE的更多信息的人,您可以在DeepMind的博客上阅读有关完整版本的更多信息,Siraj Raval在其YouTube频道上提供了很好的入门视频介绍如何开始使用它。

DeepMind is taking on challenges which make me want to get out of bed in the morning. As I write this article, they released a paper on AlphaGo Zero, the most advanced version of AlphaGo yet which learned to play Go with zero human intervention.

DeepMind正在接受挑战,这些挑战使我想早上起床。 在我撰写本文时,他们发布了有关AlphaGo Zero的论文, AlphaGo Zero是迄今为止AlphaGo的最高级版本,学会了在零人为干预下玩Go。

I’ll be deconstructing AlphaGo Zero in the coming weeks, be sure to follow me if you’re interested in learning more.

在接下来的几周内,我将解构AlphaGo Zero,如果您有兴趣了解更多信息,请务必关注我。

If you would like to join me on my mission of deconstructing intelligence, I post a weekly video on YouTube documenting my journey through my self-created AI Master’s Degree.

如果您想加入我的行列,以解构智能为己任,我会在YouTube上每周发布一段视频,记录我通过自己创建的AI硕士学位所经历的旅程。

对我有建议或正在学习AI? 我希望收到您的来信! (Have advice for me or learning about AI? I’d love to hear from you!)

Say Hi on: YouTube | Twitter | Email | GitHub | Patreon

YouTube打招呼 | 推特 | 电邮 | GitHub | Patreon

翻译自: https://www.freecodecamp.org/news/the-next-step-towards-artificial-general-intelligence-starcraft-ii-f562d5607e2/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值