minimax算法_如何通过使用minimax算法使Tic Tac Toe游戏无与伦比

minimax算法

by ahmad abdolsaheb

艾哈迈德·阿卜杜勒·塞哈卜

如何通过使用minimax算法使Tic Tac Toe游戏无与伦比 (How to make your Tic Tac Toe game unbeatable by using the minimax algorithm)

I struggled for hours scrolling through tutorials, watching videos, and banging my head on the desk trying to build an unbeatable Tic Tac Toe game with a reliable Artificial Intelligence. So if you are going through a similar journey, I would like to introduce you to the Minimax algorithm.

我花了好几个小时努力浏览教程,观看视频以及在桌子上敲打头,试图用可靠的人工智能打造无与伦比的井字游戏。 因此,如果您正在经历类似的旅程,我想向您介绍Minimax算法。

Like a professional chess player, this algorithm sees a few steps ahead and puts itself in the shoes of its opponent. It keeps playing ahead until it reaches a terminal arrangement of the board (terminal state) resulting in a tie, a win, or a loss. Once in a terminal state, the AI will assign an arbitrary positive score (+10) for a win, a negative score (-10) for a loss, or a neutral score (0) for a tie.

就像专业的国际象棋棋手一样,该算法向前迈出了几步,并将自己放在了对手的鞋上。 它一直在前进,直到到达棋盘的终端排列( 终端状态 ),导致并列,获胜或失败。 一旦处于最终状态,AI将为获胜分配一个任意的正分数(+10),为失败分配一个负的分数(-10),或为平局分配一个中性分数(0)。

At the same time, the algorithm evaluates the moves that lead to a terminal state based on the players’ turn. It will choose the move with maximum score when it is the AI’s turn and choose the move with the minimum score when it is the human player’s turn. Using this strategy, Minimax avoids losing to the human player.

同时,该算法根据玩家的回合来评估导致最终状态的移动。 轮到AI时,它将选择得分最高的动作,而当轮到AI时,它将选择得分最低的动作。 使用这种策略,Minimax可以避免输给人类玩家。

Try it for yourself in the following game.

在以下游戏中自己尝试。

A Minimax algorithm can be best defined as a recursive function that does the following things:

最好将Minimax算法定义为具有以下功能的递归函数:

  1. return a value if a terminal state is found (+10, 0, -10)

    如果找到终端状态,则返回一个值(+ 10、0,-10)
  2. go through available spots on the board

    遍历板上的可用点
  3. call the minimax function on each available spot (recursion)

    在每个可用位置调用minimax函数(递归)
  4. evaluate returning values from function calls

    评估函数调用的返回值
  5. and return the best value

    并返回最佳价值

If you are new to the concept of recursion, I recommend watching this video from Harvard’s CS50.

如果您不熟悉递归概念,建议您观看哈佛CS50上的这段视频

To completely grasp the Minimax’s thought process, let’s implement it in code and see it in action in the following two sections.

为了完全掌握Minimax的思考过程,让我们在代码中实现它,并在接下来的两个部分中对其进行实际操作。

代码中的极小极大 (Minimax in Code)

For this tutorial you will be working on a near end state of the game which is shown in figure 2 below. Since minimax evaluates every state of the game (hundreds of thousands), a near end state allows you to follow up with minimax’s recursive calls easier(9).

对于本教程,您将研究游戏的接近结束状态,如下图2所示。 由于minimax会评估游戏的每个状态(成千上万个),因此近乎结束的状态使您可以更轻松地跟进minimax的递归调用(9)。

For the following figure, assume the AI is X and the human player is O.

对于下图,假设AI为X,人类玩家为O。

To work with the Ti Tac Toe board easier, you should define it as an array with 9 items. Each item will have its index as a value. This will come handy later on. Because the above board is already populated with some X and Y moves, let us define the board with the X and Y moves already in it (origBoard).

为了更轻松地使用Ti Tac Toe板,您应该将其定义为包含9个项目的阵列。 每个项目都有其索引作为值。 稍后会派上用场。 因为上面的木板已经填充了一些X和Y移动,所以让我们定义已经包含X和Y移动的木板( origBoard )。

var origBoard = ["O",1,"X","X",4,"X",6,"O","O"];

Then declare aiPlayer and huPlayer variables and set them to “X” and “O” respectively.

然后声明aiPlayerhuPlayer变量并将它们分别设置为“ X”和“ O”

Additionally, you need a function that looks for winning combinations and returns true if it finds one, and a function that lists the indexes of available spots in the board.

另外,您需要一个函数来寻找获胜的组合并在找到组合时返回true,并且需要一个函数来列出板中可用位置的索引。

Now let’s dive into the good parts by defining the Minimax function with two arguments newBoard and player. Then, you need to find the indexes of the available spots in the board and set them to a variable called availSpots.

现在,通过定义带有两个参数newBoardplayer的Minimax函数,深入探讨这些好地方。 然后,您需要在板上找到可用位置的索引,并将它们设置为名为availSpots的变量。

Also, you need to check for terminal states and return a value accordingly. If O wins you should return -10, if X wins you should return +10. In addition, if the length of the availableSpots array is zero, that means there is no more room to play, the game has resulted in a tie, and you should return zero.

另外,您需要检查端子状态并相应地返回一个值。 如果O获胜,您应该返回-10,如果X获胜,您应该返回+10。 另外,如果availableSpots数组的长度为零,则意味着没有更多的游戏空间,游戏导致平局,您应该返回零。

Next, you need to collect the scores from each of the empty spots to evaluate later. Therefore, make an array called moves and loop through empty spots while collecting each move’s index and score in an object called move.

接下来,您需要从每个空白点收集分数,以便以后进行评估。 因此,通过空斑制造阵列称为移动和循环,同时收集每个移动的索引和得分中的对象称为移动

Then, set the index number of the empty spot that was stored as a number in the origBoard to the index property of the move object. Later, set the empty spot on the newboard to the current player and call the minimax function with other player and the newly changed newboard. Next, you should store the object resulted from the minimax function call that includes a score property to the score property of the move object.

然后,将在origBoard中存储为数字的空白点的索引号设置为移动对象的index属性。 后来,设置在newboard当前的播放器的空白处,并呼吁与其他玩家和新改变newboard 极大极小功能。 接下来,您应该存储由minimax函数调用产生的对象,该函数包括对移动对象的score属性包含一个score属性。

If the minimax function does not find a terminal state, it keeps recursively going level by level deeper into the game. This recursion happens until it reaches a terminal state and returns a score one level up.

如果minimax函数没有找到终端状态,它将在游戏中逐级递归地进行下去。 此递归会一直进行到到达最终状态并返回分数上一级为止。

Finally, Minimax resets newBoard to what it was before and pushes the move object to the moves array.

最后,Minimax将newBoard重置为之前的状态,并将move对象推入moves数组。

Then, the minimax algorithm needs to evaluate the best move in the moves array. It should choose the move with the highest score when AI is playing and the move with the lowest score when the human is playing. Therefore, If the player is aiPlayer, it sets a variable called bestScore to a very low number and loops through the moves array, if a move has a higher score than bestScore, the algorithm stores that move. In case there are moves with similar score, only the first one will be stored.

然后,minimax算法需要评估moves数组中的最佳移动 。 它应该选择当AI是打了最高分,并与当人打得分最低的举动举动 。 因此,如果玩家aiPlayer ,则会将名为bestScore的变量设置为一个非常小的数字,并循环通过moves数组,如果某个动作得分高于bestScore ,则算法会存储该move 。 如果有得分相似的动作,则只会存储第一个。

The same evaluation process happens when player is huPlayer, but this time bestScore would be set to a high number and Minimax looks for a move with the lowest score to store.

玩家huPlayer时 ,也会执行相同的评估过程,但是这次bestScore将设置为一个高数字,而Minimax会寻找分数最低的动作来存储。

At the end, Minimax returns the object stored in bestMove.

最后,Minimax返回存储在bestMove中的对象。

That is it for the minimax function. :) you can find the above algorithm on github and codepen. Play around with different boards and check the results in the console.

minimax函数就是这样。 :)您可以在githubcodepen上找到上述算法。 在不同的板上玩耍,并在控制台中检查结果。

In the next section, let’s go over the code line by line to better understand how the minimax function behaves given the board shown in figure 2.

在下一节中,让我们逐行浏览代码,以更好地理解给定图2所示板的minimax函数的行为。

最小最大动作 (Minimax in action)

Using the following figure, let’s follow the algorithm’s function calls (FC) one by one.

使用下图,让我们一个接一个地关注算法的函数调用( FC )。

Note: In figure 3, large numbers represent each function call and levels refer to how many steps ahead of the game the algorithm is playing.

注意:在图3中,大数字表示每个函数调用,级别表示算法正在玩的游戏要执行的步骤。

1.origBoard and aiPlayer is fed to the algorithm. The algorithm makes a list of the three empty spots it finds, checks for terminal states, and loops through every empty spot starting from the first one. Then, it changes the newBoard by placing the aiPlayer in the first empty spot. After that, it calls itself with newBoard and the huPlayer and waits for the FC to return a value.

1. origBoardaiPlayer被提供给算法。 该算法列出了找到的三个空白点,检查终端状态,并从第一个空白点开始循环遍历每个空白点。 然后,它通过将aiPlayer放在第一个空白位置来更改newBoard 之后, 它使用newBoardhuPlayer调用自身,并等待FC返回值。

2. While the first FC is still running, the second one starts by making a list of the two empty spots it finds, checks for terminal states, and loops through the empty spot starting from the first one. Then, it changes the newBoard by placing the huPlayer in the first empty spot. After that it calls itself with newBoard and the aiPlayer and waits for the FC to return a value.

2.当第一个FC仍在运行时,第二个FC首先列出它找到的两个空白点,检查端子状态,并从第一个空白点开始循环遍历该空白点。 然后,它通过将huPlayer放在第一个空白位置来更改newBoard 之后 它使用newBoardaiPlayer调用自身,并等待FC返回值。

3. Finally the algorithm makes a list of the empty spots, and finds a win for the human player after checking for terminal states. Therefore, it returns an object with a score property and value of -10.

3.最后,算法列出空位列表,并在检查终端状态后为人类玩家找到胜利。 因此,它返回一个得分属性为-10的对象。

Since the second FC listed two empty spots, Minimax changes the newBoard by placing huPlayer in the second empty spot. Then, it calls itself with the new board and the aiPlayer.

由于第二个FC列出了两个空白点,因此Minimax 通过将 huPlayer 放在第二个空白点来 更改 newBoard 然后,它使用新的开发板和 aiPlayer进行调用

4. The algorithm makes a list of the empty spots, and finds a win for the human player after checking for terminal states. Therefore, it returns an object with a score property and value of -10.

4.该算法列出空点列表,并在检查终端状态后为人类玩家赢得胜利。 因此,它返回一个得分属性为-10的对象。

On the second FC, the algorithm collects the values coming from lower levels (3rd and 4th FC). Since huPlayer’s turn resulted in the two values, the algorithm chooses the lowest of the two values. Because both of the values are similar, it chooses the first one and returns it up to the first FC.

在第二个FC上,该算法收集来自较低级别(第3个和第4个FC)的值。 由于 huPlayer 的回合产生了两个值,因此该算法会选择两个值中的最小值。 因为两个值都相似,所以它选择第一个,并将其返回到第一个FC。

At this point the first FC has evaluated the score of moving aiPlayer in the first empty spot. Next, it changes the newBoard by placing aiPlayer in the second empty spot. Then, it calls itself with the newBoard and the huPlayer.

此时,第一个FC已评估 了第一个空白点 中移动 aiPlayer 的得分 接下来,它 通过将 aiPlayer 放置 在第二个空白位置 来更改 newBoard 然后,它使用 newBoard huPlayer进行调用

5. On the fifth FC, The algorithm makes a list of the empty spots, and finds a win for the human player after checking for terminal states. Therefore, it returns an object with a score property and value of +10.

5.在第五个FC上,该算法列出空白点,并在检查终端状态后为人类玩家找到获胜者。 因此,它返回一个得分属性为+10的对象。

After that, the first FC moves on by changing the newBoard and placing aiPlayer in the third empty spot. Then, it calls itself with the new board and the huPlayer.

之后,第一个功能区通过更改 newBoard 并将 aiPlayer 放置 在第三个空白位置 继续前进 然后,它使用新的开发板和 huPlayer进行调用

6. The 6th FC starts by making a list of two empty spots it finds, checks for terminal states, and loops through the two empty spots starting from the first one. Then, it changes the newBoard by placing the huPlayer in the first empty spot. After that, it calls itself with newBoard and the aiPlayer and waits for the FC to return a score.

6.第6个FC首先列出其找到的两个空白点,然后检查端子状态,然后从第一个空白点开始循环遍历两个空白点。 然后,通过将huPlayer放在第一个空白位置来更改newBoard 之后, 它使用newBoardaiPlayer调用自身,并等待FC返回分数。

7. Now the algorithm is two level deep into the recursion. It makes a list of the one empty spot it finds, checks for terminal states, and changes the newBoard by placing the aiPlayer in the empty spot. After that, it calls itself with newBoard and the huPlayer and waits for the FC to return a score so it can evaluate it.

7.现在,该算法在递归中有两个层次。 它列出了找到的一个空白点,检查终端状态,并通过将aiPlayer放置在空白点来更改newBoard 之后, 它使用newBoardhuPlayer调用自身,并等待FC返回分数,以便可以对其进行评估。

8. On the 8th FC, the algorithm makes an empty list of empty spots, and finds a win for the aiPlayer after checking for terminal states. Therefore, it returns an object with score property and value of +10 one level up (7th FC).

8.在第8 FC上, 该算法列出了一个空的空白点列表,并在检查了终端状态后为aiPlayer找到了一个获胜者。 因此,它将返回一个得分属性且其值为+10向上一级(第7个FC)的对象。

The 7th FC only received one positive value from lower levels (8th FC). Because aiPlayer’s turn resulted in that value, the algorithm needs to return the highest value it has received from lower levels. Therefore, it returns its only positive value (+10) one level up (6th FC).

第7个FC仅从较低级别(第8个FC)收到一个正值。 由于 aiPlayer的回合产生了该值,因此该算法需要返回从较低级别获得的最高值。 因此,它仅将其正值(+10)返回上一级(第6个FC)。

Since the 6th FC listed two empty spots, Minimax changes newBoard by placing huPlayer in the second empty spot. Then, calls itself with the new board and the aiPlayer.

由于第6个FC列出了两个空白点,因此Minimax 通过将 huPlayer 放在第二个空白点来 更改 newBoard 然后,使用新的开发板和 aiPlayer 调用自身

9. Next, the algorithm makes a list of the empty spots, and finds a win for the aiPlayer after checking for terminal states. Therefore, it returns an object with score properties and value of +10.

9.接下来,该算法列出空白点,并在检查终端状态后找到aiPlayer的获胜者。 因此,它返回得分属性为+10的对象。

At this point, the 6 FC has to choose between the score (+10)that was sent up from the 7th FC (returned originally from from the 8 FC) and the score (-10) returned from the 9th FC. Since huPlayer’s turn resulted in those two returned values, the algorithm finds the minimum score (-10) and returns it upwards as an object containing score and index properties.

此时,第6 FC必须在从第7 FC发送来的分数(+10)(最初是从8 FC返回)与从第9 FC返回的分数(-10)之间进行选择。 由于 huPlayer 的回合导致返回了这两个值,因此该算法会找到最低得分(-10),并将其作为包含得分和索引属性的对象向上返回。

Finally, all three branches of the first FC have been evaluated ( -10, +10, -10). But because aiPlayer’s turn resulted in those values, the algorithm returns an object containing the highest score (+10) and its index (4).

最后,对第一个FC的所有三个分支进行了评估(-10,+ 10,-10)。 但是由于aiPlayer的回合产生了这些值,因此该算法返回的对象包含最高分数(+10)及其索引(4)。

In the above scenario, Minimax concludes that moving the X to the middle of the board results in the best outcome. :)

在上述情况下,Minimax得出的结论是,将X移至电路板的中部将产生最佳结果。 :)

结束! (The End!)

By now you should be able to understand the logic behind the Minimax algorithm. Using this logic try to implement a Minimax algorithm yourself or find the above sample on github or codepen and optimize it.

现在,您应该已经能够了解Minimax算法背后的逻辑。 使用这种逻辑尝试自己实现一个Minimax算法,或者在上面找到上述示例 githubcodepen并对其进行优化。

Thanks for reading! If you liked this story, please recommend it by clicking the ❤ button on the side and sharing it on social media.

谢谢阅读! 如果您喜欢此故事,请单击侧面的❤按钮并在社交媒体上分享,以推荐该故事。

Special thanks to Tuba Yilmaz, Rick McGavin, and Javid Askerov for reviewing this article.

特别感谢Tuba Yilmaz,Rick McGavin和Javid Askerov 审阅了本文。

翻译自: https://www.freecodecamp.org/news/how-to-make-your-tic-tac-toe-game-unbeatable-by-using-the-minimax-algorithm-9d690bad4b37/

minimax算法

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值