CS 188 | Spring 2021 Project 2: Multi-Agent Search

最新推荐文章于 2024-11-05 19:07:11 发布

&露从今夜白

最新推荐文章于 2024-11-05 19:07:11 发布

阅读量4.2k

点赞数 5

分类专栏： python 文章标签： python pycharm

本文链接：https://blog.csdn.net/m0_48134027/article/details/120340931

版权

python 专栏收录该内容

3 篇文章

订阅专栏

实验二：吃豆人（对抗搜索）

一.项目说明

题目网页

项目代码空白框架

在这个项目中，我们将为经典版本的Pacman设计代理，包括幽灵。在此过程中，您将实现minimax和expectimax搜索，并尝试评估函数设计。

完成作业只需要完成5个题目，按照项目介绍的步骤进行完成，主要是在multiAgents.py文件中进行补充代码

实验文件

需要编写的文件	备注
multiAgents.py	所有的搜索智能体都将在此文件中
需要查看的文件
pacman.py	运行吃豆人游戏的主文件。该文件包含了GameState类，它将在游戏中被广泛运用
game.py	该文件实现了吃豆人游戏的运行逻辑，包含了像AgentState，Agent，Direction，Grid等几个起到支持作用的类
util.py	该文件包含了实现搜索算法需要的数据结构
可忽略的支撑文件
ghostAgents.py	该文件控制幽灵的智能体
graphicsDisplay.py	该文件实现吃豆人游戏的图形界面
graphicsUtils.py	该文件为吃豆人游戏的图形界面提供支持
keyboardAgents.py	该文件实现通过键盘控制吃豆人
layout.py	该文件包含了读取游戏布局文件和保存布局内容的代码
textDisplay.py	该文件为吃豆人游戏提供ASCII码形式展现的图形

二.实验代码

问题一代码略,主要对问题2,3进行了较细致的注释,希望能对你有帮助;

Question 2：Minimax


class MinimaxAgent(MultiAgentSearchAgent):
    """
    Your minimax agent (question 2)
    """

    def getAction(self, gameState):
        """
        Returns the minimax action from the current gameState using self.depth
        and self.evaluationFunction.

        Here are some method calls that might be useful when implementing minimax.

        gameState.getLegalActions(agentIndex):
        Returns a list of legal actions for an agent
        agentIndex=0 means Pacman, ghosts are >= 1

        gameState.getNextState(agentIndex, action):
        Returns the child game state after an agent takes an action

        gameState.getNumAgents():
        Returns the total number of agents in the game

        gameState.isWin():
        Returns whether or not the game state is a winning state

        gameState.isLose():
        Returns whether or not the game state is a losing state
        """

        "*** YOUR CODE HERE ***"
        # 小鬼序号列表
        GhostIndex = [i for i in range(1, gameState.getNumAgents())]
        # 目标函数：递归终止条件，无论是输、赢还是遍历完了都要结束，结合评价函数找出最值
        def term(state, d):
            return state.isWin() or state.isLose() or d == self.depth

        # 根据幽灵序号返回相应最值
        def min_value(state, d, ghost):  # minimizer
            # 如果达到目标函数终止条件返回最值
            if term(state, d):
                return self.evaluationFunction(state)

            "Value for Min node. May have multiple ghosts"
            # min函数评价值初始为正无穷
            v = float("inf")
            #递归求值
            for action in state.getLegalActions(ghost):
                # 一层遍历完了就换下一层
                if ghost == GhostIndex[-1]:
                    v = min(v, max_value(state.getNextState(ghost, action), d + 1))
                else:
                    v = min(v, min_value(state.getNextState(ghost, action), d, ghost + 1))
            # 返回最佳最小值
            return v

        # 同上求最佳最大值（针对pcman）
        def max_value(state, d):  # maximizer
            # 如果达到目标函数终止条件返回最值
            if term(state, d):
                return self.evaluationFunction(state)

            "Value for Max node"
            # max函数评价值初始为负无穷
            v = float("-inf")
            for action in state.getLegalActions(0):
                # 豆豆人只有一个所以可以直接跳转到 ”1“号小鬼
                # 注意：min_value里面控制层数，所以这里d不用加 1， 如果加 1 层数就要翻倍
                #     这也是为什么网上有一些代码会在term()函数里乘 2，差别就在这里
                v = max(v, min_value(state.getNextState(0, action), d, 1))
            # print(v)
            return v

        "Select action for Max node"
        # 开始实际调用函数，
        # 注意：虽然要求最大利益的pcman走法，但这里必须调用min_value,
        # 因为这里实际模拟了最后的max_value操作，因为我们要返回的终极目标是getAction()-->action
        # 运用列表排序，十分巧妙的提取最大值所对应的action即最后return res[-1][0]
        res = [(action, min_value(gameState.getNextState(0, action), 0, 1)) for action in
               gameState.getLegalActions(0)]
        res.sort(key=lambda k: k[1])

        return res[-1][0]
    # util.raiseNotDefined()

Question3: Alpha-Beta Pruning


class AlphaBetaAgent(MultiAgentSearchAgent):
    """
    Your minimax agent with alpha-beta pruning (question 3)
    """

    def getAction(self, gameState):
        """
        Returns the minimax action using self.depth and self.evaluationFunction
        """
        "*** YOUR CODE HERE ***"
        # 加入剪枝函数，代码大体框架不变，仅仅加了A，B就可较好提升效率
        # 但在这里我想说的是：由于min max 交替出现，
        # 即意味着A B 的值互相交替
        # 所谓剪枝就是上层对下层的约束，而此约束又源于下层（之前的节点）
        #
        GhostIndex = [i for i in range(1, gameState.getNumAgents())]

        def term(state, d):
            return state.isWin() or state.isLose() or d == self.depth

        def min_value(state, d, ghost, A, B):  # minimizer

            if term(state, d):
                return self.evaluationFunction(state)

            v = float("inf")
            # 注意：action:是指ghost的action，
            for action in state.getLegalActions(ghost):
                if ghost == GhostIndex[-1]:  # next is maximizer with pacman
                    v = min(v, max_value(state.getNextState(ghost, action), d + 1, A, B))
                else:  # next is minimizer with next-ghost
                    v = min(v, min_value(state.getNextState(ghost, action), d, ghost + 1, A, B))

                # 主要差别：
                # 注意：每个子块都会执行此代码，v->max_value,且v是不断取小的，
                # 一旦 v<A那么此 action 就必然失去竞争力，之后的只会比他跟小，所以剪枝
                # v=B
                if v < A:
                    return v
                # 这行代码存在的意义：
                # 注意：这行代码会在每一个循环中运行一遍，不断压低（或提高最优值）最大程度让后面的代码逼近
                #
                B = min(B, v)

            return v

        def max_value(state, d, A, B):  # maximizer

            if term(state, d):
                return self.evaluationFunction(state)

            v = float("-inf")
            for action in state.getLegalActions(0):
                v = max(v, min_value(state.getNextState(0, action), d, 1, A, B))

                # 同上，越界则裁剪
                # v=A
                if v > B:
                    return v
                A = max(A, v)

            return v

        def alphabeta(state):

            v = float("-inf")
            act = None
            A = float("-inf")
            B = float("inf")

            for action in state.getLegalActions(0):  # maximizing
                tmp = min_value(gameState.getNextState(0, action), 0, 1, A, B)

                # 注意这里的v会是最后的B，对应记录下action
                if v < tmp:  # same as v = max(v, tmp)
                    v = tmp
                    act = action

                # 最后一次遍历
                if v > B:  # pruning
                    return v
                A = max(A, tmp)

            return act

        return alphabeta(gameState)
        # util.raiseNotDefined()

Question 4：Expectimax

class ExpectimaxAgent(MultiAgentSearchAgent):
    """
      Your expectimax agent (question 4)
    """

    def getAction(self, gameState):
        """
        Returns the expectimax action using self.depth and self.evaluationFunction

        All ghosts should be modeled as choosing uniformly at random from their
        legal moves.
        """
        "*** YOUR CODE HERE ***"
        GhostIndex = [i for i in range(1, gameState.getNumAgents())]

        def term(state, d):
            return state.isWin() or state.isLose() or d == self.depth

        def exp_value(state, d, ghost):  # minimizer

            if term(state, d):
                return self.evaluationFunction(state)

            v = 0
            # 每一种概率为1/count
            # count为pacam走了一步之后，ghost可以走成多少种状态
            # 有可能pacman走完游戏就结束了，此时count = 0，此时只需要直接取各sumvalue最大值即可
            prob = 1 / len(state.getLegalActions(ghost))
            #
            for action in state.getLegalActions(ghost):
                if ghost == GhostIndex[-1]:
                    v += prob * max_value(state.getNextState(ghost, action), d + 1)
                else:
                    v += prob * exp_value(state.getNextState(ghost, action), d, ghost + 1)
            # print(v)
            return v

        def max_value(state, d):  # maximizer

            if term(state, d):
                return self.evaluationFunction(state)

            v = -10000000000000000
            for action in state.getLegalActions(0):
                v = max(v, exp_value(state.getNextState(0, action), d, 1))
            # print(v)
            return v

        res = [(action, exp_value(gameState.getNextState(0, action), 0, 1)) for action in
               gameState.getLegalActions(0)]
        res.sort(key=lambda k: k[1])

        return res[-1][0]
        # util.raiseNotDefined()

Question 5：Evaluation Function

def betterEvaluationFunction(currentGameState):
    """
    Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable
    evaluation function (question 5).

    DESCRIPTION: <write something here so we know what you did>
    """
    "*** YOUR CODE HERE ***"
    newPos = currentGameState.getPacmanPosition()
    newFood = currentGameState.getFood().asList()
    newGhostStates = currentGameState.getGhostStates()
    newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]

    # 以当前游戏状态为准,评价函数由距离最近的食物颗粒的距离给出，如果没有颗粒，则为0
    eval = currentGameState.getScore()
    foodDist = float("inf")
    for food in newFood:
        foodDist = min(foodDist, util.manhattanDistance(food, newPos))
    eval += 1.0 / foodDist

    return eval
    # util.raiseNotDefined()

三.测试截图

测试框架别人写好了,直接调用就行;

MinimaxTesting:

测试用例:

python autograder.py -q q2

截图:

Alpha-Beta Pruning Testing :

测试用例:

python autograder.py -q q3 --no-graphics

截图:

Expectimax Testing:

测试用例:

python autograder.py -q q4 --no-graphics

截图:

Evaluation Function Testing :

测试用例:

python autograder.py -q q5 --no-graphics

截图: