实验二:吃豆人(对抗搜索)
一.项目说明
在这个项目中,我们将为经典版本的Pacman设计代理,包括幽灵。在此过程中,您将实现minimax和expectimax搜索,并尝试评估函数设计。
完成作业只需要完成5个题目,按照项目介绍的步骤进行完成,主要是在multiAgents.py文件中进行补充代码
实验文件
需要编写的文件 | 备注 |
---|---|
multiAgents.py | 所有的搜索智能体都将在此文件中 |
需要查看的文件 | |
pacman.py | 运行吃豆人游戏的主文件。该文件包含了GameState类,它将在游戏中被广泛运用 |
game.py | 该文件实现了吃豆人游戏的运行逻辑,包含了像AgentState,Agent,Direction,Grid等几个起到支持作用的类 |
util.py | 该文件包含了实现搜索算法需要的数据结构 |
可忽略的支撑文件 | |
ghostAgents.py | 该文件控制幽灵的智能体 |
graphicsDisplay.py | 该文件实现吃豆人游戏的图形界面 |
graphicsUtils.py | 该文件为吃豆人游戏的图形界面提供支持 |
keyboardAgents.py | 该文件实现通过键盘控制吃豆人 |
layout.py | 该文件包含了读取游戏布局文件和保存布局内容的代码 |
textDisplay.py | 该文件为吃豆人游戏提供ASCII码形式展现的图形 |
二.实验代码
问题一代码略,主要对问题2,3进行了较细致的注释,希望能对你有帮助;
Question 2:Minimax
class MinimaxAgent(MultiAgentSearchAgent):
"""
Your minimax agent (question 2)
"""
def getAction(self, gameState):
"""
Returns the minimax action from the current gameState using self.depth
and self.evaluationFunction.
Here are some method calls that might be useful when implementing minimax.
gameState.getLegalActions(agentIndex):
Returns a list of legal actions for an agent
agentIndex=0 means Pacman, ghosts are >= 1
gameState.getNextState(agentIndex, action):
Returns the child game state after an agent takes an action
gameState.getNumAgents():
Returns the total number of agents in the game
gameState.isWin():
Returns whether or not the game state is a winning state
gameState.isLose():
Returns whether or not the game state is a losing state
"""
"*** YOUR CODE HERE ***"
# 小鬼序号列表
GhostIndex = [i for i in range(1, gameState.getNumAgents())]
# 目标函数:递归终止条件,无论是输、赢还是遍历完了都要结束,结合评价函数找出最值
def term(state, d):
return state.isWin() or state.isLose() or d == self.depth
# 根据幽灵序号返回相应最值
def min_value(state, d, ghost): # minimizer
# 如果达到目标函数终止条件返回最值
if term(state, d):
return self.evaluationFunction(state)
"Value for Min node. May have multiple ghosts"
# min函数评价值初始为正无穷
v = float("inf")
#递归求值
for action in state.getLegalActions(ghost):
# 一层遍历完了就换下一层
if ghost == GhostIndex[-1]:
v = min(v, max_value(state.getNextState(ghost, action), d + 1))
else:
v = min(v, min_value(state.getNextState(ghost, action), d, ghost + 1))
# 返回最佳最小值
return v
# 同上求最佳最大值(针对pcman)
def max_value(state, d): # maximizer
# 如果达到目标函数终止条件返回最值
if term(state, d):
return self.evaluationFunction(state)
"Value for Max node"
# max函数评价值初始为负无穷
v = float("-inf")
for action in state.getLegalActions(0):
# 豆豆人只有一个所以可以直接跳转到 ”1“号小鬼
# 注意:min_value里面控制层数,所以这里d不用加 1, 如果加 1 层数就要翻倍
# 这也是为什么网上有一些代码会在term()函数里乘 2,差别就在这里
v = max(v, min_value(state.getNextState(0, action), d, 1))
# print(v)
return v
"Select action for Max node"
# 开始实际调用函数,
# 注意:虽然要求最大利益的pcman走法,但这里必须调用min_value,
# 因为这里实际模拟了最后的max_value操作,因为我们要返回的终极目标是getAction()-->action
# 运用列表排序,十分巧妙的提取最大值所对应的action即最后return res[-1][0]
res = [(action, min_value(gameState.getNextState(0, action), 0, 1)) for action in
gameState.getLegalActions(0)]
res.sort(key=lambda k: k[1])
return res[-1][0]
# util.raiseNotDefined()
Question3: Alpha-Beta Pruning
class AlphaBetaAgent(MultiAgentSearchAgent):
"""
Your minimax agent with alpha-beta pruning (question 3)
"""
def getAction(self, gameState):
"""
Returns the minimax action using self.depth and self.evaluationFunction
"""
"*** YOUR CODE HERE ***"
# 加入剪枝函数,代码大体框架不变,仅仅加了A,B就可较好提升效率
# 但在这里我想说的是:由于min max 交替出现,
# 即意味着A B 的值互相交替
# 所谓剪枝就是上层对下层的约束,而此约束又源于下层(之前的节点)
#
GhostIndex = [i for i in range(1, gameState.getNumAgents())]
def term(state, d):
return state.isWin() or state.isLose() or d == self.depth
def min_value(state, d, ghost, A, B): # minimizer
if term(state, d):
return self.evaluationFunction(state)
v = float("inf")
# 注意:action:是指ghost的action,
for action in state.getLegalActions(ghost):
if ghost == GhostIndex[-1]: # next is maximizer with pacman
v = min(v, max_value(state.getNextState(ghost, action), d + 1, A, B))
else: # next is minimizer with next-ghost
v = min(v, min_value(state.getNextState(ghost, action), d, ghost + 1, A, B))
# 主要差别:
# 注意:每个子块都会执行此代码,v->max_value,且v是不断取小的,
# 一旦 v<A那么此 action 就必然失去竞争力,之后的只会比他跟小,所以剪枝
# v=B
if v < A:
return v
# 这行代码存在的意义:
# 注意:这行代码会在每一个循环中运行一遍,不断压低(或提高最优值)最大程度让后面的代码逼近
#
B = min(B, v)
return v
def max_value(state, d, A, B): # maximizer
if term(state, d):
return self.evaluationFunction(state)
v = float("-inf")
for action in state.getLegalActions(0):
v = max(v, min_value(state.getNextState(0, action), d, 1, A, B))
# 同上,越界则裁剪
# v=A
if v > B:
return v
A = max(A, v)
return v
def alphabeta(state):
v = float("-inf")
act = None
A = float("-inf")
B = float("inf")
for action in state.getLegalActions(0): # maximizing
tmp = min_value(gameState.getNextState(0, action), 0, 1, A, B)
# 注意这里的v会是最后的B,对应记录下action
if v < tmp: # same as v = max(v, tmp)
v = tmp
act = action
# 最后一次遍历
if v > B: # pruning
return v
A = max(A, tmp)
return act
return alphabeta(gameState)
# util.raiseNotDefined()
Question 4:Expectimax
class ExpectimaxAgent(MultiAgentSearchAgent):
"""
Your expectimax agent (question 4)
"""
def getAction(self, gameState):
"""
Returns the expectimax action using self.depth and self.evaluationFunction
All ghosts should be modeled as choosing uniformly at random from their
legal moves.
"""
"*** YOUR CODE HERE ***"
GhostIndex = [i for i in range(1, gameState.getNumAgents())]
def term(state, d):
return state.isWin() or state.isLose() or d == self.depth
def exp_value(state, d, ghost): # minimizer
if term(state, d):
return self.evaluationFunction(state)
v = 0
# 每一种概率为1/count
# count为pacam走了一步之后,ghost可以走成多少种状态
# 有可能pacman走完游戏就结束了,此时count = 0,此时只需要直接取各sumvalue最大值即可
prob = 1 / len(state.getLegalActions(ghost))
#
for action in state.getLegalActions(ghost):
if ghost == GhostIndex[-1]:
v += prob * max_value(state.getNextState(ghost, action), d + 1)
else:
v += prob * exp_value(state.getNextState(ghost, action), d, ghost + 1)
# print(v)
return v
def max_value(state, d): # maximizer
if term(state, d):
return self.evaluationFunction(state)
v = -10000000000000000
for action in state.getLegalActions(0):
v = max(v, exp_value(state.getNextState(0, action), d, 1))
# print(v)
return v
res = [(action, exp_value(gameState.getNextState(0, action), 0, 1)) for action in
gameState.getLegalActions(0)]
res.sort(key=lambda k: k[1])
return res[-1][0]
# util.raiseNotDefined()
Question 5:Evaluation Function
def betterEvaluationFunction(currentGameState):
"""
Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable
evaluation function (question 5).
DESCRIPTION: <write something here so we know what you did>
"""
"*** YOUR CODE HERE ***"
newPos = currentGameState.getPacmanPosition()
newFood = currentGameState.getFood().asList()
newGhostStates = currentGameState.getGhostStates()
newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]
# 以当前游戏状态为准,评价函数由距离最近的食物颗粒的距离给出,如果没有颗粒,则为0
eval = currentGameState.getScore()
foodDist = float("inf")
for food in newFood:
foodDist = min(foodDist, util.manhattanDistance(food, newPos))
eval += 1.0 / foodDist
return eval
# util.raiseNotDefined()
三.测试截图
测试框架别人写好了,直接调用就行;
MinimaxTesting:
测试用例:
python autograder.py -q q2
截图:
Alpha-Beta Pruning Testing :
测试用例:
python autograder.py -q q3 --no-graphics
截图:
Expectimax Testing:
测试用例:
python autograder.py -q q4 --no-graphics
截图:
Evaluation Function Testing :
测试用例:
python autograder.py -q q5 --no-graphics
截图: