AI基础 : Adversarial Search II 对抗性搜索

Non-deterministic Transitions

AND-OR Search Trees

• In deterministic environments在确定性环境中,分支仅由智能体的选择引起。, branching only occurs due to agent’s choice (OR Nodes)
• In non-deterministic environments在非确定性环境中,除了智能体的选择,环境的随机性也会导致分支, the environment’s choice must also be taken into account (AND Nodes)
• Solution is a subtree of the AND-OR tree that:
— Has a goal node at every leaf
— Specifies an action at each OR node
— Includes every outcome branch of its AND nodes

AND-OR Graph Search

Adversarial Optimal Decisions

• Time Complexity O(bm)
• Space Complexity O(bm)
• Chess, on average: b = 30 m = 40

Reducing Complexity

• Reducing complexity of bm
— Reduce branching factor (b)?
— Reduce maximum search depth (m)?
— Searching in a graph rather than a tree? 在树形结构中,状态之间的连接是分层的,而在图形结构中,状态之间的连接可以是任意形式的。

Reducing Branching Factor

• Alpha-Beta Pruning
— Evaluate which nodes/branches would not affect MIN/MAX’s decision
— Based on keeping track of two parameters:
◦ α - value of the best (highest) choice we have in MAX’s path
◦ β - value of the best (lowest) choice we have in MIN’s path
• Updates these values as one goes along the tree

Move Ordering

• Pruning is strongly affected by the ordering of the moves in the tree
— A good ordering*, would enable us to prune many nodes
• Move ordering is often game-dependent knowledge (heuristic)
• Dynamic move-ordering (killer-move heuristic)  可以利用搜索树中已知的有效剪枝信息。

Reducing Depth - Killer Move

• Dynamic heuristic to determine a “good” ordering
• Search two plies ahead until Max (alt. Min) causes a beta (alt. alpha) cutoff
• The move that caused the cutoff is the killer move

在搜索过程中,算法会搜索两步,直到MAX(或MIN)玩家导致一个剪枝。

如果一个移动导致剪枝,那么这个移动被称为killer move。

Reducing M - Eval Function 减少评估函数的复杂性

Weighted linear function over features of a state

示例:国际象棋当前状态:棋子和位置(结构)

示例:万智牌(纸牌游戏)当前状态:生命值、游戏卡牌和手牌

Graph Search

• As in non-adversarial search, many states will be revisited 搜索可能需要探索不同的路径
• However, only recording visited states is not enough (since MIN can deviate in the future)
• Need to store actual loop paths (memory intensive)
— Requires “caching” strategy

Stochastic Games

• Outcome of agent choices is not deterministic
— Games must take into account multiple outcomes for the player
• Solution: weight outcomes by their probability
— Expected value

Expectiminimax

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值