Adversarial search

最新推荐文章于 2023-07-19 16:09:57 发布

@yuqing_wang

最新推荐文章于 2023-07-19 16:09:57 发布

阅读量614

点赞数

分类专栏： Introduction to AI

本文链接：https://blog.csdn.net/weixin_43199124/article/details/110693914

版权

Introduction to AI 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Games

adversarial search problems/games: our agents have one or more adversaries who attempt to keep them from reaching their goals
正常的搜索，search ， csp，配置，games ，strategy/policy

type of games

actions: deterministic or stochastic outcomes
number of players
zero-sum games: all the utility will be a constant
agents have opposite utilities
adversial,pure competetion
general games:
agents have independent utilities
cooperation, competition, indifference,…
perfect information: 所有信息是否知道，德州扑克有些信息不知道

deterministic zero-sum problems

s: states
player(s): the player moves at this state
action(s): a legal actions set
result(s,a): a transition model, return the utility of a move
terminal-test(s)
terminal-utilities(s,p)

normal search: a comprehensive plan
game: policy/stategy $s\rightarrow a$

Minimax

我们的对手behave最优，而且试图最小化我们值

terminal utilities： terminal states value
state value : $v(s)=max_{s'\in successor\ of\ s} v(s')$
for terminal states: $v (s) = k n o w n$
2.juest like a depth first search/ post-order visit of the game tree
min

finite的情况下
optimal solution
和一个完全的dfs复杂度相同
time complexity: $o(b^m)$
space complexity: $o (b m)$

we can use a depth-limited search instead, but we need a evaluation function for the non-terminal states
不能保证最优
会耗费计算时间，计算复杂度和计算准确度的trade-off
evaluation function

utility for a win state should be higher than a tie
efficient: computation should be quick
consisitent:related to the chance of winning the game

type of evaluation functions
a linear combination of features:
table-based evaluation function
machine learning based evaluation function

alpha-beta pruning

在这里插入图片描述
time complexity: $o(b^{m/2})$
会导致internal nodes 的value不对

evaluation functions

input: state
output: the estimate of minimal value of the node
depth-limited minimal 中经常使用
经常用的 evaluation function： a linear combination of features
$Eval(s)=w^tf(s)$

expectimax

在game tree中引入chance node： consider average case， expected utility

rule:
$\forall agent\ constrolled\ states:v(s)==max_{s'\in successor\ of\ s} v(s')$
for terminal states: $v (s) = k n o w n$
for chance states: $v(s)=\sum_{s'\in successors(s)}p(s'|s)V(s')$

can’t prune
可以用估计进行 depth-limited search

Mixed layer types

可以视情况组合搜索树的层

General Games

multi-agent utilities:
uyility: tuples 每个元素代表了一个agent的效益
each layer try to maximize their utility ignoring others utilities
can cooperate competition and cooperation dymatically

Utility

generate a viable utility function
principle of maximum utility: they must select the action that maximize their expected utility
rational preference:
在这里插入图片描述
Axioms of rationality:

given preference satisfies such constraints ,there exists a real-valued function satisfied that

在flat-payment和lottery之间的倾向性
risk-neutral
risk-averse
risk-seeking

题目

对于有上下界的expect问题可能存在剪枝
multi-player 不能剪枝，有合作

@yuqing_wang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Adversarial search

Gamesadversarial search problems/games: our agents have one or more adversaries who attempt to keep them from reaching their goalstype of gamesactions:deterministic or stochastic outcomesnumber of playerszero-sum games: all the utility will be a co
复制链接

扫一扫