Games
adversarial search problems/games: our agents have one or more adversaries who attempt to keep them from reaching their goals
正常的搜索,search , csp,配置,games ,strategy/policy
type of games
- actions: deterministic or stochastic outcomes
- number of players
- zero-sum games: all the utility will be a constant
agents have opposite utilities
adversial,pure competetion
general games:
agents have independent utilities
cooperation, competition, indifference,… - perfect information: 所有信息是否知道,德州扑克有些信息不知道
deterministic zero-sum problems
s: states
player(s): the player moves at this state
action(s): a legal actions set
result(s,a): a transition model, return the utility of a move
terminal-test(s)
terminal-utilities(s,p)
normal search: a comprehensive plan
game: policy/stategy
s
→
a
s\rightarrow a
s→a
Minimax
我们的对手behave最优,而且试图最小化我们值
- terminal utilities: terminal states value
state value : v ( s ) = m a x s ′ ∈ s u c c e s s o r o f s v ( s ′ ) v(s)=max_{s'\in successor\ of\ s} v(s') v(s)=maxs′∈successor of sv(s′)
for terminal states: v ( s ) = k n o w n v(s)=known v(s)=known
2.juest like a depth first search/ post-order visit of the game tree
min
finite的情况下
optimal solution
和一个完全的dfs复杂度相同
time complexity: o ( b m ) o(b^m) o(bm)
space complexity: o ( b m ) o(bm) o(bm)
we can use a depth-limited search instead, but we need a evaluation function for the non-terminal states
不能保证最优
会耗费计算时间,计算复杂度和计算准确度的trade-off
evaluation function
- utility for a win state should be higher than a tie
- efficient: computation should be quick
- consisitent:related to the chance of winning the game
type of evaluation functions
a linear combination of features:
table-based evaluation function
machine learning based evaluation function
alpha-beta pruning
time complexity:
o
(
b
m
/
2
)
o(b^{m/2})
o(bm/2)
会导致internal nodes 的value不对
evaluation functions
input: state
output: the estimate of minimal value of the node
depth-limited minimal 中经常使用
经常用的 evaluation function: a linear combination of features
E
v
a
l
(
s
)
=
w
t
f
(
s
)
Eval(s)=w^tf(s)
Eval(s)=wtf(s)
expectimax
在game tree中引入chance node: consider average case, expected utility
- rule:
∀ a g e n t c o n s t r o l l e d s t a t e s : v ( s ) = = m a x s ′ ∈ s u c c e s s o r o f s v ( s ′ ) \forall agent\ constrolled\ states:v(s)==max_{s'\in successor\ of\ s} v(s') ∀agent constrolled states:v(s)==maxs′∈successor of sv(s′)
for terminal states: v ( s ) = k n o w n v(s)=known v(s)=known
for chance states: v ( s ) = ∑ s ′ ∈ s u c c e s s o r s ( s ) p ( s ′ ∣ s ) V ( s ′ ) v(s)=\sum_{s'\in successors(s)}p(s'|s)V(s') v(s)=∑s′∈successors(s)p(s′∣s)V(s′)
can’t prune
可以用估计进行 depth-limited search
Mixed layer types
可以视情况组合搜索树的层
General Games
multi-agent utilities:
uyility: tuples 每个元素代表了一个agent的效益
each layer try to maximize their utility ignoring others utilities
can cooperate competition and cooperation dymatically
Utility
generate a viable utility function
principle of maximum utility: they must select the action that maximize their expected utility
rational preference:
Axioms of rationality:
given preference satisfies such constraints ,there exists a real-valued function satisfied that
在flat-payment和lottery之间的倾向性
risk-neutral
risk-averse
risk-seeking
题目
对于有上下界的expect问题可能存在剪枝
multi-player 不能剪枝,有合作