CS188 Games

Also known as Adversarial Search Problems

Deterministic zero-sum games

Components

  • Initial state
  • Players
  • Actions
  • Transition model
  • Terminal test
  • Terminal values(Utility)
  • State Value

State value is the best possible outcome (utility) an agent can achieve from that state

∀ \forall non-terminal states, V ( S ) = m a x s ∈ s u c c e s s o r s ( s ) V s V(S) = max_{s\in successors(s)}V_s V(S)=maxssuccessors(s)Vs
∀ \forall terminal states, V ( S ) = k n o w n V(S) =known V(S)=known

Minimax

∀ \forall agent-controlled states, V ( S ) = m a x s ∈ s u c c e s s o r s ( s ) V s V(S) = max_{s\in successors(s)}V_s V(S)=maxssuccessors(s)Vs
∀ \forall opponent-controlled states, V ( S ) = m i n s ∈ s u c c e s s o r s ( s ) V s V(S) = min_{s\in successors(s)}V_s V(S)=minssuccessors(s)Vs
∀ \forall terminal states, V ( S ) = k n o w n V(S) =known V(S)=known

In implementation, minimax behaves similarly to postorder traversal depth-first search.

Alpha-Beta Pruning

x x x is the terminal value need to be looked up
β ≥ x ≥ α \beta \ge x \ge \alpha βxα

Still need to reach the bottom.

Evaluation Functions

Evaluation function takes a state, and output the evaluated state value.

Most common form:

E v a l ( s ) = ∑ i w i f i ( s ) Eval(s) = \sum_i w_if_i(s) Eval(s)=iwifi(s)

weights: w i w_i wi,
feature: f i f_i fi

Expectimax

Chance nodes:

Instead of considering the worst case as minimizer nodes do, considers the average case.

∀ \forall agent-controlled states, V ( S ) = m a x s ′ ∈ s u c c e s s o r s ( s ) V s ′ V(S) = max_{s'\in successors(s)}V_{s'} V(S)=maxssuccessors(s)Vs
∀ \forall chance states, V ( S ) = m i n s ′ ∈ s u c c e s s o r s ( s ) P ( s ′ ∣ s ) V s ′ V(S) = min_{s'\in successors(s)} P(s'|s)V_{s'} V(S)=minssuccessors(s)P(ss)Vs
∀ \forall terminal states, V ( S ) = k n o w n V(S) =known V(S)=known

Mixed Layers Types

Many Players

Monte Carlo Tree Search

For games with a large branching factor

  • Evaluation by rollouts
  • Selective search

UCB Algorithm:
U C B 1 ( t ) = U ( n ) N ( n ) + C × l o g N ( p a r e n t ( n ) ) N ( n ) UCB_1(t) = \frac{U(n)}{N(n)} + C \times \sqrt{\frac{logN(parent(n))}{N(n)}} UCB1(t)=N(n)U(n)+C×N(n)logN(parent(n))

When N ( n ) N(n) N(n) to + ∞ +\infin +, MCTS approaches Minimax.

General Games

Multi-agent utilities: using a tuple to represent the different utility values.

请添加图片描述

Summary

  • Minimax: when opponents behave optimally
  • Expectimax: when opponents behave sub-optimally
  • Monte Carlo Tree Search: when a large branching factor
  • General game: using tuples.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值