Also known as Adversarial Search Problems
Deterministic zero-sum games
Components
- Initial state
- Players
- Actions
- Transition model
- Terminal test
- Terminal values(Utility)
- State Value
State value is the best possible outcome (utility) an agent can achieve from that state
∀ \forall ∀ non-terminal states, V ( S ) = m a x s ∈ s u c c e s s o r s ( s ) V s V(S) = max_{s\in successors(s)}V_s V(S)=maxs∈successors(s)Vs
∀ \forall ∀ terminal states, V ( S ) = k n o w n V(S) =known V(S)=known
Minimax
∀ \forall ∀ agent-controlled states, V ( S ) = m a x s ∈ s u c c e s s o r s ( s ) V s V(S) = max_{s\in successors(s)}V_s V(S)=maxs∈successors(s)Vs
∀ \forall ∀ opponent-controlled states, V ( S ) = m i n s ∈ s u c c e s s o r s ( s ) V s V(S) = min_{s\in successors(s)}V_s V(S)=mins∈successors(s)Vs
∀ \forall ∀ terminal states, V ( S ) = k n o w n V(S) =known V(S)=known
In implementation, minimax behaves similarly to postorder traversal depth-first search.
Alpha-Beta Pruning
x x x is the terminal value need to be looked up
β ≥ x ≥ α \beta \ge x \ge \alpha β≥x≥α
Still need to reach the bottom.
Evaluation Functions
Evaluation function takes a state, and output the evaluated state value.
Most common form:
E v a l ( s ) = ∑ i w i f i ( s ) Eval(s) = \sum_i w_if_i(s) Eval(s)=i∑wifi(s)
weights:
w
i
w_i
wi,
feature:
f
i
f_i
fi
Expectimax
Chance nodes:
Instead of considering the worst case as minimizer nodes do, considers the average case.
∀ \forall ∀ agent-controlled states, V ( S ) = m a x s ′ ∈ s u c c e s s o r s ( s ) V s ′ V(S) = max_{s'\in successors(s)}V_{s'} V(S)=maxs′∈successors(s)Vs′
∀ \forall ∀ chance states, V ( S ) = m i n s ′ ∈ s u c c e s s o r s ( s ) P ( s ′ ∣ s ) V s ′ V(S) = min_{s'\in successors(s)} P(s'|s)V_{s'} V(S)=mins′∈successors(s)P(s′∣s)Vs′
∀ \forall ∀ terminal states, V ( S ) = k n o w n V(S) =known V(S)=known
Mixed Layers Types
Many Players
Monte Carlo Tree Search
For games with a large branching factor
- Evaluation by rollouts
- Selective search
UCB Algorithm:
U C B 1 ( t ) = U ( n ) N ( n ) + C × l o g N ( p a r e n t ( n ) ) N ( n ) UCB_1(t) = \frac{U(n)}{N(n)} + C \times \sqrt{\frac{logN(parent(n))}{N(n)}} UCB1(t)=N(n)U(n)+C×N(n)logN(parent(n))
When N ( n ) N(n) N(n) to + ∞ +\infin +∞, MCTS approaches Minimax.
General Games
Multi-agent utilities: using a tuple to represent the different utility values.
Summary
- Minimax: when opponents behave optimally
- Expectimax: when opponents behave sub-optimally
- Monte Carlo Tree Search: when a large branching factor
- General game: using tuples.