Adversarial search

Games

adversarial search problems/games: our agents have one or more adversaries who attempt to keep them from reaching their goals
正常的搜索,search , csp,配置,games ,strategy/policy

type of games

  1. actions: deterministic or stochastic outcomes
  2. number of players
  3. zero-sum games: all the utility will be a constant
    agents have opposite utilities
    adversial,pure competetion
    general games:
    agents have independent utilities
    cooperation, competition, indifference,…
  4. perfect information: 所有信息是否知道,德州扑克有些信息不知道

deterministic zero-sum problems

s: states
player(s): the player moves at this state
action(s): a legal actions set
result(s,a): a transition model, return the utility of a move
terminal-test(s)
terminal-utilities(s,p)

normal search: a comprehensive plan
game: policy/stategy s → a s\rightarrow a sa

Minimax

我们的对手behave最优,而且试图最小化我们值

  1. terminal utilities: terminal states value
    state value : v ( s ) = m a x s ′ ∈ s u c c e s s o r   o f   s v ( s ′ ) v(s)=max_{s'\in successor\ of\ s} v(s') v(s)=maxssuccessor of sv(s)
    for terminal states: v ( s ) = k n o w n v(s)=known v(s)=known
    2.juest like a depth first search/ post-order visit of the game tree
    min
    在这里插入图片描述
    finite的情况下
    optimal solution
    和一个完全的dfs复杂度相同
    time complexity: o ( b m ) o(b^m) o(bm)
    space complexity: o ( b m ) o(bm) o(bm)

we can use a depth-limited search instead, but we need a evaluation function for the non-terminal states
不能保证最优
会耗费计算时间,计算复杂度和计算准确度的trade-off
evaluation function

  1. utility for a win state should be higher than a tie
  2. efficient: computation should be quick
  3. consisitent:related to the chance of winning the game

type of evaluation functions
a linear combination of features:
table-based evaluation function
machine learning based evaluation function

alpha-beta pruning

在这里插入图片描述
time complexity: o ( b m / 2 ) o(b^{m/2}) o(bm/2)
会导致internal nodes 的value不对

evaluation functions

input: state
output: the estimate of minimal value of the node
depth-limited minimal 中经常使用
经常用的 evaluation function: a linear combination of features
E v a l ( s ) = w t f ( s ) Eval(s)=w^tf(s) Eval(s)=wtf(s)

expectimax

在game tree中引入chance node: consider average case, expected utility

  1. rule:
    ∀ a g e n t   c o n s t r o l l e d   s t a t e s : v ( s ) = = m a x s ′ ∈ s u c c e s s o r   o f   s v ( s ′ ) \forall agent\ constrolled\ states:v(s)==max_{s'\in successor\ of\ s} v(s') agent constrolled states:v(s)==maxssuccessor of sv(s)
    for terminal states: v ( s ) = k n o w n v(s)=known v(s)=known
    for chance states: v ( s ) = ∑ s ′ ∈ s u c c e s s o r s ( s ) p ( s ′ ∣ s ) V ( s ′ ) v(s)=\sum_{s'\in successors(s)}p(s'|s)V(s') v(s)=ssuccessors(s)p(ss)V(s)
    在这里插入图片描述
    can’t prune
    可以用估计进行 depth-limited search

Mixed layer types

可以视情况组合搜索树的层

General Games

multi-agent utilities:
uyility: tuples 每个元素代表了一个agent的效益
each layer try to maximize their utility ignoring others utilities
can cooperate competition and cooperation dymatically

Utility

generate a viable utility function
principle of maximum utility: they must select the action that maximize their expected utility
rational preference:
在这里插入图片描述
Axioms of rationality:
在这里插入图片描述
given preference satisfies such constraints ,there exists a real-valued function satisfied that
在这里插入图片描述
在flat-payment和lottery之间的倾向性
risk-neutral
risk-averse
risk-seeking

题目

对于有上下界的expect问题可能存在剪枝
multi-player 不能剪枝,有合作

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
浙江大学人工智能课程课件,内容有: Introduction Problem-solving by search( 4 weeks) Uninformed Search and Informed (Heuristic) Search (1 week) Adversarial Search: Minimax Search, Evaluation Functions, Alpha-Beta Search, Stochastic Search Adversarial Search: Multi-armed bandits, Upper Confidence Bound (UCB),Upper Confidence Bounds on Trees, Monte-Carlo Tree Search(MCTS) Statistical learning and modeling (5 weeks) Probability Theory, Model selection, The curse of Dimensionality, Decision Theory, Information Theory Probability distribution: The Gaussian Distribution, Conditional Gaussian distributions, Marginal Gaussian distributions, Bayes’ theorem for Gaussian variables, Maximum likelihood for the Gaussian, Mixtures of Gaussians, Nonparametric Methods Linear model for regression: Linear basis function models; The Bias-Variance Decomposition Linear model for classification : Basic Concepts; Discriminant Functions (nonprobabilistic methods); Probabilistic Generative Models; Probabilistic Discriminative Models K-means Clustering and GMM & Expectation–Maximization (EM) algorithm, BoostingThe Course Syllabus Deep Learning (4 weeks) Stochastic Gradient Descent, Backpropagation Feedforward Neural Network Convolutional Neural Networks Recurrent Neural Network (LSTM, GRU) Generative adversarial network (GAN) Deep learning in NLP (word2vec), CV (localization) and VQA(cross-media) Reinforcement learning (1 weeks) Reinforcement learning: introduction

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值