解读《Superhuman AI for multiplayer poker》

引言

为什么poker能成为AI和博弈论领域要挑战的问题?因为人们可以优雅且高效的从poker中捕捉隐藏信息。并且针对多人牌局的AI被认为是下一阶段的重点。

多人博弈理论与实际的挑战

目前AI在游戏领域取得成绩均是基于双人零和博弈(整体的利益不会改变,要么你赢我输,要么我赢你输),AlphaGo就是基于双人零和博弈。在双人零和博弈中,应用那是均衡策略至少可以保证不输,基于双人零和博弈的AI 算法就是去寻找纳什平衡。找到一个基于三人或者更多人零和博弈的纳什平衡是非常困难的(理论上接近纳什平衡也是很困难的)。如果每个玩家单独计算找到纳什平衡,玩家联合起来的策略可能就不是一个纳什平衡。例如下面的Lemonade Stand Game:
Lemonade Stand Game
在游戏中,每个玩家都要在这个环上找到一点离其他成员尽可能地远。左图表示了四个玩家,每个颜色代表了他们的一个纳什平衡,处于纳什平衡的玩家均匀的分布在环上。右图表示如果他们独立寻找纳什平衡,那么玩家的

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques. Keywords: multi-agent learning, challenge paper, reinforcement learning, games, theory of mind, communication, imperfect information, cooperative

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值