逆强化学习(Inverse Reinforcement Learning)

谷歌 DeepMind 首席程序员,AlphaGo 创始人之一,UCL 的 David Silver 教授对于 IRL 的观点是:


Recently, a new set of approaches have been developed for learning from demonstration based on the concept of Inverse Optimal Control

Rather than learn a mapping from perceptual features to actions, these approaches seek to learn a mapping from perceptual features to costs, such that a planner minimizing said costs will achieve the expert demonstrated behavior. 

These methods take advantage of the fact that while it is difficult for an expert to define an ordering of preferences, it is easy for an expert to demonstrate the desired behavior


也就是说,人类(专家)很难对偏好进行排序,但是演示所需的行为是很简单的,这也就是逆强化学习背后的逻辑。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值