《reinforcement learning:an introduction》第八章《Planning and Learning with Tabular Methods》总结

本文是对《reinforcement learning:an introduction》第八章的总结,探讨了模型与规划、Dyna-Q、模型错误时的影响及优先清扫策略。重点讲解了如何在有限的体验中建立模型,通过模拟来改善政策,以及当模型不准确时,如何处理探索与利用之间的权衡问题。
摘要由CSDN通过智能技术生成

由于组里新同学进来,需要带着他入门RL,选择从silver的课程开始。

对于我自己,增加一个仔细阅读《reinforcement learning:an introduction》的要求。

因为之前读的不太认真,这一次希望可以认真一点,将对应的知识点也做一个简单总结。





8.1 Models and Planning

By a model of the environment we mean anything that an agent can use to predict how the environment will respond to its actions 


The word planning is used in several different ways in different fields. We use the term to refer to any computational process that takes a model as input and produces or improves a policy for interacting with the modeled environment

The difference is that whereas planning uses simulated experience generated by a model, learning methods use real experience generated by the environment. Of course this difference leads to a number of other differences

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值