Reinforcement Learning Exercise 3.29

最新推荐文章于 2022-11-24 16:50:23 发布

YeXiang\^-^/

最新推荐文章于 2022-11-24 16:50:23 发布

阅读量425

点赞数

分类专栏： reinforcement learning 文章标签： reinforcement learning

本文链接：https://blog.csdn.net/ballade2012/article/details/89789721

版权

This exercise focuses on reformulating the Bellman equations for four key value functions in Reinforcement Learning: vπ, v∗, qπ, and q∗ using the state transition probability function p and reward function r. The derivations are provided for each function, illustrating how they can be expressed in terms of these functions." 111434171,10296849,Python爬虫分析2017-2018欧洲五大联赛,"['Python', '数据可视化', '足球分析', '数据爬取', '数据分析']

摘要由CSDN通过智能技术生成

Exercise 3.29 Rewrite the four Bellman equations for the four value functions ( $v_\pi$ , $v *$ , $q_\pi$ , and $q_*$ ) in terms of the three argument function p (3.4) and the two-argument function r(3.5).

For $v_\pi$ :
$\begin{aligned} v_\pi(s) &= \sum_a \pi(a|s) \sum_{s', r}p(s', r | s, a) \bigl [ r + \gamma v_\pi(s') \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{s', r}rp(s', r | s, a) + \sum_{s',r}\gamma v_\pi(s') p(s', r | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{r}rp( r | s, a) + \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ r(s,a)+ \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ \end{aligned}$

最低0.47元/天解锁文章

YeXiang\^-^/

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Reinforcement Learning Exercise 3.29

Exercise 3.29 Rewrite the four Bellman equations for the four value functions (vπv_\pivπ, v∗v*v∗, qπq_\piqπ, and q∗q_*q∗) in terms of the three argument function p (3.4) and the two-argument functi...
复制链接

扫一扫