Reinforcement Learning Exercise 3.24

最新推荐文章于 2022-11-24 16:50:23 发布

YeXiang\^-^/

最新推荐文章于 2022-11-24 16:50:23 发布

阅读量646

点赞数

分类专栏： reinforcement learning 文章标签： reinforcement learning

本文链接：https://blog.csdn.net/ballade2012/article/details/89764268

版权

该博客探讨了强化学习中的一个练习问题，涉及一个特定的gridworld环境。在这个环境中，状态A和B分别给予+10和+5的奖励，并会跳转到相应的位置A'和B'。博主通过分析最优策略，推导出状态A的最优值为无限循环序列，并使用等比数列公式计算出当γ=0.9时，状态A的最优值精确到三位小数为24.419。

摘要由CSDN通过智能技术生成

Exercise 3.24 Figure 3.5 gives the optimal value of the best state of the gridworld as 24.4, to one decimal place. Use your knowledge of the optimal policy and (3.8) to express this value symbolically, and then to compute it to three decimal places.
在这里插入图片描述
The rule of this gridworld is that: If the agent is in status A, no matter which direction it will move to in next step, the reward is +10, and will have to jump to A’ in next step. Similarly, if the agent is in status B, no matter which direction it will move to in next step, the reward is +5, and will have to jump to B’ in next step. If the agent move to the edge of the gridworld and is going to move to the outside of the gridworld, the reward is -1, and it will have to stay in its place in next step.
According to the definition,
$\begin{aligned} v_*(s) &=\max_{a \in \mathcal A(s)} q_{\pi_*}(s,a) \\ &= \max_{a \in \mathcal A(s)} \mathbb E_{\pi_*}(G_t|S_t=s, A_t=a)\\ &= \max_{a \in \mathcal A(s)} \mathbb E_{\pi_*}(\sum_{k=0}^{\infty} \gamma^k r_{t+k+1}|S_t=s, A_t=a)\\ \end{aligned}$

最低0.47元/天解锁文章

YeXiang\^-^/

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Reinforcement Learning Exercise 3.24

Exercise 3.24 Figure 3.5 gives the optimal value of the best state of the gridworld as 24.4, to one decimal place. Use your knowledge of the optimal policy and (3.8) to express this value symbolically...
复制链接

扫一扫