强化学习-An introduction之 n-step Bootstrapping 个人笔记

最新推荐文章于 2021-07-04 00:18:56 发布

MrTriste

最新推荐文章于 2021-07-04 00:18:56 发布

阅读量1k

点赞数

分类专栏：强化学习文章标签：强化学习 n-step bootstrapping

本文链接：https://blog.csdn.net/wjc1182511338/article/details/80939946

版权

强化学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Chapter 7 n-step Bootstrapping

什么是bootstrapping？

第四章summary中讲到：That is, they update estimates on the basis of other estimates. We call this general idea bootstrapping。 即在其他估计的基础上更新估计，比如根据其他状态的估计值来更新这个状态的估计值。

这章主要讲如何通过multi-step bootstrapping methods将MC方法的优势和TD方法的优势结合在一起。

1 n-step TD prediction

TD methods -> intermediate method -> Monte Carlo methods：

MC methods用整个序列中观察到的reward和来update value，one-step TD只用了下一步的reward+bootstrapping下一个状态的value。它们是两个极端，一般的n-step TD处于中间。

MC’s complete return：

one-step return：

two-step return：

n-step return：

state-value learning algorithm for using n-step returns：

我们看到，n-step return与MC的return区别是，n-step return uses the value function $V_{t+n-1}$ to correct for the missing rewards beyond $R_{t+n}$ .

n-step TD：

解释一下，当到t时刻时，我们更新时刻 $\tau=t-n+1$ 的状态的value，即更新前面的state value，而不是t时刻的。

2 n-step Sarsa（on-policy）

n-step Sarsa return：

update rule：

on-policy n-step Sarsa：

It can speed up learning compared to one-step methods.

3 n-step Sarsa（off-policy）

与on-policy Sarsa相比，只是多了importance sampling。

n-step Sarsa update：

off-policy n-step Sarsa：

4 n-step Tree Backup Algorithm（off-policy）

off-policy method一般要用到importance sampling，那么不用importance sampling呢？

one-step method有Q-Learning和Expected Sarsa；

multi-step method是我们接下来要讲的——tree-backup algorithm.

我们以下图来讲解tree-backup algorithm的核心思想。

如果我们要更新 $q(S_t,A_t)$ ，需要计算这个state-action pair下的 $G_t$ （即reward和），首先需要加上 $R_{t+1}$ ，然后是下一层是三个state-action pair，对于 $a\ne A_{t+1}$ 的行动，用采取它们的概率 $\pi(a|S_{t+1})$ 乘以它们的value，对于 $a=A_{t+1}$ ，用概率 $\pi(A_{t+1}|S_{t+1})$ 乘以 $G_{t+1}$ ， $G_{t+1}$ 的计算与 $G_t$ 相同。

具体来讲：

one-step tree-backup return：

two-step tree-backup return：

n-step tree-backup return：

action-value update rule：

n-step Tree Backup Algorithm：

MrTriste

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
强化学习-An introduction之 n-step Bootstrapping 个人笔记

Chapter 7 n-step Bootstrapping什么是bootstrapping？第四章summary中讲到：That is, they update estimates on the basis of other estimates. We call this general idea bootstrapping。即在其他估计的基础上更新估计，比如根据其他状态的估计值来更新...
复制链接

扫一扫