Reinforcement Learning Exercise 7.1

Exercise 7.1 In Chapter 6 we noted that the Monte Carlo error can be written as the sum of TD errors (6.6) if the value estimates don’t change from step to step. Show that the n-step error used in (7.2) can also be written as a sum TD errors (again if the value estimates don’t change) generalizing the earlier result.

Here, according to equation (7.2), the TD error is:
δ t = G t : t + n − V t + n − 1 ( S t ) \delta_t = G_{t:t+n} - V_{t+n-1}(S_t) δt=Gt:t+nVt+n1(St)
For G t : t + n G_{t:t+n} Gt:t+n there is:
G t : t + n = { R t + 1 + γ R t + 2 + ⋯ + γ n − 1 R t + n + γ n V t + n − 1 ( S t + n ) ( n ≥ 1  and  0 ≤ t < T − n ) R t + 1 + γ R t + 2 + ⋯ + γ T − t − 1 R T ( t + n ≥ T ) G_{t:t+n} = \begin{cases} R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{n-1}R_{t+n} + \gamma^n V_{t+n-1}(S_{t+n}) & (n \geq 1 \text{ and } 0 \leq t < T-n) \\ R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{T-t-1}R_T & (t+n \geq T) \end{cases} Gt:t+n={ Rt+1+γRt+2++γn1Rt+n+γnVt+n1(St+n)Rt+1+γRt+2++γTt1RT(n1 and 0t<Tn)(t+nT)
Then, for t + n ≥ T t+n\geq T t+nT, the Monte Carlo error is:
G t − V t + n ( S t ) = R t + 1 + γ R t + 2 + ⋯ + γ T − t − 1 R T − V t + n ( S t ) = G t : t + n − V t + n ( S t ) \begin{aligned} G_t - V_{t+n}(S_t) &= R_{t+1} + \gamma R_{t+2} +\cdots + \gamma^{T-t-1}R_T - V_{t+n}(S_t) \\ &=G_{t:t+n}-V_{t+n}(S_t) \end{aligned} GtVt+n(St)=Rt+1+γRt+2++γTt1RTVt+n(St

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值