Exercise 7.1 In Chapter 6 we noted that the Monte Carlo error can be written as the sum of TD errors (6.6) if the value estimates don’t change from step to step. Show that the n-step error used in (7.2) can also be written as a sum TD errors (again if the value estimates don’t change) generalizing the earlier result.
Here, according to equation (7.2), the TD error is:
δ t = G t : t + n − V t + n − 1 ( S t ) \delta_t = G_{t:t+n} - V_{t+n-1}(S_t) δt=Gt:t+n−Vt+n−1(St)
For G t : t + n G_{t:t+n} Gt:t+n there is:
G t : t + n = { R t + 1 + γ R t + 2 + ⋯ + γ n − 1 R t + n + γ n V t + n − 1 ( S t + n ) ( n ≥ 1 and 0 ≤ t < T − n ) R t + 1 + γ R t + 2 + ⋯ + γ T − t − 1 R T ( t + n ≥ T ) G_{t:t+n} = \begin{cases} R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{n-1}R_{t+n} + \gamma^n V_{t+n-1}(S_{t+n}) & (n \geq 1 \text{ and } 0 \leq t < T-n) \\ R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{T-t-1}R_T & (t+n \geq T) \end{cases} Gt:t+n={
Rt+1+γRt+2+⋯+γn−1Rt+n+γnVt+n−1(St+n)Rt+1+γRt+2+⋯+γT−t−1RT(n≥1 and 0≤t<T−n)(t+n≥T)
Then, for t + n ≥ T t+n\geq T t+n≥T, the Monte Carlo error is:
G t − V t + n ( S t ) = R t + 1 + γ R t + 2 + ⋯ + γ T − t − 1 R T − V t + n ( S t ) = G t : t + n − V t + n ( S t ) \begin{aligned} G_t - V_{t+n}(S_t) &= R_{t+1} + \gamma R_{t+2} +\cdots + \gamma^{T-t-1}R_T - V_{t+n}(S_t) \\ &=G_{t:t+n}-V_{t+n}(S_t) \end{aligned} Gt−Vt+n(St)=Rt+1+γRt+2+⋯+γT−t−1RT−Vt+n(St