Exercise 7.4 Prove that the n-step return of Sarsa (7.4) can be written exactly in terms of a novel TD error, as
G
t
:
t
+
n
=
Q
t
−
1
(
S
t
,
A
t
)
+
∑
k
=
t
m
i
n
(
t
+
n
,
T
)
−
1
γ
k
−
t
[
R
k
+
1
+
γ
Q
k
(
S
k
+
1
,
A
k
+
1
)
−
Q
k
−
1
(
S
k
,
A
k
)
]
G_{t:t+n}=Q_{t-1}(S_t,A_t)+\sum_{k=t}^{min(t+n,T)-1} \gamma^{k-t}[R_{k+1} + \gamma Q_k( S_{k+1}, A_{k+1}) - Q_{k-1}(S_k,A_k)]
Gt:t+n=Qt−1(St,At)+k=t∑min(t+n,T)−1γk−t[Rk+1+γQk(Sk+1,Ak+1)−Qk−1(Sk,Ak)]
Prove:
First
G
t
:
t
+
n
G_{t:t+n}
Gt:t+n can be written in terms of the sum of difference:
G
t
:
t
+
n
=
G
t
:
t
+
1
−
G
t
:
t
+
1
+
G
t
:
t
+
2
−
G
t
:
t
+
2
+
⋯
+
G
t
:
t
+
n
−
2
−
G
t
:
t
+
n
−
2
+
G
t
:
t
+
n
−
1
−
G
t
:
t
+
n
−
1
+
G
t
:
t
+
n
=
G
t
:
t
+
1
+
(
G
t
:
t
+
2
−
G
t
:
t
+
1
)
+
⋯
+
(
G
t
:
t
+
n
−
G
t
:
t
+
n
−
1
)
=
G
t
:
t
+
1
+
∑
i
=
2
n
(
G
t
:
t
+
i
−
G
t
:
t
+
i
−
1
)
(1)
\begin{aligned} G_{t:t+n} &= G_{t:t+1} -G_{t:t+1} +G_{t:t+2} -G_{t:t+2} + \cdots + G_{t:t+n-2} -G_{t:t+n-2} +G_{t:t+n-1} -G_{t:t+n-1} + G_{t:t+n}\\ &=G_{t:t+1} + (G_{t:t+2} - G_{t:t+1}) + \cdots +(G_{t:t+n} - G_{t:t+n-1})\\ &=G_{t:t+1}+\sum_{i=2}^n(G_{t:t+i}-G_{t:t+i-1}) \tag{1} \end{aligned}
Gt:t+n=Gt:t+1−Gt:t+1+Gt:t+2−Gt:t+2+⋯+Gt:t+n−2−Gt:t+n−2+Gt:t+n−1−Gt:t+n−1+Gt:t+n=Gt:t+1+(Gt:t+2−Gt:t+1)+⋯+(Gt:t+n−Gt:t+n−1)=Gt:t+1+i=2∑n(Gt:t+i−Gt:t+i−1)(1)
According to Sarsa (7.4)
G
t
:
t
+
n
≐
R
t
+
1
+
γ
R
t
+
2
+
⋯
+
γ
n
−
1
R
t
+
n
+
γ
n
Q
t
+
n
−
1
(
S
t
+
n
,
A
t
+
n
)
,
n
≥
1
,
0
≤
t
<
T
−
n
(7.4)
G_{t:t+n} \doteq R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{n-1}R_{t+n} + \gamma^n Q_{t+n-1}(S_{t+n}, A_{t+n}), \qquad n \geq1, 0 \leq t < T-n \tag{7.4}
Gt:t+n≐Rt+1+γRt+2+⋯+γn−1Rt+n+γnQt+n−1(St+n,At+n),n≥1,0≤t<T−n(7.4)
there is:
G
t
:
t
+
n
−
G
t
:
t
+
n
−
1
=
γ
n
−
1
R
t
+
n
+
γ
n
Q
t
+
n
−
1
(
S
t
+
n
,
A
t
+
n
)
−
γ
n
−
1
Q
t
+
n
−
2
(
S
t
+
n
−
1
,
A
t
+
n
−
1
)
=
γ
n
−
1
[
R
t
+
n
+
γ
Q
t
+
n
−
1
(
S
t
+
n
,
A
t
+
n
)
−
Q
t
+
n
−
2
(
S
t
+
n
−
1
,
A
t
+
n
−
1
)
]
(2)
\begin{aligned} G_{t:t+n} - G_{t:t+n-1} & = \gamma^{n-1}R_{t+n} + \gamma^n Q_{t+n-1}(S_{t+n} , A_{t+n}) - \gamma^{n-1} Q_{t+n-2}(S_{t+n-1}, A_{t+n-1}) \\ &= \gamma^{n-1}\bigl[ R_{t+n} + \gamma Q_{t+n-1}(S_{t+n} , A_{t+n}) -Q_{t+n-2}(S_{t+n-1} , A_{t+n-1})\bigr] \tag{2} \end{aligned}
Gt:t+n−Gt:t+n−1=γn−1Rt+n+γnQt+n−1(St+n,At+n)−γn−1Qt+n−2(St+n−1,At+n−1)=γn−1[Rt+n+γQt+n−1(St+n,At+n)−Qt+n−2(St+n−1,At+n−1)](2)
and for
n
=
1
n=1
n=1, there is:
G
t
:
t
+
1
=
γ
0
R
t
+
1
+
γ
1
Q
t
+
1
−
1
(
S
t
+
1
,
A
t
+
1
)
=
γ
0
R
t
+
1
+
γ
1
Q
t
+
1
−
1
(
S
t
+
1
,
A
t
+
1
)
−
Q
t
−
1
(
S
t
,
A
t
)
+
Q
t
−
1
(
S
t
,
A
t
)
=
Q
t
−
1
(
S
t
,
A
t
)
+
γ
0
[
R
t
+
1
+
γ
Q
t
(
S
t
+
1
,
A
t
+
1
)
−
Q
t
−
1
(
S
t
,
A
t
)
]
(3)
\begin{aligned} G_{t:t+1} &=\gamma^0 R_{t+1} + \gamma^1 Q_{t+1-1} (S_{t+1}, A_{t+1}) \\ &=\gamma^0R_{t+1} + \gamma^1 Q_{t+1-1} (S_{t+1}, A_{t+1}) - Q_{t-1}(S_t, A_t) + Q_{t-1}(S_t, A_t) \\ &= Q_{t-1}(S_t, A_t) + \gamma^0 \bigl[ R_{t+1} + \gamma Q_t(S_{t+1}, A_{t+1}) - Q_{t-1}(S_t, A_t) \bigr ] \tag{3} \end{aligned}
Gt:t+1=γ0Rt+1+γ1Qt+1−1(St+1,At+1)=γ0Rt+1+γ1Qt+1−1(St+1,At+1)−Qt−1(St,At)+Qt−1(St,At)=Qt−1(St,At)+γ0[Rt+1+γQt(St+1,At+1)−Qt−1(St,At)](3)
Substitute equation (2) and (3) into (1), we get:
G
t
:
t
+
n
=
Q
t
−
1
(
S
t
,
A
t
)
+
γ
0
[
R
t
+
1
+
γ
Q
t
(
S
t
+
1
,
A
t
+
1
)
−
Q
t
−
1
(
S
t
,
A
t
)
]
+
∑
i
=
2
n
γ
i
−
1
[
R
t
+
i
+
γ
Q
t
+
i
−
1
(
S
t
+
i
,
A
t
+
i
)
−
Q
t
+
i
−
2
(
S
t
+
i
−
1
,
A
t
+
i
−
1
)
]
=
Q
t
−
1
(
S
t
,
A
t
)
+
∑
i
=
1
n
γ
i
−
1
[
R
t
+
i
+
γ
Q
t
+
i
−
1
(
S
t
+
i
,
A
t
+
i
)
−
Q
t
+
i
−
2
(
S
t
+
i
−
1
,
A
t
+
i
−
1
)
]
(4)
\begin{aligned} G_{t:t+n} &= Q_{t-1}(S_t,A_t) + \gamma^0 \bigl[ R_{t+1} + \gamma Q_t(S_{t+1}, A_{t+1}) - Q_{t-1}(S_t, A_t) \bigr ] \\ &\quad+ \sum_{i=2}^n \gamma^{i-1}\bigl[ R_{t+i} + \gamma Q_{t+i-1}(S_{t+i} , A_{t+i}) -Q_{t+i-2}(S_{t+i-1} , A_{t+i-1})\bigr] \\ &= Q_{t-1}(S_t,A_t) + \sum_{i=1}^n \gamma^{i-1}\bigl[ R_{t+i} + \gamma Q_{t+i-1}(S_{t+i} , A_{t+i}) -Q_{t+i-2}(S_{t+i-1} , A_{t+i-1})\bigr] \tag{4}\\ \end{aligned}
Gt:t+n=Qt−1(St,At)+γ0[Rt+1+γQt(St+1,At+1)−Qt−1(St,At)]+i=2∑nγi−1[Rt+i+γQt+i−1(St+i,At+i)−Qt+i−2(St+i−1,At+i−1)]=Qt−1(St,At)+i=1∑nγi−1[Rt+i+γQt+i−1(St+i,At+i)−Qt+i−2(St+i−1,At+i−1)](4)
Let
k
=
i
+
t
−
1
k =i+t-1
k=i+t−1, so
i
=
k
−
t
+
1
i =k-t+1
i=k−t+1 equation (4) can be written as:
G
t
:
t
+
n
=
Q
t
−
1
(
S
t
,
A
t
)
+
∑
k
=
t
t
+
n
−
1
γ
k
−
t
[
R
k
+
1
+
γ
Q
k
(
S
k
+
1
,
A
k
+
1
)
−
Q
k
−
1
(
S
k
,
A
k
)
]
(5)
G_{t:t+n} = Q_{t-1}(S_t, A_t) + \sum_{k=t}^{t+n-1}\gamma^{k-t}\bigl[ R_{k+1} + \gamma Q_{k}(S_{k+1} , A_{k+1}) -Q_{k-1}(S_{k} , A_{k})\bigr] \tag{5}
Gt:t+n=Qt−1(St,At)+k=t∑t+n−1γk−t[Rk+1+γQk(Sk+1,Ak+1)−Qk−1(Sk,Ak)](5)
t
+
n
t +n
t+n should not larger than
T
T
T, so equation (5) can be written as:
G
t
:
t
+
n
=
Q
t
−
1
(
S
t
,
A
t
)
+
∑
k
=
t
m
i
n
(
t
+
n
,
T
)
−
1
γ
k
−
t
[
R
k
+
1
+
γ
Q
k
(
S
k
+
1
,
A
k
+
1
)
−
Q
k
−
1
(
S
k
,
A
k
)
]
G_{t:t+n}=Q_{t-1}(S_t,A_t)+\sum_{k=t}^{min(t+n,T)-1} \gamma^{k-t}[R_{k+1} + \gamma Q_k( S_{k+1}, A_{k+1}) - Q_{k-1}(S_k,A_k)]
Gt:t+n=Qt−1(St,At)+k=t∑min(t+n,T)−1γk−t[Rk+1+γQk(Sk+1,Ak+1)−Qk−1(Sk,Ak)]
PROVED.
“相关推荐”对你有帮助么?
-
非常没帮助
-
没帮助
-
一般
-
有帮助
-
非常有帮助
提交