GCRL Workshop/ NIPS 2023
paper
Intro
Goal-conditioned RL 结合 Transformer 应用在Offline RL的设定下。
Method
tranformer网络参数通过最小化MSE损失函数优化:
arg
min
ϕ
∑
τ
∈
D
L
ϕ
(
W
ϕ
(
s
t
,
ω
)
,
s
t
+
K
)
\arg\min_\phi\sum_{\tau\in\mathcal{D}}L_\phi(W_\phi(s_t,\omega),s_{t+K})
argϕminτ∈D∑Lϕ(Wϕ(st,ω),st+K)
而对于
W
ϕ
W_\phi
Wϕ通过最小化与平均累计reward的MSE优化:
arg
min
ϕ
∑
τ
∈
D
(
[
1
T
−
t
∑
t
′
=
t
T
γ
t
r
t
∑
t
′
=
t
T
γ
t
r
t
]
⊤
−
W
ϕ
(
s
t
,
ω
)
)
2
.
\arg\min_\phi\sum_{\tau\in\mathcal{D}}(\left[\frac{1}{T-t}\sum_{t'=t}^T\gamma^tr_t\quad\sum_{t'=t}^T\gamma^tr_t\right]^\top-W_\phi(s_t,\omega))^2.
argϕminτ∈D∑([T−t1t′=t∑Tγtrtt′=t∑Tγtrt]⊤−Wϕ(st,ω))2.