motivation
Model-based approaches enjoys 1) sample efficiency (meaning they learn quickly), 2) and a reward-independent dynamics model (thinking of model-free approaches require the reward function to update), but meanwhile lagging behind model-free approaches in asymptotic performance (meaning they converge to sub-optimal solutions).
This work based on two observations:
- model capacity matters
GP is efficient but lacks expressiveness, NN leans slowly? - the above issue can be mitigated by incorporating uncertainties.
(actually i didn’t find any reasoning)
Talking of related works, the paper claims that deterministic NN used in many prior works suffer from overfitting in the ealy stages of learning.
The author mentions a major challenge in model-based RL: model should perform well in both low and high data regimes.
What causes this? Is this specific under the setting of model-based RL?
pipeline
probabilistic ensemble dynamics model
dynamics model
-
probabilistic NN
a parametrized conditional distribution model f θ ( s t + 1 ∣ s t , a t ) f_\theta(s_{t+1}\mid s_t, a_t) fθ(st+1∣st,at), optimized by Maximizing the Likelihood of environment-produced trajectories.A typical choice of the distribution is a diagonal multiunivariate Gaussian. This is similar to the model for predicting actions given states in continuous action space. The model would give a state mean vector and a state variance vector, and the next state is produced by sampling from such Gaussian.
-
deterministic NN
f ( s t