RLlib/Ray：踩坑记录

喵呜嘻嘻嘻

已于 2023-07-14 18:25:21 修改

阅读量371

点赞数

分类专栏：强化学习文章标签：开源

于 2023-07-13 11:52:22 首次发布

本文链接：https://blog.csdn.net/z3w97/article/details/131698782

版权

强化学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

RLlib Tutorial

参考：RLlib Tutorial

报错：Observation outside given space /Observation outside expected value range

如下所示是Ray的两个不同版本的报错，本质相同：

ValueError: Observation ([ 0  1  1  1  1  0  0  0  0 51 48 47 33 10  2  4  9  4  1  2  2  0  0  0
  0  0  0  1  0  0  0] dtype=None) outside given space (MultiDiscrete([ 2  2  2  2  2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11  2  2  2
  2  2  2  2  2  2  2]))!

ValueError: ('Observation outside expected value range', MultiDiscrete([ 2  2  2  2  2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11  2  2  2
2  2  2  2  2  2  2]), array([ 0,  1,  1,  1,  1,  0,  0,  0,  0, 32, 51, 32, 27, 10,  5,  0,  1,
      6,  1,  2,  2,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0],
    dtype=int64))

出现这种报错的原因是MultiDiscrete的取值范围是左闭右开。以第一个报错中的第十维为例，MultiDiscrete的取值范围是[0, 51)，因此当observation对应维度为51时就会报错。

如何理解episode，iteration，trial和experiment的关系

参考：[Tune] [RLlib] Episodes vs iterations vs trials vs experiments

Episode: In an RL environment, the episode starts when you call env.reset() and it finishes (after n timesteps for each of which you call env.step([some action])) when the env returns the done=True flag from the step() method.

Iteration: A single training iteration for an RLlib Trainer (calling Trainer.train() once). An iteration may contain one or more episodes (collecting data for the train batch or for a replay buffer), and one or more SGD update steps, depending on the particular Trainer being used.

Trial: When you use RLlib in combination with Tune and e.g. do a tune.grid_search over 2 learning rates, e.g. tune.grid_search([0.0001, 0.0005]), Tune will then run two “trials” using these two different learning rates.

Experiment: A (e.g. yaml) defined RLlib config (maybe containing grid_searches that cause n trials). You can store more than one experiment in a yaml file under different top-level experiment names (e.g. see ray.release.rllib_tests.learning_tests.major_algos_learning_tests.yaml).