RLlib Tutorial
报错:Observation outside given space /Observation outside expected value range
如下所示是Ray的两个不同版本的报错,本质相同:
ValueError: Observation ([ 0 1 1 1 1 0 0 0 0 51 48 47 33 10 2 4 9 4 1 2 2 0 0 0
0 0 0 1 0 0 0] dtype=None) outside given space (MultiDiscrete([ 2 2 2 2 2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11 2 2 2
2 2 2 2 2 2 2]))!
ValueError: ('Observation outside expected value range', MultiDiscrete([ 2 2 2 2 2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11 2 2 2
2 2 2 2 2 2 2]), array([ 0, 1, 1, 1, 1, 0, 0, 0, 0, 32, 51, 32, 27, 10, 5, 0, 1,
6, 1, 2, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
dtype=int64))
出现这种报错的原因是MultiDiscrete的取值范围是左闭右开。以第一个报错中的第十维为例,MultiDiscrete的取值范围是[0, 51),因此当observation对应维度为51时就会报错。
如何理解episode,iteration,trial和experiment的关系
参考:[Tune] [RLlib] Episodes vs iterations vs trials vs experiments
Episode: In an RL environment, the episode starts when you call env.reset() and it finishes (after n timesteps for each of which you call env.step([some action])) when the env returns the done=True flag from the step() method.
Iteration: A single training iteration for an RLlib Trainer (calling Trainer.train() once). An iteration may contain one or more episodes (collecting data for the train batch or for a replay buffer), and one or more SGD update steps, depending on the particular Trainer being used.
Trial: When you use RLlib in combination with Tune and e.g. do a tune.grid_search over 2 learning rates, e.g. tune.grid_search([0.0001, 0.0005]), Tune will then run two “trials” using these two different learning rates.
Experiment: A (e.g. yaml) defined RLlib config (maybe containing grid_searches that cause n trials). You can store more than one experiment in a yaml file under different top-level experiment names (e.g. see ray.release.rllib_tests.learning_tests.major_algos_learning_tests.yaml).