RLlib/Ray:踩坑记录

RLlib Tutorial

参考:RLlib Tutorial

报错:Observation outside given space /Observation outside expected value range

如下所示是Ray的两个不同版本的报错,本质相同:

ValueError: Observation ([ 0  1  1  1  1  0  0  0  0 51 48 47 33 10  2  4  9  4  1  2  2  0  0  0
  0  0  0  1  0  0  0] dtype=None) outside given space (MultiDiscrete([ 2  2  2  2  2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11  2  2  2
  2  2  2  2  2  2  2]))!
ValueError: ('Observation outside expected value range', MultiDiscrete([ 2  2  2  2  2 51 51 51 51 51 51 51 51 11 11 11 11 11 11 11 11  2  2  2
2  2  2  2  2  2  2]), array([ 0,  1,  1,  1,  1,  0,  0,  0,  0, 32, 51, 32, 27, 10,  5,  0,  1,
      6,  1,  2,  2,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0],
    dtype=int64))

出现这种报错的原因是MultiDiscrete的取值范围是左闭右开。以第一个报错中的第十维为例,MultiDiscrete的取值范围是[0, 51),因此当observation对应维度为51时就会报错。

如何理解episode,iteration,trial和experiment的关系

参考:[Tune] [RLlib] Episodes vs iterations vs trials vs experiments

Episode: In an RL environment, the episode starts when you call env.reset() and it finishes (after n timesteps for each of which you call env.step([some action])) when the env returns the done=True flag from the step() method.

Iteration: A single training iteration for an RLlib Trainer (calling Trainer.train() once). An iteration may contain one or more episodes (collecting data for the train batch or for a replay buffer), and one or more SGD update steps, depending on the particular Trainer being used.

Trial: When you use RLlib in combination with Tune and e.g. do a tune.grid_search over 2 learning rates, e.g. tune.grid_search([0.0001, 0.0005]), Tune will then run two “trials” using these two different learning rates.

Experiment: A (e.g. yaml) defined RLlib config (maybe containing grid_searches that cause n trials). You can store more than one experiment in a yaml file under different top-level experiment names (e.g. see ray.release.rllib_tests.learning_tests.major_algos_learning_tests.yaml).

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值