PPO configuration parameters: num_rollout_workers & train_batch_size

问题

Hello!

I use the RLlib (Ray 2.6.3), especially the PPO for my task. I have a question regarding the configuration of the PPO, which is still not clear to me.
我在我的任务中用到了Ray 2.6.3 中的PPO算法。在配置PPO算法时存在一个疑问。

Is there a connection between these two training variables: “num_rollout_workers” and the “train_batch_size”? For example, when I have two “num_rollout_workers”, do I have to multiply the “train_batch_size” with the number of “num_rollout_workers” in the configuration?
两个训练变量,”num_rollout_workers“ 和 ”train_batch_size“,之间存在关联关系吗?比如,我设置“num_rollout_workers” 为2, 那么我需要在算法的配置中,把变量“train_batch_size”的数值乘以“num_rollout_workers”吗?

Many thanks for your support in advance!

Greetings,
MRMarlies

回答

Hasnain_Fareed
Nov 2023
Yes, there is a relationship between num_rollout_workers and train_batch_size in the configuration of PPO in RLlib.
是的,在RLlib的PPO的配置中,num_rollout_workers 与 train_batch_size 之间存在关联关系。

The num_rollout_workers parameter specifies the number of workers that are used for environment sampling. Each of these workers collects samples from the environment in parallel, which can significantly speed up the data collection process.

参变量 num_rollout_workers 设定了从环境中采样的线程数。多个线程从环境中并行采样,这可以极大的加快数据采集的速度。

On the other hand, train_batch_size is the number of samples collected by all rollout workers combined that the algorithm will use for each training iteration.
另一方面,train_batch_size 是所有线程采集到的数据样本数的总和,这些数据被汇总到一起提供给算法,用于算法训练的单次迭代。

So, if you have num_rollout_workers=2, it doesn’t mean you have to multiply the train_batch_size by 2. However, you should ensure that train_batch_size is large enough to accommodate the samples collected by all the workers.
所以,如果你设置num_rollout_workers=2,不意味着你必须把train_batch_size乘以2. 然而,你需要明确,train_batch_size 需要足够大,足以容纳所有线程采集到的数据样本。

In other words, train_batch_size should be greater than or equal to num_rollout_workers * rollout_fragment_length * num_envs_per_worker. This is because each worker collects rollout_fragment_length * num_envs_per_worker samples before sending them to the learner.
换句话说, train_batch_size 应该比 num_rollout_workers * rollout_fragment_length * num_envs_per_worker 这个数值大或者相等。这是因为,每个线程在将 rollout_fragment_length * num_envs_per_worker 个样本发送给学习器之前收集了样本。

Here’s an example from a forum post:
下面是论坛帖子里的一个例子

num_gpus = 0
num_gpus_per_worker = 0
num_cpus_for_local_worker = 1
num_cpus_per_worker = 1
num_rollout_workers = 1
rollout_fragment_length = 200
train_batch_size = 200 # must be = rollout_fragment_length * num_rollout_workers * num_envs_per_worker
sgc_minibatch_size = 32

In this example, train_batch_size is set to 200, which is equal to rollout_fragment_length * num_rollout_workers * num_envs_per_worker.

Remember, the train_batch_size is a hyperparameter that you can tune based on your specific problem and computational resources. It doesn’t have to be exactly equal to num_rollout_workers * rollout_fragment_length * num_envs_per_worker, but it should be large enough to accommodate the samples collected by all the workers.
记住, train_batch_size 是一个可调的超参数。它不必准确的等于 num_rollout_workers * rollout_fragment_length * num_envs_per_worker, 但是需要足够大,大到足以容纳所有线程采集到的数据。

翻译自: https://discuss.ray.io/t/ppo-configuration-parameters-num-rollout-workers-train-batch-size/12552/2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值