[ICML 2018] RLlib: Abstractions for Distributed Reinforcement Learning

最新推荐文章于 2023-11-14 19:51:46 发布

hanjialeOK

最新推荐文章于 2023-11-14 19:51:46 发布

阅读量206

点赞数

分类专栏： Paper Reading 文章标签： RLlib

本文链接：https://blog.csdn.net/weixin_43742643/article/details/121192280

版权

Paper Reading 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

https://arxiv.org/abs/1712.09381

Introduction

many of the challenges in reinforcement learning stem from the need to scale learning and simulation while also integrating a rapidly increasing range of algorithms and models.

many of the frameworks used by these libraries rely on communication between long-running program replicas for distributed execution, such as MPI, Distributed TensorFlow and parameter servers. it does not naturally encapsulate parallelism and resource requirements within individual components.

We believe that the ability to build scalable RL algorithms by composing and reusing existing components and implementations is essential for the rapid development and progress of the field. Toward this end, we argue for structuring distributed RL components around the principles of logically centralized program control and parallelism encapsulation.

Logically Centralized Control & Hierarchical Control

Distributed Control. Most RL algorithms today are written in a fully distributed style where replicated processes independently compute and coordinate with each other according to their roles (if any).
Logically Centralized Control. a single driver program can delegate algorithm sub-tasks to other processes to execute in parallel.
Hierarchical Control. To support nested computations, we propose extending the centralized control model with hierarchical delegation of control, which allows the worker processes to further delegate work to sub-workers of their own when executing tasks.

在这里插入图片描述

Hierarchical Parallel Task Model

Distributed Control. Parallelization of entire programs using frameworks like MPI and Distributed Tensorflow typically require explicit algorithm modifications to insert points of coordina- tion when trying to compose two programs or components together.
Hierarchical Control based on Ray. Ray meets this requirement with Ray actors, which are Python classes that may be created in the cluster and accept remote method calls. Ray permits these actors to in turn launch more actors and schedule tasks on those actors as part of a method call, satisfying our need for hierarchical delegation as well.

在这里插入图片描述

Abstractions for Reinforcement Learning

在这里插入图片描述

Policy Graph

To interface with RLlib, these algorithm functions should be defined in a policy graph class with the following methods:
在这里插入图片描述

Policy Evaluation

RLlib provides a PolicyEvaluator class that wraps a policy graph and environment to add a method to sample() experience batches. Policy evaluator instances can be created as Ray remote actors and replicated across a cluster for parallelism.

在这里插入图片描述

Policy Optimization

The policy optimizer is responsible for the performance-critical tasks of distributed sampling, parameter updates, and managing replay buffers. To distribute the computation, the optimizer operates over a set of policy evaluator replicas.

在这里插入图片描述

Pseudocode for four RLlib policy optimizer step methods. Each step() operates over a local policy graph and array of remote evaluator replicas.

在这里插入图片描述
For details in (c), see https://docs.ray.io/en/master/auto_examples/plot_parameter_server.html

current_weights = ps.get_weights.remote()

gradients = {}
for worker in workers:
    gradients[worker.compute_gradients.remote(current_weights)] = worker

for i in range(iterations * num_workers):
    ready_gradient_list, _ = ray.wait(list(gradients))
    ready_gradient_id = ready_gradient_list[0]
    worker = gradients.pop(ready_gradient_id)

    # Compute and apply gradients.
    current_weights = ps.apply_gradients.remote(*[ready_gradient_id])
    gradients[worker.compute_gradients.remote(current_weights)] = worker

    if i % 10 == 0:
        # Evaluate the current model after every 10 updates.
        model.set_weights(ray.get(current_weights))
        accuracy = evaluate(model, test_loader)
        print("Iter {}: \taccuracy is {:.1f}".format(i, accuracy))

print("Final accuracy is {:.1f}.".format(accuracy))

All optimizers in RLlib.

__all__ = [
    "PolicyOptimizer",
    "AsyncReplayOptimizer",
    "AsyncSamplesOptimizer",
    "AsyncGradientsOptimizer",
    "SyncSamplesOptimizer",
    "SyncReplayOptimizer",
    "LocalMultiGPUOptimizer",
    "SyncBatchReplayOptimizer",
]

Framework Performance

Fault tolerance and straggler mitigation

Failure events become significant at scale. RLlib leverages Ray’s built-in fault tolerance mechanisms, reducing costs with preemptible cloud compute instances. Similarly, stragglers can significantly impact the performance of distributed algorithms at scale. RLlib supports straggler mitigation in a generic way via the ray.wait() primitive. For example, in PPO we use this to drop the slowest evaluator tasks, at the cost of some bias.

Data compression

RLlib uses the LZ4 algorithm to compress experience batches. For image observations, LZ4 reduces network traffic and memory usage by more than an order of magnitude, at a compression rate of∼1 GB/s/core.

Evaluation

AWS m4.16xl CPU instances, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html

在这里插入图片描述

p2.16xl GPU instance, see https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html

在这里插入图片描述

x1.16xl, see [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html]

在这里插入图片描述

hanjialeOK

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[ICML 2018] RLlib: Abstractions for Distributed Reinforcement Learning

https://arxiv.org/abs/1712.09381Introductionmany of the challenges in reinforcement learning stem from the need to scale learning and simulation while also integrating a rapidly increasing range of algorithms and models. many of the frameworks used by th
复制链接

扫一扫

专栏目录