ray 集群概述

最新推荐文章于 2024-06-25 15:31:17 发布

小那(yelanta)

最新推荐文章于 2024-06-25 15:31:17 发布

阅读量3.2k

点赞数 4

文章标签：分布式

本文链接：https://blog.csdn.net/xiaoming_ha/article/details/112555302

版权

ray 集群概述

ray的长处之一就是能够在一个程序中利用多个机器运行，在多机器的集群中才能发挥ray的真正能力。

关键的概念

ray node: ray 的集群是有一个head node和多个 worker node组成的。head node需要先启动，然后worker node使用head node的地址启动以形成集群。ray 集群自己可以做到自动缩放，可以与Cloud Provider交互，根据应用的工作负载来释放或者申请instances.
ports: ray的进程是通过tcp端口号进行交流的。无论在云上或者其他的平台，开放正确的端口号是至关重要的。
ray cluster launcher: 是一个用来配置机器和启动多节点集群的工具，在Kubernetes上就可以使用这个启动器。

概述

集群可以使用ray 集群启动器(ray cluster launcher)启动，也可以使用手册手动创建，也可以使用像k8s的标准集群管理来创建。
搭建好你的集群之后，在使用ray start的节点上启动驱动程序就可以将你的程序连接到集群。

Launching Cloud Clusters with ray

ray使用内置的集群启动器来部署一个简单的ray集群。例如在k8s上，在k8s提供的节点上集群启动器将提供资源，然后在提供的资源上启动一个ray集群。可以配置ray cluster launcher来与k8s一起使用。

launching cloud cluster

这一小节介绍配置launching cloud cluster以和k8s一起来使用。
ray/python/ray/autoscaler/kubernetes/example-full.yaml 集群配置文件可以创建一个小的集群，集群有一个head节点，并且自动扩展两个worker 节点pod。

# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to get a remote shell into the head node.
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml

# List the pods running in the cluster. You shoud only see one head node
# until you start running an application, at which point worker nodes
# should be started. Don't forget to include the Ray namespace in your
# 'kubectl' commands ('ray' by default).
$ kubectl -n ray get pods

# Get a remote screen on the head node.
$ ray attach ray/python/ray/autoscaler/kubernetes/example-full.yaml
$ # Try running a Ray program with 'ray.init(address="auto")'.

# Tear down the cluster
$ ray down ray/python/ray/autoscaler/kubernetes/example-full.yaml

观察上面的example-full示例，其中的head node的pod yaml中restartPolicy字段为Never，其中的解释是自动重启head node当前是不支持的，如果head node goes down了，那么要使用ray up来重新启动ray 集群。
其中的worker node的pod yaml中restartPolicy字段为Never，也就是ray 的集群在k8s上并不使用k8s本身的保持pod副本数量功能，对于pod的启停完全由ray的head node自动管理，也就是自动缩放器来管理。

configuring your cluster

使用配置文件来描述ray本身的集群，在配置文件中我们至少要知名如下信息：

集群的名字
集群中worker的数量
云提供商
启动节点之后要设置的命令
一个简单的集群配置文件如下：

# 集群独特的名字
cluster_name: basic-ray

# 除了head节点外要启动的最大的worker节点数量
max_workers: 0 # this means zero workers

# Cloud-provider specific configuration.
provider:
   type: kubernetes

# How Ray will authenticate with newly launched nodes.
auth:
   ssh_user: ubuntu

setup_commands:
  - pip install ray[all]
  # 下面可以制定任意你的脚本
  - touch /tmp/some_file.txt

集群自动缩放

Ray Launcher 会自动启动基于负载的自动缩放器，调度会查看来自集群的task、actor和资源需求，增加可以执行这些需求的最小数目的节点。当节点空闲时间超过设定时间时便会被移除。使用upscaling_speed属性可以限制允许pending的节点数目以避免一次创建太多的节点，默认值为1.也就是在任何时刻集群新增的节点大小最多为100%，也就是一倍，也可以设置上限。
自动缩放器实现了下面的循环控制：

自动缩放器计算需要的节点数量以满足当前pending的task、actor和放置的资源请求
如果需要的节点总数除以当前节点数超过了upscaling_speed+1,则启动的节点数目会收到upscaling_speed的约束。
如果节点空闲超时，默认5分钟，会被集群移除。
基础的自动缩放配置如下：

# An unique identifier for the head node and workers of this cluster.
cluster_name: default

# The minimum number of workers nodes to launch in addition to the head
# node. This number should be >= 0.
min_workers: 0

# The autoscaler will scale up the cluster faster with higher upscaling speed.
# E.g., if the task requires adding more nodes then autoscaler will gradually
# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
# This number should be > 0.
upscaling_speed: 1.0

# If a node is idle for this many minutes, it will be removed. A node is
# considered idle if there are no tasks or actors running on it.
idle_timeout_minutes: 5

多节点类型自动缩放

ray集群支持多节点类型。在这种操作模式下，调度器根据资源需求选择增加的节点类型，代替总是增加相同类型的节点。
集群节点类型的概念不仅包括物理设施类型（例如Gpu vs Cpu），也包括其他的属性。可以为每个节点类型指定自定义的资源，目的是ray能够在应用级别知道对指定节点类型的需求，例如通过自定义资源指定一个任务被分配到有特殊role或者机器镜像的节点上。
下面是一个配置了多节点类型的例子：

# 指定允许的节点类型和他们提供的资源，key是节点类型的名字，仅仅适用于调试模式，node_config指定启动配置和物理实例的类型
available_node_types:
    cpu_4_ondemand:
        node_config:
            InstanceType: m4.xlarge
        min_workers: 1
        max_workers: 5
    cpu_16_spot:
        node_config:
            InstanceType: m4.4xlarge
            InstanceMarketOptions:
                MarketType: spot
        # Autoscaler will auto fill the CPU resources below.
        resources: {"Custom1": 1, "is_spot": 1}
        max_workers: 10
    gpu_1_ondemand:
        node_config:
            InstanceType: p2.xlarge
        # Autoscaler will auto fill the CPU/GPU resources below.
        resources: {"Custom2": 2}
        max_workers: 4
        worker_setup_commands:
            - pip install tensorflow-gpu  # Example command.
    gpu_8_ondemand:
        node_config:
            InstanceType: p3.8xlarge
        # Autoscaler autofills the "resources" below.
        # resources: {"CPU": 32, "GPU": 4, "accelerator_type:V100": 1}
        max_workers: 2
        worker_setup_commands:
            - pip install tensorflow-gpu  # Example command.

# Specify the node type of the head node (as configured above).
head_node_type: cpu_4_ondemand

# Specify the default type of the worker node (as configured above).
worker_default_node_type: cpu_16_spot

上面的配置文件定义了两个cpu节点类型(cpu_4_ondemand and cpu_16_spot)和两个gpu类型(gpu_1_ondemand and gpu_8_ondemand)。每个节点类型有一个名字，这个名字没有语义的意思，仅仅为了debug模式。让我们看一下gpu_1_ondemand节点类型的内部字段。
node config告诉我们底层云提供者怎样去启动一个此类型的节点。这个node config 会与顶层的yaml节点配置合并，并且会覆盖字段，这个例子是指定p2.xlarge 实例类型。

node_config:
    InstanceType: p2.xlarge

resources字段告诉自动缩放器这个节点提供什么类型的资源，此处的资源也包括自定义资源，例如Custom2, 这个字段能够让自动缩放器根据应用的资源需求自动选择正确类型的节点去启动。这里指定的资源以环境变量的方式自动传递给节点的ray start命令。
min_workers和max_workers字段指明启动此类型节点的最小最大数量。
worker_setup_commands字段用来覆盖节点类型的初始化命令。注意仅能覆盖worker node的初始化命令。

多类型集群的docker支持

每个节点类型都能指定worker_image和pull_before_run 字段，这两个字段会覆盖顶层docker部分的值。worker_run_options字段与顶层docker: run_options字段结合以产生docker run命令。ray将会自动选择nvidia docker runtime（前提为可用）
下面的配置是一个使用gpu的节点类型：

available_node_types:
    gpu_1_ondemand:
        max_workers: 2
        worker_setup_commands:
            - pip install tensorflow-gpu  # Example command.
        # Docker specific commands for gpu_1_ondemand
        pull_before_run: True
        worker_image:
            - rayproject/ray-ml:latest-gpu
        worker_run_options:  # Appended to top-level docker field.
            - "-v /home:/home"

ray with k8s cluster manager

使用ray自带的cluster launcher在已有的集群上部署是最容易的方式，当然也有使用k8s本身的集群管理方式部署ray集群，ray自带的cluster launcher是之后才集成到k8s上的。在github的ray/doc/kubernetes/目录下有所有例子所需的yaml配置文件，包括命名空间，集群需要的head node和worker node的deployment yaml配置文件等。
其中要了解的就是当节点crash的时候，因为部署的类型是deployment，所以k8s会重新启动节点以保持正确的副本数量。

当worker 节点发生故障的时候，自动重新创建一个pod并且加入到集群当中
当head节点发生故障的时候，也会重新创建一个新的pod，但是这会重新创建一个新的ray集群。worker节点连接旧的head节点失败发生故障，重新启动以连接新的head节点。

当需要使用gpu时，需要将gpu配置到k8s集群，并且将资源通过配置告知ray的集群。

the ray kubernetes operator

ray提供了kubernetes Operation来管理自动缩放ray集群。使用operator可以提供和使用ray集群启动器相似的功能，但是使用operation就不需要在本地运行ray，完全是通过k8s为媒介去互动。
operator使用叫做RayCluster的k8s自定义资源。RayCluster可以使用类似于Ray cluster launcher使用的yaml文件来指定配置信息，在内部，operator使用Ray的自动缩放去管理ray集群。但是自动缩放器运行在一个单独的operator pod里，而不是在ray head node中。每个RayCluster相当与指定一个Ray集群，也就是说可以定义多个Ray集群并且由operator来管理。

其中的自定义资源有一个四千多行的配置，用来配置自己的类型，api，节点类型等等，然后执行一个operator pod，其中包含自动缩放器，这个自动缩放器可以同时管理多个Ray集群。