基于KubeRay提交RayJob

1.准备

概念:Ray job至少有三种情况,

本篇博客主要是体验第二种

准备:

  • 先有kubernetes集群,本篇博客运行在华为云CCE上,已经有了kubernetes,支持helm插件等
  • 本地安装kubectl和helm等工具

计划:

  • 安装kuberay-operator 0.5.1 helm chart, 镜像版本为0.5.0
  • 安装kuberay-apiserver 0.5.1 helm chart, 镜像版本为0.5.0
  • job使用rayproject/ray:v2.4.0版本

2. 下载

搜索kuberay相关组件:

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# helm search repo kuberay
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME                            CHART VERSION   APP VERSION     DESCRIPTION
kuberay/kuberay-apiserver       0.5.1                           A Helm chart for kuberay-apiserver
kuberay/kuberay-operator        0.5.1                           A Helm chart for Kubernetes
kuberay/ray-cluster             0.5.1                           A Helm chart for Kubernetes

下载:

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# helm fetch kuberay/kuberay-apiserver
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# helm fetch kuberay/ray-cluster
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config

解压:

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# tar -xvf kuberay-operator-0.5.1.tgz
kuberay-operator/Chart.yaml
kuberay-operator/values.yaml
kuberay-operator/templates/_helpers.tpl
kuberay-operator/templates/deployment.yaml
kuberay-operator/templates/leader_election_role.yaml
kuberay-operator/templates/leader_election_role_binding.yaml
kuberay-operator/templates/ray_rayjob_editor_role.yaml
kuberay-operator/templates/ray_rayjob_viewer_role.yaml
kuberay-operator/templates/ray_rayservice_editor_role.yaml
kuberay-operator/templates/ray_rayservice_viewer_role.yaml
kuberay-operator/templates/role.yaml
kuberay-operator/templates/rolebinding.yaml
kuberay-operator/templates/service.yaml
kuberay-operator/templates/serviceaccount.yaml
kuberay-operator/.helmignore
kuberay-operator/README.md
kuberay-operator/crds/ray.io_rayclusters.yaml
kuberay-operator/crds/ray.io_rayjobs.yaml
kuberay-operator/crds/ray.io_rayservices.yaml
root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# tar -xvf kuberay-apiserver-0.5.1.tgz
kuberay-apiserver/Chart.yaml
kuberay-apiserver/values.yaml
kuberay-apiserver/templates/_helpers.tpl
kuberay-apiserver/templates/deployment.yaml
kuberay-apiserver/templates/ingress.yaml
kuberay-apiserver/templates/role.yaml
kuberay-apiserver/templates/rolebinding.yaml
kuberay-apiserver/templates/service.yaml
kuberay-apiserver/templates/serviceaccount.yaml
kuberay-apiserver/.helmignore
kuberay-apiserver/README.md

3.下载镜像

下载镜像并且提交到华为云SWR

docker pull kuberay/operator:v0.5.0
docker tag kuberay/operator:v0.5.0  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/kuberay/operator:v0.5.0
docker push  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/kuberay/operator:v0.5.0

docker pull kuberay/apiserver:v0.5.0
docker tag kuberay/apiserver:v0.5.0  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/kuberay/apiserver:v0.5.0
docker push  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/kuberay/apiserver:v0.5.0


docker tag rayproject/ray:2.4.0  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/rayproject/ray:2.4.0
docker push  swr.cn-north-7.myhuaweicloud.com/modelarts-idm-auto/rayproject/ray:2.4.0

4.修改helm chart

替换kuberay-operator中的values:
在这里插入图片描述
替换kuberay-apiserver中的values:
在这里插入图片描述

5.安装kuberay的相关组件

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# helm install kuberay-operator  kuberay-operator
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME: kuberay-operator
LAST DEPLOYED: Wed May 10 20:46:00 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# helm install kuberay-apiserver kuberay-apiserver
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME: kuberay-apiserver
LAST DEPLOYED: Wed May 10 20:46:46 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

登录华为云CCE查看Kuberay相关组件是否安装好:
路径:CCE=》工作负载=》kuberay-operator或者kuberay-apiserver=>容器配置=》镜像访问凭证=》default-secert=》提交
在这里插入图片描述

在这里插入图片描述
如果镜像拉取不成功,需要配置default secret:
在这里插入图片描述
在这里插入图片描述

6.下载ray job配置文件:

文件源:https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayjob.yaml

修改1:修改runtime,去掉pip依赖包安装,runtimeEnv为base64编码,

{
    "pip": [ ],
    "env_vars": {"counter_name": "test_counter"}
}

base64之后为:
ewogICAgInBpcCI6IFsgXSwKICAgICJlbnZfdmFycyI6IHsiY291bnRlcl9uYW1lIjogInRlc3RfY291bnRlciJ9Cn0K
修改处:
在这里插入图片描述

修改2:

增加 serviceType: “ClusterIP”
在这里插入图片描述

修改3:

增加 imagePullSecrets
在这里插入图片描述

修改4:修改镜像地址:

在这里插入图片描述
在这里插入图片描述

6 启动rayjob

1)提交job

提交job之后会先启动ray集群,然后再submitrayjob

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray/rayjob# kubectl apply -f ray_v1alpha1_rayjob.yaml
rayjob.ray.io/rayjob-sample created
configmap/ray-job-code-sample created

2) 查看集群

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# kubectl get rayclusters -o wide
NAME                             AGE
ray2-kuberay                     49d
rayjob-sample-raycluster-rd84d   26m

3)查看rayjob

 kubectl describe rayjobs rayjob-sample

在这里插入图片描述

4)查看ray dashbord

获取service名字或地址:

^Croot@DESKTOP-3813A3M:/mnt/d/all/app/Ray# kubectget svc
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                         AGE
kuberay-apiserver-service                 NodePort    10.247.162.162   <none>        8888:31888/TCP,8887:31887/TCP                   36m
kuberay-operator                          ClusterIP   10.247.236.242   <none>        8080/TCP                                        37m
kubernetes                                ClusterIP   10.247.0.1       <none>        443/TCP                                         56d
notebook-proxy                            NodePort    10.247.154.164   <none>        80:30528/TCP                                    49d
ray2-kuberay-head-svc                     ClusterIP   10.247.206.89    <none>        10001/TCP,6379/TCP,8265/TCP,8080/TCP,8000/TCP   49d
rayjob-sample-raycluster-rd84d-head-svc   ClusterIP   10.247.255.69    <none>        8080/TCP,6379/TCP,8265/TCP,10001/TCP,8000/TCP   6s

配置转发,方便本地访问

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray# kubectl port-forward service/rayjob-sample-raycluster-rd84d-head-svc 8265:8265
Forwarding from 127.0.0.1:8265 -> 8265
Forwarding from [::1]:8265 -> 8265
Handling connection for 8265
Handling connection for 8265
Handling connection for 8265

浏览器打开127.0.0.1:8265地址既可以访问

查看cluster:
在这里插入图片描述
查看job:
在这里插入图片描述
查看job运行日志:
在这里插入图片描述

5) 删除rayjob(可选):

root@DESKTOP-3813A3M:/mnt/d/all/app/Ray/rayjob# kubectl delete -f ray_v1alpha1_rayjob.yaml
rayjob.ray.io "rayjob-sample" deleted
configmap "ray-job-code-sample" deleted

7补充:

1) 确实起了一个ray cluster

在这里插入图片描述

2)job完成自动关闭

在job中配置:

  shutdownAfterJobFinishes: true

在这里插入图片描述

3)执行时间分析:

根据kubectl describe rayjobs rayjob-sample分析:
大概10秒钟左右完成ray集群启动,ray job大概9秒钟,删除集群6秒钟

在这里插入图片描述
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值