从零到一:基于 K3s 快速搭建本地化 kubeflow AI 机器学习平台

背景

Kubeflow 是一种开源的 Kubernetes 原生框架,可用于开发、管理和运行机器学习工作负载,支持诸如 PyTorch、TensorFlow 等众多优秀的机器学习框架,本文介绍如何在 Mac 上搭建本地化的 kubeflow 机器学习平台。
在这里插入图片描述

注意:本文以 deyloyKF 发行版作为主要安装对象,本地环境仅适用于开发测试使用,不可用于生产环境!

更多 kubeflow 发行版参考官网介绍:https://www.kubeflow.org/docs/started/installing-kubeflow/

基本环境:

OS:macos 13.1 (amd64)
DockerDesktop:v4.15.0

尽管 K3s 自身需要的资源不多,但是 kubeflow 套件组件众多,需要设置 Docker 的资源分配,避免安装过程中发生 Pod Pending.
Docker 资源建议设置:CPU 8 核,Memory 10G,磁盘 40G
在这里插入图片描述

安装部署步骤

1. 安装依赖的 CLI

brew install bash argocd jq k3d kubectl kustomize

2. 创建 Kubernetes 集群

为了尽可能降低资源消耗,这里使用 K3s 运行本地集群:

k3d cluster create "kubeflow" --image "rancher/k3s:v1.27.10-k3s2"

通过如下命令检查集群是否就绪:

kubectl get -A pods

正常的输出结果类似如下这样:

NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-957fdf8bc-cj9l5   1/1     Running     0          2m30s
kube-system   coredns-77ccd57875-xzzz4                 1/1     Running     0          2m30s
kube-system   metrics-server-648b5df564-gwnhq          1/1     Running     0          2m30s
kube-system   helm-install-traefik-crd-49l4k           0/1     Completed   0          2m31s
kube-system   helm-install-traefik-xrjtd               0/1     Completed   2          2m31s
kube-system   svclb-traefik-a79cf0ef-lj4td             2/2     Running     0          89s
kube-system   traefik-768bdcdcdd-mr8z8                 1/1     Running     0          89s

3. 部署 ArgoCD

ArgoCD 是工作流编排工具,可以帮助我们实现 Kubeflow 的自动化部署

git clone -b main https://github.com/deployKF/deployKF.git
cd deployKF/argocd-plugin
chmod +x ./install_argocd.sh
bash ./install_argocd.sh

通过如下命令检查 ArgoCD 是否就绪:

kubectl get pod -n argocd

正常的输出结果类似如下这样:

NAME                                                READY   STATUS    RESTARTS   AGE
argocd-redis-69f8795dbd-7v4nn                       1/1     Running   0          106s
argocd-applicationset-controller-7b9c4dfb77-7gsf2   1/1     Running   0          106s
argocd-notifications-controller-756764ddd5-jw92c    1/1     Running   0          106s
argocd-server-86f64667bc-7nt7d                      1/1     Running   0          105s
argocd-application-controller-0                     1/1     Running   0          105s
argocd-dex-server-9b5c6dccd-2p779                   1/1     Running   0          106s
argocd-repo-server-5b55578f7c-sfzf4                 2/2     Running   0          105s

4. 安装 kubeflow 套件

准备如下文件:deploykf-app-of-apps.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: deploykf-app-of-apps
  namespace: argocd
  labels:
    app.kubernetes.io/name: deploykf-app-of-apps
    app.kubernetes.io/part-of: deploykf
spec:
  project: "default"
  source:
    ## source git repo configuration
    ##  - we use the 'deploykf/deploykf' repo so we can read its 'sample-values.yaml'
    ##    file, but you may use any repo (even one with no files)
    ##
    repoURL: "https://github.com/deployKF/deployKF.git"
    targetRevision: "v0.1.4"
    path: "."

    ## plugin configuration
    ##
    plugin:
      name: "deploykf"
      parameters:

        ## the deployKF generator version
        ##  - available versions: https://github.com/deployKF/deployKF/releases
        ##
        - name: "source_version"
          string: "0.1.4"

        ## paths to values files within the `repoURL` repository
        ##  - the values in these files are merged, with later files taking precedence
        ##  - we strongly recommend using 'sample-values.yaml' as the base of your values
        ##    so you can easily upgrade to newer versions of deployKF
        ##
        - name: "values_files"
          array:
            - "./sample-values.yaml"

        ## a string containing the contents of a values file
        ##  - this parameter allows defining values without needing to create a file in the repo
        ##  - these values are merged with higher precedence than those defined in `values_files`
        ##
        - name: "values"
          string: |
            ##
            ## This demonstrates how you might structure overrides for the 'sample-values.yaml' file.
            ## For a more comprehensive example, see the 'sample-values-overrides.yaml' in the main repo.
            ##
            ## Notes:
            ##  - YAML maps are RECURSIVELY merged across values files
            ##  - YAML lists are REPLACED in their entirety across values files
            ##  - Do NOT include empty/null sections, as this will remove ALL values from that section.
            ##    To include a section without overriding any values, set it to an empty map: `{}`
            ##

            ## --------------------------------------------------------------------------------
            ##                                      argocd
            ## --------------------------------------------------------------------------------
            argocd:
              namespace: argocd
              project: default

            ## --------------------------------------------------------------------------------
            ##                                    kubernetes
            ## --------------------------------------------------------------------------------
            kubernetes:
              {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

            ## --------------------------------------------------------------------------------
            ##                              deploykf-dependencies
            ## --------------------------------------------------------------------------------
            deploykf_dependencies:

              ## --------------------------------------
              ##             cert-manager
              ## --------------------------------------
              cert_manager:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##                 istio
              ## --------------------------------------
              istio:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##                kyverno
              ## --------------------------------------
              kyverno:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

            ## --------------------------------------------------------------------------------
            ##                                  deploykf-core
            ## --------------------------------------------------------------------------------
            deploykf_core:

              ## --------------------------------------
              ##             deploykf-auth
              ## --------------------------------------
              deploykf_auth:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##        deploykf-istio-gateway
              ## --------------------------------------
              deploykf_istio_gateway:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##      deploykf-profiles-generator
              ## --------------------------------------
              deploykf_profiles_generator:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

            ## --------------------------------------------------------------------------------
            ##                                   deploykf-opt
            ## --------------------------------------------------------------------------------
            deploykf_opt:

              ## --------------------------------------
              ##            deploykf-minio
              ## --------------------------------------
              deploykf_minio:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##            deploykf-mysql
              ## --------------------------------------
              deploykf_mysql:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

            ## --------------------------------------------------------------------------------
            ##                                  kubeflow-tools
            ## --------------------------------------------------------------------------------
            kubeflow_tools:

              ## --------------------------------------
              ##                 katib
              ## --------------------------------------
              katib:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##               notebooks
              ## --------------------------------------
              notebooks:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

              ## --------------------------------------
              ##               pipelines
              ## --------------------------------------
              pipelines:
                {} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!

  destination:
    server: "https://kubernetes.default.svc"
    namespace: "argocd"

执行如下命令,部署工作流:

kubectl apply -f ./deploykf-app-of-apps.yaml

通过 UI 界面查看 ArgoCD 状态:

kubectl port-forward --namespace "argocd" svc/argocd-server 8090:https

浏览器打开 https://localhost:8090/,用户名:admin,密码可通过如下命令获取:

echo $(kubectl -n argocd get secret/argocd-initial-admin-secret \
 -o jsonpath="{.data.password}" | base64 -d)

在这里插入图片描述
由于程序间存在依赖关系,可以通过如下脚本按序执行 Sync 操作:

git clone -b main https://github.com/deployKF/deployKF.git
cd deployKF/scripts
chmod +x ./sync_argocd_apps.sh
bash ./sync_argocd_apps.sh

该脚本是幂等的,失败后可反复执行直到部署成功,成功部署后的运行中 Pod 列表类似如下这样:

NAMESPACE                 NAME                                                 READY   STATUS    RESTARTS       AGE
argocd                    argocd-redis-69f8795dbd-x5wtv                        1/1     Running   5 (17m ago)    105m
argocd                    argocd-server-86f64667bc-zfm7m                       1/1     Running   4 (17m ago)    73m
argocd                    argocd-repo-server-5b55578f7c-x26zz                  2/2     Running   10 (17m ago)   91m
argocd                    argocd-notifications-controller-756764ddd5-2fqbr     1/1     Running   5 (17m ago)    89m
argocd                    argocd-dex-server-9b5c6dccd-bl86m                    1/1     Running   5 (17m ago)    91m
argocd                    argocd-application-controller-0                      1/1     Running   5 (17m ago)    91m
argocd                    argocd-applicationset-controller-7b9c4dfb77-hph2r    1/1     Running   5 (17m ago)    105m
cert-manager              cert-manager-c688c56f-w4jts                          1/1     Running   5 (17m ago)    109m
cert-manager              trust-manager-78766fd9bd-zd5zf                       1/1     Running   5 (17m ago)    90m
cert-manager              cert-manager-webhook-d45447457-q6cf8                 1/1     Running   6 (17m ago)    109m
cert-manager              cert-manager-cainjector-59d694bcc7-mrcvg             1/1     Running   6 (17m ago)    109m
deploykf-auth             oauth2-proxy-5fd9888b79-tpnrt                        2/2     Running   11 (16m ago)   73m
deploykf-auth             dex-68c8bf56b9-78d5g                                 2/2     Running   8 (17m ago)    73m
deploykf-dashboard        profile-controller-5575767c76-vshp2                  2/2     Running   8 (17m ago)    73m
deploykf-dashboard        kfam-api-75b64c9645-sjfcq                            2/2     Running   10 (17m ago)   98m
deploykf-dashboard        central-dashboard-6b5d9574dc-fmlt4                   2/2     Running   10 (17m ago)   98m
deploykf-istio-gateway    deploykf-gateway-6ddf8947cc-qz55g                    1/1     Running   5 (17m ago)    98m
deploykf-minio            deploykf-minio-568b877668-w2wct                      2/2     Running   5 (17m ago)    52m
deploykf-mysql            deploykf-mysql-0                                     1/1     Running   5 (17m ago)    109m
istio-system              istiod-7b9b6df595-jbztw                              1/1     Running   5 (17m ago)    91m
kube-system               svclb-deploykf-gateway-7f7cba3a-kkskn                3/3     Running   15 (17m ago)   100m
kube-system               metrics-server-648b5df564-gwnhq                      1/1     Running   9 (17m ago)    5h43m
kube-system               local-path-provisioner-957fdf8bc-cj9l5               1/1     Running   7 (17m ago)    5h43m
kube-system               coredns-77ccd57875-xzzz4                             1/1     Running   7 (17m ago)    5h43m
kube-system               traefik-768bdcdcdd-mr8z8                             1/1     Running   7 (17m ago)    5h42m
kube-system               svclb-traefik-a79cf0ef-6ksjm                         2/2     Running   10 (17m ago)   100m
kubeflow                  katib-controller-75858c4ddf-hwvkx                    1/1     Running   8 (17m ago)    95m
kubeflow                  ml-pipeline-ui-68b7f6586d-qtjp5                      2/2     Running   15 (17m ago)   94m
kubeflow                  ml-pipeline-persistenceagent-68bbd65f98-tsnqn        2/2     Running   10 (17m ago)   94m
kubeflow                  katib-ui-d4df8bdb6-2x75p                             2/2     Running   10 (17m ago)   95m
kubeflow                  ml-pipeline-6445d9fb77-dxgv4                         2/2     Running   24 (16m ago)   94m
kubeflow                  admission-webhook-deployment-789dc56fbf-z7cj8        1/1     Running   5 (17m ago)    94m
kubeflow                  metadata-writer-6f95b9588c-fmx4s                     2/2     Running   8 (17m ago)    73m
kubeflow                  notebook-controller-deployment-649cf9b976-vnvwd      2/2     Running   10 (17m ago)   95m
kubeflow                  training-operator-7cf5c66858-jf5sr                   1/1     Running   3 (17m ago)    43m
kubeflow                  tensorboards-web-app-deployment-778466f5f6-dmrks     2/2     Running   2 (17m ago)    43m
kubeflow                  tensorboard-controller-deployment-644f57dd7c-zlxnw   3/3     Running   24 (17m ago)   92m
kubeflow                  ml-pipeline-scheduledworkflow-578475988-kwz27        2/2     Running   10 (17m ago)   94m
kubeflow                  volumes-web-app-deployment-588d46bb75-95g6b          2/2     Running   2 (17m ago)    42m
kubeflow                  ml-pipeline-viewer-crd-6857ccc85c-zl895              2/2     Running   10 (17m ago)   94m
kubeflow                  metadata-grpc-deployment-566d54d578-wwj9n            2/2     Running   23 (16m ago)   94m
kubeflow                  ml-pipeline-visualizationserver-7b45b7fd56-s4pxh     2/2     Running   15 (17m ago)   94m
kubeflow                  cache-server-66d7586749-prmkq                        2/2     Running   10 (17m ago)   94m
kubeflow                  jupyter-web-app-deployment-9c8c779c-hcqvr            2/2     Running   15 (17m ago)   91m
kubeflow                  katib-db-manager-6998f5bdd8-lrs77                    1/1     Running   5 (17m ago)    95m
kubeflow                  metadata-envoy-deployment-b48db5966-542nh            1/1     Running   5 (17m ago)    94m
kubeflow-argo-workflows   argo-workflow-controller-79fc5c6895-2g26t            2/2     Running   10 (17m ago)   98m
kubeflow-argo-workflows   argo-server-6d97fb7649-lsfdw                         2/2     Running   5 (16m ago)    73m
kyverno                   kyverno-cleanup-controller-6cb4d5848-hh8nm           1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-frknb         1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-background-controller-796f77c79f-nwhrs       1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-reports-controller-6d6d98fc96-z7qjv          1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-hgtc2         1/1     Running   4 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-x744h         1/1     Running   5 (17m ago)    109m
team-1                    ml-pipeline-visualizationserver-677c86b748-nbrr5     2/2     Running   2 (17m ago)    73m
team-1                    ml-pipeline-ui-artifact-7749b4f5f6-ld7kl             2/2     Running   10 (17m ago)   94m
team-1-prod               ml-pipeline-visualizationserver-677c86b748-hqwsh     2/2     Running   2 (17m ago)    73m
team-1-prod               ml-pipeline-ui-artifact-7749b4f5f6-hl6gk             2/2     Running   10 (17m ago)   94m

同步完成后的 ArgoCD 界面(完成 20 个应用同步):
在这里插入图片描述

5. 访问控制台

执行端口转发:

kubectl port-forward \
  --namespace "deploykf-istio-gateway" \
  svc/deploykf-gateway 8080:http 8443:https

由于 Istio Gateway 基于 Host Header 区分访问的目标服务,因此需要配置本地 /etc/hosts 文件,追加如下内容:

127.0.0.1 deploykf.example.com
127.0.0.1 argo-server.deploykf.example.com
127.0.0.1 minio-api.deploykf.example.com
127.0.0.1 minio-console.deploykf.example.com

浏览器访问 https://deploykf.example.com:8443/

管理员:用户名 admin@example.com 密码 admin
用户 1: 用户名 user1@example.com 密码 user1
用户 2: 用户名 user2@example.com 密码 user2

在这里插入图片描述

6. 运行 Jupyter

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

更多功能持续探索中…

本文引用

https://www.deploykf.org/guides/local-quickstart/

  • 10
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
k3s是一个轻量级的Kubernetes发行版,它可以在嵌入式系统上进行部署。在嵌入式设备上搭建k3s的高可用环境需要考虑到数据库的高可用性。 k3s默认使用SQLite作为嵌入式数据库,但SQLite不支持高可用性。因此,我们需要使用其他支持高可用性的数据库来替换。常见的选择是使用MySQL或PostgreSQL作为嵌入式数据库。 要搭建k3s嵌入式数据库的高可用环境,可以按照以下步骤进行操作: 1. 安装和配置MySQL或PostgreSQL数据库集群:在不同的嵌入式设备上安装MySQL或PostgreSQL,并进行配置以实现主从复制或集群模式。确保数据的持久性和一致性。 2. 将k3s配置为使用MySQL或PostgreSQL:修改k3s的配置文件,让其使用MySQL或PostgreSQL作为数据库,而不是默认的SQLite。指定数据库的地址、用户名、密码等相关信息。 3. 配置数据库访问权限:确保k3s可以通过网络访问数据库集群,并具有足够的权限进行读写操作。 4. 配置数据库的高可用性:针对数据库集群的复制机制或集群配置进行优化,以提高可用性和容错性。可以使用负载均衡等技术来处理数据库请求。 5. 启动k3s并测试:在每个嵌入式设备上启动k3s,并确保它们能够连接到数据库集群。进行一些基本的测试,例如创建和管理Pod、容器等,验证k3s的功能和高可用性。 总结起来,要搭建k3s嵌入式数据库的高可用环境,需要安装和配置支持高可用性的数据库集群,将k3s配置为使用该数据库,并进行相关的网络和权限配置。通过这样的搭建,我们可以实现k3s在嵌入式设备上的高可用性。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值