Kaito 开源程序是一个自动化 Kubernetes 集群中 AI/ML 模型推理或调优工作负载的 operator。目标模型是流行的开源大型模型，如 falcon 和 phi-3。-CSDN博客

本文链接：https://blog.csdn.net/struggle2025/article/details/147654427

一、软件介绍

文末提供程序和源码下载

Kaito 是一个自动化 Kubernetes 集群中 AI/ML 模型推理或调优工作负载的 operator。目标模型是流行的开源大型模型，如 falcon 和 phi-3。

二、区别与大多数基于虚拟机基础设施构建的主流模型部署方法相比

Manage large model files using container images. An OpenAI-compatible server is provided to perform inference calls.
使用容器镜像管理大型模型文件。提供兼容 OpenAI 的服务器以执行推理调用。
Provide preset configurations to avoid adjusting workload parameters based on GPU hardware.
提供预设配置以避免根据 GPU 硬件调整工作负载参数。
Provide support for popular open-sourced inference runtimes: vLLM and transformers.
提供对流行的开源推理运行时：vLLM 和 transformers 的支持。
Auto-provision GPU nodes based on model requirements.
根据模型需求自动配置 GPU 节点。
Host large model images in the public Microsoft Container Registry (MCR) if the license allows.
如果许可允许，将大型模型镜像托管在公共微软容器注册库（MCR）中。

Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
使用 Kaito，将大型 AI 推理模型在 Kubernetes 中的上线工作流程大大简化。

三、Architecture 架构

Kaito 遵循经典的 Kubernetes 自定义资源定义(CRD)/控制器设计模式。用户管理一个描述 GPU 需求和推理或调优规范的 workspace 自定义资源。Kaito 控制器将通过协调 workspace 自定义资源来自动化部署。

上图展示了 Kaito 架构概览。其主要组件包括：

Workspace controller: It reconciles the workspace custom resource, creates machine (explained below) custom resources to trigger node auto provisioning, and creates the inference or tuning workload (deployment, statefulset or job) based on the model preset configurations.
工作空间控制器：它协调 workspace 自定义资源，创建 machine （如下所述）自定义资源以触发节点自动配置，并根据模型预设配置创建推理或调优工作负载（ deployment ， statefulset 或 job ）。
Node provisioner controller: The controller's name is gpu-provisioner in gpu-provisioner helm chart. It uses the machine CRD originated from Karpenter to interact with the workspace controller. It integrates with Azure Resource Manager REST APIs to add new GPU nodes to the AKS or AKS Arc cluster.
节点提供程序控制器：该控制器的名称在 gpu-provisioner helm 图表中为 gpu-provisioner。它使用由 Karpenter 生成的 machine CRD 与工作空间控制器交互。它集成了 Azure 资源管理器 REST API，以向 AKS 或 AKS Arc 集群添加新的 GPU 节点。

Note: The gpu-provisioner is an open sourced component. It can be replaced by other controllers if they support Karpenter-core APIs.
注意：gpu-provisioner 是一个开源组件。如果其他控制器支持 Karpenter-core API，则可以替换它。

四、安装

在开始之前，请确保您已安装以下工具：

Azure CLI to provision Azure resources
用于预配 Azure 资源的 Azure CLI
Helm to install this operator
Helm 安装此 Operator
kubectl to view Kubernetes resources
kubectl 查看 Kubernetes 资源
git to clone this repo locally
git 在本地克隆此存储库
yq to process yaml files
yq 处理 YAML 文件
jq to process JSON files
jq 处理 JSON 文件

Important Note: Ensure you use a release branch of the repository for a stable version of the installation.
重要说明：确保使用存储库的 release 分支进行稳定版本的安装。

If you do not already have an AKS cluster, run the following Azure CLI commands to create one:
如果还没有 AKS 群集，请运行以下 Azure CLI 命令来创建一个：

export RESOURCE_GROUP="myResourceGroup"
export MY_CLUSTER="myCluster"
export LOCATION="eastus"
az group create --name $RESOURCE_GROUP --location $LOCATION
az aks create --resource-group $RESOURCE_GROUP --name $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity --generate-ssh-keys

Connect to the AKS cluster.
连接到 AKS 群集。

az aks get-credentials --resource-group $RESOURCE_GROUP --name $MY_CLUSTER

If you do not have kubectl installed locally, you can install using the following Azure CLI command.
如果尚未 kubectl 在本地安装，可以使用以下 Azure CLI 命令进行安装。

az aks install-cli

Install workspace controller
安装 workspace controller

Be sure you've cloned this repo and connected to your AKS cluster before attempting to install the Helm charts.
在尝试安装 Helm 图表之前，请确保已克隆此存储库并连接到 AKS 群集。

Install the Workspace controller.
安装 Workspace 控制器。

export KAITO_WORKSPACE_VERSION=0.4.5

helm install kaito-workspace  --set clusterName=$MY_CLUSTER --wait \
https://github.com/kaito-project/kaito/raw/gh-pages/charts/kaito/workspace-$KAITO_WORKSPACE_VERSION.tgz --namespace kaito-workspace --create-namespace

Note that if you have installed another node provisioning controller that supports Karpenter-core APIs, the following steps for installing gpu-provisioner can be skipped.
请注意，如果您安装了另一个支持 Karpenter 核心 API 的节点供应控制器，则可以跳过以下安装 gpu-provisioner 步骤。

Install gpu-provisioner controller
安装 gpu-provisioner 控制器

Enable Workload Identity and OIDC Issuer features
启用 Workload Identity 和 OIDC Issuer 功能

The gpu-provisioner controller requires the workload identity feature to acquire the access token to the AKS cluster.
gpu-provisioner 控制器需要工作负载标识功能来获取 AKS 群集的访问令牌。

Run the following commands only if your AKS cluster does not already have the Workload Identity and OIDC issuer features enabled.
仅当 AKS 群集尚未启用工作负载标识和 OIDC 颁发者功能时，才运行以下命令。

export RESOURCE_GROUP="myResourceGroup"
export MY_CLUSTER="myCluster"
az aks update -g $RESOURCE_GROUP -n $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity

Create an identity and assign permissions
创建身份并分配权限

The identity kaitoprovisioner is created for the gpu-provisioner controller. It is assigned Contributor role for the managed cluster resource to allow changing $MY_CLUSTER (e.g., provisioning new nodes in it).
将为 gpu-provisioner 控制器创建身份 kaitoprovisioner 。它被分配了托管集群资源的 Contributor 角色，以允许更改 $MY_CLUSTER （例如，在其中预置新节点）。

export SUBSCRIPTION=$(az account show --query id -o tsv)
export IDENTITY_NAME="kaitoprovisioner"
az identity create --name $IDENTITY_NAME -g $RESOURCE_GROUP
export IDENTITY_PRINCIPAL_ID=$(az identity show --name $IDENTITY_NAME -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query 'principalId' -o tsv)
az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --scope /subscriptions/$SUBSCRIPTION/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerService/managedClusters/$MY_CLUSTER  --role "Contributor"

Install helm charts 安装 helm charts

Important Note: For kaito 0.4.2 and above, please use gpu-provisioner 0.3.2 or higher. For versions below kaito 0.4.2, please use gpu-provisioner 0.2.1.
重要提示：对于 kaito 0.4.2 及更高版本，请使用 gpu-provisioner 0.3.2 或更高版本。对于 kaito 0.4.2 以下的版本，请使用 gpu-provisioner 0.2.1。

Install the Node provisioner controller.
安装 Node Provisioner 控制器。

# get additional values for helm chart install
export GPU_PROVISIONER_VERSION=0.3.3

curl -sO https://raw.githubusercontent.com/Azure/gpu-provisioner/main/hack/deploy/configure-helm-values.sh
chmod +x ./configure-helm-values.sh && ./configure-helm-values.sh $MY_CLUSTER $RESOURCE_GROUP $IDENTITY_NAME

helm install gpu-provisioner --values gpu-provisioner-values.yaml --set settings.azure.clusterName=$MY_CLUSTER --wait \
https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-$GPU_PROVISIONER_VERSION.tgz --namespace gpu-provisioner --create-namespace

Create the federated credential
创建联合凭据

The federated identity credential between the managed identity kaitoprovisioner and the service account used by the gpu-provisioner controller is created.
创建托管标识 kaitoprovisioner 与 gpu-provisioner 控制器使用的服务帐户之间的联合标识凭据。

export AKS_OIDC_ISSUER=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query "oidcIssuerProfile.issuerUrl" -o tsv)
az identity federated-credential create --name kaito-federatedcredential --identity-name $IDENTITY_NAME -g $RESOURCE_GROUP --issuer $AKS_OIDC_ISSUER --subject system:serviceaccount:"gpu-provisioner:gpu-provisioner" --audience api://AzureADTokenExchange --subscription $SUBSCRIPTION

Then the gpu-provisioner can access the managed cluster using a trust token with the same permissions of the kaitoprovisioner identity. Note that before finishing this step, the gpu-provisioner controller pod will constantly fail with the following message in the log:
然后，gpu-provisioner 可以使用具有与 kaitoprovisioner 身份相同权限的信任令牌访问托管集群。请注意，在完成此步骤之前， gpu-provisioner 控制器 pod 将不断失败，并在日志中显示以下消息：

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>panic: Configure azure client fails. Please ensure federatedcredential has been created for identity XXXX.
</code></span></span></span></span>

The pod will reach running state once the federated credential is created.
创建联合凭证后，Pod 将达到 Running 状态。

Verify installation 验证安装

You can run the following commands to verify the installation of the controllers were successful.
您可以运行以下命令来验证控制器安装是否成功。

Check status of the Helm chart installations.
检查 Helm Chart 安装的状态。

helm list -n kaito-workspace
helm list -n gpu-provisioner

Check status of the workspace.
检查的状态。 workspace

kubectl describe deploy kaito-workspace -n kaito-workspace

Check status of the gpu-provisioner.
检查的状态。 gpu-provisioner

kubectl describe deploy gpu-provisioner -n gpu-provisioner

Troubleshooting 故障排除

If you see that the gpu-provisioner deployment is not running after some time, it's possible that some values incorrect in your values.ovveride.yaml.
如果您看到 gpu-provisioner 部署在一段时间后未运行，则可能是 values.ovveride.yaml .

Run the following command to check gpu-provisioner pod logs for additional details.
运行以下命令以检查 gpu-provisioner Pod 日志以获取更多详细信息。

kubectl logs --selector=app.kubernetes.io\/name=gpu-provisioner -n gpu-provisioner

Clean up 收拾

helm uninstall gpu-provisioner -n gpu-provisioner
helm uninstall kaito-workspace -n kaito-workspace

五、Quick start 快速开始

安装 Kaito 后，可以尝试以下命令来启动 phi-3.5-mini-instruct 推理服务。

$ cat examples/inference/kaito_workspace_phi_3.5-instruct.yaml
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
  name: workspace-phi-3-5-mini
resource:
  instanceType: "Standard_NC24ads_A100_v4"
  labelSelector:
    matchLabels:
      apps: phi-3-5
inference:
  preset:
    name: phi-3.5-mini-instruct

$ kubectl apply -f examples/inference/kaito_workspace_phi_3.5-instruct.yaml

The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True, the model has been deployed successfully.
工作空间状态可以通过运行以下命令进行跟踪。当 WORKSPACEREADY 列变为 True 时，模型已成功部署。

$ kubectl get workspace workspace-phi-3-5-mini
NAME                     INSTANCE                   RESOURCEREADY   INFERENCEREADY   JOBSTARTED   WORKSPACESUCCEEDED   AGE
workspace-phi-3-5-mini   Standard_NC24ads_A100_v4   True            True                          True                 4h15m

Next, one can find the inference service's cluster ip and use a temporal curl pod to test the service endpoint in the cluster.
接下来，可以找到推理服务的集群 IP，并使用一个临时的 curl Pod 来测试集群中的服务端点。

# find service endpoint
$ kubectl get svc workspace-phi-3-5-mini
NAME                     TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)            AGE
workspace-phi-3-5-mini   ClusterIP   <CLUSTERIP>  <none>        80/TCP,29500/TCP   10m
$ export CLUSTERIP=$(kubectl get svc workspace-phi-3-5-mini -o jsonpath="{.spec.clusterIPs[0]}")

# find available models
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -s  http://$CLUSTERIP/v1/models | jq
{
  "object": "list",
  "data": [
    {
      "id": "phi-3.5-mini-instruct",
      "object": "model",
      "created": 1733370094,
      "owned_by": "vllm",
      "root": "/workspace/vllm/weights",
      "parent": null,
      "max_model_len": 16384
    }
  ]
}

# make an inference call using the model id (phi-3.5-mini-instruct) from previous step
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-3.5-mini-instruct",
    "prompt": "What is kubernetes?",
    "max_tokens": 7,
    "temperature": 0
  }'

Usage 使用说明

The detailed usage for Kaito supported models can be found in HERE. In case users want to deploy their own containerized models, they can provide the pod template in the inference field of the workspace custom resource (please see API definitions for details). The controller will create a deployment workload using all provisioned GPU nodes. Note that currently the controller does NOT handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist.
Kaito 支持的模型的详细使用说明请在此处查看。如果用户想部署自己的容器化模型，他们可以在工作区自定义资源（ inference 字段）中提供 pod 模板（请参阅 API 定义以获取详细信息）。控制器将创建使用所有已配置 GPU 节点的部署工作负载。请注意，当前控制器不处理自动模型升级。如果工作负载不存在，它仅根据预设配置创建推理工作负载。

The number of the supported models in Kaito is growing! Please check this document to see how to add a new supported model.
Kaito 支持的模型数量正在增长！请查看此文档了解如何添加新的支持模型。

Starting with version v0.3.0, Kaito supports model fine-tuning and using fine-tuned adapters in the inference service. Refer to the tuning document and inference document for more information.
从版本 v0.3.0 开始，Kaito 支持模型微调和在推理服务中使用微调适配器。请参阅微调文档和推理文档以获取更多信息。