一、软件介绍
文末提供程序和源码下载
Kaito 是一个自动化 Kubernetes 集群中 AI/ML 模型推理或调优工作负载的 operator。目标模型是流行的开源大型模型,如 falcon 和 phi-3。
二、区别与大多数基于虚拟机基础设施构建的主流模型部署方法相比
- Manage large model files using container images. An OpenAI-compatible server is provided to perform inference calls.
使用容器镜像管理大型模型文件。提供兼容 OpenAI 的服务器以执行推理调用。 - Provide preset configurations to avoid adjusting workload parameters based on GPU hardware.
提供预设配置以避免根据 GPU 硬件调整工作负载参数。 - Provide support for popular open-sourced inference runtimes: vLLM and transformers.
提供对流行的开源推理运行时:vLLM 和 transformers 的支持。 - Auto-provision GPU nodes based on model requirements.
根据模型需求自动配置 GPU 节点。 - Host large model images in the public Microsoft Container Registry (MCR) if the license allows.
如果许可允许,将大型模型镜像托管在公共微软容器注册库(MCR)中。
Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
使用 Kaito,将大型 AI 推理模型在 Kubernetes 中的上线工作流程大大简化。
三、Architecture 架构
Kaito 遵循经典的 Kubernetes 自定义资源定义(CRD)/控制器设计模式。用户管理一个描述 GPU 需求和推理或调优规范的 workspace
自定义资源。Kaito 控制器将通过协调 workspace
自定义资源来自动化部署。
上图展示了 Kaito 架构概览。其主要组件包括:
- Workspace controller: It reconciles the
workspace
custom resource, createsmachine
(explained below) custom resources to trigger node auto provisioning, and creates the inference or tuning workload (deployment
,statefulset
orjob
) based on the model preset configurations.
工作空间控制器:它协调workspace
自定义资源,创建machine
(如下所述)自定义资源以触发节点自动配置,并根据模型预设配置创建推理或调优工作负载(deployment
,statefulset
或job
)。 - Node provisioner controller: The controller's name is gpu-provisioner in gpu-provisioner helm chart. It uses the
machine
CRD originated from Karpenter to interact with the workspace controller. It integrates with Azure Resource Manager REST APIs to add new GPU nodes to the AKS or AKS Arc cluster.
节点提供程序控制器:该控制器的名称在 gpu-provisioner helm 图表中为 gpu-provisioner。它使用由 Karpenter 生成的machine
CRD 与工作空间控制器交互。它集成了 Azure 资源管理器 REST API,以向 AKS 或 AKS Arc 集群添加新的 GPU 节点。
Note: The gpu-provisioner is an open sourced component. It can be replaced by other controllers if they support Karpenter-core APIs.
注意:gpu-provisioner 是一个开源组件。如果其他控制器支持 Karpenter-core API,则可以替换它。
四、安装
在开始之前,请确保您已安装以下工具:
- Azure CLI to provision Azure resources
用于预配 Azure 资源的 Azure CLI - Helm to install this operator
Helm 安装此 Operator - kubectl to view Kubernetes resources
kubectl 查看 Kubernetes 资源 - git to clone this repo locally
git 在本地克隆此存储库 - yq to process yaml files
yq 处理 YAML 文件 - jq to process JSON files
jq 处理 JSON 文件
Important Note: Ensure you use a release branch of the repository for a stable version of the installation.
重要说明:确保使用存储库的 release 分支进行稳定版本的安装。
If you do not already have an AKS cluster, run the following Azure CLI commands to create one:
如果还没有 AKS 群集,请运行以下 Azure CLI 命令来创建一个:
export RESOURCE_GROUP="myResourceGroup"
export MY_CLUSTER="myCluster"
export LOCATION="eastus"
az group create --name $RESOURCE_GROUP --location $LOCATION
az aks create --resource-group $RESOURCE_GROUP --name $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity --generate-ssh-keys
Connect to the AKS cluster.
连接到 AKS 群集。
az aks get-credentials --resource-group $RESOURCE_GROUP --name $MY_CLUSTER
If you do not have kubectl
installed locally, you can install using the following Azure CLI command.
如果尚未 kubectl
在本地安装,可以使用以下 Azure CLI 命令进行安装。
az aks install-cli
Install workspace controller
安装 workspace controller
Be sure you've cloned this repo and connected to your AKS cluster before attempting to install the Helm charts.
在尝试安装 Helm 图表之前,请确保已克隆此存储库并连接到 AKS 群集。
Install the Workspace controller.
安装 Workspace 控制器。
export KAITO_WORKSPACE_VERSION=0.4.5
helm install kaito-workspace --set clusterName=$MY_CLUSTER --wait \
https://github.com/kaito-project/kaito/raw/gh-pages/charts/kaito/workspace-$KAITO_WORKSPACE_VERSION.tgz --namespace kaito-workspace --create-namespace
Note that if you have installed another node provisioning controller that supports Karpenter-core APIs, the following steps for installing gpu-provisioner
can be skipped.
请注意,如果您安装了另一个支持 Karpenter 核心 API 的节点供应控制器,则可以跳过以下安装 gpu-provisioner
步骤。
Install gpu-provisioner controller
安装 gpu-provisioner 控制器
Enable Workload Identity and OIDC Issuer features
启用 Workload Identity 和 OIDC Issuer 功能
The gpu-provisioner controller requires the workload identity feature to acquire the access token to the AKS cluster.
gpu-provisioner 控制器需要工作负载标识功能来获取 AKS 群集的访问令牌。
Run the following commands only if your AKS cluster does not already have the Workload Identity and OIDC issuer features enabled.
仅当 AKS 群集尚未启用工作负载标识和 OIDC 颁发者功能时,才运行以下命令。
export RESOURCE_GROUP="myResourceGroup"
export MY_CLUSTER="myCluster"
az aks update -g $RESOURCE_GROUP -n $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity
Create an identity and assign permissions
创建身份并分配权限
The identity kaitoprovisioner
is created for the gpu-provisioner controller. It is assigned Contributor role for the managed cluster resource to allow changing $MY_CLUSTER
(e.g., provisioning new nodes in it).
将为 gpu-provisioner 控制器创建身份 kaitoprovisioner
。它被分配了托管集群资源的 Contributor 角色,以允许更改 $MY_CLUSTER
(例如,在其中预置新节点)。
export SUBSCRIPTION=$(az account show --query id -o tsv)
export IDENTITY_NAME="kaitoprovisioner"
az identity create --name $IDENTITY_NAME -g $RESOURCE_GROUP
export IDENTITY_PRINCIPAL_ID=$(az identity show --name $IDENTITY_NAME -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query 'principalId' -o tsv)
az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --scope /subscriptions/$SUBSCRIPTION/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerService/managedClusters/$MY_CLUSTER --role "Contributor"
Install helm charts 安装 helm charts
Important Note: For kaito 0.4.2 and above, please use gpu-provisioner 0.3.2 or higher. For versions below kaito 0.4.2, please use gpu-provisioner 0.2.1.
重要提示:对于 kaito 0.4.2 及更高版本,请使用 gpu-provisioner 0.3.2 或更高版本。对于 kaito 0.4.2 以下的版本,请使用 gpu-provisioner 0.2.1。
Install the Node provisioner controller.
安装 Node Provisioner 控制器。
# get additional values for helm chart install
export GPU_PROVISIONER_VERSION=0.3.3
curl -sO https://raw.githubusercontent.com/Azure/gpu-provisioner/main/hack/deploy/configure-helm-values.sh
chmod +x ./configure-helm-values.sh && ./configure-helm-values.sh $MY_CLUSTER $RESOURCE_GROUP $IDENTITY_NAME
helm install gpu-provisioner --values gpu-provisioner-values.yaml --set settings.azure.clusterName=$MY_CLUSTER --wait \
https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-$GPU_PROVISIONER_VERSION.tgz --namespace gpu-provisioner --create-namespace
Create the federated credential
创建联合凭据
The federated identity credential between the managed identity kaitoprovisioner
and the service account used by the gpu-provisioner controller is created.
创建托管标识 kaitoprovisioner
与 gpu-provisioner 控制器使用的服务帐户之间的联合标识凭据。
export AKS_OIDC_ISSUER=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query "oidcIssuerProfile.issuerUrl" -o tsv)
az identity federated-credential create --name kaito-federatedcredential --identity-name $IDENTITY_NAME -g $RESOURCE_GROUP --issuer $AKS_OIDC_ISSUER --subject system:serviceaccount:"gpu-provisioner:gpu-provisioner" --audience api://AzureADTokenExchange --subscription $SUBSCRIPTION
Then the gpu-provisioner can access the managed cluster using a trust token with the same permissions of the kaitoprovisioner
identity. Note that before finishing this step, the gpu-provisioner controller pod will constantly fail with the following message in the log:
然后,gpu-provisioner 可以使用具有与 kaitoprovisioner
身份相同权限的信任令牌访问托管集群。请注意,在完成此步骤之前, gpu-provisioner 控制器 pod 将不断失败,并在日志中显示以下消息:
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>panic: Configure azure client fails. Please ensure federatedcredential has been created for identity XXXX.
</code></span></span></span></span>
The pod will reach running state once the federated credential is created.
创建联合凭证后,Pod 将达到 Running 状态。
Verify installation 验证安装
You can run the following commands to verify the installation of the controllers were successful.
您可以运行以下命令来验证控制器安装是否成功。
Check status of the Helm chart installations.
检查 Helm Chart 安装的状态。
helm list -n kaito-workspace
helm list -n gpu-provisioner
Check status of the workspace
.
检查 的状态。 workspace
kubectl describe deploy kaito-workspace -n kaito-workspace
Check status of the gpu-provisioner
.
检查 的状态。 gpu-provisioner
kubectl describe deploy gpu-provisioner -n gpu-provisioner
Troubleshooting 故障 排除
If you see that the gpu-provisioner
deployment is not running after some time, it's possible that some values incorrect in your values.ovveride.yaml
.
如果您看到 gpu-provisioner
部署在一段时间后未运行,则可能是 values.ovveride.yaml
.
Run the following command to check gpu-provisioner
pod logs for additional details.
运行以下命令以检查 gpu-provisioner
Pod 日志以获取更多详细信息。
kubectl logs --selector=app.kubernetes.io\/name=gpu-provisioner -n gpu-provisioner
Clean up 收拾
helm uninstall gpu-provisioner -n gpu-provisioner
helm uninstall kaito-workspace -n kaito-workspace
五、Quick start 快速开始
安装 Kaito 后,可以尝试以下命令来启动 phi-3.5-mini-instruct 推理服务。
$ cat examples/inference/kaito_workspace_phi_3.5-instruct.yaml
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-phi-3-5-mini
resource:
instanceType: "Standard_NC24ads_A100_v4"
labelSelector:
matchLabels:
apps: phi-3-5
inference:
preset:
name: phi-3.5-mini-instruct
$ kubectl apply -f examples/inference/kaito_workspace_phi_3.5-instruct.yaml
The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True
, the model has been deployed successfully.
工作空间状态可以通过运行以下命令进行跟踪。当 WORKSPACEREADY 列变为 True
时,模型已成功部署。
$ kubectl get workspace workspace-phi-3-5-mini
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
workspace-phi-3-5-mini Standard_NC24ads_A100_v4 True True True 4h15m
Next, one can find the inference service's cluster ip and use a temporal curl
pod to test the service endpoint in the cluster.
接下来,可以找到推理服务的集群 IP,并使用一个临时的 curl
Pod 来测试集群中的服务端点。
# find service endpoint
$ kubectl get svc workspace-phi-3-5-mini
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
workspace-phi-3-5-mini ClusterIP <CLUSTERIP> <none> 80/TCP,29500/TCP 10m
$ export CLUSTERIP=$(kubectl get svc workspace-phi-3-5-mini -o jsonpath="{.spec.clusterIPs[0]}")
# find available models
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -s http://$CLUSTERIP/v1/models | jq
{
"object": "list",
"data": [
{
"id": "phi-3.5-mini-instruct",
"object": "model",
"created": 1733370094,
"owned_by": "vllm",
"root": "/workspace/vllm/weights",
"parent": null,
"max_model_len": 16384
}
]
}
# make an inference call using the model id (phi-3.5-mini-instruct) from previous step
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-3.5-mini-instruct",
"prompt": "What is kubernetes?",
"max_tokens": 7,
"temperature": 0
}'
Usage 使用说明
The detailed usage for Kaito supported models can be found in HERE. In case users want to deploy their own containerized models, they can provide the pod template in the inference
field of the workspace custom resource (please see API definitions for details). The controller will create a deployment workload using all provisioned GPU nodes. Note that currently the controller does NOT handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist.
Kaito 支持的模型的详细使用说明请在此处查看。如果用户想部署自己的容器化模型,他们可以在工作区自定义资源( inference
字段)中提供 pod 模板(请参阅 API 定义以获取详细信息)。控制器将创建使用所有已配置 GPU 节点的部署工作负载。请注意,当前控制器不处理自动模型升级。如果工作负载不存在,它仅根据预设配置创建推理工作负载。
The number of the supported models in Kaito is growing! Please check this document to see how to add a new supported model.
Kaito 支持的模型数量正在增长!请查看此文档了解如何添加新的支持模型。
Starting with version v0.3.0, Kaito supports model fine-tuning and using fine-tuned adapters in the inference service. Refer to the tuning document and inference document for more information.
从版本 v0.3.0 开始,Kaito 支持模型微调和在推理服务中使用微调适配器。请参阅微调文档和推理文档以获取更多信息。
六、软件下载
本文信息来源于GitHub作者地址:GitHub - kaito-project/kaito: Kubernetes AI Toolchain Operator