Dragonfly GitHub:
https://github.com/dragonflyoss/Dragonfly2
本文主要解决在 Triton Server 模型拉取时,存在的中心化的模型仓库带宽瓶颈问题。当在 Triton Server 下载模型的时候,文件相对较大且会有并发下载模型的场景。这样很容易导致存储带宽被打满,从而引起下载过慢的情况,影响推理服务的使用。
Triton Server 链接:
https://github.com/triton-inference-server/server
这种方式比较好的解决方案是使用 Dragonfly 的 P2P 技术利用每个节点的闲置带宽缓解模型仓库的带宽压力,从而达到加速效果。在最理想的情况下 Dragonfly 可以让整个 P2P 集群中只有一个节点回源下载模型,其他节点流量均使用集群内 P2P 内网带宽。
Part.1 部署
通过集成 Dragonfly Repository Agent 到 Triton Server 中,使下载流量通过 Dragonfly 去拉取 S3,GCS,ABS 中存储的模型文件, 并在 Triton Serve 中进行注册。Triton Server 插件维护在 dragonfly-repository-agent 仓库中。
1. 依赖
所需软件 | 版本 | 链接 |
Kubernetes | 1.20+ | https://kubernetes.io/ |
Helm | 3.8.0+ | https://helm.sh/ |
Triton Server | 23.08-py3 | https://github.com/triton-inference-server/server |
注意:如果没有可用的 Kubernetes 集群进行测试,推荐使用 Kind。
Kind 链接:https://kind.sigs.k8s.io/
2. Dragonfly Kubernetes 集群搭建
基于 Kubernetes cluster 详细安装文档可以参考 quick-start-kubernetes。
| 准备 Kubernetes 集群
创建 Kind 多节点集群配置文件 kind-config.yaml, 配置如下:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
使用配置文件创建 Kind 集群:
kind create cluster --config kind-config.yaml
切换 Kubectl 的 context 到 Kind 集群:
kubectl config use-context kind-kind
| Kind 加载 Dragonfly 镜像
下载 Dragonfly Latest 镜像:
docker pull dragonflyoss/scheduler:latest
docker pull dragonflyoss/manager:latest
docker pull dragonflyoss/dfdaemon:latest
Kind 集群加载 Dragonfly Latest 镜像:
kind load docker-image dragonflyoss/scheduler:latest
kind load docker-image dragonflyoss/manager:latest
kind load docker-image dragonflyoss/dfdaemon:latest
| 基于 Helm Charts 创建 Dragonfly 集群
创建 Helm Charts 配置文件 charts-config.yaml。可以根据对象存储的下载路径修改dfdaemon.config.proxies.regx
来调整路由匹配规则。示例中默认匹配了 AWS S3 的请求,并且添加regx:.*models.*
以匹配存储桶models
的请求,配置如下:
scheduler:
image: dragonflyoss/scheduler
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
seedPeer:
image: dragonflyoss/dfdaemon
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
dfdaemon:
image: dragonflyoss/dfdaemon
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
proxy:
defaultFilter: 'Expires&Signature&ns'
security:
insecure: true
cacert: ''
cert: ''
key: ''
tcpListen:
namespace: ''
port: 65001
registryMirror:
url: https://index.docker.io
insecure: true
certs: []
direct: false
proxies:
- regx: blobs/sha256.*
# 代理匹配 Model Bucket 的所有 HTTP Downlowd 请求
- regx: .*models.*
manager:
image: dragonflyoss/manager
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
jaeger:
enable: true
使用配置文件部署 Dragonfly Helm Charts:
$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml
LAST DEPLOYED: Wed Nov 29 21:23:48 2023
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"
2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.
3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/
4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace dragonfly-system get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace dragonfly-system port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"
检查 Dragonfly 是否部署成功:
$ kubectl get pod -n dragonfly-system
NAME READY STATUS RESTARTS AGE
dragonfly-dfdaemon-8qcpd 1/1 Running 0 2m45s
dragonfly-dfdaemon-qhkn8 1/1 Running 0 2m45s
dragonfly-jaeger-6c44dc44b9-dfjfv 1/1 Running 0 2m45s
dragonfly-manager-549cd546b9-ps5tf 1/1 Running 0 2m45s
dragonfly-mysql-0 1/1 Running 0 2m45s
dragonfly-redis-master-0 1/1 Running 0 2m45s
dragonfly-redis-replicas-0 1/1 Running 0 2m45s
dragonfly-redis-replicas-1 1/1 Running 0 2m7s
dragonfly-redis-replicas-2 1/1 Running 0 101s
dragonfly-scheduler-0 1/1 Running 0 2m45s
dragonfly-seed-peer-0 1/1 Running 0 2m45s
| 暴露 Proxy 服务端口
创建dfstore.yaml 配置文件,暴露 Dragonfly Peer 的 HTTP Proxy 服务监听的端口,用于和 Triton Server 交互。targetPort
如果未在 charts-config.yaml 中修改默认为65001
, port 可根据实际情况设定值,建议也使用65001
。
kind: Service
apiVersion: v1
metadata:
name: dfstore
spec:
selector:
app: dragonfly
component: dfdaemon
release: dragonfly
ports:
- protocol: TCP
port: 65001
targetPort: 65001
type: NodePort
创建 Service:
kubectl --namespace dragonfly-system apply -f dfstore.yaml
将本地的65001
端口流量转发至 Dragonfly 的 Proxy 服务:
kubectl --namespace dragonfly-system port-forward service/dfstore 65001:65001
3. 安装 Dragonfly Repository Agent 插件
| 设置 Dragonfly Repository Agent 配置
创建dragonfly_config.json
配置文件,示例如下:
{
"proxy": "http://127.0.0.1:65001",
"header": {
},
"filter": [
"X-Amz-Algorithm",
"X-Amz-Credential&X-Amz-Date",
"X-Amz-Expires",
"X-Amz-SignedHeaders",
"X-Amz-Signature"
]
}
proxy: Dragonfly Peer 的 HTTP Proxy 的地址。
header: 为请求增加请求头。
filter: 用于生成唯一的任务,并过滤 URL 中不必要的查询参数。
配置文件中的filter
部分, 根据对象存储类型设置不同值:
类型 | 值 |
OSS | ["Expires","Signature","ns"] |
S3 | ["X-Amz-Algorithm", "X-Amz-Credential", "X-Amz-Date", "X-Amz-Expires", "X-Amz-SignedHeaders", "X-Amz-Signature"] |
OBS | ["X-Amz-Algorithm", "X-Amz-Credential", "X-Amz-Date", "X-Obs-Date", "X-Amz-Expires", "X-Amz-SignedHeaders", "X-Amz-Signature"] |
| 设置模型仓库配置
创建cloud_credential.json
云存储凭证文件,示例如下:
{
"gs": {
"": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS",
"gs://gcs-bucket-002": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS_2"
},
"s3": {
"": {
"secret_key": "AWS_SECRET_ACCESS_KEY",
"key_id": "AWS_ACCESS_KEY_ID",
"region": "AWS_DEFAULT_REGION",
"session_token": "",
"profile": ""
},
"s3://s3-bucket-002": {
"secret_key": "AWS_SECRET_ACCESS_KEY_2",
"key_id": "AWS_ACCESS_KEY_ID_2",
"region": "AWS_DEFAULT_REGION_2",
"session_token": "AWS_SESSION_TOKEN_2",
"profile": "AWS_PROFILE_2"
}
},
"as": {
"": {
"account_str": "AZURE_STORAGE_ACCOUNT",
"account_key": "AZURE_STORAGE_KEY"
},
"as://Account-002/Container": {
"account_str": "",
"account_key": ""
}
}
}
同时,通过 Dragonfly 拉取模型,需要修改模型中配置文件config.pbtxt
,添加如下代码:
model_repository_agents
{
agents [
{
name: "dragonfly",
}
]
}
densenet_onnx 示例提供已修改的配置及模型文件。其中densenet_onnx/config.pbtxt
文件修改如下:
name: "densenet_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
{
name: "data_0"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 224, 224 ]
reshape { shape: [ 1, 3, 224, 224 ] }
}
]
output [
{
name: "fc6_1"
data_type: TYPE_FP32
dims: [ 1000 ]
reshape { shape: [ 1, 1000, 1, 1 ] }
label_filename: "densenet_labels.txt"
}
]
model_repository_agents
{
agents [
{
name: "dragonfly",
}
]
}
4. Triton Server 集成 Dragonfly Repository Agent 插件
| Docker 部署 Triton Server
拉取自带插件的 dragonflyoss/dragonfly-repository-agent
镜像。构建细节参考 Dockerfile。
docker pull dragonflyoss/dragonfly-repository-agent:latest
运行容器并 mount model-repository 和 dragonfly-repository-agent 配置目录:
docker run --network host --rm \
-v ${path-to-config-dir}:/home/triton/ \
dragonflyoss/dragonfly-repository-agent:latest tritonserver \
--model-repository=${model-repository-path}
path-to-config-dir
:dragonfly_config.json
以及cloud_credential.json
文件所在目录。
model-repository-path
:远程模型仓库地址。
正确输出结果有以下内容:
=============================
== Triton Inference Server ==
=============================
successfully loaded 'densenet_onnx'
I1130 09:43:22.595672 1 server.cc:604]
+------------------+------------------------------------------------------------------------+
| Repository Agent | Path |
+------------------+------------------------------------------------------------------------+
| dragonfly | /opt/tritonserver/repoagents/dragonfly/libtritonrepoagent_dragonfly.so |
+------------------+------------------------------------------------------------------------+
I1130 09:43:22.596011 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I1130 09:43:22.596112 1 server.cc:674]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1 | READY |
+---------------+---------+--------+
I1130 09:43:22.598318 1 metrics.cc:703] Collecting CPU metrics
I1130 09:43:22.599373 1 tritonserver.cc:2435]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.37.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | s3://192.168.36.128:9000/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I1130 09:43:22.610334 1 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I1130 09:43:22.612623 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I1130 09:43:22.695843 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
执行指令,查看 Dragonfly 日志:
kubectl exec -it -n dragonfly-system dragonfly-dfdaemon-<id> -- tail -f /var/log/dragonfly/daemon/core.log
检查日志是否有以下内容:
{
"level":"info","ts":"2024-02-02 05:28:02.631",
"caller":"peer/peertask_conductor.go:1349",
"msg":"peer task done, cost: 352ms",
"peer":"10.244.2.3-1-4398a429-d780-423a-a630-57d765f1ccfc",
"task":"974aaf56d4877cc65888a4736340fb1d8fecc93eadf7507f531f9fae650f1b4d",
"component":"PeerTask",
"trace":"4cca9ce80dbf5a445d321cec593aee65"
}
| 功能验证
对于上述上传的densenet_onnx
模型,运行以下命令进行验证:
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
推理成功的响应:
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.349563 (504) = COFFEE MUG
13.227461 (968) = CUP
10.424893 (505) = COFFEEPOT
Part.2 性能测试
测试在单机情况下,集成 Dragonfly 后下载模型的性能对比,使用的对象存储为 minio。由于机器本身网络环境、配置等影响,实际下载时间不具有参考价值。不同场景下下载速度对应的比例较有意义:
Triton API:
直接通过 URL 下载对象存储里的模型文件。
Triton API & Dragonfly Cold Boot:
通过 Dragonfly 代理模式进行回源下载,没有命中任何缓存所花费的时间。
Hit Remote Peer:
通过 Dragonfly 代理模式,在命中 Dragonfly 的远端 Peer 缓存的情况下的下载时间。
Hit Local Peer:
通过 Dragonfly 代理模式,在命中 Dragonfly 的本地 Peer 缓存的情况下的下载时间。
测试结果表明 Triton Server 集成 Dragonfly,能有效减少下载时间。在命中缓存 ,尤其是本地缓存的情况下有较大的提升。即使是回源下载的性能也和直接下载的相差无几。注意的是本次测试为单机测试,意味着在命中缓存的情况下,性能瓶颈主要在磁盘。如果是多台机器部署的 Dragonfly 进行 P2P 下载的情况,模型下载速度会更快。
Part.3 相关链接
Dragonfly 社区:
Website:
https://d7y.io/
GitHub Repo:
https://github.com/dragonflyoss/Dragonfly2
Dragonfly Repository Agent Github Repo:
https://github.com/dragonflyoss/dragonfly-repository-agent
Slack Channel:
#dragonfly on CNCF Slack
Discussion Group:
dragonfly-discuss@googlegroups.com
NVIDIA Triton Inference Server:
Website:
https://developer.nvidia.com/triton-inference-server
GitHub Repo:
https://github.com/triton-inference-server/server
Dragonfly GitHub 仓库二维码
推荐阅读
Dragonfly 社区技术分享|Nydus Registry 鉴权与 Mount 改进
Dragonfly 社区技术分享|Nydus 远端转换缓存实现
Dragonfly 加速 Git LFS 大文件分发
Nydus Acceld 去 Containerd 服务化