- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
- https://github.com/triton-inference-server/server
- API:https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/README.html
- 执行模型包装: https://github.com/triton-inference-server/backend
- pytorch https://github.com/triton-inference-server/pytorch_backend
- 容器:
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
安装
按照安装文档要求安装CPU或GPU平台
GPU命令:
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
CPU命令:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
QuickStart
cpu-only
# Step 1: Create the example model repository
git clone -b r22.12 https://ghproxy.com/https://github.com/triton-inference-server/server.git #添加代理方便下载
cd server/docs/examples
./fetch_models.sh
# Step 2: Launch triton from the NGC Triton container
docker run -name triton -d -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models
# Step 3: Sending an Inference Request
# In a separate console, launch the image_client example from the NGC Triton SDK container
docker run -it --name triton-client --rm --net=host nvcr.io/nvidia/tritonserver:22.12-py3-sdk
# Step 4: Inference
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
# Inference should return the following
Image '/workspace/images/mug.jpg':
15.346230 (504) = COFFEE MUG
13.224326 (968) = CUP
10.422965 (505) = COFFEEPOT
- Step2:
![[Pasted image 20230203171629.png]]
Step 3:
![[Pasted image 20230203171643.png]]
![[Pasted image 20230203171702.png]]
Inference:
![[Pasted image 20230203171807.png]]
说明
- ![[Pasted image 20230209202207.png]]
- k8s异构集群上的单节点的推理服务
- 支持多个框架的模型
- ![[Pasted image 20230209202255.png]]
- 支持并行推理
- 单一模型多个线程的推理
- ![[Pasted image 20230209203611.png]]
- 多模型多线程推理
- ![[Pasted image 20230209203624.png]]
- ![[Pasted image 20230209215358.png]]
- ![[Pasted image 20230210092031.png]]
- torchserve
- rayserve
-
https://zhuanlan.zhihu.com/p/598468847
-
https://github.com/triton-inference-server/client/tree/main/src/python/examples
-
kserver :https://kserve.github.io/website/modelserving/inference_api/
-
端口分布
- grpc 8001
- http 8000
- metrics 8002
![[Pasted image 20230221151448.png]]
![[Pasted image 20230221152230.png]]
https://pypi.org/project/tritonclient/
img-client
![[Pasted image 20230221145251.png]]