triton-inference-server服务部署
手把手教你如何来使用triton-inference-server部署模型,让你避免模型部署中的各种坑,以及如何来优化模型的推理速度
修炼之路
主要研究领域包括图像分类、目标检测、OCR、人脸识别等,搞过跨平台的深度学习模型的部署解决方案,设计过分布式的深度学习模型服务架构。
展开
-
使用perf_analyzer和model-analyzer测试tritonserver的模型性能超详细完整版
在使用triton-server部署模型完成之后,我们需要测试一下模型服务的能力,这篇文章详细介绍了如何通过perf_analyzer和model-analyzer两款测试工具来测试模型服务的吞吐量原创 2023-09-04 11:49:00 · 1419 阅读 · 6 评论 -
perf_analyzer提示input INPUT contains dynamic shape, provide shapes to send along with the request
使用perf_analzer来解决动态shape的问题原创 2023-08-31 10:59:35 · 377 阅读 · 0 评论 -
Model Analyzer encountered an error: Failed to set the value for field “triton_server_path“
使用model-analyzer寻找模型最佳参数的报错,找不到triton_server_path原创 2023-08-30 15:30:23 · 147 阅读 · 0 评论 -
TensorRT报Cuda initialization failure with error
起因在使用TensorRT将onnx模型转换为engine时报[TRT] Cuda initialization failure with error详细错误信息如下[TensorRT] ERROR: CUDA initialization failure with error 222. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html问原创 2021-09-03 16:07:57 · 10259 阅读 · 2 评论 -
TensorRT onnx转engine报Assertion failed: dims.nbDims == 4 || dims.nbDims == 5
错误信息在使用TensorRT将onnx转为engine的时候报错,错误信息如下[08/27/2021-15:27:08] [I] === Model Options ===[08/27/2021-15:27:08] [I] Format: ONNX[08/27/2021-15:27:08] [I] Model: glintr100.onnx[08/27/2021-15:27:08] [I] Output:[08/27/2021-15:27:08] [I] === Build Options =原创 2021-08-27 15:38:54 · 2142 阅读 · 0 评论 -
triton server报The engine plan file is generated on an incompatible device
错误信息在启动triton inference server的时候报I0701 02:42:42.028366 1 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864I0701 02:42:42.031240 1 model_repository_manager.cc:1065] loading: resnet152:1E0701 02:43:00.935893 1 loggin原创 2021-07-01 11:03:40 · 3789 阅读 · 0 评论 -
triton-inference-server报Error details: model expected the shape of dimension 0 to be between
错误信息描述在使用perf_client对模型的性能做基准测试的时候报了如下错误./perf_client -m resnet152 -u 127.0.0.1:8001 -i grpc system --concurrency-range 4*** Measurement Settings *** Batch size: 1 Measurement window: 5000 msec Using synchronous calls for inference Stabilizing u原创 2021-06-30 10:56:22 · 1306 阅读 · 0 评论 -
triton-inference-server启动报Internal - failed to load all models
启动nvidia-serverdocker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full_path/deploy/models/:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/modelstrtexec --loadEngine=resnet152.engine#输出信息[06/25/2021-22.原创 2021-06-26 22:24:40 · 6364 阅读 · 3 评论 -
triton-inference-server启动报Invalid argument: unexpected inference
错误信息在启动tritonserver的时候报错,错误信息如下:I0625 14:41:46.915214 1 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864I0625 14:41:46.978097 1 model_repository_manager.cc:1065] loading: resnet152:1I0625 14:42:16.968665 1 plan_bac原创 2021-06-25 23:42:12 · 1849 阅读 · 0 评论