Triton下的Onnx/TensorRT/Pytorch哪一个更快

最新推荐文章于 2024-01-02 12:49:18 发布

mania_yan

最新推荐文章于 2024-01-02 12:49:18 发布

阅读量1.2k

点赞数 20

分类专栏： AI 文章标签：人工智能

本文链接：https://blog.csdn.net/yyw794/article/details/135175837

版权

AI 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

测试对象和平台

测试对象：（gpt，C-Dial gpt)

测试平台：Triton Inference Server

性能测试比较

onnx形态

什么是onnx？

Open Neural Network Exchange

ONNX | Home

运行命令：

docker run --rm --net=host hub.yun.paic.com.cn/pib-core/ibudda-triton:tritonserver-21.06-py3-sdk perf_analyzer -m ibuddha_chitchat_onnx --percentile=95 -u localhost:8010 -b 50 --shape input_ids:32 --shape attention_mask:32 --shape token_type_ids:32 --input-data zero

onnx和triton内onnx转tensorRT的性能测试

batch 1

batch 50

dynamic_batching { }

136

1500

dynamic_batching { }

optimization { execution_accelerators {

gpu_execution_accelerator : [ {

name : "tensorrt"

}]

}}

264

1430

对比同一模型的pytorch形态

docker run --rm --net=host hub.yun.paic.com.cn/pib-core/ibudda-triton:tritonserver-21.06-py3-sdk perf_analyzer -m ibuddha_chitchat --percentile=95 -u localhost:8010 -b 1 --shape INPUT__0:32 --shape INPUT__1:32 --shape INPUT__2:32 --input-data zero

	batch 1	batch 50
dynamic_batching	64	1330
dynamic_batching parameters: { key: "INFERENCE_MODE" value: { string_value:"true" } }	99	1370
dynamic_batching parameters: { key: "INFERENCE_MODE" value: { string_value:"true" } } parameters: { key: "ENABLE_NVFUSER" value: { string_value:"true" } }	91	1300