TensorRT入门：polygraphy模型调试器的使用

最新推荐文章于 2025-05-12 11:08:18 发布

郑小路

最新推荐文章于 2025-05-12 11:08:18 发布

阅读量5.3k

点赞数 20

分类专栏：模型部署文章标签：人工智能神经网络深度学习边缘计算

本文链接：https://blog.csdn.net/yitiaoxiaolu/article/details/136413877

版权

模型部署专栏收录该内容

9 篇文章

订阅专栏

文章目录

前言
一、Polygraphy是什么？
二、Polygraphy功能介绍
三、Polygraphy用法
四、Polygraphy模式
总结

前言

在模型迁移到 TensorRT 之后，我们还需要解决下面的三个问题，怎么检验 TonsRT 上计算的正确性和计算精度？怎么找出计算错误或者精度不足的层？怎么进行简单的计算图优化？为了解决上述问题，我们需要引入 Polygraphy 这个工具。它是一个NVIDIA提供的深度学习模型的调试器。

一、Polygraphy是什么？

Polygraphy是NVIDIA提供的深度学习模型调试和优化工具，它包含Python API和命令行界面。通过Polygraphy，用户可以评估模型的推理性能和准确性，进行模型优化和压缩，比较和验证同一网络转换前后，不同格式模型的差异，以便更好的进行模型调试和优化。Polygraphy提供了丰富的功能和工具，能够帮助用户调试和优化深度学习模型，提高模型的性能和效果。

二、Polygraphy功能介绍

作为一个模型调试工具，Polygraphy最主要的功能是查看并对比网络的逐层信息，便于工程师检查模型转换是否正确，另外Polygraphy还能将无法从ONNX转成TensorRT的子图分割出来，方便我们优化网络结构，同时定位无法转换的子图位置可以方便我们进行编写相应的Plugin支持。
在这里插入图片描述

三、Polygraphy用法

1.简单用法（可用trtexec代替）

通过ONNX-runtime来实现模型测试（没有用到trt）

# 01-Run polygraphy from ONNX file in onnxruntime without any more option
polygraphy run modelA.onnx \
    --onnxrt \
    > result-01.log 2>&1

构建并保存engine，然后测试推理性能

# 02-Parse ONNX file, build and save TensorRT engine with more options (see Help.txt to get more information)
# Notie:
# + For the shape option, use "," to separate dimensions and use " " to separate the tensors (which is different from trtexec)
# + For example options of a model with 3 input tensors named "tensorX" and "tensorY" and "tensorZ" should be like "--trt-min-shapes 'tensorX:[16,320,256]' 'tensorY:[8,4]' tensorZ:[]"
polygraphy run modelA.onnx \
    --trt \
    --save-engine model-02.plan \
    --save-timing-cache model-02.cache \
    --save-tactics model-02-tactics.json \
    --trt-min-shapes 'tensorX:[1,1,28,28]' \
    --trt-opt-shapes 'tensorX:[4,1,28,28]' \
    --trt-max-shapes 'tensorX:[16,1,28,28]' \
    --fp16 \
    --pool-limit workspace:1G \
    --builder-optimization-level 5 \
    --max-aux-streams 4 \
    --input-shapes   'tensorX:[4,1,28,28]' \
    --verbose \
    > result-02.log 2>&1

加载engine并运行调试

# 03-Run TensorRT engine
polygraphy run model-02.plan \
    --trt \
    --input-shapes 'tensorX:[4,1,28,28]' \
    --verbose \
    > result-03.log 2>&1

2.重要用法

对比Onnxruntime和TensorRT每层的输出

# 04-Compare the output of each layer between Onnxruntime and TensorRT
polygraphy run modelA.onnx \
    --onnxrt --trt \
    --save-engine=model-04.plan \
    --onnx-outputs mark all \
    --trt-outputs mark all \
    --trt-min-shapes 'tensorX:[1,1,28,28]' \
    --trt-opt-shapes 'tensorX:[4,1,28,28]' \
    --trt-max-shapes 'tensorX:[16,1,28,28]' \
    --input-shapes   'tensorX:[4,1,28,28]' \
    --atol 1e-3 --rtol 1e-3 \
    --verbose \
    > result-04.log 2>&1

补充：–atol 1e-3 --rtol 1e-3 选项设置了输出比较的绝对和相对容差。

对比Onnxruntime和TensorRT特定层的输出

# 05-Compare the output of certain layer(s) between Onnxruntime and TensorRT
# Notice:
# + Use " " to separate names of the tensors need to be compared
polygraphy run modelA.onnx \
    --onnxrt --trt \
    --save-engine=model-04.plan \
    --onnx-outputs A-V-2-MaxPool A-V-5-MaxPool \
    --trt-outputs A-V-2-MaxPool A-V-5-MaxPool \
    --trt-min-shapes 'tensorX:[1,1,28,28]' \
    --trt-opt-shapes 'tensorX:[4,1,28,28]' \
    --trt-max-shapes 'tensorX:[16,1,28,28]' \
    --input-shapes   'tensorX:[4,1,28,28]' \
    --atol 1e-3 --rtol 1e-3 \
    --verbose \
    > result-05.log 2>&1

判断一个ONNX文件是否被TensorRT原生支持，并将支持的与不支持的子图分开保存

# 06-Judge whether a ONNX file is supported by TensorRT natively
# Notice:
# + The modelB is not fully supportede by TensorRT, so the output directory "polygraphy_capability_dumps" is crerated, which contains information of the subgraphs supported / unsupported by TensorRT natively
polygraphy inspect capability modelB.onnx \
    > result-06-B.log 2>&1

使用polygraphy简化模型（常量折叠）

# 01-Simplify the graph using polygraphy
polygraphy surgeon sanitize modelA.onnx \
    --fold-constant \
    -o modelA-FoldConstant.onnx \
    > result-01.log

3.高级用法

生成一个脚本来完成推理测试工作，然后我们可以使用命令“python polygraphyRun.py”来实际运行它

# 06-Generate a script to do the same work as part 02, afterward we can use command "python polygraphyRun.py" to actually run it
polygraphy run modelA.onnx \
    --trt \
    --save-engine model-02.plan \
    --save-timing-cache model-02.cache \
    --save-tactics model-02-tactics.json \
    --trt-min-shapes 'tensorX:[1,1,28,28]' \
    --trt-opt-shapes 'tensorX:[4,1,28,28]' \
    --trt-max-shapes 'tensorX:[16,1,28,28]' \
    --fp16 \
    --pool-limit workspace:1G \
    --builder-optimization-level 5 \
    --max-aux-streams 4 \
    --input-shapes   'tensorX:[4,1,28,28]' \
    --silent \
    --gen-script=./polygraphyRun.py \
    > result-06.log 2>&1

补充：
–trt：这个选项启用TensorRT推理引擎。

–save-engine model-02.plan：这个选项指定了将模型转换为的TensorRT引擎的保存路径和名称。

–save-timing-cache model-02.cache：这个选项指定了保存TensorRT引擎的计时缓存的路径和名称。

–save-tactics model-02-tactics.json：这个选项指定了保存TensorRT引擎的优化策略的路径和名称。

–trt-min-shapes ‘tensorX:[1,1,28,28]’：这个选项指定了TensorRT引擎的最小输入形状。

–trt-opt-shapes ‘tensorX:[4,1,28,28]’：这个选项指定了TensorRT引擎的优化输入形状。

–trt-max-shapes ‘tensorX:[16,1,28,28]’：这个选项指定了TensorRT引擎的最大输入形状。

–fp16：这个选项启用TensorRT引擎的FP16精度。

–pool-limit workspace:1G：这个选项指定TensorRT引擎的工作空间限制为1GB。

–builder-optimization-level 5：这个选项指定TensorRT引擎构建器的优化级别为5。

–max-aux-streams 4：这个选项指定TensorRT引擎的最大辅助流数为4。

–input-shapes ‘tensorX:[4,1,28,28]’：这个选项指定输入张量的形状。

–silent：这个选项使Polygraphy在运行时没有输出。

–gen-script=./polygraphyRun.py：这个选项指定生成一个Polygraphy运行脚本polygraphyRun.py，用于以后的重复运行。

带Plugin的模型测试

# 07-Build and run TensorRT engine with plugins
make
polygraphy run modelB.onnx \
    --trt \
    --plugins ./AddScalarPlugin.so \
    > result-07.log 2>&1

补充：我们需要提前编译好plugin的可执行文件plugin.so。

验证ONNX文件的输出（用于快速验证ONNX模型有效性，并未进行TensorRT推理测试）

# 08-Validate the output of ONNX files
polygraphy run modelC.onnx \
    --onnxrt \
    --validate \
    --fail-fast \
    --verbose \
    >> result-08.log 2>&1

运行推理测试并保存输入和输出 / 指定输入并推理，对比输出误差

# 09-Save and load input/output data
polygraphy run model-02.plan \
    --trt \
    --input-shapes 'tensorX:[4,1,28,28]' \
    --save-inputs "input.json" \
    --save-outputs "output.json" \
    --verbose

polygraphy run model-02.plan \
    --trt \
    --input-shapes 'tensorX:[4,1,28,28]' \
    --load-inputs "input.json" \
    --load-outputs "output.json" \
    --verboses

四、Polygraphy模式

Polygraphy一共支持7种模式：run, convert, inspect, surgeon, template, debug, data。
在这里插入图片描述

总结

在本文中，我们详细介绍了Polygraphy工具及其在TensorRT中的应用。我们从Polygraphy的基本概念和功能开始，了解了它如何帮助我们调试和优化深度学习模型。我们探讨了Polygraphy的各种功能，包括比较模型逐层信息、计算图简化、子图分割等。通过大量的使用示例，我们学习了如何使用Polygraphy来解决不同的模型调试和优化问题。

通过本文的学习，我们不仅了解了Polygraphy的基本用法，还掌握了一些高级技巧和最佳实践。希望这些知识能够帮助您更好地使用Polygraphy，并在TensorRT中取得更好的性能和效果。