TensorRT:onnx parser与onnx-graphsurgeon的解析与实践


前言

欢迎阅读本篇博客《TensorRT:onnx parser与onnx-graphsurgeon的解析与实践》!在如今深度学习领域的快速发展中,模型的部署和优化变得越来越重要。TensorRT作为一种高性能推理引擎,为我们提供了优化和加速深度学习模型的能力。而在TensorRT中,ONNX parseronnx-graphsurgeon则是两个强大的工具,能够帮助我们更好地解析和优化ONNX模型。

本博客将分为三个章节进行介绍。首先,我们将深入探讨ONNX parser,它是TensorRT中用于解析ONNX格式模型的重要组件。我们将详细介绍ONNX parser的原理和使用方法,帮助读者快速上手。

接下来,我们将重点关注onnx-graphsurgeon。onnx-graphsurgeon是一个功能强大的工具,可以帮助我们对ONNX模型进行图形操作和优化。通过使用onnx-graphsurgeon,我们可以轻松地进行模型剪枝、层融合、层替换等操作,从而进一步优化我们的模型。

在本博客的最后一个章节,我们将提供大量的onnx-graphsurgeon使用方法示例。这些示例将涵盖各种常见的优化操作,帮助读者更好地理解和运用onnx-graphsurgeon。通过学习这些示例,我们将能够更加灵活和高效地使用onnx-graphsurgeon对模型进行优化。


一、ONNX Parser

1.onnx parser是什么?

ONNX Parser是一种用于解析和加载ONNX模型的工具或库。它负责将ONNX模型文件解析为内存中的数据结构,以便后续在特定的推理引擎或框架(如TensorRT)中使用。

通常各种推理引擎或框架都会提供ONNX Parser的组件(如TensorRT的trt.OnnxParser()方法)。它能够创建一个Parser解析器读取ONNX模型文件,并将其转换为推理引擎或框架所需的数据结构和表示形式。这些数据结构可以是引擎或框架自己定义的格式,用于表示模型的网络结构、权重参数和其他元数据。

总而言之,ONNX Parser的主要功能包括解析ONNX模型文件、提取模型的网络结构和参数信息、执行必要的数据预处理和转换,以便将模型加载到推理引擎或框架中进行推理。通过使用ONNX Parser,开发者可以方便地将ONNX模型与特定的推理引擎或框架集成,从而实现模型的部署和推理。

2.TensorRT搭建网络的三种方式

对于一个已经训练好了的模型,我们要怎么把它从原来的框架下迁移到 TensorRT 中进行部署呢?TensorRT搭建网络主要有三种方法

一、使用原框架自带的 TensorRT 接口。
二、使用 Parser 将模型转换到 TensorRT 中再使用。
三、使用TensorRT 原生的 API 来重建整个模型。

在这里插入图片描述

对于第一种方法,它比较简单灵活,部署的过程仍然在原框架当中。遇到不支持的算子模型,可以自动返回到原框架中进行计算,无需我们书写 plugin 等插件,这使得它的易用性和开发效率很高。但是代价是性能和兼容性稍有欠缺。对于第二种方法,这个流程在最近的TensorRT 版本当中逐渐趋于成熟。像 Onnx 这样的中间模型表示通用性很好,也方便我们做网络调整。它兼顾了效率和性能。对于遇到不支持的算子的时候,我们可能需要采用一些修改网络,修改 Parser 或者书写 plugin 的方法来进行补救。对于第三种方法,它的性能是最好的,对我们对网络的精细控制也能做到最好。它对 GPU 的兼容性也最好,但是代价是易用性和开发效率比较低,当我们遇到不支持的算子的时候,我们只能使用自己书写的 CUDA C++ 来完成一个 plugin 来帮助网络进行计算。

使用 Parser 是目前最推荐的做法,使用框架自带的 TensorRT 接口,这个方法虽然简单,但对性能优化提升不高,使用TensorRT原生API重建模型虽然性能好,但是随着深度学习的发展,模型越来越复杂,大模型的亿级参数和上千层的网络层数,手工重建模型的工作量巨大,也不现实,同时,随着 Parser 技术逐渐变得成熟,实践当中我们不太需要一层一层的去手工搭建整个网络。

3.onnx parser的痛点

ONNX Parser虽然给模型解析带来了很大的便利,但在一些方面仍然存在一些痛点,包括:
在这里插入图片描述

  • 支持算子不全:ONNX是一个开放的中间表示格式,支持多种深度学习算子。然而,不同的深度学习框架和推理引擎对算子的支持程度可能不同。因此,在使用ONNX Parser时,可能会遇到一些特定算子不被完全支持的情况。这可能需要手动实现或使用其他技术来模拟这些不支持的算子,以便成功加载和解析模型。

  • ONNX 算子拆分导致模型结构碎片化和冗余:ONNX Lower是将高级算子转换为低级算子的过程。在转换过程中,模型结构可能会变得碎片化和冗余,影响推理速度。

  • ONNX节点阻碍TensorRT Myelin的自动合并:TensorRT Myelin是TensorRT中的一种优化技术,用于自动合并多个算子以提高推理性能。然而,某些ONNX节点可能会阻碍Myelin的自动合并。在这种情况下,可能需要手动优化模型结构或使用其他技术,以实现更好的自动合并效果。

  • 无法支持模型转换失败的情况:有时,将某些模型转换为ONNX格式可能会失败,可能是因为模型中包含不受支持的特定功能或结构。在这种情况下,可能需要检查转换过程中的错误信息,并考虑使用其他工具或方法(onnx-graphsurgeon)来成功地将模型转换为ONNX格式。

  • 无法满足手动合并算子进行深度优化的情景:在某些情况下,手动合并算子可以实现更深度的优化和性能改进。然而,ONNX Parser可能无法提供直接的支持来进行手动合并算子的操作。在这种情况下,可能需要考虑使用其他工具(onnx-graphsurgeon)或编写自定义代码来手动合并算子以进行更深度的优化。

二、Parser的使用

1.Parser解析ONNX模型

这里以MNIST手写数字识别为例,使用的TensorRT版本为8.6.1

关键代码如下:

logger = trt.Logger(trt.Logger.VERBOSE)  #创建logger
builder = trt.Builder(logger)			 #创建builder
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) #创建network并设置为动态batch模式
profile = builder.create_optimization_profile() #创建profile
config = builder.create_builder_config()        #创建config
if bUseFP16Mode:                                #开启FP16
    config.set_flag(trt.BuilderFlag.FP16)
if bUseINT8Mode:                                #开始INT8(需要校准器calibrator)
    config.set_flag(trt.BuilderFlag.INT8)
    config.int8_calibrator = calibrator.MyCalibrator(calibrationDataPath, nCalibration, (1, 1, nHeight, nWidth), cacheFile)

parser = trt.OnnxParser(network, logger) #创建parser解析器
if not os.path.exists(onnxFile):
    print("Failed finding ONNX file!")
    exit()
print("Succeeded finding ONNX file!")
with open(onnxFile, "rb") as model:      #解析ONNX模型文件
    if not parser.parse(model.read()):
        print("Failed parsing .onnx file!")
        for error in range(parser.num_errors):
            print(parser.get_error(error))
        exit()
    print("Succeeded parsing .onnx file!")

完整代码如下:


import os
from datetime import datetime as dt
from glob import glob

import calibrator
import cv2
import numpy as np
import tensorrt as trt
import torch as t
import torch.nn.functional as F
from cuda import cudart
from torch.autograd import Variable

np.random.seed(31193)
t.manual_seed(97)
t.cuda.manual_seed_all(97)
t.backends.cudnn.deterministic = True
nTrainBatchSize = 128
nHeight = 28
nWidth = 28
onnxFile = "./model.onnx"
trtFile = "./model.plan"
dataPath = os.path.dirname(os.path.realpath(__file__)) + "/../../00-MNISTData/"
trainFileList = sorted(glob(dataPath + "train/*.jpg"))
testFileList = sorted(glob(dataPath + "test/*.jpg"))
inferenceImage = dataPath + "8.png"

# for FP16 mode
bUseFP16Mode = False
# for INT8 model
bUseINT8Mode = False
nCalibration = 1
cacheFile = "./int8.cache"
calibrationDataPath = dataPath + "test/"

os.system("rm -rf ./*.onnx ./*.plan ./*.cache")
np.set_printoptions(precision=3, linewidth=200, suppress=True)
cudart.cudaDeviceSynchronize()

# Create network and train model in pyTorch ------------------------------------
class Net(t.nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = t.nn.Conv2d(1, 32, (5, 5), padding=(2, 2), bias=True)
        self.conv2 = t.nn.Conv2d(32, 64, (5, 5), padding=(2, 2), bias=True)
        self.fc1 = t.nn.Linear(64 * 7 * 7, 1024, bias=True)
        self.fc2 = t.nn.Linear(1024, 10, bias=True)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = x.reshape(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        y = self.fc2(x)
        z = F.softmax(y, dim=1)
        z = t.argmax(z, dim=1)
        return y, z

class MyData(t.utils.data.Dataset):

    def __init__(self, isTrain=True):
        if isTrain:
            self.data = trainFileList
        else:
            self.data = testFileList

    def __getitem__(self, index):
        imageName = self.data[index]
        data = cv2.imread(imageName, cv2.IMREAD_GRAYSCALE)
        label = np.zeros(10, dtype=np.float32)
        index = int(imageName[-7])
        label[index] = 1
        return t.from_numpy(data.reshape(1, nHeight, nWidth).astype(np.float32)), t.from_numpy(label)

    def __len__(self):
        return len(self.data)

model = Net().cuda()
ceLoss = t.nn.CrossEntropyLoss()
opt = t.optim.Adam(model.parameters(), lr=0.001)
trainDataset = MyData(True)
testDataset = MyData(False)
trainLoader = t.utils.data.DataLoader(dataset=trainDataset, batch_size=nTrainBatchSize, shuffle=True)
testLoader = t.utils.data.DataLoader(dataset=testDataset, batch_size=nTrainBatchSize, shuffle=True)

for epoch in range(10):
    for xTrain, yTrain in trainLoader:
        xTrain = Variable(xTrain).cuda()
        yTrain = Variable(yTrain).cuda()
        opt.zero_grad()
        y_, z = model(xTrain)
        loss = ceLoss(y_, yTrain)
        loss.backward()
        opt.step()

    with t.no_grad():
        acc = 0
        n = 0
        for xTest, yTest in testLoader:
            xTest = Variable(xTest).cuda()
            yTest = Variable(yTest).cuda()
            y_, z = model(xTest)
            acc += t.sum(z == t.matmul(yTest, t.Tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).to("cuda:0"))).cpu().numpy()
            n += xTest.shape[0]
        print("%s, epoch %2d, loss = %f, test acc = %f" % (dt.now(), epoch + 1, loss.data, acc / n))

print("Succeeded building model in pyTorch!")

# Export model as ONNX file ----------------------------------------------------
t.onnx.export(model, t.randn(1, 1, nHeight, nWidth, device="cuda"), onnxFile, input_names=["x"], output_names=["y", "z"], do_constant_folding=True, verbose=True, keep_initializers_as_inputs=True, opset_version=12, dynamic_axes={"x": {0: "nBatchSize"}, "z": {0: "nBatchSize"}})
'''
t.randn(1, 1, nHeight, nWidth, device="cuda"): 
用于生成一个随机输入的示例张量,形状为(1, 1, nHeight, nWidth)。这里使用了torch.randn函数生成正态分布随机数,
并将张量放在CUDA设备上。
dynamic_axes={"x": {0: "nBatchSize"}, "z": {0: "nBatchSize"}}: 当模型具有动态形状时,可以指定动态轴的名称
和索引。这个参数是一个字典,其中键是输入/输出张量的名称,值是一个字典,指定了轴的索引和名称。在这个例子中,指定
了输入张量"x"的动态轴为索引0,并将其命名为"nBatchSize";同样,指定了输出张量"z"的动态轴为索引0,并将其命名为
"nBatchSize"。
'''
print("Succeeded converting model into ONNX!")

# Parse network, rebuild network and do inference in TensorRT ------------------
logger = trt.Logger(trt.Logger.VERBOSE)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
profile = builder.create_optimization_profile()
config = builder.create_builder_config()
if bUseFP16Mode:
    config.set_flag(trt.BuilderFlag.FP16)
if bUseINT8Mode:
    config.set_flag(trt.BuilderFlag.INT8)
    config.int8_calibrator = calibrator.MyCalibrator(calibrationDataPath, nCalibration, (1, 1, nHeight, nWidth), cacheFile)

parser = trt.OnnxParser(network, logger)
if not os.path.exists(onnxFile):
    print("Failed finding ONNX file!")
    exit()
print("Succeeded finding ONNX file!")
with open(onnxFile, "rb") as model:
    if not parser.parse(model.read()):
        print("Failed parsing .onnx file!")
        for error in range(parser.num_errors):
            print(parser.get_error(error))
        exit()
    print("Succeeded parsing .onnx file!")

inputTensor = network.get_input(0)
profile.set_shape(inputTensor.name, [1, 1, nHeight, nWidth], [4, 1, nHeight, nWidth], [8, 1, nHeight, nWidth])
config.add_optimization_profile(profile)

network.unmark_output(network.get_output(0))  # remove output tensor "y"
engineString = builder.build_serialized_network(network, config)
if engineString == None:
    print("Failed building engine!")
    exit()
print("Succeeded building engine!")
with open(trtFile, "wb") as f:
    f.write(engineString)
engine = trt.Runtime(logger).deserialize_cuda_engine(engineString)
nIO = engine.num_io_tensors
lTensorName = [engine.get_tensor_name(i) for i in range(nIO)]
nInput = [engine.get_tensor_mode(lTensorName[i]) for i in range(nIO)].count(trt.TensorIOMode.INPUT)

context = engine.create_execution_context()
context.set_input_shape(lTensorName[0], [1, 1, nHeight, nWidth])
for i in range(nIO):
    print("[%2d]%s->" % (i, "Input " if i < nInput else "Output"), engine.get_tensor_dtype(lTensorName[i]), engine.get_tensor_shape(lTensorName[i]), context.get_tensor_shape(lTensorName[i]), lTensorName[i])

bufferH = []
data = cv2.imread(inferenceImage, cv2.IMREAD_GRAYSCALE).astype(np.float32).reshape(1, 1, nHeight, nWidth)
bufferH.append(np.ascontiguousarray(data))
for i in range(nInput, nIO):
    bufferH.append(np.empty(context.get_tensor_shape(lTensorName[i]), dtype=trt.nptype(engine.get_tensor_dtype(lTensorName[i]))))
bufferD = []
for i in range(nIO):
    bufferD.append(cudart.cudaMalloc(bufferH[i].nbytes)[1])

for i in range(nInput):
    cudart.cudaMemcpy(bufferD[i], bufferH[i].ctypes.data, bufferH[i].nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice)

for i in range(nIO):
    context.set_tensor_address(lTensorName[i], int(bufferD[i]))

context.execute_async_v3(0)

for i in range(nInput, nIO):
    cudart.cudaMemcpy(bufferH[i].ctypes.data, bufferD[i], bufferH[i].nbytes, cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost)

for i in range(nIO):
    print(lTensorName[i])
    print(bufferH[i])

for b in bufferD:
    cudart.cudaFree(b)

print("Succeeded running model in TensorRT!")

三、onnx-graphsurgeon

1.onnx-graphsurgeon简介

onnx-graphsurgeon是NVIDIA推出的一种TensorRT开发辅助工具,用于编辑和优化ONNX模型。它提供了一种简单而强大的方式来修改和转换ONNX模型的图形表示,以便更好地适应TensorRT的优化和推理。

2.onnx-graphsurgeon使用

onnx-graphsurgeon的功能:

1、修改计算图: 图属性/节点/张量/节点和张量的连接/权重
2、修改子图: 添加/删除/替换/隔离
3、优化计算图: 常量折叠/拓扑排序/去除无用层

下面代码介绍了onnx-graphsurgeon的常用API,包括在Graph层面和Node层面的相应操作。

markGraphOutput函数介绍了标记Graph输出所使用到的相关API;
addNode函数介绍了添加节点所使用到的相关API;


from collections import OrderedDict
from copy import deepcopy

import numpy as np
import onnx
import onnx_graphsurgeon as gs


def markGraphOutput(graph, lNode, bMarkOutput=True, bMarkInput=False, lMarkOutput=None, lMarkInput=None, bRemoveOldOutput=True):
    # graph:            The ONNX graph for edition
    # lNode:            The list of nodes we want to mark as output
    # bMarkOutput:      Whether to mark the output tensor(s) of the nodes in the lNode
    # bMarkInput:       Whether to mark the input tensor(s) of the nodes in the lNode
    # lMarkOutput:      The index of output tensor(s) of the node are marked as output, only available when len(lNode) == 1
    # lMarkInput:       The index of input tensor(s) of the node are marked as output, only available when len(lNode) == 1
    # bRemoveOldOutput: Whether to remove the original output of the network (cutting the graph to the node we want to mark to save ytime of building)

    # In most cases, using the first 4 parameters is enough, for example:
    #markGraphOutput(graph, ["/Conv"])                          # mark output tensor of the node "/Conv" as output
    #markGraphOutput(graph, ["/Conv"], False, True)             # mark input tensors of the node "/Conv" (input tensor + weight + bias) as output
    #markGraphOutput(graph, ["/TopK"], lMarkOutput=[1])         # mark the second output tensor of the node "/TopK" as output
    #markGraphOutput(graph, ["/Conv"], bRemoveOldOutput=False)  # mark output tensor of the node "/Conv" as output, and keep the original output of the network
    
# graph:用于编辑的ONNX图形
# lNode:我们想要标记为输出的节点列表
# bMarkOutput:是否标记lNode中节点的输出张量
# bMarkInput:是否在lNode中标记节点的输入张量
# lMarkOutput:节点的输出张量索引被标记为输出,仅当len(lNode) == 1时可用
# lMarkInput:节点的输入张量的索引被标记为输出,仅当len(lNode) == 1时可用
# bRemoveOldOutput:是否删除网络的原始输出(将图切割到我们想要标记的节点以节省构建时间)

#在大多数情况下,使用前4个参数就足够了,例如:
#markGraphOutput(graph, ["/Conv"]) #将节点"/Conv"的输出张量标记为输出
#markGraphOutput(graph, ["/Conv"], False, True) #标记节点"/Conv"的输入张量(输入张量+权重+偏置)作为输出
#markGraphOutput(graph, ["/TopK"], lMarkOutput=[1]) #将节点"/TopK"的第二个输出张量标记为输出
#markGraphOutput(graph, ["/Conv"], bRemoveOldOutput=False) #将节点"/Conv"的输出张量标记为输出,保持网络的原始输出

    if bRemoveOldOutput:
        graph.outputs = []
    for node in graph.nodes:
        if node.name in lNode:
            if bMarkOutput:
                if lMarkOutput is None or len(lNode) > 1:
                    lMarkOutput = range(len(node.outputs))
                for index in lMarkOutput:
                    graph.outputs.append(node.outputs[index])
                    print("Mark node [%s] output tensor [%s]" % (node.name, node.outputs[index].name))
            if bMarkInput:
                if lMarkInput is None or len(lNode) > 1:
                    lMarkInput = range(len(node.inputs))
                for index in lMarkInput:
                    graph.outputs.append(node.inputs[index])
                    print("Mark node [%s] input  tensor [%s]" % (node.name, node.inputs[index].name))

    graph.cleanup().toposort()
    return len(lNode)

def addNode(graph, nodeType, prefix, number, inputList, attribution=None, suffix="", dtype=None, shape=None):
    # ONLY for the node with one output tensor!!

    # graph:        The ONNX graph for edition
    # nodeType:     The type of the node to add, for example, "Concat"
    # prefix:       Optimization type, for example "RemoveLoop"
    # number:       An incremental number to prevent duplicate names
    # inputlist:    The list of input tensors for the node
    # attribution:  The attribution dictionary of the node, for example, OrderedDict([('axis',0)])
    # suffix:       Extra name for marking the tensor, for example "bTensor"
    # dtype:        The data type of the output tensor (optional)
    # shape:        The shape of the output tensor (optional)

#只适用于只有一个输出张量的节点!!

# graph:用于编辑的ONNX图形
# nodeType:要添加的节点类型,例如“Concat”。
# prefix:优化类型,例如“RemoveLoop”
# number:一个增量数字,以防止重复的名称
# inputlist:节点的输入张量列表
# attribution:节点的属性字典,例如OrderedDict([('axis',0)])
# suffix:用于标记张量的额外名称,例如“bTensor”
# dtype:输出张量的数据类型(可选)
# shape:输出张量的形状(可选)

    tensorName = prefix + "-V-" + str(number) + "-" + nodeType
    nodeName = prefix + "-N-" + str(number) + "-" + nodeType
    if attribution == None:
        attribution = OrderedDict()
    if len(suffix) > 0:
        tensorName += "-" + suffix

    tensor = gs.Variable(tensorName, dtype, shape)
    node = gs.Node(nodeType, nodeName, inputs=inputList, outputs=[tensor], attrs=attribution)
    graph.nodes.append(node)
    return tensor, number + 1


1.创建模型

使用Onnx Graphsurgeon 创建一个 ONNX graph

from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs
#定义节点输入的变量
tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])  # define tensor (variable in ONNX)
tensor1 = gs.Variable("tensor1", np.float32, None)  # type or shape of the intermediate tensors can be None
tensor2 = gs.Variable("tensor2", np.float32, None)
tensor3 = gs.Variable("tensor3", np.float32, None)
#定义节点输入的常量
constant0 = gs.Constant(name="constant0", values=np.ones(shape=[1, 3, 3, 3], dtype=np.float32))  # define constant tensor
constant1 = gs.Constant(name="constant1", values=np.ones(shape=[1], dtype=np.float32))
#定义节点
node0 = gs.Node("Conv", "myConv", inputs=[tensor0, constant0], outputs=[tensor1])  # defione node
node0.attrs = OrderedDict([["dilations", [1, 1]], ["kernel_shape", [3, 3]], ["pads", [1, 1, 1, 1]], ["strides", [1, 1]]])  # attribution of the node
node1 = gs.Node("Add", "myAdd", inputs=[tensor1, constant1], outputs=[tensor2])
node2 = gs.Node("Relu", "myRelu", inputs=[tensor2], outputs=[tensor3])
#定义Graph
graph = gs.Graph(nodes=[node0, node1, node2], inputs=[tensor0], outputs=[tensor3])  # define graph
#topsort:有向图的拓扑排序
graph.cleanup().toposort()  # clean the graph before saving as ONNX file
onnx.save(gs.export_onnx(graph), "model-01.onnx")

生成的ONNX文件:
在这里插入图片描述

2.添加节点

from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])
tensor1 = gs.Variable("tensor1", np.float32, None)
tensor2 = gs.Variable("tensor2", np.float32, None)
#gs.Node(节点类型,节点名称,输入,输出)
node0 = gs.Node("Identity", "myIdentity0", inputs=[tensor0], outputs=[tensor1])
node1 = gs.Node("Identity", "myIdentity1", inputs=[tensor1], outputs=[tensor2])

graph = gs.Graph(nodes=[node0, node1], inputs=[tensor0], outputs=[tensor2])
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-02-01.onnx")

del graph

graph = gs.import_onnx(onnx.load("model-02-01.onnx"))  # load the graph from ONNX file
for node in graph.nodes:
    if node.op == "Identity" and node.name == "myIdentity0":  # find the place we want to add ndoe
        constant0 = gs.Constant(name="constant0", values=np.ones(shape=[1, 1, 1, 1], dtype=np.float32))  # construct the new variable and node
        tensor3 = gs.Variable("tensor3", np.float32, None)
        newNode = gs.Node("Add", "myAdd", inputs=[node.outputs[0], constant0], outputs=[tensor3])

        graph.nodes.append(newNode)  # REMEMBER to add the new node into the grap
        index = node.o().inputs.index(node.outputs[0])  # find the next node
        node.o().inputs[index] = tensor3  # replace the input tensor of next node as the new tensor
#注意:node.o()为node的输出node;node.outputs[]为node的输出Tensor
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-02-02.onnx")

在这里插入图片描述

3.删除节点


from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])
tensor1 = gs.Variable("tensor1", np.float32, None)
tensor2 = gs.Variable("tensor2", np.float32, None)
tensor3 = gs.Variable("tensor3", np.float32, None)
tensor4 = gs.Variable("tensor4", np.float32, None)

node0 = gs.Node("Identity", "Node0", inputs=[tensor0], outputs=[tensor1])
node1 = gs.Node("TrashNode", "Node1", inputs=[tensor1], outputs=[tensor2])
node2 = gs.Node("Identity", "Node2", inputs=[tensor2], outputs=[tensor3])
node3 = gs.Node("Identity", "Node3", inputs=[tensor2], outputs=[tensor4])

graph = gs.Graph(nodes=[node0, node1, node2, node3], inputs=[tensor0], outputs=[tensor3, tensor4])
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-03-01.onnx")

del graph

graph = gs.import_onnx(onnx.load("model-03-01.onnx"))
for node in graph.nodes:
    if node.op == "TrashNode" and node.name == "Node1":
        inputTensor = node.inputs[0]
        outputTensor = node.outputs[0]
        for subNode in graph.nodes:  # search all nodes in case of the output tensor is used by multiple nodes
            if outputTensor in subNode.inputs:
                index = subNode.inputs.index(outputTensor)
                subNode.inputs[index] = inputTensor

graph.cleanup().toposort()  # the TrashNode node will be removed during graph clean
onnx.save(gs.export_onnx(graph), "model-03-02.onnx")

在这里插入图片描述

4.替换节点


from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])
tensor1 = gs.Variable("tensor1", np.float32, None)
tensor2 = gs.Variable("tensor2", np.float32, None)
tensor3 = gs.Variable("tensor3", np.float32, None)
constant0 = gs.Constant(name="constant0", values=np.ones(shape=[1, 1, 1, 1], dtype=np.float32))

node0 = gs.Node("Identity", "myIdentity0", inputs=[tensor0], outputs=[tensor1])
node1 = gs.Node("Add", "myAdd", inputs=[tensor1, constant0], outputs=[tensor2])
node2 = gs.Node("Identity", "myIdentity1", inputs=[tensor2], outputs=[tensor3])

graph = gs.Graph(nodes=[node0, node1, node2], inputs=[tensor0], outputs=[tensor3])
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-04-01.onnx")

del graph

# replace node by edit the operator type
graph = gs.import_onnx(onnx.load("model-04-01.onnx"))  # load the graph from ONNX file
for node in graph.nodes:
    if node.op == "Add" and node.name == "myAdd":
        node.op = "Sub"
        node.name = "mySub"  # it's OK to change the name of the node or not

graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-04-02.onnx")

del graph

# repalce node by inserting new node
graph = gs.import_onnx(onnx.load("model-04-01.onnx"))  # load the graph from ONNX file
for node in graph.nodes:
    if node.op == "Add" and node.name == "myAdd":
        newNode = gs.Node("Sub", "mySub", inputs=node.inputs, outputs=node.outputs)
        graph.nodes.append(newNode)
        #node.outputs = [] #如果注释了该行,则会得到第四个图,这个图是错误的,因为Add和Sub的输出都是tensor2,因此需要将Add的输出置空,由于Add没有输出,gs会自动删除Add层。

graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-04-03.onnx")

在这里插入图片描述

5.输出图信息

该代码可以作为工具代码来使用。

python3 05-PrintGraphInformation.py > result-05.log
from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

onnxFile = "./model-05.onnx"
nMaxAdjustNode = 256

# Create a ONNX graph with Onnx Graphsurgeon -----------------------------------
tensor0 = gs.Variable("tensor-0", np.float32, ["B", 1, 28, 28])

constant32x1 = gs.Constant("constant32x1", np.ascontiguousarray(np.random.rand(32, 1, 5, 5).reshape(32, 1, 5, 5).astype(np.float32) * 2 - 1))
constant32 = gs.Constant("constant32", np.ascontiguousarray(np.random.rand(32).reshape(32).astype(np.float32) * 2 - 1))
constant64x32 = gs.Constant("constant64x32", np.ascontiguousarray(np.random.rand(64, 32, 5, 5).reshape(64, 32, 5, 5).astype(np.float32) * 2 - 1))
constant64 = gs.Constant("constant64", np.ascontiguousarray(np.random.rand(64).reshape(64).astype(np.float32) * 2 - 1))
constantM1Comma3136 = gs.Constant("constantM1Comma3136", np.ascontiguousarray(np.array([-1, 7 * 7 * 64], dtype=np.int64)))
constant3136x1024 = gs.Constant("constant3136x1024", np.ascontiguousarray(np.random.rand(3136, 1024).reshape(3136, 1024).astype(np.float32) * 2 - 1))
constant1024 = gs.Constant("constant1024", np.ascontiguousarray(np.random.rand(1024).reshape(1024).astype(np.float32) * 2 - 1))
constant1024x10 = gs.Constant("constant1024x10", np.ascontiguousarray(np.random.rand(1024, 10).reshape(1024, 10).astype(np.float32) * 2 - 1))
constant10 = gs.Constant("constant10", np.ascontiguousarray(np.random.rand(10).reshape(10).astype(np.float32) * 2 - 1))

graphNodeList = []

tensor1 = gs.Variable("tensor-1", np.float32, None)
node1 = gs.Node("Conv", "Conv-1", inputs=[tensor0, constant32x1, constant32], outputs=[tensor1])
node1.attrs = OrderedDict([["kernel_shape", [5, 5]], ["pads", [2, 2, 2, 2]]])
graphNodeList.append(node1)

tensor2 = gs.Variable("tensor-2", np.float32, None)
node2 = gs.Node("Relu", "ReLU-2", inputs=[tensor1], outputs=[tensor2])
graphNodeList.append(node2)

tensor3 = gs.Variable("tensor-3", np.float32, None)
node3 = gs.Node("MaxPool", "MaxPool-3", inputs=[tensor2], outputs=[tensor3])
node3.attrs = OrderedDict([["kernel_shape", [2, 2]], ["pads", [0, 0, 0, 0]], ["strides", [2, 2]]])
graphNodeList.append(node3)

tensor4 = gs.Variable("tensor-4", np.float32, None)
node1 = gs.Node("Conv", "Conv-4", inputs=[tensor3, constant64x32, constant64], outputs=[tensor4])
node1.attrs = OrderedDict([["kernel_shape", [5, 5]], ["pads", [2, 2, 2, 2]]])
graphNodeList.append(node1)

tensor5 = gs.Variable("tensor-5", np.float32, None)
node5 = gs.Node("Relu", "ReLU-5", inputs=[tensor4], outputs=[tensor5])
graphNodeList.append(node5)

tensor6 = gs.Variable("tensor-6", np.float32, None)
node6 = gs.Node("MaxPool", "MaxPool-6", inputs=[tensor5], outputs=[tensor6])
node6.attrs = OrderedDict([["kernel_shape", [2, 2]], ["pads", [0, 0, 0, 0]], ["strides", [2, 2]]])
graphNodeList.append(node6)

tensor7 = gs.Variable("tensor-7", np.float32, None)
node7 = gs.Node("Transpose", "Transpose-7", inputs=[tensor6], outputs=[tensor7], attrs=OrderedDict([("perm", [0, 2, 3, 1])]))
graphNodeList.append(node7)

tensor8 = gs.Variable("tensor-8", np.float32, None)
node8 = gs.Node("Reshape", "Reshape-7", inputs=[tensor7, constantM1Comma3136], outputs=[tensor8])
graphNodeList.append(node8)

tensor9 = gs.Variable("tensor-9", np.float32, None)
node9 = gs.Node("MatMul", "MatMul-9", inputs=[tensor8, constant3136x1024], outputs=[tensor9])
graphNodeList.append(node9)

tensor10 = gs.Variable("tensor-10", np.float32, None)
node10 = gs.Node("Add", "Add-10", inputs=[tensor9, constant1024], outputs=[tensor10])
graphNodeList.append(node10)

tensor11 = gs.Variable("tensor-11", np.float32, None)
node11 = gs.Node("Relu", "ReLU-11", inputs=[tensor10], outputs=[tensor11])
graphNodeList.append(node11)

tensor12 = gs.Variable("tensor-12", np.float32, None)
node12 = gs.Node("MatMul", "MatMul-12", inputs=[tensor11, constant1024x10], outputs=[tensor12])
graphNodeList.append(node12)

tensor13 = gs.Variable("tensor-13", np.float32, None)
node13 = gs.Node("Add", "Add-13", inputs=[tensor12, constant10], outputs=[tensor13])
graphNodeList.append(node13)

tensor14 = gs.Variable("tensor-14", np.float32, None)
node14 = gs.Node("Softmax", "Softmax-14", inputs=[tensor13], outputs=[tensor14], attrs=OrderedDict([("axis", 1)]))
graphNodeList.append(node14)

tensor15 = gs.Variable("tensor-15", np.int32, None)
node15 = gs.Node("ArgMax", "ArgMax-15", inputs=[tensor14], outputs=[tensor15], attrs=OrderedDict([("axis", 1), ("keepdims", 0)]))
graphNodeList.append(node15)

graph = gs.Graph(nodes=graphNodeList, inputs=[tensor0], outputs=[tensor15])

graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), onnxFile)

print("# Traverse the node: ----------------------------------------------------")  # traverser by nodes and print information of node, input / output tensor, name of father / son node
for index, node in enumerate(graph.nodes):
    print("Node%4d: op=%s, name=%s, attrs=%s" % (index, node.op, node.name, "".join(["{"] + [str(key) + ":" + str(value) + ", " for key, value in node.attrs.items()] + ["}"])))
    for jndex, inputTensor in enumerate(node.inputs):
        print("\tInTensor  %d: %s" % (jndex, inputTensor))
    for jndex, outputTensor in enumerate(node.outputs):
        print("\tOutTensor %d: %s" % (jndex, outputTensor))

    fatherNodeList = []
    for i in range(nMaxAdjustNode):
        try:
            newNode = node.i(i)
            fatherNodeList.append(newNode)
        except:
            break
    for jndex, newNode in enumerate(fatherNodeList):
        print("\tFatherNode%d: %s" % (jndex, newNode.name))

    sonNodeList = []
    for i in range(nMaxAdjustNode):
        try:
            newNode = node.o(i)
            sonNodeList.append(newNode)
        except:
            break
    for jndex, newNode in enumerate(sonNodeList):
        print("\tSonNode   %d: %s" % (jndex, newNode.name))

print("# Traverse the tensor: --------------------------------------------------")  # traverser by tensors and print information of tensor, name of producer / consumer node, name of father / son tensor
for index, (name, tensor) in enumerate(graph.tensors().items()):
    print("Tensor%4d: name=%s, desc=%s" % (index, name, tensor))
    for jndex, inputNode in enumerate(tensor.inputs):
        print("\tInNode      %d: %s" % (jndex, inputNode.name))
    for jndex, outputNode in enumerate(tensor.outputs):
        print("\tOutNode     %d: %s" % (jndex, outputNode.name))

    fatherTensorList = []
    for i in range(nMaxAdjustNode):
        try:
            newTensor = tensor.i(i)
            fatherTensorList.append(newTensor)
        except:
            break
    for jndex, newTensor in enumerate(fatherTensorList):
        print("\tFatherTensor%d: %s" % (jndex, newTensor))

    sonTensorList = []
    for i in range(nMaxAdjustNode):
        try:
            newTensor = tensor.o(i)
            sonTensorList.append(newTensor)
        except:
            break
    for jndex, newTensor in enumerate(sonTensorList):
        print("\tSonTensor   %d: %s" % (jndex, newTensor))

输入的log日志

# Traverse the node: ----------------------------------------------------
Node   0: op=Conv, name=Conv-1, attrs={kernel_shape:[5, 5], pads:[2, 2, 2, 2], }
	InTensor  0: Variable (tensor-0): (shape=['B', 1, 28, 28], dtype=<class 'numpy.float32'>)
	InTensor  1: Constant (constant32x1): (shape=(32, 1, 5, 5), dtype=float32)
	InTensor  2: Constant (constant32): (shape=(32,), dtype=float32)
	OutTensor 0: Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
	SonNode   0: ReLU-2
Node   1: op=Relu, name=ReLU-2, attrs={}
	InTensor  0: Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
	OutTensor 0: Variable (tensor-2): (shape=None, dtype=<class 'numpy.float32'>)
	FatherNode0: Conv-1
	SonNode   0: MaxPool-3
Node   2: op=MaxPool, name=MaxPool-3, attrs={kernel_shape:[2, 2], pads:[0, 0, 0, 0], strides:[2, 2], }
	InTensor  0: Variable (tensor-2): (shape=None, dtype=<class 'numpy.float32'>)
	OutTensor 0: Variable (tensor-3): (shape=None, dtype=<class 'numpy.float32'>)
	FatherNode0: ReLU-2
	SonNode   0: Conv-4
Node   3: op=Conv, name=Conv-4, attrs={kernel_shape:[5, 5], pads:[2, 2, 2, 2], }
	InTensor  0: Variable (tensor-3): (shape=None, dtype=<class 'numpy.float32'>)
	InTensor  1: Constant (constant64x32): (shape=(64, 32, 5, 5), dtype=float32)
	InTensor  2: Constant (constant64): (shape=(64,), dtype=float32)
	OutTensor 0: Variable (tensor-4): (shape=None, dtype=<class 'numpy.float32'>)
	FatherNode0: MaxPool-3
	SonNode   0: ReLU-5
Node   4: op=Relu, name=ReLU-5, attrs={}
	InTensor  0: Variable (tensor-4): (shape=None, dtype=<class 'numpy.float32'>)
	OutTensor 0: Variable (tensor-5): (shape=None, dtype=<class 'numpy.float32'>)
	FatherNode0: Conv-4
	SonNode   0: MaxPool-6
Node   5: op=MaxPool, name=MaxPool-6, attrs={kernel_shape:[2, 2], pads:[0, 0, 0, 0], strides:[2, 2], }
	InTensor  0: Variable (tensor-5): (shape=None, dtype=<class 'numpy.float32'>)
	OutTensor 0: Variable (tensor-6): (shape=None, dtype=<class 'numpy.float32'>)
	FatherNode0: ReLU-5
	SonNode   0: Transpose-7
# Traverse the tensor: --------------------------------------------------
Tensor   0: name=tensor-0, desc=Variable (tensor-0): (shape=['B', 1, 28, 28], dtype=<class 'numpy.float32'>)
	OutNode     0: Conv-1
	SonTensor   0: Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
Tensor   1: name=constant32x1, desc=Constant (constant32x1): (shape=(32, 1, 5, 5), dtype=float32)
	OutNode     0: Conv-1
	SonTensor   0: Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
Tensor   2: name=constant32, desc=Constant (constant32): (shape=(32,), dtype=float32)
	OutNode     0: Conv-1
	SonTensor   0: Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
Tensor   3: name=tensor-1, desc=Variable (tensor-1): (shape=None, dtype=<class 'numpy.float32'>)
	InNode      0: Conv-1
	OutNode     0: ReLU-2
	FatherTensor0: Variable (tensor-0): (shape=['B', 1, 28, 28], dtype=<class 'numpy.float32'>)
	FatherTensor1: Constant (constant32x1): (shape=(32, 1, 5, 5), dtype=float32)
	FatherTensor2: Constant (constant32): (shape=(32,), dtype=float32)
	SonTensor   0: Variable (tensor-2): (shape=None, dtype=<class 'numpy.float32'>)


在这里插入图片描述

6.常量折叠

常数折叠,实现图的清理和拓扑排序


import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])  # 3 necessary tensors
tensor1 = gs.Variable("tensor1", np.float32, ["B", 3, 64, 64])
tensor2 = gs.Variable("tensor2", np.float32, ["B", 3, 64, 64])
tensor3 = gs.Variable("tensor3", np.float32, ["B", 3, 64, 64])  # 1 fake input tensor
tensor4 = gs.Variable("tensor4", np.float32, ["B", 1, 64, 64])  # 1 fake output tensor
tensor5 = gs.Variable("tensor5", np.float32, ["B", 1, 64, 64])  # 2 useless tensors
tensor6 = gs.Variable("tensor6", np.float32, ["B", 1, 64, 64])
tensor7 = gs.Variable("tensor7", np.float32, None)  # 2 intermediate tensors
tensor8 = gs.Variable("tensor8", np.float32, None)
constant0 = gs.Constant(name="w", values=np.ones(shape=[1, 1, 1, 1], dtype=np.float32))

node0 = gs.Node("Add", "myAdd0", inputs=[constant0, constant0], outputs=[tensor7])
node1 = gs.Node("Add", "myAdd1", inputs=[tensor7, constant0], outputs=[tensor8])
node2 = gs.Node("Add", "myAdd2", inputs=[tensor0, tensor8], outputs=[tensor1])  # necessary node
node3 = gs.Node("Add", "myAdd3", inputs=[tensor1, constant0], outputs=[tensor2])  # necessary node
node4 = gs.Node("Add", "myAdd4", inputs=[tensor5, constant0], outputs=[tensor6])  # useless node

graph = gs.Graph(nodes=[node4, node3, node2, node1, node0], inputs=[tensor0, tensor3], outputs=[tensor2, tensor4])  # reverse the order of the node on purpose

onnx.save(gs.export_onnx(graph), "model-06-01.onnx")
# original graph, containing 4 tensors without node, 1 node without edge and 1 chain subgraph with constant expresssion

onnx.save(gs.export_onnx(graph.fold_constants()), "model-06-02.onnx")
# graph after constant folding, the subgraph with constant expression is fused, but the two more Add node are left without edge
# notice that constant folding will not remove any nodes

onnx.save(gs.export_onnx(graph.fold_constants().cleanup()), "model-06-03.onnx")
# graph after clean, the 3 Add nodes without edge are removed

print("Before toposort:")  # The order of the original graph
for index, node in enumerate(graph.nodes):
    print("No.%d->%s" % (index, node.name))

print("After toposort:")  # The order of the last graph
graph.toposort()
for index, node in enumerate(graph.nodes):
    print("No.%d->%s" % (index, node.name))
	'''
	Before toposort:
	No.0->myAdd3
	No.1->myAdd2
	After toposort:
	No.0->myAdd2
	No.1->myAdd3
	'''
graph.inputs = [tensor0]  # remove redundant input / output manually
graph.outputs = [tensor2]
onnx.save(gs.export_onnx(graph), "model-06-04.onnx")
# Notice
# + In TensorRT<8.0, redundant input / output tensors in netowrk (maybe from ONNX file) will be removed, so the count of input / output tensor of a TensorRT engine may be different from that in ONNX file
# + In TensorRT>=8.0, redundant input / output tensors in network will be retained as useless placeholders in the network

在这里插入图片描述


在这里插入图片描述

7.形状操作和简化

shape相关操作

from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["A", 3, "B", 5])
tensor1 = gs.Variable("tensor1", np.int64, None)
tensor2 = gs.Variable("tensor2", np.int64, None)
tensor3 = gs.Variable("tensor3", np.float32, None)
tensor4 = gs.Variable("tensor4", np.int64, None)
tensor5 = gs.Variable("tensor5", np.int64, None)
tensor6 = gs.Variable("tensor6", np.int64, None)
tensor7 = gs.Variable("tensor7", np.int64, None)
tensor8 = gs.Variable("tensor8", np.float32, None)
constant0 = gs.Constant("constant0", values=np.array([0, 1], dtype=np.int32))
constant1 = gs.Constant("constant1", values=np.array([2, 3], dtype=np.int32))

node0 = gs.Node("Shape", "myShape", inputs=[tensor0], outputs=[tensor1])  # value=(A,3,B,5), shape=(4,)
node1 = gs.Node("ReduceProd", "myReduceProd0", inputs=[tensor1], attrs={"axes": [0], "keepdims": int(True)}, outputs=[tensor2])  # value=(A*3*B*5), shape=()
node2 = gs.Node("Reshape", "myReshape0", inputs=[tensor0, tensor2], outputs=[tensor3])  # shape=(A*3*B*5,)
node3 = gs.Node("Gather", "myGather0", inputs=[tensor1, constant0], outputs=[tensor4])  # value=(A,3), shape=(2,)
node4 = gs.Node("Gather", "myGather1", inputs=[tensor1, constant1], outputs=[tensor5])  # value=(B,5), shape=(2,)
node5 = gs.Node("ReduceProd", "myReduceProd1", inputs=[tensor5], attrs={"axes": [0], "keepdims": int(True)}, outputs=[tensor6])  # value=(B*5), shape=()
node6 = gs.Node("Concat", "myConcat", inputs=[tensor4, tensor6], attrs={"axis": 0}, outputs=[tensor7])  # value=(A,3,B*5), shape=()
node7 = gs.Node("Reshape", "myReshape1", inputs=[tensor0, tensor7], outputs=[tensor8])  # shape=(A,3,B*5,)

graph = gs.Graph(nodes=[node0, node1, node2, node3, node4, node5, node6, node7], inputs=[tensor0], outputs=[tensor3, tensor8])

graph.cleanup().toposort()#使用动态形状模式,图中有许多形状算子
onnx.save(gs.export_onnx(graph), "model-07-01.onnx")  # using Dynamic Shape mode, there are many shape operators in the graph

#如果形状是静态的,形状运算符可以简化
graph.inputs[0].shape = [2, 3, 4, 5]  # shape operators can be simplified if the shape is static
graph.fold_constants().cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-07-02.onnx")

在这里插入图片描述

8.隔离子图

从图中隔离出部分子图


from collections import OrderedDict

import numpy as np
import onnx
import onnx_graphsurgeon as gs

tensor0 = gs.Variable("tensor0", np.float32, ["B", 3, 64, 64])
tensor1 = gs.Variable("tensor1", np.float32, ["B", 3, 64, 64])
tensor2 = gs.Variable("tensor2", np.float32, ["B", 3, 64, 64])
tensor3 = gs.Variable("tensor3", np.float32, ["B", 3, 64, 64])
constant0 = gs.Constant(name="constant0", values=np.ones(shape=[1, 1, 1, 1], dtype=np.float32))

node0 = gs.Node("Identity", "myIdentity0", inputs=[tensor0], outputs=[tensor1])
node1 = gs.Node("Add", "myAdd", inputs=[tensor1, constant0], outputs=[tensor2])
node2 = gs.Node("Identity", "myIdentity1", inputs=[tensor2], outputs=[tensor3])

graph = gs.Graph(nodes=[node0, node1, node2], inputs=[tensor0], outputs=[tensor3])
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-08-01.onnx")

del graph

# mark some tensors in original graph as input / output tensors so that isolate a subgraph 
# 将原图中的一些张量标记为输入/输出张量,以隔离子图
graph = gs.import_onnx(onnx.load("model-08-01.onnx"))
for node in graph.nodes:
    if node.op == "Add" and node.name == "myAdd":
        graph.inputs = [node.inputs[0]]
        graph.outputs = [node.outputs[0]]

graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-08-02.onnx")

在这里插入图片描述

9.BuildModelWithAPI

使用gs.Graph.register()将一个函数注册成节点,再创建graph。


import numpy as np
import onnx
import onnx_graphsurgeon as gs


# Create nodes
# use onnx_graphsurgeon.Graph.register() to register a function as a ndoe
@gs.Graph.register()
def add(self, a, b):
    return self.layer(op="Add", inputs=[a, b], outputs=["myAdd"])

@gs.Graph.register()
def mul(self, a, b):
    return self.layer(op="Mul", inputs=[a, b], outputs=["myMul"])
# 通用矩阵-矩阵乘法(General Matrix-Matrix Multiplication,GEMM)
# a:表示输入矩阵a ; b:表示输入矩阵b ; isTransposeA和isTransposeB:表示是否需要对矩阵a和矩阵b进行转置操作
@gs.Graph.register()
def gemm(self, a, b, isTransposeA=False, isTransposeB=False):
    attrs = {"transA": int(isTransposeA), "transB": int(isTransposeB)}
    return self.layer(op="Gemm", inputs=[a, b], outputs=["myGgemm"], attrs=attrs)

@gs.Graph.register()
def min(self, *args):
    return self.layer(op="Min", inputs=args, outputs=["myMin"])

@gs.Graph.register()
def max(self, *args):
    return self.layer(op="Max", inputs=args, outputs=["myMax"])

# mark version during register a function as a ndoe
@gs.Graph.register(opsets=[11])
def relu(self, a):
    return self.layer(op="Relu", inputs=[a], outputs=["myReLU"])

# register node with the same name but different version
@gs.Graph.register(opsets=[1])
def relu(self, a):
    raise NotImplementedError("This function has not been implemented!")

# Create graph
graph = gs.Graph(opset=11)
tensor0 = gs.Variable(name="tensor0", shape=[64, 64], dtype=np.float32)
tensor1 = gs.Constant(name="tensor1", values=np.ones(shape=(64, 64), dtype=np.float32))
#tensor1 = np.ones(shape=(64, 64), dtype=np.float32) # np.array can also be used but the name of the tensor will be created automatically by ONNX if so
tensor2 = gs.Constant(name="tensor2", values=np.ones((64, 64), dtype=np.float32) * 0.5)
tensor3 = gs.Constant(name="tensor3", values=np.ones(shape=[64, 64], dtype=np.float32))
tensor4 = gs.Constant(name="tensor4", values=np.array([3], dtype=np.float32))
tensor5 = gs.Constant(name="tensor5", values=np.array([-3], dtype=np.float32))

node0 = graph.gemm(tensor1, tensor0, isTransposeB=True)
node1 = graph.add(*node0, tensor2)
node2 = graph.relu(*node1)
node3 = graph.mul(*node2, tensor3)
node4 = graph.min(*node3, tensor4)
node5 = graph.max(*node4, tensor5)

graph.inputs = [tensor0]
graph.outputs = [node5[0]]

graph.inputs[0].dtype = np.dtype(np.float32)
graph.outputs[0].dtype = np.dtype(np.float32)

onnx.save(gs.export_onnx(graph), "model-09-01.onnx")

@gs.Graph.register()
def replaceWithClip(self, inputs, outputs):
    # remove the output tensor of the tail node and the input tensor of the head node
    # 去掉尾部节点的输出张量和头部节点的输入张量
    for inp in inputs:
        inp.outputs.clear()
    for out in outputs:
        out.inputs.clear()
    # inset new node
    return self.layer(op="Clip", inputs=inputs, outputs=outputs)

temp = graph.tensors()

# find the input / outpu tensor that we want to remove, and pass them to function replaceWithClip
inputs = [temp["myMul_6"], temp["tensor5"], temp["tensor4"]]
outputs = [temp["myMax_10"]]
graph.replaceWithClip(inputs, outputs)

graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "model-09-02.onnx")

在这里插入图片描述

10.AdvanceAPI

在ONNX文件上使用高级API工作,将使用ONNX工具箱。


import os
import sys
from collections import OrderedDict
from copy import deepcopy

import numpy as np
import onnx
import onnx_graphsurgeon as gs
from polygraphy.backend.onnx.loader import fold_constants

np.random.seed(31193)

onnxFile0 = "model-10.onnx"
onnxFile1 = "model-10-01.onnx"
onnxFile2 = "model-10-02.onnx"

# Constant of 0 dimension
constantS0 = gs.Constant("constantS0", np.array(0, dtype=np.int64))
constantS2 = gs.Constant("constantS2", np.array(2, dtype=np.int64))
constantS3 = gs.Constant("constantS3", np.array(3, dtype=np.int64))
# Constant of 1 dimension integer value, MUST use np.ascontiguousarray, or TensorRT will regard the shape of this Constant as (0) !!!
constant0 = gs.Constant("constant0", np.ascontiguousarray(np.array([0], dtype=np.int64)))
constant1 = gs.Constant("constant1", np.ascontiguousarray(np.array([1], dtype=np.int64)))
# Constant of >1 dimension
constantWeight = gs.Constant("constantWeight", np.ascontiguousarray(np.random.rand(1 * 1 * 3 * 3).astype(np.float32).reshape([1, 1, 3, 3])))
constantBias = gs.Constant("constantBias", np.ascontiguousarray(np.random.rand(1 * 1 * 1 * 1).astype(np.float32).reshape([1, 1, 1, 1])))
constant307200x64 = gs.Constant("constant307200x256", np.ascontiguousarray(np.random.rand(307200 * 64).astype(np.float32).reshape([307200, 64])))

# Tool function
def markGraphOutput(graph, lNode, bMarkOutput=True, bMarkInput=False, lMarkOutput=None, lMarkInput=None, bRemoveOldOutput=True):
    # graph:            The ONNX graph for edition
    # lNode:            The list of nodes we want to mark as output
    # bMarkOutput:      Whether to mark the output tensor(s) of the nodes in the lNode
    # bMarkInput:       Whether to mark the input tensor(s) of the nodes in the lNode
    # lMarkOutput:      The index of output tensor(s) of the node are marked as output, only available when len(lNode) == 1
    # lMarkInput:       The index of input tensor(s) of the node are marked as output, only available when len(lNode) == 1
    # bRemoveOldOutput: Whether to remove the original output of the network (cutting the graph to the node we want to mark to save ytime of building)

    # In most cases, using the first 4 parameters is enough, for example:
    #markGraphOutput(graph, ["/Conv"])                          # mark output tensor of the node "/Conv" as output
    #markGraphOutput(graph, ["/Conv"], False, True)             # mark input tensors of the node "/Conv" (input tensor + weight + bias) as output
    #markGraphOutput(graph, ["/TopK"], lMarkOutput=[1])         # mark the second output tensor of the node "/TopK" as output
    #markGraphOutput(graph, ["/Conv"], bRemoveOldOutput=False)  # mark output tensor of the node "/Conv" as output, and keep the original output of the network

    if bRemoveOldOutput:
        graph.outputs = []
    for node in graph.nodes:
        if node.name in lNode:
            if bMarkOutput:
                if lMarkOutput is None or len(lNode) > 1:
                    lMarkOutput = range(len(node.outputs))
                for index in lMarkOutput:
                    graph.outputs.append(node.outputs[index])
                    node.outputs[index].dtype = np.dtype(np.float32)
                    print("[M] Mark node [%s] output tensor [%s]" % (node.name, node.outputs[index].name))
            if bMarkInput:
                if lMarkInput is None or len(lNode) > 1:
                    lMarkInput = range(len(node.inputs))
                for index in lMarkInput:
                    graph.outputs.append(node.inputs[index])
                    print("[M] Mark node [%s] input  tensor [%s]" % (node.name, node.inputs[index].name))

    graph.cleanup().toposort()
    return len(lNode)

def addNode(graph, nodeType, prefix, number, inputList, attribution=None, suffix="", dtype=None, shape=None):
    # ONLY for the node with one output tensor!!
    # graph:        The ONNX graph for edition
    # nodeType:     The type of the node to add, for example, "Concat"
    # prefix:       Optimization type, for example "RemoveLoop"
    # number:       An incremental number to prevent duplicate names
    # inputlist:    The list of input tensors for the node
    # attribution:  The attribution dictionary of the node, for example, OrderedDict([('axis',0)])
    # suffix:       Extra name for marking the tensor, for example "bTensor"
    # dtype:        The data type of the output tensor (optional)
    # shape:        The shape of the output tensor (optional)

    nodeName = prefix + "-N-" + str(number) + "-" + nodeType
    tensorName = prefix + "-V-" + str(number) + "-" + nodeType + (("-" + suffix) if len(suffix) > 0 else "")
    tensor = gs.Variable(tensorName, dtype, shape)
    node = gs.Node(nodeType, nodeName, inputs=inputList, outputs=[tensor], attrs=(OrderedDict() if attribution is None else attribution))
    graph.nodes.append(node)
    return tensor, number + 1

def addNodeMultipleOutput(graph, nodeType, prefix, number, inputList, attribution=None, suffix="", dtypeList=None, shapeList=None):
    # ONLY for the node with multiple tensor!!
    # graph:        The ONNX graph for edition
    # nodeType:     The type of the node to add, for example, "Concat"
    # prefix:       Optimization type, for example "RemoveLoop"
    # number:       An incremental number to prevent duplicate names
    # inputlist:    The list of input tensors for the node
    # attribution:  The attribution dictionary of the node, for example, OrderedDict([('axis',0)])
    # suffix:       Extra name for marking the tensor, for example "bTensor"
    # dtypeList:    The list of the data type of the output tensor (optional)
    # shapeList:    The list of the shape of the output tensor (optional)

    nodeName = prefix + "-N-" + str(number) + "-" + nodeType

    assert len(dtypeList) == len(shapeList)
    outputList = []
    for i in range(len(dtypeList)):
        tensorName = prefix + "-V-" + str(number) + "-" + nodeType + "-" + str(i) + (("-" + suffix) if len(suffix) > 0 else "")
        tensor = gs.Variable(tensorName, dtypeList[i], shapeList[i])
        outputList.append(tensor)

    node = gs.Node(nodeType, nodeName, inputs=inputList, outputs=outputList, attrs=(OrderedDict() if attribution is None else attribution))
    graph.nodes.append(node)
    return outputList, number + 1

# A example of surgery function
def removeAddSub(graph):
    scopeName = sys._getframe().f_code.co_name
    n = 0

    for node in graph.nodes:
        if node.op == "Add" and node.o().op == "Sub" and node.inputs[1].values == node.o().inputs[1].values:
            index = node.o().o().inputs.index(node.o().outputs[0])
            tensor, n = addNode(graph, "Identity", scopeName, n, [node.inputs[0]], None, "", np.dtype(np.float32), node.inputs[0].shape)
            node.o().o().inputs[index] = tensor
            n += 1
    return n

# Create a ONNX file as beginning
graph = gs.Graph()
scopeName = "WILI"  # whatever something
n = 0  # a counter to differentiate the names of the nodes

tensorInput = gs.Variable("tensorInput", np.dtype(np.float32), ["B", 1, 480, 640])

attribution = OrderedDict([["dilations", [1, 1]], ["kernel_shape", [3, 3]], ["pads", [1, 1, 1, 1]], ["strides", [1, 1]]])
tensor1, n = addNode(graph, "Conv", scopeName, n, [tensorInput, constantWeight], attribution, "", np.dtype(np.float32), ['B', 1, 480, 640])

tensor2, n = addNode(graph, "Add", scopeName, n, [tensor1, constantBias], None, "", np.dtype(np.float32), ['B', 1, 480, 640])

tensor3, n = addNode(graph, "Relu", scopeName, n, [tensor2], None, "", np.dtype(np.float32), ['B', 1, 480, 640])

tensor4, n = addNode(graph, "Add", scopeName, n, [tensor3, constant1], None, "", np.dtype(np.float32), ['B', 1, 480, 640])

tensor5, n = addNode(graph, "Sub", scopeName, n, [tensor4, constant1], None, "", np.dtype(np.float32), ['B', 1, 480, 640])

tensor6, n = addNode(graph, "Shape", scopeName, n, [tensorInput], None, "", np.dtype(np.int64), [4])  # value:(B,1,480, 640)

tensorBScalar, n = addNode(graph, "Gather", scopeName, n, [tensor6, constantS0], OrderedDict([('axis', 0)]), "tensorBScalar", np.dtype(np.int64), [])  # value: (B)

tensorB, n = addNode(graph, "Unsqueeze", scopeName, n, [tensorBScalar, constant0], None, "tensorB", np.dtype(np.int64), [1])  # value: (B)

tensorHScalar, n = addNode(graph, "Gather", scopeName, n, [tensor6, constantS2], OrderedDict([('axis', 0)]), "tensorHScalar", np.dtype(np.int64), [])  # value: (480)

tensorH, n = addNode(graph, "Unsqueeze", scopeName, n, [tensorHScalar, constant0], None, "tensorH", np.dtype(np.int64), [1])  # value: (480)

tensorWScalar, n = addNode(graph, "Gather", scopeName, n, [tensor6, constantS3], OrderedDict([('axis', 0)]), "tensorWScalar", np.dtype(np.int64), [])  # value: (640)

tensorW, n = addNode(graph, "Unsqueeze", scopeName, n, [tensorWScalar, constant0], None, "tensorW", np.dtype(np.int64), [1])  # value: (640)

tensorHW, n = addNode(graph, "Mul", scopeName, n, [tensorH, tensorW], None, "tensorHW", np.dtype(np.int64), [1])  # value: (480*640)

tensorBC1CHW, n = addNode(graph, "Concat", scopeName, n, [tensorB, constant1, tensorHW], OrderedDict([('axis', 0)]), "tensorBC1CHW", np.dtype(np.int64), [3])  # value: (B, 1, 480*640)

tensor7, n = addNode(graph, "Reshape", scopeName, n, [tensor5, tensorBC1CHW], None, "", np.dtype(np.float32), ["B", 1, 480 * 640])

tensor8, n = addNode(graph, "Squeeze", scopeName, n, [tensor7, constant1], None, "", np.dtype(np.float32), ["B", 480 * 640])

tensor9, n = addNode(graph, "MatMul", scopeName, n, [tensor8, constant307200x64], None, "", np.dtype(np.float32), ["B", 64])

graph.inputs = [tensorInput]
graph.outputs = [tensor9]
graph.cleanup().toposort()
graph.opset = 17  # node might not be supported by some old opset. For example, the shape inference of onnxruntime in polygraphy will fail if we use opset==11

# Save the model as ONNX file
# + "save_as_external_data" is used to seperate the weight and structure of the model, making it easier to copy if we are just interested in the structure. The saving process will fail without the switch if size of the model is larger than 2GiB.
# + If the model is small, "onnx.save(gs.export_onnx(graph), onnxFile0)" is enough
# + "all_tensors_to_one_file" is used to reduce the number of weight files
# + There must no directory prefix in "location" parameter
# + Clean the target weight files before saving, or the weight files will be appended to the old ones
# 保存第一个ONNX文件
os.system("rm -rf " + onnxFile0 + ".weight")
onnx.save(gs.export_onnx(graph), onnxFile0, save_as_external_data=True, all_tensors_to_one_file=True, location=onnxFile0.split('/')[-1] + ".weight")

# Load the model
# + If the size of the model is larger than 2GiB, laoding process must be divided into two steps: loading the structure firstly and then the weight
# + If the model is small, "onnxModel = onnx.load(onnxFile0)" is enough
onnxModel = onnx.load(onnxFile0, load_external_data=False)
onnx.load_external_data_for_model(onnxModel, ".")

# Do constant folding by polygraphy (and save it as visualization in this example)
# Sometimes this step should be skiped because some graph is not originally supported by polygraphy and TensorRT, so some manual graph surgery must be done before polygraphy take the model in this occasion
# 保存第二个ONNX文件
onnxModel = fold_constants(onnxModel, allow_onnxruntime_shape_inference=True)
onnx.save(onnxModel, onnxFile1, save_as_external_data=True, all_tensors_to_one_file=True, location=onnxFile1.split('/')[-1] + ".weight")

# Continue to do graph surgery by onnx-graphsurgeon
graph = gs.import_onnx(onnxModel)
#graph = gs.import_onnx(onnx.shape_inference.infer_shapes(onnxModel))  # This API can be used to infer the shape of each tensor if size of the model is less than 2GiB and polygraphy is not used before

# Print information of ONNX file before graph surgery
print("[M] %-16s: %5d Nodes, %5d tensors" % (onnxFile0, len(graph.nodes), len(graph.tensors().keys())))

# Do graph surgery and print how many subgraph is edited
print("[M] %4d RemoveAddSub" % removeAddSub(graph))

graph.cleanup().toposort()

# Print information of ONNX file after graph surgery
print("[M] %-16s: %5d Nodes, %5d tensors" % (onnxFile2, len(graph.nodes), len(graph.tensors().keys())))

# Print information of input / output tensor
for i, tensor in enumerate(graph.inputs):
    print("[M] Input [%2d]: %s, %s, %s" % (i, tensor.shape, tensor.dtype, tensor.name))
for i, tensor in enumerate(graph.outputs):
    print("[M] Output[%2d]: %s, %s, %s" % (i, tensor.shape, tensor.dtype, tensor.name))

# Do another constant folding by polygraphy and save it to ensure the model is supported by TensorRT
# 保存第三个ONNX文件
onnxModel = fold_constants(gs.export_onnx(graph), allow_onnxruntime_shape_inference=True)
onnx.save(onnxModel, onnxFile2, save_as_external_data=True, all_tensors_to_one_file=True, location=onnxFile2.split('/')[-1] + ".weight")

print("Finish graph surgery!")

[I] Folding Constants | Pass 1
2024-03-01 06:23:45.564797453 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node WILI-N-11-Unsqueeze
2024-03-01 06:23:45.565153288 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node WILI-N-9-Unsqueeze
[I] Inferring shapes in the model with `onnxruntime.tools.symbolic_shape_infer`.
    Note: To force Polygraphy to use `onnx.shape_inference` instead, set `allow_onnxruntime=False` or use the `--no-onnxruntime-shape-inference` command-line option.
[I]     Total Nodes | Original:    17, After Folding:    12 |     5 Nodes Folded
[I] Folding Constants | Pass 2
[I]     Total Nodes | Original:    12, After Folding:    12 |     0 Nodes Folded
[M] model-10.onnx   :    12 Nodes,    20 tensors
[M]    2 RemoveAddSub
[M] model-10-02.onnx:    11 Nodes,    19 tensors
[M] Input [ 0]: ['B', 1, 480, 640], float32, tensorInput
[M] Output[ 0]: ['B', 64], float32, WILI-V-16-MatMul
[I] Folding Constants | Pass 1
[I]     Total Nodes | Original:    11, After Folding:    11 |     0 Nodes Folded
Finish graph surgery!

在这里插入图片描述

3.onnx-graphsurgeon的优势

onnx-graphsurgeon作为NVIDIA提出的一个图形编辑器,是实现在TensorRT上完成模型部署任务的必不可少的辅助工具,下面是总结了关于onnx-graphsurgeon的一些详细优势:

  • 强大的图形编辑器:onnx-graphsurgeon允许您直观地编辑ONNX模型的图形表示。它提供了一组API和命令行工具,使您能够添加删除连接修改模型中的节点。通过这个编辑器,您可以轻松地修改模型的结构,以适应特定的优化需求,例如合并算子、删除不必要的节点等。

  • 强大的优化功能:onnx-graphsurgeon提供了一些强大的优化功能,可帮助您改进模型的性能和效率。它可以自动检测和删除不必要的节点、合并相邻的算子以减少计算开销,并进行其他优化操作。此外,onnx-graphsurgeon还支持自定义优化规则,您可以编写自己的优化策略来进一步优化模型。

  • 与TensorRT高度集成:onnx-graphsurgeon与TensorRT紧密集成,可以直接导出优化后的模型供TensorRT使用。我们可以使用onnx-graphsurgeon对模型进行编辑和优化,然后将修改后的模型导出为TensorRT接受的格式,从而实现更高效的推理。

  • 灵活的扩展性:onnx-graphsurgeon具有灵活的扩展性,允许我们根据需要进行自定义扩展和修改。我们可以编写自己的plugin插件和规则,以实现特定的模型操作和优化需求。这使得onnx-graphsurgeon适用于各种复杂的模型场景和优化任务。


总结

通过本篇博客我们深入探讨了TensorRT中两个重要的工具:ONNX parser和onnx-graphsurgeon。我们首先介绍了ONNX parser的原理和使用方法,帮助读者了解如何解析ONNX格式的模型,并将其转化为TensorRT可优化和加速的形式。

随后,我们重点关注了onnx-graphsurgeon,这是一个功能强大的工具,可以帮助我们对ONNX模型进行图形操作和优化。通过使用onnx-graphsurgeon,我们可以进行模型剪枝、层融合、层替换等操作,进一步优化我们的模型。我们提供了大量的onnx-graphsurgeon使用方法示例,覆盖了常见的优化操作,帮助读者更好地理解和应用onnx-graphsurgeon。

在本篇博客的阅读过程中,我们希望读者能够对TensorRT中的ONNX parser和onnx-graphsurgeon有一个全面的了解,并能够灵活运用它们来优化和加速自己的深度学习模型。

感谢您阅读本篇博客,希望它能够对您的工作和学习有所帮助。如果您有任何问题或反馈,请随时与我们联系,欢迎大家提出建议,共同学习。

  • 47
    点赞
  • 36
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
TensorRT是NVIDIA深度学习推理库,可以用于加速神经网络模型的推理,包括 ONNX 模型。下面是使用 TensorRT 加速 ONNX 模型的一些步骤: 1. 安装 TensorRT:可以从 NVIDIA 官网下载并安装 TensorRT 的相应版本。 2. 将 ONNX 模型转换为 TensorRT 引擎:使用 TensorRTONNX ParserONNX 模型转换为 TensorRT 引擎。这可以通过以下代码实现: ```python import tensorrt as trt import onnx # Load the ONNX model as a graph and prepare the TensorRT inference engine onnx_model = onnx.load('model.onnx') onnx.checker.check_model(onnx_model) trt_engine = trt.utils.\ onnx_to_trt_engine(onnx_model, max_batch_size=1, max_workspace_size=1 << 28, precision_mode="FP16", minimum_segment_size=2) ``` 3. 创建 TensorRT 推理引擎:创建 TensorRT 推理引擎实例,并为其分配输入和输出张量的内存。这可以通过以下代码实现: ```python # Create a TensorRT inference engine trt_logger = trt.Logger(trt.Logger.WARNING) trt_runtime = trt.Runtime(trt_logger) trt_context = trt_engine.create_execution_context() # Allocate memory for inputs and outputs input_shape = trt_engine.get_binding_shape(0) output_shape = trt_engine.get_binding_shape(1) input_size = trt.volume(input_shape) * trt_engine.max_batch_size * np.dtype(np.float32).itemsize output_size = trt.volume(output_shape) * trt_engine.max_batch_size * np.dtype(np.float32).itemsize # Allocate device memory d_input = cuda.mem_alloc(input_size) d_output = cuda.mem_alloc(output_size) ``` 4. 执行推理:将输入数据复制到设备内存,执行推理,然后将输出数据从设备内存复制回主机内存。这可以通过以下代码实现: ```python # Copy input data to device memory cuda.memcpy_htod(d_input, input_data) # Execute the inference trt_context.execute_v2(bindings=[int(d_input), int(d_output)]) # Copy output data from device memory output_data = np.empty(output_shape, dtype=np.float32) cuda.memcpy_dtoh(output_data, d_output) ``` 这些步骤可以帮助你使用 TensorRT 加速 ONNX 模型的推理。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值