onnx标准 & onnxRuntime加速推理引擎

onnx标准 & onnxRuntime加速推理引擎

参考博客:

参考官方文档:

一、onnx简介

通常我们在训练模型时可以使用很多不同的框架,比如有的同学喜欢用 Pytorch,有的同学喜欢使用 TensorFLow,也有的喜欢 MXNet,以及深度学习最开始流行的 Caffe等等,这样不同的训练框架就导致了产生不同的模型结果包,在模型进行部署推理时就需要不同的依赖库,而且同一个框架比如tensorflow 不同的版本之间的差异较大, 为了解决这个混乱问题,LF AI 这个组织联合 Facebook, MicroSoft等公司制定了机器学习模型的标准,这个标准叫做ONNX, Open Neural Network Exchage,所有其他框架产生的模型包 (.pth, .pb) 都可以转换成这个标准格式,转换成这个标准格式后,就可以使用统一的 ONNX Runtime等工具进行统一部署。(和Java生成的中间文件可以在JVM上运行一样,onnx runtime引擎为生成的onnx模型文件提供推理功能)

onnx主页: onnxruntime.ai

开发文档和教程: onnxruntime.ai/docs

Companion sample repositories:

onnx支持的算子https://github.com/onnx/onnx/blob/main/docs/Operators.md

onnx支持哪些平台、为哪些语言提供了API、支持的指令集架构、支持哪些硬件的加速

在这里插入图片描述

Note

  • onnx为机器学习模型定义了一个标准,和TensorFlow,Caffe都属于主流的模型格式,并且提供了onnx runtime并行加速推理包,包括对ONNX 模型进行解读,优化(融合conv-bn等操作),运行,方便AI模型的移植和部署。在使用SCRFD发现,原生的onnx runtime加载onnx模型并完成推理的确比cv2.dnn.readNet(onnx)的效率高得多,差不多是后者的一倍多,该博主也给出相同的结论:https://blog.csdn.net/woshicver/article/details/113764970。

  • onnx也为主流的机器学习框架提供模型训练,这里主要是pytorch,官网说会比之前在pytorch训练快上1.4倍。只需要在加载模型时修改成:

     from torch_ort import ORTModule
     model = ORTModule(model)
    

    有兴趣的参考microsoft/onnxruntime-training-exampleshttps://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/

  • 其实除了onnx提供了onnxRuntime推理引擎,还有比如阿里的**MNN轻量级高性能推理引擎,腾讯的NCNN,Nvidia的TensorRT**等

参考主流的深度学习推理架构有哪些呢?

ONNX源码阅读

二、pytorch转onnx

参考

在模型的推理阶段中,通过传入模型和输入的Tensor,利用torch.onnx即可实现pth文件到onnx文件的转化

#Function to Convert to ONNX
def Convert_ONNX(model,input):

    # set the model to inference mode
    model.eval()

    # Let's create a dummy input tensor
    dummy_input = input

    # Export the model
    torch.onnx.export(model,         # model being run
         dummy_input,       # model input (or a tuple for multiple inputs)
         "ImageClassifier.onnx",       # where to save the model
         export_params=True,  # store the trained parameter weights inside the model file
         opset_version=13,    # the ONNX version to export the model to
         do_constant_folding=True,  # whether to execute constant folding for optimization
         input_names = ['modelInput'],   # the model's input names
         output_names = ['modelOutput'], # the model's output names
         dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes
                                'modelOutput' : {0 : 'batch_size'}})
    print(" ")
    print('Model has been converted to ONNX')

if __name__ == '__main__':
    checkpoints = torch.load("./pth/mobileNetV3_SMART_CE.pth",map_location="cpu")  #全部权重加载到模型中
    model.load_state_dict(checkpoints)

    # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    device = torch.device("cpu")
    model = model.to(device)
    
    Convert_ONNX(model, sample)

但是在使用MobileNetV3的pth文件转onnx时,存在问题:

RuntimeError: Exporting the operator relu6 to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

原因是relu6(也就是hardSigmoid)只在MobileNetV3中实现,onnx中不存在该算子,参考https://github.com/pytorch/vision/issues/3463https://github.com/onnx/onnx/blob/main/docs/Operators.md

解决方法:

  • Native support exporting F.hardsigmoid to onnx.
  • Replace F.hardsigmoid with F.hardtanh that is friendly for exporting and equal numerically as I did above.
class hswish(nn.Module):
   def forward(self, x):
       out = x * F.hardtanh(x + 3, inplace=True) / 6
       return out


class hsigmoid(nn.Module):
   def forward(self, x):
       out = F.hardtanh(x + 3, inplace=True) / 6
       return out

三、tf1.0 / tf2.0 ckpt转onnx

参考将tensorflow 1.x & 2.x转化成onnx文件

四、python onnx的使用

1、环境安装

If using pip, run pip install --upgrade pip prior to downloading.

ArtifactDescriptionSupported Platforms
onnxruntimeCPU (Release 稳定版)Windows (x64), Linux (x64, ARM64), Mac (X64),
ort-nightlyCPU (Dev 测试版)Same as above
onnxruntime-gpuGPU (Release)Windows (x64), Linux (x64, ARM64)
ort-nightly-gpuGPU (Dev)Same as above

pip命令如下:

pip install onnxruntime
1)CPU版本
self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path)  # Create inference session using ort.InferenceSession
2)GPU版本
  • 和cpu一样,导入onnxruntime包即可,无需加上’-gpu’

  • 只需加个provider即可,参考onnx 需要指定provider

self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path,providers=['CUDAExecutionProvider'])  # Create inference session using ort.InferenceSession
3)存在的问题1

安装的onnxruntime-gpu和cuda不匹配

RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:531 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasn't able to be loaded. Please install the correct version of CUDA and cuDNN as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.

解决方法:参考 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

  • 查看CUDA:

    nvcc --version
    
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
    Cuda compilation tools, release 10.0, V10.0.130
    
  • 查看cuDNN:参考Linux 和 Windows 查看 CUDA 和 cuDNN 版本

    # 使用 PyTorch 查看 CUDA 和 cuDNN 版本
    import torch
    print(torch.__version__)
    
    print(torch.version.cuda)
    print(torch.backends.cudnn.version())
    ---
    1.9.0+cu102
    10.2
    7605
    
  • 对照下面表格:安装onnxruntime-gpu==1.2

    ONNX RuntimeCUDAcuDNNNotes
    1.1011.48.2.4 (Linux) 8.2.2.26 (Windows)libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.1.51 libcublas 11.6.1.51 libcudnn 8.2.4
    1.911.48.2.4 (Linux) 8.2.2.26 (Windows)libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.1.51 libcublas 11.6.1.51 libcudnn 8.2.4
    1.811.0.38.0.4 (Linux) 8.0.2.39 (Windows)libcudart 11.0.221 libcufft 10.2.1.245 libcurand 10.2.1.245 libcublasLt 11.2.0.252 libcublas 11.2.0.252 libcudnn 8.0.4
    1.711.0.38.0.4 (Linux) 8.0.2.39 (Windows)libcudart 11.0.221 libcufft 10.2.1.245 libcurand 10.2.1.245 libcublasLt 11.2.0.252 libcublas 11.2.0.252 libcudnn 8.0.4
    1.5-1.610.28.0.3CUDA 11 can be built from source
    1.2-1.410.17.6.5Requires cublas10-10.2.1.243; cublas 10.1.x will not work
    1.0-1.110.07.6.4CUDA versions from 9.1 up to 10.1, and cuDNN versions from 7.1 up to 7.4 should also work with Visual Studio 2017
  • 由于我的CUDA是10.0,所以onnxruntime也要降至1.2版本(如果不行,安装onnxruntime-gpu == 1.1),否则会报错

    from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, RunOptions, SessionOptions, set_default_logger_severity, NodeArg, ModelMetadata, GraphOptimizationLevel, ExecutionMode, OrtDevice, SessionIOBinding
    ImportError: cannot import name 'get_all_providers'
    

    会发现onnxruntimeonnxruntime-gpu版本是一致的:

    onnxruntimeonnxruntime-gpu

    而且还要注意一点:

    需要先安装onnxruntime,再安装onnxruntime-gpu,这样才能使用GPU,否则下面

    print(onnxruntime.get_device())    #检测当前的硬件情况
    
    self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path,providers=['CUDAExecutionProvider'])  # Create inference session using ort.InferenceSession
    

    print(onnxruntime.get_device()) 一直输出CPU

4)存在的问题2

打包的onnx文件不适用于该版本的onnxruntime,需要重新打包

onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:Conv_0 : No Op registered for Conv with domain_version of 13

参考https://blog.csdn.net/qq_279033270/article/details/109583514

重新将pth文件转换成onnx

'''pth转onnx'''
    # Export the model
    torch.onnx.export(landmark_detector,  # model being run
                      input,  # model input (or a tuple for multiple inputs)
                      "ImageClassifier.onnx",  # where to save the model
                      export_params=True,  # store the trained parameter weights inside the model file
                      opset_version=9,  # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      input_names=['modelInput'],  # the model's input names
                      output_names=['modelOutput'],  # the model's output names
                      dynamic_axes={'modelInput': {0: 'batch_size'},  # variable length axes
                                    'modelOutput': {0: 'batch_size'}})
    print(" ")
    print('Model has been converted to ONNX')

2、获得onnx模型权重参数(可视化)

这里以SCRFD模型为例

import cv2
import onnx
from onnx import helper
import onnxruntime
import numpy as np

'''一、获取onnx模型的输出层,输出层和模型中的节点数'''
if __name__ == '__main__':

    # 参考 <https://blog.csdn.net/CFH1021/article/details/108732114>, https://onnxruntime.ai/docs/get-started/with-python.html

    # 加载模型
    # model = onnx.load('./weights/mbv2_ID_recognition.onnx')  # Load the onnx model with onnx.load
    model = onnx.load('./weights/scrfd_500m_kps.onnx')
    # model = onnx.load('./weights/mbv3_fire_classifier.onnx')
    # 检查模型格式是否完整及正确
    onnx.checker.check_model(model)

    '''
    ref: https://github.com/onnx/onnx/blob/main/docs/IR.md
    Graphs have the following properties:
        name:	string	The name of the model graph.
        node:	Node[]	A list of nodes, forming a partially ordered computation graph based on input/output data dependencies. It is in topological order.
        initializer:	Tensor[]	A list of named tensor values. When an initializer has the same name as a graph input, it specifies a default value for that input. When an initializer has a name different from all graph inputs, it specifies a constant value. The order of the list is unspecified.
        doc_string:	string	Human-readable documentation for this model. Markdown is allowed.
        input:	ValueInfo[]	The input parameters of the graph, possibly initialized by a default value found in ‘initializer.’
        output:	ValueInfo[]	The output parameters of the graph. Once all output parameters have been written to by a graph execution, the execution is complete.
        value_info:	ValueInfo[]	Used to store the type and shape information of values that are not inputs or outputs.
    '''
    input = model.graph.input    # 获取输入层,包含层名称、维度信息
    output = model.graph.output  # 获取输出层,包含层名称、维度信息
    depth = len(model.graph.node)   # 获取节点数
    doc_string = model.graph.doc_string  # 获取关于onnx模型的相关文档,是在哪里转换的

    print(f"input = {input}")
    print(f"output = {output}")
    print(f"depth = {depth}")
    print(f"doc_string = {doc_string}")

    # 参考 https://www.jianshu.com/p/476478c17b8e
    # Print a human readable representation of the graph
    print(onnx.helper.printable_graph(model.graph))

输出结果:

input = [name: "images"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3
      }
      dim {
        dim_value: 640
      }
      dim {
        dim_value: 640
      }
    }
  }
}
]
output = [name: "out0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 12800
      }
      dim {
        dim_value: 1
      }
    }
  }
}
, name: "out1"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3200
      }
      dim {
        dim_value: 1
      }
    }
  }
}
, name: "out2"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 800
      }
      dim {
        dim_value: 1
      }
    }
  }
}
, name: "out3"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 12800
      }
      dim {
        dim_value: 4
      }
    }
  }
}
, name: "out4"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3200
      }
      dim {
        dim_value: 4
      }
    }
  }
}
, name: "out5"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 800
      }
      dim {
        dim_value: 4
      }
    }
  }
}
, name: "out6"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 12800
      }
      dim {
        dim_value: 10
      }
    }
  }
}
, name: "out7"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3200
      }
      dim {
        dim_value: 10
      }
    }
  }
}
, name: "out8"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 800
      }
      dim {
        dim_value: 10
      }
    }
  }
}
]
depth = 191
doc_string = 

graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %neck.lateral_convs.0.conv.weight[FLOAT, 16x72x1x1]
  %neck.lateral_convs.0.conv.bias[FLOAT, 16]
  %neck.lateral_convs.1.conv.weight[FLOAT, 16x152x1x1]
  %neck.lateral_convs.1.conv.bias[FLOAT, 16]
  %neck.lateral_convs.2.conv.weight[FLOAT, 16x288x1x1]
  %neck.lateral_convs.2.conv.bias[FLOAT, 16]
  %neck.fpn_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.0.conv.bias[FLOAT, 16]
  %neck.fpn_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.1.conv.bias[FLOAT, 16]
  %neck.fpn_convs.2.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.2.conv.bias[FLOAT, 16]
  %neck.downsample_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.downsample_convs.0.conv.bias[FLOAT, 16]
  %neck.downsample_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.downsample_convs.1.conv.bias[FLOAT, 16]
  %neck.pafpn_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.pafpn_convs.0.conv.bias[FLOAT, 16]
  %neck.pafpn_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.pafpn_convs.1.conv.bias[FLOAT, 16]
  %bbox_head.stride_cls.(8, 8).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(8, 8).bias[FLOAT, 2]
  %bbox_head.stride_cls.(16, 16).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(16, 16).bias[FLOAT, 2]
  %bbox_head.stride_cls.(32, 32).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(32, 32).bias[FLOAT, 2]
  %bbox_head.stride_reg.(8, 8).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(8, 8).bias[FLOAT, 8]
  %bbox_head.stride_reg.(16, 16).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(16, 16).bias[FLOAT, 8]
  %bbox_head.stride_reg.(32, 32).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(32, 32).bias[FLOAT, 8]
  %bbox_head.stride_kps.(8, 8).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(8, 8).bias[FLOAT, 20]
  %bbox_head.stride_kps.(16, 16).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(16, 16).bias[FLOAT, 20]
  %bbox_head.stride_kps.(32, 32).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(32, 32).bias[FLOAT, 20]
  %555[FLOAT, 16x3x3x3]
  %556[FLOAT, 16]
  %558[FLOAT, 16x1x3x3]
  %559[FLOAT, 16]
  %561[FLOAT, 16x16x1x1]
  %562[FLOAT, 16]
  %564[FLOAT, 16x1x3x3]
  %565[FLOAT, 16]
  %567[FLOAT, 40x16x1x1]
  %568[FLOAT, 40]
  %570[FLOAT, 40x1x3x3]
  %571[FLOAT, 40]
  %573[FLOAT, 40x40x1x1]
  %574[FLOAT, 40]
  %576[FLOAT, 40x1x3x3]
  %577[FLOAT, 40]
  %579[FLOAT, 72x40x1x1]
  %580[FLOAT, 72]
  %582[FLOAT, 72x1x3x3]
  %583[FLOAT, 72]
  %585[FLOAT, 72x72x1x1]
  %586[FLOAT, 72]
  %588[FLOAT, 72x1x3x3]
  %589[FLOAT, 72]
  %591[FLOAT, 72x72x1x1]
  %592[FLOAT, 72]
  %594[FLOAT, 72x1x3x3]
  %595[FLOAT, 72]
  %597[FLOAT, 152x72x1x1]
  %598[FLOAT, 152]
  %600[FLOAT, 152x1x3x3]
  %601[FLOAT, 152]
  %603[FLOAT, 152x152x1x1]
  %604[FLOAT, 152]
  %606[FLOAT, 152x1x3x3]
  %607[FLOAT, 152]
  %609[FLOAT, 288x152x1x1]
  %610[FLOAT, 288]
  %612[FLOAT, 288x1x3x3]
  %613[FLOAT, 288]
  %615[FLOAT, 288x288x1x1]
  %616[FLOAT, 288]
  %618[FLOAT, 288x1x3x3]
  %619[FLOAT, 288]
  %621[FLOAT, 288x288x1x1]
  %622[FLOAT, 288]
  %624[FLOAT, 288x1x3x3]
  %625[FLOAT, 288]
  %627[FLOAT, 288x288x1x1]
  %628[FLOAT, 288]
  %630[FLOAT, 288x1x3x3]
  %631[FLOAT, 288]
  %633[FLOAT, 288x288x1x1]
  %634[FLOAT, 288]
  %636[FLOAT, 288x1x3x3]
  %637[FLOAT, 288]
  %639[FLOAT, 288x288x1x1]
  %640[FLOAT, 288]
  %642[FLOAT, 16x1x3x3]
  %643[FLOAT, 16]
  %645[FLOAT, 64x16x1x1]
  %646[FLOAT, 64]
  %648[FLOAT, 64x1x3x3]
  %649[FLOAT, 64]
  %651[FLOAT, 64x64x1x1]
  %652[FLOAT, 64]
  %654[FLOAT, 16x1x3x3]
  %655[FLOAT, 16]
  %657[FLOAT, 64x16x1x1]
  %658[FLOAT, 64]
  %660[FLOAT, 64x1x3x3]
  %661[FLOAT, 64]
  %663[FLOAT, 64x64x1x1]
  %664[FLOAT, 64]
  %666[FLOAT, 16x1x3x3]
  %667[FLOAT, 16]
  %669[FLOAT, 64x16x1x1]
  %670[FLOAT, 64]
  %672[FLOAT, 64x1x3x3]
  %673[FLOAT, 64]
  %675[FLOAT, 64x64x1x1]
  %676[FLOAT, 64]
  %677[INT64, 1]
  %678[INT64, 1]
  %679[INT64, 1]
  %680[INT64, 1]
  %681[INT64, 1]
  %682[INT64, 1]
  %683[INT64, 1]
  %684[INT64, 1]
  %685[INT64, 1]
  %686[INT64, 1]
  %687[INT64, 1]
  %688[INT64, 1]
  %689[INT64, 1]
  %690[INT64, 1]
  %691[INT64, 1]
  %692[INT64, 1]
  %693[INT64, 1]
  %694[INT64, 1]
) {
  %554 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%images, %555, %556)
  %288 = Relu(%554)
  %557 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%288, %558, %559)
  %291 = Relu(%557)
  %560 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%291, %561, %562)
  %294 = Relu(%560)
  %563 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%294, %564, %565)
  %297 = Relu(%563)
  %566 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%297, %567, %568)
  %300 = Relu(%566)
  %569 = Conv[dilations = [1, 1], group = 40, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%300, %570, %571)
  %303 = Relu(%569)
  %572 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%303, %573, %574)
  %306 = Relu(%572)
  %575 = Conv[dilations = [1, 1], group = 40, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%306, %576, %577)
  %309 = Relu(%575)
  %578 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%309, %579, %580)
  %312 = Relu(%578)
  %581 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%312, %582, %583)
  %315 = Relu(%581)
  %584 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%315, %585, %586)
  %318 = Relu(%584)
  %587 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%318, %588, %589)
  %321 = Relu(%587)
  %590 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%321, %591, %592)
  %324 = Relu(%590)
  %593 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%324, %594, %595)
  %327 = Relu(%593)
  %596 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%327, %597, %598)
  %330 = Relu(%596)
  %599 = Conv[dilations = [1, 1], group = 152, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%330, %600, %601)
  %333 = Relu(%599)
  %602 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%333, %603, %604)
  %336 = Relu(%602)
  %605 = Conv[dilations = [1, 1], group = 152, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%336, %606, %607)
  %339 = Relu(%605)
  %608 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %609, %610)
  %342 = Relu(%608)
  %611 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%342, %612, %613)
  %345 = Relu(%611)
  %614 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%345, %615, %616)
  %348 = Relu(%614)
  %617 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%348, %618, %619)
  %351 = Relu(%617)
  %620 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%351, %621, %622)
  %354 = Relu(%620)
  %623 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%354, %624, %625)
  %357 = Relu(%623)
  %626 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%357, %627, %628)
  %360 = Relu(%626)
  %629 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%360, %630, %631)
  %363 = Relu(%629)
  %632 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%363, %633, %634)
  %366 = Relu(%632)
  %635 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%366, %636, %637)
  %369 = Relu(%635)
  %638 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%369, %639, %640)
  %372 = Relu(%638)
  %373 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%324, %neck.lateral_convs.0.conv.weight, %neck.lateral_convs.0.conv.bias)
  %374 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%336, %neck.lateral_convs.1.conv.weight, %neck.lateral_convs.1.conv.bias)
  %375 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%372, %neck.lateral_convs.2.conv.weight, %neck.lateral_convs.2.conv.bias)
  %376 = Shape(%374)
  %377 = Constant[value = <Scalar Tensor []>]()
  %378 = Gather[axis = 0](%376, %377)
  %379 = Shape(%374)
  %380 = Constant[value = <Scalar Tensor []>]()
  %381 = Gather[axis = 0](%379, %380)
  %382 = Unsqueeze[axes = [0]](%378)
  %383 = Unsqueeze[axes = [0]](%381)
  %384 = Concat[axis = 0](%382, %383)
  %385 = Shape(%375)
  %386 = Constant[value = <Tensor>]()
  %387 = Constant[value = <Tensor>]()
  %388 = Constant[value = <Tensor>]()
  %389 = Slice(%385, %387, %388, %386)
  %390 = Cast[to = 7](%384)
  %391 = Concat[axis = 0](%389, %390)
  %392 = Constant[value = <Tensor>]()
  %393 = Constant[value = <Tensor>]()
  %394 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%375, %392, %393, %391)
  %395 = Add(%374, %394)
  %396 = Shape(%373)
  %397 = Constant[value = <Scalar Tensor []>]()
  %398 = Gather[axis = 0](%396, %397)
  %399 = Shape(%373)
  %400 = Constant[value = <Scalar Tensor []>]()
  %401 = Gather[axis = 0](%399, %400)
  %402 = Unsqueeze[axes = [0]](%398)
  %403 = Unsqueeze[axes = [0]](%401)
  %404 = Concat[axis = 0](%402, %403)
  %405 = Shape(%395)
  %406 = Constant[value = <Tensor>]()
  %407 = Constant[value = <Tensor>]()
  %408 = Constant[value = <Tensor>]()
  %409 = Slice(%405, %407, %408, %406)
  %410 = Cast[to = 7](%404)
  %411 = Concat[axis = 0](%409, %410)
  %412 = Constant[value = <Tensor>]()
  %413 = Constant[value = <Tensor>]()
  %414 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%395, %412, %413, %411)
  %415 = Add(%373, %414)
  %416 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%415, %neck.fpn_convs.0.conv.weight, %neck.fpn_convs.0.conv.bias)
  %417 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%395, %neck.fpn_convs.1.conv.weight, %neck.fpn_convs.1.conv.bias)
  %418 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%375, %neck.fpn_convs.2.conv.weight, %neck.fpn_convs.2.conv.bias)
  %419 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%416, %neck.downsample_convs.0.conv.weight, %neck.downsample_convs.0.conv.bias)
  %420 = Add(%417, %419)
  %421 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%420, %neck.downsample_convs.1.conv.weight, %neck.downsample_convs.1.conv.bias)
  %422 = Add(%418, %421)
  %423 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%420, %neck.pafpn_convs.0.conv.weight, %neck.pafpn_convs.0.conv.bias)
  %424 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%422, %neck.pafpn_convs.1.conv.weight, %neck.pafpn_convs.1.conv.bias)
  %641 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%416, %642, %643)
  %427 = Relu(%641)
  %644 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%427, %645, %646)
  %430 = Relu(%644)
  %647 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%430, %648, %649)
  %433 = Relu(%647)
  %650 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%433, %651, %652)
  %436 = Relu(%650)
  %437 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_cls.(8, 8).weight, %bbox_head.stride_cls.(8, 8).bias)
  %438 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_reg.(8, 8).weight, %bbox_head.stride_reg.(8, 8).bias)
  %439 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_kps.(8, 8).weight, %bbox_head.stride_kps.(8, 8).bias)
  %440 = Shape(%437)
  %441 = Constant[value = <Scalar Tensor []>]()
  %442 = Gather[axis = 0](%440, %441)
  %443 = Transpose[perm = [0, 2, 3, 1]](%437)
  %446 = Unsqueeze[axes = [0]](%442)
  %449 = Concat[axis = 0](%446, %677, %678)
  %450 = Reshape(%443, %449)
  %out0 = Sigmoid(%450)
  %452 = Transpose[perm = [0, 2, 3, 1]](%438)
  %455 = Unsqueeze[axes = [0]](%442)
  %458 = Concat[axis = 0](%455, %679, %680)
  %out3 = Reshape(%452, %458)
  %460 = Transpose[perm = [0, 2, 3, 1]](%439)
  %463 = Unsqueeze[axes = [0]](%442)
  %466 = Concat[axis = 0](%463, %681, %682)
  %out6 = Reshape(%460, %466)
  %653 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%423, %654, %655)
  %470 = Relu(%653)
  %656 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%470, %657, %658)
  %473 = Relu(%656)
  %659 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%473, %660, %661)
  %476 = Relu(%659)
  %662 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%476, %663, %664)
  %479 = Relu(%662)
  %480 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_cls.(16, 16).weight, %bbox_head.stride_cls.(16, 16).bias)
  %481 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_reg.(16, 16).weight, %bbox_head.stride_reg.(16, 16).bias)
  %482 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_kps.(16, 16).weight, %bbox_head.stride_kps.(16, 16).bias)
  %483 = Shape(%480)
  %484 = Constant[value = <Scalar Tensor []>]()
  %485 = Gather[axis = 0](%483, %484)
  %486 = Transpose[perm = [0, 2, 3, 1]](%480)
  %489 = Unsqueeze[axes = [0]](%485)
  %492 = Concat[axis = 0](%489, %683, %684)
  %493 = Reshape(%486, %492)
  %out1 = Sigmoid(%493)
  %495 = Transpose[perm = [0, 2, 3, 1]](%481)
  %498 = Unsqueeze[axes = [0]](%485)
  %501 = Concat[axis = 0](%498, %685, %686)
  %out4 = Reshape(%495, %501)
  %503 = Transpose[perm = [0, 2, 3, 1]](%482)
  %506 = Unsqueeze[axes = [0]](%485)
  %509 = Concat[axis = 0](%506, %687, %688)
  %out7 = Reshape(%503, %509)
  %665 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%424, %666, %667)
  %513 = Relu(%665)
  %668 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%513, %669, %670)
  %516 = Relu(%668)
  %671 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%516, %672, %673)
  %519 = Relu(%671)
  %674 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%519, %675, %676)
  %522 = Relu(%674)
  %523 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_cls.(32, 32).weight, %bbox_head.stride_cls.(32, 32).bias)
  %524 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_reg.(32, 32).weight, %bbox_head.stride_reg.(32, 32).bias)
  %525 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_kps.(32, 32).weight, %bbox_head.stride_kps.(32, 32).bias)
  %526 = Shape(%523)
  %527 = Constant[value = <Scalar Tensor []>]()
  %528 = Gather[axis = 0](%526, %527)
  %529 = Transpose[perm = [0, 2, 3, 1]](%523)
  %532 = Unsqueeze[axes = [0]](%528)
  %535 = Concat[axis = 0](%532, %689, %690)
  %536 = Reshape(%529, %535)
  %out2 = Sigmoid(%536)
  %538 = Transpose[perm = [0, 2, 3, 1]](%524)
  %541 = Unsqueeze[axes = [0]](%528)
  %544 = Concat[axis = 0](%541, %691, %692)
  %out5 = Reshape(%538, %544)
  %546 = Transpose[perm = [0, 2, 3, 1]](%525)
  %549 = Unsqueeze[axes = [0]](%528)
  %552 = Concat[axis = 0](%549, %693, %694)
  %out8 = Reshape(%546, %552)
  return %out0, %out1, %out2, %out3, %out4, %out5, %out6, %out7, %out8
}

可以看到其输出有3个dict,一个是 input, 一个是 initializers,以及最后一个是operators把输入和权重 initialization 进行类似于 forward操作,在最后一个dict operators中其返回是 %191,也就是 gemm 最后一个全连接的输出。

利用netron在线工具https://netron.app/ 查看SCRFD模型结构(SCRFD中FPN每一层对应3个head)
在这里插入图片描述

3、onnx模型推理

这里使用SCRFD模型进行推理

import cv2
import onnx
from onnx import helper
import onnxruntime
import numpy as np

'''三、onnx模型推理'''
if __name__ == '__main__':

    ort_sess = onnxruntime.InferenceSession('./weights/scrfd_500m_kps.onnx')  # Create inference session using ort.InferenceSession
    
    # 加载图片
    img = cv2.imread("./img/calibrate_glasses.jpg")
    img = cv2.resize(img, (640, 640))
    img = img.astype(np.float32) / 255.
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.reshape((3,640,640))
    if len(img.shape) == 3:
        img = np.expand_dims(img, 0)

    outputs = ort_sess.run(None, {'images': img})  # 调用实例sess的run方法进行推理
    
    print(f"length of outputs = {len(outputs)}")
---
length of outputs = 9

模型的输出结果和上面的可视化的模型结果是一致的,FPN有3层,每层FPN有3个head,共9个输出。

五、onnx推理效率:和Module & DataParallel比较

跳转至onnx效率问题:和Module & DataParallel比较

  • 6
    点赞
  • 83
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
ONNX(Open Neural Network Exchange)是一种开放的深度学习模型表示格式,它的目标是实现模型的互操作性。通过使用ONNX,可以将模型从一个深度学习框架转移到另一个框架,而无需重新训练模型。ONNX定义了一种中间表示(IR),可以表示深度学习模型的结构和参数。\[1\] ONNXRuntime是一个用于高性能推理的开源引擎,它支持在多个平台上运行ONNX模型。ONNXRuntime提供了一组API,可以加载和执行ONNX模型,并提供了针对不同硬件和操作系统的优化。它可以与各种深度学习框架(如PyTorch、TensorFlow等)集成,使得在不同框架之间进行模型转换和推理变得更加方便。\[1\] 关于问题中提到的错误信息,根据引用\[2\]中的内容,可能是由于打包的ONNX文件与当前版本的ONNXRuntime不兼容导致的。解决方法是重新打包ONNX文件,以适应当前版本的ONNXRuntime。 如果你对ONNXONNXRuntime的更多细节感兴趣,可以参考引用\[1\]和引用\[3\]中提供的文章,它们提供了关于ONNX标准ONNXRuntime的设计理念以及使用ONNXONNXRuntime进行推理加速的详细信息。 #### 引用[.reference_title] - *1* *2* [onnx标准 & onnxRuntime加速推理引擎](https://blog.csdn.net/qq_33934427/article/details/124114195)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [onnx和pytorch,tensorrt 推理速度对比GPU CPU](https://blog.csdn.net/weixin_37989267/article/details/126243985)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值