onnxruntime调用AI模型的python和C++编程

最新推荐文章于 2024-05-29 01:10:08 发布

Arnold-FY-Chen

最新推荐文章于 2024-05-29 01:10:08 发布

阅读量1.5w

点赞数 12

分类专栏：深度学习文章标签：深度学习 onnxruntime

本文链接：https://blog.csdn.net/XCCCCZ/article/details/110356437

版权

深度学习专栏收录该内容

38 篇文章 13 订阅

订阅专栏

python版的onnxruntime是比较容易使用的，先保证pip更新到最新再安装onnxruntime:

pip install --upgrade pip
#安装cpu版
pip install onnxruntime
#或者安装gpu版
#pip install onnxruntime-gpu

只是用来验证模型的话，用cpu版的就很好了,比较简单易用。注意两种版本不要同时安装，否则调用时怎么弄都出错，说参数不对：

incompatible constructor arguments. The following argument types are supported

卸载掉onnxruntime-gpu版后,onnxruntime InferenceSession(...)报错:

module 'onnxruntime' has no attribute 'InferenceSession'

解决办法就是把onnxruntime cpu版也卸载掉后重新安装onnxruntime cpu版即可。

参照这里的API文档和sample代码写自己的代码即可:

https://www.onnxruntime.ai/python/api_summary.html

https://www.onnxruntime.ai/python/tutorial.html

当模型的输入输出都只有一个参数时，一般仿照sample里代码写就行了，如果有多个参数时可以仿照API文档里的例子使用iobinding来指定每个输入输出参数。

我自己代码，看起来调用部分代码很简单吧:

ort_session = onnxruntime.InferenceSession("models/efficientdet-d0-s.onnx")
...
#preprocess input data to get x
...
#
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)
regression = torch.from_numpy(ort_outs[0])
classification = torch.from_numpy(ort_outs[1])
anchors = torch.from_numpy(ort_outs[2])
#postprocess regression and classification and anchors to get bboxes
...

使用C++调用就麻烦些了，一般做AI模型的实验和工程应用落地的开发都是在linux上，但是，MS一贯的毛病就是只优先自家的宝贝Windows .NET:

所以，首先你得针对你的Linux操作系统和CPU芯片编译出so库(python版的whl安装包也会顺带编译出来),如何编译，可以参考前一篇文章。

编译完后在onnxruntime/build/Linux/MinSizeRel/下会生成一堆文件，其中有几个so库文件，其中libonnxruntime.so就是需要在编译你的调用程序时需要加上-lonnxruntime进行链接的。

另外，头文件在onnxruntime/include/onnxruntime/core/下，编译时是需要include的，为了方便可以把整个onnxruntime/include/下的onnxruntime整个目录全部拷贝或者链接到项目的include路径里去：

CFLAGS+= -DUSE_TENSORRT -Wno-deprecated-declarations -fPIC -DDS_VERSION=\"5.0.0\" -DBOOST_ALL_DYN_LINK \
         -Wno-deprecated-declarations -fPIC -DDS_VERSION=\"5.0.0\" -DBOOST_ALL_DYN_LINK \
         -I /usr/local/cuda-$(CUDA_VER)/include \
         -I /opt/nvidia/deepstream/deepstream/sources/includes/ \
         -I $(PROJECT_ROOT)/plugin/common \
         -I $(PROJECT_ROOT)/ext/inc \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/common \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/common/logging \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/framework \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/graph \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/optimizer \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/platform \
         -I $(PROJECT_ROOT)/ext/inc/onnxruntime/core/session \
         -I /usr/local/cuda-10.2/targets/aarch64-linux/include/ \
         -I /usr/include/opencv4

C++ API的文档是 https://github.com/microsoft/onnxruntime/blob/master/include/onnxruntime/core/session/onnxruntime_cxx_api.h

C++ samples里可以看看这两个简单的例子:

https://github.com/microsoft/onnxruntime/blob/master/samples/c_cxx/MNIST/MNIST.cpp

https://github.com/microsoft/onnxruntime/blob/master/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/CXX_Api_Sample.cpp

这些C++代码调用onnxruntime的例子在调用模型时都属于很简单的情况，AI模型只有一个input和一个output，实际项目中我们自己的模型很可能有多个output，这些怎么弄呢，API文档是没有说清楚的，我也是琢磨了一阵，翻看了onnxruntime的靠下层的源码onnxruntime/include/onnxruntime/core/session/onnxruntime_cxx_inline.h 才弄明白:

这里贴出我的与onnxruntime调用相关的代码作为示例，更多的代码涉及到项目的商业秘密就不好列出来了:

#include <core/session/onnxruntime_cxx_api.h>
#include <cstring>
#include <array>
#include <cmath>
#include <algorithm>
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include "providers.h"

class EfficientDetOnnxRT {
 public:
 
  EfficientDetOnnxRT(std::string onnx_file,unsigned int numClasses); 
  
  ~EfficientDetOnnxRT(); 

  ...
  
  bool processInput(float* buffer,cv::Mat& cv_img_origin); 

  std::vector<std::vector<float>> processOutput(float* regressionOutputBuffer,float* classificationOutputBuffer,
                 float* anchorsOutputBuffer,float nms_threshold, float score_thresholdi); 

  std::vector<std::vector<float>> infer(cv::Mat& cv_img, float nms_threshold, float score_threshold ); 


 private:
  ...

  char* input_names[1]{nullptr};
  float* input_image_{nullptr};
  Ort::Value input_tensor_{(Ort::Value)nullptr};
  std::array<int64_t, 4> input_shape_;
 
  Ort::Env env_ {ORT_LOGGING_LEVEL_ERROR, "EfficientDetOnnxEnv"};  
  Ort::Session session_ {nullptr};
 
  char* output_names[3]{nullptr,nullptr,nullptr};
  Ort::Value output_tensor_[3]{(Ort::Value)nullptr,(Ort::Value)nullptr,(Ort::Value)nullptr};
  std::array<int64_t,3> output_shape_regression;     
  std::array<int64_t,3> output_shape_classification; 
  std::array<int64_t,3> output_shape_anchors;        
  float* results_regression{nullptr};    
  float* results_classification{nullptr}; 
  float* results_anchors{nullptr};       
 
  ...

};

EfficientDetOnnxRT::EfficientDetOnnxRT(std::string onnx_file,unsigned int numClasses) {

    Ort::SessionOptions op;
    op.SetLogSeverityLevel(ORT_LOGGING_LEVEL_ERROR);
    int device_id = 0;
    
    std::cout <<"onnxruntime loading onnx model..." <<std::endl;
#if defined(USE_TENSORRT)
      std::cout <<"onnxruntime running with TensorRT..." <<std::endl;
      Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_Tensorrt(op, device_id));
      
#elif defined(USE_CUDA)
      OrtCUDAProviderOptions cuda_options{
          0,
          OrtCudnnConvAlgoSearch::EXHAUSTIVE,
          std::numeric_limits<size_t>::max(),
          0,
          true
      };
      Ort::ThrowOnError(op.OrtSessionOptionsAppendExecutionProvider_CUDA(op, &cuda_options));
#endif
    session_ = Ort::Session(env_, onnx_file.c_str(), op);

    Ort::AllocatorWithDefaultOptions ort_alloc;
    char* tmp = session_.GetInputName(0, ort_alloc);
    input_names[0] = strdup(tmp);
    ort_alloc.Free(tmp);
    tmp = session_.GetOutputName(0, ort_alloc);
    output_names[0] = strdup(tmp);
    ort_alloc.Free(tmp);
    tmp = session_.GetOutputName(1, ort_alloc);
    output_names[1] = strdup(tmp);
    ort_alloc.Free(tmp);
    tmp = session_.GetOutputName(2, ort_alloc);
    output_names[2] = strdup(tmp);
    ort_alloc.Free(tmp);
   
    Ort::TypeInfo info = session_.GetInputTypeInfo(0); 
    auto tensor_info = info.GetTensorTypeAndShapeInfo();
    size_t dim_count = tensor_info.GetDimensionsCount();
    std::vector<int64_t> dims(dim_count);
    tensor_info.GetDimensions(dims.data(), dims.size());
    channels_ = dims[1];
    height_ = dims[2];
    width_  = dims[3];
    input_shape_[0]= dims[0];
    input_shape_[1]= channels_;
    input_shape_[2]= height_; 
    input_shape_[3]= width_;


    info = session_.GetOutputTypeInfo(0); 
    auto tensor_info2 = info.GetTensorTypeAndShapeInfo();
    dim_count = tensor_info.GetDimensionsCount();
    dims.clear();
    dims.resize(dim_count);
    tensor_info2.GetDimensions(dims.data(), dims.size());
    for (int i=0; i< dims.size();i++)
      output_shape_regression[i] = dims[i];

  
    info = session_.GetOutputTypeInfo(1); 
    auto tensor_info3 = info.GetTensorTypeAndShapeInfo();
    dim_count = tensor_info3.GetDimensionsCount();
    dims.clear();
    dims.resize(dim_count);
    tensor_info3.GetDimensions(dims.data(), dims.size());
    for (int i=0; i< dims.size();i++)
      output_shape_classification[i] = dims[i];

    numClassScores_ = static_cast<unsigned int>(dims[2]);
    output_shape_classification[3] = numClassScores_;
    ...

    info = session_.GetOutputTypeInfo(2); 
    auto tensor_info4 = info.GetTensorTypeAndShapeInfo();
    dim_count = tensor_info4.GetDimensionsCount();
    dims.clear();
    dims.resize(dim_count);
    tensor_info4.GetDimensions(dims.data(), dims.size());
    for (int i=0; i< dims.size();i++)
      output_shape_anchors[i] = dims[i];
    ... 
    
    int size_anchors = dims[1] * dims[2];
    int size_classification = dims[1];
    results_regression = new float[size_anchors];
    results_classification = new float[size_classification];
    results_anchors = new float[size_anchors];  
 
    int size_image_data = channels_ * width_ * height_;
    input_image_ = new float[size_image_data];
    auto memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
    input_tensor_ = Ort::Value::CreateTensor<float>(memory_info, input_image_, size_image_data, input_shape_.data(), input_shape_.size());
    output_tensor_[0] = Ort::Value::CreateTensor<float>(memory_info, results_regression, size_anchors, 
                                       output_shape_regression.data(), output_shape_regression.size());
    output_tensor_[1] = Ort::Value::CreateTensor<float>(memory_info, results_classification, size_classification,
                                       output_shape_classification.data(), output_shape_classification.size());
    output_tensor_[2] = Ort::Value::CreateTensor<float>(memory_info, results_anchors, size_anchors,
                                       output_shape_anchors.data(), output_shape_anchors.size());
  }

...

std::vector<std::vector<float>> EfficientDetOnnxRT::infer(cv::Mat& cv_img, float nms_threshold, float score_threshold ) {
    memset(input_image_,0, channels_ * height_ * width_ * sizeof(float));
    processInput(input_image_,cv_img);   

    session_.Run(Ort::RunOptions{nullptr}, &input_names[0], &input_tensor_, 1, &output_names[0], &output_tensor_[0], 3); 
    return 
 processOutput(results_regression,results_classification,results_anchors,nms_threshold,score_threshold);
  }

...

C++代码的头文件需要包含onnxruntime_cxx_api.h这个onnxruntime的C++头文件

#include <core/session/onnxruntime_cxx_api.h>
另外，用到指定provider的话，还需要包含相关provider的头文件:

#include "providers.h"

可以把samples/c_cxx/include/providers.h这个文件拷贝到自己的项目中予以包含，因为这个文件是sample代码，并不在onnxruntime/include/onnxruntime/core/下,它的内容是：

#include "onnxruntime/core/providers/cpu/cpu_provider_factory.h"

#ifdef USE_CUDA
#include "onnxruntime/core/providers/cuda/cuda_provider_factory.h"
#endif
#ifdef USE_DNNL
#include "onnxruntime/core/providers/dnnl/dnnl_provider_factory.h"
#endif
#ifdef USE_NGRAPH
#include "onnxruntime/core/providers/ngraph/ngraph_provider_factory.h"
#endif
#ifdef USE_NUPHAR
#include "onnxruntime/core/providers/nuphar/nuphar_provider_factory.h"
#endif
#ifdef USE_TENSORRT
#include "onnxruntime/core/providers/tensorrt/tensorrt_provider_factory.h"
#include "onnxruntime/core/providers/cuda/cuda_provider_factory.h"
#endif
#ifdef USE_DML
#include "onnxruntime/core/providers/dml/dml_provider_factory.h"
#endif
#ifdef USE_MIGRAPHX
#include "onnxruntime/core/providers/migraphx/migraphx_provider_factory.h"
#endif

包含了provider.h后，在Makefile里CFLAGS记得加上 -DUSE_TENSORRT

上面的代码中input/output的传入/出仍然采用老式的简单办法，嫌麻烦没有采用iobinding来写，多个output的names和tensors传入Run()时注意取地址的方式，弄错了是编译不过的，为何要这样取地址，而不能像C++ sample代码里那样写呢，翻看Session类的Run()的底层实现代码就明白了。

C++版的iobinding传input/output的用法可以参考API说明和python版示例代码里的写法翻译成C++的写法就是了。

关于env和session的定义也是有个坑，不要定义在类的构造函数内部，而是要定义成类成员，这样才能全局可用，否则infer()用到session时就会报错:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000007f810c4240 in onnxruntime::logging::LoggingManager::Log(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::logging::Capture const&) const () from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
[Current thread is 1 (Thread 0x7f52036010 (LWP 31523))]
(gdb) bt
#0  0x0000007f810c4240 in onnxruntime::logging::LoggingManager::Log(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::logging::Capture const&) const () from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#1  0x0000007f810c3c3c in onnxruntime::logging::Capture::~Capture() () from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#2  0x0000007f8104c6b0 in onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator<int> > const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<int, std::allocator<int> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) ()
   from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#3  0x0000007f81039044 in onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) ()
   from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#4  0x0000007f8103a6ec in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) () from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#5  0x0000007f808084ac in onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) () from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#6  0x0000007f807d1e0c in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) ()
   from /home/dev/app-ext/lib/libonnxruntime.so.1.5.2
#7  0x0000005562bbc0a0 in Ort::Session::Run (this=0x558186df88, run_options=..., input_names=0x558186df50, input_values=0x558186df60, input_count=1, output_names=0x558186df90,
    output_values=0x558186dfa8, output_count=3) at /home/dev/laundry-app/ext/inc/onnxruntime/core/session/onnxruntime_cxx_inline.h:507
#8  0x0000005562bb9cd4 in EfficientDetOnnxRT::infer (this=0x558186df40, cv_img=..., nms_threshold=0.100000001, score_threshold=0.100000001) at effonnxrt.cpp:375
#9  0x0000005562bb56d8 in main (argc=1, argv=0x7fe2742d38) at test.cpp:21

把调用onnxruntime运行模型进行图像识别的应用程序编译出来后，如果是使用了TensorRT这个Provider，在第一次启动时会非常慢，我在Nano上是需要10分钟左右，原因就是TensorRT在解析和构建网络序列化生成engine文件非常慢，这个序列化engine文件默认是没有保存的，需要在运行程序前设置一下环境变量:

export ORT_TENSORRT_ENGINE_CACHE_ENABLE=1
export ORT_TENSORRT_ENGINE_CACHE_PATH="/home/dev/model"

然后TensorRT在启动时会在上面设置的路径/home/dev/model/下生成一个序列化了的engine文件，名字类似这样：

TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_0_0_706245698641377.engine

下次启动时只要设置了上面两个环境变量，那么TensorRT会自动读取这个engine文件，这样就略过了onnx解析等步骤，很快就启动完毕。

关于使用TensorRT Provider时，有哪些环境变量可设置及其意义，参见https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/TensorRT-ExecutionProvider.md，环境变量用得比较多点的除了上面的两个外，恐怕就是下面的几个了，可根据实际情况设置:

#override default max workspace size to 2GB
export ORT_TENSORRT_MAX_WORKSPACE_SIZE=2147483648  
#Enable FP16 mode in TensorRT
export ORT_TENSORRT_FP16_ENABLE=1
#Enable INT8 mode in TensorRT
#export ORT_TENSORRT_INT8_ENABLE=1

我测试了一下，使用了TensorRT的Provider时，做图像识别的推理速度会比CPU版快了不少，使用CUDA Provider时推理的速度居然比使用默认的CPU Provider(不设置任何Provider时默认使用它)时推理速度还慢，搞不懂为何，不过反正只是用来验证和对比模型精度，不是实际项目应用也就无所谓了。

Arnold-FY-Chen

关注

12
点赞
踩
53

收藏

觉得还不错? 一键收藏
打赏
25
评论
onnxruntime调用AI模型的python和C++编程

python版的onnxruntime是比较容易使用的，先保证pip更新到最新再安装onnxruntime:pip install --upgrade pip#安装cpu版pip install onnxruntime#或者安装gpu版#pip install onnxruntime-gpu只是用来验证模型的话，用cpu版的就很好了,比较简单易用。注意两种版本不要同时安装，否则调用时怎么弄都出错，说参数不对：incompatible constructor arguments. .
复制链接

扫一扫