onnxruntime C++ 使用（一）

SongpingWang

已于 2022-03-04 11:19:17 修改

阅读量1w

点赞数 1

分类专栏： pytorch 文章标签： qt 开发语言 ui

于 2022-03-04 11:05:51 首次发布

原文链接：https://onnxruntime.ai/docs/reference/ort-model-format.html

版权

pytorch 专栏收录该内容

33 篇文章

订阅专栏

一、简介

官方网站：https://onnxruntime.ai/

什么是 ORT 模型格式？

ORT 模型格式是一种尺寸缩小的模型格式，用于移动和 Web 应用程序等受限环境。ONNX Runtime 提供了将 ONNX 模型转换为 ORT 格式的工具。

将 ONNX 模型转换为 ORT 格式

使用脚本将 ONNX 模型转换为 ORT 格式convert_onnx_models_to_ort。
转换脚本执行两个功能：

加载和优化 ONNX 格式模型，并以 ORT 格式保存它们
确定优化模型所需的运算符和可选的数据类型，并将它们保存在配置文件中，以便在需要时用于简化的运算符构建
转换脚本可以在单个 ONNX 模型或目录上运行。如果针对目录运行，将递归搜索该目录以查找要转换的“.onnx”文件。
每个 ‘.onnx’ 文件都以 ORT 格式加载、优化并保存为与原始 ‘.onnx’ 文件位于相同位置的具有 ‘.ort’ 扩展名的文件。

ONNX 模型可以从ONNX 模型动物园获得，以及许多其他地方。
将模型获取或转换为 ONNX 格式后，还需要进一步的步骤来优化移动部署的模型。将模型转换为 ORT 格式以优化模型二进制大小、更快的初始化和峰值内存使用。
ONNX Runtime Mobile 使用 ORT 模型格式，这使我们能够创建自定义 ORT 构建，以最小化二进制大小并减少客户端推理的内存使用量。ORT 模型格式文件是使用onnxruntimepython 包从常规 ONNX 模型生成的。

脚本的输出

每个 ONNX 模型一个 ORT 格式模型
一个构建配置文件（‘required_operators.config’），其中包含优化的 ONNX 模型所需的运算符。
如果启用类型缩减（ONNX 运行时版本 1.7 或更高版本），配置文件还将包括每个运算符所需的类型，称为“required_operators_and_types.config”。
如果您使用的是预构建的 ONNX Runtime iOS、Android或web包，则不使用构建配置文件，可以忽略。

脚本位置

ONNX 运行时版本 1.5.2 或更高版本支持 ORT 模型格式。
ONNX 格式模型到 ORT 格式的转换利用 ONNX 运行时 python 包，因为模型被加载到 ONNX 运行时并作为转换过程的一部分进行优化。
对于 ONNX Runtime 版本 1.8 及更高版本，转换脚本直接从 ONNX Runtime python 包运行。
对于早期版本，转换脚本从本地 ONNX 运行时存储库运行。

二、安装 ONNX runtime

从https://pypi.org/project/onnxruntime/安装 onnxruntime python 包，以便将模型从 ONNX 格式转换为内部 ORT 格式。需要 1.5.3 或更高版本。
安装最新版本
pip install onnxruntime
安装以前的版本
如果您从源代码构建 ONNX 运行时（自定义、精简或最小构建），则必须将 python 包版本与您签出的 ONNX 运行时存储库的分支相匹配。
例如，要使用 1.7 版本：

git checkout rel-1.7.2
pip install onnxruntime==1.7.2

如果您使用mastergit 存储库中的分支，则应使用 nightly ONNX Runtime python 包

pip install -U -i https://test.pypi.org/simple/ ort-nightly

将 ONNX 模型转换为 ORT 格式脚本使用

ONNX 运行时版本 1.8 或更高版本：

python -m onnxruntime.tools.convert_onnx_models_to_ort <onnx model file or dir>

在哪里：

onnx 模式文件或目录是包含一个或多个 .onnx 模型的 .onnx 文件或目录的路径
当前的可选参数可以通过运行带有--help参数的脚本来获得。ONNX 运行时版本支持的参数和默认值略有不同。

python -m onnxruntime.tools.convert_onnx_models_to_ort --help
usage: convert_onnx_models_to_ort.py [-h] [--use_nnapi]
      [--optimization_level {disable,basic,extended,all}]
      [--enable_type_reduction]
      [--custom_op_library CUSTOM_OP_LIBRARY] [--save_optimized_onnx_model]
      model_path_or_dir
Convert the ONNX model/s in the provided directory to ORT format models. 
All files with a `.onnx` extension will be processed. For each one, an ORT format model will be created in the same directory. 
A configuration file will also be created called `required_operators.config`, and will contain the list of required operators for all converted models. 
This configuration file should be used as input to the minimal build via the `--include_ops_by_config` parameter.
positional arguments:
  model_path_or_dir     Provide path to ONNX model or directory containing ONNX model/s to convert. All files with a .onnx extension, including in subdirectories, will be processed.
optional arguments:
  -h, --help            show this help message and exit
  --optimization_level {disable,basic,extended,all}
                        Level to optimize ONNX model with, prior to converting to ORT format model. 
                        These map to the onnxruntime.GraphOptimizationLevel values. 
                        If the level is 'all' the NCHWc transformer is manually disabled as it contains device specific logic, 
                        so the ORT format model must be generated on the device it will run on. 
                        Additionally, the NCHWc optimizations are not applicable to ARM devices.
  --enable_type_reduction
                        Add operator specific type information to the configuration file to potentially 
                        reduce the types supported by individual operator implementations.
  --custom_op_library CUSTOM_OP_LIBRARY
                        Provide path to shared library containing custom operator kernels to register.
  --save_optimized_onnx_model
                        Save the optimized version of each ONNX model. This will have the same optimizations 
                        applied as the ORT format model.

可选脚本参数

优化级别

设置 ONNX 运行时在以 ORT 格式保存之前用于优化模型的优化级别。
对于 ONNX 运行时版本 1.8 及更高版本，如果模型将使用 CPU 执行提供程序 (EP) 运行，则建议使用all 。
对于早期版本，建议使用扩展，因为以前的**所有级别都包含特定于设备的优化，这会限制模型的可移植性。
如果要使用 NNAPI EP 或 CoreML EP 运行模型，建议使用基本优化级别创建 ORT 格式模型。应进行性能测试以比较在启用 NNAPI 或 CoreML EP 的情况下运行此模型与使用 CPU EP 运行优化到更高级别的模型以确定最佳设置。
有关更多信息，请参阅有关性能调整移动方案的文档。

启用类型缩减

使用 ONNX 运行时版本 1.7 及更高版本，可以限制所需操作员支持的数据类型，以进一步减小构建大小。这种修剪在本文档中称为“运算符类型减少”。随着 ONNX 模型的转换，每个算子所需的输入和输出数据类型都会累积并包含在配置文件中。
如果您希望启用运算符类型减少，则必须安装Flatbuffers python 包。

pip install flatbuffers

例如，Softmax 的 ONNX 运行时内核支持浮点和双精度。如果您的模型使用 Softmax 但仅使用浮点数据，我们可以排除支持双精度的实现以减少内核的二进制大小。

自定义运算符支持

如果您的 ONNX 模型使用自定义运算符，则必须提供包含自定义运算符内核的库的路径，以便可以成功加载 ONNX 模型。自定义运算符将保留在 ORT 格式模型中。

保存优化的 ONNX 模型

添加此标志以保存优化的 ONNX 模型。优化后的 ONNX 模型包含与 ORT 格式模型相同的节点和初始化程序，可以在Netron中查看以进行调试和性能调整。

三、ONNX 运行时的早期版本

在 ONNX Runtime 版本 1.7 之前，模型转换脚本必须从克隆的源存储库运行：

python <ONNX Runtime repository root>/tools/python/convert_onnx_models_to_ort.py <onnx model file or dir>

加载并执行 ORT 格式的模型

用于执行 ORT 格式模型的 API 与 ONNX 模型相同。
有关各个 API 使用的详细信息，请参阅ONNX 运行时 API 文档。

平台 API

平台	可用的 API
安卓	C、C++、Java、Kotlin
iOS	C、C++、Objective-C（Swift 通过桥接）
网络	JavaScript

ORT 格式模型加载

如果您为 ORT 格式模型提供文件名，“.ort”的文件扩展名将被推断为 ORT 格式模型。
如果您为 ORT 格式模型提供内存字节，则将检查这些字节中的标记以推断它是否是 ORT 格式模型。
如果您希望明确地说 InferenceSession 输入是一个 ORT 格式模型，您可以通过 SessionOptions 来实现，尽管这通常不是必需的。

从文件路径加载 ORT 格式模型

C++ API

Ort::SessionOptions session_options;
session_options.AddConfigEntry("session.load_model_format", "ORT");
Ort::Env env;
Ort::Session session(env, <path to model>, session_options);

从内存字节数组中加载 ORT 格式模型

如果使用包含 ORT 格式模型数据的输入字节数组创建会话，默认情况下，我们将在创建会话时复制模型字节，以确保模型字节缓冲区有效。
session.use_ort_model_bytes_directly您还可以通过将 Session Options 配置设置为来启用直接使用模型字节的选项1。这可能会减少 ONNX Runtime Mobile 的峰值内存使用量，但您需要保证模型字节在 ORT 会话的整个生命周期内都是有效的。对于 ONNX Runtime Web，默认设置此选项。
C++ API

Ort::SessionOptions session_options;
session_options.AddConfigEntry("session.load_model_format", "ORT");
session_options.AddConfigEntry("session.use_ort_model_bytes_directly", "1");
std::ifstream stream(<path to model>, std::ios::in | std::ios::binary);
std::vector<uint8_t> model_bytes((std::istreambuf_iterator<char>(stream)), std::istreambuf_iterator<char>());
Ort::Env env;
Ort::Session session(env, model_bytes.data(), model_bytes.size(), session_options);

C/C++ examples: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx