Qualcomm AI Hub的入门指南1：模型编译

本文链接：https://blog.csdn.net/weixin_62707802/article/details/142360579

对官方文档的翻译，原文链接：Device — qai-hub documentation (qualcomm.com)

Qualcomm AI Hub 支持编译使用以下方法训练的模型：

PyTorch 插件
ONNX
AI Model Efficiency Toolkit (AIMET) quantized models.
TensorFlow（通过 ONNX）

上述任何模型都可以针对以下目标运行时进行编译

TensorFlow Lite（推荐给 Android 开发者）
ONNX 运行时ONNX RuntimeONNX 运行时（推荐给 Windows 开发人员）
Qualcomm® AI Engine Direct context binary (SOC-specific)
Qualcomm® AI Engine Direct model library (operating system-specific)

1.将 PyTorch 编译为 TensorFlow Lite

要编译PyTorch模型，我们必须首先使用PyTorch中的jit.trace方法在内存中生成TorchScript模型。一旦触发，就可以使用submit_compile_job() API编译模型。

from typing import Tuple

import torch
import torchvision

import qai_hub as hub

# Using pre-trained MobileNet
torch_model = torchvision.models.mobilenet_v2(pretrained=True)
torch_model.eval()

# Trace model
input_shape: Tuple[int, ...] = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
pt_model = torch.jit.trace(torch_model, example_input)

# Compile model on a specific device
compile_job = hub.submit_compile_job(
    pt_model,
    name="MyMobileNet",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(image=input_shape),
)

assert isinstance(compile_job, hub.CompileJob)

编译TorchScript模型

如果您已经保存了跟踪或脚本化torch模型（使用 torch.jit.save保存存），则可以直接提交它。我们将使用 mobilenet_v2.pt 作为示例。

import qai_hub as hub

# Compile a model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

# Profile the compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

2. 将 PyTorch模型编译为QNN模型库

Qualcomm AI Hub支持对TorchScript到QNN模型库进行编译和分析。在本例中，我们将使用 mobilenet_v2.pt 并将其编译为ARM64 Android平台（aarch64_Android）的QNN模型库（.so文件）。

模型库是一种特定于操作系统的部署机制，即 SOC 不可知。请注意，Qualcomm® AI Engine Direct SDK 不保证模型库将 ABI 与 SDK 的所有版本兼容。请参阅Qualcomm® AI Engine Direct 选项。

import qai_hub as hub

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23"),
    options="--target_runtime qnn_lib_aarch64_android",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

返回值是 CompileJob. 的实例。请看这里的示例 example here了解如何为Snapdragon®神经处理单元（NPU）编译此模型

3. 将PyTorch模型编译为QNN Context二进制文件

Qualcomm®AI Hub支持将PyTorch模型编译和分析为QNN上下文二进制文件。在这个例子中，我们将使用 mobilenet_v2.pt并将其编译为优化为在特定设备上运行的QNN上下文二进制文件。由于它们是专门针对目标硬件进行优化的，因此只能针对单个设备进行编译。

上下文二进制是SOC特定的部署机制。当为设备编译时，预计该模型将部署在同一设备上。该格式与操作系统无关，因此可以在Android、Linux或Windows上部署相同的模型。上下文二进制文件仅为NPU设计。

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23"),
    options="--target_runtime qnn_context_binary",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

返回值是 CompileJob. 的实例。请看这里的示例 example here了解如何为Snapdragon®神经处理单元（NPU）编译此模型。

4. 编译预编译的QNN ONNX

Qualcomm®AI Hub支持编译和分析预编译的ONNX Runtime时模型。这是一个与ONNX Runtime兼容的模型，其中包含可在Snapdragon设备上使用ONNX Runtime运行的预编译QNN二进制文件。更多细节记录在这里。

使用预编译QNN ONNX的优点：

易于部署：适用于Android、Linux或Windows。

性能增益：相当于QNN上下文二进制。

简单的推理代码：ONNX Runtime使用QNN Execution Provider对编译后的模型运行推理。

大型模型：适用于大型模型（>2GB），如LLM、稳定扩散等。

请注意，QNN上下文二进制文件与操作系统无关，但特定于设备。此外，上下文二进制文件仅为NPU设计。

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23"),
    options="--target_runtime precompiled_qnn_onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

编译后的模型是onnx和QNN上下文二进制文件的压缩目录（扩展名为.onx）。分析ONNX模型时，ONNX模型和QNN上下文二进制文件必须位于同一目录中。此外，由于ONNX模型中嵌入了相对文件路径，更改它会导致意外故障。

<model>.onnx
   |-- <model>.onnx
   +-- <model>.bin

5.为ONNX Runtime编译PyTorch模型

Qualcomm®AI Hub支持为ONNX Runtime编译PyTorch模型。在这个例子中，我们将使用 mobilenet_v2.pt 并将其编译为ONNX模型。此模型可以使用ONNX运行时进行分析。

ONNX Runtime支持在CPU、GPU（使用 DML execution provider）或NPU（ QNN execution provider）上运行

import qai_hub as hub

# Compile a model to an ONNX model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23"),
    options="--target_runtime onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

6. 将ONNX模型编译为TensorFlow Lite或QNN

Qualcomm®AI Hub还支持将ONNX模型编译为TensorFlow Lite或QNN模型库。我们将使用 mobilenet_v2.onnx作为示例。

import qai_hub as hub

# Compile a model to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
)
assert isinstance(compile_job, hub.CompileJob)

7. 将AIMET量化的模型编译为TensorFlow Lite或QNN

AI Model Efficiency Toolkit（AIMET）是一个开源库，为训练神经网络模型提供先进的模型量化和压缩技术。AIMET的QuantizationSimModel 可以导出到以下其中一个：

ONNX模型（.onx）和带有量化参数的编码文件（.encodings）。(推荐)
TorchScript（.pt）和具有量化参数的编码文件（.encodings）。
要使用这些模型，请创建一个名称为.amet的目录。它应该包含一个.pt或.onx模型和相应的编码文件。

<model>.aimet
   |-- <model>.onnx  or .pt file
   +-- <model>.encodings

其中＜model＞可以是任何名称。

让我们以 mobilenet_v2_onnx.aimet.zip为例。解压到mobilenet_v2_onnx.amet目录后，我们可以通过下面代码提交编译

import qai_hub as hub

# Compile to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S23"),
)
assert isinstance(compile_job, hub.CompileJob)

# Compile to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S23"),
    options="--target_runtime qnn_lib_aarch64_android --quantize_full_type int8",
)
assert isinstance(compile_job, hub.CompileJob)