Qualcomm® AI Engine Direct 使用手册（26）

最新推荐文章于 2024-03-07 15:48:48 发布

weixin_38498942

最新推荐文章于 2024-03-07 15:48:48 发布

阅读量2k

点赞数 20

分类专栏： AI Qualcomm 笔记文章标签：人工智能算法 Qualcomm ai

本文链接：https://blog.csdn.net/weixin_38498942/article/details/135678495

版权

笔记同时被 3 个专栏收录

364 篇文章

订阅专栏

Qualcomm

150 篇文章

订阅专栏

50 篇文章

订阅专栏

Qualcomm® AI Engine Direct 使用手册（26）

- 8.2 高级的
- - 8.2.1 QNN HTP 共享缓冲区教程
  - 8.2.2 使用 DLC 执行

8.2 高级的

8.2.1 QNN HTP 共享缓冲区教程

介绍
本教程介绍如何使用数据缓冲区在 QNN HTP 后端的处理域之间进行共享访问。使用共享缓冲区可以消除主机 CPU 上的客户端代码和 HTP 加速器之间的数据复制。

HTP 后端支持两种类型的共享内存。

Qnn_MemDescriptor_t 类型	QnnMemHtp_Descriptor_t 类型	描述符
QNN_MEM_TYPE_ION	1、不适用	每个张量将被映射到它自己的共享缓冲区; 2、文件描述符和内存句柄之间的一对一关系
QNN_MEM_TYPE_CUSTOM	QNN_HTP_MEM_SHARED_BUFFER	1、多个张量将被映射到一个共享缓冲区；2、文件描述符和内存句柄之间的一对多关系

》笔记
本教程仅关注共享缓冲区的使用。SDK 示例代码中有一些先决条件，此处未详细讨论。用户可以参考QNN文档中的相应部分，或者参考SampleApp。

SampleApp 文档：示例应用程序教程

示例应用代码：${QNN_SDK_ROOT}/examples/QNN/SampleApp

加载必备共享库
配备高通芯片组的硬件设备包含一个共享库，该库提供共享缓冲区操作的功能。

加载共享库
该libcdsprpc.so共享库可在大多数配备高通芯片组的主流设备（SD888 及更高版本）上使用。

我们可以动态加载它，如下所示：

1 void* libCdspHandle = dlopen("libcdsprpc.so", RTLD_NOW | RTLD_LOCAL);
2
3 if (nullptr == libCdspHandle) {
4   // handle errors
5 }

解析符号
共享库成功加载后，我们可以继续解析所有必需的符号。

下面的代码片段显示了解析共享库中符号的模板：

 1/**
 2* Defination: void* rpcmem_alloc(int heapid, uint32 flags, int size);
 3* Allocate a buffer via ION and register it with the FastRPC framework.
 4* @param[in] heapid  Heap ID to use for memory allocation.
 5* @param[in] flags   ION flags to use for memory allocation.
 6* @param[in] size    Buffer size to allocate.
 7* @return            Pointer to the buffer on success; NULL on failure.
 8*/
 9typedef void *(*RpcMemAllocFn_t)(int, uint32_t, int);
10
11/**
12* Defination: void rpcmem_free(void* po);
13* Free a buffer and ignore invalid buffers.
14*/
15typedef void (*RpcMemFreeFn_t)(void *);
16
17/**
18* Defination: int rpcmem_to_fd(void* po);
19* Return an associated file descriptor.
20* @param[in] po  Data pointer for an RPCMEM-allocated buffer.
21* @return        Buffer file descriptor.
22*/
23typedef int (*RpcMemToFdFn_t)(void *);
24
25RpcMemFreeFn_t rpcmem_alloc = (RpcMemAllocFn_t)dlsym(libCdspHandle, "rpcmem_alloc");
26RpcMemFreeFn_t rpcmem_free = (RpcMemFreeFn_t)dlsym(libCdspHandle, "rpcmem_free");
27RpcMemToFdFn_t rpcmem_to_fd = (RpcMemToFdFn_t)dlsym(libCdspHandle, "rpcmem_to_fd");
28if (nullptr == rpcmem_alloc || nullptr == rpcmem_free || nullptr == rpcmem_to_fd) {
29    dlclose(libCdspHandle);
30    // handle errors
31}

将 QNN_MEM_TYPE_ION 与 QNN API 结合使用
以下是 ION 共享缓冲区的表示，其中每个张量都有自己的共享缓冲区，具有自己唯一的内存指针、文件描述符和内存句柄。
请添加图片描述
一个例子如下所示：

HTP 共享缓冲区示例

 1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
 2QnnInterface_t qnnInterface;
 3// Init qnn interface ......
 4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
 5
 6// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
 7Qnn_Tensor_t inputTensor;
 8// Set up common setting for inputTensor ......
 9/* There are 2 specific settings for shared buffer:
10*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
11*  2. union member memHandle should be used instead of clientBuf, and it
12*     should be set to nullptr. (line 41)
13*/
14
15
16size_t bufSize;
17// Calculate the bufSize base on tensor dimensions and data type ......
18
19#define RPCMEM_HEAP_ID_SYSTEM 25
20#define RPCMEM_DEFAULT_FLAGS 1
21
22// Allocate the shared buffer
23uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, bufSize);
24if (nullptr == memPointer) {
25    // handle errors
26}
27
28int memFd = rpcmem_to_fd(memPointer);
29if (-1 == memfd) {
30    // handle errors
31}
32
33// Fill the info of Qnn_MemDescriptor_t and regist the buffer to QNN
34// Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
35Qnn_MemDescriptor_t memDescriptor = QNN_MEM_DESCRIPTOR_INIT;
36memDescriptor.memShape = {inputTensor.rank, inputTensor.dimensions, nullptr};
37memDescriptor.dataType = inputTensor.dataType;
38memDescriptor.memType = QNN_MEM_TYPE_ION;
39memDescriptor.ionInfo.fd = memfd;
40inputTensor.memType = QNN_TENSORMEMTYPE_MEMHANDLE;
41inputTensor.memHandle = nullptr;
42Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
43// To obtain QNN context handle:
44// For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
45// For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
46Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor.memHandle));
47if (QNN_SUCCESS != registRet) {
48    rpcmem_free(memPointer);
49    // handle errors
50}
51
52/**
53* At this place, the allocation and registration of the shared buffer has been complete.
54* On QNN side, the buffer has been bound by memfd
55* On user side, this buffer can be manipulated through memPointer.
56*/
57
58/**
59* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-46).
60* And if so the output buffer also should be deregistered and freed as below codes (lines 66-70).
61*/
62
63// Load the input data to memPointer ......
64
65// Execute QNN graph with input tensor and output tensor ......
66
67// Get output data ......
68
69// Deregister and free all buffers if it's not being used
70Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&tensors.memHandle, 1);
71if (QNN_SUCCESS != registRet) {
72    // handle errors
73}
74rpcmem_free(memPointer);

将 QNN_HTP_MEM_SHARED_BUFFER 与 QNN API 结合使用

以下是多张量共享缓冲区的表示，其中一组张量映射到单个共享缓冲区。这个单个共享缓冲区有一个内存指针和一个文件描述符，但是每个张量都有自己的内存指针偏移量和内存句柄。

请添加图片描述
一个例子如下所示：

HTP 多张量共享缓冲区示例

  1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
  2QnnInterface_t qnnInterface;
  3// Init qnn interface ......
  4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
  5
  6// Total number of input tensors
  7size_t numTensors;
  8
  9// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
 10Qnn_Tensor_t inputTensors[numTensors];
 11// Set up common setting for inputTensor ......
 12/* There are 2 specific settings for shared buffer:
 13*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
 14*  2. union member memHandle should be used instead of clientBuf, and it
 15*     should be set to nullptr. (line 41)
 16*/
 17
 18// Calculate the shared buffer size
 19uint64_t totalBufferSize;
 20for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) {
 21   // Calculate the tensorSize based on tensor dimensions and data type
 22   totalBufferSize += tensorSize;
 23}
 24
 25#define RPCMEM_HEAP_ID_SYSTEM 25
 26#define RPCMEM_DEFAULT_FLAGS 1
 27
 28// Allocate the shard buffer
 29uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, totalBufferSize);
 30if (nullptr == memPointer) {
 31    // handle errors
 32}
 33
 34// Get a file descriptor for the buffer
 35int memFd = rpcmem_to_fd(memPointer);
 36if (-1 == memfd) {
 37    // handle errors
 38}
 39
 40// Regiter the memory handles using memory descriptors
 41// This is the offset of the tensor location in the shared buffer
 42uint64_t offset;
 43for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) {
 44   // Fill the info of Qnn_MemDescriptor_t and register the descriptor to QNN
 45   // Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
 46   Qnn_MemDescriptor_t memDescriptor;
 47   memDescriptor.memShape = {inputTensors[tensorIdx].rank, inputTensors[tensorIdx].dimensions, nullptr};
 48   memDescriptor.dataType = inputTensors[tensorIdx].dataType;
 49   memDescriptor.memType = QNN_MEM_TYPE_CUSTOM;
 50   inputTensor[tensorIdx].memType = QNN_TENSORMEMTYPE_MEMHANDLE;
 51   inputTensor[tensorIdx].memHandle = nullptr;
 52
 53   // Fill the info of QnnMemHtp_Descriptor_t and set as custom info
 54   // QnnMemHtp_Descriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/HTP/QnnHtpMem.h
 55   QnnMemHtp_Descriptor_t htpMemDescriptor;
 56   htpMemDescriptor.type = QNN_HTP_MEM_SHARED_BUFFER;
 57   htpMemDescriptor.size = totalBufferSize; //Note: it's total buffer size
 58
 59   QnnHtpMem_SharedBufferConfig_t htpSharedBuffConfig = {memFd, offset};
 60   htpMemDescriptor.sharedBufferConfig = htpSharedBuffConfig;
 61
 62   memDescriptor.customInfo = &htpMemDescriptor;
 63
 64   Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
 65   // To obtain QNN context handle:
 66   // For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
 67   // For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
 68
 69   Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor[tensorIdx].memHandle));
 70   if (QNN_SUCCESS != registRet) {
 71      // Deregister already created memory handles
 72      rpcmem_free(memPointer);
 73      // handle errors
 74   }
 75
 76   // move offset by the tensor size
 77   offset = offset + tensorSize;
 78}
 79
 80/**
 81* At this place, the allocation and registration of the shared buffer has been complete.
 82* On QNN side, the buffer has been bound by memfd
 83* On user side, this buffer can be manipulated through memPointer and offset.
 84*/
 85
 86/**
 87* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-78).
 88* And if so the output buffer also should be deregistered and freed as below codes (lines 98-104).
 89*/
 90
 91// Load the input data to memPointer with respecitve offsets ......
 92
 93// Execute QNN graph with input tensors and output tensors ......
 94
 95// Get output data from the memPointer and offset combination ......
 96
 97// Deregister all mem handles the buffer if it's not being used
 98for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) {
 99   Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&(inputTensors[tensorIdx].memHandle), 1);
100   if (QNN_SUCCESS != registRet) {
101    // handle errors
102   }
103}
104rpcmem_free(memPointer);

8.2.2 使用 DLC 执行

教程设置
本教程假设已遵循QNN和SNPE的一般设置说明。特别是，使用工具转换为 DLC需要适当设置 PYTHONPATH 和 SNPE_ROOT。

此外，本教程需要获取 Inception V3 Tensorflow 模型文件和示例图像。这是由提供的安装脚本处理的setup_inceptionv3.py。该脚本位于：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py

用法如下：

usage: setup_inceptionv3.py [-h] -a ASSETS_DIR [-d] [-c] [-q]

Prepares the inception_v3 assets for tutorial examples.

required arguments:
  -a ASSETS_DIR, --assets_dir ASSETS_DIR
                        directory containing the inception_v3 assets

optional arguments:
  -d, --download        Download inception_v3 assets to inception_v3 example
                        directory
  -c, --convert_model   Convert and compile model once acquired.
  -q, --quantize_model  Quantize the model during conversion. Only available
                        if --c or --convert_model option is chosen

在使用脚本之前，请将环境变量设置TENSORFLOW_HOME为指向TensorFlow包的安装位置。该脚本使用 TensorFlow 实用程序，例如 optimize_for_inference.py，它们位于 TensorFlow 安装目录中。

找到TensorFlow包的位置：

$ python3 -m pip show tensorflow

TENSORFLOW_HOME使用 TensorFlow 包的安装位置（步骤 #1 中输出的位置字段）设置环境变量：

$ export TENSORFLOW_HOME=<tensorflow-location>/tensorflow_core

使用以下脚本安装 Inception V3 TensorFlow 模型和示例图像setup_inceptionv3.py：

$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d

该模型文件现在应填充在以下位置：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb

此原始图像现在应填充在以下位置：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped

型号转换

获取模型资产后，可以使用 Qualcomm® 神经处理 SDK 中的转换工具将模型转换为 DLC。

笔记
HTP 和 DSP 后端需要使用量化模型。请参阅模型量化以生成量化的 DLC。

使用snpe-tensorflow-to-dlc工具转换 Inception V3 模型。

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-tensorflow-to-dlc \
  --input_network ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb \
  --input_dim input 1,299,299,3 \
  --out_node InceptionV3/Predictions/Reshape_1 \
  --output_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \

这会生成${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlcDLC 文件。

DLC 包含序列化模型、网络拓扑和关联的模型数据。

模型量化
DLC 可以使用snpe-dlc-quantize 工具进行量化。用法示例如下：

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quantize \
  --input_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
  --input_list ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/raw_list.txt \
  --output_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \

这将产生以下工件：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc

笔记
量化模型时，输入列表必须包含输入数据的绝对路径。

执行需要生成的 DLC 和提供的实用程序库libQnnModelDlc.so。该库扩展了QNN 模型 API 以组成 QNN 图并从提供的 DLC 路径返回其句柄。

ModelError_t QnnModel_composeGraphsFromDlc(Qnn_BackendHandle_t backendHandle,
                                        QNN_INTERFACE_VER_TYPE interface,
                                        Qnn_ContextHandle_t contextHandle,
                                        const GraphConfigInfo_t **graphsConfigInfo,
                                        const char *dlcPath,
                                        const uint32_t numGraphsConfigInfo,
                                        GraphInfoPtr_t **graphsInfo,
                                        uint32_t *numGraphsInfo,
                                        bool debug,
                                        QnnLog_Callback_t logCallback,
                                        QnnLog_Level_t maxLogLevel)

QnnGraph_ComposeGraphs这与添加了输入参数的 API相同dlcPath 。然后可以最终确定并执行返回的 QNN 图句柄。

以下部分演示了 DLC 的执行。

CPU后端执行
在Linux主机上执行

qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3.dlc 作为参数来执行模型–dlc_path。

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnCpu.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
              --input_list data/cropped/raw_list.txt

结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

查看结果。

$ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                               -o output/ \
                                                                               -l data/imagenet_slim_labels.txt

在安卓上执行
在 Android 目标上运行 CPU 后端与在 Linux x86 目标上运行类似。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnCpu.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnModelDlc.so /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnCpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                               -o output_android/ \
                                                                               -l data/imagenet_slim_labels.txt

GPU后端执行

笔记
不支持在 Windows 设备上运行 GPU 后端。

在安卓上执行
在 Android 目标上运行 GPU 后端与在 Android 目标上运行 CPU 后端类似。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnGpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                              -o output_android/ \
                                                                              -l data/imagenet_slim_labels.txt

HTP 后端执行
在Linux主机上执行

笔记
可以使用 HTP 模拟后端在 Linux 主机上运行 HTP 后端。

qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3_quantized.dlc 作为参数来执行模型–dlc_path。

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
              --input_list data/cropped/raw_list.txt

笔记
HTP 仿真后端需要量化模型。有关量化的更多信息，请参阅模型量化。

结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

查看结果。

$ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                               -o output/ \
                                                                               -l data/imagenet_slim_labels.txt

在安卓上执行
在 Android 目标上运行 HTP 后端与在 Android 目标上运行 CPU 和 GPU 后端类似，不同之处在于 HTP 后端需要量化模型和用户生成的序列化上下文。有关量化的更多信息，请参阅模型量化。

qnn-context-binary-generator通过使用 libQnnModelDlc.so 作为–model参数和量化 DLC 作为参数运行，从 DLC 生成序列化上下文–dlc_path。

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
              --binary_file Inception_v3_quantized.serialized

上下文将在处创建./output/Inception_v3_quantized.serialized.bin。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output/Inception_v3_quantized.serialized.bin /data/local/tmp/inception_v3

笔记
本节演示了 Android 上的 HTP 执行以及离线准备的图形步骤。要执行设备上（在线）准备好的图表，请推送设备上准备库和量化 DLC。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
$ export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3"

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnHtp.so --input_list target_raw_list.txt --retrieve_context Inception_v3_quantized.serialized.bin

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                              -o output_android/ \
                                                                              -l data/imagenet_slim_labels.txt