Qualcomm® AI Engine Direct 使用手册（5）

最新推荐文章于 2024-07-02 16:37:47 发布

weixin_38498942

最新推荐文章于 2024-07-02 16:37:47 发布

阅读量1.5k

点赞数 25

分类专栏： AI 笔记 Qualcomm 文章标签：人工智能 Qualcomm 机器学习 ai

本文链接：https://blog.csdn.net/weixin_38498942/article/details/135266565

版权

笔记同时被 3 个专栏收录

364 篇文章 44 订阅

订阅专栏

Qualcomm

150 篇文章 13 订阅

订阅专栏

50 篇文章 2 订阅

订阅专栏

Qualcomm® AI Engine Direct 使用手册（5）

- - 4.1.2 HTP - QNN HTP 后端扩展

4.1.2 HTP - QNN HTP 后端扩展

QNN HTP 后端扩展

qnn-net-run 实用程序与后端无关，这意味着它只能使用通用 QNN API。后端扩展功能方便使用后端特定 API，即自定义配置。有关后端扩展的更多文档可以在qnn-net-run下找到。请注意，QNN 后端扩展的范围是仅限于 qnn-net-run。

HTP 后端扩展是一个为 HTP 后端提供自定义选项的接口。还需要启用不同的性能模式。这些选项和性能模式可以通过提供扩展共享库来行使 libQnnHtpNetRunExtensions.so 和配置文件（如有必要）。

要将后端扩展相关参数与 qnn-net-run 一起使用，请使用 --config_file 参数并提供 JSON 文件的路径。

$ qnn-net-run --model <qnn_model_name.so> \
              --backend <path_to_model_library>/libQnnHtp.so \
              --output_dir <output_dir_for_result> \
              --input_list <path_to_input_list.txt>
              --config_file <path to JSON of backend extensions>

上述配置文件包含通过 JSON 指定的最少参数（例如后端扩展配置），如下所示：

{
    "backend_extensions" :
    {
        "shared_library_path" : "path_to_shared_library",  // give path to shared extensions library (.so)
        "config_file_path" : "path_to_config_file"         // give path to backend config
    }
}

用户可以通过后端配置为 HTP 后端设置自定义选项和不同的性能模式。各种种类配置中可用的选项如下所示：

{
   "type": "object", "properties": {
     "graphs": {
         "type": "object", "properties": {

           // Corresponds to the graph name provided to QnnGraph_create
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "graph_names": {"type": "array", "items": {"type": "string"}},

           // Provides performance infrastructure configuration options that are memory specific [optional]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "vtcm_mb": {"type": "integer"},

           // Used to perform computation with half precision i.e. 16 bits [optional] [default: 0]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "fp16_relaxed_precision": {"type": "integer"},

           // Corresponds to the number of HVX threads to use for a particular graph during an inference.
          // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "hvx_threads": {"type": "integer"},

           // Set Graph optimization value in range 1 to 3 [optional] [default: 2]
           // 1 = Faster preparation time, less optimal graph, 2 = Longer preparation time, more optimal graph
           // 3 = Longest preparation time, most likely even more optimal graph
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "O": {"type": "number", "multipleOf": 1},

           // Provide deep learning bandwidth compression value 0 or 1 [optional] [default: 0]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "dlbc": {"type": "number", "multipleOf": 1}
         }
     },
     "devices": {
       "type": "array", "items": {
         "type": "object", "properties": {

           // Selection of the device [optional] [default: 0]
           // Used by qnn-net-run
           "device_id": {"type": "integer"},

           // Selection of the SoC [optional] [default: 0]
           // Used by qnn-net-run and qnn-context-binary-generator
           "soc_id": {"type": "integer"},

           // Set dsp architecture value [optional] [default: NONE]
           // Used by qnn-net-run and qnn-context-binary-generator
           "dsp_arch": {"type": "string"},

           // Specifies the user pd attribute [optional] [default: "unsigned"]
           // Used by qnn-net-run and qnn-context-binary-generator
           "pd_session": {"type": "string"},

           // Used for linting profiling level [optional] [default: not set]
           // Used by qnn-net-run and qnn-context-binary-generator
           "profiling_level": {"type": "string"},

           // Specifies whether to use null context or not. true means using a unique power context id, and false means using null context.
           // NOTE: This parameter is not supported for v68 onwards
           // Used by qnn-net-run
           "use_client_context": {"type": "boolean"},
           "cores": {
             "type": "array", "items": {
               "type": "object", "properties": {

                 // Select the core [optional] [default: 0]
                 // Used by qnn-net-run
                 "core_id": {"type": "integer"},

                 // Provide performance profile [optional] [default: "high_performance"]
                 // Used by qnn-net-run
                 // NOTE: command line perf profile option is now deprecated.
                 "perf_profile": {"type": "string"},

                 // Rpc control latency value in micro second [optional] [default: 100us]
                 // Used by qnn-net-run
                 "rpc_control_latency": {"type": "integer"},

                 // Rpc polling time value in micro second [optional]
                 // [default: 9999 us for burst, high_performance & sustained_high_performance, 0 us for other perf profiles]
                 // Used by qnn-net-run
                 "rpc_polling_time": {"type": "integer"},

                 // Hmx timeout value in micro second [optional] [default: 300000us]
                 // Used by qnn-net-run
                 "hmx_timeout_us": {"type": "integer"}
               }
             }
           }
         }
       }
     },
     "context": {
       "type": "object", "properties": {

         // Used for enabling Weight Sharing [optional] [default: false]
         // Used by qnn-context-binary-generator during offline preparation
         "weight_sharing_enabled": {"type": "boolean"},

         // Used to associate max spill-fill buffer size across multiple contexts within a group [optional] [default: Not Set]
         // Used by qnn-net-run and throughput-net-run during offline preparation. group_id value must be set to 0 for this option to be used.
         "max_spill_fill_buffer_for_group": {"type": "integer"},

         // Specifies the group id to which contexts can be associated [optional] [default: None]
         // Used by qnn-net-run and throughput-net-run during offline preparation.
         "group_id": {"type": "integer"}
       }
     }
   }
}

检查Qnn_SocModel_t设置 soc_id 参数。注意，这里的图对象从 SDK 2.20 版本开始将被弃用，改为图数组，如下所示：

{
   "graphs": [
     {
        .....
     },
        .....
   ]
}

具有 HTP 后端扩展的性能模式
可以使用 perf_profile 参数通过后端配置启用后端扩展性能模式，如上所示。有效设置为 low_balanced、balanced、default、high_performance、持续_high_performance、burst、low_power_saver、 power_saver、high_power_saver、extreme_power_saver 和 system_settings。这些性能模式使用不同的配置核心时钟、总线时钟、Dcvs 和睡眠延迟。有 3 种电压角定义为 TURBO、NOM 和 SVS 它们具有不同的最小和最大频率阈值。除了设置最大和最小电压角之外目标支持的最大和最小频率。有关性能模式配置的更多详细信息及参数详细信息，请参考hexagon sdk文档。不同性能模式使用的这些设置如下表所示：

在这里插入图片描述

上表按性能从最高性能 (BURST) 到最低性能 (EXTREME_POWER_SAVER) 排序。 BURST 和 SUSTAINED_HIGH_PERFORMANCE 在执行期间使用计时器，这有助于保持所有推论的高投票率并避免随后进行上下性能投票，直到超时。它们具有较低的睡眠延迟并在执行期间禁用 DCVS。 DCVS 均可增加并降低核心/总线时钟速度，同时使用 min_corner 和 max_corner 投票作为下限和上限准则。 BURST 频率最高，投票率最高，性能最好。 POWER_SAVER、LOW_POWER_SAVER 和 HIGH_POWER_SAVER频率较低，不支持投票。它们具有较高的睡眠延迟并在执行期间启用 DCVS。 EXTREME_POWER_SAVER 是性能最低的性能模式，但节省的电量最高。有关性能模式的更多详细信息这些参考使用的电压角文件 QnnHtpPerfInfrastruct.h 的程序列表

以下配置可用于设置性能配置文件和 rpc 轮询时间：

{
   "graphs": {
       ...
       ...
   },
   "devices": [
      {
         ...
         "cores":[
            {
                "perf_profile": "burst",    // use this to set any of the above performance profile
                "rpc_polling_time": 9999    // use this to set rpc polling, ranges 0-9999 us
                "rpc_control_latency": 100  // use to set rpc control latency
            }
         ]
      }
   ]
}

请注意，上述配置结构将从 SDK 2.20 版本开始弃用，支持的新配置如下所示：

{
   "graphs": [
     {
       ...
       ...
     }
     ....
   ],
   "devices": [
      {
         ...
         "cores":[
            {
                "perf_profile": "burst",    // use this to set any of the above performance profile
                "rpc_polling_time": 9999    // use this to set rpc polling, ranges 0-9999 us
                "rpc_control_latency": 100  // use to set rpc control latency
            }
         ]
      }
   ]
}

weixin_38498942

关注

25
点赞
踩
24

收藏

觉得还不错? 一键收藏
0
评论
Qualcomm® AI Engine Direct 使用手册（5）

POWER_SAVER、LOW_POWER_SAVER 和 HIGH_POWER_SAVER频率较低，不支持投票。有效设置为 low_balanced、balanced、default、high_performance、持续_high_performance、burst、low_power_saver、 power_saver、high_power_saver、extreme_power_saver 和 system_settings。用户可以通过后端配置为 HTP 后端设置自定义选项和不同的性能模式。
复制链接

扫一扫

专栏目录