Qualcomm® AI Engine Direct 使用手册(5)

150 篇文章 9 订阅
50 篇文章 0 订阅

Qualcomm® AI Engine Direct 使用手册(5)


4.1.2 HTP - QNN HTP 后端扩展

QNN HTP 后端扩展

qnn-net-run 实用程序与后端无关,这意味着它只能使用通用 QNN API。后端扩展功能 方便使用后端特定 API,即自定义配置。有关后端扩展的更多文档 可以在qnn-net-run下找到。请注意,QNN 后端扩展的范围是 仅限于 qnn-net-run。

HTP 后端扩展是一个为 HTP 后端提供自定义选项的接口。还需要启用不同的 性能模式。这些选项和性能模式可以通过提供扩展共享库来行使 libQnnHtpNetRunExtensions.so 和配置文件(如有必要)。

要将后端扩展相关参数与 qnn-net-run 一起使用,请使用 --config_file 参数并提供 JSON 文件的路径。

$ qnn-net-run --model <qnn_model_name.so> \
              --backend <path_to_model_library>/libQnnHtp.so \
              --output_dir <output_dir_for_result> \
              --input_list <path_to_input_list.txt>
              --config_file <path to JSON of backend extensions>

上述配置文件包含通过 JSON 指定的最少参数(例如后端扩展配置),如下所示:

{
    "backend_extensions" :
    {
        "shared_library_path" : "path_to_shared_library",  // give path to shared extensions library (.so)
        "config_file_path" : "path_to_config_file"         // give path to backend config
    }
}

用户可以通过后端配置为 HTP 后端设置自定义选项和不同的性能模式。各种种类 配置中可用的选项如下所示:

{
   "type": "object", "properties": {
     "graphs": {
         "type": "object", "properties": {

           // Corresponds to the graph name provided to QnnGraph_create
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "graph_names": {"type": "array", "items": {"type": "string"}},

           // Provides performance infrastructure configuration options that are memory specific [optional]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "vtcm_mb": {"type": "integer"},

           // Used to perform computation with half precision i.e. 16 bits [optional] [default: 0]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "fp16_relaxed_precision": {"type": "integer"},

           // Corresponds to the number of HVX threads to use for a particular graph during an inference.
          // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "hvx_threads": {"type": "integer"},

           // Set Graph optimization value in range 1 to 3 [optional] [default: 2]
           // 1 = Faster preparation time, less optimal graph, 2 = Longer preparation time, more optimal graph
           // 3 = Longest preparation time, most likely even more optimal graph
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "O": {"type": "number", "multipleOf": 1},

           // Provide deep learning bandwidth compression value 0 or 1 [optional] [default: 0]
           // Used by qnn-net-run during online prepare and qnn-context-binary-generator uses it during offline preparation
           "dlbc": {"type": "number", "multipleOf": 1}
         }
     },
     "devices": {
       "type": "array", "items": {
         "type": "object", "properties": {

           // Selection of the device [optional] [default: 0]
           // Used by qnn-net-run
           "device_id": {"type": "integer"},

           // Selection of the SoC [optional] [default: 0]
           // Used by qnn-net-run and qnn-context-binary-generator
           "soc_id": {"type": "integer"},

           // Set dsp architecture value [optional] [default: NONE]
           // Used by qnn-net-run and qnn-context-binary-generator
           "dsp_arch": {"type": "string"},

           // Specifies the user pd attribute [optional] [default: "unsigned"]
           // Used by qnn-net-run and qnn-context-binary-generator
           "pd_session": {"type": "string"},

           // Used for linting profiling level [optional] [default: not set]
           // Used by qnn-net-run and qnn-context-binary-generator
           "profiling_level": {"type": "string"},

           // Specifies whether to use null context or not. true means using a unique power context id, and false means using null context.
           // NOTE: This parameter is not supported for v68 onwards
           // Used by qnn-net-run
           "use_client_context": {"type": "boolean"},
           "cores": {
             "type": "array", "items": {
               "type": "object", "properties": {

                 // Select the core [optional] [default: 0]
                 // Used by qnn-net-run
                 "core_id": {"type": "integer"},

                 // Provide performance profile [optional] [default: "high_performance"]
                 // Used by qnn-net-run
                 // NOTE: command line perf profile option is now deprecated.
                 "perf_profile": {"type": "string"},

                 // Rpc control latency value in micro second [optional] [default: 100us]
                 // Used by qnn-net-run
                 "rpc_control_latency": {"type": "integer"},

                 // Rpc polling time value in micro second [optional]
                 // [default: 9999 us for burst, high_performance & sustained_high_performance, 0 us for other perf profiles]
                 // Used by qnn-net-run
                 "rpc_polling_time": {"type": "integer"},

                 // Hmx timeout value in micro second [optional] [default: 300000us]
                 // Used by qnn-net-run
                 "hmx_timeout_us": {"type": "integer"}
               }
             }
           }
         }
       }
     },
     "context": {
       "type": "object", "properties": {

         // Used for enabling Weight Sharing [optional] [default: false]
         // Used by qnn-context-binary-generator during offline preparation
         "weight_sharing_enabled": {"type": "boolean"},

         // Used to associate max spill-fill buffer size across multiple contexts within a group [optional] [default: Not Set]
         // Used by qnn-net-run and throughput-net-run during offline preparation. group_id value must be set to 0 for this option to be used.
         "max_spill_fill_buffer_for_group": {"type": "integer"},

         // Specifies the group id to which contexts can be associated [optional] [default: None]
         // Used by qnn-net-run and throughput-net-run during offline preparation.
         "group_id": {"type": "integer"}
       }
     }
   }
}

检查Qnn_SocModel_t设置 soc_id 参数。 注意,这里的图对象从 SDK 2.20 版本开始将被弃用,改为图数组,如下所示:

{
   "graphs": [
     {
        .....
     },
        .....
   ]
}

具有 HTP 后端扩展的性能模式
可以使用 perf_profile 参数通过后端配置启用后端扩展性能模式,如上所示。 有效设置为 low_balanced、balanced、default、high_performance、持续_high_performance、burst、low_power_saver、 power_saver、high_power_saver、extreme_power_saver 和 system_settings。这些性能模式使用不同的配置 核心时钟、总线时钟、Dcvs 和睡眠延迟。有 3 种电压角定义为 TURBO、NOM 和 SVS 它们具有不同的最小和最大频率阈值。除了设置最大和最小电压角之外 目标支持的最大和最小频率。有关性能模式配置的更多详细信息 及参数详细信息,请参考hexagon sdk文档。不同性能模式使用的这些设置如下表所示:

在这里插入图片描述

上表按性能从最高性能 (BURST) 到最低性能 (EXTREME_POWER_SAVER) 排序。 BURST 和 SUSTAINED_HIGH_PERFORMANCE 在执行期间使用计时器,这有助于保持所有推论的高投票率并避免 随后进行上下性能投票,直到超时。它们具有较低的睡眠延迟并在执行期间禁用 DCVS。 DCVS 均可增加 并降低核心/总线时钟速度,同时使用 min_corner 和 max_corner 投票作为下限和上限准则。 BURST 频率最高,投票率最高,性能最好。 POWER_SAVER、LOW_POWER_SAVER 和 HIGH_POWER_SAVER频率较低,不支持投票。它们具有较高的睡眠延迟并在执行期间启用 DCVS。 EXTREME_POWER_SAVER 是性能最低的性能模式,但节省的电量最高。有关性能模式的更多详细信息 这些参考使用的电压角 文件 QnnHtpPerfInfrastruct.h 的程序列表

以下配置可用于设置性能配置文件和 rpc 轮询时间:

{
   "graphs": {
       ...
       ...
   },
   "devices": [
      {
         ...
         "cores":[
            {
                "perf_profile": "burst",    // use this to set any of the above performance profile
                "rpc_polling_time": 9999    // use this to set rpc polling, ranges 0-9999 us
                "rpc_control_latency": 100  // use to set rpc control latency
            }
         ]
      }
   ]
}

请注意,上述配置结构将从 SDK 2.20 版本开始弃用,支持的新配置如下所示:

{
   "graphs": [
     {
       ...
       ...
     }
     ....
   ],
   "devices": [
      {
         ...
         "cores":[
            {
                "perf_profile": "burst",    // use this to set any of the above performance profile
                "rpc_polling_time": 9999    // use this to set rpc polling, ranges 0-9999 us
                "rpc_control_latency": 100  // use to set rpc control latency
            }
         ]
      }
   ]
}
  • 25
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值