骁龙神经处理引擎SDK参考指南（30）

最新推荐文章于 2023-12-19 22:18:52 发布

weixin_38498942

最新推荐文章于 2023-12-19 22:18:52 发布

阅读量451

点赞数

分类专栏：神经处理引擎笔记 sdk 文章标签：深度学习 caffe python

本文链接：https://blog.csdn.net/weixin_38498942/article/details/131383422

版权

笔记同时被 3 个专栏收录

364 篇文章 36 订阅

订阅专栏

sdk

281 篇文章 28 订阅

订阅专栏

神经处理引擎

17 篇文章 7 订阅

订阅专栏

骁龙神经处理引擎SDK参考指南（30）

- 7.3 推理准确率
8 工具

7.3 推理准确率

SNPE SDK 推理分类精度是针对几种流行的公共模型进行测量的。根据我们的测量，每个芯片组的准确度分数没有变化。

高通分类精度指标

以下分类精度分数是通过将 SNPE 推理结果与真实情况进行比较来计算的：

mAP：平均预测率
Top-1 错误率：概率最高的预测类别不是真实类别的可能性
![请添加图片描述](https://img-blog.csdnimg.cn/14a046ad39cc4c3b88b010010a4d0eb4.png
Top-5 错误率：真实类别不包含在概率最高的 5 个类别中的机会

平均精度计算

mAP（平均平均精度）是所有类别的平均精度。每个 AveP（平均精度）的计算公式为：

在这里插入图片描述
在哪里：

k 是检索到的文档序列中的排名
n 是检索到的文档数
P(k) 是列表中截止 k 处的精度。精度由 tp/(tp+fp) 计算，其中 tp 是真阳性，fp 是假阳性。
rel(k) 是一个指示函数，如果排名 k 的项目是相关文档，则等于 1，否则为零。
如果没有检索到相关文档，则精度分数为零。

计算AP的python代码示例：

for j in range(len(img_sorted)):
    if img_sorted[j] in anno_imgs:
        count += 1.0
        AP += count/rank
    rank += 1.0
if (count == 0):
    AP = 0
else:
    AP = AP/count

8 工具

本章介绍各种 SDK 工具和功能。

snpe-net-run
snpe-parallel-run
snpe_bench.py
snpe-caffe-to-dlc
snpe-diagview
snpe-dlc-info
snpe-dlc-diff
snpe-dlc-viewer
snpe-dlc-quantize
snpe-dlc-quant
snpe-dlc-graph-prepare
snpe-tensorflow-to-dlc
snpe-tflite-to-dlc
snpe-onnx-to-dlc
snpe-pytorch-to-dlc
snpe-platform-validator
snpe-platform-validator-py
snpe-throughput-net-run
snpe-udo-package-generator

snpe-net-run

snpe-net-run 加载 DLC 文件，加载输入张量的数据，并在指定的运行时执行网络。

  DESCRIPTION:
  ------------
  Example application demonstrating how to load and execute a neural network
  using the SNPE C/C++ API.

  REQUIRED ARGUMENTS:
  -------------------
    --container  <FILE>   Path to the DL container containing the network.
    --input_list <FILE>   Path to a file listing the inputs for the network.
  
  
  OPTIONAL ARGUMENTS:
  -------------------
    --use_gpu             Use the GPU runtime for SNPE.
    --use_dsp             Use the DSP fixed point runtime for SNPE.
    --debug               Specifies that output from all layers of the network
                          will be saved.
    --output_dir=<val>
                          The directory to save output to. Defaults to ./output
    --storage_dir=<val>
                          The directory to store SNPE metadata files
    --encoding_type=<val>
                          Specifies the encoding type of input file. Valid settings are "nv21".
                          Cannot be combined with --userbuffer*.
    --use_native_input_files
                          Specifies to consume the input file(s) in their native data type(s).
                          Must be used with --userbuffer_xxx.
    --use_native_output_files
                          Specifies to write the output file(s) in their native data type(s).
                          Must be used with --userbuffer_xxx.
    --userbuffer_auto
                          Specifies to use userbuffer for input and output, with auto detection of types enabled.
                          Must be used with user specified buffer. Cannot be combined with --encoding_type.
    --userbuffer_float
                          Specifies to use userbuffer for inference, and the input type is float.
                          Cannot be combined with --encoding_type.
    --userbuffer_floatN=<val>
                          Specifies to use userbuffer for inference, and the input type is float 16 or float 32.
                          Cannot be combined with --encoding_type.
    --userbuffer_tf8      Specifies to use userbuffer for inference, and the input type is tf8exact0.
                          Cannot be combined with --encoding_type.
    --userbuffer_tfN=<val>
                          Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
                          Must be used with user specified buffer.
    --userbuffer_float_output
                          Overrides the userbuffer output used for inference, and the output type is float. Must be used with user
                          specified buffer.
    --userbuffer_floatN_output=<val>
                          Overrides the userbuffer output used for inference, and the output type is float 16 or float 32. Must be used with user
                          specified buffer.
    --userbuffer_tfN_output=<val>
                          Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
                          Must be used with user specified buffer.
    --userbuffer_tf8_output
                          Overrides the userbuffer output used for inference, and the output type is tf8exact0.
    --userbuffer_uintN_output=<val>
                          Overrides the userbuffer output used for inference, and the output type is Uint N. Must be used with user
                          specified buffer.
    --static_min_max  Specifies to use quantization parameters from the model instead of
                          input specific quantization. Used in conjunction with --userbuffer_tf8.
    --resizable_dim=<val>
                          Specifies the maximum number that resizable dimensions can grow into.
                          Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a
                          positive integer and is not applicable when using ITensor.
    --userbuffer_glbuffer
                          [EXPERIMENTAL]  Specifies to use userbuffer for inference, and the input source is OpenGL buffer.
                          Cannot be combined with --encoding_type.
                          GL buffer mode is only supported on Android OS.
    --data_type_map=<val>
                          Sets data type of IO buffers during prepare.
                          Arguments should be provided in the following format:
                          --data_type_map buffer_name1=buffer_name1_data_type --data_type_map buffer_name2=buffer_name2_data_type
                          Data Type can have the following values: float32, fixedPoint8, fixedPoint16
    --tensor_mode=<val>
                          Sets type of tensor to use.
                          Arguments should be provided in the following format:
                          --tensor_mode itensor
                          Data Type can have the following values: userBuffer, itensor
    --perf_profile=<val>
                          Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default",
                          "high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver",
                          "high_power_saver" and "system_settings".
    --profiling_level=<val>
                          Specifies the profiling level.  Valid settings are "off", "basic", "moderate" and "detailed".
                          Default is detailed.
    --enable_cpu_fallback
                          Enables cpu fallback functionality. Defaults to disable mode.
    --input_name=<val>
                          Specifies the name of input for which dimensions are specified.
    --input_dimensions=<val>
                          Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
                          For multiple inputs, specify --input_name and --input_dimensions multiple times.
    --gpu_mode=<val>  Specifies gpu operation mode. Valid settings are "default", "float16".
                          default = float32 math and float16 storage (equiv. use_gpu arg).
                          float16 = float16 math and float16 storage.
    --enable_init_cache
                          Enable init caching mode to accelerate the network building process. Defaults to disable.
    --platform_options=<val>
                          Specifies value to pass as platform options.
    --priority_hint=<val>
                          Specifies hint for priority level.  Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
                          Note: "normal_high" is only available on DSP.
    --inferences_per_duration=<val>
                          Specifies the number of inferences in specific duration (in seconds). e.g. "10,20".
    --runtime_order=<val>
                          Specifies the order of precedence for runtime e.g  cpu_float32, dsp_fixed8_tf etc
                          Valid values are:-
                          cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                          gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                          dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                          gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
    --set_output_tensors=<val>
                          Specifies a comma separated list of tensors to be output after execution.
    --set_unconsumed_as_output
                          Sets all unconsumed tensors as outputs.
                          aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
                          cpu (Snapdragon CPU)               = Same as cpu_float32
                          gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                          dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf
                          aip (Snapdragon HTA+HVX)           = Same as aip_fixed8_tf
    --udo_package_path=<val>
                          Path to the registration library for UDO package(s).
                          Optionally, user can provide multiple packages as a comma-separated list.
    --duration=<val>      Specified the duration of the run in seconds. Loops over the input_list until this amount of time has transpired.
    --dbglogs
    --timeout=<val>       Execution terminated when exceeding time limit. Only valid for dsp runtime currently.
    --userlogs=<val>      Specifies the user level logging as level,<optional logPath>.
    --help                Show this help message.
    --version             Show SNPE Version Number.

默认情况下，此二进制文件将原始输出张量输出到输出文件夹中。使用 snpe-net-run 的示例可以在运行 AlexNet教程中找到。

额外细节：

运行批量输入：
- snpe-net-run 能够自动批量输入数据。批量大小在模型容器（DLC 文件）中指示，但也可以使用传递给 snpe-net-run 的“input_dimensions”参数进行设置。用户不需要批量输入数据。如果输入数据不是批量的，则输入大小需要是输入数据文件大小的倍数。snpe-net-run 会将提供的输入分组为批次，并用零填充不完整的批次（如果存在）。
在下面的示例中，模型设置为接受三个输入的批次。因此，输入会通过 snpe-net-run 自动分组在一起形成批次，并对最终批次进行填充。请注意，snpe-net-run 生成了五个输出文件：

在这里插入代码片

输入列表参数：
- snpe-net-run 可以将多个输入文件作为每次迭代的输入数据，并在输入列表文件中指定多个输出名称，格式如下：

 #<output_name>[<space><output_name>]
      <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
      …

以“#”开头的第一行指定输出层的名称。如果有多个输出，则应使用空格作为分隔符。在第一行之后，您可以使用多行来提供输入文件，每次迭代一行，每行仅提供一个层。如果每行有多个输入，则应使用空格作为分隔符。

这是一个示例，其中层名称为“Input_1”和“Input_2”，输入位于路径“Placeholder_1/real_input_inputs_1/”中。其输入列表文件应如下所示：

      #Output_1 Output_2
      Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor
      Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor
      Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor

注意： 如果模型的批次维度大于 1，则输入文件中的批次元素数量必须与 DLC 中指定的批次维度匹配，或者必须为 1。在后一种情况下，snpe-net-run 会将多条线组合成一个输入张量。

运行 AIP 运行时：
- AIP Runtime 需要一个已量化的 DLC，并且 HTA 部分是离线生成的。请参阅添加 HTA 部分
- AIP运行时不支持debug_mode
- AIP 运行时需要一个 DLC，所有层都划分为 HTA，以支持批量输入

snpe-net-run

snpe-parallel-run 加载 DLC 文件，加载输入张量的数据，并在指定的运行时执行网络。此应用程序类似于 snpe-net-run，但能够在同一网络上运行多个推理线程以进行基准测试。

  DESCRIPTION:
  ------------
  Example application demonstrating how to use SNPE
  using the PSNPE and SNPE C/C++ API.
  
  
  REQUIRED ARGUMENTS:
  -------------------
    --container  <FILE>   Path to the DL container containing the network.
    --input_list <FILE>   Path to a file listing the inputs for the network.
    --perf_profile <VAL>
                          Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" , "sustained_high_performance" , "burst" , "power_saver" and "system_settings".
                          NOTE: "balanced" and "default" are the same.  "default" is being deprecated in the future.
    --cpu_fallback        Enables cpu fallback functionality. Valid settings are "false", "true".
    --runtime_order <VAL,VAL,VAL,..>
                          Specifies the order of precedence for runtime e.g cpu,gpu etc. Valid values are:-
                                   cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                                   gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                                   dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                                   gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
                                   aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
                                   cpu (Snapdragon CPU)               = Same as cpu_float32
                                   gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                                   dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf
                                   aip (Snapdragon HTA+HVX)           = Same as aip_fixed8_tf
    --use_cpu             Use the CPU runtime for SNPE.
    --use_gpu             Use the GPU float32 runtime for SNPE.
    --use_gpu_fp16        Use the GPU float16 runtime for SNPE.
    --use_dsp             Use the DSP fixed point runtime for SNPE.
    --use_aip             Use the AIP fixed point runtime for SNPE.
  
  
  OPTIONAL ARGUMENTS:
  -------------------
    --userbuffer_float    Specifies to use userbuffer for inference, and the input type is float.
    --userbuffer_tf8      Specifies to use userbuffer for inference, and the input type is tf8exact0.
    --userbuffer_auto     Specifies to use userbuffer with automatic input and output type detection for inference.
    --use_native_input_files
                          Specifies to consume the input file(s) in their native data type(s).
                          Must be used with --userbuffer_xxx.
    --use_native_output_files
                          Specifies to write the output file(s) in their native data type(s).
                          Must be used with --userbuffer_xxx.
    --input_name <INPUT_NAME>
                          Specifies the name of input for which dimensions are specified.
    --input_dimensions <INPUT_DIM>
                          Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
    --output_dir <DIR>    The directory to save result files
    --static_min_max      Specifies to use quantization parameters from the model instead of
                          input specific quantization. Used in conjunction with --userbuffer_tf8.
    --userbuffer_float_output
                          Overrides the userbuffer output used for inference, and the output type is float.
                          Must be used with user specified buffer.
    --userbuffer_tf8_output
                          Overrides the userbuffer output used for inference, and the output type is tf8exact0.
                          Must be used with user specified buffer.
    --enable_init_cache   Enable init caching mode to accelerate the network building process. Defaults to disable.
    --profiling_level     Specifies the profiling level.  Valid settings are "off", "basic", "moderate" and "detailed".Default is off.
    --platform_options    Specifies value to pass as platform options.  Valid settings: "HtaDLBC:ON/OFF", "unsignedPD:ON/OFF".
    --set_output_tensors  Specifies a comma separated list of tensors to be output after execution.
    --userlogs <VAL>      Specifies the user level logging as level,<optional logPath>.
    --version             Show SNPE Version Number.
    --help                Show this help message.

额外细节：

所需的运行时参数：
- 对于与运行时规范相关的必需参数，可以使用 –runtime_order 或 use_cpu/gpu/etc。需要指定。以下示例演示了使用这两个选项之一的等效命令。

snpe-parallel-run --container container.dlc --input_list input_list.txt 
    --perf_profileburst --cpu_fallback true --use_dsp --use_gpu --userbuffer_auto

相当于

    snpe-parallel-run --container container.dlc --input_list input_list.txt 
    --perf_profileburst --cpu_fallback true --runtime_order dsp、gpu --userbuffer_auto

生成多个线程：
- snpe-parallel-run 能够创建多个线程来执行相同的推理过程。
在下面的示例中，给定的命令具有给定容器和输入列表所需的参数。在这 2 个选项之后，其余选项形成与每个线程对应的重复序列。在此示例中，我们改变了为每个线程指定的运行时（一个用于 dsp，另一个用于 gpu，最后一个用于 dsp）。

 snpe-parallel-run --container container.dlc --input_list input_list.txt
  --perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
  --perf_profile burst --cpu_fallback true --use_gpu --userbuffer_auto
  --perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto

执行此命令时，会观察到以下输出部分：

        ...
      Processing DNN input(s):
      input.raw
      PSNPE start executing...
      runtimes: dsp_fixed8_tf gpu_float32_16_hybrid dsp_fixed8_tf - Mode :0- Number of images processed: x
      Build time: x seconds.
      ...

请注意，列出的运行时数对应于指定的线程数以及指定这些线程的顺序。

snpe_bench.py

python 脚本 snpe_bench.py 运行 DLC 神经网络并收集基准性能信息。

  usage: snpe_bench.py [-h] -c CONFIG_FILE [-o OUTPUT_BASE_DIR_OVERRIDE]
                       [-v DEVICE_ID_OVERRIDE] [-r HOST_NAME] [-a]
                       [-t DEVICE_OS_TYPE_OVERRIDE] [-d] [-s SLEEP]
                       [-b USERBUFFER_MODE] [-p PERFPROFILE] [-l PROFILINGLEVEL]
                       [-json] [-cache]
  
  Run the snpe_bench
  
  required arguments:
    -c CONFIG_FILE, --config_file CONFIG_FILE
                          Path to a valid config file
                          Refer to sample config file config_help.json for more
                          detail on how to fill params in config file
  
  optional arguments:
    -o OUTPUT_BASE_DIR_OVERRIDE, --output_base_dir_override OUTPUT_BASE_DIR_OVERRIDE
                          Sets the output base directory.
    -v DEVICE_ID_OVERRIDE, --device_id_override DEVICE_ID_OVERRIDE
                          Use this device ID instead of the one supplied in config
                          file. Cannot be used with -a
    -r HOST_NAME, --host_name HOST_NAME
                          Hostname/IP of remote machine to which devices are
                          connected.
    -a, --run_on_all_connected_devices_override
                          Runs on all connected devices, currently only support 1.
                          Cannot be used with -v
    -t DEVICE_OS_TYPE_OVERRIDE, --device_os_type_override DEVICE_OS_TYPE_OVERRIDE
                          Specify the target OS type, valid options are
                          ['android', 'android-aarch64', 'le', 'le64_gcc4.9',
                          'le_oe_gcc6.4', 'le64_oe_gcc6.4']
    -d, --debug           Set to turn on debug log
    -s SLEEP, --sleep SLEEP
                          Set number of seconds to sleep between runs e.g. 20
                          seconds
    -b USERBUFFER_MODE, --userbuffer_mode USERBUFFER_MODE
                          [EXPERIMENTAL] Enable user buffer mode, default to
                          float, can be tf8exact0
    -p PERFPROFILE, --perfprofile PERFPROFILE
                          Set the benchmark operating mode (balanced, default,
                          sustained_high_performance, high_performance,
                          power_saver, system_settings)
    -l PROFILINGLEVEL, --profilinglevel PROFILINGLEVEL
                          Set the profiling level mode (off, basic, moderate, detailed).
                          Default is basic.
    -json, --generate_json
                          Set to produce json output.
    -cache, --enable_init_cache
                          Enable init caching mode to accelerate the network
                          building process. Defaults to disable.

snpe-caffe-to-dlc

snpe-caffe-to-dlc 将 Caffe 模型转换为 SNPE DLC 文件。

  usage: snpe-caffe-to-dlc [-h] [--input_network INPUT_NETWORK] [-o OUTPUT_PATH]
                           [--out_node OUT_NAMES]
                           [--copyright_file COPYRIGHT_FILE]
                           [--model_version MODEL_VERSION]
                           [--disable_batchnorm_folding]
                           [--input_type INPUT_NAME INPUT_TYPE]
                           [--input_dtype INPUT_NAME INPUT_DTYPE]
                           [--input_encoding INPUT_NAME INPUT_ENCODING]
                           [--input_layout INPUT_NAME INPUT_LAYOUT]
                           [--udl UDL_MODULE FACTORY_FUNCTION]
                           [--enable_preprocessing]
                           [--quantization_overrides QUANTIZATION_OVERRIDES]
                           [--keep_quant_nodes]
                           [--keep_disconnected_nodes]
                           [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                           [--strict] [--debug [DEBUG]]
                           [-b CAFFE_BIN]
                           [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
  
  Script to convert caffemodel into a DLC file.
  
  optional arguments:
    -h, --help            show this help message and exit
  
  required arguments:
    --input_network INPUT_NETWORK, -i INPUT_NETWORK
                          Path to the source framework model.
  
  optional arguments:
    --out_node OUT_NAMES, --out_name OUT_NAMES
                          Name of the graph's output Tensor Names. Multiple output names should be
                          provided separately like:
                              --out_name out_1 --out_name out_2
    -o OUTPUT_PATH, --output_path OUTPUT_PATH
                          Path where the converted Output model should be
                          saved.If not specified, the converter model will be
                          written to a file with same name as the input model
    --copyright_file COPYRIGHT_FILE
                          Path to copyright file. If provided, the content of
                          the file will be added to the output model.
    --model_version MODEL_VERSION
                          User-defined ASCII string to identify the model, only
                          first 64 bytes will be stored
    --disable_batchnorm_folding
                          If not specified, converter will try to fold batchnorm
                          into previous convolution layer
    --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                          Type of data expected by each input op/layer. Type for
                          each input is |default| if not specified. For example:
                          "data" image.Note that the quotes should always be
                          included in order to handle special characters,
                          spaces,etc. For multiple inputs specify multiple
                          --input_type on the command line. Eg: --input_type
                          "data1" image --input_type "data2" opaque These
                          options get used by DSP runtime and following
                          descriptions state how input will be handled for each
                          option. Image: input is float between 0-255 and the
                          input's mean is 0.0f and the input's max is 255.0f. We
                          will cast the float to uint8ts and pass the uint8ts to
                          the DSP. Default: pass the input as floats to the dsp
                          directly and the DSP will quantize it. Opaque: assumes
                          input is float because the consumer layer(i.e next
                          layer) requires it as float, therefore it won't be
                          quantized.Choices supported:['image', 'default',
                          'opaque']
    --input_dtype INPUT_NAME INPUT_DTYPE
                          The names and datatype of the network input layers
                          specified in the format [input_name datatype], for
                          example: 'data' 'float32'. Default is float32 if not
                          specified. Note that the quotes should always be
                          included in order to handle special characters, spaces,
                          etc. For multiple inputs specify multiple
                          --input_dtype on the command line like: --input_dtype
                          'data1' 'float32' --input_dtype 'data2' 'float32'
    --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                          Image encoding of the source images. Default is bgr.
                          Eg usage: "data" rgba Note the quotes should always be
                          included in order to handle special characters,
                          spaces, etc. For multiple inputs specify
                          --input_encoding for each on the command line. Eg:
                          --input_encoding "data1" rgba --input_encoding "data2"
                          other. Use options: color encodings(bgr,rgb, nv21...)
                          if input is image; time_series: for inputs of rnn
                          models; other: if input doesn't follow above
                          categories or is unknown. Choices supported:['bgr',
                          'rgb', 'rgba', 'argb32', 'nv21', 'time_series',
                          'other']
    --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                          Layout of each input tensor. If not specified, it will use the default
                          based on the Source Framework, shape of input and input encoding.
                          Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                          N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                          NDHWC/NCDHW used for 5d inputs
                          NHWC/NCHW used for 4d image-like inputs
                          NFC/NCF used for inputs to Conv1D or other 1D ops
                          NTF/TNF used for inputs with time steps like the ones used for LSTM op
                          NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                          NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                          F used for 1D inputs, e.g. Bias tensor
                          NONTRIVIAL for everything elseFor multiple inputs specify multiple
                          --input_layout on the command line.
                          Eg:
                             --input_layout "data1" NCHW --input_layout "data2" NCHW
    --udl UDL_MODULE FACTORY_FUNCTION
                          Option to add User Defined Layers. Provide Filename, Function
                          name.1.Filename: Name of python module to load for registering custom
                          udl(note: must be in PYTHONPATH). If file part of package list the
                          package.filename as you would when doing a python import.2.Function name:
                          Name of the udl factory function that return a dictionary of key layer type
                          and value function callback.
    --enable_preprocessing
                          If specified, converter will enable preprocessing specified by a datalayer
                          transform_param subtract_mean is supported.
    --keep_disconnected_nodes
                          Disable Optimization that removes Ops not connected to the main graph.
                          This optimization uses output names provided over commandline OR
                          inputs/outputs extracted from the Source model to determine the main graph
    --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                          A combination of processor and runtime target against
                          which model will be validated. Choices for
                          RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                          PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                          snapdragon_835}.If not specified, will validate model
                          against {snapdragon_820, snapdragon_835} across all
                          runtime targets.
    --strict              If specified, will validate in strict mode whereby
                          model will not be produced if it violates constraints
                          of the specified validation target. If not specified,
                          will validate model in permissive mode against the
                          specified validation target.
    --debug [DEBUG]       Run the converter in debug mode.
    -b CAFFE_BIN, --caffe_bin CAFFE_BIN
                          Input caffe binary file containing the weight data
    --udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -udo CUSTOM_OP_CONFIG_PATHS 
                          [CUSTOM_OP_CONFIG_PATHS ...]
                          Path to the UDO configs (space separated, if multiple)
  
  Quantizer Options:
    --quantization_overrides QUANTIZATION_OVERRIDES
                          Use this option to specify a json file with parameters to use for
                          quantization. These will override any quantization data carried from
                          conversion (eg TF fake quantization) or calculated during the normal
                          quantization process. Format defined as per AIMET specification.
    --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                          than stripping them.

使用此脚本的示例可以在将模型从 Caffe 转换为 SNPE中找到。

额外细节：

输入编码参数：
- 指定输入图像的编码类型。
- 网络中添加了一个预处理层，用于将输入图像从指定编码转换为 BGR（Caffe 使用的编码）。
- 使用snpe-dlc-info时可以看到编码预处理层。
- 允许的选项有：
- argb32：ARGB32 格式每个像素由 4 个字节组成：一个字节用于红色，一个字节用于绿色，一个字节用于蓝色，一个字节用于 Alpha 通道。Alpha 通道被忽略。对于小端 CPU，字节顺序是 BGRA。对于大端 CPU，字节顺序是 ARGB。
- rgba：RGBA 格式每个像素由 4 个字节组成：一个字节表示红色，一个字节表示绿色，一个字节表示蓝色，一个字节表示 Alpha 通道。Alpha 通道被忽略。字节顺序与字节序无关，并且始终为 RGBA 字节顺序。
- nv21：NV21 是 YUV 的 Android 版本。色度经过下采样，子采样率为 4:2:0。请注意，此图像格式有 3 个通道，但 U 和 V 通道进行了二次采样。每四个 Y 像素就有 1 个 U 像素和 1 个 V 像素。
- bgr：BGR 格式由每个像素 3 个字节组成：一个字节表示红色，一个字节表示绿色，一个字节表示蓝色。字节顺序与字节序无关，并且始终为 BGR 字节顺序。
- 该参数是可选的。如果省略，则假设输入图像编码为 BGR 并且不添加预处理层。
- 有关更多详细信息，请参阅 input_preprocessing。
disable_batchnorm_folding 参数：
- 禁用batchnorm折叠参数允许用户在可能的情况下关闭将batchnorm和batchnorm +缩放层折叠到先前卷积层的优化。
- 该参数是可选的。如果省略，则转换器将尽可能将批量标准化和批量标准化+缩放层折叠到先前的卷积层中，作为优化。当发生这种情况时，折叠的批量标准化和缩放层的名称将连接到它折叠到的卷积层。
- 例如：如果将名为“bn”的批量归一化层和名为“scale”的缩放层折叠为名为“conv”的卷积层，则生成的 dlc 将显示名为“conv.bn.scale”的卷积层。
输入类型参数：
- 指定特定输入图层名称的预期数据类型。
- 如果您想指定两个或多个输入层的预期数据类型，则可以多次传递此参数。
- input_type 参数采用 INPUT_NAME，后跟 INPUT_TYPE。
- 该参数是可选的。如果某个输入层被省略，则预期的数据类型将为 type:default。
- 允许的选项有：
- default：指定输入包含浮点值。
- image：指定输入包含浮点值，这些值都是 0…255 范围内的整数。
- opaque：指定输入包含浮点值，这些值应不加修改地传递到选定的运行时。
例如，不透明张量直接传递到 DSP，无需量化。
- 例如：[–input_type“data”图像–input_type“roi”不透明]。

weixin_38498942

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
骁龙神经处理引擎SDK参考指南（30）

snpe-parallel-run 加载 DLC 文件，加载输入张量的数据，并在指定的运行时执行网络。如果模型的批次维度大于 1，则输入文件中的批次元素数量必须与 DLC 中指定的批次维度匹配，或者必须为 1。这是一个示例，其中层名称为“Input_1”和“Input_2”，输入位于路径“Placeholder_1/real_input_inputs_1/”中。根据我们的测量，每个芯片组的准确度分数没有变化。snpe-net-run 加载 DLC 文件，加载输入张量的数据，并在指定的运行时执行网络。
复制链接

扫一扫