7.3 推理准确率
SNPE SDK 推理分类精度是针对几种流行的公共模型进行测量的。根据我们的测量,每个芯片组的准确度分数没有变化。
高通分类精度指标
以下分类精度分数是通过将 SNPE 推理结果与真实情况进行比较来计算的:
- mAP:平均预测率
- Top-1 错误率:概率最高的预测类别不是真实类别的可能性
- ![请添加图片描述](https://img-blog.csdnimg.cn/14a046ad39cc4c3b88b010010a4d0eb4.png
Top-5 错误率:真实类别不包含在概率最高的 5 个类别中的机会
平均精度计算
mAP(平均平均精度)是所有类别的平均精度。每个 AveP(平均精度)的计算公式为:
在哪里:
- k 是检索到的文档序列中的排名
- n 是检索到的文档数
- P(k) 是列表中截止 k 处的精度。精度由 tp/(tp+fp) 计算,其中 tp 是真阳性,fp 是假阳性。
- rel(k) 是一个指示函数,如果排名 k 的项目是相关文档,则等于 1,否则为零。
- 如果没有检索到相关文档,则精度分数为零。
计算AP的python代码示例:
for j in range(len(img_sorted)):
if img_sorted[j] in anno_imgs:
count += 1.0
AP += count/rank
rank += 1.0
if (count == 0):
AP = 0
else:
AP = AP/count
8 工具
本章介绍各种 SDK 工具和功能。
- snpe-net-run
- snpe-parallel-run
- snpe_bench.py
- snpe-caffe-to-dlc
- snpe-diagview
- snpe-dlc-info
- snpe-dlc-diff
- snpe-dlc-viewer
- snpe-dlc-quantize
- snpe-dlc-quant
- snpe-dlc-graph-prepare
- snpe-tensorflow-to-dlc
- snpe-tflite-to-dlc
- snpe-onnx-to-dlc
- snpe-pytorch-to-dlc
- snpe-platform-validator
- snpe-platform-validator-py
- snpe-throughput-net-run
- snpe-udo-package-generator
snpe-net-run
snpe-net-run 加载 DLC 文件,加载输入张量的数据,并在指定的运行时执行网络。
DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C/C++ API.
REQUIRED ARGUMENTS:
-------------------
--container <FILE> Path to the DL container containing the network.
--input_list <FILE> Path to a file listing the inputs for the network.
OPTIONAL ARGUMENTS:
-------------------
--use_gpu Use the GPU runtime for SNPE.
--use_dsp Use the DSP fixed point runtime for SNPE.
--debug Specifies that output from all layers of the network
will be saved.
--output_dir=<val>
The directory to save output to. Defaults to ./output
--storage_dir=<val>
The directory to store SNPE metadata files
--encoding_type=<val>
Specifies the encoding type of input file. Valid settings are "nv21".
Cannot be combined with --userbuffer*.
--use_native_input_files
Specifies to consume the input file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--use_native_output_files
Specifies to write the output file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--userbuffer_auto
Specifies to use userbuffer for input and output, with auto detection of types enabled.
Must be used with user specified buffer. Cannot be combined with --encoding_type.
--userbuffer_float
Specifies to use userbuffer for inference, and the input type is float.
Cannot be combined with --encoding_type.
--userbuffer_floatN=<val>
Specifies to use userbuffer for inference, and the input type is float 16 or float 32.
Cannot be combined with --encoding_type.
--userbuffer_tf8 Specifies to use userbuffer for inference, and the input type is tf8exact0.
Cannot be combined with --encoding_type.
--userbuffer_tfN=<val>
Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--userbuffer_float_output
Overrides the userbuffer output used for inference, and the output type is float. Must be used with user
specified buffer.
--userbuffer_floatN_output=<val>
Overrides the userbuffer output used for inference, and the output type is float 16 or float 32. Must be used with user
specified buffer.
--userbuffer_tfN_output=<val>
Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
Must be used with user specified buffer.
--userbuffer_tf8_output
Overrides the userbuffer output used for inference, and the output type is tf8exact0.
--userbuffer_uintN_output=<val>
Overrides the userbuffer output used for inference, and the output type is Uint N. Must be used with user
specified buffer.
--static_min_max Specifies to use quantization parameters from the model instead of
input specific quantization. Used in conjunction with --userbuffer_tf8.
--resizable_dim=<val>
Specifies the maximum number that resizable dimensions can grow into.
Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a
positive integer and is not applicable when using ITensor.
--userbuffer_glbuffer
[EXPERIMENTAL] Specifies to use userbuffer for inference, and the input source is OpenGL buffer.
Cannot be combined with --encoding_type.
GL buffer mode is only supported on Android OS.
--data_type_map=<val>
Sets data type of IO buffers during prepare.
Arguments should be provided in the following format:
--data_type_map buffer_name1=buffer_name1_data_type --data_type_map buffer_name2=buffer_name2_data_type
Data Type can have the following values: float32, fixedPoint8, fixedPoint16
--tensor_mode=<val>
Sets type of tensor to use.
Arguments should be provided in the following format:
--tensor_mode itensor
Data Type can have the following values: userBuffer, itensor
--perf_profile=<val>
Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver",
"high_power_saver" and "system_settings".
--profiling_level=<val>
Specifies the profiling level. Valid settings are "off", "basic", "moderate" and "detailed".
Default is detailed.
--enable_cpu_fallback
Enables cpu fallback functionality. Defaults to disable mode.
--input_name=<val>
Specifies the name of input for which dimensions are specified.
--input_dimensions=<val>
Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
For multiple inputs, specify --input_name and --input_dimensions multiple times.
--gpu_mode=<val> Specifies gpu operation mode. Valid settings are "default", "float16".
default = float32 math and float16 storage (equiv. use_gpu arg).
float16 = float16 math and float16 storage.
--enable_init_cache
Enable init caching mode to accelerate the network building process. Defaults to disable.
--platform_options=<val>
Specifies value to pass as platform options.
--priority_hint=<val>
Specifies hint for priority level. Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
Note: "normal_high" is only available on DSP.
--inferences_per_duration=<val>
Specifies the number of inferences in specific duration (in seconds). e.g. "10,20".
--runtime_order=<val>
Specifies the order of precedence for runtime e.g cpu_float32, dsp_fixed8_tf etc
Valid values are:-
cpu_float32 (Snapdragon CPU) = Data & Math: float 32bit
gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
dsp_fixed8_tf (Hexagon DSP) = Data & Math: 8bit fixed point Tensorflow style format
gpu_float16 (Adreno GPU) = Data: float 16bit Math: float 16bit
--set_output_tensors=<val>
Specifies a comma separated list of tensors to be output after execution.
--set_unconsumed_as_output
Sets all unconsumed tensors as outputs.
aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
cpu (Snapdragon CPU) = Same as cpu_float32
gpu (Adreno GPU) = Same as gpu_float32_16_hybrid
dsp (Hexagon DSP) = Same as dsp_fixed8_tf
aip (Snapdragon HTA+HVX) = Same as aip_fixed8_tf
--udo_package_path=<val>
Path to the registration library for UDO package(s).
Optionally, user can provide multiple packages as a comma-separated list.
--duration=<val> Specified the duration of the run in seconds. Loops over the input_list until this amount of time has transpired.
--dbglogs
--timeout=<val> Execution terminated when exceeding time limit. Only valid for dsp runtime currently.
--userlogs=<val> Specifies the user level logging as level,<optional logPath>.
--help Show this help message.
--version Show SNPE Version Number.
默认情况下,此二进制文件将原始输出张量输出到输出文件夹中。使用 snpe-net-run 的示例可以在运行 AlexNet教程中找到。
额外细节:
- 运行批量输入:
- snpe-net-run 能够自动批量输入数据。批量大小在模型容器(DLC 文件)中指示,但也可以使用传递给 snpe-net-run 的“input_dimensions”参数进行设置。用户不需要批量输入数据。如果输入数据不是批量的,则输入大小需要是输入数据文件大小的倍数。snpe-net-run 会将提供的输入分组为批次,并用零填充不完整的批次(如果存在)。
在下面的示例中,模型设置为接受三个输入的批次。因此,输入会通过 snpe-net-run 自动分组在一起形成批次,并对最终批次进行填充。请注意,snpe-net-run 生成了五个输出文件:
在这里插入代码片
- 输入列表参数:
- snpe-net-run 可以将多个输入文件作为每次迭代的输入数据,并在输入列表文件中指定多个输出名称,格式如下:
#<output_name>[<space><output_name>]
<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
…
以“#”开头的第一行指定输出层的名称。如果有多个输出,则应使用空格作为分隔符。在第一行之后,您可以使用多行来提供输入文件,每次迭代一行,每行仅提供一个层。如果每行有多个输入,则应使用空格作为分隔符。
这是一个示例,其中层名称为“Input_1”和“Input_2”,输入位于路径“Placeholder_1/real_input_inputs_1/”中。其输入列表文件应如下所示:
#Output_1 Output_2
Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor
注意: 如果模型的批次维度大于 1,则输入文件中的批次元素数量必须与 DLC 中指定的批次维度匹配,或者必须为 1。在后一种情况下,snpe-net-run 会将多条线组合成一个输入张量。
- 运行 AIP 运行时:
- AIP Runtime 需要一个已量化的 DLC,并且 HTA 部分是离线生成的。请参阅添加 HTA 部分
- AIP运行时不支持debug_mode
- AIP 运行时需要一个 DLC,所有层都划分为 HTA,以支持批量输入
snpe-net-run
snpe-parallel-run 加载 DLC 文件,加载输入张量的数据,并在指定的运行时执行网络。此应用程序类似于 snpe-net-run,但能够在同一网络上运行多个推理线程以进行基准测试。
DESCRIPTION:
------------
Example application demonstrating how to use SNPE
using the PSNPE and SNPE C/C++ API.
REQUIRED ARGUMENTS:
-------------------
--container <FILE> Path to the DL container containing the network.
--input_list <FILE> Path to a file listing the inputs for the network.
--perf_profile <VAL>
Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" , "sustained_high_performance" , "burst" , "power_saver" and "system_settings".
NOTE: "balanced" and "default" are the same. "default" is being deprecated in the future.
--cpu_fallback Enables cpu fallback functionality. Valid settings are "false", "true".
--runtime_order <VAL,VAL,VAL,..>
Specifies the order of precedence for runtime e.g cpu,gpu etc. Valid values are:-
cpu_float32 (Snapdragon CPU) = Data & Math: float 32bit
gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
dsp_fixed8_tf (Hexagon DSP) = Data & Math: 8bit fixed point Tensorflow style format
gpu_float16 (Adreno GPU) = Data: float 16bit Math: float 16bit
aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
cpu (Snapdragon CPU) = Same as cpu_float32
gpu (Adreno GPU) = Same as gpu_float32_16_hybrid
dsp (Hexagon DSP) = Same as dsp_fixed8_tf
aip (Snapdragon HTA+HVX) = Same as aip_fixed8_tf
--use_cpu Use the CPU runtime for SNPE.
--use_gpu Use the GPU float32 runtime for SNPE.
--use_gpu_fp16 Use the GPU float16 runtime for SNPE.
--use_dsp Use the DSP fixed point runtime for SNPE.
--use_aip Use the AIP fixed point runtime for SNPE.
OPTIONAL ARGUMENTS:
-------------------
--userbuffer_float Specifies to use userbuffer for inference, and the input type is float.
--userbuffer_tf8 Specifies to use userbuffer for inference, and the input type is tf8exact0.
--userbuffer_auto Specifies to use userbuffer with automatic input and output type detection for inference.
--use_native_input_files
Specifies to consume the input file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--use_native_output_files
Specifies to write the output file(s) in their native data type(s).
Must be used with --userbuffer_xxx.
--input_name <INPUT_NAME>
Specifies the name of input for which dimensions are specified.
--input_dimensions <INPUT_DIM>
Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
--output_dir <DIR> The directory to save result files
--static_min_max Specifies to use quantization parameters from the model instead of
input specific quantization. Used in conjunction with --userbuffer_tf8.
--userbuffer_float_output
Overrides the userbuffer output used for inference, and the output type is float.
Must be used with user specified buffer.
--userbuffer_tf8_output
Overrides the userbuffer output used for inference, and the output type is tf8exact0.
Must be used with user specified buffer.
--enable_init_cache Enable init caching mode to accelerate the network building process. Defaults to disable.
--profiling_level Specifies the profiling level. Valid settings are "off", "basic", "moderate" and "detailed".Default is off.
--platform_options Specifies value to pass as platform options. Valid settings: "HtaDLBC:ON/OFF", "unsignedPD:ON/OFF".
--set_output_tensors Specifies a comma separated list of tensors to be output after execution.
--userlogs <VAL> Specifies the user level logging as level,<optional logPath>.
--version Show SNPE Version Number.
--help Show this help message.
额外细节:
- 所需的运行时参数:
- 对于与运行时规范相关的必需参数,可以使用 –runtime_order 或 use_cpu/gpu/etc。需要指定。以下示例演示了使用这两个选项之一的等效命令。
snpe-parallel-run --container container.dlc --input_list input_list.txt
--perf_profileburst --cpu_fallback true --use_dsp --use_gpu --userbuffer_auto
相当于
snpe-parallel-run --container container.dlc --input_list input_list.txt
--perf_profileburst --cpu_fallback true --runtime_order dsp、gpu --userbuffer_auto
- 生成多个线程:
- snpe-parallel-run 能够创建多个线程来执行相同的推理过程。
在下面的示例中,给定的命令具有给定容器和输入列表所需的参数。在这 2 个选项之后,其余选项形成与每个线程对应的重复序列。在此示例中,我们改变了为每个线程指定的运行时(一个用于 dsp,另一个用于 gpu,最后一个用于 dsp)。
snpe-parallel-run --container container.dlc --input_list input_list.txt
--perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
--perf_profile burst --cpu_fallback true --use_gpu --userbuffer_auto
--perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
执行此命令时,会观察到以下输出部分:
...
Processing DNN input(s):
input.raw
PSNPE start executing...
runtimes: dsp_fixed8_tf gpu_float32_16_hybrid dsp_fixed8_tf - Mode :0- Number of images processed: x
Build time: x seconds.
...
请注意,列出的运行时数对应于指定的线程数以及指定这些线程的顺序。
snpe_bench.py
python 脚本 snpe_bench.py 运行 DLC 神经网络并收集基准性能信息。
usage: snpe_bench.py [-h] -c CONFIG_FILE [-o OUTPUT_BASE_DIR_OVERRIDE]
[-v DEVICE_ID_OVERRIDE] [-r HOST_NAME] [-a]
[-t DEVICE_OS_TYPE_OVERRIDE] [-d] [-s SLEEP]
[-b USERBUFFER_MODE] [-p PERFPROFILE] [-l PROFILINGLEVEL]
[-json] [-cache]
Run the snpe_bench
required arguments:
-c CONFIG_FILE, --config_file CONFIG_FILE
Path to a valid config file
Refer to sample config file config_help.json for more
detail on how to fill params in config file
optional arguments:
-o OUTPUT_BASE_DIR_OVERRIDE, --output_base_dir_override OUTPUT_BASE_DIR_OVERRIDE
Sets the output base directory.
-v DEVICE_ID_OVERRIDE, --device_id_override DEVICE_ID_OVERRIDE
Use this device ID instead of the one supplied in config
file. Cannot be used with -a
-r HOST_NAME, --host_name HOST_NAME
Hostname/IP of remote machine to which devices are
connected.
-a, --run_on_all_connected_devices_override
Runs on all connected devices, currently only support 1.
Cannot be used with -v
-t DEVICE_OS_TYPE_OVERRIDE, --device_os_type_override DEVICE_OS_TYPE_OVERRIDE
Specify the target OS type, valid options are
['android', 'android-aarch64', 'le', 'le64_gcc4.9',
'le_oe_gcc6.4', 'le64_oe_gcc6.4']
-d, --debug Set to turn on debug log
-s SLEEP, --sleep SLEEP
Set number of seconds to sleep between runs e.g. 20
seconds
-b USERBUFFER_MODE, --userbuffer_mode USERBUFFER_MODE
[EXPERIMENTAL] Enable user buffer mode, default to
float, can be tf8exact0
-p PERFPROFILE, --perfprofile PERFPROFILE
Set the benchmark operating mode (balanced, default,
sustained_high_performance, high_performance,
power_saver, system_settings)
-l PROFILINGLEVEL, --profilinglevel PROFILINGLEVEL
Set the profiling level mode (off, basic, moderate, detailed).
Default is basic.
-json, --generate_json
Set to produce json output.
-cache, --enable_init_cache
Enable init caching mode to accelerate the network
building process. Defaults to disable.
snpe-caffe-to-dlc
snpe-caffe-to-dlc 将 Caffe 模型转换为 SNPE DLC 文件。
usage: snpe-caffe-to-dlc [-h] [--input_network INPUT_NETWORK] [-o OUTPUT_PATH]
[--out_node OUT_NAMES]
[--copyright_file COPYRIGHT_FILE]
[--model_version MODEL_VERSION]
[--disable_batchnorm_folding]
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE]
[--input_encoding INPUT_NAME INPUT_ENCODING]
[--input_layout INPUT_NAME INPUT_LAYOUT]
[--udl UDL_MODULE FACTORY_FUNCTION]
[--enable_preprocessing]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--keep_quant_nodes]
[--keep_disconnected_nodes]
[--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
[--strict] [--debug [DEBUG]]
[-b CAFFE_BIN]
[--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
Script to convert caffemodel into a DLC file.
optional arguments:
-h, --help show this help message and exit
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be
saved.If not specified, the converter model will be
written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of
the file will be added to the output model.
--model_version MODEL_VERSION
User-defined ASCII string to identify the model, only
first 64 bytes will be stored
--disable_batchnorm_folding
If not specified, converter will try to fold batchnorm
into previous convolution layer
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for
each input is |default| if not specified. For example:
"data" image.Note that the quotes should always be
included in order to handle special characters,
spaces,etc. For multiple inputs specify multiple
--input_type on the command line. Eg: --input_type
"data1" image --input_type "data2" opaque These
options get used by DSP runtime and following
descriptions state how input will be handled for each
option. Image: input is float between 0-255 and the
input's mean is 0.0f and the input's max is 255.0f. We
will cast the float to uint8ts and pass the uint8ts to
the DSP. Default: pass the input as floats to the dsp
directly and the DSP will quantize it. Opaque: assumes
input is float because the consumer layer(i.e next
layer) requires it as float, therefore it won't be
quantized.Choices supported:['image', 'default',
'opaque']
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers
specified in the format [input_name datatype], for
example: 'data' 'float32'. Default is float32 if not
specified. Note that the quotes should always be
included in order to handle special characters, spaces,
etc. For multiple inputs specify multiple
--input_dtype on the command line like: --input_dtype
'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
Image encoding of the source images. Default is bgr.
Eg usage: "data" rgba Note the quotes should always be
included in order to handle special characters,
spaces, etc. For multiple inputs specify
--input_encoding for each on the command line. Eg:
--input_encoding "data1" rgba --input_encoding "data2"
other. Use options: color encodings(bgr,rgb, nv21...)
if input is image; time_series: for inputs of rnn
models; other: if input doesn't follow above
categories or is unknown. Choices supported:['bgr',
'rgb', 'rgba', 'argb32', 'nv21', 'time_series',
'other']
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--udl UDL_MODULE FACTORY_FUNCTION
Option to add User Defined Layers. Provide Filename, Function
name.1.Filename: Name of python module to load for registering custom
udl(note: must be in PYTHONPATH). If file part of package list the
package.filename as you would when doing a python import.2.Function name:
Name of the udl factory function that return a dictionary of key layer type
and value function callback.
--enable_preprocessing
If specified, converter will enable preprocessing specified by a datalayer
transform_param subtract_mean is supported.
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--validation_target RUNTIME_TARGET PROCESSOR_TARGET
A combination of processor and runtime target against
which model will be validated. Choices for
RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
snapdragon_835}.If not specified, will validate model
against {snapdragon_820, snapdragon_835} across all
runtime targets.
--strict If specified, will validate in strict mode whereby
model will not be produced if it violates constraints
of the specified validation target. If not specified,
will validate model in permissive mode against the
specified validation target.
--debug [DEBUG] Run the converter in debug mode.
-b CAFFE_BIN, --caffe_bin CAFFE_BIN
Input caffe binary file containing the weight data
--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -udo CUSTOM_OP_CONFIG_PATHS
[CUSTOM_OP_CONFIG_PATHS ...]
Path to the UDO configs (space separated, if multiple)
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
使用此脚本的示例可以在将模型从 Caffe 转换为 SNPE中找到。
额外细节:
- 输入编码参数:
- 指定输入图像的编码类型。
- 网络中添加了一个预处理层,用于将输入图像从指定编码转换为 BGR(Caffe 使用的编码)。
- 使用snpe-dlc-info时可以看到编码预处理层。
- 允许的选项有:
- argb32:ARGB32 格式每个像素由 4 个字节组成:一个字节用于红色,一个字节用于绿色,一个字节用于蓝色,一个字节用于 Alpha 通道。Alpha 通道被忽略。对于小端 CPU,字节顺序是 BGRA。对于大端 CPU,字节顺序是 ARGB。
- rgba:RGBA 格式每个像素由 4 个字节组成:一个字节表示红色,一个字节表示绿色,一个字节表示蓝色,一个字节表示 Alpha 通道。Alpha 通道被忽略。字节顺序与字节序无关,并且始终为 RGBA 字节顺序。
- nv21:NV21 是 YUV 的 Android 版本。色度经过下采样,子采样率为 4:2:0。请注意,此图像格式有 3 个通道,但 U 和 V 通道进行了二次采样。每四个 Y 像素就有 1 个 U 像素和 1 个 V 像素。
- bgr:BGR 格式由每个像素 3 个字节组成:一个字节表示红色,一个字节表示绿色,一个字节表示蓝色。字节顺序与字节序无关,并且始终为 BGR 字节顺序。
- 该参数是可选的。如果省略,则假设输入图像编码为 BGR 并且不添加预处理层。
- 有关更多详细信息,请参阅 input_preprocessing。 - disable_batchnorm_folding 参数:
- 禁用batchnorm折叠参数允许用户在可能的情况下关闭将batchnorm和batchnorm +缩放层折叠到先前卷积层的优化。
- 该参数是可选的。如果省略,则转换器将尽可能将批量标准化和批量标准化+缩放层折叠到先前的卷积层中,作为优化。当发生这种情况时,折叠的批量标准化和缩放层的名称将连接到它折叠到的卷积层。
- 例如:如果将名为“bn”的批量归一化层和名为“scale”的缩放层折叠为名为“conv”的卷积层,则生成的 dlc 将显示名为“conv.bn.scale”的卷积层。 - 输入类型参数:
- 指定特定输入图层名称的预期数据类型。
- 如果您想指定两个或多个输入层的预期数据类型,则可以多次传递此参数。
- input_type 参数采用 INPUT_NAME,后跟 INPUT_TYPE。
- 该参数是可选的。如果某个输入层被省略,则预期的数据类型将为 type:default。
- 允许的选项有:
- default:指定输入包含浮点值。
- image:指定输入包含浮点值,这些值都是 0…255 范围内的整数。
- opaque:指定输入包含浮点值,这些值应不加修改地传递到选定的运行时。
例如,不透明张量直接传递到 DSP,无需量化。
- 例如:[–input_type“data”图像–input_type“roi”不透明]。