SNPE DSP vs AIP

背景

高通官方宣称865板子上的HTA(也就是AIP, Artificial Intelligence Processor)的算力高达8T,比DSP的3T高2.7倍,但实测过classify模型,HTA的速度跟DSP差不多,甚至可能还更慢一些。

DSP VS AIP测试

简单卷积层测试-1

onnx生成代码

class SampleModel(torch.nn.Module):
    def __init__(self):
        super(SampleModel, self).__init__()
        self.conv3x3_0 = torch.nn.Conv2d(3, 16, (3, 3))
        self.conv3x3_1 = torch.nn.Conv2d(16, 32, (3, 3))
        self.conv3x3_2 = torch.nn.Conv2d(32, 64, (3, 3))
        self.conv3x3_3 = torch.nn.Conv2d(64, 128, (3, 3))

    def forward(self, x):
        x = self.conv3x3_0(x)
        x = self.conv3x3_1(x)
        x = self.conv3x3_2(x)
        return self.conv3x3_3(x)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-o', '--output_onnx', type=str, default='Conv_4.onnx', help='onnx save path')
    args = parser.parse_args()

    net = SampleModel()
    model_name = args.output_onnx
    print(model_name)
    dummy_input = torch.randn(1, 3, 320, 320)
    torch.onnx.export(model=net,
                      args=dummy_input,
                      f=model_name,
                      input_names=['input'],
                      output_names=['output'])

snpe量化

量化数据集可以采用随机生成的方式,size是320x320x3,量化时加入了enable_hta参数,以生成可以在AIP上运行的dlc模型:SNPE-hta_support

> snpe-dlc-quantize --input_dlc=./Conv_4.dlc \
--input_list=../face_det/raw_list_769M.txt \
--output_dlc=./Conv_4_int8_hta.dlc \
--enable_hta
[INFO] InitializeStderr: DebugLog initialized.
[INFO] Writing intermediate model
item already exists: 1124
[AIP_TF8_HTA : 0 1 2 3 4 ] ::1
item already exists: 1124
Starting assembler.
[INFO] Setting activation for layer: input and buffer: input
[INFO] bw: 8, min: -2.127063, max: 2.630841, delta: 0.018658, offset: -114.000000
[INFO] Setting activation for layer: Conv_0 and buffer: input.1
[INFO] bw: 8, min: -3.182152, max: 3.813096, delta: 0.027432, offset: -116.000000
[INFO] Setting activation for layer: Conv_1 and buffer: input.4
[INFO] bw: 8, min: -2.350746, max: 2.260332, delta: 0.018083, offset: -130.000000
[INFO] Setting activation for layer: Conv_2 and buffer: input.8
[INFO] bw: 8, min: -1.348760, max: 1.317393, delta: 0.010456, offset: -129.000000
[INFO] Setting activation for layer: Conv_3 and buffer: output
[INFO] bw: 8, min: -0.795907, max: 0.777397, delta: 0.006170, offset: -129.000000
[INFO] Running Graph Partitioner for SDM865
[INFO] Blob ID:2
[INFO] Writing quantized model to: ./Conv_4_int8_hta.dlc
[INFO] Compiling HTA metadata into DLC.
[INFO] Creating new AIP record aip.metadata0
[INFO] Record Version:: 1.2.0.0
[INFO] Compiler Version:: 1.6.2.1
[INFO] HTA Blob ID:: 1
NO ERRORS;
NO WARNINGS;
Please contact Qualcomm NPU team for potential further performance optimization for your model
item already exists: 1124
[INFO] Creating new AIP record aip.metadata1
[INFO] Record Version:: 1.2.0.0
[INFO] Compiler Version:: 1.6.2.1
[INFO] HTA Blob ID:: 2
[INFO] Successfully compiled HTA metadata into DLC.
[INFO] DebugLog shutting down.

高通板子(QCS8250)上运行dsp与aip

demo说明:使用SNPE c++ API编写的demo,输入是一张图片,会resize到模型需要的input,-d=2代表DSP, -d=3代表AIP。-e参数无需关心。

  • Dsp: 397ms

255|kona:/data/local/tmp/yeruihuan/snpe_test $ ./snpe_test -i=./lena.jpg \
-m=./Conv_4_int8_hta.dlc -e=7 -d=2
[08-18 10:42:33.517]  15069 15069 V snpe_test.cc:0131:  INFO: input img size: 320 x 320
[08-18 10:42:33.521]  15069 15069 V snpe_test.cc:0147:  INFO: use SNPE AI engine
[08-18 10:42:33.521]  15069 15069 V snpe_util.cc:0008:  SNPE runtime: DSP_FIXED8_TF
[08-18 10:42:33.918]  15069 15069 V snpe_test.cc:0105:  RunSNPESample cost: 397.628000 ms
  • Aip:445ms

kona:/data/local/tmp/yeruihuan/snpe_test $ ./snpe_test -i=./lena.jpg \
-m=./Conv_4_int8_hta.dlc -e=7 -d=3
[08-18 10:42:59.496]  15183 15183 V snpe_test.cc:0131:  INFO: input img size: 320 x 320
[08-18 10:42:59.500]  15183 15183 V snpe_test.cc:0147:  INFO: use SNPE AI engine
[08-18 10:42:59.500]  15183 15183 V snpe_util.cc:0008:  SNPE runtime: AIP_FIXED8_TF
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
NPU User Driver: npu_read_info 0
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
NPU driver built on: Nov 15 2021 16:31:41
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
DLBC compression enabled
item already exists: 1124
NET size 106496 off 0 id=ffffffff
INTERMEDIATE size 19384320 off 0 id=fffffffe
ACO buffer size 7688 fd 15 off 0
* NPU_Stats: npu_compile_get_objs(): 17.21 ms
DUAL ACO VA = 0fddf4000 Network VA = 0xff1a0000 Intermediate VA = 0xfde00000 Intermediate 1 VA= 0xfca00000
npu_load_network_v2: perf mode = 4 priority = 3f flags = 0x44 num layers = 5
* NPU_Stats: npu_load_network_v2: NPU + kernel : 16.71 ms
npu_load_network_v2: network handle = 0x10401
* NPU_Stats: npu_load_network(): 36.02 ms
* NPU_Stats: npu_alloc_buffer_v2(): 0.74 ms sts=0
* NPU_Stats: npu_alloc_buffer_v2(): 0.41 ms sts=0
npu_set_property status: 0
[08-18 10:42:59.945]  15183 15183 V snpe_test.cc:0105:  RunSNPESample cost: 445.387000 ms
* NPU_Stats: npu_free_buffer_v2(): 0.00 ms
* NPU_Stats: npu_free_buffer_v2(): 0.00 ms
* NPU_Stats: npu_unload_network(): NPU + kernel : 12.72 ms
free delayed buffer fbc00000
free delayed buffer fc980000
* NPU_Stats: npu_unload_network(): 31.72 ms

SNPE SDK TOOL测试

官方链接:SNPE-SDK-Tools

Snpe-platform-validator

检查device上的SNPE兼容情况,可以查看此设备是否支持SNPE的GPU, DSP, AIP等。

1|kona:/data/local/tmp/SNPE_SDK_TOOL/bin/ # ./snpe-platform-validator \
--runtime aip \
--coreVersion --libVersion --debug

PF_VALIDATOR: DEBUG: starting calculator test
PF_VALIDATOR: DEBUG: Loading DSP stub: libcalculator.so
PF_VALIDATOR: DEBUG: Successfully loaded DSP library - 'libcalculator.so'.  Setting up pointers.
PF_VALIDATOR: DEBUG: Success in executing the sum function
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
NPU User Driver: npu_read_info 0
PF_VALIDATOR: DEBUG: Calling PlatformValidator->RuntimeCheck
PF_VALIDATOR: DEBUG: Testing for the support of AIP runtime.
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
NPU User Driver: npu_read_info 0
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
NPU driver built on: Nov 15 2021 16:31:41
npu_get_property status: 0
npu_get_property status: 0
FW CAPS [0] = 0x2007
FW CAPS [1] = 0x5
FW CAPS [2] = 0x0
FW CAPS [3] = 0x0
FW CAPS [4] = 0x0
FW CAPS [5] = 0x0
FW CAPS [6] = 0x0
FW CAPS [7] = 0x0
npu_get_property status: 0
DLBC compression enabled
item already exists: 1124
NET size 4096 off 0 id=ffffffff
INTERMEDIATE size 8192 off 0 id=fffffffe
ACO buffer size 2728 fd 13 off 0
* NPU_Stats: npu_compile_get_objs(): 15.23 ms
DUAL ACO VA = 0fffee000 Network VA = 0xffff3000 Intermediate VA = 0xffff0000 Intermediate 1 VA= 0xfffec000

npu_load_network_v2: perf mode = 0 priority = 0 flags = 0x44 num layers = 2
* NPU_Stats: npu_load_network_v2: NPU + kernel : 4.44 ms
npu_load_network_v2: network handle = 0x10101
* NPU_Stats: npu_load_network(): 20.81 ms
Unit Test on the runtime AIP: Passed.
SNPE is supported for runtime AIP on the device.
PF_VALIDATOR: DEBUG: Calling PlatformValidator->IsRuntimeAvailable
Runtime AIP Prerequisites: Present.
PF_VALIDATOR: DEBUG: Calling PlatformValidator->GetLibVersion
Library Version of the runtime AIP: Npu Lib v2

PF_VALIDATOR: DEBUG: Calling PlatformValidator->GetCoreVersion
Core Version of the runtime AIP: 135936

高通板子上所有device的支持情况如下,说明GPU, DSP和AIP都是支持的。

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
SNPE Docker是一个包含Snapdragon Neural Processing Engine SDK、Caffe和ADB的Docker镜像。你可以使用以下步骤来构建和运行SNPE Docker镜像: **步骤1:构建Docker镜像** 使用以下命令构建Docker镜像: ``` docker build -t snpe . ``` **步骤2:运行Docker容器** 使用以下命令来运行Docker容器: ``` docker run -it --privileged -v ~/Git/snpe-1.13.0:/root/snpe -v... ``` 在这个命令中,你需要根据你自己的需求来设置相关的选项和挂载卷。 **步骤3:安装SNPE环境** 根据需求,你可以使用Docker镜像中的SNPE环境。根据你的需求,你可以通过以下步骤来安装SNPE环境: 1. 登录到Docker仓库: ``` docker login cr.d.xiaomi.net -u org_46522 -p 46370020336295373ad3815abd6db118 ``` 2. 拉取SNPE镜像: ``` docker pull cr.d.xiaomi.net/ailab-vision-doc/snpe_dev:18.04 ``` 3. 开启一个后台Docker容器: ``` docker run -it --privileged -d --name snpe-1.50 -v /dev/bus/usb:/dev/bus/usb -v /home/:/home/ --net=host cr.d.xiaomi.net/ailab-vision-doc/snpe_dev:18.04 ``` **步骤4:使用SNPE** 在启动的容器中,你可以使用以下命令来使用SNPE: 1. 启动一个容器: ``` # 查看之前已经启动的容器 docker container ls -a # 61744239ab70是容器的ID docker container start 61744239ab70 # 开启一个Docker终端 docker exec -it 61744239ab7 /bin/bash ``` 2. 如果在Docker镜像中没有对应版本的SNPE目录,你可以从SNPE官网下载对应版本的SNPE,并将其拷贝到`/root/snpe/`目录中。 3. 使用SNPE进行模型转换和量化。具体的步骤可以参考官方文档或者SNPE的使用指南。 希望以上信息能够帮助到你。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值