算能边缘盒子性能测试

算能边缘盒子性能测试

本次对算能的边缘盒子:SE50221和SE9 16-BP1-11进行了性能测试,两个盒子的规格参数如下:

1. SE50221

TPU算力:17.6 TOPS@INT8;2.2 TFLOPS@FP32 , 不支持FP16.

1.1. 产品图片

在这里插入图片描述

1.2. 规格参数:

1.3. 性能测试

分别采用YOLOV5S v6.1的版本F32模型和INT8模型,进行图片测试,测试结果如下:

1.3.1. FP32模型推理

模型端到端推理耗时为:4.645+2.320+22.299+16.525=45.789毫秒

linaro@sophon:/data/sophon-demo_v0.1.8_dbb4632_20231116/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1684/yolov5s_v6.1_3output_fp32_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1084] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json

############################
SUMMARY: yolov5 test
############################
[                  decode time]  loops:    4 avg: 4.645 ms
[            yolov5 preprocess]  loops:    4 avg: 2.320 ms
[             yolov5 inference]  loops:    4 avg: 22.299 ms
[           yolov5 postprocess]  loops:    4 avg: 16.525 ms
[post 1: get output and decode]  loops:    4 avg: 12.342 ms
[         post 2: filter boxes]  loops:    4 avg: 3.229 ms
[                  post 3: nms]  loops:    4 avg: 0.012 ms
YoloV5 dtor ...

1.3.2. INT8模型模型推理

模型端到端推理耗时为:4.645+1.802+11.202+16.562=34.211毫秒

linaro@sophon:/data/sophon-demo_v0.1.8_dbb4632_20231116/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1684/yolov5s_v6.1_3output_fp32_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1084] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json

############################
SUMMARY: yolov5 test
############################
[                  decode time]  loops:    4 avg: 4.645 ms
[            yolov5 preprocess]  loops:    4 avg: 1.802 ms
[             yolov5 inference]  loops:    4 avg: 11.202 ms
[           yolov5 postprocess]  loops:    4 avg: 16.562 ms
[post 1: get output and decode]  loops:    4 avg: 12.340 ms
[         post 2: filter boxes]  loops:    4 avg: 3.249 ms
[                  post 3: nms]  loops:    4 avg: 0.012 ms
YoloV5 dtor ...

4645 usdecode time4 avg:loops:4 avg:1802 usyolov5 preprocessloops:4 avg:11202 usyolov5 inference.yolov5 postprocess4 avg:16562 usloops:4 avg: 12340 uslLoops:post 2: filter boxes]4 avg:3249 usloops:post 3:nms4 avg: 12 us

2. SE9 16-BP1-11

SE9-16的TPU算力是:32TOPS@INT4; 16TOPS@INT8; 4TFLOPS@FP16/BF16; 0.25TFLOPS@FP32

2.1. 产品图片:

在这里插入图片描述

2.2. 规格参数:

在这里插入图片描述

2.3. 性能测试

分别采用YOLOV5S v6.1的版本FP32模型、FP16模型和INT8模型,进行了测试,测试结果如下:

2.3.1. FP32模型推理

模型端到端推理耗时为:6.567+2.429+98.383+22.943=130.322毫秒

linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json

############################
SUMMARY: yolov5 test
############################
[                  decode time]  loops:    4 avg: 6.567000 ms
[            yolov5 preprocess]  loops:    4 avg: 2.429000 ms
[             yolov5 inference]  loops:    4 avg: 98.383000 ms
[           yolov5 postprocess]  loops:    4 avg: 22.943000 ms
[post 1: get output and decode]  loops:    4 avg: 17.845000 ms
[         post 2: filter boxes]  loops:    4 avg: 4.896000 ms
[                  post 3: nms]  loops:    4 avg: 0.016000 ms
YoloV5 dtor ...

2.3.2. FP16模型推理

模型端到端推理耗时为:6.592+2.422+ 27.805+24.594=61.413毫秒

linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp16_1b.bmodel_test_bmcv_cpp_result.json

############################
SUMMARY: yolov5 test
############################
[                  decode time]  loops:    4 avg: 6.592000 ms
[            yolov5 preprocess]  loops:    4 avg: 2.422000 ms
[             yolov5 inference]  loops:    4 avg: 27.805000 ms
[           yolov5 postprocess]  loops:    4 avg: 24.594000 ms
[post 1: get output and decode]  loops:    4 avg: 19.502000 ms
[         post 2: filter boxes]  loops:    4 avg: 4.887000 ms
[                  post 3: nms]  loops:    4 avg: 0.017000 ms
YoloV5 dtor ...

视频推理,平均每帧端到端推理时间:4.784+27.601000+22.611=54.996毫秒

linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test_car_person_1080P.mp4 --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

[h264_bm @ 0x5589d07480] bm decoder id: 0
[h264_bm @ 0x5589d07480] bm output format: 0
[h264_bm @ 0x5589d07480] mode bitstream: 2, frame delay: -1
[h264_bm @ 0x5589d07480] openDec video_stream_idx = 0, pix_fmt = 23
1, det_nums: 6
2, det_nums: 6
3, det_nums: 6
4, det_nums: 6
5, det_nums: 6
6, det_nums: 6
.......
587, det_nums: 7
[h264_bm @ 0x5589d07480] av_read_frame ret(-541478725) maybe eof...
588, det_nums: 6
589, det_nums: 6
590, det_nums: 7
591, det_nums: 8
592, det_nums: 7
#VideoDecFFM exit

############################
SUMMARY: yolov5 test
############################
[            yolov5 preprocess]  loops:  592 avg: 4.784000 ms
[             yolov5 inference]  loops:  592 avg: 27.601000 ms
[           yolov5 postprocess]  loops:  592 avg: 22.611000 ms
[post 1: get output and decode]  loops:  592 avg: 17.452000 ms
[         post 2: filter boxes]  loops:  592 avg: 4.946000 ms
[                  post 3: nms]  loops:  592 avg: 0.031000 ms
YoloV5 dtor ...

1路视频,TPU利用率:17%

Tue Mar 12 16:50:37 2024
+--------------------------------------------------------------------------------------------------+
| SDK Version:    0.4.9             Driver Version:  0.4.9                                         |
+---------------------------------------+----------------------------------------------------------+
|card  Name      Mode        SN         |TPU  boardT  chipT   TPU_P  TPU_V  ECC  CorrectN  Tpu-Util|
|12V_ATX  MaxP boardP Minclk Maxclk  Fan|Bus-ID      Status   Currclk   TPU_C   Memory-Usage       |
|=======================================+==========================================================|
| 0  1688-SOC     SOC      N/A          | 0    N/A     N/A     N/A     N/A  N/A    N/A         17% |
|   N/A   N/A   N/A  450M     900M   N/A| N/A        Active    900M       N/A   352MB/ 6144MB      |
+=======================================+==========================================================+

+--------------------------------------------------------------------------------------------------+
| Processes:                                                                            TPU Memory |
|  TPU-ID       PID   Process name                                                      Usage      |
|==================================================================================================|
        0   3617144  ./yolov5_bmcv.soc                                                   0MB
        0   3617144  ./yolov5_bmcv.soc                                                  65MB

2.3.3. INT8模型推理

模型端到端推理耗时为:6.702+2.452+7.593+24.163=40.91毫秒

linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_int8_1b.bmodel_test_bmcv_cpp_result.json

############################
SUMMARY: yolov5 test
############################
[                  decode time]  loops:    4 avg: 6.702000 ms
[            yolov5 preprocess]  loops:    4 avg: 2.452000 ms
[             yolov5 inference]  loops:    4 avg: 7.593000 ms
[           yolov5 postprocess]  loops:    4 avg: 24.163000 ms
[post 1: get output and decode]  loops:    4 avg: 18.327000 ms
[         post 2: filter boxes]  loops:    4 avg: 5.641000 ms
[                  post 3: nms]  loops:    4 avg: 0.017000 ms
YoloV5 dtor ...

视频推理,平均每帧端到端推理时间:4.815+7.434+22.668=34.917毫秒

linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test_car_person_1080P.mp4 --bmodel=../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel  --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***

########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
  Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
  Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
  Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
  Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################

[h264_bm @ 0x5579297350] bm decoder id: 0
[h264_bm @ 0x5579297350] bm output format: 0
[h264_bm @ 0x5579297350] mode bitstream: 2, frame delay: -1
[h264_bm @ 0x5579297350] openDec video_stream_idx = 0, pix_fmt = 23
1, det_nums: 6
2, det_nums: 6
3, det_nums: 6
.......
588, det_nums: 5
589, det_nums: 5
590, det_nums: 7
591, det_nums: 7
592, det_nums: 6
#VideoDecFFM exit

############################
SUMMARY: yolov5 test
############################
[            yolov5 preprocess]  loops:  592 avg: 4.815000 ms
[             yolov5 inference]  loops:  592 avg: 7.434000 ms
[           yolov5 postprocess]  loops:  592 avg: 22.668000 ms
[post 1: get output and decode]  loops:  592 avg: 17.498000 ms
[         post 2: filter boxes]  loops:  592 avg: 4.960000 ms
[                  post 3: nms]  loops:  592 avg: 0.028000 ms
YoloV5 dtor ...

1路视频,TPU 利用率: 7%

Tue Mar 12 16:43:46 2024
+--------------------------------------------------------------------------------------------------+
| SDK Version:    0.4.9             Driver Version:  0.4.9                                         |
+---------------------------------------+----------------------------------------------------------+
|card  Name      Mode        SN         |TPU  boardT  chipT   TPU_P  TPU_V  ECC  CorrectN  Tpu-Util|
|12V_ATX  MaxP boardP Minclk Maxclk  Fan|Bus-ID      Status   Currclk   TPU_C   Memory-Usage       |
|=======================================+==========================================================|
| 0  1688-SOC     SOC      N/A          | 0    N/A     N/A     N/A     N/A  N/A    N/A          6% |
|   N/A   N/A   N/A  450M     900M   N/A| N/A        Active    900M       N/A   340MB/ 6144MB      |
+=======================================+==========================================================+

+--------------------------------------------------------------------------------------------------+
| Processes:                                                                            TPU Memory |
|  TPU-ID       PID   Process name                                                      Usage      |
|==================================================================================================|
        0   3613086  ./yolov5_bmcv.soc                                                   0MB
        0   3613086  ./yolov5_bmcv.soc                                                  59MB

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值