算能边缘盒子性能测试
本次对算能的边缘盒子:SE50221和SE9 16-BP1-11进行了性能测试,两个盒子的规格参数如下:
1. SE50221
TPU算力:17.6 TOPS@INT8;2.2 TFLOPS@FP32 , 不支持FP16.
1.1. 产品图片
1.2. 规格参数:
1.3. 性能测试
分别采用YOLOV5S v6.1的版本F32模型和INT8模型,进行图片测试,测试结果如下:
1.3.1. FP32模型推理
模型端到端推理耗时为:4.645+2.320+22.299+16.525=45.789毫秒
linaro@sophon:/data/sophon-demo_v0.1.8_dbb4632_20231116/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1684/yolov5s_v6.1_3output_fp32_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1084] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json
############################
SUMMARY: yolov5 test
############################
[ decode time] loops: 4 avg: 4.645 ms
[ yolov5 preprocess] loops: 4 avg: 2.320 ms
[ yolov5 inference] loops: 4 avg: 22.299 ms
[ yolov5 postprocess] loops: 4 avg: 16.525 ms
[post 1: get output and decode] loops: 4 avg: 12.342 ms
[ post 2: filter boxes] loops: 4 avg: 3.229 ms
[ post 3: nms] loops: 4 avg: 0.012 ms
YoloV5 dtor ...
1.3.2. INT8模型模型推理
模型端到端推理耗时为:4.645+1.802+11.202+16.562=34.211毫秒
linaro@sophon:/data/sophon-demo_v0.1.8_dbb4632_20231116/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1684/yolov5s_v6.1_3output_fp32_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1084] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1023] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json
############################
SUMMARY: yolov5 test
############################
[ decode time] loops: 4 avg: 4.645 ms
[ yolov5 preprocess] loops: 4 avg: 1.802 ms
[ yolov5 inference] loops: 4 avg: 11.202 ms
[ yolov5 postprocess] loops: 4 avg: 16.562 ms
[post 1: get output and decode] loops: 4 avg: 12.340 ms
[ post 2: filter boxes] loops: 4 avg: 3.249 ms
[ post 3: nms] loops: 4 avg: 0.012 ms
YoloV5 dtor ...
4645 usdecode time4 avg:loops:4 avg:1802 usyolov5 preprocessloops:4 avg:11202 usyolov5 inference.yolov5 postprocess4 avg:16562 usloops:4 avg: 12340 uslLoops:post 2: filter boxes]4 avg:3249 usloops:post 3:nms4 avg: 12 us
2. SE9 16-BP1-11
SE9-16的TPU算力是:32TOPS@INT4; 16TOPS@INT8; 4TFLOPS@FP16/BF16; 0.25TFLOPS@FP32
2.1. 产品图片:
2.2. 规格参数:
2.3. 性能测试
分别采用YOLOV5S v6.1的版本FP32模型、FP16模型和INT8模型,进行了测试,测试结果如下:
2.3.1. FP32模型推理
模型端到端推理耗时为:6.567+2.429+98.383+22.943=130.322毫秒
linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp32_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp32_1b.bmodel_test_bmcv_cpp_result.json
############################
SUMMARY: yolov5 test
############################
[ decode time] loops: 4 avg: 6.567000 ms
[ yolov5 preprocess] loops: 4 avg: 2.429000 ms
[ yolov5 inference] loops: 4 avg: 98.383000 ms
[ yolov5 postprocess] loops: 4 avg: 22.943000 ms
[post 1: get output and decode] loops: 4 avg: 17.845000 ms
[ post 2: filter boxes] loops: 4 avg: 4.896000 ms
[ post 3: nms] loops: 4 avg: 0.016000 ms
YoloV5 dtor ...
2.3.2. FP16模型推理
模型端到端推理耗时为:6.592+2.422+ 27.805+24.594=61.413毫秒
linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_fp16_1b.bmodel_test_bmcv_cpp_result.json
############################
SUMMARY: yolov5 test
############################
[ decode time] loops: 4 avg: 6.592000 ms
[ yolov5 preprocess] loops: 4 avg: 2.422000 ms
[ yolov5 inference] loops: 4 avg: 27.805000 ms
[ yolov5 postprocess] loops: 4 avg: 24.594000 ms
[post 1: get output and decode] loops: 4 avg: 19.502000 ms
[ post 2: filter boxes] loops: 4 avg: 4.887000 ms
[ post 3: nms] loops: 4 avg: 0.017000 ms
YoloV5 dtor ...
视频推理,平均每帧端到端推理时间:4.784+27.601000+22.611=54.996毫秒
linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test_car_person_1080P.mp4 --bmodel=../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_fp16_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
[h264_bm @ 0x5589d07480] bm decoder id: 0
[h264_bm @ 0x5589d07480] bm output format: 0
[h264_bm @ 0x5589d07480] mode bitstream: 2, frame delay: -1
[h264_bm @ 0x5589d07480] openDec video_stream_idx = 0, pix_fmt = 23
1, det_nums: 6
2, det_nums: 6
3, det_nums: 6
4, det_nums: 6
5, det_nums: 6
6, det_nums: 6
.......
587, det_nums: 7
[h264_bm @ 0x5589d07480] av_read_frame ret(-541478725) maybe eof...
588, det_nums: 6
589, det_nums: 6
590, det_nums: 7
591, det_nums: 8
592, det_nums: 7
#VideoDecFFM exit
############################
SUMMARY: yolov5 test
############################
[ yolov5 preprocess] loops: 592 avg: 4.784000 ms
[ yolov5 inference] loops: 592 avg: 27.601000 ms
[ yolov5 postprocess] loops: 592 avg: 22.611000 ms
[post 1: get output and decode] loops: 592 avg: 17.452000 ms
[ post 2: filter boxes] loops: 592 avg: 4.946000 ms
[ post 3: nms] loops: 592 avg: 0.031000 ms
YoloV5 dtor ...
1路视频,TPU利用率:17%
Tue Mar 12 16:50:37 2024
+--------------------------------------------------------------------------------------------------+
| SDK Version: 0.4.9 Driver Version: 0.4.9 |
+---------------------------------------+----------------------------------------------------------+
|card Name Mode SN |TPU boardT chipT TPU_P TPU_V ECC CorrectN Tpu-Util|
|12V_ATX MaxP boardP Minclk Maxclk Fan|Bus-ID Status Currclk TPU_C Memory-Usage |
|=======================================+==========================================================|
| 0 1688-SOC SOC N/A | 0 N/A N/A N/A N/A N/A N/A 17% |
| N/A N/A N/A 450M 900M N/A| N/A Active 900M N/A 352MB/ 6144MB |
+=======================================+==========================================================+
+--------------------------------------------------------------------------------------------------+
| Processes: TPU Memory |
| TPU-ID PID Process name Usage |
|==================================================================================================|
0 3617144 ./yolov5_bmcv.soc 0MB
0 3617144 ./yolov5_bmcv.soc 65MB
2.3.3. INT8模型推理
模型端到端推理耗时为:6.702+2.452+7.593+24.163=40.91毫秒
linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test --bmodel=../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
1/4, img_file: ../../datasets/test/000000547383.jpg
2/4, img_file: ../../datasets/test/3.jpg
3/4, img_file: ../../datasets/test/dog.jpg
4/4, img_file: ../../datasets/test/zidane.jpg
================
result saved in results/yolov5s_v6.1_3output_int8_1b.bmodel_test_bmcv_cpp_result.json
############################
SUMMARY: yolov5 test
############################
[ decode time] loops: 4 avg: 6.702000 ms
[ yolov5 preprocess] loops: 4 avg: 2.452000 ms
[ yolov5 inference] loops: 4 avg: 7.593000 ms
[ yolov5 postprocess] loops: 4 avg: 24.163000 ms
[post 1: get output and decode] loops: 4 avg: 18.327000 ms
[ post 2: filter boxes] loops: 4 avg: 5.641000 ms
[ post 3: nms] loops: 4 avg: 0.017000 ms
YoloV5 dtor ...
视频推理,平均每帧端到端推理时间:4.815+7.434+22.668=34.917毫秒
linaro@sophon:/data/sophon-demo_v0.1.9_91d8161e_20231227/sample/YOLOv5/cpp/yolov5_bmcv$ ./yolov5_bmcv.soc --input=../../datasets/test_car_person_1080P.mp4 --bmodel=../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel --dev_id=0 --conf_thresh=0.5 --mns_thresh=0.5 --obj_thresh=0.5 --classname=../../datasets/coco.names
set device id: 0
[BMRT][bmcpu_setup:435] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][BMProfile:59] INFO:Profile For arch=4
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
[BMRT][load_bmodel:1573] INFO:Loading bmodel from [../../models/BM1688/yolov5s_v6.1_3output_int8_1b.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1501] INFO:pre net num: 0, load net num: 1
YoloV5 ctor ..
*** Run in SOC mode ***
########################
NetName: yolov5s_v6.1_3output
---- stage 0 ----
Input 0) 'images' shape=[ 1 3 640 640 ] dtype=FLOAT32 scale=1
Output 0) 'output_Transpose_f32' shape=[ 1 3 80 80 85 ] dtype=FLOAT32 scale=1
Output 1) '365_Transpose_f32' shape=[ 1 3 40 40 85 ] dtype=FLOAT32 scale=1
Output 2) '385_Transpose_f32' shape=[ 1 3 20 20 85 ] dtype=FLOAT32 scale=1
########################
[h264_bm @ 0x5579297350] bm decoder id: 0
[h264_bm @ 0x5579297350] bm output format: 0
[h264_bm @ 0x5579297350] mode bitstream: 2, frame delay: -1
[h264_bm @ 0x5579297350] openDec video_stream_idx = 0, pix_fmt = 23
1, det_nums: 6
2, det_nums: 6
3, det_nums: 6
.......
588, det_nums: 5
589, det_nums: 5
590, det_nums: 7
591, det_nums: 7
592, det_nums: 6
#VideoDecFFM exit
############################
SUMMARY: yolov5 test
############################
[ yolov5 preprocess] loops: 592 avg: 4.815000 ms
[ yolov5 inference] loops: 592 avg: 7.434000 ms
[ yolov5 postprocess] loops: 592 avg: 22.668000 ms
[post 1: get output and decode] loops: 592 avg: 17.498000 ms
[ post 2: filter boxes] loops: 592 avg: 4.960000 ms
[ post 3: nms] loops: 592 avg: 0.028000 ms
YoloV5 dtor ...
1路视频,TPU 利用率: 7%
Tue Mar 12 16:43:46 2024
+--------------------------------------------------------------------------------------------------+
| SDK Version: 0.4.9 Driver Version: 0.4.9 |
+---------------------------------------+----------------------------------------------------------+
|card Name Mode SN |TPU boardT chipT TPU_P TPU_V ECC CorrectN Tpu-Util|
|12V_ATX MaxP boardP Minclk Maxclk Fan|Bus-ID Status Currclk TPU_C Memory-Usage |
|=======================================+==========================================================|
| 0 1688-SOC SOC N/A | 0 N/A N/A N/A N/A N/A N/A 6% |
| N/A N/A N/A 450M 900M N/A| N/A Active 900M N/A 340MB/ 6144MB |
+=======================================+==========================================================+
+--------------------------------------------------------------------------------------------------+
| Processes: TPU Memory |
| TPU-ID PID Process name Usage |
|==================================================================================================|
0 3613086 ./yolov5_bmcv.soc 0MB
0 3613086 ./yolov5_bmcv.soc 59MB