目录
模型下载
点击模型名称,由于在github上,所以下载会比较慢。
模型转换
参考链接: 昇腾 CANN YOLOV8 和 YOLOV9 适配-云社区-华为云
pt->onnx
PC上安装 ultralytics
大概需要半个小时
pip install ultralytics
以及运行时的依赖
pip install onnx==1.16.1
pip install onnxslim onnxruntime
这里特别要注意 onnx的版本为指定版本。否则会报如下错误:
ImportError: DLL load failed while importing onnx_cpp2py_export :动态链接库(DLL)初始化历程失败
转换过程
H:\310p>python
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ultralytics
>>> from ultralytics import YOLO
>>> model = YOLO('yolov8l.pt')
>>> model.export(format='onnx', dynamic=False, simplify=True, opset=11)
Ultralytics 8.3.88 🚀 Python-3.9.2 torch-2.6.0+cpu CPU (11th Gen Intel Core(TM) i7-1165G7 2.80GHz)
YOLOv8l summary (fused): 112 layers, 43,668,288 parameters, 0 gradients, 165.2 GFLOPs
PyTorch: starting from 'yolov8l.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (83.7 MB)
ONNX: starting export with onnx 1.16.1 opset 11...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 5.6s, saved as 'yolov8l.onnx' (166.8 MB)
Export complete (8.0s)
Results saved to H:\310p
Predict: yolo predict task=detect model=yolov8l.onnx imgsz=640
Validate: yolo val task=detect model=yolov8l.onnx imgsz=640 data=coco.yaml
Visualize: https://netron.app
'yolov8l.onnx'
这里特别注意: model.export(format='onnx', dynamic=False, simplify=True, opset=11)opset参数指定到11,否则在后续转换到OM模型时会报如下错误: E19010
atc --model=yolov8l.onnx --framework=5 --output=yolov8l --input_shape="images:1,3,640,640" --soc_version=Ascend310P3
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E19010: 2025-03-12-12:00:19.102.020 No parser is registered for Op [/model.0/conv/Conv, optype [ai.onnx::19::Conv]].
Solution: Check the version of the installation package and reinstall the package. For details, see the operator specifications.
TraceBack (most recent call last):
参考链接: ATC转换指南:常见问题排查与解决方案大全【持续更新】_CANN_昇腾论坛
onnx->om
atc --model=yolov8l_11.onnx --framework=5 --output=yolov8l_11 --input_shape="images:1,3,640,640" --soc_version=Ascend310P3 --insert_op_conf=aipp.cfg
ATC start working now, please wait for a moment.
...
ATC run success, welcome to the next use.
samples: CANN Samples - Gitee.com
aipp.cfg来源于上述链接。
模型性能测试
tools: Ascend tools - Gitee.com 参考此链接 采用ais_bench 测试模型性能。
依赖软件安装
pip3 install tqdm
Collecting tqdm
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
|████████████████████████████████| 78 kB 241 kB/s
Installing collected packages: tqdm
Successfully installed tqdm-4.67.1
python安装源配置
不配置安装源,默认在老外那里的话,经常会比较慢或者安装失败。在配置了安装源后,下面两个工具的编译链接就非常简单。
当然这是在线安装的场景。
pip config list
# pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
Writing to /root/.config/pip/pip.conf
参考链接: pypi | 镜像站使用帮助 | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror
ais工具编译
构建aclruntime包
tools/ais-bench_workload/tool/ais_bench# pip3 wheel ./backend/ -v
............................
INFO] adding 'aclruntime.cpython-38-aarch64-linux-gnu.so'
[INFO] adding 'aclruntime-0.0.2.dist-info/METADATA'
[INFO] adding 'aclruntime-0.0.2.dist-info/WHEEL'
[INFO] adding 'aclruntime-0.0.2.dist-info/top_level.txt'
[INFO] adding 'aclruntime-0.0.2.dist-info/RECORD'
[INFO] removing build/bdist.linux-aarch64/wheel
/tmp/pip-build-env-3i0s7qsj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'release_date'
warnings.warn(msg)
Building wheel for aclruntime (PEP 517) ... done
Created wheel for aclruntime: filename=aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl size=447424 sha256=c1745441b5ab86adb46b9642e423fe26abe3f2ce9174c1e80cf38fe3ac3324a9
Stored in directory: /tmp/pip-ephem-wheel-cache-40oahjoz/wheels/b8/1c/65/914d76d5961c346e16b03144c00090aaa6e02f92a848d5f037
Successfully built aclruntime
Cleaning up...
Removing source in /tmp/pip-req-build-cl5cou1g
Removed build tracker: '/tmp/pip-req-tracker-_olq8rvv'
构建ais_bench推理程序包
tools/ais-bench_workload/tool/ais_bench# pip3 wheel ./ -v
removing build/bdist.linux-aarch64/wheel
done
Created wheel for ais-bench: filename=ais_bench-0.0.2-py3-none-any.whl size=101486 sha256=b779d61de6d4f83c7433dd95ee6043fbe5e3313f47110d91ca85da082bbb83c9
Stored in directory: /tmp/pip-ephem-wheel-cache-_fkriu7f/wheels/bb/ce/e5/26a02c79276bcb10bd2a8fc0506b630f7f6c7b2a2a3a24bd08
Successfully built ais-bench
Cleaning up...
Removing source in /tmp/pip-req-build-rtpo1whw
Removing source in /tmp/pip-wheel-26dw94g8/attrs
Removing source in /tmp/pip-wheel-26dw94g8/numpy
Removing source in /tmp/pip-wheel-26dw94g8/tqdm
Removed build tracker: '/tmp/pip-req-tracker-yh1jzgpb'
安装ais工具
/tools/ais-bench_workload/tool/ais_bench# pip3 install ./aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl
Installing collected packages: aclruntime
Successfully installed aclruntime-0.0.2
/tools/ais-bench_workload/tool/ais_bench# pip3 install ./ais_bench-0.0.2-py3-none-any.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./ais_bench-0.0.2-py3-none-any.whl
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (4.67.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (1.24.0)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (24.2.0)
Installing collected packages: ais-bench
Successfully installed ais-bench-0.0.2
模型推理性能测试
帧率及时延
time python3 -m ais_bench --model ./yolov8l_11.om --output /home/zh/310poutput/ --outfmt BIN --loop 2000
[INFO] acl init success
[INFO] open device 0 success
[INFO] create new context
[INFO] load model ./yolov8l_11.om success
[INFO] create model description success
[INFO] try get model batchsize:1
[INFO] output path:/home/zh/310poutput/2025_03_12-15_50_05
[INFO] warm up 1 done
loop inference exec: (2000/2000)| | 0/1 [00:00<?, ?it/s]
Inference array Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:19<00:00, 19.83s/it]
[INFO] -----------------Performance Summary------------------
[INFO] NPU_compute_time (ms): min = 9.8603515625, max = 9.99896240234375, mean = 9.8857497215271, median = 9.884765625, percentile(99%) = 9.925010833740235
[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(9.8857497215271): 101.1557067667221
[INFO] ------------------------------------------------------
[INFO] unload model success, model Id is 1
[INFO] end to reset device 0
[INFO] end to finalize acl
real 0m23.306s
user 0m4.432s
sys 0m0.391s
可以看到吞吐率 101帧 /s,平均耗时9.89毫秒。
资源消耗情况
通过npu-smi info watch -i 16
功耗
总结
虽然都是yolov8, 单各个小版本模型的资源消耗,处理速率并不相同,进而功耗差异也不同。这在我们前期需求选择模型时有重要参考价值。
再回到初始下载链接的图,图中,8s和8l再CPU ONNX的耗时倍数是3倍,这个倍差和我们用昇腾芯片测试出来一致。A100的延时只有2.39,而昇腾有9.89