yolov8在昇腾芯片上的测试

目录

模型下载

模型转换 

pt->onnx

PC上安装 ultralytics

转换过程

onnx->om 

 模型性能测试

依赖软件安装

python安装源配置

ais工具编译

构建aclruntime包

构建ais_bench推理程序包

安装ais工具

模型推理性能测试

帧率及时延

资源消耗情况

功耗

总结


模型下载

YOLOv8 -Ultralytics YOLO 文档

    点击模型名称,由于在github上,所以下载会比较慢。

模型转换 

 参考链接: 昇腾 CANN YOLOV8 和 YOLOV9 适配-云社区-华为云

pt->onnx

PC上安装 ultralytics

大概需要半个小时

pip install ultralytics

以及运行时的依赖

pip install onnx==1.16.1    

pip install onnxslim onnxruntime

这里特别要注意 onnx的版本为指定版本。否则会报如下错误:

ImportError: DLL load failed while importing onnx_cpp2py_export :动态链接库(DLL)初始化历程失败

转换过程

H:\310p>python
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ultralytics
>>> from ultralytics import YOLO
>>> model = YOLO('yolov8l.pt')
>>> model.export(format='onnx', dynamic=False, simplify=True, opset=11)
Ultralytics 8.3.88 🚀 Python-3.9.2 torch-2.6.0+cpu CPU (11th Gen Intel Core(TM) i7-1165G7 2.80GHz)
YOLOv8l summary (fused): 112 layers, 43,668,288 parameters, 0 gradients, 165.2 GFLOPs

PyTorch: starting from 'yolov8l.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (83.7 MB)

ONNX: starting export with onnx 1.16.1 opset 11...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 5.6s, saved as 'yolov8l.onnx' (166.8 MB)

Export complete (8.0s)
Results saved to H:\310p
Predict:         yolo predict task=detect model=yolov8l.onnx imgsz=640
Validate:        yolo val task=detect model=yolov8l.onnx imgsz=640 data=coco.yaml
Visualize:       https://netron.app
'yolov8l.onnx'

这里特别注意: model.export(format='onnx', dynamic=False, simplify=True, opset=11)opset参数指定到11,否则在后续转换到OM模型时会报如下错误: E19010

 atc --model=yolov8l.onnx --framework=5 --output=yolov8l --input_shape="images:1,3,640,640"  --soc_version=Ascend310P3
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E19010: 2025-03-12-12:00:19.102.020 No parser is registered for Op [/model.0/conv/Conv, optype [ai.onnx::19::Conv]].
        Solution: Check the version of the installation package and reinstall the package. For details, see the operator specifications.
        TraceBack (most recent call last):

参考链接: ATC转换指南:常见问题排查与解决方案大全【持续更新】_CANN_昇腾论坛

onnx->om 

 atc --model=yolov8l_11.onnx --framework=5 --output=yolov8l_11 --input_shape="images:1,3,640,640"  --soc_version=Ascend310P3  --insert_op_conf=aipp.cfg
ATC start working now, please wait for a moment.
...
ATC run success, welcome to the next use.

samples: CANN Samples - Gitee.com

aipp.cfg来源于上述链接。

 模型性能测试

 tools: Ascend tools - Gitee.com  参考此链接 采用ais_bench 测试模型性能。

依赖软件安装

 pip3 install tqdm

Collecting tqdm
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
|████████████████████████████████| 78 kB 241 kB/s
Installing collected packages: tqdm
Successfully installed tqdm-4.67.1

python安装源配置

不配置安装源,默认在老外那里的话,经常会比较慢或者安装失败。在配置了安装源后,下面两个工具的编译链接就非常简单。

当然这是在线安装的场景。

pip config list

# pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

Writing to /root/.config/pip/pip.conf

参考链接: pypi | 镜像站使用帮助 | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror

ais工具编译

构建aclruntime包

tools/ais-bench_workload/tool/ais_bench# pip3 wheel ./backend/ -v

............................
INFO] adding 'aclruntime.cpython-38-aarch64-linux-gnu.so'
  [INFO] adding 'aclruntime-0.0.2.dist-info/METADATA'
  [INFO] adding 'aclruntime-0.0.2.dist-info/WHEEL'
  [INFO] adding 'aclruntime-0.0.2.dist-info/top_level.txt'
  [INFO] adding 'aclruntime-0.0.2.dist-info/RECORD'
  [INFO] removing build/bdist.linux-aarch64/wheel
  /tmp/pip-build-env-3i0s7qsj/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'release_date'
    warnings.warn(msg)
  Building wheel for aclruntime (PEP 517) ... done
  Created wheel for aclruntime: filename=aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl size=447424 sha256=c1745441b5ab86adb46b9642e423fe26abe3f2ce9174c1e80cf38fe3ac3324a9
  Stored in directory: /tmp/pip-ephem-wheel-cache-40oahjoz/wheels/b8/1c/65/914d76d5961c346e16b03144c00090aaa6e02f92a848d5f037
Successfully built aclruntime
Cleaning up...
  Removing source in /tmp/pip-req-build-cl5cou1g
Removed build tracker: '/tmp/pip-req-tracker-_olq8rvv'

构建ais_bench推理程序包

tools/ais-bench_workload/tool/ais_bench# pip3 wheel ./ -v

  removing build/bdist.linux-aarch64/wheel
done
  Created wheel for ais-bench: filename=ais_bench-0.0.2-py3-none-any.whl size=101486 sha256=b779d61de6d4f83c7433dd95ee6043fbe5e3313f47110d91ca85da082bbb83c9
  Stored in directory: /tmp/pip-ephem-wheel-cache-_fkriu7f/wheels/bb/ce/e5/26a02c79276bcb10bd2a8fc0506b630f7f6c7b2a2a3a24bd08
Successfully built ais-bench
Cleaning up...
  Removing source in /tmp/pip-req-build-rtpo1whw
  Removing source in /tmp/pip-wheel-26dw94g8/attrs
  Removing source in /tmp/pip-wheel-26dw94g8/numpy
  Removing source in /tmp/pip-wheel-26dw94g8/tqdm
Removed build tracker: '/tmp/pip-req-tracker-yh1jzgpb'

安装ais工具

/tools/ais-bench_workload/tool/ais_bench# pip3 install ./aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./aclruntime-0.0.2-cp38-cp38-linux_aarch64.whl
Installing collected packages: aclruntime
Successfully installed aclruntime-0.0.2



/tools/ais-bench_workload/tool/ais_bench# pip3 install ./ais_bench-0.0.2-py3-none-any.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./ais_bench-0.0.2-py3-none-any.whl
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (4.67.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (1.24.0)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.8/dist-packages (from ais-bench==0.0.2) (24.2.0)
Installing collected packages: ais-bench
Successfully installed ais-bench-0.0.2

模型推理性能测试

帧率及时延

time python3 -m ais_bench --model  ./yolov8l_11.om --output /home/zh/310poutput/ --outfmt BIN --loop 2000
[INFO] acl init success
[INFO] open device 0 success
[INFO] create new context
[INFO] load model ./yolov8l_11.om success
[INFO] create model description success
[INFO] try get model batchsize:1
[INFO] output path:/home/zh/310poutput/2025_03_12-15_50_05
[INFO] warm up 1 done
loop inference exec: (2000/2000)|                                                                                                  | 0/1 [00:00<?, ?it/s]
Inference array Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:19<00:00, 19.83s/it]
[INFO] -----------------Performance Summary------------------
[INFO] NPU_compute_time (ms): min = 9.8603515625, max = 9.99896240234375, mean = 9.8857497215271, median = 9.884765625, percentile(99%) = 9.925010833740235
[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(9.8857497215271): 101.1557067667221
[INFO] ------------------------------------------------------
[INFO] unload model success, model Id is 1
[INFO] end to reset device 0
[INFO] end to finalize acl

real    0m23.306s
user    0m4.432s
sys     0m0.391s

可以看到吞吐率 101帧 /s,平均耗时9.89毫秒。

资源消耗情况

通过npu-smi info watch -i 16

功耗

总结

虽然都是yolov8, 单各个小版本模型的资源消耗,处理速率并不相同,进而功耗差异也不同。这在我们前期需求选择模型时有重要参考价值。

再回到初始下载链接的图,图中,8s和8l再CPU ONNX的耗时倍数是3倍,这个倍差和我们用昇腾芯片测试出来一致。A100的延时只有2.39,而昇腾有9.89

### 部署YOLOv8模型于华为昇腾平台 #### 准备工作环境 为了在华为昇腾平台上成功部署YOLOv8模型,首先需要准备适当的工作环境。这包括安装必要的软件包以及获取所需的硬件资源。 - 安装依赖库和工具链,确保环境中已配置好Python解释器及其相关开发库。 - 下载并编译适用于昇腾系列处理器的SDK和支持库[^4]。 #### 获取YOLOv8模型权重文件 考虑到网络连接可能不稳定或较慢,在开始之前应当预先下载好YOLOv8模型的`.pt`格式权重文件,并将其放置在一个易于访问的位置。这样可以在后续过程中直接引用本地路径来加载模型,从而提高效率。 ```bash wget -O yolov8n-seg.pt https://example.com/path_to_yolov8_weights # 假设这是官方提供的链接地址 ``` #### 转换模型至适配昇腾架构的形式 由于昇腾AI处理器具有特定的数据表示形式(如INT8),因此有必要将原始浮点型参数转换为目标设备所支持的低精度版本。此过程通常涉及使用ATC (Ascend Tensor Compiler) 工具完成模型离线压缩与优化操作[^1]。 ```bash atc --model=yolov8n-seg.onnx --framework=5 --output=yolov8n-seg --input_format=NCHW --input_type=FP32 --soc_version=Ascend310P3 --insert_op_conf=aipp.cfg ``` > **注意**: 上述命令中的具体选项需依据实际情况调整;特别是关于SOC型号的选择应严格匹配实际使用的昇腾芯片类型。 #### 编写推理程序 编写一段简单的Python脚本来调用经过处理后的YOLOv8模型执行图像分割任务。这里假设已经完成了前面提到的各项准备工作: ```python from mindspore import context, load_checkpoint, export import numpy as np from PIL import Image import time context.set_context(mode=context.GRAPH_MODE, device_target="Ascend") def preprocess(image_path): img = Image.open(image_path).convert('RGB') img_resized = img.resize((640, 640)) img_array = np.array(img_resized)/255. input_data = np.expand_dims(img_array.transpose(2, 0, 1), axis=0).astype(np.float32) return input_data if __name__ == '__main__': model_file = "yolov8n-seg.mindir" net = ... # 加载mindir格式的YOLOv8模型 test_images = ["data/images/huawei.jpg", "data/images/bus.jpg", "data/images/dog.jpg"] for image in test_images: inputs = preprocess(image) start_time = time.time() outputs = net(inputs) end_time = time.time() print(f"Inference Time: {round(end_time-start_time, 4)} seconds.") # 处理输出结果... ``` 上述代码片段展示了如何利用MindSpore框架读取图片数据、对其进行预处理后送入神经网络进行预测计算的过程。需要注意的是,此处省略了一些细节实现部分,比如具体的类定义等[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

proware

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值